Abstract
Background:
Systemic lupus erythematosus (SLE) is a chronic autoimmune condition with complex causes involving genetic and environmental factors. While genome-wide association studies (GWASs) have identified genetic loci associated with SLE, the functional genomic elements responsible for disease development remain largely unknown. Mendelian Randomization (MR) is an instrumental variable approach to causal inference based on data from observational studies, where genetic variants are employed as instrumental variables (IVs).
Methods:
This study utilized a two-step strategy to identify causal genes for SLE. In the first step, the classical MR method was employed, assuming the absence of horizontal pleiotropy, to estimate the causal effect of gene expression on SLE. In the second step, advanced probabilistic MR methods (PMR-Egger, MRAID, and MR-MtRobin) were applied to the genes identified in the first step, considering horizontal pleiotropy, to filter out false positives. PMR-Egger and MRAID analyses utilized whole blood expression quantitative trait loci (eQTL) and SLE GWAS summary data, while MR-MtRobin analysis used an independent eQTL dataset from multiple immune cell types along with the same SLE GWAS data.
Results:
The initial MR analysis identified 142 genes, including 43 outside of chromosome 6. Subsequently, applying the advanced MR methods reduced the number of genes with significant causal effects on SLE to 66. PMR-Egger, MRAID, and MR-MtRobin, respectively, identified 13, 7, and 16 non-chromosome 6 genes with significant causal effects. All methods identified expression of PHRF1 gene as causal for SLE. A comprehensive literature review was conducted to enhance understanding of the functional roles and mechanisms of the identified genes in SLE development.
Conclusions:
The findings from the three MR methods exhibited overlapping genes with causal effects on SLE, demonstrating consistent results. However, each method also uncovered unique genes due to different modelling assumptions and technical factors, highlighting the complementary nature of the approaches. Importantly, MRAID demonstrated a reduced percentage of causal genes from the Major Histocompatibility complex (MHC) region on chromosome 6, indicating its potential in minimizing false positive findings. This study contributes to unraveling the mechanisms underlying SLE by employing advanced probabilistic MR methods to identify causal genes, thereby enhancing our understanding of SLE pathogenesis.
Keywords: systemic lupus erythematosus (SLE), Mendelian Randomization (MR), GWAS, eQTL, gene expression, genetics, causal inference, PHRF1
INTRODUCTION
Genome-wide association studies (GWASs) have identified thousands of genetic loci associated with common diseases and disease-related traits (www.ebi.ac.uk/gwas/). However, the functional genomic elements which exert causal effects on the phenotypes remain largely unknown. Genetic variants can causally affect a disease phenotype by altering gene product structure or quantitative levels of gene products. For Systemic Lupus Erythematosus (SLE), for example, nearly 200 apparently independent loci have been identified (Harley and Sawalha, 2022). Thousands of changes in gene expression are associated with the variants at these loci. Many of these relationships define mechanisms of gene regulation, but do not help determine whether they are related in any way to disease causality.
When the underlying assumptions are met, newer analytical methods of Mendelian Randomization (MR) offer the possibility of identifying causal relations between gene expression and disease. Herein we have applied multiple MR methods (Gleason et al., 2021; Yuan et al., 2022, 2020; Zhu et al., 2016) using genetic data from SLE (Bentham et al., 2015) and multiple sources of expression quantitative trait loci (eQTL) data (Lappalainen et al., 2013; Schmiedel et al., 2018; Võsa et al., 2021).
In this work, therefore, we are concerned only with the modeling of a causal effect for levels of gene products, as indirectly inferred from mRNA gene expression data. Other models of causation, such as alleles leading to alternate protein sequence, are not evaluated. The genetic variants modulate gene expression levels, which in turn exert causal effects on the disease phenotype. Since the genetic variants are the underlying reason for the heritability of the disease phenotype, we naturally initiate the investigation with the potentially causal variants. However, due to linkage disequilibrium (LD), the causal potential for disease associated GWAS genetic variants is difficult to interpret. Thousands of variants are potentially causal for disease (candidate causal variants). Despite these problems, many statistical methods have been developed for causal variant discovery (Schaid et al., 2018). However, due to the large number of genetic variants in the genome, these methods often have low statistical power to identify causal variants. These challenges have thus motivated the development of methods to prioritize candidate causal genes at GWAS loci; the resulting methods are potentially more statistically powerful since there is a smaller set of candidate genes, rather than considering millions of genetic variants in the genome.
Transcriptome-wide association studies (TWAS) leverage expression reference panels (eQTL data) to identify gene expressions associated with disease phenotype. However, TWAS projects can only identify gene-disease associations, but the association does not imply causation. Mendelian Randomization (MR) is an instrumental variable approach to causal inference based on data from observational studies, where genetic variants serve the role of instrumental variables (IV). The goal of this method is to identify the causal effect (variable α in Figure 1A) of exposure X on the outcome Y, in the presence of unmeasured confounders C of the X-Y association. If the three IV assumptions shown in Figure 1B hold true, the classical MR method provides an unbiased estimate of α (see Materials and Methods for details). However, if the IV assumptions are violated in the presence of the horizontal pleiotropy, which is widespread given the complexity of the genetics of living species (Verbanck et al., 2018), then the estimate of causal effect size made using the classical MR method is biased. Horizontal pleiotropy occurs when the genetic variant has an effect on outcome/disease outside of its effect on the “Exposure” (see Figure 1) in MR. There are two types of horizontal pleiotropy: uncorrelated pleiotropy, where the effects of genetic variants (G) on Y are uncorrelated with the effects of G on X, and correlated pleiotropy, where the effects of genetic variants (G) on Y are correlated with effects of G on X through confounders (C) (Figure 1). In the context of MR, newer methods accommodate both types of pleiotropy (Morrison et al., 2020; Yuan et al., 2022, 2020). Additional advances even allow for the analysis on a tissue-specific level (Materials and Methods) (Gleason et al., 2021). By applying these methods using existing SLE loci as Instruments (IVs) and expression data as the Exposure we show that the three approaches suggest causation for a subset of the locus-gene expression dyads, thereby, providing a beginning to elucidate mechanisms for SLE using the MR approaches.
FIGURE 1.

A schematic of the classical Mendelian Randomization (MR) method. (A) A graphical model of classical MR. The directed acyclic graph represents the probabilistic dependencies of the random variables shown. The goal of the MR method is to estimate the causal effect α of Exposure X on the Outcome (trait Y), using marginal effect sizes from the Exposure and Outcome GWAS (see Materials and Methods for details). Horizontal pleiotropy (green dotted arrows) occurs when the genetic variant has an effect on Outcome/disease outside of its effect on the Exposure X. There are two types of horizontal pleiotropy: uncorrelated pleiotropy, where effects of genetic variants G on Y are uncorrelated with effects of G on X, and correlated pleiotropy, where effects of genetic variants G on Y (directed path G→C→Y) are correlated with effects of G on X (directed path G→C→X). (B) If the Instrumental Variable (IV) assumptions hold, and the linear relation between the variables in the model is assumed, MR method can unbiasedly estimate the causal effect size as the ratio of G-on-Y to G-on-X effect sizes. When one or more of the IV assumptions are violated, such as when correlated and/or uncorrelated horizontal pleiotropic effects are present, the naïve estimate is biased. The second IV assumption, ‘G is marginally independent of C’, can be mathematically described as follows. Panel A’s graphical model corresponds a joint probability distribution P(G, C, X, Y) of four random variables. Summing over all possible outcomes of X and Y produces a marginal probability distribution of G and C: P(G, C) = ∑X,Y P(G, C, X, Y). If the marginal distribution factorizes as P(G, C) = P(G)P(C), then G and C are said to be marginally independent. Similarly, the third IV assumption, ‘G and Y are independent given X and C, can be understood as a factorization of the conditional probability distribution: P(G, Y|X, C) = P(G|X, C)P(Y|X, C).
In the last few years, MR methods have become increasingly sophisticated with heavy use of complex concepts and methods from theoretical statistics and statistical learning theory. In this work, we attempt a pedagogical exposition of the statistical theory behind the MR methods used in our analyses, with the hope that a wider range of investigators in genetics will be able to exploit the inner-workings of these methods to provide insight into disease mechanisms.
RESULTS
Single-SNP summary data-based MR analysis
We used a two-step strategy to identify causal genes. In the first step, we assumed that horizontal pleiotropic effects are absent and applied the classical MR method to estimate causal effect α of the gene expression (Exposure X) on SLE (Outcome Y) (as in Figure 1). In the second step, we applied advanced probabilistic MR methods to the genes identified in Step-1, without restrictive assumptions on the horizontal pleiotropic effects.
Let be the marginal effect size of a SNP (single nucleotide polymorphism) (Instrumental Variable G) on the gene expression (Exposure X) from the eQTL summary statistics data (Figure 1). Let be the marginal effect size of the same SNP (Instrumental Variable G) on the SLE (Outcome Y) from the SLE GWAS summary statistics data. When the Instrumental Variable (IV) assumptions hold, the classical MR method provides unbiased estimate of the causal effect size to be (Materials and Methods) (Didelez and Sheehan, 2007; Smith and Ebrahim, 2003). We applied the classical MR method, as implemented in a two-sample MR method SMR (Summary data-based Mendelian Randomization) (Zhu et al., 2016), to the whole-blood eQTL (expression quantitative trait locus) summary statistics data from the eQTLGen study (Võsa et al., 2021) (sample size = 31,684 persons of European ancestry), and the SLE GWAS summary statistics data from (Bentham et al., 2015) (sample size = 14267 persons of European ancestry; 5201 persons diagnosed with SLE and 9066 persons without known SLE diagnosis; GWAS summary data was preprocessed using the QC algorithm DENTIST (Chen et al., 2021), Materials and Methods). In two-sample MR, the Exposure and Outcome variables are measured on two non-overlapping sets of individuals.
We chose genes with at least one significant cis-eQTL (PeQTL < 5e-8) with transcription start sites (TSS) located within 500kb of GWAS significant SNPs (PGWAS < 5e-8) for the SMR analysis (Materials and Methods). For each gene chosen for the SMR analysis, its top cis-eQTL SNP was used as the Instrumental Variable. The causal effects of expressions of 142 genes were identified as being statistically significant by the SMR method (Bonferroni corrected p-value < 0.05; Nominal p-value < 9e-5) (Figure 2 and Supplementary Table S1). Of these genes, 99 are from the chromosome 6 and 43 are from the rest of the genome. 95 of the chromosome 6 genes are from the extended MHC region (xMHC, hg19 region chr6:25Mb-34Mb). 68 of these genes are from the classical MHC region (MHC, hg19 region chr6:28.5Mb-33.4Mb).
FIGURE 2.

Manhattan Plot of SMR-significant genes. Named are the 43 non-chromosome 6 and 3 chromosome 6 genes whose expression has statistically significant causal effect on SLE according to the single-SNP SMR method, which presupposes that the Instrumental Variable (IV) assumptions hold (see Figure 1B). The dotted horizontal line is at the nominal causal effect p-value threshold of 0.05/531 (Bonferroni correction for 531 multiple statistical tests). Only 3 important genes from chromosome 6 are labeled, and the rest are not labeled to avoid clutter. The genes’ order on the chromosome is maintained, but their location is not shown to scale for clarity.
Genetic variants are known to exhibit widespread horizontal pleiotropic effects (Verbanck et al., 2018). The IV assumptions underpinning the classical MR method are thus likely to be violated, with the implication that there may also be some false positives among the 142 putative causal genes identified by the SMR method. Thus, we applied advanced MR methods capable of performing inference in the presence of invalid Instrumental Variables to analyze the 142 genes identified earlier. Our two-step strategy can be motivated as follows. The SMR algorithm assumes that horizontal pleiotropy is absent, which can lead to false positives in gene discovery. Addressing these false positives is our primary concern, so we applied additional filtering to the genes identified by SMR in step-1. Our aim was to employ advanced MR methods as filters. Applying these advanced MR methods on the entire dataset instead would have introduced challenges related to multiple testing due to the large number of genes.
Modelling uncorrelated horizontal pleiotropy with PMR-Egger
If IV assumptions of the classical MR method can be violated by the horizontal pleiotropy, why not explicitly model it? This is the approach taken in recent studies (Bowden et al., 2015; Cheng et al., 2022, 2020; Morrison et al., 2020; Xu et al., 2021; Yuan et al., 2022, 2020). For a recent review of these methods, see (Boehm and Zhou, 2022). The idea is to jointly estimate the parameters characterizing the horizontal pleiotropy and causal effect size using summary statistics from the Exposure and the Outcome GWAS. PMR-Egger (Yuan et al., 2020), a probabilistic MR method, explicitly models uncorrelated horizontal pleiotropy (see Materials and Methods, and Figure 3 for a concise description of the method). The European whole-blood eQTL summary statistics data from the eQTLGen study (Võsa et al., 2021), the European population SLE GWAS summary statistics data from (Bentham et al., 2015), and the LD structure from the European 1KG data, was used as input data for the PMR-Egger method.
FIGURE 3.

A description of the PMR-Egger method. (A) A graphical representation of the statistical model. The model is used to estimate the causal effect α of gene expression X on the trait Y of interest, in the presence of the uncorrelated horizontal pleiotropic effect γ. Correlated pleiotropic effects are assumed to be absent. For each j = 1, …, p, the random variable βj represents the effect size of genetic variant (cis-SNP) Gj on the gene expression X and is assumed to follow the normal distribution with mean zero and the variance . These random variables are assumed to be independent of each other. (B) The matrix equations representing the mixed-effects statistical model from panel A. For example, the first equation states that the vector of marginal effect sizes of p cis-SNPs on exposure variable X (gene expression) is equal to the matrix-product of matrix R (SNP-SNP genotype correlation matrix) and the vector β of effect sizes of the p cis-SNPs on exposure X, plus a noise vector. The matrix and vectors in the equations are emphasized in boldface. (C) A detailed description of some variables in the equations from the panel B. Column vectors of dimension p are represented as a transpose (‘T’) of row vectors. The noise terms εx and εy follow the multivariate normal distributions with mean zero and covariance matrix and , respectively, where R is the SNP-SNP genotype correlation matrix.
From the set of 142 genes identified by SMR method, we selected 97 genes based on the criterion that the SNP set Sg for each gene g contains at least 25 statistically significant (PeQTL < 5e-8) eQTL SNPs (for the details and a heuristic motivation for the cutoff of 25 SNPs, see Materials and Methods). We performed PMR-Egger analysis on each of these 97 genes using the eQTL and SLE GWAS summary statistics data, and the LD data restricted to SNPs from the set Sg for each gene g.
The PMR-Egger method has identified 13 non-chromosome 6 and 34 chromosome 6 genes with statistically significant causal effect sizes (causal effect p-value < 0.05) (see Figure 4, Table 1 and Supplementary Table S1). Due the complexity of the MHC locus on chromosome 6, the results for chromosome 6 are likely to be unreliable. For a discussion of this issue, see ‘Comparison of analysis results from three MR methods’ Section below.
FIGURE 4.

A comparison of PMR-Egger and SMR causal effect sizes. A scatter plot depicting SLE causal effect sizes of non-chromosome 6 genes which are statistically significant according to the probabilistic MR method (PMR-Egger). The signs of the causal effect sizes estimated using two methods agree. The PMR-Egger method models the uncorrelated horizontal pleiotropic effects, but assumes that the correlated horizontal pleiotropy is absent.
TABLE 1.
Candidate causal genes from outside of the chromosome 6 identified in this study. See also Figure 13A. Cells with p-value < 0.05 in the table are highlighted in light blue color. The table column names are described as follows. Gene: Symbol representing the gene; SNP: ID of the most significant eQTL SNP associated with the gene; Chrom: ID of chromosome where the gene and SNPs are located; Position: Chromosomal position (base-pair) of the top eQTL SNP associated with the gene; p_GWAS: SLE GWAS p-value of the top eQTL SNP; p_eQTL: eQTL p-value of the top eQTL SNP; p_SMR: p-value for the gene causal effect size estimate by SMR method; p_PMR: p-value for the gene causal effect size estimate by PMR-Egger method; p_MRAID: p-value for the gene causal effect size estimate by MRAID method; p_MtRobin: p-value for the gene causal effect size estimate by MR-MtRobin method. ‘NA’ in the cells: data not available because of technical issues such as ‘algorithm generated errors’ and ‘insufficient number of genetic variants at the locus to reliably estimate causal effect size’.
| Gene | SNP | Chrom | Position | p_GWAS | p_eQTL | p_SMR | p_PMR | p_MRAID | p_MtRobin |
|---|---|---|---|---|---|---|---|---|---|
| GPX3 | rs3792789 | 5 | 150445968 | 3.0E-06 | 7.5E-80 | 5.7E-06 | NA | 9.0E-01 | 4.0E-04 |
| IRF5 | rs6467223 | 7 | 128674666 | 2.3E-13 | 0.0E+00 | 3.3E-13 | 7.4E-04 | 2.7E-01 | 0.0E+00 |
| TNPO3 | rs6467223 | 7 | 128674666 | 2.3E-13 | 0.0E+00 | 4.7E-13 | 9.1E-14 | 6.0E-01 | 2.8E-01 |
| RP11–128A6.2 | rs6467223 | 7 | 128674666 | 2.3E-13 | 1.4E-15 | 6.6E-08 | NA | NA | 1.3E-02 |
| SMO | rs74942545 | 7 | 128751444 | 1.5E-08 | 2.2E-10 | 2.4E-05 | NA | NA | 2.0E-06 |
| XKR6 | rs4618656 | 8 | 10969235 | 2.0E-06 | 1.3E-95 | 3.6E-06 | 2.3E-04 | 9.6E-01 | 1.0E-06 |
| AF131215.9 | rs4618656 | 8 | 10969235 | 2.0E-06 | 1.8E-290 | 2.4E-06 | NA | 2.0E-06 | 0.0E+00 |
| AF131215.2 | rs4618656 | 8 | 10969235 | 2.0E-06 | 6.4E-298 | 2.4E-06 | NA | 8.5E-09 | 0.0E+00 |
| FAM167A | rs2736345 | 8 | 11352485 | 1.5E-13 | 0.0E+00 | 1.9E-13 | NA | 6.9E-10 | 0.0E+00 |
| BLK | rs2736345 | 8 | 11352485 | 1.5E-13 | 0.0E+00 | 2.8E-13 | NA | 1.0E-08 | 0.0E+00 |
| RP11–148O21.6 | rs11250144 | 8 | 11386276 | 2.8E-08 | 1.0E-14 | 6.5E-06 | 6.6E-04 | 6.5E-01 | 0.0E+00 |
| RP11–148O21.4 | rs2736345 | 8 | 11352485 | 1.5E-13 | 7.4E-105 | 2.6E-12 | 7.1E-04 | 1.8E-02 | 0.0E+00 |
| RP11–148O21.2 | rs2736345 | 8 | 11352485 | 1.5E-13 | 1.1E-33 | 2.9E-10 | 0.0E+00 | NA | 0.0E+00 |
| PHRF1 | rs6598008 | 11 | 618172 | 6.7E-10 | 2.8E-19 | 3.7E-07 | 3.9E-06 | 4.7E-03 | 9.9E-03 |
| IRF7 | rs1051390 | 11 | 613165 | 8.8E-11 | 4.6E-81 | 8.2E-10 | 1.7E-02 | 1.0E+00 | 3.2E-01 |
| TMEM80 | rs12277188 | 11 | 688091 | 9.8E-08 | 0.0E+00 | 1.0E-07 | NA | 7.6E-02 | 1.2E-03 |
| RP11–542M13.3 | rs12149636 | 16 | 85971220 | 1.1E-07 | 7.3E-52 | 5.4E-07 | NA | NA | 8.6E-03 |
| RP11–542M13.2 | rs9308364 | 16 | 86003446 | 3.4E-07 | 9.7E-17 | 1.4E-05 | NA | NA | 3.4E-02 |
| RP11–94L15.2 | rs12936231 | 17 | 38029120 | 1.8E-05 | 4.1E-73 | 3.0E-05 | 6.9E-05 | 7.2E-02 | 4.8E-01 |
| GSDMB | rs12936231 | 17 | 38029120 | 1.8E-05 | 0.0E+00 | 1.8E-05 | 1.6E-03 | 2.0E-01 | NA |
| ORMDL3 | rs12936231 | 17 | 38029120 | 1.8E-05 | 0.0E+00 | 1.8E-05 | 1.9E-02 | 2.8E-01 | NA |
| TYK2 | rs11085725 | 19 | 10462513 | 9.6E-13 | 1.1E-163 | 5.1E-12 | 4.2E-03 | 9.7E-01 | 6.4E-01 |
| UBE2L3 | rs2070512 | 22 | 21949411 | 1.5E-13 | 0.0E+00 | 1.9E-13 | 7.0E-15 | 4.2E-02 | 4.9E-01 |
PMR-Egger method estimates the causal effect size α, the uncorrelated horizontal pleiotropy level γ (see Figure 3) and the corresponding statistical significance p-values Pα and Pγ. Interestingly, we found no evidence of uncorrelated pleiotropy for the 13 non-chromosome 6 significant genes (Pα < 0.05 and median Pγ = 0.2). On the other hand, the median pleiotropy p-value (Pγ) for the statistically non-significant genes (those with the causal effect p-value Pα > 0.05) is 3e-6. This means that genes whose causal effects on SLE are not significant according to PMR-Egger have high uncorrelated pleiotropy levels. The SMR method incorrectly identified these genes as being causal due to the invalid IV assumption that horizontal pleiotropy was absent. On the other hand, the PMR-Egger method, by explicitly taking into account the uncorrelated horizontal pleiotropy in the statistical modelling, shows that the expression of many genes does not have statistically significant causal effects on SLE under the model being tested.
Despite being statistically significant, the causal effect sizes estimated using PMR-Egger method are small (Figure 4 and Supplementary Table S1). However, absolute values of causal effect sizes should not be taken literally because the SLE GWAS summary statistics were calculated using a logistic regression for a binary trait (case-control study). On the other hand, almost all MR methods, including the methods used in this work, assume continuous trait values in linear models. Thus, the MR methods treat binary trait values as continuous, which is not fully justified. Thus, it may be more appropriate to focus on the statistical significance level (p-value) and interpret the causal effect size only semi-quantitatively. Nevertheless, despite the technical limitations of MR methods, our study findings demonstrate consistent estimates of the direction of causal effects across all four MR methods employed for the majority of genes identified (Figures 4,6,10).
FIGURE 6.

A comparison of MRAID and SMR causal effect sizes. A scatter plot depicting SLE causal effect sizes of non-chromosome 6 genes which are statistically significant according the MRAID method. The MRAID method models both uncorrelated and correlated horizontal pleiotropic effects. The signs of the causal effect sizes estimated using two methods agree.
FIGURE 10.

A comparison of MR-MtRobin and SMR causal effect sizes. A scatter plot depicting SLE causal effect sizes of non-chromosome 6 genes which are statistically significant according the MR-MtRobin method applied to the DICE and LCL eQTL data. This method implicitly models both uncorrelated and correlated horizontal pleiotropic effects. The signs of the causal effect sizes estimated using two methods agree for all but four genes.
Modelling correlated and uncorrelated horizontal pleiotropic effects using MRAID
PMR-Egger statistical model described above imposes a restrictive assumption on the model: the absence of correlated pleiotropic effects. The correlated pleiotropy is present when effects of genetic variants on Outcome Y are correlated with effects on Exposure X [Figure 1 and (Morrison et al., 2020)]. MRAID (MR with Automated Instrument Determination) is a probabilistic MR method for causal inference with correlated SNP instruments in the presence of both correlated and uncorrelated horizontal pleiotropic effects (Yuan et al., 2022). For a concise description of the MRAID model, see Figure 5 and Materials and Methods. MRAID was originally developed for causal inference of complex traits exposures, but, to the best of our knowledge, has not yet been applied to gene expression exposures in published work.
FIGURE 5.

A description of the MRAID method. (A) A graphical representation of the statistical model. The model is used to estimate the causal effect α of gene expression X on the trait Y of interest, in the presence of the uncorrelated and correlated horizontal pleiotropic effects. In this mixed-effects model, the variables α and ρ are fixed effects, and the other variables are random effects. (B) The random effect variables in the model follow mixture probability distributions. For instance, with the probability πβ, the random variable βj follows a normal distribution and is identically equal to zero with the probability 1 − πβ. In the latter case, the genetic variant Gj does not directly affect the gene expression X, but has a direct uncorrelated pleiotropic effect on the outcome Y. With the probability 1 − πc, the random variable is equal to zero, which results in the vanishing correlated horizontal pleiotropy random variable . When , the effect size of genetic variant (cis-SNP) Gj on the gene expression X is non-zero and is equal to βj. (C) The matrix equations representing the mixed-effects statistical model from panel A. The equation for is formally identical to the corresponding equation from the PMR-Egger method (see Figure 3B). In the linear equation for , the first three terms on the right-hand side are of the form: matrix R times a vector random variable. Thus, the variables are a priori not distinguishable. However, thanks to assumptions on distributions of the random variables (see panels A and B), the variables are distinguishable and the method can infer the parameters of the probability distributions. (D) A heuristic derivation of mixed-model equations from panel C when all SNPs are in linkage equilibrium. In the latter case, SNPs are uncorrelated and the SNP-SNP genotype correlation matrix R becomes the identity matrix, and R can then be erased from the equations. The right-hand side of equation for can be understood as follows. In the graph from panel A, there are two directed paths from Gj to X: GCX and GX. The value of is the sum of the contributions from these two paths and the noise term. The value of the directed path GCX is the product of the values of the directed paths GC and CX. Similarly, the value of is the sum of the contributions from the directed paths GCY, GCXY, GXY and GY, and the noise term. The equations in the general case of correlated SNPs can be understood as R-weighted contributions to marginal effect size of a SNP from the tagged SNPs which are in LD.
For MRAID analysis, we used the same whole-blood eQTL, SLE GWAS summary statistics, LD structure data as described earlier for the PMR-Egger analysis. From the set of 142 genes identified by SMR method, we selected 97 genes and the corresponding sets (Sg for each gene g) of statistically significant (PeQTL < 5e-8) eQTL SNPs (see Materials and Methods). We performed MRAID analysis on each of these 97 genes using the eQTL and SLE GWAS summary statistics data, and the LD data restricted to SNPs from the set Sg for each gene g. The MRAID method has identified 7 non-chromosome 6 and 6 chromosome 6 genes with statistically significant causal effect sizes (causal effect p-value < 0.05) (see Figure 6, Table 1 and Supplementary Table S1).
Multi-cell type MR analysis
The MR analyses described so far used whole-blood eQTL data generated from over 31 thousand individuals in the eQTLGen project (Võsa et al., 2021). We sought to replicate our findings in an independent data set from different immune cell types. To this end, we applied MR-MtRobin method (Gleason et al., 2021) to the eQTL datasets from the DICE project (15 immune cell types from 90 Europeans) and GEUVADIS lymphoblastoid cell lines (LCLs from 445 Europeans) (Lappalainen et al., 2013). We included LCL data in our analysis because these cell lines are infected with Epstein-Barr virus (EBV), which is a strong etiologic candidate for causing SLE and Multiple Sclerosis (MS) (Bjornevik et al., 2022; Harley and James, 2006; Laurynenka et al., 2022). LCLs are stable transformed cell lines that express EBV’s Latency III program. Notably, the EBV gene product and transcription co-factor, EBNA2, is enriched at the genetic loci associated with the risk of both SLE and MS (Harley et al., 2018). A comparison of the gene expression profile of SLE risk genes across 459 different cell/tissue types revealed that EBV-infected B cells (LCLs) had the strongest representation of highly expressed SLE risk genes (Afrasiabi et al., 2022).
For a concise description of the MR-MtRobin model, see Figure 7 and Materials and Methods. The MR-MtRobin method uses a mixed-effects linear statistical model which relates cell-type specific eQTL effect sizes (dependent variables) to GWAS effect sizes (independent variables) in what effectively amounts to a weighted reverse regression analysis (‘reverse’ because of the inversion in the roles of dependent and independent variables), with the weights given by reciprocals of squares of standard errors in estimate of eQTL effect sizes (Figures 7–9 and Materials and Methods). The MR-MtRobin method has identified 16 non-chromosome 6 and 21 chromosome 6 genes with statistically significant causal effect sizes (causal effect p-value < 0.05) (see Figure 10, Table 1 and Supplementary Table S1).
FIGURE 7.

A description of MR-MtRobin method. (A) An illustrative plot depicting a weighted linear regression of cell type-specific eQTL effect sizes against the effect sizes from the trait Y GWAS. Artificially generated data points for three SNPs in four cell types are shown. The error bars represent standard errors of effect sizes from the cell type-specific eQTL summary statistics data. The effect sizes of SNP-1 eQTL are statistically significant in four cell types, while those of SNP-2 and SNP-3 are significant in three cell types only. Each vertical cluster of data points corresponds to a single SNP. A blue line connecting the origin with a point in a vertical cluster for each SNP represents a statistical fit of the weighted linear regression model described in the panel B, with the estimated SNP-specific slope parameter equal to θ + θk for SNP-k. For each SNP, the data points with smaller eQTL effect size standard errors receive larger weights (the end points of the blue lines are closer to such data points). The red dotted line with the slope θ, which is the reciprocal 1/α of the X-on-Y causal effect size α, represents an overall linear relationship between and . (B) The mixed-effects linear model. In the linear relationship between cell type-specific eQTL effect sizes and GWAS effect sizes , θ is a fixed effect and θj are SNP-specific random effects. The noise term εjm in the equation is cell type specific and depends on the structure of SNP-SNP genotype correlations. Specifically, for the cell type m, the vector εm follows a multivariate normal distribution with mean zero and the covariance matrix whose elements are the products of eQTL standard errors in cell type m and SNP genotype correlation matrix elements.
FIGURE 9.

A SNP-centric view of MR-MtRobin analysis for the causal effect of BLK gene expression on SLE. To ensure clarity, only a selection of the top GWAS SNPs is depicted.
Interestingly, among the 16 non-chromosome 6 genes, MR-MtRobin and SMR methods demonstrated a discrepancy in the direction/sign of causal effect size estimates for four genes (see Figure 10). To investigate the reason for this inconsistency, we conducted an analysis of scatter plots comparing multi-cell type eQTL effect sizes versus GWAS effect sizes (Figure 11). Notably, the LCL cell line (shown in red color) has the most substantial impact on the causal effect size estimates by MR-MtRobin due to its larger sample size (n = 445) compared to the smaller sample sizes (n = 90) of eQTL data from other cell types. When the MR-MtRobin analysis included the LCL eQTL data alongside other cell types’ eQTL data, the causal effect of PHRF1 is positive, equaling 3.5 (p-value = 0.099, Supplementary Table S1), which aligns with the red data points following a positive slope line (red line in Figure 11A). Conversely, when the analysis excluded the LCL eQTL data, the causal effect of PHRF1 became negative, equaling −0.5 (although statistically not significant: p-value = 0.33), in agreement with the data points for non-LCL cell types following a negative slope line (blue dashed line in Figure 11A) and consistent with the direction of the causal effect estimate obtained by SMR analysis using whole blood eQTL data. However, for the other three genes (IRF5, GPX3, and RP11–542M13.2), the exclusion of LCL eQTL data did not lead to a reversal of the causal effect estimates by MR-MtRobin [Causal effect estimates of these genes with LCL data included in MR-MtRobin analysis: 1.1 (p-value = 0), −0.77 (p-value = 0.0004) and 1.6 (p-value = 0.034), see Supplementary Table S1; Causal effect estimates when LCL data are excluded from the analysis: 1.6 (p-value = 0.57), −0.64 (p-value = 0.0004) and 4.1 (p-value = 0.7)] (Figures 11B,C,D). The direction/sign of causal effect size estimates for the remaining 12 genes shown in Figure 10 stay consistent between SMR and MR-MtRobin when the LCL data is excluded from the MR-MtRobin analysis. Achieving more consistent estimates of the direction of causal effects would necessitate larger and more balanced multi-cell type eQTL data sets, and advanced MR methods capable of incorporating cell type-specific eQTL effects in statistical models. For completeness, we report the Venn diagram comparisons of statistically significant (p-value < 0.05) genes identified by MR-MtRobin method with and without LCL eQTL data (Figure 12).
FIGURE 11.

Reversal of causal effect direction estimates by MR-MtRobin due to cell type-specific eQTL effects and unbalanced eQTL sample sizes. MR-MtRobin weighted linear regression analysis for the causal effects of expressions of four genes on SLE. For a description of the model, see Figure 7 and Materials and Methods, and for a description of the scatter plots, see Figure 8 legend. The largest contribution to the causal effect size α estimate is from the LCL cell line (data points shown in red color) due to the larger sample size (n = 445) of the LCL eQTL data compared to the smaller sample sizes (n = 90) of eQTL data for other cell types. The slope of red line for each gene represents the reciprocal of causal effect size (1/α) estimate with LCL cell line data included in the MR-MtRobin analysis, while the blue dashed line shows the same with LCL data excluded from the analysis. (A) For PHRF1 gene, the data points for LCL (red circles) are aligned along the red line with a positive slope. The data points for other cell types are more consistent with the negative slope blue dashed line. The MR-MtRobin causal effect sizes of PHRF1 with and without LCL eQTL data included in the MR-MtRobin analysis are α = 3.5 and α = −0.5, respectively (see Supplementary Table S1). (B-D) Scatter plots for IRF5, GPX3 and RP11–542M13 genes, respectively.
FIGURE 12.

Venn diagram comparison of statistically significant causal genes identified by MR-MtRobin method using two different sets of eQTL data: DICE + LCL (with LCL) and DICE alone (without LCL). (A) Genes from outside of chromosome 6. (B) Genes from chromosome 6.
Comparison of analysis results from three MR methods
Non-chromosome 6 genes
In this study, we employed three different MR methods, namely MRAID, PMR-Egger, and MR-MtRobin, to identify SLE causal genes. Specifically, excluding chromosome 6, MRAID identified a total of 7 genes, while PMR-Egger detected 13 genes, and MR-MtRobin identified 16 genes (Figure 13A). Among these, 3 genes were found to be common between MRAID and PMR-Egger, whereas 6 genes were shared between MRAID and MR-MtRobin. Interestingly, we observed that 6 genes were common between PMR-Egger and MR-MtRobin, and 2 genes were identified by all three methods (Figure 13A).
FIGURE 13.

Venn diagrams of statistically significant causal genes identified by MRAID, PMR-Egger and MR-MtRobin (the latter with DICE + LCL eQTL data) methods. (A) Genes from outside of chromosome 6. (B) Genes from chromosome 6.
To shed light on the reasons behind the discrepancies in gene identification among the methods, it is crucial to consider various factors. Technical issues played a significant role in certain genes being deemed significant by one method but not by another. Notably, four genes (AF131215.9, AF131215.2, FAM167A, and BLK) identified as significant by MRAID were not detected by PMR-Egger due to errors that occurred during the analysis (‘Cholesky decomposition failed in PMR_summary_Egger_CPP function’ error message was reported when Cholesky decomposition of a matrix constructed from LD SNP-SNP correlation matrix was performed by PMR-Egger). Similarly, six genes specific to MR-MtRobin were not identified by either MRAID or PMR-Egger. Specifically, four of these genes (RP11−128A6.2, SMO, RP11−542M13.3, and RP11−542M13.2) were excluded from the analysis of PMR-Egger and MRAID due to having fewer than 25 SNPs in Sg (as explained in the Materials and Methods section). Additionally, errors (‘Cholesky decomposition failed in PMR_summary_Egger_CPP function’) were encountered during the analysis of TMEM80 and GPX3 genes by PMR-Egger. Furthermore, the absence of GSDMB and ORMDL3 genes from the MR-MtRobin list resulted from errors generated by a non-linear optimization (NLopt) step in the MR-MtRobin algorithm. Thus, it is evident that these three MR methods serve as complementary approaches for inferring causal genes, as they may not be able to analyze the same set of genes.
For the remaining genes, the precise reasons behind their identification by one method but not the others are yet to be determined. It is important to note that the three MR methods operate based on different assumptions, leading to distinct regimes of validity the methods (for assumptions, see Materials and Methods). To illustrate this, consider a hypothetical scenario where a gene is deemed significant by MR method A but not by MR method B. If the assumptions underlying method A are violated by that particular gene, it is likely to be a false positive for method A while being a true negative for method B. Conversely, if the assumptions for method A are valid but those for method B are not, the gene would be classified as a false negative for method B. Therefore, understanding the assumptions and limitations of each MR method is crucial in interpreting the discrepancies in gene identification results. Performing a comprehensive examination of the impact of MR modeling assumptions on the false positive and false negative rates in causal gene discovery would necessitate extensive simulation of in silico datasets using complex probabilistic models. These models would need to incorporate factors like horizontal pleiotropic effects and cell-type specificity. However, undertaking such an extensive in silico analysis is beyond the scope of the present study.
Chromosome 6 genes
On chromosome 6, MRAID identified a total of 6 genes, while PMR-Egger detected 34 genes, and MR-MtRobin identified 21 genes (Figure 13B). Among these, 5 genes were found to be common between MRAID and PMR-Egger, whereas only one gene was shared between MRAID and MR-MtRobin. Interestingly, we observed that 12 genes were common between PMR-Egger and MR-MtRobin, and none were identified together by all three methods (Figure 13B).
Interestingly, only 46% of causal genes identified by MRAID are from chromosome 6, compared to 70% for SMR, 72% for PMR-Egger and 57% for MR-MtRobin. This suggests that MRAID, by modelling both uncorrelated and correlated pleiotropic effects, and using a richer probabilistic model than other MR methods, was able to reduce false positive ‘causal’ genes from the chromosome 6. PMR-Egger method makes a simplistic assumption that correlated horizontal pleiotropy is absent. Furthermore, PMR-Egger method makes a simplifying assumption that horizontal pleiotropic effect sizes of all instrumental SNPs are equal to a single unknown parameter γ (see Figure 3 and Materials and Methods). By contrast, the MRAID model is general and the instrumental SNPs in the model are not constrained to have the same value of horizontal pleiotropic effect (Figure 5 and Materials and Methods).
Due to the high levels of linkage disequilibrium (LD) at the MHC locus on chromosome 6, this region is commonly excluded from Mendelian randomization (MR) analyses. For instance, the SMR study (Zhu et al., 2016) excluded this locus due to LD. The elevated LD levels in this region are likely to result in violations of the standard MR assumptions. Consequently, we anticipate that a typical MR method would exhibit a higher false positive rate in identifying causal genes on chromosome 6 compared to non-chromosome 6 regions.
In contrast, the MRAID method effectively eliminated most chromosome 6 genes, proving valuable in reducing false positives. Conversely, the other MR methods identified a significant number of chromosome 6 genes, suggesting that these methods potentially have higher false positive rates when applied to chromosome 6. This finding underscores the challenges posed by LD in this region. However, beyond chromosome 6, LD levels are lower, indicating that different MR methods likely have comparable false positive rates. Nonetheless, false negative rates may vary, highlighting the complementary nature of the three probabilistic MR approaches (MRAID, PMR-Egger, and MR-MtRobin) utilized in this study. Thus, the combination of these methods provides a comprehensive assessment of causal gene discovery in both chromosome 6 and non-chromosome 6 regions, taking into account the varying levels of LD and the potential for false positives and false negatives.
Multivariable Mendelian Randomization (MVMR) analysis to disentangle causal effects at the FAM167A-BLK locus
The analysis using three single-Exposure variable Mendelian Randomization (MR) methods, namely PMR-Egger, MRAID, and MR-MtRobin, has identified specific genes from the FAM167A-BLK locus as potential causal factors for SLE. The neighboring genes BLK and FAM167A have been found to have causal effects in opposite directions (Figures 6,10 and Supplementary Table S1). This finding aligns with previous research indicating that reduced expression of BLK and elevated expression of FAM167A are associated with an increased risk of SLE (Hom et al., 2008; Saint Just Ribeiro et al., 2022).
In addition to BLK and FAM167A, other genes within the FAM167A-BLK locus, namely RP11–148O21.2, RP11–148O21.4, and RP11–148O21.6, have also been identified as potential causal factors for SLE by one or more of the single-variable MR methods (see Table 1). Given the high linkage disequilibrium at this locus, it is important to investigate whether the causal effects of these five genes are independent of each other. To investigate whether the causal effects of the identified genes within the FAM167A-BLK locus are independent, we employed a Multivariable Mendelian Randomization (MVMR) method (Burgess et al., 2015; Burgess and Thompson, 2015). In this study, we used the single-variable MR methods with an abstract approach to model horizontal pleiotropy without exploring the precise underlying mechanisms driving these effects. In contrast, the MVMR method offers a more explicit modeling of horizontal pleiotropy by considering the pathways from genetic variants to the expression patterns of a subset of genes within a gene set during causal inference. To illustrate this concept, we can reinterpret the MVMR model incorporating the five mentioned genes as a single-variable MR analysis with a specific focus on the BLK gene, while interpreting the expressions of other four genes as mediators of horizontal pleiotropic effects. For a concise description of the MVMR method, see Materials and Methods.
A joint MVMR analysis of the five genes at the FAM167A-BLK locus revealed that the causal effect sizes of two genes were statistically significant. Specifically, the BLK gene exhibited a significant effect (causal effect size = −0.635, p-value = 0.004), suggesting its direct causal relationship with SLE. Similarly, the RP11–148O21.2 gene also demonstrated a significant effect (causal effect size = 0.637, p-value = 0.002). On the other hand, the causal effects of the remaining three genes did not reach statistical significance, indicating that they may not play a significant role in the development of SLE. However, it is crucial to note that this conclusion hinges on the validity of the underlying assumptions in the MVMR analysis. As such, we cannot assert with absolute certainty that FAM167A, RP11–148O21.4, and RP11–148O21.6 are not important in the etiology of SLE.
HEIDI test of horizontal pleiotropy
In this study, we employed three probabilistic MR methods (PMR-Egger, MRAID, and MR-MtRobin) that explicitly model horizontal pleiotropic effects. We conducted a comparison between the results obtained using these probabilistic MR methods and the findings generated by HEIDI test on the set of 142 genes identified using the SMR method. HEIDI (HEterogeneity In Dependent Instruments) test is a statistical method used in Mendelian randomization (MR) analysis to detect horizontal pleiotropy that may arise due to the linkage disequilibrium between single-nucleotide polymorphisms (SNPs) (Zhu et al., 2016).
A challenge in MR analysis using expression quantitative trait loci (eQTLs) as exposure variables is the limited availability of independent cis-eQTL single-nucleotide polymorphisms (SNPs) for most genes in the genome. To overcome this challenge and increase the number of instrumental variables (SNPs), we included correlated SNPs in our analysis. However, we ensured that the correlation between these SNPs (IVs) remained below a specific threshold (a linkage disequilibrium (LD) r-squared value of less than 0.9). These correlated SNPs were considered when applying HEIDI test.
It is important to note that the correlation among instrumental variables can increase the likelihood of false positive results in the HEIDI test, potentially leading to incorrect indications of horizontal pleiotropy. Consequently, HEIDI test may erroneously reject causal genes identified by the SMR algorithm as non-causal (i.e., false negatives in gene discovery). To address this concern, we chose not to rely on HEIDI test in this study. Instead, we employed the three aforementioned probabilistic MR methods, which explicitly account for horizontal pleiotropy, to filter the candidate causal genes identified by the SMR method.
Nevertheless, for completeness, we report the results of HEIDI test (see p_HEIDI p-values in Supplementary Table S1). Among the set of 142 genes identified by the SMR algorithm, the HEIDI test detected heterogeneity in 115 genes at a significance level of p_HEIDI < 0.05. Out of the remaining 27 genes that passed HEIDI test (p_HEIDI >= 0.05), seven genes are located on chromosome 6, while the remaining 20 are located on other chromosomes. Interestingly, four (HIST1H2BK, HIST1H4K, APOM and DEF6) out of the seven genes from chromosome 6 that passed HEIDI test were among the 43 chromosome 6 genes identified in our study. Similarly, ten (RP11–128A6.2, AF131215.9, AF131215.2, FAM167A, BLK, PHRF1, TMEM80, RP11–542M13.2, GSDMB and UBE2L3) out of the 20 non-chromosome 6 genes that passed HEIDI test were among the 23 non-chromosome 6 genes identified in our study. It is worth considering that HEIDI test, with its stringent p-value cutoff of p_HEIDI >= 0.05, may be overly conservative for gene discovery and could result in the exclusion of many potentially causal genes (i.e., false negatives).
Relation to other Mendelian randomization studies on autoimmune diseases
Previous work on Mendelian Randomization (MR) methods in understanding autoimmune and inflammatory diseases encompasses various studies. These include investigations into the association of atopic dermatitis with autoimmune diseases using a bidirectional and multivariable two-sample MR (Zhou et al., 2023), the causal association between atopic eczema and inflammatory bowel disease using a two-sample bidirectional MR (Wang et al., 2023), and the causal associations between Vitamin D levels and psoriasis, atopic dermatitis, and vitiligo using a bidirectional two-sample MR (Ren et al., 2022). Additionally, studies have explored the causal relation between telomere length and the development of SLE using MR methods (Wang et al., 2022), the causal relationship between vitamin D levels and the risk of juvenile idiopathic arthritis (Clarke et al., 2023), and the potential therapeutic targeting of TYK2 for autoimmune diseases (Yuan et al., 2023).
Of particular relevance to our study are two MR analyses. One study (Yazar et al., 2022) employed MR analysis using single-cell eQTL data from peripheral blood mononuclear cells (PBMCs) collected from 982 donors and identified cell type-specific causal genes in seven autoimmune diseases. The study identified 19 candidate causal genes for SLE, 16 of which were from chromosome 6 and 3 were non-chromosome 6 genes. After applying the HEIDI test filter (p_HEIDI >= 0.05), 8 significant causal genes remained (5 from chromosome 6 and 3 from other chromosomes). All three non-chromosome 6 genes identified in (Yazar et al., 2022) (BLK, FAM167A, and UBE2L3) were also identified as causal in our study using four combined methods (SMR and one or more of PMR-Egger, MRAID and MR-MtRobin). Furthermore, 13 of the 19 genes identified in (Yazar et al., 2022) were found in our SMR analysis. Notably, the study (Yazar et al., 2022) revealed cell type-specific effects, such as the causal effect of BLK being restricted to immature naïve B cells, memory B cells, and CD4+ T cells, while the causal effect of FAM167A was restricted to memory B cells. The causal effect of UBE2L3 was found in CD4+ and CD8+ T cells, as well as in mature natural killer (NK) cells. Additionally, the causal effect of C6orf48 was observed in immature naïve B cells, CD4+ T cells, CD4+ central memory T cells, and CD8+ T cells. The causal effect of BTN3A2 was restricted to the same cell types as C6orf48, with additional effects seen in memory B cells, CD8+ effector T cells, and NK cells.
The second relevant study (Mo et al., 2020) utilized the SMR Mendelian randomization method and HEIDI filtering (p_HEIDI > 0.05) on three whole-blood eQTL datasets from European individuals [Westra data (n = 5311) (Westra et al., 2013), CAGE data (n = 2765) (Lloyd-Jones et al., 2017) and GTEx data (GTEx Consortium et al., 2017)] to identify causal genes in SLE. Their analysis identified 21 genes, with 12 from chromosome 6 and 9 from outside of chromosome 6. The majority of these genes showed statistically significant causal effects in only one of the three eQTL datasets. However, four non-chromosome 6 genes (FAM167A, BLK, IRF7, and UBE2L3) and four chromosome 6 genes (HCP5, C6orf48, C4A, and RNF5) identified in (Mo et al., 2020) were also identified as causal in our study using four combined methods. When considering only our SMR results, we found overlap with the same four non-chromosome 6 genes and ten chromosome 6 genes identified in (Mo et al., 2020).
In summary, our study confirms and extends the findings of previous Mendelian Randomization (MR) studies, providing further support for the role of specific genes in autoimmune diseases. We observed overlap in the identification of causal genes, particularly those showing significant effects across multiple datasets. Moreover, our research emphasizes the significance of considering cell type-specific effects, offering insights into the involvement of different immune cell populations in autoimmune diseases.
Our investigation utilized a powerful whole blood expression quantitative trait loci (eQTL) dataset comprising data from almost 32,000 individuals. This dataset provided valuable insights into the causal genes associated with autoimmune and inflammatory diseases. However, given the mixed nature of whole blood, which encompasses various cell types, it remains challenging to discern cell type-specific effects using this dataset alone. Unfortunately, cell type-specific datasets of comparable sample size are not currently available. Although a previous study we discussed employed single-cell eQTL data, the sample size was significantly smaller, around 1,000 individuals, limiting the comprehensive exploration of cell type-specific effects. To gain a more comprehensive understanding of the specific roles played by different cell types in autoimmune and inflammatory diseases, future investigations should incorporate larger single-cell eQTL datasets.
Disease relevance of candidate causal genes identified in this study
By utilizing statistical MR methods, we have identified 23 genes outside of chromosome 6 whose expression may play a causative role in the development of SLE (see Table 1 and Figure 13A). Next, we collected an extensive literature evidence to strengthen their association with the development of this disease (see Appendix A). Based on literature evidence, the genes identified as potentially causal for SLE in this study are implicated in diverse pathways and mechanisms (see Figure 14). Misexpression of these genes contributes to dysregulated type I interferon and IL-12/23 signaling, dysregulation of antibody class switch recombination, dysregulation of B-cell signaling and function, breakdown of self-tolerance, NF-kB hyperactivity, immune dysregulation, inflammatory response, tissue damage, oxidative stress, Epstein-Barr virus infection, and dysregulation in lymphocyte development.
FIGURE 14.

Tentative SLE disease mechanisms for genes identified in this study. This figure illustrates potential disease mechanisms associated with systemic lupus erythematosus (SLE) genes. The light-blue shapes represent 21 of the 23 non-chromosome 6 genes identified as plausible causes in this study. Depending on the context within the figure, these shapes can also represent gene products, such as proteins and mRNAs. The unfilled ovals represent the SLE genes IKZF3 and IRF8, which were not identified in this study. The number of asterisks next to each gene shape indicates the overall level of confidence in the evidence supporting its disease mechanism, ranging from 1 star (*) for the least confident evidence to 4 stars (****) for the most confident evidence. For IRF8, there are two pathways associated with it, and the number of asterisks shown above each pathway represents the confidence level in the evidence supporting that particular pathway.
MATERIALS AND METHODS
SLE GWAS summary statistics data preprocessing
The European population SLE GWAS summary statistics data from (Bentham et al., 2015) (5201 cases and 9066 controls) was downloaded from European Bioinformatics Institute (ebi.ac.uk) under the study ID GCST003156. The SNPs with effect size standard error (SE) identically equal to zero were removed from the GWAS summary statistics data. The reference genotype data from European individuals was obtained from the International Genome Sample Resource (IGSR, https://www.internationalgenome.org), formerly 1000 Genomes Project (1KG). The 1KG European genotypes were filtered using plink2 (C. C. Chang et al., 2015) with parameters hwe = 1e-6 (Hardy-Weinberg equilibrium test p-value cutoff) and maf = 0.001 (minor-allele frequency cutoff). The effect sizes for all SNPs were transformed (β → −β) to be relative to the minor allele of SNP based on minor allele frequency in the 1KG European population. The quality control (QC) of the SLE GWAS summary statistics data was done using DENTIST (Detecting Errors iN analyses of summary staTISTics) algorithm. DENTIST is a GWAS summary statistics quality control method that leverages LD among genetic variants to detect and eliminate errors in GWAS or LD reference and heterogeneity between the two (Chen et al., 2021). DENTIST was run with default parameter settings. DENTIST removed around 5% of SNPs from the SLE GWAS summary statistics data. We name the resulting summary statistics data ‘Bentham-SLE-GWAS’. We found the quality control of GWAS summary statistics by DENTIST method to be an essential data preprocessing step. Without it, the probabilistic MR algorithm MRAID (which is described below) runs into numerical errors such as very large (>1e+10) estimates of some model parameters.
Preprocessing of the eQTL data
The European whole-blood eQTL summary statistics data from the eQTLGen study (Võsa et al., 2021) (sample size = 31,684) was downloaded from eQTLGen Consortium website (https://eqtlgen.org/) (Võsa et al., 2021). The eQTL effect sizes for all SNPs were transformed (β → −β) to be relative to the minor allele of SNP based on minor allele frequency in the 1KG European population. The eQTLGen summary statistics data file was reformatted into file formats suitable for SMR, PMR-Egger and MRAID algorithms using custom Perl and R scripts. The European eQTL summary statistics data from 15 immune cell types (sample size = 90) was downloaded from the DICE (Database of Immune Cell Expression, Expression quantitative trait loci (eQTLs) and Epigenomics) project website (https://dice-database.org/) (Schmiedel et al., 2018). The 15 immune cell types in the DICE dataset are: naïve B cell, classical monocytes, non-classical monocytes, CD56dim CD16+ NK cells, various CD4 T cell types (Tfh, Th1, Th17, Th1/17, Th2, memory Treg, naïve Treg, naïve CD4 T cell, activated naïve CD4 T cell), naïve CD8 T cells and activated naïve CD8 T cells. The processed GEUVADIS LCL eQTL summary statistics data (Lappalainen et al., 2013) was obtained from (Kerimov et al., 2021) (European sample size = 445). The DICE and LCL eQTL summary statistics data files were reformatted into file formats suitable for the MR-MtRobin algorithm using custom Perl and R scripts.
Description of SMR method
If the Instrumental Variable (IV) assumptions hold, the classical MR method can unbiasedly estimate causal effect α (Figure 1) (Didelez and Sheehan, 2007; Smith and Ebrahim, 2003). A two-stage least-squares regression procedure then yields statistical estimate of causal effect as the ratio of GWAS and eQTL marginal effect sizes of the SNP. For single-SNP MR analysis, we used SMR (summary data-based MR) method, which was specifically developed for causal gene inference from eQTL and GWAS summary statistics data (Zhu et al., 2016). In SMR method, p-value of causal effect is computed using an approximate chi-squared test statistic , where and are the z statistics from the eQTL and GWAS study, respectively.
SMR analysis
From the eQTLGen whole-blood eQTL data, we removed SNP-gene associations with the nominal p-value > 5e-8. From the retained data, we kept only rows with genes whose transcription start sites (TSS) are located within 500kb of any GWAS significant SNP (p-value < 5e-8) from the Bentham-SLE-GWAS summary data. The TSS genomic coordinates information was obtained from the Ensembl database (www.ensembl.org, version GRCh37-p13). The classical MR method, as implemented in a two-sample MR method SMR (Zhu et al., 2016), was applied to the European whole-blood eQTL summary statistics data from the eQTLGen study (Võsa et al., 2021) (sample size = 31,684), and the European population SLE GWAS summary statistics data from (Bentham et al., 2015) (5201 cases and 9066 controls).
Description of PMR-Egger statistical model
PMR-Egger (Yuan et al., 2020) is a probabilistic Mendelian randomization (MR) method for performing two-sample MR analysis with correlated SNP instruments in the presence of uncorrelated horizontal pleiotropy which violates one of the IV assumptions (see Figure 1). PMR-Egger examines one gene at a time and estimates causal effect α of gene expression (exposure X) on the trait (outcome Y) of interest (Figure 3A). PMR-Egger is a mixed-effects statistical model containing both fixed effects (α and γ) and random effects (βj) (see Figure 3B for a concise description of the model). The random variables βj, which represents effect size of SNP j on the exposure variable X, are assumed to be independent and follow the same normal distribution (Figure 3A). In order to avoid the problem of overfitting the data, the PMR-Egger method makes a simplifying assumption that horizontal pleiotropic effect sizes of all instrumental SNPs are equal to a single unknown parameter γ. The marginal effect size estimates and from the exposure (eQTL) and outcome (GWAS) studies, and SNP genotype correlation matrix R (a measure of SNP-SNP linkage-disequilibrium levels) are used as the input data to perform a maximum likelihood inference of model parameters. In the eQTL summary statistics data, the marginal effect size of a SNP on the gene expression was estimated using a univariate linear regression analysis. The marginal effect size of a SNP consists of the ‘functional’ effect size of the SNP and LD-weighted contributions from the tagged SNPs which are in linkage disequilibrium (LD) with the SNP. This intuitively explains the vectorial equation , which reads in the component form as When SNPs are in linkage equilibrium, the genotype correlation matrix R becomes the identity matrix, and the equations from Figure 3B simplify.
PMR-Egger analysis
For each gene g from the set of 142 genes identified using SMR method, we considered the set Sg of all its significant eQTLs (PeQTL < 5e-8) located within 500kb of the gene’s topmost significant eQTL and with linkage disequilibrium r2 < 0.9 in order to avoid inclusion of highly correlated (and hence uninformative) SNPs in the analysis. For the analyses using PMR-Egger and MRAID, we selected 97 genes whose set Sg contains at least 25 SNPs in order to be able to reliably estimate the parameters of the mixed-effects linear model from the eQTL and GWAS summary statistics data for the SNPs in the set Sg. The cutoff of 25 SNPs can be heuristically motivated as follows: PMR-Egger and MRAID models contain many parameters, including the causal effect size alpha. Bayesian averaging was applied over all parameters except alpha, simplifying the models to focus on estimating alpha. This, combined with the ‘one in ten rule’ (Peduzzi et al., 1996) from the logistic regression method, which indicates the need for at least 10 data points to reliably estimate one model parameter, and considering that the number of SNPs in the model corresponds to the number of rows (i.e., data points) in the matrix equations of PMR-Egger and MRAID statistical models shown in Figures 3B and 5C, led us to conservatively select a lower bound of 25 SNPs. We performed PMR-Egger analysis on each of these 97 genes using the eQTL and SLE GWAS summary statistics data, and the LD data restricted to SNPs from the set Sg.
Description of MRAID statistical model
MRAID (MR with Automated Instrument Determination) is a probabilistic MR method for causal inference with correlated SNP instruments in the presence of IV assumptions-violating correlated and uncorrelated horizontal pleiotropic effects (Yuan et al., 2022). For a concise description of the MRAID model, see Figure 5. MRAID is a mixed-effects statistical model containing fixed effect (α and ρ) and random effect (β, ηu, ηc) variables. The random variables in the model are assumed to follow mixture probability distributions shown in Figures 5A,B. The use of random effect variables in the model can be motivated as follows. MRAID takes marginal effect sizes for a set of SNPs from exposure X and outcome Y summary statistics data as input and estimates various parameters in the model. For p SNPs, 3p parameters would have been required to parametrize the model if these were fixed-effect variables. In MRAID, these parameters are treated as independent random variables drawn from mixture distributions parametrized by a small number of hyper-parameters (πβ, σβ, etc.), thus circumventing the problem of overfitting the input data. Using an explicit formula for the posterior likelihood, Gibbs sampling can be performed to estimate parameters characterizing various distributions in the MRAID model (Yuan et al., 2022).
MRAID analysis
For the analysis using MRAID, we selected the same 97 genes as in PMR-Egger analysis. We performed MRAID analysis on each of these 97 genes using the eQTL and SLE GWAS summary statistics data, and the LD data restricted to SNPs from the set Sg. MRAID algorithm was run with a default setting for all parameters except the parameter Gibbsnumber (the number of Gibbs sampling iterations) which was set to 1e6.
Description of the MR-MtRobin statistical model
MR-MtRobin (Multi-tissue TWMR method ROBust to Invalid IV) method takes as input summary-level GWAS and multi-cell type eQTL statistics, and performs transcriptome-wide MR (TWMR) inference in the presence of invalid Ivs (Gleason et al., 2021). It leverages multi-cell type eQTL data in a mixed-effects statistical model, which makes identifiable the SNP-specific random effects due to pleiotropy from standard errors of eQTL summary statistics and provide inference of causal effect of gene expression on the outcome trait.
If the instrumental variable (IV) assumptions hold, an unbiased MR estimate of the causal effect size is given by (see Figure 1). Equivalently, , where θ ≡ 1/α. If the IV assumptions are violated due to horizontal pleiotropy, the equation will include bias terms: . For a concise description of the MR-MtRobin mixed-effects linear model, see Figure 7B. In the linear relationship between cell type-specific eQTL effect sizes and GWAS effect sizes , θ is a fixed effect and θj are SNP-specific random effects. The noise term εjm in the equation is cell type specific and depends on the structure of SNP-SNP genotype correlations. Specifically, for the cell type m, the vector εm follows a multivariate normal distribution with mean zero and the covariance matrix whose elements are the products of eQTL standard errors and SNP genotype correlation matrix elements.
The MR-MtRobin method is based on a generalized InSIDE (G-InSIDE) assumption, which is a more general version of the InSIDE assumption used in an earlier MR method called MR-Egger (Bowden et al., 2015). MR-Egger provides consistent causal effect estimates when the Instrument Strength Independent of Direct Effect (InSIDE) assumption holds. The InSIDE assumption is met when there is no correlation between the direct effects of the pleiotropic Instrument Variable on the Outcome and its effects on the Exposure variable (represented by ηu and G → X in Figure 5A, respectively). The G-InSIDE assumption used in MR-MtRobin is a more complex version of InSIDE (Gleason et al., 2021).
MR-MtRobin analysis
The SNP Instrument Variables (IVs) were selected using ‘select_IV’ function with the following values of the parameters: nTiss_thresh = 2 (minimum number of cell types in which a candidate IV must have eQTL p-value < 0.05) and ld_thresh = 0.5 (pairwise LD threshold r2 < 0.5). The main MR-MtRobin algorithm, MR_MtRobin, was run with the parameter pval_thresh = 0.05 (p-value threshold for Instrumental Variables). The p-values of gene expression causal effects were estimated using ‘MR_MtRobin_resample’ function with the parameter nsamp = 1e6 (number of resampling to perform in estimating the causal effect p-value).
The MR-MtRobin gene causal effect sizes shown in Figure 10 were calculated as follows. The MR_MtRobin algorithm does not explicitly return the gene causal effect sizes. However, it returns lme_res, an R object produced by the linear mixed-effects modeling algorithm lme4 (https://CRAN.R-project.org/package=lme4). From lme_res, the fixed effect θ (see Figure 7) was extracted. The causal effect size was computed as α = 1/θ.
Description of Multivariable Mendelian Randomization (MVMR) method
Multivariable Mendelian randomization (MVMR) extends the scope of single-variable Mendelian randomization method by addressing genetic variants that are linked to multiple Exposure variables or risk factors (Burgess et al., 2015; Burgess and Thompson, 2015). This method provides a more explicit modeling of horizontal pleiotropy as the pathways from genetic variants to the expression patterns of a subset of genes within a gene set in the causal inference.
The MVMR approach relies on specific assumptions known as extended Instrumental Variable (IV) assumptions. Firstly, it assumes that the genetic variant is associated with one or more of the Exposure variables. Secondly, the genetic variant should not be linked to any confounding factor that may influence the associations between the Exposure variables and the Outcome. Finally, the genetic variant is conditionally independent of the Outcome given the Exposure variables and Confounders.
It is important to note that not every genetic variant needs to be associated with every Exposure variable in the set. However, a variant cannot have associations with the Outcome except through the Exposure variables of interest. These assumptions guide the application of MVMR and ensure the validity of causal inference in the analysis.
The MVMR method, as implemented in the “MendelianRandomization” R package (https://cran.r-project.org/package=MendelianRandomization), leverages generalized multivariable weighted linear regression to analyze correlated genetic variants. This approach enables estimation of causal effects by regressing the associations of genetic variants with the Outcome variable onto the associations of genetic variants with the Exposure variables. The weighted regression is performed with the intercept set to zero, and the weights are determined by the inverse-variances of the associations of genetic variants with the Outcome.
The resulting causal effect estimates represent the direct causal effect of each exposure variable individually, while considering the other exposure variables as fixed. This allows for a comprehensive understanding of the specific causal effects associated with each exposure variable within the context of the others. The MVMR method provides a robust framework for analyzing the relationships between genetic variants, Exposure variables, and the Outcome, offering valuable insights into the direct causal effects in a multivariable setting.
MVMR analysis
The input data for the MVMR analysis was prepared in the following manner. First, for each of the five genes (BLK, FAM167A, RP11–148O21.2, RP11–148O21.4, and RP11–148O21.6), the eQTLGen expression Quantitative Trait Locus (eQTL) SNPs with a significance level of PeQTL < 0.001 were selected. Subsequent analysis was restricted to the shared significant eQTL SNPs across these genes.
To avoid including highly correlated genetic variants in the MVMR analysis, LD (Linkage Disequilibrium) variant pruning was performed using the plink2 algorithm (C. C. Chang et al., 2015) with the parameter “indep-pairwise 50 0.9” (a window size of 50kb and a threshold of r-squared = 0.9) and the input IGSR reference genotype data from European individuals described earlier in Materials and Methods.
From the resulting list of SNPs, GWAS SNPs that passed the DENTIST-filtering with a significance level of PGWAS < 0.001 were selected. This resulted in a final list of 196 SNPs spanning a genomic region of 1.3 megabases (hg19 coordinates chr8:10.5Mb-11.8Mb) at the FAM167A-BLK locus. A matrix of LD correlations between these SNPs was then calculated using the SMR algorithm (Zhu et al., 2016) with the parameters “--make-bld --r --ld-wind 4000”.
Using the SLE GWAS effect sizes and standard errors (Bentham et al., 2015), eQTLGen study eQTL effect sizes and standard errors (Võsa et al., 2021), as well as the LD correlation matrix, the input object for the MVMR analysis was created using the “mr_mvinput” function from the “MendelianRandomization” R package.
Finally, the MVMR analysis was performed using the “mr_mvivw” function (Multivariable inverse-variance weighted method) from the “MendelianRandomization” R package with the following parameters: model = “random”, correl = TRUE, distribution = “normal”. This analysis allowed for the assessment of the causal effects in a multivariable setting, taking into account the correlations between variables.
Data visualization
We used three R (https://cran.r-project.org/) packages for data visualization: (1) ggmanh for the Manhattan plot in Figure 2 (source: https://bioconductor.org/packages/release/bioc/html/ggmanh.html), (2) ggplot2 for the scatter plots in Figures 4,6,8–11 (source: https://CRAN.R-project.org/package=ggplot2), and (3) ggvenn for the Venn diagrams in Figures 12,13 (source: https://CRAN.Rproject.org/package=ggvenn).
FIGURE 8.

MR-MtRobin weighted linear regression analysis for the causal effect of BLK gene expression on SLE. For a description of the model, see Figure 7 and Materials and Methods. A scatter plot of SLE GWAS versus cell type-specific eQTL effect sizes of cis-SNPs in the neighborhood of BLK gene. Each colored circle represents a SNP in a particular cell type (see color to cell type dictionary on the right), with the size of the circle being proportional to the weight , where is the standard error of the estimate for eQTL effect size of the SNP j in the cell type m (see Figure 7) – the more accurate the estimate is, the larger the weight is. The weights are largest for LCL eQTLs due to the larger sample size (n = 445) of the LCL eQTL study. The slope of the black line through origin represents the fixed effect θ of the model (see Figure 7).
DISCUSSION
In this study, a two-step strategy was employed to identify causal genes for systemic lupus erythematosus (SLE). The first step utilized classical Mendelian randomization (MR) method without assuming horizontal pleiotropic effects to estimate the causal effect of gene expression on SLE, resulting in the identification of 142 genes, including 43 from outside of chromosome 6. In the second step, advanced probabilistic MR methods, namely PMR-Egger, MRAID and MR-MtRobin, were applied to the genes identified in the first step to filter out false positives, allowing for the consideration of horizontal pleiotropy.
Using PMR-Egger, which models uncorrelated horizontal pleiotropy, 13 non-chromosome 6 genes and 34 chromosome 6 genes with statistically significant causal effects were identified. MRAID, which models both correlated and uncorrelated horizontal pleiotropic effects, revealed 7 non-chromosome 6 genes and 6 chromosome 6 genes with statistically significant causal effects. To validate the findings, an independent dataset from different immune cell types was utilized, and the MR-MtRobin method identified 16 non-chromosome 6 genes and 21 chromosome 6 genes with statistically significant causal effects.
Although there were overlaps between the genes identified by the three MR methods, some genes were identified by only one or two methods due to different modelling assumptions and technical factors. These discrepancies highlight the complementary nature of the three MR methods and the importance of understanding their assumptions and limitations. Notably, MRAID showed a lower percentage of causal genes from chromosome 6 compared to other methods, suggesting its ability to reduce false positive causal genes.
A Multivariable Mendelian Randomization (MVMR) method was used to investigate the independence of causal effects of genes at the FAM167A-BLK locus. The joint analysis revealed significant effects for BLK and RP11–148O21.2, while the other three genes at the locus did not reach statistical significance. However, the certainty of these findings depends on the validity of the underlying assumptions of MVMR method.
Following the identification of causal genes using the MR methods, an extensive review of the literature was conducted to provide additional evidence supporting their association with the development of systemic lupus erythematosus (SLE) (see Appendix A). This literature review aimed to strengthen the understanding of the functional roles and mechanisms by which these genes contribute to the pathogenesis of the disease.
The extensive literature supports the notion that misexpression of genes identified as potentially causal for SLE in this study contributes to dysregulated immune responses, Epstein-Barr virus infection, dysregulated type I interferon, IL-12/23 and B-cell signaling, dysregulation of antibody class switch recombination, breakdown of self-tolerance, NF-kB hyperactivity, dysregulation in lymphocyte development, inflammatory response, tissue damage and oxidative stress (Figure 14). By integrating the findings from the MR methods with the extensive literature evidence, this study aimed to provide a comprehensive understanding of the functional roles and disease relevance of the identified causal genes in the context of SLE. This collective knowledge serves to strengthen the association between these genes and the development of the disease, paving the way for further research and potential therapeutic targets.
Under the assumption of valid IV instruments, the SMR method initially identified 142 genes as statistically significant. However, the subsequent use of more advanced probabilistic MR methods revealed that many of these genes were not statistically significant. This suggests that the presence of horizontal pleiotropic effects may have led to false positives among the genes identified by SMR. Nevertheless, it would be premature to conclude that the genes identified as ‘not causal’ by the advanced methods are indeed not involved in the development of SLE. We believe that a significant number of the 142 genes are genuinely causal, but demonstrating causality will require powerful datasets and more advanced MR methods. Drawing an analogy with the legal principle “presumption of innocence until proven guilty,” our approach adopts the “presumption of non-causal until proven causal” in gene discovery. We applied advanced MR methods to filter out potential false positives among the 142 genes identified by the simplistic SMR method. Despite lacking conclusive causality proof, we remain optimistic about future advancements in MR methods and richer data to demonstrate causality of these genes.
To increase the statistical power in detecting causal gene expression for SLE, it will be necessary to utilize large sample size eQTL, mQTL (methylation QTL), and other molecular data from diverse immune cell types. Additionally, sophisticated probabilistic MR methods capable of integrating molecular data from various cell types, performing multi-variable MR (MVMR) analyses, accounting for correlated SNPs, and resilient to the presence of invalid Instrumental Variables will be indispensable.
Our focus on European GWAS and eQTL data stems from the extensive sample size (n = 32k individuals of European ancestry) available in the eQTLGen study. Accurate estimation of parameters in probabilistic MR models relies on data from GWAS and eQTL studies with substantial sample sizes. Unfortunately, eQTL studies with similar sample sizes to eQTLGen are currently lacking for non-European populations. The significance of having data from diverse ethnic populations cannot be overstated, as demonstrated by the value of trans-ethnic study design approaches to boost statistical power in fine-mapping causal genetic variants (Li and Keating, 2014).
It is important to note that most MR methods, including the ones employed in this study, assume continuous trait values in linear models. However, the SLE GWAS summary statistics were calculated using logistic regression for a binary trait in a case-control study. Therefore, treating binary trait values as continuous in MR methods is not entirely justified, and the interpretation of causal effect size estimates should be considered semi-quantitative at best. The development of probabilistic MR methods that can appropriately handle binary traits is crucial, as demonstrated by recent progress (Allman et al., 2021).
Although MR-MtRobin enhances statistical power by utilizing eQTL data from multiple cell types, it adopts a consensus approach where only eQTLs with consistent effects across cell types are utilized. However, the etiology of diseases often involves cell-type specific effects (Hekselman and Yeger-Lotem, 2020). Therefore, the development of advanced MR methods that can model tissue-specific contributions to diseases should incorporate statistical approaches for estimating the causal tissues for complex traits and diseases (Arvanitis et al., 2022; Finucane et al., 2018; Hu et al., 2011; Ongen et al., 2017). By doing so, we can better understand the tissue-specific mechanisms underlying complex traits and diseases, leading to more accurate MR analyses.
Supplementary Material
TABLE S1. Candidate causal genes identified by the SMR method. Cells with p-value < 0.05 in the table are highlighted in light blue color. The table column names are described as follows. Gene: Symbol representing the gene; SNP: ID of the most significant eQTL SNP associated with the gene; Chrom: ID of chromosome where the gene and SNPs are located; Position: Chromosomal position (base-pair) of the top eQTL SNP associated with the gene; p_GWAS: SLE GWAS p-value of the top eQTL SNP; p_eQTL: eQTL p-value of the top eQTL SNP; alpha_SMR: gene causal effect size estimate by SMR method; p_SMR: p-value for the gene causal effect size estimate by SMR method; alpha_PMR: gene causal effect size estimate by PMR-Egger method; p_PMR: p-value for the gene causal effect size estimate by PMR-Egger method; alpha_MRAID: gene causal effect size estimate by MRAID method; p_MRAID: p-value for the gene causal effect size estimate by MRAID method; alpha_MtRobin: gene causal effect size estimate by MR-MtRobin method; p_MtRobin: p-value for the gene causal effect size estimate by MR-MtRobin method. ‘NA’ in the cells: data not available because of technical issues such as ‘algorithm generated errors’ and ‘insufficient number of genetic variants at the locus to reliably estimate causal effect size’; p_HEIDI: p-value for HEIDI test of heterogeneity
ACKNOWLEDGMENTS
Some analyses described in this work were performed on Owens computing cluster from the Ohio Supercomputer Center. This work was presented in part at American Society of Human Genetics international meeting, October 25-29, 2022.
FUNDING
This work was supported by a Career Development Award and a K Bridge Award from the Rheumatology Research Foundation to ITWH, and Merit Awards to JBH (I01 BX001834 and I01 BX006254), NIH R01 AI024717, and the Burroughs Wellcome Fund.
Appendix A. Literature evidence for disease relevance of candidate causal genes
We collected an extensive literature evidence to strengthen the association of 23 genes outside of chromosome 6 with the development of SLE. The following discussion is organized based on the MR methods utilized to identify these genes. See Figure 14 as a guide to the discussion.
Genes identified by PMR-Egger
IRF5 and TNPO3:
SMR, the single-SNP Mendelian randomization method, identified these two genes as potentially causal, using the SLE risk variant, rs6467223, as an Instrumental Variable (IV) (see Table 1). The IRF5-TNPO3 region harbors at least two independent SLE genetic association signals, one adjacent to the promoter of IRF5 and a more distal signal that spans the TNPO3 gene and includes variants within the promoter (Kottyan et al., 2015). Further, variants in the TNPO3 promoter exhibit allele specific enhancer activity for IRF5 independent of TNPO3 expression (Thynn et al., 2020). A plausible role for IRF5 as a modulator of type I interferon signaling in SLE is clear (Gallucci et al., 2021). While the specific mechanism through which TNPO3 may impact SLE risk is still uncertain, it is worth mentioning that TNPO3 plays a crucial role as a host factor in facilitating the entry of various viruses into the nucleus. The most well studied of these is HIV-1 (Bhargava et al., 2018). Several lines of evidence point to Epstein-Barr virus infection as an etiologic environmental trigger of SLE (Harley et al., 2018; Harley and James, 2006; Laurynenka et al., 2022). Considering the importance of nuclear import for the proper functioning of several proteins encoded by EBV (Li et al., 2021) and the role of TNPO3 in nuclear import, it is plausible that TNPO3 could potentially influence the risk of developing SLE through this mechanism.
XKR6:
The XKR6 gene, located near the FAM167A-BLK locus on chromosome 8, harbors variants whose association with SLE is influenced by the 8p23 polymorphic inversion (Namjou et al., 2014). The frequency of this ~4.5Mbp inversion variant varies significantly among human global ancestral populations (Salm et al., 2012). While the function of XKR6 remains incompletely defined, it is a member of the XKR family of phospholipid scramblases (Kodigepalli et al., 2015). The closest orthologue of known function is Ced-8, a C. elegans gene that regulates the timing of apoptosis (Stanfield and Horvitz, 2000). Two recent studies found that some members of the XKR family promote exposure of phosphatidylserine (PS) in apoptotic cells (Suzuki et al., 2014, 2013). PS exposure is a key signal for phagocytes to clear dying cells. Inappropriate clearance of apoptotic and necrotic cellular debris is thought to be a key immunogenic context by which exposure to self-antigen breaches tolerance in SLE (Munoz et al., 2008). While it may be tempting to speculate that XKR6 plays a similar role in humans, limited investigations into its involvement in this process have not shown a conservation of this function within the XKR family (Suzuki et al., 2014).
RP11–148O21.2, RP11–148O21.4 and RP11–148O21.6:
These three genes, located in the FAM167A-BLK locus, have been identified as causal using two distinct variants, rs2736345 (RP11–148O21.2 & RP11–148O21.4) and rs11250144 (RP11–148O21.6), as Instrumental Variables by the SMR method (see Table 1). While BLK and FAM167A exhibit functional effects that could plausibly impact the etiology of SLE (see below: Genes identified by MRAID), it is noteworthy the other three genes within the locus are classified as long interspersed non-coding RNA (lincRNA) and their specific functions remain unknown. Identification of two lincRNAs as causal for SLE risk at this locus does not ipso facto invalidate dysregulated expression of FAM167A and/or BLK as causal at this locus. This is because lincRNAs can act to regulate expression of nearby genes. A notable example is the association of IRF8 with SLE. At that locus, the SLE risk variant alters expression of the long non-coding RNA AC092723.1. AC092723.1, in turn, recruits TET1 to the IRF8 promoter and regulates its expression. Thus, the nomination by PMR-Egger of RP11-148O21.2, RP11-148O21.4 and RP11–148O21.6 as causal remains consistent with dysregulation of FAM167A and/or BLK ultimately mediating disease risk. Further, RP11-148O21.2, RP11–148O21.4 and RP11–148O21.6 all exhibit increased expression in B cells relative to other immune cells [DICE database, https://dice-database.org/genes, (Schmiedel et al., 2018)]. This finding aligns with the importance of B cells in SLE etiopathogenesis (Parodis et al., 2022). Consistent with the predictions of the PMR-Egger method, these genes have also been described to be co-regulated by SLE– and RA-associated risk variants (Lodde et al., 2020).
PHRF1 and IRF7:
These two genes are adjacent to one another, but the two IV genetic variants used by SMR to identify these genes are different (see Table 1). The SLE relevance of IRF7 function is clear as related to IRF7 impacting the type I interferon signaling pathway (Fu et al., 2011). Importantly, one of the proposed mechanisms of the SLE drug, mycophenolate mofetil, is through IRF7 (Shigesaka et al., 2020). In contrast, the role of PHRF1 is not immediately obvious, since few publications address the role of this gene [Plant HomeoDomain (PHD) and Ring Finger 1]. PHRF1 encodes an E3-ubiquitin ligase similar to another SLE risk gene, TNFAIP3, which encodes A20. Whereas TNFAIP3 modulates NF-kB signaling (Ma and Malynn, 2012), PHRF1 has been shown to modulate Non-homologous end joining (NHEJ) of double stranded DNA breaks (C.-F. Chang et al., 2015). A functional NHEJ pathway is necessary for class switch recombination (CSR) to IgA, IgG and IgE (Xu et al., 2022). Thus, modulation of NHEJ provides a plausible mechanism through which PHRF1 expression may impact SLE risk. Specifically, class switch from IgM to IgG is characteristic of the pathogenic autoantibodies of SLE patients, in whom CSR appears to be dysregulated (Liu et al., 2004).
ORMDL3, GSDMB and RP11–94L15.2:
The same IV genetic variant, rs12936231, was used by SMR to identify these three genes. In addition to association with SLE, this variant has been associated with several SLE related immune traits [GWAS Atlas PheWAS: https://atlas.ctglab.nl/PheWAS (Watanabe et al., 2019)].
Curiously, there are cell-state specific expression quantitative trait loci (eQTL) at this locus that show an association with ORMDL3 expression (Nathan et al., 2022). Plausible immune cell mechanisms for ORMDL3 include the modulation of sphingolipid synthesis leading to skewed CD4+ T cell development (Luthers et al., 2020). Several mechanisms that link GSDMB (Gasdermin B) function to immune phenotypes, which are expected to have an impact on SLE, have been described (Ivanov et al., 2023). However, there is some controversy regarding its role compared to other better studied members of the pore-forming gasdermin family. This is partly attributed to the absence of a GSDMB orthologue in mice, which limits in vivo studies and further complicates the understanding of its specific function (Ruan, 2023). RP11–94L15.2 encodes a long interspersed non-coding RNA (lincRNA) of unknown function. Notably, it is immediately adjacent to/overlapping with IKZF3, which encodes Aiolos, an SLE candidate in its own right and the target (together with Ikaros) of iberdomide, an SLE therapy currently in clinical trials (Merrill et al., 2022, p. 2).
TYK2:
In addition to its association with SLE, the top eQTL variant, rs11085725, used in the identification of this gene through the SMR method, has also been associated with several SLE related immune traits [GWAS Atlas PheWAS: https://atlas.ctglab.nl/PheWAS (Watanabe et al., 2019)]. Curiously, several variants altering the amino acid sequence of TYK2 are also associated with SLE, RA and IBD with limited effect on non-immune mediated traits (Diogo et al., 2015). The association at TYK2 may be similar to that observed at ITGAM, another SLE associated gene. The ITGAM risk variant that alters the amino acid sequence has also been shown to act as an enhancer for ITGAM mRNA expression (Maiti et al., 2014). Indeed, since exonic transcription factor binding sites (Stergachis et al., 2013) are relatively common, this may represent a more general genetic risk mechanism than one might appreciate. Regardless, Tyk2 is a Janus Kinase immediately downstream of several cytokine receptors. This notably includes IL-12/23 and type I interferon signaling pathways. Recent FDA approval has been granted for deucravacitinib, a selective Tyk2 inhibitor for treatment of psoriasis (Hoy, 2022). Furthermore, a recent phase II clinical trial of this agent in SLE showed increased response rates over placebo and phase III trials are underway (Morand et al., 2023).
UBE2L3:
Along with its association with SLE, the top eQTL variant, rs2070512, utilized in identifying this gene through the SMR method, has also shown associations with multiple immune traits relevant to SLE [GWAS Atlas PheWAS: https://atlas.ctglab.nl/PheWAS (Watanabe et al., 2019)]. Prior work has shown increased mRNA (UBE2L3) and protein (UBCH7) level expression in SLE patients bearing the risk haplotype (Wang et al., 2012). This increase in expression appears to depend on YY1-mediated enhancement of an interaction between the UBE2L3 and adjacent YDJC promoter (Gopalakrishnan et al., 2022). The lead SLE risk variant at this locus has been shown to regulate basal NF-kB in monocytes and B cells. This variant also regulates responsiveness to TNF-family signals: TNF in myeloid cells and CD40 in B cells. (M. J. Lewis et al., 2015). Genotype-associated correlates in B cell subsets of SLE patients and controls and a predilection for expression in plasma cells (M. Lewis et al., 2015). In addition to CD40– and TNF–stimulus induced NF-kB activity, TLR7– stimulus induced NF-kB activity has been attributed to UBE2L3 (Mauro et al., 2023). In addition to this, UBE2L3 is a proposed target of dimethyl fumarate (DMF), a therapy approved for both multiple sclerosis and psoriasis. Several additional proposed mechanisms of DMF include modulation of Nrf2-dependent, Nrf2-independent, Th2, type 2 myeloid and Th17 pathways (Ebihara et al., 2016; Tollenaere et al., 2021; Yadav et al., 2019). The relative contribution of each of these mechanisms in current clinical indications remains incompletely clear.
Genes identified by MRAID
AF131215.9 and AF131215.2:
These two genes are lncRNAs of unknown function located within intron 1 of the canonical [according to the MANE project (Morales et al., 2022)] transcript of XKR6. The specific functions of these genes and their impact on SLE-related immune phenotypes remains to be determined. However, the SLE association near IRF8 (see below) provides a potential general mechanism for variants that impact lncRNA expression. In this scenario, lncRNA expression modulates expression of one or more nearby genes in close physical proximity to the lncRNA. Applying this model to AF131215.9 and AF131215.2, it may be that they regulate expression of XKR6 or another nearby SLE risk gene, such as FAM167A or BLK. Alternatively, they may modify SLE risk through another mechanism.
FAM167A, BLK and RP11–148O21.4:
RP11–148O21.4 was also identified by PMR-Egger (see discussion above). As for BLK and FAM167A, previous studies have demonstrated that the SLE risk variants reciprocally regulate the expression of these two genes (Hom et al., 2008; Saint Just Ribeiro et al., 2022). At first glance, decreased expression or activity of BLK seems paradoxical in light of the B cell hyperactivity observed in SLE patients (Peng, 2009) and the observation that increased BLK activity enhances B cell receptor-signaling (DeFranco, 1997). Nonetheless, BLK also remains a strong candidate causal gene for the associated autoimmune diseases. First, the correspondence of decreased BLK expression in B cell lines and primary B cells and disease risk variants has been described in SLE, SS and RA (Guthridge et al., 2014; Lindén et al., 2017; Simpfendorfer et al., 2012; Thalayasingam et al., 2018). Second, despite the decreased BLK expression in B cells bearing autoimmune disease risk alleles, B cells from individuals carrying the risk haplotype are more responsive to B cell receptor stimulus (Simpfendorfer et al., 2012). This increased responsiveness in the presence of decreased BLK expression/function is paralleled in marginal zone and follicular B cell subsets Blk-deficient mice (Samuelson et al., 2012). Third, Blk-deficiency and/or haploinsufficiency leads to worsened disease phenotypes in murine models of lupus-like disease (Samuelson et al., 2014; Wu et al., 2015). Fourth, rare, loss-of-function variants in BLK causing reduced kinase activity have been observed SLE patients and introduction of one such orthologous Blk variant increases pathogenic lymphocyte accumulation in the MRL.Faslpr murine lupus-like disease model (Jiang et al., 2019). Finally, in parallel to the accumulation of splenic B1a cells in murine Blk-deficiency, healthy persons carrying the SLE/RA risk allele at rs2736340 were found to have increased B1-like cells and IgG anti-dsDNA in their peripheral blood (Wu et al., 2015). Taken together, these data support the hypothesis that BLK is a causal gene contributing to autoimmune disease risk at this risk locus. While the long-standing observation that BLK contributes to B cell receptor signaling (Burkhardt et al., 1991; DeFranco, 1997) makes this support seem paradoxical, data clearly demonstrate differential requirement for Blk and effect on signaling at various stages of B cell development, whether at the pre-B cell stage (Saijo et al., 2003) or in mature B cells (Samuelson et al., 2015).
As for how FAM167A might impact SLE risk, recent data implicate the product of this gene, DIORA-1, in NF-kB signaling (Mentlein et al., 2018; Yang et al., 2022), a pathway of some importance in the generation of lupus-autoimmunity (Mauro et al., 2023). Thus, both of these genes impact biological pathways of relevance to the development of SLE. The impact of both genes on relevant biological pathways in SLE, along with the findings from MRAID and MR-MtRobin, strengthens the causal connection between the FAM167A-BLK gene dyad and SLE.
PHRF1 and UBE2L3:
These genes were also identified by PMR-Egger and the impact of both on pathways relevant to SLE risk is discussed above.
Genes identified by MR-MtRobin
Altogether, the 16 non-chromosome 6 genes identified by MR-MtRobin comprise five areas of the genome.
GPX3:
This gene encodes glutathione peroxidase 3 and is located adjacent to TNIP1. The product of this gene is a plasma selenoprotein that regulates oxidative stress by reducing lipid peroxides and hydrogen peroxide. How glutathione peroxidase 3 might impact SLE risk is not immediately obvious to us. However, NCF1 (Zhao et al., 2017) and NCF2 (Jacob et al., 2012), are both SLE risk genes where nonsynonymous variants have been implicated as causal. The enzymes encoded by these genes act upstream of glutathione peroxidase by generating superoxide that is converted to hydrogen peroxide by superoxide dismutase. Thus, GPX3 could act in the same pathway as these other two SLE risk genes.
IRF5, RP11–128A6.2, SMO:
IRF5 was identified as causal by PMR-Egger (see discussion above). As for RP11–128A6.2 it encodes an ornithine decarboxylase pseudogene located within one of the exons of TNPO3. In contrast, SMO encodes smoothened, a G-protein coupled receptor that becomes activated when sonic hedgehog (Shh) or another ligand binds Patched Homologue 1 (PTCH1) (Zhang and Beachy, 2023). How this core developmental pathway might impact SLE-relevant biology is not directly apparent. However, hedgehog signaling is important at several lymphocyte developmental stages, including B lymphopoiesis (Cooper et al., 2012), germinal center formation (Sacedón et al., 2005), during the process of positive selection of T cell clones in the thymus (Outram et al., 2000), and during the activation (Lowrey et al., 2002) and proliferation (Chan et al., 2006) of CD4+ T lymphocytes.
XKR6, AF131215.9, AF131215.2, FAM167A, BLK, RP11–148O21.6, RP11–148O21.4, RP11–148O21.2:
These genes were identified as causal by either PMR-Egger or MRAID (see discussion above).
PHRF1 and TMEM80:
PHRF1 was identified as causal by both PMR-Egger and MRAID (see discussion above). As for TMEM80, the product of this gene is associated with the MKS signaling module of the primary cilia (Li et al., 2016). Mechanistic links to SLE etiology for this gene are also unclear.
RP11–542M13.3 and RP11–542M13.2:
These genes are implicated as causal by SMR method using two distinct variants as IV variables. The genes encode two adjacent lncRNAs in the SLE risk region near IRF8. Prior work has demonstrated that the lncRNA encoded by the gene RP11–542M13.2, also known as AC092723.1, regulates IRF8 expression by recruiting a cell-type-specific enhancer complex (Zhou et al., 2022).
DATA AVAILABILITY STATEMENT
The original contributions presented in the study are included in the article/Supplementary Material. Further inquiries can be directed to the corresponding authors.
REFERENCES
- Afrasiabi A, Keane JT, Ong LTC, Alinejad-Rokny H, Fewings NL, Booth DR, Parnell GP, Swaminathan S, 2022. Genetic and transcriptomic analyses support a switch to lytic phase in Epstein Barr virus infection as an important driver in developing Systemic Lupus Erythematosus. J. Autoimmun 127, 102781. 10.1016/j.jaut.2021.102781 [DOI] [PubMed] [Google Scholar]
- Allman PH, Aban I, Long DM, Bridges SL, Srinivasasainagendra V, MacKenzie T, Cutter G, Tiwari HK, 2021. A novel Mendelian randomization method with binary risk factor and outcome. Genet. Epidemiol 45, 549–560. 10.1002/gepi.22387 [DOI] [PubMed] [Google Scholar]
- Arvanitis M, Tayeb K, Strober BJ, Battle A, 2022. Redefining tissue specificity of genetic regulation of gene expression in the presence of allelic heterogeneity. Am. J. Hum. Genet 109, 223–239. 10.1016/j.ajhg.2022.01.002 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Bentham J, Morris DL, Graham DSC, Pinder CL, Tombleson P, Behrens TW, Martín J, Fairfax BP, Knight JC, Chen L, Replogle J, Syvänen A-C, Rönnblom L, Graham RR, Wither JE, Rioux JD, Alarcón-Riquelme ME, Vyse TJ, 2015. Genetic association analyses implicate aberrant regulation of innate and adaptive immunity genes in the pathogenesis of systemic lupus erythematosus. Nat. Genet 47, 1457–1464. 10.1038/ng.3434 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Bhargava A, Lahaye X, Manel N, 2018. Let me in: Control of HIV nuclear entry at the nuclear envelope. Cytokine Growth Factor Rev. 40, 59–67. 10.1016/j.cytogfr.2018.02.006 [DOI] [PubMed] [Google Scholar]
- Bjornevik K, Cortese M, Healy BC, Kuhle J, Mina MJ, Leng Y, Elledge SJ, Niebuhr DW, Scher AI, Munger KL, Ascherio A, 2022. Longitudinal analysis reveals high prevalence of Epstein-Barr virus associated with multiple sclerosis. Science 375, 296–301. 10.1126/science.abj8222 [DOI] [PubMed] [Google Scholar]
- Boehm FJ, Zhou X, 2022. Statistical methods for Mendelian randomization in genome-wide association studies: A review. Comput. Struct. Biotechnol. J 20, 2338–2351. 10.1016/j.csbj.2022.05.015 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Bowden J, Davey Smith G, Burgess S, 2015. Mendelian randomization with invalid instruments: effect estimation and bias detection through Egger regression. Int. J. Epidemiol 44, 512–525. 10.1093/ije/dyv080 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Burgess S, Dudbridge F, Thompson SG, 2015. Re: “Multivariable Mendelian randomization: the use of pleiotropic genetic variants to estimate causal effects.” Am. J. Epidemiol 181, 290–291. 10.1093/aje/kwv017 [DOI] [PubMed] [Google Scholar]
- Burgess S, Thompson SG, 2015. Multivariable Mendelian randomization: the use of pleiotropic genetic variants to estimate causal effects. Am. J. Epidemiol 181, 251–260. 10.1093/aje/kwu283 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Burkhardt AL, Brunswick M, Bolen JB, Mond JJ, 1991. Anti-immunoglobulin stimulation of B lymphocytes activates src-related protein-tyrosine kinases. Proc. Natl. Acad. Sci. U. S. A 88, 7410–7414. 10.1073/pnas.88.16.7410 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Chan VSF, Chau S-Y, Tian L, Chen Y, Kwong SKY, Quackenbush J, Dallman M, Lamb J, Tam PKH, 2006. Sonic hedgehog promotes CD4+ T lymphocyte proliferation and modulates the expression of a subset of CD28-targeted genes. Int. Immunol 18, 1627–1636. 10.1093/intimm/dxl096 [DOI] [PubMed] [Google Scholar]
- Chang CC, Chow CC, Tellier LC, Vattikuti S, Purcell SM, Lee JJ, 2015. Second-generation PLINK: rising to the challenge of larger and richer datasets. GigaScience 4, 7. 10.1186/s13742-015-0047-8 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Chang C-F, Chu P-C, Wu P-Y, Yu M-Y, Lee J-Y, Tsai M-D, Chang M-S, 2015. PHRF1 promotes genome integrity by modulating non-homologous end-joining. Cell Death Dis. 6, e1716. 10.1038/cddis.2015.81 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Chen W, Wu Y, Zheng Z, Qi T, Visscher PM, Zhu Z, Yang J, 2021. Improved analyses of GWAS summary statistics by reducing data heterogeneity and errors. Nat. Commun 12, 7117. 10.1038/s41467-021-27438-7 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Cheng Q, Qiu T, Chai X, Sun B, Xia Y, Shi X, Liu J, 2022. MR-Corr2: a two-sample Mendelian randomization method that accounts for correlated horizontal pleiotropy using correlated instrumental variants. Bioinforma. Oxf. Engl 38, 303–310. 10.1093/bioinformatics/btab646 [DOI] [PubMed] [Google Scholar]
- Cheng Q, Yang Y, Shi X, Yeung K-F, Yang C, Peng H, Liu J, 2020. MR-LDP: a two-sample Mendelian randomization for GWAS summary statistics accounting for linkage disequilibrium and horizontal pleiotropy. NAR Genomics Bioinforma 2, lqaa028. 10.1093/nargab/lqaa028 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Clarke SLN, Mitchell RE, Sharp GC, Ramanan AV, Relton CL, 2023. Vitamin D Levels and Risk of Juvenile Idiopathic Arthritis: A Mendelian Randomization Study. Arthritis Care Res. 75, 674–681. 10.1002/acr.24815 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Cooper CL, Hardy RR, Reth M, Desiderio S, 2012. Non-cell-autonomous hedgehog signaling promotes murine B lymphopoiesis from hematopoietic progenitors. Blood 119, 5438–5448. 10.1182/blood-2011-12-397976 [DOI] [PMC free article] [PubMed] [Google Scholar]
- DeFranco AL, 1997. The complexity of signaling pathways activated by the BCR. Curr. Opin. Immunol 9, 296–308. 10.1016/s0952-7915(97)80074-x [DOI] [PubMed] [Google Scholar]
- Didelez V, Sheehan N, 2007. Mendelian randomization as an instrumental variable approach to causal inference. Stat. Methods Med. Res 16, 309–330. 10.1177/0962280206077743 [DOI] [PubMed] [Google Scholar]
- Diogo D, Bastarache L, Liao KP, Graham RR, Fulton RS, Greenberg JD, Eyre S, Bowes J, Cui J, Lee A, Pappas DA, Kremer JM, Barton A, Coenen MJH, Franke B, Kiemeney LA, Mariette X, Richard-Miceli C, Canhão H, Fonseca JE, de Vries N, Tak PP, Crusius JBA, Nurmohamed MT, Kurreeman F, Mikuls TR, Okada Y, Stahl EA, Larson DE, Deluca TL, O’Laughlin M, Fronick CC, Fulton LL, Kosoy R, Ransom M, Bhangale TR, Ortmann W, Cagan A, Gainer V, Karlson EW, Kohane I, Murphy SN, Martin J, Zhernakova A, Klareskog L, Padyukov L, Worthington J, Mardis ER, Seldin MF, Gregersen PK, Behrens T, Raychaudhuri S, Denny JC, Plenge RM, 2015. TYK2 protein-coding variants protect against rheumatoid arthritis and autoimmunity, with no evidence of major pleiotropic effects on non-autoimmune complex traits. PloS One 10, e0122271. 10.1371/journal.pone.0122271 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ebihara S, Tajima H, Ono M, 2016. Nuclear factor erythroid 2-related factor 2 is a critical target for the treatment of glucocorticoid-resistant lupus nephritis. Arthritis Res. Ther 18, 139. 10.1186/s13075-016-1039-5 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Finucane HK, Reshef YA, Anttila V, Slowikowski K, Gusev A, Byrnes A, Gazal S, Loh P-R, Lareau C, Shoresh N, Genovese G, Saunders A, Macosko E, Pollack S, Brainstorm Consortium, Perry JRB, Buenrostro JD, Bernstein BE, Raychaudhuri S, McCarroll S, Neale BM, Price AL, 2018. Heritability enrichment of specifically expressed genes identifies disease-relevant tissues and cell types. Nat. Genet 50, 621–629. 10.1038/s41588-018-0081-4 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Fu Q, Zhao J, Qian X, Wong JLH, Kaufman KM, Yu CY, Hwee Siew Howe, Tan Tock Seng Hospital Lupus Study Group, Mok MY, Harley JB, Guthridge JM, Song YW, Cho S-K, Bae S-C, Grossman JM, Hahn BH, Arnett FC, Shen N, Tsao BP, 2011. Association of a functional IRF7 variant with systemic lupus erythematosus. Arthritis Rheum. 63, 749–754. 10.1002/art.30193 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Gallucci S, Meka S, Gamero AM, 2021. Abnormalities of the type I interferon signaling pathway in lupus autoimmunity. Cytokine 146, 155633. 10.1016/j.cyto.2021.155633 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Gleason KJ, Yang F, Chen LS, 2021. A robust two-sample transcriptome-wide Mendelian randomization method integrating GWAS with multi-tissue eQTL summary statistics. Genet. Epidemiol 45, 353–371. 10.1002/gepi.22380 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Gopalakrishnan J, Tessneer KL, Fu Y, Pasula S, Pelikan RC, Kelly JA, Wiley GB, Gaffney PM, 2022. Variants on the UBE2L3/YDJC Autoimmune Disease Risk Haplotype Increase UBE2L3 Expression by Modulating CCCTC-Binding Factor and YY1 Binding. Arthritis Rheumatol. Hoboken NJ 74, 163–173. 10.1002/art.41925 [DOI] [PMC free article] [PubMed] [Google Scholar]
- GTEx Consortium, Laboratory, Data Analysis &Coordinating Center (LDACC)—Analysis Working Group, Statistical Methods groups—Analysis Working Group, Enhancing GTEx (eGTEx) groups, NIH Common Fund, NIH/NCI, NIH/NHGRI, NIH/NIMH, NIH/NIDA, Biospecimen Collection Source Site—NDRI, Biospecimen Collection Source Site—RPCI, Biospecimen Core Resource—VARI, Brain Bank Repository—University of Miami Brain Endowment Bank, Leidos Biomedical—Project Management, ELSI Study, Genome Browser Data Integration &Visualization—EBI, Genome Browser Data Integration &Visualization—UCSC Genomics Institute, University of California Santa Cruz, Lead analysts:, Laboratory, Data Analysis &Coordinating Center (LDACC):, NIH program management:, Biospecimen collection:, Pathology:, eQTL manuscript working group:, Battle A, Brown CD, Engelhardt BE, Montgomery SB, 2017. Genetic effects on gene expression across human tissues. Nature 550, 204–213. 10.1038/nature24277 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Guthridge JM, Lu R, Sun H, Sun C, Wiley GB, Dominguez N, Macwana SR, Lessard CJ, Kim-Howard X, Cobb BL, Kaufman KM, Kelly JA, Langefeld CD, Adler AJ, Harley ITW, Merrill JT, Gilkeson GS, Kamen DL, Niewold TB, Brown EE, Edberg JC, Petri MA, Ramsey-Goldman R, Reveille JD, Vilá LM, Kimberly RP, Freedman BI, Stevens AM, Boackle SA, Criswell LA, Vyse TJ, Behrens TW, Jacob CO, Alarcón-Riquelme ME, Sivils KL, Choi J, Joo YB, Bang S-Y, Lee H-S, Bae S-C, Shen N, Qian X, Tsao BP, Scofield RH, Harley JB, Webb CF, Wakeland EK, James JA, Nath SK, Graham RR, Gaffney PM, 2014. Two functional lupus-associated BLK promoter variants control cell-type- and developmental-stage-specific transcription. Am. J. Hum. Genet 94, 586–598. 10.1016/j.ajhg.2014.03.008 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Harley ITW, Sawalha AH, 2022. Systemic lupus erythematosus as a genetic disease. Clin. Immunol. Orlando Fla 236, 108953. 10.1016/j.clim.2022.108953 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Harley JB, Chen X, Pujato M, Miller D, Maddox A, Forney C, Magnusen AF, Lynch A, Chetal K, Yukawa M, Barski A, Salomonis N, Kaufman KM, Kottyan LC, Weirauch MT, 2018. Transcription factors operate across disease loci, with EBNA2 implicated in autoimmunity. Nat. Genet 50, 699–707. 10.1038/s41588-018-0102-3 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Harley JB, James JA, 2006. Epstein-Barr virus infection induces lupus autoimmunity. Bull. NYU Hosp. Jt. Dis 64, 45–50. [PubMed] [Google Scholar]
- Hekselman I, Yeger-Lotem E, 2020. Mechanisms of tissue and cell-type specificity in heritable traits and diseases. Nat. Rev. Genet 21, 137–150. 10.1038/s41576-019-0200-9 [DOI] [PubMed] [Google Scholar]
- Hom G, Graham RR, Modrek B, Taylor KE, Ortmann W, Garnier S, Lee AT, Chung SA, Ferreira RC, Pant PVK, Ballinger DG, Kosoy R, Demirci FY, Kamboh MI, Kao AH, Tian C, Gunnarsson I, Bengtsson AA, Rantapää-Dahlqvist S, Petri M, Manzi S, Seldin MF, Rönnblom L, Syvänen A-C, Criswell LA, Gregersen PK, Behrens TW, 2008. Association of systemic lupus erythematosus with C8orf13-BLK and ITGAM-ITGAX. N. Engl. J. Med 358, 900–909. 10.1056/NEJMoa0707865 [DOI] [PubMed] [Google Scholar]
- Hoy SM, 2022. Deucravacitinib: First Approval. Drugs 82, 1671–1679. 10.1007/s40265-022-01796-y [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hu X, Kim H, Stahl E, Plenge R, Daly M, Raychaudhuri S, 2011. Integrating autoimmune risk loci with gene-expression data identifies specific pathogenic immune cell subsets. Am. J. Hum. Genet 89, 496–506. 10.1016/j.ajhg.2011.09.002 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ivanov AI, Rana N, Privitera G, Pizarro TT, 2023. The enigmatic roles of epithelial gasdermin B: Recent discoveries and controversies. Trends Cell Biol. 33, 48–59. 10.1016/j.tcb.2022.06.006 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Jacob CO, Eisenstein M, Dinauer MC, Ming W, Liu Q, John S, Quismorio FP, Reiff A, Myones BL, Kaufman KM, McCurdy D, Harley JB, Silverman E, Kimberly RP, Vyse TJ, Gaffney PM, Moser KL, Klein-Gitelman M, Wagner-Weiner L, Langefeld CD, Armstrong DL, Zidovetzki R, 2012. Lupus-associated causal mutation in neutrophil cytosolic factor 2 (NCF2) brings unique insights to the structure and function of NADPH oxidase. Proc. Natl. Acad. Sci. U. S. A 109, E59–67. 10.1073/pnas.1113251108 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Jiang SH, Athanasopoulos V, Ellyard JI, Chuah A, Cappello J, Cook A, Prabhu SB, Cardenas J, Gu J, Stanley M, Roco JA, Papa I, Yabas M, Walters GD, Burgio G, McKeon K, Byers JM, Burrin C, Enders A, Miosge LA, Canete PF, Jelusic M, Tasic V, Lungu AC, Alexander SI, Kitching AR, Fulcher DA, Shen N, Arsov T, Gatenby PA, Babon JJ, Mallon DF, de Lucas Collantes C, Stone EA, Wu P, Field MA, Andrews TD, Cho E, Pascual V, Cook MC, Vinuesa CG, 2019. Functional rare and low frequency variants in BLK and BANK1 contribute to human lupus. Nat. Commun 10, 2201. 10.1038/s41467-019-10242-9 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kerimov N, Hayhurst JD, Peikova K, Manning JR, Walter P, Kolberg L, Samoviča M, Sakthivel MP, Kuzmin I, Trevanion SJ, Burdett T, Jupp S, Parkinson H, Papatheodorou I, Yates AD, Zerbino DR, Alasoo K, 2021. A compendium of uniformly processed human gene expression and splicing quantitative trait loci. Nat. Genet 53, 1290–1299. 10.1038/s41588-021-00924-w [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kodigepalli KM, Bowers K, Sharp A, Nanjundan M, 2015. Roles and regulation of phospholipid scramblases. FEBS Lett. 589, 3–14. 10.1016/j.febslet.2014.11.036 [DOI] [PubMed] [Google Scholar]
- Lappalainen T, Sammeth M, Friedländer MR, ‘t Hoen PAC, Monlong J, Rivas MA, Gonzàlez-Porta M, Kurbatova N, Griebel T, Ferreira PG, Barann M, Wieland T, Greger L, van Iterson M, Almlöf J, Ribeca P, Pulyakhina I, Esser D, Giger T, Tikhonov A, Sultan M, Bertier G, MacArthur DG, Lek M, Lizano E, Buermans HPJ, Padioleau I, Schwarzmayr T, Karlberg O, Ongen H, Kilpinen H, Beltran S, Gut M, Kahlem K, Amstislavskiy V, Stegle O, Pirinen M, Montgomery SB, Donnelly P, McCarthy MI, Flicek P, Strom TM, Geuvadis Consortium, Lehrach H, Schreiber S, Sudbrak R, Carracedo A, Antonarakis SE, Häsler R, Syvänen A-C, van Ommen G-J, Brazma A, Meitinger T, Rosenstiel P, Guigó R, Gut IG, Estivill X, Dermitzakis ET, 2013. Transcriptome and genome sequencing uncovers functional variation in humans. Nature 501, 506–511. 10.1038/nature12531 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kottyan LC, Zoller EE, Bene J, Lu X, Kelly JA, Rupert AM, Lessard CJ, Vaughn SE, Marion M, Weirauch MT, Namjou B, Adler A, Rasmussen A, Glenn S, Montgomery CG, Hirschfield GM, Xie G, Coltescu C, Amos C, Li H, Ice JA, Nath SK, Mariette X, Bowman S, UK Primary Sjögren’s Syndrome Registry, Rischmueller M, Lester S, Brun JG, Gøransson LG, Harboe E, Omdal R, Cunninghame-Graham DS, Vyse T, Miceli-Richard C, Brennan MT, Lessard JA, Wahren-Herlenius M, Kvarnström M, Illei GG, Witte T, Jonsson R, Eriksson P, Nordmark G, Ng W-F, UK Primary Sjögren’s Syndrome Registry, Anaya J-M, Rhodus NL, Segal BM, Merrill JT, James JA, Guthridge JM, Scofield RH, Alarcon-Riquelme M, Bae S-C, Boackle SA, Criswell LA, Gilkeson G, Kamen DL, Jacob CO, Kimberly R, Brown E, Edberg J, Alarcón GS, Reveille JD, Vilá LM, Petri M, Ramsey-Goldman R, Freedman BI, Niewold T, Stevens AM, Tsao BP, Ying J, Mayes MD, Gorlova OY, Wakeland W, Radstake T, Martin E, Martin J, Siminovitch K, Moser Sivils KL, Gaffney PM, Langefeld CD, Harley JB, Kaufman KM, 2015. The IRF5-TNPO3 association with systemic lupus erythematosus has two components that other autoimmune disorders variably share. Hum. Mol. Genet 24, 582–596. 10.1093/hmg/ddu455 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Laurynenka V, Ding L, Kaufman KM, James JA, Harley JB, 2022. A High Prevalence of Anti-EBNA1 Heteroantibodies in Systemic Lupus Erythematosus (SLE) Supports Anti-EBNA1 as an Origin for SLE Autoantibodies. Front. Immunol 13, 830993. 10.3389/fimmu.2022.830993 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lewis M, Vyse S, Shields A, Boeltz S, Gordon P, Spector T, Lehner P, Walczak H, Vyse T, 2015. Effect of UBE2L3 genotype on regulation of the linear ubiquitin chain assembly complex in systemic lupus erythematosus. Lancet Lond. Engl 385 Suppl 1, S9. 10.1016/S0140-6736(15)60324-5 [DOI] [PubMed] [Google Scholar]
- Lewis MJ, Vyse S, Shields AM, Boeltz S, Gordon PA, Spector TD, Lehner PJ, Walczak H, Vyse TJ, 2015. UBE2L3 polymorphism amplifies NF-κB activation and promotes plasma cell development, linking linear ubiquitination to multiple autoimmune diseases. Am. J. Hum. Genet 96, 221–234. 10.1016/j.ajhg.2014.12.024 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Li C, Jensen VL, Park K, Kennedy J, Garcia-Gonzalo FR, Romani M, De Mori R, Bruel A-L, Gaillard D, Doray B, Lopez E, Rivière J-B, Faivre L, Thauvin-Robinet C, Reiter JF, Blacque OE, Valente EM, Leroux MR, 2016. MKS5 and CEP290 Dependent Assembly Pathway of the Ciliary Transition Zone. PLoS Biol. 14, e1002416. 10.1371/journal.pbio.1002416 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Li J, Guo Y, Deng Y, Hu L, Li B, Deng S, Zhong J, Xie L, Shi S, Hong X, Zheng X, Cai M, Li M, 2021. Subcellular Localization of Epstein-Barr Virus BLLF2 and Its Underlying Mechanisms. Front. Microbiol 12, 672192. 10.3389/fmicb.2021.672192 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Li YR, Keating BJ, 2014. Trans-ethnic genome-wide association studies: advantages and challenges of mapping in diverse populations. Genome Med. 6, 91. 10.1186/s13073-014-0091-5 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lindén M, Ramírez Sepúlveda JI, James T, Thorlacius GE, Brauner S, Gómez-Cabrero D, Olsson T, Kockum I, Wahren-Herlenius M, 2017. Sex influences eQTL effects of SLE and Sjögren’s syndrome-associated genetic polymorphisms. Biol. Sex Differ 8, 34. 10.1186/s13293-017-0153-7 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Liu S, Cerutti A, Casali P, Crow MK, 2004. Ongoing immunoglobulin class switch DNA recombination in lupus B cells: analysis of switch regulatory regions. Autoimmunity 37, 431–443. 10.1080/08916930400010611 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lloyd-Jones LR, Holloway A, McRae A, Yang J, Small K, Zhao J, Zeng B, Bakshi A, Metspalu A, Dermitzakis M, Gibson G, Spector T, Montgomery G, Esko T, Visscher PM, Powell JE, 2017. The Genetic Architecture of Gene Expression in Peripheral Blood. Am. J. Hum. Genet 100, 228–237. 10.1016/j.ajhg.2016.12.008 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lodde V, Murgia G, Simula ER, Steri M, Floris M, Idda ML, 2020. Long Noncoding RNAs and Circular RNAs in Autoimmune Diseases. Biomolecules 10, 1044. 10.3390/biom10071044 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lowrey JA, Stewart GA, Lindey S, Hoyne GF, Dallman MJ, Howie SEM, Lamb JR, 2002. Sonic hedgehog promotes cell cycle progression in activated peripheral CD4(+) T lymphocytes. J. Immunol. Baltim. Md 1950 169, 1869–1875. 10.4049/jimmunol.169.4.1869 [DOI] [PubMed] [Google Scholar]
- Luthers CR, Dunn TM, Snow AL, 2020. ORMDL3 and Asthma: Linking Sphingolipid Regulation to Altered T Cell Function. Front. Immunol 11, 597945. 10.3389/fimmu.2020.597945 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ma A, Malynn BA, 2012. A20: linking a complex regulator of ubiquitylation to immunity and human disease. Nat. Rev. Immunol 12, 774–785. 10.1038/nri3313 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Maiti AK, Kim-Howard X, Motghare P, Pradhan V, Chua KH, Sun C, Arango-Guerrero MT, Ghosh K, Niewold TB, Harley JB, Anaya J-M, Looger LL, Nath SK, 2014. Combined protein- and nucleic acid-level effects of rs1143679 (R77H), a lupus-predisposing variant within ITGAM. Hum. Mol. Genet 23, 4161–4176. 10.1093/hmg/ddu106 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Mauro D, Manou-Stathopoulou S, Rivellese F, Sciacca E, Goldmann K, Tsang V, Lucey-Clayton I, Pagani S, Alam F, Pyne D, Rajakariar R, Gordon PA, Whiteford J, Bombardieri M, Pitzalis C, Lewis MJ, 2023. UBE2L3 regulates TLR7-induced B cell autoreactivity in Systemic Lupus Erythematosus. J. Autoimmun 136, 103023. 10.1016/j.jaut.2023.103023 [DOI] [PubMed] [Google Scholar]
- Mentlein L, Thorlacius GE, Meneghel L, Aqrawi LA, Ramírez Sepúlveda JI, Grunewald J, Espinosa A, Wahren-Herlenius M, 2018. The rheumatic disease-associated FAM167A-BLK locus encodes DIORA-1, a novel disordered protein expressed highly in bronchial epithelium and alveolar macrophages. Clin. Exp. Immunol 193, 167–177. 10.1111/cei.13138 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Merrill JT, Werth VP, Furie R, van Vollenhoven R, Dörner T, Petronijevic M, Velasco J, Majdan M, Irazoque-Palazuelos F, Weiswasser M, Korish S, Ye Y, Gaudy A, Schafer PH, Liu Z, Agafonova N, Delev N, 2022. Phase 2 Trial of Iberdomide in Systemic Lupus Erythematosus. N. Engl. J. Med 386, 1034–1045. 10.1056/NEJMoa2106535 [DOI] [PubMed] [Google Scholar]
- Mo X, Guo Y, Qian Q, Fu M, Lei S, Zhang Y, Zhang H, 2020. Mendelian randomization analysis revealed potential causal factors for systemic lupus erythematosus. Immunology 159, 279–288. 10.1111/imm.13144 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Morales J, Pujar S, Loveland JE, Astashyn A, Bennett R, Berry A, Cox E, Davidson C, Ermolaeva O, Farrell CM, Fatima R, Gil L, Goldfarb T, Gonzalez JM, Haddad D, Hardy M, Hunt T, Jackson J, Joardar VS, Kay M, Kodali VK, McGarvey KM, McMahon A, Mudge JM, Murphy DN, Murphy MR, Rajput B, Rangwala SH, Riddick LD, Thibaud-Nissen F, Threadgold G, Vatsan AR, Wallin C, Webb D, Flicek P, Birney E, Pruitt KD, Frankish A, Cunningham F, Murphy TD, 2022. A joint NCBI and EMBL-EBI transcript set for clinical genomics and research. Nature 604, 310–315. 10.1038/s41586-022-04558-8 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Morand E, Pike M, Merrill JT, van Vollenhoven R, Werth VP, Hobar C, Delev N, Shah V, Sharkey B, Wegman T, Catlett I, Banerjee S, Singhal S, 2023. Deucravacitinib, a Tyrosine Kinase 2 Inhibitor, in Systemic Lupus Erythematosus: A Phase II, Randomized, Double-Blind, Placebo-Controlled Trial. Arthritis Rheumatol. Hoboken NJ: 75, 242–252. 10.1002/art.42391 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Morrison J, Knoblauch N, Marcus JH, Stephens M, He X, 2020. Mendelian randomization accounting for correlated and uncorrelated pleiotropic effects using genome-wide summary statistics. Nat. Genet 52, 740–747. 10.1038/s41588-020-0631-4 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Munoz LE, van Bavel C, Franz S, Berden J, Herrmann M, van der Vlag J, 2008. Apoptosis in the pathogenesis of systemic lupus erythematosus. Lupus 17, 371–375. 10.1177/0961203308089990 [DOI] [PubMed] [Google Scholar]
- Namjou B, Ni Y, Harley ITW, Chepelev I, Cobb B, Kottyan LC, Gaffney PM, Guthridge JM, Kaufman K, Harley JB, 2014. The effect of inversion at 8p23 on BLK association with lupus in Caucasian population. PloS One 9, e115614. 10.1371/journal.pone.0115614 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Nathan A, Asgari S, Ishigaki K, Valencia C, Amariuta T, Luo Y, Beynor JI, Baglaenko Y, Suliman S, Price AL, Lecca L, Murray MB, Moody DB, Raychaudhuri S, 2022. Single-cell eQTL models reveal dynamic T cell state dependence of disease loci. Nature 606, 120–128. 10.1038/s41586-022-04713-1 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ongen H, Brown AA, Delaneau O, Panousis NI, Nica AC, GTEx Consortium, Dermitzakis ET, 2017. Estimating the causal tissues for complex traits and diseases. Nat. Genet 49, 1676–1683. 10.1038/ng.3981 [DOI] [PubMed] [Google Scholar]
- Outram SV, Varas A, Pepicelli CV, Crompton T, 2000. Hedgehog signaling regulates differentiation from double-negative to double-positive thymocyte. Immunity 13, 187–197. 10.1016/s1074-7613(00)00019-4 [DOI] [PubMed] [Google Scholar]
- Parodis I, Gatto M, Sjöwall C, 2022. B cells in systemic lupus erythematosus: Targets of new therapies and surveillance tools. Front. Med 9, 952304. 10.3389/fmed.2022.952304 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Peduzzi P, Concato J, Kemper E, Holford TR, Feinstein AR, 1996. A simulation study of the number of events per variable in logistic regression analysis. J. Clin. Epidemiol 49, 1373–1379. 10.1016/s0895-4356(96)00236-3 [DOI] [PubMed] [Google Scholar]
- Peng SL, 2009. Altered T and B lymphocyte signaling pathways in lupus. Autoimmun. Rev 8, 179–183. 10.1016/j.autrev.2008.07.040 [DOI] [PubMed] [Google Scholar]
- Ren Y, Liu J, Li W, Zheng H, Dai H, Qiu G, Yu D, Yao D, Yin X, 2022. Causal Associations between Vitamin D Levels and Psoriasis, Atopic Dermatitis, and Vitiligo: A Bidirectional Two-Sample Mendelian Randomization Analysis. Nutrients 14, 5284. 10.3390/nu14245284 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ruan J, 2023. Regulating GSDMB pore formation: to ignite or inhibit? Cell Death Differ. 10.1038/s41418-023-01163-8 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Sacedón R, Díez B, Nuñez V, Hernández-López C, Gutierrez-Frías C, Cejalvo T, Outram SV, Crompton T, Zapata AG, Vicente A, Varas A, 2005. Sonic hedgehog is produced by follicular dendritic cells and protects germinal center B cells from apoptosis. J. Immunol. Baltim. Md 1950 174, 1456–1461. 10.4049/jimmunol.174.3.1456 [DOI] [PubMed] [Google Scholar]
- Saijo K, Schmedt C, Su I-H, Karasuyama H, Lowell CA, Reth M, Adachi T, Patke A, Santana A, Tarakhovsky A, 2003. Essential role of Src-family protein tyrosine kinases in NF-kappaB activation during B cell development. Nat. Immunol 4, 274–279. 10.1038/ni893 [DOI] [PubMed] [Google Scholar]
- Saint Just Ribeiro M, Tripathi P, Namjou B, Harley JB, Chepelev I, 2022. Haplotype-specific chromatin looping reveals genetic interactions of regulatory regions modulating gene expression in 8p23.1. Front. Genet 13, 1008582. 10.3389/fgene.2022.1008582 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Salm MPA, Horswell SD, Hutchison CE, Speedy HE, Yang X, Liang L, Schadt EE, Cookson WO, Wierzbicki AS, Naoumova RP, Shoulders CC, 2012. The origin, global distribution, and functional impact of the human 8p23 inversion polymorphism. Genome Res. 22, 1144–1153. 10.1101/gr.126037.111 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Samuelson EM, Laird RM, Maue AC, Rochford R, Hayes SM, 2012. Blk haploinsufficiency impairs the development, but enhances the functional responses, of MZ B cells. Immunol. Cell Biol 90, 620–629. 10.1038/icb.2011.76 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Samuelson EM, Laird RM, Papillion AM, Tatum AH, Princiotta MF, Hayes SM, 2014. Reduced B lymphoid kinase (Blk) expression enhances proinflammatory cytokine production and induces nephrosis in C57BL/6-lpr/lpr mice. PloS One 9, e92054. 10.1371/journal.pone.0092054 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Schaid DJ, Chen W, Larson NB, 2018. From genome-wide associations to candidate causal variants by statistical fine-mapping. Nat. Rev. Genet 19, 491–504. 10.1038/s41576-018-0016-z [DOI] [PMC free article] [PubMed] [Google Scholar]
- Schmiedel BJ, Singh D, Madrigal A, Valdovino-Gonzalez AG, White BM, Zapardiel-Gonzalo J, Ha B, Altay G, Greenbaum JA, McVicker G, Seumois G, Rao A, Kronenberg M, Peters B, Vijayanand P, 2018. Impact of Genetic Polymorphisms on Human Immune Cell Gene Expression. Cell 175, 1701–1715.e16. 10.1016/j.cell.2018.10.022 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Shigesaka M, Ito T, Inaba M, Imai K, Yamanaka H, Azuma Y, Tanaka A, Amuro H, Nishizawa T, Son Y, Satake A, Ozaki Y, Nomura S, 2020. Mycophenolic acid, the active form of mycophenolate mofetil, interferes with IRF7 nuclear translocation and type I IFN production by plasmacytoid dendritic cells. Arthritis Res. Ther 22, 264. 10.1186/s13075-020-02356-z [DOI] [PMC free article] [PubMed] [Google Scholar]
- Simpfendorfer KR, Olsson LM, Manjarrez Orduño N, Khalili H, Simeone AM, Katz MS, Lee AT, Diamond B, Gregersen PK, 2012. The autoimmunity-associated BLK haplotype exhibits cis-regulatory effects on mRNA and protein expression that are prominently observed in B cells early in development. Hum. Mol. Genet 21, 3918–3925. 10.1093/hmg/dds220 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Smith GD, Ebrahim S, 2003. “Mendelian randomization”: can genetic epidemiology contribute to understanding environmental determinants of disease? Int. J. Epidemiol 32, 1–22. 10.1093/ije/dyg070 [DOI] [PubMed] [Google Scholar]
- Stanfield GM, Horvitz HR, 2000. The ced-8 gene controls the timing of programmed cell deaths in C. elegans. Mol. Cell 5, 423–433. 10.1016/s1097-2765(00)80437-2 [DOI] [PubMed] [Google Scholar]
- Stergachis AB, Haugen E, Shafer A, Fu W, Vernot B, Reynolds A, Raubitschek A, Ziegler S, LeProust EM, Akey JM, Stamatoyannopoulos JA, 2013. Exonic transcription factor binding directs codon choice and affects protein evolution. Science 342, 1367–1372. 10.1126/science.1243490 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Suzuki J, Denning DP, Imanishi E, Horvitz HR, Nagata S, 2013. Xk-related protein 8 and CED-8 promote phosphatidylserine exposure in apoptotic cells. Science 341, 403–406. 10.1126/science.1236758 [DOI] [PubMed] [Google Scholar]
- Suzuki J, Imanishi E, Nagata S, 2014. Exposure of phosphatidylserine by Xk-related protein family members during apoptosis. J. Biol. Chem 289, 30257–30267. 10.1074/jbc.M114.583419 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Thalayasingam N, Nair N, Skelton AJ, Massey J, Anderson AE, Clark AD, Diboll J, Lendrem DW, Reynard LN, Cordell HJ, Eyre S, Isaacs JD, Barton A, Pratt AG, 2018. CD4+ and B Lymphocyte Expression Quantitative Traits at Rheumatoid Arthritis Risk Loci in Patients With Untreated Early Arthritis: Implications for Causal Gene Identification. Arthritis Rheumatol. Hoboken NJ: 70, 361–370. 10.1002/art.40393 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Thynn HN, Chen X-F, Hu W-X, Duan Y-Y, Zhu D-L, Chen H, Wang N-N, Chen H-H, Rong Y, Lu B-J, Yang M, Jiang F, Dong S-S, Guo Y, Yang T-L, 2020. An Allele-Specific Functional SNP Associated with Two Systemic Autoimmune Diseases Modulates IRF5 Expression by Long-Range Chromatin Loop Formation. J. Invest. Dermatol 140, 348–360.e11. 10.1016/j.jid.2019.06.147 [DOI] [PubMed] [Google Scholar]
- Tollenaere M. a. X., Hebsgaard J, Ewald DA, Lovato P, Garcet S, Li X, Pilger SD, Tiirikainen ML, Bertelsen M, Krueger JG, Norsgaard H, 2021. Signalling of multiple interleukin (IL)-17 family cytokines via IL-17 receptor A drives psoriasis-related inflammatory pathways. Br. J. Dermatol 185, 585–594. 10.1111/bjd.20090 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Verbanck M, Chen C-Y, Neale B, Do R, 2018. Detection of widespread horizontal pleiotropy in causal relationships inferred from Mendelian randomization between complex traits and diseases. Nat. Genet 50, 693–698. 10.1038/s41588-018-0099-7 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Võsa U, Claringbould A, Westra H-J, Bonder MJ, Deelen P, Zeng B, Kirsten H, Saha A, Kreuzhuber R, Yazar S, Brugge H, Oelen R, de Vries DH, van der Wijst MGP, Kasela S, Pervjakova N, Alves I, Favé M-J, Agbessi M, Christiansen MW, Jansen R, Seppälä I, Tong L, Teumer A, Schramm K, Hemani G, Verlouw J, Yaghootkar H, Sönmez Flitman R, Brown A, Kukushkina V, Kalnapenkis A, Rüeger S, Porcu E, Kronberg J, Kettunen J, Lee B, Zhang F, Qi T, Hernandez JA, Arindrarto W, Beutner F, BIOS Consortium, i2QTL Consortium, Dmitrieva J, Elansary M, Fairfax BP, Georges M, Heijmans BT, Hewitt AW, Kähönen M, Kim Y, Knight JC, Kovacs P, Krohn K, Li S, Loeffler M, Marigorta UM, Mei H, Momozawa Y, Müller-Nurasyid M, Nauck M, Nivard MG, Penninx BWJH, Pritchard JK, Raitakari OT, Rotzschke O, Slagboom EP, Stehouwer CDA, Stumvoll M, Sullivan P, ‘t Hoen PAC, Thiery J, Tönjes A, van Dongen J, van Iterson M, Veldink JH, Völker U, Warmerdam R, Wijmenga C, Swertz M, Andiappan A, Montgomery GW, Ripatti S, Perola M, Kutalik Z, Dermitzakis E, Bergmann S, Frayling T, van Meurs J, Prokisch H, Ahsan H, Pierce BL, Lehtimäki T, Boomsma DI, Psaty BM, Gharib SA, Awadalla P, Milani L, Ouwehand WH, Downes K, Stegle O, Battle A, Visscher PM, Yang J, Scholz M, Powell J, Gibson G, Esko T, Franke L, 2021. Large-scale cis- and trans-eQTL analyses identify thousands of genetic loci and polygenic scores that regulate blood gene expression. Nat. Genet 53, 1300–1310. 10.1038/s41588-021-00913-z [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wang L, Wang Y, Wang XE, Chen B, Zhang L, Lu X, 2023. Causal association between atopic eczema and inflammatory bowel disease: A two-sample bidirectional Mendelian randomization study of the East Asian population. J. Dermatol 50, 327–336. 10.1111/1346-8138.16642 [DOI] [PubMed] [Google Scholar]
- Wang S, Adrianto I, Wiley GB, Lessard CJ, Kelly JA, Adler AJ, Glenn SB, Williams AH, Ziegler JT, Comeau ME, Marion MC, Wakeland BE, Liang C, Kaufman KM, Guthridge JM, Alarcón-Riquelme ME, BIOLUPUS and GENLES Networks, Alarcón GS, Anaya J-M, Bae S-C, Kim J-H, Joo YB, Boackle SA, Brown EE, Petri MA, Ramsey-Goldman R, Reveille JD, Vilá LM, Criswell LA, Edberg JC, Freedman BI, Gilkeson GS, Jacob CO, James JA, Kamen DL, Kimberly RP, Martin J, Merrill JT, Niewold TB, Pons-Estel BA, Scofield RH, Stevens AM, Tsao BP, Vyse TJ, Langefeld CD, Harley JB, Wakeland EK, Moser KL, Montgomery CG, Gaffney PM, 2012. A functional haplotype of UBE2L3 confers risk for systemic lupus erythematosus. Genes Immun. 13, 380–387. 10.1038/gene.2012.6 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wang X-F, Xu W-J, Wang F-F, Leng R, Yang X-K, Ling H-Z, Fan Y-G, Tao J-H, Shuai Z-W, Zhang L, Ye D-Q, Leng R-X, 2022. Telomere Length and Development of Systemic Lupus Erythematosus: A Mendelian Randomization Study. Arthritis Rheumatol. Hoboken NJ: 74, 1984–1990. 10.1002/art.42304 [DOI] [PubMed] [Google Scholar]
- Watanabe K, Stringer S, Frei O, Umićević Mirkov M, de Leeuw C, Polderman TJC, van der Sluis S, Andreassen OA, Neale BM, Posthuma D, 2019. A global overview of pleiotropy and genetic architecture in complex traits. Nat. Genet 51, 1339–1348. 10.1038/s41588-019-0481-0 [DOI] [PubMed] [Google Scholar]
- Westra H-J, Peters MJ, Esko T, Yaghootkar H, Schurmann C, Kettunen J, Christiansen MW, Fairfax BP, Schramm K, Powell JE, Zhernakova A, Zhernakova DV, Veldink JH, Van den Berg LH, Karjalainen J, Withoff S, Uitterlinden AG, Hofman A, Rivadeneira F, Hoen P.A.C.‘t, Reinmaa E, Fischer K, Nelis M, Milani L, Melzer D, Ferrucci L, Singleton AB, Hernandez DG, Nalls MA, Homuth G, Nauck M, Radke D, Völker U, Perola M, Salomaa V, Brody J, Suchy-Dicey A, Gharib SA, Enquobahrie DA, Lumley T, Montgomery GW, Makino S, Prokisch H, Herder C, Roden M, Grallert H, Meitinger T, Strauch K, Li Y, Jansen RC, Visscher PM, Knight JC, Psaty BM, Ripatti S, Teumer A, Frayling TM, Metspalu A, van Meurs JBJ, Franke L, 2013. Systematic identification of trans eQTLs as putative drivers of known disease associations. Nat. Genet 45, 1238–1243. 10.1038/ng.2756 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wu Y-Y, Georg I, Díaz-Barreiro A, Varela N, Lauwerys B, Kumar R, Bagavant H, Castillo-Martín M, El Salem F, Marañón C, Alarcón-Riquelme ME, 2015. Concordance of increased B1 cell subset and lupus phenotypes in mice and humans is dependent on BLK expression levels. J. Immunol. Baltim. Md 1950 194, 5692–5702. 10.4049/jimmunol.1402736 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Xu S, Fung WK, Liu Z, 2021. MRCIP: a robust Mendelian randomization method accounting for correlated and idiosyncratic pleiotropy. Brief. Bioinform 22, bbab019. 10.1093/bib/bbab019 [DOI] [PubMed] [Google Scholar]
- Xu Y, Zhou H, Post G, Zan H, Casali P, 2022. Rad52 mediates class-switch DNA recombination to IgD. Nat. Commun 13, 980. 10.1038/s41467-022-28576-2 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Yadav SK, Soin D, Ito K, Dhib-Jalbut S, 2019. Insight into the mechanism of action of dimethyl fumarate in multiple sclerosis. J. Mol. Med. Berl. Ger 97, 463–472. 10.1007/s00109-019-01761-5 [DOI] [PubMed] [Google Scholar]
- Yang T, Sim K-Y, Ko G-H, Ahn J-S, Kim H-J, Park S-G, 2022. FAM167A is a key molecule to induce BCR-ABL-independent TKI resistance in CML via noncanonical NF-κB signaling activation. J. ExpClin. Cancer Res. CR 41, 82. 10.1186/s13046-022-02298-1 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Yazar S, Alquicira-Hernandez J, Wing K, Senabouth A, Gordon MG, Andersen S, Lu Q, Rowson A, Taylor TRP, Clarke L, Maccora K, Chen C, Cook AL, Ye CJ, Fairfax KA, Hewitt AW, Powell JE, 2022. Single-cell eQTL mapping identifies cell type-specific genetic control of autoimmune disease. Science 376, eabf3041. 10.1126/science.abf3041 [DOI] [PubMed] [Google Scholar]
- Yuan S, Wang L, Zhang H, Xu F, Zhou X, Yu L, Sun J, Chen J, Ying H, Xu X, Yu Y, Spiliopoulou A, Shen X, Wilson J, Gill D, Theodoratou E, Larsson SC, Li X, 2023. Mendelian randomization and clinical trial evidence supports TYK2 inhibition as a therapeutic target for autoimmune diseases. EBioMedicine 89, 104488. 10.1016/j.ebiom.2023.104488 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Yuan Z, Liu L, Guo P, Yan R, Xue F, Zhou X, 2022. Likelihood-based Mendelian randomization analysis with automated instrument selection and horizontal pleiotropic modeling. Sci. Adv 8, eabl5744. 10.1126/sciadv.abl5744 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Yuan Z, Zhu H, Zeng P, Yang S, Sun S, Yang C, Liu J, Zhou X, 2020. Testing and controlling for horizontal pleiotropy with probabilistic Mendelian randomization in transcriptome-wide association studies. Nat. Commun 11, 3861. 10.1038/s41467-020-17668-6 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zhang Y, Beachy PA, 2023. Cellular and molecular mechanisms of Hedgehog signalling. Nat. Rev. Mol. Cell Biol 10.1038/s41580-023-00591-1 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zhao J, Ma J, Deng Y, Kelly JA, Kim K, Bang S-Y, Lee H-S, Li Q-Z, Wakeland EK, Qiu R, Liu M, Guo J, Li Z, Tan W, Rasmussen A, Lessard CJ, Sivils KL, Hahn BH, Grossman JM, Kamen DL, Gilkeson GS, Bae S-C, Gaffney PM, Shen N, Tsao BP, 2017. A missense variant in NCF1 is associated with susceptibility to multiple autoimmune diseases. Nat. Genet 49, 433–437. 10.1038/ng.3782 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zhou T, Zhu X, Ye Z, Wang Y-F, Yao C, Xu N, Zhou M, Ma J, Qin Y, Shen Y, Tang Y, Yin Z, Xu H, Zhang Y, Zang X, Ding H, Yang W, Guo Y, Harley JB, Namjou B, Kaufman KM, Kottyan LC, Weirauch MT, Hou G, Shen N, 2022. Lupus enhancer risk variant causes dysregulation of IRF8 through cooperative lncRNA and DNA methylation machinery. Nat. Commun 13, 1855. 10.1038/s41467-022-29514-y [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zhou W, Cai J, Li Z, Lin Y, 2023. Association of atopic dermatitis with autoimmune diseases: A bidirectional and multivariable two-sample mendelian randomization study. Front. Immunol 14, 1132719. 10.3389/fimmu.2023.1132719 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zhu Z, Zhang F, Hu H, Bakshi A, Robinson MR, Powell JE, Montgomery GW, Goddard ME, Wray NR, Visscher PM, Yang J, 2016. Integration of summary data from GWAS and eQTL studies predicts complex trait gene targets. Nat. Genet 48, 481–487. 10.1038/ng.3538 [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
TABLE S1. Candidate causal genes identified by the SMR method. Cells with p-value < 0.05 in the table are highlighted in light blue color. The table column names are described as follows. Gene: Symbol representing the gene; SNP: ID of the most significant eQTL SNP associated with the gene; Chrom: ID of chromosome where the gene and SNPs are located; Position: Chromosomal position (base-pair) of the top eQTL SNP associated with the gene; p_GWAS: SLE GWAS p-value of the top eQTL SNP; p_eQTL: eQTL p-value of the top eQTL SNP; alpha_SMR: gene causal effect size estimate by SMR method; p_SMR: p-value for the gene causal effect size estimate by SMR method; alpha_PMR: gene causal effect size estimate by PMR-Egger method; p_PMR: p-value for the gene causal effect size estimate by PMR-Egger method; alpha_MRAID: gene causal effect size estimate by MRAID method; p_MRAID: p-value for the gene causal effect size estimate by MRAID method; alpha_MtRobin: gene causal effect size estimate by MR-MtRobin method; p_MtRobin: p-value for the gene causal effect size estimate by MR-MtRobin method. ‘NA’ in the cells: data not available because of technical issues such as ‘algorithm generated errors’ and ‘insufficient number of genetic variants at the locus to reliably estimate causal effect size’; p_HEIDI: p-value for HEIDI test of heterogeneity
Data Availability Statement
The original contributions presented in the study are included in the article/Supplementary Material. Further inquiries can be directed to the corresponding authors.
