Abstract
Identifying whether potential causal variants for related diseases are shared can identify overlapping etiologies of multifactorial disorders. Colocalization methods disentangle shared and distinct causal variants. However, existing approaches require independent datasets. Here we extend two colocalization methods to allow for the shared control design commonly used in comparison of genome-wide association study results across diseases. Our analysis of four autoimmune diseases, type 1 diabetes (T1D), rheumatoid arthritis, celiac disease and multiple sclerosis, revealed 90 regions that were associated with at least one disease, 33 (37%) of which with two or more disorders. Nevertheless, for 14 of these 33 shared regions there was evidence that causal variants differed. We identified novel disease associations in 11 regions previously associated with one or more of the other three disorders. Four of eight T1D-specific regions contained known type 2 diabetes candidate genes: COBL, GLIS3, RNLS and BCAR1, suggesting a shared cellular etiology.
Introduction
Overlaps of genetic association to different diseases have been widely observed, and are thought to reflect shared etiology between diseases.1 However, showing that a variant is associated with two traits does not demonstrate that it is causal for both: this may be due to distinct variants in linkage disequilibrium.2 Colocalization analyses are used to study whether potential causal variants are shared by combining information across multiple single nucleotide polymorphisms (SNPs) in a region. The proportional approach3 tests a null hypothesis of proportionality under which, if causal variants are shared, we expect to see that the effects of any set of SNPs on the two diseases are proportional to each other. A weakness of this approach is interpretation. Failure to reject the null hypothesis does not only imply colocalization, but could also be caused by either disease being not associated, or by insufficient power owing to too few samples analysed and/or an incomplete genetic map4 (Supplementary Fig. 1). We have no way of measuring how likely colocalization is. A strength is that no assumptions are made about the number of causal variants: the null hypothesis corresponds to complete sharing across all causal variants. An alternative is to use a Bayesian framework,5 to generate posterior probabilities for colocalization and distinct causal variants, as competing hypotheses. However, a weakness of this approach, as currently developed, is that it assumes only a single causal variant for each trait within any region.
Existing colocalization methods require that genetic association with the two traits of interest has been tested in distinct samples. However, this requirement restricts the applicability of the approach to related diseases since each set of case samples must have a corresponding distinct set of control samples, enabling a logistic binomial model to be used independently upon each disease. In contrast, many studies use a common set of controls for different diseases to increase efficiency. Here, we extend both colocalization methods to allow for the use of multinomial logistic regression, the natural model for shared controls.
Previous studies have identified many regions associated with multiple autoimmune or autoinflammatory diseases, including type 1 diabetes (T1D) and celiac disease (CEL).6,1 Such multi–disease association led to the development of the ImmunoChip,7 a custom genotyping chip with 196,000 SNPs designed to densely cover 186 regions known to associate with at least one immune disease on the basis of GWAS p–value < 10−8. The ImmunoChip consortium genotyped a common control set, to which some disease groups added their own controls. We applied our extended methods to ImmunoChip raw genotyping data for a total of 36,030 samples, including one set of controls and four disease cohorts, in order to better understand the extent of shared genetic etiology in these diseases.
Results
Overview of Method
The Bayesian method derives the posterior support for each of five hypotheses describing the possible association of the region with both diseases. Of greatest interest are:
: Both diseases are associated with the region, with different causal variants.
: Both diseases are associated with the region, and share a single causal variant.
Association with both traits corresponds to or ; colocalization corresponds to . This method requires specification of prior probabilities for each hypothesis. We calibrated priors to match our expectations that about 50% of regions associated with two immune–mediated diseases correspond to a shared causal variant (Supplementary Fig. 2), which is close to the proportion found in a manually curated summary of association to six immune–mediated diseases8 (58%). For rheumatoid arthritis (RA)9 and multiple sclerosis (MS),10 for which only UK subsets of international cohorts were analyzed, we modified priors in regions with published associations to reflect this additional information from the published papers. Where a region was annotated in ImmunoBase as associated with RA or MS, we shrunk our priors for hypotheses corresponding to no association for the disease close towards 0, and increased our priors for the remaining hypotheses (Supplementary Methods).
One hundred and twenty six ImmunoChip regions assigned to at least one of the diseases (based upon knowledge when the chip was designed or identified in subsequent papers and curated in ImmunoBase, accessed 12/11/13) were analyzed using both approaches for all six pairwise comparisons of the four diseases. Sample and SNP QC is described in the Methods; we excluded low frequency variants (MAF< 1%) to reduce the number of models to be considered and because genotyping errors are more common amongst this group of SNPs, and we did not have cluster plots available for all diseases. Although GWAS studies typically have sufficient power to detect association only with more common SNPs, some rarer variants (for example, in TYK211) have been reported with these diseases which will be missed in our analysis.
Overview of Results
The Bayesian approach assumes a single causal variant per trait in any region. To allow for multiple causal variants, we used a stepwise method. In the overwhelming majority of cases (740 out of 756 pairwise comparisons, or 98%) data were consistent with at most one causal variant per trait in a region. Ninety of the 126 regions (71%) showed association with at least one disease; in 33 regions, the association was shared between at least two diseases (Fig 1). Complete results are given in Supplementary Table 1, Supplementary Table 2 and Supplementary Table 3. For fifty–seven regions, the greatest support was for association with precisely one of the four diseases; in 21 cases, we know of no other immune–mediated diseases that have reported association to these regions and therefore hypothesize these may be disease specific among autoimmune diseases (Table 1).
Table 1.
Chromosome | Position | Disease Association | Posterior Probability of Single Association | Candidate Causal Gene(s)/ Genes in Region |
---|---|---|---|---|
1p22.1 | 92023171–93311800 | MS | 1.00 | EVI5 |
1p21.2 | 100982239–101455699 | MS | 0.57 | EXTL2 VCAM1 SLC30A7 |
1p13.1 | 116831830–116911865 | MS | 1.00 | CD58 |
3p24.1 | 28015774–28105476 | MS | 0.99 | (CMC1) |
3q13.33 | 122818149–123329522 | MS | 1.00 | IQCB1 SLC15A2 CD86 |
5q21.1 | 102062861–102777130 | RA | 0.58 | C5orf30 |
6q23.3 | 137348296–137587799 | MS | 1.00 | IL22RA2 |
7p12.2 | 50337180–50662811 | T1D | 0.97 | 3′ IKZF1* region |
7p12.2 | 50866661–51640000 | T1D | 1.00 | COBL |
8q21.12 | 79575897–79914680 | MS | 1.00 | ZC2HC1A |
8q24.21 | 129187117–129368419 | MS | 0.51 | PVT1 MIR1208 |
9p24.2 | 4218549–4311558 | T1D | 1.00 | GLIS3 |
10q23.31 | 89998026–90268360 | T1D | 0.87 | RNLS |
11p15.5 | 2024999–2264880 | T1D | 1.00 | INS |
12q24.31 | 121926103–122574026 | MS | 0.59 | PITPNM2 |
14q32.2 | 100357783–100398492 | T1D | 0.98 | DLK1 |
16q23.1 | 73760230–74086012 | T1D | 1.00 | BCAR1 |
19p13.3 | 6564831–6636304 | MS | 1.00 | TNFSF14 |
19p13.11 | 16300497–16612240 | MS | 1.00 | (EPS15L1, CALR3, MED26, C19orf44, CHERP, SLC35E1) |
19p13.11 | 17905598–18272802 | MS | 0.87 | MPV17L2 IFI30 |
20p13 | 1444472–1707590 | T1D | 0.99 | (SIRPD, SIRPB1, SIRPG) |
There are two ImmunoChip regions which overlap IKZF1 and are separated by a recombination hotspot. The region towards the 5′ end has colocalizing associations with MS and T1D while the region towards the 3′ end appears specific to T1D, as shown in Supplementary Figure 7. Note we provide coordinates of the region, and not an index SNP as is conventional in gwas studies because the method synthesises information across the whole region and does not, in most cases, highlight a single SNP responsible for the association.
In the Bayesian approach, when the posterior probability of a hypothesis is close to 0.5, assignment cannot be made with confidence to any single hypothesis. However, in the 30 instances in which both diseases showed very strong evidence of association , the Bayesian and proportional approaches produced consistent results. For these 30 cases, the proportional null was rejected only in cases in which the Bayesian analysis favored , and not rejected in cases where was favored. Focusing on these, the data strongly supported that the same causal variants underlie all diseases in ten cases, while seven showed strong evidence for distinct variants, suggesting that just under half, 42%, of overlapping association signals reflect distinct causal variants. In total, fourteen regions showed evidence of separate SNP effects , see Table 2).
Table 2.
Chromosome | Position | Associations | Evidence | Candidate Causal Gene |
---|---|---|---|---|
2p16.1 | 60722116–61952276 | C—M | CM: ~0.65 | REL |
2q32.2 | 191412527–191739472 | RC—M | RM: ~0.51 | STAT1 STAT4 |
2q33.1 | 202920548–204528303 | D—C—R | DR: ~0.98 RC: ~0.91 | CD28 CTLA4 ICOS |
3p21.31 | 45812888–46633741 | D—C | DC: ~0.92 | CCR3 CCR1 CCR5 |
3q25.33 | 160950948–161389020 | C—M | CM: ~0.96 | IL12A |
4q27 | 123121079–124497235 | D—C | DC: ~1.00 | IL2 IL21 |
6q23.3 | 137914792–138345363 | DRC—M | RM: ~0.75 CM: ~0.85 | TNFAIP3 |
10p15.1 | 6068495–6237542 | D—M | DM: ~1.00 | IL2RA |
11q23.3 | 117805448–118403529 | C—M | CM: ~0.82 | CXCR5 |
13q32.3 | 98723872–99034738 | D—C | DC: ~0.67 | GPR183 |
16p13.13 | 10831557–11408130 | DM—C | DC: ~0.51 | DEXI SOCS1 |
18p11.21 | 12407903–12919721 | D—C | DC: ~0.58 | PTPN2 |
19p13.2 | 10081000–11019034 | DRM—C | DC: ~0.53 | ICAM1 ICAM3 TYK2 |
21q22.3 | 42681877–42771181 | D—R—C | DR: ~0.77 DC: ~0.99 RC: ~0.69 | UBASH3A |
Disentangling Patterns of Association
For colocalized disease regions, the two diseases generally have consistent directions of effect (Fig 2) with the exception of the 6q25.3 region containing candidate gene TAGAP, which is associated in our analysis with CEL and MS only: the risk allele for CEL is protective for MS and vice versa (Supplementary Fig. 3). This opposing effect of TAGAP alleles has been previously described for T1D and CEL,6 although the region did not provide sufficient evidence for association with T1D in the data available to us. A similar effect for the 2q12.1 region containing candidate gene IL18RAP has also been reported.6 However, later data12 have not offered support for T1D association to 2q12.1, and, in our analysis, the posterior support is concentrated on CEL association alone.
Patterns of association with multiple diseases can be complex. In the 2q33 region containing established candidate gene CTLA4, as well as the equally strong functional candidate genes, CD28 and ICOS, three potential causal variants appear to be partially shared between T1D, RA and CEL. The strongest association with T1D is at rs3087243 (which has previously been called CT60), while the strongest association with CEL is with rs231775 (which alters the amino acid at position 17 of CTLA4, Ala17Thr, and has previously been called CT42). The two SNPs have r2 = 0.5, and haplotype analysis has previously suggested CT60 and not CT42 is causal for Graves’ disease.13 For RA, the strongest single SNP signal is at rs1980422, which is not in LD with either CT42 or CT60 (r2 < 0.1). We fitted the 512 possible standard multinomial models involving these three SNPs for the three diseases, and computed approximate Bayes factors for each. Assuming each model to be equally likely a priori, the model with highest posterior probability has rs1980422/rs3087243 (CT60) signals for CEL and rs231775 (CT42)/rs1980422 for both T1D and RA, although whilst rs231775 (CT42) is the strongest effect for T1D, rs1980422 is strongest for RA (Fig. 3). We note that our analysis is based on SNPs selected through a stepwise process and that without fine mapping analysis we cannot claim that any one of these models correctly reflects the causal variants for any disease. These results do, however, clearly illustrate the different patterns of association with the three disorders and emphasize the potential complexity that can arise in regions of multiple association signals. They motivate the future extension of the colocalization approach developed here to allow model search strategies that do not require stepwise assumptions.
Discovery of Novel Associations
Two regions were associated with all four diseases (Fig. 1). One was the 6q23.3 region containing candidate gene TNFAIP3, known to be associated with RA and CEL. There has been some published evidence that T1D is associated with this region,14 although not at genome–wide significant levels. Our results identify a T1D signal, colocalized with that for RA and CEL, suggesting a single shared causal variant affecting the three diseases. There is also evidence of MS association, driven by a distinct causal variant (in the CEL–MS analysis, , Fig. 4).
The second region was 19p13.2, known to be associated with T1D, RA and MS, containing the strong functional candidate gene TYK2, although immune adhesion genes ICAM1 and ICAM3 are also good candidate genes. Our analysis supports these associations, with a posterior probability of colocalization approaching 1. We also find evidence for a novel CEL association. In each of the pairwise analyses involving CEL, the probability of both diseases being associated ≃ 0.88, although this could be a distinct signal: we have (Supplementary Fig. 4). In total, 11 regions showed strong evidence of novel association with (Table 3).
Table 3.
Chromosome | Position | Prior Associations | Associations Found | Posterior probability both diseases are associated | Posterior probability shared causal variant given joint association | Candidate Causal Genes/Genes in Region |
---|---|---|---|---|---|---|
1q24.3 | 170882016–171208336 | C | DC | DC:0.75 | DC:0.95 | FASLG |
2p14 | 65246601–65570598 | R | RM | RM:0.86 | RM:0.72 | SPRED2 |
2q11.2 | 99883120–100415547 | DR | DRC | DC:0.98 RC:1.00 | DC:0.57 RC:0.90 | AFF3 |
2q37.1 | 230758228–230962304 | M | CM | CM:0.94 | CM:0.90 | SP140 |
5q11.2 | 55450712–55492884 | RM | DRM | DR:0.71 DM:0.71 | DR:1.00 DM:1.00 | ANKRD55 |
6q23.3 | 137914792–138345363 | RCM | DRC—M | DR:0.80 DC:0.77 | DR:0.94 DC:0.93 | TNFAIP3 |
7p14.2 | 37323488–37406978 | CM | RCM | RC:0.80 RM:0.77 | RC:0.84 RM:0.83 | ELMO1 |
7p12.2 | 50222360–50335957 | M | DM | DM:0.73 | DM:0.70 | 5′ IKZF1* region |
13q32.3 | 98723872–99034738 | DM | D—C | DC:0.67 | DC:0.00 | GPR183 |
15q25.1 | 76773859–77050416 | DM | DC | DC:0.82 | DC:0.99 | CTSH |
19p13.2 | 10081000–11019034 | DRM | DRM—C | DC:0.87 RC:0.87 CM:0.88 | DC:0.40 RC:0.46 CM:0.57 | ICAM1 ICAM3 TYK2 |
An association of T1D in a region 3′ of IKZF1, for which it is hypothesised that IKZF1 is the candidate causal gene is already known33 (see Table 1). The novel association we report here is in a region 5′ of IKZF1, and independent of the established association. Note we provide coordinates of the region, and not an index SNP as is conventional in gwas studies because the method synthesises information across the whole region and does not, in most cases, highlight a single SNP responsible for the association.
In regions with colocalizing novel associations, effect sizes tended to be smaller in the new disease (Fig. 2). This could indicate that the stronger effect is in the previously known association, or it could be due to Winner’s Curse,15 with the previously known associations displaying inflated effect size estimates. In general for colocalized signals, the coefficient of proportionality is centered about 1.
One novel association found was in the chromosome 1q24.3 region, known to be associated with CEL and containing candidate gene FASLG. Pathway analysis also produced evidence for a T1D–associated variant here,16 although no SNP has reached the genome–wide significance threshold. Our results support a shared causal variant for T1D and CEL (posterior probability 0.71). Our Bayesian approach also enables fine–mapping when dense genotyping data are available, as is the case here. We identified a single likely causal variant lying in a region with strong evidence of predicted regulatory activity, rs78037977 (Supplementary Fig. 5), with a posterior probability of being causal amongst all genotyped variants, given the colocalization hypothesis, of 0.99. Note that rs78037977 was removed from the CEL data in the original analysis17 owing to failing a missingness check (the call rate of 99.942% was just below the 99.95% cut–off). Plots of the signal clouds for our samples at this SNP are given in Supplementary Figure 6. The clustering shown here is of good quality, implying that the rs78037977 genotype can be considered reliable.
Prior Sensitivity
We tested prior sensitivity by varying p12 (the probability that an arbitrary SNP is associated with both diseases) from p12 = 10−5 to 10−7, while keeping p1 and p2 (the probability that this SNP is associated with only trait 1, or only trait 2) constant at 10−4 (Supplementary Table 4). Whether a region is disease specific is largely unaffected by choice of p12 and, for the five regions discussed in detail in this paper (1q24.3/FASLG; 2q33.1/CTLA4; 6q23.3/TNFAIP3; 6q25.3/TAGAP and 19p13.2/TYK2), the prior does not change which diseases are associated. However, the posterior odds for does vary with p12. Under p12 = 10−7, neither 1q24.3/FASLG nor 6q23.3/TNFAIP3 had strong posterior support as a novel T1D region since the evidence for novel association in these regions comes about due to colocalization with the stronger previously known association. This dependence on prior belief is a strength of Bayesian methods, but they require that priors be carefully calibrated. Whilst our prior belief is that about 50% of regions associated with two immune–mediated diseases are likely to correspond to a shared causal variant, others may disagree. The results given in Supplementary Table 2 can be used to calculate the posterior under any alternative p12 using the formula given in Supplementary Material.
Discussion
Colocalization methods so far have allowed for the simultaneous analysis of only two traits: a potential weakness when considering more than two diseases, as investigated here. The Bayesian approach could be extended to arbitrarily many traits, at the cost of increased computational complexity and spreading the posterior over an exponentially increasing hypothesis space, potentially making it difficult to draw firm conclusions. Wen et al, in their description of an alternative method for partitioning the association of a single SNP amongst multiple related quantitative traits,18 suggest dealing with this complexity by considering only the extremes – a SNP is associated to all traits, exactly one, or none. Such reduction is impractical when analyzing regions, since it does not allow for overlapping but distinct signals. Although we have extended our software to consider three diseases simultaneously, we have chosen for practical reasons to focus on pairwise analyses with manual curation of the 11 cases (9%) for which more than two diseases showed association.
Giambartolomei et al5 showed that inference is consistent when the causal variant is directly genotyped or well imputed. The decision was taken when the ImmunoChip was designed not to thin by LD, but instead target all SNPs and small indels known at that time in 1000 Genomes European samples and it has since been shown that common variants may be very accurately imputed using ImmunoChip.19 Therefore we are likely to be very close to the situation where causal variants are directly genotyped. The application of our method to the less complete coverage provided by genome–wide SNP arrays would require an imputation step to allow consistent inference to be made. The Bayesian colocalization analysis assumes a single causal variant per region, which could be restrictive, and we addressed this using a stepwise approach, attempting to colocalize the individual signals for each disease where there was evidence for more than one. The agreement between our results with this approach and using the proportional colocalization approach which does not make this assumption confirms the appropriateness of the stepwise approach in the cases we consider.
We identified 21 regions that appeared associated to only one autoimmune disease. One challenge in interpretation when defining disease unique signals is exemplified by a region on chromosome 7p12.2 which contains the candidate causal gene IKZF1. This gene overlaps two ImmunoChip regions separated by a recombination hotspot, one 5′ of IKZF1 and one 3′ of IKZF1. The 5′ region contains a colocalized signal for MS and T1D, whilst the 3′ region contains only a T1D signal (Supplementary Fig. 7). Our analysis has been based on regions, as defined in the design of the ImmunoChip and based on recombination hot spots. However, whilst the T1D signals in these regions are independent and the 3′ region of IKZF1 appears unique to T1D, it is plausible that the causal variants in both regions act through the same gene, IKZF1. Another challenge is to deal with the effects of power, given the established influence of sample size on power to detect associations.20 Many of the regions in Table 1 contain genes linked to immune function, and we expect a number of apparent disease–specific results to associate with other diseases as sample sizes for each disease continue to increase. Indeed, the chromosome 19p13.11 region, associated only with MS in our analysis, has previously been associated with lymphocyte count,21 with high LD between the peak MS SNP (rs1870071) and the lymphocyte count SNP (rs11878602, r2 = 0.99), suggesting an immune mechanism for the association.
However, in the case of T1D, three disease–unique regions overlap known type 2 diabetes (T2D) regions. Chromosome 9p24.2, containing the candidate gene GLIS3, has been associated with T2D22 and fasting glucose23 with high LD between the peak SNP for T1D (rs10814914) and these other traits (rs7041847, r2 > 0.9). GLIS3 and its causal allele alter disease risk by altering pancreatic beta–cell function, probably by increasing beta–cell apoptosis.24 Chromosome 16q23.1, containing the candidate gene BCAR1, is associated with T1D in our analysis and T2D,22 and the T2D alleles in this region have been associated with reduced beta cell function,25 again with high LD between the peak SNPs for T1D (rs8056814) and T2D (rs7202877, r2 = 0.81). Inspecting the distribution of T2D GWAS p values at the peak SNPs in our T1D associated regions (Supplementary Fig. 8), we note that the peak SNP in the T1D associated region 6q22.32, rs17754780, also shows association to T2D (p = 7.9 × 10−5) and is in tight LD with peak T2D SNP in the region (rs9385400, r2 = 0.97). This region has been reported as associated with T2D at genome–wide significance in a larger study.26 Chromosome 6q22.3 is not uniquely associated to T1D in our analysis because it overlaps an established Crohn’s disease region,27 but the lead Crohn’s SNP (rs9491697) is not in LD with the T1D SNP (r2 = 0.03). This is then likely to be a third shared signal between T1D and T2D. The nearest genes are MIR588 about which little appears to be known and CENPW (centromere protein W) which has no obvious functional candidacy. This genetic overlap between T1D and T2D (Supplementary Table 5) emphasizes that T1D results from an interaction between the immune system and beta cells, and it is probable that some of our other apparent disease unique regions will also prove to be specific to the target of autoimmune destruction in MS and RA.
By analyzing regions known to associate with one disease, we were able to link 11 to additional disorders: in most cases (8/11) the novel disease association was clearly colocalized with a previously known signal, whilst in one case, GPR183, the evidence supported a distinct causal variant for the novel association. In others (3/11) the evidence for colocalization was more equivocal, even with evidence for pairwise association.
In a standard GWAS analysis, a p–value significance threshold of 5 × 10−8 is used in absence of replication data, due to a desire to minimise reporting of false positive results, although a relaxation of this threshold has been suggested.29 However, since autoimmune diseases are known to share etiology, conditioning upon association for one autoimmune disease, we should require a less stringent threshold to believe it significant for another. Indeed, whilst the question of whether the ImmunoChip significance threshold should be somewhat relaxed remains,8 examination of p–values in the regions in which we observe novel associations (Supplementary Fig. 9) suggests that a threshold between 10−5 and 10−6 for SNPs that are confirmed index SNPs for another disease might be more appropriate. We estimated that 42% of overlapping and genome–wide significant immune–mediated disease signals relate to distinct causal variants. In these regions, therefore, there appear to be distinct causal variants for two or more autoimmune diseases which are physically proximal but in low LD. We suggest that physical proximity to a known associated variant in a related disease, and not only LD with it, may prove to be an appropriate criterion with which to alter interpretation of a small but not genome–wide significance threshold. Variants meeting such thresholds might be prioritised for genotyping in replication samples. We note, also, that the four diseases we studied are all characterized by the presence of autoantibodies. Had we included autoantibody negative diseases we might have found a higher proportion of discordant associations as reported in a previous manual curation of ImmunoChip studies,8 given there remains considerable overlap in location of association signals. Although a careful and detailed manual curation of several studies has been conducted,8 the ability of colocalization methods to distinguish shared from distinct causal variants allows clearer interpretation of genetic results.
In summary, we have developed a methodology for examining shared genetic etiology between diseases in the case of common control datasets, extending previous work.2,3 This enables the discovery of new disease associations and the exploration of complex association patterns. Although these methods have been presented in this paper to analyze autoimmune diseases, the prior is user defined, and could be used to analyze any pair of related diseases.
Online Methods
Samples
All samples included in this analysis were gathered in the United Kingdom, and have reported or self declared European ancestry. Informed consent was obtained from all subjects. Detailed summaries of the sample cohorts are given in the ImmunoChip papers for CEL,17 RA,9 MS10 and T1D.30 For the RA and MS cases, we used the subset of cases from the UK. Sample exclusions were applied as described in each paper, and in total, 6,691 T1D, 3,870 RA, 7,987 CEL, 5,112 MS and 12,370 control samples were analyzed. SNPs were filtered according to the following criteria: call rate > 0.99; minor allele frequency > 0.01; Hardy–Weinberg ∣Z∣ < 5. SNPs which passed these thresholds in controls and any specific pair of cases were used for that pairwise analysis.
Using only UK cases and controls means that we expect any effects of population stratification to be very limited, as evidenced by the low genomic inflation factors in published UK ImmunoChip analyses31 and we did not take any further specific actions to limit effects of population stratification.
Selection of Regions for Analysis
We considered all regions annotated in ImmunoBase (accessed on 12/11/13) as associated with at least one of our diseases. Where regions overlapped, we formed the union. Regions containing fewer than ten SNPs or with a SNP density < 1 SNP/kb were excluded. The MHC (chr6:29797978–33606563 hg18) was removed from the analysis, since this region is known to have complex multi–SNP effects. A full list of the 126 regions analyzed, together with our resulting associations, can be found in Supplementary Table 1.
Colocalization Analysis
Two colocalization methods were applied to each of the 126 regions (see Supplementary Fig. 1).
Bayesian Approach
The first approach is based upon a Bayesian approach proposed by Giambartolomei et al.5 All models in which each trait is caused by at most one variant are considered, and approximate Bayes factors computed for each. Our extension follows the same framework, but, in order to extend this method to the case of a common control, a multinomial model was used. Bayes factors were computed using a Laplace approximation32 as implemented in the R package mlogitBMA. Each of these models is contained within precisely one of the following sets:
: No SNP is associated with either trait
: There is a SNP associated with trait 2, but no SNP is associated with trait 1.
: There is a SNP associated with trait 1, but no SNP is associated with trait 2.
: Both the traits are associated with the region, with different causal variants.
: Both the traits are associated with the region, and share a single causal variant.
By summing the Bayes factors generated for all models in the set, a posterior possibility can be computed for each of the hypotheses, and hence for colocalization (). Similarly, the posterior probability of any given model, given a specific hypothesis and equal prior probability of each model, is proportional to the Bayes factor for that model. Since a Bayes factor is assigned to each model independently, it is straightforward to calculate the conditional probability of each SNP being causal, given association, as proportional to the Bayes factor for the relevant model.
This approach assumes a single causal variant at any region. In regions with strong evidence of association we performed conditional analysis. Firstly, all plausibly important SNPs were discovered by iteratively conditioning on the most likely set of SNPs to cause the associations seen, until there was no longer strong evidence of additional association. In those cases where multiple SNPs were considered relevant, all but a pair (one potentially causal for the first trait, and one for the second) were conditioned upon, in order to discover the colocalization (or not) of the effects at this pair alone.
Proportional Approach
A second method based upon the proportional approach2,3 was also used. Phenotypes are modeled using multinomial logistic regression, producing maximum likelihood estimates b1 and b2 of regression coefficients β1 and β2. Since the samples sizes can be large, the asymptotic normality of maximum likelihood estimators is used to approximate:
for some variance–covariance matrix V.
The proportional method3 assumes that b1, b2 are independent (i.e. V 12 = V 21 = O). However in the extension to a common control dataset, we cannot assume this, and proceed with a fully unknown V.
The null hypothesis corresponds to the existence of a constant η such that:
Under this hypothesis, and given η,
This is used as our test statistic. However, since the value of η was unknown, a posterior predictive p–value is generated instead, by integrating the p–values associated with the test statistic over the posterior distribution of η. To avoid bias in regression coefficients due to selection of SNPs on the basis of their strength of association, Bayesian model averaging was used to average inference over all plausible two SNP models.
Further details of the colocalization methods can be found in the Supplementary Methods section.
Identification of disease specific regions
To examine evidence for GWAS association with other traits, we took the index SNP with smallest p values in a region, and then identified proxy SNPs based on LD (r2 ≥ 0.9) using 1000 genomes EUR data. We used this as a query SNP set to examine associations annotated in the NIHR GWAS catalog (accessed 07/10/2014)
We identified disease specific regions for which: the posterior probability for single SNP association was >0.5; posterior probability of association with any other disease was <0.2; the region was not annotated as associated with any other autoimmune disease in ImmunoBase; and no proxies for the index SNP were associated with any other autoimmune disease in the NIHR GWAS catalog.
Type 2 diabetes data
Summary from a T2D GWAS meta–analysis22 was downloaded from the DIAGRAM website (accessed 20/10/14).
Code availability
The code used is given in the colocCommonControl R package, which can be found at https://github.com/mdfortune/colocCommonControl
Supplementary Material
Acknowledgments
MF is funded by the Wellcome Trust (099772). CW and HG are funded by the Wellcome Trust (089989).
This work was funded by the JDRF (9–2011–253), the Wellcome Trust (091157) and the National Institute for Health Research (NIHR) Cambridge Biomedical Research Centre. The Cambridge Institute for Medical Research (CIMR) is in receipt of a Wellcome Trust Strategic Award (100140). ImmunoBase.org is supported by Eli Lilly and Company.
We thank the UK Medical Research Council and Wellcome Trust for funding the collection of DNA for the British 1958 Birth Cohort (MRC grant G0000934, WT grant 068545/Z/02). DNA control samples were prepared and provided by S. Ring, R. Jones, M. Pembrey, W. McArdle, D. Strachan and P. Burton.
This research utilizes resources provided by the Type 1 Diabetes Genetics Consortium, a collaborative clinical study sponsored by the National Institute of Diabetes and Digestive and Kidney Diseases (NIDDK), National Institute of Allergy and Infectious Diseases (NIAID), National Human Genome Research Institute (NHGRI), National Institute of Child Health and Human Development (NICHD), and the JDRF and supported by U01 DK062418.
Collection of the rheumatoid arthritis data was funded by the Arthritis Foundation and the US National Institutes Health.
We are grateful to G. Trynka and D. van Heel for use of the celiac disease data. Funding was provided by the Wellcome Trust, by grants from the celiac disease Consortium and an Innovative Cluster approved by the Netherlands Genomics Initiative, by the Dutch Government (BSIK03009 to C. Wijmenga) and the Netherlands Organisation for Scientific Research (NWO, grant 918.66.620) and by the US National Institutes of Health grant 1R01CA141743 and Fondo de Investigacin Sanitaria grants FIS08/1676 and FIS07/0353.
Multiple sclerosis data was provided by S. Sawcer and the International Multiple Sclerosis Genetics Consortium. Funding was provided by the US National Institutes of Health, the Wellcome Trust, the UK MS Society, the UK Medical Research Council, the US National MS Society, the Cambridge National Institute for Health Research (NIHR) Biomedical Research Centre, DeNDRon, the Bibbi and Niels Jensens Foundation, the Swedish Brain Foundation, the Swedish Research Council, the Knut and Alice Wallenberg Foundation, the Swedish Heart–Lung Foundation, the Foundation for Strategic Research, the Stockholm County Council, Karolinska Institutet, INSERM, Fondation d’Aide pour la Recherche sur la Sclrose en Plaques, Association Franaise contre les Myopathies, Infrastrutures en Biologie Sant et Agronomie (GIS–IBISA), the German Ministry for Education and Research, the German Competence Network MS, Deutsche Forschungsgemeinschaft, Munich Biotec Cluster M4, the Fidelity Biosciences Research Initiative, Research Foundation Flanders, Research Fund KU Leuven, the Belgian Charcot Foundation, Gemeinntzige Hertie Stiftung, University Zurich, the Danish MS Society, the Danish Council for Strategic Research, the Academy of Finland, the Sigrid Juselius Foundation, Helsinki University, the Italian MS Foundation, Fondazione Cariplo, the Italian Ministry of University and Research, the Torino Savings Bank Foundation, the Italian Ministry of Health, the Italian Institute of Experimental Neurology, the MS Association of Oslo, the Norwegian Research Council, the South–Eastern Norwegian Health Authorities, the Australian National Health and Medical Research Council, the Dutch MS Foundation and Kaiser Permanente.
Marina Evangelou is thanked for motivating the investigation of the FASLG association.
Footnotes
Competing Financial Interests
The JDRF/Wellcome Trust Diabetes and Inflammation Laboratory receives funding from Hoffmann La Roche and Eli Lilly and Company. ImmunoBase, for which O.B. is a principal investigator ,is funded in part by Eli Lilly and Company.
URLs
ImmunoBase http://www.immunobase.org
mlogitBMA R package http://cran.r–project.org/web/packages/mlogitBMA/index.html
NIHR GWAS catalog http://www.genome.gov/admin/gwascatalog.txt
DIAGRAM http://diagram–consortium.org/
References
- 1.Cotsapas C, et al. Pervasive sharing of genetic effects in autoimmune disease. PLoS Genet. 2011;7 doi: 10.1371/journal.pgen.1002254. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Plagnol V, Smyth DJ, Todd JA, Clayton DG. Statistical independence of the colocalized association signals for type 1 diabetes and RPS26 gene expression on chromosome 12q13. Biostatistics. 2009;10:327–334. doi: 10.1093/biostatistics/kxn039. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Wallace C, et al. Statistical colocalization of monocyte gene expression and genetic risk variants for type 1 diabetes. Hum Mol Genet. 2012;44:1–35. doi: 10.1093/hmg/dds098. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Wallace C. Statistical testing of shared genetic control for potentially related traits. Genet Epidemiol. 2013;37:802–813. doi: 10.1002/gepi.21765. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Giambartolomei C, et al. Bayesian test for colocalization between pairs of genetic association studies using summary statistics. PLoS Genet. 2014;10:e1004383. doi: 10.1371/journal.pgen.1004383. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Smyth DJ, et al. Shared and distinct genetic variants in type 1 diabetes and celiac disease. The New England journal of medicine. 2008;359:2767–2777. doi: 10.1056/NEJMoa0807917. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Cortés A, Brown MA. Promise and pitfalls of the Immunochip. Arthritis research & therapy. 2011;13:101. doi: 10.1186/ar3204. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Parkes M, Cortés A, van Heel DA, Brown MA. Genetic insights into common pathways and complex relationships among immune–mediated diseases. Nat Rev Genet. 2013;14:661–673. doi: 10.1038/nrg3502. [DOI] [PubMed] [Google Scholar]
- 9.Eyre S, et al. High–density genetic mapping identifies new susceptibility loci for rheumatoid arthritis. Nat Genet. 2012;44:1336–40. doi: 10.1038/ng.2462. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Beecham AH, et al. Analysis of immune–related loci identifies 48 new susceptibility variants for multiple sclerosis. Nat Genet. 2013;45:1353–60. doi: 10.1038/ng.2770. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Mero IL, et al. A rare variant of the TYK2 gene is confirmed to be associated with multiple sclerosis. Eur J Hum Genet. 2010;18:502–4. doi: 10.1038/ejhg.2009.195. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Barrett JC, et al. Genome–wide association study and meta–analysis find that over 40 loci affect risk of type 1 diabetes. Nat Genet. 2009;41:703–7. doi: 10.1038/ng.381. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Ueda H, et al. Association of the T–cell regulatory gene CTLA4 with susceptibility to autoimmune disease. Nature. 2003;423:506–11. doi: 10.1038/nature01621. [DOI] [PubMed] [Google Scholar]
- 14.Fung EYMG, et al. Analysis of 17 autoimmune disease–associated variants in type 1 diabetes identifies 6q23/TNFAIP3 as a susceptibility locus. Genes and immunity. 2009;10:188–91. doi: 10.1038/gene.2008.99. [DOI] [PubMed] [Google Scholar]
- 15.Ioannidis JPA. Why most discovered true associations are inflated. Epidemiology. 2008;19:640–648. doi: 10.1097/EDE.0b013e31818131e7. (Cambridge, Mass) [DOI] [PubMed] [Google Scholar]
- 16.Evangelou M, et al. A method for gene–based pathway analysis using genomewide association study summary statistics reveals nine new type 1 diabetes associations. Genet Epidemiol. doi: 10.1002/gepi.21853. (in press) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Trynka G, et al. Dense genotyping identifies and localizes multiple common and rare variant association signals in celiac disease. Nat Genet. 2012;43:1193–1201. doi: 10.1038/ng.998. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Flutre T, Wen X, Pritchard J, Stephens M. A Statistical Framework for Joint eQTL Analysis in Multiple Tissues. PLoS Genet. 2013;9 doi: 10.1371/journal.pgen.1003486. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Deelen P, et al. Improved imputation quality of low–frequency and rare variants in European samples using the Genome of The Netherlands. 2014. [DOI] [PMC free article] [PubMed]
- 20.Visscher PM, Brown MA, McCarthy MI, Yang J. Five years of GWAS discovery. Am J Hum Genet. 2012;90:7–24. doi: 10.1016/j.ajhg.2011.11.029. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Nalls MA, et al. Multiple loci are associated with white blood cell phenotypes. PLoS Genet. 2011;7:e1002113. doi: 10.1371/journal.pgen.1002113. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Morris AP, et al. Large–scale association analysis provides insights into the genetic architecture and pathophysiology of type 2 diabetes. Nat Genet. 2012;44:981–90. doi: 10.1038/ng.2383. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Dupuis J, et al. New genetic loci implicated in fasting glucose homeostasis and their impact on type 2 diabetes risk. Nat Genet. 2010;42:105–16. doi: 10.1038/ng.520. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Nogueira TC, et al. GLIS3, a susceptibility gene for type 1 and type 2 diabetes, modulates pancreatic beta cell apoptosis via regulation of a splice variant of the BH3–only protein Bim. PLoS Genet. 2013;9:e1003532. doi: 10.1371/journal.pgen.1003532. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Harder MN, et al. Type 2 diabetes risk alleles near BCAR1 and in ANK1 associate with decreased β–cell function whereas risk alleles near ANKRD55 and GRB14 associate with decreased insulin sensitivity in the Danish Inter99 cohort. The Journal of clinical endocrinology and metabolism. 2013;98:E801–6. doi: 10.1210/jc.2012-4169. [DOI] [PubMed] [Google Scholar]
- 26.Scott RA, et al. Genome–wide association study imputed to 1000 genomes reveals 18 novel associations with type 2 diabetes. American Society of Human Genetics. 2014 [Google Scholar]
- 27.Jostins L, et al. Host–microbe interactions have shaped the genetic architecture of inflammatory bowel disease. Nature. 2012;491:119–24. doi: 10.1038/nature11582. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Sawcer S, et al. Genetic risk and a primary role for cell–mediated immune mechanisms in multiple sclerosis. Nature. 2011;476:214–9. doi: 10.1038/nature10251. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Panagiotou OA, Ioannidis JPA. What should the genome–wide significance threshold be? Empirical replication of borderline genetic associations. Int J Epidemiol. 2012;41:273–86. doi: 10.1093/ije/dyr178. [DOI] [PubMed] [Google Scholar]
Methods-Only References
- 30.Onengut–Gumuscu S, et al. Fine Mapping of type 1 diabetes susceptibility loci and evidence for colocalization of causal variants with lymphoid gene enhancers. Nat Genet. 2014 doi: 10.1038/ng.3245. in press. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Cooper JD, et al. Seven newly identified loci for autoimmune thyroid disease. Hum Mol Genet. 2012;21:5202–8. doi: 10.1093/hmg/dds357. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Raftery AE. Approximate Bayes factors and accounting for model uncertainty in generalised linear models. Biometrika. 1996;83:251–266. [Google Scholar]
- 33.Swafford ADE, et al. An allele of IKZF1 (Ikaros) conferring susceptibility to childhood acute lymphoblastic leukemia protects against type 1 diabetes. Diabetes. 2011;60:1041–4. doi: 10.2337/db10-0446. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.