Abstract
We tested 310,605 single-nucleotide polymorphisms for association in 778 celiac disease cases and 1422 controls. Outside the HLA, the most significant finding (rs13119723, P=2.0 × 10−7, empirical genome-wide significance P=0.045) was in the KIAA1109/Tenr/IL2/IL21 linkage disequilibrium block. Association was independently confirmed in two further collections (strongest at rs6822844, 24kB 5' of IL21, meta-analysis P=1.3 × 10−14, OR 0.63), suggesting genetic variation in this region predisposes to celiac disease.
Celiac disease is a common (1% prevalence) small intestinal inflammatory condition induced by dietary wheat, rye, and barley. However despite high heritability (estimated at 87% from twin studies1), no non-HLA genetic factors have been identified and convincingly replicated. The majority of celiacs possess HLA-DQ2 (the remainder mostly HLA-DQ82), and how HLA-DQ2 presents cereal peptides to intestinal T cells is understood3. However HLA-DQ2 is common in healthy individuals, demonstrating it is necessary but not sufficient for disease development.
We therefore designed a genome wide association study to identify predisposing genetic factors in celiac disease. We genotyped samples with Illumina BeadChips (Supplementary Methods). After quality control, association analysis was performed on 310,605 SNPs with minor allele frequency >1% genotyped in 778 UK celiac cases and 1422 UK population controls (Supplementary Table 1). Overall SNP call rate was 99.87%. Single SNP association statistics are presented in Supplementary Figure 1.
Highly significant association was seen around the HLA locus, as expected. Association was strongest at rs2187668, which maps to the first intron of HLA-DQA1 (χ2=769.1, P<10−19, frequency of A allele in controls 13.8%, cases 53.1%, odds ratio (OR) 7.04 [95% CI 6.08 - 8.15]). When compared with classical HLA typing (Supplementary Methods), the rs2187668-A allele efficiently tagged HLA-DQ2.5cis (r2=0.97, Supplementary Table 2). HLA-DQ2.5cis is the most common HLA-DQ2 haplotype associated with celiac disease, where the two chains of the DQ2 heterodimer are encoded on the same chromosome. One or two copies of HLA-DQ2.5cis (inferred by rs2187668 genotype) were present in 89.2% of UK celiac patients versus 25.5% of population controls. In order to identify other HLA predisposing variants occurring in the presence, or absence, of HLA-DQ2.5cis, we performed further analyses stratified by rs2187668 genotype. In cases (n=558) and controls (n=331) of rs2187668-AG genotype, peak association was seen at rs9357152 (P=5.2 × 10−14); and in cases (n=83) and controls (n=1059) of rs2187668-GG genotype peak association was seen at rs9275141 (P=3.9 × 10−16). Numbers of rs2187668-AA cases (n=31) were too small for analysis. The finding that rs2187668, rs9275141 and rs9357152 map within or adjacent to HLA-DQA1 and -DQB1 underpins the critical role of HLA-DQ2/8 in antigen presentation in celiac disease.
Outside the HLA region, we observed more significant SNPs than would be expected by chance, with 56 SNPs showing association at P<10−4 (Supplementary Table 3). Many of these SNPs are in close proximity, suggesting that some of the excess in low p-value SNPs might be due to true disease associations among multiple SNPs in linkage disequilibrium with nearby disease variants. We therefore prioritised these findings for rapid replication (interim results in Supplementary Table 3), whilst designing a more extensive SNP replication study. We noted weak evidence for association in the previously reported region CD28-CTLA4-ICOS4 (rs4675374 P=0.007, rs11681040 P=0.008) but not the MYO9B5 region.
The most significant (non-HLA) finding was rs13119723 (P=2.0 × 10−7, frequency of G allele in controls 15.8%, cases 10.1%). Permutation of affection status labels demonstrated genome-wide significance (in 9 of 200 (P=0.045) permutations the most significant permuted P value was ≤2.0 × 10−7). The location of rs13119723 close to IL2 and IL21 made it a highly plausible celiac disease candidate gene. We did not observe any evidence for statistical interaction between rs13119723 genotype and inferred HLA-DQ2.5cis genotype (P>0.20). We then confirmed association of rs13119723 with celiac disease in two separate collections (Table 1). The G allele of rs13119723 was more common in controls in each collection, and meta-analysis (all 4,680 samples) established highly significant disease association at rs13119723 (P=4.8 × 10−11).
Table 1. Chromosome 4q27 markers in UK genome wide association scan, and replication collections.
UK genome wide association scan collection | Dutch collection | Irish collection | Meta-analysis | ||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Allele Freq. Cases |
Allele Freq. Controls |
P value1 | Allele Freq. Cases |
Allele Freq. Controls |
P value1 | Allele Freq. Cases |
Allele Freq. Controls |
P value1 | Odds Ratio [95% CI] |
P value1 | |
n=778 | n=1422 | n=508 | n=929 | n=483 | n=560 | ||||||
rs6835946 | 30.3% | 29.6% | 0.63 | 25.1% | 25.5% | 0.81 | 31.4% | 33.5% | 0.32 | ||
rs11938795 | 21.2% | 26.3% | 0.00017 | ||||||||
rs4374642 | 10.5% | 8.4% | 0.020 | 12.1% | 8.4% | 0.0015 | 8.3% | 6.7% | 0.17 | ||
rs13151961 | 12.6% | 17.9% | 5.2 × 10−6 | 12.2% | 19.1% | 2.2 × 10−6 | 14.8% | 19.4% | 0.0056 | 0.65 [0.58 - 0.73] | 1.3 × 10−12 |
rs4505848 | 34.4% | 32.3% | 0.15 | 37.6% | 34.9% | 0.15 | 31.3% | 27.8% | 0.084 | ||
rs4288027 | 10.4% | 8.4% | 0.02 | ||||||||
rs7683061 | 40.6% | 38.0% | 0.090 | ||||||||
rs13119723 | 10.1% | 15.8% | 2.0 × 10−7 | 11.5% | 16.4% | 0.00042 | 12.8 % | 16.2 % | 0.030 | 0.66 [0.58 - 0.74] | 4.8 × 10−11 |
rs11734090 | 21.5% | 26.5% | 0.00023 | ||||||||
rs7699742 | 34.9% | 32.2% | 0.066 | ||||||||
rs1127348 | 24.8% | 21.6% | 0.016 | 25.7% | 24.0% | 0.34 | 21.7% | 16.9% | 0.0053 | ||
rs7678445 | 9.9% | 7.7% | 0.013 | ||||||||
rs7684187 | 25.2% | 30.1% | 0.00056 | ||||||||
rs11732095 | 8.8% | 8.6% | 0.79 | 9.9% | 9.1% | 0.45 | 8.5% | 7.7% | 0.50 | ||
rs716501 | 34.8% | 32.2% | 0.087 | ||||||||
rs10857092 | 5.2% | 6.6% | 0.069 | 7.8% | 7.3% | 0.60 | 7.0% | 7.7% | 0.53 | ||
rs6848139 | 10.2% | 8.5% | 0.050 | ||||||||
rs6852535 | 29.7% | 29.4% | 0.84 | ||||||||
rs12642902 | 28.5% | 34.6% | 3.3 × 10−5 | ||||||||
rs6822844 | 12.6% | 17.9% | 4.6 x 10−6 | 12.4% | 18.5% | 2.1 × 10−5 | 14.2% | 19.7% | 0.0013 | 0.63 [0.57 - 0.71] | 1.3 × 10−14 |
rs4492018 | 26.6% | 26.5% | 0.93 | 22.7% | 23.1% | 0.80 | 29.1% | 30.2% | 0.60 | ||
rs975405 | 38.6% | 42.8% | 0.0071 | 41.5% | 43.3% | 0.34 | 39.9% | 43.9% | 0.070 | ||
rs7682241 | 36.6% | 34.4% | 0.13 | ||||||||
rs17005931 | 26.5% | 26.2% | 0.82 | ||||||||
rs1398553 | 34.1% | 30.6% | 0.016 | 35.2% | 32.6% | 0.18 | 30.6% | 25.7% | 0.015 | ||
rs2893008 | 10.0% | 8.0% | 0.021 | 11.0% | 7.4% | 0.0011 | 7.3% | 6.5% | 0.50 | ||
rs6840978 | 16.3% | 21.5% | 3.8 × 10−5 | 15.2% | 21.9% | 2.0 × 10−5 | 19.2% | 24.0% | 0.0083 | 0.70 [0.63 - 0.78] | 1.1 × 10−10 |
P values from χ2 squared test of allele counts. All tests are two-tailed.
The rs13119723 SNP maps to a region of strong linkage disequilibrium (Supplementary Fig. 2). We had genotyped 27 SNPs in this 4q27 region, extending ∼480kb from rs6835946 to rs6840978. In addition to rs13119723, four other SNPs showed association with celiac disease at P<10−4 in the UK dataset (Fig. 1, Table 1). We further genotyped rs6822844, rs13151961 and rs6840978 (all strongly correlated with rs13119723, Fig 1.) in the Dutch and Irish collections and replicated the UK dataset associations (Table 1, Supplementary Table 4). The strongest association overall was seen at rs6822844, approximately 24kB 5' of IL21 (meta-analysis P=1.3 × 10−14, OR 0.63 [0.57 - 0.71]).
Markers on the HumanHap300 BeadChip (Illumina) are haplotype tag SNPs. We found that the 27 SNPs genotyped in the UK collection highly efficiently captured the common genetic variation in the ∼480kb region (161 of 165 common phase I+II HapMap SNPs pairwise tagged at r2>0.8 in CEU population6, Supplementary Methods). Genotyping of further markers in the UK collection was therefore unlikely to contribute substantial additional information.
Finer analysis of haplotype structure in the ∼480kb region in the UK collection showed subdivision into two closely correlated ∼439kb and ∼40kb haplotype blocks (using strict criteria7). The rs13119723-G allele was found on a single strongly associated haplotype in both blocks (Supplementary Fig. 2), with haplotype frequencies in the 439kb block of 10.1% in cases and 15.3% in controls (P=2.1 × 10−6), and in the 40kb block of 16.3% in cases and 21.5% in controls (P=4.3 × 10−5). We genotyped 10 additional SNPs to tag >5% frequency haplotypes (in addition to the 4 SNPs already tested) in the Dutch and Irish collections, and found similar haplotype structure and association across all three populations (Supplementary Table 4). Due to extensive linkage disequilibrium, these analyses did not enable us to determine the causal variant associated with celiac disease in the 4q27 region. The population specific genetic variance at the associated 4q27 markers (CEU HapMap data) is relatively high, suggesting possible selection in the Northern European population.
The 4q27 celiac disease associated region contains three known protein coding genes NP_640336.1/Tenr, IL2, IL21 and a predicted gene of unknown function (KIAA1109/Q6ZS70). We manually annotated the human genome sequence in the region (not shown), but did not identify further genes. IL-2, secreted in an autocrine fashion by antigen-stimulated T cells, is a key cytokine for T cell activation and proliferation. Another T cell derived cytokine, IL-21, enhances B, T and NK cell proliferation and interferon-γ production. Both cytokines are implicated in the mechanisms of other intestinal inflammatory diseases8,9. Expression profiles for the four genes across multiple cell/tissue types were examined in the GNF SymAtlas database (Supplementary Methods). Tenr is specifically testis expressed, and is an unlikely candidate for the causal celiac disease susceptibility gene. The function of KIAA1109 is largely unknown10, although KIAA1109 is widely expressed as multiple splice variants in multiple tissue types. We specifically looked at gene expression in duodenal tissue from normal and celiac disease individuals (with normal histology, and with villous atrophy). Tenr expression levels were mostly undetectable. No differences were seen between normal and treated celiac individuals for KIAA1109, IL2 or IL21. In the presence of inflammation (untreated celiac disease), KIAA1109 and IL2 levels showed a modest reduction and IL21 an increase (Supplementary Fig. 3). The syntenic mouse region to human 4q27 (Idd3) determines susceptibility to multiple autoimmune diseases in the NOD mouse model, by a mechanism influencing IL-2 mRNA/protein levels and CD4+ CD25+ regulatory T cell activity11. However further studies are required to determine the human celiac disease susceptibility gene in this region.
Our genome wide association study has identified genetic variation in a linkage disequilibrium block encompassing the KIAA1109/Tenr/IL2/IL21 genes as a novel susceptibility factor for celiac disease. In addition to further investigation of this 4q27 region, the next steps in dissecting the genetic causes of celiac disease include larger scale replication of other putative associations and additional genome-wide analyses (e.g. of copy number variation12).
Supplementary Material
Acknowledgments
We acknowledge funding from Coeliac UK; the Coeliac Disease Consortium (an innovative cluster approved by the Netherlands Genomics Initiative and partly funded by the Dutch Government, grant BSIK03009); the Netherlands Genomics Initiative (grant 050-72-425); the Netherlands Organization for Scientific Research (grant 901-04-219); the Science Foundation Ireland and the Wellcome Trust (GR068094MA Clinician Scientist Fellowship to D.A.vH; New Blood Fellowship to R.M.; support for the work of R.McG and P.D.). The authors acknowledge use of DNA from the British 1958 Birth Cohort collection, funded by the UK Medical Research Council grant G0000934 and the Wellcome Trust grant 068545/Z/02.
We thank C.A. Mein and the Barts and The London Genome Centre for advice and genotyping support; D.Simpkin, T. Dibling and C. Hand for genotyping (Sanger Institute); M.J. Caulfield for advice on study design; D.P. Kelsell for comments on the manuscript; D. Strachan and W.L. McArdle for 1958 Birth Cohort samples; A. Monsuur for patient recruitment, G. Meijer and J. Meijer for histology review, K. Duran for DNA extraction, H. van Someren for clinical database management (the Netherlands); A. Ryan, G. Turner, M. Abuzakouk, N. Kennedy, F. Stevens and C. O'Morain for patient and control recruitment and sample management (Ireland). We thank J. Loveland for EST annotation and checking. We thank the Wellcome Trust Centre for Human Genetics, University of Oxford for provision of computing facilities. We thank all celiac patients and controls for participating in this study.
References
- 1.Nistico L, et al. Gut. 2006;55:803–8. doi: 10.1136/gut.2005.083964. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Karell K, et al. Hum Immunol. 2003;64:469–77. doi: 10.1016/s0198-8859(03)00027-2. [DOI] [PubMed] [Google Scholar]
- 3.van Heel DA, West J. Gut. 2006;55:1037–46. doi: 10.1136/gut.2005.075119. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Hunt KA, et al. Eur J Hum Genet. 2005;13:440–4. doi: 10.1038/sj.ejhg.5201357. [DOI] [PubMed] [Google Scholar]
- 5.Monsuur AJ, et al. Nat Genet. 2005;12:1341–4. doi: 10.1038/ng1680. [DOI] [PubMed] [Google Scholar]
- 6.Altshuler D, et al. Nature. 2005;437:1299–320. [Google Scholar]
- 7.Gabriel SB, et al. Science. 2002;296:2225–9. doi: 10.1126/science.1069424. [DOI] [PubMed] [Google Scholar]
- 8.Sadlack B, et al. Cell. 1993;75:253–61. doi: 10.1016/0092-8674(93)80067-o. [DOI] [PubMed] [Google Scholar]
- 9.Monteleone G, et al. Gastroenterology. 2005;128:687–94. doi: 10.1053/j.gastro.2004.12.042. [DOI] [PubMed] [Google Scholar]
- 10.He QY, et al. Bioinformatics. 2006;22:2189–91. doi: 10.1093/bioinformatics/btl123. [DOI] [PubMed] [Google Scholar]
- 11.Yamanouchi J, et al. Nat Genet. 2007 [Google Scholar]
- 12.Redon R, et al. Nature. 2006;444:444–54. doi: 10.1038/nature05329. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.