Abstract
Family studies for Crohn disease (CD) report extensive linkage on chromosome 16q and pinpoint NOD2 as a possible causative locus. However, linkage is also observed in families that do not bear the most frequent NOD2 causative mutations, but no other signals on 16q have been found so far in published genome-wide association studies. Our aim is to identify this missing genetic contribution. We apply a powerful genetic mapping approach to the Wellcome Trust Case-Control Consortium and the National Institute of Diabetes and Digestive and Kidney Diseases genome-wide association data on CD. This method takes into account the underlying structure of linkage disequilibrium (LD) by using genetic distances from LD maps and provides a location for the causal agent. We find genetic heterogeneity within the NOD2 locus and also show an independent and unsuspected involvement of the neighboring gene, CYLD. We find associations with the IRF8 region and the region containing CDH1 and CDH3, as well as substantial phenotypic and genetic heterogeneity for CD itself. The genes are known to be involved in inflammation and immune dysregulation. These findings provide insight into the genetics of CD and suggest promising directions for understanding disease heterogeneity. The application of this method thus paves the way for understanding complex inheritance in general, leading to the dissection of different pathways and ultimately, personalized treatment.
Main Text
Crohn disease (CD) is a subclassification of idiopathic inflammatory bowel disease (IBD [MIM 266600]) and has a disease prevalence of about 100–150 per 100,000 people in populations of European ancestry.1 It is characterized by transmural and segmental inflammation, chiefly located in the ileocecal region of the gastrointestinal tract. There is disease heterogeneity: Some patients have involvement of other intestinal regions and, in many cases, extraintestinal manifestations2 as well.
Both CD and ulcerative colitis (UC), the more prevalent form of IBD, arise primarily from a faulty immune-defense system of the gut. Twin and family studies suggest that, unlike most other complex disorders, CD has a high heritability of 50%–60%3 and an estimated individual sibling recurrence risk ratio (λS) ranging from 15–30.4 The National Institute of Diabetes and Digestive and Kidney Diseases (NIDDK) IBD Genetics Consortium (IBDGC) has made progress in mapping genes involved in CD, and combined genome-wide association studies (GWASs) have identified 32 initial susceptibility loci,5 followed by an additional 39.6 Several of these loci are implicated in other diseases involving inflammatory and immune dysregulation (e.g., PTGER4 [MIM 601586] on chromosome 5p is common to CD, UC,7 and multiple sclerosis [MIM 126200]8). However, despite the large number of patients (∼22,000 cases),6 this study reported that the 71 loci accounted for only 23.2%6 of the genetic risk. The well-known problem of “missing heritability” thus applies to CD.
The missing heritability might partly be due to the fact that current methods miss rare variants with large effects. We suggest it is likely that both common and rare variants are missed by the single-SNP approach, which makes the unrealistic assumption that one of the SNPs in the genotyping platform either is the causal agent or is in almost complete linkage disequilibrium (LD) with the causal agent. There are also severe problems associated with the stringent significance levels required as a result of multiple testing.
Here, we revisit the genome-wide association (GWA) database in an attempt to shed light on the missing genetic contribution to CD by applying a powerful multimarker mapping approach.9 Unlike single-SNP tests, our procedure does not assume that the causal variant is in strong LD with a particular SNP on the platform. Instead, we test evidence for association by using a composite likelihood approach in which we use multiple SNPs within a genomic region. This method gives an estimate of the most likely causal location by taking into account the structure of LD in the human genome. For this, we used genetic distances from LD maps rather than physical locations in kilobases (kb). We focused on chromosome 16q, which had previously been reported to show linkage over an extensive region.10 The discovery in 2001 that variants in NOD2 (16q12.1 [MIM 605956]), which encodes a protein required for the pathogen recognition through its leucine-rich repeats (LRRs), were associated with CD was a crucial step in providing a direct link between CD and a genetically altered immune system.11 It was, however, subsequently reported that there is no direct relationship between the prevalence of the three most common causal variants (rs17860491, rs17860492, rs17860493) in the general population and CD frequency.12 Indeed, NOD2 causal mutations have not been found in all patients,13 but this finding is not surprising in complex inheritance. However, linkage was still observed in families who did not bear the most frequent NOD2 mutations.14 However, no other signals on 16q have been found so far in published GWASs. To add to the complexity, NOD2 mutations are associated with ileal CD but not with perianal or colonic disease.15
The strong evidence of linkage on 16q cannot, in our view, be fully accounted for by the one gene, NOD2, upon which so much attention has been focused. Indeed, with our method we found several distinct association signals of high significance on chromosome 16q alone. Here, we report three of these regions that we have studied in detail. Each emerges from the Wellcome Trust Case Control Consortium (WTCCC) data and, quite independently, from the NIDDK database, which contains information on Jewish ancestry as well as detailed phenotypic data. We analyzed 1,698 cases for CD and 2,948 controls from the GWA scan of the UK WTCCC study16 by using the Affymetrix 500K array. Half of the ∼3,000 nationally ascertained controls came from the 1958 British birth cohort collection and the remainder from the UK Blood Services collection of common controls.16 The cases were confirmed to be patients of any subtype of CD via endoscopic, radiological procedures and histopathological criteria. The cases were not specifically enriched for early age of onset or family history, and they were from a variety of IBD clinics. For the replication study, we analyzed 813 North American patients with CD and 947 controls made available by the NIDDK IBDGC. The NIDDK IBDGC GWA scan was based on the Illumina HumanHap300 array,17 which has a smaller sample size and an SNP set that overlaps only partially with the WTCCC SNP array. However, the NIDDK has more phenotypic data than the WTCCC and includes information on the involvement of other intestinal locations. Details are available in other reports.16, 17
Our approach uses genetic locations from a high-resolution LD map that can describe the underlying structure of LD. We can visualize LD maps18 by plotting marker locations in LD units (LDU) against distances in kilobases (kb), which demonstrates the nonlinear relationship between physical distance and LD. These metric genetic maps are analogous to linkage maps in cM.19 A high-density LDU map for the whole of chromosome 16 was constructed with the CEU (Utah residents with Northern and Western European ancestry from the CEPH collection) phase II data from the HapMap Project.20 Figure 1A shows the “block-step” structure of a 225 kb region that flanks NOD2, and Figure 2, Figure 3 show similar maps for the other two gene regions. Blocks of conserved LD (horizontal line, Figure 1A) are areas of reduced haplotype diversity, whereas steps represent LD breakdown due to recombination (i.e., cross-over profiles agree with LDU patterns).21 We also constructed LDU maps for the three gene regions from the latest HapMap III CEU data and the WTCCC controls, but all maps yielded the same LDU patterns (results not shown). However, we used the HapMap Phase II data to obtain genetic distances in LDU because their estimation is based on a higher resolution as a result of a denser SNP coverage than the HapMap III data. The methodology for constructing LDU maps is based on the Malécot model, which describes the decline of LD as a function of distance18 and is an extension of earlier work.22
Figure 1.
NOD2, CYLD, and Genetic Heterogeneity
(A) Analysis of all patients (unstratified data); SNP data are grouped in two separate windows, covering either NOD2 or CYLD. The vertical lines are the estimated locations (Ŝ) for the windows including NOD2 alone and CYLD alone. The LD map, which plots LDUs (y axis) against physical distance (x axis), is shown for the region surrounding the estimates of Ŝ.
(B) Stratified data are shown for carriers and noncarriers of the most frequent NOD2 causal variants. (B) includes SNPs for both genes (NOD2 and CYLD windows). See text for details on the stratification procedure. The SNPs rs17860491, rs17860492, and rs17860493 correspond to the reported functional SNPs in NOD2. This figure also depicts the position of a putative enhancer (see text).
Figure 2.
Localization within the CDH3 and CDH1 Region
The LD map, which plots HapMap LDUs (y axis) against kb (x axis), is shown for the region surrounding Ŝ. The red vertical arrow is the estimated location Ŝ for both data sets, but because of the very long LD block, the causal location(s) could reside anywhere in this block. The rs16260 is a functional SNP within this block and is located 365 nucleotides upstream of the transcription start site for CDH1. ∗NIDDK data only contain ileal CD with the involvement of at least one extraileal intestinal location.
Figure 3.
Localization within the IRF8 Region
The LD map, which plots HapMap LDUs (y axis) against kb (x axis), is shown for the region surrounding the Ŝ. The red vertical arrow is the estimated location Ŝ. ∗NIDDK data only contain ileal CD with an involvement of at least one extraileal intestinal location. Two SNPs have been identified for UC and MS in GWA meta-analyses.
The entire 16q was divided into nonoverlapping windows on the basis of the LDU map, and each window had a minimum length of 10 LDU and a minimum of 30 SNPs. For each window, the multimarker method9 returned a p value from composite test statistics and an estimated location of the causal variant (Ŝ). The association mapping that was applied for each window was based on the same Malécot model used for constructing LDU maps, but we modeled the decline of affection-status-by-SNP association as a function of genetic distance in LDU by using HapMap LDU locations.9 We tested each window for an association with CD by using composite likelihood (Λ) that combined information from all single-SNP tests within each window and avoided undue multiple testing correction. The significance for each window is based on an F statistic, which we formulated by comparing Λ from the null model, which assumes no association, with Λ from the alternative, where the Ŝ is estimated iteratively. The p value takes into account the different degrees of freedom for each window. For convenience, the F statistic was converted to χ2 with 1 degree of freedom.9 The 95% confidence interval (CI) for the estimated location Ŝ was obtained as Ŝ ± t SE, where t is the tabulated value of Student's t test and SE is the standard error of the parameter Ŝ. We obtained the predicted estimates of Ŝ and 95% CIs by fitting the LDU genetic distances, given that this approach increases the power of association.9 For convenience, we converted these estimates back to kb (NCBI35 assembly) by using linear interpolation of the two flanking SNPs in HapMap. When Ŝ is in an LD block (horizontal line), all markers within that block have the same LDU location. In such cases, we took the midpoint of that block as an estimate of Ŝ in kb. Therefore, the CIs measured in kb cannot be symmetrical because of the “block-step” structure of the human genome. Other details are given by Maniatis et al.,9 who provide evidence of the power and resolution of this approach over single SNPs. A more recent study on a disease case showed similar results.23
Initially, using the WTCCC data, we identified three major signals on chromosome 16q in the regions of NOD2 and CYLD (MIM 605018), CDH3 (MIM 114021) and CDH1 (MIM 192090), and IRF8 (MIM 601565) (Table 1). We then used the NIDDK GWA data to replicate this study. Table 1 and Figure 1A show the significance and estimated locations (Ŝ) for the genomic region that harbors NOD2 and CYLD for both datasets. Two independent signals of association were mapped to NOD2 and CYLD. An analysis of the window that only included marker information from NOD2 yielded a highly significant association with CD for the WTCCC dataset; this association was replicated with the NIDDK dataset. The estimated (Ŝ) location was identical for both datasets (49,306.7) and was within an LDU block that spans 16 kb. The block includes exons 4 and 8, which harbor two of the most frequent causative variants (rs17860491-R702W and rs17860492-G908R, respectively) within NOD2. The third most frequent variant (rs17860493-1007fs) is on a neighboring LD block and has a slightly different LDU location (see LDU block, Figure 1A). The analysis of the WTCCC data for the CYLD window yielded an estimated location 11 kb downstream of the gene (Figure 1A). This signal was replicated with the NIDDK data and had an estimated location 234 base pairs downstream of the WTCCC Ŝ.
Table 1.
Association Statistics and Estimated Location of the Causal Variation for Three Different Windows Covering NOD2 and CYLD, in Relation to the Locations on the Human Genome Sequence
Data | Windowa | Cases | χ21b | p Value | Estimated Location (kb) | 95% Confidence Interval (kb) |
---|---|---|---|---|---|---|
WTCCC | NOD2 | 1,698 | 62.6 | 3 × 10−15 | 49,306.7 | 49,265–49,324 |
NIDDK | NOD2 | 813 | 37.1 | 1 × 10−9 | 49,306.7 | 49,265–49,396 |
WTCCC | CYLD | 1,698 | 54.4 | 2 × 10−13 | 49,403.8 | 49,397–49,408 |
NIDDK | CYLD | 813 | 12.5 | 4 × 10−4 | 49,404.0 | 49,404–49,408 |
The coordinates for NOD2 and CYLD are 49,289–49,324 and 49,334–49,393 kb, respectively.
Window with marker information covering NOD2 or the adjacent window covering CYLD.
A χ2 determined via the composite-likelihood method for each window.
Figure 1A shows the estimates of Ŝ for the two different NOD2 and CYLD windows and datasets. For both datasets, the CYLD window was significantly associated with CD (2 × 10−13 and 4 × 10−4 for WTCCC and NIDDK, respectively, Table 1). The lower significance in the NIDDK data, as compared to the WTCCC result, is probably not only due to the difference in sample size but also to the fact that WTCCC included any subtype of CD and not just the ileal form of the condition. The 95% CI for the CYLD window is very narrow (8.3 kb) for both datasets because Ŝ is within a region of LD breakdown caused by a recombination hot spot. Remarkably, this location is approximately 2 kb away from a predicted enhancer region (coordinates: 49,405.9–49,407.2 kb), which is within the 95% CI (Table 1). This “enhancer element 85” has been identified in a bioinformatics study that combined sequence conservation and functional studies.24
These results show that the NOD2 region is more complex than was previously thought. The two different Ŝ locations within the region containing NOD2 and CYLD are an indication of the existence of different risk genes in different patients (genetic heterogeneity). We further investigated the relationship of these two genes by analyzing a window that included marker information from both NOD2 and CYLD and by stratifying the data (Figure 1B). The three frequent causal NOD2 mutations11, 12, 25 were not included in the genotyping platforms for either WTCCC or NIDDK, and we could not directly identify which of those patients bore the mutations. For genetic stratification with respect to carrier status for the most common NOD2 mutations, we used the SNP rs2076756 in the WTCCC dataset (G was the minor allele and had a frequency of 0.24 in controls and 0.32 in cases) and rs5743289 in the NIDDK dataset (T was the minor allele and had a frequency of 0.17 in controls and 0.24 in cases). The stratification by SNP separated, as far as possible, the patients with NOD2 mutations from those without. This finding was feasible because the two SNPs (100 bp apart) were in complete LD and were in a region of conserved LD that contained the three NOD2 mutations (see LDU map in Figure 1A; also previously reported for rs207675626). The group that included all the carriers of the disease-associated allele (carriers were individuals who were heterozygous or homozygous for the minor allele, i.e., they had AG/GG or CT/TT genotypes for the rs2076756 [WTCCC] or rs5743289 [NIDDK] SNPs, respectively) yielded much higher significance levels than before, even though the number of patients was much smaller than that of the full dataset (Table 2). This finding gives evidence that these two groups of patients do represent the majority of cases with the functional causative NOD2 mutations. In addition, Ŝ was within NOD2, despite the fact that we analyzed both genes in the same window (NOD2 and CYLD window, Table 2, Figure 1B). Using the unstratified data showed that the position of Ŝ (49,306.7 kb) was exactly the same as for the NOD2 window in Figure 1A.
Table 2.
Association Statistics and Estimated Location of the Causal Variation for Three Different Windows Covering NOD2 and CYLD, in Relation to the Locations on the Human Genome Sequence
Data | Windowa | Cases | χ21b | p Value | Estimated Location (kb) | Signal |
---|---|---|---|---|---|---|
WTCCCc | ||||||
AA | NOD2 and CYLD | 805 | 46.0 | 1 × 10−11 | 49,403.8 | CYLD |
AG | NOD2 and CYLD | 665 | 124.6 | 6 × 10−29 | 49,306.7 | NOD2 |
GG | NOD2 and CYLD | 199 | 82.8 | 9 × 10−20 | 49,306.7 | NOD2 |
NIDDKc | ||||||
CC | NOD2 and CYLD | 482 | 13.5 | 2 × 10−4 | 49,403.8 | CYLD |
CT | NOD2 and CYLD | 266 | 60.5 | 7 × 10−15 | 49,306.7 | NOD2 |
TT | NOD2 and CYLD | 60 | 103.3 | 3 × 10−24 | 49,306.7 | NOD2 |
The coordinates for NOD2 and CYLD are 49,289–49,324 and 49,334–49,393 kb, respectively.
Window with marker information from both genes.
A χ2 determined via the composite-likelihood method or each window.
Data stratified on the basis of the AA, AG, and GG genotypes for the rs2076756 SNP from the WTCCC data (49,314.4 kb) and CC, CT, and TT genotypes for the rs5743289 SNP from the NIDDK data (49,314.3 kb).
The analyses of the noncarrier cases for the WTCCC (AA for rs2076756) and the NIDDK (CC for rs5743289) produced essentially identical results; both pointed toward a location approximately 11 kb downstream of CYLD (Table 2, Figure 1B). This genetic stratification reveals heterogeneity among patients with CD and indicates that CYLD plays a larger role in patients who do not carry NOD2 functional mutations.
NOD2 interacts with nuclear factor-кB (NF-кB) by signaling in a complex way, which includes ubiquitinylation.27 Interestingly, CYLD is a deubiquitinating (ubiquitin-removing) enzyme that has been shown to regulate cell proliferation, cell survival, and inflammatory responses28 and that is also involved in NF-кB signaling. Dysregulation of NF-кB signaling leads to a defective immune system, causing an immunodeficient or autoimmune phenotype, depending on whether NF-кB function is impaired or persistent.29 CYLD is important in immune homeostasis because it prevents the spontaneous activation of NF-кB in peripheral T and B lymphocytes. The peripheral T cells from CYLD-deficient mice have increased sensitivity and a heightened response to T cell receptor (TCR) stimulation; this heightened response leads to spontaneous inflammation in the colon30 and colitis-associated tumorigenesis.31 Inflammation is the major underlying phenotype of CD, and some CD patients develop colon cancer at a later stage in life. Furthermore, genome-wide cDNA microarray analysis demonstrates that CYLD expression is downregulated in CD.32
The WTCCC dataset contains patients who have any subtype of CD but who have no subclassification in the database. On the other hand, the NIDDK database contains additional disease-related information—that on a possible extraileal intestinal involvement in particular—and also classifies the patients and controls according to ancestry (Jewish or non-Jewish). Given that there was a prior expectation of genetic differentiation across these categories,33 we exploited this extra information to stratify the analyses of the CDH3, CDH1, and IRF8 regions.
Table 3 shows the significance and estimated locations we found for the signals in the IRF8 window and the window between CDH3 and CDH1. For the analyses of the WTCCC data, both windows were significantly associated with CD (1 × 10−8 and 6 × 10−9, respectively). These two signals were not initially replicated with the NIDDK GWA scan; however, when we reanalyzed the windows on the basis of a subset of the data by using phenotypic information given by the NIDDK IBDGC, they showed significant association. This subset included patients who had ileal CD and an involvement of at least one extraileal intestinal location, i.e., jejunal, colorectal, or perianal. We replicated the WTCCC signal near the CDH3 and CDH1 window by using this subset of the NIDDK data, despite the much smaller number of cases (Table 3). Thus, phenotypic heterogeneity is clearly important. For both GWA scans, the estimated location Ŝ for the former window was between CDH3 and CDH1, within an LDU block spanning 65 kb (Figure 2). The causal locus could be anywhere within this block, which included the 3′ intron 2 region between CDH3 and CDH1 (Figure 2). Also within this block was a functional promoter SNP, rs1626034 (Figure 2), that had previously been associated with postinfectious IBD.35 CDH3 and CDH1 encode cadherin proteins that participate in cell recognition, signaling, morphogenesis, and tumor progression. CDH1 encodes an epithelial cadherin (E-cadherin) that is expressed in the intestine and which has essential functions in intestinal homeostasis.36 The loss of E-cadherin expression leads to apoptosis and cell shedding and to disruption of the maturation of paneth and goblet cells, which are important to the innate immune system and to microbial defense.36 E-cadherin helps to maintain the intestinal epithelial defense system, and reduced CDH1 expression is a feature of CD and UC patients with an inflamed intestinal epithelium.37 A genome-wide linkage analysis of CD reported evidence of linkage on 16q in families that did not carry the NOD2 mutations.14 Notably, CDH3 and CDH1 are at that linkage peak. However, GWASs and meta-analyses of CD did not detect these genes, despite the large number of samples and imputations based on the latest HapMap samples. A GWAS on UC, on the other hand, has recently reported CDH1 as a susceptibility locus38 and has found that the most significant SNP for UC is 180 kb upstream of the gene. Here, we demonstrate that the most likely location of the causal variant is within an interval that is flanked by CDH3 and CDH1. CDH1 was also detected in a GWA meta-analysis of colorectal cancer (MIM 114500), and the reported SNPs fall within our confidence interval.39 CDH3, which encodes placental-cadherin (P-cadherin), is also implicated in colorectal carcinomas.40
Table 3.
Association Statistics and Estimated Location of the Causal Variation for Two Different Windows Covering CDH3, CDH1, and IRF8, in Relation to the Locations on the Human Genome Sequence
Data | Window | Cases | χ21a | p Value | Estimated Location (kb) | 95% Confidence Interval (kb) |
---|---|---|---|---|---|---|
WTCCC | CDH3 CDH1 | 1,698 | 32.6 | 1 × 10−8 | 67,303.7 | 67,239–67,393 |
NIDDKb | CDH3/CDH1 | 315 | 6.2 | 1 × 10−2 | 67,303.7 | 67,241–67,393 |
WTCCC | IRF8 | 1,698 | 34.7 | 4 × 10−9 | 84,539.8 | 84,539–84,541 |
NIDDKb Jewishc | IRF8 | 38 | 20.9 | 5 × 10−6 | 84,515.7 | 84,506–84,519 |
Non-Jewishc | IRF8 | 277 | 5.3 | 2 × 10−2 | 84,515.7 | 84,492–84,519 |
The coordinates for CDH3, CDH1, and IRF8 are 67,236–67,290, 67,329–67,427, and 84,490–84,514 kb, respectively.
A χ2 determined via the composite-likelihood method for each window. Note that the 95% CIs for IRF8 for the UK and North American data are nonoverlapping, suggesting heterogeneity of the location of the causative change, i.e., allelic heterogeneity.
Ileal CD with the involvement of at least one other extraileal intestinal location.
432 Jewish and 515 non-Jewish controls were used for these analyses.
For the IRF8 window, the WTCCC data yield an Ŝ location 29 kb downstream of the gene, within a small block that is flanked by LD breakdown (Figure 3). IRF8 encodes the transcription factor also known as interferon consensus sequence-binding protein, which plays a negative regulatory role in cells of the immune system. The analysis of the NIDDK data for patients with any extraileal intestinal involvement showed a signal that is 1.7 kb downstream of IRF8 within a region of LD breakdown (Figure 3). This signal was revealed to be significant after we further separated the data into Jewish and non-Jewish patients. The Jewish data alone yielded a substantially higher significance than the non-Jewish data, although it was derived from only 38 cases and 432 controls as opposed to 277 cases and 515 controls for the non-Jewish data (Table 3). This effect was only observed for the IRF8 window but shows the importance of considering ancestral heterogeneity. These two NIDDK datasets yielded essentially identical Ŝ locations. The 95% CIs for both the Jewish and non-Jewish data included part of the IRF8 gene and had an estimated location 24 kb upstream of the WTCCC signal (Table 3, Figure 3), further suggesting allelic heterogeneity. A recent meta-analysis of GWA data on multiple sclerosis, an immune-dysregulation condition, identified a marker SNP (rs17445836) 61 kb away from IRF8 (Figure 2). Another study on UC has identified a SNP (rs16940202) 58 kb downstream of IRF8. IRF8 expression is crucial in bone metabolism: IRF8-deficient mice display extensive osteoporosis, one of the extraintestinal manifestations associated with CD.41
Here, by using a multimarker genetic-mapping approach on available WTCCC data, we found three additional significant signals of association with CD, and all three signals were replicated via the NIDDK GWA scan. Furthermore, we show that there is substantial heterogeneity in the NOD2 region and demonstrate the independent involvement of the neighboring gene, CYLD. We also show two additional signals within the IRF8 gene region and the region containing CDH3 and CDH1. The analytical method used here avoids undue Bonferroni correction. Significance for all analyses of the WTCCC data passed the Bonferroni threshold for all 98 windows tested. These signals were replicated with the NIDDK data, which has a smaller sample size and less SNP coverage. Using the same method but expressing the distances in kb instead of LDU yielded a lower χ21 (data not shown), which agrees with the findings of previous studies.42 Also, a kb map provides poorer estimates of Ŝ than does an LDU map because it does not consider the LD structure. Our analysis with LDU maps provides insight into the genetic and phenotypic heterogeneity of CD and might lead the way to studying other frequent and complex diseases.
In recent years, the emphasis of genetic studies has shifted from hypothesis-driven research to GWASs, largely as a result of technological advancement. This shift has been very expensive and has been criticized for its slow progress. Even so, expensive data are now becoming available, and here, we use an approach to extract more information from such data. We find that complex inheritance can be dissected more effectively than before and also that detailed records of disease phenotype and ancestry are more useful than larger datasets. Our approach brings localization—as opposed to regions—and greater information to mapping and to the understanding of the disease phenotype. It indicates that regulatory regions such as enhancers are often important43, 44 and calls into question the current trend of focusing on exome sequencing.
Acknowledgments
We are very grateful to the Wellcome Trust, the National Institute of Diabetes and Digestive and Kidney Diseases (NIDDK, USA), and the International HapMap Project for making their invaluable data available to the scientific community. The data from the Wellcome Trust Case-Control Consortium (WTCCC, UK) was funded by the Wellcome Trust. A full list of the investigators who contributed to the generation of the data is available on the WTCCC website. The NIDDK IBD Genetics Consortium Crohn Disease GWAS was conducted by Judy H. Cho (Yale University), Steven Brant (Johns Hopkins University), Richard Duerr (University of Pittsburgh), Huiying Yang (Cedars-Sinai Medical Center), John Rioux (University of Montreal), and Mark Silverberg (University of Toronto), with support from the NIDDK. This manuscript was not prepared in collaboration with the labs of any of the investigators responsible for generating the data and does not necessarily reflect the views or opinions of these investigators or the NIDDK. The datasets used were obtained from the database of Genotypes and Phenotypes (dbGaP) under accession number phs000130. We thank Newton E. Morton (University of Southampton) and Steve Jones (UCL) for their useful comments and suggestions.
Published online: December 8, 2011
Web Resources
The URLs for data presented herein are as follows:
International HapMap Project, http://hapmap.ncbi.nlm.nih.gov/
National Institute of Diabetes and Digestive and Kidney Diseases (NIDDK), www.niddk.nih.gov/
Online Mendelian Inheritance in Man (OMIM), http://www.omim.org
UCSC Genome Browser, http://genome.ucsc.edu/
Wellcome Trust Case Control Consortium (WTCCC), http://www.wtccc.org.uk/
References
- 1.Loftus E.V., Jr. Clinical epidemiology of inflammatory bowel disease: Incidence, prevalence, and environmental influences. Gastroenterology. 2004;126:1504–1517. doi: 10.1053/j.gastro.2004.01.063. [DOI] [PubMed] [Google Scholar]
- 2.Ardizzone S., Puttini P.S., Cassinotti A., Porro G.B. Extraintestinal manifestations of inflammatory bowel disease. Dig. Liver Dis. 2008;40(Suppl 2):S253–S259. doi: 10.1016/S1590-8658(08)60534-4. [DOI] [PubMed] [Google Scholar]
- 3.Tysk C., Lindberg E., Järnerot G., Flodérus-Myrhed B. Ulcerative colitis and Crohn's disease in an unselected population of monozygotic and dizygotic twins. A study of heritability and the influence of smoking. Gut. 1988;29:990–996. doi: 10.1136/gut.29.7.990. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Schreiber S., Rosenstiel P., Albrecht M., Hampe J., Krawczak M. Genetics of Crohn disease, an archetypal inflammatory barrier disease. Nat. Rev. Genet. 2005;6:376–388. doi: 10.1038/nrg1607. [DOI] [PubMed] [Google Scholar]
- 5.Barrett J.C., Hansoul S., Nicolae D.L., Cho J.H., Duerr R.H., Rioux J.D., Brant S.R., Silverberg M.S., Taylor K.D., Barmada M.M., et al. NIDDK IBD Genetics Consortium. Belgian-French IBD Consortium. Wellcome Trust Case Control Consortium Genome-wide association defines more than 30 distinct susceptibility loci for Crohn's disease. Nat. Genet. 2008;40:955–962. doi: 10.1038/NG.175. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Franke A., McGovern D.P., Barrett J.C., Wang K., Radford-Smith G.L., Ahmad T., Lees C.W., Balschun T., Lee J., Roberts R., et al. Genome-wide meta-analysis increases to 71 the number of confirmed Crohn's disease susceptibility loci. Nat. Genet. 2010;42:1118–1125. doi: 10.1038/ng.717. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Anderson C.A., Boucher G., Lees C.W., Franke A., D'Amato M., Taylor K.D., Lee J.C., Goyette P., Imielinski M., Latiano A., et al. Meta-analysis identifies 29 additional ulcerative colitis risk loci, increasing the number of confirmed associations to 47. Nat. Genet. 2011;43:246–252. doi: 10.1038/ng.764. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.De Jager P.L., Jia X., Wang J., de Bakker P.I., Ottoboni L., Aggarwal N.T., Piccio L., Raychaudhuri S., Tran D., Aubin C., et al. International MS Genetics Consortium Meta-analysis of genome scans and replication identify CD6, IRF8 and TNFRSF1A as new multiple sclerosis susceptibility loci. Nat. Genet. 2009;41:776–782. doi: 10.1038/ng.401. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Maniatis N., Collins A., Morton N.E. Effects of single SNPs, haplotypes, and whole-genome LD maps on accuracy of association mapping. Genet. Epidemiol. 2007;31:179–188. doi: 10.1002/gepi.20199. [DOI] [PubMed] [Google Scholar]
- 10.Cavanaugh J., IBD International Genetics Consortium International collaboration provides convincing linkage replication in complex disease through analysis of a large pooled data set: Crohn disease and chromosome 16. Am. J. Hum. Genet. 2001;68:1165–1171. doi: 10.1086/320119. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Hugot J.P., Chamaillard M., Zouali H., Lesage S., Cézard J.P., Belaiche J., Almer S., Tysk C., O'Morain C.A., Gassull M., et al. Association of NOD2 leucine-rich repeat variants with susceptibility to Crohn's disease. Nature. 2001;411:599–603. doi: 10.1038/35079107. [DOI] [PubMed] [Google Scholar]
- 12.Hugot J.P., Zaccaria I., Cavanaugh J., Yang H., Vermeire S., Lappalainen M., Schreiber S., Annese V., Jewell D.P., Fowler E.V., et al. for the IBD International Genetics Consortium Prevalence of CARD15/NOD2 mutations in Caucasian healthy people. Am. J. Gastroenterol. 2007;102:1259–1267. doi: 10.1111/j.1572-0241.2007.01149.x. [DOI] [PubMed] [Google Scholar]
- 13.Lesage S., Zouali H., Cézard J.P., Colombel J.F., Belaiche J., Almer S., Tysk C., O'Morain C., Gassull M., Binder V., et al. EPWG-IBD Group. EPIMAD Group. GETAID Group CARD15/NOD2 mutational analysis and genotype-phenotype correlation in 612 patients with inflammatory bowel disease. Am. J. Hum. Genet. 2002;70:845–857. doi: 10.1086/339432. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.van Heel D.A., Dechairo B.M., Dawson G., McGovern D.P., Negoro K., Carey A.H., Cardon L.R., Mackay I., Jewell D.P., Lench N.J. The IBD6 Crohn's disease locus demonstrates complex interactions with CARD15 and IBD5 disease-associated variants. Hum. Mol. Genet. 2003;12:2569–2575. doi: 10.1093/hmg/ddg281. [DOI] [PubMed] [Google Scholar]
- 15.Ahmad T., Armuzzi A., Bunce M., Mulcahy-Hawes K., Marshall S.E., Orchard T.R., Crawshaw J., Large O., de Silva A., Cook J.T., et al. The molecular classification of the clinical manifestations of Crohn's disease. Gastroenterology. 2002;122:854–866. doi: 10.1053/gast.2002.32413. [DOI] [PubMed] [Google Scholar]
- 16.Wellcome Trust Case Control Consortium Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls. Nature. 2007;447:661–678. doi: 10.1038/nature05911. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Rioux J.D., Xavier R.J., Taylor K.D., Silverberg M.S., Goyette P., Huett A., Green T., Kuballa P., Barmada M.M., Datta L.W., et al. Genome-wide association study identifies new susceptibility loci for Crohn disease and implicates autophagy in disease pathogenesis. Nat. Genet. 2007;39:596–604. doi: 10.1038/ng2032. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Maniatis N., Collins A., Xu C.F., McCarthy L.C., Hewett D.R., Tapper W., Ennis S., Ke X., Morton N.E. The first linkage disequilibrium (LD) maps: Delineation of hot and cold blocks by diplotype analysis. Proc. Natl. Acad. Sci. USA. 2002;99:2228–2233. doi: 10.1073/pnas.042680999. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Tapper W., Collins A., Gibson J., Maniatis N., Ennis S., Morton N.E. A map of the human genome in linkage disequilibrium units. Proc. Natl. Acad. Sci. USA. 2005;102:11835–11839. doi: 10.1073/pnas.0505262102. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Lau W., Kuo T.Y., Tapper W., Cox S., Collins A. Exploiting large scale computing to construct high resolution linkage disequilibrium maps of the human genome. Bioinformatics. 2007;23:517–519. doi: 10.1093/bioinformatics/btl615. [DOI] [PubMed] [Google Scholar]
- 21.Webb A.J., Berg I.L., Jeffreys A. Sperm cross-over activity in regions of the human genome showing extreme breakdown of marker association. Proc. Natl. Acad. Sci. USA. 2008;105:10471–10476. doi: 10.1073/pnas.0804933105. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Morton N.E., Zhang W., Taillon-Miller P., Ennis S., Kwok P.Y., Collins A. The optimal measure of allelic association. Proc. Natl. Acad. Sci. USA. 2001;98:5217–5221. doi: 10.1073/pnas.091062198. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Andrew T., Maniatis N., Carbonaro F., Liew S.H., Lau W., Spector T.D., Hammond C.J. Identification and replication of three novel myopia common susceptibility gene loci on chromosome 3q26 using linkage and linkage disequilibrium mapping. PLoS Genet. 2008;4:e1000220. doi: 10.1371/journal.pgen.1000220. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Pennacchio L.A., Ahituv N., Moses A.M., Prabhakar S., Nobrega M.A., Shoukry M., Minovitsky S., Dubchak I., Holt A., Lewis K.D., et al. In vivo enhancer analysis of human conserved non-coding sequences. Nature. 2006;444:499–502. doi: 10.1038/nature05295. [DOI] [PubMed] [Google Scholar]
- 25.Hampe J., Cuthbert A., Croucher P.J., Mirza M.M., Mascheretti S., Fisher S., Frenzel H., King K., Hasselmeyer A., MacPherson A.J., et al. Association between insertion mutation in NOD2 gene and Crohn's disease in German and British populations. Lancet. 2001;357:1925–1928. doi: 10.1016/S0140-6736(00)05063-7. [DOI] [PubMed] [Google Scholar]
- 26.Glas J., Seiderer J., Tillack C., Pfennig S., Beigel F., Jürgens M., Olszak T., Laubender R.P., Weidinger M., Müller-Myhsok B., et al. The NOD2 single nucleotide polymorphisms rs2066843 and rs2076756 are novel and common Crohn's disease susceptibility gene variants. PLoS ONE. 2010;5:e14466. doi: 10.1371/journal.pone.0014466. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Abbott D.W., Wilkins A., Asara J.M., Cantley L.C. The Crohn's disease protein, NOD2, requires RIP2 in order to induce ubiquitinylation of a novel site on NEMO. Curr. Biol. 2004;14:2217–2227. doi: 10.1016/j.cub.2004.12.032. [DOI] [PubMed] [Google Scholar]
- 28.Brummelkamp T.R., Nijman S.M.B., Dirac A.M.G., Bernards R. Loss of the cylindromatosis tumour suppressor inhibits apoptosis by activating NF-kappaB. Nature. 2003;424:797–801. doi: 10.1038/nature01811. [DOI] [PubMed] [Google Scholar]
- 29.Courtois G., Gilmore T.D. Mutations in the NF-kappaB signaling pathway: Implications for human disease. Oncogene. 2006;25:6831–6843. doi: 10.1038/sj.onc.1209939. [DOI] [PubMed] [Google Scholar]
- 30.Reiley W.W., Jin W., Lee A.J., Wright A., Wu X., Tewalt E.F., Leonard T.O., Norbury C.C., Fitzpatrick L., Zhang M., Sun S.C. Deubiquitinating enzyme CYLD negatively regulates the ubiquitin-dependent kinase Tak1 and prevents abnormal T cell responses. J. Exp. Med. 2007;204:1475–1485. doi: 10.1084/jem.20062694. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Zhang J., Stirling B., Temmerman S.T., Ma C.A., Fuss I.J., Derry J.M., Jain A. Impaired regulation of NF-kappaB and increased susceptibility to colitis-associated tumorigenesis in CYLD-deficient mice. J. Clin. Invest. 2006;116:3042–3049. doi: 10.1172/JCI28746. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Costello C.M., Mah N., Häsler R., Rosenstiel P., Waetzig G.H., Hahn A., Lu T., Gurbuz Y., Nikolaus S., Albrecht M., et al. Dissection of the inflammatory bowel disease transcriptome using genome-wide cDNA microarrays. PLoS Med. 2005;2:e199. doi: 10.1371/journal.pmed.0020199. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Need A.C., Kasperaviciute D., Cirulli E.T., Goldstein D.B. A genome-wide genetic signature of Jewish ancestry perfectly separates individuals with and without full Jewish ancestry in a large random sample of European Americans. Genome Biol. 2009;10:R7. doi: 10.1186/gb-2009-10-1-r7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Li L.C., Chui R.M., Sasaki M., Nakajima K., Perinchery G., Au H.C., Nojima D., Carroll P., Dahiya R. A single nucleotide polymorphism in the E-cadherin gene promoter alters transcriptional activities. Cancer Res. 2000;60:873–876. [PubMed] [Google Scholar]
- 35.Villani A.C., Lemire M., Thabane M., Belisle A., Geneau G., Garg A.X., Clark W.F., Moayyedi P., Collins S.M., Franchimont D., Marshall J.K. Genetic risk factors for post-infectious irritable bowel syndrome following a waterborne outbreak of gastroenteritis. Gastroenterology. 2010;138:1502–1513. doi: 10.1053/j.gastro.2009.12.049. [DOI] [PubMed] [Google Scholar]
- 36.Schneider M.R., Dahlhoff M., Horst D., Hirschi B., Trülzsch K., Müller-Höcker J., Vogelmann R., Allgäuer M., Gerhard M., Steininger S., et al. A key role for E-cadherin in intestinal homeostasis and Paneth cell maturation. PLoS ONE. 2010;5:e14325. doi: 10.1371/journal.pone.0014325. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Gassler N., Rohr C., Schneider A., Kartenbeck J., Bach A., Obermüller N., Otto H.F., Autschbach F. Inflammatory bowel disease is associated with changes of enterocytic junctions. Am. J. Physiol. Gastrointest. Liver Physiol. 2001;281:G216–G228. doi: 10.1152/ajpgi.2001.281.1.G216. [DOI] [PubMed] [Google Scholar]
- 38.Barrett J.C., Lee J.C., Lees C.W., Prescott N.J., Anderson C.A., Phillips A., Wesley E., Parnell K., Zhang H., Drummond H., et al. UK IBD Genetics Consortium. Wellcome Trust Case Control Consortium 2 Genome-wide association study of ulcerative colitis identifies three new susceptibility loci, including the HNF4A region. Nat. Genet. 2009;41:1330–1334. doi: 10.1038/ng.483. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Houlston R.S., Webb E., Broderick P., Pittman A.M., Di Bernardo M.C., Lubbe S., Chandler I., Vijayakrishnan J., Sullivan K., Penegar S., et al. Colorectal Cancer Association Study Consortium. CoRGI Consortium. International Colorectal Cancer Genetic Association Consortium Meta-analysis of genome-wide association data identifies four new susceptibility loci for colorectal cancer. Nat. Genet. 2008;40:1426–1435. doi: 10.1038/ng.262. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Milicic A., Harrison L.A., Goodlad R.A., Hardy R.G., Nicholson A.M., Presz M., Sieber O., Santander S., Pringle J.H., Mandir N., et al. Ectopic expression of P-cadherin correlates with promoter hypomethylation early in colorectal carcinogenesis and enhanced intestinal crypt fission in vivo. Cancer Res. 2008;68:7760–7768. doi: 10.1158/0008-5472.CAN-08-0020. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Zhao B., Takami M., Yamada A., Wang X., Koga T., Hu X., Tamura T., Ozato K., Choi Y., Ivashkiv L.B., et al. Interferon regulatory factor-8 regulates bone metabolism by suppressing osteoclastogenesis. Nat. Med. 2009;15:1066–1071. doi: 10.1038/nm.2007. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.Maniatis N., Morton N.E., Gibson J., Xu C.F., Hosking L.K., Collins A. The optimal measure of linkage disequilibrium reduces error in association mapping of affection status. Hum. Mol. Genet. 2005;14:145–153. doi: 10.1093/hmg/ddi019. [DOI] [PubMed] [Google Scholar]
- 43.Hindorff L.A., Sethupathy P., Junkins H.A., Ramos E.M., Mehta J.P., Collins F.S., Manolio T.A. Potential etiologic and functional implications of genome-wide association loci for human diseases and traits. Proc. Natl. Acad. Sci. USA. 2009;106:9362–9367. doi: 10.1073/pnas.0903103106. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44.Goode D.L., Cooper G.M., Schmutz J., Dickson M., Gonzales E., Tsai M., Karra K., Davydov E., Batzoglou S., Myers R.M., Sidow A. Evolutionary constraint facilitates interpretation of genetic variation in resequenced human genomes. Genome Res. 2010;20:301–310. doi: 10.1101/gr.102210.109. [DOI] [PMC free article] [PubMed] [Google Scholar]