Abstract
Crohn disease is a complex, multigenic, chronic inflammatory bowel disease of uncertain etiology. Recent advances in genetics, including high-throughput single-nucleotide polymorphism typing platforms and deep sequencing technologies have begun to shed light upon disease predisposition and pathogenesis. Autophagy is emerging as a key player in both innate and adaptive immunity, as well as tissue homeostasis and development in the gut. Here we describe our recent studies into the Crohn disease-associated Immunity-Related GTPase family, M (IRGM) gene and our discovery of a large risk-conferring upstream deletion. We discuss the effects of this deletion upon expression levels of IRGM alleles and how tissue-specific expression might be affected by the promoter polymorphism. In addition, we comment upon the potential roles of IRGM in autophagy of intracellular pathogens, and the challenges ahead for further elucidating IRGM function.
Keywords: Crohn disease, inflammation, infection, bacteria, host-pathogen interaction, innate immunity
Crohn disease (CD) is a chronic inflammatory bowel disease, characterized by an ongoing inflammatory immune response in the intestine. The current understanding of Crohn’s pathogenesis points to a dysregulation in immune responses to the “normal” microbial flora in the gut, leading to a vicious cycle of inflammation, tissue damage, and further microbial exposure.1 This model is supported by clinical observations that both antibiotic treatment and immune/inflammatory inhibition can initiate disease remission, although complete control of the disease is often elusive.
CD has a known genetic component, first identified via the elevated risks seen in family members of CD patients. Genetic linkage studies identified several loci, including Nod2 (a pathogen-recognition receptor responsive to the bacterial cell wall component muramyl dipeptide), as well as associations within the major histo-compatibility complex (MHC) and other loci.2,3 The advent of large-scale single-nucleotide-polymorphism typing (SNP-typing) platforms, combined with high-throughput sequencing technologies has enabled the current state-of-the-art genome-wide association studies. Using a series of genome-wide SNPs identified from the HapMap project, SNP-typing arrays have been able to identify SNPs associated with a number of diseases including CD.4–10 Subsequent linkage analysis and resequencing often enables direct association of single SNPs with the majority of haplotype-conferred risk. In some cases these risk-conferring SNPs induce amino acid changes in resultant proteins, but more commonly occur in regions outside of coding sequences, perhaps resulting in altered gene regulation. The number of CD-associated gene loci currently stands at 32—a 16-fold increase in just two years.11
These novel associations have opened several avenues of research into CD pathogenesis, one of the most interesting being the role of autophagy in mediating immune responses and controlling host-bacterial interactions in the gut.12 Two of the most strongly CD-associated SNPs are located within or near genes with roles in autophagy, namely ATG16L1 and IRGM.7–9 The CD-associated SNP at ATG16L1 is a nonsynonymous coding SNP that results in a single amino acid change in the mature protein. For such variants it is reasonable to propose alteration in protein function as a result of the amino acid substitution. However, the CD-associated SNP at IRGM was outside the coding region, and resequencing did not reveal coding variation in linkage disequilibrium with the CD-associated SNP. Thus, the reason for the upstream IRGM SNP association with CD was unclear.
In our recent study we used a novel SNP copy-number variation array13 to examine the genomic region around the IRGM locus. These data revealed that many people’s genomes lack a large region of sequence directly upstream of the IRGM locus. This was confirmed by quantitative PCR, which showed that individuals had 0, 1 or 2 copies of the sequence in this region. This insertion/deletion was in perfect linkage disequilibrium (i.e., perfectly correlated, across a large population sample) with the CD-associated SNP in all HapMap (a large, diverse, tissue and data archive of SNP-typed human samples) samples studied, with the deletion corresponding to the risk allele. This made the upstream deletion polymorphism a candidate to explain the CD association at the locus.
We were subsequently able to map the breakpoint of the polymorphism to 200 bp resolution and determined through sequencing that the deletion removes 20,103 nucleotides, replacing this with just seven nucleotides. The right breakpoint of the deletion was within 123 bp of the CD-associated SNP, explaining the strong linkage disequilibrium between the deletion and this SNP. The common deletion allele involved loss of genomic sequence to within 2.5 kb of the IRGM transcriptional start.
IRGM therefore shows an interesting pattern of genetic variation, in which the gene segregates in the population with the same protein-coding sequence but with alternate upstream sequences. The striking variation in the genomic sequence upstream of IRGM—and the fact that additional genetic data implicated the upstream region as harboring the causal variation in CD—led us to postulate a polymorphism-associated alteration in expression of IRGM. We compared the ability of the CD-risk and CD-protective haplotypes to activate IRGM expression by measuring the relative abundance of transcripts derived from the two haplotypes, in cells that were heterozygous for them. We were able to monitor the expression of the two IRGM alleles within heterozygous cell lines by using an exonic, synonymous SNP which was in linkage disequilibrium with the observed CD-associated SNP and upstream deletion polymorphism. Because of the strong linkage disequilibrium among these polymorphisms, transcripts arising from the CD risk-associated haplotype (with the deletion) always carry the T allele of the linked SNP, and transcripts arising from the CD-protective haplotype (which lack the deletion) always carry the C allele (Fig. 1A). The relative abundances of these diagnostic transcript alleles were determined using two independent technologies; a TaqMan platform, and a mass-extension analysis (Sequenom). Both platforms were able to robustly detect relative transcript variation and produced equivalent data.
Figure 1.
IRGM locus variation and putative isoforms. (A) A large insertion/deletion polymorphism exists upstream of the IRGM locus; linkage between the deletion and CD-risk SNPs suggest this polymorphism may be causal for the IRGM SNP-associated CD risk. An exonic SNP (T or C, as shown) in strong linkage disequilibrium with this deletion allows unambiguous identification of transcripts resulting from the risk or protective alleles. (B) There are five putative IRGM isoforms, although it is unclear which result in mature proteins. Putative mature mRNAs are shown; all isoforms are thought to share a common start codon, and untranslated exons and ORFs are shown in alternating shading to delineate intron/exon boundaries. Isoforms (c–e) all have intron-exon boundaries after the stop codon, usually indicative of targets for nonsense-mediated decay. All isoforms except (a) possess a putative second translated exon encoded on the genome approximately 30 kb downstream of the first exon, subsequent intron/exon boundaries are an additional 17 to 20 kb distant.
The data from both techniques established that the two IRGM haplotypes were differentially expressed in heterozygous cell lines. Indeed, HeLa cells (which contain both alleles in their genomic DNA) expressed the C allele almost exclusively; in contrast, HCT116 cells expressed the T allele at close to six times the level of C allele expression. Primary smooth muscle cells derived from human bronchus also showed six-fold higher T versus C allele expression, whereas lymphoblastoid cell lines (from ten individuals) all tended to express the C allele at approximately twice the level of the T allele. These data demonstrate that the two IRGM haplotypes have different transcriptional potency in different cell types, and that in many cell types these differences are extremely strong.
Since IRGM transcript expression levels are altered by the identified haplotypes, we sought to understand whether IRGM expression level influences autophagic function, a mechanism that might explain the association with CD. Previous studies demonstrated a role for IRGM in the autophagy-mediated clearance of Mycobacterium tuberculosis14,15 and both our own and other studies have demonstrated that autophagy is involved in clearance of Salmonella typhimurium.7,16 Therefore we performed functional studies of anti-Salmonella autophagy in HeLa cells undergoing both siRNA-mediated IRGM knockdown and IRGM overexpression. IRGM knockdown significantly attenuated the autophagy of internalized S. typhimurium, reducing the proportion of bacteria captured within autophagosomes from 30% to less than 15%. Increasing levels of IRGM overexpression showed a dose-dependent increase in autophagic efficiency—apparently saturating at 200 ng of transfected IRGM plasmid, with over 50% of internalized bacteria within autophagosomes after one hour of infection.
Compared to the murine IRGM gene, the human sequence lacks interferon-responsive elements and is thought to encode five isoforms, although it is unknown whether all of these result in protein products (Fig. 1B).17 Three of these five isoforms appear to be strong candidates for nonsense-mediated RNA decay, due to the presence of exon-intron boundaries 3' to the termination codon. So far, attempts to detect protein products in human cell lines have yielded conflicting data—antibodies raised to recombinant human IRGM isoform (a) were unable to detect endogenous protein,18 whereas antipeptide antibodies directed against the putative IRGM C terminus appeared to detect a specific band in human cell lines.14 For our experiments we utilized IRGM cDNA corresponding to the putative isoform (a), and siRNAs capable of targeting all five putative isoforms. IRGM isoform (a) appears to possess a truncated G-domain, and recombinant protein failed to demonstrate GTPase activity.18 The alternative, longer isoforms possess additional sequences unrelated to known GTPase domains. Thus, the functional significance and regulation of the different IRGM isoforms are currently unclear. Future research must be directed to clearly establish the expression profile of all IRGM isoforms and their protein products, as well as define roles for specific isoforms, both in autophagy and other cellular processes. Until such data are replicated across laboratories and with independent reagents, it is unlikely that these discrepancies can be resolved.
Overall, both our data and others strongly support the conclusion that IRGM is involved in mediating autophagy of intracellular bacteria, and that differences in expression level yield functional alterations in the ability of cells to initiate and sustain antibacterial autophagy. While it is unlikely that CD is due to a microbial pathogen, it is possible that autophagy of microbial flora forms a key part of gut homeostasis and may influence both innate and adaptive immune responses as well as wound healing. Thus, by using model organisms such as Salmonella to challenge the autophagic apparatus we may be able to determine the limits of autophagy and whether CD-associated genotypes exhibit altered autophagic function.
In addition, this study provided the first evidence of a structural polymorphism outside of a coding sequence associated with a human disease risk. As with the initial advent of genome-wide association and SNP-typing technologies, the adoption of copy-number variation analyses will likely increase our understanding of genetic predisposition to CD, as well as drive fruitful new avenues of research into treatment. Recent analysis of copy number variation suggests that this variation is more limited in genomic coverage than previously estimated. Both common and low-frequency copy-number variants appear to segregate on specific SNP haplotypes. This overall segregation of copy-number variants to specific haplotypes mimics that seen in the IRGM locus, suggesting that combining SNP haplotypes with limited copy number variants data represents a generalizable model for copy-number variation discovery and analysis.13,19
References
- 1.Xavier RJ, Podolsky DK. Unravelling the pathogenesis of inflammatory bowel disease. Nature. 2007;448:427–434. doi: 10.1038/nature06005. [DOI] [PubMed] [Google Scholar]
- 2.Ogura Y, Bonen DK, Inohara N, Nicolae DL, Chen FF, Ramos R, Britton H, Moran T, Karaliuskas R, Duerr RH, Achkar JP, Brant SR, Bayless TM, Kirschner BS, Hanauer SB, Nunez G, Cho JH. A frameshift mutation in NOD2 associated with susceptibility to Crohn’s disease. Nature. 2001;411:603–606. doi: 10.1038/35079114. [DOI] [PubMed] [Google Scholar]
- 3.van Heel DA, Fisher SA, Kirby A, Daly MJ, Rioux JD, Lewis CM. Inflammatory bowel disease susceptibility loci defined by genome scan meta-analysis of 1,952 affected relative pairs. Hum Mol Genet. 2004;13:763–770. doi: 10.1093/hmg/ddh090. [DOI] [PubMed] [Google Scholar]
- 4.WTCCC WTCCC. Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls. Nature. 2007;447:661–678. doi: 10.1038/nature05911. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Todd JA, Walker NM, Cooper JD, Smyth DJ, Downes K, Plagnol V, Bailey R, Nejentsev S, Field SF, Payne F, Lowe CE, Szeszko JS, Hafler JP, Zeitels L, Yang JH, Vella A, Nutland S, Stevens HE, Schuilenburg H, Coleman G, Maisuria M, Meadows W, Smink LJ, Healy B, Burren OS, Lam AA, Ovington NR, Allen J, Adlem E, Leung HT, Wallace C, Howson JM, Guja C, Ionescu-Tirgoviste C, Simmonds MJ, Heward JM, Gough SC, Dunger DB, Wicker LS, Clayton DG. Robust associations of four new chromosome regions from genome-wide analyses of type 1 diabetes. Nature Genet. 2007;39:857–864. doi: 10.1038/ng2068. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Samani NJ, Erdmann J, Hall AS, Hengstenberg C, Mangino M, Mayer B, Dixon RJ, Meitinger T, Braund P, Wichmann HE, Barrett JH, Konig IR, Stevens SE, Szymczak S, Tregouet DA, Iles MM, Pahlke F, Pollard H, Lieb W, Cambien F, Fischer M, Ouwehand W, Blankenberg S, Balmforth AJ, Baessler A, Ball SG, Strom TM, Braenne I, Gieger C, Deloukas P, Tobin MD, Ziegler A, Thompson JR, Schunkert H. Genomewide association analysis of coronary artery disease. N Engl J Med. 2007;357:443–453. doi: 10.1056/NEJMoa072366. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Rioux JD, Xavier RJ, Taylor KD, Silverberg MS, Goyette P, Huett A, Green T, Kuballa P, Barmada MM, Datta LW, Shugart YY, Griffiths AM, Targan SR, Ippoliti AF, Bernard EJ, Mei L, Nicolae DL, Regueiro M, Schumm LP, Steinhart AH, Rotter JI, Duerr RH, Cho JH, Daly MJ, Brant SR. Genome-wide association study identifies new susceptibility loci for Crohn disease and implicates autophagy in disease pathogenesis. Nature Genet. 2007;39:596–604. doi: 10.1038/ng2032. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Parkes M, Barrett JC, Prescott NJ, Tremelling M, Anderson CA, Fisher SA, Roberts RG, Nimmo ER, Cummings FR, Soars D, Drummond H, Lees CW, Khawaja SA, Bagnall R, Burke DA, Todhunter CE, Ahmad T, Onnie CM, McArdle W, Strachan D, Bethel G, Bryan C, Lewis CM, Deloukas P, Forbes A, Sanderson J, Jewell DP, Satsangi J, Mansfield JC, Cardon L, Mathew CG. Sequence variants in the autophagy gene IRGM and multiple other replicating loci contribute to Crohn’s disease susceptibility. Nature Genet. 2007;39:830–832. doi: 10.1038/ng2061. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Hampe J, Franke A, Rosenstiel P, Till A, Teuber M, Huse K, Albrecht M, Mayr G, De La Vega FM, Briggs J, Gunther S, Prescott NJ, Onnie CM, Hasler R, Sipos B, Folsch UR, Lengauer T, Platzer M, Mathew CG, Krawczak M, Schreiber S. A genome-wide association scan of nonsynonymous SNPs identifies a susceptibility variant for Crohn disease in ATG16L1. Nature Genet. 2007;39:207–211. doi: 10.1038/ng1954. [DOI] [PubMed] [Google Scholar]
- 10.Duerr RH, Taylor KD, Brant SR, Rioux JD, Silverberg MS, Daly MJ, Steinhart AH, Abraham C, Regueiro M, Griffiths A, Dassopoulos T, Bitton A, Yang H, Targan S, Datta LW, Kistner EO, Schumm LP, Lee AT, Gregersen PK, Barmada MM, Rotter JI, Nicolae DL, Cho JH. A genome-wide association study identifies IL23R as an inflammatory bowel disease gene. Science. 2006;314:1461–1463. doi: 10.1126/science.1135245. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Barrett JC, Hansoul S, Nicolae DL, Cho JH, Duerr RH, Rioux JD, Brant SR, Silverberg MS, Taylor KD, Barmada MM, Bitton A, Dassopoulos T, Datta LW, Green T, Griffiths AM, Kistner EO, Murtha MT, Regueiro MD, Rotter JI, Schumm LP, Steinhart AH, Targan SR, Xavier RJ, Libioulle C, Sandor C, Lathrop M, Belaiche J, Dewit O, Gut I, Heath S, Laukens D, Mni M, Rutgeerts P, Van Gossum A, Zelenika D, Franchimont D, Hugot JP, de Vos M, Vermeire S, Louis E, Cardon LR, Anderson CA, Drummond H, Nimmo E, Ahmad T, Prescott NJ, Onnie CM, Fisher SA, Marchini J, Ghori J, Bumpstead S, Gwilliam R, Tremelling M, Deloukas P, Mansfield J, Jewell D, Satsangi J, Mathew CG, Parkes M, Georges M, Daly MJ. Genome-wide association defines more than 30 distinct susceptibility loci for Crohn’s disease. Nature Genet. 2008;40:955–962. doi: 10.1038/NG.175. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Xavier RJ, Huett A, Rioux JD. Autophagy as an important process in gut homeostasis and Crohn’s disease pathogenesis. Gut. 2008;57:717–720. doi: 10.1136/gut.2007.134254. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.McCarroll SA, Kuruvilla FG, Korn JM, Cawley S, Nemesh J, Wysoker A, Shapero MH, de Bakker PI, Maller JB, Kirby A, Elliott AL, Parkin M, Hubbell E, Webster T, Mei R, Veitch J, Collins PJ, Handsaker R, Lincoln S, Nizzari M, Blume J, Jones KW, Rava R, Daly MJ, Gabriel SB, Altshuler D. Integrated detection and population-genetic analysis of SNPs and copy number variation. Nature Genet. 2008;40:1166–1174. doi: 10.1038/ng.238. [DOI] [PubMed] [Google Scholar]
- 14.Singh SB, Davis AS, Taylor GA, Deretic V. Human IRGM induces autophagy to eliminate intracellular mycobacteria. Science. 2006;313:1438–1441. doi: 10.1126/science.1129577. [DOI] [PubMed] [Google Scholar]
- 15.Gutierrez MG, Master SS, Singh SB, Taylor GA, Colombo MI, Deretic V. Autophagy is a defense mechanism inhibiting BCG and Mycobacterium tuberculosis survival in infected macrophages. Cell. 2004;119:753–766. doi: 10.1016/j.cell.2004.11.038. [DOI] [PubMed] [Google Scholar]
- 16.Birmingham CL, Smith AC, Bakowski MA, Yoshimori T, Brumell JH. Autophagy controls Salmonella infection in response to damage to the Salmonella-containing vacuole. J Biol Chem. 2006;281:11374–11383. doi: 10.1074/jbc.M509157200. [DOI] [PubMed] [Google Scholar]
- 17.Bekpen C, Hunn JP, Rohde C, Parvanova I, Guethlein L, Dunn DM, Glowalla E, Leptin M, Howard JC. The interferon-inducible p47 (IRG) GTPases in vertebrates: loss of the cell autonomous resistance mechanism in the human lineage. Genome Biol. 2005;6:92. doi: 10.1186/gb-2005-6-11-r92. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Bekpen C. Evolutionary and functional studies of p47 GTPases involved in cell autonomous immunity. Universitat zu Koln. 2006 [Google Scholar]
- 19.Korn JM, Kuruvilla FG, McCarroll SA, Wysoker A, Nemesh J, Cawley S, Hubbell E, Veitch J, Collins PJ, Darvishi K, Lee C, Nizzari MM, Gabriel SB, Purcell S, Daly MJ, Altshuler D. Integrated genotype calling and association analysis of SNPs, common copy number polymorphisms and rare CNVs. Nature Genet. 2008;40:1253–1260. doi: 10.1038/ng.237. [DOI] [PMC free article] [PubMed] [Google Scholar]