Skip to main content
UKPMC Funders Author Manuscripts logoLink to UKPMC Funders Author Manuscripts
. Author manuscript; available in PMC: 2024 Dec 5.
Published in final edited form as: Gastroenterology. 2021 Nov 13;162(3):859–876. doi: 10.1053/j.gastro.2021.11.014

An integrated taxonomy for monogenic inflammatory bowel disease

Chrissy Bolton 1,2,#, Christopher S Smillie 3,#, Rasa Elmentaite 4, Gabrielle Wei 5, Carmen Argmann 5, Dominik Aschenbrenner 1, Kylie R James 4, Dermot PB McGovern 6, Marina Macchi 1, Judy Cho 5, Dror Shouval 7, Jochen Kammermeier 8, Sibylle Koletzko 9, Lauren Peters 5, Simon PL Travis 1,10, Luke Jostins 11, Carl A Anderson 4, Scott Snapper 12, Christoph Klein 9, Eric Schadt 5, Matthias Zilbauer 13,14, Ramnik Xavier 3, Sarah Teichmann 4,15,16, Aleixo M Muise 17,18,19, Aviv Regev 3,20,21, Holm H Uhlig 1,10,22,*
PMCID: PMC7616885  EMSID: EMS152101  PMID: 34780721

Abstract

Complex diseases can have monogenic and polygenic forms that may inform on independent as well as converging mechanisms. Monogenic forms of inflammatory bowel disease (IBD) are caused by genetic defects that disrupt essential networks of intestinal homeostasis. Here, we use a quantitative framework to build a systematic taxonomy of 136 disorders and syndromes. We classify 82 of the most penetrant disorders by their overlapping syndromic features; response to hematopoietic stem cell transplantation; bulk gene expression across 32 tissues; and single-cell RNA-seq profiles of over 50 cell subsets from the intestine of healthy individuals and IBD patients, both children and adults. We define a vast landscape of monogenic IBD gene expression across diverse epithelial, mesenchymal, endothelial, and immune cells, which is enriched in neutrophils and regulatory T cells. Although only a few genes are implicated in both monogenic and polygenic IBD, shared pathways exist within intestinal cell types. Overall, our work provides a clinically relevant framework for the classification and management of monogenic IBD, while revealing the shared and unique features of monogenic and polygenic IBD.

Keywords: inflammatory bowel disease, Crohn’s disease, Ulcerative colitis, unclassified colitis, indeterminate colitis, immunodeficiency, IBD unclassified, genetics, next-generation sequencing, exome sequencing, clinical genomics


Graphic abstract.

Graphic abstract

Abbreviations

BGRNs

Bayesian gene regulatory networks

CD

Crohn’s disease

CI

Confidence interval

DCs

dendritic cells

eQTL

expression quantitative trait loci

GOF

Gain of function

GWAS

genome-wide association study

IBD

inflammatory bowel disease

IBDU

IBD unclassified

IEC

intestinal epithelial cell

IPEX

Immune dysregulation, polyendocrinopathy, enteropathy, X-linked syndrome

HSCT

Hematopoietic stem cell transplantation

LOF

Loss of function

M cells

Microfold cells

NMF

non-negative matrix factorization

PCA

principal component analysis

scRNA-seq

single-cell RNA-seq

SNPs

single-nucleotide polymorphisms

TA

transit-amplifying

Treg

Regulatory T cells

UC

Ulcerative colitis

Introduction

Inflammatory bowel diseases (IBD) arise from breakdown in mucosal barrier function and immune homeostasis1,2. IBD has been conventionally classified as Crohn’s disease (CD), ulcerative colitis (UC) and unclassified IBD (IBDU). Through Montreal and Paris criteria, these disease forms are endo-phenotyped with regard to histological, endoscopic and demographic information3,4. However, although these IBD classifications are commonly used in clinical research, in routine clinical practice they lack strong predictive power of disease course, have a limited impact on management across the wide spectrum of patients and a restricted ability to delineate molecular etiology.

Substantial progress has been made in understanding the genetic and immunological etiology of IBD through genome-wide association studies (GWAS) and single-cell RNA-seq (scRNA-seq)5,6. However while over 240 loci have been associated with risk of polygenic IBD5 only around 10% of these loci have been mapped to causal variants7. Given the multifactorial nature of polygenic IBD, individual risk variants have little predictive diagnostic power. ScRNA-seq has started to reveal how interactions between immune, parenchymal and stromal cells might modify cellular or molecular responses to exert a protective or pathogenic effect6,812. Combining single-cell profiles of RNA and proteins provides new avenues for disease classification and opportunities to understand the heterogeneity of all cell subsets in a tissue over the course of disease11,13. Ongoing investigation is required to understand the clinical application of such discoveries.

A complementary approach to understanding the complex ethiology of intestinal inflammation is through the lens of monogenic forms of IBD. There is a clinical need as well as conceptual advantages to systematically investigating monogenic IBD: (1) patients present with more severe and distinguishable phenotypes; (2) the precise genetic cause of the disease and molecular pathways, identified from individual protein-coding variants are potentially easier to map; and (3) pathway-specific therapies can provide functional evidence. Indeed, for some forms of monogenic IBD, a genetic diagnosis will indicate pathway-specific therapies, predict postoperative recurrence or complications, inform screening for malignancies or infections, facilitate genetic counselling and help to avoid an extremely prolonged course of intractable inflammation14,15. Nonetheless, the mechanistic model we have for different monogenic IBD disorders is limited by a strong bias towards a narrow subset of disorders, implicating a relatively small number of immune cells into functional models15, such that the relationship between polygenic and monogenic IBD is not clear.

Here, we tackle this challenge by building a comprehensive taxonomy of 136 disorders and syndromes, enabling a systematic analysis of these rare disease forms, and their relationship to polygenic IBD. Starting from an extensive literature review, we identify 82 monogenic disorders that are most strongly associated with IBD. We use this for a comprehensive analysis of the cellular and genetic networks and mechanisms that drive inflammation, enabling comparison between monogenic and polygenic IBD.

Results

Stratification of disorders based on the penetrance of monogenic IBD

We investigated a diverse group of 136 Mendelian disorders and syndromes based on a comprehensive literature review to identify patients with monogenic defects causing intestinal inflammation (Fig. 1a, Extended Data Fig. 1). To ensure high-confidence gene mapping, we only included reports of patients with histologically proven intestinal inflammation attributed to validated monogenic defects. We excluded disorders associated with large chromosomal defects or mosaicism, as well as common genetic variants, where the minor allele frequency did not support a causal monogenic relationship.

Figure 1. Approach to investigating syndromes and Mendelian disorders associated with monogenic inflammatory bowel disease (IBD).

Figure 1

Monogenic gene defects were classified according to whether they had (b) high-penetrance, (c) moderate-penetrance or (d) insufficient evidence of intestinal inflammation, 90% confidence intervals are shown. For each gene, the total number of patients with IBD-like intestinal inflammation is presented above as an additional marker of clinical confidence.

We classified the gene defects by the penetrance of inflammation, using a statistical model that accounts for effect size and integrates patient numbers (Fig. 1b-d, Extended Data Fig. 2). As a pragmatic and conservative threshold, we defined high-penetrance disorders as having a penetrance of IBD that was greater than 5%. This matches the penetrance of biallelic variants in NOD2, which are associated with the strongest genetic susceptibility to polygenic CD1618 We defined moderate penetrance disorders as those with an IBD penetrance between 1-5% (as per the confidence interval), based on the highest estimated baseline risk of IBD in the population of 1%19 (Fig. 1c). Rather than solely integrating the number of patients with a gene defect and IBD as evidence, this model allowed the inclusion of very rare defects, where even a single robustly-validated case report could be included. Interestingly, this model also allowed exclusion of more common Mendelian disorders (such as cystic fibrosis, due tomutations in CFTR), where many IBD patients have been reported, but the penetrance was not significantly higher than the population risk (Fig. 1d).

Using these criteria, we distilled hundreds of heterogeneous case reports down to 56 disorders with high-penetrance IBD, 26 disorders with moderate-penetrance, and 22 disorders with insufficient evidence for IBD penetrance greater than 1% (Fig. 1b-d). Reflecting the rarity of the disorders, 39% were associated with at least 10 IBD cases, 16% with 5-9 IBD cases and 45% with 1-4 IBD cases.

Monogenic IBD gene identification is supported by population-based analysis

Because Mendelian disorders are expected to be rare within a population, we reasoned that loss-of-function (LOF) defects in the 82 monogenic IBD genes should also be rare. To test this, we examined the frequencies of LOF variants for monogenic IBD genes in the Exome Aggregation Consortium (ExAC)20, considering autosomal and X-linked genes separately due to their distinct patterns of inheritance.

The low median frequencies of LOF alleles for monogenic IBD genes in ExAC (6.96*10-5 for autosomal and 1.28*10-5 for X-linked; Extended Data Fig. 3c) was consistent with our selection criteria. Across all monogenic IBD genes, the combined frequencies of alleles with high-confidence LOF variants were below 10-3 and 10-4, respectively, well below that of the reference gene NOD2 (Extended Data Fig. 3a,c). Conversely, genes that did not pass our rigorous selection criteria (Extended Data Table 2) had higher LOF allele frequencies,including DUOX2 and NOX1 (Extended Data Fig. 3a,b) genes previously discussed as causative defects.

Gene-Phenotype-Ontology of monogenic IBD relates genes to age of onset, intestinal inflammation and extra intestinal features

We next investigated available phenotypes for high- and moderate-penetrance monogenic IBD genes (Fig. 2).

Figure 2. Phenotypic characteristics of people with monogenic IBD gene defects indicate a spectrum of onset age and syndromic associations.

Figure 2

(a) Onset age (circles) of individuals’ intestinal inflammation of monogenic IBD (n=338). Dashed line denotes the median. The age of onset in polygenic IBD is shown for comparison (grey, right, n=1608).

(b) Median age of onset for high- and moderate-penetrance IBD genes (P= 0.0065, Mann Whitney test, one-tailed) with individual gene defects represented (circle).

(c) Syndromic gene-phenotype associations and outcomes of intestinal inflammation following hematopoietic stem cell transplant (HSCT). SCID= severe combined immunodeficiency

Focusing on age of onset (n = 338 total records), patients with high-penetrance IBD gene defects were significantly younger than those with moderate-penetrance IBD defects (P=0.0065, one-sided Mann-Whitney test; Fig. 2b), and both were significantly younger than polygenic IBD patients (P<0.0001, one-sided Mann-Whitney test; Fig. 2a). In total, infantile IBD was associated with variants in 30 genes (37%), and very early onset IBD in 45 genes (55%). Many high-penetrance disorders presented exclusively with infantile onset (e.g., IL-10 signaling defects), while others presented with a broader age of onset, including adulthood. However, even among high-penetrance genes, 37% of defects were associated with a median age of onset above 6 years of age (Fig. 2b).

For many gene defects spanning a range of shared syndromic phenotypes, patients were diagnosed with both CD and UC (Fig. 2c), showing the limited utility of the Montreal classification in distinguishing diverse monogenic IBD mechanisms. Some gene defects were associated with intestinal inflammation predominantly resembling CD (with granulomas or perianal/fistulizing disease) and others UC. Disorders with CD-like inflammation included defects in the NOD2 pathway of pathogen recognition and antimicrobial autophagy (XIAP, TRIM22, NPC1); chronic granulomatous disease; IL-10 signaling defects; autoinflammatory disorders (NLRC4, MVK); and congenital diarrhea (GUCY2C, SLC9A4) (Fig. 2c). The small group of genes associated with UC-like inflammation included FERMT1, ICOS, ADA2, MALT1 and MASP2 (Fig. 2c). Systematic analysis of Montreal/Paris sub-phenotypes or disease extent were not consistently reported in patients with monogenic IBD. In terms of extra-intestinal features, 67% of monogenic IBD genes could be curated into 15 syndromic phenotypic categories that were shared between at least 2 genes in each category (Fig. 2c, Extended Data Table 2).

Hematopoietic stem cell transplantation (HSCT) identifies immunological mechanisms of monogenic IBD

The response to targeted therapeutic interventions can provide key information regarding disease mechanisms. Sustained resolution of intestinal inflammation after allogenic HSCT suggests an underlying deficiency in hematopoietic cells, whereas non-response implicates non-hematopoietic cellular compartments. HSCT data were available for 23 disorders (Fig. 2c, Extended Data Table 2). HSCT is efficacious for IL-10 signaling defects; chronic granulomatous disease and IPEX (“Immune dysregulation, Polyendocrinopathy, Enteropathy, X-linked”) syndromes. For TTC7A defects, HSCT did not cure intestinal inflammation, while for IKBKG deficiency, HSCT was not consistently curative, although it may cure the immunodeficiency traits (Fig. 2c, Extended Data Table 2).

Enriched expression of monogenic IBD genes in intestinal and lymphoid tissues

To investigate the tissue compartments affected by monogenic IBD genes, we analyzed bulk expression profiles of 32 tissues from the Human Protein Atlas22 (Extended Data Fig. 4). Unsupervised clustering organized the monogenic IBD genes into organ-specific subgroups, with most genes enriched in intestinal or primary/secondary lymphoid tissue (Extended Data Fig. 4).

Monogenic IBD gene expression in cellular compartments of the healthy small intestine and colon

To elucidate the cell types that may be affected by monogenic IBD disorders, we combined scRNA-seq spanning over 50 cell types from the healthy adult colon23 (Fig. 3a,b, 12 subjects) and healthy pediatric ileum24 (Fig. 3c,d, 8 subjects). These data span much of the cellular diversity of the intestinal mucosa, with the exception of neutrophils, which have eluded previous scRNA-seq studies of IBD. To mitigate this weakness, we added scRNA-seq data from blood-derived neutrophils of healthy human donors25 to the healthy colonic data (Fig. 3a) and validated this approach using a separate immune proteomic dataset13 (Fig. 3e). Overall, monogenic IBD genes were enriched in colonic and ileal cell types from epithelial, mesenchymal, endothelial, myeloid, T cell, and B cell compartments (Fig. 3a,c).

Figure 3. scRNA-seq expression of monogenic IBD genes from healthy participants shows specificity to cellular compartments.

Figure 3

(a-d) Monogenic IBD genes are enriched in specific cell subsets from healthy adult colon (n=12) (a), healthy pediatric ileum samples (n=8) (c) and their corresponding 1st and 2nd principal components ((b) and (d) respectively). Scaled mean expression of monogenic IBD genes (columns) across healthy cell subsets (rows) from different cell lineages (color legend), black outlines: q<0.05.

(e) Unsupervised clustering of mean protein copy numbers (encoded by monogenic IBD genes) from 3-4 donors of peripheral hematopoietic cells, by quantitative proteomics of cells sorted by fluorescence-activated cell sorting (Rieckmann et al 2017). There was no data available for n=14 monogenic IBD genes, isoforms are specified.

(f) Mean protein copy numbers of 3-4 donors in neutrophils vs classical monocytes in healthy participants.

ILCs= innate lymphoid cells; Tregs= regulatory T cells; NKs= natural killer cells; CD4+LP= CD4+ lamina propria T cells; GC=germinal centre cells, DC= dendritic cells; M cell= ‘Microfold’ cells; TA= transit amplifying; WNT…= fibroblast subsets; MAIT cell= mucosal-associated invariant T cells

In the intestinal epithelial compartment, two gene expression patterns emerged. One subset of genes was expressed in stem, crypt and transit-amplifying cells (ANKZF1, HPS4, IRFBP2, DKC1, MVK and FERMT1; Fig. 3a,c). The other was expressed in goblet cells and mature enterocytes (ALPI, SLC26A3, SLC9A3,and TTC7A; Fig. 3a,c). Indeed, many of these genes were not detectable in protein levels of peripheral immune cells (Fig. 3e).

Endothelial cells showed distinctly high expression of TGFBR2 and SLCO2A1 (Fig. 3a,c), with the highest expression in capillary cells and microvascular cells.

A distinct gene module was less clear for the mesenchymal compartment, where specific expression was displayed by COL7A1, highlighting inflammatory fibroblasts, WNT5B/2B+ fibroblasts, and smooth muscle cells (Fig. 3a,c).

In the hematopoietic compartment, a phagocyte-predominant group of genes showed particularly high expression in neutrophils and in a subset of monocytes, conventional type 2 dendritic cells (DC2) and macrophages (Fig. 3a,c). The distinctly strong expression of multiple monogenic IBD genes in neutrophils (including DOCK2, ITGB2, NCF2/4, NFAT5, PIK3CD, STAT1, WAS and WIPF1) highlights their exceptional role in several monogenic IBD disorders (Fig. 3a). Monocyte, macrophage and DC2 subsets expressed high levels of CECR1, NLRC4, HPS3, IL10RA and LACC1 for example.

The enrichment of these genes in neutrophils was replicated in an independent immune proteomic dataset13 (Fig. 3e), which also confirmed the large overlap of monogenic IBD gene products expressed amongst neutrophils and monocytes (Fig. 3e, Extended Data Fig. 5b). Most monogenic IBD genes had ~10-fold higher protein levels in neutrophils than in activated classical monocytes, but the levels were highly correlated across the cell types (Spearman r =0.83) (Fig. 3f).

In contrast, LACC1, RTEL1 and NFAT5 showed >100-fold higher levels in classical monocytes than in neutrophils (Fig. 3f, Extended Data Fig. 5b). Divergent transcriptomic and proteomic signals for glycolysis-associated genes G6PC3 and SLC37A4 in neutrophils was consistent with the suggestion that glycolysis in neutrophils is regulated by translational and post-transcriptional mechanisms25.

A set of monogenic IBD genes, including ZAP70, ICOS, CD40LG, CD3G and IL2RG, was highly expressed across T lymphocytes (Fig. 3a,c). Regulatory T cells (Tregs) showed the strongest monogenic IBD gene enrichment in this compartment, with distinctively high expression of CTLA4, FOXP3, ICOS, IL10 and IL2RA (Fig. 3a,c). Although not exclusive to B cells, ADA, BACH2, NCF1, DCLRE1C and CYBC1 showed highest mRNA expression in B cell subsets (Fig. 3a,c).

The single-cell profiles of monogenic IBD genes in intestinal hematopoietic cells were largely consistent with their protein expression in peripheral blood immune cell subsets (Fig. 3e). In addition to high expression of FOXP3 in CD4+ Tregs, gene products for IPEX-like syndromes (e.g., CTLA4, STAT1, STAT3, IL2RA, and LRBA) were highly expressed in Tregs and activated T cells (Fig. 3e).

Monogenic IBD gene expression during intestinal inflammation

We next investigated the expression of monogenic IBD genes in cell types during inflammation, using samples from the colon of 18 adult UC patients and the ileum of 7 pediatric CD patients. Similar to the healthy samples, we observedthe compartmentalized expression of monogenic IBD genes among cell types (Fig. 4a-d). PCA showed less differentiation between cellular compartments, suggesting inflammation-induced effects on multiple cell types (Fig. 4b,d).

Figure 4. Cell-type specific monogenic IBD gene expression in IBD.

Figure 4

(a-d) Monogenic IBD genes are enriched in specific cell subsets from inflamed adult colon (n=18) (a), inflamed paediatric ileum (n=7) (c) and their corresponding 1st and 2nd principal components ((b) and (d) respectively). Scaled mean expression of monogenic IBD genes (columns) across inflamed cell subsets (rows) from different cell lineages (color legend), black outlines: q<0.05.

(e) Differentially-expressed (DE) genes (columns) in inflamed vs. healthy samples across cell subsets (rows) annotated by cell lineage (color legend) from colonic samples. Dot size: fraction of expressing cells from inflamed samples; dot color: DE model coefficients (q < 0.05; discrete model coefficient from MAST).

(f) Cytokine/receptor cascades involving monogenic IBD genes based on scRNA-seq of non-inflamed colon. Monogenic IBD genes are indicated with an asterisk.

Monogenic IBD genes displayed significant up- and down-regulation during inflammation across a number of cell types (Fig. 4e). The cell types with the most differentially expressed genes between inflamed and healthy samples included mature enterocytes (SLC26A3, HPS1, GUCY2C, XIAP, BCL10, CD55 and SLCO2A1), macrophages (WIPF1, IRFBP2, NCD2, CYBB, NCF4 and ARPC1B) and CD8+ intra-epithelial lymphocytes (IL10RA, TGFB1, ITGB2, CD3G and ARPC1B) (Fig. 4e).

ScRNA-seq analyses highlight cell type-specific pathways and biological processes involving monogenic IBD disease genes

We next utilized the cell type-specific expression of monogenic IBD genes to explore cellular processes and networks in disease pathology, investigate cytokine signaling pathways (Fig. 4f), and group genes into biological processes described by Gene Ontology (GO) terms (Extended Data Fig. 6).

Cytokine networks forming amplification cascades and feedback loops regulate inflammation across diverse cell types and are involved in the pathogenesis of monogenic IBD (Fig. 4f). Using our data, we reconstructed key parts of a cytokine network formed by monogenic IBD genes, as follows. First, IL-2 produced by T lymphocytes acts on IL-2 receptor-expressing macrophages, monocytes, and Tregs (Fig. 4f) Tregs also express the checkpoint inhibitor, CTLA4, and, together with monocytes, are major producers of IL-10. IL-10 may primarily exert its anti-inflammatory effects via monocytes and macrophages, both of which express TNF and IL-1. TNF, in turn, acts on a large number of immune and non-immune cells including enterocytes, fibroblasts and endothelial cells, where its effects are modulated by the expression of TNFAIP3 (A20) (Fig. 4f). In contrast, IL-1 primarily acts on a mesenchymal cytokine network (Fig. 4f). The therapeutic relevance of these multicellular signaling networks is highlighted by the efficacy of IL-1 receptor antagonists in patients with IL-10 signaling defects26. This supports a model where defects in different layers of cytokine regulation drive inflammation.

To investigate which intra-cellular pathways may be targeted by monogenic IBD, we performed pathway enrichment analysis using Gene Ontology (GO) terms for co-expressed monogenic IBD genes with increased expression in the colon during colitis (Extended Data Fig. 6a). This analysis highlighted significantly enriched terms in hematopoietic cells such as DC2s, inflammatory monocytes, cycling T cells, activated FOS-hi CD4+T cells, CD8+IL17+ T cells and Tregs (Extended Data Fig. 6c, Extended Data Table 5), as well as in non-hematopoietic cells, including WNT5B+ fibroblasts, endothelial cells, post-capillary venules, Microfold(M)-like cells and inflammatory fibroblasts (Extended Data Fig. 6b). This suggests that subsets of monogenic IBD genes are not only expressed in similar cell types, but that they form functional modules and pathways within these cells.

Monogenic and common polygenic IBD intersect at the network level

We next investigated the genetic and functional relationship between monogenic and polygenic IBD. Although the 81 monogenic and 278 polygenic IBD risk loci (confidently mapped from genome-wide association studies (GWAS))5,7,27,28 only targeted an overlapping set of 13 genes, the extent of this overlap was significant (P <3.04*10-5; hypergeometric test; Fig. 5a).

Figure 5. Overlap in genes, cell types, cell modules and transcriptional regulatory networks between polygenic and monogenic IBD.

Figure 5

(a) There is a significant overlap of 13 genes between 81 monogenic IBD genes and 278 polygenic IBD candidate loci (representation factor: 6.5; p < 3.034e-05).

(b) Monogenic and polygenic IBD genes are enriched in overlapping cell types. Mean expression of polygenic IBD genes (y axis) vs. monogenic IBD genes (x axis) for each cell type from the healthy human colon, with select cell types annotated and colored by lineage (legend).

(c) Monogenic and polygenic IBD genes are co-expressed in gene modules. For gene modules of 250 co-expressed genes in healthy and inflamed cells from the colon (top) and ileum (bottom), shown is the top enriched KEGG term (y axis), the number of monogenic and polygenic IBD genes (left), the cell type distribution colored by cell lineage (middle), and annotated monogenic genes (right, black) and polygenic genes (right, grey) in the module.

(d) For monogenic IBD genes (dark grey) or a background set of genes with the same expression statistics (light grey), the probability that genes from the gene set are co-expressed with polygenic IBD genes in the same cell type (left) or module (right) for colon and ileum cells (x axis). Error bars: SEM, P-values, *** P < 0.001.

(e-f) 40 monogenic IBD genes (the ‘seed set’) were found within a Bayesian Gene Regulatory Network of gut biopsy transcriptomes and paediatric CD genotyping from the RISK cohort (Peters et al 2017) cohort, which integrated 7568 nodes (genes) and 14389 edges. Shown are the networks formed by adding (e) 1 or (f) 2 additional network layers to the seed set of these 40 monogenic IBD genes. 39 of the 40 genes were connected to each other in (f), along with a many adult GWAS polygenic genes, with significant enrichment found, suggesting a common transcriptional landscape.

Moreover, there was a strong correlation across cell types when scored for the expression of monogenic and polygenic IBD genes (Fig. 5b; Spearman’s ρ = 0.61; P < 1x10-5). In particular both monogenic and polygenic IBD gene were highly expressed in phagocytic cells (cycling monocytes, neutrophils, DC2, macrophages) and activated T cells (CD8+IL-17+, cycling T cells) (Fig. 5b). However, there were also differences. Monogenic IBD genes were enriched in Tregs, inflammatory monocyes and exhausted T cells (CD4+PD1+ cells), whereas polygenic IBD genes showed stronger enrichment in enterocytes and mesenchymal cells (Fig. 5b).

To determine whether monogenic and polygenic IBD genes may impact the same pathways, we used non-negative matrix factorization29 (NMF) to learn modules of co-expressed genes in the healthy and inflamed colon and ileum (Fig. 5c) and tested them for enrichment in monogenic and polygenic IBD genes. Several modules were significantly enriched for the expression of monogenic and polygenic IBD genes, in both the ileal and colonic samples (FDR <= 1x10-4 for each tissue; permutation test). These modules, which were enriched for KEGG terms related to T cell signaling (modules 86 and 35), phagocytosis (modules 13 and 71), and effector T cell cytotoxicity (modules 21 and 61),collectively spanned more than 26 monogenic (35%) and 24 polygenic (23%) IBD genes. These results suggest that both monogenic and polygenic genes may converge onto a shared set of pathways. To quantitatively assess the extent of their co-expression, we next measured the probability that monogenic and polygenic IBD genes were found within the same cell types or gene modules (Fig. 5d). Both gene sets were strongly enriched in the same cell types and gene modules, suggesting overlapping gene networks.

We also tested for overlap between monogenic and polygenic IBD genes in an independent Bayesian gene regulatory network of polygenic IBD30 that was built from bulk expression profiles and expression quantitative trait loci (eQTL) priors. After excluding genes that were shared by both diseases, high-confidence polygenic IBD genes were 3-fold enriched in the subnetwork of genes that were < 2 degrees of separation from the 40 high-penetrance monogenic IBD genes (P<1.7*10-4; Fisher test, Fig. 5f). Thus, the transcriptional networks governing monogenic and polygenic IBD genes interrelate and are highly connected.

Monogenic IBD classification by cell and module expression aligns with clinical and therapeutic distinctions

We next integrated these diverse genetic, clinical and molecular features into a comprehensive taxonomy of monogenic IBD (Table 1), combining (1) IBD penetrance; (2) syndromic phenotype; (3) therapeutic outcome after allogeneic HSCT; (4) organ expression enrichment (e.g., lymphoid vs. intestinal tissue); and (5) enriched cellular compartments during inflammation (using scRNA-seq).

Table 1. Integrated Monogenic IBD Gene Taxonomy.

Genes are organised according to syndromal phenotypes, organ-specific gene expression, single cell gene expression and outcome of hematopoietic stem cell transplant (HSCT). The cellular compartment with the highest mean scRNA-seq expression (inflamed) for that gene is shown in columns ‘monocyte/macrophage’ to ‘endothelial’. Genes defying the syndromic group trend show 2nd highest expressed compartment in grey. Genes with the highest neutrophil and regulatory T cell expression are shown. Impact of HSCT on intestinal inflammation is indicated by circle colours; red= effective, strong evidence; pink = effective, weak evidence; black = ineffective. Red asterisks indicate where experimental data suggests a cell-specific role for the gene.

DC= dendritic cell, IL= interleukin, CID= Combined immunodeficiency, SCID = severe combined immunodeficiency, TCR= T cell receptor.

We identified several distinct patterns of disease and cell type association (Extended Data Fig. 7). Many disorders were significantly associated with a distinct cell type such as IPEX/IPEX-like syndromes that had genes which largely mapped to Tregs (Extended Data Fig. 7a) or can be associated with either hematopoietic or non-hematopoietic expression pattern (Extended Data Fig. 7a).

Other disorders had more complex phenotypes involving hematopoietic and non-hematopoietic cells, where a single pathogenic cellular compartment was not implicated, but immunological mechanisms were plausible (Extended Data Fig. 7b). This is exemplified by defects in the STAT3 gene, which despite being highly expressed in endothelial cells was classified with other IPEX-like disorders, where Treg and T lymphocyte-specific gene expression is predominant (Table 1), consistent with the role of STAT3 in regulating Th17 cell and Treg differentiation31. In some of these instances, similar phenotypes were associated with divergent patterns of cellular expression that could potentially be explained by signaling pathways (e.g., stromal-epithelial or cytokine-receptor interactions) (Extended Data Fig. 6a, 7c, Fig. 4f). In other instances, single-cell expression profiles may miss relevant cell types (e.g., congenital neutropenia genes). Lastly, in all disorders it is important to assess the directionality of the functional defect as illustrated by the contrasting effects of LOF and GOF mutations in PIK3R1, PIK3CD, STAT1 or in STAT3 (Fig. 1, Extended Data Fig. 7e).

Orthogonal therapeutic and phenotypic evidence supports the mapping of monogenic IBD disorders and syndromes to specific cell types and pathways.

Disorders caused by genes expressed in hematopoietic cells were likely to respond to HSCT (Table 1). Indeed, 71% of genetic disorders responsive to HSCT, the genes were predominantly expressed in hematopoietic compartments, although this association was not significant given the paucity of HSCT treatment data for genes not associated with immunodeficiency. The shared expression patterns, shared syndromic features and available clinical data suggests that additional gene defects related to chronic granulomatous disease, Wiskott Aldrich-like syndromes and IPEX-like syndromes might respond to HSCT (Table 1).

As further validation for our approach, we hypothesized that defects mapping to the same cell types or pathways should be more likely to lead to similar intestinal and extra-intestinal phenotypes. Indeed, genes associated with the same syndromic phenotype were significantly enriched in the same cell types and gene modules in colon and ileum (Extended Data Fig. 8). This enrichment was stronger than would be expected for either all monogenic IBD genes or a control set of genes with similar expression levels (Extended Data Fig. 8).

An integrated evidence-based taxonomy of monogenic IBD

The integrated taxonomy partitioned the monogenic IBD genes into 24 key disease subgroups (Table 1, Extended Data Fig. 7 and 9). This included genes with IL-10 signaling defects, conferring the strongest susceptibility to monogenic IBD (Fig. 1b). A phagocyte-enriched group was associated with defective antimicrobial autophagy and leucocyte migration (Table 1, Extended Data Fig. 7a), lysosomal defects, defective vesicle transport, and impaired glucose 6-phosphate and fatty acid metabolism. Genes associated with impaired actin polymerization were highly expressed in diverse cells, but were some of the most highly expressed genes in myeloid cells (Fig. 3f). Neutropenia itself does not consistently cause intestinal inflammation, but instead dysfunctional phagocytic activity may trigger a pathogenic myeloid-stromal inflammatory network6,8,32.

A group of disorders with causal genes with lymphocyte-dominant expression impact Treg activity, T cell development, T cell activation, and phosphadidyl-3-phosphate signaling induced tolerance (Table 1, Extended Data Fig. 9). Monogenic disorders may additionally point to non-canonical cellular functions. For example, the significance of the ~100-fold increase in protein levels of the NADPH-oxidase subunit genes (NCF1, NCF4 and CYBB) in Tregs compared to naïve CD4+T cells requires further investigation (Extended Data Fig. 7a), since a role in Treg suppression has been postulated33.

A group of disorders with causal genes predominantly expressed in intestinal epithelial cells were characterized by electrolyte-related defects in intraluminal milieu, epithelial polarization and brush border enzymes (Table 1). The strong enrichment of SLC26A3, SLC9A3 and GUCY2C in ileal enterocytes in particular may account for a number of patients showing ileal ulcers or ileitis with these gene defects34,35(Fig. 4a,c). Other defects in epithelial-dominant FERMT1 or mesenchymal-dominant COL7A1 disrupt intestinal epithelial adhesion (Table 1).

Defects in genes enriched in endothelial cells were associated with multiple chronic, nonspecific ulcers and strictures of the small intestine (SLCO2A1) anddefective TGF-β signaling. This is consistent with the key vascular manifestations of aortic aneurisms and dissection in Loeys-Dietz connective tissue disorder. The different phenotype of biallelic TGFB1 deficiency (LOF) and Loeys-Dietz syndrome caused by dominant TGFβ receptor variants is likely related to functional differences, whereby TGFBR1/2 LOF variants lead to paradoxical enhanced TGFβ signaling. A complex group of disorders with genes with both hematopoietic and non-hematopoietic cell expression was characterized by autoinflammation due to inflammasome activation, innate NF-Kβ signaling defects with variable phenotype, or RIPK1 deficiency (Table 1). Several gene defects could not be grouped into phenotypic or functional groups (Fig. 4e).

Discussion

This taxonomy provides data-driven insights into monogenic IBD, which may guide focused gene panel testing, aid mechanistic understanding and therapeutic considerations such as HSCT or targeted therapies for severe forms of monogenic IBD. With growing understanding of molecular pathways, our classification supports the increasing application of pathway-specific therapies in monogenic IBD, which have otherwise not been efficacious in polygenic IBD populations36. For example, studies suggest that many patients with pathogenic variants in CTLA4 and LRBA (which regulates CTLA-4 turnover and is expressed in Treg cells), respond to abatacept, a CTLA-4 mimic that partially restores Treg function37,38. Individual patients with genetic disorders associated with monogenic inflammasome activation, such as MVK deficiency or NLRC4 defects, responded to IL1 or IL18 targeting therapies3941.

The finding that monogenic and polygenic disorders have an overlapping functional basis has several implications. It supports a model of partially shared inflammatory cascades in subsets of patients with monogenic and polygenic disease. For instance, hyper-inflammatory monocytes have been implicated in subsets of patients with monogenic and polygenic IBD6,23,32. In polygenic IBD, increased expression of the Oncostatin M cytokine and receptor correlated with disease severity and anti-TNF-resistance. Elevated expression of Oncostatin M has been linked to IL-23 producing inflammatory monocytes, which have also been associated with anti-TNF resistance6,32. Meanwhile, IL-10 suppresses several cytokines including IL-1 dependent IL-23 production and a functional resistance to IL-1 can be induced by contact of monocytes with bacteria32. Since individual patients with monogenic defects in the IL10-pathway responded to IL-1 blockade26, it will be interesting to see whether patients with a transcriptional profile suggestive of functional IL-10 resistance might also respond to these therapies. In line with the hypothesis of overlapping pathogenesis are also a limited number of rare variants of monogenic IBD genes amonst patients with classical IBD.

Consistent with proteomic studies in classical IBD43,44, we provide essential transcriptomic and proteomic evidence for the significant role of neutrophils in monogenic IBD disorders. Our work also supports a role for other cell subsets that were only recently highlighted in polygenic IBD, including endothelial cells, enterocytes and DC2s6,10,23. We note that there is a noticeable absence of monogenic IBD gene expression in other cell types, including goblet cells, Paneth cells, and glial cells. These cells have been implicated in polygenic IBDor animal models of intestinal inflammation10,24,45. Our findings may either reflect a reduced role for these cell types in monogenic IBD compared to polygenic IBD (Fig. 5b), or reflect our more limited understanding of these cell types.

Future classifications would benefit from integrating several additional criteria. A single-cell proteomic approach is desired because a poor correlation between gene and protein expression has been observed for some genes46. Spatial transcriptomics or proteomics can further refine cell classifications and identify correlated cellular communities. Further consideration is also warranted for the emerging role of the intestinal microbiota in the development and maintenance of gut homeostasis, for example through epigenetic modification and induction of protective immune cell specialization47,48. The role of the microbiome in monogenic IBD requires further clarification to exclude confounders such as antibiotic use and pathogen colonisation 49. Developmental aspects of the immune system likely contribute to the pathogenesis of polygenic and monogenic IBD24,45,47, given the very early onset in many cases (Fig. 2a), which could be incorporated as single cell atlases of the developing gut are assembled (in preparation50). Although we have focused on disorders with evidence for Mendelian inheritance, somatic variants might contribute to intestinal inflammation and inflammation response in a cell type specific manner.

In summary, our findings from monogenic IBD support a model where single genes and pathogenic cell types may cause IBD, due to the amplification of interconnected inflammatory networks, across multiple cellular compartments.

Materials and Methods

Gene-phenotype dataset

A literature search was performed to identify Mendelian and syndromal disorders associated “inflammatory bowel disease”, “Crohn’s disease”, “Ulcerative Colitis” and “colitis” in Pubmed, OMIM, and ClinVar database (last accessed 31st September 2018). Poster abstracts were also reviewed from the 2016 Clinical Immunology Society Annual Meeting and the 2014, 2016 and 2017 European Society for Immunodeficiencies meetings. We did not include Mendelian and syndromal disorders where intestinal inflammation arises de-novo due to a known iatrogenic mechanism, e.g. i) after solid organ transplantation due to treatment with mycophenolate; ii) after treatment with checkpoint inhibitors or iii) after surgery that induces diversion colitis. Definitions and concepts of the monogenic IBD classification were agreed after interdisciplinary expert consensus of pediatric and adult gastroenterologists, clinical and basic immunologists, and geneticists (Extended Data Methods, Extended Data Table 1). Gene names were recorded according to Human Gene Organization Nomenclature classification; mode of inheritance ((X-linked (X), autosomal recessive (AR, or autosomal dominant (AD)); and as well as functional directionality of the gene defect (i.e. LOF or GOF).

Classification of Mendelian disorders according to IBD penetrance

As a comparable quantitative estimate for the strength of the disease association, the penetrance of the IBD-like phenotype was determined. Literature searches on Pubmed and Google Scholar were performed for thelargest, recent cohort descriptions summarizing the genotype-phenotype associations of each gene. In order to avoid penetrance inflation, we did not focus on studies that solely genotyped IBD patients. Personal correspondence with authors of published studies facilitated evaluation of variant status, variant validation, IBD prevalence and intestinal phenotype. If studies described less than 10 individuals with a given gene defect (exceptionally rare or newly discovered gene defects), multiple cohorts were summated. IBD penetrance for the most extreme genotype was counted, for example, for genes with AR inheritance, only patients with biallelic variants were counted. Case reports of patients likely to have bigenic cause were not included. For inclusion, patients required a stated IBD/CD/UC diagnosis or IBD-like intestinal inflammation based on endoscopic and/or histological evidence in the absence of causal infection.

To maintain a quantitative benchmark that differentiates genes of high impact from polygenic IBD risk factors we defined high impact genes as those with stronger penetrance of intestinal inflammation than NOD2 variants (higher than 4.9% penetrance). NOD2 is the strongest in risk factor for CD in European ancestry populations and serves as a biological benchmark. The penetrance of CD of homozygosity or compound heterozygosity of the three polygenic NOD2 variants (p.Arg702Trp rs2066844; p.Gly908Arg rs2066845; and p.Leu1007fsinsC rs5743293) in population-based studies is variable, estimated at 1% 51, 1.5% 52 and 4.9 % 53,54.

Penetrance and CIs were based on the modified Wald equation (where m=number of patients with Mendelian IBD; n=number of patients with thepathogenic gene defect; and z=1.645 (90% CI), z=1.96 (95% CI), and z=2.576 for 99% CI, respectively).

We estimated penetrance p=m+0.5z2n+z2

and confidence interval CI=p±zp(1p)n+z2

The penetrance of intestinal inflammation of gene defects was classified as:

  • High-penetrance: genes with a CI for estimated penetrance exceeding 4.9%

  • Moderate-penetrance: genes with an estimated lower penetrance interval above 1%, the highest estimated baseline population risk in Western countries19.

  • Insufficient or contradictory evidence: This group includes genes with CIs extending below 1% penetrance. In addition, it involves all syndromic and Mendelian disorders with insufficient or contradictory evidence (Fig. 1, Extended Data Fig. 3).

To further rate the clinical confidence in the disease association, we looked for the total number of patients described with IBD-like phenotypes for each gene. Snowballing of papers cited in the case series, author correspondence and literature searches were conducted to identify additional cases of intestinal inflammation for genes in high and moderate-penetrance groups.

Gene damage intolerance analysis and allele frequency in reference cohorts

Allele frequency of cumulated nonsense variants (LOF variants; sum of stop-loss/stop-gain and frameshift variants) was based on ExAC variant server (Extended Data Fig. 3). The sum of essential LOF variants includes not only the relative number of variants normalized to the gene size (pLI score) but also the minor allele frequency in aggregated population data. For GOF or hypomorphic variants, cumulated LOF metrics do not apply.

Phenotype assessment

Key phenotype disease characteristics were recorded from the sample of included patients that had intestinal inflammation, where data was available (Fig. 2). The presence of the characteristic in at least 1 patient warranted scoring as positive, for large patient samples, up to 10 patient cases were reviewed. The following were recorded:

  1. The age of IBD diagnosis according to endoscopic investigation or the time of IBD-symptom onset if there was >1 year delay prior to endoscopy. As per the Paris Classification, pediatric-onset IBD is defined as starting before 17 years of age and very early onset IBD before 6 years of age.

  2. Intestinal phenotype (CD, UC or IBDU, disease location (oral, duodenitis, ileitis, colitis); perianal disease (fistulas and/or abscess formation); penetrating disease; strictures, histological features of granuloma or defined epithelial defects such as tufting and apoptosis).

  3. Searches for published reports of patients with gene defects and IBD were performed to identify those who had received allogenic HSCT. For inclusion, patients required a genotyped variant, which was at least ‘likely pathogenic’ and intestinal inflammation not attributable to an infectious etiology. The effectiveness of HSCT on IBD was assessed according to the need for further immunomodulating IBD treatment.

  4. Due to lack of standardized studies, effects of biologic or immunomodulatory treatments were not assessed.

Organ RNA expression

mRNA expression data of 32 different human tissues were analyzed based on the organ expression of the Human Protein Atlas project (Extended Data Fig. 4, https://www.proteinatlas.org)22. For PCA, unit variance scaling was applied to rows; single value decomposition with imputation was used to calculate principal components.

scRNA-seq analysis

Detailed methodology for previously collected scRNA-seq analyses is provided in their original publications (Fig. 3a-d)8,24,25. Briefly, for colonic scRNA-seq we used published data (which we collected previously) from 366,650 cells of the mucosa, taken from location-matched samples of 12 healthy patients (Fig. 3a,b) and 18 patients with UC (Fig. 4a,b). “Epithelial” and “lamina propria” fractions were separated from each sample, with clustering of cells into immune, epithelial and stromal compartments, as previously described. Transcriptionally distinct sub-clusters of cells were identified and organized into subsets with known lineage relationships, as previously described. We used the published clusters, annotations, and expression data to analyze mean pure expression level (log2(TP10K+1)) with reference to high- and moderate-penetrance monogenic IBD genes for all expressing and non-expressing cells (Fig. 3a,b afternormalization according to the mean expression of the gene across all 51 cells. ScRNA-seq data of blood-derived neutrophils was added into the healthy colonic dataset (Fig. 3a)25.

For pediatric ileal samples (Fig. 3c-d, 4c-d), raw sequence reads in FASTQ format were obtained (described further in Ref. 24) and re-aligned to the GRCh38-3.0.0 human reference transcriptome using the CellRanger v3.1.0 pipeline (10x Genomics) with default parameters. The resulting gene expression matrices were analyzed using Scanpy package v1.5.155. After quality control and doublet exclusion by scrublet56, terminal ileum scRNA-seq data included 58,900 cells from 8 healthy pediatric patients (Fig. 3c,d) and 7 patients with CD (Fig. 4c,d). Healthy and CD cells were clustered and annotated together, and annotations were further refined after integration with fetal and healthy adult samples as described in Ref.24. Analysis and visualization of mean expression levels of monogenic IBD genes followed the analysis described for the colonic cells.

Proteomic analysis

Quantitative protein levels of 28 hematopoietic cells from peripheral blood (43 activated or steady state types) were obtained from the ImmProt database13 (Fig. 3e-f, Extended Data Fig. 5, Extended Data Fig. 7a, http://www.immprot.org/). Levels were based on mass spectrometry of fluorescence-activated cell sorted immune cells from 3-4 healthy donors. For clustering analysis columns are centered; unit variance scaling is applied to columns. Columns are clustered using correlation distance and average linkage.

Differential expression analysis

Differential expression (DE) tests were performed using MAST57, which fits a hurdle model to the expression of each gene, consisting of logistic regression for the zero process (i.e. whether the gene is expressed) and linear regression for the continuous process (i.e. the expression level) (Fig.4e). To reduce the size of the inference problem, separate models were fit for each annotated cell subset, comparing cells within the given cell subsets to all other cells. The regression model includes terms to capture the effects of the cell subset and the disease state on gene expression, while controlling for cell complexity (i.e. the number of genes detected per cell).

Specifically, we used the regression formula, Yi ~ X + D + N, where Yi is the standardized log2(TP10K+1) expression vector for gene i across all cells, X is a binary variable reflecting cell subset membership (e.g. Tregs vs. non-Tregs), D is the disease state associated with each cell, and N is the number of genes detected in each cell. To identify genes that are specific to cell subsets in healthy subjects and IBD (i.e. UC or CD) patients, we used two disease states: Healthy and IBD. Additionally, a few heuristics were used to increase the speed of the tests: we required all tested genes to have a minimum fold change of 1.2 and to be expressed by at least 1% of the cells within the group of interest, and cells were evenly downsampled across groups so that a maximum of 2,500 cells were tested for each cell subset. In all cases, the discrete and continuous coefficients of the model were retrieved and p-values were calculated using the likelihood ratio test in MAST. Q-values were separately estimated for each cell subset comparison using the Benjamini-Hochberg FDR. Unless otherwise indicated, all reported DE coefficients and q-values correspond to the discrete component of the model (i.e. the logistic regression).

Gene modules

Genes associated with risk of polygenic IBD were identified from multiple genome-wide association studies, as described previously8, for the analysis in Fig. 5b-d. To build gene modules for the colon and ileum datasets (Fig. 5c), we subsampled 1,000 cells from each cell subset to create a dataset with a more balanced cell subset distribution. For subsets containing fewer than 1,000 cells, we retained all cells belonging to that subset. We next used consensus non-negative matrix factorization (cNMF)29 to estimate 100 factors (i.e. gene modules). cNMF was run on a subset of 2,000 variable genes, which were estimated from the linear relationship between the mean and the coefficient of variation of gene expression8, but was re-fit to include estimates for all genes. We assessed membership in gene modules using the top 250 scoring genes from each module.

Enriched expression and co-expression of gene sets within cell subsets and gene modules

To identify cell subsets that were statistically enriched for the expression of monogenic IBD genes, we computed the mean log2(TP10K+1) expression of each gene across all cell subsets, then discretized these expression levels usingan expression cutoff of 0.25, which results in ~2,215 expressed genes per cell subset (other cutoffs yield congruent results). We then scored each cell subset according to the number of monogenic IBD genes it expressed. To identify gene modules that were statistically enriched for the expression of monogenic IBD genes, we calculated an enrichment score by the number of genes from the monogenic IBD gene set that were in the top 250 genes of each gene module (Fig. 5c). To estimate significance, we compared these enrichment scores to a null distribution that was estimated from 100 background sets of genes. Each background gene set was selected to have matching expression levels, using 20 equal-frequency expression bins that were defined across all cells in the dataset (Fig. 5d, Extended Data Fig. 8). To determine whether monogenic IBD genes were significantly co-expressed within cell subsets and gene modules, we examined all pairs of genes within the monogenic IBD gene set, and compared their frequency of co-expression to a null distribution that was estimated as previously described (Extended Data Fig. 8).

Taxonomy

We generated an integrated taxonomy (Table 1) using 5 main classes of data (Fig. 1-3, Extended Data Table 2). Genes were first categorized according to syndromic phenotype (Fig. 2c). Compartments of cell types were differentiated (Fig. 3b,d). For example, endothelial cells, microvascular cells and post-capillary venules contributed to the endothelial compartment. Highest expression of genes in cells of each compartment were calculated according to inflamed scRNA-seq biopsies (as per Fig. 4a), and represented by color within the taxonomy. The second strongest expressing cellular subset was shown in grey, where genes defied the trend of their syndromic group or there was no trend to align with (mechanism unclear). Syndromic groups were ordered according to the compartment in which the gene group was predominantly expressed. The highest expressed genes are also shown for Tregs and neutrophils (n=10).

Identifying candidate genes within IBD loci

To identify credible candidate genes in IBD-associated loci (Fig. 5a,e,f, Extended Data Table 4), we gathered data from four sources: three polygenic IBD meta-analyses5,27,28 and one fine-mapping study7. The candidate genes were identified by bioinformatic prioritization of candidate genes within IBD loci (using eQTLs and coding SNPs, as well as network prioritization algorithms GRAIL and DAPPLE)5,27,28, alongside fine-mapping of IBD loci to localize signals to single genes5,7. All candidate genes from those sources were combined to provide a long-list of candidates for each locus. In total, we combined data on 223 non-MHC loci, with a total of 343 candidate genes prioritized. 167 loci included at least one candidate gene, and 84 had exactly one prioritized gene. In addition, we assigned a subset of loci as having a high-confidence candidate gene, defined according to bioinformatic evidence, fine-mapping and manual annotation.

  1. Bioinformatic: Multiple sources of bioinformatic evidence that uniquely implicate a single gene (i.e. there is exactly one gene that is identified by at least two prioritization techniques).

  2. Fine-mapping: the fine-mapping localizes the signal to a region that contains exactly one gene.

  3. Manual: Several well-established genes were manually assigned to high-confidence, based on existing functional studies. These genes were: PTPN22, ATG16L1, NOD2, FUT2, IRF5, ITGAL, IL23R, IL10, IFIH1, IRGM, CAR9 and TYK2.

By this analysis, a total of 65 loci had exactly one high-confidence candidate gene. One additional locus was removed from the high-confidence list because of a clash between sources: fine-mapping and bioinformatic analysis around the SNP rs10065637 gave conflicting results, with bioinformatic analyses converging on the gene IL6ST, but fine-mapping localized the gene to the nearby ANKRD55. In all other cases, converging bioinformatic information, fine-mapping and manual annotation agreed. A total of four genes were included in the confidence list to manual annotation only (FUT2, IFIH1, IL10 and IRGM). These genes were all prioritized by bioinformatic analysis, but were not included in the high-confidence list as they were only supported by one source of annotation (FUT2 by a coding SNP, IFIH1 and IRGM by GRAIL), or because multiple genes in the locus were supported by two or more prioritization sources (IL10, which was joined by nearby genes IL19. IL20 and IL24).

Analysis of enriched biological processes

Enrichment of cells’ biological processes under inflamed conditions (Extended Data Fig. 6) was tested using the STRING network, by including cell-type specific monogenic IBD genes that were expressed more than the mean (nominal expression level log2(TP10K+1)) from diseased state UC samples (based on data represented in Fig. 4a). Biological processes enriched with a conservative false discovery rate of 0.00098 (0.05/51) were included. Manual filtering out of non-informative terms like ‘immune response’ was performed and data was transformed by ln(x-1) for unsupervised hierarchical clustering (Extended Data Fig. 6). No scaling is applied to rows. Rows are clustered using correlation distance and average linkage. Columns are clustered using Euclidean distance and average linkage.

Gene networks

Bayesian gene regulatory networks (BGRN) were generated as previously published30 using the RISK pediatric CD ileal intestinal biopsy RNA-seq data (Fig. 5f). High-penetrance monogenic IBD gene-centric subnetwork(s) were generated by selecting high-penetrance monogenic IBD genes from the BGRN and expanding out one or two path lengths (undirected) to obtain the nearest neighbors. Genes associated with IBD due to hypomorphic functionality and complete LOF were included once (e.g., ZAP70). The most connected subnetwork at two path lengths was then extracted and tested for enrichment in either high or moderate-penetrance monogenic IBD genes; IBD GWAS genes (high and low confidence genes) and functional enrichment (Bioplanet58) using Fisher’s exact test (FET) and a Benjamini Hochberg multiple test correction in R (package version 1.0.12.). BGRNs were visualized in Cytoscape version 3.7.259.

Statistics

Calculations of significance for differences in continuous data were performed with Mann and Whitney U test, Fisher exact test, ANOVA and Spearman correlation when applicable.

For the gene set overlap analysis (Fig. 5a), the representation factor of two gene sets was calculated based on a total number of 18,000 human genes.

Extended Data

Extended Data Figure 1. Selection for investigating monogenic syndromes and Mendelian disorders associated with IBD.

Extended Data Figure 2. Graphs showing the relationship between the observed/ reported penetrance and minimum patient number required to identify monogenic disorders with 5% penetrance.

(a) Minimal effect size (observed penetrance) required to detect a >5% penetrance with 90%, 95% or 99% confidence depending on the size of the case series (number of patients with a gene defect).

(b) The minimal number of individuals with IBD phenotype in relation to the total number of patients with gene defect in case series to confidently detect a >5% penetrance. Nominal penetrance and confidence margins were calculated according to adjusted Wald equation.

CI= confidence interval

Extended Data Figure 3. Nonsense allele frequencies of monogenic IBD genes reveal the relative rarity of nonsense mutations compared to polygenic risk variant NOD2 and excluded genes.

(a-b) Nonsense allele frequency analysis of (a) autosomal and (b) X-linked monogenic IBD genes in the ExAC database. Minor allele frequency of nonsense variants (LOF, i.e. stop gain/stop loss, and frameshift variants) are shown at the respective amino acid position of each gene. Examples of monogenic IBD genes (IL10, IL10RA, XIAP, FOXP3, CYBB) were compared to the benchmark gene NOD2 and genes that are unlikely to be associated with high-penetrance IBD (NOX1, DUOX2).

(c) Cumulated allele frequency for nonsense variants vs penetrance of IBD associated with different genetic variants.

AA= amino acid, LOF= loss of function

Extended Data Figure 4. Bulk RNA-seq expression of monogenic IBD genes in multiple organs identifies hematopoietic and intestinal-enriched gene clusters.

(a) Unsupervised hierarchical clustering and (b) principal component analysis of bulk RNA expression (FPKM) analysis of monogenic-IBD genes in 32 different human tissues (Uhlen et al 2015). (a) Heatmaps rows are centered; unit variance scaling is applied to rows. Both rows and columns are clustered using correlation distance and average linkage.

(c) Examples of differentially expressed genes from bulk RNA sequencing that are enriched in intestinal (GUCY2, SLC9A3) and lymphoid tissues (IL10RA, PIK3CD).

Extended Data Figure 5. Protein copy numbers of monogenic IBD genes in peripheral blood immune cells of healthy donors distinguishes compartment-specific gene groups.

(a) Principle component analysis of regulatory (orange), activated (red) and resting/steady state T cells (green), according to protein levels of monogenic IBD genes (Rieckmann et al 2017).

(b) Hierarchical clustering of normalised protein levels (encoded by monogenic IBD genes) for each donor sample (n=3-4) showing different proteomic profiles between neutrophils and monocytes (Rieckmann et al 2017).

DC= dendritic cell, MO= monocyte, CM= central memory, EM= effector memory, EMRA= terminally-differentiated effector memory, ss= steady state

Extended Data Figure 6. Cell-type specific enriched biological processes according to monogenic IBD gene expression.

(a-c) Unsupervised hierarchical clustering of biological processes, according to above average monogenic IBD gene expression in cellular subsets, under inflamed colonic conditions implicates distinct cellular modules.

Extended Data Figure 7. Patterns of inflamed colonic monogenic IBD single-cell gene expression according to syndromic phenotypes.

(a-e) Heatmaps showing scRNAseq data from colonic samples of UC patients, with plot in (a) highlighting NADPH oxidase proteins enriched in Treg vs naïve CD4+ T cells.

Extended Data Figure 8. Enrichment of monogenic IBD genes in the same cell types and gene modules is strongest in genes sharing a syndromic phenotype.

For all monogenic IBD genes (blue, “Monogenic”), monogenic IBD genes with the same syndromic phenotype (blue, “Syndromic”), or background sets of genes that were selected to have matching expression statistics (light grey), the probability that genes from the gene set are co-expressed (y axis) within the same cell type (left panel) or gene module (right panel) across all colon and ileum cells (x axis). Error bars: SEM. P-values: NS: not significant, * P < 0.05, *** P < 0.001.

Extended Data Figure 9. Graphical representation of key innate and adaptive immune cell types and processes associated with monogenic IBD.

Key cellular subtypes associated with monogenic IBD are shown (red), with distinction of hematopoietic (green background) and non-hematopoietic compartments (cream background). Processes are shown (blue boxes) and associated monogenic IBD genes (black)

GC= germinal centre, DC= dendritic cell, Tregs= regulatory T cells, M cells= ‘microfold’ cells

Supplementary Material

Supplementary File 1
Supplementary File 2
Table S5

Acknowledgements

We thank many authors of original papers and the UK Cystic Fibrosis Trust Registry for clarifications and kindly providing additional unpublished information.

Declarations

Author Contributions:

CB, CS, RE, GW, CA, DA, KRJ, JK, AR and HUH provided acquisition and analysis of the data. CB, CS, CA, DA, KRJ, JK, DM, JC, DS, SK, ST, LJ, SS, CK, ES, RX, ST, AM, AV, AR and HUH contributed to interpretation of the data. HUH and CB drafted the first version manuscript. CB, CS, RE, CA, DA, KRJ, AR and HUH contributed to data visualisation. All authors read and provided feedback on the final manuscript. HHU is the guarantor.

Competing Interests statement

None of the authors have a conflict of interest related to this article. HHU received research support or consultancy fees from Eli Lilly, UCB Pharma, Celgene, Boehringer Ingelheim, Pfizer and AbbVie. SPLT has been adviser to, in receipt of educational or research grants from, or invited lecturer for AbbVie; Amgen; Asahi; Biogen; Boehringer Ingelheim; BMS; Cosmo; Elan; Enterome; Ferring; FPRT Bio; Genentech/Roche; Genzyme; Glenmark; GW Pharmaceuticals; Immunocore; Immunometabolism; Janssen; Johnson & Johnson; Lilly; Merck; Novartis; Novo Nordisk; Ocera; Pfizer; Shire; Santarus; Sensyne; SigmoidPharma; Synthon; Takeda; Tillotts; Topivert; Trino Therapeutics with Wellcome Trust; UCB Pharma; Vertex; VHsquared; Vifor; Warner Chilcott and Zeria. A.R. is a co-founder and equity holder of Celsius Therapeutics, an equity holder in Immunitas, and was an SAB member ofThermoFisher Scientific, Syros Pharmaceuticals, Neogene Therapeutics and Asimov until July 31, 2020. From August 1, 2020, A.R. is an employee of Genentech.

Online resources / URLs

The following online data sources have been accessed:

Online Mendelian Inheritance in Man (OMIM): http://www.omim.org

STRING: http://string-db.org

ExAC browser: http://exac.broadinstitute.org

ClinVar: www.ncbi.nlm.nih.gov/clinvar/

ClinGen: https://clinicalgenome.org

Single cell transcriptomics portal https://portals.broadinstitute.org/single_cell/

Cytoscape version 3.7.2: https://cytoscape.org/

Human Protein Atlas project https://www.proteinatlas.org

Contributor Information

Chrissy Bolton, Email: Chrissybolton0@gmail.com.

Christopher S. Smillie, Email: csmillie@mit.edu.

Rasa Elmentaite, Email: re5@sanger.ac.uk.

Gabrielle Wei, Email: gabbie.wei@icahn.mssm.edu.

Carmen Argmann, Email: carmen.argmann@mssm.edu.

Dominik Aschenbrenner, Email: dominik.aschenbrenner@ndm.ox.ac.uk.

Kylie R James, Email: kj7@sanger.ac.uk.

Dermot P.B McGovern, Email: Dermot.McGovern@cshs.org.

Marina Macchi, Email: kj7@sanger.ac.uk.

Judy Cho, Email: judy.cho@mssm.edu.

Dror Shouval, Email: dror.shouval@gmail.com.

Jochen Kammermeier, Email: Jochen.Kammermeier@gstt.nhs.uk.

Sibylle Koletzko, Email: Sibylle.Koletzko@med.uni-muenchen.de.

Simon P.L. Travis, Email: simon.travis@ndm.ox.ac.uk.

Luke Jostins, Email: luke.jostins@kennedy.ox.ac.uk.

Carl A. Anderson, Email: ca3@sanger.ac.uk.

Scott Snapper, Email: Scott.Snapper@childrens.harvard.edu.

Christoph Klein, Email: Christoph.Klein@med.uni-muenchen.de.

Eric Schadt, Email: eric.schadt@mssm.edu.

Ramnik Xavier, Email: xavier@molbio.mgh.harvard.edu.

Sarah Teichmann, Email: st9@sanger.ac.uk.

Aleixo M. Muise, Email: aleixo.muise@sickkids.ca.

Aviv Regev, Email: aregev@broadinstitute.org.

Funding

We acknowledge the contribution of the BRC Gastrointestinal biobank (11/YH/0020, 16/YH/0247), supported by the National Institute for Health Research (NIHR) Oxford Biomedical Research Centre (BRC). This work was supported by the Leona M. and Harry B. Helmsley Charitable Trust (CK, SS, AMM, DMG, JC, GW, CA, ES, and HHU), the Manton Foundation (RX and AR), the Klarman Cell Observatory (AR), and HHMI (AR). AMM is funded by a Canada Research Chair (Tier 1) in Pediatric IBD, CIHR Foundation Grant and NIDDK (RC2DK118640) Grant. ST, KJ and HHU are supported by the Wellcome trust Human Cell Atlas grant.

Data Accessibility Statement

Colonic scRNA-seq is available from the controlled-access data repository DUOS (https://duos.broadinstitute.org). Web links for publicly available datasets used in the study are included. Figures that have associated raw data are publicly available as specified in Online Methods. Data from24 and Elmentaite el al’s manuscript in preparation will be available on publication at gutcellatlas.org. Codes used for analysis are available at https://www.github.com/cssmillie/ulcerative_colitis

References

  • 1.Uhlig HH, Powrie F. Translating Immunology into Therapeutic Concepts for Inflammatory Bowel Disease. Annu Rev Immunol. 2018;36:755–781. doi: 10.1146/annurev-immunol-042617-053055. [DOI] [PubMed] [Google Scholar]
  • 2.Graham DB, Xavier RJ. Pathway paradigms revealed from the genetics of inflammatory bowel disease. Nature. 2020;578:527–539. doi: 10.1038/s41586-020-2025-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Levine A, et al. Pediatric modification of the Montreal classification for inflammatory bowel disease: The Paris classification. Inflamm Bowel Dis. 2011;17:1314–1321. doi: 10.1002/ibd.21493. [DOI] [PubMed] [Google Scholar]
  • 4.Hyams JS. Standardized recording of parameters related to the natural history of inflammatory bowel disease: From montreal to Paris. Dig Dis. 2014;32:337–344. doi: 10.1159/000358133. [DOI] [PubMed] [Google Scholar]
  • 5.de Lange KM, et al. Genome-wide association study implicates immune activation of multiple integrin genes in inflammatory bowel disease. Nat Genet. 2017;49:256–261. doi: 10.1038/ng.3760. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Martin JC, et al. Single-Cell Analysis of Crohn’s Disease Lesions Identifies a Pathogenic Cellular Module Associated with Resistance to Anti-TNF Therapy. Cell. 2019;178:1493–1508.:e20. doi: 10.1016/j.cell.2019.08.008. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Huang H, et al. Fine-mapping inflammatory bowel disease loci to singlevariant resolution. Nature. 2017;547:173–178. doi: 10.1038/nature22969. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Smillie CS, et al. Intra- and Inter-cellular Rewiring of the Human Colon during Ulcerative Colitis. Cell. 2019;178:714–730.:e22. doi: 10.1016/j.cell.2019.06.029. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Kinchen J, et al. Structural Remodeling of the Human Colonic Mesenchyme in Inflammatory Bowel Disease. Cell. 2018;175:372–386.:e17. doi: 10.1016/j.cell.2018.08.067. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Parikh K, et al. Colonic epithelial cell diversity in health and inflammatory bowel disease. Nature. 2019;567:49–55. doi: 10.1038/s41586-019-0992-y. [DOI] [PubMed] [Google Scholar]
  • 11.Corridoni D, et al. Single-cell atlas of colonic CD8+ T cells in ulcerative colitis. Nat Med. 2020:1–11. doi: 10.1038/s41591-020-1003-4. [DOI] [PubMed] [Google Scholar]
  • 12.Boland BS, et al. Heterogeneity and clonal relationships of adaptive immune cells in ulcerative colitis revealed by single-cell analyses. Sci Immunol. 2020;5 doi: 10.1126/sciimmunol.abb4432. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Rieckmann JC, et al. Social network architecture of human immune cells unveiled by quantitative proteomics. Nat Immunol. 2017;18:583–593. doi: 10.1038/ni.3693. [DOI] [PubMed] [Google Scholar]
  • 14.Uhlig HH, et al. The diagnostic approach to monogenic very early onset inflammatory bowel disease. Gastroenterology. 2014;147:990–1007.:e3. doi: 10.1053/j.gastro.2014.07.023. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Uhlig HH. Monogenic diseases associated with intestinal inflammation: implications for the understanding of inflammatory bowel disease. Gut. 2013;62:1795–805. doi: 10.1136/gutjnl-2012-303956. [DOI] [PubMed] [Google Scholar]
  • 16.Hugot JP, et al. Association of NOD2 leucine-rich repeat variants with susceptibility to Crohn’s disease. Nature. 2001;411:599–603. doi: 10.1038/35079107. [DOI] [PubMed] [Google Scholar]
  • 17.Ogura Y, et al. A frameshift mutation in NOD2 associated with susceptibility to Crohn’s disease. Nature. 2001;411:603–606. doi: 10.1038/35079114. [DOI] [PubMed] [Google Scholar]
  • 18.Horowitz JE, et al. Mutation spectrum of NOD2 reveals recessive inheritance as a main driver of Early Onset Crohn’s Disease. bioRxiv. 2017:098574. doi: 10.1038/s41598-021-84938-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Ng SC, et al. Worldwide incidence and prevalence of inflammatory bowel disease in the 21st century: a systematic review of population-based studies. Lancet. 2017;390:2769–2778. doi: 10.1016/S0140-6736(17)32448-0. [DOI] [PubMed] [Google Scholar]
  • 20.Karczewski KJ, et al. The mutational constraint spectrum quantified from variation in 141,456 humans. Nature. 2020;581:434–443. doi: 10.1038/s41586-020-2308-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Schwerd T, et al. NOX1 loss-of-function genetic variants in patients with inflammatory bowel disease. Mucosal Immunol. 2018;11:562–574. doi: 10.1038/mi.2017.74. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Uhlen M, et al. Tissue-based map of the human proteome. Science (80-) 2015;347:1260419. doi: 10.1126/science.1260419. [DOI] [PubMed] [Google Scholar]
  • 23.Smillie CS, Biton M, Ordovas J, Shalek AK, Xavier RJ. Intra- and Inter-cellular Rewiring of the Human Colon during Ulcerative Colitis. doi: 10.1016/j.cell.2019.06.029. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Elmentaite R, et al. Single-Cell Sequencing of Developing Human Gut Reveals Transcriptional Links to Childhood Crohn’s Disease. Dev Cell. 2020 doi: 10.1016/j.devcel.2020.11.010. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Xie X, et al. Single-cell transcriptome profiling reveals neutrophil heterogeneity in homeostasis and infection. Nat Immunol. 2020;21:1119–1133. doi: 10.1038/s41590-020-0736-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Shouval DS, et al. Interleukin 1β Mediates Intestinal Inflammation in Mice and Patients With Interleukin 10 Receptor Deficiency. Gastroenterology. 2016;151:1100–1104. doi: 10.1053/j.gastro.2016.08.055. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Jostins L, et al. Host-microbe interactions have shaped the genetic architecture of inflammatory bowel disease. Nature. 2012;491:119–24. doi: 10.1038/nature11582. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Liu JZ, et al. Association analyses identify 38 susceptibility loci for inflammatory bowel disease and highlight shared genetic risk across populations. Nat Genet. 2015;47:979–986. doi: 10.1038/ng.3359. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Kotliar D, et al. Identifying gene expression programs of cell-type identity and cellular activity with single-cell RNA-Seq. Elife. 2019;8 doi: 10.7554/eLife.43803. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Peters LA, et al. A functional genomics predictive network model identifies regulators of inflammatory bowel disease. Nat Genet. 2017;49:1437–1449. doi: 10.1038/ng.3947. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Gaffen SL, Jain R, Garg AV, Cua DJ. The IL-23-IL-17 immune axis: From mechanisms to therapeutic testing. Nature Reviews Immunology. 2014;14:585–600. doi: 10.1038/nri3707. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Aschenbrenner D, et al. Deconvolution of monocyte responses in inflammatory bowel disease reveals an IL-1 cytokine network that regulates IL-23 in genetic and acquired IL-10 resistance. Gut. 2020 doi: 10.1136/gutjnl-2020-321731. 0. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Efimova O, Szankasi P, Kelley TW. Ncf1 (p47phox) is essential for direct regulatory T cell mediated suppression of CD4+ effector T cells. PLoS One. 2011;6 doi: 10.1371/journal.pone.0016013. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Müller T, et al. No Title. Gut. 2016;65:1306–13. doi: 10.1136/gutjnl-2015-309441. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Fiskerstrand T, et al. Familial Diarrhea Syndrome Caused by an Activating GUCY2C Mutation. N Engl J Med. 2012;366:1586–1595. doi: 10.1056/NEJMoa1110132. [DOI] [PubMed] [Google Scholar]
  • 36.Rossi CP, et al. Interferon beta-1a for the maintenance of remission in patients with Crohn’s disease: Results of a phase II dose-finding study. BMC Gastroenterol. 2009;9 doi: 10.1186/1471-230X-9-22. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.Tesch VK, et al. Long-term outcome of LRBA deficiency in 76 patients after various treatment modalities as evaluated by the immune deficiency and dysregulation activity (IDDA) score. J Allergy Clin Immunol. 2020;145:1452–1463. doi: 10.1016/j.jaci.2019.12.896. [DOI] [PubMed] [Google Scholar]
  • 38.Schwab C, et al. Phenotype, penetrance, and treatment of 133 cytotoxic T-lymphocyte antigen 4-insufficient subjects. J Allergy Clin Immunol. 2018;142:1932–1946. doi: 10.1016/j.jaci.2018.02.055. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Canna SW, et al. An activating NLRC4 inflammasome mutation causes autoinflammation with recurrent macrophage activation syndrome. Nat Genet. 2014;46:1140–1146. doi: 10.1038/ng.3089. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Canna SW, et al. Life-threatening NLRC4-associated hyperinflammation successfully treated with IL-18 inhibition. J Allergy Clin Immunol. 2017;139:1698–1701. doi: 10.1016/j.jaci.2016.10.022. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41.Michael L, Camille J, Brigitte B. PW02-020 - Colitis revealing mevalonate kinase deficiency. Pediatr Rheumatol. 2013;11:1–1. [Google Scholar]
  • 42.Légeret C, et al. JAK Inhibition in a Patient with X-Linked Reticulate Pigmentary Disorder. Journal of Clinical Immunology. 2020:1–5. doi: 10.1007/s10875-020-00867-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43.Bennike TB, et al. Neutrophil extracellular traps in ulcerative colitis: A proteome analysis of intestinal biopsies. Inflamm Bowel Dis. 2015;21:2052–2067. doi: 10.1097/MIB.0000000000000460. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44.Schniers A, et al. Ulcerative colitis: Functional analysis of the in-depth proteome. Clin Proteomics. 2019;16:4. doi: 10.1186/s12014-019-9224-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45.Allaire JM, et al. The Intestinal Epithelium: Central Coordinator of Mucosal Immunity. Trends in Immunology. 2018;39:677–696. doi: 10.1016/j.it.2018.04.002. [DOI] [PubMed] [Google Scholar]
  • 46.Trzupek D, et al. Discovery of CD80 and CD86 as recent activation markers on regulatory T cells by protein-RNA single-cell analysis. Genome Med. 2020;12:55. doi: 10.1186/s13073-020-00756-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47.Al Nabhani Z, et al. A Weaning Reaction to Microbiota Is Required for Resistance to Immunopathologies in the Adult. Immunity. 2019;50:1276–1288.:e5. doi: 10.1016/j.immuni.2019.02.014. [DOI] [PubMed] [Google Scholar]
  • 48.Amatullah H, Jeffrey KL. Epigenome-metabolome-microbiome axis in health and IBD. Current Opinion in Microbiology. 2020;56:97–108. doi: 10.1016/j.mib.2020.08.005. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 49.Bracaglia Claudia, et al. P2068 Microbiota transplant to control inflammation in a NLRC4-related disease patient with recurrent hemophagocytic lymphohistiocytosis (HLH); 10th Congress of International Society of Systemic Auto-Inflammatory Diseases (ISSAID); Springer Science and Business Media LLC; 2019. 18 [Google Scholar]
  • 50.Elmentaite R, et al. The human gastrointestinal tract through space and time. (manuscript in preparation)
  • 51.Zhou Z, et al. Variation at NOD2/CARD15 in familial and sporadic cases of Crohn’s disease in the Ashkenazi Jewish population. Am J Gastroenterol. 2002;97:3095–3101. doi: 10.1111/j.1572-0241.2002.07105.x. [DOI] [PubMed] [Google Scholar]
  • 52.Yazdanyar S, et al. Penetrance of NOD2/CARD15 genetic variants in the general population. 2010;182:661–665. doi: 10.1503/cmaj.090684. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 53.Brant SR, et al. A Population-Based Case-Control Study of CARD15 and Other Risk Factors in Crohn’s Disease and Ulcerative Colitis. Am J Gastroenterol. 2007;102:313–323. doi: 10.1111/j.1572-0241.2006.00926.x. [DOI] [PubMed] [Google Scholar]
  • 54.Silver J. The Importance of Penetration. Inflamm Bowel Dis. 2003;9:341. doi: 10.1097/00054725-200309000-00012. [DOI] [PubMed] [Google Scholar]
  • 55.Wolf FA, Angerer P, Theis FJ. SCANPY: large-scale single-cell gene expression data analysis. Genome Biol. 2018;19:15. doi: 10.1186/s13059-017-1382-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 56.Wolock SL, Lopez R, Klein AM, Abstract G. Scrublet: Computational Identification of Cell Doublets in Single-Cell Transcriptomic Data. Cell Syst. 2019;8:281–291.:e9. doi: 10.1016/j.cels.2018.11.005. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 57.Finak G, et al. MAST: A flexible statistical framework for assessing transcriptional changes and characterizing heterogeneity in single-cell RNA sequencing data. Genome Biol. 2015;16:278. doi: 10.1186/s13059-015-0844-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 58.Huang R, et al. The NCATS BioPlanet – An Integrated Platform for Exploring the Universe of Cellular Signaling Pathways for Toxicology, Systems Biology, and Chemical Genomics. Front Pharmacol. 2019;10:445. doi: 10.3389/fphar.2019.00445. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 59.Bindea G, et al. ClueGO: a Cytoscape plug-in to decipher functionally grouped gene ontology and pathway annotation networks. Bioinformatics. 2009;25:1091–1093. doi: 10.1093/bioinformatics/btp101. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary File 1
Supplementary File 2
Table S5

Data Availability Statement

Colonic scRNA-seq is available from the controlled-access data repository DUOS (https://duos.broadinstitute.org). Web links for publicly available datasets used in the study are included. Figures that have associated raw data are publicly available as specified in Online Methods. Data from24 and Elmentaite el al’s manuscript in preparation will be available on publication at gutcellatlas.org. Codes used for analysis are available at https://www.github.com/cssmillie/ulcerative_colitis

RESOURCES