Abstract
We describe a precision medicine workflow, the integrated single nucleotide polymorphism network platform (iSNP), designed to determine the mechanisms by which SNPs affect cellular regulatory networks, and how SNP co-occurrences contribute to disease pathogenesis in ulcerative colitis (UC). Using SNP profiles of 378 UC patients we map the regulatory effects of the SNPs to a human signalling network containing protein-protein, miRNA-mRNA and transcription factor binding interactions. With unsupervised clustering algorithms we group these patient-specific networks into four distinct clusters driven by PRKCB, HLA, SNAI1/CEBPB/PTPN1 and VEGFA/XPO5/POLH hubs. The pathway analysis identifies calcium homeostasis, wound healing and cell motility as key processes in UC pathogenesis. Using transcriptomic data from an independent patient cohort, with three complementary validation approaches focusing on the SNP-affected genes, the patient specific modules and affected functions, we confirm the regulatory impact of non-coding SNPs. iSNP identified regulatory effects for disease-associated non-coding SNPs, and by predicting the patient-specific pathogenic processes, we propose a systems-level way to stratify patients.
Subject terms: Systems biology, Ulcerative colitis
Single Nucleotide Polymorphisms (SNPs) affect cellular regulatory networks, and SNP co-occurrences contribute to disease pathogenesis in ulcerative colitis (UC). Here the authors introduce iSNP, a precision medicine pipeline that combines genomics and network biology approaches to uncover patient specific pathways affected in complex diseases.
Introduction
Precision medicine is a key clinical goal for the effective treatment of heterogeneous, complex diseases such as inflammatory bowel disease (IBD). Complex, multilayered, integrative techniques are required to identify the individual patients’ complex pathogenic pathways1,2. With IBD, the interlinked facets leading to disease are a dysfunctional immune system and response to environmental triggers, including constituents of the intestinal microbiota and dietary factors, in a genetically susceptible host3. Focusing solely on genetic susceptibility, genome-wide association studies (GWAS) and subsequent fine mapping of identified regions defined causal disease-associated single nucleotide polymorphisms (SNPs)4,5. However, the clinical impact of these SNPs has yet to be realised. A promising approach to assess the functional role of SNPs, and advise clinical practice, is to examine patient-specific sets in combination with systems-level approaches6.
Exome sequencing and protein structural biology have already contributed to the functional annotation of SNPs in protein-coding regions (that alter the amino acid composition and the function of the translated proteins), and how they impact diseases such as obesity7, IBD5 and lung cancer8. Computational workflows prioritise such coding SNPs for further analysis9. These approaches include artificial intelligence methodologies (such as machine learning and deep neural networks) to identify and quantify deleterious regulatory impacts of SNPs using chromatin accessibility and transcription factor binding affinities10, and high-throughput chromatin interaction studies11. This allows for the identification of SNPs of interest. However, understanding the function of SNPs in non-coding regions of the DNA remains challenging, principally because many disease-causing SNPs are in areas yet to be annotated5.
In ulcerative colitis (UC), a form of IBD, coding SNPs comprises less than 10% of the total UC-associated SNPs12. These coding SNPs are not causally related to impaired intestinal barrier function or inflammation that are hallmark pathognomic features of UC13. Understanding of the phenotypic effects of SNPs in IBD has involved the study of monogenic IBD in paediatrics that analysed the deleterious nature of non-coding SNPs14, although in adult-onset IBD these rare individual phenotypic SNPs have not been identified15. A broader and deeper understanding of the function of SNPs in this complex genetic disease is therefore needed.
We propose that functional annotation at the molecular and systems-level of the remaining 90% SNPs located in non-coding regions would expand the utility of these disease-associated SNPs. The proposed gap-filling systems-level analysis is essential, as individual SNPs may have subtle phenotypic effects, but in combination, they may have a pathological impact. Integrated analysis of these non-coding SNPs allows the identification of novel pathogenic pathways, and potentially patient-specific disease susceptibility, thus facilitating precision therapy.
For functional annotation of SNPs in non-coding regions, a key question is whether the SNPs affect gene expression by, for example, affecting long non-coding RNAs16–20, microRNA-target sites (miRNA-TS)21, splicing22–25 or transcription factor (TF) binding sites (TFBS)26 in promoter regions and within the first introns27, which has been reported in complex diseases such as diabetes, schizophrenia, coronary heart disease and Crohn’s disease28–31. In this study, we focused on two regulatory effects as examples; SNPs occurring in transcription factors binding sites and in miRNA target sites as they can be validated by published studies.
To identify the effect of non-coding SNPs, we have built on the concepts identified by Boyle et al. to track the cumulative effects of multiple regulatory SNPs as an ‘omnigenic’ model32. Using network biology approaches that we have previously exploited to uncover novel and important proteins in cancer biology33, we aimed to further understand the pathogenic pathways of UC and to identify novel and previously hidden disease-associated proteins. These proteins are often undetected or hidden in conventional mutation and expression screens as they mostly act as direct interactors (first neighbours) of the proteins affected by a disease-associated SNP. Using first neighbours gives an optimal trade-off to keep specificity while reconstructing a connected graph. Similar studies have utilised the concept of first neighbour proteins in both type 2 diabetes34 and juvenile idiopathic arthritis35. Systems biology approaches have been utilised with predictive network models that identified proteins involved in the pathogenesis of IBD in general36–38 but these approaches are unable to account for regulatory and downstream effects of non-coding SNPs. Therefore, by identifying first neighbour proteins in UC, we aimed to expand current research and identify additional pathogenic pathways of pharmacological use in UC that have been previously undetected or hidden due to a lack of connection with non-coding UC-associated SNPs. As UC is highly heterogeneous, we used individual patient data to identify patient cohorts with similar or different pathogenic pathways of UC.
Connecting non-coding SNPs to pathways, especially in a patient-specific manner, is a much needed but highly challenging approach. To achieve this, we developed a workflow, named the integrative SNP Network Platform (iSNP) by combining systems genomics and network biology approaches into a scalable system. We demonstrated its applicability by analysing a UC-associated signalling network and by identifying patient clusters with distinct pathomechanisms contributing to UC. Within these clusters, we highlighted cluster-specific key players, identifying known and additional proteins as well as patient-specific pathways to the disease. These predicted pathogenic effects were then validated using transcriptomic data from an independent patient cohort39. Integrating systems genomics and network biology data and analysis offers unique biological insights and enables the scalable examination of patient-specific datasets for precision medicine.
Results
Constructing the UC-associated signalling network
The integrative SNP Network Platform (iSNP) was developed to assess the regulatory effects of non-coding SNPs. The iSNP workflow constructs an integrated network based on identifying the proteins whose expression could be directly affected by the SNPs (termed as SNP-affected proteins) and their interactors (first neighbours) through protein–protein interactions. (Fig. 1, Supplementary Fig. 1). We used the UK East Anglia cohort of 378 patients from the UK IBD genetics consortium for this analysis.
Patients from this cohort had a total of 40 individual UC-associated SNPs from which we identified 22 UC-associated regulatory SNPs localised within TFBS or miRNA-TS. These SNPs were annotated to occur within 20 TFBSs and 4 individual miRNA-TSs (Table 1, Supplementary Data 1). 11 of the affected TFBSs were in enhancer regions and 3 were in both enhancer and promoter regions. Each of this affected TFBS and miRNA-TS has multiple TFs and miRNAs binding to them resulting in 264 transcription factors and 405 miRNAs whose regulatory function is affected by these non-coding SNPs (Supplementary Data 1). These regulators are involved in a total of 1490 regulatory interactions (923 TF-TFBS and 550 miRNA-miRNA-TS interactions). The identified regulatory interactions affected by the non-coding SNPs led us to determine the genes whose expression could be impacted by a SNP. These regulatory interactions potentially affected 48 genes.
Table 1.
SNP | Target gene name | Regulatory annotation of the SNP |
---|---|---|
rs11041476 | LSP1 | TFBS in an enhancer, miRNA-TS in the first intron |
TNNI2 | TFBS in an enhancer | |
rs11168249 | RAPGEF3 | TFBS in an enhancer |
HDAC7 | TFBS in an enhancer, miRNA-TS in the first intron | |
rs11676348 | ARPC2 | TFBS in an enhancer |
CXCR1 | ||
CXCR2 | ||
SLC11A1 | ||
CTDSP1 | ||
rs12254167 | CCNY | TFBS in an enhancer |
rs1598859 | NFKB1 | TFBS in an enhancer |
CISD2 | ||
rs17085007 | RPL21 | TFBS in an enhancer |
GTF3A | ||
rs1801274 | FCGR2A | miRNA-TS in an exon |
rs3774937 | NFKB1 | miRNA-TS in an intron |
rs477515 | HLA-DQA2 | TFBS in an enhancer |
HLA-DQB1 | ||
HLA-DQB2 | ||
C4A | ||
HSPA1B | ||
HLA-DPA1 | ||
AGER | ||
NOTCH4 | ||
rs543104 | CCDC82 | TFBS in an enhancer |
rs559928 | RPS6KA4 | TFBS in an enhancer |
rs6087990 | DNMT3B | TFBS in a promoter |
rs7404095 | PRKCB | miRNA-TS in a intron |
rs907611 | LSP1 | TFBS in a promoter |
rs913678 | SNAI1 | TFBS in an enhancer |
CEBPB | ||
PTPN1 | ||
rs943072 | VEGFA | TFBS in an enhancer |
XPO5 | ||
POLH |
aDetails of each interaction are provided in Supplementary Table 1. Cluster-driving SNPs affecting the regulation of a high number of proteins directly or through their first neighbours are shown in bold.
The products of the genes predicted to be affected by the SNPs were filtered for proteins present in the OmniPath network, an integrated and comprehensive resource for manually curated signalling interaction databases. Of the 48 SNP-affected proteins, 33 were in the OmniPath network40,41 and were regulated by 169 TFs and 247 miRNAs. To uncover the larger effect space of the non-coding SNPs, we identified the first neighbour interactors of the 33 SNP-affected proteins. In total, the UC-associated signalling network consisted of 686 protein nodes, 6808 protein–protein interactions resulting in 758 regulatory interactions (Fig. 2a).
The UC-associated signalling network contains three major parts or modules, each over-represented with functions relevant in UC: (1) calcium homeostasis; (2) cell motility and adhesion; (3) stress regulation. Two additional modules were identified, one containing HLA receptors involved in antigen-presentation and one containing other proteins such as MAPKs or HDAC7.
The network visualisation shown in Fig. 2 highlights the weighting of each SNP in the iSNP workflow; if a single nucleotide polymorphism is in a miRNA-TS, enhancer or promoter of a hub protein, which has a high number of neighbours, then it has a larger effect on the network compared to other proteins. This is particularly apparent for the two main SNPs that are the driving force behind the constructed network: rs7404095 and rs913678. rs7404095 affects PRKCB gene through a miRNA -TS whereas rs913678 affects PTPN1, CEBPB and SNAI1 genes through a TFBS in an enhancer region.
The UC-associated signalling network uncovered interesting regulatory feedback loops (Fig. 2a). In these loops, TFs (Fig. 2a, listed in Supplementary Data 2A) are regulatory genes encoding proteins that interact with the same TF at the protein–protein level. The TFs include key stress response regulators, such as MYC, JUN, PPARA, PPARG, CEBPA and HIF1A. By using the whole feedback loop for a Gene Ontology biological process enrichment test, they were enriched in relation to cell proliferation, wound healing, angiogenesis regulation, stress response and cytokine response (Supplementary Data 2C). Thus, these feedback loops are affected by UC-associated regulatory SNPs and systematically perturb cellular processes critical in UC pathogenesis.
Identification of patient-specific clusters based on the UC-associated network
We then investigated how the UC-associated signalling network was different in each of the 378 UC patients. Based on the set of SNPs present in each patient, we defined patient-specific UC-associated signalling networks, called ‘network footprints’. Unsupervised hierarchical clustering using different linkage algorithms of 378 patients stratified the patient-specific network footprints into four distinct clusters (Fig. 3a). The distribution of patients in the four clusters is presented in Supplementary Table 1.
SNP-affected proteins with many protein interactions drove the clustering of patients, often designated as hub proteins in network biology. In our analysis, we defined these proteins as ‘cluster driving proteins’ and the SNPs affecting them are identified in Table 1 (bold text). The SNP rs7404095 affecting PRKCB gene had the largest effect in clustering the patients, as it has 305 interactor partners in the network. PRKCB has been implicated in the pathogenesis of IBD due to its effects on the colonic mucosa42, colonic microbiota43 and cell junction complexes44,45. This SNP divides the patient cohort into two different clusters (Fig. 3a, b). The secondary divider for clusters is the SNP rs913678, which is in the enhancer region of SNAI1, PTPN1 and CEBPB. SNAI1 is a transcription factor involved in epithelial-mesenchymal transition46. In dextran sulphate sodium (DSS)-induced colitis, it was shown that SNAI1 augmented the effects of MIST1 on the inflammasome protein NLRP3, promoting inflammation47. PTPN1 is a phosphatase that inhibits many tyrosine phosphate receptors such as EGFR48 or PDGFR49,50. Inhibiting PTPN1 increases angiogenesis and decreases inflammation51. CEBPB is a transcription factor overexpressed in both DSS- and beta caryophyllene-induced colitis52. Tertiary drivers are the SNP rs477515 affecting TFBSs in the enhancer region of HLA genes, and the SNP rs943072 affecting TFBS in a shared enhancer region of VEGFA, XPO5 and POLH.
We used two additional network resources (Reactome53 and STRING54) to validate the clustering of the patients. From the 48 SNP-affected proteins 23 were present in Reactome and 33 in STRING. The UC-associated signalling networks were not similar, due to the complementarity of the three used networks (Supplementary Fig. 2). The clusters were driven by the primary hubs in the networks that were the various HLA proteins in Reactome and STRING, and the secondary drivers were the VEGFA, XPO5 and POLH proteins (Supplementary Data 4). These SNP-affected proteins divided the patient clusters tertiary and quaternary in the OmniPath network-based clustering (Supplementary results). The similarity of the patient clusters was low (adjusted rand index <0.05; Supplementary Fig. 1) but the Gene Ontology Biological Processes enriched in the networks were similar in all three networks, highlighting various immune functions (Supplementary Fig. 3, Supplementary Data 5).
Looking at the distribution of affected proteins in the patient cohort (Fig. 3c), we identified processes and proteins frequently affected in UC patients as well as more specific processes that were affected only in a smaller group of patients. In particular, we found that 63 proteins were affected in 79.5% of the patients (300 patients) that were involved in various immune system processes, autophagy and NFKB signalling (Supplementary Fig. 4, Supplementary Data 6). Also, 114 proteins were affected in less than 170 patients (Supplementary Fig. 4 and Supplementary Data 6) that were involved in cellular adhesion, angiogenesis and transmembrane receptor tyrosine kinase activity.
Validating the iSNP clusters using an independent cohort
To validate the iSNP methodology, we used the TAMMA resource55, which is the largest available transcriptomic resource in IBD where the origin of the patient biopsy is available. We identified the study GSE10914239 containing 206 juvenile, treatment-naive UC samples from their index colonoscopy (at diagnosis with active disease) and 20 juvenile controls. The data were coming from the PROTECT study56. We defined whether a gene is differentially expressed in the UC patients compared to controls using fold change as a simple metric and developed three validation approaches (Fig. 4a): (1) Using the SNP-affected genes to determine whether they are differentially expressed in the transcriptomic dataset; (2) Examining differentially expressed genes from the UC-associated signalling network in the transcriptomic dataset; (3) Comparing overlapping Gene Ontology Biological Processes of the SNP-affected proteins with the Gene Ontology Biological Processes of the differentially expressed genes from the transcriptomic dataset.
The first validation approach revealed that the SNP-affected genes were differentially expressed on average in 63.24% patients (SD = 39.58%) (Fig. 4b). Of the cluster-driving SNP-affected genes, PRKCB and two HLA genes, HLA-DQB1 and HLA-DPA1, were differentially expressed in all patients in the validation cohort, whereas VEGFA and CEBPB were differentially expressed in 97.6% and 93.2% of the patients, respectively. This validation analysis demonstrated that the SNP-affected genes we have functionally annotated (predicted) were also differentially expressed in an independent cohort of UC patients.
The second approach (Fig. 4c) used the cluster-driving proteins and their first neighbour’s gene expression to compare the patient clusters generated from the transcriptomic measurements with the patient clusters generated from the iSNP pipeline. Two clusters were similar between the transcriptomic and the genomic datasets derived analyses. The first had all the SNP-affected genes and their first neighbours differentially expressed (the red cluster on Figs. 3b, 4c), with the second one containing only a few differentially expressed genes (the purple cluster on Figs. 3b, 4c). These clusters matched clusters 1 and 4 in the iSNP study analysis, respectively. The most differentially expressed genes in the analysis were genes that were first neighbours of more than one cluster-driving proteins, or the NFKB1-related first neighbours impacting the clustering of the transcriptome analysis. These results imply that the cluster-driving proteins highlighted by the iSNP workflow are also identified as being important in an independent cohort of UC patients. Moreover, we replicated the patient clustering with an independent cohort, and using only transcriptomic data with no genotype (SNP) data, further validating the power of the iSNP approach.
The third validation approach showed that the biological functions which we have identified using the UC-associated signalling network were also differentially regulated in the independent cohort. We identified the Gene Ontology biological processes that were overlapping between the differentially expressed genes and first neighbours of SNP-affected proteins (Fig. 4d, Supplementary Data 3). These included unspecific functions, such as metabolic process, regulation of signalling or cell motility. The overlapping biological processes which were not differentially expressed were upstream regulatory functions, such as MAPK cascade or response to insulin. Specific over-represented processes from the iSNP network analysis were upstream processes such as interleukin-6 mediated signalling, wound healing, and Notch signalling. The specific processes over-represented based on the differentially expressed genes from the validation cohort were downstream, inflammation-related processes including immune cell activation (e.g. T cell differentiation, neutrophil activation, macrophage activation). We also compared the over-represented gene ontology biological processes in the differentially expressed genes with those biological processes which were over-represented in the first neighbours of the cluster-driving proteins (side stacked bar chart in Fig. 4d). On average, 37.7% of the enriched biological processes were similar between the differentially expressed genes and the first neighbours of the cluster-driving SNPs.
Our validation approaches confirmed that the iSNP analysis identified the known genes involved in active UC. Moreover, with the increased coverage from the first neighbours of the SNP-affected proteins, iSNP enabled the identification of those genes and proteins that are involved in UC pathogenesis that would not have been identified by conventional genetic or transcriptomic analysis alone.
Discussion
We have designed an integrated systems genomics workflow (Fig. 1, Supplementary Fig. 1), termed iSNP, to layer patient data from population-wide genomics with network biology and transcriptomics using UC as a model of a complex genetic disease. Our aim was to resolve the complex genetic background contributing to disease pathogenesis for an individual patient. To achieve this, we first identified so far hidden proteins involved in UC pathogenesis, second we identified key pathogenic pathways for UC and third we determined if patients had similar or different pathological processes in disease development. This was done with a view to providing insights that could advance personalised medicine for patients with UC. This study used functional annotation of non-coding SNPs with the integration of transcriptomics and protein–protein interactions at an individual patient level.
There are significant challenges in designing and executing computational pipelines for functional analysis of genetic data, particularly on an individual patient basis (see Supplementary Discussion for more detailed discussion). To overcome input challenges, we accessed high-quality individual patient genetic information from the UK IBD Genetic Consortium. This comprises preprocessed and quality-controlled immunochip data57, giving individual patient alleles present at SNP sites. This allowed us to functionally annotate UC-associated SNPs on a patient-by-patient basis. A binary approach was used for determining whether a SNP-affected the regulation of a gene or protein, allowing us to identify when a SNP weakly affects the binding of a transcription factor (TFBS) or miRNA target site (miRNA-TS), but does not eliminate the site completely, giving a broader overview of SNP functional annotation.
For functional annotation of SNPs within TFBS, we utilised the two widely cited, validated tools, Regulatory Sequence Analysis tools (RSAT)58,59 and Find Individual Motif Occurrences (FIMO)60. We considered the length of the TFBS query sequence to include promoters and enhancers. We acknowledge that not all TFBS in enhancer regions will be active, and that recently artificial intelligence techniques have integrated predictions of chromatin interaction with SNP data to identify SNPs in areas of active chromatin10,11. A switch mechanism to identify which TFBS were active or inactive was not available during the development or expansion of the iSNP workflow so we adopted a simple approach: If a TFBS was affected in an enhancer site by a SNP with a target gene in the Human Enhancer Disease Database (HEDD) it was retained within the network.
In terms of the miRNA-TS identification algorithm, both MIRANDA61 and TargetScan62 were trialled for inclusion in the pipeline. Both performed well; however, as TargetScan requires genome assembly to work, it was not plausible to integrate it into a functional annotation pipeline. Although SNPs may impact other parts of miRNA biogenesis and action, we utilised the site of SNP impact with the largest wealth of experimental data.
The UC-associated signalling network identified mechanisms of transcriptional and post-transcriptional regulation impacted by non-coding SNPs. There was more transcriptional regulation of SNP-affected genes than miRNA-based regulation (Fig. 2b) due to the significant number of SNPs annotating within TFBS in enhancer regions (Table 1). Each enhancer influences multiple genes and multiple transcription factors were predicted to bind to any given enhancer, meaning that each SNP had a pleiotropic but individually minor effect on the expression of various genes.
In contrast, the SNPs in miRNA-TSs have a specific effect on their individual target genes. Due to the fine-tuning role of miRNAs, the gain or loss of a miRNA-TS by itself has a small effect on the regulation of a cell63. iSNP mapped both the specific and pleiotropic regulatory changes one step further using a protein–protein interaction network. This has an inherent risk of increased noise within the network, and to reduce this we utilised the sparse OmniPath which integrates experimentally validated protein–protein interactions from 44 sources40. From studies of cancer-related signalling networks, we have shown that information regarding pathogenic pathways to disease can be gleaned from the direct protein–protein interactors for a protein of interest33. By integrating the protein–protein interaction and regulatory SNP effects, the iSNP method highlighted key pathogenesis pathways including calcium homeostasis, cell adhesion, stress response and cytokine signalling (Fig. 2a, b). We also compared the results we got using the OmniPath network with two other protein–protein interaction networks, and we found similar functions affected by SNPs. This confirmed that our findings did not depend on the specific network resource we used in the study.
The calcium homeostasis signalling pathway has not been identified previously as a driver of inflammation in UC. Intracellular calcium levels were described as altered in ulcerative colitis64 and described as a mechanism involved in DSS induced colitis in vitro65. However, closely linked with calcium homeostasis are Vitamin D signalling pathways, which have been hypothesised as a link between aberrant colonic mucosal vitamin D metabolism and the development of IBD66,67. Calcium homeostasis is likely linked to osteopenia and osteoporosis in IBD. Further investigation is required to decide what part of the intracellular or systematic calcium metabolism is affected in UC. There was not enough granularity in the clinical data, or a large enough population size, to determine if the cohorts of patients with affected calcium homeostasis had alterations in their bone mineral density compared to those patients without this pathway involvement, or to remove confounders such as recurrent corticosteroid therapy.
Pathways involved in the regulation and cellular response to stress, including wound healing and stress-related TFs, such as PPARs, were identified via NFKB1. Wound healing is complex and in the intestine involves multiple cell types, including immune cells, macrophages, fibroblasts, endothelial cells, intestinal epithelial cells and stem cells. Intracellularly, these pathways are also complex, but within the UC-associated signalling network, we identified the involvement of proteins integral to inflammasomes and peroxisomes. Specifically, within the UC-associated signalling network, we identified SNAI1, which is a regulator of the NLRP3 inflammasome47. There has been extensive analysis of the NLRP3 inflammasome and its role in IBD in both animal and in vitro studies, but the results are inconsistent, with the NLRP3 inflammasome being deleterious or protective depending on the colitis model used, the gut microbiota, or the means of inducing colitis in animal models68.
Pathways impacting immune cell motility and cellular adhesion in UC form the basis of therapeutic management with vedolizumab (a4b7 integrin inhibitor) and etrolizumab (b7 integrin subunit inhibitor). Neither gene was affected by a SNP within the network, nor in the first neighbours, but cell motility and adhesion pathways feature in a distinct subset of patients indicating a potential mechanism and explanation by which therapies that impact these pathways may be more or less successful in certain subsets of patients. This needs to be examined more closely and validated in a large clinical cohort, as it may be a means for personalising therapeutic strategies based on patient-specific underlying pathogenic mechanisms in UC.
From the individual patient networks, we undertook unsupervised clustering, which was driven by the highest degree nodes (hub) using distance metrics within a hierarchical agglomeration method. This allows us to identify structures within the networks, which were hitherto unknown. One limitation of this approach is a potential bias towards promiscuous hubs, which have high numbers of curated interactions within the interactome networks. An example of this is PRKCB. Conversely, these large hub proteins are very important to the network69 as they identify where a SNP has a wider effect on signalling pathways, and from this, we can identify particular pathways unique to clusters of patients which aim to correlate with therapeutic response or disease process. However, no significant differences based on the cohorts (Chi-square tests p > 0.05, One way ANOVA p > 0.05, Supplementary Table 1) were found. This is not unexpected as it required nearly 30,000 patients for Cleynen and colleagues to identify NOD2, MHC and 3p21 as being associated with the age of disease onset and disease location in IBD70.
Our analysis identified multiple genes whose translated proteins were hubs within the network including NFKB1 which is a central player in inflammatory signalling cascades, immune-mediated processes and in tight junctions regulation, but in our network was shown not to be a cluster-driving protein. The HLA proteins were cluster-driving proteins within the network, but did not include the known IBD HLA serotypes71 (HLA-DQB1 with Crohn’s72) with HLA-DQB2 and HLA-DPA1 being associations identified here. Unexpected cluster-driving proteins were identified that have clear links with IBD such as PRKCB, and VEGFA73,74 as well as proteins that have not been previously associated with UC including Exportin 5 and DNA polymerase eta. The involvement of Exportin 5 (a required protein for canonical miRNA biogenesis75), as well as the multitude of miRNA-TSs identified, adds weight to UC being a disease whose pathogenesis is intrinsically complex, with multiple small impacts on upstream gene regulation as opposed to singular high impact phenotypic mutations.
Whilst we have used UC as a use case study for iSNP, the pipeline is not disease-specific. We have made iSNP accessible and tailorable, accounting for the importance of functional annotation and downstream analysis of non-coding SNP effects for complex genetic diseases. iSNP is a dockerised pipeline that can be interfaced using the command line. Each of the analytical modules of the pipeline can be run independently of each other or run from start to finish. The parameters for each analytical module can be tuned by the user based on the input data. It is available on GitHub at https://github.com/korcsmarosgroup/iSNP.
The integrative SNP Network Platform (iSNP) is a workflow to functionally annotate non-coding SNPs, identify the first neighbour interactions within a disease-specific network and identify signalling pathways in which these SNPs and interactors are over-represented. iSNP has the functionality to allow this to be done on a broad scale to identify disease-associated pathways, and on an individual level to identify patient-specific affected pathways. Using UC as an example of a complex genetic disease, iSNP has identified how patients have differing mechanisms of pathogenesis. We identified pathways regulating the cellular response to stress, cell motility and calcium homeostasis as being over-represented in the UC-associated signalling network. Further work now needs to be done on larger cohorts and with multi-omics datasets to confirm the potential for iSNP to be used for precision therapy based on patient-specific genetics.
Methods
Sources of SNP data
UC-associated index SNPs were identified from the UK IBD Genetics Consortium Immunochip data12 and the Broad Institute Repository76. If no fine mapping was available for an index SNP (the immunochip finemapped SNP had an R2 < 0.8), then the highest proxy partners (based on tightest linkage disequilibrium and distance) were assessed using a SNP proxy search and were included in the analysis. Each SNP was annotated using Ensembl from the rsID using the genome map GRCH38.p7. Disease-associated SNPs were retrieved from the original data source.
After obtaining ethics approval from the University of East Anglia Faculty of Medicine and Health Science ethics committee (ref 02-01-16), anonymised individual patient immunochip data and clinical parameters for 378 patients were retrieved from the UK IBD Genetics Consortium from seven centres across East Anglia, UK (Cambridge, Norwich, Ipswich, Stevenage, Luton, Bedford and West-Suffolk). Informed consent of the patients was obtained by the IBD Bio-resource team. The patients have consented to the IBD Bio-resource consent form version 2. We included patients between 16 years and 83 years of age at diagnosis to account for the bimodal age prevalence of UC (See Supplementary Table 1 for patient demographics). SNPs were characterised into different types depending on their location in the genome: exonic (missense, synonymous), intronic/untranslated regions and intergenic. Flanking nucleotide sequences were obtained from the downloaded September 2017 version of dbSNP77. For the list of analysed SNPs and their effect, see Supplementary Data 1.
Assessing the effect of SNPs on transcription factor binding sites and miRNA-TS
From the JASPAR database, 746 human transcription factors’ binding profiles represented by Position Specific Scoring Matrices (PSSMs) were downloaded78. The JASPAR format PSSMs were converted to the TRANSFAC format to ease handling of results. To assess the effect of the SNP on the gain or loss of putative TF binding sites, flanking sequences 50 bases upstream and downstream of the SNPs were extracted. The Regulatory Sequence Analysis Tool (RSAT) matrix-scan58 was used to search for potential TFBS in the ancestral and patient-specific mutant alleles. The background model estimation was determined by using residue probabilities from the genome version GRCH38.p7 sequences of all promoters based on the UCSC genome table browser79 5KB before the TSS and all enhancers from the HEDD database80. In calculating the background probabilities we used a Markov order of 1. The search was subject to both strands of the sequences. Hits with a P-value ≤1e-05 were considered binding sites. Other parameters were set at default values.
As a complementary TF binding sites prediction algorithm, FIMO was used60. FIMO predicts the transcription factor targets sites using a matrix-based sequence scanning algorithm without a hidden Markov model, unlike the previous tool RSAT matrix-scan. It calculates the log-odds scores comparing random and test sequences followed by a Benjamini-Hochberg-based false discovery correction of the P-value. The false discovery rate cut-off was 0.1.
To increase the coverage of the TF binding sides, enhancer regions were added using the Human Enhancer Disease Database (HEDD)81. HEDD contains the enhancers from ENCODE82, FANTOM583,84 and the Epigenomics RoadMap85. To assess the effect of the SNPs on miRNA-TSs, the 22 bp sequences of mature miRNAs were retrieved from miRBase86,87. The flanking sequences of SNPs were assessed for the presence of miRNA-TSs using miRanda88. Hits occurring in the seed region (2’–8’) of the miRNAs, and with alignment scores ≥90 and energy threshold ≤ −16 kcal/mol were considered as TS. Other parameters were set to default settings. TSs in the coding region or in the first intronic region were kept. A final manual check was performed to ensure that the SNPs overlapped with the predicted TFBS or miRNA target sites. For the miRNA-TS predictions, miRanda was chosen as it predicts and characterises miRNA binding sites using entropy-based binding energy scores instead of traditional conservation-based methods88. Gain or loss of the regulatory interactions between TFs and protein-coding genes were also considered where the protein-coding gene was in the promoter or in the enhancer region. We defined the promoter regions as 5 kb upstream from the transcription start site and downstream to the first exon of the gene. This information was retrieved using the feature retrieval function of the UCSC genome table browser79. The effect of SNPs on the uncovered TFBS or miRNA-TSs was classified into either a gain or loss of binding site/target site or a neutral change. Only those sites identified as loss or gain regarding sites corresponding to the ancestral allele were considered for subsequent analysis. We referred to genes corresponding to such SNPs as ‘SNP-affected genes’.
Network construction and analysis
Protein–protein interactions of the proteins encoded by SNP-affected genes were obtained from OmniPath on 10 January 202040,41. For the STRING network, we used stringent parameters using only the physical protein–protein interactions: values >0 in the experimental and database channel in the physical links downloaded on 28 October 202154. For the Reactome interactions, we used the Homo sapiens mitab interaction file downloaded on 28 October 202189. All interactions were translated to UniProt Accession numbers90 using the UniProt mapping tool with a python script. For each patient, the set of proteins encoded by SNP-affected genes and their first interactors (first neighbours) were defined as the UC-associated network footprint of a particular patient. The union of all network footprints, the UC-associated signalling network, was analysed and visualised in Cytoscape 3.3.091 using the inverted self-organising map layout. We retained only those SNP-affected genes which were present in the OmniPath resource and which formed a giant component with their interactors. Patient-specific networks were constructed using the Cytoscape CyRestClient 0.6 in Python 3.692.
Module analysis was carried out using the Clustermaker2 1.1.0 Cytoscape app93 implementing the GLay clustering method94, which is an implementation of the Girvan-Newman clustering algorithm95. Briefly, the clustering method deletes the highest betweenness edges from the network until the network collapses to non-connected components and these components form the clusters. We used this clustering method due to being algorithmically quick and giving biologically meaningful clusters. (For further discussion see Supplementary Discussion). We call the network clusters ‘modules’, to distinguish them from patient clusters.
Hierarchical clustering and statistical analysis
The scikit-learn (v 0.23.) package was used for hierarchical clustering of the patient-specific clusters96. The constructed distance matrix between patients was based on the Hamming distance97. If a protein was directly or indirectly affected by a SNP, it was assigned a value of “1” for a patient. If the protein was not affected, it was scored as “0”. The cluster similarity was measured using the adjusted rand index from the python Scikit-learn package96.
Gene Ontology analysis
The Gene Ontology analysis was performed using the GORILLA tool98. The gene ontology biological processes were visualised using REVIGO99. For the overrepresentation test, the background was the giant component of the specific network resource (OmniPath, Reactome, or STRING). The tests were false discoveries corrected by the Benjamini-Hochberg method. We considered a Gene Ontology Biological Process term representative for a cluster if it was enriched with a corrected q < 0.05.
Validation cohort analysis
The TAMMA transcriptomics collection datasets were downloaded on 14 June 202155. After examining the metadata, the study GSE10914239 was used as it had annotated source tissue and an adequate number of patients and controls (206 and 20, respectively). Expression tables were assembled from the gene-specific expression values remaining those genes expressed in 10 or more read counts and the samples were normalised using the limma package (version 3.50.1)100 which implemented voom101. The log2 normed counts were used for further analysis. On a patient to patient basis, the fold change values were calculated by comparison with the average of the control samples. If the absolute differential expression was >1 then the gene was considered to be differentially expressed in that patient. This binary matrix was used for clustering and visualisation.
For case one, only the SNP-affected genes in the OmniPath database were used (Table 1). For case two, the UC-associated signalling network was used with the proteins grouped by the hub SNPs. For case three, differentially expressed genes in GSE109142 were used to compare the SNP-affected genes’ first neighbours enriched gene ontology biological processes. The definition of differentially expressed genes was |FC| > 1 and q < 0.05 Benjamini-Hochberg corrected moderate t-test using the standard limma analysis pipeline100.
Reporting summary
Further information on research design is available in the Nature Research Reporting Summary linked to this article.
Supplementary information
Acknowledgements
J.B.W. was funded by a Wellcome Trust Clinical Training Fellowship. D.M. and A.B. were funded by a European Research Council Starting Grant (336159). P.S. and S.V. were supported by the European Research Council Advanced Grant (ERC-2015-AdG, 694679, CrUCCial). The work of D.M., P.S., M.S.B., L.J.H., S.R.C. and T.K. were supported by the BBSRC Gut Microbes and Health Institute Strategic Programme BB/R012490/1 and its constituent projects BBS/E/F/000PR10353 and BBS/E/ F/000PR10355. L.J.H. is also funded by Wellcome Trust Investigator Awards (100974/Z/13/Z and 220876/Z/20/Z). A.W. is funded by the BB/K018256/1 grant. D.M., P.S. and T.K. were also supported by a BBSRC Core Strategic Programme Grant for Genomes to Food Security (BB/CSP1720/1) and its constituent work packages, BBS/E/T/000PR9819 and BBS/E/T/000PR9817. The work of J.B.W., T.K. and S.R.C. were supported by a Norwich Research Park Translational Fund grant (NRP/TF/5.3). O.K. is funded by the National Research, Development and Innovation Fund of Hungary under Grant FK 13426. B.V. is funded by the Clinical Research Fund (KOOR), University Hospitals, Leuven, Belgium. J.P.T. is funded by an Academic Clinical Fellow supported by the National Institute of Health Research (NIHR) and has been awarded funding through the Health Education England (HEE) Genomics Education Programme. M.M. is supported by the BBSRC Norwich Research Park Biosciences Doctoral Training Partnership (grant numbers BB/M011216/1 and BB/S50743X/1).
Author contributions
J.B.W., S.R.C. and T.K. designed the iSNP workflow, and wrote the manuscript with D.M. J.B.W., D.M., P.S., M.S.B., D.F., B.B. and M.M. developed and automated the workflow. M.P. provided the East Anglian SNP data and metadata. D.M. carried out network analysis and the GSEA. P.S., O.K., A.Z. and D.M. were involved in data analysis and interpretation. J.P.T. and L.J.H. contributed to writing the manuscript and interpreting the biological data. J.B.W., M.P., A.W., M.T., B.V., S.V. and B.M. provided clinical insight and/or clinical data analysis, and all contributed to writing the manuscript. A.B. supervised the work of D.M. and A.Z., and contributed to writing the manuscript. All the authors read and approved the final version of the manuscript.
Peer review
Peer review information
Nature Communications and the authors thank the anonymous reviewer(s) for their contribution to the peer review of this work. Peer reviewer reports are available.
Data availability
The immunochip SNP data were retrieved from the IBD bioresource database https://www.ibdbioresource.nihr.ac.uk/. The data are available under restricted access due to the clinical and so sensitive nature of the data. Access can be obtained by applying to the IBD Bio-resource through https://www.ibdbioresource.nihr.ac.uk/index.php/resources/applying-for-access-to-the-ibd-bioresource-panel-2/. The outcome of the pipeline is available in Supplementary Data 7 containing internal patient IDs, SNP-affected genes and the transcription factors and miRNAs. The transcriptomic data were downloaded from the GEO database accession: GSE109142.
Code availability
The iSNP pipeline is available in the project GitHub page: https://github.com/korcsmarosgroup/iSNP, 10.5281/zenodo.6346651.
Competing interests
J.B.W., T.K. and S.R.C. are named inventors on the granted patent PCT/GB2019/053128, INT.class: G16B 5/00. The patent was applied by the Earlahm Institute and Quadram Institute and it contains the iSNP workflow to create disease-specific networks from SNP data. J.B.W. received lecture fees from Falk Pharma and financial support for research from AbbVie. B.V. and S.V. received financial support for research from MSD, Abbvie, Janssen, Takeda and Pfizer; lecture fees from Abbott, Abbvie, Merck Sharpe & Dohme, Ferring Pharmaceuticals, Pfizer, Takeda, Galapagos/Gilead and UCB Pharma; consultancy fees from Pfizer, Ferring Pharmaceuticals, Shire Pharmaceuticals Group, Merck Sharpe & Dohme, Abbvie, Takeda, Prodigest, Celgene, Galapagos, Gilead, Arena Pharmaceuticals, Genentech/Roche, Abivax and AstraZeneca Pharmaceuticals. D.M. got consultancy fees from HEALX and IOTA Pharmaceuticals. The remaining authors declare no competing interests.
Footnotes
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
These authors contributed equally: Johanne Brooks-Warburton, Dezso Modos, Padhmanand Sudhakar.
Contributor Information
Simon R. Carding, Email: Simon.Carding@quadram.ac.uk
Tamas Korcsmaros, Email: Tamas.Korcsmaros@earlham.ac.uk.
Supplementary information
The online version contains supplementary material available at 10.1038/s41467-022-29998-8.
References
- 1.Seyed Tabib NS, et al. Big data in IBD: big progress for clinical practice. Gut. 2020;69:1520–1532. doi: 10.1136/gutjnl-2019-320065. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Grapov D, Fahrmann J, Wanichthanarak K, Khoomrung S. Rise of deep learning for genomic, proteomic, and metabolomic data integration in precision medicine. OMICS. 2018;22:630–636. doi: 10.1089/omi.2018.0097. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.de Souza HSP, Fiocchi C, Iliopoulos D. The IBD interactome: an integrated view of aetiology, pathogenesis and therapy. Nat. Rev. Gastroenterol. Hepatol. 2017;14:739–749. doi: 10.1038/nrgastro.2017.110. [DOI] [PubMed] [Google Scholar]
- 4.de Lange KM, et al. Genome-wide association study implicates immune activation of multiple integrin genes in inflammatory bowel disease. Nat. Genet. 2017;49:256–261. doi: 10.1038/ng.3760. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Huang H, et al. Fine-mapping inflammatory bowel disease loci to single-variant resolution. Nature. 2017;547:173–178. doi: 10.1038/nature22969. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Pai S, Bader GD. Patient similarity networks for precision medicine. J. Mol. Biol. 2018;430:2924–2938. doi: 10.1016/j.jmb.2018.05.037. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Cheng M, et al. Computational analyses of obesity associated loci generated by genome-wide association studies. PLoS ONE. 2018;13:e0199987. doi: 10.1371/journal.pone.0199987. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.McKay JD, et al. Large-scale association analysis identifies new lung cancer susceptibility loci and heterogeneity in genetic susceptibility across histological subtypes. Nat. Genet. 2017;49:1126–1132. doi: 10.1038/ng.3892. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Zhou L, Zhao F. Prioritization and functional assessment of noncoding variants associated with complex diseases. Genome Med. 2018;10:53. doi: 10.1186/s13073-018-0565-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Xu C, et al. Quantifying functional impact of non-coding variants with multi-task Bayesian neural network. Bioinformatics. 2020;36:1397–1404. doi: 10.1093/bioinformatics/btz767. [DOI] [PubMed] [Google Scholar]
- 11.Meng X-H, Xiao H-M, Deng H-W. Combining artificial intelligence: deep learning with Hi-C data to predict the functional effects of non-coding variants. Bioinformatics. 2021;37:1339–1344. doi: 10.1093/bioinformatics/btaa970. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Jostins L, et al. Host-microbe interactions have shaped the genetic architecture of inflammatory bowel disease. Nature. 2012;491:119–124. doi: 10.1038/nature11582. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Prager M, Buettner J, Buening C. Genes involved in the regulation of intestinal permeability and their role in ulcerative colitis. J. Dig. Dis. 2015;16:713–722. doi: 10.1111/1751-2980.12296. [DOI] [PubMed] [Google Scholar]
- 14.Kircher M, et al. A general framework for estimating the relative pathogenicity of human genetic variants. Nat. Genet. 2014;46:310–315. doi: 10.1038/ng.2892. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Uhlig HH, Muise AM. Clinical genomics in inflammatory bowel disease. Trends Genet. 2017;33:629–641. doi: 10.1016/j.tig.2017.06.008. [DOI] [PubMed] [Google Scholar]
- 16.Mirza AH, Kaur S, Brorsson CA, Pociot F. Effects of GWAS-associated genetic variants on lncRNAs within IBD and T1D candidate loci. PLoS ONE. 2014;9:e105723. doi: 10.1371/journal.pone.0105723. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Peng, C., Han, S., Zhang, H. & Li, Y. RPITER: A hierarchical deep learning framework for ncRNA-protein interaction prediction. Int. J. Mol. Sci. 20, 1070 (2019). 10.3390/ijms20051070 [DOI] [PMC free article] [PubMed]
- 18.Pyfrom SC, Luo H, Payton JE. PLAIDOH: a novel method for functional prediction of long non-coding RNAs identifies cancer-specific LncRNA activities. BMC Genomics. 2019;20:137. doi: 10.1186/s12864-019-5497-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Lin J, et al. Pipelines for cross-species and genome-wide prediction of long noncoding RNA binding. Nat. Protoc. 2019;14:795–818. doi: 10.1038/s41596-018-0115-5. [DOI] [PubMed] [Google Scholar]
- 20.Shen C, Ding Y, Tang J, Guo F. Multivariate information fusion with fast kernel learning to kernel ridge regression in predicting LncRNA-protein interactions. Front. Genet. 2018;9:716. doi: 10.3389/fgene.2018.00716. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Wu D, et al. Genome-wide association study combined with biological context can reveal more disease-related SNPs altering microRNA target seed sites. BMC Genomics. 2014;15:669. doi: 10.1186/1471-2164-15-669. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Cheung R, et al. A multiplexed assay for exon recognition reveals that an unappreciated fraction of rare genetic variants cause large-effect splicing disruptions. Mol. Cell. 2019;73:183–194.e8. doi: 10.1016/j.molcel.2018.10.037. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Zuallaert J, et al. SpliceRover: interpretable convolutional neural networks for improved splice site prediction. Bioinformatics. 2018;34:4180–4188. doi: 10.1093/bioinformatics/bty497. [DOI] [PubMed] [Google Scholar]
- 24.Wen J, Wang J, Zhang Q, Guo D. A heuristic model for computational prediction of human branch point sequence. BMC Bioinformatics. 2017;18:459. doi: 10.1186/s12859-017-1864-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Meher PK, Sahu TK, Rao AR, Wahi SD. A statistical approach for 5’ splice site prediction using short sequence motifs and without encoding sequence data. BMC Bioinformatics. 2014;15:362. doi: 10.1186/s12859-014-0362-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Nishizaki SS, et al. Predicting the effects of SNPs on transcription factor binding affinity. Bioinformatics. 2020;36:364–372. doi: 10.1093/bioinformatics/btz612. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Schwartz AM, et al. Multiple single nucleotide polymorphisms in the first intron of the IL2RA gene affect transcription factor binding and enhancer activity. Gene. 2017;602:50–56. doi: 10.1016/j.gene.2016.11.032. [DOI] [PubMed] [Google Scholar]
- 28.Gong Y, et al. Polymorphisms in microRNA target sites influence susceptibility to schizophrenia by altering the binding of miRNAs to their targets. Eur. Neuropsychopharmacol. 2013;23:1182–1189. doi: 10.1016/j.euroneuro.2012.12.002. [DOI] [PubMed] [Google Scholar]
- 29.Brest P, et al. A synonymous variant in IRGM alters a binding site for miR-196 and causes deregulation of IRGM-dependent xenophagy in Crohn’s disease. Nat. Genet. 2011;43:242–245. doi: 10.1038/ng.762. [DOI] [PubMed] [Google Scholar]
- 30.Liu C, et al. MicroRNA-34b inhibits pancreatic cancer metastasis through repressing Smad3. Curr. Mol. Med. 2013;13:467–478. doi: 10.2174/1566524011313040001. [DOI] [PubMed] [Google Scholar]
- 31.Maurano MT, et al. Systematic localization of common disease-associated variation in regulatory DNA. Science. 2012;337:1190–1195. doi: 10.1126/science.1222794. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Boyle EA, Li YI, Pritchard JK. An expanded view of complex traits: from polygenic to omnigenic. Cell. 2017;169:1177–1186. doi: 10.1016/j.cell.2017.05.038. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Módos D, et al. Neighbours of cancer-related proteins have key influence on pathogenesis and could increase the drug target space for anticancer therapies. npj Syst. Biol. Appl. 2017;3:2. doi: 10.1038/s41540-017-0003-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Ali S, et al. Understanding genetic heterogeneity in type 2 diabetes by delineating physiological phenotypes: SIRT1 and its gene network in impaired insulin secretion. Rev. Diabet. Stud. 2016;13:17–34. doi: 10.1900/RDS.2016.13.17. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Donn R, De Leonibus C, Meyer S, Stevens A. Network analysis and juvenile idiopathic arthritis (JIA): a new horizon for the understanding of disease pathogenesis and therapeutic target identification. Pediatr. Rheumatol. Online J. 2016;14:40. doi: 10.1186/s12969-016-0078-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Gazouli M, et al. Differential genetic and functional background in inflammatory bowel disease phenotypes of a Greek population: a systems bioinformatics approach. Gut Pathog. 2019;11:31. doi: 10.1186/s13099-019-0312-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Peters LA, et al. A functional genomics predictive network model identifies regulators of inflammatory bowel disease. Nat. Genet. 2017;49:1437–1449. doi: 10.1038/ng.3947. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Thomas JP, Modos D, Korcsmaros T, Brooks-Warburton J. Network biology approaches to achieve precision medicine in inflammatory bowel disease. Front. Genet. 2021;12:760501. doi: 10.3389/fgene.2021.760501. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Haberman Y, et al. Ulcerative colitis mucosal transcriptomes reveal mitochondriopathy and personalized mechanisms underlying disease severity and treatment response. Nat. Commun. 2019;10:38. doi: 10.1038/s41467-018-07841-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Türei D, et al. Integrated intra- and intercellular signaling knowledge for multicellular omics analysis. Mol. Syst. Biol. 2021;17:e9923. doi: 10.15252/msb.20209923. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Türei D, Korcsmáros T, Saez-Rodriguez J. OmniPath: guidelines and gateway for literature-curated signaling pathway resources. Nat. Methods. 2016;13:966–967. doi: 10.1038/nmeth.4077. [DOI] [PubMed] [Google Scholar]
- 42.Larivée P, et al. Platelet-activating factor induces airway mucin release via activation of protein kinase C: evidence for translocation of protein kinase C to membranes. Am. J. Respir. Cell Mol. Biol. 1994;11:199–205. doi: 10.1165/ajrcmb.11.2.8049080. [DOI] [PubMed] [Google Scholar]
- 43.Maloy KJ, Powrie F. Intestinal homeostasis and its breakdown in inflammatory bowel disease. Nature. 2011;474:298–306. doi: 10.1038/nature10208. [DOI] [PubMed] [Google Scholar]
- 44.Koizumi J, et al. Protein kinase C enhances tight junction barrier function of human nasal epithelial cells in primary culture by transcriptional regulation. Mol. Pharmacol. 2008;74:432–442. doi: 10.1124/mol.107.043711. [DOI] [PubMed] [Google Scholar]
- 45.Weiler F, Marbe T, Scheppach W, Schauber J. Influence of protein kinase C on transcription of the tight junction elements ZO-1 and occludin. J. Cell. Physiol. 2005;204:83–86. doi: 10.1002/jcp.20268. [DOI] [PubMed] [Google Scholar]
- 46.Carver EA, Jiang R, Lan Y, Oram KF, Gridley T. The mouse snail gene encodes a key regulator of the epithelial-mesenchymal transition. Mol. Cell. Biol. 2001;21:8184–8188. doi: 10.1128/MCB.21.23.8184-8188.2001. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47.Wang T, et al. Mist1 promoted inflammation in colitis model via K+-ATPase NLRP3 inflammasome by SNAI1. Pathol. Res. Pract. 2021;224:153511. doi: 10.1016/j.prp.2021.153511. [DOI] [PubMed] [Google Scholar]
- 48.Flint AJ, Tiganis T, Barford D, Tonks NK. Development of “substrate-trapping” mutants to identify physiological substrates of protein tyrosine phosphatases. Proc. Natl Acad. Sci. USA. 1997;94:1680–1685. doi: 10.1073/pnas.94.5.1680. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49.Haj FG, Markova B, Klaman LD, Bohmer FD, Neel BG. Regulation of receptor tyrosine kinase signaling by protein tyrosine phosphatase-1B. J. Biol. Chem. 2003;278:739–744. doi: 10.1074/jbc.M210194200. [DOI] [PubMed] [Google Scholar]
- 50.Sangwan V, et al. Regulation of the Met receptor-tyrosine kinase by the protein-tyrosine phosphatase 1B and T-cell phosphatase. J. Biol. Chem. 2008;283:34374–34383. doi: 10.1074/jbc.M805916200. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 51.Figueiredo A, Leal EC, Carvalho E. Protein tyrosine phosphatase 1B inhibition as a potential therapeutic target for chronic wounds in diabetes. Pharmacol. Res. 2020;159:104977. doi: 10.1016/j.phrs.2020.104977. [DOI] [PubMed] [Google Scholar]
- 52.Cho JY, et al. β-Caryophyllene attenuates dextran sulfate sodium-induced colitis in mice via modulation of gene expression associated mainly with colon inflammation. Toxicol. Rep. 2015;2:1039–1045. doi: 10.1016/j.toxrep.2015.07.018. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 53.Jassal B, et al. The reactome pathway knowledgebase. Nucleic Acids Res. 2020;48:D498–D503. doi: 10.1093/nar/gkz1031. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 54.Szklarczyk D, et al. The STRING database in 2021: customizable protein-protein networks, and functional characterization of user-uploaded gene/measurement sets. Nucleic Acids Res. 2021;49:D605–D612. doi: 10.1093/nar/gkaa1074. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 55.Massimino, L. et al. Inflammatory bowel disease transcriptome and metatranscriptome meta-analysis (IBD TaMMA) framework. Res. Sq. 10.21203/rs.3.rs-478844/v1 (2021). [DOI] [PMC free article] [PubMed]
- 56.Hyams JS, et al. Factors associated with early outcomes following standardised therapy in children with ulcerative colitis (PROTECT): a multicentre inception cohort study. Lancet Gastroenterol. Hepatol. 2017;2:855–868. doi: 10.1016/S2468-1253(17)30252-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 57.Márquez A, et al. Meta-analysis of Immunochip data of four autoimmune diseases reveals novel single-disease and cross-phenotype associations. Genome Med. 2018;10:97. doi: 10.1186/s13073-018-0604-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 58.Turatsinze J-V, Thomas-Chollier M, Defrance M, van Helden J. Using RSAT to scan genome sequences for transcription factor binding sites and cis-regulatory modules. Nat. Protoc. 2008;3:1578–1588. doi: 10.1038/nprot.2008.97. [DOI] [PubMed] [Google Scholar]
- 59.Medina-Rivera A, et al. RSAT 2015: regulatory sequence analysis tools. Nucleic Acids Res. 2015;43:W50–W56. doi: 10.1093/nar/gkv362. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 60.Grant CE, Bailey TL, Noble WS. FIMO: scanning for occurrences of a given motif. Bioinformatics. 2011;27:1017–1018. doi: 10.1093/bioinformatics/btr064. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 61.Betel D, Wilson M, Gabow A, Marks DS, Sander C. The microRNA.org resource: targets and expression. Nucleic Acids Res. 2008;36:D149–D153. doi: 10.1093/nar/gkm995. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 62.Agarwal, V., Bell, G. W., Nam, J.-W. & Bartel, D. P. Predicting effective microRNA target sites in mammalian mRNAs. eLife4, 10.7554/eLife.05005 (2015). [DOI] [PMC free article] [PubMed]
- 63.Sevignani C, Calin GA, Siracusa LD, Croce CM. Mammalian microRNAs: a small world for fine-tuning gene expression. Mamm. Genome. 2006;17:189–202. doi: 10.1007/s00335-005-0066-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 64.Schmidt C, Kosché E, Baumeister B, Vetter H. Arachidonic acid metabolism and intracellular calcium concentration in inflammatory bowel disease. Eur. J. Gastroenterol. Hepatol. 1995;7:865–869. [PubMed] [Google Scholar]
- 65.Samak G, et al. Calcium/Ask1/MKK7/JNK2/c-Src signalling cascade mediates disruption of intestinal epithelial tight junctions by dextran sulfate sodium. Biochem. J. 2015;465:503–515. doi: 10.1042/BJ20140450. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 66.Kellermann L, et al. Mucosal vitamin D signaling in inflammatory bowel disease. Autoimmun. Rev. 2020;19:102672. doi: 10.1016/j.autrev.2020.102672. [DOI] [PubMed] [Google Scholar]
- 67.Cross HS, Nittke T, Kallay E. Colonic vitamin D metabolism: implications for the pathogenesis of inflammatory bowel disease and colorectal cancer. Mol. Cell. Endocrinol. 2011;347:70–79. doi: 10.1016/j.mce.2011.07.022. [DOI] [PubMed] [Google Scholar]
- 68.Zhen Y, Zhang H. NLRP3 inflammasome and inflammatory bowel disease. Front. Immunol. 2019;10:276. doi: 10.3389/fimmu.2019.00276. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 69.Hu, G., Wu, Z., Uversky, V. N. & Kurgan, L. Functional analysis of human hub proteins and their interactors involved in the intrinsic disorder-enriched interactions. Int. J. Mol. Sci. 18, 2761 (2017). 10.3390/ijms18122761. [DOI] [PMC free article] [PubMed]
- 70.Cleynen I, et al. Inherited determinants of Crohn’s disease and ulcerative colitis phenotypes: a genetic association study. Lancet. 2016;387:156–167. doi: 10.1016/S0140-6736(15)00465-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 71.Stokkers PC, Reitsma PH, Tytgat GN, van Deventer SJ. HLA-DR and -DQ phenotypes in inflammatory bowel disease: a meta-analysis. Gut. 1999;45:395–401. doi: 10.1136/gut.45.3.395. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 72.Cariappa A, et al. Analysis of MHC class II DP, DQ and DR alleles in Crohn’s disease. Gut. 1998;43:210–215. doi: 10.1136/gut.43.2.210. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 73.Scaldaferri F, et al. VEGF-A links angiogenesis and inflammation in inflammatory bowel disease pathogenesis. Gastroenterology. 2009;136:585–95.e5. doi: 10.1053/j.gastro.2008.09.064. [DOI] [PubMed] [Google Scholar]
- 74.Stürzl M, Kunz M, Krug SM, Naschberger E. Angiocrine regulation of epithelial barrier integrity in inflammatory bowel disease. Front. Med. (Lausanne) 2021;8:643607. doi: 10.3389/fmed.2021.643607. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 75.Kim Y-K, Kim B, Kim VN. Re-evaluation of the roles of DROSHA, Export in 5, and DICER in microRNA biogenesis. Proc. Natl Acad. Sci. USA. 2016;113:E1881–E1889. doi: 10.1073/pnas.1602532113. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 76.Farh KK-H, et al. Genetic and epigenetic fine mapping of causal autoimmune disease variants. Nature. 2015;518:337–343. doi: 10.1038/nature13835. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 77.Sherry ST, et al. dbSNP: the NCBI database of genetic variation. Nucleic Acids Res. 2001;29:308–311. doi: 10.1093/nar/29.1.308. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 78.Mathelier A, et al. JASPAR 2016: a major expansion and update of the open-access database of transcription factor binding profiles. Nucleic Acids Res. 2016;44:D110–D115. doi: 10.1093/nar/gkv1176. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 79.Navarro Gonzalez J, et al. The UCSC Genome Browser database: 2021 update. Nucleic Acids Res. 2021;49:D1046–D1057. doi: 10.1093/nar/gkaa1070. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 80.Qi, Y. et al. HEDD: the human epigenetic drug database. Database (Oxford)2016, baw159 (2016). [DOI] [PMC free article] [PubMed]
- 81.Wang Z, et al. HEDD: human enhancer disease database. Nucleic Acids Res. 2018;46:D113–D120. doi: 10.1093/nar/gkx988. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 82.ENCODE Project Consortium. An integrated encyclopedia of DNA elements in the human genome. Nature. 2012;489:57–74. doi: 10.1038/nature11247. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 83.Lizio M, et al. Gateways to the FANTOM5 promoter level mammalian expression atlas. Genome Biol. 2015;16:22. doi: 10.1186/s13059-014-0560-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 84.Abugessaisa I, et al. FANTOM enters 20th year: expansion of transcriptomic atlases and functional annotation of non-coding RNAs. Nucleic Acids Res. 2021;49:D892–D898. doi: 10.1093/nar/gkaa1054. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 85.Roadmap Epigenomics Consortium. et al. Integrative analysis of 111 reference human epigenomes. Nature. 2015;518:317–330. doi: 10.1038/nature14248. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 86.Schultz MD, et al. Human body epigenome maps reveal noncanonical DNA methylation variation. Nature. 2015;523:212–216. doi: 10.1038/nature14465. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 87.Kozomara A, Griffiths-Jones S. miRBase: integrating microRNA annotation and deep-sequencing data. Nucleic Acids Res. 2011;39:D152–D157. doi: 10.1093/nar/gkq1027. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 88.Enright AJ, et al. MicroRNA targets in Drosophila. Genome Biol. 2003;5:R1. doi: 10.1186/gb-2003-5-1-r1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 89.Fabregat A, et al. The reactome pathway knowledgebase. Nucleic Acids Res. 2018;46:D649–D655. doi: 10.1093/nar/gkx1132. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 90.UniProt Consortium. UniProt: a worldwide hub of protein knowledge. Nucleic Acids Res. 2019;47:D506–D515. doi: 10.1093/nar/gky1049. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 91.Shannon P, et al. Cytoscape: a software environment for integrated models of biomolecular interaction networks. Genome Res. 2003;13:2498–2504. doi: 10.1101/gr.1239303. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 92.Ono K, Muetze T, Kolishovski G, Shannon P, Demchak B. Cyrest: turbocharging cytoscape access for external tools via a restful API. [version 1; peer review: 2 approved] F1000Res. 2015;4:478. doi: 10.12688/f1000research.6767.1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 93.Morris JH, et al. clusterMaker: a multi-algorithm clustering plugin for Cytoscape. BMC Bioinformatics. 2011;12:436. doi: 10.1186/1471-2105-12-436. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 94.Su G, Kuchinsky A, Morris JH, States DJ, Meng F. GLay: community structure analysis of biological networks. Bioinformatics. 2010;26:3135–3137. doi: 10.1093/bioinformatics/btq596. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 95.Girvan M, Newman MEJ. Community structure in social and biological networks. Proc. Natl Acad. Sci. USA. 2002;99:7821–7826. doi: 10.1073/pnas.122653799. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 96.Pedregosa, F. et al. Scikit-learn: Machine learning in Python. J. Machine Learn. Res.12, 2825–2830 (2011).
- 97.Hamming RW. Error detecting and error correcting codes. Bell Syst. Tech. J. 1950;29:147–160. doi: 10.1002/j.1538-7305.1950.tb00463.x. [DOI] [Google Scholar]
- 98.Eden E, Navon R, Steinfeld I, Lipson D, Yakhini Z. GOrilla: a tool for discovery and visualization of enriched GO terms in ranked gene lists. BMC Bioinformatics. 2009;10:48. doi: 10.1186/1471-2105-10-48. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 99.Supek F, Bošnjak M, Škunca N, Šmuc T. REVIGO summarizes and visualizes long lists of gene ontology terms. PLoS ONE. 2011;6:e21800. doi: 10.1371/journal.pone.0021800. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 100.Ritchie ME, et al. limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Res. 2015;43:e47. doi: 10.1093/nar/gkv007. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 101.Law CW, Chen Y, Shi W, Smyth GK. voom: Precision weights unlock linear model analysis tools for RNA-seq read counts. Genome Biol. 2014;15:R29. doi: 10.1186/gb-2014-15-2-r29. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
The immunochip SNP data were retrieved from the IBD bioresource database https://www.ibdbioresource.nihr.ac.uk/. The data are available under restricted access due to the clinical and so sensitive nature of the data. Access can be obtained by applying to the IBD Bio-resource through https://www.ibdbioresource.nihr.ac.uk/index.php/resources/applying-for-access-to-the-ibd-bioresource-panel-2/. The outcome of the pipeline is available in Supplementary Data 7 containing internal patient IDs, SNP-affected genes and the transcription factors and miRNAs. The transcriptomic data were downloaded from the GEO database accession: GSE109142.
The iSNP pipeline is available in the project GitHub page: https://github.com/korcsmarosgroup/iSNP, 10.5281/zenodo.6346651.