Skip to main content
Human Molecular Genetics logoLink to Human Molecular Genetics
. 2016 Sep 23;25(23):5254–5264. doi: 10.1093/hmg/ddw325

Finding lost genes in GWAS via integrative—omics analysis reveals novel sub-networks associated with preterm birth

Douglas Brubaker 1, Yu Liu 1,, Junye Wang 2, Huiqing Tan 2, Ge Zhang 3,4, Bo Jacobsson 5, Louis Muglia 4, Sam Mesiano 2, Mark R Chance 1,*
PMCID: PMC6078636  PMID: 27664809

Abstract

Maternal genome influences associate with up to 40% of spontaneous preterm births (PTB). Multiple genome wide association studies (GWAS) have been completed to identify genetic variants associated with PTB. Disappointingly, no highly significant SNPs have replicated in independent cohorts so far. We developed an approach combining protein-protein interaction (PPI) network data with tissue specific gene expression data to “find” SNPs of modest significance to identify candidate genes of functional importance that would otherwise be overlooked. This approach is based on the assumption that “high-ranking” SNPs falling short of genome wide significance may nevertheless indicate genes that have substantial biological value in understanding PTB. We mapped highly-ranked candidate SNPs from a meta-analysis of PTB-GWAS to coding genes and developed a PPI network enriched with PTB-SNP carrying genes. This network was scored with gene expression data from term and preterm myometrium to identify subnetworks of PTB-SNP associated genes coordinately expressed with labour onset in myometrial tissue. Our analysis consistently identified significant sub-networks associated with the interacting transcription factors MEF2C and TWIST1, genes not previously associated with PTB, both of which regulate processes clearly relevant to birth timing. Other genes in the significant sub-networks were also associated with inflammatory pathways, as well as muscle function and ion channels. Gene expression level dysregulation was confirmed for eight of these networks by qRT-PCR in an independent set of term and pre-term subjects. Our method identifies novel genes dysregulated in PTB and provides a generalized framework to identify GWAS SNPs that would otherwise be overlooked.

Introduction

Spontaneous preterm birth (PTB) is a complex disorder that accounts for the majority of neonatal mortality worldwide (1). Multiple biological and environmental factors are believed to converge upon common signalling pathways that result in either spontaneous premature rupture of foetal membranes or spontaneous preterm uterine contractions or both (2). Though environmental factors are significant, maternal genome influences are associated with up to 40% of all preterm births (2–6). Many genome wide association studies (GWAS) have attempted to identify single nucleotide polymorphisms (SNPs) associated with PTB (2,7–9). Though some SNPs achieve genome wide statistical significance within study cohorts, no studies have identified any SNP that replicates in an independent cohort subsequent to the usual multiple hypothesis testing corrections (2,7,9).

Although GWAS studies have identified many genes of functional interest across many diseases, these studies, as in the case of PTB, often identify only a few significant genes and thus provide minimal functionally relevant data to drive the development of biomarkers or novel targets. The conventional understanding of this phenomena is that “common” variants, which are targeted in most GWAS studies that include thousands of subjects, are associated with modest functional effects and that increased cohort sizes or higher resolution mapping of the genome (e.g. sequencing), or both are needed to better understand the genetic basis of complex disease phenotypes having substantial heritable components (2,10,11). (In the case of PTB, these limitations are compounded by the difficulties in defining the phenotype precisely.) Although this view has considerable merit, we suggest that the cellular interactions of common genetic variants, particularly in the context of tissue-specific expression effects, are important to driving phenotype in patient sub-sets. In this scenario, top-scoring SNPs associated with disease phenotypes in GWAS, are enriched for functions relevant to the phenotype, providing key leads to reveal the basic biology of the disease.

Network-based approaches are an effective platform for integrating diverse molecular data types for integrative omics analysis (12–15). Protein–protein interaction (PPI) networks are particularly powerful as an analysis framework since PPIs directly reflect functional actions of genes and may contain potentially uncharacterized functional relationships (15). As examples, PPI network frameworks have been successfully used to identify candidate genes and sub-networks associated with diseases from breast cancer to Alzheimer’s disease (16,17). SNPs associated with complex disorders (especially non-coding SNPs) may perturb gene expression preferentially in affected tissues (18–21) and in the context of PTB we suggest that these SNP-expression perturbations may be identified through an exploration of subnetwork modules of a PPI network. In total, our work suggests a novel method to “rescue” modestly significant SNPs in GWAS studies through an examination of tissue specific dysregulation in a PPI network specific context. In this case the approach identifies novel candidate genes, in particular MEF2C and TWIST, as important targets for future investigation in the control of birth timing.

Results

Tissue agnostic PTB-SNP enriched PPI network

Candidate SNPs were obtained from a meta-analysis of three independent preterm birth GWAS constituting a cohort of 3,485 mother-child pairs (11). We selected 250 SNPs with p-values between 107 and 104 that are associated with 236 genes within 20kb. A gene ontology (GO) analysis of these candidate genes for the enrichment of biological processes and cellular component functions revealed significant (Bonferroni P <  0.05) enrichments of: single-organism cellular process, single-organism process, cell proliferation, cell part, and cell (Supplementary Material, Table S1)) (22,23). As the PTB-SNP gene set did not identify other than general biological pathways, we expanded this list to include interacting encoded proteins in a protein interaction network.

The 236 PTB-SNP candidate genes were mapped onto a high confidence STRING (Search Tool for Retrieval of Interacting Genes/Proteins) network. Of the 236 candidate genes, 56 encoded either microRNAs or did not have any associated protein in the full STRING database and 66 were excluded due to not being part of the filtered set of 10,174 proteins with interaction confidence greater than 0.5. The remaining 114 candidate genes were mapped to STRING and were then linked using a Steiner Tree algorithm by adding 91 topologically related Steiner Nodes where 327 edges were required to minimally connect the 205 total genes (Fig. 1). This connected network contrasts with the sparse disconnected network of the original 236 candidate genes where only 9 interactions among 16 genes were present (Supplementary Material, Fig. S1). Further, a GO analysis of the 205 PTB-SNP genes and topologically related genes showed significant enrichment (Bonferroni P < 0.05) of 629 biological and cellular processes (Supplementary Material, Table S2) (22,23). This list of enriched biological and cellular processes is too large to be interrogated efficiently and thus required further refinement through the integration of functional mRNA expression data.

Figure 1.

Figure 1.

PPI network of SNP carrying seed genes and interacting partners recruited by the Steiner Tree algorithm. Green nodes indicate PTB-SNP genes mapped onto the STRING PPI Network (114 genes) and red nodes indicate proteins added by the Steiner Tree algorithm (93 genes).

PTB-SNP-enriched subnetworks in term and preterm labour

To identify dysregulated modules within this PTB network at the level of tissue specific gene expression, we used SASSy, a Subnetwork Analysis and Scoring System (15), which aggregates gene expression across selected modules and evaluates dysregulation using mutual information (MI). Using either a transcriptome dataset from term labour tissues (5 term non-labour or TNL vs. 5 term in-labour samples or TIL), (24) or preterm labour tissues (6 preterm non-labour or PTNL vs. 6 preterm in-labour or PTIL), we evaluated all possible combinations of 2–5 genes within the network of Fig. 1 for dysregulation with respect to the labouring phenotype (25,26). Subnetworks that associated with the phenotypes of term and preterm labour were considered significant if the MI of the subnetworks passed both a phenotype and gene set permutation test with P < 0.05 (full details in Methods “Subnetwork Analysis and Scoring”).

This analysis using SASSy identified 22 significant subnetworks of 2-5 genes associated with term labour, 15 of which included myocyte enhancer factor-2C (MEF2C). MEF2C is a transcription factor known to suppress inflammatory pathways in endothelial cells (27) and is downregulated with the onset of labour in myometrial transcriptome datasets consistent with the established theme that increases in inflammatory signalling promote labour (24–26). MEF2C is not within 20kb of a PTB-SNP, but was coordinately downregulated with several PTB-SNP carrying genes with known biological functions associated with muscle function, ion channel, prostaglandin, and inflammation in labour (Fig. 2) (28–30). These co-regulated genes are not direct transcriptional targets of MEF2C. Instead, MEF2C acts to modulate these downstream genes through the intermediate nodes, EP300, HDAC9, and HDAC5, all of which are direct transcriptional targets of MEF2C (28,29).

Figure 2.

Figure 2.

Term transcriptome MEF2C networks. MEF2C networks are shown grouped by associated function of coordinately regulated genes. Green nodes carry PTB SNPs and red nodes were recruited by the Steiner tree algorithm. The subnetwork functional categories are (A) Ion Channel Subnetworks, (B) Muscle Function Subnetworks and (C) KLF/Inflammation Subnetworks.

Sub-networks including MEF2C and the genes CACNB2, DPP6, KCNAB1, and KCNJ9 were associated with ion channel function (Fig. 2A) while HDAC9 and CACNB2 also scored highly as an ion channel associated subnetwork (28,30) (Fig. 2A). Muscle function associated networks included those containing MEF2C and the genes FHL1 (known to be important in skeletal muscle), GRP (smooth muscle), LGALS2 (cardiac muscle), and MYBPC1 (skeletal muscle) (28,30) (Fig. 2B). FHL1 is also associated with ion channel binding (28,30). The inflammation and prostaglandin associated MEF2C sub-networks included those containing KLF12, RPS6KA5, and PLA2G4C (28,30) (Fig. 2C). Within the inflammatory process associated networks the genes KLF12 and RPS6KA5 also scored highly as a separate subnetwork in addition to being co-regulated with MEF2C (Fig.23C). All of these subnetworks are functionally associated with myometrial contractions in labour and together contain 12 PTB associated SNPS. These sub-networks may be important to regulating term labour and their functional expression may be altered in the presence of the associated PTB-SNPs.

SASSy also identified 38 significant subnetworks associated with preterm labour. Two subnetworks contained MEF2C and four contained a repressor of MEF2C, TWIST1 (Twist related protein 1) (Fig. 3). TWIST1 acts to modulate downstream genes in the subnetworks through its direct transcriptional targets EP300 and RELA (28,29). TWIST1 carries a preterm birth associated SNP (P = 2.74*105), is upregulated in term labour, and is part of negative feedback loop for the cytokines TNF-α and IL-1β in the NF-κB signalling pathway (31). In birth timing, this up-regulation could be important to suppressing MEF2C to enhance the effect of inflammatory signalling and to promote labour and emptying of the uterus. However, in the preterm labour cohorts, TWIST1 is downregulated along with the genes DLG1, PAX6, COG4, PLAT, NR2C2, and CDC42 (Fig. 3A). These genes play diverse roles in ion channel binding, transcriptional regulation, and stress response (28,30). Both preterm labour MEF2C associated subnetworks contained a PTB-SNP carrying gene, the calcium independent phospholipase PLA2G4C (P = 1.85*105). PLA2G4C has been previously associated with preterm birth as a prostaglandin synthesis disruptor that acts independent of pro-labour oxytocin signalling (32). The preterm labour networks also contained the muscle function gene LGALS2 and transcription factor RUNX1, both of which carry PTB-SNPs (Fig. 3B) (28,30).

Figure 3.

Figure 3.

Preterm transcriptome TWIST1 and MEF2C networks. Six networks were found to be coordinately regulated in preterm myometrium associated with function in term myometrium. Green nodes carry PTB-SNPs and red nodes were recruited by the Steiner Tree algorithm in the (A) TWIST1 networks and (B) MEF2C networks.

Confirmation of transcriptional activity in a MEF2C module enriched with eight PTB-SNP genes

We performed qRT-PCR to evaluate the expression of significant genes in SASSy subnetworks in two independent cohorts of term (5 TNL, 5 TIL) and preterm (5 PTNL, 3 PTIL) myometrium samples. Our aim was to confirm the hypothesis of dysregulated individual gene expression activity and aggregated subnetwork activity identified by SASSy in the original transcriptome datasets (Fig. 4). Since gene expression data from whole genome approaches differs from qRT-PCR in the source and magnitude of errors and noise, different normalization approaches are required for the data (33). For the qRT-PCR sample cohort, we assessed the individual gene differential expression and network differential expression using a Wilcoxon Mann-Whitney test (P < 0.10) on relative individual gene expression and aggregated network activity defined by a network activity norm (NAN) (For full details on defining NAN, see Methods Network Expression Confirmation with qRT-PCR).

Figure 4.

Figure 4.

Heatmaps of significant subnetwork genes. Heatmaps showing the expression patterns of genes identified by SASSy in term (left) and preterm labour (right) gene expression data.

After two attempts, the primers for PAX6, DLG1, KCNJ9, MYBPC1, and GRP failed to generate identical melt curves. These genes were excluded from further analysis. Tables 1 and 2 show the predicted gene expression fold changes (IL/NL) and qRT-PCR fold changes of genes in the term and preterm labour subnetworks respectively. Four individual genes, MEF2C (P∼0.03), HDAC9 (P∼0.095), KLF12 (P∼0.007), and CACN2B (P∼0.055), were differentially expressed in term labour networks and none were differentially expressed in preterm labour subnetworks (Tables 1 and 2).

Table 1.

Term labour subnetwork individual gene differential expression. qRT-PCR results for the term myometrium subnetworks. Fold changes are shown for each gene in the original transcriptome study and qRT-PCR cohort. All fold changes are calculated as labouring expression relative to non-labouring expression. The significance of the qRT-PCR cohort fold change is assessed by the Wilcoxon Mann Whitney test with P < 0.1 considered significant.

Gene Chan et al. Fold Change qRT-PCR Fold Change p-value
CACNB2 0.446 0.467 0.056
DPP6 0.355 0.586 0.15
FHL1 0.358 0.601 0.22
HDAC9 0.254 0.585 0.095
KCNAB1 0.385 0.376 0.22
KLF12 0.289 0.538 0.008
LGALS2 1.43 0.687 0.15
MEF2C 0.528 0.623 0.032
PLA2G4C 0.524 0.752 0.31
RPS6KA5 0.316 0.432 0.15

Table 2.

Preterm labour subnetwork individual gene differential expression. qRT-PCR results for the preterm myometrium subnetworks. Fold changes are shown for each gene in the original transcriptome study and qRT-PCR cohort. All fold changes are calculated as labouring expression relative to non-labouring expression. The significance of the qRT-PCR cohort fold change is assessed by the Wilcoxon Mann Whitney test with P < 0.1 considered significant.

Genes Weiner et al. Fold Change Bethin et al. Fold Change qRT-PCR Fold Change P-value
CDC42 1.15 0.832 0.849 0.39
COG4 0.951 0.949 0.845 0.57
LGALS2 1.41 2.11 1.20 0.39
MEF2C 1.13 0.508 0.945 0.79
NR2C2 0.887 0.977 0.846 0.57
PLA2G4C 0.944 0.851 1.30 0.57
PLAT 1.047 1.49 0.604 0.14
PYGO1 1.017 0.953 1.01 0.79
RUNX1 1.24 1.54 1.43 0.79
TWIST1 0.967 1.20 0.697 0.14

Analysis of sub-network level differential expression revealed that eight of the term myometrium networks were confirmed as differentially expressed, with p-values ranging from 0.008 to 0.056 (Table 3). Four confirmed subnetworks were associated with ion channel function (MEF2C-DPP6, MEF2C-CACNB2, MEF2C-KCNAB1, HDAC9-CACNB2), one sub-network is associated with muscle function (MEF2C-LGALS2), and three sub-networks are associated with inflammation (MEF2C-RPS6KA5, KLF12-RPS6KA5, MEF2C-KLF12). The confirmed sub-networks were merged into one combined sub-network and coloured by differential expression in term labour (Fig. 5). PLA2G4C is included despite not being coordinately expressed with MEF2C as it is one of two subnetwork genes that replicates in term and preterm sub-networks and has a prior association with PTB (32).

Table 3.

Assessment of network differential expression. Term and preterm myometrium network significance as assessed by aggregating subnetwork activity for each patient with the NAN and tested with the Wilcoxon Mann–Whitney test (P < 0.10 significant).

Network Phenotype P-value
MEF2C-CACNB2 Term 0.056
MEF2C-DPP6 Term 0.016
MEF2C-KCNAB1 Term 0.031
MEF2C-FHL1 Term 0.22
HDAC9-CACNB2 Term 0.056
MEF2C-LGALS2 Term 0.032
MEF2C-PLA2G4C Term 0.22
MEF2C-KLF12 Term 0.008
KLF12-RPS6KA5 Term 0.008
MEF2C-RPS6KA5 Term 0.031
TWIST1-DLG1-PYGO1 Preterm 1.00
TWIST1-PAX6-DLG1 Preterm 0.14
TWIST1-COG4-PLAT Preterm 0.57
TWIST1-NR2C2-CDC42 Preterm 0.25
MEF2C-RUNX1-PLA2G4C Preterm 0.79
MEF2C-PLA2G4C-LGALS2 Preterm 0.79

Figure 5.

Figure 5.

Confirmed MEF2C module. Summary of the confirmed coordinate expression of MEF2C and downstream labour associated genes merged into one network. (A) Predicted regulation (red-upregulated, blue-downregulated) of MEF2C associated genes based on SASSy results. (B) Measured regulation of MEF2C associated genes based on qRT-PCR. (C) PTB-SNP status of genes in the MEF2C module. Green nodes carry a PTB-SNP while red nodes were recruited by the Steiner Tree algorithm.

Discussion

We performed an integrative PPI network analysis that included preterm birth GWAS and myometrium tissue transcriptome data to identify key sub-networks of genes and proteins potentially regulating the onset of term and preterm labour. This analysis of GWAS data within a PPI network revealed novel functional connections between coding genes close to SNPs that have p-values suggestive in multiple GWAS studies but do not pass the threshold of replicable genome wide significance. The subnetworks identified in the preterm and term myometrium include the transcription factors TWIST1 and MEF2C, which are functionally related. TWIST1 is a direct repressor of MEF2C via EP300 (28,30). The observed up-regulation of TWIST in term labour is consistent with the observed de-repression of MEF2C and is consistent with the enhanced inflammatory signalling associated with birth timing. The downstream effects of TWIST1 and MEF2C also include modulation of ion channel function, muscle cell functions, prostaglandin synthesis (28,30), all of which are involved with the transition of the myometrium from the quiescent to labouring state. TWIST1 and MEF2C have not previously been associated with birth timing and this analysis suggests that a subset of preterm birth associated SNPs may act through TWIST1 and MEF2C associated mechanisms that are manifested in premature uterine contractions. For the eight MEF2C associated subnetworks we confirmed coordinate downregulation with the onset of term labour (Fig. 5). Though we do not know how this combined module functions in preterm labour, the enrichment of this term labour-associated subnetwork including eight modestly significant PTB-SNP carrying genes suggests that dysregulation of this module has the potential to contribute to preterm labour triggers.

We assessed whether the PTB-SNPs in the MEF2C modules were known expression quantitative trait loci (eQTL) by searching for the SNPs in the Genotype-Tissue Expression (GTEx) project website (34). While none of the PTB-SNPs were found to be eQTLs, we also assessed whether they were in linkage disequilibrium (LD) with any other known eQTLs. We searched for pairwise LD between the SNPs in the MEF2C module and all known eQTLs for the module genes using the SNP Annotation and Proxy Search (SNAP) tool (35). Though none of the MEF2C module SNPs were in LD with known eQTLs, GTEx does not currently have eQTL data for either smooth muscle or myometrium tissues (34). Further studies are required to assess whether the SNPs in the MEF2C module constitute novel predicted eQTLs.

The term myometrium subnetworks of MEF2C-LGALS2 and MEF2C-PLA2G4C are the only ones to also be identified by SASSy as dysregulated in the preterm myometrium. While these genes were all downregulated with the onset of term labour, in preterm labour both LGALS2 and PLA2G4C were found to be upregulated by qRT-PCR. Additionally, the preterm MEF2C, LGALS2, and PLA2G4C 3-gene subnetwork suggests stronger co-regulation of these genes in the preterm myometrium than the term myometrium where they formed two distinct subnetworks. This suggests that this module of MEF2C, PLA2G4C, and LGALS2 may be worthy of further investigation as a differentially regulated sub-network in term and preterm labour.

PLA2G4C is a calcium-independent phospholipase that has been shown to regulate prostaglandin synthesis independent of other labour signals such as oxytocin (32). Disruption of PLA2G4C function both from internal SNPs and the upstream disruption of MEF2C by TWIST1 SNPs may produce a synergistic combined effect to prime the myometrium for preterm labour. Though PLA2G4C acts independent of intracellular calcium signalling pathways, our results suggest that SNPs on upstream regulators of PLA2G4C could influence calcium and potassium signalling pathways in parallel to PLA2G4C to prematurely initiate the myometrium’s contractile phenotype.

As we focused on identifying dysregulated sub-networks as opposed to single genes, we introduced a metric for quantifying sub-network activity for significance testing called the network activity norm (NAN). Since our study focused on sub-networks of genes where the expected differential expression pattern was in the same direction for all genes, the NAN is an appropriate metric for capturing the overall activation or deactivation of a subnetwork between phenotypes. Sub-networks where some genes increase and others decrease expression between phenotypes would not be appropriate candidates for evaluation via the NAN. One can easily see that a two-gene subnetwork where one gene activates and the other deactivates between phenotypes could produce the same NAN in each phenotype despite a drastic change in network component activity. A possible solution to this problem would be to subdivide networks into smaller units where the genes have the same differential expression pattern.

There are several possible improvements and extensions to our study. A larger cohort of patients or the recruitment of additional cohorts to a preterm birth GWAS meta-analysis would increase the power of the analysis and enable identification of additional candidate genes for network construction and scoring. In addition, use of other network frameworks with more interactions would increase mapping of genes to PPIs, as well an integration of miRNA associated SNPs with their potential downstream targets would enrich the network framework. One could also be stricter or more lenient with the 20kb threshold of associating candidate SNPs with genes or consider trans SNP effects (eSNPs) as well as cis-based associations. Another opportunity includes the potential for direct proteomics measurements to extend the exploration of functional dysregulation to the level of protein expression. The challenges of detecting low abundance proteins and the modest coverage of proteomics data necessitated using transcriptome data to score the subnetworks in this study. A guiding principle of our approach was that though mRNA expression is not always well correlated with protein activity, PPI subnetworks with dysregulated mRNA activity have very high correlations with protein level dysregulation for the same subnetworks (15,36,37) motivating our approach to confirming the subnetworks at the mRNA level in this initial study. Based on this proposed correlation, attempts to confirm protein level dysregulation are likely to be fruitful.

Further, though we focused on the myometrium and preterm labour as defined by premature uterine contractions, premature rupture of foetal membranes accounts for a significant portion of preterm birth (4). The advantage of our approach is that the PPI network we constructed, which is enriched with PTB-SNP-carrying candidate genes, is tissue agnostic and is highly useful for other studies in other tissues. For example, the PTB-SNP enriched PPI network could be scored with other maternal gene expression data (e.g. decidua, cervix, etc.) to identify other tissue specific PPI sub-networks associated with preterm birth. However, the available transcriptome data for the non-labouring and labouring foetal membranes in gene expression omnibus (38) is sparse and makes conducting such an analysis impossible at this time. This network is thus a valuable resource for the preterm birth community and is available as part of this manuscript’s Supplementary Material.

The overall hypothesis of our approach is that a complex disorder like preterm birth not the product of a single large genetic disturbance is the product of several perturbations acting through common functional pathways and networks. Our identification of a set of overlapping MEF2C-TWIST1 subnetworks and a set of eight MEF2C regulatory sub-networks (Fig. 5) implicates these transcription factors as key drivers of parturition. Further studies are required to assess the therapeutic potential of targeting this pathway, in particular TWIST1, LGALS2, PLA2G4C, and MEF2C, to prevent preterm labour.

Furthermore, our approach demonstrates that genes in the GWAS analysis with a modest significance of associated SNPs may in fact be highly enriched for relevant candidate genes that may explain a complex disorder. We have presented a generalizable approach for “rescuing” genes that traditional GWAS analysis would overlook and shown how the integration of these genes with PPI network and functional genomics data leads to novel associations and hypotheses to explain complex disorders.

Materials and Methods

Multiple high dimensional PTB datasets were analysed in a protein-protein interaction network framework to identify genetically driven subnetworks associated with the preterm myometrial contraction phenotype (Fig. 6). Candidate genes (within of PTB-SNPs) were used to seed a PPI network enriched with these genes and their interacting partners. The search space was thus constrained to a set of high confidence protein interactions in the network neighbourhood of the PTB SNPs. This tissue agnostic network was then scored with transcriptome data from term and preterm myometrium to identify subnetworks associated with the onset of the term and preterm labour phenotypes. Subnetworks were then confirmed with qRT-PCR and compared against dbPTB to search for known associations with PTB.

Figure 6.

Figure 6.

Workflow diagram. Candidate preterm birth associated SNP carrying genes are mapped onto a protein–protein interaction network and connected along the shortest paths between them using a Steiner tree algorithm. The resulting network is enriched for the preterm birth associated candidates and includes topologically associated interacting proteins. Transcriptome data are used to score the enriched PPI network and identify groups of genes whose expression discriminated between in-labour and non-labouring samples in datasets of term and preterm myometrium samples.

Datasets

A list of 250 SNPs (P-values between 107 and 104, 236 genes within 20kb) was obtained from a meta-analysis of three GWAS of preterm birth (11) (Supplementary Material, Table S3). The meta-analysis analysed cohorts of women from Denmark (8), Norway (7), and Finland (9). Though the SNPs were not statistically significant at the genome wide level, they served as a list of “genomic seed genes” that had the highest statistical significance in the meta-analysis.

Data from three transcriptome studies of the human pregnancy myometrium, one RNA-seq (24) and two microarray datasets (25,26), were obtained to assemble a cohort of five term non-labouring (TNL), five term in-labour (TIL), six preterm non-labouring (PTNL), and six preterm in-labour (PTIL) samples. We selected probes common to both microarray datasets and converted expression values to z-scores. The two cohorts were then merged into one gene expression matrix for subnetwork scoring.

Expanded network construction

A PPI network neighbourhood was constructed from the seed genes using the STRING Protein-Protein Interaction Network Database (28). STRING curates thousands of protein interactions and assesses the confidence of each based on the type and amount of evidence for the interactions. We obtained the full database and filtered for medium confidence edges (edge weight > 0.5) to obtain a network of 51,256 protein-protein interactions between 10,174 proteins where proteins were represented by their coding genes. Seed genes from the GWAS meta-analysis were mapped onto the STRING PPI network and connected along shortest paths using a Steiner Tree algorithm (39) such that a minimum number recruited genes, Steiner-Nodes, were added to the network. The resulting network contains both seed genes carrying PTB associated SNPs and potentially important interacting partners not in our original list. The network was visualized using Cytoscape (40).

Subnetwork analysis and scoring algorithm

We mined the PTB-SNP enriched PPI network for subnetworks of genes coordinately differentially expressed in labour using the Subnetwork Analysis and Scoring System (SASSy) algorithm (13–15). SASSy first aggregated the activity of a subnetwork using transcriptome data from term or preterm myometrium and then computed the mutual information (MI) between subnetwork activity and phenotype. Here, MI serves as a measure of the dependence between subnetwork activity and phenotype (labouring vs. non-labouring myometrium). Subnetworks with high MI are likely to be important drivers of phenotypic change from a relaxed to the contractile uterus since the higher the MI the better the subnetwork distinguishes between phenotypes. We searched for such subnetworks in term labour (TNL vs. TIL) and preterm labour (PTNL vs. PTIL). SASSy used the term myometrium transcriptome data to assess MI between subnetwork activity and the phenotypes of TNL and TIL. This was then repeated for the preterm myometrium transcriptome data and the phenotypes of PTNL and PTIL. The resulting subnetworks were then compared to assess functional connections and differences between term and preterm labour.

Coordinate differential expression of a subnetwork was assessed by aggregating the gene expression (inferred from mRNA abundance) in the subnetwork to compute a sub-network activity, and comparing the expression across phenotypes. Sub-network activity was defined as the aggregated gene expression (i.e., mRNA abundance) of the subnetwork of genes for a given sample. SASSy searched the PTB-SNP enriched network for all possible combinations of 2-5 genes and computed the sub-network activity of that group of genes (13,14). The restriction to small subnetworks of coordinately differentially expressed genes was purely for computational reasons and in principle one could search for larger subnetworks with sufficient computing power.

Two permutation tests were performed to assess the significance of the mutual information score for a subnetwork and test the null hypothesis that a subnetwork did not associate with a particular phenotype. The first test permuted the sample labels 100,000 times between phenotypes to randomize the patient groups (labouring and non-labouring) while preserving the expression correlations between genes for a given sample. The second test permuted the gene labels 1,000,000 times while preserving the patients in their respective phenotype groups. In each test, a null distribution for mutual information is estimated for each subnetwork size, 2-5 genes, and the cumulative distribution function (CDF) was computed for the distribution. The significance of the mutual information for a real network was then determined by evaluating the CDF at that value of mutual information (15). For example, a value of 95% from the CDF indicates that there is a 5% chance (P = 0.05) of observing a higher mutual information value under the null hypothesis. Subnetworks were considered significant only if they passed both permutation tests with P < 0.05.

Sometimes SASSy identifies coordinately regulated genes which are not direct neighbours in the Steiner network used for scoring (i.e., PTB-SNP enriched PPI Network). When this occurred the coordinately regulated candidate genes were connected along all possible shortest paths in the PTB-SNP enriched PPI network.

Network expression confirmation by qRT-PCR

Total RNA was extracted from the uterine tissue obtained from the lower uterine segment at the time of caesarean delivery. Samples were collected at term (≥ 37 weeks) before (n = 5) and after (n = 5) and preterm (≤ 37 weeks) before (n = 5) and after (n = 3) the onset of active labour defined by forceful and rhythmic uterine contractions and ≥ 4 cm cervical dilation. Tissue was collected with patient consent (IRB approval # 11-04-06) at MacDonald Women’s, University Hospitals of Cleveland. Total RNA was isolated as previously described (41). Genomic DNA was degraded by DNase treatment (Applied Biosystems). RNA was ethanol precipitated, resuspended in water, and quantified by absorbance at 260 nm. For quantitative RT-PCR, total RNA (400 ng) was reverse transcribed with random primers using Superscript II reverse transcriptase (Life Technologies). Primers for specific target mRNAs were designed using the Primer Express software (Applied Biosystems) based on published sequences (Supplementary Material, Table S4). Assays were optimized and validated for all primer sets by confirming that the single amplicons of appropriate size and sequence were generated and that the priming and amplification efficiencies of all primer pairs were identical. PCR was performed in the presence of SYBR Green (Applied Biosystems) in an ABI PRISM 7500 Sequence Detector (Applied Biosystems). The cycling conditions were 50°C for 2 min, 95°C for 10 min., and 40 cycles of 95°C for 15 s, 60°C for 1 min. The cycle at which the fluorescence reached a preset threshold (cycle threshold = CT) was used for quantitative analyses. The threshold in each assay was set at a level where the rate of exponential increase in amplicon abundance was approximately parallel between all samples. Messenger RNA abundance data were expressed relative to the abundance of the constitutively expressed glyceraldehyde 3-phosphate dehydrogenase (GAPDH) using the ΔCT method (i.e., relative mRNA abundance = 2-(CT gene of interest - CT 18S rRNA)).

The significance of the qRT-PCR results was assessed both at the individual gene and network level. Though qRT-PCR, is more targeted and sensitive to detecting an individual gene expression, it is vulnerable to amplifying errors and noise exponentially in a way that whole genome approaches (RNA-seq or microarray) are not (33). SASSy was designed for and has only been applied to score networks using data from whole genome approaches (13–15) and does not account for the potential sources of error particular to qRT-PCR measurements. Therefore a statistical approach resistant to potential outliers and appropriate for smaller cohort sizes is required to assess significance. Individual gene differential expression was assessed using the Wilcoxon-Mann Whitney test on the relative mRNA abundance values between the labouring and non-labouring samples.

To assess subnetwork differential expression for the qRT-PCR data, we introduced a metric for quantifying subnetwork activity for significance testing called the network activity norm. The activity of the subnetwork for each sample was aggregated into a network activity norm (NAN) defined as the Euclidean norm of the relative mRNA abundance values for each gene in the network. Let n be a vector of abundance values for a k gene subnetwork. The NAN of n is defined as n=g12+g22+···+gk2 where gi is the expression of gene i in subnetwork n. By squaring the expression values of each gene, values that are between 0 and 1, we can reduce the contribution of amplification errors to the ultimate value of n.

A Wilcoxon-Mann Whitney test was performed between labouring and non-labouring samples using the NANs for each subnetwork identified by SASSy. Since not all primers succeeded in measuring gene expression in the independent cohort, we calculated the NANs with whatever genes were successfully measured by qRT-PCR. In both the single gene and network cases a p value less than 0.1 was considered a confirmatory p-value for the qRT-PCR.

Supplementary Material

Supplementary Material is available at HMG online.

Supplementary Material

Supplementary Data

Acknowledgements

This manuscript benefitted greatly from comments by Mehmet Koyuturk and Scott Williams. We would also like to thank Olivia Corradin, Alethea Barbaro and Jill Barnholtz-Sloan for their invaluable discussions.

Conflict of Interest statement. None declared.

Funding

This work was supported by NIH Grant T32HL007567, the Clinical and Translational Science Collaborative of Cleveland, UL1TR000439 from the National Center for Advancing Translational Sciences (NCATS) component of the National Institutes of Health and NIH roadmap for Medical Research, NIH R01LM01124 the March of Dimes Prematurity Research Center Ohio Collaborative, The Global Alliance to Prevent Prematurity and Stillbirth and the Eunice Kennedy Shriver National Institute of Child Health and Human Development (HD069819).

References

  • 1. Ananth C.V., Vintzileos A.M. (2006) Epidemiology of preterm birth and its clinical subtypes. J. Matern. Fetal Neonatal. Med., 19, 773–782. [DOI] [PubMed] [Google Scholar]
  • 2. Zhang H., Baldwin D.A., Bukowski R.K., Parry S., Xu Y., Song C., Andrews W.W., Saade G.R., Esplin M.S., Sadovsky Y., et al. (2015) A genome-wide association study of early spontaneous preterm delivery. Genet. Epidemiol., 39, 217–226. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3. Chaudhari B.P., Plunkett J., Ratajczak C.K., Shen T.T., DeFranco E.A., Muglia L.J. (2008) The genetics of birth timing: insights into a fundamental component of human development. Clin. Genet., 74, 493–501. [DOI] [PubMed] [Google Scholar]
  • 4. Plunkett J., Borecki I., Morgan T., Stamilio D., Muglia L.J. (2008) Population-based estimate of sibling risk for preterm birth, preterm premature rupture of membranes, placental abruption and pre-eclampsia. BMC Genetics, 9, 44.. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5. Plunkett J., Feitosa M.F., Trusgnich M., Wangler M.F., Palomar L., Kistka Z.A., DeFranco E.A., Shen T.T., Stormo A.E., Puttonen H., et al. (2009) Mother's genome or maternally-inherited genes acting in the fetus influence gestational age in familial preterm birth. Hum. Hered., 68, 209–219. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6. Svensson A.C., Sandin S., Cnattingius S., Reilly M., Pawitan Y., Hultman C.M., Lichtenstein P. (2009) Maternal effects for preterm birth: a genetic epidemiologic study of 630,000 families. Am. J. Epidemiol., 170, 1365–1372. [DOI] [PubMed] [Google Scholar]
  • 7. Myking S., Boyd H.A., Myhre R., Feenstra B., Jugessur A., Devold Pay A.S., Ostensen I.H., Morken N.H., Busch T., Ryckman K.K., et al. (2013) X-chromosomal maternal and fetal SNPs and the risk of spontaneous preterm delivery in a Danish/Norwegian genome-wide association study. PloS One, 8, e61781.. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8. Olsen J., Melbye M., Olsen S.F., Sorensen T.I., Aaby P., Andersen A.M., Taxbol D., Hansen K.D., Juhl M., Schow T.B., et al. (2001) The Danish National Birth Cohort–its background, structure and aim. Scand. J. Public Health., 29, 300–307. [DOI] [PubMed] [Google Scholar]
  • 9. Plunkett J., Doniger S., Orabona G., Morgan T., Haataja R., Hallman M., Puttonen H., Menon R., Kuczynski E., Norwitz E., et al. (2011) An evolutionary genomic approach to identify genes involved in human birth timing. PLoS Genetics, 7, e1001365.. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10. Visscher Peter, M., Brown Matthew, A., McCarthy Mark, I., Yang J. (2012) Five Years of GWAS Discovery. Am. J. Hum. Genet., 90, 7–24. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11. Zhang G., Bacelis J., Lengyel C., Teramo K., Hallman M., Helgeland O., Johansson S., Myhre R., Sengpiel V., Njolstad P.R., et al. (2015) Assessing the Causal Relationship of Maternal Height on Birth Size and Gestational Age at Birth: A Mendelian Randomization Analysis. PLoS Medicine, 12, e1001865.. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12. Liu Y., Maxwell S., Feng T., Zhu X., Elston R.C., Koyuturk M., Chance M.R. (2012) Gene, pathway and network frameworks to identify epistatic interactions of single nucleotide polymorphisms derived from GWAS data. BMC Syst. Biol., 6 Suppl 3, S15.. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13. Liu Y., Koyuturk M., Maxwell S., Zhao Z., Chance M.R. (2012) Integrative analysis of common neurodegenerative diseases using gene association, interaction networks and mRNA expression data. AMIA Jt Summits Transl Sci Proc., 2012, 62–71. [PMC free article] [PubMed] [Google Scholar]
  • 14. Liu Y., Patel S., Nibbe R., Maxwell S., Chowdhury S.A., Koyuturk M., Zhu X., Larkin E.K., Buxbaum S.G., Punjabi N.M., et al. (2011) Systems biology analyses of gene expression and genome wide association study data in obstructive sleep apnea. Pac Symp Biocomput., 14–25. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15. Nibbe R.K., Markowitz S., Myeroff L., Ewing R., Chance M.R. (2009) Discovery and scoring of protein interaction subnetworks discriminative of late stage human colon cancer. Mol. Cell. Proteomics, 8, 827–845. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16. Auffray C. (2007) Protein subnetwork markers improve prediction of cancer outcome. Mol. Syst. Biol., 3, 141–141.. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17. Vanunu O., Magger O., Ruppin E., Shlomi T., Sharan R. (2010) Associating Genes and Protein Complexes with Disease via Network Propagation. PLoS Comput. Biol., 6, e1000641.. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18. Chen Y., Zhu J., Lum P.Y., Yang X., Pinto S., MacNeil D.J., Zhang C., Lamb J., Edwards S., Sieberts S.K., et al. (2008) Variations in DNA elucidate molecular networks that cause disease. Nature, 452, 429–435. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19. Dixon A.L., Liang L., Moffatt M.F., Chen W., Heath S., Wong K.C., Taylor J., Burnett E., Gut I., Farrall M., et al. (2007) A genome-wide association study of global gene expression. Nat. Genet., 39, 1202–1207. [DOI] [PubMed] [Google Scholar]
  • 20. Schadt E.E., Molony C., Chudin E., Hao K., Yang X., Lum P.Y., Kasarskis A., Zhang B., Wang S., Suver C., et al. (2008) Mapping the genetic architecture of gene expression in human liver. PLoS Biology, 6, e107.. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21. Nicolae D.L., Gamazon E., Zhang W., Duan S., Dolan M.E., Cox N.J. (2010) Trait-associated SNPs are more likely to be eQTLs: annotation to enhance discovery from GWAS. PLoS Genetics, 6, e1000888.. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22. Ashburner M., Ball Ca Fau - Blake J.A., Blake Ja Fau-Botstein D., Botstein D., Fau - Butler H., Butler H., Fau - Cherry J.M., Cherry Jm Fau - Davis A.P., Davis Ap Fau - Dolinski K., Dolinski K., Fau - Dwight S.S., Dwight Ss Fau - Eppig J.T., Eppig Jt Fau - Harris M.A., et al. Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat. Genet. 25,25–29. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Gene Ontology Consortium: going forward. in press. [DOI] [PMC free article] [PubMed]
  • 24. Chan Y.W., van den Berg H.A., Moore J.D., Quenby S., Blanks A.M. (2014) Assessment of myometrial transcriptome changes associated with spontaneous human labour by high-throughput RNA-seq. Exp. Physiol., 99, 510–524. [DOI] [PubMed] [Google Scholar]
  • 25. Bethin K.E., Nagai Y., Sladek R., Asada M., Sadovsky Y., Hudson T.J., Muglia L.J. (2003) Microarray analysis of uterine gene expression in mouse and human pregnancy. Mol. Endocrinol., 17, 1454–1469. [DOI] [PubMed] [Google Scholar]
  • 26. Weiner C.P., Mason C.W., Dong Y., Buhimschi I.A., Swaan P.W., Buhimschi C.S. (2010) Human effector/initiator gene sets that regulate myometrial contractility during term and preterm labor. Am. J. Obstet. Gynecol., 202, 474.e471–420. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27. Xu Z., Yoshida T., Wu L., Maiti D., Cebotaru L., Duh E.J. (2015) Transcription factor MEF2C suppresses endothelial cell inflammation via regulation of NF-kappaB and KLF2. J. Cell. Physiol., 230, 1310–1320. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28. Szklarczyk D., Franceschini A., Kuhn M., Simonovic M., Roth A., Minguez P., Doerks T., Stark M., Muller J., Bork P., et al. (2011) The STRING database in 2011: functional interaction networks of proteins, globally integrated and scored. Nucleic Acids Res., 39, D561–D568. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29. Szklarczyk D., Franceschini A., Wyder S., Forslund K., Heller D., Huerta-Cepas J., Simonovic M., Roth A., Santos A., Tsafou K.P., et al. (2015) STRING v10: protein-protein interaction networks, integrated over the tree of life. Nucleic Acids Res., 43(Database issue):D447–452. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30. Rebhan M., Chalifa-Caspi V., Fau - Prilusky J., Prilusky J., Fau - Lancet D., Lancet D. (1997) GeneCards: integrating information about genes, proteins and diseases. Trends Genet. 13:163. [DOI] [PubMed] [Google Scholar]
  • 31. O'Brien M., Morrison J.J., Smith T.J. (2008) Upregulation of PSCDBP, TLR2, TWIST1, FLJ35382, EDNRB, and RGS12 gene expression in human myometrium at labor. Reprod. Sci., 15, 382–393. [DOI] [PubMed] [Google Scholar]
  • 32. Plunkett J., Doniger S., Morgan T., Haataja R., Hallman M., Puttonen H., Menon R., Kuczynski E., Norwitz E., Snegovskikh V., et al. (2010) Primate-specific evolution of noncoding element insertion into PLA2G4C and human preterm birth. BMC Med. Genomics, 3, 62.. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33. Morey J.S., Ryan J.C., Van Dolah F.M. (2006) Microarray validation: factors influencing correlation between oligonucleotide microarrays and real-time PCR. Biol. Proced. Online, 8, 175–193. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34. Lonsdale J., Thomas J., Salvatore M., Phillips R., Lo E., Shad S., Hasz R., Walters G., Garcia F., Young N., et al. (2013) The Genotype-Tissue Expression (GTEx) project. Nat. Genet., 45, 580–585. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35. Johnson A.D., Handsaker Re Fau - Pulit S.L., Pulit Sl Fau - Nizzari M.M., Nizzari Mm Fau - O'Donnell C.J., O'Donnell Cj Fau - de Bakker P.I.W., de Bakker P.I. (2008) SNAP: a web-based tool for identification and annotation of proxy SNPs using HapMap. Bioinformatics. 15;24:2938–2939. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36. Nibbe R.K., Koyuturk M., Fau - Chance M.R., Chance M.R. An integrative -omics approach to identify functional sub-networks in human colorectal cancer. PLos Comput Biol., 6(1): e1000639. doi:10.1371/journal.pcbi.1000639 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37. Patel V.N., Gokulrangan G., Fau - Chowdhury S.A., Chowdhury Sa Fau - Chen Y., Chen Y., Fau - Sloan A.E., Sloan Ae Fau - Koyuturk M., Koyuturk M., Fau - Barnholtz-Sloan J., Barnholtz-Sloan J., Fau - Chance M.R., Chance M.R. (2013) Network signatures of survival in glioblastoma multiforme. PLos Comput Biol., 9:e1003237. doi: 10.1371/journal.pcbi.1003237. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38. Edgar R., Domrachev M., Lash A.E. (2002) Gene Expression Omnibus: NCBI gene expression and hybridization array data repository. Nucleic Acids Res., 30, 207–210. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39. Klein P., Ravi R. (1995) A nearly best-possible approximation algorithm for node-weighted Steiner trees. J. Algorithm, 19, 104–115. [Google Scholar]
  • 40. Smoot M.E., Ono K., Ruscheinski J., Wang P.L., Ideker T. (2011) Cytoscape 2.8: new features for data integration and network visualization. Bioinformatics, 27, 431–432. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41. Merlino A.A., Welsh T.N., Tan H., Yi L.J., Cannon V., Mercer B.M., Mesiano S. (2007) Nuclear progesterone receptors in the human pregnancy myometrium: evidence that parturition involves functional progesterone withdrawal mediated by increased expression of progesterone receptor-A. J. Clin. Endocrinol. Metab., 92, 1927–1933. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary Data

Articles from Human Molecular Genetics are provided here courtesy of Oxford University Press

RESOURCES