Abstract
Motivation: Time-evolving differential protein–protein interaction (PPI) networks are essential to understand serial activation of differentially regulated (up- or downregulated) cellular processes (DRPs) and their interplays over time. Despite developments in the network inference, current methods are still limited in identifying temporal transition of structures of PPI networks, DRPs associated with the structural transition and the interplays among the DRPs over time.
Results: Here, we present a probabilistic model for estimating Time-Evolving differential PPI networks with MultiPle Information (TEMPI). This model describes probabilistic relationships among network structures, time-course gene expression data and Gene Ontology biological processes (GOBPs). By maximizing the likelihood of the probabilistic model, TEMPI estimates jointly the time-evolving differential PPI networks (TDNs) describing temporal transition of PPI network structures together with serial activation of DRPs associated with transiting networks. This joint estimation enables us to interpret the TDNs in terms of temporal transition of the DRPs. To demonstrate the utility of TEMPI, we applied it to two time-course datasets. TEMPI identified the TDNs that correctly delineated temporal transition of DRPs and time-dependent associations between the DRPs. These TDNs provide hypotheses for mechanisms underlying serial activation of key DRPs and their temporal associations.
Availability and implementation: Source code and sample data files are available at http://sbm.postech.ac.kr/tempi/sources.zip.
Contact: seungjin@postech.ac.kr or dhwang@dgist.ac.kr
Supplementary information: Supplementary data are available at Bioinformatics online.
1 INTRODUCTION
Many cellular events involve serial activation of cellular processes during which genes/proteins associated with the processes are up- or downregulated. Differential protein–protein interaction (PPI) networks (DNs) have been used to delineate PPIs (edges) among differentially regulated nodes (DRNs), such as up- or downregulated genes or proteins. The DNs have been considered more effective for understanding the differences between two conditions, compared with non-DNs (de la Fuente, 2010). However, the DNs delineate no temporal transition of the DRNs and/or their edges to represent serial activation of cellular processes over time. Thus, time-evolving differential PPI networks (TDNs) have been introduced to delineate (i) temporal changes in abundances or activities of DRNs (node transition), and/or (ii) formation of new edges for the DRNs and disappearance of existing edges over time (edge transition). The TDNs are essential to understand serial activation of differentially regulated cellular processes (DRPs) during a cellular event and their underlying mechanisms.
Time-course gene expression analysis can provide temporal changes in abundances of the DRNs (Hwang et al., 2009). Several interaction assays, such as yeast two-hybrid (Ito et al., 2001; Uetz et al., 2000; Yu et al., 2008) and mass spectrometry-based tandem affinity purification (Collins et al., 2007) can be used to measure PPIs among the DRNs. However, it is still challenging to experimentally identify temporal transition of the edges among the DRNs because of the limited coverage of the interactomes detected by these assays (von Mering et al., 2002).
The limitation of the experimental methods prompted us to develop a computational method to estimate TDNs. Many methods for estimation of dynamic gene regulatory networks have been developed (Kim et al., 2014). However, estimation of temporal transitions of differential PPI networks (i.e. TDNs) has been rarely studied. A couple of methods have been developed to identify differential PPI networks, which can be then used to estimate TDNs. First, a simple method to infer DNs using time-course gene expression data identifies DRNs over time and constructs a template PPI network with the known PPIs among all the DRNs (Hwang et al., 2009; Przytycka and Kim, 2010). TDNs can be then constructed by selecting the interacting DRNs with significant expression changes at each t from the template PPI network. Second, principal network analysis (PNA) identifies differential expression patterns over time and then selects DRNs and their edges (known PPIs between the DRNs) showing the differential expression patterns (Kim et al., 2011). A principal subnetwork (PS) is then constructed using both DRNs and edges selected for each differential expression pattern. Finally, TDNs can be constructed by selecting the edges in PSs for which the linked DRNs show significant expression changes at each t.
Functional interpretation of the inferred TDNs is important to understand temporal transition of the DRPs. In most of the network inference methods, it is commonly performed independently from network inference using post hoc analyses of Gene Ontology biological processes (GOBPs) of the nodes in the inferred networks (Kim et al., 2014). For example, the method proposed by Park and Bader (2012) clusters the nodes in time-evolving networks based on the similarity of temporal transitions of their edges and then links these clusters to cellular functions using GOBPs. However, none of the methods estimating DNs or TDNs integrates functional information, such as GOBPs, during the network inference such that the inferred TDNs can represent directly temporal transition of differentially regulated GOBPs and time-dependent interplays between the GOBPs, thereby facilitating functional interpretation of the TDNs.
Here, we introduce a probabilistic model for estimating Time-Evolving differential PPI networks with MultiPle Information (TEMPI). Although many methods have used probabilistic modeling for estimating network structures (Friedman et al., 2000; Ong et al., 2002; Song et al., 2009), a unique aspect of our model is that it models additionally probabilistic dependencies of GOBPs with network structures and time-course global data. By maximizing the likelihood function of the probabilistic model, TEMPI jointly estimates the TDNs showing temporal transitions of network structures with temporal activation of the GOBPs and their temporal interplays. During the network inference, TEMPI infers edges not included in the known PPIs, whereas most of the previous methods (e.g. PNA) select a subset of PPIs for estimation of TDNs from the known PPIs.
2 FRAMEWORK OF TEMPI
TEMPI uses the observed data (time-course gene expression data, known PPIs and GOBPs) as the input variables, estimates the output variables based on the probabilistic graphical model describing probabilistic dependencies among the input and output variables, and then infers TDNs using the estimated output variables. First, TEMPI uses the following three observed variables as the inputs. As the first input, TEMPI uses the time-course gene expression log2-fold-changes (dynamic data) of n nodes at T time points, with R biological replicates (an array E in Supplementary Fig. S1, bottom left). Estimation of the TDNs only using E can be an underdetermined problem (De Smet and Marchal, 2010). To reduce this issue, as the second input, TEMPI uses known PPI data (an adjacency matrix ; static data) of n nodes (Supplementary Fig. S1, top left). In TEMPI, the are converted into positions of n nodes in a p-dimensional latent space (a positional matrix) using multidimensional scaling (MDS; Supplementary Information S1.1; Higham et al., 2008; You et al., 2010). In this study, we used the 2D latent space (Supplementary Information S1.1). MDS locates the interacting nodes closely in the latent space: for example, for nodes A–E in (Supplementary Fig. S1, top center), MDS located A–D closely, but E distantly from A–D in a 2D latent space. To identify TDNs, TEMPI then selects m DRNs from time-course gene expression data (Supplementary Information S1.2). TEMPI uses the for the selected DRNs as an input. Finally, as the third input, TEMPI further uses the GOBP data (l GOBP terms assigned to m DRNs; static data), a binary node-GOBP matrix T in which = 1 when node i has GOBP k (e.g. a T for nodes A–F in Supplementary Fig. S1, top right).
Second, TEMPI uses a probabilistic graphical model (Fig. 1) that describes probabilistic dependencies (see Section 2.1) among the input observed variables () and the following output hidden variables for m DRNs at each t : (i) a positional matrix representing positions of n nodes in a p-dimensional space at t, (ii) an adjacency matrix representing the presence of edges between m nodes at t and (iii) a node-GOBP matrix representing differentially regulated GOBPs at t, in which when GOBP k is estimated to be differentially regulated for node i; otherwise, . The output hidden variables were then estimated through the optimization (see Section 2.2). Third, using the estimated outputs (), the TDNs can be constructed as described in Section 2.3 (‘Inferred TDNs’ of Supplementary Fig. S1, bottom right). The resulting TDNs have the following characteristics: (i) a pair of interacting DRNs according to are likely to be linked when they share expression changes and GOBPs [e.g. the interacting A–B in (Supplementary Fig. S1, top left), both of which were upregulated (Supplementary Fig. S1, bottom left) and share GOBPs 3 and 6 (Supplementary Fig. S1, top right), were linked in at (Supplementary Fig. S1, bottom right)]; (ii) the links between non-interacting DRNs according to can be inferred when they share expression changes and GOBPs (e.g. non-interacting B–D in , both of which were upregulated and share GOBPs 3 and 4, were linked in at ); and (iii) the GOBPs assigned to the interacting DRNs in T are likely to be differentially regulated (e.g. the interacting A–B at have GOBPs 3 and 6 in T, which are co-differentially regulated in the estimated ).
2.1 Probabilistic graphical model
The probabilistic graphical model (Fig. 1) was constructed to include the following dependencies between the input observed and the output hidden variables : (i) depends on the initial positions of nodes () and their positions at () to achieve smooth transition of TDNs over time; (ii) , a geometric graph, depends on the distances between the nodes in the latent space and thus on ; (iii) E depends on the interactions () between the nodes based on the observation in real PPI networks that the nodes with similar expression changes are likely to interact (Grigoriev, 2001; Supplementary Fig. S2A); and (iv) depends on and the node-GOBP matrix T, based on the observation in real PPI networks that the nodes with the same GOBPs are likely to interact (Sharan et al., 2007; Supplementary Fig. S2B).
Based on these dependencies, the probabilistic model was defined by the following four submodels (see Supplementary Information S1.3 for further details of the four submodels):
Transition model () for m nodes is defined as a product of m Gaussian distributions: where and . and I are the positional vector for node i at t and the identity matrix, respectively. and control the penalties of displacement of nodes from positions () at t − 1 and initial positions (), respectively.
Link model () is modeled as a product of Bernoulli distributions for pairs of m nodes: where link probability () of nodes i and j is defined by with . decreases as increases, and its decreasing rate is controlled by
Expression model () at t is defined as a weighted product of for all time points τ, where is the log2-fold-change matrix for m nodes, with R biological replicates at time τ, and is column r of . is then modeled as the product of the mixtures of Gamma distributions for all pairs of m nodes with two sets of parameters, and :
where . k and θ are determined to produce a higher probability in the first Gamma distribution than in the second one for a small . Finally, we used a radial basis function kernel, , to weight in its weighted product (Song et al., 2009):
This weighting scheme ensures (i) smooth transitions of TDNs, and (ii) a more significant dependency of on at τ = t than other τs ().
Ontology model () is factorized into and . First, is defined by the product of Bernoulli distributions for l GOBPs of m nodes:
where (ϵ = 1×10−4), given an inverse function frequency that penalizes general GOBPs. The normalization constant s was defined as the maximum value of . Second, was modeled as the mixture of Gamma distributions for all pairs of m nodes using two sets of parameters, and :
where k and θ are determined to produce a higher probability in the first Gamma function than in the second one for a large .
Many previous methods have used probabilistic models for estimating the network structures using E (Friedman et al., 2000; Ong et al., 2002; Song et al., 2009). However, a unique aspect of our model is that it includes the ontology model to integrate GOBP data ( and T) during the probabilistic estimation of TDNs.
2.2 Optimization of the likelihood function
The joint probability of the graphical model at each t is defined by . The output hidden variables () are then estimated by maximizing the log-likelihood function of the joint distribution, using variational inference (Beal, 2003). Briefly, for each t, we first calculated the lower bound of the marginal log-likelihood function, , which can be obtained by integrating the joint probability with respect to (Supplementary Information S1.4): for distributions of , , . We then used a variational approximation (Beal, 2003) to estimate the variational distribution , assuming the independency among the variables: where and In these distributions, , , and are the variational parameters (Supplementary Information S1.5). Finally, we determined the variational parameters (, and ) such that they maximize the lower bound of the marginal likelihood function as described in Supplementary Information S1.5.
2.3 Construction of TDNs
Using these variational parameters (, and ), the output hidden variables () were finally estimated. At each t, first, the position , column i of of node i was determined as the expected value of the posterior probability of given the observed variables, P(). The variational distribution with the estimated variational parameters approximates the posterior probability distribution of the hidden variables given the values of the observed variables: . With the independency among the hidden variables, , and . Thus, was determined as , the expected value of . Second, nodes i and j were determined to be linked (i.e. of = 1) when ≥ 0.5. Using the estimated and , TDNs can be constructed as geometric graphs at individual time points.
Finally, for functional interpretation of the inferred TDNs, GOBP k was determined to be differentially regulated (positively or negatively activated) at t for node i ( of = 1) when the log-likelihood ratio of posterior and prior probabilities of , = , was significantly (P < 0.01) larger than zero. The P-value was computed using one-tailed t-test (degree of freedom = the number of nodes with GOBP k). Using the estimated , the activation degree of GOBP k at t was estimated as the fraction of the nodes with activated GOBP k () in the network at t among all nodes with GOBP k (): , where is the sign of and r is the index of biological replicates. The sign was multiplied to distinguish positive and negative activation of GOBP k for node i at t. Furthermore, the interaction degree for two activated GOBPs k and l at t was estimated as the fraction of the inferred edges between the two sets of the nodes with activated GOBP k 1) and activated GOBP l 1), respectively, in the network at t among all possible edges among the two sets of nodes with activated GOBPs k and l:
The sum of the signs of the linked nodes i and j was included such that the edge with different signs of the linked nodes should have no contribution to the interaction degree.
3 A SYNTHETIC TDN MODEL
To demonstrate the performance of TEMPI, we generated a template PPI network and GOBPs to simulate characteristics of real yeast PPI networks and then sampled the synthetic TDN model from the template PPI network for which temporal transitions of (i) network structures and (ii) differentially regulated GOBPs associated with the TDN model are known. First, to generate a template PPI network with the characteristics of real yeast PPI networks, we used a geometric graph model with gene duplication and mutation (GEO-GD expansion model; Przulj et al., 2010; Supplementary Fig. S3A). See Supplementary Information S2.1 for the detailed procedure. The resulting template PPI network included 26 454 edges for the 3258 nodes (Supplementary Fig. S4A). This network was also used as the input PPI network (GI). Second, to generate GOBPs with the characteristics of real yeast PPI networks, we assigned 306 GOBP labels (T) to the 3258 nodes in the template PPI network using a modified version of network module (NeMo; Rivera et al., 2010; Supplementary Fig. S3B). See Supplementary Information S2.2 for the detailed procedure. Among the 306 GOBPs, we used 103 after removing 203 GOBPs assigned to >100 or <5 DRNs, which can be too general or non-meaningful, respectively, for functional interpretation. Third, we then sampled a TDN model from the template PPI network by selecting the nodes with (i) GOBPs 60, 67 and 218; (ii) GOBPs 178, 228 and 206; and (iii) GOBPs 94, 146 and 246 at individual time points based on predefined fractions of the linked edges among the selected nodes over time (Supplementary Figs S3C and S4B). See Supplementary Information S2.3 for the detailed procedure. Finally, we generated time-course gene expression log2-fold-changes that reflect temporal transitions of the synthetic TDN model using Metropolis–Hastings algorithm (Hastings, 1970; Supplementary Figs S3D and S4C). See Supplementary Information S2.4 for the detailed procedure.
4 RESULTS AND DISCUSSION
4.1 Application of TEMPI to the synthetic data
To evaluate performance of TEMPI, we applied it to the synthetic input data (). As the input PPI data, we used the PPIs in the template PPI network (GI; see Section 3). We first applied MDS to for the 3258 nodes to compute in 2D latent space. To identify TDNs, we then identified 616 DRNs with false discovery rates (FDRs) < 0.1 using a modified version of repeated measure-analysis of variance (RM-ANOVA) test previously reported (ElBakry et al., 2012) and maximum log2-fold-changes > 0.58 (1.5-fold) at least at one time point (Supplementary Information S1.2) and used the for the 616 DRNs as an input data. For the synthetic GOBP data, we used 103 GOBPs (T) for the DRNs as described in Section 3. Finally, we used the 616×6×3 synthetic log2-fold-changes for the DRNs as the input expression data (E). After applying TEMPI to these synthetic input data () for the 616 DRNs, the output variables () were estimated by the optimization of the likelihood function of the probabilistic graphical model (see Section 2.2). Using the and , TDNs (TEMPI- ) were inferred at individual time points (see Section 2.3).
For functional interpretation of TEMPI-, we first examined temporal activation of the 103 GOBPs represented by TEMPI- (Fig. 2A) based on the activation degrees of the 103 GOBPs computed using the estimated (see Section 2.3). The activation degrees revealed that three groups of GOBPs (Groups 1–3 in Fig. 2A, left panels), among the 103 GOBPs, were differentially regulated early, mid and late over time, respectively. Notably, Groups 1–3 included GOBPs 60, 178 and 94 (Group 1); 67, 228 and 146 (Group 2); and 218, 206 and 246 (Group 3), respectively, consistent to the predefined differential regulation of the three sets of the GOBPs (Supplementary Information S2.3). For example, the high activation degree of Group 1 at t = 1 indicates that a large number of the nodes with Group 1 are linked at t = 1. The decrease of the activation degree from t = 2 indicates that decreasing numbers of the nodes with Group 1 are linked from t = 2. Moreover, the descendants of the predefined GOBPs in Groups 1–3, respectively, were partially differentially regulated. This is expected because the nodes assigned with the descendant GOBPs also have their parent GOBPs (Supplementary Fig. S3). We then examined temporal associations among the GOBPs in Groups 1–3 represented by TEMPI- based on the interaction degrees of the GOBPs computed using the estimated and (Supplementary Fig. 2B; see Section 2.3). The interaction degrees revealed that (i) Group 1 and their descendants; (ii) Group 2 and their descendants; and (iii) Group 3 and their descendants showed early, mid and late associations among them, respectively, consistent to the predefined differential regulation of their parent GOBPs.
Finally, TEMPI- (Fig. 2C) showed the transitions of nodes and edges over time from the initial network (; Supplementary Fig. S4A). Of note, because of the geometric representation of the TDNs, the linked nodes at t in TEMPI- were moved closely to each other. The TDNs (Fig. 2C) for GOBP60 (Group 1), GOBP67 (Group 2) and GOBP218 (Group 3) correctly captured early, middle and late transitions defined in their true TDNs (Supplementary Fig. S4C).
4.2 Comparison of TEMPI with previous methods
To quantitatively assess the relative performance of TEMPI, we applied the two methods, the simple method (Hwang et al., 2009) and PNA (Kim et al., 2011) described in Section 1, to the synthetic data and identified TDNs as follows. For the simple method, as the input data, we used the synthetic E and GI in the template PPI network. We first identified the DRNs as the nodes with the maximum fold-changes > a cutoff of 2 or 1.5. For the DRNs, we constructed a DN using the GI. We then estimated the TDNs (Simple-) by selecting the interacting DRNs with fold-changes > the cutoff in the DN at each t or either of its neighboring time points (t − 1 or t + 1) to reflect significant smooth transitions and then by linking the selected interacting DRNs (Supplementary Information S3.1). For the PNA application, as the input data, we used the same synthetic E and GI. PNA identified three activation patterns (H1-3 in Supplementary Fig. S5A) and then generated the three PSs (PS1-3 in Supplementary Fig. S5B) that describe interactions among the DRNs showing the three activation patterns based on the input PPIs. By selecting both DRNs and edges showing PS1-3 at each t and then combining them, we identified TDNs (PNA-) at –6 (Supplementary Information S3.2).
We then evaluated the performance measures (precisions, recalls and F1 scores) by comparing the TDNs inferred by the three methods with the true TDNs and compared the performance measures of TEMPI with those of these two methods. Moreover, in many species, known PPIs (GI) are incomplete (Beyer et al., 2007). Unlike TEMPI, both the simple method and PNA predict no edges not included in GI. Their performance can thus depend on completeness of GI. Thus, we further examined robustness in accuracy of the inferred TDNs against incompleteness in GI by inferring the TDNs using the three methods as randomly removing 10–90% of the PPIs in the template PPI network (GI). Furthermore, to understand how the capability of TEMPI to predict edges not in GI can contribute to the performance, we also compared the performance of TEMPI after removing the edge prediction capability by fixing to 0 when during the optimization of TEMPI. PNA resulted in the highest precisions, but the lowest recalls, indicating that PNA- had less false positives (FPs), but more false negatives (FNs), compared with TEMPI and the simple method (Fig. 3A and Supplementary Table S1). By contrast, the simple method had lower precisions (more FPs), but high recalls (less FNs), compared with TEMPI and PNA. Interestingly, precisions of the three methods are robust to the amount of the removed PPIs, indicating the robust sensitivity of the methods for identifying the true links against the removal of PPIs. On the other hand, recalls of the simple method, PNA and TEMPI with no edge prediction capability linearly decreased with the increase of the amount of the removed PPIs. However, importantly, recall of TEMPI was robust up to 60% removal of the input PPIs. Based on the overall performance measure, F1 score defined as the harmonic mean of precision and recall, PNA was the best or comparable with TEMPI when the amount of the removed PPIs is <30%. However, TEMPI outperformed the other methods when the amount of the removed PPIs is >30% and showed the robust performance against the removal of the PPIs. Also, TEMPI with no edge prediction capability achieved the similar performance to PNA, and the robustness against the PPI removal disappeared. These data indicate that the edge prediction capability of TEMPI recovered the removed links in TEMPI-.
In addition, it is important to assess whether the three methods correctly estimate temporal transitions of topological properties in the true TDNs. Thus, we next examined the performance of the three methods in estimating transitions of the topological properties. To this end, we first selected TDNs reconstructed by the three methods when percentage of removal in input PPIs is 30% because they achieved similar F1-scores. Then, for the nodes in the estimated by each method, we computed (i) degree and (ii) clustering coefficient(CC) profiles over time. Then, we calculated correlations of the degree and clustering coefficient profiles with those of the true TDNs (Fig. 3B–C). The comparisons revealed that temporal transitions of the topological properties in TEMPI-, compared with PNA- and Simple-, agreed best with those in the true TDNs. Structural changes in the s should be linked to transitions of cellular functions represented by the s over time. Thus, we examined how well structural changes in the s estimated by the three methods were linked to differential regulation of GOBPs 60, 67 and 218 represented in the true TDNs (Supplementary Fig. S6 for the other predefined six GOBPs). For TEMPI, as described in Section 4.1, the activation degrees of GOBPs showed that structural changes in the s well-represented early, mid and late differential regulation of the three GOBPs in the true TDNs (Fig. 2A). As another subjective measure previously reported (Song et al., 2009), we further defined ‘interactivity’ for each GOBP as the average number of the edges in among the nodes with the GOBP. For the GOBPs 60, 67 and 218, temporal changes of the interactivities obtained from TEMPI- achieved the highest correlation with those of the true TDNs, compared with PNA- and Simple- (Fig. 3D). All these data together indicate that TEMPI- represents effectively the temporal transitions in the true TDNs in the structural (Fig. 3A), topological (Fig. 3B–C) and functional (Fig. 3D) aspects, compared with the two previous methods.
4.3 Application of TEMPI to the cell cycle data
Cell cycle is a representative time-varying cellular process. Although small-scale TDNs for several molecules involved in cell cycle have been studied, large-scale TDNs for a comprehensive set of molecules for cell cycle are still largely unknown. Thus, we obtained time-course gene expression data (GSE8799) collected during the cell cycle of wild-type yeasts (Orlando et al., 2008). We first applied MDS to the high quality of known yeast PPI data (; Supplementary Information S4.1) to calculate in a 2D latent space for 3258 nodes. We selected the 755 DRNs with FDRs < 0.1 using the modified RM ANOVA test and maximum log2-fold-changes > 0.58 (1.5-fold) at one time point at least. To identify TDNs, we then obtained the following input data for the 755 DRNs: (i) log2-fold-changes (E) at six time points with two replicates, (ii) positions () in a 2D space and (iii) 664 GOBPs (T) assigned to the 755 nodes (Supplementary Information S4.1).
TEMPI generated to over six time points (G0, G1, S, G2, G2/M and M phases) during the cell cycle. To understand functional transition represented by TEMPI-, we first examined activation degrees of the 664 GOBPs. Among them, we focused on the 14 cell cycle–related GOBPs with pulsed changes of activation degrees, a characteristic of the cell cycle (Fig. 4A; Supplementary Information S4.2). They can be categorized into the following three groups: Group 1, early activated GOBPs with the peaks at G0 to G1 phase (DNA-dependent DNA replication initiation, DNA-dependent DNA replication, DNA strand elongation involved in DNA replication, DNA integrity checkpoint and telomere maintenance via telomerase); Group 2, middle activated GOBPs with the peaks at S to G2 phase (post-replication repair, sister chromatid cohesion, DNA packaging, spindle organization, spindle checkpoint, regulation of G2/M transition of mitotic cell cycle and nuclear division); and Group 3, late activated GOBPs with the peaks at G2/M to M phase [cytokinetic cell separation (CCS) and organelle inheritance]. The activation kinetics of these 14 GOBPs was largely consistent with their known kinetics during the cell cycle (Orlando et al., 2008; Simon et al., 2001; Spellman et al., 1998). For example, the expression of genes involved in DNA replication and repair reached peaks in G1 or S phases (Spellman et al., 1998), consistent to the kinetics of the GOBPs of DNA replication and post-replication repair in Figure 4A.
We then examined the interaction degrees of the GOBPs to understand time-dependent associations among GOBPs represented by TEMPI-. In this analysis, we focused on the following five GOBPs: GOBPs 1–2, DNA-dependent DNA replication (DR) and DNA integrity checkpoint (DIC) in Group 1; GOBPs 3–4, spindle checkpoint (SC) and regulation of G2/M transition of mitotic cell cycle (G2M) in Group 2; and GOBPs 5, CCS in Group 3. The interaction degrees revealed the following temporal interplays between the five GOBPs (Fig. 4B): (i) DIC and DR in Groups 1 and 2, respectively, during G0, G1 and S phases (–3); (ii) DIC and SC, as well as DIC and G2M, in Group 1 at S phase (); (iii) SC and G2M in Group 1 at S and G2 phases (–4); and (iv) SC and CCS in Group 4 at G2 phase (). Some of these interplays have been previously reported. Noh et al. (2009) reported that downregulation of G2/M transition-related genes (e.g. PLK1 and SURVIVIN) led to a defect in mitotic spindle, and Gardner et al. (1999) reported that two DIC-related genes, RAD53 and DUN1, are required for establishment and maintenance of G2/M arrest. TEMPI- showed temporal transition of the nodes with the five GOBPs and their edges (G2M and CCS in first row of Fig. 4C, SC and G2M in second row of Fig. 4C and Supplementary Fig. S7 for individual GOBPs), consistent to the transitions of the GOBPs in Figure 4A and B.
These findings can provide novel insights into the interplays among the GOBPs during the cell cycle. TEMPI- at S phase () showed the strongest associations among the GOBPs. To investigate novel insights represented by the associations, we built a subnetwork (Fig. 4D) including the nodes with the four GOBPs (DIC, DR, SC and G2M) strongly interacting at S phase (). To reduce the complexity, we focused on the RAD53 subnetwork (Fig. 4D) including RAD53, a key molecule in DIC and DR, and its 36 interactors involved in the four GOBPs. Of the 36 interactors, only three are included in the input PPIs (), while the others are predicted. To assess the reliability of the predicted interactors of RAD53, we obtained 9850 interactions among the 755 DRNs from an independent PPI database, BioGrid (Stark et al., 2006), and confirmed that 18 of the 33 predicted interactors were previously detected by high-throughput interactome analyses, supporting the validity of TEMPI in the edge prediction. For example, TEMPI estimated the interactions of RAD53 with the 23 nodes (‘×’ nodes in Fig. 4D) involved in DR. Among them, RAD53-DBF4 and DUN1 was included in , while the other interactions are newly predicted. Of the 21 predicted interactors, 14 (e.g. CTF4, CDC45 and CDC7) are reported to interact with RAD53 according to BioGrid. Moreover, TEMPI newly predicted the seven interactors of RAD53 involved in G2M (‘*’ nodes in Fig. 4D), where three of them (CLB2, CLB5 and IPL1) are reported to interact with RAD53 according to BioGrid. Similarly, TEMPI identified seven interactors of RAD53 involved in DIC (‘□’ nodes in Fig. 4D). Two of them (DUN1 and RAD9) were included in , and the other five were reported to interact with RAD53 according to BioGrid. The molecules with newly predicted interactions in the RAD53 subnetwork are known to be involved in DIC, DR, SC or G2M, independently of RAD53-dependent regulation of the cell cycle at S phase. Thus, all these data indicate novel insights into potential roles of these molecules in the RAD53-dependent regulation of the cell cycle at S phase. Many subnetworks can be analyzed in the same way to generate hypotheses for novel mechanisms underlying dynamic regulation of cell cycle.
We further compared the performance of TEMPI on the cell cycle data with that of PNA. PNA produced the 10 PSs. By combining them, we generated TDNs as described above (PNA-Gs in Supplementary Fig. S8). PNA-Gs were significantly sparse, compared with TEMPI-Gs (Supplementary Fig. S8), because PNA used only the sparse real PPIs (1425 PPIs between 755 nodes) in GI, whereas TEMPI predicted a significant number of novel edges among the DRNs (Supplementary Fig. S9A). As described above, we assessed the reliability of the predicted PPIs by examining how many of them are reported in the BioGrid database (Supplementary Fig. S9B–C). The fraction of the predicted PPIs reported in BioGrid (0.254) was significantly larger than the random expectation (0.0346). Moreover, TEMPI- captured the larger number of cell cycle–related GOBPs (120 GOBPs) with pulsed activation patterns than PNA- (16 GOBPs), suggesting that TEMPI- more effectively captured cell cycle–related functional transition than PNA- (Supplementary Fig. S9D–F). The comparison of the distribution of degrees and CC also showed that TEMPI- is more dense and modular than PNA- (Supplementary Fig. S9G–H). See Supplementary Information S4.3 for further details. Also, we applied TEMPI to the gene expression data collected from the mutant yeasts with defects in cell cycle, compared activation/interaction degrees of cell cycle–related GOBPs between wild-type and mutant yeasts, and examined deregulation of wild-type TEMPI- in mutant yeasts (Supplementary Information S4.4).
4 CONCLUSIONS
In this study, we developed TEMPI that effectively estimates TDNs associated with activated GOBPs over time by integrating time-course gene expression, PPI and GOBP data. TEMPI provides activation and interaction degrees of GOBPs, facilitating the interpretation of temporal activations and interplays of GOBPs represented by the estimated TDNs. This interpretation leads to generation of TDN-driven hypotheses for key pathways regulating cellular events under investigation (see Section 4.3). Thus, TEMPI can serve as a useful tool that provides hypotheses for the mechanisms underlying functional transitions in various problems in time-varying biological systems. See Supplementary Information S5 and S6 for implementation and limitations of TEMPI, respectively, which include potential limited applicability of TEMPI to other types of interactions than PPIs, sensitivity to completeness of the input PPIs and the scalability issue.
Funding: This study was supported by National Research Foundation (NRF) of Korea (NRF-2013R1A2A2A01067464), the IT R&D Program of MSIP/KEIT (14-824-09-014, Machine Learning Center), Institute for Basic Science (CA1308), the Next-generation Biogreen21 Program (PJ009072) and Proteogenomic Research Program.
Conflict of interest: none declared.
Supplementary Material
REFERENCES
- Beal MJ. London: Ph.D. Thesis, The Gatsby Computational Neuroscience Unit, University College; 2003. Variational Algorithms for Approximate Bayesian Inference. [Google Scholar]
- Beyer A, et al. Integrating physical and genetic maps: from genomes to interaction networks. Nat. Rev. Genet. 2007;8:699–710. doi: 10.1038/nrg2144. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Collins SR, et al. Toward a comprehensive atlas of the physical interactome of Saccharomyces cerevisiae. Mol. Cell. Proteomics. 2007;6:439–450. doi: 10.1074/mcp.M600381-MCP200. [DOI] [PubMed] [Google Scholar]
- de la Fuente A. From ‘differential expression' to ‘differential networking' - identification of dysfunctional regulatory networks in diseases. Trends Genet. 2010;26:326–333. doi: 10.1016/j.tig.2010.05.001. [DOI] [PubMed] [Google Scholar]
- De Smet R, Marchal K. Advantages and limitations of current network inference methods. Nat. Rev. Microbiol. 2010;8:717–729. doi: 10.1038/nrmicro2419. [DOI] [PubMed] [Google Scholar]
- ElBakry O, et al. Identification of differentially expressed genes for time-course microarray data based on modified RM ANOVA. IEEE/ACM Trans. Comput. Biol. Bioinform. IEEE ACM. 2012;9:451–466. doi: 10.1109/TCBB.2011.65. [DOI] [PubMed] [Google Scholar]
- Friedman N, et al. Using Bayesian networks to analyze expression data. J. Comp. Biol. 2000;7:601–620. doi: 10.1089/106652700750050961. [DOI] [PubMed] [Google Scholar]
- Gardner R, et al. RAD53, DUN1 and PDS1 define two parallel G2/M checkpoint pathways in budding yeast. EMBO J. 1999;18:3173–3185. doi: 10.1093/emboj/18.11.3173. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Grigoriev A. A relationship between gene expression and protein interactions on the proteome scale: analysis of the bacteriophage T7 and the yeast Saccharomyces cerevisiae. Nucleic Acids Res. 2001;29:3513–3519. doi: 10.1093/nar/29.17.3513. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hastings WK. Monte Carlo sampling methods using Markov chains and their applications. Biometrika. 1970;57:97–109. [Google Scholar]
- Higham DJ, et al. Fitting a geometric graph to a protein-protein interaction network. Bioinformatics. 2008;24:1093–1099. doi: 10.1093/bioinformatics/btn079. [DOI] [PubMed] [Google Scholar]
- Hwang D, et al. A systems approach to prion disease. Mol. Syst. Biol. 2009;5:252. doi: 10.1038/msb.2009.10. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ito T, et al. A comprehensive two-hybrid analysis to explore the yeast protein interactome. Proc. Natl Acad. Sci. USA. 2001;98:4569–4574. doi: 10.1073/pnas.061034498. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kim Y, et al. Principal network analysis: identification of subnetworks representing major dynamics using gene expression data. Bioinformatics. 2011;27:391–398. doi: 10.1093/bioinformatics/btq670. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kim Y, et al. Inference of dynamic networks using time-course data. Brief. Bioinform. 2014;15:212–228. doi: 10.1093/bib/bbt028. [DOI] [PubMed] [Google Scholar]
- Noh EJ, et al. An HDAC inhibitor, trichostatin A, induces a delay at G2/M transition, slippage of spindle checkpoint, and cell death in a transcription-dependent manner. Biochem. Biophys. Res. Commun. 2009;378:326–331. doi: 10.1016/j.bbrc.2008.11.057. [DOI] [PubMed] [Google Scholar]
- Ong IM, et al. Modelling regulatory pathways in E. coli from time series expression profiles. Bioinformatics. 2002;18(Suppl. 1):S241–S248. doi: 10.1093/bioinformatics/18.suppl_1.s241. [DOI] [PubMed] [Google Scholar]
- Orlando DA, et al. Global control of cell-cycle transcription by coupled CDK and network oscillators. Nature. 2008;453:944–947. doi: 10.1038/nature06955. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Park Y, Bader JS. How networks change with time. Bioinformatics. 2012;28:i40–i48. doi: 10.1093/bioinformatics/bts211. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Przulj N, et al. Geometric evolutionary dynamics of protein interaction networks. Pac. Symp. Biocomput. 2010:178–189. doi: 10.1142/9789814295291_0020. [DOI] [PubMed] [Google Scholar]
- Przytycka TM, Kim YA. Network integration meets network dynamics. BMC Biol. 2010;8:48. doi: 10.1186/1741-7007-8-48. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Rivera CG, et al. NeMo: Network Module identification in Cytoscape. BMC Bioinformatics. 2010;11(Suppl. 1):S61. doi: 10.1186/1471-2105-11-S1-S61. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Sharan R, et al. Network-based prediction of protein function. Mol. Syst. Biol. 2007;3:88. doi: 10.1038/msb4100129. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Simon I, et al. Serial regulation of transcriptional regulators in the yeast cell cycle. Cell. 2001;106:697–708. doi: 10.1016/s0092-8674(01)00494-9. [DOI] [PubMed] [Google Scholar]
- Song L, et al. KELLER: estimating time-varying interactions between genes. Bioinformatics. 2009;25:i128–i136. doi: 10.1093/bioinformatics/btp192. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Spellman PT, et al. Comprehensive identification of cell cycle-regulated genes of the yeast Saccharomyces cerevisiae by microarray hybridization. Mol. Biol. Cell. 1998;9:3273–3297. doi: 10.1091/mbc.9.12.3273. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Stark C, et al. BioGRID: a general repository for interaction datasets. Nucleic Acids Res. 2006;34:D535–D539. doi: 10.1093/nar/gkj109. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Uetz P, et al. A comprehensive analysis of protein-protein interactions in Saccharomyces cerevisiae. Nature. 2000;403:623–627. doi: 10.1038/35001009. [DOI] [PubMed] [Google Scholar]
- von Mering C, et al. Comparative assessment of large-scale data sets of protein-protein interactions. Nature. 2002;417:399–403. doi: 10.1038/nature750. [DOI] [PubMed] [Google Scholar]
- You ZH, et al. Using manifold embedding for assessing and predicting protein interactions from high-throughput experimental data. Bioinformatics. 2010;26:2744–2751. doi: 10.1093/bioinformatics/btq510. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Yu H, et al. High-quality binary protein interaction map of the yeast interactome network. Science. 2008;322:104–110. doi: 10.1126/science.1158684. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.