Abstract
Background
Linear regression models are important tools for learning regulatory networks from gene expression time series. A conventional assumption for non-homogeneous regulatory processes on a short time scale is that the network structure stays constant across time, while the network parameters are time-dependent. The objective is then to learn the network structure along with changepoints that divide the time series into time segments. An uncoupled model learns the parameters separately for each segment, while a coupled model enforces the parameters of any segment to stay similar to those of the previous segment. In this paper, we propose a new consensus model that infers for each individual time segment whether it is coupled to (or uncoupled from) the previous segment.
Results
The results show that the new consensus model is superior to the uncoupled and the coupled model, as well as superior to a recently proposed generalized coupled model.
Conclusions
The newly proposed model has the uncoupled and the coupled model as limiting cases, and it is able to infer the best trade-off between them from the data.
Supplementary Information
The online version supplementary material available at 10.1186/s12859-021-03998-9.
Keywords: Bayesian piece-wise linear regression, Gene regulatory networks, Network reconstruction, Segment-wise parameter coupling
Background
Non-homogeneous dynamic Bayesian networks have become a popular tool for learning the structures of cellular regulatory networks from gene expression and protein concentration data. The traditional (homogeneous) dynamic Bayesian network models assume the network parameters to stay constant across time. This can lead to biased results and wrong conclusions, as cellular regulatory processes can change in time. It was therefore proposed to combine dynamic Bayesian network models with Bayesian changepoint processes, see, e.g., [1–3]. Then a multiple changepoint process is used to divide the temporal data into disjoint segments, and the data within each segment are modelled by linear regression models. For most cellular processes on a short time scale it is not realistic to assume that the network structure changes over time. The network structure is therefore usually assumed to stay unchanged and only the network parameters are assumed to time-varying. As a motivation for this assumption consider a gene regulatory network, in which an edge from gene to gene , , typically would indicate that gene codes for a transcription factor that can bind to the promoter of gene , so that ’s transcription is initiated. The ability to bind to the promoter (= the edge connection) is unlikely to change within a short time period, whereas the extent of binding (= the network interaction parameter) can undergo quick temporal changes. Regarding our two real-life applications to S. cerevisiae (yeast) and A. thaliana (plant) gene expression data, the assumption of a fixed network structure therefore seems more faithful.
The uncoupled model, akin to the models proposed by Lèbre et al. [1] and Dondelinger et al. [3], learns the segment-specific network parameters for each segment separately. To allow for information-sharing with respect to the network parameters, models with globally [4] and sequentially [5] coupled network parameters were proposed. As sequential information-sharing seems more suitable for temporal time segments, we focus here on the sequential coupling. The underlying idea is that the network parameters of each segment should be enforced to stay similar to those of the previous segment. Grzegorczyk and Husmeier [5] proposed a coupled model, in which the posterior expectations of the network parameters of segment h are used as prior expectations for the next segment . The strength of the coupling, i.e. the variance of the network parameter priors, is regulated by a coupling parameter. Although it was shown that this is very useful for applications where the network parameters stay similar over time, the fully coupled model has the drawback that it enforces coupling and does not feature any possibility for uncoupling. In this paper we therefore propose a partially segment-wise coupled model, which can be seen as a consensus model between the uncoupled and the fully coupled model. Discrete binary indicator variables indicate for each segment h whether it is coupled to the previous segment () or uncoupled from it (). Along with the network structure and the data segmentation the values of those indicator variables are inferred from the data. The new partially coupled model reaches the original models in the limit: If it couples all segments ( for all ), it becomes the fully coupled model. If it uncouples all segments ( for all h), it becomes the uncoupled model.
In our earlier work [6] we have proposed a new generalized fully coupled model. While the fully coupled model from [5] couples all neighbouring segments with the same coupling strength , the generalised (fully) coupled model from [6] uses for each pair of neighbouring segments a segment-specific coupling strength parameter . This leads to a higher model flexibility, but like the coupled model the generalized coupled model still does not allow for uncoupling. In our comparative evaluation study, we will compare the new partially coupled model with the three competing models: the uncoupled model, the (fully) coupled model, and the generalized (fully) coupled model.
In recent works alternative model refinements have been proposed [7, 8]. These models distinguish coupled from uncoupled network edges rather than distinguishing coupled from uncoupled time segments. The partially non-homogeneous model from Shafiee Kamalabad et al. [7] builds on the idea that only some network parameters (i.e some edges) might be subject to changes, while other network parameters (i.e. edges) might stay constant. The model has been designed for analysing data that have been measured under different experimental conditions, so that it does not allow the segmentation of a time series to be inferred. The non-homogeneous model from Shafiee Kamalabad and Grzegorczyk [8] distinguishes between two groups of edges: (i) edges that are fully coupled among all segments and (ii) edges that are uncoupled among all segments. The new model that we propose here is conceptual related, but complementary in that it replaces the concept of partially coupled edges by the concept of partially coupled time segments.
We note that network reconstruction is a topical research field in the computational biology literature and that many different network reconstruction approaches have been proposed over the years. However, most of the proposed models do not focus on non-homogeneous regulatory processes but rely on a homogeneity of the regulatory processes. For some applications this assumption of homogeneity can be too restrictive; compare, e.g., our data applications. In response to one of the reviewers of our paper, we here briefly discuss a few recently proposed network reconstruction methods. Vignes et al. [9] investigated and compared a wide variety of methods, ranging from Bayesian networks to penalised linear regression based models and proposed a meta-analysis based on Fisher’s Inverse Chi-Square meta-test for combining different approaches. Huang et al. [10] proposed to apply Bayesian model averaging for linear regression methods. The method uses a closed form solution to compute the edge posterior probabilities within a hybrid framework of Bayesian model averaging and linear regression. Xing et al. [11] proposed a Candidate Auto Selection algorithm based on the pairwise mutual information and breakpoint detection. With a greedy search algorithm it is searched for the best network topology. Unlike the above mentioned models, Fan et al. [12] propose to impose a prior on the topology information in their inference process. Incorporating this prior information can partially compensate for the lack of reliable data. They then developed a Bayesian group lasso with spike and slab prior approach based on non-parametric models. Xu et al. [13] propose to employ a series of linear regression problems to model the relationship between the network nodes. They use an efficient variational Bayes method for optimization and inference of the unknown network parameters.
Methods
Learning dynamic networks with time-varying parameters
Consider N random variables that are the nodes of a network. Let denote an N-by- data matrix, whose N rows correspond to the variables and whose columns correspond to time points . The element in the ith row and tth column, , is the value of at time point t. For temporal data it is typically assumed that the regulatory interactions are subject to a lag of one time point. For example, an edge indicates that ( at ) depends on ( at t). The variable is then called a parent (node) of .
Because of the lag, there is no need for any acyclicity constraint, and for each node () the parent nodes can be learned separately. This has computational advantages, since the ‘network learning task’ can be separated into N independent ‘parent learning tasks’. Henceforth, when a computer cluster is available, the N parent sets can be learned in parallel, so that the inference algorithms scale-up well.
A popular method is to apply linear regression, where is the response and are potential covariates (with ). Because of the lag, time points yield T observations for the linear regression model. Each observation consists of a response value and the shifted covariate values: , where .
Having inferred a covariate set for each , a network is built by merging the covariate sets: . There is the edge in if and only if .
As the same linear regression approaches are used for each , we describe the models using a general terminology: Let Y be the response and let be the covariates of the linear regression model.
To allow for time-dependent regression coefficients, a piece-wise linear regression model can be used. Changepoints with divide the observations into disjoint segments containing consecutive data points, so that: . Observation () belongs to segment h if , where and are two pseudo changepoints.
We assume all covariate sets with up to covariates to be equally likely a priori, , while parent sets with more than covariates get a zero prior probability (‘fan-in restriction’). Further we assume that the distance between changepoints is geometrically distributed with hyperparameter , so that
With being the set of segment-specific response vectors, implied by the changepoint set , the posterior distribution takes the form:
| 1 |
where denotes the set of all model parameters, including segment-specific parameters as well as parameters that are shared among segments.
In the following subsections we assume and the segmentation , induced by , to be fixed, and we do not make and explicit anymore. Without loss of generality, we assume that contains the first k covariates: . For fixed and , Eq. (1) reduces to:
A generic Bayesian piece-wise linear regression model
Consider a Bayesian linear regression model, where Y is the response and are the covariates. We assume that T observations have been made at equidistant time points and that the data can be subdivided into disjoint segments , where segment h contains data points and has a segment-specific regression coefficient vector . Let be the response vector and be the design matrix for segment h, where each includes a first column of 1’s for the intercept. For each segment we assume a Gaussian likelihood:
| 2 |
where is the identity matrix, and is a noise variance parameter that is shared among all segments. We impose an inverse Gamma prior on , , and we assume that the vectors have Gaussian priors:
| 3 |
where is a (k+1)-dimensional vector, and is a positive definite -by- matrix. Re-using the parameter in Eq. (3), yields a fully-conjugate prior in both and (see, e.g., Sections 3.3 and 3.4 in Gelman [14]). Figure 1 shows a graphical model representation of this generic model. For notational convenience we define:
Fig. 1.

Graphical representation of the generic model. Parameters that have to be inferred are represented by white circles. The data and the fixed hyperparameters are represented by grey circles. Circles within the plate are specific for segment h
The full conditional distribution of is (cp. Section 3.3 in [15]):
| 4 |
and the segment-specific marginal likelihoods with integrated out are:
| 5 |
where (cp. Section 3.3 in [15]). From Eq. (5) we get:
where and . The shape of implies:
| 6 |
For the marginal likelihood, with () and integrated out, we apply the rule from Section 2.3.7 of Bishop [15]:
| 7 |
When all parameters in are fixed, the marginal likelihood of the piece-wise linear regression model can be computed in closed form. In typical models the (hyper-)hyperparameters in depend on hyperparameters with their own hyperprior distributions. From now on we will only include the free hyperparameters in . In the following subsections we describe four possible model instantiations, namely: the uncoupled model (M1), the coupled model (M2), the newly proposed partially coupled model (M3), and the generalized coupled model (M4). In the forthcoming subsections we will introduce further mathematical symbols. For convenience, Table 1 lists the mathematical symbols that we will use in this paper.
Table 1.
List of mathematical symbols
| Symbol | Description | Prior distribution |
|---|---|---|
| N | Total number of nodes (genes) | – |
| n | Number of potential parent nodes, here | – |
| h | Data segment h | – |
| H | Total number of data segments | – |
| k | Number of covariates in covariate set | – |
| t | Data point t | – |
| Noise variance parameter | ||
| Coupling strength parameter, | ||
| SNR parameter, | ||
| hth coupling strength parameter (M4 model) | ||
| hth coupling indicator variable (M3 model) | , | |
| T | Total number of data points | – |
| Number of data points in segment h | – | |
| ith data point | – | |
| ith network node | – | |
| Parent (covariate) set of ith node, | , | |
| Changepoint set | ||
| Changepoint h | – | |
| ith covariate | – | |
| Design matrix of segment h | – | |
| Response vector of segment h | ||
| Regression coefficient vector of segment h | ||
| Posterior expectation of | – |
Model M1: the uncoupled model
A standard approach, akin to the models of Lèbre et al. [1] and Dondelinger et al. [3], is to set and to assume that the matrices are diagonal matrices , where the parameter is shared among segments and assumed to be inverse Gamma distributed, . In the supplementary material we provide a graphical model representation of the uncoupled model (M1). Using the notation of the generic model, we have:
| 8 |
For the posterior distribution of the uncoupled model we have:
| 9 |
where . From Eq. (9) it follows for the full conditional distribution of :
and the shape of the latter density implies:
| 10 |
Since the full conditional distribution of depends on and , those parameters have to be sampled first. From Eq. (6) a value of can be sampled via a collapsed Gibbs-sampling step, with the ’s being integrated out. Subsequently, given , Eq. (4) can be used to sample the vectors ’s. Finally, for each sampled from Eq. (10) the marginal likelihood, , can be computed by plugging in the expressions from Eq. (8) into Eq. (7).
Model M2: the (fully) coupled model
The (fully) coupled model, proposed by Grzegorczyk and Husmeier [5], uses the posterior expectation of as prior expectation for . Only the first segment has an uninformative prior:
| 11 |
where is the posterior expectation of (cp. Eq. (4)):
The parameter has been called the ’coupling parameter’ and it has been assumed that it has an inverse Gamma prior distribution, . Using the notation from the generic model (see Fig. 1), we note that Eq. (11) corresponds to:
with , and . As is treated like a fixed hyperparameter when used as input for segment h, we exclude the parameters from .
In the supplementary material we provide a graphical model representation of the coupled M2 model. For the posterior we have:
| 12 |
In analogy to the derivations in the previous subsection one can derive (cp. [5]):
| 13 |
| 14 |
where and .
For each the marginal likelihood, , can be computed by plugging the expressions and into Eq. (7).
Model M3: the new partially segment-wise coupled model
We propose a new ‘consensus’ model between the M1 and the M2 model. The new model (M3) allows each segment either to coupled top or to uncouple from the preceding segment . We use an uninformative prior for the first segment , and for all segments we introduce a binary variable which indicates whether segment h is coupled to () or uncoupled from () the preceding segment :
| 15 |
where is the posterior expectation of . The new priors from Eq. (15) yield for the following posterior expectations (cp. Eq. (4)):
with , , we have in the generic model notation:
We assume the binary variables to have a Bernoulli prior distributions, , with a joint hyperparameter having a Beta hyperprior distribution, . We note that
() gives model M1 with for all h
() gives model M2 with for .
The new partially segment-wise coupled model infers the variables () from the data. It searches for the best trade-off between the models M1 and M2.
A graphical model presentation of the partially coupled model is shown in Fig. 2. For with the joint marginal density of is:
| 16 |
For the posterior distribution of the partially segment-wise coupled model we get:
For the full conditional distributions of and we have:
where fixed. And it follows from the shapes of the densities:
where is the number of coupled segments, is the number of uncoupled segments, so that , and
For each parameter instantiation the marginal likelihood, , can be computed with Eq. (7), where was defined above, and
We have for each binary variable ():
so that its full conditional distribution is:
Each () can therefore be sampled with a collapsed Gibbs sampling step, where , and have been integrated out.
Fig. 2.
Graphical representation of the new partially coupled model (M3). Parameters that have to be inferred are represented by white circles. The data and the fixed hyperparameters are represented by grey circles. The two rectangles indicate definitions, which depend on the parent nodes. Circles and definitions within the plate are segment-specific. For each segment the model infers if the prior for is coupled to () or uncoupled from () the preceding segment
Model M4: the generalised (fully) coupled model
In [6] we proposed to generalise the (fully) coupled model (i.e. the M2 model) by introducing a segment-specific coupling parameter for each segment . This yields:
| 17 |
where is the posterior expectation of . For the parameters we have assumed that they are inverse Gamma distributed, , with hyperparameters and . In the supplementary material we provide a graphical model representation of the M4 model. Recalling the generic notation and setting and , Eq. (17) gives:
For the posterior we have:
| 18 |
For it follows:
where and .
For each the marginal likelihood, , can be computed with Eq. (7); using the expressions and defined above.
Unlike the proposed partially coupled M3 model, the generalized coupled M4 model does not feature any mechanism to uncouple neighbouring segments. Like the fully coupled M2 model, the M4 model has been designed such that it has to couple all neighbouring segments. The only advantage over the M2 model is that the the M4 model introduces segment-specific coupling parameters, so that the coupling strength(s) can vary over time.
Reversible jump Markov chain Monte Carlo inference
We use Reversible Jump Markov Chain Monte Carlo simulations to generate posterior samples . In each iteration we re-sample the parameters in from their full conditional distributions (Gibbs sampling), and we perform two Metropolis-Hastings moves; one on the covariate set and one on the changepoint set . For the four models (M1–M4) Eq. (1) takes the form:
All likelihood terms, , are marginalized over and and for the new M3 model also the Bernoulli parameter has been integrated out.
For the models M1–M2 the dimension of does not depend on , while for the models M3–M4 the dimension of does depend on . The M3 model has a discrete parameter and the M4 model has a continuous parameter for each .
The model-specific full conditional distributions for the Gibbs sampling steps have been provided above. For sampling we implement 3 moves: covariate ‘removal (R)’, ‘addition (A)’, and ‘exchange (E)’. Each move proposes to replace by a new covariate set having one covariate more (A) or less (R) or exchanged (E). When randomly selecting the move type and the involved covariate(s), we get for all models the acceptance probability:
For sampling we also implement 3 move types: changepoint ‘birth (R)’, ‘death (D)’, and ‘re-allocation (R)’ moves. Each move proposes to replace by a new changepoint set having one changepoint added (B) or deleted (D) or re-allocated (R). When randomly selecting the move type, the involved changepoint and the new changepoint location, we get for M1 and M2:
For the models M3 (proposed here) and the model M4 from [6] the changepoint moves also affect the numbers of parameters in and , respectively. For all segments that stay identical we keep the parameters unchanged. For all new segments we re-sample the corresponding parameters. For the new model M3 we flip coins to get candidates for the involved ’s. This yields:
where for birth, for death, and for re-allocation moves. For the model M4 we follow [6] and re-sample the involved ’s from their priors . We obtain:
Note that the additional factor of the Hastings ratio has been canceled with the prior ratio .
Edge scores and areas under precision recall curves (AUC)
For a network with N variables we infer N separate regression models. For each we get a sample from the ith posterior. From the covariate sets we form a sample of graphs . For each edge the edge posterior probability (edge score) is:
If the true network is known and has M edges, we can quantify the network reconstruction accuracy. For each threshold we extract the edges whose scores exceed , and we count the number of true positives among them. Plotting the precisions against the recalls , gives the precision-recall curve. We refer to the area under the curve as AUC value.
Hyperparameter settings and simulation details
The hyperparameters of the priors and hyperpriors of the four NH-DBN models (M1–M4) have to be specified in advance, and we note that the hyperparameter setting can have an effect on the resulting posterior distributions and so on the network reconstruction results. Selecting appropriate hyperparameters is therefore a crucial task. In the absence of genuine prior knowledge (e.g. from experts or from the literature), we re-use the rather uninformative (and thus generic) parameter settings from earlier publications. Re-using those hyperparameters also has the advantage that our empirical results can be compared with earlier reported results. More specifically, we proceed as follows:
For the models M1, M2 and M4 we re-use the hyperparameters from the earlier works by Lèbre et al. [1], Grzegorczyk and Husmeier [5], and Shafiee Kamalabad and Grzegorczyk [6]: with , , and . For the new partially coupled model M3 we use the same setting with the extension: with , which seems to be a very natural choice. For the M3 model we also tested several alternative hyperparameter settings, but we did not observe significantly deviating results, indicating that the M3 model is rather robust with respect to the hyperparameter settings. For more thorough studies on how the hyperparameter setting affects the network reconstruction results, we refer to the work by Grzegorczyk and Husmeier [5].
For all models M1–M4 we run each reversible jump Markov chain Monte Carlo simulation for iterations. Setting the burn-in phase to 0.5V (50%) and thinning out by the factor 10 during the sampling phase, yields samples from each posterior. To check for convergence, we compared the samples of independent simulations, using standard trace plot diagnostics as well as scatter plots of the estimated edge scores. For most of the data sets, analysed here, the diagnostics indicated almost perfect convergence already after iterations; see Fig. 7a for an example.
Fig. 7.
Analysis of the real yeast data. (a) For each run length, we performed 15 RJMCMC simulations with the partially coupled model (M3). We used the hyperparameter for the changepoint prior. For each V there is a scatter plot where the simulation-specific edge scores (vertical axis) are plotted against the average scores for that V (horizontal axis). (b) We implemented the models M1–M4 with different hyperparameters p of the geometric distribution for the distance between changepoints. For each p the bars show the model-specific average AUC scores. The error bars indicate standard deviations
Data
Synthetic network data
For model comparisons we generated various synthetic network data sets. We report here on two studies with realistic network topologies, shown in Figs. 3 and 4. In both studies we assumed the data segmentation to be known. Hence, we kept the changepoints in fixed at their right locations and did not perform reversible jump Markov chain Monte Carlo moves on .
Fig. 3.

Yeast networks. Left: the true yeast network with nodes and edges. Right: yeast network prediction obtained with model M3. The grey (dotted) edges correspond to false positives (negatives)
Fig. 4.

RAF network. RAF pathway with nodes and edges
Study 1 For the RAF pathway with nodes and edges, shown in Fig. 4 and taken from Sachs et al. [16], we generated data with segments having data points each. For each node and its parent nodes in we sampled the regression coefficients for from standard Gaussian distributions and collected them in a vector which we normalised to Euclidean norm 1, . For the segments we use: (, coupled) or (, uncoupled). The design matrices contain a first column of 1’s for the intercept and the segment-specific values of the parent nodes, shifted by one time point. To the segment-specific values of : we element-wise added Gaussian noise with standard deviation . For all coupling scenarios , we generated 25 data sets having different regression coefficients.
Study 2 This study is similar to the first one with three changes: (i) We used the yeast network with nodes and edges, shown in the left panel of Fig. 3 and taken from Cantone et al. [17]. (ii) Again we generated data with segments, but we varied the number of time points per segment . (iii) We focused on one scenario: For each node and its parent nodes in we generated two vectors and with standard Gaussian distributed entries. We re-normalised the first vector to Euclidean norm 1, , and the 2nd vector to norm 0.5, . We set so that the segments and are coupled, and , so that the segments and are coupled, while the coupling between and is ‘moderate’. For each m we generated 25 data matrices with different regression coefficients.
Yeast gene expression data
Cantone at al. [17] synthetically designed a network in S. cerevisiae (yeast) with genes, and measured gene expression data under galactose- and glucose-metabolism: 16 measurements were taken in galactose and 21 measurements were taken in glucose, with 20 minutes intervals in between measurements. Although the network is small, it is an ideal benchmark data set: The network structure is known, so that network reconstruction methods can be cross-compared on real wet-lab data. We follow Grzegorczyk and Husmeier and pre-process the data as described in [5]. The true network structure is shown in the left panel of Fig. 3. As an example, a network prediction obtained with the partially coupled model (M3) is shown in the right panel. For the prediction we extracted the 8 edges with the highest scores.
Arabidopsis gene expression data
The circadian clock in Arabidopsis thaliana optimizes the gene regulatory processes with respect to the daily dark:light cycles (photo periods). In four experiments Arabidopsis plants were entrained in different dark:light cycles, before gene expression data were measured under constant light condition over 24- and 48-h time intervals. We follow Grzegorczyk and Husmeier [5] and merge the four time series to one single data set with data points and focus our attention on the core genes: LHY, TOC1, CCA1, ELF4, ELF3, GI, PRR9, PRR5, and PRR3.
Results
In this section we present the results of a comparative evaluation study, in which we compare the performance of the new partially coupled model (M3) with the competing models M1, M2 and M4. Throughout this section we use the new M3 model as reference model.
Results for synthetic network data
We start with the RAF-pathway for which we generated network data for 8 different coupling scenarios. Figure 5a compares the network reconstruction accuracies in terms of average AUC value differences. For 6 out of 8 scenarios the three AUC differences are clearly and significantly in favour of M3. Not surprisingly, for the two extreme scenarios, where all segments are either coupled (‘0111’) or uncoupled (‘0000’), M3 performs slightly worse than the fully coupled models (M2 and M4) or the uncoupled model (M1), respectively. But unlike the uncoupled model (M1) for coupled data (‘0111’), and unlike the coupled models (M2 and M4) for uncoupled data (‘0000’), the partially coupled model (M3) never performs significantly worse than the respective ‘gold-standard’ model. For the partially coupled model, Fig. 5b shows the posterior probabilities that the segments are coupled. The trends are in good agreement with the true coupling mechanism. Model M3 correctly infers whether the regression coefficients stay similar (identical) or change (substantially). The generalised coupled model (M4) can only adjust the segment-specific coupling strengths, but has no option to uncouple. Like the coupled model (M2), it fails when the parameters are subject to drastic changes. When comparing the coupled model (M2) with the generalised coupled model (M4), we see that M2 performs better when only one segment is coupled, while the new M4 model is superior to M2 if two segments are coupled, see the scenarios ‘0011’, ‘0110’, and ‘0101’.
Fig. 5.
Results for synthetic RAF pathway data. We distinguish 8 coupling scenarios . a Each histogram has three bars for the average AUC differences between the partially coupled model (M3) and the other models: ‘M3 versus M2 [= Coupled]’ (white), ‘M3 versus M4 [= Generalised]’ (black), and ‘M3 versus M1 [= Uncoupled]’ (grey). The error bars indicate t-test confidence intervals. b Diagnostic for the partially coupled model (M3): The bars give the posterior probabilities that segment h is coupled to ()
For the yeast network we generated data corresponding to a ‘0101’ coupling scheme and the change of the parameters (from the 2nd to the 3rd segment) is less drastic than for the RAF pathway data. Figure 6 shows how the AUC differences vary with the number of time points T, where and m is the number of data points per segment. For sufficiently many data points the effect of the prior diminishes and all models yield high AUC values (see bottom right panel). There are then no significant differences between the AUC values anymore. However, for the lower sample sizes again the new partially coupled model (M3) performs clearly best. For model M3 is significantly superior to all other models and for it still significantly outperforms the uncoupled (M1) and the coupled (M2) model. The performance of the generalised model (M4) is comparable to the performance of the uncoupled model. For moderate sample sizes () model M4 is significantly better than the fully coupled model (M2).
Fig. 6.
Results for synthetic yeast data, generated under coupling scenario (0, 1, 1, 1). Five panels show the average AUC differences plotted against the numbers of data points T. The error bars indicate t test confidence intervals. The bottom right panel shows the model-specific average AUC values
Results for yeast gene expression data
For the yeast gene expression data we assume the changepoint(s) to be unknown and we infer the segmentation from the data. Figure 7a shows convergence diagnostics for the partially coupled model (M3). It can be seen from the scatter plots that RJMCMC iterations yield already almost perfect convergence. The edge scores of 15 independent MCMC runs are almost identical to each other.
The average AUC scores of the models M1–M4 are shown in Fig. 7b. Since the number of inferred changepoints grows with the hyperparameter p of the geometric distribution on the distance between changepoints, we implemented the models with different p’s. The uncoupled model is superior to the coupled model for the lowest p () only, but becomes more and more inferior to the coupled model, as p increases. This result is consistent with the finding in Grzegorczyk and Husmeier [5] and can be explained as follows: As the hyperparameter of the changepoint prior increases, the number of inferred data segments H grows so that the individual data segments get shorter. The individual segments h then cover less data points and are thus less informative. The coupling scheme allows for information-sharing among segments. The information content of large segments is sufficient for inference, so that coupling does not provide any noteworthy advantage. But for short (uninformative) segments information coupling improves the inference certainty, as coupling allows for the incorporation of information from the preceding segment(s). Therefore the potential improvement that can be gained by coupling grows with the hyperparameter p.
The new partially coupled model (M3) performs consistently better than the uncoupled and the coupled model (M1–M2). The only exemption occurs for where the coupled model (M2) appears to perform slightly (but not significantly) better than M3. For p’s up to the fully coupled (M2) and the generalised fully coupled model (M4) perform approximately equally well. However, for the three highest p’s the M4 model performs better than the coupled model (M2) and even outperforms the new partially coupled model (M3). While the performances of the models M1–M3 decrease with the number of changepoints, the performance of the model M4 stays rather robust.
Subsequently, we re-analysed the yeast data with fixed changepoints. Figure 8a, b shows the average AUC scores and the AUC score differences in favour of the partially coupled model (M3). Panel (a) reveals that the new partially coupled model (M3) reaches again the highest network reconstruction accuracy. Panel (b) shows that the superiority of M3 is significant, with only one exemption: For the uncoupled model M1 does not perform worse than the partially coupled model (M3).
Fig. 8.
Results for real yeast data with fixed changepoints. We imposed changepoints and kept them fixed. K changepoints yield segments. For each K we used the first changepoint to separate the two parts of the time series (galactose vs. glucose metabolism). Successively we located the next changepoint in the middle of the longest segment to divide it into 2 segments, until K changepoints were set. a show the model-specific average total AUC scores with error bars indicating standard deviations. b shows the AUC score differences in favour of the partially coupled model (M3). Here the error bars indicate t-test confidence intervals
Subsequently, we also investigated the segment-specific coupling posterior probabilities () for the new partially coupled model (M3) and the posterior distributions of the coupling parameters for the generalised model (M4), but we could not find clear trends for any gene. As an example, we provide the results for gene ASH1 in Fig. 9a, b. Panel (a) shows that the coupling posterior probabilities of model M3 do not have a clear pattern. However, it becomes obvious that the partially coupled model makes use of segment-wise switches between the uncoupled and the coupled approach. Panel (b) shows that the distributions of the segment-specific coupling parameters, , of model M4 stay rather similar among segments. This explains why the generalised coupled model (M4) is not superior to the fully coupled model (M2).
Fig. 9.
Results for real yeast data with fixed changepoints. We imposed changepoints and kept them fixed. K changepoints yield segments. For each K we used the first changepoint to separate the two parts of the time series (galactose vs. glucose metabolism). Successively we located the next changepoint in the middle of the longest segment to divide it into 2 segments, until K changepoints were set. a Diagnostic for the partially coupled model (M3): The bars give the posterior probabilities that segment h is coupled to () for target gene ASH1. b Diagnostic for the generalised coupled model (M4): In each panel there is a boxplot for each segment showing the distributions of the logarithmic coupling parameters for target gene ASH1
Application to Arabidopsis gene expression data
For the Arabidopsis gene expression data we cannot objectively compare the network reconstruction accuracies of the four models, since the true circadian clock network is not known. We therefore only applied the new partially coupled model (M3), which we had found to be the best model in our earlier studies. Figure 10 shows the Arabidopsis network, which was reconstructed using the hyperparameter for the geometric distribution on the distance between changepoints. To obtain a network prediction, we extracted the 20 edges with the highest edge scores. Although a proper evaluation of the network prediction is beyond the scope of this paper, we note that several features of the network are consistent with the plant biology literature. E.g. the feedback loop between LHY and TOC1 is the most important key feature of the circadian clock network (see, e.g., the work by Locke et al. [18]). Many of the other predicted edges have been reported in more recent works. E.g. the edges , , , and can all be found in the circadian clock network (hypothesis) of Herrero et al. [19].
Fig. 10.

Prediction of the circadian clock network in Arabidopsis thaliana. The prediction was obtained with the proposed partially coupled model (M3), using the hyperparameter for the geometric distribution on the distance between changepoints. The network shows the 20 edges with the highest edge scores. We have added the label ‘L’ to those edges that have already been reported in the biology literature. Fore more details see the main text
Discussion and conclusions
We have proposed a new Bayesian piece-wise linear regression model for reconstructing regulatory networks from gene expression time series. The new partially coupled model (M3), whose graphical model representation is given in Fig. 2, is a consensus model between the uncoupled model (M1) and the fully coupled model (M2). In the uncoupled model (M1) the segment-specific regression coefficients have to be learned for each segment separately. In the fully coupled model (M2) each segment is compelled to be coupled to the previous one. The new partially coupled model (M3) combines features of the uncoupled and the fully coupled model, and it can infer for each individual time segment whether it is coupled to (or uncoupled from) the preceding segment.
We have cross-compared the new model (M3) with the two established models (M1–M2) as well as with the generalised coupled model (M4) that makes use of segment-specific coupling parameters [6]. In our data applications, the new partially coupled model (M3) reached significantly better network reconstruction accuracies than its competitors (M1, M2, and M4).
In an earlier work [6], we found that the performances of the fully coupled model (M1) and of the generalised fully coupled model (M4) can be improved by imposing additional hyperpriors on the hyperparameters of the coupling strength parameter. In our future work we will therefore investigate whether either the use of hyperpriors or the use of segment specific continuous (coupling/SNR) parameters along the lines of the M4 model can improve the new partially coupled model (M3). Moreover, in our future work we will also try to combine the concept of partially coupled time segments of the proposed model (M3) with the recently proposed concept of partially coupled edges [8]. The combination of both concepts will yield a highly flexible novel NH-DBN model, in which each individual network edge is partially segment-wise coupled. We will empirically test whether this new hybrid model leads to improved network reconstruction results or whether it suffers from model over-flexibility.
Supplementary information
Additional file 1. Graphical model representations of the three competing models are provided as additional files. Figure 11 shows a graphical model representation of the M1 model. Figure 12 shows a graphical model representation of the M2 model. Figure 13 shows a graphical model representation of the M4 model.
Acknowledgements
Not applicable.
About this supplement
This article has been published as part of BMC Bioinformatics Volume 22, Supplement 2 2021: 15th and 16th International Conference on Computational Intelligence methods for Bioinformatics and Biostatistics (CIBB 2018-19). The full contents of the supplement are available at https://bmcbioinformatics.biomedcentral.com/articles/supplements/volume-22-supplement-2.
Abbreviations
- DBN
Dynamic Bayesian network
- NH-DBN
Non-homogeneous dynamic Bayesian network
- MCMC
Markov chain Monte Carlo
- RJMCMC
Reversible jump Markov chain Monte Carlo
- SNR
Signal-to-noise ratio
- AUC
Areas under precision recall curve
Authors' contributions
Both authors contributed equally to the methodological work and both authors. MSK performed the computational work and drafted the manuscript. MG supervised the project and revised the draft version of the manuscript. All authors read and approved the final manuscript.
Funding
Not applicable.
Availability of data and materials
The datasets analysed during the current study are available in the figshare repository, https://figshare.com/s/96f578777aa6b43f3638
We note that the data stem from earlier publications [5, 17].
Ethics approval and consent to participate
Not applicable.
Consent for publication
Not applicable.
Competing interests
The authors declare that they have no competing interests.
Footnotes
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Contributor Information
Mahdi Shafiee Kamalabad, Email: m.shafiee@tilburguniversity.edu.
Marco Grzegorczyk, Email: m.a.gzegorczyk@rug.nl.
References
- 1.Lèbre S, Becq J, Devaux F, Lelandais G, Stumpf MPH. Statistical inference of the time-varying structure of gene-regulation networks. BMC Syst Biol. 2010;4:130. doi: 10.1186/1752-0509-4-130. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Grzegorczyk M, Husmeier D. Improvements in the reconstruction of time-varying gene regulatory networks: dynamic programming and regularization by information sharing among genes. Bioinformatics. 2011;27(5):693–699. doi: 10.1093/bioinformatics/btq711. [DOI] [PubMed] [Google Scholar]
- 3.Dondelinger F, Lèbre S, Husmeier D. Non-homogeneous dynamic Bayesian networks with Bayesian regularization for inferring gene regulatory networks with gradually time-varying structure. Mach Learn. 2012;90:191–230. doi: 10.1007/s10994-012-5311-x. [DOI] [Google Scholar]
- 4.Grzegorczyk M, Husmeier D. Regularization of non-homogeneous dynamic Bayesian networks with global information-coupling based on hierarchical Bayesian models. Mach Learn. 2013;91:105–154. doi: 10.1007/s10994-012-5326-3. [DOI] [Google Scholar]
- 5.Grzegorczyk M, Husmeier D. A non-homogeneous dynamic Bayesian network with sequentially coupled interaction parameters for applications in systems and synthetic biology. Stat Appl Genet Mol Biol SAGMB. 2012;11(4) (Article 7). [DOI] [PubMed]
- 6.Shafiee Kamalabad M, Grzegorczyk M. Improving nonhomogeneous dynamic Bayesian networks with sequentially coupled parameters. Stat Neerl. 2018;72(3):281–305. doi: 10.1111/stan.12136. [DOI] [Google Scholar]
- 7.Shafiee Kamalabad M, Heberle AM, Thedieck K, Grzegorczyk M. Partially non-homogeneous dynamic Bayesian networks based on Bayesian regression models with partitioned design matrices. Bioinformatics. 2019;35(12):2108–2117. doi: 10.1093/bioinformatics/bty917. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Shafiee Kamalabad M, Grzegorczyk M. Non-homogeneous dynamic Bayesian networks with edge-wise sequentially coupled parameters. Bioinformatics. 2020;36(4):1198–1207. doi: 10.1093/bioinformatics/btz690. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Vignes M, Vandel J, Allouche D, Ramadan-Alban N, Cierco-Ayrolles C, Schiex T, Mangin B, De Givry S. Gene regulatory network reconstruction using Bayesian networks, the Dantzig selector, the Lasso and their meta-analysis. PLoS ONE. 2011;6(12):29165. doi: 10.1371/journal.pone.0029165. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Huang X, Zi Z. Inferring cellular regulatory networks with Bayesian model averaging for linear regression (BMALR) Mol Biol Syst. 2014;10(8):2023–2030. doi: 10.1039/c4mb00053f. [DOI] [PubMed] [Google Scholar]
- 11.Xing L, Guo M, Liu X, Wang C, Wang L, Zhang Y. An improved Bayesian network method for reconstructing gene regulatory network based on candidate auto selection. BMC Genom. 2017;18(9):17–30. doi: 10.1186/s12864-017-4228-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Fan Y, Wang X, Peng Q. Inference of gene regulatory networks using Bayesian nonparametric regression and topology information. Comput Math Methods Med. 2017;2017:8307530. doi: 10.1155/2017/8307530. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Xu S, Zhang C-X, Wang P, Zhang J. Variational Bayesian complex network reconstruction. CoRR 2018.
- 14.Gelman A, Carlin JB, Stern HS, Rubin DB. Bayesian data analysis. 2. London: Chapman and Hall/CRC; 2004. [Google Scholar]
- 15.Bishop CM. Pattern recognition and machine learning. Singapore: Springer; 2006. [Google Scholar]
- 16.Sachs K, Perez O, Pe’er D, Lauffenburger DA, Nolan GP. Protein-signaling networks derived from multiparameter single-cell data. Science. 2005;308:523–529. doi: 10.1126/science.1105809. [DOI] [PubMed] [Google Scholar]
- 17.Cantone I, Marucci L, Iorio F, Ricci MA, Belcastro V, Bansal M, Santini S, di Bernardo M, di Bernardo D, Cosma MP. A yeast synthetic network for in vivo assessment of reverse-engineering and modeling approaches. Cell. 2009;137:172–181. doi: 10.1016/j.cell.2009.01.055. [DOI] [PubMed] [Google Scholar]
- 18.Locke JCW, Kozma-Bognár L, Gould PD, Fehér B, Kevei E, Nagy F, Turner MS, Hall A, Millar AJ. Experimental validation of a predicted feedback loop in the multi-oscillator clock of Arabidopsis thaliana. Mol Syst Biol. 2006;2(1):59. doi: 10.1038/msb4100102. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Herrero E, Kolmos E, Bujdoso N, Yuan Y, Wang M, Berns MC, Uhlworm H, Coupland G, Saini R, Jaskolski M, Webb A, Concalves J, Davis SJ. EARLY FLOWERING4 recruitment of EARLY FLOWERING3 in the nucleus sustains the Arabidopsis circadian clock. Plant Cell. 2012;24(2):428–443. doi: 10.1105/tpc.111.093807. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Additional file 1. Graphical model representations of the three competing models are provided as additional files. Figure 11 shows a graphical model representation of the M1 model. Figure 12 shows a graphical model representation of the M2 model. Figure 13 shows a graphical model representation of the M4 model.
Data Availability Statement
The datasets analysed during the current study are available in the figshare repository, https://figshare.com/s/96f578777aa6b43f3638
We note that the data stem from earlier publications [5, 17].






