Skip to main content
Bioinformatics logoLink to Bioinformatics
. 2018 Nov 5;35(12):2108–2117. doi: 10.1093/bioinformatics/bty917

Partially non-homogeneous dynamic Bayesian networks based on Bayesian regression models with partitioned design matrices

Mahdi Shafiee Kamalabad 1, Alexander Martin Heberle 2, Kathrin Thedieck 2,3, Marco Grzegorczyk 1,
Editor: Jonathan Wren
PMCID: PMC6581439  PMID: 30395165

Abstract

Motivation

Non-homogeneous dynamic Bayesian networks (NH-DBNs) are a popular modelling tool for learning cellular networks from time series data. In systems biology, time series are often measured under different experimental conditions, and not rarely only some network interaction parameters depend on the condition while the other parameters stay constant across conditions. For this situation, we propose a new partially NH-DBN, based on Bayesian hierarchical regression models with partitioned design matrices. With regard to our main application to semi-quantitative (immunoblot) timecourse data from mammalian target of rapamycin complex 1 (mTORC1) signalling, we also propose a Gaussian process-based method to solve the problem of non-equidistant time series measurements.

Results

On synthetic network data and on yeast gene expression data the new model leads to improved network reconstruction accuracies. We then use the new model to reconstruct the topologies of the circadian clock network in Arabidopsis thaliana and the mTORC1 signalling pathway. The inferred network topologies show features that are consistent with the biological literature.

Availability and implementation

All datasets have been made available with earlier publications. Our Matlab code is available upon request.

Supplementary information

Supplementary data are available at Bioinformatics online.

1 Introduction

Dynamic Bayesian networks (DBNs) have become a popular tool for learning the topologies of cellular regulatory networks from time series data. The classical (homogeneous) DBN models assume that the network parameters stay constant in time, so that the network structure is inferred along with one single set of network parameters (Friedman et al., 2000). Many regulatory processes are non-stationary so that this homogeneity assumption is too restrictive. To allow for time-dependent parameters, many authors have proposed to combine DBNs with multiple changepoint processes (Ahmed and Xing, 2009; Dondelinger et al., 2013; Grzegorczyk and Husmeier, 2013; Husmeier et al., 2010; Lèbre et al., 2010; Robinson and Hartemink, 2010) or with hidden Markov models (Grzegorczyk, 2016). In those models a multiple changepoint process (or a hidden Markov model) divides the temporal data into disjoint components with component-specific network parameters. The network structure, the data segmentation and the component-specific network parameters are inferred from the data. Those models are often referred to as non-homogeneous DBNs (NH-DBNs).

In many real-world applications, in particular in systems biology, data are often collected under different experimental conditions. That is, instead of one single (long) time series that has to be segmented, there are K (short) time series. The data are then intrinsically divided into K unordered components, and there is no need for inferring the segmentation. In this situation, it is normally not clear a priori whether the network parameters stay constant across components (conditions) or whether they vary from component to component (with the conditions). Three biological systems that we will consider in this article are: Section 4.4: The parameters of a metabolism-related regulatory network in Saccharomyces cerevisiae can depend on the medium, in which yeast is cultured (glucose versus galactose). Section 4.5: The parameters of the circadian clock network in Arabidopsis thaliana can depend on the dark: light cycles, to which the plants were earlier exposed. Section 4.6: The parameters of the mammalian target of rapamycin complex 1 (mTORC1) protein signalling network can change in the presence of insulin. For more examples and a thorough discussion on the integration of single-cell data from multiple experimental conditions we refer to Geissen et al. (2016).

If the parameters stay constant, all data can be merged and analysed with one single homogeneous DBN. If the parameters are component-specific, then the data should be analysed by a NH-DBN. The disadvantage of both approaches is that all parameters are assumed to be either constant (DBN) or component-specific (NH-DBN). In real-world applications there can be both types of parameters, so that both models are inappropriate. For example, if a variable Y is regulated by two other variables, X1YX2, then the interaction X1Y can stay constant, while X2Y might be component-specific, e.g. for K = 2 and in terms of a regression model:

E[Y|X1=x1,X2=x2]={αx1+βx2ifk=1αx1+γx2ifk=2 (1)

A DBN ignores that β and γ are different. A NH-DBN has to infer the same parameter α two times separately. Both model misspecifications can increase the inference uncertainty, and are thus critical for sparse data.

No tailor-made model for the situation in (1) has been proposed yet. To fill this gap, we propose a partially NH-DBN model, which infers the best trade-off between a DBN and a NH-DBN. Unlike all earlier proposed NH-DBNs, the new partially NH-DBN model operates on the individual interactions (network edges). For each interaction there is a parameter, and the model infers from the data whether the parameter is constant or component-specific. We implement the new model in a hierarchical Bayesian regression framework, since this model class reached the highest network reconstruction accuracy in the cross-method comparison by Aderhold et al. (2014). But we note that the underlying idea is generic and could also be implemented in other frameworks, e.g. via L1-regularized regression models (‘LASSO’).

In Section 2.5 we propose a Gaussian process (GP)-based method to deal with the problem of non-equidistant measurements. The standard assumption for all NH-DBNs is that data are measured at equidistant time points. For applications where this assumption is not fulfilled, we propose to use a GP to predict the values at equidistant data points and to replace the non-equidistant values by predicted equidistant values. We will make use of the GP method when analysing the mTORC1 data in Section 4.6.

2 Materials and methods

DBNs and NH-DBNs are used to infer networks showing the regulatory interactions among variables Z1,,ZN. The interactions are subject to a time lag, so that there is no need for an acyclic network structure. Hence, dynamic network inference can be thought of as inferring the covariate sets for N independent regression models. In the ith model, Zi is the response and the remaining N*:=N1 variables Z1,,Zi1,Zi+1,,ZN at time point t − 1 are used as potential covariates for Zi at time point t. The goal is to infer a covariate set for each Zi, and the system of covariate sets describes a network; see Section 2.6 for details. As the same regression model is applied to each Zi separately, we describe it using a general notation, where Y is the response and X1,,Xn are the covariates.

2.1 Bayesian regression with partitioned design matrix

We consider a regression model with response Y and covariates X1,,Xn. We assume that data were measured under K experimental conditions, which we refer to as K components. We further assume that the data for each component k{1,,K} were measured at equidistant time points t=1,,Tk. Let yk,t and xi,k,t denote the values of Y and Xi at the tth time point of component k. In dynamic networks, the interactions are subject to a time lag O, which is usually set to one time point. That is, the values x1,k,t,,xn,k,t correspond to the response value yk,t+1. For each component k we build a component-specific response vector yk and the corresponding design matrix Xk, where Xk includes a first column of 1’s for the intercept:

yk=(yk,2,,yk,Tk)T,Xk=(1x1,kxn,k)

where xi,k=(xi,k,1,,xi,k,Tk1)T

For each k we could assume a separate Gaussian likelihood:

ykNTk1(Xkβk,σk2I)(k=1,,K) (2)

where I is the identity matrix, βk=(βk,0,βk,1,,βk,n)T is the component-specific vector of regression coefficients, and σk2 is the component-specific noise variance. Imposing independent priors on each pair {βk,σk2}, leads to K independent models. Alternatively, we could merge the data y:=(y1T,yKT)T and X:=(X1T,,XKT)T and employ one model for the merged data:

yNT(Xβ,σ2I)whereT:=k=1K(Tk1) (3)

so that β=(β0,β1,,βn)T would apply to all components.

When some covariates have a component-specific and other covariates have a constant regression coefficient, both likelihoods (2) and (3) are suboptimal. For this situation, we propose a new partially non-homogeneous regression model that infers the best trade-off from the data. The key idea is to use a likelihood with a partitioned design matrix.

For now, we assume that we know for each coefficient whether it is component-specific or constant. Let the intercept and the first n1<n coefficients stay constant while the remaining n2=nn1 coefficients are component-specific. We then have the regression equation:

yk,t+1=β0+i=1n1βi·xi,k,t+i=n1+1nβk,i·xi,k,t+ϵk,t+1

where ϵk,t+1 N(0,σ2), and the likelihood takes the form:

yNT(XBβB,σ2I) (4)

where βB is a vector of (1+n1+K·n2) regression coefficients, and XB is a partitioned matrix with T=(Tk1) rows and (1+n1)+(K·n2) columns. For example, for K = 2 the matrix XB has the structure:

(1x1,1xn1,1xn1+1,1xn,1001x1,2xn1,200xn1+1,2xn,2),

where xi,k=(xi,k,2,,xi,k,Tk1)T, and βB is of the form:

((β0,β1,,βn1),(βn1+1,1,,βn,1),(βn1+1,2,,βn,2))T

The first subvector of βB is the vector β*:=(β0,β1,,βn1)T of the regression coefficients that stay constant, and then there is a subvector βk:=(βn1+1,k,,βn,k)T for each component k with the component-specific regression coefficients. For the noise variance parameter σ2 we use an inverse Gamma prior, σ2Gam(a,b), and on β* we impose a Gaussian prior with zero mean vector:

β*Nn1+1(0,σ2λ*2I) (5)

For the component-specific vectors β1,,βK we adapt the idea from Grzegorczyk and Husmeier (2013), and impose a hyperprior:

βkNn2(μ,σ2λ2I)(k=1,,K)andμNn2(μ0,Σ0) (6)

The hyperprior couples the vectors β1,,βK hierarchically and encourages them to stay similar across components. Re-using the variance parameter σ2 in (5 and 6) allows the regression coefficient vectors and the noise variance to be integrated out in the likelihood, i.e. the marginal likelihood p(y|λ*2,λ2,μ) to be computed analytically (see below). For λ*2 and λ2 we also use inverse Gamma priors:

λ*2Gam(α*,β*)and λ2Gam(α,β)

The prior of βB=(β*T,β1T,,βKT)T is a product of Gaussians:

p(βB|σ2,λ2,λ*2,μ)=p(β*|σ2,λ*2)·k=1Kp(βk|σ2,λ2,μ)

Given σ2,λ2,λ*2, and μ, the Gaussians are independent, so that:

βB|(σ2,λ2,λ*2,μ)N1+n1+K·n2(μ˜,σ2Σ˜)
with:μ˜=(0T,μT,,μT)TandΣ˜=(λ*2I*00λ2I)

where I* is the (n1+1)-dimensional and I the (K·n2)-dimensional identity matrix. We have for the posterior distribution:

p(βB,σ2,λ*2,λ2,μ|y)p(y|σ2,βB)·p(βB|σ2,λ2,λ*2,μ)·p(μ)·p(σ2)·p(λ*2)·p(λ2) (7)

A graphical model representation of the new regression model is provided in Figure 1. The full conditional distributions of βB,σ2,λ*2,λ2 and μ can be computed analytically, so that Gibbs-sampling can be applied to generate a posterior sample. As the derivations are mathematically involved, we relegate them to Part A of the Supplementary Material.

Fig. 1.

Fig. 1.

Graphical model representation of the regression model with partitioned design matrix. Variables that have to be inferred are in white circles. The data and the fixed hyperparameters are in grey circles. The vector βB deterministically depends on β* and β1,,βK. The vector βk in the plate is condition-specific

The marginalization rule from Section 2.3.7 of Bishop (2006) yields:

p(y|λ2,λ*2,μ)=Γ(T2+a)Γ(a)·πT2(2b)adet(C)1/2··(2b+(yXBμ˜)TC1(yXBμ˜))(T2+a)where T:=k=1K(Tk1),andC:=I+XBΣ˜XBT. (8)

2.2 Inferring the relevant covariates and their types

In typical applications, there is a set of N* variables, and the subset of the relevant covariates has to be inferred from the data. Each covariate can be either constant (δ = 1) or component-specific (δ = 0). Let Π={X1,,Xn} be a subset of the N* variables, and let δ=(δ0,δ1,,δn)T be a vector of binary variables, where δi indicates whether Xi has a constant (δi=1) or component-specific (δi=0) regression coefficient. The first element, δ0, refers to the intercept.

The goal is then to infer the covariate set Π and the corresponding indicator vector δ from the data. For any combination of Π and δ, the partitioned design matrix XB=XB,Π,δ can be built, and the marginal likelihood p(y|λ2,λ*2,μ,Π,δ) can be computed with (8). We get for the posterior:

p(Π,δ,λ*2,λ2,μ|y)p(y|λ2,λ*2,μ,Π,δ)·p(Π)·p(δ|Π)··p(μ|Π,δ)·p(λ*2)·p(λ2)

where p(μ|Π,δ) is a Gaussian, whose dimension is the number of component-specific coefficients. For the covariate sets, Π, we follow Grzegorczyk and Husmeier (2013) and assume a uniform distribution, truncated to |Π|3. The prior p(δ|Π) will be specified in Section 2.5. To generate samples from the posterior, we use a Markov Chain Monte Carlo (MCMC) algorithm, which combines the Gibbs-sampling steps for βB,σ2,λ*2,λ2 and μ with two blocked Metropolis Hastings (MHs) moves. In the first MH move the vector δ is sampled jointly with μ, and in the second MH move Π is sampled jointly with δ and μ. As the implementation of the MCMC algorithm is involved, we relegate the mathematical details to Parts B and C of the Supplementary Material.

2.3 Competing models

A homogeneous model merges all data, while a non-homogeneous model assumes each component k to have specific parameters; see (2). The new partially non-homogeneous model infers the best trade-off: Each regression coefficient can be either constant or component-specific.

For a fair comparison, we also allow the non-homogeneous model to switch between a homogeneous and a non-homogeneous state. However, like all models that have been proposed so far, it operates on the covariate sets. All covariates have either component-specific (S = 0) or constant (S = 1) regression coefficients. In our method comparison, we include:

  • DBN: A homogeneous model that merges all data, see (3).

  • NH-DBN: The NH-DBN model switches between two states. We have a DBN for S = 1, and the likelihood takes the form of (2) for S = 0.

  • coupled NH-DBN: This model from Grzegorczyk and Husmeier (2013) is an NH-DBN that globally couples the regression coefficients.

2.4 Specifying the covariate type prior

The NH-DBNs can switch between: ‘all covariates are constant’ (S = 1) versus ‘all covariates are component-specific’ (S = 0). Those states refer to δ=1 and δ=0 of the partially NH-DBN. To match the priors, we set:

p(S=1)p(S=0)=p(δ=1|Π)p(δ=0|Π) (9)

For Π={X1,,Xn},δ contains n + 1 binary elements, which we assume to be independently Bernoulli distributed. To fulfil (9) the Bernoulli parameter must depend on n=|Π|. We get: p(δ=1|Π)=θnn+1 and p(δ=0|Π)=(1θn)n+1. From (9) we obtain:

r:=p(S=1)p(S=0)=θnn+1(1θn)n+1θn=(r1+r)1/(n+1)andp(δ|Π)=θni=0nδi·(1θn)1=0n(1δi)

For mixture models it is often assumed that the number of components K˜ has a Poisson distribution (Green, 1995). We truncate it to K˜{1,K}:

p(S=0)=q(K)q(1)+q(K) and p(S=1)=q(1)q(1)+q(K)

where q(.) is the density of the Poisson distribution with parameter θ = 1.

2.5 GP smoothing for non-equidistant data

The regression models assume that the time lag O between the response value yk,t+1 and the covariate values x1,k,t,,xn,k,t is the same for all t. If the data within a component k were measured at time points t1,,tTk, with varying distances Oi:=titi1, the models lead to biased results. For this scenario, we propose to replace the observed non-equidistant response values by predicted equidistant response values. We propose the following GP-based method:

  • Determine the lowest time lag O*=min{O2,,OTk}, where Oi:=titi1.

  • Given the observed data points {(t,yk,t):t=t1,,tTk}, use a GP to predict the whole curve {(t,yk,t)}t0.

  • Extract the response values at the time points: t1+O*,,tTk+O*.

  • Build the response vector and design matrix such that the values x1,k,ti,,xn,k,ti are used to explain the predicted response value yk,ti+O* (i=1,,Tk). The new lag is then constant; Ot=O*.

A GP is a stochastic process {Yk,t}t0, here indexed by time, such that every finite subset of the random variables has a Gaussian distribution. A GP can be used to estimate a non-linear curve (t,yk,t)t0 from noisy observations. We here assume the relationship: yk,t=f(t)+ϵt where ϵtN(0,σ2) is observational noise, and the non-linear function f(.) is unknown. We estimate f(.) by fitting a GP to the observed data. The GP defines a distribution over the functions f(.), which transforms the input (t1,,tTk) into output (yk,t1,,yk,tTK), such that

(Yk,t1,,Yk,tTK)TNTk(0,K+σ2I) (10)

where I is the identity matrix, and the elements of the Tk-by-Tk covariance matrix, K, are defined through a kernel function: Ki,j=ξ2·k(ti,tj) with signal variance parameter ξ2. The kernel function k(.,.) is typically chosen such that similar inputs ti and tj yield correlated variables Yti and Ytj. A popular and widely used kernel is the squared exponential kernel with: k(ti,tj)=exp(12·(titj)2l2) where l is the length scale. For the unobserved vector yk,*:=(yk,t1+O*,,yk,tTk+O*)T we then have the predictive distribution:

yk,*N(y^k,*,Σ^k,*) (11)

with

y^k,*:=(K*+σ2I)·(K+σ2I)1·yΣ^k,*:=(K**+σ2I)(K*+σ2I)(K+σ2I)1(K*+σ2I)T (12)

where y:=(yk,t1,,yk,tTK)T is the observed response vector, and K* and K** are Tk-by-Tk matrices, whose elements are given by: (K*)i,j:=ξ2·k(ti+O*,tj) and (K**)i,j:=ξ2·k(ti+O*,tj+O*). Before inferring the GP, we standardize y to mean 0, and we impose log-uniform priors on the GP parameters (σ2,ξ2 and l). For predicting the unobserved response vector, we have to make two decisions:

  1. The GP parameters can either be sampled via MCMC simulations or their maximum a posteriori (MAP) estimates can be computed.

  2. Given GP parameters, the vector yk,* can be sampled from (11) or it can be set equal to its predictive expectation, y^k,*, defined in (12).

We have implemented and cross-compared all four combinations. For lack of space, we here report the results obtained for predictive expectations based on MAP estimates. A comparison of the four approaches can be found in Part D of the Supplementary Material.

2.6 Learning topologies of regulatory networks

Assume that the variables Z1,,ZN interact with each other in form of a network and that data were collected under K conditions and that the conditions influence some of the interactions. Let Dk denote the N-by-Tk data matrix which was measured under condition k. The rows of Dk correspond to the variables and the columns of Dk correspond to Tk time points. Dk,i,t denotes the value of Zi at time point t under condition k.

The goal is to infer the network structure. Interactions for temporal data are usually modelled with a time lag, e.g. of order O=1. An edge, ZjZi, indicates that Zj has an effect on Zi in the following sense: For all k the value Di,k,t+1 (Zi at t + 1) depends on Dj,k,t (Zj at t).

There is no acyclicity constraint, and DBN inference can be thought of as inferring N separate regression models and combining the results. In the ith model Y:=Zi is the response. The remaining N*:=N1 variables Z1,,Zi1,Zi+1,,ZN are the potential covariates. For each Y:=Zi we infer a covariate set Πi, and the covariate sets Π1,,ΠN describe a network N. There is the edge ZjZi in the network N if and only if ZjΠi.

We can thus apply the partially non-homogeneous model to each Y = Zi separately, to generate posterior samples. We extract the covariate sets, Πi(1),,Πi(R) (i=1,,N), and we merge them to a network sample N(1),,N(R). The rth network N(r) possesses the edge ZjZi if and only if ZjΠi(r). For each edge ZjZi we can then estimate its marginal posterior probability (‘score’):

s^j,i=1Rr=1RIji(N(r))whereIji(N(r))={1ifZjΠi(r)0ifZjΠi(r)

When the true network is known, we can evaluate the network reconstruction accuracy with precision-recall curves. For each ψ[0,1] we extract the n(ψ) edges whose scores s^j,i exceed ψ, and we count the number of true positives T(ψ) among them. Plotting the precisions P(ψ):=T(ψ)/n(ψ) against the recalls R(ψ):=T(ψ)/M, where M is the number of edges in the true network, gives the precision–recall curve (Davis and Goadrich, 2006). We refer to the area under the curve as AUC value. The higher the AUC, the higher the reconstruction accuracy.

3 Implementation

For the inverse Gamma distributed parameters (σ2,λ*2,λ2) we use shape and rate parameters from earlier works, e.g. in Grzegorczyk and Husmeier (2013) and Lèbre et al. (2010): σ2Gam(0.005,0.005) and λ*2,λ2Gam(2,0.2) and for the hyperprior on μ we use μ0=0 and Σ0=I. Other settings led to comparable results what indicates robustness w.r.t. those hyperparameters. To ensure a fair comparison we use the same hyperparameters for the competing models; cf. Section 2.3.

For generating posterior samples, we run the MCMC algorithm from Section 2.2 for 100 000 (100k) iterations. We set the burn-in phase to 50k and we sample every 100th graph during the sampling phase. This yields R = 500 posterior samples for each response Y = Zi. We merge the individual covariate sets Πi(r) (i=1,,N; r=1,,R) to a network sample N(1),,N(R), as explained in Section 2.6. For each edge ZjZi we then compute its edge score s^j,i.

We used scatter plots of edge scores from independent simulations to monitor convergence. In Section 4.3 we study convergence, scalability and the computational costs for model inference.

We implement the GP method with the squared exponential kernel and used the Matlab package ‘GPstuff’ (Vanhatalo et al., 2013) to numerically determine the MAP estimates of the parameters via scaled conjugate gradient optimization. We also tested other kernels, such as the Matern 3/2 and 5/2 kernel, and for them we obtained very similar results.

4 Empirical results

4.1 Pre-study 1: GP smoothing

Our first objective is to provide empirical evidence that the proposed GP method from Section 2.5 can yield substantial improvements. To this end, we generate values for 10 autoregressive (AR) variables:

Xi,t=1η·Xi,t1+η·ϵi,t(t=0,,120;i=1,,10) (13)

where all ϵi,t’s are independently N(0, 1) distributed, Xi,1N(0,1) for all i, and η(0,1). This yields: Xi,tN(0,1) for all t and all i. We further assume that X1 and X2 are covariates for:

Yt+1=β0+β1X1,t+β2X2,t+ϵy,t+1 (14)

where ϵy,t+1N(0,0.012).

In a second scenario we replace (13) by moving averages (MA):

Xi,t=j=tqtϵi,j(t=0,1,,120;i=1,,10) (15)

where all ϵi,t’s are independently N(0,(q+1)1) distributed, so that again Xi,tN(0,1) for all i and t.

We generate data for both scenarios (AR and MA) with different parameter settings (β0,β1,β2) in (14) and η in (13), respective q in (15). We thin the data out and keep only the observations at the time points t{0,1,3,5,10,15,30,45,60,120}, as the same time points were measured for the mTORC1 data; see Section 4.6. The standard regression approach uses the covariate values at ti for explaining Y at ti+1, although the time lag steadily increases. The GP method from Section 2.5 predicts the response values at ti+O*, and replaces yti+1 (observed Y at ti+1) by y^ti+O* (predicted Y at ti+O*), where O*=1.With both approaches we run MCMC simulations on each dataset, and from the MCMC samples we compute for each covariate Xi the score that Xi is a covariate for Y. Our results show that the proposed GP method finds the true covariates X1 and X2, while the standard approach cannot clearly distinguish them from the irrelevant variables X3,,X8. Figure 2 shows histograms of the average covariate scores for AR data with βi=1 and η=0.2, and for MA data with βi=1 and q = 10.

Fig. 2.

Fig. 2.

Average scores (posterior probabilities). In each histogram, the dark grey bars refer to the scores of the true covariates, and the light grey bars refer to the irrelevant variables. Covariate values were generated via AR (top) and MA processes (bottom). The left histograms show the scores of a standard regression (without GP processing). The right histograms show the scores when the proposed GP method is used. Error bars indicate SDs

4.2 Pre-study 2: Synthetic RAF-pathway data

The RAF pathway, as reported in Sachs et al. (2005), consists of N = 11 nodes and 20 directed edges. The topology of the RAF pathway is shown in Part E of the Supplementary Material. We generate data with K = 2 components and Tk = 10 data points each. The parent nodes of each node Zi build its covariate set Πi. We assume a linear model with component-specific regression coefficients:

zi,k,t+1=βk,0i+j:ZjΠiβk,ji·zj,k,t+ek,ti(k=1,2)

where zi,k,t denotes the value of node Zi at time point t in component k, and βk,ji is the regression coefficient for ZjZi in component k. The noise values ek,ti and the initial values zi,k,1 are sampled from independent N(0,0.052) distributions. For Zi there are 2(|Πi|+1) component-specific regression coefficients. For each Zi we collect them in two vectors βki (k = 1, 2), and we sample the elements of βki from N(0, 1) Gaussian distributions. We then re-normalize the vectors to Euclidean norm one: βkiβki/|βki| (k = 1, 2). We distinguish six scenarios:

  • (S1) Identical: We withdraw β2i and assume that the same regression coefficients apply to both components. We set: β2i=β1i for all i.

  • (S2) Identical signs (correlated): We enforce the coefficients to have the same signs, i.e. we replace β2,ji by: β2,ji:=sign(β1,ji)·|β2,ji| for all i and j.

  • (S3) Uncorrelated: We use the vectors βki for component k (k = 1, 2). The component-specific coefficients β1,ji and β2,ji are then uncorrelated for all i and all j.

  • (S4) Opposite signs (negatively correlated): We withdraw the vector β2i and we set: β2,ji=(1)·β1,ji. The coefficients β1,ji and β2,ji are then negatively correlated.

  • Mixture of (S1) and (S3): We assume that 50% of the coefficients are identical for both k, while the other 50% are uncorrelated. We randomly select 50% of the coefficients and set: β2,ji=β1,ji. The other 50% of the coefficients stay unchanged (uncorrelated).

  • Mixture of (S1) and (S4): We withdraw β2i and we assume that 50% of the coefficients are identical for both k, while the other 50% have an opposite sign. We randomly select 50% of the coefficients and set: β2,ji=β1,ji. For the other coefficients we set β2,ji=(1)·β1,ji.

For each scenario we generate 25 datasets. We then analyse every dataset with each model. Figure 3 shows the average AUC values for reconstructing the RAF pathway. Only for scenario (S1), where all coefficients are constant, the models perform equally well. For (S2–S6) the homogeneous DBN is substantially worse than the NH-DBNs. The coupled NH-DBN is slightly superior to the (non-coupled) NH-DBN. The proposed partially NH-DBN yields the highest average AUC scores.

Fig. 3.

Fig. 3.

Network reconstruction accuracy for RAF pathway. The histograms show the average precision-recall AUC values. Each AUC is averaged across 25 datasets and the error bars indicate SDs. The bars refer to: the homogeneous DBN (white), the NH-DBN model (light-grey), the coupled NH-DBN (dark-grey) and the partially NH-DBN (black). For (S2–5) the AUC differences are significantly in favour of the new partially NH-DBN (two-sided paired t-test P-values < 0.05)

4.3 Pre-study 3: Scalability and computational costs

To study the scalability of the new network reconstruction method, we generate data for random network structures with N{10,25,50,100} nodes. For each node Zi we first sample the number of parents xi from a Poisson distribution with parameter λ = 1 (‘Poisson in-degree distribution’), before we randomly draw a parent set Πi from a uniform distribution over the system of all parent sets with cardinality xi, {Πi:|Πi|=xi}. Given the network structure, we generate data as described in Section 4.2, i.e. via regression relationships using K = 2 and Tk = 10. Here we present and discuss the results for the scenario: ‘Mixture of (S1) and (S3)’. For each N we generate 10 independent datasets, i.e. 40 in total. Next, we measure how many MCMC iterations W we can perform in 1 h. With our Matlab implementation on a desktop computer with Intel Xeon 2.5 GHz processor and 8GB of RAM, the average numbers of iterations per hour are: W = 208 637 (N = 10), W = 83 615 (N = 25), W = 41 666 (N = 50) and W = 19 855 (N = 100).

To monitor convergence and network reconstruction accuracy in real-time, we perform long MCMC simulations. For each N we select the numbers of iterations such that the simulation takes 16 h. During the simulations we sample 200 equidistant networks per hour (i.e. 3200 networks in total). When withdrawing the first 50% of the networks (‘burn-in’), we have Rh=100h networks after h hours. We use those Rh networks to assess the performance after h hours of computational time. For running two independent 16 h long MCMC simulations on each of the 40 datasets (computational time: 1280 h), we use a computer cluster.

To assess convergence, we consider scatter plots of the edge scores of two independent MCMC simulations on the same dataset. For the largest networks with N = 100, Figure 4a shows superimposed scatter plots for different computational times. It can be seen that after 2–4 h only few edges points deviate from the diagonal, i.e. only few edge scores differ between independent simulations. This is a good indication of convergence. The corresponding scatter plots for the smaller networks with N{10,25,50} can be found in Part F of the Supplementary Material. As expected, the Supplementary Figures show that the rate of convergence decreases with the size of the network N.

Fig. 4.

Fig. 4.

Convergence analysis. (a) For each of 10 datasets two independent MCMC simulations have been performed. The edge scores of the two simulations can be plotted against each other. Each panels refer to a computational time and show 10 superimposed scatter plots. (b) Both panels show curves of the average AUC value against the computational time. In the upper panel the network size N varies, while the no. of data points is kept fixed. In the lower panel for N = 100 the no. of data points is varied

The upper panel of Figure 4b monitors the average precision-recall AUC scores along the computational time. The four AUC curves run into plateaus. Already after 1–2 h the AUCs (i.e. the average network reconstruction accuracy) does not improve anymore. The two curves for N = 10 and = 25 converge to AUC = 0.72, while the AUC limits are lower for N = 50 (AUC = 0.58) and N = 100 (AUC = 0.49). The individual AUC curves are shown in Part F of the Supplementary Material. Taking into account that a network among N = 100 nodes had to be inferred from 20 data points (K = 2 conditions with Tk = 10 data points each), the rather low network reconstruction accuracy (AUC = 0.49) is not surprising. To show that higher AUCs can be reached for networks with N = 100 nodes, we repeat the study with Tk = 20 and = 40. The bottom panel of Figure 4b monitors the average AUC scores for N = 100 and Tk{10,20,40}. Here, we had to adapt the numbers of MCMC iterations per hour to W = 12 707 (Tk = 20), and W = 9343 (Tk=40). The new curves also run into plateaus and reach higher limits: AUC = 0.70 and = 0.83. The results of this section are compactly summarized in Table 1.

Table 1.

Summary of scalability results

Nodes N 10 25 50 100 100 100
Data points 2·Tk 20 20 20 20 40 80
Iterations per hour 208 637 83 615 41 666 19 855 12 707 9 343
AUC limit 0.72 0.72 0.58 0.49 0.70 0.83

Note: See Section 4.3 for further details.

4.4 Reconstructing the yeast gene network topology

By means of synthetic biology, Cantone et al. (2009) designed a network with N = 5 genes in S.cerevisiae (yeast); Figure 5 shows the true network. With quantitative Real-Time Polymerase Chain Reaction, Cantone et al. (2009) then measured in vivo gene expression data: under galactose- (k = 1) and glucose-metabolism (k = 2). T1=16 measurements were taken in galactose and T2=21 in glucose. The data have become a benchmark application, as the network reconstruction accuracies can be cross-compared on real in vivo gene expression data. Figure 5 shows the results, and again a clear trend can be seen: The homogeneous DBN yields the lowest AUC value. The NH-DBN model yields higher AUCs and can be further improved by coupling the regression coefficients (coupled NH-DBN). The proposed partially NH-DBN reaches the highest network reconstruction accuracy. The results are thus consistent with the results for the RAF-pathway data in Section 4.2.

Fig. 5.

Fig. 5.

Network reconstruction accuracy for yeast gene expression data. The histogram shows the average precision-recall AUC values, averaged across 25 MCMC simulations, with error bars indicating SDs. The AUCs are: 0.61 (DBN), 0.69 (NH-DBN), 0.81 (coupled NH-DBN) and 0.87 (new NH-DBN). All three AUC differences are significant in terms of two-sided t-tests (P < 10–3)

4.5 The circadian clock network in A.thaliana

The circadian clock network in A.thaliana orchestrates the gene regulatory processes, related to the plant metabolism, with respect to the daily changing dark: light cycles of the solar day. The mechanism of internal time-keeping allows the plant to anticipate each new day, at the molecular level, and to optimize its growth. In K = 4 experiments Arabidopsis plants were entrained in different dark: light cycles, before the gene expressions of N = 9 circadian clock genes were measured under experimentally controlled constant light condition. The numbers of observed time points are T1=12 and Tk = 13 for k = 2, 3, 4. For further details on the experimental protocols we refer to Grzegorczyk (2016). Figure 6a shows a network that was inferred with the new partially coupled model. For the prediction, we extracted the 21 edges with edge scores higher than the threshold ψ=0.5. A proper biological evaluation of the network is hindered and beyond the scope of this article, as the true circadian clock network has not been fully discovered yet. However, our predicted network in Figure 6a contains many edges that are consistent with hypotheses from the plant biology literature. In particular, the high-scoring feedback loop LHY ↔TOC1 seems to be the most important key feature of the circadian clock network (see, e.g. Locke et al., 2006). Moreover, it has been reported that LHY is a regulator of the genes ELF3 and ELF4 (Alabadi et al., 2001; Kikis et al., 2005). Also the edge LHY →CCA1 is not unexpected, as LHY and CCA1 are known to be biological homologues (Miwa et al., 2007). Four more edges, for which we could find biological literature references, are: GI →TOC1 (Locke et al., 2005), ELF3 →LHY (Kikis et al., 2005), ELF3 →TOC1 (Dixon et al., 2011) and ELF3 →PRR9 (Chow et al., 2012).

Fig. 6.

Fig. 6.

Shown are the edges whose scores exceeded the threshold ψ=0.5; edges are labelled with their scores. Edges with scores higher than ψ=0.8 are in bold. (a) Inferred circadian clock network in Arabidopsis thaliana. (b) Inferred topology of the mTORC1 signalling pathway

4.6 The mTORC1 network

The mammalian target of rapamycin complex 1 (mTORC1) is a serine/threonine kinase which is evolutionary conserved and essential in all eukaryotes (Saxton and Sabatini, 2017). mTORC1 is at the centre of a multiply wired, complex signalling network, whose topology is well studied and contains several well-characterized feedback loops (Saxton and Sabatini, 2017). Hence, we used the mTORC1 network as a surrogate based on which we can objectively evaluate the predictive power of our partially NH-DBN model for learning network structures. The signalling network converging on mTORC1 is built by kinases, which inactivate or activate each other by phosphorylation. Thus, a protein can be phosphorylated at one or several sites, and the phosphorylations at these positions determine its activity. Signalling through the mTORC1 network is elicited by external signals like insulin or amino acids. Dalle Pezze et al. (2016) relatively quantified 11 phosphorylation states of 8 key proteins across the mTORC1 signalling network by immunoblotting; for an overview see Table 2. Dynamic time course data were obtained under two experimental conditions, namely upon stimulation with amino acids only (k = 1), and with amino acids plus insulin (k = 2). The phosphorylation states were measured at Tk = 10 time points: t=0,1,3,5,10,15,30,45,60,120 minutes, so that the time lag increases from 1 to 60. We therefore apply the GP method from Section 2.5 to predict equidistant response values, before analysing the data with the proposed partially NH-DBN. The 12 edges with scores higher than ψ=0.5 yield the network topology shown in Figure 6b. A literature review shows that 11 out of the 12 edges have been reported earlier.

Table 2.

mTORC1 timecourse data

Protein Full name Sites
mTOR mammalian target of rapamycin pS2481, pS2448
PRAS40 proline-rich AKT/PKB substrate 40 kDa pT246, pS183
AKT Protein kinase B pT308, pS473
IRS1 insulin receptor substrate 1 pS636
IR-beta insulin receptor beta pY1146
AMPK AMP-dependent protein kinase pT172
TSC2 tuberous sclerosis 2 protein pS1387
p70-S6K Ribosomal protein S6 kinase beta-1 pT389

Note: Overview to the 8 proteins and the 11 measured phosphorylation sites.

We focus first on the five interactions with the highest scores ψ>0.8. Two out of these five interactions are enzyme-substrate relationships: p70-S6K is a kinase which is directly activated by mTORC1 through phosphorylation at threonine 389 [p70-S6K-pT389] (Saxton and Sabatini, 2017). Thus, p70-S6K-pT389 represents a direct readout of mTORC1 activity. p70-S6K phosphorylates IRS1 at serine 636, [IRS1-pS636] (Tzatsos and Kandor, 2006) and mTOR at serine 2448 [mTOR-pS2448] (Dibble and Cantley, 2015), and both edges are correctly identified by our model [p70-S6K-pT389→IRS1-pS636, p70-S6K-pT389→mTOR-pS2448]. Two other interactions with a high score are between AKT-pT308 ↔AKT-pS473. The two phosphorylations are predicted by our model to influence each other, and a positive feedback between phosphorylation events on S473 and T308 of AKT has indeed been demonstrated biochemically (Manning and Toker, 2017). Another high score prediction is between IRS1-pS636 and p70-S6K-pT389 [IRS1-pS636 →p70-S6K-pT389]. Phosphorylation at S636 inhibits IRS1, thereby leading to inhibition of mTORC1 and its substrate p70-S6K-T389 (Tzatsos and Kandor, 2006). Thus, the negative feedback between IRS1-pS639 and p70-S6K-pT389 explains the learned edge between them [IRS1-pS636→p70-S6K-pT389]. In addition, IRS1 inhibition by phosphorylation at S636 results in reduced phosphorylation of AKT at threonine 308, which is in agreement with the learned edge between IRS1-pS636 and AKT-pT308 [IRS1-pS636→AKT-pT308].

We could also find evidence for 6 of the remaining 7 edges with scores in between 0.5 and 0.8. PRAS40 is an endogenous mTORC1 inhibitor (Saxton and Sabatini, 2017). The edge from PRAS40-pT246 to PRAS40-pS183 corresponds to a well-described mechanism of PRAS40 regulation: AKT phosphorylates PRAS40 at T246 [PRAS40-pT246], which allows subsequent phosphorylation of PRAS40-S183 by mTORC1 (Nascimento et al., 2010). This interaction is accurately resembled by our model [PRAS40-pT246→ PRAS40-pS183]. PRAS40’s double phosphorylation dissociates PRAS40 from mTORC1, leading to its derepression (Nascimento et al., 2010). This mechanism is resembled by the edge between PRAS40-S183 and mTOR-S2481 [PRAS40-pS183→mTOR-pS2481], the latter being an autophosphorylation site which directly monitors mTOR activity (Soliman et al., 2010). Furthermore, the model suggests an edge between p70-S6K-pT389 and PRAS40-pS183 [p70-S6K-pT389 →PRAS40-pS183]. Both are mTORC1 substrate sites (Nascimento et al., 2010; Saxton and Sabatini, 2017) and are therefore often targeted in parallel. The only predicted edge for which there is to the best of our knowledge no literature evidence is between mTOR-pS2448 and TSC2-pS1387 [mTOR-pS2448 →TSC2-pS1387]. TSC2 is activated by phosphorylation at S1387 and inhibits mTORC1 (Hindupur et al., 2015). Our model prediction that mTORC1—when phosphorylated at S2448 by p70-S6K—regulates TSC2 remains to be experimentally tested.

After having identified 11 of 12 edges as true positives, we performed a literature review to find false negative edges, i.e. edges that our model did not extract, although it has been reported that they exist. This way, we could identify two false negative edges, namely: IR-beta-pY1146→AKT-pT308 and AMPK-pT172→TSC2-pS1387, which were reported in Vigneri et al. (2016) and Mihaylova and Shaw (2012), respectively. Finally, we note that this does not imply that the remaining edges that our model did not extract can be assumed to be true negatives. Nowadays incomplete knowledge about the topology of the mTORC1 pathway renders the safe identification of true negative edges impossible. The absence of literature reports on an edge does not necessarily imply that it does not exist.

5 Conclusion and discussion

We propose a new partially NH-DBN model for learning network structures. When data are measured under different experimental conditions, it is rarely clear whether the data can be merged and analysed within one single model, or whether there is need for a NH-DBN model that allows the network parameters to depend on the condition. The new partially NH-DBN has been designed such that it can infer the best trade-off from the data. It infers for each individual edge whether the corresponding interaction parameter is constant or condition-specific. Our applications to synthetic RAF pathway data as well as to yeast gene-expression data have shown that the partially NH-DBN model improves the network reconstruction accuracy. We have used the partially NH-DBN model to predict the structure of the mTORC1 signalling network. As the measured mTORC1 data are non-equidistant, we have applied a GP-based method to predict the missing equidistant values. Results on synthetic data (see Section 4.1) show that the proposed GP-method (see Section 2.5) can lead to substantially improved results.

All but one of the predicted interactions across the mTORC1 network are reflected in experiments reported in the biological literature. Dalle Pezze et al. (2016) built an ODE-based dynamic model which allows to predict signalling responses to perturbations. Like for many ODE-based models, the topology of this model was defined by the authors, based on literature-knowledge. The ODE model simulations could reproduce the measured mTORC1 timecourse data. Interestingly, all the connections predicted by our new partially NH-DBN model form part of the core model by Dalle Pezze et al. (2016). Hence, we present an alternative unsupervised learning approach, in which the topology of signalling networks is inferred directly from the data. The new model is thus a complementary tool that enhances dynamic model building by predicting the network’s topology in a purely data-driven manner.

Although it worked well for the mTORC1 data, we note that the GP method from Section 2.5 requires the time series to be sufficiently smooth. For non-smooth time series the method might not be able to properly predict the values at unobserved time points, leading to biased response values. Then, network reconstruction methods, such as the new partially NH-DBN, will inevitably infer distorted network topologies and wrong conclusions might be drawn. The assumption of smoothness is therefore crucial for the complete analysis pipeline to work. Our results in Section 4.3 show that the new partially NH-DBN can also be used to infer larger networks. However, our results suggest that there is then need for more data points (or higher signal-to-noise ratios, respectively) to reach accurate network predictions. A conceptual advantage of our partially NH-DBN is that it has two established models, namely the homogeneous DBN (δ=1) and the globally coupled NH-DBN (δ=0) as limiting cases. The new model operates between them, and as we follow a model averaging approach, it is less susceptible to over-fitting. The edge scores of the partially NH-DBN take the two established models as well as all ‘in-between’ models into account. For sparse data, we would thus expect low edge scores, indicating that we might average across too many models.

Funding

M.S.K. and M.G. are supported by the European Cooperation in Science and Technology (COST) [COST Action CA15109 European Cooperation for Statistics of Network Data Science (COSTNET)]. K.T. acknowledges support from the BMBF e: Med initiatives GlioPATH [01ZX1402B] and MAPTor-NET (031A426B), the German Research Foundation [TH 1358/3-1], the Stichting TSC Fonds (calls 2015 and 2017) and the MESI-STRAT project, which has received funding from the European Union’s Horizon 2020 research and innovation programme under [grant agreement No 754688]. K.T. is recipient of a Rosalind-Franklin-Fellowship of the University of Groningen and of the Research Award 2017 of the German Tuberous Sclerosis Foundation.

Conflict of Interest: none declared.

Supplementary Material

bty917_Supplementary_Data

References

  1. Aderhold A. et al. (2014) Statistical inference of regulatory networks for circadian regulation. Stat. Appl. Genet. Mol. Biol., 13, 227–273. [DOI] [PubMed] [Google Scholar]
  2. Ahmed A., Xing E. (2009) Recovering time-varying networks of dependencies in social and biological studies. Proc. Natl. Acad. Sci. USA, 106, 11878–11883. [DOI] [PMC free article] [PubMed] [Google Scholar]
  3. Alabadi D. et al. (2001) Reciprocal regulation between TOC1 and LHY/CCA1 within the Arabidopsis circadian clock. Science, 293, 880–883. [DOI] [PubMed] [Google Scholar]
  4. Bishop C.M. (2006). Pattern Recognition and Machine Learning. Springer, Singapore. [Google Scholar]
  5. Cantone I. et al. (2009) A yeast synthetic network for in vivo assessment of reverse-engineering and modeling approaches. Cell, 137, 172–181. [DOI] [PubMed] [Google Scholar]
  6. Chow B. et al. (2012) ELF3 recruitment to the PRR9 promoter requires other evening complex members in the Arabidopsis circadian clock. Plant Signal Behav., 7, 170–173. [DOI] [PMC free article] [PubMed] [Google Scholar]
  7. Dalle Pezze P. et al. (2016) A systems study reveals concurrent activation of AMPK and mTOR by amino acids. Nat. Commun., 7, 1–19. [DOI] [PMC free article] [PubMed] [Google Scholar]
  8. Davis J., Goadrich M. (2006). The relationship between precision-recall and ROC curves In: ICML ’06: Proceedings of the 23rd International Conference on Machine Learning, pages 233–240. ACM, New York, NY, USA. [Google Scholar]
  9. Dibble C., Cantley L. (2015) Regulation of mTORC1 by PIP3K signaling. Trends Cell Biol., 25, 545–555. [DOI] [PMC free article] [PubMed] [Google Scholar]
  10. Dixon L. et al. (2011) Temporal repression of core circadian genes is mediated through EARLY FLOWERING 3 in Arabidopsis. Curr. Biol., 21, 120–125. [DOI] [PMC free article] [PubMed] [Google Scholar]
  11. Dondelinger F. et al. (2013) Non-homogeneous dynamic Bayesian networks with Bayesian regularization for inferring gene regulatory networks with gradually time-varying structure. Mach. Learn., 90, 191–230. [Google Scholar]
  12. Friedman N. et al. (2000) Using Bayesian networks to analyze expression data. J. Comput. Biol., 7, 601–620. [DOI] [PubMed] [Google Scholar]
  13. Geissen E. et al. (2016) MEMO: multi-experiment mixture model analysis of censored data. Bioinformatics, 32, 2464–2472. [DOI] [PMC free article] [PubMed] [Google Scholar]
  14. Green P. (1995) Reversible jump Markov chain Monte Carlo computation and Bayesian model determination. Biometrika, 82, 711–732. [Google Scholar]
  15. Grzegorczyk M. (2016) A non-homogeneous dynamic Bayesian network with a hidden Markov model dependency structure among the temporal data points. Mach. Learn., 102, 155–207. [Google Scholar]
  16. Grzegorczyk M., Husmeier D. (2013) Regularization of non-homogeneous dynamic Bayesian networks with global information-coupling based on hierarchical Bayesian models. Mach. Learn., 91, 105–154. [Google Scholar]
  17. Hindupur S. et al. (2015) The opposing actions of target of rapamycin and AMP-activated protein kinase in cell growth control. Cold Spring Harb. Perspect. Biol., 7, a019141.. [DOI] [PMC free article] [PubMed] [Google Scholar]
  18. Husmeier D. et al. (2010). Inter-time segment information sharing for non-homogeneous dynamic Bayesian networks In: Lafferty J. E. A. (ed.) Proceedings of the Twenty-Fourth Annual Conference on Neural Information Processing Systems (NIPS), Vol. 23, pp. 901–909. Curran Associates. [Google Scholar]
  19. Kikis E. et al. (2005) ELF4 is a phytochrome-regulated component of a negative-feedback loop involving the central oscillator components CCA1 and LHY. Plant J., 44, 300–313. [DOI] [PubMed] [Google Scholar]
  20. Lèbre S. et al. (2010) Statistical inference of the time-varying structure of gene-regulation networks. BMC Syst. Biol., 4, [DOI] [PMC free article] [PubMed] [Google Scholar]
  21. Locke J. et al. (2005) Extension of a genetic network model by iterative experimentation and mathematical analysis. Mol. Syst. Biol., 1, 13. Doi: 10.1038/msb4100018. [DOI] [PMC free article] [PubMed] [Google Scholar]
  22. Locke J.C.W. et al. (2006) Experimental validation of a predicted feedback loop in the multi-oscillator clock of Arabidopsis thaliana. Mol. Syst. Biol., 2, 59. Doi: 10.1038/msb4100102. [DOI] [PMC free article] [PubMed] [Google Scholar]
  23. Manning B., Toker A. (2017) AKT/PKB Signaling: navigating the Network. Cell, 169, 381–405. [DOI] [PMC free article] [PubMed] [Google Scholar]
  24. Mihaylova M.M., Shaw R. (2012) The AMP-activated protein kinase (AMPK) signaling pathway coordinates cell growth, autophagy, & metabolism. Nat. Cell Biol., 13, 1016–1023. [DOI] [PMC free article] [PubMed] [Google Scholar]
  25. Miwa K. et al. (2007) Genetic linkages of the circadian clock-associated genes, TOC1, CCA1 and LHY, in the photoperiodic control of flowering time in Arabidopsis thaliana. Plant Cell Physiol., 48, 925–937. [DOI] [PubMed] [Google Scholar]
  26. Nascimento E. et al. (2010) Phosphorylation of PRAS40 on Thr246 by PBK/AKT facilitates efficient phosphorylation of Ser183 by mTORC1. Cell. Signal., 22, 961–967. [DOI] [PubMed] [Google Scholar]
  27. Robinson J., Hartemink A. (2010) Learning non-stationary dynamic Bayesian networks. J. Mach. Learn. Res., 11, 3647–3680. [Google Scholar]
  28. Sachs K. et al. (2005) Causal protein-signaling networks derived from multiparameter single-cell data. Science, 308, 523–529. [DOI] [PubMed] [Google Scholar]
  29. Saxton R., Sabatini D. (2017) mTOR signaling in growth, metabolism, and disease. Cell, 168, 960–976. [DOI] [PMC free article] [PubMed] [Google Scholar]
  30. Soliman G. et al. (2010) mTOR Ser-2481 autophosphorylatyion monitors mTORC-specific catalytic activity and clarifies rapamycin mechanism of action. J. Biol. Chem., 285, 7866–7879. [DOI] [PMC free article] [PubMed] [Google Scholar]
  31. Tzatsos A., Kandror K.V. (2006) Nutrients suppress phosphatidylinositol 3-kinase/AKT signaling via raptor-dependent mTOR-mediated insulin receptor substrate 1 phosphorylation. Mol. Cell Biol., 26, 63–76. [DOI] [PMC free article] [PubMed] [Google Scholar]
  32. Vanhatalo J. et al. (2013) GPstuff: Bayesian modeling with Gaussian processes. J. Mach. Learn. Res., 14, 1175–1179. [Google Scholar]
  33. Vigneri R. et al. (2016) Insulin, insulin receptors, and cancer. J. Endocrinol. Investig., 39, 1365–1376. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

bty917_Supplementary_Data

Articles from Bioinformatics are provided here courtesy of Oxford University Press

RESOURCES