Skip to main content
Bioinformatics Advances logoLink to Bioinformatics Advances
. 2024 Feb 29;4(1):vbae034. doi: 10.1093/bioadv/vbae034

Inference of differential gene regulatory networks using boosted differential trees

Gihanna Galindez 1,2, Markus List 3, Jan Baumbach 4,5, Uwe Völker 6, Ulrike Mäder 7, David B Blumenthal 8,2, Tim Kacprowski 9,10,2,
Editor: Yoshihiro Yamanishi
PMCID: PMC10948285  PMID: 38505804

Abstract

Summary

Diseases can be caused by molecular perturbations that induce specific changes in regulatory interactions and their coordinated expression, also referred to as network rewiring. However, the detection of complex changes in regulatory connections remains a challenging task and would benefit from the development of novel nonparametric approaches. We develop a new ensemble method called BoostDiff (boosted differential regression trees) to infer a differential network discriminating between two conditions. BoostDiff builds an adaptively boosted (AdaBoost) ensemble of differential trees with respect to a target condition. To build the differential trees, we propose differential variance improvement as a novel splitting criterion. Variable importance measures derived from the resulting models are used to reflect changes in gene expression predictability and to build the output differential networks. BoostDiff outperforms existing differential network methods on simulated data evaluated in four different complexity settings. We then demonstrate the power of our approach when applied to real transcriptomics data in COVID-19, Crohn’s disease, breast cancer, prostate adenocarcinoma, and stress response in Bacillus subtilis. BoostDiff identifies context-specific networks that are enriched with genes of known disease-relevant pathways and complements standard differential expression analyses.

Availability and implementation

BoostDiff is available at https://github.com/scibiome/boostdiff_inference.

1 Introduction

Gene regulation is a fundamental biological process that underlies various cellular functions, including developmental, environmental, and disease contexts. The regulatory relationships in a biological sample can be represented by gene regulatory networks (GRNs), where two gene nodes with a regulatory relationship are connected by an edge (Karlebach and Shamir 2008). GRN inference remains a challenging task because of the inherent complexity of transcriptional regulation, as well as the high dimensionality and noise in biological datasets. Furthermore, GRNs are dynamic and context-specific (Ideker and Krogan 2012, Nagy-Staron et al. 2021), i.e. some regulatory processes are active only in certain cell types, tissues, conditions, or in response to specific stimuli. Changes in these pairwise dependencies have been associated with the development of complex diseases (de la Fuente 2010). Differential network analysis, which aims to detect altered connectivity between different conditions or disease states, has recently emerged as a powerful complement to standard differential expression (DE) analysis and is more suitable for detecting context-specific GRNs (de la Fuente 2010, Marbach et al. 2012). Exploring how GRN structures are rewired between two different states can reveal molecular mechanisms that drive disease development and progression and identify more relevant therapeutic targets.

Various approaches for deriving differential networks have been the focus of recent studies (Bhuva et al. 2019, Basha et al. 2020, Baur et al. 2020). Representative methods are shown in Table 1. The z-score method performs Fisher transformation of Pearson’s correlation coefficients between two conditions. The resulting z-scores are modeled as a normal distribution, followed by a z-test to detect significant pairwise edges (Zhang et al. 2007). Diffcoex, a framework based on weighted gene co-expression analysis (WGGNA), first builds a correlation matrix for each condition being compared and then calculates their difference. From the difference matrix, a dissimilarity matrix is calculated based on a topological overlap measure and hierarchical clustering is applied to find the differentially co-expressed gene clusters (Tesson et al. 2010). Another approach, a Gaussian graphical model (GGM)-based method, learns the differential network from conditional associations (Chu et al. 2011). EBcoexpress relies on empirical Bayes’ estimation to calculate the posterior probability that an edge is differentially co-expressed (Dawson and Kendziorski 2012, Dawson et al. 2012).

Table 1.

Overview of methods used to derive differential networks and patient-specific networks.a

Method Algorithmic approach Test Directionality No. of conditions Network type Reference
BoostDiff Tree-based Yes Two Differential network This paper
z-score Correlation-based z-test No Two Differential network Zhang et al. (2007)
EBcoexpress Empirical Bayes; correlation No Multiple Differential network Dawson and Kendziorski, (2012)
Diffcoex Correlation-based Permutation test No Multiple Differential network Tesson et al. (2010)
GGM-based GGM; posterior odds Permutation test No Two Differential network Chu et al. (2011)
chNet GGM; differential expression t-test No Two Differential network Tu et al. (2021)
LIONESS Linear interpolation No Patient-specific Kuijjer et al. (2019)
Tanaka et al. (2020) Nonparametric Bayesian network; Δ Edge contribution value t-test Yes Patient-specific Tanaka et al. (2020)
DEVC-net Differential expression variance covariance (DEVC) t-test; Wilcoxon rank sum test No Two Differential network; patient-specific Yu et al. (2015)
a

Adapted from Bhuva et al. (2019).

The differential network methods described above measure linear relationships or rely on joint normality assumptions, which may not hold in practice (Shojaie 2021). Nonlinear regulatory dependencies are common mechanisms that occur in biology (Brunel et al. 2010, Costello et al. 2014) and these complex relationships from real datasets may be difficult to detect using correlation- or GGM-based methods. As discussed in a recent review, new methods for differential network analysis for non-Gaussian data are needed (Shojaie 2021). Some methods for inferring patient-specific networks have also been developed. To measure the sample-specific edge contributions, DEVC-net (Yu et al. 2015) uses a method that calculates an integrated score called DEVC that considers the differential expression of a gene, differential expression variance of a gene, and covariance of gene pairs. This method uses a template network to identify differential network features from gene expression data and can be used to obtain sample-specific networks. Linear Interpolation to Obtain Network Estimates for Single Samples (LIONESS) reverse engineers sample-specific networks from an aggregate network by applying linear interpolation on the edge weights from the aggregate networks (Kuijjer et al. 2019). This approach can be used in combination with existing network inference methods to also capture nonlinear dependencies. Tanaka et al. (2020) infer nonparametric Bayesian networks using B-spline regression. From the model parameters, the method derives a value representing the contribution of each edge to the expression in a particular sample, called the edge contribution value (ECv). The absolute differences in ECvs (ΔECv) between certain samples are then used to build sample-specific networks. An alternative to the abovementioned approaches are tree-based strategies, which offer the advantage of more relaxed model assumptions. While examples such as GENIE3 and derived tools continue to be successfully applied in various biological settings (Huynh-Thu et al. 2010, Moerman et al. 2019, Van de Sande et al. 2020), they cannot be used to directly compare two different biological conditions.

We introduce BoostDiff, a nonparametric tree-based approach for reconstructing directed differential networks (Fig. 1). The main component of BoostDiff is the differential tree, a modified version of the standard regression tree that can be used to identify gene pairs exhibiting changes in regulatory dependencies between two biological conditions. To build the differential trees, we use a novel splitting criterion called the differential variance improvement (DVI), which measures the difference in predictive value of a feature on the gene expression levels between two conditions. We demonstrate that boosting the differential trees with respect to samples belonging to a specific target condition is an important step for promoting condition specificity of the output networks. Tree-based variable importance measures are then calculated to obtain a ranking of regulators and to derive the differential network. Using simulated gene expression data, we evaluate the performance of BoostDiff in comparison to baseline methods and existing differential network tools. We further apply BoostDiff to multiple transcriptomics datasets, including COVID-19, Crohn’s disease, prostate adenocarcinoma, and breast cancer data, as well as stress response data in the bacterium Bacillus subtilis.

Figure 1.

Figure 1.

Overview of the BoostDiff algorithm. As input, we require two gene expression matrices corresponding to a target condition (e.g. disease) and a baseline condition (e.g. control). For each of p total genes, a learning subsample (LS) is drawn from the two datasets, after which an AdaBoost ensemble of differential trees is built to identify the features that are more predictive of the gene expression levels in the target condition. By setting a target condition, BoostDiff can be used to identify regulatory relationships that are more pronounced in condition A (e.g. disease state) and condition B (e.g. control/healthy), thereby providing a differential network capturing context-specific regulatory changes. In the overall workflow, the BoostDiff algorithm is run twice, once with condition A as target condition and subsequently with B as target condition. The results are then combined to obtain the final differential network. Most notably, while existing approaches aim for the reconstruction of whole genome-scale GRNs, BoostDiff concentrates on maximizing the precision for those parts of the regulatory network that actually characterize the difference between the two phenotypes.

2 Methods

2.1 Overview of the differential network inference approach

The overall strategy of BoostDiff is inspired by the popular network inference tool GENIE3 (Huynh-Thu et al. 2010). GENIE3 aims to reconstruct a network using as input a gene expression matrix derived from a single biological condition. For this, GENIE3 fits an ensemble of regression trees, using the expression levels of the target genes as the output variable and the expression levels of potential regulators as the features. The splitting criterion for a node in the regression tree is the reduction

|S|·Var(S)|SL|·Var(SL)|SR|·Var(SR),

in variance (impurity) of the target gene’s expression caused by the split. Here, S denotes the set of samples that reach a node to be split, SL and SR are the subsets of samples in the left and right branches of the split, respectively, and Var(S) is the variance of the output variable in a set of samples S{S,SL,SR}. The importance of one variable in a single regression tree is calculated by summing up the reductions in variance of all tree nodes where this variable is used as a splitting feature. The overall importance of a feature in the ensemble is then averaged across all trees. One ensemble model is learned for each target gene in the dataset. The tree-based feature importances based on variance reduction are interpreted as edge weights of candidate regulator-target gene links. A high edge weight indicates that a regulator has a large contribution in predicting the expression level of a target gene. The edges with the highest weights are returned as the inferred GRN. Biologically, the top-ranked edges inferred by GENIE3 represent the relationships that are dominant in the condition of interest.

Similarly, the differential network inference problem can be decomposed into p independent regression subproblems, where p is the total number of genes in the expression data. Our strategy assumes that, in a given biological context, the expression level of a gene can be modeled as a function of the expression levels of other genes (Fig. 1). The crucial difference between BoostDiff and GENIE3 is that we simultaneously take into account two datasets for inferring a differential network. More precisely, our approach requires the availability of (1) gene expression data matrix

XD=(xi,gD)(i,g)=(1,1)(ND,p)

for ND measurements from a disease condition and (2) the matrix

XC=(xi,gC)(i,g)=(1,1)(NC,p)

for NC measurements from a control condition. Both matrices have p columns (total number of genes). That is, xi,gD contains the expression of gene g in measurement (patient) i from the disease condition.

The inference task is modeled as a feature selection problem that aims to find features that are more predictive of expression levels in a target condition than in the baseline condition. For this, differential network analysis is performed by solving the regression problem while taking into account information from two distinct labels. We use the AdaBoost algorithm (Drucker 1997) using differential trees as base learners to drive the improved prediction of expression levels in the target condition. The trained model provides a ranking of the edges by computing a feature importance weight for each regulator (see details below).

For each target gene g=1,,p, we define regression problems LSD(g)=(XgD,X,gD) and LSC(g)=(XgC,X,gC). The design matrices XgD and XgC are obtained by deleting the gth columns X,gD and X,gC from XD and XC, respectively. The target variables are set to the deleted columns X,gD and X,gC. The inference of our differential network is then performed as follows:

  1. For each target gene g=1,,p:

    1. Generate the design matrices and target variable vectors for the regression problems LSD(g) and LSC(g) for gene g for the disease and the control conditions.

    2. Train an ensemble of differential regression trees with the disease selected as target condition, i.e. tree ensembles that yields a good fit for LSD(g) but not for LSC(g).

    3. From the ensemble of differential trees, extract weights for all potential regulators of g which are large if the expression of the regulator is highly predictive of g in the disease condition but not in the control condition.

  2. Aggregate and sort the p gene rankings to obtain a global ranking of all differential regulatory edges.

  3. Repeat the steps 1 and 2 with reversed roles of D and C to identify regulatory links that are present in the control but not in the disease condition.

In the following, we describe the details of the algorithm.

2.2 Step 1a: Growing a differential tree

We first explain how to build a single differential tree, the key ingredient of step 1a of the algorithm outlined above. We describe how to build the differential tree for the case where the disease has been selected as the target condition. The workflow with control as target condition is analogous.

A differential tree is built through binary recursive partitioning. The key difference to standard regression trees is that, to determine the features (i.e. genes) used for splitting the samples at the inner nodes of our trees, we use a novel split criterion called differential variance improvement (DVI) instead of variance reduction. At each node of the differential tree, we maintain subsets SD{1,,ND} and SC{1,,NC} of the disease and control samples, respectively. Given a possible split feature (i.e. candidate predictor gene) g, we define DVI(g) as follows:

DVI(g)=maxτVarRed(g,τ,SD)maxτVarRed(g,τ,SC)

For a fixed g and splitting threshold τ, the variance reduction for the disease samples is given by

VarRed(g,τ,SD)=MSE(X,gD[SD])|SLD(g,τ)||SD|·MSE(X,gD[SLD(g,τ)])|SRD(g,τ)||SD|·MSE(X,gD[SRD(g,τ)]),

where MSE(·) is the mean squared error from the sample mean, X,gD[S]={xi,gD|iS} is the restriction of the target variable to the disease samples (rows) contained in a set of samples S, and SLD(g,τ)={iSD|xi,gDτ}, and SRD(g,τ)={iSD|xi,gD>τ} are the subsets of disease samples SD available at the current node whose split gene expression levels are, respectively, smaller or larger than the threshold τ. Variance reduction for the control samples is defined analogously. A positive value of the DVI hence means that the gene g is more predictive of g’s expression level in the disease condition than in the control condition, whereas a negative DVI value indicates that the opposite is the case.

When building our differential trees, we use genes g maximizing DVI(g) as split features (see details below). If disease has been selected as the target condition, these genes correspond to regulators of g which are much more active in disease samples than in control samples. If control has been selected as the target condition, they correspond to regulators in healthy individuals that have lost their regulatory activity in the disease.

Given the regression problems LSD(g) and LSC(g) for the disease and control conditions, respectively, we construct a differential regression tree whose nodes are 5-tuples

v=(SD,SC,g,τD,τC),

where g is the split gene, τD is the split threshold for the disease samples, and τC is the split threshold for the control samples. Note that we use two different thresholds, since using a single threshold for both conditions while optimizing the DVI leads to a skewed expression distribution in each side of the split, with one side favoring disease samples and the other side favoring control samples. The construction is done as follows:

  1. Initialize root as
    r=(SD={1,,ND},SC={1,,NC},,,),

    where ‘’ is a placeholder for not yet defined split genes and thresholds.

  2. Starting at the root, recursively construct a differential tree via binary partitioning as follows:

  3. At the current node v=(SD,SC,,,) of the tree under construction, do the following:

    1. If a suitable termination criterion (the maximum depth of the tree has been reached or the number of target or baseline samples reaching in a node has been reached) has been satisfied or maxgDVI(g)0, label v as leaf and traceback.

    2. Otherwise, set the node v’s split gene to
      g=argmaxgDVI(g),
      its disease threshold to
      τD=argmaxτVarRed(g,τ,SD),
      and its control threshold to
      τC=argmaxτVarRed(g,τ,SC).
    3. Initialize v’s left child as
      vL=(SLD(g,τD),SLC(g,τC),,,),
      its right child as
      vR=(SRD(g,τD),SRC(g,τC),,,),

      and continue with processing vL and vR.

In the regression trees described by (Breiman et al. 1984), the prediction for a sample is determined by traversing the tree until a leaf node is reached. Here, we are more interested in predicting the expression values of the samples in the target condition; thus, prediction is performed only for target samples using the identified splitting thresholds τD. The final prediction is calculated as the expected value of the expression levels of the target samples assigned to the leaf nodes after fitting the differential tree.

2.3 Step 1b: Boosting the differential trees

Inspired by GRNBoost2 (Moerman et al. 2019), we implement a boosting algorithm that derives a strong prediction model by sequentially training a pool of differential trees as the weak learners. AdaBoost for regression is typically used for solving problems where the output is a continuous variable (i.e. expression levels) without explicitly considering the class of the samples. Here, we adapted the AdaBoost.R2 algorithm (Drucker 1997) to handle the regression problem, given labels from two classes (i.e. conditions). Using the differential trees as base learners, the modified algorithm performs the boosting with respect to samples belonging to the specified target condition. In this way, BoostDiff attempts to find a model that is more predictive of the target condition compared to the baseline condition. In each tree, only the target samples are re-weighted in subsequent boosting iterations, while samples from the baseline condition retain uniform weight. In particular, target samples that are more difficult to predict are selected with higher weights during the bootstrapping step and will always be compared to a uniform sample from the baseline condition. To avoid overfitting, we set a low number of trees and in practice find that 50–100 differential trees in the ensemble are sufficient for real transcriptomics datasets. The algorithm is described in detail in the Supplementary Methods section.

2.4 Step 1c: Computing variable importances to rank the candidate predictors

Tree-based methods allow for the calculation of a variable importance measure that can be used to rank the features according to their relevance for predicting the output. In GENIE3, the importance of a predictor gene g is calculated as the sum of the variance reduction across all nodes where g is used as the splitting feature, averaged over all trees in the ensemble. For BoostDiff, we derive a similar measure by considering the samples belonging to the target condition (step 1c in the high-level description of BoostDiff given above). If disease is selected as target condition, the importance VIM(g) attributed to a candidate predictor gene g is calculated as the weighted variance reduction for the disease samples across all M trees in the ensemble:

VIM(g)=m=1Mαm·vVm(g)VarRed(g,τvD,SvD)

Here, m is the boosting iteration, αm is the weight of the differential tree returned by AdaBoost, Vm(g) is the set of nodes in tree m where g was used as the split feature, and VarRed(g,τvD,SvD) is the variance reduction for the disease samples at a node v=(SvD,,g,τvD,)Vm(g). Since each node in a differential tree has two independent thresholds, interpreting the tree becomes more abstract with increasing depth. Boosting using shallow differential trees (e.g. differential tree stumps) thus favors greater interpretability of the variable importance measure.

2.5 Steps 2 and 3: Edge ranking and filtering from boosted differential trees

Each modified AdaBoost model yields a separate ranking of the regulators. However, simply ordering the regulatory links according to the weights leads to a bias toward highly variable target genes. To avoid this, we first scale the expression levels of each target gene to unit variance, similarly implemented in GENIE3 (Huynh-Thu et al. 2010).

Boosting with respect to a target condition does not necessarily produce a model that predicts a gene’s expression in the target condition better than its expression in the baseline condition. To illustrate, sample plots of the training progression are shown in Supplementary Fig. S1. To restrict the results to differential edges, we recommend examining the distributions of the mean difference in prediction error. Sample distributions of these values from the simulated and real transcriptomics data are shown in Supplementary Figs S2 and S3, respectively. Based on these generated plots, users can filter for target genes with lower mean prediction error in the target condition than the baseline condition by applying a threshold. Alternatively, users can select the top edges with the lowest mean difference in prediction error or input a user-defined percentile. After filtering, the edges are re-ranked based on the variable importance measure used as the edge weight. The procedure is performed for each of the two sub-analyses (disease or control as target condition). The top edges (user-specified parameter) from each sub-analysis are then merged to obtain the final network.

Given the ensemble nature of BoostDiff where each differential tree is constructed differently from randomly selected predictors, deriving an explanation of the model’s predictions is challenging. To obtain more mechanistic and biologically meaningful results, we can rely on the topology of the differential network and use community detection algorithms that aim to find highly connected subnetworks that could be potentially associated with the disease of interest.

2.6 Compared methods

We evaluate the performance of BoostDiff against baseline methods and existing differential networks methods. To verify the condition specificity of the networks output by BoostDiff, we compare its performance to those of two GRN inference methods, GENIE3 and ARACNE (Margolin et al. 2006). GENIE3 is run using the corresponding R package. ARACNE is run using the implementation provided in the R package minet (Meyer et al. 2008). For both GENIE3 and ARACNE, only the gene expression matrix from the disease condition is used as input. We also compare BoostDiff to two additional baseline methods: (i) random forests of differential trees (‘RF Difftrees’) and (ii) a ‘Differential GENIE3’ approach, where we separately infer the networks from each condition using GENIE3 and subtract the calculated edge weights derived from the control condition from the weights derived from the disease condition and vice versa. For details, an AIMe report is available at https://aime.report/656I3Z/2 (Matschinske et al. 2021).

Next, we compare the performance of BoostDiff to other differential network methods from the benchmarking study in Bhuva et al. (2019). These methods include z-score, EBcoexpress, Diffcoex, and a GGM-based method, which are implemented in the dcanr package in R. Additionally, we run the chNet algorithm (Tu et al. 2021), which considers significant changes in both partial correlations of edges and differential expression. To facilitate comparability and given that only the tree-based methods provide directionality information among the methods examined here, we convert directed edges to undirected edges (Bhuva et al. 2019).

2.7 Evaluation using simulated datasets

Gene expression data for disease and control conditions are simulated by adapting the SimulatorGRN approach (Bhuva et al. 2019), a dynamical systems modeling tool to simulate differential co-expression patterns. SimulatorGRN induces differential co-expression by perturbing a reference network via gene knockdown to reduce their expression levels and models activation/repression of a gene by a set of regulators using normalized Hill equations. Knockdown leads to complex nonlinear differences between the generated disease and control samples (Supplementary Fig. S4). By default, known regulatory interactions in Saccharomyces cerevisiae are used to create a reference GRN. To obtain the ‘true’ differentially co-expressed edges and allow performance evaluation, genes in the network whose abundance are affected by the knockdown perturbation and thus exhibit regulatory dependency are determined via sensitivity analysis.

In the original SimulatorGRN framework, a sample can have multiple genes knocked down, even though the evaluation considers each knockdown gene separately. To explicitly eliminate the confounding effect of additional knockdown genes in our experiments, we generate the expression data in the perturbed condition such that exactly one randomly selected input gene is knocked down. We evaluate the different tools across four scenarios: (i) 50 nodes; 500 simulations, (ii) 150 nodes; 500 simulations, (iii) 300 nodes; 500 simulations, and (iv) 500 nodes; 250 simulations. In each simulation, 100 samples are generated per condition. The final disease samples are those which have the gene knockdown, whereas the control samples are ‘wild type’ (Bhuva et al. 2019). The edges from the ‘association network’ of each simulation are used as the ground truth edges for evaluating performance. The parameters for generating the simulated data are shown in Supplementary Table S1.

In the analyses on simulated data, all genes except the target gene are considered as potential regulators. The z-score method, chNet, Diffcoex, and the GGM-based method are run with the default parameters. The parameters used for the RF Difftrees and BoostDiff are provided in Supplementary Table S2. For each simulation, we filter for the target genes belonging to the 3rd percentile based on the mean difference in prediction errors. The parameters for the EBcoexpress runs are shown in Supplementary Table S3.

The different tools have different statistical methods and cutoffs for determining the differentially coexpressed edges depending on how the algorithm works. To facilitate comparability, we show the top k predicted edges output by each method (except for chNet, wherein the number of predicted differential edges depends on the tuning parameter and is variable for each simulation; thus, extracting the top k edges cannot be consistently applied across simulations). For visualization, we show the results based on the top 100 predicted edges output by each method for the simulated data. We report the performance using precision, recall, and F1 score as the evaluation metrics.

2.8 Evaluation using real datasets

For evaluation on real datasets, BoostDiff, Diffcoex, EBcoexpress, the z-score method, and Differential GENIE3 are run on gene expression data for four human diseases (COVID-19, Crohn’s disease, breast cancer, and prostate adenocarcinoma) and for the stress response in B.subtilis. For the human diseases, we extract the top 1000 edges to generate the final network. For the B.subtilis data, we use the top 300 edges.

For the COVID-19 RNA-Seq dataset, raw gene counts are downloaded from the Gene Expression Omnibus (GEO) database under the accession number GSE156063 (Mick et al. 2020). We use data generated from nasal swab samples from COVID-19 (n=93) and uninfected patients (n=100). Count data are normalized using the DESeq2 package in R with the variance stabilizing transformation (vst}) function. Differentially expressed genes (DEGs) are obtained using DESeq2 (Love et al. 2014).

For the Crohn’s disease (CD) dataset, normalized microarray data are downloaded from the GEO database under the accession number GSE126124 (Palmer et al. 2019), which were generated from colon biopsies of individuals with CD (n=37) and healthy controls (n=19). Differential expression analysis is performed using limma (Ritchie et al. 2015).

To evaluate BoostDiff on data from The Cancer Genome Atlas (TCGA), preprocessed RNA-Seq gene expression data and DEGs from primary tumors and normal solid tissues are obtained from https://xenabrowser.net/ (Goldman et al. 2020). We use gene expression data for two cancer types, namely, prostate adenocarcinoma (PRAD) and breast cancer (BRCA), both of which contain at least 50 samples for the normal tissues.

For the human datasets, Illumina IDs are converted to HGNC symbols using the R package biomaRt where applicable (Durinck et al. 2009). Expression levels corresponding to probes mapped to the same gene symbol are averaged. The top 20% of genes with the lowest variance are filtered out. To generate the final GRN and facilitate comparability with BoostDiff, we filter the outputs for edges containing a TF, using the list of human transcription factors downloaded from http://humantfs.ccbr.utoronto.ca/ (Lambert et al. 2018). These TFs are also used as the candidate regulators for BoostDiff. Genes whose mean difference in prediction error of the models are more extreme than the threshold identified from the 3rd percentiles of the distributions are retained. For the CD dataset, a more stringent filtering step is performed because of the small sample size available for inference. All the outputs from the different methods are filtered for the top 1000 edges (except for chNet). For BoostDiff, the final network is thus comprised of the combined top 500 edges from the run where the disease condition is set as the target condition and the top 500 edges from the run where the control condition is set as the target condition. To identify more specific candidate network modules and facilitate the interpretability of the resulting networks, we apply the Louvain community detection algorithm using the python-louvain package (https://github.com/taynaud/python-louvain). The enrichr module of the gseapy v. 1.1.0 package is used to identify enriched KEGG pathways in the output networks (Subramanian et al. 2005, Kuleshov et al. 2016). KEGG terms are considered to be significantly enriched in a network if the overlap with the gene set is at least three genes.

We further test the performance of BoostDiff using transcriptome data related to the well-characterized general stress response in B.subtilis (Nicolas et al. 2012). We examine three contexts, namely, response to salt stress, response to heat stress, and response to ethanol stress, all of which are known to induce the SigB regulon. Expression data for all protein-coding genes and RNA features (S1 to S1583) of B.subtilis from the corresponding samples were downloaded from the GEO database (accession number GSE27219). Considering that bacterial adaptation is characterized by smaller variation across samples than in animal experiments or experiments with human specimens, gene expression datasets for bacterial adaptation generally have low number of replicates. Therefore, we generate additional synthetic samples using Synthetic Minority Over-sampling TEchnique (SMOTE) (https://github.com/akazs/pySMOTE/) to obtain 48 samples per condition to be available for inference. We note that a large sample size from experimental data still remains ideal for using BoostDiff; SMOTE is used as a workaround in order to be able to run the compared network inference methods on the B.subtilis datasets. Genes with low variance are filtered out using a variance cutoff of 0.4. For the BoostDiff runs, the final network is obtained by filtering the differentially predicted target genes based on the 3rd percentile and combining the top 150 edges in the differential network for each of the two sub-analyses. The results of the compared methods are filtered for the top 300 edges to obtain the differential network. Enrichment analyses for the B.subtilis outputs is performed using the ShinyGO v. 0.77 (Ge et al. 2020).

3 Results and discussion

3.1 BoostDiff outperforms existing methods on complex simulated settings

As expected, BoostDiff better identifies the differential edges than GENIE3 and ARACNE, both of which infer a static network from the disease condition alone (Fig. 2). The Differential GENIE3 method shows similar performance to those of the baseline methods, despite using the expression data from both disease and control conditions. These findings indicate that the feature importances calculated from the trees built using DVI as the splitting criterion are useful for identifying the differential edges. In addition, in all but the smallest simulation scenarios, the boosting scheme performs significantly better than the random forest of differential trees (see results for BoostDiff and RF Difftrees in Fig. 2): While RF Difftrees outperformed all the other methods in the 50-gene setting, BoostDiff is the best performing methods in the more complex 150-, 300-, and 500-gene settings. Results are similar for varying cutoffs of the number of top edges (k{50,100,150,200}) (Supplementary Fig. S5).

Figure 2.

Figure 2.

Performance of differential trees and BoostDiff compared to baseline methods and other differential network tools using simulated data comprising (a) 50 genes; (b) 150 genes; (c) 300 genes; (d) 500 genes. For all methods, the top 100 edges were considered (except for chNet). The one-sided Mann–Whitney U test is used to calculate the P-values.

We then determine the rank of each method in terms of the F1 score across all simulations in the four simulation settings. The results show that BoostDiff is the most frequently top-ranked method in the majority of simulations, more notably in the simulation settings with larger numbers of genes (Supplementary Fig. S6). In addition, except for when a high number of base learners is used in the ensemble, the performance of BoostDiff is consistent across different hyperparameter combinations (no. of features, maximum tree depth) (Supplementary Fig. S7).

3.2 Results on real datasets

3.2.1 COVID-19

The network inferred by BoostDiff shows stronger enrichment of COVID-19-related pathways compared to those of Diffcoex and the z-score method, whereas all edges in the EBcoexpress output are assigned zero posterior probabilities (Fig. 3 and Supplementary Tables S4–S7). In particular, the Diffcoex and z-score networks are enriched in less specific terms, such as ‘Influenza A,’ ‘Measles,’ and ‘Human T-cell leukemia virus 1 infection.’ Differential GENIE3 returned ‘Coronavirus disease,’ ‘Neutrophil extracellular trap formation,’ and ‘Phagosome.’ In addition to terms related to various pathogenic infections, the BoostDiff network is significantly enriched in COVID-19-relevant pathways, such as ‘Coronavirus disease,’ ‘TNF signaling pathway,’ ‘Th17 cell differentiation pathway,’ ‘NF-κB signaling pathway,’ ‘NOD-like receptor pathway,’ ‘Toll-like receptor signaling pathway,’ and ‘Viral protein interaction with cytokine and cytokine receptor’ (Fig. 3). However, similar to BoostDiff, ‘Herpes simplex I infection’ is a top enriched term in the networks output by both methods, which could indicate the presence of shared genes activated in response to viral infection. For the BoostDiff network, examining the differential edges when considering the two sub-analyses separately shows generally similar results, indicating the enrichment of infection-related pathways (Supplementary Fig. S8). We also compare the BoostDiff network to the list of DEGs. While the overlap between the differential network nodes and DEGs is significant, it is quite low (Jaccard similarity=0.129). Further, removing the DEGs from the genes in the differential network retains the enrichment of COVID-19-related pathways (Supplementary Fig. S9), indicating that these dysregulated genes identified by BoostDiff are missed by standard DE analysis.

Figure 3.

Figure 3.

Enriched KEGG pathways in the networks inferred by BoostDiff for the COVID-19 dataset. The resulting differential networks were strongly enriched in inflammation-related and other relevant pathways, consistent with known disease pathophysiology.

To further examine the differential networks, we apply the Louvain community detection algorithm (Blondel et al. 2008). BoostDiff, Diffcoex, z-score, and Differential GENIE3, identifies modules that are enriched in viral infection terms such as ‘Influenza A,’ ‘Coronavirus disease,’ ‘Hepatitis C,’ as well as signaling pathways ‘NOD-like receptor signaling pathway,’ ‘RIG-1-like receptor signaling pathway,’ and ‘Toll-like receptor signaling pathway’ (Supplementary Figs S10–S12 and Supplementary Tables S8–S11). However, the BoostDiff module is additionally enriched in the terms ‘Cytosolic DNA-sensing pathway’ and ‘Chemokine signaling pathway’ (Fig. 4 and Supplementary Tables S10 and S11). Notable COVID-19-related genes in the BoostDiff module include CXCL10, DDX58, STAT1, STAT2, EIF2AK2, and ISG15. Other additionally known genes involved in pathogen response include IFIT1, IFIT2, IFIT3, CXCL11, and CXCL9. Chemokines are produced in response to a range of viral infections. In COVID-19, chemokine signaling has been linked to acute respiratory distress syndrome (Zhou et al. 2020). DDX58 (RIG-1) is involved in the production of interferons in response to COVID-19 (Yamada et al. 2021). Interferon signaling mediated by STAT1 and STAT2 is a key antiviral defense mechanism, and SARS-CoV-2 was demonstrated to inhibit nuclear translocation of STAT1 and STAT2 (Mu et al. 2020). OASL, OAS2, and OAS3 are members of the oligoadenylate synthetase family of proteins, which are involved in the antiviral activity of interferons and have been reported to exacerbate cardiac effects of COVID-19 (Gao et al. 2023). JAK inhibitors have been explored for the treatment and management of COVID-19 (Levy et al. 2022) and is known to regulate the release of pro-inflammatory cytokines and chemokines. For instance, the chemokines CXCL9, CXCL10 and CXCL11 are known to be upregulated in the COVID-19 response (Callahan et al. 2021). IFIT1, IFIT2, and IFIT3 form a functional complex and participate in interferon-induced broad viral response (Mears and Sweeney 2018, Singh et al. 2022). ISG15 encodes a ubiquitin-like protein whose activation triggers the release of various pro-inflammatory cytokines and chemokines (Cao 2021).

Figure 4.

Figure 4.

Dysregulated module identified from the COVID-19 differential network inferred by BoostDiff using the Louvain algorithm. Some notable genes in the module include JAK2, CXCL9, CXCL10, CXCL11, DDX58, STAT1, STAT2, IFIT1, IFIT2, IFIT3, IFITM3, OASL, OAS2, OAS3, and ISG15.

In addition to the enrichment of COVID-19-related genes that are consistent with literature, BoostDiff also identified edges between TREX1 and IFIT1, IFIT3, IFITM3, ISG15, and OAS3, all of which have been implicated with COVID-19. While TREX1 has not been directly linked to COVID-19 pathogenesis, it encodes an exonuclease that is involved in host defense and has also been implicated in autoimmune diseases (Stetson et al. 2008). In particular, TREX1 regulates interferon-mediated signaling in antiviral response (Hasan et al. 2013). Interestingly, a recent genome-wide screen identified TREX1 as a pro-viral factor in viral replication during influenza infection (King et al. 2023). Given that nucleic acid detection is a key component of the immune response, TREX1 could be a candidate player involved in immune response to viral pathogens, including SARS-CoV-2, that can be examined in future investigations.

3.2.2 Crohn’s disease

Notably, the BoostDiff network shows stronger enrichment of CD-related pathways than the networks inferred by the other methods (Fig. 5 and Supplementary Tables S12–S15). The network returned by Differential GENIE3 does not show any significant enrichment. In addition to infection pathways, the network output by BoostDiff for the CD dataset is enriched in CD-relevant pathways, including ‘TNF signaling pathway,’ ‘NOD-like receptor signaling pathway,’ ‘Th17 cell differentiation,’ ‘IL-17 signaling pathway,’ and ‘NF-κB signaling,’ as well as ‘Inflammatory bowel disease.’ NOD-like receptors play important roles in host defense and homeostasis by acting as sensors of microbial pathogens in the intestinal mucosa. IBD has been associated with abnormal gut microbiota composition, which promotes NF-κB signaling and downstream inflammatory responses (Hug et al. 2018). The TF NF-κB functions in maintaining intestinal homeostasis, and dysregulation of the NF-κB pathway leads to sustained inflammatory state characteristic of IBD patients (Zaidi and Wine 2018). NF-κB signaling activation has been associated with more severe clinical manifestations in CD patients (Han et al. 2017, Zaidi and Wine 2018). The Th17 subset of CD4+ T cells have well recognized roles in IBD pathogenesis. In CD, IL-17 signaling mediates the activation of Th17 cells, which further drive pro-inflammatory cascades via the production of IL-21, IL-22, IFN-γ, and TNF (Schmitt et al. 2021). The differential edges obtained from the two BoostDiff sub-analyses where the disease or control states are used as the target condition also show enrichment of CD-relevant pathways (Supplementary Fig. S13). The nodes in the BoostDiff network have a small overlap with the list of DEGs (Jaccard similarity=0.008). Enrichment results after removal of DEGs are similar to those of the original network (Supplementary Fig. S14). The Diffcoex network is enriched in only the term ‘Herpes simplex virus 1 infection,’ while the z-score network is enriched in many cancer-related terms (Supplementary Tables S12 and S13). The EBcoexpress network is also enriched in CD-related terms, such as ‘IL-17 signaling pathway,’ ‘NOD-like receptor signaling pathway,’ ‘TNF signaling pathway’ and ‘NF-κB signaling pathway.’ (Supplementary Table S15).

Figure 5.

Figure 5.

Enriched KEGG pathways in the networks inferred by BoostDiff for the CD dataset. The resulting differential networks are strongly enriched in inflammation-related and other relevant pathways, consistent with known disease pathophysiology.

The Louvain algorithm is then applied on the differential networks to refine the results. For Diffcoex and Differential GENIE3, none of the modules show significant CD-related enrichment. For z-score, one module is enriched in metabolism-related genes (Supplementary Fig. S15 and Supplementary Table S16). The EBcoexpress network contains a module that is enriched in inflammation-related terms ‘Natural killer cell mediated cytotoxicity,’ ‘NF-κB signaling pathway,’ and ‘TNF signaling pathway’ (Supplementary Fig. S16 and Supplementary Table S17). On the other hand, the BoostDiff network contains a module enriched in the terms ‘Necroptosis,’ ‘Apoptosis,’ and ‘TNF signaling pathway’ (Fig. 6 and Supplementary Table S18). CD is characterized by inflammation in the intestinal mucosa leading to necroptosis, autophagy, and apoptosis. TLRs, such as TLR3, are central players in microbial defense in the gut that trigger downstream inflammatory signaling pathways, such as NF-κB (Lu et al. 2018). The complex roles of caspases, including CASP1, CASP8, and CASP10, are also known to drive gut inflammation and tissue damage (Mandal et al. 2018). CHUCK is a member of the NF-κB signaling pathway that functions in chemokine signaling (Awane et al. 1999). Aberrant STAT1 and STAT3 signaling in IBD pathogenesis has been previously established (Nieminen et al. 2013). The protective role of chemokine-like receptor 1 (CMKLR1) against microbiota-driven colitis has been reported (Lin et al. 2022). TRIM40 drives IBD pathogenesis by targeting cell-cell junctions in the mucosal epithelial barrier and promoting inflammation in the gut (Noguchi et al. 2011, Kang et al. 2023). Disturbances in the circadian rhythm are also linked to IBD manifestations (Giebfried and Lorentz 2023); for instance, CLOCK expression is disrupted in IBD patients (Weintraub et al. 2020, Giebfried and Lorentz 2023). Interestingly, BoostDiff identifies an edge between CLOCK and STAT3, which could point to a candidate regulatory mechanism where the circadian clock influences STAT3, a known driver of the CD phenotype. Thus, this dependency could point to further research in this direction.

Figure 6.

Figure 6.

Dysregulated module identified from the CD differential network output by BoostDiff. Notable genes include CLOCK, TRIM40, IL7, TLR3, CASP1, STAT3, STAT1, CASP8, CASP10, CMKLR1, and CHUK.

3.2.3 Stress response in Bacillus subtilis

BoostDiff also outperforms its competitors on the B.subtilis data (Supplementary Tables S19–S28). The networks inferred by BoostDiff are found to be enriched in stress-related pathways for all three conditions (Supplementary Fig. S17). On the other hand, only the Differential GENIE3 network in the salt stress condition is enriched in stress-related pathways. For Diffcoex, only one of the networks (heat treatment) is significantly enriched in stress-related pathways.

Next, we examine the networks based on the top hub genes. For the salt stress experiment, all the top ten hub genes in the differential network (csbD, S536, ohrB, yuzA, ywjC, S1199, S1379, yfhK, yycD, ywmE, and ybyB) are part of the SigB regulon (Fig. 7). Similarly, for the heat stress experiment, BoostDiff identifies members of the SigB regulon, namely, csbD, ohrB, S536, yuzA, S1199, yycD, ywjC, yfhK, ywmE, and nhaX, as hub genes. For the ethanol stress experiment, BoostDiff idebtifies yuzA, ydaD, yycD, ydaE, yocK, yjgB, yhdF, S1477, and ytaB as the top ten hubs. Consistent with the applied stressors, the abovementioned genes encoding response proteins are part of the SigB regulon. The general stress response is controlled by the alternative sigma factor SigB, which induces the transcription of >100 genes across a wide range of stress stimuli (Nannapaneni et al. 2012). On the other hand, the other methods identify fewer relevant hub genes or do not produce meaningful outputs. EBcoexpress assigns a total of 68 532 edges and 1 080 620 edges with a posterior probability of 1.0 for the salt stress and heat stress experiments, respectively, thereby hindering biological interpretation. For the ethanol stress experiment, EBcoexpress did not run even after stringent filtering because of strong correlations in the dataset. The z-score method only identifies ctc as the relevant hub gene in the salt stress condition and no relevant genes in the heat stress condition; for the ethanol experiment, the z-score method identifies the ribosomal gene rplJ as a hub, which is not directly involved in the stress response (Supplementary Fig. S18). For Diffcoex, only three members of the SigB regulon (ydbD, yqjL, and yvgO) are identified as hubs in the heat stress condition and one relevant hub (S1290) is identified in the salt stress condition; no relevant hub gene is identified in the ethanol stress condition (Supplementary Fig. S19). Differential GENIE3 identifies only yflD as a hub in the salt stress condition (Supplementary Fig. S20).

Figure 7.

Figure 7.

The differential networks identified by BoostDiff for the stress response experiments in B.subtilis. (a) The differential network inferred with the salt stress versus Spizizen’s minimal medium control comparison. (b) The differential network inferred with the heat stress versus Belitsky Minimal Medium (BMM) control comparison (c). The differential network inferred with the ethanol stress versus BMM control comparison. In all three conditions, the top ten hub genes identified by BoostDiff are members of the SigB regulon, which is induced in diverse stresses in B.subtilis.

3.2.4 TCGA—breast cancer

Again, the network inferred by BoostDiff shows stronger enrichment of disease-related pathways (including ‘Transcriptional misregulation in cancer,’ ‘Pathways in cancer,’ ‘Proteoglycans in cancer,’ and ‘Signaling pathways regulating pluripotency of stem cells’) than networks inferred using Diffcoex, EBcoexpress, Differential GENIE3, or the z-score method (Fig. 8 and Supplementary Tables S29–S32). Only the Diffcoex network is enriched in inflammatory and cancer-related pathways. Ebcoexpress assigns a posterior probability of 1 to 375 873 edges, while the z-score network is enriched in one term, ‘Tight junction’ (Supplementary Tables S30 and S31). The network inferred by Differential GENIE3 is enriched in terms related to olfactory transduction and metabolism (Supplementary Table S32). For BoostDiff, the results of enrichment analysis when considering the two sub-analyses separately shows generally similar results (Supplementary Fig. S21). The nodes in the BoostDiff network have a small overlap with the list of DEGs (Jaccard similarity=0.126). Enrichment results after removal of DEGs are similar to those of the original network (Supplementary Fig. S22).

Figure 8.

Figure 8.

Enriched KEGG pathways in the networks inferred by BoostDiff for the TCGA BRCA dataset.

No BRCA-related module is found in the Differential GENIE3 network. Module analysis for Diffcoex identifies a module that is primarily enriched in generic inflammation-related pathways, such as ‘IL-17 signaling pathway,’ ‘TNF signaling pathway,’ and ‘JAK-STAT signaling pathway.’ (Supplementary Fig. S23 and Supplementary Table S33). BoostDiff results are more specific: Here, we identify a module that is enriched in the terms ‘PPAR signaling pathway,’ ‘Insulin resistance,’ ‘AMPK signaling pathway,’ ‘Adipocytokine signaling pathway,’ ‘Pyruvate metabolism,’ and ‘Regulation of lipolysis in adipocytes’ (Fig. 9 and Supplementary Table S34). Adipose tissue plays an important role in tumor progression because it provides nutrients, metabolites, and adipokines to the tumor microenvironment of breast cancer cells (DeBerardinis et al. 2008, Guaita-Esteruelas et al. 2017). PPAR is a ligand-activated TF that is involved in fatty acid oxidation and lipid metabolism (Lee et al. 2019). In the context of BRCA, PPAR signaling is known to contribute to metastasis and promote survival in harsh metabolic conditions (Saez et al. 2004, Wang et al. 2016).

Figure 9.

Figure 9.

Dysregulated Louvain module for the TCGA BRCA dataset. Notable genes include PPARG, PLIN1, PLIN4, CD36, ESR2, ADIPOQ, FABP4, PCK1, CEBPA, MEOX1, and MEOX2.

The BoostDiff results not only make sense in light of the enriched pathways, but also when zooming-in on the individual edges and hub nodes: For instance, BoostDiff identifies a relationship between PPARG and PLIN4, which are already known to be associated in the context of adipogenesis (Nielsen et al. 2008). Perilipins comprise lipid droplet-associated proteins that modulate lipid homeostasis. In BRCA, PLIN4 upregulation is associated with poor prognosis (Kim et al. 2015, Zhang et al. 2021). BoostDiff also predicts an edge between PPARG and PCK1, which is also part of the PPAR pathway. PCK1 encodes a protein kinase that has multiple known metabolic roles, including lipogenesis and tumorigenesis (Tang et al. 2021). BoostDiff identifies CD36 as a hub gene in the module. In BRCA, CD36 has been reported to be involved in metabolic rewiring and reprogramming favoring fatty acid oxidation and has been proposed as a therapeutic target in breast cancer (Feng et al. 2019). BoostDiff further identified an edge between CD36 and FABP4. In BRCA, CD36 was shown to directly interact with FABP4 to influence transport and metabolism of fatty acids (Gyamfi et al. 2021). Further, CD36 is upregulated along with other lipid metabolism genes PLIN1 and ADIPOQ (adiponectin) in breast cancer (Marino et al. 2020).

The correlation between ESR2 and ADIPOQ in adipose tissue is also established (Ahmed et al. 2022). While this specific relationship has not been described in breast cancer, this could be investigated in future studies, given the established role of estrogens and adiposity in breast cancer (Bhardwaj and Brown 2021). Along with PPARG, CEBPA is a known master regulator of adipogenesis that influences lipid accumulation (Lee et al. 2019). Here, CEBPA is associated with ADIPOQ, another adipocyte marker (Matulewicz et al. 2017). Other identified TFs include MEOX1 and MEOX2, which are involved in angiogenesis and vascular development (Schönrock et al. 2022, Wang et al. 2022, Zeng et al. 2023). For instance, MEOX1 expression was found to be associated with tumor growth triple-negative BRCA (Zeng et al. 2023). Based on their emerging roles in neoplastic diseases, MEOX1 and MEOX2 could be candidate targets in breast cancer. Overall, our results identify genes that can be further explored and reinforce the important role of adipogenesis in breast cancer.

3.2.5 TCGA—prostate adenocarcinoma

For PRAD, the EBcoexpress network is too large to analyze, as 375 873 edges are assigned with posterior probability of 1, while the z-score network is not significantly enriched in any terms. The Diffcoex network is enriched in infection- and cancer-related terms, while the Differential GENIE3 network is enriched in terms related to metabolism. On the other hand, the BoostDiff network shows overall stronger enrichment of cancer-related terms (Fig. 10 and Supplementary Tables S35–S37). The nodes in the BoostDiff network have a small overlap with the list of DEGs (Jaccard similarity=0.114). The results of enrichment analysis when considering the two sub-analyses separately shows generally similar results (Supplementary Fig. S24). Enrichment results after removal of DEGs are similar to those of the original network (Supplementary Fig. S25).

Figure 10.

Figure 10.

Enriched KEGG pathways in the networks inferred by BoostDiff for the TCGA PRAD dataset.

For Diffcoex, we identify a cancer-associated module enriched in terms such as ‘Cell cycle,’ ‘Cellular senescence,’ and ‘Bladder cancer’ (Supplementary Fig. S26 and Supplementary Table S38). None of the modules in the z-score and Differential GENIE3 networks show significant enrichment of cancer-related terms. For BoostDiff, we identify a module enriched in the terms ‘Transcriptional misregulation in cancer,’ ‘Adrenergic signaling in cardiomyocytes,’ ‘Dilated cardiomyopathy,’ ‘Thyroid hormone signaling pathway,’ ‘Cardiac muscle contraction,’ and ‘Pathways in cancer’ (Fig. 11 and Supplementary Table S39). The module contains MEF2C, a TF known to be involved in oncogenesis and tumor proliferation in many types of cancers (Bai et al. 2015, Di Giorgio et al. 2017). TEAD4 was found to be an important prognostic marker in PRAD and is a known effector of oncogenic pathways (Chen et al. 2021). The potential involvement of MYH7 in PRAD has been recently described; in particular (Liang et al. 2022), found a higher frequency of MYH7 variants in prostate cancer patients. CAMK2A is known to promote tumor progression in prostate cancer (Rokhlin et al. 2007, He and Li 2021). In an earlier study, protein-protein interaction network-based analyses similarly identified ACTN2, MYL1, MYL2, and MYH7 as hub genes associated with PTEN-associated prostate cancer (Sun et al. 2019). The dysregulation of ACTN2 and MYH7 has been described in the context of other cancers (Ju et al. 2023). Our results suggest that candidate genes in the identified module can be further examined in the context of prostate cancer.

Figure 11.

Figure 11.

Dysregulated Louvain module for the TCGA PRAD dataset. Notable genes include MEF2C, MYL1, MYH6, MYH7, MYL2, ACTN2, CAMK2A, and TEAD4.

3.2.6 Expression distributions of top-ranked edges identified by BoostDiff reflect shifts in correlation distributions

We additionally examine the Pearson correlations of the top edges from the differential networks identified by BoostDiff using the original expression data. This procedure is performed separately for the results of the two sub-analyses, namely, when the disease condition is used as the target condition, and when the control condition is used as the target condition. For the COVID-19 and CD datasets, for the same set of edges, we observe a unimodal distribution of correlation values in the nontarget condition and a bimodal distribution in the target condition, where strong positive correlation values suggest target-condition-specific activating regulator-target relationship, while negative values indicate target-condition-specific inhibitory relationships (Fig. 12). Similar observations can be derived for the B.subtilis stress adaptation and TCGA datasets (Supplementary Fig. S27). That is, expression levels of genes incident with differential edges returned by BoostDiff tend to exhibit highly condition-specific correlation patterns, indicating that BoostDiff can indeed uncover disease-specific gene dysregulation. Importantly, this striking observation does not hold for the set of all pairwise edges whose endpoints are DEGs or for randomly selected edges. For Differential GENIE3, similar patterns were observed for the salt stress and ethanol stress networks (Supplementary Fig. S28). However, no consistent patterns are observed when the same analysis is performed on the networks inferred from the human datasets (Supplementary Fig. S29).

Figure 12.

Figure 12.

Violin plots showing that the top 500 edges in the differential network predicted by BoostDiff for each sub-analysis tend to exhibit changes in correlation distributions between the disease and control expression data, indicating dysregulation in pairwise relationships. Correlations between predicted differential edges are compared to correlations between all pairwise combinations of DEGs, as well as randomly selected edges. Results are shown for (a) COVID-19, (b) CD, (c) TCGA—BRCA, and (d) TCGA—PRAD datasets.

4 Conclusions

Gene regulation is a complex process that changes under different biological contexts. Differential network biology explores the rewiring of these regulatory interaction landscapes that are fundamentally distinct from the static networks that are inferred in most standard GRN inference methods (Ideker and Krogan 2012). By additionally considering the regulatory dependencies from a baseline condition, we can uncover a more refined picture underlying the molecular processes that are perturbed in a condition of interest, such as disease.

Inference of networks from biological expression data is a challenging task. The novelty of BoostDiff is 2-fold: (i) We use differential variance improvement as the splitting measure in a tree-based algorithm that can explicitly compare two datasets with a continuous output variable; (ii) BoostDiff adapts the AdaBoost algorithm to use differential trees as the base learner. Boosting the differential trees with respect to samples belonging to the target condition is a crucial step that significantly improves the detection of differential edges.

BoostDiff outperforms existing differential network methods on simulated data in more complex settings with higher dimensionality. BoostDiff yields biologically meaningful results when evaluated on five real-world transcriptomics datasets. We show that BoostDiff infers differential networks that are consistent with the pathophysiology of COVID-19, Crohn’s disease, and two types of cancer, namely, breast cancer and prostate adenocarcinoma, using TCGA data. We further validate BoostDiff using three different stress conditions evaluating the response of B.subtilis to salt, heat, and ethanol stressors, where BoostDiff correctly identifies members of the SigB regulon as hub genes in the differential networks. On the other hand, the compared methods recover few stress-related response genes. BoostDiff is also scalable and has reasonable runtime since it builds one model for each gene and can hence easily be parallelized. To facilitate biological interpretation and further refine the results, various graph-based algorithms can be used to find highly connected subnetworks. In this study, we apply the Louvain community detection algorithm, although other graph-based tools can also be used to find perturbed modules.

Nevertheless, our method has several limitations. First, BoostDiff can only compare two conditions at a time. Moreover, BoostDiff is similar to GENIE3 in that it does not perform statistical testing. Instead, scores are assigned to individual edges by calculating tree-based variable importance measures; thus, only the ranking of the edge weights is considered. Further, the AdaBoost algorithm can be prone to overfitting. In our experiments, BoostDiff performance decreased when setting the number of trees to 300. Thus, to avoid overfitting, we recommend users to set a low number of base differential trees (e.g. 50 trees).

The application of BoostDiff is not limited to gene expression data; the proposed feature selection approach can be generalized to other omics datasets. For instance, BoostDiff can be applied to proteomics or metabolomics studies that aim to detect changes in dependencies of proteins or metabolites. Moreover, the simple but effective strategy implemented in BoostDiff is an algorithmic advancement that can be further extended to other problems that aim to extract differentially predictive features. Adapting BoostDiff for analyzing time-series datasets is also a promising research direction.

Supplementary Material

vbae034_Supplementary_Data

Acknowledgements

We are grateful to Julian Beier for technical support. We thank the anonymous reviewers for their valuable suggestions.

Contributor Information

Gihanna Galindez, Division Data Science in Biomedicine, Peter L. Reichertz Institute for Medical Informatics of Technische Universität Braunschweig and Hannover Medical School, Braunschweig, 38106, Germany; Braunschweig Integrated Centre of Systems Biology (BRICS), TU Braunschweig, Braunschweig, 38106, Germany.

Markus List, Experimental Bioinformatics, TUM School of Life Sciences, Technical University of Munich, Munich, 85354, Germany.

Jan Baumbach, Institute for Computational Systems Biology, University of Hamburg, Hamburg, 22607, Germany; Computational Biomedicine Lab, Department of Mathematics and Computer Science, University of Southern Denmark, Odense, 5230, Denmark.

Uwe Völker, Department of Functional Genomics, Interfaculty Institute for Genetics and Functional Genomics, University Medicine Greifswald, Greifswald, 17475, Germany.

Ulrike Mäder, Department of Functional Genomics, Interfaculty Institute for Genetics and Functional Genomics, University Medicine Greifswald, Greifswald, 17475, Germany.

David B Blumenthal, Biomedical Network Science Lab, Department of Artificial Intelligence in Biomedical Engineering, Friedrich-Alexander-Universität Erlangen-Nürnberg, Erlangen, 91052, Germany.

Tim Kacprowski, Division Data Science in Biomedicine, Peter L. Reichertz Institute for Medical Informatics of Technische Universität Braunschweig and Hannover Medical School, Braunschweig, 38106, Germany; Braunschweig Integrated Centre of Systems Biology (BRICS), TU Braunschweig, Braunschweig, 38106, Germany.

Author contributions

G.G.: Conceptualization, Formal analysis, Methodology, Software, Writing—original draft. M.L.: Conceptualization, Writing—Review & Editing. J.B.: Conceptualization, Writing—Review & Editing. U.V.: Validation, Writing—Review & Editing. U.M.: Validation, Writing—Review & Editing. D.B.B.: Conceptualization, Methodology, Supervision, Writing—Review & Editing. T.K.: Conceptualization, Methodology, Supervision, Writing—Review & Editing.

Supplementary data

Supplementary data are available at Bioinformatics Advances online.

Conflict of interest

None declared.

Funding

This work was supported by the German Federal Ministry of Education and Research (BMBF) [031L0309A to D.B.B.]; and the Open Access Publication Funds of Technische Universität Braunschweig.

Data availability

The data used in this article are publicly available. Transcriptomics data used for analyzing the COVID-19, Crohn’s disease, and Bacillus subtilis stress response can be downloaded from Gene Expression Omnibus under the accession numbers GSE156063, GSE126124, and GSE27219, respecively. The TCGA datasets are available at https://xenabrowser.net/datapages/. The simulated gene expression data are deposited at https://zenodo.org/records/10214950. Our BoostDiff implementation, as well as sample scripts and tutorials are available on GitHub: https://github.com/scibiome/boostdiff_inference.

References

  1. Ahmed F, Hetty S, Vranic M. et al. ESR2 expression in subcutaneous adipose tissue is related to body fat distribution in women, and knockdown impairs preadipocyte differentiation. Adipocyte 2022;11:434–47. [DOI] [PMC free article] [PubMed] [Google Scholar]
  2. Awane M, Andres PG, Li DJ. et al. Nf-κb-inducing kinase is a common mediator of il-17-, tnf-α-, and il-1β-induced chemokine promoter activation in intestinal epithelial cells. J Immunol 1999;162:5337–44. [PubMed] [Google Scholar]
  3. Bai XL, Zhang Q, Ye LY. et al. Myocyte enhancer factor 2C regulation of hepatocellular carcinoma via vascular endothelial growth factor and Wnt/β-catenin signaling. Oncogene 2015;34:4089–97. [DOI] [PubMed] [Google Scholar]
  4. Basha O, Argov CM, Artzy R. et al. Differential network analysis of multiple human tissue interactomes highlights tissue-selective processes and genetic disorder genes. Bioinformatics 2020;36:2821–8. [DOI] [PubMed] [Google Scholar]
  5. Baur B, Shin J, Zhang S. et al. Data integration for inferring context-specific gene regulatory networks. Curr Opin Syst Biol 2020;23:38–46. [DOI] [PMC free article] [PubMed] [Google Scholar]
  6. Bhardwaj P, Brown KA.. Obese adipose tissue as a driver of breast cancer growth and development: update and emerging evidence. Front Oncol 2021;11:638918. [DOI] [PMC free article] [PubMed] [Google Scholar]
  7. Bhuva DD, Cursons J, Smyth GK. et al. Differential co-expression-based detection of conditional relationships in transcriptional data: comparative analysis and application to breast cancer. Genome Biol 2019;20:236. [DOI] [PMC free article] [PubMed] [Google Scholar]
  8. Blondel VD, Guillaume J-L, Lambiotte R. et al. Fast unfolding of communities in large networks. J Stat Mech2008;2008:P10008. [Google Scholar]
  9. Breiman L, Friedman J, Olshen R. et al. Classification and regression trees. Wadsworth Int Group 1984;37:237–51. [Google Scholar]
  10. Brunel H, Gallardo-Chacón J-J, Buil A. et al. MISS: a non-linear methodology based on mutual information for genetic association studies in both population and sib-pairs analysis. Bioinformatics 2010;26:1811–8. [DOI] [PubMed] [Google Scholar]
  11. Callahan V, Hawks S, Crawford MA. et al. The pro-inflammatory chemokines CXCL9, CXCL10 and CXCL11 are upregulated following SARS-CoV-2 infection in an AKT-dependent manner. Viruses 2021;13:1062. [DOI] [PMC free article] [PubMed] [Google Scholar]
  12. Cao X. ISG15 secretion exacerbates inflammation in SARS-CoV-2 infection. Nat Immunol 2021;22:1360–2. [DOI] [PubMed] [Google Scholar]
  13. Chen C-L, Hsu S-C, Chung T-Y. et al. Arginine is an epigenetic regulator targeting tead4 to modulate oxphos in prostate cancer cells. Nat Commun 2021;12:2398. [DOI] [PMC free article] [PubMed] [Google Scholar]
  14. Chu J-H, Lazarus R, Carey VJ. et al. Quantifying differential gene connectivity between disease states for objective identification of disease-relevant genes. BMC Syst Biol 2011;5:89. [DOI] [PMC free article] [PubMed] [Google Scholar]
  15. Costello JC, Heiser LM, Georgii E. et al. ; NCI DREAM Community. A community effort to assess and improve drug sensitivity prediction algorithms. Nat. Biotechnol 2014;32:1202–12. [DOI] [PMC free article] [PubMed] [Google Scholar]
  16. Dawson JA, Kendziorski C.. An empirical Bayesian approach for identifying differential coexpression in high-throughput experiments. Biometrics 2012;68:455–65. [DOI] [PMC free article] [PubMed] [Google Scholar]
  17. Dawson JA, Ye S, Kendziorski C.. R/EBcoexpress: an empirical Bayesian framework for discovering differential co-expression. Bioinformatics 2012;28:1939–40. [DOI] [PMC free article] [PubMed] [Google Scholar]
  18. DeBerardinis RJ, Lum JJ, Hatzivassiliou G. et al. The biology of cancer: metabolic reprogramming fuels cell growth and proliferation. Cell Metab 2008;7:11–20. [DOI] [PubMed] [Google Scholar]
  19. de la Fuente A. From ‘differential expression’ to ‘differential networking’ – identification of dysfunctional regulatory networks in diseases. Trends Genet 2010;26:326–33. [DOI] [PubMed] [Google Scholar]
  20. Di Giorgio E, Franforte E, Cefalù S. et al. The co-existence of transcriptional activator and transcriptional repressor MEF2 complexes influences tumor aggressiveness. PLoS Genet 2017;13:e1006752. [DOI] [PMC free article] [PubMed] [Google Scholar]
  21. Drucker H. Improving regressors using boosting techniques. In: ICML, July 8–12, 1997; Nashville, Tennessee, USA: Morgan Kaufmann, Vol. 97, 1997, 107–15. [Google Scholar]
  22. Durinck S, Spellman PT, Birney E. et al. Mapping identifiers for the integration of genomic datasets with the R/Bioconductor package biomart. Nat Protoc 2009;4:1184–91. [DOI] [PMC free article] [PubMed] [Google Scholar]
  23. Feng WW, Wilkins O, Bang S. et al. CD36-Mediated metabolic rewiring of breast cancer cells promotes resistance to HER2-Targeted therapies. Cell Rep 2019;29:3405–20.e5. [DOI] [PMC free article] [PubMed] [Google Scholar]
  24. Gao L-J, He Z-M, Li Y-Y. et al. Role of OAS gene family in COVID-19 induced heart failure. J Transl Med 2023;21:212. [DOI] [PMC free article] [PubMed] [Google Scholar]
  25. Ge SX, Jung D, Yao R.. ShinyGO: a graphical gene-set enrichment tool for animals and plants. Bioinformatics 2020;36:2628–9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  26. Giebfried J, Lorentz A.. Relationship between the biological clock and inflammatory bowel disease. Clocks Sleep 2023;5:260–75. [DOI] [PMC free article] [PubMed] [Google Scholar]
  27. Goldman MJ, Craft B, Hastie M. et al. Visualizing and interpreting cancer genomics data via the xena platform. Nat Biotechnol 2020;38:675–8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  28. Guaita-Esteruelas S, Bosquet A, Saavedra P. et al. Exogenous FABP4 increases breast cancer cell proliferation and activates the expression of fatty acid transport proteins. Mol Carcinog 2017;56:208–17. [DOI] [PubMed] [Google Scholar]
  29. Gyamfi J, Yeo JH, Kwon D. et al. Interaction between CD36 and FABP4 modulates adipocyte-induced fatty acid import and metabolism in breast cancer. NPJ Breast Cancer 2021;7:129. [DOI] [PMC free article] [PubMed] [Google Scholar]
  30. Han YM, Koh J, Kim JW. et al. NF-kappa B activation correlates with disease phenotype in Crohn’s disease. PLoS One 2017;12:e0182071. [DOI] [PMC free article] [PubMed] [Google Scholar]
  31. Hasan M, Koch J, Rakheja D. et al. Trex1 regulates lysosomal biogenesis and interferon-independent activation of antiviral genes. Nat Immunol 2013;14:61–71. [DOI] [PMC free article] [PubMed] [Google Scholar]
  32. He Q, Li Z.. The dysregulated expression and functional effect of CaMK2 in cancer. Cancer Cell Int 2021;21:326. [DOI] [PMC free article] [PubMed] [Google Scholar]
  33. Hug H, Mohajeri MH, La Fata G.. Toll-Like receptors: regulators of the immune response in the human gut. Nutrients 2018;10:203. [DOI] [PMC free article] [PubMed] [Google Scholar]
  34. Huynh-Thu VA, Irrthum A, Wehenkel L. et al. Inferring regulatory networks from expression data using tree-based methods. PLoS One 2010;5:203. [DOI] [PMC free article] [PubMed] [Google Scholar]
  35. Ideker T, Krogan NJ.. Differential network biology. Mol Syst Biol 2012;8:565. [DOI] [PMC free article] [PubMed] [Google Scholar]
  36. Ju G, Yao Z, Zhao Y. et al. Data mining on identifying diagnosis and prognosis biomarkers in head and neck squamous carcinoma. Sci Rep 2023;13:10020. [DOI] [PMC free article] [PubMed] [Google Scholar]
  37. Kang S, Kim J, Park A. et al. TRIM40 is a pathogenic driver of inflammatory bowel disease subverting intestinal barrier integrity. Nat Commun 2023;14:700. [DOI] [PMC free article] [PubMed] [Google Scholar]
  38. Karlebach G, Shamir R.. Modelling and analysis of gene regulatory networks. Nat Rev Mol Cell Biol 2008;9:770–80. [DOI] [PubMed] [Google Scholar]
  39. Kim S, Lee Y, Koo JS.. Differential expression of lipid metabolism-related proteins in different breast cancer subtypes. PLoS One 2015;10:e0119473. [DOI] [PMC free article] [PubMed] [Google Scholar]
  40. King CR, Liu Y, Amato KA. et al. Pathogen-driven CRISPR screens identify TREX1 as a regulator of DNA self-sensing during influenza virus infection. Cell Host Microbe 2023;31:1552–67.e8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  41. Kuijjer ML, Tung MG, Yuan G. et al. Estimating Sample-Specific regulatory networks. iScience 2019;14:226–40. [DOI] [PMC free article] [PubMed] [Google Scholar]
  42. Kuleshov MV, Jones MR, Rouillard AD. et al. Enrichr: a comprehensive gene set enrichment analysis web server 2016 update. Nucleic Acids Res 2016;44:W90–7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  43. Lambert SA, Jolma A, Campitelli LF. et al. The human transcription factors. Cell 2018;175:598–9. [DOI] [PubMed] [Google Scholar]
  44. Lee J-E, Schmidt H, Lai B. et al. Transcriptional and epigenomic regulation of adipogenesis. Mol Cell Biol 2019;39:e00601-18. [DOI] [PMC free article] [PubMed] [Google Scholar]
  45. Levy G, Guglielmelli P, Langmuir P. et al. JAK inhibitors and COVID-19. J Immunother Cancer 2022;10:e002838. [DOI] [PMC free article] [PubMed] [Google Scholar]
  46. Liang Y, Chiu PK-F, Zhu Y. et al. Whole-exome sequencing reveals a comprehensive germline mutation landscape and identifies twelve novel predisposition genes in Chinese prostate cancer patients. PLoS Genet 2022;18:e1010373. [DOI] [PMC free article] [PubMed] [Google Scholar]
  47. Lin Y, Cai Q, Luo Y. et al. Epithelial chemerin–CMKLR1 signaling restricts microbiota-driven colonic neutrophilia and tumorigenesis by up-regulating lactoperoxidase. Proc Natl Acad Sci USA 2022;119:e2205574119. [DOI] [PMC free article] [PubMed] [Google Scholar]
  48. Love MI, Huber W, Anders S.. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol 2014;15:550. [DOI] [PMC free article] [PubMed] [Google Scholar]
  49. Lu Y, Li X, Liu S. et al. Toll-like receptors and inflammatory bowel disease. Front Immunol 2018;9:72. [DOI] [PMC free article] [PubMed] [Google Scholar]
  50. Mandal P, Feng Y, Lyons JD. et al. Caspase-8 collaborates with caspase-11 to drive tissue damage and execution of endotoxic shock. Immunity 2018;49:42–55.e6. [DOI] [PMC free article] [PubMed] [Google Scholar]
  51. Marbach D, Costello JC, Küffner R. et al. ; DREAM5 Consortium. Wisdom of crowds for robust gene network inference. Nat. Methods 2012;9:796–804. [DOI] [PMC free article] [PubMed] [Google Scholar]
  52. Margolin AA, Nemenman I, Basso K. et al. ARACNE: an algorithm for the reconstruction of gene regulatory networks in a mammalian cellular context. BMC Bioinformatics 2006;7:S7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  53. Marino N, German R, Rao X. et al. Upregulation of lipid metabolism genes in the breast prior to cancer diagnosis. NPJ Breast Cancer 2020;6:50. [DOI] [PMC free article] [PubMed] [Google Scholar]
  54. Matschinske J, Alcaraz N, Benis A. et al. The AIMe registry for artificial intelligence in biomedical research. Nat Methods 2021;18:1128–31. [DOI] [PubMed] [Google Scholar]
  55. Matulewicz N, Stefanowicz M, Nikolajuk A. et al. Markers of adipogenesis, but not inflammation, in adipose tissue are independently related to insulin sensitivity. J Clin Endocrinol Metab 2017;102:3040–9. [DOI] [PubMed] [Google Scholar]
  56. Mears HV, Sweeney TR.. Better together: the role of IFIT protein–protein interactions in the antiviral response. J Gen Virol 2018;99:1463–77. [DOI] [PubMed] [Google Scholar]
  57. Meyer PE, Lafitte F, Bontempi G.. minet: a R/Bioconductor package for inferring large transcriptional networks using mutual information. BMC Bioinformatics 2008;9:461. [DOI] [PMC free article] [PubMed] [Google Scholar]
  58. Mick E, Kamm J, Pisco AO. et al. Upper airway gene expression reveals suppressed immune responses to SARS-CoV-2 compared with other respiratory viruses. Nat Commun 2020;11:5854. [DOI] [PMC free article] [PubMed] [Google Scholar]
  59. Moerman T, Aibar Santos S, Bravo González-Blas C. et al. GRNBoost2 and arboreto: efficient and scalable inference of gene regulatory networks. Bioinformatics 2019;35:2159–61. [DOI] [PubMed] [Google Scholar]
  60. Mu J, Fang Y, Yang Q. et al. SARS-CoV-2 N protein antagonizes type I interferon signaling by suppressing phosphorylation and nuclear translocation of STAT1 and STAT2. Cell Discov 2020;6:65. [DOI] [PMC free article] [PubMed] [Google Scholar]
  61. Nagy-Staron A, Tomasek K, Caruso Carter C. et al. Local genetic context shapes the function of a gene regulatory network. Elife 2021;10:e65993. [DOI] [PMC free article] [PubMed] [Google Scholar]
  62. Nannapaneni P, Hertwig F, Depke M. et al. Defining the structure of the general stress regulon of Bacillus subtilis using targeted microarray analysis and random forest classification. Microbiology (Reading) 2012;158:696–707. [DOI] [PubMed] [Google Scholar]
  63. Nicolas P, Mäder U, Dervyn E. et al. Condition-dependent transcriptome reveals high-level regulatory architecture in Bacillus subtilis. Science 2012;335:1103–6. [DOI] [PubMed] [Google Scholar]
  64. Nielsen R, Pedersen TA, Hagenbeek D. et al. Genome-wide profiling of PPARgamma: RXR and RNA polymerase II occupancy reveals temporal activation of distinct metabolic pathways and changes in RXR dimer composition during adipogenesis. Genes Dev 2008;22:2953–67. [DOI] [PMC free article] [PubMed] [Google Scholar]
  65. Nieminen JK, Niemi M, Sipponen T. et al. Dendritic cells from Crohn’s disease patients show aberrant STAT1 and STAT3 signaling. PLoS One 2013;8:e70738. [DOI] [PMC free article] [PubMed] [Google Scholar]
  66. Noguchi K, Okumura F, Takahashi N. et al. TRIM40 promotes neddylation of IKKγ and is downregulated in gastrointestinal cancers. Carcinogenesis 2011;32:995–1004. [DOI] [PubMed] [Google Scholar]
  67. Palmer NP, Silvester JA, Lee JJ. et al. Concordance between gene expression in peripheral whole blood and colonic tissue in children with inflammatory bowel disease. PLoS One 2019;14:e0222952. [DOI] [PMC free article] [PubMed] [Google Scholar]
  68. Ritchie ME, Phipson B, Wu D. et al. limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Res 2015;43:e47. [DOI] [PMC free article] [PubMed] [Google Scholar]
  69. Rokhlin OW, Taghiyev AF, Bayer KU. et al. Calcium/calmodulin-dependent kinase II plays an important role in prostate cancer cell survival. Cancer Biol Ther 2007;6:732–42. [DOI] [PubMed] [Google Scholar]
  70. Saez E, Rosenfeld J, Livolsi A. et al. PPAR gamma signaling exacerbates mammary gland tumor development. Genes Dev 2004;18:528–40. [DOI] [PMC free article] [PubMed] [Google Scholar]
  71. Schmitt H, Neurath MF, Atreya R.. Role of the IL23/IL17 pathway in Crohn’s disease. Front Immunol 2021;12:622934. [DOI] [PMC free article] [PubMed] [Google Scholar]
  72. Schönrock A, Heinzelmann E, Steffl B. et al. MEOX2 homeobox gene promotes growth of malignant gliomas. Neuro Oncol 2022;24:1911–24. [DOI] [PMC free article] [PubMed] [Google Scholar]
  73. Shojaie A. Differential network analysis: a statistical perspective. Wiley Interdiscip Rev Comput Stat 2021;13:e1508. [DOI] [PMC free article] [PubMed] [Google Scholar]
  74. Singh DK, Aladyeva E, Das S. et al. Myeloid cell interferon responses correlate with clearance of SARS-CoV-2. Nat Commun 2022;13:679. [DOI] [PMC free article] [PubMed] [Google Scholar]
  75. Stetson DB, Ko JS, Heidmann T. et al. Trex1 prevents cell-intrinsic initiation of autoimmunity. Cell 2008;134:587–98. [DOI] [PMC free article] [PubMed] [Google Scholar]
  76. Subramanian A, Tamayo P, Mootha VK. et al. Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc Natl Acad Sci USA 2005;102:15545–50. [DOI] [PMC free article] [PubMed] [Google Scholar]
  77. Sun J, Li S, Wang F. et al. Identification of key pathways and genes in PTEN mutation prostate cancer by bioinformatics analysis. BMC Med Genet 2019;20:191. [DOI] [PMC free article] [PubMed] [Google Scholar]
  78. Tanaka Y, Tamada Y, Ikeguchi M. et al. System-based differential gene network analysis for characterizing a sample-specific subnetwork. Biomolecules 2020;10:306. [DOI] [PMC free article] [PubMed] [Google Scholar]
  79. Tang K, Zhu L, Chen J. et al. Hypoxia promotes breast cancer cell growth by activating a glycogen metabolic program. Cancer Res 2021;81:4949–63. [DOI] [PubMed] [Google Scholar]
  80. Tesson BM, Breitling R, Jansen RC.. DiffCoEx: a simple and sensitive method to find differentially coexpressed gene modules. BMC Bioinformatics 2010;11:497. [DOI] [PMC free article] [PubMed] [Google Scholar]
  81. Tu J-J, Ou-Yang L, Zhu Y. et al. Differential network analysis by simultaneously considering changes in gene interactions and gene expression. Bioinformatics 2021;37:4414–23. [DOI] [PubMed] [Google Scholar]
  82. Van de Sande B, Flerin C, Davie K. et al. A scalable SCENIC workflow for single-cell gene regulatory network analysis. Nat Protoc 2020;15:2247–76. [DOI] [PubMed] [Google Scholar]
  83. Wang H, Tang Y, Yang X. et al. A crucial angiogenesis-associated gene meox2 could be a promising biomarker candidate for breast cancer. Front Oncol 2022;12:759300. [DOI] [PMC free article] [PubMed] [Google Scholar]
  84. Wang X, Wang G, Shi Y. et al. PPAR-Delta promotes survival of breast cancer cells in harsh metabolic conditions. Oncogenesis 2016;5:e232. [DOI] [PMC free article] [PubMed] [Google Scholar]
  85. Weintraub Y, Cohen S, Chapnik N. et al. Clock gene disruption is an initial manifestation of inflammatory bowel diseases. Clin Gastroenterol Hepatol 2020;18:115–22.e1. [DOI] [PubMed] [Google Scholar]
  86. Yamada T, Sato S, Sotoyama Y. et al. RIG-I triggers a signaling-abortive anti-SARS-CoV-2 defense in human lung cells. Nat Immunol 2021;22:820–8. [DOI] [PubMed] [Google Scholar]
  87. Yu X, Zeng T, Wang X. et al. Unravelling personalized dysfunctional gene network of complex diseases based on differential network model. J Transl Med 2015;13:189–13. [DOI] [PMC free article] [PubMed] [Google Scholar]
  88. Zaidi D, Wine E.. Regulation of nuclear factor kappa-light-chain-enhancer of activated B cells (NF-κβ) in inflammatory bowel diseases. Front Pediatrics 2018;6:317. [DOI] [PMC free article] [PubMed] [Google Scholar]
  89. Zeng G, Liu X, Su X. et al. The role of MEOX1 in non-neoplastic and neoplastic diseases. Biomed Pharmacother 2023;158:114068. [DOI] [PubMed] [Google Scholar]
  90. Zhang J, Ji Y, Zhang L.. Extracting three-way gene interactions from microarray data. Bioinformatics 2007;23:2903–9. [DOI] [PubMed] [Google Scholar]
  91. Zhang X, Su L, Sun K.. Expression status and prognostic value of the perilipin family of genes in breast cancer. Am J Transl Res 2021;13:4450–63. [PMC free article] [PubMed] [Google Scholar]
  92. Zhou Z, Ren L, Zhang L. et al. Heightened innate immune responses in the respiratory tract of COVID-19 patients. Cell Host Microbe 2020;27:883–90.e2. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

vbae034_Supplementary_Data

Data Availability Statement

The data used in this article are publicly available. Transcriptomics data used for analyzing the COVID-19, Crohn’s disease, and Bacillus subtilis stress response can be downloaded from Gene Expression Omnibus under the accession numbers GSE156063, GSE126124, and GSE27219, respecively. The TCGA datasets are available at https://xenabrowser.net/datapages/. The simulated gene expression data are deposited at https://zenodo.org/records/10214950. Our BoostDiff implementation, as well as sample scripts and tutorials are available on GitHub: https://github.com/scibiome/boostdiff_inference.


Articles from Bioinformatics Advances are provided here courtesy of Oxford University Press

RESOURCES