Skip to main content
Bioinformatics logoLink to Bioinformatics
. 2024 Jul 4;40(7):btae435. doi: 10.1093/bioinformatics/btae435

Reverse network diffusion to remove indirect noise for better inference of gene regulatory networks

Jiating Yu 1,2,3, Jiacheng Leng 4,5,6, Fan Yuan 7,8, Duanchen Sun 9,, Ling-Yun Wu 10,11,
Editor: Pier Luigi Martelli
PMCID: PMC11236096  PMID: 38963312

Abstract

Motivation

Gene regulatory networks (GRNs) are vital tools for delineating regulatory relationships between transcription factors and their target genes. The boom in computational biology and various biotechnologies has made inferring GRNs from multi-omics data a hot topic. However, when networks are constructed from gene expression data, they often suffer from false-positive problem due to the transitive effects of correlation. The presence of spurious noise edges obscures the real gene interactions, which makes downstream analyses, such as detecting gene function modules and predicting disease-related genes, difficult and inefficient. Therefore, there is an urgent and compelling need to develop network denoising methods to improve the accuracy of GRN inference.

Results

In this study, we proposed a novel network denoising method named REverse Network Diffusion On Random walks (RENDOR). RENDOR is designed to enhance the accuracy of GRNs afflicted by indirect effects. RENDOR takes noisy networks as input, models higher-order indirect interactions between genes by transitive closure, eliminates false-positive effects using the inverse network diffusion method, and produces refined networks as output. We conducted a comparative assessment of GRN inference accuracy before and after denoising on simulated networks and real GRNs. Our results emphasized that the network derived from RENDOR more accurately and effectively captures gene interactions. This study demonstrates the significance of removing network indirect noise and highlights the effectiveness of the proposed method in enhancing the signal-to-noise ratio of noisy networks.

Availability and implementation

The R package RENDOR is provided at https://github.com/Wu-Lab/RENDOR and other source code and data are available at https://github.com/Wu-Lab/RENDOR-reproduce

1 Introduction

Gene regulatory networks (GRNs) are essential components of the cellular machinery that govern gene expression and control various biological processes. GRNs are composed of complex interactions among genes, transcription factors (TFs), and various regulatory elements (Li et al. 2024, Yuan and Duren 2024). The rapid growth of computational biology and biotechnology has made inferring GRNs from multi-omics data a prominent research area (Badia-i-Mompel et al. 2023, Li et al. 2023). Deciphering GRNs is fundamental for understanding how genes are controlled and coordinated, which in turn impacts various cellular processes and contributes to unraveling the molecular mechanisms underlying development and diseases (Conard et al. 2021, Kloesch et al. 2022).

However, when networks are constructed from data through computational inference methods (e.g. calculating pairwise correlations), they are prone to suffer from false-positive problem due to the transitive effects (Feizi et al. 2013, Xiao et al. 2022). This can result in the observed network being affected by indirect noise from two perspectives. First, consider the scenario where gene A regulates gene B, and gene B regulates gene C. In this case, a strong correlation between the expression of genes A and C can also be observed, even though there is no direct connection between them (Barzel and Barabási 2013, Feizi et al. 2013). Second, the transitive effects can also lead to overestimating the weights of edges linked by multiple indirect network paths. These spurious indirect effects can perturb the true underlying network structure, complicating the analyses of gene interaction patterns. Therefore, it is crucial to decipher the direct relationships between genes from observed networks containing both direct influences (true signals) and indirect influences (noises).

To address the challenge of indirect noises in GRN inference, methodologies typically fall into two main categories. The first involves inferring direct regulatory networks from gene expression data. For example, wpLogicNet (Malekpour et al. 2023) infers directed GRN structures and logic gates among genes by using a Bayesian mixture model to estimate target gene profiles. CMI2NI (Zhang et al. 2015) and CN (Aghdam et al. 2015) utilize Conditional Mutual Information (CMI) to compute causal gene associations. However, CMI is known for its tendency to underestimate, potentially leading to false negatives (Zhao et al. 2021). The Partial Mutual Information method was also proposed to infer the partial independence of variables (Zhao et al. 2016). Additionally, RSNET (Jiang and Zhang 2022) leverages Mutual Information and recursive optimization for network redundancy reduction. These information theory-based methods perform effectively with discrete data, but their efficacy is reduced with continuous data due to the requirement for a larger dataset, posing a limitation to their applicability.

Alternatively, the second strategy involves denoising an inferred GRN to improve its accuracy. Notably, Network Deconvolution (ND) (Feizi et al. 2013) deduces direct dependencies from an observed network using eigenvalue reweighting techniques. Despite its effectiveness, ND lacks robust physical interpretation, as it represents higher-order indirect influences merely by raising the weight matrix A to Ak, falling short in providing the underlying network dynamics. In a similar vein, Baruch et al. formulated Silencer (Barzel and Barabási 2013) to eliminate indirect noises of correlation networks, which treats the observed correlation perturbations as the cumulative outcome of local perturbations. This model, however, is constrained to scenarios where the input matrix is a correlation matrix. In addition, Network Enhancement (NE) (Wang et al. 2018) employs a diffusion-based mechanism to denoise biological networks by enhancing the signal intensity through a nonlinear operator. NE defines strong and well-organized edges as network signals and then enhances their weights, which may not hold true across diverse GRNs. Furthermore, NSRGRN refines GRNs by integrating topological properties and edge importance measures (Liu et al. 2023). Graph-MRcML recovers direct causal network using a graph deconvolution algorithm (Lin et al. 2023). Classical methods like partial correlation (Kim 2015) also provide alternative strategies to remove indirect influences, yet they typically rely on a linear assumption, which may not hold in cases of nonlinear variable interactions and are generally limited to low-order interactions.

In this study, we proposed a novel network denoising method, named REverse Network Diffusion On Random walks (RENDOR), based on our previous work (Yu et al. 2023). RENDOR formulates a network diffusion model under the graph-theory framework to capture indirect noises and attempts to remove these noises by applying reverse network diffusion (Fig. 1). RENDOR excels in modeling high-order indirect influences, since it normalizes the product of edge weights by the degree of the nodes in the path, thereby diminishing the significance of paths with higher intermediate node degrees (Yu et al. 2023). The underlying assumption of RENDOR is that the observed noisy network can be conceptualized as an outcome of diffusion from an underlying true network. Consequently, we can use the inverse diffusion to denoise GRNs to improve their inference accuracy accordingly. The rationale for modeling indirect effects as information diffusion lies in the consideration that indirect influences can be deconstructed into composite outcomes resulting from second-order, third-order, and higher-order effects. Global network diffusion provides a better way to account for the influences of different network orders.

Figure 1.

Figure 1.

The framework of RENDOR. RENDOR takes a noisy network as input, which is often affected by indirect effects, and outputs a denoised network containing only direct effects. The core of the RENDOR method lies in employing a reverse network diffusion approach based on random walks.

2 Materials and methods

As a network denoising method, RENDOR is designed to adjust edge weights for improving the signal-to-noise ratio (SNR) of the input network. The network outputted by RENDOR is a weighted complete graph, meaning that edges not present in the original network (with an initial weight of 0) may also acquire a small weight after denoising. Its primary goal is to amplify the weights of direct relationships between variables, while diminishing those of indirect noisy edges, thus enhancing the accuracy of network inference.

We first present an overview of RENDOR in Fig. 1. It takes a noisy network afflicted by indirect effects as input, models higher-order indirect interactions between nodes by transitive closure, eliminates false-positive edges through inverse network diffusion, and ultimately yields the refined network as output.

Specifically, RENDOR decomposes the noisy observed network Gobs into the sum of a direct network Gdir and an indirect network Gindir. It leverages a network diffusion approach, named Network Refinement (NR), to model indirect influences (Yu et al. 2023):

Gobs=Gdir+Gindir=NR(Gdir)

The underlying assumption is that the observed network Gobs can be obtained through NR diffusion on a true network Gdir, which encapsulates only direct relationships. The NR diffusion operator is algebraically articulated as a power series of the transition probability matrix associated with Gdir, with parameters constrained to guarantee series convergence.

In contrast, for the reconstruction of Gdir from Gobs, the inverse diffusion process of NR, termed RENDOR, is applied to eliminate indirect influences, as denoted by:

Gdir=RENDOR(Gobs)

Next, we will introduce the detailed mathematical formulas and model descriptions of the forward network diffusion method NR, and the reverse network diffusion method RENDOR.

2.1 Random walk on graph

Given a weighted undirected graph G=(V,E,W), where V represents the node set of G, E represents the edge set of G, and W captures the weights of edges in E. Random walk on the graph G can be defined as a finite Markov chain with state-space V and transition matrix P=D-1W, where D=W1 is the degree matrix of G. That is to say, if the random variable is currently at state (node) i, then its state has a probability of Pij=Wij/jWij moving to the neighboring state (node) j in the next step. According to the C-K theorem (Kulik and Soulier 2020), the element of k-th power of transitive probability matrix Pijk tells us the probability of moving from node i to node j through a walk of length k.

2.2 Reverse network diffusion on random walks

RENDOR takes a noisy network, denoted as GinV,E,W, as model input and outputs a denoised network, represented as Gout(V,E,W). The core of the RENDOR method is the graph transformation operator Bm, which is a composite graph operator consisting of three mathematical operators: fm-1, g, and h. More specifically, the operator fm-1 models the inverse process of diffusion, while the operators g and h facilitate the mapping of this inverse diffusion process onto the underlying graph. The subscript m serves as a hyper-parameter that controls the strength of diffusion, and its role will be further elucidated in the subsequent discussion.

2.2.1 Network diffusion on random walks and its inverse process

Denote P as the set of transition matrices and W as the set of (weighted) adjacency matrices of undirected graphs, where PP gives the representation of a random walk on a graph, and WW gives the representation of a graph. The diffusion process fm transforms a transition matrix P to another by adding the probability of all paths of different length k joining two nodes, with a smaller weight coefficient 1/mk for a longer path of length k (Yu et al. 2023):

fm: PP
fmP=1k=11/mkPm+P2m2+P3m3+
=1k=11/mkk=1Pkmk  (1)
=m1PmI-P-1  (2)

where k1/mk is a normalization factor, and m>1 to ensure that the series converges when k approaches infinity.

It is easy to check that the spectral radius ρ (the supremum of the absolute value of the eigenvalues) of P/m satisfies ρ(P/m)<1 when m>1, which promises the convergence of the infinite series: fmP=(m1)PmI-P-1.

Now we focus on the inverse process of the diffusion process defined by fm. Considering that diffusion will cause indirect effects, we assume that the random walk on the noisy network is generated from the diffusion on a potential network, then we might make use of the inverse process of fm to eliminate the indirect effects (noises). That is to say, we can treat:

Pobs=fm(Pdir) (3)

where Pobs represents the random walk on the observed noisy network, and Pdir represents the random walk on the underlying true network. Then the inverse operator of fm can recover Pdir from Pobs by removing the indirect effect of all lengths of paths between two nodes.

In detail, the inverse diffusion process fm-1 is defined as follows:

fm-1P=mm1I+P-1P (4)

It is easy to verify that the operators fm defined in Equation (2) and fm-1 defined in Equation (4) are inverse operators to each other. (See Theorem 1 and its proof in Supplementary Data). Notice that the operator fm-1 shares the same properties as the operator fm (Yu et al. 2023) (See Theorem 2 and its proof in Supplementary Data). Besides, we need to promise that the operator fm-1 also transform a transition matrix to another, that is to say: fm-1(P)P for PP. Firstly, we can prove that the sum of each row of the matrix fm-1(P) is 1 for PP (See Theorem 3 and its proof in Supplementary Data).

We pointed out that the non-negativity of fm-1P is not always guaranteed, preventing it from being a standard transition matrix. This is because the basic assumption of using the operator fm-1 to remove all indirect effects is that the random walk on the noisy input network is (approximately) generated by the diffusion process defined by the operator fm. When the truth is far away from this assumption, the inaccuracy could lead to abnormal results. Particularly, removing the indirect effects of a sparse network is challenging, and negative numbers occur when there is no edge between two nodes or when the weight of two nodes is too small to account for the sum of all paths of different lengths joining them. To solve this ill-conditioned problem, which hinders us from ensuring that fm-1(P)P, we have provided a corrective approach in practical implementation, as detailed in following Section 2.2.4.

2.2.2 Auxiliary operators mapping between graph and random walk

To facilitate the transformation between the graph and the random walk on the graph, we introduce two auxiliary operators, g and h. These two operators enable us to map the diffusion of the random walk defined by the operators fm and fm-1 to the diffusion of the graph, as described in our previous work (Yu et al. 2023).

Denote D as the diagonal degree matrix of W, then gW defines a random walk on the graph whose (weighted) adjacency matrix is W:

g: W P 
gW=D-1W (5)

The operator h has the opposite effect of g, which recovers the underlying graph of the random walk defined by the transition matrix P:

h:PW 
hP=α·diagπP (6)

where πP=(π1,,πn) is the stationary distribution of transition matrix P such that πP=π. diagx means the diagonal matrix whose diagonal element is the vector x, and α is a constant that controls the sum of the weight matrix h(P). The operator h multiplies the transition probability Pij by the stationary distribution of node i, which reflects the degree information of the graph.

2.2.3 Composite graph operator for (reverse) network diffusion

The forward diffusion process and backward diffusion process on random walk defined by fm and fm-1 can now be mapped onto the graph, respectively. The composite graph operator Fm, which is related to fm, is what we previously referred to as the NR diffusion method:

Fm:WW
FmW=hfmgW (7)

Similarly, we define the composite graph operator Bm related to fm-1 as:

Bm:WW
BmW=hfm-1gW (8)

RENDOR employs the operator Bm to modify the edge weights of noisy networks. Bm aims at removing the combined indirect effects of all lengths of paths connecting two nodes. Assuming that the noisy network is approximately generated by the network diffusion defined by the operator Fm, then hopefully we could utilize the operator Bm to remove all the indirect effects.

2.2.4 Model implementation

To denoise an observed network Gobs with adjacency matrix Wobs using the RENDOR method, we first define a random walk gWobs on the graph, then applied inverse diffusion to get fm-1gWobs, and finally recover the denoised graph hfm-1gWobs.

We have mentioned before that the matrix fm-1(g(Wobs)) may not always be non-negative, which is a prerequisite for it to be a standard transition matrix. To solve this ill-conditioned issue, we preprocess the input network by modifying it to Wobs=Wobs+ε1J+ε2I before applying RENDOR. Here, J is a matrix with all elements equal to 1, and I is the identity matrix with diagonal entries of 1. The parameters ε1 and ε2 are introduced to control the extent of modification to the input network. This preprocessing step can be understood as a recovery mechanism, if we assume that the input network is obtained by thresholding a network and removing the diagonal entries. This harmless modification does not destroy the structure of the original network and helps to make fm-1(g(Wobs)) non-negative if we take large enough values for the parameters ε1 and ε2. In order not to affect the properties of the input network, we take ε1=ε2=1 for the unweighted graph and the minimum non-zero value for the weighted graph.

However, when the input network is weighted, a preprocessing step like Wobs=Wobs+ε1J+ε2I may not be adequate to guarantee the non-negativity of fm-1(g(Wobs)) due to the complexity of edge weights distribution. Therefore, we take a postprocessing step to the matrix fm-1(g(Wobs)) to ensure its non-negativity. Specifically, we adjust each row containing negative numbers in fm-1(g(Wobs)) by subtracting the smallest negative value in that row. Denote fm-1(g(Wobs)) as Pdir, this postprocessing step can be formalized as:

Pdir=Pdir-β11,βn1T 

where 1 is the column vector with all elements as 1, βi=0 if the i-th row of Pdir has no negative elements, βi=minj{Pdirij}if the i-th row of Pdir has at least one negative element, j refers to the column index within the i-th row of the matrix Pdir.

We present the pseudocode of RENDOR in Table 1 so that the readers can understand our method more clearly.

Table 1.

The pseudocode for the RENDOR method.

Pseudocode for RENDOR
Input: Wobs: weighted adjacency matrix of observed network;
m: diffusion intensity parameter;
ε1, ε2: preprocessing parameters.
Output: Wdir: denoised adjacency matrix of direct network.
1.  Wobs=Wobs+ε1J+ε2I
2.  Pobs=gWobs=diagWobs1-1Wobs
3.  Pdir=fm-1Pobs=mm-1I+Pobs-1Pobs
4.   for i=1,,n:
  if minj{Pdirij}0: βi=0
  else: βi=minj{Pdirij}
5.  Pdir=Pdir-β11,,βn1T
6.  Wdir=hPdir=diagπPdirPdir

The computational complexity of RENDOR is O(n3), primarily due to the matrix inversion and multiplication steps involved in processing the input network, where n is the number of nodes in the network (Supplementary Data). RENDOR exhibited relatively fast performance, taking approximately 10 seconds to process a network with 1500 nodes, which is considered acceptable (Supplementary Fig. S1).

3 Results

We evaluated the denoising performance of RENDOR on both simulated and real GRNs. In each experiment, we examined how applying RENDOR as a denoising step improved the accuracy of network inference. This was validated by comparing the Area Under the Receiver Operating Characteristic Curve (AUROC) and the Area Under the Precision-Recall Curve (AUPR) scores (Supplementary Data).

3.1 Comparable methods

We compared the denoising performance of RENDOR with four other state-of-the-art GRN denoising methods: ND (Feizi et al. 2013), NE (Wang et al. 2018), Silencer (Barzel and Barabási 2013), and inverse correlation matrix (ICM) (Alipanahi and Frey 2013).

3.1.1 Network deconvolution

Given the observed similarity matrix Gobs, ND distinguishes direct dependencies Gdir from the following equation:

Gdir=GobsI+Gobs-1

In practical implementation, ND used matrix eigen-decomposition to derive Gdir. Specifically, ND first scaled the observed matrix Gobs to ensure that the spectral radius of Gdir is less than 1, meaning that all eigenvalues fall within the range of −1 to 1. Subsequently, ND performed an eigen decomposition of Gobs as follows:

Gobs=UΣobsU-1=Udiagλ1obs, λ2obsU-1

Here, U represents the matrix of eigenvectors of Gobs, and the elements λ1obs, λ2obs, … are the corresponding eigenvalues of Gobs.

In the final deconvolution step, the output matrix can be obtained through the following eigenvalue reweighting:

λidir=λiobs1+λiobs

resulting in:

Gdir=UΣdirU-1=Udiagλ1dir, λ2dirU-1

3.1.2 Network enhancement

NE employs a diffusion-based mechanism to denoise biological networks. Specifically, NE takes a weighted matrix W as input and outputs the denoised matrix whose SNR is enhanced. Mathematically, it first encodes the local structures of W as matrix T:

Pij=WijkNiWi,kΙjNi, Tij=k=1nPi,kPj,kv=1nPv,k

where Ni means the k nearest neighbors of node i, and I{.} is the indicator function. Then it obtains the output matrix by defining an iterative diffusion process on T:

Wt+1=αT×Wt×T+(1-α)T

3.1.3 Silencer

The Silencer method defines the input correlation matrix C as the global response matrix, encompassing both direct and indirect effects. It calculates the local response matrix S using the following equation, which eliminates the contribution of indirect effects:

S=C-I+DC-ICC-1

where I represents the identity matrix, and D(X) sets the off-diagonal terms of X to zero.

3.1.4 Inverse correlation matrix

Since the Silencer method essentially relies on a scaled version of the ICM (Alipanahi and Frey 2013), we also employed this well-established method as a baseline approach (referred to as ICM):

S=C-1
Pij=-SijSiiSjj, ij  1,   i=j

When the input C is a correlation matrix, P represents the partial correlation matrix, and S is commonly known as the precision matrix.

As outlined in the Section 1, one can either infer direct GRNs from gene expression data, or denoise existing GRNs. Partial correlation has been widely used in GRN inference to infer direct connections (Guo et al. 2017). Network inference methods based on partial correlation can be regarded as applying correlation followed by this ICM network denoising step. Therefore, ICM is capable of refining networks containing indirect effects to further improve their accuracy.

3.2 Simulation experiments

We first generated simulated networks containing indirect noise to evaluate the denoising effectiveness of RENDOR. In these datasets, we had ground-truth labels distinguishing noise (indirect edges) from signals (direct edges), allowing us to accurately assess the predictive performance of edge confidence scores.

Specifically, we take the circular graphs, Erdős–Rényi (ER) random graphs, and BA graphs (Barabási and Albert 1999) as the original true networks, and introduce simulated indirect edges to get noisy networks. The method for adding noisy edges is based on the principle that the more paths connecting two nodes and the shorter these paths, the higher the probability of adding an edge between the nodes. Details for generating simulated networks are provided in Supplementary Data. We can control the number of indirect noisy edges added to generate simulated networks with varying levels of noise (Supplementary Figs S2 and S3). Next, we apply network denoising methods to these noisy networks. We compare how well the denoised networks recover the original network and assess their predictive capability regarding noisy/signal edges.

To visually illustrate the denoising effect of RENDOR, we first created a noisy network by adding 15 indirect edges to a 20-node circular graph with 20 edges (Fig. 2a). After applying five network denoising methods and thresholding the edges to match the original network’s edge count, results in Fig. 2a show that RENDOR more accurately reconstructs the original network structure. Additionally, RENDOR-denoised network contains a higher number of true positive (TP) and true negative (TN) edges, and a lower number of false positive (FP) and false negative (FN) edges (Fig. 2b). This demonstrates the effectiveness of RENDOR in noise elimination.

Figure 2.

Figure 2.

The denoising performance of RENDOR on the simulated networks. (a) Six subfigures present the noisy graph with simulated indirect edges, and the denoised graphs obtained by applying five denoising methods. Noisy edges are marked in red. Ture edges are marked in black. (b) The number of FN, TN, TP, and FP edges of the five denoised networks. (c) The AUROC and AUPR scores (y-axis) of applying various denoising methods to the noisy networks under different noise levels (x-axis) generated based on the ER graphs and BA graphs.

We further conducted a comparative analysis of five denoising methods on noisy networks generated from the ER graph of node size N=50 and edge-formation probability P=.3, as well as the BA graph of node size N=50 and e=3 (number of edges added each step). To introduce varying difficulty levels in the denoising tasks, we added indirect noisy edges with different intensities. The proportion of noisy edges in the noisy networks ranges from 0.1 to 0.5. Subsequently, we applied five denoising methods to these noisy networks and computed AUPR and AUROC scores. For each proportion of noisy edges, we presented the average score of 100 repeated tests. The results illustrated in Fig. 2c demonstrate that the RENDOR-denoised networks exhibit a higher predictive accuracy, and RENDOR consistently outperforms other methods across different experiment settings. Furthermore, RENDOR displays a smaller variance, underscoring its robustness in handling varying noise levels.

In summary, the simulated experiments designed in this section provide pieces of evidence that RENDOR can effectively and robustly eliminate indirect noise, leading to a more accurate reconstruction of the underlying true network structure.

3.3 DREAM benchmark

We then evaluated the denoising performance of RENDOR on the GRNs constructed from the Dialogue on Reverse Engineering Assessment and Methods (DREAM) project (Marbach et al. 2012), which serves as a comprehensive platform for assessing the performance of various GRN inference methods (Supplementary Data). Specifically, we obtained GRNs from various network inference algorithms and used them as input for five denoising methods, including RENDOR. We then calculated and compared AUROC and AUPR scores before and after denoising to assess the effectiveness of these methods in enhancing GRN inference.

These GRN inference algorithms tested included correlation-based methods [ARACNE (Margolin et al. 2006), Pearson, Spearman], information-theory-based methods [CLR (Faith et al. 2007), OIPCQ and OIPCQ2 (Mahmoodi et al. 2021)], machine learning approaches [GENIE3 (Huynh-Thu et al. 2010), TIGERSS (Haury et al. 2012), GRNBoost2 (Moerman et al. 2019)], and statistical approaches [Inferelator (Bonneau et al. 2006), ANOVA (Küffner et al. 2012), wpLogicNet (Malekpour et al. 2023)].

3.3.1 RENDOR improves inference accuracy of GRNs

We first utilized the DREAM3 benchmark alongside 13 GRN inference algorithms to assess the denoising efficacy of RENDOR. As illustrated in Supplementary Fig. S4, some network inference methods (like OIPCQ and OIPCQ2) already exhibit excellent performance before denoising, and show further improvement after applying RENDOR. Among the 13 network inference algorithms tested, nearly all GRNs showed improved performance after RENDOR denoising, with the greatest overall improvement compared with other denoising methods.

We further tested the denoising performance of RENDOR on DREAM5 in silico benchmark. As previously mentioned, RENDOR has a parameter m that modulates the denoising intensity. We now discuss the impact of different values of m on the denoising outcomes. As depicted in Fig. 3a and b, for most values of m, RENDOR exhibits favorable positive denoising effects. However, when m takes on very small values, it leads to a stronger modification of edge weights, which may result in undesirable results. Because if the original network is of high quality, which assumes a lower noise level, excessive denoising is not recommended. Conversely, if the original network is of lower quality, denoising on top of it is theoretically unlikely to yield good results. Therefore, opting for a more conservative m value is strongly recommended. In the subsequent experiments, we present results with m set to 4.

Figure 3.

Figure 3.

The denoising performance of RENDOR on GRNs that inferred from the DREAM5 dataset. (a) and (b) illustrate the denoising performance of RENDOR on ten GRNs (x-axis) for varying values of RENDOR’s parameter m (y-axis). The heatmap colors represent the degree of improvement in scores after denoising compared to before denoising. The symbol “+” indicates an enhancement in network accuracy after denoising, whereas “-” denotes a reduction. (c) and (d) present the AUPR and AUROC scores for the ten GRNs before denoising (raw) and after denoising using the five methods. The parameter m of RENDOR was set to 4. (e) Boxplot showing the improvement of AURPC and AUPR scores (y-axis) of GRNs derived from various denoising methods (x-axis), as compared to the original GRNs. Each dot corresponds to a GRN.

In Fig. 3c and d, we present inference accuracy of 10 GRNs obtained from various network inference algorithms before denoising (raw) and after denoising using five different methods: RENDOR, ND, NE, ICM, and Silencer. It can be summarized that the RENDOR-denoised networks substantially improve the inference accuracy of the input networks due to the diminishing of indirect noise influence, and RENDOR outperforms other methods. The AUROC score has increased by 4.6% overall for RENDOR while reduced by 0.01% for ND. Meanwhile, the AUPR score has increased by 17.0% overall for RENDOR while increased by 7.0% for ND (Fig. 3e). The denoising performance of RENDOR evaluated using F1-score was presented in Supplementary Fig. S5.

Moreover, the denoising performance of both ICM and Silencer was unsatisfactory. This can be attributed to two potential reasons: first, the input matrices were obtained through different methods rather than all derived from correlation; second, the input matrix only gives correlation values between TF and genes, rather than between all genes, so we supplemented the missing values between non-TF genes with zeros to create a square input matrix. It can be concluded that ND and RENDOR are not sensitive to this data preprocessing step, while ICM and Silencer were ineffective at removing indirect effects under this scenario.

3.3.2 Denoising GENIE3-inferred GRN

Furthermore, we observed that the GENIE3 method exhibits the best performance in GRN inference on DREAM5 benchmark, and RENDOR denoising is also more effective based on this high-quality GRN. Therefore, we conducted a detailed analysis of the network structure information of the GRN inferred by GENIE3 and further denoised by RENDOR.

Specifically, for the GRN inferred by GENIE3 on DREAM5 in silico dataset and denoised network obtained by applying RENDOR and ND methods on this network, we kept their top 100–1500 edges with the highest weights, and compared the number of correctly inferred edges. Due to the prioritization and elevated ranking of true edges in the network denoised by RENDOR, we observed a significantly higher proportion of TP edges when retaining edges with higher weights. As shown in Table 2, when retaining the top 200 edges with the highest confidence scores, all edges identified by RENDOR correspond to true regulatory relationships. This significantly enhances the network inference accuracy of GENIE3 from 88.5% to 100%. As the number of retained edges increases, the overall inference accuracy decreases. Nevertheless, the GRN denoised by RENDOR consistently demonstrates superior inference performance compared to both the original GRN inferred by GENIE3 and the GRN denoised by ND.

Table 2.

Comparison of the number of TP edges when different numbers of edges (100–1500) are retained on the original weighted network inferred by GENIE3, and the RENDOR- and ND- denoised networks.

#Kept_edges GENIE3 GENIE3 + ND GENIE3 + RENDOR
100 97 99 100
200 177 191 200
300 255 275 299
400 334 354 395
500 405 429 490
1000 711 750 869
1200 794 851 976
1500 901 942 1113

Furthermore, we visualized the RENDOR-denoised GRNs in Supplementary Fig. S6. We observed that they consistently maintain the hub structure in the network without disrupting the scale-free structure present in the original GRN inferred by GENIE3. This indicates that the edge weight adjustments made by RENDOR to the input network are indeed beneficial.

In summary, RENDOR significantly improved the network inference accuracy when compared with various network denoising methods for GRN inference and thus had a better GRN inference performance.

4 Discussion

We acknowledged the limitations of denoising when dealing with networks of inferior quality, such as those where inferred edges are nearly indistinguishable from random guesses. In these cases, the effectiveness of even the most advanced denoising methods is constrained, leading to a scenario described as “garbage in, garbage out.” We utilized the E.coli dataset from DREAM5 for further illustration. Notably, the accuracy of GRN inferred on this dataset is lower. This suggests that applying denoising methods may not substantially enhance network quality or provide convincingly better results. Nonetheless, REDNOR is still capable of enhancing GRN performance of most network inference methods, with an average improvement level higher than other denoising methods (Supplementary Fig. S7). Furthermore, RENDOR’s weight adjustment for edges with higher confidence can effectively prioritize some actual existing edges, thus enhancing inference accuracy (Supplementary Fig. S8). When denoising networks of inferior quality, we recommend using RENDOR to adjust only the weights of edges with higher confidence scores, while preserving the original weights of edges with lower scores.

Before applying denoising methods, it is necessary to verify the appropriateness of the denoising scenario. Misapplication of denoising methods in inappropriate contexts can lead to suboptimal or even adverse denoising effects. In the context of denoising GRN, as shown in the Supplementary Fig. S9, using other denoising methods designed for better community detection can dramatically decrease the accuracy of GRN inference. The inferior performance of these denoising methods is because they enhance connections within nodes that have self-organizing properties, thus making the network’s triangular structures denser, which is contrary to the goal of removing indirect influences (breaking down triangular structures).

Additionally, the preprocessing and postprocessing steps of RENDOR are heuristic, aimed at ensuring feasible solutions. Their impact on the denoising effects is not fully understood, warranting further research to explore potentially better approaches. Furthermore, the selection of algorithm hyperparameters is empirical. A more comprehensive evaluation and finer selection methods might be necessary.

5 Conclusion

In conclusion, this study introduced RENDOR, a novel denoising approach for improving the accuracy of network inference. This method is designed to handle noisy networks affected by indirect effects. It effectively models higher-order indirect interactions between nodes through network diffusion, employs reverse network diffusion to eliminate indirect effects, and outputs refined networks containing only direct signal edges.

Through comprehensive evaluations on both simulated noisy networks and real GRNs, we demonstrated that RENDOR consistently outperforms alternative denoising methods for GRN inference, enhancing the inference accuracy by effectively mitigating the impact of indirect noise. Furthermore, our experiments showcased RENDOR’s robustness across various noise levels, reinforcing its applicability in diverse biological contexts. RENDOR offers a valuable contribution to the field of network inference and provides researchers with a powerful tool for uncovering more accurate and reliable biological network structures.

Supplementary Material

btae435_Supplementary_Data

Contributor Information

Jiating Yu, School of Mathematics and Statistics, Nanjing University of Information Science & Technology, Nanjing 210044, China; IAM, MADIS, NCMIS, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing 100190, China; School of Mathematical Sciences, University of Chinese Academy of Sciences, Beijing 100049, China.

Jiacheng Leng, IAM, MADIS, NCMIS, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing 100190, China; School of Mathematical Sciences, University of Chinese Academy of Sciences, Beijing 100049, China; Zhejiang Lab, Hangzhou 311121, China.

Fan Yuan, IAM, MADIS, NCMIS, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing 100190, China; School of Mathematical Sciences, University of Chinese Academy of Sciences, Beijing 100049, China.

Duanchen Sun, School of Mathematics, Shandong University, Jinan 250100, China.

Ling-Yun Wu, IAM, MADIS, NCMIS, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing 100190, China; School of Mathematical Sciences, University of Chinese Academy of Sciences, Beijing 100049, China.

Supplementary data

Supplementary data are available at Bioinformatics online.

Conflict of interest

The authors declare that they have no competing interests.

Funding

This work was supported by the National Key Research and Development Program of China [No. 2020YFA0712402, No. 2022YFA1004803]; National Natural Science Foundation of China [No. 12231018, No. 62202269]; Science Foundation Program of the Shandong Province (No. 2023HWYQ-012); the Program of Qilu Young Scholars of Shandong University; and the Startup Foundation for Introducing Talent of Nanjing University of Information Science & Technology, China (No. 2024r088).

References

  1. Aghdam R, Ganjali M, Zhang X. et al. CN: a consensus algorithm for inferring gene regulatory networks using the SORDER algorithm and conditional mutual information test. Mol Biosyst 2015;11:942–9. [DOI] [PubMed] [Google Scholar]
  2. Alipanahi B, Frey BJ.. Network cleanup. Nat Biotechnol 2013;31:714–5. [DOI] [PubMed] [Google Scholar]
  3. Badia-I-Mompel P, Wessels L, Müller-Dott S. et al. Gene regulatory network inference in the era of single-cell multi-omics. Nat Rev Genet 2023;24:739–54. [DOI] [PubMed] [Google Scholar]
  4. Barabási AL, Albert R.. Emergence of scaling in random networks. Science 1999;286:509–12. [DOI] [PubMed] [Google Scholar]
  5. Barzel B, Barabási AL.. Network link prediction by global silencing of indirect correlations. Nat Biotechnol 2013;31:720–5. [DOI] [PMC free article] [PubMed] [Google Scholar]
  6. Bonneau R, Reiss DJ, Shannon P. et al. The inferelator: an algorithm for learning parsimonious regulatory networks from systems-biology data sets de novo. Genome Biol 2006;7:R36. [DOI] [PMC free article] [PubMed] [Google Scholar]
  7. Conard AM, Goodman N, Hu Y. et al. TIMEOR: a web-based tool to uncover temporal regulatory mechanisms from multi-omics data. Nucleic Acids Res 2021;49:W641–53. [DOI] [PMC free article] [PubMed] [Google Scholar]
  8. Faith JJ, Hayete B, Thaden JT. et al. Large-scale mapping and validation of Escherichia coli transcriptional regulation from a compendium of expression profiles. PLoS Biol 2007;5:e8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  9. Feizi S, Marbach D, Médard M. et al. Network deconvolution as a general method to distinguish direct dependencies in networks. Nat Biotechnol 2013;31:726–33. [DOI] [PMC free article] [PubMed] [Google Scholar]
  10. Guo W, Calixto CPG, Tzioutziou N. et al. Evaluation and improvement of the regulatory inference for large co-expression networks with limited sample size. BMC Syst Biol 2017;11:62. [DOI] [PMC free article] [PubMed] [Google Scholar]
  11. Haury A-C, Mordelet F, Vera-Licona P. et al. TIGRESS: trustful inference of gene REgulation using stability selection. BMC Syst Biol 2012;6:145. [DOI] [PMC free article] [PubMed] [Google Scholar]
  12. Huynh-Thu VA, Irrthum A, Wehenkel L. et al. Inferring regulatory networks from expression data using tree-based methods. PLoS One 2010;5:1–10. [DOI] [PMC free article] [PubMed] [Google Scholar]
  13. Jiang X, Zhang X.. RSNET: inferring gene regulatory networks by a redundancy silencing and network enhancement technique. BMC Bioinformatics 2022;23:165. [DOI] [PMC free article] [PubMed] [Google Scholar]
  14. Kim S. Ppcor: an R package for a fast calculation to semi-partial correlation coefficients. Commun Stat Appl Methods 2015;22:665–74. [DOI] [PMC free article] [PubMed] [Google Scholar]
  15. Kloesch B, Ionasz V, Paliwal S. et al. A GATA6-centred gene regulatory network involving HNFs and ΔNp63 controls plasticity and immune escape in pancreatic cancer. Gut 2022;71:766–77. [DOI] [PMC free article] [PubMed] [Google Scholar]
  16. Küffner R, Petri T, Tavakkolkhah P. et al. Inferring gene regulatory networks by ANOVA. Bioinformatics 2012;28:1376–82. [DOI] [PubMed] [Google Scholar]
  17. Kulik R, Soulier P. Markov chains. Heavy-Tailed Time Series. New York, NY: Springer New York, 2020, 373–423.
  18. Li S, Liu Y, Shen L-C. et al. GMFGRN: a matrix factorization and graph neural network approach for gene regulatory network inference. Brief Bioinform 2024;25:bbad529. [DOI] [PMC free article] [PubMed] [Google Scholar]
  19. Li Z, Nagai JS, Kuppe C. et al. scMEGA: single-cell multi-omic enhancer-based gene regulatory network inference. Bioinform Adv 2023;3:vbad003. [DOI] [PMC free article] [PubMed] [Google Scholar]
  20. Lin Z, Xue H, Pan W. et al. Combining mendelian randomization and network deconvolution for inference of causal networks with GWAS summary data. PLoS Genet 2023;19:e1010762. [DOI] [PMC free article] [PubMed] [Google Scholar]
  21. Liu W, Yang Y, Lu X. et al. NSRGRN: a network structure refinement method for gene regulatory network inference. Brief Bioinform 2023;24:bbad129. [DOI] [PubMed] [Google Scholar]
  22. Mahmoodi SH, Aghdam R, Eslahchi C. et al. An order independent algorithm for inferring gene regulatory network using quantile value for conditional independence tests. Sci Rep 2021;11:7605. [DOI] [PMC free article] [PubMed] [Google Scholar]
  23. Malekpour SA, Shahdoust M, Aghdam R. et al. wpLogicNet: logic gate and structure inference in gene regulatory networks. Bioinformatics 2023;39:btad072. [DOI] [PMC free article] [PubMed] [Google Scholar]
  24. Marbach D, Costello JC, Küffner R. et al. Wisdom of crowds for robust gene network inference. Nat Methods 2012;9:796–804. [DOI] [PMC free article] [PubMed] [Google Scholar]
  25. Margolin AA, Nemenman I, Basso K. et al. ARACNE: an algorithm for the reconstruction of gene regulatory networks in a mammalian cellular context. BMC Bioinformatics 2006;7(Suppl 1):S7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  26. Moerman T, Aibar Santos S, Bravo González-Blas C. et al. GRNBoost2 and arboreto: efficient and scalable inference of gene regulatory networks. Bioinformatics 2019;35:2159–61. [DOI] [PubMed] [Google Scholar]
  27. Wang B, Pourshafeie A, Zitnik M. et al. Network enhancement as a general method to denoise weighted biological networks. Nat Commun 2018;9:3108. [DOI] [PMC free article] [PubMed] [Google Scholar]
  28. Xiao N, Zhou A, Kempher ML. et al. Disentangling direct from indirect relationships in association networks. Proc Natl Acad Sci USA 2022;119:e2109995119. [DOI] [PMC free article] [PubMed] [Google Scholar]
  29. Yu J, Leng J, Sun D. et al. Network refinement: denoising complex networks for better community detection. Physica A 2023;617:128681. [Google Scholar]
  30. Yuan Q, Duren Z.. Inferring gene regulatory networks from single-cell multiome data using atlas-scale external data. Nat Biotechnol, preprint, 2024. 10.1038/s41587-024-02182-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  31. Zhang X, Zhao J, Hao J-K. et al. Conditional mutual inclusive information enables accurate quantification of associations in gene regulatory networks. Nucleic Acids Res 2015;43:e31. [DOI] [PMC free article] [PubMed] [Google Scholar]
  32. Zhao J, Zhou Y, Zhang X. et al. Part mutual information for quantifying direct associations in networks. Proc Natl Acad Sci USA 2016;113:5130–5. [DOI] [PMC free article] [PubMed] [Google Scholar]
  33. Zhao M, He W, Tang J. et al. A comprehensive overview and critical evaluation of gene regulatory network inference technologies. Brief Bioinform 2021;22:bbab009. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

btae435_Supplementary_Data

Articles from Bioinformatics are provided here courtesy of Oxford University Press

RESOURCES