Skip to main content
Scientific Reports logoLink to Scientific Reports
. 2024 Jul 23;14:16881. doi: 10.1038/s41598-024-67442-7

Target control of linear directed networks based on the path cover problem

Wataru Someya 1, Tatsuya Akutsu 2, Jose C Nacher 1,
PMCID: PMC11266607  PMID: 39043768

Abstract

Securing complete control of complex systems comprised of tens of thousands of interconnected nodes holds immense significance across various fields, spanning from cell biology and brain science to human-engineered systems. However, depending on specific functional requirements, it can be more practical and efficient to focus on a pre-defined subset of nodes for control, a concept known as target control. While some methods have been proposed to find the smallest driver node set for target control, they either rely on heuristic approaches based on k-walk theory, lacking a guarantee of optimal solutions, or they are overly complex and challenging to implement in real-world networks. To address this challenge, we introduce a simple and elegant algorithm, inspired by the path cover problem, which efficiently identifies the nodes required to control a target node set within polynomial time. To practically apply the algorithm in real-world systems, we have selected several networks in which a specific set of nodes with functional significance can be designated as a target control set. The analysed systems include the complete connectome of the nematode worm C. elegans, the recently disclosed connectome of the Drosophila larval brain, as well as dozens of genome-wide metabolic networks spanning major plant lineages. The target control analysis shed light on distinctions between neural systems in nematode worms and larval brain insects, particularly concerning the number of nodes necessary to regulate specific functional systems. Furthermore, our analysis uncovers evolutionary trends within plant lineages, notably when examining the proportion of nodes required to control functional pathways.

Subject terms: Complex networks, Network topology

Introduction

Network controllability methods have been developed by integrating control theory concepts with traditional network features. This innovative approach potentially enables us to manipulate large-scale systems using only a limited number of nodes as input signals13. These crucial nodes, connected to external signals, are referred to as driver nodes. In the field of structural controllability in complex networks, the primary goal is to identify the minimum set of driver nodes. By controlling the states of these nodes, we are able to guide the states of the remaining nodes toward desired states in finite time.

Liu et al. proposed the Maximum Matching (MM)-based model to address network controllability1. They demonstrated that for networks with linear dynamics, determining a minimum set of driver nodes involves computing a maximum bipartite matching1. Nacher and Akutsu established a relationship between a Minimum Dominating Set (MDS) and structural controllability for linear and non-linear systems2. Both models have been extensively applied to analyse real-world networks. For example, Wuchty conducted an in-depth analysis of controllability in protein–protein interaction networks in human and yeast organisms using the MDS approach. By examining oncogenes and tumor suppressor genes, a significant association with the identified MDS of proteins was observed4. Moreover, various algorithms have been devised to identify MDS and classify their control categories, yielding substantial biological insights into disease-related genes and critical control proteins across diverse biological systems510.

On the other hand, the MM model has also played a key role in uncovering associations between control nodes and biological characteristics like essential and cancer genes in protein interaction networks1,11. Mutations in driver genes linked to these nodes are thought to drive the transition from normal to diseased states. Moreover, the MM model has provided insights into specific neuron functions in nematode neuronal circuitry through combined ablation experiments and controllability analysis12. Additionally, the MM model has been applied to understand the control mechanisms of cancer genes, particularly within the MYC oncogene family13. Indeed, structural controllability has been widely used to analyze many different types of complex biological networks1419.

However, under certain functional requirements, it may be more convenient or practical to concentrate on a specific subset of nodes, known as target nodes, for control purposes. Hereafter, we adopt the Maximum Matching (MM) approach to address the problem of target controllability. Target structural controllability introduced by Gao et al. focuses on determining the minimum number of driver nodes necessary to steer the states of the target nodes to their desired states within a finite timeframe20, building upon the foundation of structural controllability and utilizing the maximum matching framework as the primary controllability method1, which was used in subsequent related research21,22.

We define a directed graph GV,E, where V is the node set, and E is the edge set. Each node vi in V is assigned a state value xit that can depend on time. Therefore, the state vector xt=[x1t,…,xNt]T represents the states of V={v1,…,vN}. Then, the linear system (A, B, C) is described by the equations:

dxtdt=Axt+Butyt=Cxt 1

The upper equation indicates that, by considering x0 and xF as initial and final states, respectively, and ut=u1t,,uMtT as a time-dependent external control input vector of M external signals, the system is driven from x0toxFin finite time. Here, A is an N × N adjacency matrix of the network, and the coupling strength between nodes and external control nodes is denoted by the real matrix B with N × M dimensions. To address target control, the lower equation y(t) is added into the system (see Eq. 1). In this context, the triplet (A, B, C) is referred to as being target controllable concerning a designated target node set S, where |S| is the cardinality of S such that SN, representing a subset of the set of internal nodes V SV (corresponding to the state vector x)2022. Target controllability means that there exists a time-dependent input vector u(t) capable of driving the state of the target nodes to any desired final state within a finite timeframe. Therefore, yt=y1t,,yStT denotes the output vector of a target set S, and C is an output matrix with S × N dimensions. Furthermore, Gao et al. introduced an algorithm designed to identify the smallest driver node set, corresponding to the input vector u, within the framework of structural controllability. It is important to note that their algorithm is heuristic in nature and draws from k-walk theory, which means it may not always produce optimal solutions20. In fact, Czeizler et al. demonstrated that this problem is NP-hard in general21, which validates the pursuit of heuristic algorithms in addressing it. On the other hand, Li et al. approached the target control problem by formulating it as the path cover problem and devised a polynomial-time algorithm rooted in network flow analysis22. In this context, within a directed graph G(V, E) and a set of target nodes S ⊆ V, a path cover is defined as a set of disjoint paths and cycles that collectively encompass all nodes within S. The objective of the path cover problem is to identify a path cover with the minimal number of paths. While this definition does not precisely mirror the original target control problem, it does establish a sufficient condition for addressing the original problem22. In the discussion that follows, we use P to represent the set of paths and Q to denote the set of cycles within a path cover. Since Refs. 2022 formulate the target control problem in distinct ways, especially Czeizler et al. assume reachability in dealing with cycles while Li et al. do not, there is no contradiction between the NP-hard nature of the original problem and the polynomial-time algorithm devised by Li et al.22. Nevertheless, it is worth noting that the algorithm outlined in Ref. 16 is intricate in nature. Consequently, we have devised a much simpler algorithm for the path cover problem, which is based on the concept of maximum matching. Subsequently, we have applied this algorithm to several biological networks, with the aim of targeting and controlling specific sets of nodes associated with important biological functions.

Before explaining our computational results and methodology, it is essential to provide a more detailed overview of the existing methodologies. As highlighted earlier, two seemingly contradictory results have been put forth. On one hand, Czeizler et al. asserted that the problem of target structural controllability is NP-hard in general21. On the other hand, Li et al. demonstrated that this problem can be solved in polynomial time through a reduction to the maximum flow problem22. Notably, there is no inherent inconsistency between these findings as discussed above. This scenario mirrors the distinction in the technical treatment of driven nodes versus driver nodes discussed in a previous work23. Furthermore, there are slight differences between Gao et al. and Czeizler et al. While Czeizler assumes the concept of reachability, Gao’s approach does not rely on it.

In this study, we derive a theorem that enable us to mathematically characterize the problem at hand and, importantly, to demonstrate the correctness of the proposed algorithm, which can resolve the target structural controllability problem efficiently in polynomial time. Furthermore, our algorithm offers a notably simpler solution compared to the one outlined in Li et al.’s work21.

In addition to the development of a new algorithm for computing target controllability, this study centers on the application of target controllability methodology to analyze complex real-world biological networks. To put the SimpleTarget algorithm into practical use within real systems, we have selected various networks in which a specific set of nodes can be designated as a target control set. Our selection includes the comprehensive connectome of the nematode worm C. elegans, the recently assembled connectome of the Drosophila larval brain, and 70 genome-wide metabolic networks spanning significant plant lineages, which were constructed from the Plant Metabolic Network database. Our findings bring to light notable distinctions between the neural systems associated with nematode worms and larval brain insects in terms of the quantity of nodes necessary to control specific functional systems. Furthermore, our analysis uncovers evolutionary trends within the plant lineages, particularly when examining the proportion of nodes required to regulate functional pathways.

Results

Theoretical results

In this work, we propose a novel algorithm for target controllability. The SimpleTarget algorithm introduces several innovations compared to that of the standard maximum matching-based algorithm1, particularly tailored for achieving target controllability in networks. One distinctive feature is the inclusion of self-loops: SimpleTarget adds self-loops to all non-target nodes during a preprocessing step. Unlike the maximum matching-based algorithm1 that focus on controlling entire networks, SimpleTarget specifically identifies the minimum number of driver nodes required for target controllability. This approach not only relies on a simple implementation but also operates efficiently in polynomial time, making it a practical solution for target controllability.

In this section, we summarize our theoretical findings, comprising a mathematical theorem (see Methods section for details). This theorem furnishes a rigorous mathematical proof for the correctness and computational complexity of the proposed algorithm. Our algorithm was developed based on the concept of the path cover problem. Before presenting the path cover problem definition, we will introduce the necessary mathematical notation.

Let G (V, E) be a directed graph where V is a set of nodes and E is a set of directed edges. A sequence of nodes vi1,vi2,,vik is called a path if vipviq holds for all pq and vip,vip+1 ∈ E holds for all p=1,,k-1. Similarly, a sequence of nodes vi1,vi2,,vik,vi1 is called a cycle if vipvip+1 holds for all pq, vip,vip+1 ∈ E holds for all p=1,...,k-1, and vik,vik+1 ∈ E holds. Then, the path cover problem is defined as below22.

Definition 1 (Path Cover Problem)

Given a directed graph G(V, E) and a set of target node S ⊆ V , find a set of disjoint cycles and paths with the minimum number of paths that include all nodes in S.

Note that initial nodes in distinct paths should be controlled by distinct external nodes whereas all cycles can be controlled by one external node, as assumed in Refs. 1,20,22. An example of the path cover problem is shown in Fig. 1. The network consists of eight target nodes that can be controlled by two external control nodes u1 and u2. In this particular example, the target nodes are covered by a combination of two cycles and two paths. Following the methodology described in Ref.1, we assume that all cycles and one path can be controlled by a single external node. Consequently, in this example we only require two external control nodes u1 and u2.

Figure 1.

Figure 1

Example of the path cover problem. In this figure, filled (red) nodes represent target nodes and dotted nodes (i.e., u1, and u2) represent external control nodes. In this case, target nodes are covered by two cycles and two paths. As in Refs. 1,20,22, it is assumed that all cycles and one path are controlled by one external node. Therefore, we only need two external control nodes.

By utilizing the aforementioned path cover concept, we have devised a novel algorithm named SimpleTarget, which offers a simpler and efficient solution to the target structural controllability problem. The full details of the SimpleTarget algorithm are outlined in the Methods section. Then, by combining this path cover concept with the maximum matching-based controllability approach1, we establish the following theorem:

Theorem 1

Algorithm SimpleTarget solves the target structural controllability problem (in the sense of the path cover problem shown in  Ref.22) in polynomial time.

A detailed proof of the theorem can be found in the Methods section.

We have computed the simulation results (state evolution trajectories) to show that the network can indeed be target controlled. This demonstrates that the nodes identified by the proposed SimpleTarget algorithm are the ones that drive the target node states to the desired final state. The detailed analysis is provided in the Supplementary Information file (Supplementary Fig. S1).

Computational results derived from data analysis

Metrics to evaluate target control

Gao et al. primarily focused their analysis of target controllability on artificially generated networks, rather than on experimental biological networks20. Consequently, they adopted two distinct methodologies for selecting a set of target nodes. In one approach, they randomly selected a fraction f of nodes. In a different scenario, they implemented a so-called local schema, which still involved random selection of a fraction f of nodes, but with the additional condition that these nodes must be adjacent. Here f is defined as f = NT/N where NT is the number of target nodes and N the total number of nodes. To gauge the efficiency of target control for a specific fraction f, they introduced the target controllability parameter, denoted as αD. This parameter is defined as αD = PD/ND, where PD represents the minimum number of driver nodes required to control a fraction f of target nodes, while ND represents the minimum number of driver nodes needed to control the entire network20. As such, αD serves as a measure of how efficiently target control operates in comparison to full control over the entire network. It quantifies the ratio of driver nodes needed for targeted control to those needed for controlling the entire network. This parameter was employed to evaluate both random and local control schemas in the analyzed scale-free (SF) and Erdős-Rényi (ER) networks 20. In our data analysis, we adopt αD to assess target control in biological networks. Additionally, we introduce βD, defined as βD = αD/f, to assess how target control efficiency compares to a neutral expectation. When βD is less than 1, it signifies that target control is more efficient than neutral expectation. If target control operates as efficiently as the neutral expectation (βD = 1), it implies that

PD=fND,andαD=f.

In our research, our objective is to apply the target controllability algorithm to real biological networks. Specifically, we analyze neural networks from the C. elegans worm12,24,25 and the Drosophila fly insect26, as well as metabolic networks associated with 70 different plant organisms (further details are available in the Methods section)27,28. To achieve this, we must carefully select sets of target nodes related to specific biological functions or pathways of interest.

Results for the analysis of connectome networks

The initial segment of our analysis focuses on connectome networks. The neuronal network of the C. elegans worm is composed of 378 nodes and 5256 directed links, and consists of various functional neuron classes, including motor, interneuron, poly-modal, sensory, and muscle neurons12,24,25. As part of our analysis, we designate each of these classes as a set of target nodes for control purposes. Another dataset is derived from the recently discovered connectome of the larval brain of the Drosophila melanogaster insect, comprising a comprehensive network of 2952 neurons and 110,140 directed links26. Similar to the approach taken with the C. elegans worm, we establish specific sets of target nodes based on the functional neuron classification in this network. These classes are further categorized into three major groups: brain inputs, interneurons, and brain outputs. From a target control perspective, it is intuitive to consider the brain outputs as potentially more desirable for control.

In the case of C. elegans, when we consider muscles as the target control set, the SimpleTarget algorithm yields a result of αD = 0.989 (as shown in Fig. 2). Shifting our focus to the Drosophila brain network, we select brain output neuron classes as the target control sets, resulting in the following outcomes: Ring Gland Neurons (RGN) yield αD = 0.391, Descending Neurons to Subesophageal Zone (DN-SEZ) result in αD = 0.043, Descending Neurons to Ventral Nerve Cord (DN-VNC) yield αD = 0.115 (see Fig. 3). These results suggest that the Drosophila connectome appears to be more optimized for brain output neuron control when compared to the muscle class for C. elegans. When assessed in the context of the neutral expectation, DN-SEZ gives a βD = 0.782, indicating a level of control efficiency below one. In the case of C. elegans, the result for muscles a as target control is βD = 3.857, quite above the neutral efficiency threshold of 1. On the other hand, DN-VNC and RGN neuronal classes generate values exceeding one, with βD = 1.88 and 21.39, respectively. While a precise quantitative comparison between the connectomes of both organisms proves challenging due to their intrinsic biological differences, a qualitative assessment suggests that C. elegans demonstrates superior efficiency in controlling input neurons related to sensory classes compared to Drosophila (see brain inputs and sensory classes in Figs. 2 and 3). Intuitively, it seems reasonable that the αD is for sensory neurons in Drosophila is large since these neurons primary receive stimuli from the external environment and may not have internal input edges. However, in C. elegans the observation of a small value of αD can be attributed to an information transfer between sensory-sensory connections, which develop a higher abundance of loops or between sensory nodes and other neurons. The prevalence of loops in this particular class may contribute to the enhanced efficiency of input neuron control observed in C. elegans as compared to Drosophila. In contrast, optimization in interneuron control is observed in both Drosophila and C. elegans, requiring minimal external control nodes (see Fig. 4), which is originated by the existence of loops. Furthermore, as discussed above, when evaluating output neurons, the Drosophila connectome appears to exhibit greater efficiency than its C. elegans counterpart.

Figure 2.

Figure 2

Computed αD and βD metrics for each functional neuron class considered as controllability target set. Notably, muscle neurons display a discernible trend of being accessible to targeting by a set of driver nodes, which is plausible for real-world applications.

Figure 3.

Figure 3

Computed αD and βD metrics for each functional neuron class considered as controllability target set in the brain of the fruit fly insect (Drosophila). All classes can be categorized into three main groups: input neurons, interneurons, and output neurons. Many classes exhibit controllability by a single external control node (PD = 1) or few nodes, reflecting the existence of loops in the internal network structure. Notably, all three classes belonging to the output neuron group (RGN, DN-SEZ, DV-VNC) can be target-controlled by a larger set of driver nodes. The dashed red line represents βD = 1.

Figure 4.

Figure 4

Visualization of the entire Drosophila connectome with highlighted target control sets (red) and the driver node set (blue). (a) The target control set of interneurons. (b) The target control set of brain output neurons. Notably, the brain output neurons are mapped into more distributed locations and require more driver nodes. In contrast, interneurons are more interconnected, which favors the presence of loops.

The elevated αD observed in sensory nodes of Drosophila is due to their limited number of input edges. Conversely, in C. elegans, the high observed in muscles αD is a consequence of their sparse output edges. Although the result, a large αD, is the same, the underlying mechanisms differ between the two scenarios. This nuanced contrast, where a high αD may arise from either a scarcity of input edges or a shortage of output edges (see Fig. 5), adds an interesting dimension to the comparative analysis.

Figure 5.

Figure 5

In the example, both (a) and (b) have PD = 5, ND = 5, so αD= 1. However, (a) has output nodes as the target, while (b) has input nodes as the target.

Notably, it is intriguing to observe that several target node sets linked to functional categories, such as interneurons, within C. elegans and Drosophila exhibit PD = 1. This observation suggests that these target neuronal sets themselves inherently possess controllability. That is, these subsets of neurons contain loops and, therefore, can be controlled by a single external node.

To further investigate the topological features of the identified driver nodes, we compared the mean degree of driver nodes < kD > to the mean degree of each functional neuron class (i.e., target node set) < kT > in the C. elegans neuronal network and the Drosophila brain connectome. The results shown in Fig. S2 (Supplementary Information) indicate that relatively low-degree nodes are used to control specific target systems. In other works, the driver nodes tend to avoid hubs.

Results for the analysis of metabolic networks from 70 plant species

By using plants metabolic networks, we may also set as target nodes specific pathways or enzyme classes27,28. Figure 6 displays the αD and βD metrics computed for plant metabolic networks. The target sets represent specific functional pathways and the enzymes/reactions associated with them. An intriguing trend is observed, suggesting an evolutionary tendency. The αD and βD metrics appear to increase as we move from the pathways of eudicots/monocots (more modern) to those of basal plants and green algae (more primitive). Moreover, overall, the values of αD are much smaller compared to those of connectome networks, suggesting an abundance of loops.

Figure 6.

Figure 6

Computed αD and βD for plant metabolic networks. The target sets are specific functional pathways and the enzymes/reactions involved in them. Colors denote the four major lineages. It seems there is an evolutionary tendency because αD and βD metrics tend to increase in most pathways from eudicots/monocots (most modern) plants to basal plants and green algae (more primitive).

On the other hand, Fig. 7 shows a scatter plot that illustrates the relationship between the αD metric and f calculated for the full set of plant metabolic networks. In the left panel, each dot corresponds to a pathway, resulting in a total of 70 × 14 data points, with colors representing the main lineage types. The right panel, on the other hand, uses colors to differentiate the data points based on the functional class of the pathways. While it may not be apparent that plants from major lineage groups follow a distinct pattern (Fig. 7(left), the scatter plot that includes indications for functional pathways reveals a more clustered trend (Fig. 7(right)). In other words, pathways with the same biological functions tend to exhibit similar αD versus f values. Moreover, most of them tend to be more efficient than the neutral expectation αD<f, with some samples of specialized metabolism being exceptions.

Figure 7.

Figure 7

Scattered plot between αD metric and f computed for plant metabolic networks. (Left) Each dot corresponds to a pathway and there are 70 plants so we have 70 × 14 data points. They are colored by main lineage types. (Right) The dots are colored according to the pathway functional class indicated in legend.

To get deeper insights into the network features of the identify driver nodes, we compared the mean degree of driver nodes < kD > to the mean degree of functional pathways (i.e., target node sets) < kT > in the analyzed plant metabolic networks. Similar to the results observed in neural networks (Fig. S2), the findings shown in Fig. S3 (Supplementary Information) suggest that high-degree nodes are not commonly selected as driver nodes for controlling specific plant metabolic pathways. This observation becomes clearer when comparing the degree distributions of driver nodes with those of the target nodes associated with energy metabolism. As shown in Fig. S4 (Supplementary Information file), the degree distribution of plant metabolic networks across main lineages follows a power-law distribution. This implies that a small fraction of nodes are highly connected (i.e., hubs). Interestingly, our results show that these nodes are typically not chosen as driver nodes (see Figs. S5S7 in the Supplementary Information). Instead, most sets of driver nodes are small in size and tend to exhibit low or medium-degree values.

Discussion and conclusion

In this work, we proposed a simple yet efficient algorithm for target control based on the path cover problem. The algorithm was applied to conduct an extensive data analysis of real-world biological networks, focusing on specific sets of functional nodes as target control. Notably, our study used the recently available comprehensive data on the entire neuronal brain of the Drosophila insect, offering a unique comparison with other neuronal systems such as C. elegans worm, along with plants metabolic networks from various major lineages.

In conclusion, the observed large αD for sensory neurons in Drosophila aligns with their primary role in receiving stimuli from the external environment, potentially indicating a lack of internal input edges. Conversely, the small αD in C. elegans suggests efficient information transfer within sensory-sensory connections, fostering a higher abundance of loops and interactions with other neurons. This prevalence of loops likely contributes to the superior efficiency of input neuron control in C. elegans compared to that in Drosophila. While both organisms demonstrate optimized interneuron control, requiring minimal external control nodes due to the presence of loops, Drosophila’s connectome exhibits greater efficiency in output neurons than its C. elegans counterpart, highlighting varied control strategies in different neuronal classes.

In the analysis of metabolic pathways, an evolutionary trend is evident as both αD and βD metrics consistently decrease across most pathways. This trend is observed from more primitive organisms, such as basal plants and green algae, towards the more modern eudicots/monocots plants, with specialized metabolism possibly being an exception. The decrease in αD suggests an enhancement in target control efficiency across evolution. Furthermore, across lineages, almost all pathways exhibit βD values smaller than 1, indicating that target control for each pathway is more efficient than the neutral expectation. Notably, this efficiency appears to have been enhanced throughout evolution.

As mentioned above, some specialized metabolic pathways shown an exception of the observed trend. Specialized metabolic pathways, also known as secondary metabolism, exhibit greater diversity compared to primary metabolic pathways for several reasons. They generate structurally varied compounds linked to specific ecological roles such as defense, signaling, and adaptation to environmental stresses, resulting in higher chemical complexity29,30. Unlike the primary metabolism, secondary metabolism shows increased complexity, with pathways forming distinct modules that can evolve independently, leading to significant diversity31. Furthermore, enzymes involved in specialized metabolism often display high substrate specificity, and gene duplication followed by divergence enhances the evolution and complexity of these pathways32. These significant differences between secondary metabolism and primary metabolism may have reshaped the topology of these pathways, influencing their controllability in distinct ways, as observed in our study.

In summary, by combining a novel and efficient algorithm to tackle target control problem with data analysis from connectomes and metabolic pathways in plants, our data-driven study represents a novel effort in providing insights into control features and strategies for achieving precise target control of specific functional segments in complex neuronal networks and metabolic pathways. The approach, with its easy implementation and fast computation, holds promise for future applications in diverse biological networks and genome-scale pathways.

Methods

Datasets

The neural network in the C. elegans worm comprises 378 nodes and 5256 directed links, encompassing diverse functional neuron classes such as motor, interneuron, poly-modal, sensory, and muscle neurons and was collected from the WormAtlas database25. In our analysis, we categorize each of these classes as a set of target nodes for control objectives. Additionally, we utilize another dataset extracted from the recently unveiled connectome of the larval brain of the Drosophila melanogaster insect. This connectome presents a comprehensive network featuring 2952 neurons and 110,140 directed link and was downloaded from supplementary material file of Ref.26. Note that multi-links connecting the same pairs of neurons were excluded, and each directed edge in our analysis was assigned a weight of one. The data for plant metabolic networks is publicly available in the PMN database28. We constructed enzyme-reaction-centric metabolic networks for 70 plant species, encompassing four major lineages, including 2 algae, 6 basal land plants, 12 monocots, and 50 eudicots.

SimpleTarget algorithm

Our proposed algorithm is quite simple. The algorithm is called SimpleTarget and its pseudocode is given as below.

Step 1: Add self-loops to all non-target nodes.

Step 2: Determine the minimum number of driver nodes using the maximum matching-based algorithm in1.

This algorithm works in polynomial time. The algorithm is illustrated in Fig. 8a, and some graph examples are also shown in Fig. 8b–f.

Figure 8.

Figure 8

(a) Illustration of the proposed algorithm for target controllability. Step 1 adds self-loops to all non-target nodes. Step 2 determines the minimum driver nodes based on maximum matching-based algorithm1. (be) Several examples of directed networks wherein our algorithm identifies a solution. (f) An illustrative graph instance wherein the count of driver nodes PD is one. Interestingly, we observed that several target node sets associated with functional categories of C. elegans and Drosophila exhibit PD = 1 (see Figs. 2 and 3).

For the sake of self-completeness, we briefly explain the maximum matching-based algorithm in Ref.1. We construct a bipartite graph BGVL,VR,Eb from G(V, E) by VL = viL|viV, VR = viR|viV, EB = viL,vjR|vi,vjV (see Fig. 8a). A subset of edges M ⊆ EB is called a matching (in a bipartite graph) if any two edges in M do not share any endpoint. A maximum matching is a matching with the maximum number of edges. It is well-known that a maximum matching can be computed for a bipartite graph (and also for an undirected graph) in polynomial time33. Finally, the unmatched nodes in VR correspond to the driver nodes, which should be controlled by distinct external nodes.

The detailed pseudocode of the algorithm is also given as follows:graphic file with name 41598_2024_67442_Figa_HTML.jpg

Note that the full code, including the detailed code necessary to solve the maximum matching-based algorithm1, is publicly available on GitHub. See the Data Availability section for details.

Theorem 1

Algorithm SimpleTarget solves the target structural controllability problem (in the sense of the path cover problem shown in Ref. 22) in polynomial time.

Proof

It is straight-forward to observe that the algorithm works in polynomial time (since the algorithm in Ref. 1 operates in polynomial time). Therefore, we prove that SimpleTarget consistently finds an optimal path cover (i.e., a path cover with the minimum number of paths). Note that the algorithm in1 always finds a path cover with the minimum number of paths for a given directed graph when S = V.

In this proof, let G(V, E) and S0 be the input graph and the target node set in the original path cover problem, respectively. First, let G(V, E′) be the directed graph obtained from G(V, E) by adding self-loops for all nodes not appearing in S0. Let (P0, Q0) be an optimal solution found by the algorithm in Ref. 1 for G(V, E′). Then, we remove self-loops in Q0 for the nodes not appearing in S0. Let Q1 be the resulting set of cycles. Consequently, it is obvious that (P0,Q1) forms a path cover for G(V, E) and S0. This means that SimpleTarget gives a solution (but not necessarily optimal) for the path cover problem.

Next, let (P, Q) be an optimal solution for the path cover problem for G(V, E) and S0. Then, we add self-loops to the nodes not appearing in P ∪ Q. Let G(V, E′′) be the resulting graph. Given that all target nodes appear in P ∪ Q, E′′ ⊆ E ′ holds. Let Q′ be the set of added self-loops. Then, (P, Q ∪ Q′) is clearly a path cover for G(V, E′) with S = V. As (P0, Q0) is an optimal path cover for G(V, E′) with S = V , P0 ≤|P| holds. Therefore, (P0, Q1) is an optimal path cover for G(V, E) and S0, which means that SimpleTarget provides an optimal solution for the path cover problem. q.e.d.

Supplementary Information

Acknowledgements

T.A. was partially supported by JSPS KAKENHI Grant Numbers 22H00532 and 22K19830. This research was also supported in part by the Research Collaboration Projects of the Institute for Chemical Research, Kyoto University (2024-32).

Author contributions

W.S. and J.C.N. conducted the principal research. W.S. analysed the empirical datasets, implemented the algorithms and prepared figures. J.C.N. and T.A. designed research, contributed to new analytical and theoretical tools, and wrote the paper. All authors have read and approved the final manuscript.

Data availability

The custom code for the developed algorithm used in this study is publicly available from: https://github.com/wataru-s1/SimpleTarget. Our study did not generate new biological dataset; therefore, the manuscript does not report biological data generation. The data used in this study is publicly available from the WormAtlas25 and the PMN28 databases, and from a previous publication (Supplemental Material section) referenced in the text26: 10.1126/science.add9330

Competing interests

The authors declare no competing interests.

Footnotes

Publisher's note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

The online version contains supplementary material available at 10.1038/s41598-024-67442-7.

References

  • 1.Liu, Y.-Y., Slotines, J.-J. & Barabasi, A.-L. Controllability of complex networks. Nature473, 167–173 (2011). 10.1038/nature10011 [DOI] [PubMed] [Google Scholar]
  • 2.Nacher, J. C. & Akutsu, T. Dominating scale-free networks with variable scaling exponent: Heterogeneous networks are not difficult to control. New J. Phys.14, 073005 (2012). 10.1088/1367-2630/14/7/073005 [DOI] [Google Scholar]
  • 3.Mochizuki, A., Fiedler, B., Kurosawa, G. & Saito, D. Dynamics and control at feedback vertex sets. II: A faithful monitor to determine the diversity of molecular activities in regulatory networks. J. Theoret. Biol.335, 130–146 (2013). 10.1016/j.jtbi.2013.06.009 [DOI] [PubMed] [Google Scholar]
  • 4.Wuchty, S. Controllability in protein interaction networks. Proc. Natl. Acad. Sci. USA111, 7156–7160 (2014). 10.1073/pnas.1311231111 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Basler, G., Nikoloski, Z., Larhlimi, A., Barabási, A.-L. & Liu, Y.-Y. Control of fluxes in metabolic networks. Genom. Res.26, 956–968 (2016). 10.1101/gr.202648.115 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Ishitsuka, M., Akutsu, T. & Nacher, J. C. Critical controllability in proteome-wide protein interaction network integrating transcriptome. Sci. Rep.6, 23541 (2016). 10.1038/srep23541 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Wuchty, S. et al. Proteome data improves protein function prediction in the interactome of helicobacter pylori. J. Mol. Cell. Proteom.17(5), 961 (2018). 10.1074/mcp.RA117.000474 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Guo, W.-F. et al. A novel network control model for identifying personalized driver genes in cancer. PLoS Comput. Biol.15(11), e1007520 (2019). 10.1371/journal.pcbi.1007520 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Schwartz, J. M., Otokuni, H., Akutsu, T. & Nacher, J. C. Probabilistic controllability approach to metabolic fluxes in normal and cancer tissues. Nat. Commun.10, 2725 (2019). 10.1038/s41467-019-10616-z [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Wang, P. et al. Deciphering driver regulators of cell fate decisions from single-cell transcriptomics data with CEFCON. Nat. Commun.14, 8459 (2023). 10.1038/s41467-023-44103-3 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Vinayagama, A. et al. Controllability analysis of the directed human protein interaction network identifies disease genes and drug targets. Proc. Natl. Acad. Sci. USA113(18), 4976–4981 (2016). 10.1073/pnas.1603992113 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Yan, G. et al. Network control principles predict neuron function in the Caenorhabditiselegans connectome. Nature550, 519–523 (2017). 10.1038/nature24056 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Pan, C. et al. Control analysis of protein-protein interaction network reveals potential regulatory targets for MYCN. Front. Oncol.11, 633579 (2021). 10.3389/fonc.2021.633579 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Liu, X. & Pan, L. Identifying driver nodes in the human signaling network using structural controllability analysis. IEEE/ACM Trans. Comput. Biol. Bioinform.12(2), 467–472 (2015). 10.1109/TCBB.2014.2360396 [DOI] [PubMed] [Google Scholar]
  • 15.Sun, P. G. Co-controllability of drug-disease-gene network. New J. Phys.17, 085009 (2015). 10.1088/1367-2630/17/8/085009 [DOI] [Google Scholar]
  • 16.Kanhaiya, K. et al. NetControl4BioMed: A pipeline for biomedical data acquisition and analysis of network controllability. BMC Bioinform.19(Suppl. 7), 185 (2018). 10.1186/s12859-018-2177-3 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Wu, L., Li, M., Wang, J.-X. & Wu, F.-X. Controllability and its applications to biological networks. J. Comput. Sci. Technol.34(1), 16–34 (2019). 10.1007/s11390-019-1896-x [DOI] [Google Scholar]
  • 18.Liu, S., Xu, Q., Chen, A. & Wang, P. Structural controllability of dynamic transcriptional regulatory networks for Saccharomycescerevisiae. Phys. A Stat. Mech. Appl.537, 122772 (2020). 10.1016/j.physa.2019.122772 [DOI] [Google Scholar]
  • 19.Zhihua, C., Siyuan, C. & Xiaoli, Q. Identification of biomarker in brain-specific gene regulatory network using structural controllability analysis. Front. Bioinform.2, 812314 (2022). 10.3389/fbinf.2022.812314 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Gao, J. et al. Target control of complex networks. Nat. Commun.5, 5415 (2014). 10.1038/ncomms6415 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Czeizler, E. et al. Structural target controllability of linear networks. IEEE Trans. Comp. Biol. Bioinform.15(4), 1217–1228 (2018). 10.1109/TCBB.2018.2797271 [DOI] [PubMed] [Google Scholar]
  • 22.Li, G. et al. Target control of directed networks based on network flow problems. IEEE Trans. Control Netw. Syst.7(2), 673–685 (2020). 10.1109/TCNS.2019.2939641 [DOI] [Google Scholar]
  • 23.Shinzawa, Y., Akutsu, T. & Nacher, J. C. Uncovering and classifying the role of driven nodes in control of complex networks. Sci. Rep.11, 9627 (2021). 10.1038/s41598-021-88295-4 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Varshney, L. R., Chen, B. L., Paniaqua, E., Hall, D. H. & Chklovskii, D. B. Structural properties of the C. elegans neuronal network. PLoS Comput. Biol.3(7), e1001066 (2011). 10.1371/journal.pcbi.1001066 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Altun, Z. F. et al. (eds) 2002–2023. WormAtlas: https://www.wormatlas.org/neuronalwiring.html.
  • 26.Winding, M. et al. The connectome of an insect brain. Science379, 6636 (2023). 10.1126/science.add9330 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Chae, L., Kim, T., Nico-Poyanco, R. & Rhee, S. Y. Genomic signatures of specialized metabolism in plants. Science344, 510–513 (2014). 10.1126/science.1252076 [DOI] [PubMed] [Google Scholar]
  • 28.Rhee, S. Y. Plant Metabolic Network Database (PMN): https://www.plantcyc.org.
  • 29.Wink, M. Biochemistry of plant secondary metabolism. Ann. Plant Rev.40 (2010).
  • 30.Firn, R. D. & Jones, C. G. Natural products–a simple model to explain chemical diversity. Nat. Prod. Rep.20(4), 382–391 (2003). 10.1039/b208815k [DOI] [PubMed] [Google Scholar]
  • 31.Weng, J.-K., Philippe, R. N. & Noel, J. P. The rise of chemodiversity in plants. Science336(6089), 1667–1670 (2012). 10.1126/science.1217411 [DOI] [PubMed] [Google Scholar]
  • 32.Ober, D. Seeing double: Gene duplication and diversification in plant secondary metabolism. Trends Plant Sci.10(9), 444–449 (2005). 10.1016/j.tplants.2005.07.007 [DOI] [PubMed] [Google Scholar]
  • 33.Cormen, T. H. et al.Introduction to Algorithms 4th edn. (MIT Press, 2022). [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Data Availability Statement

The custom code for the developed algorithm used in this study is publicly available from: https://github.com/wataru-s1/SimpleTarget. Our study did not generate new biological dataset; therefore, the manuscript does not report biological data generation. The data used in this study is publicly available from the WormAtlas25 and the PMN28 databases, and from a previous publication (Supplemental Material section) referenced in the text26: 10.1126/science.add9330


Articles from Scientific Reports are provided here courtesy of Nature Publishing Group

RESOURCES