Feature Learning and Network Structure from Noisy Node Activity Data

Junyao Kuang; Caterina Scoglio; Kristin Michel

doi:10.1103/PhysRevE.106.064301

. Author manuscript; available in PMC: 2023 Jan 23.

Published in final edited form as: Phys Rev E. 2022 Dec;106(6-1):064301. doi: 10.1103/PhysRevE.106.064301

Feature Learning and Network Structure from Noisy Node Activity Data

Junyao Kuang ^1,^*, Caterina Scoglio ¹, Kristin Michel ²

PMCID: PMC9869472 NIHMSID: NIHMS1855463 PMID: 36671154

Abstract

In the studies of network structures, much attention has been devoted to developing approaches to reconstruct networks and predict missing links when edge-related information is given. However, such approaches are not applicable when we are only given noisy node activity data with missing values. This work presents an unsupervised learning framework to learn node vectors and construct networks from such node activity data. First, we design a scheme to generate random node sequences from node context sets, which are generated from node activity data. Then, a three-layer neural network is adopted training the node sequences to obtain node vectors, which allow us to construct networks and capture nodes with synergistic roles. Furthermore, we present an entropy-based approach to select the most meaningful neighbors for each node in the resulting network. Finally, the effectiveness of the method is validated through both synthetic and real data.

I. INTRODUCTION

A network is a system-level view of pairwise interactions between nodes, genes, or elements in a complex system [1–14]. The first step in analyzing a networked system is to construct the network from data obtained with different technologies. In most cases, network structures can be determined through direct measurements, meaning that pairwise relationships between nodes can be observed directly. For instance, the edges in friendship networks can be probed through various ways, including using questionnaires, checking Facebook or Twitter friendship, and investigating face-to-face interactions [15–19]. As another example, edges in web graphs can be directly determined by checking if hyperlinks exist between web pages. However, there are cases where the relationships between nodes cannot be observed directly [20]. Instead, we may only have node activity data that reflect the properties of nodes from various aspects. In these cases, we need to estimate the underlying network structure from nodal data. Such problem exists in many areas, including the construction of financial, biological, and climate networks [21–31]. In these areas, measurements of pairwise relationships are not always feasible [6, 20, 32]. Instead, we can conduct various experiments to measure node activities under different conditions [33].

This work develops a model to learn node representations from noisy and heterogeneous data and proposes an entropy-based method to extract network structures. Specifically, we investigate the problems of feature learning and network construction for gene co-expression data. Different high throughput technologies, including microarray and RNA-sequencing, allow simultaneously evaluating thousands of gene expression data. Usually, the data can be organized into a matrix that consists of rows representing N genes (nodes) and columns representing M experimental conditions. To construct a network from such expression data, we need to consider three problems. (1) The expression data, measured through different experimental technologies, are distributed in various ranges. For example, the raw expression values obtained from different versions of RNA-sequencing in different labs are dispersed from zero to tens of thousands and do not follow any specific distribution. (2) Missing values are frequently present in the dataset. Some experiments may only test a subset of genes for specific purposes, or some experimental data for some genes (nodes) are not available. (3) The levels of noise are not constant. For instance, the environments, such as humidity, temperature, and light intensity, could potentially influence the accuracy of the devices and the measured node activity data. The method for network construction is not allowed to be affected by missing values and noisy data.

There are diverse approaches aiming at constructing networks from nodal data. A significant volume of works uses the correlation coefficient to measure the degree to which a pair of nodes is related, and edges are selected by thresholding the correlation coefficients [34–36]. However, the drawbacks of the correlation methods are that: 1) the expression data are required to follow a (quasi-) normal distribution, 2) the correlation coefficients are significantly affected by outliers, and 3) the number of measured conditions and missing values substantially affect the results. [34, 35]. Mutual Information (MI) and its variants are also used to construct gene co-expression networks. The MI models do not require the data to follow the normal distribution. Still, the MI models are even more complex, since we are expected to find the joint probability distribution for every pair of genes [37]. We need to solve the problems mentioned above before applying either of the two methods. To solve problems (1) and (3), some researchers have proposed using rescaling and normalization methodologies to obtain quasi-normal distributed data from the raw node activity data [38]. According to [39], the number of experimental conditions significantly influences the correlation coefficients under the null hypothesis that two nodes are not correlated. Theoretical analysis shows that correlations based on ten conditions tend to be higher than those computed with 50 conditions. Missing values lead to node pairs with a different number of paired elements, meaning that the node pairs with fewer paired elements are more likely to have high correlation coefficients. Therefore, some works use imputation or interpolation to solve problem (2) [40, 41]. The complex data processing procedures pose a severe challenge for the principle of parsimony when we further study the resulting network structure [42].

Edge selection is another issue we need to consider when constructing networks from node activity data. Both correlation and MI methods return coefficients between −1 and 1. Many researchers construct unweighted networks by applying a threshold to select edges of the network corresponding to node pairs with the highest coefficients. However, choosing a threshold is always tricky since a high threshold could generate singleton nodes, while a low threshold generates networks with many weakly connected node pairs [38, 43]. Though the problem can be solved by fixing the minimum number of neighbors of each node, the choice of the threshold influences the node degree distribution, meaning that nodes’ roles in the resulting network are related to the choice of thresholds. As an alternative, we propose an entropy-based network construction method, which has better performance in maintaining nodes’ roles (e.g. hubs and leaf nodes) and avoiding isolating nodes.

This paper proposes a neural network-based method to extract node representations, and presents an entropy-based approach to construct networks from noisy node activity data. Inspired by the application of neural networks in natural language processing (NLP) [44–49], we propose generating node sequences from node activity data to simulate sentences in documents. The neural network model can embed node sequences into vectors of identical dimensions, which allow us to study node features and construct networks. The main contributions of the paper are as follows: First, we design a simple and direct data processing scheme to generate random node sequences from M conditions. In our approach, the raw data are not required to follow any specific distribution. Thus, re-scaling and normalization are obsolete. In addition, the M conditions are processed separately, meaning that negative impacts from missing data and outliers can be minimized. Second, the node sequences are trained with a three-layer neural network model, which builds on the hypothesis that nodes with similar properties tend to have similar neighbors [49]. As a result, similar nodes have similar values in the trained node vectors. Third, we propose an entropy-based method to extract the corresponding network where selected edges can recover node roles [50, 51]. Finally, we demonstrate the validity of the proposed approach experimentally using synthetic and real data.

II. APPROACH

In this section, we define the context set, node sequence generation, and the entropy-based method for network construction.

In human language, words in similar contexts tend to have similar meanings [44]. That is, words with similar meanings usually show in similar neighborhoods. We can use NLP models to learn node representations if we have node sequences in which nodes with similar measurements are in similar contexts. The measurements of nodes in different conditions represent different properties, similar to words in various topics that may have different meanings. Building on these observations, we design a scalable node sequence generation strategy to process the M conditions separately.

A. Generate context sets from node activity data

Suppose the N nodes are measured in M conditions. Given a node υ_i (i ≤ N), we assume its value in the ωth condition is υ_i(ω). We define the context set of node υ_i in the ωth condition as:

C_{ω} (v_{i}) = {v_{j} : | v_{j} (ω) - v_{i} (ω) | \leq δ_{i}^{ω}} .

(1)

where the tolerance $δ_{i}^{ω}$ can be a parameter such that $δ_{i}^{ω} = β_{ω} v_{i} (ω)$ . By employing the parameter β_ω, we can tune the size of the context set per the error levels of different technologies. In this work, we skip the generation of context set C_ω(υ_i) when a missing value is present in the ωth condition for node υ_i, and we do not predict the missing values from other conditions.

Formally, the context set of node υ_i is composed of nodes with measurements falling in the range $[v_{i} (ω) - δ_{i}^{ω}, v_{i} (ω) + δ_{i}^{ω}]$ . Therefore, the number of elements of the intersection set C_ω(υ_i) ∩ C_ω(υ_k) is related to the measurements of the two nodes υ_i and υ_k. For example, assume the measurements of the three nodes υ_x, υ_y, and υ_z in the ωth condition are respectively υ_x(ω) = 1000, υ_y(ω) = 990, and υ_z(ω) = 950. It is clear that υ_y(ω) is closer to υ_x(ω) than υ_z(ω), i.e., |υ_x(ω) − υ_y(ω)| < |υ_x(ω) − υ_z(ω)|. Therefore, we have the following inequality:

| C_{ω} (v_{x}) \cap C_{ω} (v_{y}) | \geq | C_{ω} (v_{x}) \cap C_{ω} (v_{z}) |,

(2)

where | · | denotes the cardinality of the intersection set. The context set C_ω(υ_y) recapitulates more elements of C_ω(υ_x) than the context set C_ω(υ_z). For any node υ_j ∈ C_ω(υ_x), we have the probability

P (v_{j} \in C_{ω} (v_{y})) \geq P (v_{j} \in C_{ω} (v_{z})) .

(3)

Furthermore, we assume G_ω(υ_i) is a set consisting of nodes whose context set contains node υ_i, such that

G_{ω} (v_{i}) = {v_{j} : v_{i} \in C_{ω} (v_{j})} .

(4)

Based on the same example above, we can say that there are more context sets containing simultaneously υ_x and υ_y than containing simultaneously υ_x and υ_z. Therefore, we have

| G_{ω} (v_{x}) \cap G_{ω} (v_{y}) | \geq | G_{ω} (v_{x}) \cap G_{ω} (v_{z}) |,

(5)

meaning that the nodes with closer values are more likely to be present in the same context sets. Similarly, for any node υ_j ∈ G_ω(υ_x), we have the probability

P (v_{j} \in G_{ω} (v_{y})) \geq P (v_{j} \in G_{ω} (v_{z})) .

(6)

In the generation of node sequences, we always sample the subsequent node from the context set of the current node. For example, given a node sequence l, suppose the ith node is υ_x, i.e., l_i = υ_x. Then, we have a node sequence

{\dots, l_{i - 1} \in G_{ω} (v_{x}), l_{i} = v_{x}, l_{i + 1} \in C_{ω} (v_{x}), \dots} .

(7)

Based on Eq. 3 and Eq. 6, l_i+1 tends to be in C_ω(υ_y) with higher probability than C_ω(υ_z), and l_i−1 is more likely to be in G_ω(υ_y) than in G_ω(υ_z). That is, the context nodes of υ_x tend to be the context nodes of υ_y rather than υ_z, since υ_y(ω) is closer to υ_x(ω) than υ_z(ω). Therefore, in the generated node sequences, we can say that nodes with closer values tend to appear in similar contexts.

B. Generate random node sequences

The simplest way to generate node sequences from context sets would be to randomly sample the next node from the context set of the current node, which is exactly the first order Markov chain [52]. Assume the ith node of a node sequence is l_i, the next node l_i+1 ∈ C_ω(l_i) is chosen with probability

p (l_{i + 1} ∣ l_{i}) = \frac{1}{| C_{ω} (l_{i}) |} .

(8)

Under this assumption, the nodes in the context set C_ω(l_i) have an equal probability of being chosen as the subsequent node.

Alternatively, we can generate biased random node sequences. Suppose we have just traversed node l_i−1, and now we reside at node l_i. The probability of sampling the next node l_i+1 is biased by the previous node l_i−1. Therefore, we introduce a parameter ρ, and the unnormalized probability of the next node is

p (l_{i + 1} ∣ l_{i}, l_{i - 1}) = {\begin{array}{l} 1 & if if l_{i + 1} \in C_{ω} (l_{i - 1}) \cup {l_{i - 1}} \\ ρ & else, \end{array}

(9)

where l_i+1 ∈ C_ω(l_i). The sampling strategy is similar to a second order Markov chain [52], in which the probability of adding the next node is not only influenced by the current node but also the previous node. A low value of ρ boosts the rate of sampling an element from C_ω(l_i−1). On the contrary, a high value of ρ controls the probability of exploring a node far from l_i−1. Higher ρ allows sampling a node in C_ω(l_i) but not in C_ω(l_i−1). If ρ = 1, Eq. 9 is equivalent to Eq. 8.

In the ωth condition, we generate K random node sequences starting from each node. Repeating the process for all the M conditions, we obtain a corpus T containing KNM − KZ node sequences, where Z represents the number of missing values.

The goal of generating random node sequences is to feed the corpus T to a three-layer neural networks to obtain node vectors [47–49, 53, 54]. Please refer to Appendix A for more information about the neural network model.

C. Construct network from trained node vectors

After training the neural network model, we obtain N vectors for the N nodes. With the node vectors, we can predict relationships between the nodes, visualize the global structures of the nodes, and construct a corresponding network.

A conventional way to select the edges is by global thresholding the cosine similarities to filter out weak links and obtain a backbone of the underlying network. Globally thresholding edges (GTE) is widely used in determining gene co-expression networks [36]. However, the drawback of the GTE is that some nodes could be isolated from the network if the threshold is high. Though we can force isolated nodes connected to some other nodes, the degree distribution of the constructed network is still affected by the selection of threshold, meaning that the roles of nodes in the network are sensible to the choice of threshold. To avoid these issues, we propose a Rényi entropy-based method (REM) to extract a network from the trained node vectors [50, 55, 56, 58, 59].

Once we have the node vectors, we can compute the cosine similarity to quantify the connection strength for each pair of nodes. Here, we define S⁰(υ_i) as the initial neighbor set of node υ_i. S⁰(υ_i) is composed of nodes that are positively similar to υ_i, i.e., S⁰(υ_i) = {υ_j : s(υ_i, υ_j) > 0}, where s(υ_i, υ_j) is the cosine similarity between υ_i and υ_j. The network constructed from S⁰(υ_i), ∀i < N is not helpful in real applications because most node pairs are weakly connected.

Inspired by the application of entropy in ecology, we regard the nodes in the set S⁰(υ_i) are the states of the system υ_i. Then, we associate each state with a probability, which is computed from the similarity values, such that:

{\bar{s}}_{i}^{0} (v_{j}) = \frac{s (v_{i}, v_{j})}{\sum_{v_{j} \in S^{0} (v_{i})} s (v_{i}, v_{j})} .

(10)

In information theory, entropy depicts the diversity and randomness of a system [51]. The Rényi entropy for node υ_i with order α is

H_{α}^{1} (v_{i}) = \frac{1}{1 - α} ln \sum_{v_{j} \in S^{0} (v_{i})} {({\bar{s}}_{i}^{0} (j))}^{α},

(11)

where α > 0. Note that the Rényi entropy converges to the Shannon entropy in the case α → 1, i.e., $H_{α}^{1} (v_{i}) = \sum_{v_{j} \in S^{0} (v_{i})} {\bar{s}}_{i}^{0} (v_{j}) log {\bar{s}}_{i}^{0} (v_{j})$ . For any α, the entropy $H_{α}^{1} (v_{i})$ varies from zero to ln |S⁰(υ_i)|. In the case of a certain event, i.e., ∃ υ_j ∈ S⁰(υ_i), where ${\bar{s}}_{i}^{0} (v_{j}) = 1$ and H_α(υ_i) = 0. Conversely, the entropy $H_{α}^{1} (v_{i}) = ln | S^{0} (v_{i}) |$ when ${\bar{s}}_{i}^{0} (v_{j})$ follows a uniform distribution. The diversity index $D_{α}^{1} (v_{i})$ is

D_{α}^{1} (v_{i}) = exp (H_{α}^{1} (v_{i})) = {(\sum_{v_{j} \in S^{0} (v_{i})} {({\bar{s}}_{i}^{0} (v_{j}))}^{α})}^{1 / (1 - α)},

(12)

which is also known as the Hill numbers [55]. It is unsurprising that ${\bar{s}}_{i}^{0} (v_{j})$ is not uniformly distributed, and $H_{α}^{1} (v_{i}) \in [0, ln | S^{0} (v_{i}) |]$ . In ecology, the diversity index quantifies the abundance of species in a community. The diversity index approaches the total number of species when the species are equally abundant and approaches one if there is a dominant species. In Eq. 12, the order α influences the sensitivity of the diversity index. Increasing α strengthens the weights of the most abundant species. That is, higher α allows us to select the more abundant species, while lower α will detect more species. Therefore, we can use α to control the number of neighbors of node υ_i.

We pick ⌊ $D_{α}^{1} (v_{i})$ ⌉ nodes that have highest similarities from S⁰(υ_i) as the effective number of neighbors of node υ_i. Then, the selected neighbors compose a new neighbor set S¹(υ_i). The nodes in S¹(υ_i) are more strongly connected to υ_i than the nodes in S⁰(υ_i). The network constructed from S¹(υ_i), ∀i ≤ N is denser than that constructed from S⁰(υ_i), ∀i ≤ N. We can repeatedly run Eqs. 10, 11, and 12 to obtain a network with desired edge density. Assume ⌊ $D_{α}^{k} (v_{i})$ ⌉ is the diversity index of kth iteration. Then, we have

S^{k} (v_{i}) = {v_{j} \in S^{k - 1} (v_{i}) : | {v_{z} \in S^{k - 1} (v_{i}) : s (v_{i}, v_{j}) < s (v_{i}, v_{z})} | < ⌊ D_{α}^{k} (v_{i}) ⌉},

(13)

where k ≥ 1. In each iteration, ⌊ $D_{α}^{k} (v_{i})$ ⌉ nodes with highest similarity values are selected as the neighbors of υ_i. Intuitively, the REM can filter out weak links for υ_i, and the remaining nodes S^k(υ_i) are the most meaningful neighbors of υ_i.

In real networks, leaf nodes are those connected to a small number of others, while hubs have many neighbors. Considering the property of entropy [56, 59], the size of the resulting neighbor set S^k(υ_i) is relatively small if the similarity value distribution of S⁰(υ_i) is right-skewed. On the contrary, the size of S^k(υ_i) is much larger if the similarity value distribution of S⁰(υ_i) is left-skewed [50, 57]. That is, the role of node υ_i in the resulting network is related to the similarity value distribution.

III. RESULTS

The method we have presented falls in the category of unsupervised learning. In this section, we use both synthetic and real data to evaluate the performance of the proposed approach in recovering global and local structures in terms of feature learning and network reconstruction.

A. Feature learning

a. Synthetic data.

In this part, we used two case studies with N₁ = 5000 and N₂ = 5500 nodes to evaluate the performance of the proposed approach in recovering a global structure. The nodes in the two case studies are measured in six conditions (M₁, M₂, ⋯, M₆) and distributed in five communities (G₁, G₂, ⋯, G₅). The first case study has five communities of equal size, i.e., each group has 1,000 nodes. The five communities in the second case study have respectively G₁ = 1000, G₂ = 1500, G₃ = 500, G₄ = 750, and G₅ = 1750 nodes (The sizes of the communities are chosen randomly). Note that the sixth condition is a perturbation. In each condition, nodes in the same community are assigned random values from one of the intervals: A = [1, 100], B = [101, 200], C = [201, 300], D = [301, 400], E = [401, 500], and R = [1, 500]. In this work, we created four datasets for each case study per the tables in Appendix B. In Table. VI (Data.1), G₁ and G₂ are adjacent but not overlapped. In Tables VII (Data.2), VIII (Data.3), and IX (Data.4), nodes from G₁ and G₂ are respectively assigned with values from two, three, and four same intervals, as shown in bold fonts. The relative distance between G₁ and G₂ is expected to decrease with respect to the increase of the number of overlapped intervals.

b. Experimental results.

Based on the approach introduced in Section II, we generated context sets with a tolerance of $δ_{i}^{ω} = 0.1 v_{i} (ω)$ (β_ω = 0.1). In the experiments, we generated K =10 random node sequences of length l = 80 starting from each node in each of the six conditions. Consequently, the corpus T₁ and T₂ consist of 300000 and 330000 node sequences, respectively. In the neural network, we set the node vector dimension to d = 128.

To evaluate the training results qualitatively, we mapped the trained node vectors to a 2D plane via the Principle component analysis (PCA) [60, 61]. In Fig. 1(a) and 2(a), the nodes from the same communities are mapped to the same areas, meaning that the proposed method can recover the global structure of the dataset. Note that nodes in G₁ and G₂ are assigned to values from two, three, and four overlapped sub-intervals from Data.2 to Data.4 (see the details in Appendix B). That is, the distance between G₁ and G₂ is assumed to be decreasing for Data.2, Data.3, and Data.4. In Fig. 1(b) and 2(b), we observed the relative distance between G₁ and G₂ was closer than that in Fig. 1(a) and 2(a). Similarly, the relative distance between G₁ and G₂ was even closer in Fig. 1(c) and 2(c), and the two communities were almost merged in Fig. 1(d) and 2(d). The results for the two case studies (eight datasets in total) demonstrated that the node vectors can reflect the relative distances of the node communities, which are affected by the number of overlapped sub-intervals.

FIG. 1. — The node vectors trained from the first case study are visualized via PCA. The five communities have an equal number of nodes. Panels (a) to (d) are respectively the training results of *Data*.1 to *Data*.4.

FIG. 2. — The node vectors trained from the second case study are visualized via the PCA. The five communities have 1000, 1500, 500, 750, and 1250 nodes. Panels (a) to (d) are respectively the training results of *Data*.1 to *Data*.4.

In order to quantitatively show the results, we computed the distance between G₁ and G₂. To this end, we calculated the cosine distance (1-cosine similarity) between node pairs. The distance between G₁ and G₂ was computed as the summation of all possible node pairs between the two communities. For example, the cosine distance between G₁ and G₂ is

Dis (G_{1}, G_{2}) = \sum_{v_{i} \in G_{1}, v_{j} \in G_{2}} 1 - s (v_{i}, v_{j}) .

(14)

Then, we calculate the relative distance between G₁ and G₂ as:

RelaDis (G_{1}, G_{2}) = \frac{D i s (G_{1}, G_{2}) * D i s (G_{1}, G_{2})}{D i s (G_{1}, G_{1}) * D i s (G_{2}, G_{2})} .

(15)

Additionally, we perform the simple K-means clustering method [62] to classify the trained node vectors into five communities. The classification results are compared to the ground-truth communities.

The relative distances between G₁ and G₂ and classification results are shown in Table. I. We observe that the cosine distance between G₁ and G₂ is decreasing for Data.1 to Data.4, in accord with the visualizations in Fig. 1 and Fig. 2. Specifically, the distance is close to one for Data.4, which suggests that the two communities almost merged. The classification results also agree with the visualization. The classification accuracy is above 99% for Data.1, Data.2, and Data.3, and the classification accuracy has dropped significantly in Data.4 since the two communities are almost overlapped, and the nodes from the two communities are falsely classified. From a global view, the node vectors can recover the mesoscopic structure of the nodes. In the following experiments, we will only use the first case study to conduct further analysis.

TABLE I.

The relative distance between G₁ and G₂ and classification accuracy

Data.	First case study		Second case study
Data.	Distance	Accuracy(%)	Distance	Accuracy(%)
1	2.18	99.98	2.14	100
2	1.82	99.96	1.86	99.98
3	1.52	99.96	1.49	99.90
4	1.18	79.84	1.16	77.92

Open in a new tab

To study the influence of missing values, we generate incomplete datasets by randomly removing 10% and 20% of values from each condition. We use the same parameters to train the neural network, and the results are shown in Table II. It can be observed that the relative distances between G₁ and G₂ and the classification accuracies are not significantly affected by the missing values. In Fig. 3, the visualization shows that the global structure of the nodes can still be recovered even when 20% data have been removed randomly. Therefore, the results suggest the proposed method is robust to missing values.

TABLE II.

The influence of missing values on the distance between G₁ and G₂ and classification accuracy

Data.	10% missing values		20% missing values
Data.	Distance	Accuracy(%)	Distance	Accuracy(%)
1	2.19	100	2.16	99.92
2	1.77	99.92	1.76	99.72
3	1.49	99.64	1.47	98.76
4	1.15	79.68	1.14	79.96

Open in a new tab

FIG. 3. — Visualization of node vectors trained from the first example with 20% missing values. Panels (a) to (d) are respectively the training results of *Data*.1 to *Data*.4.

The training results are robust to the choice of training parameters. In the generation of node sequences, we assigned different values to β_ω to control the size of the context set $δ_{ω}^{i}$ . The node sequences are trained using the neural network model with ρ = 1. Similarly, we computed the relative distances between G₁ and G₂ and the classification accuracy. Table III shows that the relative distances are at the same levels for the same datasets, and the classification accuracies are not significantly affected by β_ω, meaning that the global structure is still maintained. Thus, the choice of β_ω has limited influence on the embedded node vectors.

TABLE III.

The relative distance between G₁ and G₂ and classification accuracy w.r.t. the variation of β_ω

Data.	Distance				Accuracy(%)
Data.	0.05	0.1	0.15	0.2	0.05	0.1	0.15	0.2
1	2.08	2.18	2.22	2.16	100	99.98	99.86	99.86
2	1.78	1.82	1.82	1.76	99.98	99.96	99.86	99.88
3	1.49	1.52	1.48	1.47	99.92	99.96	99.06	99.84
4	1.15	1.18	1.16	1.14	80.72	79.84	80.86	81.34

Open in a new tab

To compare the proposed approach with the widely used correlation approach [36, 39], we generated four networks from the synthetic data and trained the networks with semi-supervised learning algorithms to obtain node vectors. First, the values of each condition are normalized with the z-score:

{\bar{v}}_{i} (ω) = \frac{v_{i} (ω) - μ_{ω}}{σ_{ω}}

(16)

where μ_ω is the mean of all the values in the ωth condition, σ_ω is the standard deviation, and ${\bar{v}}_{i} (ω)$ is the normalized expression value.

The Pearson correlation coefficient (PCC) of any two nodes is:

r_{(v_{x}, v_{y})} = \frac{\sum_{i = 1}^{M} ({\bar{v}}_{x} (ω) - {\bar{v}}_{x}) ({\bar{v}}_{y} (ω) - {\bar{v}}_{y})}{\sqrt{\sum_{i = 1}^{M} {({\bar{v}}_{x} (ω) - {\bar{v}}_{x})}^{2}} \sqrt{\sum_{i = 1}^{M} {({\bar{v}}_{y} (ω) - {\bar{v}}_{y})}^{2}}}

(17)

where $r_{(v_{x}, v_{y})}$ is the PCC between node υ_x and υ_y, and ${\bar{v}}_{x}$ is the mean of node υ_x across the M conditions. The PCC measures how much the two genes are related [36, 38]. In this experiment, we did not consider the missing value problem, which could substantially influence the correlation coefficients according to the results in [39]. The edges are selected by thresholding correlation coefficients, such that PCC ≥ 0.95 [38, 63]. All four networks have edge densities above 5%, as shown in Table IV.

TABLE IV.

The relative distance between G₁ and G₂ and the classification accuracy of the PCC networks

Data.	Edge density	Distance	Accuracy(%)
1	6.30%	6.13	100
2	5.81%	3.64	81.30
3	5.21%	2.66	80.82
4	5.73%	1.17	64.26

Open in a new tab

To study the properties of nodes, different methodologies are used to determine node vectors from network structure [44–46, 64]. Here, we used the node2vec method introduced in [44] to obtain node vectors from the constructed networks since the approach has shown outstanding performance in reconstructing networks. Similarly, we computed the relative distances between G₁ and G₂ from the trained node vectors, and the results are shown in Table IV. We observed that the relative distances between the two communities for the first three networks are much higher than in our method (Table I). In the synthetic datasets, G₁ and G₂ are assumed to partially overlap. However, this characteristic is not recovered from trained node vectors per the visualization of Fig. 4. In the PCC method, errors could be introduced in data normalization, network construction, and feature learning, which consequently influence the accuracy of trained node vectors. As a comparison, our proposed approach trains the node vectors directly from the raw data.

FIG. 4. — The visualization of node vectors trained from the Pearson correlation network. Panels (a) to (d) are respectively *Data*.1 to *Data*.4.

More experimental results on the choice of ρ can be found in Appendix C.

c. Real data.

We used two real Anopheles gambiae gene expression datasets [20, 41] to show that the learned node vectors can capture the local structure of the nodes. The first dataset consists of 10,433 Anopheles gambiae genes measured in time series after desiccation stress (five conditions) [65]. The five measurements (conditions) of each gene are almost at the same level, and the distributions of the coefficient of variation (CV) and means of the 10,433 genes are shown in Fig. 5(a) and (c). The second dataset measures the gene expression values after mating [66], consisting of four measurements (also in time series). The distributions of the CV and means are shown in Fig. 5(b) and (d).

In Fig. 5(a) and (b), we observe that the CVs of most genes are at a low level. Therefore, we can set the tolerance $δ_{i}^{ω}$ as the average CV of all the nodes, such that

\bar{C V} = \frac{1}{N} \sum_{i} \frac{σ_{i}}{m_{i}},

(18)

where m_i is the mean value of gene υ_i, σ_i is the standard deviation, and $\frac{σ_{i}}{m_{i}}$ is the CV of gene υ_i. The $\bar{C V} s$ of the two data sets are respectively 0.086 and 0.12. Therefore, we set $β_{ω} = \bar{C V}$ .

d. Experimental results.

The trained node vectors are visualized via the t-distributed stochastic neighbor embedding (t-SNE) method [67] in Fig. 6. The t-SNE constructs probability distribution over pairs of vectors and does not retain the distances of node pairs, but their probabilities. Therefore, the t-SNE approach has better performance in preserving local structure. In Fig. 6, we can observe that the genes with similar expression values are mapped closer, even when 20% of values have been removed from each condition.

As a comparison, we construct a PCC network for the first real data. The raw expression values are rescaled with log2 [20, 38] and normalized per Eq. 16 (the distribution of the raw data is heterogeneous as shown in Fig. 5(c)). Then, a PCC network is constructed by thresholding the edges with a threshold PCC≥ 0.95 (the network is not sparse). The resulting network consists of 756,330 edges. Similarly, the node vectors are obtained by training the node2vec model. In Fig. 7, we observed that nodes are distributed randomly in the 2D plane, suggesting that nodes with close values are not mapped to the same area. For example, the expression values of the two genes AGAP004677 and AGAP012093 are respectively [2764, 2869, 3276, 3690, 3671] and [129, 149, 184, 221, 265], and it is apparent that the expression values of the two genes are at different levels. However, the PCC between the two genes is 0.983, suggesting the two nodes are highly related. The reason is that the PCC method does not depend on the scale of expression values but detects the linear dependence of two genes. In contrast, our approach assumes that similar nodes have more shared elements in their context sets.

FIG. 7. — The visualization of node vectors trained from the PCC network.

B. Results of network extraction

Thresholding similarity value is the most straightforward and widely used approach in network construction. However, some nodes could be isolated from the network since these nodes may have relatively low similarities to all other nodes. In Fig. 8, we applied different thresholds to the cosine similarities computed from the node vectors trained with the synthetic data (Fig. 1(a)) and the real data (Fig. 5(a)). We observed that the percentage of isolated nodes increases rapidly when the thresholds are greater than 0.8 (synthetic data) and 0.95 (real data), respectively. In this paper, we define such threshold as the critical value. If we use a threshold smaller than the critical value, most nodes are connected to at least one other node. On the contrary, if the threshold is larger than the critical value, we possibly obtain a network with a large percentage of singleton nodes.

FIG. 8. — Experimental results when different thresholds are applied. (a) shows the percentages of isolated nodes w.r.t. the thresholds. (b) shows the edge densities of networks when different thresholds are used.

We can force isolated nodes to connect with highly similar nodes in real applications. However, the neighbors selected through a single threshold are not affected by the distribution of similarity values. It is not rare that hubs are connected to many other nodes but with relatively low similarity values, while leaf nodes may connect to a small number of nodes with high similarities. That is, the distribution of similarity values is not considered in the selection of edges.

The proposed REM will maintain every node connected to at least one other node since the “threshold” of each node is determined via the distribution of similarity values per Eq. 12. In the experiments, we applied the REM to the two datasets used in Fig. 8, and the results are shown in Fig. 9. We observed that edge density decreases drastically in the first several iterations, and then the edge density decreases gradually. The reason is that the weakly connected edges are removed immediately from the network in the first several cycles. In contrast, the remaining edges have relatively high similarities, which are removed at a slower speed. In addition, the parameter α allows us to control the removal speed and edge density. Higher α removes weak links more efficiently, which aligns with our analysis in Sec. IIC. In real applications, we can fix α and update Eq. 10 to Eq. 10 iteratively until we obtain a network with desired edge density.

FIG. 9. — Edge density analysis of the REM. (a) Synthetic data. (b) Real data.

Furthermore, we generated four GTE-based networks with different thresholds for the datasets used in Fig. 8. The properties of the networks are shown in Table V. Specifically, The GTE networks are respectively generated with thresholds less and equal to the critical thresholds. In addition, we generated four REM networks, which have similar edge densities to their GTE counterparts. In Table V, we found that both GTE and REM return networks with similar average degrees when the edge densities are the same. However, the GTE networks always have a higher average clustering coefficient, suggesting that nodes in the GTE networks are more likely to cluster together. In Fig. 10, we compared the degree distributions of the eight networks. We observed that the degree distributions of the GTE and REM networks almost overlap when the thresholds are less than the critical values (panels (a) and (c)). When the thresholds are at the critical values (panels (b) and (d)), some nodes in the REM networks still have high degrees, which are similar to the hubs in many real networks. Besides, we observed that all four REM networks have many low-degree nodes, which account for the lower average clustering coefficients in Table V.

TABLE V.

The properties of networks constructed with the GTE and REM

Property	<CT^a (Syn.^b)		=CT (Syn.)		<CT (Real.^c)		=CT (Real)
Property	GTE	REM	GTE	REM	GTE	REM	GTE	REM
Threshold	0.7	-	0.8	-	0.92	-	0.95	-
Isolated nodes	0	0	287	0	372	0	849	0
Edge density	1.03%	0.972%	0.164%	0.160%	2.88%	2.81%	1.18%	1.18%
Ave. degree	51.5	48.6	8.2	8	300.8	293.2	123.2	123.4
Ave. clustering	0.46	0.41	0.38	0.32	0.66	0.55	0.59	0.37

Open in a new tab

CT denotes Critical threshold

Syn. denotes the synthetic data used in Fig. 8 and 9

Real. denotes the real data used in Fig. 8 and 8

FIG. 10. — The degree distributions of networks generated from the GTE and REM. (a) and (b) are the degree distributions of the networks constructed from synthetic data. (c) and (d) are the degree distributions of the networks constructed from real data. Panels (a) and (c) show the results of densely connected networks, while (b) and (d) are the results of sparsely connected networks.

Finally, we compared how edge density affects the roles of nodes. In Fig. 11, each point represents a node. The horizontal coordinate represents the node’s degree in the densely connected network, while the vertical coordinate represents the node’s degree in the sparse network. We observed that the node degree of the GTE networks is remarkably affected by the threshold selection. The highest node degrees have dropped from 345 to 73 for the synthetic data and from 704 to 370 for the real data. In the REM network, the highest node degrees have dropped from 321 to 210 for the synthetic data and 682 to 493 for the real data. In Fig. 11, the REM approach is more likely to remove edges from low-degree nodes. Edges from high-degree nodes are removed proportionally, which means the nodes’ roles are maintained and not significantly influenced by edge densities. On the other hand, the nodes’ degree in the resulting GTE networks is strongly related to the choice of edge density. In Fig. 11(b), we can see that the points of the GTE networks are over-dispersed in the diagram.

More experimental results on real data are discussed in Appendix D.

IV. CONCLUSION AND FUTURE WORKS

This paper presents a neural network-based approach for learning node vectors from noisy node activity data. The primary advantage of the proposed method is that data are not required to follow any specific distribution since we generate context sets from raw data for each condition. The proposed approach is not constrained by missing values that ubiquitously exist in experimental results. Inspired by the application of neural networks in natural language processing, we generate a corpus of node sequences to simulate sentences in documents. The corpus is trained by a neural network model, which produces node vectors and allows comparing and identifying nodes with synergistic roles. The experimental results show that the proposed approach is robust to the choice of parameters and missing values. In addition, we offer an alternative method to select edges for the underlying network. The REM method is based on the Rényi entropy and selects edges according to the distribution of similarity values. The proposed approach constructs networks without isolating nodes and can recover the roles of nodes.

In this work, we designed two experiments to test the proposed method. With both synthetic and real data, we showed that the proposed method could unveil the global and local structure of the nodal data even when 20% values are randomly removed from the datasets. Furthermore, we tested the proposed entropy-based network extraction method. We can obtain a network with desired edge density without isolated nodes by controlling the parameter α and the number of iterations.

The experiments in this paper show promising results in detecting global and local structures from noisy nodal data. We expect the proposed data processing methodology to be used in different areas, including biology and finance, especially where node activity data are measured with different techniques and missing values are present.

ACKNOWLEDGEMENTS

This research is supported by the National Institutes of Health under Grant No. R01AI140760. The contents of this article are solely the responsibility of the authors and do not necessarily represent the official views of the funding agency.

Appendix A: The skip-gram model

The goal of generating random node sequences is to feed the corpus T to neural networks to train node vectors. In this work, we adopt the simple three-layer skip-gram model as shown in Fig. A.1. This neural network framework has three layers; input, hidden, and output layer [47–49, 53, 54]. In this work, the goal is to find the d dimensional vector for each of the N nodes.

In our assumption, nodes with similar values tend to appear in a similar context. Given a neighborhood H consisting of 2c nodes, we denote P(υ_x | H) as the conditional probability of node υ_x is neighboring to the 2c nodes in H. Based on Bayes’ theorem, we have

P (v_{x} ∣ H) = \frac{P (H ∣ v_{x}) P (v_{x})}{P (H)},

(A1)

where P(H) and P(υ_x) are respectively the probability of H and υ_x, and P(H) and P(υ_x) can be regarded as constants. Then, we have

P (v_{x} ∣ H) \propto P (H ∣ v_{x}) .

(A2)

Now, we take one of the node sequences from the corpus. Let l_i denote the ith node of the sequence, and H = {l_i−c, …, l_i₋₁, l_i₊₁, …, l_i+c}. That is, we have an outcome H given l_i. Since the goal is to determine f(l_i), we replace υ_x in Eq. A2 with f(l_i), and assume the 2c nodes are independent [47]. We have

P (f (l_{i}) ∣ H) \propto P (H ∣ f (l_{i})) = \prod_{- c \leq j \leq c, j \neq 0} P (l_{i + j} ∣ f (l_{i})),

(A3)

where P(l_i+j | f(l_i)) is the occurring probability of node l_i+j given the vector f(l_i). To determine f(l_i), we have the optimization problem after taking the log form of Eq. A3:

E = - min_{f} \sum_{- c \leq j \leq c, j \neq 0} log P (l_{i + j} ∣ f (l_{i})) .

(A4)

In the model, the node vector f(l_i) is projected to an N dimensional output vector u_i as shown in Fig. A.1. The N dimensions of u_i are associated to the N nodes in the corpus. Then, we use the softmax function [48, 49, 68] to map the entries of u_i into probabilities, which all together give a probability distribution. For example, the probability of the rth entry of u_i given f(l_i) is

P (v_{r} ∣ f (l_{i})) = \frac{exp (u_{i}^{r})}{\sum_{r^{'} \in N} exp (u_{i}^{r^{'}})},

(A5)

where $(u_{i}^{r})$ is rth entry of u_i and P(υ_r | f(l_i)) is the probability of node υ_r to be the context of l_i. According to Eq. A5, nodes in H have higher probabilities to be the context of node l_i. Combining Eq. A4 and Eq. A5, we have the loss function [69, 70]

E = - min_{_{f}} \sum_{- c \leq j \leq c, j \neq 0} u_{i + j} + log (\sum_{r^{'} \in N} exp (u_{i}^{r^{'}})) .

(A6)

which is applied to every node in the sequence. Eq. A6 is optimized by using the stochastic gradient descent approach [44, 48, 49], which backpropagates [68] errors to update the elements of the matrices W₁ and W₂ in Fig. 2.

The method we have introduced falls in the category of unsupervised learning, in which we learn node vectors from nodal data. The node vectors can be used to extract networks or detect nodes with similar properties.

FIG. A.1. — The three-layer neural network model. Each input node is associated with an N dimensional one-hot vector [47], which is mapped to the node vector f(υ_i) (the hidden layer) of dimension d by matrix W₁. The hidden layer is mapped to the output vector by matrix W₂. The elements of W₁ and W₂ are initialized with random values, which are expected to be optimized by backpropagation [48].

Appendix B: Synthetic datasets

The four synthetic datasets for the two case studies are generated according to Tables. VI, VII, VIII, and IX.

TABLE VI.

The synthetic dataset 1

group	Conditions
group	M ₁	M ₂	M ₃	M ₄	M ₅	M ₆
G ₁	A	B	C	D	E	R
G ₂	B	C	D	E	A	R
G ₃	C	D	E	A	B	R
G ₄	D	E	A	B	C	R
G ₅	E	A	B	C	D	R

Open in a new tab

TABLE VII.

The synthetic dataset 2

group	Conditions
group	M ₁	M ₂	M ₃	M ₄	M ₅	M ₆
G ₁	A	B	C	D	E	R
G ₂	A	B	D	E	D	R
G ₃	B	C	E	A	C	R
G ₄	C	D	A	B	B	R
G ₅	D	E	B	C	A	R

Open in a new tab

TABLE VIII.

The synthetic dataset 3

group	Conditions
group	M ₁	M ₂	M ₃	M ₄	M ₅	M ₆
G ₁	A	B	C	D	E	R
G ₂	A	B	C	E	D	R
G ₃	B	C	D	A	C	R
G ₄	C	D	E	B	A	R
G ₅	D	E	B	C	B	R

Open in a new tab

TABLE IX.

The synthetic dataset 4

group	Conditions
group	M ₁	M ₂	M ₃	M ₄	M ₅	M ₆
G ₁	A	B	C	D	E	R
G ₂	A	B	C	D	D	R
G ₃	B	C	D	E	C	R
G ₄	C	D	E	A	B	R
G ₅	D	E	B	C	A	R

Open in a new tab

Appendix C: Parameter choice

We study the influence of ρ on the relative distance between G₁ and G₂, and the results are shown in Table. X and Table. XI. We observe that the distance between the two communities slightly increases when we employ a low value of ρ since a small ρ encourages adding nodes that also exist in the context set of the previous node. As a result, far away nodes will become closer, reflected in the reduced distance between the two communities. However, the results are not significantly influenced by ρ since the relative distances of two communities are maintained at the same level for the same data. Therefore, we recommend using ρ = 1 in most cases.

TABLE X.

The relative distance between G₁ and G₂ w.r.t. ρ

Data.	1/10	1/5	1/3	3	5	10
1	2.23	2.22	2.19	2.18	2.17	2.07
2	1.84	1.83	1.82	1.80	1.78	1.67
3	1.53	1.52	1.52	1.48	1.42	1.32
4	1.22	1.21	1.20	1.17	1.16	1.13

Open in a new tab

TABLE XI.

The prediction accuracy w.r.t. ρ

Data.	1/10	1/5	1/3	3	5	10
1	98.24	98.24	98.72	98.80	98.90	98.82
2	98.26	97.22	98.68	98.00	97.30	98.12
3	97.52	97.48	97.72	98.22	97.58	97.32
4	78.14	78.58	78.62	79.72	79.32	79.78

Open in a new tab

Appendix D: Study the REM approach with AUC metrics on real data

In this part, we use two real datasets with both network structure and node activity data to study the proposed approach. It is often hard to quantitatively determine the relationships between the network structure and node activity data because they describe the properties of nodes from different aspects. In the experiments, we learn node vectors from the node activity data, compute similarity and construct networks. The constructed network is compared to the network structures, and we use the AUC to evaluate our REM approach.

a. The cora dataset.

The cora dataset [72] contains a sparse citation network with 2708 nodes and 5278 edges (the edge density is 0.144%), where nodes represent publications and edges represent the citation relationships between the papers. Each node in the network is described by a 0/1-valued word vector, indicating the absence/presence of the corresponding word from a dictionary. The dictionary consists of 1433 unique words presented at least ten times in one of the 2708 publications.

b. The pubmed dataset.

The pubmed dataset [73] contains a sparse citation network with 19717 nodes and 44324 edges (the edge density is 0.0228%), where nodes represent publications and edges represent the citation relationships between the papers. Each node in the network is described by a TF/IDF weighted word vector from a dictionary consisting of 500 words.

The two networks represent citation relationships between the publications (nodes), while the node activity data are extracted from the content of each publication. We implement the proposed approach on these two real datasets to generate node vectors (128 dimensions). Then, we calculate the similarity for every node pair, and the performance of the approach is evaluated by comparing it to the true citation networks. The AUCs of the two datasets are respectively 0.81 and 0.73. Though the true relationship between network structure and node activity data is unknown, the results reveal that the node activity data are related to the network structure. Therefore, one of the advantages of our approach is that it allows us to compare two different types of data.

Footnotes

Code and data are avalilable at: https://github.com/BigBroKuang/embed-data-to-vector

References

[1].Pastor-Satorras R, Castellano C, Van Mieghem P, and Vespignani A, Rev. Mod. Phys 87, 925 (2015). [Google Scholar]
[2].Gysi DM, Do Valle Í, Zitnik M, Ameli A, Gan X, Varol O, Ghiassian SD, Patten JJ, Davey RA, Loscalzo J, and Barabási AL, Proc. Natl. Acad. Sci 118, 19 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
[3].De Bacco C, Power EA, Larremore DB, and Moore C, Phys. Rev. E 95, 042317 (2017). [DOI] [PubMed] [Google Scholar]
[4].Kuang J and Scoglio C, Phys. Rev. E 104, 024301 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
[5].Young J, Cantwell GT, and Newman MEJ, Journal of Complex Networks 8, cnaa046 (2020). [Google Scholar]
[6].Newman MEJ, Phys. Rev. E 98, 062321 (2018). [Google Scholar]
[7].Decelle A, Krzakala F, Moore C, and Zdeborová L, Phys. Rev. E 84, 066106 (2011). [DOI] [PubMed] [Google Scholar]
[8].Guimerá R and Sales-Pardo M, Proc. Natl. Acad. Sci. U.S.A 106, 22073 (2009). [DOI] [PMC free article] [PubMed] [Google Scholar]
[9].Peixoto TP, Phys. Rev. E 97, 012306 (2018). [DOI] [PubMed] [Google Scholar]
[10].Peixoto TP, Phys. Rev. X 8, 041011 (2018). [Google Scholar]
[11].Prasse B and Van Mieghem P, arXiv:1807.08630
[12].Peixoto TP, arXiv:1705.10225
[13].Karrer B and Newman ME, Phys. Rev. E 83, 016107 (2011) [DOI] [PubMed] [Google Scholar]
[14].Peixoto TP, Phys. Rev. Lett 123, 128301 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
[15].Timme M, Phys. Rev. Lett 98, 224101 (2007). [DOI] [PubMed] [Google Scholar]
[16].Shandilya SG and Timme M, New J. Phys 13, 013004 (2011). [Google Scholar]
[17].Van Mieghem P and Liu Q, Phys. Rev. E 100, 022317 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
[18].Boccaletti S, Bianconi G, Criado R, del Genio CI, Gómez-Gardeñes J, and Romance M, Phys Rep. 544, 1 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
[19].Newman MEJ, Nature Phys 14, 542 (2018). [Google Scholar]
[20].MacCallum RM, Redmond SN, and Christophides GK, BMC Genomics 14, 12, 620 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]
[21].Horvath S and Dong J, PLoS computational biology 4, 8, e1000117 (2008). [DOI] [PMC free article] [PubMed] [Google Scholar]
[22].Lynall ME, Bassett DS, Kerwin R, McKenna PJ, Kitzbichler M, Muller U, and Bullmore E, Journal of Neuroscience 30, 28, 9477 (2010). [DOI] [PMC free article] [PubMed] [Google Scholar]
[23].Raimondo S, and De Domenico M, Phys. Rev. E 103, 022311 (2021). [DOI] [PubMed] [Google Scholar]
[24].Sugihara G, May R, Ye H, Hsieh CH, Deyle E, Fogarty M, and Munch S, science 338, 6106, 496 (2012). [DOI] [PubMed] [Google Scholar]
[25].Bullmore E and Sporns O, Nat. Rev. Neurosci 10, 186 (2009). [DOI] [PubMed] [Google Scholar]
[26].Benigni B, Ghavasieh A, Corso A, d’Andrea V, and De Domenico M, Network Neuroscience 5, 3, 831 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
[27].Schiefer J, Niederbuhl A, Pernice V, Lennartz C, Hennig J, LeVan P, and Rotter S, 2018. PLoS computational biology 14, 3, e1006056 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
[28].Jeong J, Gore JC, and Peterson BS, Clinical neurophysiology 112, 5, 827 (2001). [DOI] [PubMed] [Google Scholar]
[29].Namaki A, Shirazi AH, Raei R, R. and Jafari GR, Physica A 390, 21–22, 3835 (2011). [Google Scholar]
[30].Yamasaki K, Gozolchiani A, and Havlin S, Phys. Rev. Lett 100, 228501 (2008). [DOI] [PubMed] [Google Scholar]
[31].Donges JF, Zou Y, Marwan N, and Kurths J, EPL 87, 48007, (2009). [Google Scholar]
[32].Young JG, Kirkley A, and Newman MEJ, Phys. Rev. E 105, 014312 (2022). [DOI] [PubMed] [Google Scholar]
[33].Kovács IA, Barabási DL, and Barabási AL, Proc. Natl. Acad. Sci 117, 33570 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
[34].Oldham MC, Horvath S, and Geschwind DH, Proc. Natl. Acad. Sci 103, 17973 (2006). [DOI] [PMC free article] [PubMed] [Google Scholar]
[35].Zhang B, and Horvath S, Statistical applications in genetics and molecular biology, 4, 1,(2005) [DOI] [PubMed] [Google Scholar]
[36].Song L, Langfelder P, and Horvath S, BMC bioinformatics, 13, 1 (2012) [DOI] [PMC free article] [PubMed] [Google Scholar]
[37].Margolin AA, Nemenman I, Basso K, Wiggins C, Stolovitzky G, Dalla Favera R, and Califano A, BMC Bioinformatics 7, S7 (2006). [DOI] [PMC free article] [PubMed] [Google Scholar]
[38].Ramirez R, Chiu YC, Hererra A, Mostavi M, Ramirez J, Chen Y, Huang Y, and Jin YF, Frontiers in physics 8, 203 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
[39].Reverter A and Chan EK, Bioinformatics 24, 21, 2491 (2008). [DOI] [PubMed] [Google Scholar]
[40].García S, Luengo J, and Herrera F, Intelligent Systems Reference Library, 72. (2015). [Google Scholar]
[41].Kuang J, Buchon N, Michel K, and Scoglio C, BMC Bioinformatics 23, 170 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
[42].Dresp-Langley B, Ekseth OK, Fesl J, Gohshi S, Kurz M, and Sehring H-W, Appl. Sci, 9, 3065. (2019). [Google Scholar]
[43].Van Dam S, Vosa U, van der Graaf A, Franke L, and de Magalhaes JP, Briefings in bioinformatics, 19, 4, 575 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
[44].Grover A and Leskovec J, In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD ‘16) 855, (2016). [Google Scholar]
[45].Perozzi B, Al-Rfou R, and Skiena S, In Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining (KDD ‘14) 701, (2014). [Google Scholar]
[46].Tang J, Qu M, Wang M, Zhang M, Yan J, and Mei Q, In Proceedings of the 24th International Conference on World Wide Web (WWW ‘15) 1067, (2015). [Google Scholar]
[47].Peters ME, Neumann M, Iyyer M, Gardner M, Clark C, Lee K, and Zettlemoyer L, ArXiv:1802.05365
[48].Xin R, ArXiv:1411.2738
[49].Mikolov T, Sutskever I, Chen K, Corrado G, and Dean J, ArXiv:1310.4546
[50].Lee MJ, Lee E, Lee B, Jeong H, Lee DS, and Lee SH, Phys. Rev. Research 3, 043136 (2021) [Google Scholar]
[51].De Domenico M, Nicosia V, Arenas A, and Latora V, Nat. Commun 6, 6864 (2015) [DOI] [PubMed] [Google Scholar]
[52].Shamshad A, Bawadi MA, Wan Hussin WMA, Majid TA, and Sanusi SAM, Energy 30, 5, 693 (2005) [Google Scholar]
[53].Barkan O and Koenigstein N, 2016 IEEE 26th International Workshop on Machine Learning for Signal Processing (MLSP), (2016) [Google Scholar]
[54].Grbovic M and Cheng H, In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD ‘18),(2018) [Google Scholar]
[55].Hill MO, Ecology 54, 427 (1973). [Google Scholar]
[56].Zyczkowski K, Syst. Inf. Dyn 10, 297 (2003). [Google Scholar]
[57].Hill RJ, Journal of Econometrics 130, 1, 25, (2006). [Google Scholar]
[58].Chao A, Gotelli NJ, Hsieh TC, Sander EL, Ma KH, Colwell RK, and Ellison AM, Ecological Monographs, 84, 45, (2014). [Google Scholar]
[59].Jost L, Oikos 113, 363 (2006). [Google Scholar]
[60].Wold S, Esbensen K, and Geladi P, Chemometrics and intelligent laboratory systems 2, 1–3, 37 (1987). [Google Scholar]
[61].Filzmoser P, Hron K, and Reimann C, The Official Journal of the International Environmetrics Society 20, 6, 621 (2009). [Google Scholar]
[62].Likas A, Vlassis N, and Verbeekb JJ, Pattern Recognition 36, 2, 451, (2003). [Google Scholar]
[63].Baltakys K, Kanniainen J, and Emmert-Streib F, Sci Rep 8, 8198 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
[64].Ball B, Karrer B, and Newman ME, Phys. Rev. E 84, 036103 (2011). [DOI] [PubMed] [Google Scholar]
[65].Wang MH, Marinotti O, Vardo-Zalik A, Boparai R, and Yan G, PLoS one, 6, e26011 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]
[66].Rogers DW, Whitten MM, Thailayil J, Soichot J, Levashina EA, and Catteruccia F, Proc. Natl. Acad. Sci 105, 19390 (2008). [DOI] [PMC free article] [PubMed] [Google Scholar]
[67].Van Der Maaten L, In Artificial Intelligence and Statistics, 384 (2009). [Google Scholar]
[68].LeCun Y, Bengio Y, and Hinton G, Nature 521, 436 (2015) [DOI] [PubMed] [Google Scholar]
[69].Wang D, AI Magazine 22, 2, 10, (2001). [Google Scholar]
[70].Wang Q, Ma Y, Zhao K, and Tian Y, Annals of Data Science, 9, 187 (2022) [Google Scholar]
[71].Ruan J, Dean AK, and Zhang W, BMC Syst. Biol 4, 8 (2010). [DOI] [PMC free article] [PubMed] [Google Scholar]
[72].The cora dataset, https://relational.fit.cvut.cz/dataset/CORA.
[73].The pubmed dataset, https://relational.fit.cvut.cz/dataset/PubMed_Diabetes.

[R1] [1].Pastor-Satorras R, Castellano C, Van Mieghem P, and Vespignani A, Rev. Mod. Phys 87, 925 (2015). [Google Scholar]

[R2] [2].Gysi DM, Do Valle Í, Zitnik M, Ameli A, Gan X, Varol O, Ghiassian SD, Patten JJ, Davey RA, Loscalzo J, and Barabási AL, Proc. Natl. Acad. Sci 118, 19 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R3] [3].De Bacco C, Power EA, Larremore DB, and Moore C, Phys. Rev. E 95, 042317 (2017). [DOI] [PubMed] [Google Scholar]

[R4] [4].Kuang J and Scoglio C, Phys. Rev. E 104, 024301 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R5] [5].Young J, Cantwell GT, and Newman MEJ, Journal of Complex Networks 8, cnaa046 (2020). [Google Scholar]

[R6] [6].Newman MEJ, Phys. Rev. E 98, 062321 (2018). [Google Scholar]

[R7] [7].Decelle A, Krzakala F, Moore C, and Zdeborová L, Phys. Rev. E 84, 066106 (2011). [DOI] [PubMed] [Google Scholar]

[R8] [8].Guimerá R and Sales-Pardo M, Proc. Natl. Acad. Sci. U.S.A 106, 22073 (2009). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R9] [9].Peixoto TP, Phys. Rev. E 97, 012306 (2018). [DOI] [PubMed] [Google Scholar]

[R10] [10].Peixoto TP, Phys. Rev. X 8, 041011 (2018). [Google Scholar]

[R11] [11].Prasse B and Van Mieghem P, arXiv:1807.08630

[R12] [12].Peixoto TP, arXiv:1705.10225

[R13] [13].Karrer B and Newman ME, Phys. Rev. E 83, 016107 (2011) [DOI] [PubMed] [Google Scholar]

[R14] [14].Peixoto TP, Phys. Rev. Lett 123, 128301 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R15] [15].Timme M, Phys. Rev. Lett 98, 224101 (2007). [DOI] [PubMed] [Google Scholar]

[R16] [16].Shandilya SG and Timme M, New J. Phys 13, 013004 (2011). [Google Scholar]

[R17] [17].Van Mieghem P and Liu Q, Phys. Rev. E 100, 022317 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R18] [18].Boccaletti S, Bianconi G, Criado R, del Genio CI, Gómez-Gardeñes J, and Romance M, Phys Rep. 544, 1 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R19] [19].Newman MEJ, Nature Phys 14, 542 (2018). [Google Scholar]

[R20] [20].MacCallum RM, Redmond SN, and Christophides GK, BMC Genomics 14, 12, 620 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R21] [21].Horvath S and Dong J, PLoS computational biology 4, 8, e1000117 (2008). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R22] [22].Lynall ME, Bassett DS, Kerwin R, McKenna PJ, Kitzbichler M, Muller U, and Bullmore E, Journal of Neuroscience 30, 28, 9477 (2010). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R23] [23].Raimondo S, and De Domenico M, Phys. Rev. E 103, 022311 (2021). [DOI] [PubMed] [Google Scholar]

[R24] [24].Sugihara G, May R, Ye H, Hsieh CH, Deyle E, Fogarty M, and Munch S, science 338, 6106, 496 (2012). [DOI] [PubMed] [Google Scholar]

[R25] [25].Bullmore E and Sporns O, Nat. Rev. Neurosci 10, 186 (2009). [DOI] [PubMed] [Google Scholar]

[R26] [26].Benigni B, Ghavasieh A, Corso A, d’Andrea V, and De Domenico M, Network Neuroscience 5, 3, 831 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R27] [27].Schiefer J, Niederbuhl A, Pernice V, Lennartz C, Hennig J, LeVan P, and Rotter S, 2018. PLoS computational biology 14, 3, e1006056 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R28] [28].Jeong J, Gore JC, and Peterson BS, Clinical neurophysiology 112, 5, 827 (2001). [DOI] [PubMed] [Google Scholar]

[R29] [29].Namaki A, Shirazi AH, Raei R, R. and Jafari GR, Physica A 390, 21–22, 3835 (2011). [Google Scholar]

[R30] [30].Yamasaki K, Gozolchiani A, and Havlin S, Phys. Rev. Lett 100, 228501 (2008). [DOI] [PubMed] [Google Scholar]

[R31] [31].Donges JF, Zou Y, Marwan N, and Kurths J, EPL 87, 48007, (2009). [Google Scholar]

[R32] [32].Young JG, Kirkley A, and Newman MEJ, Phys. Rev. E 105, 014312 (2022). [DOI] [PubMed] [Google Scholar]

[R33] [33].Kovács IA, Barabási DL, and Barabási AL, Proc. Natl. Acad. Sci 117, 33570 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R34] [34].Oldham MC, Horvath S, and Geschwind DH, Proc. Natl. Acad. Sci 103, 17973 (2006). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R35] [35].Zhang B, and Horvath S, Statistical applications in genetics and molecular biology, 4, 1,(2005) [DOI] [PubMed] [Google Scholar]

[R36] [36].Song L, Langfelder P, and Horvath S, BMC bioinformatics, 13, 1 (2012) [DOI] [PMC free article] [PubMed] [Google Scholar]

[R37] [37].Margolin AA, Nemenman I, Basso K, Wiggins C, Stolovitzky G, Dalla Favera R, and Califano A, BMC Bioinformatics 7, S7 (2006). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R38] [38].Ramirez R, Chiu YC, Hererra A, Mostavi M, Ramirez J, Chen Y, Huang Y, and Jin YF, Frontiers in physics 8, 203 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R39] [39].Reverter A and Chan EK, Bioinformatics 24, 21, 2491 (2008). [DOI] [PubMed] [Google Scholar]

[R40] [40].García S, Luengo J, and Herrera F, Intelligent Systems Reference Library, 72. (2015). [Google Scholar]

[R41] [41].Kuang J, Buchon N, Michel K, and Scoglio C, BMC Bioinformatics 23, 170 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R42] [42].Dresp-Langley B, Ekseth OK, Fesl J, Gohshi S, Kurz M, and Sehring H-W, Appl. Sci, 9, 3065. (2019). [Google Scholar]

[R43] [43].Van Dam S, Vosa U, van der Graaf A, Franke L, and de Magalhaes JP, Briefings in bioinformatics, 19, 4, 575 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R44] [44].Grover A and Leskovec J, In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD ‘16) 855, (2016). [Google Scholar]

[R45] [45].Perozzi B, Al-Rfou R, and Skiena S, In Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining (KDD ‘14) 701, (2014). [Google Scholar]

[R46] [46].Tang J, Qu M, Wang M, Zhang M, Yan J, and Mei Q, In Proceedings of the 24th International Conference on World Wide Web (WWW ‘15) 1067, (2015). [Google Scholar]

[R47] [47].Peters ME, Neumann M, Iyyer M, Gardner M, Clark C, Lee K, and Zettlemoyer L, ArXiv:1802.05365

[R48] [48].Xin R, ArXiv:1411.2738

[R49] [49].Mikolov T, Sutskever I, Chen K, Corrado G, and Dean J, ArXiv:1310.4546

[R50] [50].Lee MJ, Lee E, Lee B, Jeong H, Lee DS, and Lee SH, Phys. Rev. Research 3, 043136 (2021) [Google Scholar]

[R51] [51].De Domenico M, Nicosia V, Arenas A, and Latora V, Nat. Commun 6, 6864 (2015) [DOI] [PubMed] [Google Scholar]

[R52] [52].Shamshad A, Bawadi MA, Wan Hussin WMA, Majid TA, and Sanusi SAM, Energy 30, 5, 693 (2005) [Google Scholar]

[R53] [53].Barkan O and Koenigstein N, 2016 IEEE 26th International Workshop on Machine Learning for Signal Processing (MLSP), (2016) [Google Scholar]

[R54] [54].Grbovic M and Cheng H, In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD ‘18),(2018) [Google Scholar]

[R55] [55].Hill MO, Ecology 54, 427 (1973). [Google Scholar]

[R56] [56].Zyczkowski K, Syst. Inf. Dyn 10, 297 (2003). [Google Scholar]

[R57] [57].Hill RJ, Journal of Econometrics 130, 1, 25, (2006). [Google Scholar]

[R58] [58].Chao A, Gotelli NJ, Hsieh TC, Sander EL, Ma KH, Colwell RK, and Ellison AM, Ecological Monographs, 84, 45, (2014). [Google Scholar]

[R59] [59].Jost L, Oikos 113, 363 (2006). [Google Scholar]

[R60] [60].Wold S, Esbensen K, and Geladi P, Chemometrics and intelligent laboratory systems 2, 1–3, 37 (1987). [Google Scholar]

[R61] [61].Filzmoser P, Hron K, and Reimann C, The Official Journal of the International Environmetrics Society 20, 6, 621 (2009). [Google Scholar]

[R62] [62].Likas A, Vlassis N, and Verbeekb JJ, Pattern Recognition 36, 2, 451, (2003). [Google Scholar]

[R63] [63].Baltakys K, Kanniainen J, and Emmert-Streib F, Sci Rep 8, 8198 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R64] [64].Ball B, Karrer B, and Newman ME, Phys. Rev. E 84, 036103 (2011). [DOI] [PubMed] [Google Scholar]

[R65] [65].Wang MH, Marinotti O, Vardo-Zalik A, Boparai R, and Yan G, PLoS one, 6, e26011 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R66] [66].Rogers DW, Whitten MM, Thailayil J, Soichot J, Levashina EA, and Catteruccia F, Proc. Natl. Acad. Sci 105, 19390 (2008). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R67] [67].Van Der Maaten L, In Artificial Intelligence and Statistics, 384 (2009). [Google Scholar]

[R68] [68].LeCun Y, Bengio Y, and Hinton G, Nature 521, 436 (2015) [DOI] [PubMed] [Google Scholar]

[R69] [69].Wang D, AI Magazine 22, 2, 10, (2001). [Google Scholar]

[R70] [70].Wang Q, Ma Y, Zhao K, and Tian Y, Annals of Data Science, 9, 187 (2022) [Google Scholar]

[R71] [71].Ruan J, Dean AK, and Zhang W, BMC Syst. Biol 4, 8 (2010). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R72] [72].The cora dataset, https://relational.fit.cvut.cz/dataset/CORA.

[R73] [73].The pubmed dataset, https://relational.fit.cvut.cz/dataset/PubMed_Diabetes.

PERMALINK

Feature Learning and Network Structure from Noisy Node Activity Data

Junyao Kuang

Caterina Scoglio

Kristin Michel

Abstract

I. INTRODUCTION

II. APPROACH

A. Generate context sets from node activity data

B. Generate random node sequences

C. Construct network from trained node vectors

III. RESULTS

A. Feature learning

a. Synthetic data.

b. Experimental results.

FIG. 1.

FIG. 2.

TABLE I.

TABLE II.

FIG. 3.

TABLE III.

TABLE IV.

FIG. 4.

c. Real data.

FIG. 5.

d. Experimental results.

FIG. 6.

FIG. 7.

B. Results of network extraction

FIG. 8.

FIG. 9.

TABLE V.

FIG. 10.

FIG. 11.

IV. CONCLUSION AND FUTURE WORKS

ACKNOWLEDGEMENTS

Appendix A: The skip-gram model

FIG. A.1.

Appendix B: Synthetic datasets

TABLE VI.

TABLE VII.

TABLE VIII.

TABLE IX.

Appendix C: Parameter choice

TABLE X.

TABLE XI.

Appendix D: Study the REM approach with AUC metrics on real data

a. The cora dataset.

b. The pubmed dataset.

Footnotes

References

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases