Network inference with infection frequency matrix by improved Bayesian method

Xin Jin; Yinghong Ma; Le Song; Ruhan Wei; Han Zhou

doi:10.1016/j.isci.2026.115462

. 2026 Mar 24;29(4):115462. doi: 10.1016/j.isci.2026.115462

Network inference with infection frequency matrix by improved Bayesian method

Xin Jin ¹, Yinghong Ma ¹, Le Song ^1,^2,^4,^∗, Ruhan Wei ³, Han Zhou ¹

PMCID: PMC13091019 PMID: 42006322

Summary

Most network inference methods based on epidemic spreading models rely on binary-state time series to reconstruct the underlying network structure. However, because binary-state time series only qualitatively describe node states and lack quantitative information on infection histories, accurate network reconstruction typically requires extensive iterative computation and suffers from low efficiency. To overcome this limitation, this work proposes a Bayesian network inference approach that converts binary-state time series into an infection frequency matrix encoding pairwise infection events and uses the resulting likelihood to jointly infer the contact network and transmission-related parameters. This infection frequency representation reduces computational complexity, improves inference accuracy, and enables fast, high-fidelity reconstruction of contact networks, providing a principled basis for optimizing intervention strategies in biological, social, and cyber-physical systems.

Subject areas: health sciences, computational bioinformatics

Graphical abstract

Highlights

•
A network inference method based on improved Bayesian method is proposed
•
The method with infection frequency matrix instead of binary-state time series
•
The infection frequency follows Poisson distribution that is demonstrated
•
The inference results display both of network structure and propagation parameters

health sciences; computational bioinformatics

Introduction

Network-centric analyses provide a unifying framework for dissecting how nodes influence one another during contagion processes under heterogeneous biological, behavioral, and socio-economic drivers.¹^,² From historic scourges such as smallpox and plague to modern pandemics including AIDS, Ebola, and COVID-19,³^,⁴^,⁵^,⁶ limiting physical contact through targeted distancing measures remains the most effective lever for curbing transmission.⁷^,⁸^,⁹ However, within these underlying contact networks, the microscopic influence pathways that dictate who infects whom are rarely observable. Instead, they must be reconstructed from the macroscopic footprints left in temporal records of past outbreaks.¹⁰^,¹¹^,¹² This inverse problem, propagation network inference, lies at the heart of statistical-physics approaches to complex systems, demanding principled methods that transform incomplete and noisy hidden transmission paths into manifested maps.

Network inference aims to quantify the likelihood of connections between nodes and estimate network structure based on observed interaction data. This approach is widely used to uncover latent or unobservable network structures, such as equity networks among financial institutions,¹³ interaction networks among humans or animals,¹⁴ mutualistic networks between plants and pollinators,¹⁵ and gene regulatory networks.¹⁶^,¹⁷^,¹⁸ Manuel et al. pioneered the conceptual framework of network inference in the study of information and epidemic propagation dynamics in 2012.¹⁹ Building upon this foundation, researchers have developed numerous new algorithms by considering factors such as network sparsity, topic frequency, and community structure.²⁰^,²¹^,²² Recent studies have further advanced diffusion-based network inference by introducing probabilistic and continuous-time frameworks that enable more accurate estimation of influence probabilities and network topology.²³^,²⁴^,²⁵ However, these methods model transmission as tree-like processes and are highly dependent on identifying propagation sources.²⁶ When source information is unavailable or multiple transmission sources exist, these methods face limitations in inferring network structures. To address these limitations, Shen et al. proposed an approach for network structure inference based on binary-state time series analysis through the application of compressed sensing methodology.²⁷ Recognizing that compressed sensing methods may not fully capture all effective network information, subsequent work incorporated the least absolute shrinkage and selection operator (LASSO) method to enhance network inference accuracy and broadened its applicability to various binary-state time series systems.²⁸^,²⁹ To address the potential information loss inherent in these methods, Zhang and Ma et al. developed an expectation-maximization method (EM method) that preserves complete temporal information from epidemic infection dynamics, thereby enabling more comprehensive network inference.³⁰^,³¹ This methodology was subsequently extended to higher-order network structures.³² These methods infer edge connectivity by examining temporal relationships between node state transitions, whereby inference accuracy is positively correlated with time series length. Since these methods require pairwise computations across all elements in the binary-state time series, the computational time increases as it grows larger.

Traditional methods for network inference rely on binary-state time series to identify potential edge relationships between nodes, requiring element-by-element comparison of binary-state time series between each pair of nodes, where model accuracy exhibits strong positive correlation with time series length.³³^,³⁴^,³⁵ Furthermore, the computational cost increases exponentially with the length of time series, as demonstrated in Figure 1. When the number of nodes is 20, it costs 1,000 time steps to reach the accuracy 89%. That means EM method costs more time to reach the higher accuracy of the real adjacency matrix A.

Performance of network inference using the EM method and cumulative time

The image at the top left shows the true adjacency matrix A. The image at the bottom left is the time cost. With the EM method, cumulative time likes exponential growth with the length of the time series. The four images at the right show the accuracy of inference results obtained by the EM method at four different time costs. When w = 100, the accuracy of inference is 32.61%; When w = 200, the accuracy of inference is 43.48%; When w = 500, the accuracy of inference is 80.43%; When w = 1000, the accuracy of inference is 89.13%. As shown in the figures, the accuracy of the EM method is correlated with the length of the time series.

To overcome these limitations, network inference with the infection frequency matrix (NIIFM) is proposed in this work. This method considers the probability distribution of epidemic transmission frequencies between connected node pairs. Rather than directly using binary-state time series as input, NIIFM transforms them into an infection frequency matrix (IF matrix) as the inference statistic. The detailed procedure for the transformation is presented in Figure 2 and Algorithm 1. IF matrix quantifies edge existence probability between node pairs and enables more effective extraction of network structural information embedded within the binary-state time series. Compared to traditional methods, NIIFM demonstrates superior inference performance under small sample conditions and substantially lower computational overhead when dealing with large the length of time series. Furthermore, the Bayesian method is employed, which not only provides posterior probability distributions for the network adjacency matrix but also yields posterior probabilities for additional parameters, enabling simultaneous inference of network structures while evaluating epidemic severity and impact. NIIFM facilitates more accurate and efficient network structure inference with substantially reduced data requirements.

Schematic illustration of transforming a binary-state time series matrix into an infection frequency matrix

This figure illustrates how to transform a binary-state time series matrix S_w×n into an infection frequency matrix M_n×n. Taking node 2 as an example: (a). The second column records all the states of node 2, where 0 means the node is in the susceptible(S) state and 1 means the node is in the infection (I) state; (b). Find the times at which node 2 transitions from S to I state, w = 2, w = 4, w = 6; (c). Examine the I state of all other nodes (3, 4, 5); (d). Record the corresponding frequencies at which node 2 was infected by nodes 3, 4, and 5, which are 3, 1, and 2, respectively, in the second row of the M matrix.

Algorithm 1. Construction of the Infection Frequency (IF) Matrix.

Input: S_w×n

Output: M_n×n

Initialize n sets ${S_{1}^{'}, S_{2}^{'}, \dots, S_{n}^{'}}$ based on the column dimension n of matrix S to record the infection timestamps for each node.

for i = 1,2,⋯,n do

Identify the infection timestamps for each node;

For t = 1,2,⋯w do

While S_i,t = 0 do

If S_i,t+1 = 1 then

$S_{i}^{'} = t$

Else

t = t+1

End if

End while

End for

For all $t \in S_{i}^{'}$ do

For j = 1,2,⋯,n do

M_ij = 0

If S_ij = 1 then

Update M_ij according to infection frequency

End if

j = j+1

End for

This work has the following contributions: (1) a network inference method based on an improved Bayesian approach has been proposed; (2) in Bayesian methods, the IF matrix replaces binary-state time series to optimize inference performance; (3) it has been proven that the IF in the IF matrix follows a Poisson distribution, which is consistent with reality; and (4) the network inference based on IF matrix is more accurate and efficient compared to other methods.

Results

Performance of usability and fault tolerance on NIIFM

The primary objective of this study is to determine the consistency between the inferred adjacency matrix $\tilde{A}$ and the actual adjacency matrix A. To evaluate this consistency, we employ several widely used metrics, including the area under the receiver operating characteristic curve (AUROC), the area under the precision-recall curve (AUPR),³⁶ and the F1 score.³⁷ All necessary notations and their corresponding definitions are listed in the Table 1.

Table 1.

Notations with their definitions

Notations	Definitions
n,w	The number of the network nodes and the simulated times respectively.
S_w×n,S_ij	The state matrix and the state of node i at time j respectively. Where, $S_{w \times n} = {(S_{i j})}_{w \times n}$ . S_ij = 1 if node i is infected at moment j, otherwise S_ij = 0.
M_n×n,M_ij	M_n×n is the infection frequency matrix, M_ij is the times of node i infected by node j, where $M_{n \times n} = {(M_{i j})}_{n \times n}$ .
A_n×n,A_ij	The adjacency matrix with its elements. Where $A_{n \times n} = {(A_{i j})}_{n \times n}$ . And if there is an edge between nodes i and j, A_ij = 1, otherwise, A_ij = 0.
β,c,ρ	The probability of epidemic transmission, the steady-state ratio of the infected nodes, and the prior probability of edge existence in networks respectively, β,c,ρ are the values in interval [0,1].
k_i,Z_i	The degree of node i, the expectations of node i would be infected, where $Z_{i} = 1 - {(1 - β)}^{k_{i} c}$ .

Open in a new tab

Performance of usability on NIIFM

Given the sparse structure of adjacency matrices, direct element-wise comparison between $\tilde{A}$ and A provides misleading performance assessments. For a network with n = 100, ⟨k⟩ = 10, even a null predictor (all zeros) achieves 90% accuracy. Therefore, F1 scores are employed for more reliable performance evaluation. To reduce randomness in the simulation process, all F1 scores are averaged over five independent runs.

In Figure 3, networks comprising 50, 100, and 200 nodes were constructed using three different models: ER, WS, and BA. All networks were configured with an average degree of 10. The adjacency matrix of each network is denoted as A. Epidemic transmission was subsequently simulated on these networks to generate binary-state time series. These time series were then transformed into an IF matrix, and Bayesian methods were employed to infer the network structure, resulting in an inferred adjacency matrix $\tilde{A}$ . Comparison between the original and inferred matrices enabled calculation of F1 scores. As illustrated in the figures, the proposed method achieves near-perfect inference of homogeneous networks, with inferred and actual networks showing almost complete consistency. For heterogeneous networks, although the inference precision is not as high as for homogeneous networks, the F1 scores still exceed 0.95. To assess its scalability to larger networks, we conducted a single experiment on ER and BA networks with n = 500, ⟨k⟩ = 6. The corresponding results are provided in Figure S1. Meanwhile, susceptible infected recovered susceptible model, as a replacement for SIS model, further tested the performance of usability of NIIFM under different epidemic propagation models, and the specific results are shown in Figure S2.

Performance of NIIFM on synthetic networks across network sizes

This figure shows how the F1 score of network inference by NIIFM changes with increasing time series length w on BA, WS, and ER networks with different numbers of nodes n.

(A) n = 50, ⟨k⟩ = 10.

(B) n = 100, ⟨k⟩ = 10.

(C) n = 200, ⟨k⟩ = 10.

Performance of fault tolerance on NIIFM

Real-world binary-state time series often suffer from noise contamination and partial observability, where data from certain nodes may be missing or inaccessible. To validate the fault tolerance of the NIIFM against such data imperfections, a systematic perturbation analysis is conducted. Specifically, controlled noise is introduced by randomly flipping the states of a predetermined fraction of nodes in the time series. The network structures inferred from these binary-state time series are then compared with the ground truth to evaluate the method’s resilience to data corruption.

Two scenarios corresponding to data corruption and data omission during statistical collection are examined. The data corruption scenario is illustrated in Figure 4, where the states of a specified proportion of elements in the time series are randomly flipped to generate error-contaminated sequences for network inference. Performance is then evaluated by calculating F1 scores, AUROC, and AUPR between the inferred adjacency matrix $\tilde{A}$ and the truth adjacency matrix A. The results demonstrate that even with 10% data corruption, the inference maintains F1 scores exceeding 80% for both homogeneous ER networks and heterogeneous BA networks.

Fault tolerance of NIIFM under data corruption scenarios

This figure illustrates the fault-tolerance evaluation of NIIFM on BA and ER networks under time series recording errors of 2%, 4%, 6%, 8%, and 10%, using the F1 score, AUROC, and AUPR. The networks consist of n = 100 nodes with an average degree of ⟨k⟩ = 10.

(A) BA network.

(B) ER network.

The missing data scenario is presented in Figure 5. Incomplete time series are generated by randomly removing a specified proportion of elements for network inference testing. The results show that with 10% data missing, the F1 score for BA networks can reach 0.9, while ER networks are almost unaffected.

Fault tolerance of NIIFM under data omission scenarios

This figure illustrates the fault-tolerance evaluation of NIIFM on BA and ER networks under time-series data omission of 2%, 4%, 6%, 8%, and 10%, using the F1 score, AUROC, and AUPR. The networks consist of n = 100 nodes with an average degree of ⟨k⟩ = 10.

(A) BA network.

(B) ER network.

It is evident that in both BA and ER networks, the results of the second scenario outperform those of the first scenario. This is likely attributable to the fact that when randomly flipping the state matrix, some states transition from 0 to 1 while others transition from 1 to 0, thereby neutralizing the overall impact. In contrast, data removal converts nodes in state 1 to state 0, resulting in a more significant influence on the outcomes. However, since NIIFM does not rely on binary-state time series inference directly, but rather transforms binary-state time series into an IF matrix for inference, minor alterations in several positions of the binary-state time series have limited impact on the IF matrix. This characteristic is the primary reason NIIFM can still accurately infer network structures even when 10% of the data exhibits anomalies.

Performance of parameter inference

The advantage of employing Bayesian methods lies in their ability to express the posterior probability distribution of parameters. After deriving the posterior distributions of the transmission probability β and edge probability ρ, these parameters can be estimated—a capability not achievable by previous methods.

Figures 6A–6C illustrate the estimation of transmission probability using binary time series obtained from disease propagation simulations on BA, WS, and ER networks, respectively. The networks consist of n = 100 nodes with an average degree of n = 100, ⟨k⟩ = 10 and the transmission probability is set to 0.1. The estimated values range from 0.732 to 0.1081, with BA networks exhibiting greater fluctuation, likely attributed to their inherent heterogeneity. Although the estimation accuracy is limited, the results provide a reasonable indication of disease transmission capacity. Figures 6D–6F demonstrate the estimation of edge connection probability ρ on ER networks with n = 100, β = 0.1 and varying rewiring probabilities. The results show accurate estimation of edge connection probabilities, confirming that the NIIFM method can effectively infer network structure and, consequently, precisely determine the connection probability in ER networks.

Performance of NIIFM in parameter inference

This figure presents performance of NIIFM in estimating the disease transmission probability β and the edge connection probability ρ. The red dashed line indicates the true parameter value, and the blue bar indicates estimated value in each time steps. (A, B, and C) inference results of epidemic transmission probability on BA, WS, and ER networks. (D, E, and F) inference results of edge connection probability in an ER network with connection probability of 0.1, 0.2, 0.3.

(A) BA network with β = 0.1.

(B) WS network with β = 0.1.

(C) ER network with β = 0.1.

(D) ER network with ρ = 0.1.

(E) ER network with ρ = 0.2.

(F) ER network with ρ = 0.3.

It is evident that whether estimating propagation probability or edge connection probability, the estimated values are relatively accurate. This accuracy is attributed to the Bayesian method, which treats parameters as random variables and transforms the parameter statistical inference problem into determining the posterior distribution of parameters. Through numerical simulation using the HMC, parameter values related to epidemic transmission are effectively estimated. When confronted with actual epidemic outbreaks, NIIFM facilitates the dual objectives of contact network inference and quantification of transmission dynamics and network structural properties.

Performance of centrality inference

Epidemic transmission control represents a practical application scenario for the NIIFM method. While the most straightforward approach would be to sever all transmission routes, this option is prohibitively costly. Therefore, selectively controlling certain nodes offers a more cost-effective solution. This necessitates comparing the centrality of different nodes to determine which transmission pathways to interrupt.³⁸^,³⁹^,⁴⁰ It is necessary to evaluate the inference quality by comparing the consistency of node rankings based on different centrality measures between the original and inferred networks. Specifically, the analysis involves ranking each node according to centrality values calculated from the adjacency matrices of both the original and inferred networks, and then compute the Kendall correlation coefficient to assess ranking consistency. As the Kendall correlation coefficient approaches 1, it indicates that the inferred network more accurately reconstructs the centrality characteristics of nodes in the original network.

Common centrality measures used in network science include closeness centrality,⁴¹ degree centrality,⁴² and eigenvector centrality.⁴³ In addition to these, NDIC (neighboring dynamical information-based centrality) is also considered as an indicator that ranks the influence of nodes in a network based on the status of their neighbors.⁴⁴ It builds on the centrality of the neighborhood and introduces the state of the neighborhood, we only compute the second neighborhood in NDIC; the formula is as follows:

C_{i}^{2} (θ) = θ_{i} + a \sum_{j \in Γ_{i}} θ_{j} - b \sum_{j \in Γ_{i}} θ_{j} ρ_{j} + a^{2} \sum_{l \in Γ_{j}} θ_{l} - b^{2} \sum_{l \in Γ_{j}} θ_{l} ρ_{l},

(Equation 1)

where α is the set of neighbors of node i and b represents the reference centrality of the neighbor node.

To validate the accuracy of the proposed method, three types of networks with 100 nodes and an average degree of 10 are employed: ER, BA, and WS networks. Additionally, two real networks are utilized: Polbooks and Paris School. The specific results are shown in Table 2. Given that node centrality is intimately connected with network structure, the Table 2 demonstrates that centrality estimation performs better in homogeneous networks where network structure inference is more accurate. Heterogeneous networks show slightly inferior performance; however, the Kendall correlation coefficients remain above 0.8 in most cases. With the exception of eigenvector centrality estimation in the Polbooks network, all other performance metrics are satisfactory. These findings facilitate the accurate identification of potential threats for controlling epidemic propagation during outbreak periods.

Table 2.

The Kendall correlation coefficient between original network inferential network

Network/centrality	Closeness	Degree	Eigenvector	NDIC
ER	1	1	1	0.989
WS	1	1	1	0.954
BA	0.906	0.943	0.864	0.869
Polbooks	0.982	0.993	0.776	0.944
Lyonschool	0.941	0.927	0.841	0.911

Open in a new tab

Comparative analysis with other methods

Epidemic propagation was simulated on both artificial networks (including ER, WS, and BA networks) and real-world networks (such as the Karate club and Dolphins social networks) to generate binary-state time series. Subsequently, we evaluated NIIFM against established network inference methods designed for binary epidemic time series, specifically CTS,²⁷ LASSO,²⁸ and EM.³⁰^,³²^,⁴⁵ To place NIIFM within a broader methodological context, we further benchmarked it against several representative approaches from related domains. These include four models from the classical Bayesian network inference framework¹⁴: the binomial-random graph model (BR), the binomial-soft configuration model (BS), the Poisson-random graph model (PR), and the Poisson-soft configuration model (PS). In addition, we compared NIIFM with two methods developed for inferring gene regulatory networks from time-series data, namely the and/or tree ensemble network (ATEN)¹⁷ and the neuroevolution-based Boolean network inference method (NNBNI).¹⁸

Comparison with other methods on the production network

This section presents a comparative analysis of seven network inference methods (NIIFM, EM, BR, BS, PR, PS, and NNBNI), evaluating both the inference accuracy as measured by F1 score and the computational time. As for methods like CTS, LASSO, and ATEN, where F1 score is not directly applicable, we will present a comparative assessment using other evaluation criteria in the following section.

Figure 7 presents the performance of the seven network inference methods on WS and BA networks. Across both homogeneous (WS) and heterogeneous (BA) networks, NIIFM consistently achieved highly accurate reconstruction. Notably, NIIFM obtained F1 scores above 0.9 with only w = 2,000 time steps and approached near-perfect accuracy as the sequence length increased to w = 10,000. In contrast, EM matched NIIFM only when the time series was sufficiently long, but its accuracy deteriorated substantially for shorter sequences. This performance differential arises from EM’s stronger dependence on mutual infection values between node pairs for inferring edge relationships, requiring a longer evolutionary process to reliably separate informative from non-informative interactions. The four Bayesian models (BR, BS, PR, and PS) performed reasonably well on WS networks and achieved relatively high F1 scores with shorter sequences. However, their performance dropped markedly on BA networks with higher degree heterogeneity, likely because these models were developed as general approaches and are not specifically optimized for the inference setting considered in this work. NNBNI also exhibited poor performance, likely due to its genetic algorithm (GA) and neural network (NN) components being originally optimized for gene regulatory network inference rather than for binary epidemic time series. Overall, these results indicate that NIIFM can reliably recover network structure across both homogeneous and heterogeneous networks, regardless of whether the available time series are short or long.

Comparison of NIIFM with other methods on WS and BA networks with n = 50

This figure compares the performance of NIIFM with six network inference methods (PSC, PE, BE, BSC, EM, and NNBNI) on WS and BA networks with n = 50, ⟨k⟩ = 10.

(A) WS network.

(B) BA network.

Figure 8 presents a comparative analysis of the accuracy and computational efficiency of the seven methods on ER networks. As shown in Figure 8A, NIIFM achieves consistently high accuracy across all time-series lengths, surpassing all other methods even when only limited data are available. Its F1 score approaches 1.0 as w increases, highlighting its rapid convergence and high inference fidelity. The EM method exhibits a similar trend, showing low accuracy for short sequences but gradually approaching NIIFM as the time series becomes longer. PSC and PE achieve relatively high accuracy for short sequences, comparable to NIIFM at small w, but their performance plateaus and do not improve with additional data. BR and BSC perform poorly when the time series is short and eventually stabilize around an F1 score of 0.8, reflecting limitations inherent to general-purpose Bayesian models that are not optimized for this inference setting. While NNBNI shows an increasing trend with w, its accuracy trails NIIFM, likely because its parameters are specifically calibrated for gene regulatory networks rather than the SIS dynamics analyzed here. Figure 8B shows the computational time of the seven methods on a logarithmic scale. Although the differences appear compressed visually, the true gaps are substantial. Owing to their relatively simple parameterization, the four Bayesian models require the least time, with PE completing inference in only 11 s and the others ranging from 45 to 69 s. NIIFM requires a moderate runtime of 334 s. In contrast, EM and NNBNI are significantly slower, taking 6,894 and 13,252 s, respectively. EM’s high computational cost arises from direct operations on the full time series, causing runtime to scale with sequence length, whereas NNBNI’s cost is dominated by the computational demands of neural network training.

Comparison of performance and runtime between NIIFM and other methods on ER networks with n = 50

This figure compares NIIFM with six network inference methods (PSC, PE, BE, BSC, EM, and NNBNI) on ER networks with n = 50, ⟨k⟩ = 10. Both inference performance and computational time are reported to highlight the differences among methods.

(A) F1 score.

(B) Average computational time (log scale).

Figure 9 shows the performance and computational cost of the seven inference methods when the network size increases to n = 100, ⟨k⟩ = 10. As shown in Figures 9A–9C, NIIFM continues to outperform all other approaches across WS, BA, and ER networks, achieving higher F1 scores than competing methods for both short and long time-series lengths. Although the EM method maintains its characteristic pattern, low accuracy for small w followed by gradual improvement as more data become available, its final accuracy is noticeably lower than in the n = 50 experiments. The four Bayesian baselines (PE, PSC, BE, and BSC) again exhibit their best performance on WS networks, in some cases even surpassing EM. However, their accuracy on BA and ER networks remains limited, reflecting the challenges posed by degree heterogeneity and the general-purpose nature of these models. NNBNI shows a performance trend similar to that observed at n = 50, with marginal improvements as w increases but overall limited accuracy, likely due to parameter settings tuned for gene regulatory network inference rather than epidemic dynamics. Figure 9D illustrates the average computational time for each method. As expected, runtime increases for all methods as network size grows. PE remains the fastest, completing inference in 45 s, followed by BE, PSC, and BSC, which fall within the range of 81–284 s; NIIFM requires 966 s. In contrast, EM and NNBNI experience dramatic increases in runtime, reaching 47,419 s and 69,725 s, respectively. These results indicate that while all methods become slower on larger networks, PE, PSC, BE, BSC, and NIIFM scale more favorably, whereas EM and NNBNI are heavily impacted by network size.

Comparison of NIIFM with other methods for inferring BA, WS, and ER networks with n = 100

This figure compares NIIFM with six network inference methods (PSC, PE, BE, BSC, EM, and NNBNI) on BA, WS, and ER networks with n = 100, ⟨k⟩ = 10. Inference performance is reported for all three network types, and the computational time for inferring ER network.

(A) BA network.

(B) WS network.

(C) ER network.

(D) Average computational time (log scale).

Comparison with other methods on the real network

The objective of this study is to evaluate the performance of nine methods, namely CST, Lasso, EM, NIIFM, BSC, BR, PSC, PR and ATEN in inferring network structures on four commonly used real networks: Karate,⁴⁶ Dolphins,⁴⁷ Polbooks,⁴⁸ and Footballs.⁴⁹ Table 3 lists the basic properties of these all networks in the table below.

Table 3.

Basic properties of networks

Network	Nodes	Edges	Avg. degree	Clustering coef.
Karate	34	78	4.588	0.571
Dolphins	62	159	5.129	0.259
Polbooks	105	441	8.400	0.488
Football	115	613	10.661	0.403
Hypertext2009	85	192	4.518	0.314
Lyonschool	222	602	5.423	0.282

Open in a new tab

The metrics used for comparison here are AUROC and AUPR. Because the links of each node are actually identified separately, the AUROC and AUPR are calculated for each node, this analysis employs the mean index values over all the nodes to characterize the reconstruction performance for the whole network.

As presented in Table 4, NIIFM exhibits the overall strongest performance among all nine inference approaches across the four benchmark networks when evaluated by AUROC and AUPR. In this comparison, the binary epidemic time series have length w = 10,000. In terms of averages over the four networks, NIIFM achieves the highest AUPR (0.9860), clearly outperforming CTS (0.9555), EM (0.9530), LASSO (0.8498), and the four Bayesian structure learning variants BS, BR, PS, and PR (0.3223–0.3463). NIIFM also attains the largest mean AUROC (0.9985), slightly higher than CTS (0.9885), EM (0.9793), and the ensemble-based approach ATEN (0.9970), indicating near-perfect discrimination between connected and unconnected node pairs.

Table 4.

Performance comparison of NIIFM with other methods

AUROC/AUPR	Karate	Dolphins	Polbooks	Footballs	Mean
CTS	0.995/0.993	0.999/0.996	0.976/0.918	0.984/0.915	0.9885/0.9555
LASSO	0.954/0.946	0.981/0.941	0.896/0.801	0.928/0.711	0.93975/0.84975
EM	0.983/0.982	0.998/0.993	0.940/0.864	0.996/0.973	0.97925/0.953
NIIFM	0.999/0.997	0.999/0.980	0.998/0.972	0.998/0.995	0.9985/0.9860
BS	0.952/0.437	0.947/0.310	0.909/0.238	0.948/0.304	0.939/0.32225
BR	0.955/0.467	0.950/0.343	0.910/0.242	0.949/0.333	0.941/0.34625
PSC	0.952/0.437	0.949/0.332	0.910/0.238	0.948/0.304	0.93975/0.32775
PR	0.954/0.458	0.950/0.343	0.911/0.248	0.949/0.333	0.941/0.3455
ATEN	0.998/0.972	0.998/0.991	0.995/0.960	0.997/0.956	0.997/0.96975

Open in a new tab

At the level of individual networks, NIIFM delivers the best or near-best performance in almost all cases: it clearly dominates LASSO, EM, and the Bayesian variants on the Karate, Polbooks, and Footballs networks, and remains competitive with CTS and ATEN on the Dolphins network, where CTS obtains the highest AUPR. These results demonstrate that NIIFM provides the most accurate and stable reconstruction across heterogeneous real-world networks, particularly in terms of AUPR, whereas classical Bayesian structure learning (BS, BR, PS, and PR) suffers from markedly lower precision and recall. As EM was found to be overall stronger than CTS and LASSO,³⁰ the following experiments focus on a detailed comparison between NIIFM and EM on the Hypertext2009⁵⁰ and Lyonschool networks⁵¹^,⁵² in terms of F1 score.

The comparative analysis presented in Figure 10 reveals that NIIFM achieves superior F1 score performance relative to the EM method in real network inference, with pronounced advantages evident under the same conditions. NIIFM achieves superior network inference accuracy regardless of time series scale, performing effectively under both sparse and dense time series conditions. Given that real-world scenarios typically involve limited data availability, this highlights NIIFM’s enhanced effectiveness in practical applications.

Performance of the NIIFM and EM methods for inferring networks in real networks

(A) Hypertext2009 network.

(B) Lyonschool network. The performance of NIIFM is better than EM method when w = 4,000.

Comparison of NIIFM with EM employed in robustness

To compare the robustness of NIIFM and EM, we adopted the same two scenarios that used to assess the fault tolerance of NIIFM: (1) the data corruption scenario and (2) the missing data scenario. For both scenarios, we generated time series of length w = 10,000 on BA and ER networks with n = 100, ⟨k⟩ = 10. We then introduced perturbations by corrupting or removing 2%, 4%, 6%, 8%, and 10% of the time-series entries, respectively. The goal was to examine how such contaminated data affect the network inference performance of NIIFM and EM. The evaluation metric used in all experiments was the F1 score.

As shown in Figure 11, both NIIFM and EM are more strongly affected in the data corruption scenario than in the missing data scenario, reflecting the fact that incorrect entries are substantially more disruptive to inference than missing ones. In Figures 11A and 11B, it can be observed that under the data corruption scenario, NIIFM maintains an F1 score of approximately 0.8 even when 10% of the time-series entries are corrupted. In contrast, the F1 scores of EM drop below 0.5 on both BA and ER networks. This sensitivity arises because EM relies heavily on temporal transitions between successive states, making it vulnerable to erroneous observations. By transforming time-series data into an IF matrix, NIIFM mitigates the effect of corrupted entries, since the dominant characteristics of high-frequency node pairs remain preserved even when individual temporal events are incorrect. Figures 11C and 11D present the results for the missing data scenario. Both methods exhibit improved performance compared with the corruption scenario; however, EM still performs substantially worse than NIIFM. In this case, the lowest F1 score of NIIFM occurs on the BA network, remaining around 0.9, whereas the F1 score of EM remains below 0.5, indicating a pronounced performance gap. These findings indicate that NIIFM is considerably more robust to both random noise and incomplete observations, highlighting its reliability in practical settings where epidemic time-series data are frequently imperfect.

Robustness comparison between NIIFM and EM on binary-state time series data

(A) The data corruption scenario in BA network.

(B) The data corruption scenario in ER network.

(C) The missing data scenario in BA network.

(D) The missing data scenario in ER network.

Discussion

Accurate inference of who infects whom from binary-state epidemic time series remains challenging, especially when only short traces are available. Most existing network reconstruction methods operate by element-wise comparison of binary trajectories between node pairs, so their accuracy depends strongly on time-series length and their computational cost grows rapidly with data size. In this work, we proposed NIIFM, which maps binary infection traces into an IF matrix that summarizes transmission events between node pairs. By treating the IF matrix as low-dimensional inference statistics within an improved Bayesian framework, NIIFM alleviates the dependence on sequence length, preserves information relevant to edge existence, and substantially reduces computational complexity. This framework allows us to efficiently infer network structure and simultaneously obtain posterior distributions of epidemic transmission parameters and node-level quantities, which are not directly accessible to conventional non-Bayesian approaches such as CTS, LASSO, and EM. Extensive simulations using HMC show that NIIFM achieves high reconstruction accuracy, with F1 scores exceeding 0.98 on ER and WS networks and above 0.95 on BA networks, and it remains robust when 10% of the input data are corrupted (F1 scores above 0.8). Estimates of epidemic parameters and node centrality are consistent with ground truth, enabling the identification of high-risk nodes and the interruption of transmission chains. Taken together, NIIFM bridges theoretical network inference and practical public health applications by providing a fast and accurate tool to recover transmission routes and key nodes from sparse epidemic data.

Limitations of the study

Despite these advantages, several limitations of the present study should be noted. First, although NIIFM performs well on heterogeneous BA networks, its accuracy (F1 = 0.84) remains lower than in homogeneous cases. Furthermore, we also analyzed the impact of network heterogeneity on the inference accuracy of NIIFM in Figure S3. This discrepancy suggests that strong degree heterogeneity may require additional structural priors, such as degree-corrected models, to better capture the underlying topology and refine edge probability estimation. Second, scalability becomes challenging for very large systems n > 10³, where the computational burden of constructing and processing the IF matrix grows substantially. Inspired by the pre-pruning strategy introduced in the pairwise-interactions-based Bayesian inference method,²⁴ future work will explore integrating similar pre-pruning mechanisms with deep learning techniques to further improve the scalability and efficiency of NIIFM on large-scale networks.⁵³^,⁵⁴ Third, NIIFM is currently formulated for static networks. In many real-world settings, however, contact patterns evolve over time. Extending the framework to dynamic, time-varying networks and coupling it with real-time data streams (e.g., mobile or social-media-based signals) will be essential to more faithfully represent realistic transmission processes and to support timely, targeted interventions such as prioritized vaccination and contact tracing.

Resource availability

Lead contact

Requests for further information and resources should be directed to and will be fulfilled by the lead contact, Le Song (songle@sdnu.edu.cn).

Materials availability

This study did not generate new unique reagents.

Data and code availability

•
This study analyzes real-world networks datasets (Zachary’s karate club, the dolphin social network, the American college football network, and the political books network), accessible at https://websites.umich.edu/∼mejn/netdata/.
•
This study analyzes real-world networks datasets (Hypertex2009 network and the Lyonschool network), accessible at https://github.com/HuanWang2022/reconstruct_simplicial_complex.
•
Our source code is available at GitHub (https://github.com/Auzninee/NIIFM-Network-Inference-with-Infection-Frequency-Matrix) to be provided upon acceptance.
•
Any additional information required to reanalyze the data reported in this study is available from the lead contact upon request.

Acknowledgments

We thank all the editors for their efforts and efficient processing. We also show our great appreciation to all anonymous reviewers who presented good comments and advices. Last but not least, we show our thanks for the National Natural Science Foundation of China (nos.72171136 and 72304101), the Guangdong Basic and Applied Basic Research Foundation (no.2024A1515011559), the Guangdong Provincial Philosophy and Social Science Planning Project (no.GD24YGL22), and Shandong Provincial Natural Science Foundation (ZR2025MS1152) supporting partly for this research.

Author contributions

Conceptualization, X.J. and Y.M.; methodology, X.J., Y.M., and L.S.; investigation, X.J., Y.M., L.S., and H.Z.; visualization, X.J. and R.W.; formal analysis, X.J. and R.W.; writing – original draft, X.J. and Y.M.; writing – review and editing, X.J. and R.W.; funding acquisition, Y.M. and L.S.; resources, Y.M. and L.S.; supervision, Y.M. and L.S. All co-authors have read and approved the final version of the manuscript.

Declaration of interests

The authors declare no competing interests.

Declaration of generative AI and AI-assisted technologies in the writing process

During the preparation of this work, the authors used ChatGPT (based on GPT-5.2) for improving grammar. After using this tool, the authors reviewed and edited the content as needed and take full responsibility for the content of the publication.

STAR★Methods

Key resources table

REAGENT or RESOURCE	SOURCE	IDENTIFIER
Deposited data

Zachary’s karate club	W. W. Zachary⁴⁶	https://websites.umich.edu/∼mejn/netdata/
Dolphin social network	D. Lusseau et al.⁴⁷	https://websites.umich.edu/∼mejn/netdata/
American College football network	M. Girvan and M. E. J. Newman⁴⁹	https://websites.umich.edu/∼mejn/netdata/
Political books network	V. Krebs	https://websites.umich.edu/∼mejn/netdata/
Hypertex2009 network	Isella, L. et al.⁵⁰	https://github.com/HuanWang2022/reconstruct_simplicial_complex.
Lyonschool network	Stehlé, J. et al.⁵²	https://github.com/HuanWang2022/reconstruct_simplicial_complex.

Software and algorithms

Python version 3.12.8	Python Software Foundation	https://www.python.org
NIIFM	This study	https://github.com/Auzninee/NIIFM-Network-Inference-with-Infection-Frequency-Matrix
EM mothed	Wang et al.³²	https://github.com/HuanWang2022/reconstruct_simplicial_complex.
Bayesian network inference framework (BR, BS, PR, PS)	Young et al.¹⁴	https://github.com/jg-you/noisy-networks-measurements

Open in a new tab

Experimental model and study participant details

Omitted as our study does not involve biological models.

Method details

Network inference with infection frequency matrix model

The aim of network inference is to calculate the probability of edge formation between any two nodes. Binary-state time series of infection between any two nodes implies the potential existence of an edge between them. Therefore, infection frequency constitutes the quantitative characterization of whether connections exist between any pair of nodes, serving as a statistical measure for inferring network structure. In this section, the process of obtaining the infection frequency matrix from binary-state time series is first introduced, followed by an elaboration on constructing the likelihood function, and finally an explanation of how to infer network structure through the Bayesian method.

Extract infection frequency matrix from binary-state time series

Before presenting the detailed derivation, all necessary notations and their corresponding definitions are listed in the Table 1.

Let us consider a condition with n nodes in an epidemic-affected area over w time steps. The epidemic infection status of these nodes can be represented by matrix S_w×n, where each element indicates the infection state (0 or 1) of a node at a specific time. The goal is to infer the social network matrix A_n×n that represents relationships between these n nodes based on S_w×n.

Based on the SIS model for complex networks,⁵⁵ epidemic spread is assumed to follow a Markovian process. Due to the Markovian nature of epidemic transmission, if a node becomes infected at time t+1, it must have been infected by nodes that were in the infectious state at time t. Within the set of infected nodes at time t, at least one node is a neighbor of node i. Hence, to determine whether an edge exists between any pair of nodes i and j, only two key frequencies need to be examined: how often node j is in the infectious state before node i becomes infected, and how often node i is in the infectious state before node j becomes infected. These frequencies serve as statistical indicators for the existence of an edge between the two nodes. After completing the frequency calculation for each pair of nodes, IF matrix M can be obtained, where any element M_ij in the IF matrix M represents the frequency count of times node i is infected by node j (i.e. , $M_{i j} = \sum_{t = 1}^{W - 1} I (S_{i}^{t} = 0) \cdot I (S_{i}^{t + 1} = 1) \cdot I (S_{j}^{t} = 1)$ ). After obtaining the IF matrix M, the adjacency matrix A and the related propagation parameter θ can be inferred based on the data in M.

The process of converting Binary-state time series into an IF matrix can be demonstrated through the example shown in Figure 2. Algorithm 1 illustrates the detailed steps of the proposed approach.

Theoretical analysis of the likelihood function based on the IF matrix

After obtaining the infection frequency M_ij between each pair of nodes, it is necessary to infer the adjacency matrix A, which represents the possible links. The crucial step in inferring A is constructing the likelihood function, as it determines the precision of the description of the relationship between known data and parameters.

With the likelihood function established, the structure of the adjacency matrix can be inferred using traditional statistical methods. Both A and other parameters in the epidemic propagation process should be obtained, then the parameter set is partitioned into two components: the adjacency matrix A and other propagation parameters θ. Bayesian method is a commonly employed approach for addressing such problems, and the detailed process and mathematical derivations will be presented in the following sections.

Theorem 1. Let G=(V,E) be a network with adjacency matrix A, consider epidemic propagation on G according to the SIS model with transmission parameter β∈(0,1). For any two nodes i,j∈V after w discrete time steps of propagation, the infections frequency from node i to node j follows a Poisson distribution with mean μ_ij, i.e. M_ij∼Possion(μ_ij), where

μ_{i j} = w Z_{i} (Z_{j} + β A_{i j})

(Equation 2)

with $Z_{i} = 1 - {(1 - β)}^{k_{i} c}$ , k_i represents the degree of node i and c is proportion of infectious state in stable state.

Proof. For the infection frequency M_ij between nodes i and j, it is possible to model the epidemic transmission process as a series of Bernoulli trials.⁵⁶ Each potential transmission event represents a single trial, under repeated epidemic spread scenarios, the cumulative count M_ij follows a binomial distribution. As we know, if node j were to remain in an infected state throughout the entire period, the number of successful transmissions to node i within w time steps would follow a binomial distribution with parameter β. However, in the SIS model, node j cannot maintain a constant infected state; rather, it possesses infectivity only during specific intervals within the overall time frame. Consequently, the actual transmission probability is lower than β. Therefore, the transmission probability from node j at each discrete time step can be assumed as P_w. Within w time steps, the number of successful transmissions from node j to node i follows a binomial distribution with parameter P_w, which can be expressed as

M_{i j} \sim Binomial (w, P_{w}) .

(Equation 3)

Then the probability of M_ij=k can be formulated by

P (M_{i j} = k) = (\begin{array}{c} w \\ k \end{array}) P_{w}^{k} {(1 - P_{w})}^{w - k},

(Equation 4)

when w is large enough, it can be deduced from Poisson’s theorem that the number of M_ij follows a Poisson distribution,

\lim_{w \to \infty} = (\begin{array}{c} w \\ k \end{array}) P_{w}^{k} {(1 - P_{w})}^{w - k} = \frac{λ^{k}}{k!} e^{- λ} .

(Equation 5)

Consequently, when the number of times node i becomes infected during the propagation process w is known, M_ij can be modeled as a Poisson distribution with parameter μ_ij. The parameter μ_ij can be explicitly formulated. The proportion of infected nodes at steady state is assumed to be c. This assumption is reasonable given that the epidemic propagation processes examined in this study are sufficiently long-term, and epidemic models with fixed network structures typically converge to equilibrium within 50 time steps. Since simulations are conducted for a minimum of 1,000 time steps, transient behaviors before reaching steady state can be neglected. For any node i, let k_i denote its degree, the probability of infection from its neighbors is expressed as:

P (i was infected by its neighbors) = 1 - {(1 - β)}^{k_{i} c} .

(Equation 6)

The expected infection rate of node i at time step w can be expressed as

E (S_{i} = 1) = w (1 - {(1 - β)}^{k_{i} c}) .

(Equation 7)

After obtaining the expected number of infections for node i,assuming that the transmission probability is β, then for any neighbor j of node i, the expected number of times that node j may be infected by node i is given by βE(S_i=1) that is the product of the transmission probability and the expected infection state of node i:

E (The exact number of infection from i to j) = w β (1 - {(1 - β)}^{k_{i} c}) .

(Equation 8)

However, for any node pairs (i,j) that are not directly connected by an edge, it may still occur that node i is in the infected state at time w, while node j becomes infected at time w+1. This situation occurs on every node pair (regardless of whether there are edges), and its expected value is given by:

E (S_{j, w + 1} = 1 | S_{i, w} = 1) = w (1 - {(1 - β)}^{k_{i} c}) (1 - {(1 - β)}^{k_{j} c}) .

(Equation 9)

Equation 8 describes an additional increment generated only when nodes i and j are neighbors. While Equation 9 represents the data generated regardless of whether nodes i and j are adjacent. Together, these two components determine the mean of the Poisson distribution that M_ij follows. For simply, $Z_{i} = 1 - {(1 - β)}^{k_{i} c}$ is denoted by Z_i. The mean μ_ij of the random variable M_ij is given by:

μ_{i j} = w Z_{i} (Z_{j} + β A_{i j}) .

(Equation 10)

Since M_ij∼Possion(μ_ij), the distribution of M_ij is given by:

P (M_{i j} ∣ μ_{i j}) = \frac{{μ_{i j}}^{M_{i j}}}{M_{i j}!} e^{- μ_{i j}} .

(Equation 11)

Given the adjacency matrix A and the parameter set {θ}, the probability of number of times node i infects node j can be obtained. The likelihood function for the entire infection frequency matrix M can then be derived.

P (M ∣ A, θ) = \prod_{i, j} \frac{{[w Z_{i} (Z_{j} + β A_{i j})]}^{M_{i j}}}{M_{i j}!} e^{- w Z_{i} (Z_{j} + β A_{i j})} .

(Equation 12)

Inferring network structure by Bayesian methods

In theorem 1, it is established that there exists a relationship between infection frequency between any two nodes, {θ} and the adjacency matrix. Since this involves conditional probability, the aim is to estimate the edge connection probability between any two nodes using the Bayesian method. Moreover, because the Bayesian method treats parameters as random variables, both the edge connection probability and parameter values can be represented through posterior distributions, thereby accomplishing the dual objectives of network inference and parameter estimation.

Theorem 2. When the infection frequency matrix M and parameter set {θ} are known, the probability distribution of the adjacency matrix A can be obtained using the Bayesian method

P (A ∣ M, θ) = \frac{P (M ∣ A, θ) P (A ∣ θ)}{P (M ∣ θ)} = \frac{\prod_{i, j} \frac{1}{M_{i j}!} e^{- w Z_{i} Z_{j}} {(w Z_{i} Z_{j})}^{M_{i j}} {(1 - ρ)}^{1 - A_{i j}} {[ρ {(1 + \frac{β}{Z_{j}})}^{M_{i j}} e^{- w Z_{i} β}]}^{A_{i j}}}{e^{- c} \prod_{i, j} \frac{1}{M_{i j}!} e^{- w Z_{i} Z_{j}} {(w Z_{i} Z_{j})}^{M_{i j}} [1 - ρ + ρ {(1 + \frac{β}{Z_{j}})}^{M_{i j}} e^{- w Z_{i} β}]} .

(Equation 13)

Proof. Equation 11 provides the probability of observing the IF matrix M, conditional on the adjacency matrix A and parameter θ. However, the objective is to determine the probability of the adjacency matrix A and parameter θ given the observed matrix M. Using Bayesian method

P (A, θ ∣ M) = \frac{P (M ∣ A, θ) P (A ∣ θ) P (θ)}{P (M)} .

(Equation 14)

The posterior probability on the left side of the Equation 14 represents the probability that the adjacency matrix A and the parameter θ are correct given the observed IF matrix M. The denominator on the right side of the Equation is a normalization constant that remains unchanged, so it can be ignored during optimization. Having established the first term of the numerator above, the second and third terms are analyzed below. P(A|θ) represents the prior probability of the network structure. Assuming that without any prior knowledge about the network structure, the probability of generating an edge between any pair of nodes is equal, represented by ρ, the prior probability of the entire network is

P (A ∣ θ) = \prod_{i j} {(1 - ρ)}^{1 - A_{i j}} ρ^{A_{i j}} .

(Equation 15)

Then, assuming that w, β, k and ρ all have uniform priors, substituting P(M|A,θ) and P(A|θ) into the equation yields

P (A, θ ∣ M) \propto P (θ) \prod_{i j} {(1 - ρ)}^{1 - A_{i j}} ρ^{A_{i j}} \frac{{μ_{i j}}^{M i j}}{M_{i j}!} e^{- μ_{i j}},

(Equation 16)

By Equation 16, the probability space of parameter A can be integrated over to obtain P(θ|M),

P (θ ∣ M) = \sum_{A} P (A, θ ∣ M) .

(Equation 17)

To solve P(θ|M), the elements in Equation 16 can be divided into two parts, x_ijand y_ij, one part related to A_ij is represented by x_ij, and the other part related to 1-A_ij is represented by y_ij,

P (θ | M) = \sum_{A} P (A, θ | M) = \sum_{A} \prod_{i j} x_{i j}^{A_{i j}} y_{i j}^{1 - A_{i j}} = \prod_{i j} (x_{i j} + y_{i j}) .

(Equation 18)

Subsequently, Equation 16 can expressed as

P (A, θ ∣ M) \propto P (θ) \prod_{i j} {(1 - ρ)}^{1 - A_{i j}} ρ^{A_{i j}} {[w Z_{i} (Z_{j} + β A_{i j})]}^{M_{i j}} e^{- w Z_{i} (Z_{j} + β A_{i j})} .

(Equation 19)

In Equation 19, if A_ij=0, the right side of Equation 19 becomes

P (θ) \prod_{i j} (1 - ρ) {(w Z_{i} Z_{j})}^{M_{i j}} e^{- w Z_{i} Z_{j}},

(Equation 20)

when A_ij=1, the right side of the Equation becomes

P (θ) \prod_{i j} ρ {[w Z_{i} (Z_{j} + β)]}^{M_{i j}} e^{- w Z_{i} (Z_{j} + β)} .

(Equation 21)

Combining Equations 20 and 21, the result of P(θ|M) can be obtained

P (θ ∣ M) \propto P (θ) \prod_{i j} e^{- w Z_{i} Z_{j}} {(w Z_{i} Z_{j})}^{M_{i j}} [1 - ρ + ρ {(1 + \frac{β}{Z_{j}})}^{M_{i j}} e^{- w Z_{i} β}] .

(Equation 22)

According to Equation 22, the value of parameter θ can be estimated by Hamiltonian Monte Carlo (HMC) simulation. After the estimated value of θ is obtained, the structure of A can be estimated

P (A ∣ M, θ) = \frac{P (M ∣ A, θ) P (A ∣ θ) P (θ)}{P (θ ∣ M) P (M)} .

(Equation 23)

By substituting Equations 12, 15, and 22, P(A|M,θ) is formulated as

P (A ∣ M, θ) = \frac{\prod_{i, j} \frac{1}{M_{i j}!} e^{- w Z_{i} Z_{j}} {(w Z_{i} Z_{j})}^{M_{i j}} {(1 - ρ)}^{1 - A_{i j}} {[ρ {(1 + \frac{β}{Z_{j}})}^{M_{i j}} e^{- w Z_{i} β}]}^{A_{i j}}}{e^{- c} \prod_{i j} \frac{1}{M_{i j}!} e^{- w Z_{i} Z_{j}} {(w Z_{i} Z_{j})}^{M_{i j}} [1 - ρ + ρ {(1 + \frac{β}{Z_{j}})}^{M_{i j}} e^{- w Z_{i} β}]} .

(Equation 24)

The element P_ij of P means the probability of edge between nodes i and j, that is, if A_ij=1, P_ij can be got by Equation 24,

P_{i j} = P (A_{i j} = 1 ∣ M, θ) = \frac{ρ {(1 + \frac{β}{Z_{j}})}^{M_{i j}} e^{- w Z_{i} β}}{1 - ρ + ρ {(1 + \frac{β}{Z_{j}})}^{M_{i j}} e^{- w Z_{i} β}} .

(Equation 25)

Then, $P = {(P_{i j})}_{n \times n} = j = (P (A_{i j} = 1 | M, θ))$ , the structure matrix A can be calcualted.

Hamiltonian Monte Carlo sampling

To calculate the edge probability between each pair of nodes using Equation 25, the parameter values of θ must be determined. Consequently, the HMC method is employed to sample θ values. The sampling procedure is implemented using the cmdstanpy package which provides a Python interface for Stan’s probabilistic programming framework.

For facilitating HMC sampling, the logarithmic transformation is applied to both sides of Equation 22, which results in

\log P (θ ∣ M) = \log \frac{P (θ)}{P (M)} + \sum [\log e^{- w Z_{i} Z_{j}} + \log {(w Z_{i} Z_{j})}^{M_{i j}} + \log (1 - ρ + ρ {(1 + \frac{β}{Z_{j}})}^{M_{i j}} e^{- w Z_{i} β})] .

(Equation 26)

As P(θ) is constant, the Equation 26 reduces to

\log P (θ ∣ M) = - C + \sum_{i j} (X_{i j} + Y_{i j}),

(Equation 27)

where

X_{i j} = M_{i j} \log w Z_{i} Z_{j} - w Z_{i} Z_{j};

(Equation 28)

Y_{i j} = \log (1 - ρ + ρ {(1 + \frac{β}{Z_{j}})}^{M_{i j}} e^{- w Z_{i} β}) .

(Equation 29)

To maintain computational stability and prevent underflow issues, it is necessary to introduce the Y_ij by defining

ξ_{i j} = \log (1 - ρ); ν_{i j} = \log ρ + M_{i j} \log (1 + \frac{β}{Z_{j}}) - w Z_{i} β,

(Equation 30)

and then writing

Y_{i j} = {\begin{cases} ξ_{i j} + \log (1 + e^{ν_{i j} - ξ_{i j}}) if ξ_{i j} > ν_{i j}, \\ ν_{i j} + \log (1 + e^{ξ_{i j} - ν_{i j}}) otherwise \end{cases} .

(Equation 31)

Subsequently, HMC simulation can be conducted based on X_ij and Y_ij. To mitigate the potential risk of converging to local optima, multiple independent simulations are usually conducted.

Quantification and statistical analysis

All computations were performed in Python (3.12.8). We considered both real-world and synthetic networks. For real-world networks, the ground-truth adjacency matrix was used for evaluation. For synthetic networks, we generated ER, WS, and BA networks with n nodes and average degree k using NetworkX. We then simulated an SIS spreading process for w time steps on each network to obtain a binary infection-state time series S_w×n. The time series was transformed into an IF matrix M_n×n using the script transform_m.py, which served as the input for network inference.

For network reconstruction, we implemented a Bayesian model in Stan (model.stan) and performed inference via CmdStanPy, a Python interface to Stan. Posterior samples of model parameters were used to compute the posterior probability (or an equivalent decision statistic) of edge existence for each node pair, yielding an inferred adjacency matrix. Finally, we evaluated inference performance by comparing against the ground truth A using F1-score, AUROC, and AUPR.

Published: March 24, 2026

Footnotes

Supplemental information can be found online at https://doi.org/10.1016/j.isci.2026.115462.

Supplemental information

Document S1. Figures S1–S3

mmc1.pdf^{(259.2KB, pdf)}

References

1.Li B., Saad D. Infection-induced cascading failures-impact and mitigation. Commun. Phys. 2024;7:144. [Google Scholar]
2.Hou Y., Lu Y., Dong Y., Jin L., Shi L. Impact of different social attitudes on epidemic spreading in activity-driven networks. Appl. Math. Comput. 2023;446 [Google Scholar]
3.Hou C., Chen J., Zhou Y., Hua L., Yuan J., He S., Guo Y., Zhang S., Jia Q., Zhao C., et al. The effectiveness of quarantine of Wuhan city against the Corona Virus Disease 2019 (COVID-19): A well-mixed SEIR model analysis. J. Med. Virol. 2020;92:841–848. doi: 10.1002/jmv.25827. [DOI] [PubMed] [Google Scholar]
4.Berger D.W., Herkenhoff K.F., Mongey S. National Bureau of Economic Research; 2020. An SEIR Infectious Disease Model with Testing and Conditional Quarantine. [Google Scholar]
5.Abdalla S.J.M., Govinder K.S., Chirove F. The impact of geographically-targeted vaccinations during the 2018-2020 Kivu Ebola outbreak. Appl. Math. Model. 2025;142 [Google Scholar]
6.Premeaux T.A., Bowler S., Friday C.M., Moser C.B., Hoenigl M., Lederman M.M., Landay A.L., Gianella S., Ndhlovu L.C. Machine learning models based on fluid immunoproteins that predict non-AIDS adverse events in people with HIV. iScience. 2024;27 doi: 10.1016/j.isci.2024.109945. [DOI] [PMC free article] [PubMed] [Google Scholar]
7.Bonaccorsi G., Pierri F., Cinelli M., Flori A., Galeazzi A., Porcelli F., Schmidt A.L., Valensise C.M., Scala A., Quattrociocchi W., Pammolli F. Economic and social consequences of human mobility restrictions under COVID-19. Proc. Natl. Acad. Sci. USA. 2020;117:15530–15535. doi: 10.1073/pnas.2007658117. [DOI] [PMC free article] [PubMed] [Google Scholar]
8.Dash T.K., Chakraborty C., Mahapatra S., Panda G. Mitigating information interruptions by COVID-19 face masks: a three-stage speech enhancement scheme. IEEE Trans. Comput. Soc. Syst. 2024;11:4790–4799. [Google Scholar]
9.Jones C., Philippon T., Venkateswaran V. Optimal mitigation policies in a pandemic: Social distancing and working from home. Rev. Financ. Stud. 2021;34:5188–5223. [Google Scholar]
10.Li X., Yang J.-X., Wang H.-Y., Tan Y. Controlling the spread of infectious diseases by using random walk method to remove many important links. Commun. Nonlinear Sci. Numer. Simulat. 2024;128 [Google Scholar]
11.Huang H., Chen Y., Yan Z. Impacts of social distancing on the spread of infectious diseases with asymptomatic infection: a mathematical model. Appl. Math. Comput. 2021;398 doi: 10.1016/j.amc.2021.125983. [DOI] [PMC free article] [PubMed] [Google Scholar]
12.Zhang Z., Wang X., Li H., Chen Y., Qu Z., Mi Y., Hu G. Uncovering hidden nodes and hidden links in complex dynamic networks. Sci. China Phys. Mech. Astron. 2024;67 [Google Scholar]
13.Demirer M., Diebold F.X., Liu L., Yilmaz K. Estimating global bank network connectedness. J. Appl. Econ. 2018;33:1–15. [Google Scholar]
14.Young J.-G., Cantwell G.T., Newman M.E.J. Bayesian inference of network structure from unreliable data. Journal of Complex Networks. 2021;8 [Google Scholar]
15.Young J.-G., Valdovinos F.S., Newman M.E.J. Reconstruction of plant-pollinator networks from observational data. Nat. Commun. 2021;12:3911. doi: 10.1038/s41467-021-24149-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
16.Brożek A., Ceccarelli A., Jørgensen A.C.S., Hintze M., Shahrezaei V., Barkoulas M. Inference of a three-gene network underpinning epidermal stem cell development in Caenorhabditis elegans. iScience. 2025;28 doi: 10.1016/j.isci.2025.111826. [DOI] [PMC free article] [PubMed] [Google Scholar]
17.Shi N., Zhu Z., Tang K., Parker D., He S. ATEN: And/Or tree ensemble for inferring accurate Boolean network topology and dynamics. Bioinformatics. 2020;36:578–585. doi: 10.1093/bioinformatics/btz563. [DOI] [PubMed] [Google Scholar]
18.Barman S., Kwon Y.-K. A neuro-evolution approach to infer a Boolean network from time-series gene expressions. Bioinformatics. 2020;36:i762–i769. doi: 10.1093/bioinformatics/btaa840. [DOI] [PubMed] [Google Scholar]
19.Gomez-Rodriguez M., Leskovec J., Krause A. Inferring networks of diffusion and influence. ACM Trans. Knowl. Discov. Data. 2012;5:1–37. [Google Scholar]
20.Tan Q., Liu Y., Liu J. Motif-aware diffusion network inference. Int. J. Data Sci. Anal. 2020;9:375–387. [Google Scholar]
21.Gray C., Mitchell L., Roughan M. Bayesian inference of network structure from information cascades. IEEE Trans. Signal Inf. Process. Netw. 2020;6:371–381. [Google Scholar]
22.Braunstein A., Ingrosso A., Muntoni A.P. Network reconstruction from infection cascades. J. R. Soc. Interface. 2019;16 doi: 10.1098/rsif.2018.0844. [DOI] [PMC free article] [PubMed] [Google Scholar]
23.Wang Y., Wang H., Gao C., Fan K., Cheng H., Shen Z., Wang Z., Perc M. Learning influence probabilities in diffusion networks without timestamps. Appl. Math. Comput. 2025;503 [Google Scholar]
24.Gao C., Wang Y., Wang Z., Li X., Li X. Proceedings of the ACM Web Conference 2023. 2023. Pairwise-interactions-based Bayesian Inference of Network Structure from Information Cascades; pp. 102–110. [Google Scholar]
25.Huang K., Gao R., Cautis B., Xiao X. Proceedings of the ACM Web Conference 2024. 2024. Scalable Continuous-Time Diffusion Framework for Network Inference and Influence Estimation; pp. 2660–2671. [Google Scholar]
26.Pinto P.C., Thiran P., Vetterli M. Locating the source of diffusion in large-scale networks. Phys. Rev. Lett. 2012;109 doi: 10.1103/PhysRevLett.109.068702. [DOI] [PubMed] [Google Scholar]
27.Shen Z., Wang W.-X., Fan Y., Di Z., Lai Y.-C. Reconstructing propagation networks with natural diversity and identifying hidden sources. Nat. Commun. 2014;5:4323. doi: 10.1038/ncomms5323. [DOI] [PMC free article] [PubMed] [Google Scholar]
28.Li J., Shen Z., Wang W.-X., Grebogi C., Lai Y.-C. Universal data-based method for reconstructing complex networks with binary-state dynamics. Phys. Rev. E. 2017;95 doi: 10.1103/PhysRevE.95.032303. [DOI] [PubMed] [Google Scholar]
29.Shi L., Shen C., Jin L., Shi Q., Wang Z., Boccaletti S. Inferring network structures via signal Lasso. Phys. Rev. Res. 2021;3 [Google Scholar]
30.Ma C., Chen H.-S., Lai Y.-C., Zhang H.-F. Statistical inference approach to structural reconstruction of complex networks from binary time series. Phys. Rev. E. 2018;97 doi: 10.1103/PhysRevE.97.022301. [DOI] [PubMed] [Google Scholar]
31.Zhang H.-F., Xu F., Bao Z.-K., Ma C. Reconstructing of networks with binary-state dynamics via generalized statistical inference. IEEE Trans. Circ. Syst. I. 2019;66:1608–1619. [Google Scholar]
32.Wang H., Ma C., Chen H.-S., Lai Y.-C., Zhang H.-F. Full reconstruction of simplicial complexes from binary contagion and Ising data. Nat. Commun. 2022;13:3043. doi: 10.1038/s41467-022-30706-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
33.Peixoto T.P. Network reconstruction and community detection from dynamics. Phys. Rev. Lett. 2019;123 doi: 10.1103/PhysRevLett.123.128301. [DOI] [PMC free article] [PubMed] [Google Scholar]
34.Beaufort L.-B., Massé P.-Y., Reboulet A., Oudre L. Network reconstruction problem for an epidemic reaction--diffusion system. Journal of Complex Networks. 2022;10 [Google Scholar]
35.Chen M., Zhang Y., Zhang Z., Du L., Wang S., Zhang J. Inferring network structure with unobservable nodes from time series data. Chaos. 2022;32 doi: 10.1063/5.0076521. [DOI] [PubMed] [Google Scholar]
36.Hanley J.A., McNeil B.J. The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology. 1982;143:29–36. doi: 10.1148/radiology.143.1.7063747. [DOI] [PubMed] [Google Scholar]
37.He H., Garcia E.A. Learning from imbalanced data. IEEE Trans. Knowl. Data Eng. 2009;21:1263–1284. [Google Scholar]
38.Kitsak M., Gallos L.K., Havlin S., Liljeros F., Muchnik L., Stanley H.E., Makse H.A. Identification of influential spreaders in complex networks. Nat. Phys. 2010;6:888–893. [Google Scholar]
39.Jain L., Katarya R., Sachdeva S. Opinion leader detection using whale optimization algorithm in online social network. Expert Syst. Appl. 2020;142 [Google Scholar]
40.Zhang X., Wang Z., Liu G., Wang Y. Key node identification in social networks based on topological potential model. Comput. Commun. 2024;213:158–168. [Google Scholar]
41.Freeman L.C. Centrality in social networks conceptual clarification. Soc. Netw. 1978;1:215–239. [Google Scholar]
42.Sabidussi G. The centrality index of a graph. Psychometrika. 1966;31:581–603. doi: 10.1007/BF02289527. [DOI] [PubMed] [Google Scholar]
43.Newman M. Mathematics of Networks. The New Palgrave Dictionary of Economics. 2008;1:8. [Google Scholar]
44.Qu J., Tang M., Liu Y., Guan S. Identifying influential spreaders in reversible process. Chaos Solitons Fractals. 2020;140 [Google Scholar]
45.Liu K., Lü X., Gao F., Zhang J. Expectation-maximizing network reconstruction and most applicable network types based on binary time series data. Phys. Nonlinear Phenom. 2023;454 [Google Scholar]
46.Zachary W.W. An information flow model for conflict and fission in small groups. J. Anthropol. Res. 1977;33:452–473. [Google Scholar]
47.Lusseau D., Schneider K., Boisseau O.J., Haase P., Slooten E., Dawson S.M. The bottlenose dolphin community of doubtful sound features a large proportion of long-lasting associations: can geographic isolation explain this unique trait? Behav. Ecol. Sociobiol. 2003;54:396–405. [Google Scholar]
48.Newman M.E.J. Finding community structure in networks using the eigenvectors of matrices. Phys. Rev. E - Stat. Nonlinear Soft Matter Phys. 2006;74 doi: 10.1103/PhysRevE.74.036104. [DOI] [PubMed] [Google Scholar]
49.Girvan M., Newman M.E.J. Community structure in social and biological networks. Proc. Natl. Acad. Sci. USA. 2002;99:7821–7826. doi: 10.1073/pnas.122653799. [DOI] [PMC free article] [PubMed] [Google Scholar]
50.Isella L., Stehlé J., Barrat A., Cattuto C., Pinton J.-F., Van den Broeck W. What’s in a crowd? Analysis of face-to-face behavioral networks. J. Theor. Biol. 2011;271:166–180. doi: 10.1016/j.jtbi.2010.11.033. [DOI] [PubMed] [Google Scholar]
51.Gemmetto V., Barrat A., Cattuto C. Mitigation of infectious disease at school: targeted class closure vs school closure. BMC Infect. Dis. 2014;14:695. doi: 10.1186/s12879-014-0695-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
52.Stehlé J., Voirin N., Barrat A., Cattuto C., Isella L., Pinton J.-F., Quaggiotto M., Van den Broeck W., Régis C., Lina B., Vanhems P. High-resolution measurements of face-to-face contact patterns in a primary school. PLoS One. 2011;6 doi: 10.1371/journal.pone.0023176. [DOI] [PMC free article] [PubMed] [Google Scholar]
53.Murphy C., Laurence E., Allard A. Deep learning of contagion dynamics on complex networks. Nat. Commun. 2021;12:4720. doi: 10.1038/s41467-021-24732-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
54.Ding X., Kong L.-W., Zhang H.-F., Lai Y.-C. Deep-learning reconstruction of complex dynamical networks from incomplete data. Chaos. 2024;34 doi: 10.1063/5.0201557. [DOI] [PubMed] [Google Scholar]
55.Dodds P.S., Watts D.J. Universal behavior in a generalized model of contagion. Phys. Rev. Lett. 2004;92 doi: 10.1103/PhysRevLett.92.218701. [DOI] [PubMed] [Google Scholar]
56.Pastor-Satorras R., Vespignani A. Epidemic spreading in scale-free networks. Phys. Rev. Lett. 2001;86:3200–3203. doi: 10.1103/PhysRevLett.86.3200. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Document S1. Figures S1–S3

mmc1.pdf^{(259.2KB, pdf)}

Data Availability Statement

•
This study analyzes real-world networks datasets (Zachary’s karate club, the dolphin social network, the American college football network, and the political books network), accessible at https://websites.umich.edu/∼mejn/netdata/.
•
This study analyzes real-world networks datasets (Hypertex2009 network and the Lyonschool network), accessible at https://github.com/HuanWang2022/reconstruct_simplicial_complex.
•
Our source code is available at GitHub (https://github.com/Auzninee/NIIFM-Network-Inference-with-Infection-Frequency-Matrix) to be provided upon acceptance.
•
Any additional information required to reanalyze the data reported in this study is available from the lead contact upon request.

[bib1] 1.Li B., Saad D. Infection-induced cascading failures-impact and mitigation. Commun. Phys. 2024;7:144. [Google Scholar]

[bib2] 2.Hou Y., Lu Y., Dong Y., Jin L., Shi L. Impact of different social attitudes on epidemic spreading in activity-driven networks. Appl. Math. Comput. 2023;446 [Google Scholar]

[bib3] 3.Hou C., Chen J., Zhou Y., Hua L., Yuan J., He S., Guo Y., Zhang S., Jia Q., Zhao C., et al. The effectiveness of quarantine of Wuhan city against the Corona Virus Disease 2019 (COVID-19): A well-mixed SEIR model analysis. J. Med. Virol. 2020;92:841–848. doi: 10.1002/jmv.25827. [DOI] [PubMed] [Google Scholar]

[bib4] 4.Berger D.W., Herkenhoff K.F., Mongey S. National Bureau of Economic Research; 2020. An SEIR Infectious Disease Model with Testing and Conditional Quarantine. [Google Scholar]

[bib5] 5.Abdalla S.J.M., Govinder K.S., Chirove F. The impact of geographically-targeted vaccinations during the 2018-2020 Kivu Ebola outbreak. Appl. Math. Model. 2025;142 [Google Scholar]

[bib6] 6.Premeaux T.A., Bowler S., Friday C.M., Moser C.B., Hoenigl M., Lederman M.M., Landay A.L., Gianella S., Ndhlovu L.C. Machine learning models based on fluid immunoproteins that predict non-AIDS adverse events in people with HIV. iScience. 2024;27 doi: 10.1016/j.isci.2024.109945. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib7] 7.Bonaccorsi G., Pierri F., Cinelli M., Flori A., Galeazzi A., Porcelli F., Schmidt A.L., Valensise C.M., Scala A., Quattrociocchi W., Pammolli F. Economic and social consequences of human mobility restrictions under COVID-19. Proc. Natl. Acad. Sci. USA. 2020;117:15530–15535. doi: 10.1073/pnas.2007658117. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib8] 8.Dash T.K., Chakraborty C., Mahapatra S., Panda G. Mitigating information interruptions by COVID-19 face masks: a three-stage speech enhancement scheme. IEEE Trans. Comput. Soc. Syst. 2024;11:4790–4799. [Google Scholar]

[bib9] 9.Jones C., Philippon T., Venkateswaran V. Optimal mitigation policies in a pandemic: Social distancing and working from home. Rev. Financ. Stud. 2021;34:5188–5223. [Google Scholar]

[bib10] 10.Li X., Yang J.-X., Wang H.-Y., Tan Y. Controlling the spread of infectious diseases by using random walk method to remove many important links. Commun. Nonlinear Sci. Numer. Simulat. 2024;128 [Google Scholar]

[bib11] 11.Huang H., Chen Y., Yan Z. Impacts of social distancing on the spread of infectious diseases with asymptomatic infection: a mathematical model. Appl. Math. Comput. 2021;398 doi: 10.1016/j.amc.2021.125983. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib12] 12.Zhang Z., Wang X., Li H., Chen Y., Qu Z., Mi Y., Hu G. Uncovering hidden nodes and hidden links in complex dynamic networks. Sci. China Phys. Mech. Astron. 2024;67 [Google Scholar]

[bib13] 13.Demirer M., Diebold F.X., Liu L., Yilmaz K. Estimating global bank network connectedness. J. Appl. Econ. 2018;33:1–15. [Google Scholar]

[bib14] 14.Young J.-G., Cantwell G.T., Newman M.E.J. Bayesian inference of network structure from unreliable data. Journal of Complex Networks. 2021;8 [Google Scholar]

[bib15] 15.Young J.-G., Valdovinos F.S., Newman M.E.J. Reconstruction of plant-pollinator networks from observational data. Nat. Commun. 2021;12:3911. doi: 10.1038/s41467-021-24149-x. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib16] 16.Brożek A., Ceccarelli A., Jørgensen A.C.S., Hintze M., Shahrezaei V., Barkoulas M. Inference of a three-gene network underpinning epidermal stem cell development in Caenorhabditis elegans. iScience. 2025;28 doi: 10.1016/j.isci.2025.111826. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib17] 17.Shi N., Zhu Z., Tang K., Parker D., He S. ATEN: And/Or tree ensemble for inferring accurate Boolean network topology and dynamics. Bioinformatics. 2020;36:578–585. doi: 10.1093/bioinformatics/btz563. [DOI] [PubMed] [Google Scholar]

[bib18] 18.Barman S., Kwon Y.-K. A neuro-evolution approach to infer a Boolean network from time-series gene expressions. Bioinformatics. 2020;36:i762–i769. doi: 10.1093/bioinformatics/btaa840. [DOI] [PubMed] [Google Scholar]

[bib19] 19.Gomez-Rodriguez M., Leskovec J., Krause A. Inferring networks of diffusion and influence. ACM Trans. Knowl. Discov. Data. 2012;5:1–37. [Google Scholar]

[bib20] 20.Tan Q., Liu Y., Liu J. Motif-aware diffusion network inference. Int. J. Data Sci. Anal. 2020;9:375–387. [Google Scholar]

[bib21] 21.Gray C., Mitchell L., Roughan M. Bayesian inference of network structure from information cascades. IEEE Trans. Signal Inf. Process. Netw. 2020;6:371–381. [Google Scholar]

[bib22] 22.Braunstein A., Ingrosso A., Muntoni A.P. Network reconstruction from infection cascades. J. R. Soc. Interface. 2019;16 doi: 10.1098/rsif.2018.0844. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib23] 23.Wang Y., Wang H., Gao C., Fan K., Cheng H., Shen Z., Wang Z., Perc M. Learning influence probabilities in diffusion networks without timestamps. Appl. Math. Comput. 2025;503 [Google Scholar]

[bib24] 24.Gao C., Wang Y., Wang Z., Li X., Li X. Proceedings of the ACM Web Conference 2023. 2023. Pairwise-interactions-based Bayesian Inference of Network Structure from Information Cascades; pp. 102–110. [Google Scholar]

[bib25] 25.Huang K., Gao R., Cautis B., Xiao X. Proceedings of the ACM Web Conference 2024. 2024. Scalable Continuous-Time Diffusion Framework for Network Inference and Influence Estimation; pp. 2660–2671. [Google Scholar]

[bib26] 26.Pinto P.C., Thiran P., Vetterli M. Locating the source of diffusion in large-scale networks. Phys. Rev. Lett. 2012;109 doi: 10.1103/PhysRevLett.109.068702. [DOI] [PubMed] [Google Scholar]

[bib27] 27.Shen Z., Wang W.-X., Fan Y., Di Z., Lai Y.-C. Reconstructing propagation networks with natural diversity and identifying hidden sources. Nat. Commun. 2014;5:4323. doi: 10.1038/ncomms5323. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib28] 28.Li J., Shen Z., Wang W.-X., Grebogi C., Lai Y.-C. Universal data-based method for reconstructing complex networks with binary-state dynamics. Phys. Rev. E. 2017;95 doi: 10.1103/PhysRevE.95.032303. [DOI] [PubMed] [Google Scholar]

[bib29] 29.Shi L., Shen C., Jin L., Shi Q., Wang Z., Boccaletti S. Inferring network structures via signal Lasso. Phys. Rev. Res. 2021;3 [Google Scholar]

[bib30] 30.Ma C., Chen H.-S., Lai Y.-C., Zhang H.-F. Statistical inference approach to structural reconstruction of complex networks from binary time series. Phys. Rev. E. 2018;97 doi: 10.1103/PhysRevE.97.022301. [DOI] [PubMed] [Google Scholar]

[bib31] 31.Zhang H.-F., Xu F., Bao Z.-K., Ma C. Reconstructing of networks with binary-state dynamics via generalized statistical inference. IEEE Trans. Circ. Syst. I. 2019;66:1608–1619. [Google Scholar]

[bib32] 32.Wang H., Ma C., Chen H.-S., Lai Y.-C., Zhang H.-F. Full reconstruction of simplicial complexes from binary contagion and Ising data. Nat. Commun. 2022;13:3043. doi: 10.1038/s41467-022-30706-9. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib33] 33.Peixoto T.P. Network reconstruction and community detection from dynamics. Phys. Rev. Lett. 2019;123 doi: 10.1103/PhysRevLett.123.128301. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib34] 34.Beaufort L.-B., Massé P.-Y., Reboulet A., Oudre L. Network reconstruction problem for an epidemic reaction--diffusion system. Journal of Complex Networks. 2022;10 [Google Scholar]

[bib35] 35.Chen M., Zhang Y., Zhang Z., Du L., Wang S., Zhang J. Inferring network structure with unobservable nodes from time series data. Chaos. 2022;32 doi: 10.1063/5.0076521. [DOI] [PubMed] [Google Scholar]

[bib36] 36.Hanley J.A., McNeil B.J. The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology. 1982;143:29–36. doi: 10.1148/radiology.143.1.7063747. [DOI] [PubMed] [Google Scholar]

[bib37] 37.He H., Garcia E.A. Learning from imbalanced data. IEEE Trans. Knowl. Data Eng. 2009;21:1263–1284. [Google Scholar]

[bib38] 38.Kitsak M., Gallos L.K., Havlin S., Liljeros F., Muchnik L., Stanley H.E., Makse H.A. Identification of influential spreaders in complex networks. Nat. Phys. 2010;6:888–893. [Google Scholar]

[bib39] 39.Jain L., Katarya R., Sachdeva S. Opinion leader detection using whale optimization algorithm in online social network. Expert Syst. Appl. 2020;142 [Google Scholar]

[bib40] 40.Zhang X., Wang Z., Liu G., Wang Y. Key node identification in social networks based on topological potential model. Comput. Commun. 2024;213:158–168. [Google Scholar]

[bib41] 41.Freeman L.C. Centrality in social networks conceptual clarification. Soc. Netw. 1978;1:215–239. [Google Scholar]

[bib42] 42.Sabidussi G. The centrality index of a graph. Psychometrika. 1966;31:581–603. doi: 10.1007/BF02289527. [DOI] [PubMed] [Google Scholar]

[bib43] 43.Newman M. Mathematics of Networks. The New Palgrave Dictionary of Economics. 2008;1:8. [Google Scholar]

[bib44] 44.Qu J., Tang M., Liu Y., Guan S. Identifying influential spreaders in reversible process. Chaos Solitons Fractals. 2020;140 [Google Scholar]

[bib45] 45.Liu K., Lü X., Gao F., Zhang J. Expectation-maximizing network reconstruction and most applicable network types based on binary time series data. Phys. Nonlinear Phenom. 2023;454 [Google Scholar]

[bib46] 46.Zachary W.W. An information flow model for conflict and fission in small groups. J. Anthropol. Res. 1977;33:452–473. [Google Scholar]

[bib47] 47.Lusseau D., Schneider K., Boisseau O.J., Haase P., Slooten E., Dawson S.M. The bottlenose dolphin community of doubtful sound features a large proportion of long-lasting associations: can geographic isolation explain this unique trait? Behav. Ecol. Sociobiol. 2003;54:396–405. [Google Scholar]

[bib48] 48.Newman M.E.J. Finding community structure in networks using the eigenvectors of matrices. Phys. Rev. E - Stat. Nonlinear Soft Matter Phys. 2006;74 doi: 10.1103/PhysRevE.74.036104. [DOI] [PubMed] [Google Scholar]

[bib49] 49.Girvan M., Newman M.E.J. Community structure in social and biological networks. Proc. Natl. Acad. Sci. USA. 2002;99:7821–7826. doi: 10.1073/pnas.122653799. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib50] 50.Isella L., Stehlé J., Barrat A., Cattuto C., Pinton J.-F., Van den Broeck W. What’s in a crowd? Analysis of face-to-face behavioral networks. J. Theor. Biol. 2011;271:166–180. doi: 10.1016/j.jtbi.2010.11.033. [DOI] [PubMed] [Google Scholar]

[bib51] 51.Gemmetto V., Barrat A., Cattuto C. Mitigation of infectious disease at school: targeted class closure vs school closure. BMC Infect. Dis. 2014;14:695. doi: 10.1186/s12879-014-0695-9. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib52] 52.Stehlé J., Voirin N., Barrat A., Cattuto C., Isella L., Pinton J.-F., Quaggiotto M., Van den Broeck W., Régis C., Lina B., Vanhems P. High-resolution measurements of face-to-face contact patterns in a primary school. PLoS One. 2011;6 doi: 10.1371/journal.pone.0023176. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib53] 53.Murphy C., Laurence E., Allard A. Deep learning of contagion dynamics on complex networks. Nat. Commun. 2021;12:4720. doi: 10.1038/s41467-021-24732-2. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib54] 54.Ding X., Kong L.-W., Zhang H.-F., Lai Y.-C. Deep-learning reconstruction of complex dynamical networks from incomplete data. Chaos. 2024;34 doi: 10.1063/5.0201557. [DOI] [PubMed] [Google Scholar]

[bib55] 55.Dodds P.S., Watts D.J. Universal behavior in a generalized model of contagion. Phys. Rev. Lett. 2004;92 doi: 10.1103/PhysRevLett.92.218701. [DOI] [PubMed] [Google Scholar]

[bib56] 56.Pastor-Satorras R., Vespignani A. Epidemic spreading in scale-free networks. Phys. Rev. Lett. 2001;86:3200–3203. doi: 10.1103/PhysRevLett.86.3200. [DOI] [PubMed] [Google Scholar]

PERMALINK

Network inference with infection frequency matrix by improved Bayesian method

Xin Jin

Yinghong Ma

Le Song

Ruhan Wei

Han Zhou

Summary

Graphical abstract

Highlights

Introduction

Figure 1.

Figure 2.

Algorithm 1. Construction of the Infection Frequency (IF) Matrix.

Results

Performance of usability and fault tolerance on NIIFM

Table 1.

Performance of usability on NIIFM

Figure 3.

Performance of fault tolerance on NIIFM

Figure 4.

Figure 5.

Performance of parameter inference

Figure 6.

Performance of centrality inference

Table 2.

Comparative analysis with other methods

Comparison with other methods on the production network

Figure 7.

Figure 8.

Figure 9.

Comparison with other methods on the real network

Table 3.

Table 4.

Figure 10.

Comparison of NIIFM with EM employed in robustness

Figure 11.

Discussion

Limitations of the study

Resource availability

Lead contact

Materials availability

Data and code availability

Acknowledgments

Author contributions

Declaration of interests

Declaration of generative AI and AI-assisted technologies in the writing process

STAR★Methods

Key resources table

Experimental model and study participant details

Method details

Network inference with infection frequency matrix model

Extract infection frequency matrix from binary-state time series

Theoretical analysis of the likelihood function based on the IF matrix

Inferring network structure by Bayesian methods

Hamiltonian Monte Carlo sampling

Quantification and statistical analysis

Footnotes

Supplemental information

References

Associated Data

Supplementary Materials

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases