Abstract
Summary
Network medicine leverages the quantification of information flow within sub-cellular networks to elucidate disease etiology and comorbidity, as well as to predict drug efficacy and identify potential therapeutic targets. However, current Network Medicine toolsets often lack computationally efficient data processing pipelines that support diverse scoring functions, network distance metrics, and null models. These limitations hamper their application in large-scale molecular screening, hypothesis testing, and ensemble modeling. To address these challenges, we introduce NetMedPy, a highly efficient and versatile computational package designed for comprehensive Network Medicine analyses.
Availability and implementation
NetMedPy is an open-source Python package under an MIT license. Source code, documentation, and installation instructions can be downloaded from https://github.com/menicgiulia/NetMedPy and https://pypi.org/project/NetMedPy. The package can run on any standard desktop computer or computing cluster.
1 Introduction
Network medicine is a post-genomic discipline that harnesses network science principles to analyze the complex interactions within biological systems, viewing diseases as localized disruptions in networks of genes, proteins, and other molecular entities (Barabási et al. 2011). areas(do Valle et al. 2021, Nasirian and Menichetti 2023). By integrating comprehensive biological networks, such as the interactome or protein-protein interaction network (PPI), with databases of disease-associated genes (GDA) and ligand-protein interactions, Network Medicine has: (i) successfully identified functional pathways linked to specific phenotypes and diseases (Sharma et al. 2015); (ii) pinpointed potential drug targets, highlighting opportunities for drug repurposing (Cheng et al. 2018, Patten et al. 2022) and effective drug combinations (Cheng et al. 2019). Additionally, this framework has been extended beyond pharmaceuticals to identify food-derived small molecules that impact specific therapeutic areas (do Valle et al. 2021, Nasirian and Menichetti 2023).
The structure of the biological network plays an essential role in the system’s ability to efficiently propagate signals and withstand random failures.
Consequently, most analyses in Network Medicine focus on quantifying the efficiency of communication between different regions of the interactome.
For example, proteins involved in similar therapeutic areas or disease modules are expected to create a cohesive functional subgraph of proteins that communicate and influence each other. In turn, diseases with high pathobiological similarity typically reside in overlapping neighborhoods of the interactome as measured by the separation score (Menche et al. 2015, Supplementary Information I, available as supplementary data at Bioinformatics online).
Similarly, areas of the interactome perturbed by a drug should be close to its protein targets as quantified by the proximity score (Guney et al. 2016, Supplementary Information I, available as supplementary data at Bioinformatics online).
The speed and reliability of signaling are most commonly quantified through shortest-path metrics with expectations set by uniform or degree-preserving null models, highlighting biological properties not solely determined by link density or degree distribution (Supplementary Information II, available as supplementary data at Bioinformatics online). However, biological information does not always travel along geodesic paths, in part because of differences in the flow of information across links. Therefore, a comprehensive assessment of proximity and separation must consider additional metrics of diffusion and communicability. Given the complexity of these combinatorial settings, efficient algorithms are crucial for exhaustive screening of disease atlases and molecular libraries.
Despite the importance of network measures, most Network Medicine packages focus on the curation of the interactome (Helmy et al. 2022, de Carvalho 2023) or the curation of GDAs (de Weerd et al. 2022, Ben Guebila et al. 2023). Existing packages that calculate proximity and separation have not advanced beyond their initial introduction (Wang et al. 2022, Maier et al. 2024). As a result, these tools remain highly inefficient for large-scale screening and rely exclusively on shortest-path metrics and limited sample size for hypothesis testing. Here, we introduce NetMedPy, an intuitive Python package for Network Medicine designed to quantify network localization, calculate proximity and separation between biological entities, and conduct screenings involving a large number of diseases and drugs efficiently. NetMedPy provides users with four default metrics and null models with automated statistical analyses. Optimized for high performance in large-scale studies, NetMedPy enhances the robustness and scalability of Network Medicine research, facilitating the discovery of mechanisms of action and prioritizing hypotheses for experimental validation.
2 NetMedPy
The workflow of NetMedPy, as illustrated in Fig. 1A, involves: (i) loading the interactome, (ii) computing and storing the distance matrix induced by a selected metric, (iii) loading the desired GDAs and drug targets, and (iv) calculating the selected scoring functions (proximity, separation) with the null models of choice. The pipeline output can be further used in downstream analyses. NetMedPy supports weighted and unweighted networks through a Graph object in NetworkX, a widely used library for network analysis. GDAs are entered using a dictionary format, where keys represent disease names and values are lists of associated genes. A similar approach is used for drug targets. The results are then returned in dictionaries, detailing the statistical analysis performed for proximity and separation. For large-scale screening studies, the output is stored in tabular form using Pandas DataFrames.
Figure 1.
Overview and application of NetMedPy. (A) Diagram of the NetMedPy pipeline. Users first load an interaction network, drug targets, and GDAs. NetMedPy calculates the distance matrix induced by the chosen metric for all nodes in the network. Then users set options for subgraph statistics, study type (e.g. proximity, separation), null model, and execution parameters. Visualization and interpretation are performed outside of NetMedPy. (B) Proximity between Vitamin D’s targets and various diseases. A large negative Z-score indicates a statistically significant closeness between Vitamin D and the disease, while Z-scores close to zero are no different from random. (C) AMSPL distribution and proximity Z-scores for Vitamin D to Inflammation and Factor IX Deficiency, comparing Vitamin D’s targets to disease genes (vertical lines) and degree-preserving log-binned null models (density plots). Inflammation shows a significantly smaller AMSPL. (D) Normalized AMSPL using different distance metrics: Shortest Path (blue), Random Walks (green), Biased Random Walks (orange), and Communicability (pink). (E) NetMedPy execution time (red) versus Proximity implementations found in other packages (black; PMC11223884, PMC4740350, PMC9374494) for increasing gene set sizes. Dots represent time measurements, and straight lines indicate quadratic functions fitted to the data. In each experiment, the proximity Z-Score was calculated using one hundred random samples for illustration purposes. All calculations were performed with a 10-core Intel i9-12900H processor and 32 GB of RAM. Created in BioRender. Aldana, A. (2025) https://BioRender.com/kci1sjt
NetMedPy offers a comprehensive suite of metrics, including shortest paths (Menche et al. 2015), random walks (Masuda et al. 2017), biased random walks (Erten et al. 2011), communicability (Estrada and Hatano 2008), and user-defined options (Supplementary Information III, available as supplementary data at Bioinformatics online). This wide range of metrics allows researchers to tailor their analysis to the specific requirements of various biological questions. The ability to define custom metrics further empowers researchers to develop specialized approaches for their unique research needs. By applying ensemble learning techniques, researchers can also combine the strengths of diverse metrics, enhancing the reliability and depth of their conclusions. This integrated approach can help prioritize experimental tests, improving cost-efficiency and reducing the time and effort required for validation.
NetMedPy provides primary functions for the analysis of disease modules, proximity, separation, and large-scale screening studies, including:
Modules: Given an interactome and a set of nodes, A, NetMedPy extracts the largest connected component (LCC) or subgraph formed by set A and calculates the statistical significance of the LCC size (Supplementary Information I.I, available as supplementary data at Bioinformatics online).
Proximity: The original proximity measure between node sets A and B is asymmetric, meaning that . NetMedPy addresses this property by offering both an asymmetric and symmetric proximity Z-score (Supplementary Information I.II, available as supplementary data at Bioinformatics online).
Separation: NetMedPy calculates separation (Supplementary Information I.II) and its statistical significance, expressed by Z-Score and P-value.
Screening: NetMedPy incorporates a screening function to calculate network measures between sets of diseases and drugs. The function runs in parallel, enhancing the computational efficiency of multi-core processing capabilities.
Network Medicine leverages null models that generate random samples as benchmarks. By comparing observed network measures against these null hypotheses, researchers can confidently assert the non-randomness of their findings, thereby substantiating the biological relevance of the observed relationships. NetMedPy enhances the robustness of this statistical analysis by incorporating various null models: Perfect Degree Match, Logarithmic Binning, Strength Binning, Uniform Distribution, and user-provided models (Supplementary Information II, available as supplementary data at Bioinformatics online). Each null model selects random node sets differently, allowing researchers to account for diverse network properties and biases that might influence the analysis.
3 Case study with vitamin D
To showcase NetMedPy, we evaluated the role of Vitamin D for an array of 13 disease phenotypes and endophenotypes, selected based on the strength of experimental evidence supporting Vitamin D as a treatment. These categories include strong support [Inflammation, Asthma, Coronary Artery Disease (CAD), Vitamin D Deficiency, Chronic Obstructive Pulmonary Disease (COPD), Rickets], medium support (Brain Neoplasms, Rett Syndrome), and low support (Prader-Willi Syndrome, Factor VII Deficiency, Beta Thalassemia, Fragile X Syndrome, Factor IX Deficiency). We curated Vitamin D’s drug–target data (Piras et al. 2024) (Supplementary Information IV.II, available as supplementary data at Bioinformatics online) and the GDAs of each therapeutic area with and without experimental evidence of Vitamin D modulation (Supplementary Information IV.III, available as supplementary data at Bioinformatics online). Vitamin D is known to (i) reduce the activity of pro-inflammatory cells, (ii) regulate blood pressure, and (iii) reduce proliferation and boost apoptosis of cancer cells by regulating gene expression via Vitamin D receptors. Leveraging an interactome that integrates the PPIs reported in Luck et al. (2020), Huttlin et al. (2021), and Maron et al. (2021) (Supplementary Information IV.I, available as supplementary data at Bioinformatics online) we calculate the proximity between Vitamin D’s targets and each GDA set (Fig. 1B).
Our findings reveal that the observed average minimum shortest path length (AMSPL) between Vitamin D and inflammation is significantly smaller than expected when considering node sets of the same size and comparable degree (Z-Score = −7.64), confirming that Vitamin D influences inflammatory processes. Conversely, Factor IX Deficiency, a Mendelian disorder, is more distant from Vitamin D’s targets than expected by chance (Z-Score = 1.34), providing a reasonable negative result (Fig. 1B and C). When evaluating the proximity values between Vitamin D and all selected phenotypes, we find that inflammation and related diseases such as asthma show the closest proximity to Vitamin D. This result stands in contrast to diseases with no known association to Vitamin D (e.g. Prader-Willi Syndrome, Factor VII deficiency, Beta Thalassemia, Fragile X Syndrome, Factor IX Deficiency), aligning with existing literature (Fig. 1B). Finally, the AMSPL-equivalents for four different metrics display a robust ranking of the results under different notions of distance (Fig. 1D and Fig. 1, available as supplementary data at Bioinformatics online).
4 NetMedPy performance evaluation and comparison
Quantifying the statistical significance of network measures such as proximity and separation in large networks is computationally intensive, as it necessitates comparing selected node sets with randomly generated ones to obtain Z-scores and empirical P-values (Supplementary Information I-SII, available as supplementary data at Bioinformatics online). NetMedPy leverages parallelism and precalculated distances between all pairs of nodes to enhance performance. This optimization allows distances to be computed once and reused multiple times, significantly improving efficiency and facilitating large-scale screening studies. Figure 1E illustrates the execution time of NetMedPy for calculating proximity between random node sets of increasing size. Our findings show that NetMedPy completes this task faster than the regular proximity implementation, found in different Network Medicine packages (Patten et al. 2022, Wang et al. 2022, Maier et al. 2024), even accounting for the time required to pre-calculate the distances. Consequently, as the number of disease genes and drug-disease pairs increases, NetMedPy demonstrates a substantial performance improvement.
5 Discussion
We developed NetMedPy, a user-friendly Python package designed to optimize tools for Network Medicine applications. Tailored for high-performance computing, NetMedPy efficiently handles large-scale data, making it ideal for studies involving drug screening, drug repurposing, and comorbidity identification. The package offers functionalities for extracting the LCC and calculating proximity and separation between node sets, with options for both symmetric and asymmetric measures. Additionally, it supports various null models to validate the statistical significance of network metrics, ensuring robust analytical outcomes.
As in many areas of data science, the quality of predictions in NetMedPy is highly dependent on the input data. Incomplete annotations and erroneous associations can introduce variability in the results. NetMedPy enables efficient robustness analyses under perturbations to the input data (Supplementary Information VI, available as supplementary data at Bioinformatics online), and also facilitates a wide range of studies in case-specific weighted and unweighted networks. These include PPI interaction networks for drug repurposing (Fang et al. 2021), virus–host and drug–target networks (Zhou et al. 2020), as well as recent advances in transformers-assisted network medicine (Spector et al. 2025).
Choosing appropriate null models is essential in biological network analysis to ensure meaningful results. NetMedPy currently supports various null models, including degree-preserving node randomization. These approaches maintain the overall topology of the selected network while reassigning biological entities to different nodes. Therefore, they effectively preserve global network characteristics, such as degree-degree correlations (assortativity or disassortativity), while retaining key structural features of the selected nodes under study, including node degrees and clustering, which are essential for biologically meaningful comparisons. In biological networks, each edge carries specific biochemical significance, reflecting distinct molecular interactions and processes within the organism. In contrast, randomizing edges would disrupt this biological context, breaking meaningful associations and potentially resulting in artificially extreme P-values. Such outcomes may not accurately represent realistic biological scenarios, compromising the interpretability of the results. Nevertheless, link randomization can be valuable for assessing robustness under alternative hypotheses or for introducing explicit topological perturbations (Zhou et al. 2023). Future versions of NetMedPy may incorporate degree-preserving edge randomization to natively support such analyses.
Furthermore, the versatility of NetMedPy extends its value to numerous scientific fields that utilize networks. For example, it can enhance social network analysis by investigating social interactions and information dissemination. In epidemiology, NetMedPy can analyze disease spread and the effectiveness of health interventions within interconnected populations.
In conclusion, NetMedPy is a valuable tool for researchers, enabling them to uncover new insights and address complex problems with efficient network analysis techniques.
Supplementary Material
Acknowledgements
We thank Ruisheng Wang for the process to curate disease–gene associations and Andrea Piras for the collection of drug–target associations.
Contributor Information
Andres Aldana, Network Science Institute, Northeastern University, 177 Huntington Avenue, Boston, MA 02115, USA.
Michael Sebek, Network Science Institute, Northeastern University, 177 Huntington Avenue, Boston, MA 02115, USA.
Gordana Ispirova, Channing Division of Network Medicine, Brigham and Women’s Hospital, Harvard Medical School, 181 Longwood Avenue, Boston, MA 02115, USA.
Rodrigo Dorantes-Gilardi, Network Science Institute, Northeastern University, 177 Huntington Avenue, Boston, MA 02115, USA.
Joseph Loscalzo, Division of Cardiovascular Medicine, Brigham and Women’s Hospital, Harvard Medical School, 77 Avenue Louis Pasteur, Boston, MA 02115, USA.
Albert-László Barabási, Network Science Institute, Northeastern University, 177 Huntington Avenue, Boston, MA 02115, USA.
Giulia Menichetti, Network Science Institute, Northeastern University, 177 Huntington Avenue, Boston, MA 02115, USA; Channing Division of Network Medicine, Brigham and Women’s Hospital, Harvard Medical School, 181 Longwood Avenue, Boston, MA 02115, USA; Harvard Data Science Initiative, Harvard University, 114 Western Avenue, Boston, MA 02134, USA.
Author contributions
Andrés Aldana (Formal analysis [lead], Methodology [equal], Software [lead], Validation [equal], Visualization [equal], Writing—original draft [lead], Writing—review & editing [equal]), Michael Sebek (Data curation [equal], Methodology [equal], Software [supporting], Validation [supporting], Visualization [supporting], Writing—original draft [equal], Writing—review & editing [equal]), Gordana Ispirova (Methodology [supporting], Software [equal], Writing—review & editing [supporting]), Rodrigo Dorantes-Gilardi (Methodology [supporting], Software [supporting]), Albert-Laszio Barabasi (Writing—review & editing [supporting]), Joseph Loscalzo (Writing—review & editing [supporting]), and Giulia Menichetti (Conceptualization [lead], Funding acquisition [lead], Project administration [lead], Validation [supporting], Visualization [supporting], Writing—review & editing [supporting])
Supplementary data
Supplementary data are available at Bioinformatics online.
Conflict of interest
A.-L.B. and J.L. are scientific cofounders of Scipher Medicine, Inc., which focuses on network medicine approaches to disease biomarker and drug target discovery. All other authors have no competing interests.
Funding
This work was supported by National Institutes of Health/National Heart, Lung, and Blood Institute [K25HL173665] and American Heart Association [24MERIT1185447] to G.M.; the Veteran’s Affairs Medical Center of Boston Contract #36C24122N0769 and the European Union’s Horizon 2020 Research and Innovation Programme under grant agreement No 810115—DYNASNET to A.-L.B.; National Institutes of Health [U01HG007691, R01HL155107, R01HL155096, R01HL166137], American Heart Assocation [957729, 24MERIT1185447], and European Union HorizonHealth2021 [101057619] to J.L.
Data availability
NetMedPy is freely available at https://github.com/menicgiulia/NetMedPy. All data used in this paper can be downloaded at https://github.com/menicgiulia/NetMedPy/tree/main/examples/VitaminD/data.
References
- Barabási A-L, Gulbahce N, Loscalzo J. Network medicine: a network-based approach to human disease. Nat Rev Genet 2011;12:56–68. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ben Guebila M, Wang T, Lopes-Ramos CM et al. The Network Zoo: a multilingual package for the inference and analysis of gene regulatory networks. Genome Biol 2023;24:45. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Cheng F, Desai RJ, Handy DE et al. Network-based approach to prediction and population-based validation of in silico drug repurposing. Nat Commun 2018;9:2691. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Cheng F, Kovács IA, Barabási A-L. Network-based prediction of drug combinations. Nat Commun 2019;10:1197. [DOI] [PMC free article] [PubMed] [Google Scholar]
- de Carvalho LM. csppinet: a Python package for context-specific biological network construction and analysis based on omics data. bioRxiv, 10.1101/2023.05.23.541999, 2023, preprint: not peer reviewed. [DOI]
- de Weerd HA, Åkesson J, Guala D et al. MODalyseR—a novel software for inference of disease module hub regulators identified a putative multiple sclerosis regulator supported by independent eQTL data. Bioinform Adv 2022;2:vbac006. [DOI] [PMC free article] [PubMed] [Google Scholar]
- do Valle IF, Roweth HG, Malloy MW et al. Network medicine framework shows that proximity of polyphenol targets and disease proteins predicts therapeutic effects of polyphenols. Nat Food 2021;2:143–55. [DOI] [PubMed] [Google Scholar]
- Erten S, Bebek G, Ewing RM et al. DA DA: degree-aware algorithms for network-based disease gene prioritization. BioData Min 2011;4:19. 10.1186/1756-0381-4-19 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Estrada E, Hatano N. Communicability in complex networks. Phys Rev E Stat Nonlin Soft Matter Phys 2008;77:036111. [DOI] [PubMed] [Google Scholar]
- Fang J, Zhang P, Zhou Y et al. Endophenotype-based in silico network medicine discovery combined with insurance record data mining identifies sildenafil as a candidate drug for Alzheimer’s disease. Nat Aging 2021;1:1175–88. 10.1038/s43587-021-00138-z [DOI] [PMC free article] [PubMed] [Google Scholar]
- Guney E, Menche J, Vidal M et al. Network-based in silico drug efficacy screening. Nat Commun 2016;7:10331. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Helmy M, Mee M, Ranjan A et al. OpenPIP: an open-source platform for hosting, visualizing and analyzing protein interaction data. J Mol Biol 2022;434:167603. [DOI] [PubMed] [Google Scholar]
- Huttlin EL, Bruckner RJ, Navarrete-Perea J et al. Dual proteome-scale networks reveal cell-specific remodeling of the human interactome. Cell 2021;184:3022–40.e28. 10.1016/j.cell.2021.04.011 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Luck K, Kim D-K, Lambourne L et al. A reference map of the human binary protein interactome. Nature 2020;580:402–8. 10.1038/s41586-020-2188-x [DOI] [PMC free article] [PubMed] [Google Scholar]
- Maier A, Hartung M, Abovsky M et al. Drugst. one—a plug-and-play solution for online systems medicine and network-based drug repurposing. Nucleic Acids Res 2024;52:gkae388–W488. 10.1093/nar/gkae388 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Maron BA, Wang R-S, Shevtsov S et al. Individualized interactomes for network-based precision medicine in hypertrophic cardiomyopathy with implications for other clinical pathophenotypes. Nat Commun 2021;12:873. 10.1038/s41467-021-21146-y [DOI] [PMC free article] [PubMed] [Google Scholar]
- Masuda N, Porter MA, Lambiotte R. Random walks and diffusion on networks. Phys Rep 2017;716–717:1–58. 10.1016/j.physrep.2017.07.007 [DOI] [Google Scholar]
- Menche J, Sharma A, Kitsak M et al. Uncovering disease-disease relationships through the incomplete human interactome. Science 2015;347:1257601. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Nasirian F, Menichetti G. Molecular interaction networks and cardiovascular disease risk: the role of food bioactive small molecules. Arterioscler Thromb Vasc Biol 2023;43:813–23. [DOI] [PubMed] [Google Scholar]
- Patten JJ, Keiser PT, Morselli-Gysi D et al. Identification of potent inhibitors of SARS-CoV-2 infection by combined pharmacological evaluation and cellular network prioritization. iScience 2022;25:104925. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Piras A, Chenghao S, Sebek M et al. Cpiextract: a software package to collect and harmonize small molecule and protein interactions. bioRxiv, 10.1101/2024.07.03.601957, 2024, preprint: not peer reviewed. [DOI]
- Sharma A, Menche J, Huang CC et al. A disease module in the interactome explains disease heterogeneity, drug response and captures novel pathways and genes in asthma. Hum Mol Genet 2015;24:3005–20. 10.1093/hmg/ddv001 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Spector J, Aldana A, Sebek M et al. Transformers enhance the predictive power of network medicine. medRxiv, 10.1101/2025.01.27.25321204, 2025, preprint: not peer reviewed. [DOI]
- Wang Y, Aldahdooh J, Hu Y et al. DrugRepo: a novel approach to repurposing drugs based on chemical and genomic features. Sci Rep 2022;12:21116. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zhou Y, Hou Y, Shen J et al. Network-based drug repurposing for novel coronavirus 2019-ncov/sars-cov-2. Cell Discov 2020;6:14. 10.1038/s41421-020-0153-3 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zhou Y, Liu Y, Gupta S et al. A comprehensive sars-cov-2–human protein–protein interactome reveals covid-19 pathobiology and potential host therapeutic targets. Nat Biotechnol 2023;41:128–39. 10.1038/s41587-022-01474-0 [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
NetMedPy is freely available at https://github.com/menicgiulia/NetMedPy. All data used in this paper can be downloaded at https://github.com/menicgiulia/NetMedPy/tree/main/examples/VitaminD/data.

