Skip to main content
editorial
. 2024 Aug 14;4(1):vbae099. doi: 10.1093/bioadv/vbae099

Table 1.

Prominent open-source benchmark datasets for machine learning on biological networks.

Data type Database Task type Prediction tasks
General Long Range Graph Benchmark (Dwivedi et al. 2022b) Edge-level Molecular bond
Graph-level Peptide function, peptide structure

General Open Biomedical Network Node-level Protein function
Benchmark (Liu and Krishnan 2024) Edge-level Disease–gene association

General Open Graph Benchmark (Hu et al. 2020b) Node-level Protein function
Edge-level Protein–protein association, drug–drug interaction, heterogeneous interaction, vessels in mouse brain
Graph-level Molecular property, species-specific protein association

General SubGNN Benchmarks (Alsentzer et al. 2020) Subgraph-level Proteins associated with biological process, rare neurological disorders phenotype-based diagnosis, and rare metabolic disorders phenotype-based diagnosis

General Temporal Graph Benchmark (Huang et al. 2024) Node-level Dynamic node affinity prediction
Edge-level Dynamic link prediction

Knowledge graph PrimeKG (Chandak et al. 2023) Node-level Identity of protein/gene, disease, drug, biological process, pathway, phenotype, molecular function, cellular component, exposure, and anatomical region
Edge-level Protein–protein interaction, disease–drug indication, disease–drug contraindication, disease-drug off-label use, disease–phenotype association, disease–disease association, disease–protein association, disease–exposure association, phenotype–protein association, pathway–gene association, etc.

Knowledge graph Phenotype Knowledge Translator (Callahan et al. 2024) Node-level Identity of tissue, cell, DNA, RNA, gene, miRNA, variant, protein, disease, biological process, pathway, phenotype, molecular function, cellular component, and chemical
Edge-level Tissue-/cell-specific gene expression, gene-variant association, variant-disease association, chemical-disease association, chemical-pathway association, etc.

Molecular design Protein sEquence undERstanding (Xu et al. 2022) Edge-level Protein–protein interaction, contact prediction
Graph-level Molecular property (e.g. fold classification, secondary structure prediction)

Molecular design Tasks Assessing Protein Embeddings (Rao et al. 2019) Edge-level Protein–protein interaction, contact prediction
Graph-level Molecular property (e.g. fold classification, secondary structure prediction)

Molecular design Graph Explainability Library (Agarwal et al. 2023) Graph-level Molecular mutagenic property, molecular functional group (e.g. benzine rings, fluoride carbonyl)

Neurology NeuroGraph (Said et al. 2023) Graph-level Donor demographics (age and gender), task states (emotion processing, gambling, language, motor, relational processing, social cognition, and working memory), cognitive traits (working memory, fluid intelligence)

Therapeutic discovery AVIDa-hIL6 (Tsuruta et al. 2024) Edge-level Antigen–antibody interaction

Therapeutic discovery Therapeutic Data Commons (Huang et al. 2021) Edge-level Drug–target interaction, drug–drug interaction, protein–protein interaction, disease–gene association, drug–response prediction, drug–synergy prediction, peptide-MHC binding, antibody–antigen affinity, miRNA–target prediction, catalyst prediction, TCR–epitope binding, and clinical trial outcomes
Graph-level Molecular property (e.g. synthesizability, drug-likeness)

Databases are categorized by data type. The table is organized alphabetically by data type and database names.