Skip to main content

This is a preprint.

It has not yet been peer reviewed by a journal.

The National Library of Medicine is running a pilot to include preprints that result from research funded by NIH in PMC and PubMed.

bioRxiv logoLink to bioRxiv
[Preprint]. 2024 Dec 9:2024.12.05.627020. [Version 1] doi: 10.1101/2024.12.05.627020

BioMedGraphica: An All-in-One Platform for Biomedical Prior Knowledge and Omic Signaling Graph Generation

Heming Zhang, Shunning Liang, Tim Xu, Wenyu Li, Di Huang, Yuhan Dong, Guangfu Li, J Philip Miller, S Peter Goedegebuure, Marco Sardiello, Jonathan Cooper, William Buchser, Patricia Dickson, Ryan C Fields, Carlos Cruchaga, Yixin Chen, Michael Province, Philip Payne, Fuhai Li
PMCID: PMC11661111  PMID: 39713411

Abstract

Artificial intelligence (AI) is revolutionizing scientific discovery because of its super capability, following the neural scaling laws, to integrate and analyze large-scale datasets to mine knowledge. Foundation models, large language models (LLMs) and large vision models (LVMs), are among the most important foundations paving the way for general AI by pre-training on massive domain-specific datasets. Different from the well annotated, formatted and integrated large textual and image datasets for LLMs and LVMs, biomedical knowledge and datasets are fragmented with data scattered across publications and inconsistent databases that often use diverse nomenclature systems in the field of AI for Precision Health and Medicine (AI4PHM). These discrepancies, spanning different levels of biomedical organization from genes to clinical traits, present major challenges for data integration and alignment. To facilitate foundation AI model development and applications in AI4PHM, herein, we developed BioMedGraphica , an all-in-one platform and unified text-attributed knowledge graph (TAKG), consists of 3,131,788 entities and 56,817,063 relations, which are obtained from 11 distinct entity types and harmonizes 29 relations/edge types using data from 43 biomedical databases. All entities and relations are labeled a unique ID and associated with textual descriptions (textual features). Since covers most of research entities in AI4PHM, BioMedGraphica supports the zero-shot or few-shot knowledge discoveries via new relation prediction on the graph. Via a graphical user interface (GUI), researchers can access the knowledge graph with prior knowledge of target functional annotations, drugs, phenotypes and diseases (drug-protein-disease-phenotype), in the graph AI ready format. It also supports the generation of knowledge-multi-omic signaling graphs to facilitate the development and applications of novel AI models, like LLMs, graph AI, for AI4PHM science discovery, like discovering novel disease pathogenesis, signaling pathways, therapeutic targets, drugs and synergistic cocktails.

Full Text

The Full Text of this preprint is available as a PDF (13.5 MB). The Web version will be available soon.


Articles from bioRxiv are provided here courtesy of Cold Spring Harbor Laboratory Preprints

RESOURCES