An ontology-based knowledge graph for representing interactions involving RNA molecules

Emanuele Cavalleri; Alberto Cabri; Mauricio Soto-Gomez; Sara Bonfitto; Paolo Perlasca; Jessica Gliozzo; Tiffany J Callahan; Justin Reese; Peter N Robinson; Elena Casiraghi; Giorgio Valentini; Marco Mesiti

doi:10.1038/s41597-024-03673-7

. 2024 Aug 22;11:906. doi: 10.1038/s41597-024-03673-7

An ontology-based knowledge graph for representing interactions involving RNA molecules

Emanuele Cavalleri ¹, Alberto Cabri ¹, Mauricio Soto-Gomez ¹, Sara Bonfitto ¹, Paolo Perlasca ¹, Jessica Gliozzo ¹, Tiffany J Callahan ², Justin Reese ³, Peter N Robinson ^4,⁵, Elena Casiraghi ^1,^3,⁵, Giorgio Valentini ^1,⁵, Marco Mesiti ^1,^3,^✉

PMCID: PMC11341713 PMID: 39174566

Abstract

The “RNA world” represents a novel frontier for the study of fundamental biological processes and human diseases and is paving the way for the development of new drugs tailored to each patient’s biomolecular characteristics. Although scientific data about coding and non-coding RNA molecules are constantly produced and available from public repositories, they are scattered across different databases and a centralized, uniform, and semantically consistent representation of the “RNA world” is still lacking. We propose RNA-KG, a knowledge graph (KG) encompassing biological knowledge about RNAs gathered from more than 60 public databases, integrating functional relationships with genes, proteins, and chemicals and ontologically grounded biomedical concepts. To develop RNA-KG, we first identified, pre-processed, and characterized each data source; next, we built a meta-graph that provides an ontological description of the KG by representing all the bio-molecular entities and medical concepts of interest in this domain, as well as the types of interactions connecting them. Finally, we leveraged an instance-based semantically abstracted knowledge model to specify the ontological alignment according to which RNA-KG was generated. RNA-KG can be downloaded in different formats and also queried by a SPARQL endpoint. A thorough topological analysis of the resulting heterogeneous graph provides further insights into the characteristics of the “RNA world”. RNA-KG can be both directly explored and visualized, and/or analyzed by applying computational methods to infer bio-medical knowledge from its heterogeneous nodes and edges. The resource can be easily updated with new experimental data, and specific views of the overall KG can be extracted according to the bio-medical problem to be studied.

Subject terms: Data integration, Data acquisition

Background & Summary

The involvement of RNAs in various physiological processes has been ascertained by several studies^1–3 that have revealed the pervasive transcription of an unexpected variety of RNA molecules^4–7. These molecules can lead to a significant breakthrough in the treatment of cancer, genetic and neurodegenerative disorders, cardiovascular and infectious diseases⁸. The study of RNA is also one of the most promising avenues of research in therapeutics, as evidenced by the recent success of mRNA-based vaccines for the COVID-19 pandemic⁹, for the treatment of melanoma¹⁰, for the development of new drugs that can target both proteins and mRNA, as well as other non-coding RNAs, and for encoding missing or defective proteins, regulating the transcriptome, and mediating DNA or RNA editing¹¹. Thus, RNA technology significantly broadens the set of druggable targets, and is also less expensive than other technologies (e.g., drug synthesis based on recombinant proteins), due to the relatively simple structure of RNA molecules that facilitate their biochemical synthesis and chemical modifications¹². Non-coding RNAs (ncRNAs) comprise a large range of RNA species¹³, and a large set of scientific data is made publicly available by several genomics laboratories representing different kinds of interactions among them and with other bio-entities (e.g., genes, proteins, chemicals, diseases, and phenotypes).

The possibility of integrating the interactions that they make available would be of great relevance for knowledge discovery and also for the development of new RNA-based drugs. However, these sources adopt different data models, formats, and conventions for the representation of the bio-entities, and different semantics can be assigned to the proposed interactions. The extraction and integration of information from even two data sources for conducting knowledge discovery activity would require a lot of effort from researchers. To address these issues, Knowledge graphs (KGs)¹⁴ have emerged as a compelling abstraction for organizing interrelated knowledge in different domains and a way for integrating heterogeneous information extracted from multiple data sources with the aim of highlighting complex interdependencies and uncovering hidden relationships. KGs can be represented both with property graphs (e.g., Neo4j¹⁵) or according to the Resource Description Framework (RDF¹⁶) with different advantages and disadvantages¹⁷. When a KG is generated according to an ontology, it contains a schema part (denoted TBox or terminologies) and a data part (denoted ABox, facts, or assertions) on top of which different kinds of reasoning activities can be conducted using expressive languages (like OWL¹⁸, DL¹⁹, or SPARQL²⁰). KGs have started to play a central role also in the life sciences²¹ for the representation of bio-entities and their interactions and for the application of AI approaches for discovering new knowledge and eventually for explaining it. Different ontologies have been proposed for systematizing the corpus of terms used to describe the function and localization of bio-entities and for offering a formal framework to represent biological knowledge. Specific biological KGs (e.g., PrimeKG²², Human Disease benchmark KG²³, ReproTox-KG²⁴, Monarch Knowledge Graph²⁵, Oregano Knowledge Graph²⁶, and Knowledge Base of Biomedicine²⁷) have been recently constructed for conducting different kinds of analysis and supporting research activities.

In this paper we describe RNA-KG, the first ontology-based KG for representing coding and non-coding RNA molecules and their interactions with other biomolecular data as well as with pathways, abnormal phenotypes and diseases to support the study and the discovery of the biological role of the “RNA-world”. RNA-KG contains RDF triples extracted from more than 60 public data sources and also integrates related bio-medical concepts. RNA-KG can be exploited for the study of RNA molecules and the development of innovative graph algorithms to support knowledge discovery in data science. A big effort has been dedicated to the characterization of the data sources and to the identification of the bio-medical ontological concepts that better represent the information provided by the considered data sources and the interactions involving RNA molecules. This work culminated in the construction of a meta-graph that represents all the possible interactions that can be devised from the considered data sources. The relationships have been grounded according to the Relation Ontology (RO²⁸), which ensures common semantics for the different relationships that can be extracted from the sources. Relying on the generated meta-graph and exploiting the Phenotype Knowledge Translator (PheKnowLator²³) tool, we extracted 673,825 nodes and 12,692,212 high-quality edges according to the metrics provided in each data source. Different analyses have been conducted to characterize the types of nodes and interactions that are represented in RNA-KG, their distribution, and the topological structure. RNA-KG can be exported according to different knowledge models and can be accessed through a SPARQL endpoint.

RNA-KG takes advantage of a preliminary meta-graph²⁹ and the PheKnowLator system²³ for the construction of semantically rich, large-scale biomedical KGs that are Semantic Web compliant and amenable to automatic OWL reasoning, and conforming to contemporary property graph standards. We have used PheKnowLator to download data, transform and/or pre-processing resources into edge lists, construct KGs, and generate a wide range of outputs.

Related Work

For a better understanding of the approach that we have followed in the construction of RNA-KG, we first outline the methods developed for integrating graph-based biomedical heterogeneous data sources and then summarize the main characteristics of the different types of RNA molecules. Finally, we outline the bio-ontologies that can be exploited for the characterization of RNA molecules and the bio-entities with which they are related.

Approaches for the construction of bio-medical knowledge graphs

Data integration is a widely recognized challenge in data management, that prompted the development of numerous approaches to handle relational data³⁰. However, the proliferation of data formats (like CSV, JSON, and XML), alongside the variability in representing similar data types^31,32, has underscored the necessity of leveraging ontologies as global models for both accessing (OBDA - Ontology-Based Data Access) and integrating (OBDI - Ontology-Based Data Integration) data sources³³. In OBDA, queries are expressed according to ontology terms, with mappings between the ontology and data sources’ schema described through declarative rules. Typically, two approaches have been proposed for enabling access and integration across different data sources: materialization and virtualization. Materialization involves aligning local data formats to the ontology concepts and relationships, whereas virtualization executes transformations on the fly during query evaluation, utilizing mapping rules and ontology. In virtualization, only data pertinent to the query from the original sources are accessed. Materialization offers swift and accurate data access to the data that are collected and organized in a centralized repository. However, frequent changes in data sources may compromise data freshness. Conversely, virtualization allows access to fresh data but may introduce delays and inconsistencies when the local source schema changes. Various approaches exist for specifying mapping rules, including R2RML³⁴ (a W3C standard for relational to RDF mapping), RML³⁵ (a R2RML extension for dealing with multiple formats), SPARQL-Generate³⁶, YARRRML³⁷, and ShExML³⁸, catering to data heterogeneity.

In the biological domain, significant efforts are being dedicated to constructing knowledge graphs (KGs) by integrating diverse public sources, using materialization and virtualization approaches. For instance, Zhang et al.³⁹ utilized a Connecting Ontology (CO) to integrate external ontologies describing involved data sources. By leveraging algorithms for fusion and annotation integration, they generated an enriched KG spanning multiple data sources, annotated with an integrated biological ontology combining Gene⁴⁰, Trait⁴¹, Disease⁴², and Plant⁴³ ontologies. PrimeKG²² was developed to represent comprehensive views of diseases, integrating over 20 high-quality resources capturing information such as disease-associated perturbations and molecular pathways. Ontologies (e.g. Disease Gene Network, Mayo Clinical Knowledgebase, Mondo, Bgee, and DrugBank) have been used to annotate the collected data. ReproTox-KG²⁴ combines gene, drug, and preclinical small molecule information with birth defect associations, aiming to predict compound-induced birth abnormalities and whether these compounds are likely to cross the placental barrier. The information is extracted from scientific literature taking into account ontologies like HPO⁴⁴, CDC birth-defect terms⁴⁵, Geneshot⁴⁶ for connecting genes with birth-defect terms, DrugCentral⁴⁷ for connecting drugs with birth-defect terms, and LINCS L1000 data⁴⁸ for drug-gene associations. Other examples include the Oregano KG for drug repositioning²⁶ and a virtualization approach proposed by Sima et al.⁴⁹ for federating three data sources (Bgee, OMA, and UNIProtKB) through ontology-based integration. Specifically, starting from the GenEx semantic model for gene expression, mapping rules were proposed to deal with the different formats of the three sources and faced the issue of joint queries across the sources by leveraging SPARQL endpoints.

All these papers point out the difficulties that arise when trying to integrate different data sources that exploit different data models, formats, and ontologies. Specifically, data redundancies, data duplicates, and lack of common identifier mechanisms must be properly addressed. In the case of RNA data integration, we also have to consider the lack of specific ontologies for the description of all possible non-coding RNA sequences, and the presence of ontologies that are not well-recognized by the community because still in their infancy. All these aspects must be properly addressed in the generation of RNA-KG.

RNA molecules

The wide functional role of the different types of RNA molecules opened the way to novel therapeutics able to revolutionize the treatment and prevention of human diseases⁵⁰. Indeed, RNA molecules play a fundamental role in cell biology, performing a wide range of functions either i) directly by regulating gene expression, exhibiting enzymatic activity, through the modification or regulation of other RNAs or other bio-molecules, or ii) indirectly by being translated into proteins. Figure 1 shows the main classes of RNAs.

Fig. 1 — Schematic representation of the RNA network within a cell.

Coding RNA

Eukaryotic messenger RNA (mRNA) primary transcripts undergo extensive processing to obtain their protein-encoding mature form from pre-mRNA; mRNA is finally translated by ribosomes into sequence of amino acids connected through peptide bonds⁵¹.

Non-coding RNA

Non-coding RNAs (ncRNAs) are transcripts not translated into proteins. Two subgroups named long non-coding RNAs (lncRNAs) and small non-coding RNAs (sncRNAs) can be distinguished relying on a length cut-off of 200 nucleotides¹³.

Long non-coding RNA (lncRNA)

lncRNAs constitute the bulk of transcription products and hold crucial importance in the onset and advancement of diseases⁵². lncRNAs are involved in competitive endogenous RNA (ceRNA) regulation, transcriptional regulation and epigenetic regulation⁵³. They can modulate chromatin function, regulate the assembly and function of membraneless nuclear bodies, alter the stability and translation of cytoplasmic mRNAs, and interfere with signaling pathways⁵⁴. Its transcriptional regulation activity is realized either modifying transcription factor activity, or regulating the association and activity of co-regulators⁵⁵. lncRNAs are also involved in post-transcriptional regulation. Several studies showed that lncRNAs act as competitive endogenous RNAs (ceRNAs) by “sponging” target miRNAs to regulate mRNA expression⁵³. Among lncRNAs, circular RNAs (circRNAs) derive from alternative splicing and can be involved in the regulation of splicing events. Their abnormal expression is detected in numerous human diseases, including cancer and neurodegenerative disorders like Alzheimer’s and Parkinson’s disease⁵⁶. lncRNAs are also well-known epigenetic regulators that guide target enzymes necessary to control chromatin organization. For instance, they are involved in the inactivation of X-chromosome in female mammals (e.g., Xist) and in genomic imprinting (e.g., Kcnq1ot1) by recruiting histone modifying enzymes leading to gene silencing^57,58. Moreover, lncRNAs and circRNAs can interact directly or indirectly with the enzyme families involved in DNA methylation (i.e., DNMT) and demethylation (i.e., TET) to modulate methylation at specific genomic positions, in turn being involved in many tumors⁵⁹ but also in physiological processes (e.g., Kcnq1ot1 interacts with Dnmt1 to further control the silencing of ubiquitous imprinted genes⁵⁸).

Small non-coding RNA (sncRNA)

sncRNAs participate in multiple cellular biological processes, encompassing: translation, RNA interference (RNAi) pathways, splicing and self-cleavage processes, biochemical reactions catalysis, and targeted gene editing.

sncRNAs involved in the translation process

Numerous sncRNAs play various important roles in the translation phase of protein biosynthesis, such as: some types of small rRNA, transfer RNAs (tRNAs), small nuclear RNAs (snRNAs), small nucleolar RNAs (snoRNAs), and Small Cajal body-specific RNAs (scaRNAs, snoRNAs localized in the Cajal body). While rRNAs are the structural and enzymatic scaffold of the ribosome, tRNAs have a unique structure comprising an acceptor stem that binds to a specific amino acid and a distinct anticodon sequence of three bases complementary to the mRNA codon triplet. This configuration guarantees the accurate translation of mRNA codons into their corresponding chain of amino acids. snRNAs and snoRNAs mainly direct the chemical modification of other RNAs (e.g. rRNA and tRNA) and regulate the chromatin condensation state and DNA accessibility.

sncRNAs associated with RNA interference pathways

RNA interference is a post-transcriptional regulation mechanism of gene expression and its alteration is associated with many pathologies⁶⁰. sncRNAs participating in RNAi pathways include: microRNAs (miRNAs), short interfering RNAs (siRNAs), short hairpin RNAs (shRNAs), antisense oligonucleotides (ASOs), piwi-interacting RNAs (piRNAs), tRNA-derived fragments (tRFs), and tRNA-derived small RNAs (tsRNAs). Mature miRNAs, siRNAs, shRNAs, and ASOs modulate mRNA expression by inhibiting translation or facilitating the degradation of the target transcript via complementary base pairing. In contrast to siRNAs, a single miRNA has the capacity to concurrently regulate hundreds of protein-coding genes and transcription factors (TFs). The term for miRNAs that are detected in various human biological fluids but originate from exogenous sources is xeno-miRNAs⁶¹. On the other hand, ASOs suppress the expression of nuclear targets with greater efficacy while siRNA have better performance in the inhibition of cytoplasmic target⁶² by recruiting the RNA-induced Silencing Complex (RISC) through siRNA:mRNA duplex, which in turn catalyzes the mRNA cleavage reaction. piRNAs and tRFs similarly leverage RNA interference mechanisms to silence transposons, retrotransposons and repeat sequences, thus preserving genome integrity and cellular homeostasis⁶³.

Aptamers, riboswitches, ribozymes and guide-RNA

RNAs can perform interference activities also exploiting their tertiary structure. Aptamers are short single-stranded DNA or RNA molecules that, due to their specific 3D conformation, can act like chemical antibodies binding a diverse array of targets (e.g., proteins, peptides, carbohydrates, DNA, and RNA)^64–66. Riboswitches are small non-coding RNAs that perform a ligand-dependent conformational change triggering alternative splicing and self-cleavage processes that cause the modulation of gene expression and mRNA degradation, which is of pivotal importance for cell survival and adaptation to different environmental stimuli⁶⁷. Some RNAs (e.g. ribozymes) have enzymatic activity acting as catalysts to accelerate biochemical reactions like mRNA and protein cleavage. Ribozymes can be artificially engineered to target specific sequences and synthetic ribozymes have already been designed against viral RNA. Synthetic guide RNAs (gRNAs) are employed in the CRISPR-Cas9 system, which is utilized for gene editing and gene therapy purposes⁶⁸.

Biomedical ontologies for the semantic characterization of RNA-KG

Several standard biomedical ontologies can be used to set up common semantics in the considered data sources. Table 1 shows those considered during RNA-KG construction (their specifications are made available in the web portals ebi.ac.uk/ols4 and bioportal.bioontology.org). We selected these ontologies because their terms and hierarchical structures are commonly accepted by the scientific community to unequivocally describe biological classes and entities such as diseases, phenotypes, chemicals, biological processes, proteins, and relations between them. In the case of RNA-KG, we have also taken into account the lack of specific ontologies for the description of all possible RNA sequences (especially non-coding ones), and the presence of bio-ontologies that are yet not well-recognized by the community.

Table 1.

Main biomedical ontologies used for RNA-KG construction (* modified to include only human and viral proteins).

Name	Abbr.	Description
Human Phenotype Ontology⁴⁴	HPO	Terms representing medically relevant phenotypes and disease-phenotype annotations.
Gene Ontology⁴⁰	GO	Terms representing attributes of gene products in all organisms. Cellular component, molecular function, and biological process domains are covered.
Monarch Merged Disease Ontology⁹⁸	Mondo	Terms representing human diseases.
Vaccine Ontology⁹⁹	VO	Terms in the domain of vaccine and vaccination.
Chemical Entities of Biological Interest¹⁰⁰	ChEBI	Structured classification of molecular entities of biological interest focusing on “small” chemical compounds.
Uber-anatomy Ontology¹⁰¹	Uberon	Terms representing body parts, organs and tissues in a variety of animal species, with a focus on vertebrates.
Cell Line Ontology¹⁰²	CLO	Terms representing publicly available cell lines.
PRotein Ontology¹⁰³*	PRO	Terms representing protein-related entities (including specific modified forms, orthologous isoforms, and protein complexes).
Sequence Ontology¹⁰⁴	SO	Terms representing features and properties of nucleic acid used in biological sequence annotation.
Pathway Ontology¹⁰⁵	PW	Terms for annotating gene products to pathways.
Relation Ontology²⁸	RO	Terms and properties representing relationships used across a wide variety of biological ontologies.

Open in a new tab

Methods

The creation of a KG is a complex task that requires passing through several phases that can be organized in a workflow such as the one reported in Fig. 2. In the remainder of the section, we provide a detailed description of the different issues and adopted solutions for each phase of the creation of our RNA-based KG.

Fig. 2 — Workflow for the construction of RNA-KG.

RNA sources characterization

In this phase, we have identified and analyzed the characteristics of relevant data sources from which the information for feeding the KG has been extracted. This is a well recognized critical initial step in constructing a KG⁶⁹.

With this aim, an extensive literature review was carried out and ended with the identification of more than 60 public repositories dealing with RNA sequences and annotations developed by well-reputed organizations, published in top journals in the last 10 years, periodically updated, and containing significant amounts of RNA molecules and relevant relationships with other types of molecules and bio-entities. Sources provide data in different formats (e.g., CSV, TSV, gaf, reactome, xlsx, JSON, and HTML) or by issuing queries on content management systems. Once the data were downloaded, Pandas⁷⁰ DataFrames were used to transform the data into a common format (TSV files) and remove syntactic inconsistencies.

The adoption of different types of molecular identifiers represents another issue. Indeed, the identification scheme encountered in the considered data may vary from the source and target of the relation and could be characterized by different accuracy levels. Four levels have been detected: Well-Reputed (denoted WR), when the identifiers are widely accepted by the scientific community (e.g., NCBI Entrez Gene identifiers); Ontology-based (denoted O), when the identifiers are directly represented with ontological terms; Mapping-based (denoted M), when the identifiers can be obtained by exploiting look-up tables; and Proprietary (denoted P), when all the previous techniques cannot be applied. Once the identification scheme adopted in a source has been classified, appropriate look-up tables for their mapping into the reference ontology have been realized by analyzing synonyms in the reference ontology or by examining the ones provided by the sources themselves to facilitate interoperability with other sources dealing with the same entities. For instance, we employed NCBI Gene Entrez identifiers to represent genes in RNA-KG, but many sources provide the correspondent Gene Symbol. In this case, a look-up table has been used to map gene identifiers into the chosen representation (Fig. 3). To guarantee a high level of reliability of the relationships to be included in RNA-KG, only meaningful relationships have been considered, that is those satisfying constraints that take into account p-values or FDR – False Discovery Rate (e.g., p_val < 0.01), experimental validation of results, or scores (denoted with σ) defined as reliable in the considered data source.

Fig. 3 — The relationship between chemicals and miRNA cannot be decoded directly because of the use of different identification schemes. However, by means of a look-up table the relationship can be highlighted.

The main characteristics of the identified repositories are reported in Tables 2, 3, 4, whose entries are organized according to the main type of RNA molecules made available by the source. Sources with miRNA entities can contain hairpin miRNA, xeno-miRNA, and mature miRNA molecules (last ones, in turn, can be classified in -3p and -5p transcripts). Inter RNA sources are those that do not focus on a single RNA type but propose multiple relationships among different types of RNA molecules and bio-entities (e.g., disease in the case of RNADisease or cellular component in the case of RNALocate). Note that no species is present for RNA drugs, vaccines and aptamers because they are synthetic. Regarding the format, the majority of the considered data sources (≥70%) export data in a flat-file format (CSV). Only a small fraction of them (around 20%) provides an API for accessing data stored in a relational database. Only DrugBank and the GO resource offer an RDF data representation coupled with a SPARQL endpoint.

Table 2.

Main data sources (Part I).

Type	Data source	Species	RNAs	Format	API	Threshold	SI	Relation with	TI	Relation
miRNA	miRBase¹⁰⁶	271	87,474	rel/CSV	no	validated	WR	miRNA epi. mod.	WR M	58,168 7
	miRDB¹⁰⁷	5	7,086	CSV	no	σ > 80	WR	mRNA	WR	3,519,884
	miRNet¹⁰⁸	10	7,928	rel/CSV	yes		WR	variant gene snoRNA chemical TF epi. mod. lncRNA pseudogene circRNA disease	WR WR WR M M M WR WR WR M	67,532 3,025,487 9,738 4,935 3,311 1,955 31,345 59,417 804,086 32,004
	miRecords¹⁰⁹	9	384	CSV	no	validated	WR	mRNA	M	1,529
	HMDD¹¹⁰	HS	1,206	CSV	no		WR	disease	M	35,547
	EpimiR¹¹¹	7	617	CSV	no		WR	epi. mod.	M	1,974
	miR2Disease¹¹²	HS	349	CSV	no		WR	disease	O	3,273
	TargetScan¹¹³	5	5,168	CSV	no	validated	WR	gene	WR	2,850,014
	SomamiR¹¹⁴	HS	1,078	CSV	no	validated	WR	mRNA circRNA lncRNA disease	WR WR WR M	2,313,416 428,237 127,025 2,424
	TarBase¹¹⁵	18	2,156	rel/CSV	no		WR	gene	WR	665,843
	miRTarBase¹¹⁶	28	4,630	CSV	no		WR	gene	WR	2,200,449
	SM2miR¹¹⁷	21	1,658	CSV	no		WR	chemical	M	4,989
	TransmiR¹¹⁸	19	785	CSV	no	validated	WR	TF	M	3,730
	PolymiRTS¹¹⁹	HS	11,182	rel/CSV	no	validated	WR	disease variant mRNA	M WR WR	83,516 16,412 16,412
	dbDEMC¹²⁰	HS	3,268	CSV	no	p_val < 0.01	WR	disease	M	160,800
	TAM¹²¹	HS	1,209	CSV	no		WR	mol. function miRNA TF disease anatomy	M WR M M M	2,538 1,218 165 12,516 58
	PuTmiR¹²²	HS	1,296	CSV	no		WR	TF	M	12,097
	miRPathDB¹²³	HS, MM	29,430	CSV	no	FDR < 0.05 validated	WR	mol. function bio. process cell. component pathway	O O O WR	1,066,511 4,782,046 1,136,036 986,400
	miRCancer¹²⁴	HS	57,984	CSV	no		WR	disease	M	9,080
	miRdSNP¹²⁵	HS	249	CSV	no	validated	WR	disease variant mRNA	M WR WR	786 758 180
	miRandola¹²⁶	14	1,002	CSV	no		WR	extracell. form chemical	M M	3,262 25
mRNA vaccine	DrugBank¹²⁷		4	rel/RDF	yes^S		P	disease	M	8

Open in a new tab

For each type of RNA molecule, the table reports the corresponding data sources. Moreover, for each data source, Species and RNAs columns specify the number of species and distinct sequences (HS and MM tags refer to specific species Homo sapiens and Mus musculus); Relation with and Relation columns specify the distinct relationships with bio-entities and their number; Format column refers to the data format (CSV for flatfiles, rel for relational tables, RDF, or HTML for web pages); API column reports the availability of API or SPARQL endpoints (the last one denoted with the superscript s) for data access; Threshold column provides identified quality threshold within the source. SI and TI columns contain the class of the identification schemes (WR – Well-Reputed, O – Ontology-based, M – Mapping- based, and P – Proprietary) adopted respectively by source and target(s) within a specific resource (the source is the RNA molecule specified in the Type column, whereas target(s) are specified in the Relation with column).

Table 3.

Main data sources (Part II).

Type	Data source	Species	RNAs	Format	API	Threshold	SI	Relation with	TI	Relation
s(i/h)RNA	ICBP siRNA¹²⁸	HS, MM	147	HTML	no		P	mRNA	WR	147
s(i/h)RNA	DrugBank¹²⁷		4	rel	yes^S		P	mRNA disease	WR M	4 6
RNA aptamer	Apta-Index¹²⁹		230	rel	no		P	chemical protein	M M	77 153
RNA aptamer	DrugBank¹²⁷		2	rel/RDF	yes^S		P	protein disease	M M	5 6
ASO	eSkip-Finder¹³⁰	4	2,196	rel	no		P	mRNA	WR	11,778
ASO	DrugBank¹²⁷		12	rel/RDF	yes^S		P	protein mRNA disease	M WR M	12 7 14
gRNA	Addgene¹³¹	29	296	HTML	no		P	gene	WR	321
lncRNA	LncBook¹³²	HS	323,950	rel/CSV	no	validated	WR	miRNA small protein disease bio. context	WR WR M M	146,092,274 772,745 34,536 95,243
	LncRNADisease¹³³	4	6,066	CSV	no		WR	disease	M	20,277
	LncExpDB¹³⁴	HS	101,293	rel/CSV	no	p_val < 0.01	WR	mRNA	WR	28,443,865
	dbEssLnc¹³⁵	HS, MM	207	JSON	no		WR	bio. role bio. process	P O	207 28
	lncATLAS¹³⁶	HS	6,768	CSV	no	σ ≥ 28.50	WR	cell. component	M	2,429,368
	NONCODE¹³⁷	39	644,510	rel	no		WR	disease	O	32,226
	Lnc2Cancer¹³⁸	HS	3,402	CSV	no		WR	disease	O	9,254
	LncRNAWiki¹³⁹	HS	106,063	rel/CSV	no		WR	small protein disease bio. context cell. component gene miRNA TF bio. process mol. function chemical pathway	P M M M WR WR M M M M M	9,387 7,634 18,453 4,969 509 210 232 10,806 1,800 789 571
	LncBase¹⁴⁰	4	21,225	rel/CSV	no	σ₁ ≥ 0.7325 σ₂ ≥ 2.497	WR	miRNA anatomy cell cell. component	WR M M M	4,229,539 61,905 68,355 73,069
	TANRIC¹⁴¹	HS	12,727	rel/CSV	no	σ ≥ 0.3	WR	disease	M	36,632
Ribozyme	Ribocentre¹⁴²	1,195	21,084	rel	no		P	bio. process	M	34
Ribozyme	Rfam¹⁴³	16	35	rel	yes		P	bio. process cell. component mol. function	O O O	8 6 22
Riboswitch	TBDB¹⁴⁴	3,621	23,497	CSV	no		P	protein	M	23,535
Riboswitch	RSwitch¹⁴⁵	50	215	rel/CSV	no		P	bact. strain	M	215
tRF & tsRNA	tRFdb¹⁴⁶	7	863	CSV	no		P	tRNA cell	WR M	792 2,292
	tsRFun¹⁴⁷	HS	3,940	CSV	no	FDR < 0.01	P	miRNA tRNA disease	WR WR M	45,165 46,798 4,620
	MINTbase¹⁴⁸	HS	28,824	CSV	no		P	tRNA	WR	125,285

Open in a new tab

Table 4.

Main data sources (Part III).

Type	Data source	Species	RNAs	Format	API	Threshold	SI	Relation with	TI	Relation
Viral RNA	ViroidDB¹⁴⁹	9	9,891	CSV	no		WR	ribozyme	P	17,460
snoRNA	snoDB¹⁵⁰	HS	751	CSV	no		WR	gene mRNA lncRNA miRNA pseudogene rRNA snoRNA snRNA tRNA scaRNA	WR WR WR WR WR P WR WR P WR	763 276 45 17 10 735 670 164 164 34
tRNA	tRNAdb¹⁵¹	681	9,758	rel	no		WR	amino acid	M	8,872
tRNA	GtRNAdb¹⁵²	4,857	426,592	rel	no		WR	epi. mod.	M	1,366
piRNA	piRBase⁸²	44	218,383,944	rel	no		WR	disease variant mRNA lncRNA	M WR WR WR	302 1,640,636 30,338 1,199
	iPiDA-GCN¹⁵³	HS	10,149	CSV	no		WR	disease	O	11,981
	TarpiD¹⁵⁴	9	1,154	rel	no		WR	gene disease	WR M	28,682 11,869
Inter RNA	RNAInter¹⁵⁵	156	455,887	CSV	yes	σ ≥ 0.2886	WR	chemical histone mod. RBP TF protein gene	M M M M M WR	10,890 1,060,685 5,200,067 9,323,690 22,543,829 119,377
	RNALocate¹⁵⁶	104	123,592	CSV	yes		WR	cell. component	M	213,429
	RNADisease¹⁵⁷	117	91,245	CSV	yes	σ ≥ 0.95	WR	disease	O	343,273
	ncRDeathDB¹⁵⁸	12	648	CSV	yes		WR	prog. cell death	M	4,615
	cncRNADB¹⁵⁹	21	2,002	CSV	yes		WR	anatomy	M	2,598
	ViRBase¹⁶⁰	152	41,718	CSV	yes	σ ≥ 0.7	WR	viral RNA viral protein	WR M	719,214 195
	Vesiclepedia¹⁶¹	41	20,490	CSV	no		WR	extracell. form	M	388,154
	DirectRMDB¹⁶²	25	19,702	CSV	no		WR	epi. mod.	M	904,712
	Modomics¹⁶³	32	225	rel/RDF	yes		WR	epi. mod.	M	276
	The GO resource (GO annotations)⁴⁰	12	26,245	rel/RDF	yes^S		P	bio. process mol. function cell. component	O O O	48,096 23,767 42,563

Open in a new tab

For the characterization of the relationships that can be extracted from the different data sources, we applied the Relation Ontology (RO). Moreover, the hierarchical organization of concepts in RO allows the expression of different kinds of relationships at different granularities (e.g., the general property interacts with can be substituted with more specific properties such as molecularly interacts with or genetically interacts with). Finally, in case of a lack of specific properties for describing relationships identified in a data source, we decided to approximate the concept/relationship type with a property already present in RO. This choice has the effect of getting a larger agreement on the meaning of the used terms.

Figure 4 summarizes the available relations involving RNA molecules and bio-entities (i.e., gene, protein, chemical, and disease) that we have identified in the different data sources. miRNA-lncRNA interactions are the most numerous. We can retrieve around 150 million distinct relationships of this type from public RNA-based data sources. In terms of cardinality, they are followed by lncRNA-mRNA interactions (~ 28 million) and miRNA-mRNA/miRNA-gene interactions (~ 6 million each). Around 800 thousand distinct relationships can also be identified for protein-lncRNA interactions. These categories of molecules often interact with each other in specific diseases. RNA aptamer-disease is the less represented one because at the current stage only two approved (or under-investigation) RNA aptamer drugs are present in DrugBank and, in general, RNA drugs are less represented than others because they are synthetic (DrugBank siRNA and mRNA vaccine categories contain only 4 approved or under-investigation drugs, ASO drugs are only 12, and RNA aptamer drugs only 2). In addition to the so far discussed data sources, RNAcentral⁷¹ is a collector coordinated by the European Bioinformatics Institute (EBI⁷²), which imports non-coding RNA sequences from multiple databases and enables integrated text search, sequence similarity search, bulk downloads, and programmatic data access through a reliable API.

Fig. 4 — Number of relationships involving RNA molecules and relevant bio-entities (gene, protein, chemical, and disease) within the considered RNA sources. Colors represent the ranges of relationships in log scale, as reported in the legend.

To guarantee a high level of homogeneity in the KG, a few tuples have been omitted when the mapping to the reference ontology was not possible. For some types of RNA molecules (especially ncRNA sequences), the look-up tables cannot be adopted because of the lack of a reference ontology with which these molecules can be represented. In these cases, the NCBI Entrez Gene identifiers of the gene from which the specific RNA is transcribed have been extended with a suffix that corresponds to the type of non-coding RNA (e.g., in case of small nucleolar RNA molecules the suffix is ?snoRNA). We remark that the lack of a common ontological representation among heterogeneous sources can cause the same molecule to be represented multiple times under different identifiers. At the current stage of development, we decided to admit their presence, but we will consider de-duplication techniques^73,74 that rely on the use of similarity measures on the molecule sequences in future releases.

Starting from the need to understand whether the content of the different data sources overlap, we examined the entities and relationships made available in the considered data sources and identified containment (or overlapping) data sources. The result of our study is reported in Fig. 5 where bubbles represent the relationships made available by the data sources. We can note the presence of two prominent clusters (miRNet and RNAcentral) that properly include or overlap the relationships made available by other data sources. The identification of these containments has been exploited to reduce the issue of semantic compatibility. Furthermore, many miRNA and lncRNA sources contain relations that either overlap or are properly included within other sources. We remark that the Inter RNA sources RNAInter, RNALocate, RNAdisease, ncRDeathDB, cncRNAdb, and ViRBase are nicknamed “Sister Projects” because they are updated and maintained by the same research team. Common semantics in “Sister Projects” result useful for data handling because they share a practically identical structure.

Fig. 5 — Bubbles represent the relationships made available from the considered data sources. Overlapping and inclusions of bubbles show the presence of common relationships among the considered data sources.

Ontological description of the KG

In this phase, we identified the classes of bio-entities that need to be managed and the kinds of relationships that can exist among them (schema layer). Moreover, specific instances and the properties that need to be maintained have been highlighted (data layer). This design activity plays a fundamental role in the hierarchy, structure, and content filling of the knowledge graph, and it is the basis for determining the kind of reasoning that can be supported.

Starting from the knowledge gained from the characterization of RNA sources, we moved toward the construction of the ontological schema underlying the KG. A meta-graph was built to include all the kinds of bio-entities and relationships between them outlined in the previous phase. The meta-graph provides both direct and inverse relationships that are considered to guarantee bi-directional navigation of the generated KG. Once classes of bio-entities and their relationships have been identified, we determined the properties that should be kept for them. At the current stage, only fundamental properties of bio-entities have been collected (identifiers, node types, and source provenance). This choice has the advantage of avoiding the explosion of the KG size. However, in future implementations, we wish to include class properties.

Table 5 reports the main relationships of the considered data sources according to the RO ontology. For each relation, the table reports the RO identifier, the corresponding meaning, and, whenever available, the inverse relation. The relation names that are exploited for unidirectional relationships are marked with the * symbol. The general relationships interacts with available in RO with the meaning “A relationship that holds between two entities in which the processes executed by the two entities are causally connected” have been specified in the most specific relationships molecularly interacts with in our classification to represent the situation in which the two partners are molecular entities that directly physically interact with each other (e.g., via a stable binding interaction or a brief interaction during which one modifies the other). We use this relationship to represent a specific interaction process at the molecular level (e.g., aptamer-protein binding interaction or tRNA molecule charged with a specific amino acid). We remark that some authors^75,76 suggest that miRNA molecules are involved in negative regulation of complementary miRNA molecules by forming base-pairing interactions. However, this kind of relationship is not present in our data sources. Finally, we note that (the part of property, together with its inverse has part, formally belong to the Basic Formal Ontology – BFO⁷⁷ – but are imported in RO).

Table 5.

Main relations among bio-entities involving RNA with the RO identifier (* symmetric relationship).

Relation ID	Name	Inverse Relation ID	Inverse Name
RO:0000056	participates in	RO:0000057	has participant
RO:0000079	function of	RO:0000085	has function
RO:0011013	indirectly positively regulates activity of
RO:0001015	location of	RO:0001025	located in
RO:0011016	indirectly negatively regulates activity of
RO:0002202	develops from	RO:0002203	develops into
RO:0002204	gene product of	RO:0002205	has gene product
RO:0002245	over-expressed in
RO:0002246	under-expressed in
RO:0002260	has biological role
RO:0002263	acts upstream of
RO:0002264	acts upstream of or within
RO:0002291	ubiquitously expressed in	RO:0002293	ubiquitously expresses
RO:0002302	is treated by substance	RO:0002606	is substance that treats
RO:0002314	characteristic of part of
RO:0002325	colocalizes with*
RO:0002326	contributes to
RO:0002327	enables	RO:0002333	enabled by
RO:0002331	involved in
RO:0002387	has potential to develop into
RO:0002428	involved in regulation of
RO:0002430	involved in negative regulation of
RO:0002432	is active in
RO:0002434	interacts with*
RO:0002435	genetically interacts with*
RO:0002436	molecularly interacts with*
RO:0002448	directly regulates activity of
RO:0002449	directly negatively regulates activity of
RO:0002450	directly positively regulates activity of
RO:0002479	has part that occurs in
RO:0002526	overlaps sequence of*
RO:0002528	is upstream of sequence of	RO:0002529	is downstream of sequence of
RO:0002559	causally influenced by	RO:0002566	causally influences
RO:0003002	represses expression of
RO:0003302	causes or contributes to condition
RO:0004033	acts upstream of or within, negative effect
RO:0004035	acts upstream of, negative effect
RO:0010001	generically depends on	RO:0010002	is carrier of
RO:0011002	regulates activity of
RO:0011007	decreases by repression quantity of
BFO:0000050	part of	BFO:0000051	has part
RO:HOM0000000	in similarity relationship with*
RO:HOM0000001	in homology relationship with*

Open in a new tab

The content of Tables 2–5 is the groundwork for the generation of the meta-graph reported in Fig. 6. The graphical representation provides a global overview of the richness of information that is currently provided. To simplify the visualization of the meta-graph, we omitted most of the non-RNA bio-entities that are known to play an important role in studying the biology (and supporting the discovery) of novel RNA drugs. Moreover, we have omitted some of the relationships extracted from the Inter RNA data sources (see Table 4) because of the limitation of their occurrences. The meta-graph in Fig. 6 can be further extended to include other nodes representing other bio-entities (e.g., diseases, epigenetic modifications, small molecules, tissues, biological pathways, and cellular components) and relationships relevant to the analysis. This “enlarged” meta-graph is quite complex and difficult to represent graphically. Figure 7 shows a very abstract representation by clustering in a single RNA node all the kinds of RNA molecules described in Fig. 6. Then, this node is connected with various bio-entities based on insights extracted from RNA sources and literature.

Fig. 7 — The complete conceptual RNA-KG meta-graph. RNA nodes are summarized into a few general types (e.g., ncRNA and mRNA) to simplify the visualization.

Table 6 specifies the bio-ontologies that have been exploited for grounding concepts in RNA sources. RNA sources are categorized according to the main treated molecules of RNA (whose characteristics are reported in Tables 2–4).

Table 6.

Bio-ontologies that can be exploited for the characterization of data sources (Part I).

Type	Data source	GO	Mondo	HPO	VO	ChEBI	Uberon	CLO	PRO	SO	PW
miRNA	miRBase	x								x
	miRDB									x
	miRNet		x	x		x			x	x
	miRecords									x
	EpimiR									x
	HMDD		x	x						x
	miR2Disease		x	x						x
	TargetScan									x
	SomamiR		x	x						x
	TarBase									x
	miRTarBase									x
	SM2miR					x				x
	TransmiR								x	x
	PolymiRTS		x	x						x
	dbDEMC		x	x						x
	TAM	x	x	x			x		x	x
	PuTmiR								x	x
	miRPathDB	x								x	x
	miRCancer		x	x						x
	miRdSNP		x	x						x
	miRandola	x				x				x
mRNA vaccine	DrugBank		x		x	x				x
s(i/h)RNA	ICBP siRNA									x
s(i/h)RNA	DrugBank		x			x				x
RNA aptamer	Apta-Index					x			x	x
RNA aptamer	DrugBank		x			x			x	x
ASO	eSkip-Finder									x
ASO	DrugBank		x			x			x	x
gRNA	Addgene									x
lncRNA	LncBook	x	x	x			x	x	x	x
	LncRNADisease		x	x						x
	LncExpDB									x
	dbEssLnc	x				x				x
	lncATLAS	x								x
	NONCODE		x	x						x
	Lnc2Cancer		x	x						x
	LncRNAWiki	x	x			x			x	x	x
	LncBase	x					x	x		x
	TANRIC		x							x

Open in a new tab

Ontological alignment specification

In this phase, we identified the KG representation and the kind of storage system to adopt. RDF triples have turned out to be suitable because of their common, flexible, and uniform data model. These properties result in an ontologically-grounded KG for conducting different kinds of analysis and reasoning.

Since a standardized formal definition for the concept of a KG is still lacking, we considered the one adopted by Callahan and colleagues²³ in which a KG is a pair < T, A > , where T is the TBox and A the ABox. The TBox represents the taxonomy of a particular domain including classes, properties/relationships, and assertions that are assumed to generally hold within a domain (e.g., a miRNA is a small regulatory ncRNA located in an exosome as depicted in Fig. 8). The ABox describes attributes and roles of class instances (i.e., individuals) and assertions about their membership in classes within the TBox (e.g., hsa-miR-125b-5p is a miRNA whose overexpression has been associated with leukemia). Non-ontological entities (i.e., entities from a data source that are not compliant to a given set of ontologies such as RNA molecules) can be integrated with ontologies using either a TBox (i.e., class-based) or ABox (i.e., instance-based) knowledge model. For the class-based approach, each database entity is represented as subClassOf an existing ontology class, while for the instance-based approach it is represented as instanceOf an existing ontology class.

Fig. 8 — An example of the use of Description Logic (DL) for knowledge modeling. The TBox includes classes (i.e., miRNA, small regulatory ncRNA, and exosome), and the assertions between classes (i.e., “miRNA subClassOf small regulatory ncRNA” and “miRNA is located in exosome”). The ABox includes instances of classes (i.e., *hsa-miR-125b-5p*) represented in the TBox and assertions about those instances (i.e., “*hsa-miR-125b-5p* instanceOf miRNA” and “*hsa-miR-125b-5p* causes leukemia”).

For the construction of the KG we have employed the PheKnowLator ecosystem²³ because it offers both approaches for the representation of bio-entities and their relationships, and also because of its simplicity in the identification of the columns containing the molecules’ identifiers and for the specification of their relationships in terms of the RO ontology. PheKnowLator also provides tools to easily generate the ontology that better describes the content of the KG that, besides the terms and relationships of the meta-graph, also includes other ontological terms for supporting the reasoning.

KGs can be easily exported according to different kinds of models offered by PheKnowLator depending on the analyses to be conducted. Even if RNA-KG is made available in all the supported knowledge models, we think that the instance-based, inverse relation, semantically abstracted (OWL-NETS⁷⁸ without harmonization) configuration is the most suitable to be processed by different kinds of ML algorithms for node and link prediction. This solution ensures that RNA molecules (which lack semantic characterizations in bio-ontologies) and other non-ontological data can be specified as subClassOf specific ontological classes. Moreover, this approach enables the automatic specification of inverse relations among the involved bio-entities. Lastly, OWL-NETS reversibly abstracts ontological biomedical knowledge into a network representation containing only biologically meaningful concepts and relations. Figure 9 shows a small toy-example subgraph extracted from RNA-KG according to the proposed set-up. We can notice the presence of inverse relationships (located in and its inverse location of), and the relation RDF subClassOf connected to entities that do not have a corresponding term in a reference ontology (miRNA molecules are specified as subClassOf the SO term miRNA).

Fig. 9 — Example of a RNA-KG subgraph realized according to the instance-based, inverse relations, semantically abstracted (OWL-NETS without harmonization) parameters.

By studying the characteristics of the data sources, specific mapping rules have been devised through PheKnowLator to extract triples compliant with the adopted ontologies. Mapping rules contain the position of the source and object in the TSV file, the two human-readable labels for subject and object (e.g., mRNA and disease), the type of relationship that holds/exists between them according to RO (e.g., RO_0003302 corresponds to causes or contributes to condition relation), and further detailed options (e.g., thresholds for considering the tuple, row filtering options, transformation options according to the look-up table). These rules will be exploited for the extraction of the triples according to the reference ontology.

Since many ontologies are used in our context, we adopted the PheKnowLator tools to clean ontology files (i.e., remove and normalize errors, eliminate obsolete and/or deprecated entities, remove duplicate classes and class concepts) and merge cleaned ontology files into a single one that describes entirely the structure of RNA-KG and is compliant with our meta-graph.

RNA-KG generation and analysis

In this final phase, the PheKnowLator mapping rules have been issued on the pre-processed data for generating a KG compliant with the meta-graph identified in Phase 2 (ontological description of the KG). In order to evaluate the characteristics of the generated KG, we used the GRAPE library that we recently developed for fast and efficient graph processing and embedding⁷⁹. By importing RNA-KG into the GRAPE environment, we were able to retrieve relevant topological information and topological oddities that can be useful in identifying (eventual) data duplication. Moreover, GRAPE can be exploited to implement different types of graph embedding techniques that cannot be realized by means of other tools because of the size of the generated KG. Finally, a Blazegraph endpoint⁸⁰ was created to make RNA-KG freely available and accessible. Using SPARQL, it is possible to extract portions of the graph and use it for different kinds of analysis (see the examples reported in the Supplementary Listings S1–S3). Moreover, the entire RNA-KG can be downloaded from our lab website http://RNA-KG.anacleto.di.unimi.it.

Data Records

RNA-KG data resource is hosted on Zenodo⁸¹. We have deposited the KG and all relevant intermediate files in this repository. The current version of RNA-KG (version 0.9) has a single connected component containing 673,825 nodes and 12,692,212 edges. The number of nodes and edges has been substantially reduced by considering only the relationships with high reliability. The construction process of the graph is designed to be periodically updated, including data from other public RNA and related biomedical sources. Moreover, thresholds can be tuned for enlarging or reducing the KG size. Table 7 depicts the main macroscopic topological and structural properties of the current RNA-KG.

Table 7.

Bio-ontologies that can be exploited for the characterization of data sources (Part II).

Type	Data source	GO	Mondo	HPO	VO	ChEBI	Uberon	CLO	PRO	SO
Ribozyme	Ribocentre	x								x
Ribozyme	Rfam	x								x
Viral RNA	ViroidDB									x
Riboswitch	TBDB	x							x	x
Riboswitch	RSwitch				x					x
tRF & tsRNA	tRFdb							x		x
	tsRFun		x							x
	MINTbase									x
snoRNA	snoDB									x
tRNA	tRNAdb					x				x
tRNA	GtRNAdb	x								x
piRNA	piRBase		x							x
	iPiDA-GCN		x							x
	TarpiD		x	x						x
Inter RNA	RNAInter					x			x	x
	RNALocate	x								x
	RNADisease		x							x
	ncRDeathDB	x								x
	cncRNADB						x	x		x
	ViRBase								x	x
	Vesiclepedia	x				x			x	x
	DirectRMDB	x								x
	Modomics	x							x	x
	The GO resource	x								x

Open in a new tab

RNA-KG is made available in N-Triples format. The files made available are described in Supplementary Tables S1–S8. These tables provide a breakdown of the number of nodes by node type and the number of edges by edge type. RNA-KG is available via SPARQL endpoint that can be accessed at http://RNA-KG.anacleto.di.unimi.it.

The dataset also includes the N-Triples KG that merges RNA-KG and Human Disease benchmark KG. Moreover, we provide supporting input files for the generation of RNA-KG by using PheKnowLator. Specifically:

edge_source_list.txt. It contains the organization of the resources directory in which data files downloaded from the repositories are maintained. For each file, it reports the kinds of edges that can be extracted by PheKnowLator.
ontology_source_list.txt. It contains the bio-ontologies chosen to describe the meta-graph. Ontologies employed for building the current release of RNA-KG were updated to date April 22, 2024.
resource_info.txt. It contains the meta-graph. RO properties are used to describe the interactions stored in edge_source_list.txt. Moreover, this file is used to specify prefixes for subject and object nodes (e.g., http://purl.obolibrary.org/obo/ for ontological nodes), apply evidence criteria to filter a source, and map node identifiers through look-up tables.

Raw data are also reported on Zenodo and have been collected from the public data sources referenced in Tables 2–4. The data sources’ owners have been contacted to present the initiative and asked the use permission for their data. No one answered that their data cannot be used for academic purposes.

Technical Validation

To evaluate the quality of RNA-KG, several analyses were executed whose results are reported in the following paragraphs.

RNA-KG statistical analysis

Figure 10a shows the distribution of nodes contained in RNA-KG. Nodes can be classified into nodes representing bio-entities and those representing ontological terms. Bio-entities have been further subdivided into RNA nodes (gathering together sncRNA, mRNA, lncRNA, viral RNA, and unclassified RNA nodes), and non-RNA nodes (named other bio-entities) that contain, for instance, gene and variant (SNP)-typed nodes. Furthermore, Fig. 11a presents the distribution of nodes according to the main type of RNA molecules, detailing the different categories of sncRNA, mRNA, viral RNA, and lncRNA. miRNA, mRNA, and lncRNA molecules are among the most represented in RNA-KG because they are well-studied (many RNA sources have been categorized/typed as lncRNA and miRNA, and mRNA are in relationships with many other ncRNAs as already discussed in the characterization of the data sources). piRNAs are the most numerous sncRNA category because they are very short sequences of ~ 20 − 30 nucleotides that originate from large genomic regions called piRNA clusters, which can be transcribed into long single-strand precursors and further processed into mature piRNAs that often act on the same gene or mRNA transcript in specific diseases or aberrant phenotypes^82,83. Additionally, tRF molecules are abundant because they are “fragments” of tRNAs (one tRNA can generate more than one tRF or tsRNA). Riboswitches are also numerous because in RNA-KG we have many riboswitches belonging to human-pathogenic bacteria that can be targets for drugs. Each bacterial riboswitch comes with a different identifier.

Fig. 10 — Pie-chart of: (a) node distribution according to node types; (b) edge distribution according to edge types.

Fig. 11 — Node distribution according to node types.

The unclassified RNA category includes 780 RNA nodes for which a better semantic characterization cannot be assigned because, in the original sources, they are specified as “other RNA”, “miscellaneous RNA”, “unknown RNA”, “ncRNA”, or “RNA molecules to be experimentally confirmed”. Finally, the other category includes sncRNA molecules whose distribution is negligible in RNA-KG (71 sncRNA molecules among ribozymes, enhancer RNAs, vault RNAs, Y RNAs, retained introns, mitochondrial RNAs, small conditional RNAs, and scaRNAs). The total number of mRNA, and in general, RNA, is consistent with experimental studies regarding the number of genes in humans (~ 22-25K protein coding genes and more than 100K total genes⁸⁴).

Ontological terms shown in Fig. 10a are introduced in the generation of the KG for supporting reasoning activities and can be further classified according to the specific bio-ontology from which they are extracted (e.g., ChEBI for chemicals and HPO for phenotypes). Among them, chemical and protein nodes cover around 47.8% of the total amount of nodes in RNA-KG. This is because ChEBI and PRO both contain many terms representing chemical entities and proteins for Homo sapiens. Figure 11b further details the distribution of ontological terms. Since the considered ontologies contain also terms that do not follow the usual pattern for their identification (e.g., terms representing glycans belong to PRO but their identifier starts with the prefix GNO which differs from the usual one adopted for identifying proteins), we have introduced the category species, with the terms representing the species (all species start with the prefix NCBITaxon), and the category other terms generally containing all the others.

Figure 10b shows the distribution of edges in RNA-KG. Edges have been subdivided into three categories: i) edges representing RO properties that have been further classified in those that describe the interactions among RNA molecules and RO properties introduced by the integrations of the bio-ontologies; ii) edges representing the subClassOf relationships; and iii) edges representing other kinds of relationships not included in RO (e.g., has gene template belonging to PRO). Figure 12a details the distribution of the types of edges involving RNA molecules. As reflected by the organization of the meta-graph, interacts with is the most represented edge type because it encompasses edges for which the original source does not provide specific semantics, whereas the presence of many regulates activity of edges is justified by the vast majority of miRNA molecules within RNA-KG that indeed regulate the activity of, for example, genes and pseudogene and mRNA molecules. Moreover, Fig. 12b shows the distribution of the remaining edges included in RNA-KG. We can notice the prevalence of many subClassOf due to the integration of the bio-ontologies, and because each RNA molecule is subClassOf an appropriate class within SO (e.g., SO_0000276 for miRNA molecules).

t-SNE representation

Figure 13 shows the t-SNE representation of an embedding of the nodes/edges in RNA-KG by using the GRAPE implementation of Node2Vec with continuous bag of words (CBOW), a random walk-based second-order embedding algorithm⁸⁵, with walk length equal to 5. Figure 13a shows how the embedding of the node type is able to effectively identify the similarities among the nodes of the same type, thus capturing their function in the network. On the other hand, Fig. 13b depicts the edge embedding for RNA-KG. Also in this case, the embedding is able to capture the similarity between edges with the only exception of the interacts with and regulates activity of relations which seem to overlap several other edge types. This fact is not so surprising considering that the interacts with relation is also used to denote a generic relation between nodes. Moreover, the various subcategories of regulates activity of existing between miRNA and mRNA (e.g., directly regulates activity of and its subtypes directly positively regulates activity of and directly negatively regulates activity of) are not distinguished as the algorithm is homogeneous. In the future, we plan to adopt algorithms that take into account the heterogeneity of RNA-KG.

Fig. 13 — Bidimensional view of RNA-KG embeddings.

Topological analysis

The topological analysis led to the identification of top-5 nodes with the highest degree centrality: microvesicle (GO_1990742) with degree 26.94K (whose type is GO); nucleus (GO_0005634) with degree 20.45K; hcmv-miR-US25-1-5p (human cytomegalovirus hcmv-miR-US25-1-5p mature miRNA) with degree 18.18K and node type miRNA; and, (human) hFOXA1 (PR_P55317) with degree 17.37K and node type protein. We remark that the nodes nucleus and microvescicle represent cellular components used for aggregating different bio-entities existing in the context of a cell and this is the main reason for the high node degree within RNA-KG.

Moreover, the RNA relationships with these kinds of cellular components are enhanced by the semantics of location of edge type together with its respective RO inverse located in. On the other hand, hcmv-miR-US25-1-5p is a human cytomegalovirus (HCMV)-encoded -5p miRNA transcript, whose diagnostic and prognostic value has been proved valid for several human diseases and their clinical implications⁸⁶. Relationships involving this miRNA sequence are mainly borrowed from ViRBase source. Finally, hFOXA1 is a human forkhead TF known to be the main target of insulin signaling, to regulate metabolic homeostasis in response to oxidative stress, and to interact with chromatin. The central role assumed by hFOXA1 in RNA-KG is quite interesting since this TF is implicated in various human malignancies characterized by altered expression of ncRNAs⁸⁷.

Degree distribution

As shown in Table 7, the average degree of the undirected version of RNA-KG is relatively small (25.23). Despite the sparsity of the graph, the diameter of the KG is also relatively small (33). On the other hand, as shown in Fig. 14a, the degree distribution suggests a heavy-tailed distribution. All of these properties are usually associated with scale-free networks, or more generally to heavy-tailed degree distributions, which is a common structure in real-world complex systems. This motivates the computation of the empirical complementary cumulative distribution function (CCDF) for the degree reported in Fig. 14b. This curve approximates the probability distribution that a randomly selected node has a degree greater than or equal to x. A linear trend in this plot is usually associated with a powerlaw distribution, where the CCDF is given by a function proportional to x^1−α. We estimate the power of the distribution using⁸⁸, obtaining a value of α = 1.708. The theoretical powerlaw obtained for the degrees is shown in Fig. 14b together with other common heavy-tailed distributions. Among these alternatives, we found that the truncated powerlaw distribution fits better according to the log-likelihood ratio criterion⁸⁹ with p-values smaller than or equal to 10⁻²⁰. This distribution behaves as a power law’s scaling on a range but is truncated by an exponentially bounded tail according to the distribution x^−αe^−λx. Further exploration should be made to confirm the powerlaw properties of the graph since they are usually associated with a hierarchical modular structure of the network, entailing algorithmic advantages for its analysis. For instance, the closeness centrality distribution in Fig. 14c exhibits a bimodal behavior, which could be explained by the existence of a well-connected core usually present in heavy-tail degree distribution networks.

Fig. 14 — (a) Node degree distribution (semi-log). (b) Complementary cumulative distribution (CCDF) for the node degree. (c) Approximated closeness centrality distribution.

Treewidth

Treewidth is a graph parameter measuring the structural similarity between a graph and a tree. It is based on the construction of a tree decomposition which captures how subset of nodes can be grouped to form a tree structure that maintains the global structure of the former graph. For instance, graphs having treewidth equal to one are trees, cycles have treewidth two, and clique graphs have a treewidth equal to the number of nodes minus one. The computation of the treewidth is in NP, but several approximation strategies can be used⁹⁰. The upper bound (10, 611 in Table 7), computed on the undirected version of the KG, can be considered relatively small because it represents about 1.57% of the KG size. This result is consistent with a tree-like hierarchical structure of RNA-KG that has a small and well-connected core.

Isomorphic node groups

RNA-KG contains 761 isomorphic node groups, that is nodes with exactly the same neighbours, node and edge types. Nodes in such groups are topologically indistinguishable, that is, swapping their identifiers would not change the graph topology. These groups involve a total of 9.15K nodes (1.36%) and 272.30K edges (2.15%), with the largest one involving 372 nodes and 10.86K edges. This particular group has degree 10 and is composed of tRFs, specifically i-tRFs (i.e. tRF molecules originating from the internal region of mature tRNA⁹¹), and contains sequences that stem from tRNA molecules whose predicted tRNA isotypes/anticodons are all Histidine-GTG. Other detected isomorphic group components involve tRNA molecules that are all interacting with amino acids at a molecular level or tRF and tsRNA sequences that originate from tRNAs with molecular interactions tied to specific amino acids. For example, some of these groups include Aspartic acid-GTC tRNA sequences or tsRNA-Leucines. The remaining isomorphic node groups involve mRNA and piRNA molecules. All these isomorphic node groups deserve further investigation to check whether the involved molecules correspond to the same tRNA, mRNA, piRNA, tsRNA, or tRF and thus their pruning improves the information quality of RNA-KG. Indeed, many of these groups derive from different RNA sources and contain molecules presenting proprietary identifiers that might collapse.

Usage Notes

The methodology we employed to construct RNA-KG enabled us to generate a high-quality KG that includes reliable interactions, validated through experimental methods and/or strongly endorsed by data providers, and whose meaning was meticulously verified to ensure a consistent representation of domain knowledge.

RNA-KG can generate heterogeneous biomedical graphs in different formats that can be processed by graph-based computational tools to infer biomedical knowledge, provide insights into biomolecular mechanisms and biological processes underlying diseases, support the discovery of new drugs, especially those based on RNA, and evaluate biomedical hypotheses in silico. Specific views of RNA-KG can also be generated or extracted by querying the entire KG according to the type of prediction task to be conducted. Predefined views of interest are already provided on RNA-KG’s website and queries, like the one reported in the Supplementary material, can be issued on RNA-KG for extracting meaningful hidden patterns from the data.

RNA-KG is specifically designed to deal with computational tasks involving RNAs by e.g., exploiting the information about ncRNA interactions for gene and protein expression regulation, collected from tens of publicly available databases. By leveraging the biomedical concepts represented in the biomedical ontologies embedded in the KG, RNA-KG can be also analyzed to predict associations and causal relationships of the “RNA world” with diseases and abnormal phenotypes. We also observe that the rich information embedded in the RNA-KG can be leveraged for classical biomedical prediction tasks, including e.g., gene-disease prioritization, drug-target prediction, drug repurposing and synthetic lethality interaction detection^92,93.

Most of these biomedical tasks can be modeled as link or node-label prediction problems in heterogeneous graphs. Even if, in principle we could apply methods developed for homogeneous graphs⁹⁴, to leverage the rich information scattered across the different types of modes and edges of the RNA-KG, we suggest applying methods specifically designed for heterogeneous graphs⁹⁵. To this end, several AI graph-based methods have been recently proposed to deal with heterogeneous graphs, also in the context of biomedical KGs⁹⁶. In particular, we foresee that Graph Representation Learning methods, by leveraging the topology of the complex bio-medical heterogeneous graphs to embed them into compact vectorial spaces, could be the most promising choice to properly analyze the complex heterogeneous structure of RNA-KG⁹⁷.

Supplementary information

41597_2024_3673_MOESM1_ESM.pdf^{(167.4KB, pdf)}

An ontology-based knowledge graph for representing interactions involving RNA molecules - Supplementary material

Acknowledgements

This work was primarily funded by National Recovery and Resilience Plan-NextGenerationEU award from the National Center for Gene Therapy and Drugs based on RNA Technology (G43C22001320007) to AC, GV, MM, MSG, SB. This work was also supported by funding from: MUSA - Multilayered Urban Sustainability Action - Project, funded by the PNRR-NextGeneration EU program ([G43C22001370007], Code ECS00000037) to MM and GV; the National Library of Medicine (T15LM009451 and T15LM007079) to TJC; the Director, Office of Science, Office of Basic Energy Sciences, of the U.S. Department of Energy under Contract (DE-AC02-05CH11231) to JR; and, the National Human Genome Research Institute (NHGRI) to PNR (5U24HG011449). The authors wish to thank the anonymous reviewers and the editorial board member, prof. Lynn Schriml, for their useful suggestions for improving the quality of our work.

Author contributions

M.M. and G.V. designed the study. Em.C. retrieved, processed, and harmonized datasets. T.J.C., M.M. defined the methodology for the construction of the KG. Em.C. and S.B. worked on the specification of the mapping alignment. G.V., Em.C., and J.G. worked on the description of RNA molecules. Em.C. generated the KG that A.C. and M.S.G. analyzed. S.B. and P.P. set up the SPARQL endpoint and developed the SPARQL queries on the generated KG. G.V., E.C., J.G., J.R., P.N.R. identified the possible applications of the KG in conducting knowledge discovery in life science. M.M., Em.C. and G.V. wrote the initial draft of the paper. All authors reviewed the final version of the manuscript and approved it.

Code availability

The RNA-KG’s project website is at http://RNA-KG.anacleto.di.unimi.it. The code to reproduce results, together with documentation and tutorials, is available in RNA-KG’s GitHub repository at https://github.com/AnacletoLAB/RNA-KG. In addition, the repository contains information and Python scripts to build new versions of RNA-KG as the underlying primary resources get updated and new data become available.

Competing interests

The authors declare no competing interests.

Footnotes

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

The online version contains supplementary material available at 10.1038/s41597-024-03673-7.

References

1.Bartel, D. P. & Chen, C.-Z. Micromanagers of gene expression: the potentially widespread influence of metazoan micrornas. Nature Reviews Genetics5, 396–400, 10.1038/nrg1328 (2004). 10.1038/nrg1328 [DOI] [PubMed] [Google Scholar]
2.Guttman, M. & Rinn, J. L. Modular regulatory principles of large non-coding rnas. Nature482, 339–346, 10.1038/nature10887 (2012). 10.1038/nature10887 [DOI] [PMC free article] [PubMed] [Google Scholar]
3.Cech, T. R. & Steitz, J. A. The noncoding rna revolution—trashing old rules to forge new ones. Cell157, 77–94, 10.1016/j.cell.2014.03.008 (2014). 10.1016/j.cell.2014.03.008 [DOI] [PubMed] [Google Scholar]
4.Iyer, M. K. et al. The landscape of long noncoding rnas in the human transcriptome. Nature genetics47, 199–208, 10.1038/ng.3192 (2015). 10.1038/ng.3192 [DOI] [PMC free article] [PubMed] [Google Scholar]
5.Lorenzi, L. et al. The rna atlas expands the catalog of human non-coding rnas. Nature biotechnology39, 1453–1465, 10.1038/s41587-021-00936-1 (2021). 10.1038/s41587-021-00936-1 [DOI] [PubMed] [Google Scholar]
6.Keller, A. et al. mirnatissueatlas2: an update to the human mirna tissue atlas. Nucleic acids research50, D211–D221, 10.1093/nar/gkab808 (2022). 10.1093/nar/gkab808 [DOI] [PMC free article] [PubMed] [Google Scholar]
7.Vo, J. N. et al. The landscape of circular rna in cancer. Cell176, 869–881, 10.1016/j.cell.2018.12.021 (2019). 10.1016/j.cell.2018.12.021 [DOI] [PMC free article] [PubMed] [Google Scholar]
8.Damase, T. R. et al. The limitless future of rna therapeutics. Frontiers in Bioengineering and Biotechnology9, 10.3389/fbioe.2021.628137 (2021). [DOI] [PMC free article] [PubMed]
9.Barbier, A. J., Jiang, A. Y., Zhang, P., Wooster, R. & Anderson, D. G. The clinical progress of mrna vaccines and immunotherapies. Nature Biotechnology40, 840–854, 10.1038/s41587-022-01294-2 (2022). 10.1038/s41587-022-01294-2 [DOI] [PubMed] [Google Scholar]
10.Carvalho, T. Personalized anti-cancer vaccine combining mrna and immunotherapy tested in melanoma trial. Nature Medicine29, 2379–2380, 10.1038/d41591-023-00072-0 (2023). 10.1038/d41591-023-00072-0 [DOI] [PubMed] [Google Scholar]
11.Winkle, M., El-Daly, S. M., Fabbri, M. & Calin, G. A. Noncoding rna therapeutics — challenges and potential solutions. Nature Reviews Drug Discovery20, 629–651, 10.1038/s41573-021-00219-z (2021). 10.1038/s41573-021-00219-z [DOI] [PMC free article] [PubMed] [Google Scholar]
12.Paunovska, K., Loughrey, D. & Dahlman, J. E. Drug delivery systems for rna therapeutics. Nature Reviews Genetics23, 265–280, 10.1038/s41576-021-00439-4 (2022). 10.1038/s41576-021-00439-4 [DOI] [PMC free article] [PubMed] [Google Scholar]
13.Hombach, S. & Kretz, M.Non-coding RNAs: Classification, Biology and Functioning, 3-17 (Springer International Publishing, 2016). [DOI] [PubMed]
14.Hogan, A. et al. Knowledge graphs. ACM Computing Surveys54, 1–37, 10.1145/3447772 (2021). 10.1145/3447772 [DOI] [Google Scholar]
15.Neo4j. Neo4j - the world’s leading graph database. Available at http://neo4j.org/ (2012).
16.Beckett, D. & McBride, B. RDF/XML Syntax Specification (Revised) - W3C recommendation. Available at https://www.w3.org/TR/REC-rdf-syntax/ (2004).
17.Alocci, D. et al. Property graph vs rdf triple store: A comparison on glycan substructure search. PLOS ONE10, e0144578, 10.1371/journal.pone.0144578 (2015). 10.1371/journal.pone.0144578 [DOI] [PMC free article] [PubMed] [Google Scholar]
18.OWL Working Group. Web ontology language (owl) - w3c recommendation. Available at https://www.w3.org/OWL/ (2012).
19.Baader, F., Horrocks, I., Lutz, C. & Sattler, U.An Introduction to Description Logic (Cambridge University Press, 2017).
20.Prud’hommeaux, E. & Seaborne, A. SPARQL Query Language for RDF - W3C recommendation. Available at https://www.w3.org/TR/rdf-sparql-query/ (2018).
21.Chen, J. et al. Knowledge graphs for the life sciences: Recent developments, challenges and opportunities. Transactions on Graph Data Knowl.1, 5:1–5:33, 10.4230/TGDK.1.1.5 (2023).
22.Chandak, P., Huang, K. & Zitnik, M. Building a knowledge graph to enable precision medicine. Scientific Data10, 10.1038/s41597-023-01960-3 (2023). [DOI] [PMC free article] [PubMed]
23.Callahan, T. J. et al. An open source knowledge graph ecosystem for the life sciences. Scientific Data11, 10.1038/s41597-024-03171-w (2024). [DOI] [PMC free article] [PubMed]
24.Evangelista, J. E. et al. Toxicology knowledge graph for structural birth defects. Communications Medicine3, 10.1038/s43856-023-00329-2 (2023). [DOI] [PMC free article] [PubMed]
25.Shefchek, K. A. et al. The monarch initiative in 2019: an integrative data and analytic platform connecting phenotypes to genotypes across species. Nucleic Acids Research48, D704–D715, 10.1093/nar/gkz997 (2019). 10.1093/nar/gkz997 [DOI] [PMC free article] [PubMed] [Google Scholar]
26.Boudin, M., Diallo, G., Drancé, M. & Mougin, F. The oregano knowledge graph for computational drug repurposing. Scientific Data10, 871, 10.1038/s41597-023-02757-0 (2023). 10.1038/s41597-023-02757-0 [DOI] [PMC free article] [PubMed] [Google Scholar]
27.Livingston, K. M., Bada, M., Baumgartner, W. A. & Hunter, L. E. Kabob: ontology-based semantic integration of biomedical databases. BMC Bioinformatics16, 10.1186/s12859-015-0559-3 (2015). [DOI] [PMC free article] [PubMed]
28.Mungall, C. et al. oborel/obo-relations: 2023-08-18 release. Zenodo10.5281/zenodo.8263469 (2023). 10.5281/zenodo.8263469 [DOI]
29.Cavalleri, E. et al. A meta-graph for the construction of an rna-centered knowledge graph. In Rojas, I., Valenzuela, O., Rojas Ruiz, F., Herrera, L. J. & Ortuño, F. (eds.) Bioinformatics and Biomedical Engineering, 165–180, 10.1007/978-3-031-34953-9_13 (Springer Nature Switzerland, Cham, 2023).
30.Halevy, A. Information integration. In Encyclopedia of Database Systems, 1490-1496, 10.1007/978-0-387-39940-9_1069 (Springer US, 2009).
31.Mesiti, M. et al. Xml-based approaches for the integration of heterogeneous bio-molecular data. BMC Bioinformatics10, 10.1186/1471-2105-10-s12-s7 (2009). [DOI] [PMC free article] [PubMed]
32.Bonfitto, S., Casiraghi, E. & Mesiti, M. Table understanding approaches for extracting knowledge from heterogeneous tables. WIREs Data Mining and Knowledge Discovery11, 10.1002/widm.1407 (2021).
33.Poggi, A. et al. Linking data to ontologies. In Spaccapietra, S. (ed.) Journal on Data Semantics X, 133–173 (Springer Berlin Heidelberg, Berlin, Heidelberg, 2008).
34.Das, S., Sundara, S. & Cyganiak, R. R2rml: Rdb to rdf mapping language - w3c recommendation. Available at https://www.w3.org/TR/r2rml/ (2012).
35.Dimou, A. et al. RML: a generic language for integrated RDF mappings of heterogeneous data. In Bizer, C., Heath, T., Auer, S. & Berners-Lee, T. (eds.) Proceedings of the 7th Workshop on Linked Data on the Web, vol. 1184 of CEUR Workshop Proceedings (2014).
36.Lefrançois, M., Zimmermann, A. & Bakerally, N. A sparql extension for generating rdf from heterogeneous formats. In Blomqvist, E. et al. (eds.) The Semantic Web, 35–50 10.1007/978-3-319-58068-5_3 (Springer International Publishing, Cham, 2017).
37.Heyvaert, P., De Meester, B., Dimou, A. & Verborgh, R.Declarative Rules for Linked Data Generation at Your Fingertips!, 213-217 (Springer International Publishing, 2018).
38.García-González, H., Boneva, I., Staworko, S., Labra-Gayo, J. E. & Cueva Lovelle, J. M. Shexml: improving the usability of heterogeneous data mapping languages for first-time users. PeerJ Computer Science6, e318, 10.7717/peerj-cs.318 (2020). 10.7717/peerj-cs.318 [DOI] [PMC free article] [PubMed] [Google Scholar]
39.Zhang, S. et al. A graph-based approach for integrating biological heterogeneous data based on connecting ontology. In 2021 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)10.1109/bibm52615.2021.9669700 (IEEE, 2021).
40.Ashburner, M. et al. Gene ontology: tool for the unification of biology. Nature Genetics25, 25–29, 10.1038/75556 (2000). 10.1038/75556 [DOI] [PMC free article] [PubMed] [Google Scholar]
41.Pan, Q. et al. Trait ontology analysis based on association mapping studies bridges the gap between crop genomics and phenomics. BMC Genomics20, 10.1186/s12864-019-5812-0 (2019). [DOI] [PMC free article] [PubMed]
42.Schriml, L. M. et al. The human disease ontology 2022 update. Nucleic Acids Research50, D1255–D1261, 10.1093/nar/gkab1063 (2021). 10.1093/nar/gkab1063 [DOI] [PMC free article] [PubMed] [Google Scholar]
43.Cooper, L. & Jaiswal, P.The Plant Ontology: A Tool for Plant Genomics, 89-114 (Springer New York, 2016). [DOI] [PubMed]
44.Robinson, P. N. et al. The human phenotype ontology: A tool for annotating and analyzing human hereditary disease. The American Journal of Human Genetics83, 610–615, 10.1016/j.ajhg.2008.09.017 (2008). 10.1016/j.ajhg.2008.09.017 [DOI] [PMC free article] [PubMed] [Google Scholar]
45.CDC - Centers for Disease Control and Prevention. Learn about specific birth defects. Available at https://www.cdc.gov/ncbddd/birthdefects/types.html (2023).
46.Lachmann, A. et al. Geneshot: search engine for ranking genes from arbitrary text queries. Nucleic Acids Research47, W571–W577, 10.1093/nar/gkz393 (2019). 10.1093/nar/gkz393 [DOI] [PMC free article] [PubMed] [Google Scholar]
47.Avram, S. et al. Drugcentral 2021 supports drug discovery and repositioning. Nucleic Acids Research49, D1160–D1169, 10.1093/nar/gkaa997 (2020). 10.1093/nar/gkaa997 [DOI] [PMC free article] [PubMed] [Google Scholar]
48.Evangelista, J. E. et al. SigCom LINCS: data and metadata search engine for a million gene expression signatures. Nucleic Acids Research50, W697–W709, 10.1093/nar/gkac328 (2022). 10.1093/nar/gkac328 [DOI] [PMC free article] [PubMed] [Google Scholar]
49.Sima, A. C. et al. Enabling semantic queries across federated bioinformatics databases. Database2019, baz106, 10.1093/database/baz106 (2019). 10.1093/database/baz106 [DOI] [PMC free article] [PubMed] [Google Scholar]
50.Sparmann, AnkeandVogel,J. örg Rna-based medicine: from molecular mechanisms to therapy. The EMBO Journal42, e114760, 10.15252/embj.2023114760 (2023). 10.15252/embj.2023114760 [DOI] [PMC free article] [PubMed] [Google Scholar]
51.Vorländer, M. K., Pacheco-Fiallos, B. & Plaschka, C. Structural basis of mrna maturation: Time to put it together. Current Opinion in Structural Biology75, 102431, 10.1016/j.sbi.2022.102431 (2022). 10.1016/j.sbi.2022.102431 [DOI] [PubMed] [Google Scholar]
52.Mattick, J. S. et al. Long non-coding rnas: definitions, functions, challenges and recommendations. Nature Reviews Molecular Cell Biology24, 430–447, 10.1038/s41580-022-00566-8 (2023). 10.1038/s41580-022-00566-8 [DOI] [PMC free article] [PubMed] [Google Scholar]
53.Liu, L. et al. LncRNAWiki 2.0: a knowledgebase of human long non-coding RNAs with enhanced curation model and database system. Nucleic Acids Research50, D190–D195, 10.1093/nar/gkab998 (2022). 10.1093/nar/gkab998 [DOI] [PMC free article] [PubMed] [Google Scholar]
54.Statello, L., Guo, C.-J., Chen, L.-L. & Huarte, M. Gene regulation by long non-coding rnas and its biological functions. Nat Rev Mol Cell Biol22, 96–118, 10.1038/s41580-020-00315-9 (2021). 10.1038/s41580-020-00315-9 [DOI] [PMC free article] [PubMed] [Google Scholar]
55.Vance, K. & CP, P. Transcriptional regulatory functions of nuclear long noncoding rnas. Trends Genet.30, 348–55, 10.1016/j.tig.2014.06.001 (2014). 10.1016/j.tig.2014.06.001 [DOI] [PMC free article] [PubMed] [Google Scholar]
56.Nisar, S. et al. Insights into the role of circrnas: Biogenesis, characterization, functional, and clinical impact in human malignancies. Frontiers in Cell and Developmental Biology9, 10.3389/fcell.2021.617281 (2021). [DOI] [PMC free article] [PubMed]
57.Loda, A. & Heard, E. Xist rna in action: Past, present, and future. PLoS genetics15, e1008333, 10.1371/journal.pgen.1008333 (2019). 10.1371/journal.pgen.1008333 [DOI] [PMC free article] [PubMed] [Google Scholar]
58.Kanduri, C. Kcnq1ot1: a chromatin regulatory rna. Seminars in Cell & Developmental Biology22, 343–350, 10.1016/j.semcdb.2011.02.020 (2011). 10.1016/j.semcdb.2011.02.020 [DOI] [PubMed] [Google Scholar]
59.Yang, Z. et al. Insights into the role of long non-coding rnas in dna methylation mediated transcriptional regulation. Frontiers in molecular biosciences9, 1067406, 10.3389/fmolb.2022.1067406 (2022). 10.3389/fmolb.2022.1067406 [DOI] [PMC free article] [PubMed] [Google Scholar]
60.Hannon, G. J. Rna interference. Nature418, 244–251, 10.1038/418244a (2002). 10.1038/418244a [DOI] [PubMed] [Google Scholar]
61.Stephen, B. J. et al. Xeno-mirna in maternal-infant immune crosstalk: An aid to disease alleviation. Frontiers in Immunology11, 10.3389/fimmu.2020.00404 (2020). [DOI] [PMC free article] [PubMed]
62.Lee, J. & JT, M. Antisense-mediated transcript knockdown triggers premature transcription termination. Mol Cell.77, 1044–1054, 10.1016/j.molcel.2019.12.011 (2020). 10.1016/j.molcel.2019.12.011 [DOI] [PMC free article] [PubMed] [Google Scholar]
63.Yu, A.-M., Choi, Y. H. & Tu, M.-J. Rna drugs and rna targets for small molecules: Principles, progress, and challenges. Pharmacological Reviews72, 862–898, 10.1124/pr.120.019554 (2020). 10.1124/pr.120.019554 [DOI] [PMC free article] [PubMed] [Google Scholar]
64.Dunn, M. R., Jimenez, R. M. & Chaput, J. C. Analysis of aptamer discovery and technology. Nature Reviews Chemistry1, 0076, 10.1038/s41570-017-0076 (2017). 10.1038/s41570-017-0076 [DOI] [Google Scholar]
65.Byun, J. Recent progress and opportunities for nucleic acid aptamers. Life11, 193, 10.3390/life11030193 (2021). 10.3390/life11030193 [DOI] [PMC free article] [PubMed] [Google Scholar]
66.Ştefan, G., Hosu, O., De Wael, K., Lobo-Castañón, M. J. & Cristea, C. Aptamers in biomedicine: Selection strategies and recent advances. Electrochimica Acta376, 137994, 10.1016/j.electacta.2021.137994 (2021). 10.1016/j.electacta.2021.137994 [DOI] [Google Scholar]
67.Machtel, P., Bakowska-Żywicka, K. & Żywicki, M. Emerging applications of riboswitches - from antibacterial targets to molecular tools. Journal of Applied Genetics57, 531–541, 10.1007/s13353-016-0341-x (2016). 10.1007/s13353-016-0341-x [DOI] [PMC free article] [PubMed] [Google Scholar]
68.Linlin, S., Brianna Marie, L. & Yuan-Xiang, T. The crispr/cas9 system for gene editing and its potential application in pain research. Translational Perioperative and Pain Medicine3, 10.31480/2330-4871/040 (2016). [PMC free article] [PubMed]
69.Wang, X. et al. Knowledge graph quality control: A survey. Fundamental Research10.1016/j.fmre.2021.08.018 (2021).
70.The pandas development team. pandas-dev/pandas: Pandas. Zenodo10.5281/zenodo.3509134 (2020). 10.5281/zenodo.3509134 [DOI]
71.Sweeney, B. A. et al. Rnacentral 2021: secondary structure integration, improved sequence search and new member databases. Nucleic Acids Research49, D212–D220, 10.1093/nar/gkaa921 (2020). 10.1093/nar/gkaa921 [DOI] [PMC free article] [PubMed] [Google Scholar]
72.Cantelli, G. et al. The european bioinformatics institute (embl-ebi) in 2021. Nucleic Acids Research50, D11–D19, 10.1093/nar/gkab1127 (2021). 10.1093/nar/gkab1127 [DOI] [PMC free article] [PubMed] [Google Scholar]
73.Altschul, S. F., Gish, W., Miller, W., Myers, E. W. & Lipman, D. J. Basic local alignment search tool. Journal of Molecular Biology215, 403–410, 10.1016/s0022-2836(05)80360-2 (1990). 10.1016/s0022-2836(05)80360-2 [DOI] [PubMed] [Google Scholar]
74.Pearson, W. R. & Lipman, D. J. Improved tools for biological sequence comparison. Proceedings of the National Academy of Sciences85, 2444–2448, 10.1073/pnas.85.8.2444 (1988). 10.1073/pnas.85.8.2444 [DOI] [PMC free article] [PubMed] [Google Scholar]
75.Guo, L., Sun, B., Wu, Q., Yang, S. & Chen, F. mirna-mirna interaction implicates for potential mutual regulatory pattern. Gene511, 187–194, 10.1016/j.gene.2012.09.066 (2012). 10.1016/j.gene.2012.09.066 [DOI] [PubMed] [Google Scholar]
76.Lai, E. C., Wiel, C. & Rubin, G. M. Complementary mirna pairs suggest a regulatory role for mirna:mirna duplexes. RNA10, 171–175, 10.1261/rna.5191904 (2004). 10.1261/rna.5191904 [DOI] [PMC free article] [PubMed] [Google Scholar]
77.Spear, A. D., Ceusters, W. & Smith, B. Functions in basic formal ontology. Applied Ontology11, 103–128, 10.3233/ao-160164 (2016). 10.3233/ao-160164 [DOI] [Google Scholar]
78.Callahan, T. J. et al. Owl-nets: Transforming owl representations for improved network inference. In Biocomputing 2018, 10.1142/9789813235533_0013 (WORLD SCIENTIFIC, 2017). [PMC free article] [PubMed]
79.Cappelletti, L. et al. Grape for fast and scalable graph processing and random-walk-based embedding. Nature Computational Science3, 552–568, 10.1038/s43588-023-00465-8 (2023). 10.1038/s43588-023-00465-8 [DOI] [PMC free article] [PubMed] [Google Scholar]
80.Blazegraph™. Blazegraph™ DB. Available at https://blazegraph.com/.
81.Cavalleri, E. et al. Rna-kg: 2024-05-21 release. Zenodo10.5281/zenodo.11236947 (2024). 10.5281/zenodo.11236947 [DOI]
82.Wang, J. et al. pirbase: integrating pirna annotation in all aspects. Nucleic Acids Research50, D265–D272, 10.1093/nar/gkab1012 (2021). 10.1093/nar/gkab1012 [DOI] [PMC free article] [PubMed] [Google Scholar]
83.Rosenkranz, D., Zischler, H. & Gebert, D. pirnaclusterdb 2.0: update and expansion of the pirna cluster database. Nucleic Acids Research50, D259–D264, 10.1093/nar/gkab622 (2021). 10.1093/nar/gkab622 [DOI] [PMC free article] [PubMed] [Google Scholar]
84.Salzberg, S. L. Open questions: How many genes do we have?BMC Biology16, 10.1186/s12915-018-0564-x (2018). [DOI] [PMC free article] [PubMed]
85.Grover, A. & Leskovec, J. Node2vec: Scalable feature learning for networks. In Proc. of the 22nd ACM SIGKDD Int’l Conf. on Knowledge Discovery and Data Mining, KDD ’16, 855-864, 10.1145/2939672.2939754 (ACM, New York, NY, USA, 2016). [DOI] [PMC free article] [PubMed]
86.Fernández-Moreno, R., Torre-Cisneros, J. & Cantisán, S. Human cytomegalovirus (hcmv)-encoded micrornas: potential biomarkers and clinical applications. RNA Biology18, 2194–2202, 10.1080/15476286.2021.1930757 (2021). 10.1080/15476286.2021.1930757 [DOI] [PMC free article] [PubMed] [Google Scholar]
87.Peng, Q. et al. Foxa1 suppresses the growth, migration, and invasion of nasopharyngeal carcinoma cells through repressing mir-100-5p and mir-125b-5p. Journal of Cancer11, 2485–2495, 10.7150/jca.40709 (2020). 10.7150/jca.40709 [DOI] [PMC free article] [PubMed] [Google Scholar]
88.Alstott, J., Bullmore, E. & Plenz, D. powerlaw: A python package for analysis of heavy-tailed distributions. PLoS ONE9, e85777, 10.1371/journal.pone.0085777 (2014). 10.1371/journal.pone.0085777 [DOI] [PMC free article] [PubMed] [Google Scholar]
89.Clauset, A., Shalizi, C. R. & Newman, M. E. J. Power-law distributions in empirical data. SIAM Review51, 661–703, 10.1137/070710111 (2009). 10.1137/070710111 [DOI] [Google Scholar]
90.Bodlaender, H. L. & Koster, A. M. Treewidth computations i. upper bounds. Information and Computation208, 259–275, 10.1016/j.ic.2009.03.008 (2010). 10.1016/j.ic.2009.03.008 [DOI] [Google Scholar]
91.Zhang, Y., Qian, H., He, J. & Gao, W. Mechanisms of trna-derived fragments and trna halves in cancer treatment resistance. Biomarker Research810.1186/s40364-020-00233-0 (2020). [DOI] [PMC free article] [PubMed]
92.Valentini, G., Paccanaro, A., Caniza, H., Romero, A. E. & Re, M. An extensive analysis of disease-gene associations using network integration and fast kernel-based gene prioritization methods. Artificial Intelligence in Medicine61, 63–78, 10.1016/j.artmed.2014.03.003 (2014). 10.1016/j.artmed.2014.03.003 [DOI] [PMC free article] [PubMed] [Google Scholar]
93.Cappelletti, L. et al. Node-degree aware edge sampling mitigates inflated classification performance in biomedical random walk-based graph representation learning. Bioinformatics Advances4, vbae036, 10.1093/bioadv/vbae036 (2024). 10.1093/bioadv/vbae036 [DOI] [PMC free article] [PubMed] [Google Scholar]
94.Hamilton, W. L., Ying, R. & Leskovec, J. Representation learning on graphs: Methods and applications. IEEE Data Eng. Bull.40, 52–74 (2017).
95.Yang, C., Xiao, Y., Zhang, Y., Sun, Y. & Han, J. Heterogeneous network representation learning: A unified framework with survey and benchmark. IEEE Transactions on Knowledge and Data Engineering34, 4854–4873, 10.1109/tkde.2020.3045924 (2022). 10.1109/tkde.2020.3045924 [DOI] [PMC free article] [PubMed] [Google Scholar]
96.Johnson, R., Li, M. M., Noori, A., Queen, O. & Zitnik, M. Graph artificial intelligence in medicine. Annu. Rev. Biomed. Data Sci.10.1146/annurev-biodatasci-110723-024625 (2024). [DOI] [PMC free article] [PubMed]
97.Li, M. M., Huang, K. & Zitnik, M. Graph representation learning in biomedicine and healthcare. Nature Biomedical Engineering6, 1353–1369, 10.1038/s41551-022-00942-x (2022). 10.1038/s41551-022-00942-x [DOI] [PMC free article] [PubMed] [Google Scholar]
98.Vasilevsky, N. A. et al. Mondo: Unifying diseases for the world, by the world. Preprint at 10.1101/2022.04.13.22273750 (2022).
99.He, Y. et al. Vo: Vaccine ontology. Nature Precedings10.1038/npre.2009.3553.1 (2009).
100.Degtyarenko, K. et al. Chebi: a database and ontology for chemical entities of biological interest. Nucleic Acids Research36, D344–D350, 10.1093/nar/gkm791 (2007). 10.1093/nar/gkm791 [DOI] [PMC free article] [PubMed] [Google Scholar]
101.Mungall, C. J., Torniai, C., Gkoutos, G. V., Lewis, S. E. & Haendel, M. A. Uberon, an integrative multi-species anatomy ontology. Genome Biology13, R5, 10.1186/gb-2012-13-1-r5 (2012). 10.1186/gb-2012-13-1-r5 [DOI] [PMC free article] [PubMed] [Google Scholar]
102.Sarntivijai, S. et al. Clo: The cell line ontology. Journal of Biomedical Semantics5, 37, 10.1186/2041-1480-5-37 (2014). 10.1186/2041-1480-5-37 [DOI] [PMC free article] [PubMed] [Google Scholar]
103.Natale, D. A. et al. The protein ontology: a structured representation of protein forms and complexes. Nucleic Acids Research39, D539–D545, 10.1093/nar/gkq907 (2010). 10.1093/nar/gkq907 [DOI] [PMC free article] [PubMed] [Google Scholar]
104.Eilbeck, K. et al. The sequence ontology: a tool for the unification of genome annotations. Genome Biology6, 10.1186/gb-2005-6-5-r44 (2005). [DOI] [PMC free article] [PubMed]
105.Petri, V. et al. The pathway ontology - updates and applications. Journal of Biomedical Semantics5, 7, 10.1186/2041-1480-5-7 (2014). 10.1186/2041-1480-5-7 [DOI] [PMC free article] [PubMed] [Google Scholar]
106.Kozomara, A., Birgaoanu, M. & Griffiths-Jones, S. mirbase: from microrna sequences to function. Nucleic Acids Research47, D155–D162, 10.1093/nar/gky1141 (2018). 10.1093/nar/gky1141 [DOI] [PMC free article] [PubMed] [Google Scholar]
107.Chen, Y. & Wang, X. mirdb: an online database for prediction of functional microrna targets. Nucleic Acids Research48, D127–D131, 10.1093/nar/gkz757 (2019). 10.1093/nar/gkz757 [DOI] [PMC free article] [PubMed] [Google Scholar]
108.Fan, Y., Habib, M. & Xia, J. Xeno-mirnet: a comprehensive database and analytics platform to explore xeno-mirnas and their potential targets. PeerJ6, e5650, 10.7717/peerj.5650 (2018). 10.7717/peerj.5650 [DOI] [PMC free article] [PubMed] [Google Scholar]
109.Xiao, F. et al. mirecords: an integrated resource for microrna-target interactions. Nucleic Acids Research37, D105–D110, 10.1093/nar/gkn851 (2009). 10.1093/nar/gkn851 [DOI] [PMC free article] [PubMed] [Google Scholar]
110.Huang, Z. et al. Hmdd v3.0: a database for experimentally supported human microrna-disease associations. Nucleic Acids Research47, D1013–D1017, 10.1093/nar/gky1010 (2018). 10.1093/nar/gky1010 [DOI] [PMC free article] [PubMed] [Google Scholar]
111.Dai, E. et al. Epimir: a database of curated mutual regulation between mirnas and epigenetic modifications. Database2014, 10.1093/database/bau023 (2014). [DOI] [PMC free article] [PubMed]
112.Jiang, Q. et al. mir2disease: a manually curated database for microrna deregulation in human disease. Nucleic Acids Research37, D98–D104, 10.1093/nar/gkn714 (2009). 10.1093/nar/gkn714 [DOI] [PMC free article] [PubMed] [Google Scholar]
113.McGeary, S. E. et al. The biochemical basis of microrna targeting efficacy. Science36610.1126/science.aav1741 (2019). [DOI] [PMC free article] [PubMed]
114.Bhattacharya, A. & Cui, Y. Somamir 2.0: a database of cancer somatic mutations altering microrna-cerna interactions. Nucleic Acids Research44, D1005–D1010, 10.1093/nar/gkv1220 (2015). 10.1093/nar/gkv1220 [DOI] [PMC free article] [PubMed] [Google Scholar]
115.Karagkouni, D. et al. Diana-tarbase v8: a decade-long collection of experimentally supported mirna-gene interactions. Nucleic Acids Research46, D239–D245, 10.1093/nar/gkx1141 (2017). 10.1093/nar/gkx1141 [DOI] [PMC free article] [PubMed] [Google Scholar]
116.Huang, H.-Y. et al. mirtarbase update 2022: an informative resource for experimentally validated mirna-target interactions. Nucleic Acids Research50, D222–D230, 10.1093/nar/gkab1079 (2021). 10.1093/nar/gkab1079 [DOI] [PMC free article] [PubMed] [Google Scholar]
117.Liu, X. et al. Sm2mir: a database of the experimentally validated small molecules’ effects on microrna expression. Bioinformatics29, 409–411, 10.1093/bioinformatics/bts698 (2012). 10.1093/bioinformatics/bts698 [DOI] [PubMed] [Google Scholar]
118.Tong, Z., Cui, Q., Wang, J. & Zhou, Y. Transmir v2.0: an updated transcription factor-microrna regulation database. Nucleic Acids Research47, D253–D258, 10.1093/nar/gky1023 (2018). 10.1093/nar/gky1023 [DOI] [PMC free article] [PubMed] [Google Scholar]
119.Bhattacharya, A., Ziebarth, J. D. & Cui, Y. Polymirts database 3.0: linking polymorphisms in micrornas and their target sites with human diseases and biological pathways. Nucleic Acids Research42, D86–D91, 10.1093/nar/gkt1028 (2013). 10.1093/nar/gkt1028 [DOI] [PMC free article] [PubMed] [Google Scholar]
120.Xu, F. et al. dbdemc 3.0: Functional exploration of differentially expressed mirnas in cancers of human and model organisms. Genomics, Proteomics & Bioinformatics20, 446–454, 10.1016/j.gpb.2022.04.006 (2022). 10.1016/j.gpb.2022.04.006 [DOI] [PMC free article] [PubMed] [Google Scholar]
121.Lu, M., Shi, B., Wang, J., Cao, Q. & Cui, Q. Tam: A method for enrichment and depletion analysis of a microrna category in a list of micrornas. BMC Bioinformatics11, 10.1186/1471-2105-11-419 (2010). [DOI] [PMC free article] [PubMed]
122.Bandyopadhyay, S. & Bhattacharyya, M. Putmir: A database for extracting neighboring transcription factors of human micrornas. BMC Bioinformatics11, 10.1186/1471-2105-11-190 (2010). [DOI] [PMC free article] [PubMed]
123.Kehl, T. et al. mirpathdb 2.0: a novel release of the mirna pathway dictionary database. Nucleic Acids Research48, D142–D147, 10.1093/nar/gkz1022 (2019). 10.1093/nar/gkz1022 [DOI] [PMC free article] [PubMed] [Google Scholar]
124.Xie, B., Ding, Q., Han, H. & Wu, D. mircancer: a microrna-cancer association database constructed by text mining on literature. Bioinformatics29, 638–644, 10.1093/bioinformatics/btt014 (2013). 10.1093/bioinformatics/btt014 [DOI] [PubMed] [Google Scholar]
125.Bruno, A. E. et al. mirdsnp: a database of disease-associated snps and microrna target sites on 3’utrs of human genes. BMC Genomics13, 10.1186/1471-2164-13-44 (2012). [DOI] [PMC free article] [PubMed]
126.Russo, F. et al. mirandola 2017: a curated knowledge base of non-invasive biomarkers. Nucleic Acids Research46, D354–D359, 10.1093/nar/gkx854 (2017). 10.1093/nar/gkx854 [DOI] [PMC free article] [PubMed] [Google Scholar]
127.Wishart, D. S. et al. Drugbank 5.0: a major update to the drugbank database for 2018. Nucleic Acids Research46, D1074–D1082, 10.1093/nar/gkx1037 (2017). 10.1093/nar/gkx1037 [DOI] [PMC free article] [PubMed] [Google Scholar]
128.Lindstrom, M. The MIT/ICBP siRNA Database. Available at https://web.mit.edu/sirna/links.html (2009).
129.Aptagen, LLC. Apta-Index™ (Aptamer Database). Available at https://www.aptagen.com/apta-index/ (2023).
130.Chiba, S. et al. eskip-finder: a machine learning-based web application and database to identify the optimal sequences of antisense oligonucleotides for exon skipping. Nucleic Acids Research49, W193–W198, 10.1093/nar/gkab442 (2021). 10.1093/nar/gkab442 [DOI] [PMC free article] [PubMed] [Google Scholar]
131.Kamens, J. The addgene repository: an international nonprofit plasmid and data resource. Nucleic Acids Research43, D1152–D1157, 10.1093/nar/gku893 (2014). 10.1093/nar/gku893 [DOI] [PMC free article] [PubMed] [Google Scholar]
132.Li, Z. et al. Lncbook 2.0: integrating human long non-coding rnas with multi-omics annotations. Nucleic Acids Research51, D186–D191, 10.1093/nar/gkac999 (2022). 10.1093/nar/gkac999 [DOI] [PMC free article] [PubMed] [Google Scholar]
133.Chen, G. et al. Lncrnadisease: a database for long-non-coding rna-associated diseases. Nucleic Acids Research41, D983–D986, 10.1093/nar/gks1099 (2012). 10.1093/nar/gks1099 [DOI] [PMC free article] [PubMed] [Google Scholar]
134.Li, Z. et al. Lncexpdb: an expression database of human long non-coding rnas. Nucleic Acids Research49, D962–D968, 10.1093/nar/gkaa850 (2020). 10.1093/nar/gkaa850 [DOI] [PMC free article] [PubMed] [Google Scholar]
135.Zhang, Y.-Y., Zhang, W.-Y., Xin, X.-H. & Du, P.-F. dbesslnc: A manually curated database of human and mouse essential lncrna genes. Computational and Structural Biotechnology Journal20, 2657–2663, 10.1016/j.csbj.2022.05.043 (2022). 10.1016/j.csbj.2022.05.043 [DOI] [PMC free article] [PubMed] [Google Scholar]
136.Mas-Ponte, D. et al. Lncatlas database for subcellular localization of long noncoding rnas. RNA23, 1080–1087, 10.1261/rna.060814.117 (2017). 10.1261/rna.060814.117 [DOI] [PMC free article] [PubMed] [Google Scholar]
137.Zhao, L. et al. Noncodev6: an updated database dedicated to long non-coding rna annotation in both animals and plants. Nucleic Acids Research49, D165–D171, 10.1093/nar/gkaa1046 (2020). 10.1093/nar/gkaa1046 [DOI] [PMC free article] [PubMed] [Google Scholar]
138.Gao, Y. et al. Lnc2cancer 3.0: an updated resource for experimentally supported lncrna/circrna cancer associations and web tools based on rna-seq and scrna-seq data. Nucleic Acids Research49, D1251–D1258, 10.1093/nar/gkaa1006 (2020). 10.1093/nar/gkaa1006 [DOI] [PMC free article] [PubMed] [Google Scholar]
139.Liu, L. et al. Lncrnawiki 2.0: a knowledgebase of human long non-coding rnas with enhanced curation model and database system. Nucleic Acids Research50, D190–D195, 10.1093/nar/gkab998 (2021). 10.1093/nar/gkab998 [DOI] [PMC free article] [PubMed] [Google Scholar]
140.Karagkouni, D. et al. Diana-lncbase v3: indexing experimentally supported mirna targets on non-coding transcripts. Nucleic Acids Research10.1093/nar/gkz1036 (2019). [DOI] [PMC free article] [PubMed]
141.Li, J. et al. Tanric: An interactive open platform to explore the function of lncrnas in cancer. Cancer Research75, 3728–3737, 10.1158/0008-5472.can-15-0273 (2015). 10.1158/0008-5472.can-15-0273 [DOI] [PMC free article] [PubMed] [Google Scholar]
142.Deng, J. et al. Ribocentre: a database of ribozymes. Nucleic Acids Research51, D262–D268, 10.1093/nar/gkac840 (2022). 10.1093/nar/gkac840 [DOI] [PMC free article] [PubMed] [Google Scholar]
143.Kalvari, I. et al. Rfam 14: expanded coverage of metagenomic, viral and microrna families. Nucleic Acids Research49, D192–D200, 10.1093/nar/gkaa1047 (2020). 10.1093/nar/gkaa1047 [DOI] [PMC free article] [PubMed] [Google Scholar]
144.Marchand, J. A., Pierson Smela, M. D., Jordan, T. H. H., Narasimhan, K. & Church, G. M. Tbdb: a database of structurally annotated t-box riboswitch:trna pairs. Nucleic Acids Research49, D229–D235, 10.1093/nar/gkaa721 (2020). 10.1093/nar/gkaa721 [DOI] [PMC free article] [PubMed] [Google Scholar]
145.Penchovsky, R., Pavlova, N. & Kaloudas, D. Rswitch: A novel bioinformatics database on riboswitches as antibacterial drug targets. IEEE/ACM Transactions on Computational Biology and Bioinformatics18, 804–808, 10.1109/tcbb.2020.2983922 (2021). 10.1109/tcbb.2020.2983922 [DOI] [PubMed] [Google Scholar]
146.Kumar, P., Mudunuri, S. B., Anaya, J. & Dutta, A. trfdb: a database for transfer rna fragments. Nucleic Acids Research43, D141–D145, 10.1093/nar/gku1138 (2014). 10.1093/nar/gku1138 [DOI] [PMC free article] [PubMed] [Google Scholar]
147.Wang, J.-H. et al. tsrfun: a comprehensive platform for decoding human tsrna expression, functions and prognostic value by high-throughput small rna-seq and clip-seq data. Nucleic Acids Research50, D421–D431, 10.1093/nar/gkab1023 (2021). 10.1093/nar/gkab1023 [DOI] [PMC free article] [PubMed] [Google Scholar]
148.Pliatsika, V., Loher, P., Telonis, A. G. & Rigoutsos, I. Mintbase: a framework for the interactive exploration of mitochondrial and nuclear trna fragments. Bioinformatics32, 2481–2489, 10.1093/bioinformatics/btw194 (2016). 10.1093/bioinformatics/btw194 [DOI] [PMC free article] [PubMed] [Google Scholar]
149.Lee, B. D., Neri, U., Oh, C. J., Simmonds, P. & Koonin, E. V. Viroiddb: a database of viroids and viroid-like circular rnas. Nucleic Acids Research50, D432–D438, 10.1093/nar/gkab974 (2021). 10.1093/nar/gkab974 [DOI] [PMC free article] [PubMed] [Google Scholar]
150.Bouchard-Bourelle, P. et al. snodb: an interactive database of human snorna sequences, abundance and interactions. Nucleic Acids Research48, D220–D225, 10.1093/nar/gkz884 (2019). 10.1093/nar/gkz884 [DOI] [PMC free article] [PubMed] [Google Scholar]
151.Jühling, F. et al. trnadb 2009: compilation of trna sequences and trna genes. Nucleic Acids Research37, D159–D162, 10.1093/nar/gkn772 (2009). 10.1093/nar/gkn772 [DOI] [PMC free article] [PubMed] [Google Scholar]
152.Chan, P. P. & Lowe, T. M. Gtrnadb 2.0: an expanded database of transfer rna genes identified in complete and draft genomes. Nucleic Acids Research44, D184–D189, 10.1093/nar/gkv1309 (2015). 10.1093/nar/gkv1309 [DOI] [PMC free article] [PubMed] [Google Scholar]
153.Hou, J., Wei, H. & Liu, B. ipida-gcn: Identification of pirna-disease associations based on graph convolutional network. PLOS Computational Biology18, e1010671, 10.1371/journal.pcbi.1010671 (2022). 10.1371/journal.pcbi.1010671 [DOI] [PMC free article] [PubMed] [Google Scholar]
154.Gupta, P., Das, G., Chattopadhyay, T., Ghosh, Z. & Mallick, B. Tarpid, a database of putative and validated targets of pirnas. Mol. Omics19, 706–713, 10.1039/D3MO00098B (2023). 10.1039/D3MO00098B [DOI] [PubMed] [Google Scholar]
155.Kang, J. et al. Rnainter v4.0: Rna interactome repository with redefined confidence scoring system and improved accessibility. Nucleic Acids Research50, D326–D332, 10.1093/nar/gkab997 (2021). 10.1093/nar/gkab997 [DOI] [PMC free article] [PubMed] [Google Scholar]
156.Cui, T. et al. Rnalocate v2.0: an updated resource for rna subcellular localization with increased coverage and annotation. Nucleic Acids Research50, D333–D339, 10.1093/nar/gkab825 (2021). 10.1093/nar/gkab825 [DOI] [PMC free article] [PubMed] [Google Scholar]
157.Chen, J. et al. Rnadisease v4.0: an updated resource of rna-associated diseases, providing rna-disease analysis, enrichment and prediction. Nucleic Acids Research51, D1397–D1404, 10.1093/nar/gkac814 (2022). 10.1093/nar/gkac814 [DOI] [PMC free article] [PubMed] [Google Scholar]
158.Wu, D. et al. ncrdeathdb: A comprehensive bioinformatics resource for deciphering network organization of the ncrna-mediated cell death system. Autophagy11, 1917–1926, 10.1080/15548627.2015.1089375 (2015). 10.1080/15548627.2015.1089375 [DOI] [PMC free article] [PubMed] [Google Scholar]
159.Huang, Y. et al. cncrnadb: a manually curated resource of experimentally supported rnas with both protein-coding and noncoding function. Nucleic Acids Research49, D65–D70, 10.1093/nar/gkaa791 (2020). 10.1093/nar/gkaa791 [DOI] [PMC free article] [PubMed] [Google Scholar]
160.Cheng, J. et al. Virbase v3.0: a virus and host ncrna-associated interaction repository with increased coverage and annotation. Nucleic Acids Research50, D928–D933, 10.1093/nar/gkab1029 (2021). 10.1093/nar/gkab1029 [DOI] [PMC free article] [PubMed] [Google Scholar]
161.Pathan, M. et al. Vesiclepedia 2019: a compendium of rna, proteins, lipids and metabolites in extracellular vesicles. Nucleic Acids Research47, D516–D519, 10.1093/nar/gky1029 (2018). 10.1093/nar/gky1029 [DOI] [PMC free article] [PubMed] [Google Scholar]
162.Zhang, Y. et al. Directrmdb: a database of post-transcriptional rna modifications unveiled from direct rna sequencing technology. Nucleic Acids Research51, D106–D116, 10.1093/nar/gkac1061 (2022). 10.1093/nar/gkac1061 [DOI] [PMC free article] [PubMed] [Google Scholar]
163.Boccaletto, P. et al. Modomics: a database of rna modification pathways. 2021 update. Nucleic Acids Research50, D231–D235, 10.1093/nar/gkab1083 (2021). 10.1093/nar/gkab1083 [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Citations

Mungall, C. et al. oborel/obo-relations: 2023-08-18 release. Zenodo10.5281/zenodo.8263469 (2023). 10.5281/zenodo.8263469 [DOI]
The pandas development team. pandas-dev/pandas: Pandas. Zenodo10.5281/zenodo.3509134 (2020). 10.5281/zenodo.3509134 [DOI]
Cavalleri, E. et al. Rna-kg: 2024-05-21 release. Zenodo10.5281/zenodo.11236947 (2024). 10.5281/zenodo.11236947 [DOI]

Supplementary Materials

41597_2024_3673_MOESM1_ESM.pdf^{(167.4KB, pdf)}

An ontology-based knowledge graph for representing interactions involving RNA molecules - Supplementary material

Data Availability Statement

[CR1] 1.Bartel, D. P. & Chen, C.-Z. Micromanagers of gene expression: the potentially widespread influence of metazoan micrornas. Nature Reviews Genetics5, 396–400, 10.1038/nrg1328 (2004). 10.1038/nrg1328 [DOI] [PubMed] [Google Scholar]

[CR2] 2.Guttman, M. & Rinn, J. L. Modular regulatory principles of large non-coding rnas. Nature482, 339–346, 10.1038/nature10887 (2012). 10.1038/nature10887 [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR3] 3.Cech, T. R. & Steitz, J. A. The noncoding rna revolution—trashing old rules to forge new ones. Cell157, 77–94, 10.1016/j.cell.2014.03.008 (2014). 10.1016/j.cell.2014.03.008 [DOI] [PubMed] [Google Scholar]

[CR4] 4.Iyer, M. K. et al. The landscape of long noncoding rnas in the human transcriptome. Nature genetics47, 199–208, 10.1038/ng.3192 (2015). 10.1038/ng.3192 [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR5] 5.Lorenzi, L. et al. The rna atlas expands the catalog of human non-coding rnas. Nature biotechnology39, 1453–1465, 10.1038/s41587-021-00936-1 (2021). 10.1038/s41587-021-00936-1 [DOI] [PubMed] [Google Scholar]

[CR6] 6.Keller, A. et al. mirnatissueatlas2: an update to the human mirna tissue atlas. Nucleic acids research50, D211–D221, 10.1093/nar/gkab808 (2022). 10.1093/nar/gkab808 [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR7] 7.Vo, J. N. et al. The landscape of circular rna in cancer. Cell176, 869–881, 10.1016/j.cell.2018.12.021 (2019). 10.1016/j.cell.2018.12.021 [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR8] 8.Damase, T. R. et al. The limitless future of rna therapeutics. Frontiers in Bioengineering and Biotechnology9, 10.3389/fbioe.2021.628137 (2021). [DOI] [PMC free article] [PubMed]

[CR9] 9.Barbier, A. J., Jiang, A. Y., Zhang, P., Wooster, R. & Anderson, D. G. The clinical progress of mrna vaccines and immunotherapies. Nature Biotechnology40, 840–854, 10.1038/s41587-022-01294-2 (2022). 10.1038/s41587-022-01294-2 [DOI] [PubMed] [Google Scholar]

[CR10] 10.Carvalho, T. Personalized anti-cancer vaccine combining mrna and immunotherapy tested in melanoma trial. Nature Medicine29, 2379–2380, 10.1038/d41591-023-00072-0 (2023). 10.1038/d41591-023-00072-0 [DOI] [PubMed] [Google Scholar]

[CR11] 11.Winkle, M., El-Daly, S. M., Fabbri, M. & Calin, G. A. Noncoding rna therapeutics — challenges and potential solutions. Nature Reviews Drug Discovery20, 629–651, 10.1038/s41573-021-00219-z (2021). 10.1038/s41573-021-00219-z [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR12] 12.Paunovska, K., Loughrey, D. & Dahlman, J. E. Drug delivery systems for rna therapeutics. Nature Reviews Genetics23, 265–280, 10.1038/s41576-021-00439-4 (2022). 10.1038/s41576-021-00439-4 [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR13] 13.Hombach, S. & Kretz, M.Non-coding RNAs: Classification, Biology and Functioning, 3-17 (Springer International Publishing, 2016). [DOI] [PubMed]

[CR14] 14.Hogan, A. et al. Knowledge graphs. ACM Computing Surveys54, 1–37, 10.1145/3447772 (2021). 10.1145/3447772 [DOI] [Google Scholar]

[CR15] 15.Neo4j. Neo4j - the world’s leading graph database. Available at http://neo4j.org/ (2012).

[CR16] 16.Beckett, D. & McBride, B. RDF/XML Syntax Specification (Revised) - W3C recommendation. Available at https://www.w3.org/TR/REC-rdf-syntax/ (2004).

[CR17] 17.Alocci, D. et al. Property graph vs rdf triple store: A comparison on glycan substructure search. PLOS ONE10, e0144578, 10.1371/journal.pone.0144578 (2015). 10.1371/journal.pone.0144578 [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR18] 18.OWL Working Group. Web ontology language (owl) - w3c recommendation. Available at https://www.w3.org/OWL/ (2012).

[CR19] 19.Baader, F., Horrocks, I., Lutz, C. & Sattler, U.An Introduction to Description Logic (Cambridge University Press, 2017).

[CR20] 20.Prud’hommeaux, E. & Seaborne, A. SPARQL Query Language for RDF - W3C recommendation. Available at https://www.w3.org/TR/rdf-sparql-query/ (2018).

[CR21] 21.Chen, J. et al. Knowledge graphs for the life sciences: Recent developments, challenges and opportunities. Transactions on Graph Data Knowl.1, 5:1–5:33, 10.4230/TGDK.1.1.5 (2023).

[CR22] 22.Chandak, P., Huang, K. & Zitnik, M. Building a knowledge graph to enable precision medicine. Scientific Data10, 10.1038/s41597-023-01960-3 (2023). [DOI] [PMC free article] [PubMed]

[CR23] 23.Callahan, T. J. et al. An open source knowledge graph ecosystem for the life sciences. Scientific Data11, 10.1038/s41597-024-03171-w (2024). [DOI] [PMC free article] [PubMed]

[CR24] 24.Evangelista, J. E. et al. Toxicology knowledge graph for structural birth defects. Communications Medicine3, 10.1038/s43856-023-00329-2 (2023). [DOI] [PMC free article] [PubMed]

[CR25] 25.Shefchek, K. A. et al. The monarch initiative in 2019: an integrative data and analytic platform connecting phenotypes to genotypes across species. Nucleic Acids Research48, D704–D715, 10.1093/nar/gkz997 (2019). 10.1093/nar/gkz997 [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR26] 26.Boudin, M., Diallo, G., Drancé, M. & Mougin, F. The oregano knowledge graph for computational drug repurposing. Scientific Data10, 871, 10.1038/s41597-023-02757-0 (2023). 10.1038/s41597-023-02757-0 [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR27] 27.Livingston, K. M., Bada, M., Baumgartner, W. A. & Hunter, L. E. Kabob: ontology-based semantic integration of biomedical databases. BMC Bioinformatics16, 10.1186/s12859-015-0559-3 (2015). [DOI] [PMC free article] [PubMed]

[CR28] 28.Mungall, C. et al. oborel/obo-relations: 2023-08-18 release. Zenodo10.5281/zenodo.8263469 (2023). 10.5281/zenodo.8263469 [DOI]

[CR29] 29.Cavalleri, E. et al. A meta-graph for the construction of an rna-centered knowledge graph. In Rojas, I., Valenzuela, O., Rojas Ruiz, F., Herrera, L. J. & Ortuño, F. (eds.) Bioinformatics and Biomedical Engineering, 165–180, 10.1007/978-3-031-34953-9_13 (Springer Nature Switzerland, Cham, 2023).

[CR30] 30.Halevy, A. Information integration. In Encyclopedia of Database Systems, 1490-1496, 10.1007/978-0-387-39940-9_1069 (Springer US, 2009).

[CR31] 31.Mesiti, M. et al. Xml-based approaches for the integration of heterogeneous bio-molecular data. BMC Bioinformatics10, 10.1186/1471-2105-10-s12-s7 (2009). [DOI] [PMC free article] [PubMed]

[CR32] 32.Bonfitto, S., Casiraghi, E. & Mesiti, M. Table understanding approaches for extracting knowledge from heterogeneous tables. WIREs Data Mining and Knowledge Discovery11, 10.1002/widm.1407 (2021).

[CR33] 33.Poggi, A. et al. Linking data to ontologies. In Spaccapietra, S. (ed.) Journal on Data Semantics X, 133–173 (Springer Berlin Heidelberg, Berlin, Heidelberg, 2008).

[CR34] 34.Das, S., Sundara, S. & Cyganiak, R. R2rml: Rdb to rdf mapping language - w3c recommendation. Available at https://www.w3.org/TR/r2rml/ (2012).

[CR35] 35.Dimou, A. et al. RML: a generic language for integrated RDF mappings of heterogeneous data. In Bizer, C., Heath, T., Auer, S. & Berners-Lee, T. (eds.) Proceedings of the 7th Workshop on Linked Data on the Web, vol. 1184 of CEUR Workshop Proceedings (2014).

[CR36] 36.Lefrançois, M., Zimmermann, A. & Bakerally, N. A sparql extension for generating rdf from heterogeneous formats. In Blomqvist, E. et al. (eds.) The Semantic Web, 35–50 10.1007/978-3-319-58068-5_3 (Springer International Publishing, Cham, 2017).

[CR37] 37.Heyvaert, P., De Meester, B., Dimou, A. & Verborgh, R.Declarative Rules for Linked Data Generation at Your Fingertips!, 213-217 (Springer International Publishing, 2018).

[CR38] 38.García-González, H., Boneva, I., Staworko, S., Labra-Gayo, J. E. & Cueva Lovelle, J. M. Shexml: improving the usability of heterogeneous data mapping languages for first-time users. PeerJ Computer Science6, e318, 10.7717/peerj-cs.318 (2020). 10.7717/peerj-cs.318 [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR39] 39.Zhang, S. et al. A graph-based approach for integrating biological heterogeneous data based on connecting ontology. In 2021 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)10.1109/bibm52615.2021.9669700 (IEEE, 2021).

[CR40] 40.Ashburner, M. et al. Gene ontology: tool for the unification of biology. Nature Genetics25, 25–29, 10.1038/75556 (2000). 10.1038/75556 [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR41] 41.Pan, Q. et al. Trait ontology analysis based on association mapping studies bridges the gap between crop genomics and phenomics. BMC Genomics20, 10.1186/s12864-019-5812-0 (2019). [DOI] [PMC free article] [PubMed]

[CR42] 42.Schriml, L. M. et al. The human disease ontology 2022 update. Nucleic Acids Research50, D1255–D1261, 10.1093/nar/gkab1063 (2021). 10.1093/nar/gkab1063 [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR43] 43.Cooper, L. & Jaiswal, P.The Plant Ontology: A Tool for Plant Genomics, 89-114 (Springer New York, 2016). [DOI] [PubMed]

[CR44] 44.Robinson, P. N. et al. The human phenotype ontology: A tool for annotating and analyzing human hereditary disease. The American Journal of Human Genetics83, 610–615, 10.1016/j.ajhg.2008.09.017 (2008). 10.1016/j.ajhg.2008.09.017 [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR45] 45.CDC - Centers for Disease Control and Prevention. Learn about specific birth defects. Available at https://www.cdc.gov/ncbddd/birthdefects/types.html (2023).

[CR46] 46.Lachmann, A. et al. Geneshot: search engine for ranking genes from arbitrary text queries. Nucleic Acids Research47, W571–W577, 10.1093/nar/gkz393 (2019). 10.1093/nar/gkz393 [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR47] 47.Avram, S. et al. Drugcentral 2021 supports drug discovery and repositioning. Nucleic Acids Research49, D1160–D1169, 10.1093/nar/gkaa997 (2020). 10.1093/nar/gkaa997 [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR48] 48.Evangelista, J. E. et al. SigCom LINCS: data and metadata search engine for a million gene expression signatures. Nucleic Acids Research50, W697–W709, 10.1093/nar/gkac328 (2022). 10.1093/nar/gkac328 [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR49] 49.Sima, A. C. et al. Enabling semantic queries across federated bioinformatics databases. Database2019, baz106, 10.1093/database/baz106 (2019). 10.1093/database/baz106 [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR50] 50.Sparmann, AnkeandVogel,J. örg Rna-based medicine: from molecular mechanisms to therapy. The EMBO Journal42, e114760, 10.15252/embj.2023114760 (2023). 10.15252/embj.2023114760 [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR51] 51.Vorländer, M. K., Pacheco-Fiallos, B. & Plaschka, C. Structural basis of mrna maturation: Time to put it together. Current Opinion in Structural Biology75, 102431, 10.1016/j.sbi.2022.102431 (2022). 10.1016/j.sbi.2022.102431 [DOI] [PubMed] [Google Scholar]

[CR52] 52.Mattick, J. S. et al. Long non-coding rnas: definitions, functions, challenges and recommendations. Nature Reviews Molecular Cell Biology24, 430–447, 10.1038/s41580-022-00566-8 (2023). 10.1038/s41580-022-00566-8 [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR53] 53.Liu, L. et al. LncRNAWiki 2.0: a knowledgebase of human long non-coding RNAs with enhanced curation model and database system. Nucleic Acids Research50, D190–D195, 10.1093/nar/gkab998 (2022). 10.1093/nar/gkab998 [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR54] 54.Statello, L., Guo, C.-J., Chen, L.-L. & Huarte, M. Gene regulation by long non-coding rnas and its biological functions. Nat Rev Mol Cell Biol22, 96–118, 10.1038/s41580-020-00315-9 (2021). 10.1038/s41580-020-00315-9 [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR55] 55.Vance, K. & CP, P. Transcriptional regulatory functions of nuclear long noncoding rnas. Trends Genet.30, 348–55, 10.1016/j.tig.2014.06.001 (2014). 10.1016/j.tig.2014.06.001 [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR56] 56.Nisar, S. et al. Insights into the role of circrnas: Biogenesis, characterization, functional, and clinical impact in human malignancies. Frontiers in Cell and Developmental Biology9, 10.3389/fcell.2021.617281 (2021). [DOI] [PMC free article] [PubMed]

[CR57] 57.Loda, A. & Heard, E. Xist rna in action: Past, present, and future. PLoS genetics15, e1008333, 10.1371/journal.pgen.1008333 (2019). 10.1371/journal.pgen.1008333 [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR58] 58.Kanduri, C. Kcnq1ot1: a chromatin regulatory rna. Seminars in Cell & Developmental Biology22, 343–350, 10.1016/j.semcdb.2011.02.020 (2011). 10.1016/j.semcdb.2011.02.020 [DOI] [PubMed] [Google Scholar]

[CR59] 59.Yang, Z. et al. Insights into the role of long non-coding rnas in dna methylation mediated transcriptional regulation. Frontiers in molecular biosciences9, 1067406, 10.3389/fmolb.2022.1067406 (2022). 10.3389/fmolb.2022.1067406 [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR60] 60.Hannon, G. J. Rna interference. Nature418, 244–251, 10.1038/418244a (2002). 10.1038/418244a [DOI] [PubMed] [Google Scholar]

[CR61] 61.Stephen, B. J. et al. Xeno-mirna in maternal-infant immune crosstalk: An aid to disease alleviation. Frontiers in Immunology11, 10.3389/fimmu.2020.00404 (2020). [DOI] [PMC free article] [PubMed]

[CR62] 62.Lee, J. & JT, M. Antisense-mediated transcript knockdown triggers premature transcription termination. Mol Cell.77, 1044–1054, 10.1016/j.molcel.2019.12.011 (2020). 10.1016/j.molcel.2019.12.011 [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR63] 63.Yu, A.-M., Choi, Y. H. & Tu, M.-J. Rna drugs and rna targets for small molecules: Principles, progress, and challenges. Pharmacological Reviews72, 862–898, 10.1124/pr.120.019554 (2020). 10.1124/pr.120.019554 [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR64] 64.Dunn, M. R., Jimenez, R. M. & Chaput, J. C. Analysis of aptamer discovery and technology. Nature Reviews Chemistry1, 0076, 10.1038/s41570-017-0076 (2017). 10.1038/s41570-017-0076 [DOI] [Google Scholar]

[CR65] 65.Byun, J. Recent progress and opportunities for nucleic acid aptamers. Life11, 193, 10.3390/life11030193 (2021). 10.3390/life11030193 [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR66] 66.Ştefan, G., Hosu, O., De Wael, K., Lobo-Castañón, M. J. & Cristea, C. Aptamers in biomedicine: Selection strategies and recent advances. Electrochimica Acta376, 137994, 10.1016/j.electacta.2021.137994 (2021). 10.1016/j.electacta.2021.137994 [DOI] [Google Scholar]

[CR67] 67.Machtel, P., Bakowska-Żywicka, K. & Żywicki, M. Emerging applications of riboswitches - from antibacterial targets to molecular tools. Journal of Applied Genetics57, 531–541, 10.1007/s13353-016-0341-x (2016). 10.1007/s13353-016-0341-x [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR68] 68.Linlin, S., Brianna Marie, L. & Yuan-Xiang, T. The crispr/cas9 system for gene editing and its potential application in pain research. Translational Perioperative and Pain Medicine3, 10.31480/2330-4871/040 (2016). [PMC free article] [PubMed]

[CR69] 69.Wang, X. et al. Knowledge graph quality control: A survey. Fundamental Research10.1016/j.fmre.2021.08.018 (2021).

[CR70] 70.The pandas development team. pandas-dev/pandas: Pandas. Zenodo10.5281/zenodo.3509134 (2020). 10.5281/zenodo.3509134 [DOI]

[CR71] 71.Sweeney, B. A. et al. Rnacentral 2021: secondary structure integration, improved sequence search and new member databases. Nucleic Acids Research49, D212–D220, 10.1093/nar/gkaa921 (2020). 10.1093/nar/gkaa921 [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR72] 72.Cantelli, G. et al. The european bioinformatics institute (embl-ebi) in 2021. Nucleic Acids Research50, D11–D19, 10.1093/nar/gkab1127 (2021). 10.1093/nar/gkab1127 [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR73] 73.Altschul, S. F., Gish, W., Miller, W., Myers, E. W. & Lipman, D. J. Basic local alignment search tool. Journal of Molecular Biology215, 403–410, 10.1016/s0022-2836(05)80360-2 (1990). 10.1016/s0022-2836(05)80360-2 [DOI] [PubMed] [Google Scholar]

[CR74] 74.Pearson, W. R. & Lipman, D. J. Improved tools for biological sequence comparison. Proceedings of the National Academy of Sciences85, 2444–2448, 10.1073/pnas.85.8.2444 (1988). 10.1073/pnas.85.8.2444 [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR75] 75.Guo, L., Sun, B., Wu, Q., Yang, S. & Chen, F. mirna-mirna interaction implicates for potential mutual regulatory pattern. Gene511, 187–194, 10.1016/j.gene.2012.09.066 (2012). 10.1016/j.gene.2012.09.066 [DOI] [PubMed] [Google Scholar]

[CR76] 76.Lai, E. C., Wiel, C. & Rubin, G. M. Complementary mirna pairs suggest a regulatory role for mirna:mirna duplexes. RNA10, 171–175, 10.1261/rna.5191904 (2004). 10.1261/rna.5191904 [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR77] 77.Spear, A. D., Ceusters, W. & Smith, B. Functions in basic formal ontology. Applied Ontology11, 103–128, 10.3233/ao-160164 (2016). 10.3233/ao-160164 [DOI] [Google Scholar]

[CR78] 78.Callahan, T. J. et al. Owl-nets: Transforming owl representations for improved network inference. In Biocomputing 2018, 10.1142/9789813235533_0013 (WORLD SCIENTIFIC, 2017). [PMC free article] [PubMed]

[CR79] 79.Cappelletti, L. et al. Grape for fast and scalable graph processing and random-walk-based embedding. Nature Computational Science3, 552–568, 10.1038/s43588-023-00465-8 (2023). 10.1038/s43588-023-00465-8 [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR80] 80.Blazegraph™. Blazegraph™ DB. Available at https://blazegraph.com/.

[CR81] 81.Cavalleri, E. et al. Rna-kg: 2024-05-21 release. Zenodo10.5281/zenodo.11236947 (2024). 10.5281/zenodo.11236947 [DOI]

[CR82] 82.Wang, J. et al. pirbase: integrating pirna annotation in all aspects. Nucleic Acids Research50, D265–D272, 10.1093/nar/gkab1012 (2021). 10.1093/nar/gkab1012 [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR83] 83.Rosenkranz, D., Zischler, H. & Gebert, D. pirnaclusterdb 2.0: update and expansion of the pirna cluster database. Nucleic Acids Research50, D259–D264, 10.1093/nar/gkab622 (2021). 10.1093/nar/gkab622 [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR84] 84.Salzberg, S. L. Open questions: How many genes do we have?BMC Biology16, 10.1186/s12915-018-0564-x (2018). [DOI] [PMC free article] [PubMed]

[CR85] 85.Grover, A. & Leskovec, J. Node2vec: Scalable feature learning for networks. In Proc. of the 22nd ACM SIGKDD Int’l Conf. on Knowledge Discovery and Data Mining, KDD ’16, 855-864, 10.1145/2939672.2939754 (ACM, New York, NY, USA, 2016). [DOI] [PMC free article] [PubMed]

[CR86] 86.Fernández-Moreno, R., Torre-Cisneros, J. & Cantisán, S. Human cytomegalovirus (hcmv)-encoded micrornas: potential biomarkers and clinical applications. RNA Biology18, 2194–2202, 10.1080/15476286.2021.1930757 (2021). 10.1080/15476286.2021.1930757 [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR87] 87.Peng, Q. et al. Foxa1 suppresses the growth, migration, and invasion of nasopharyngeal carcinoma cells through repressing mir-100-5p and mir-125b-5p. Journal of Cancer11, 2485–2495, 10.7150/jca.40709 (2020). 10.7150/jca.40709 [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR88] 88.Alstott, J., Bullmore, E. & Plenz, D. powerlaw: A python package for analysis of heavy-tailed distributions. PLoS ONE9, e85777, 10.1371/journal.pone.0085777 (2014). 10.1371/journal.pone.0085777 [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR89] 89.Clauset, A., Shalizi, C. R. & Newman, M. E. J. Power-law distributions in empirical data. SIAM Review51, 661–703, 10.1137/070710111 (2009). 10.1137/070710111 [DOI] [Google Scholar]

[CR90] 90.Bodlaender, H. L. & Koster, A. M. Treewidth computations i. upper bounds. Information and Computation208, 259–275, 10.1016/j.ic.2009.03.008 (2010). 10.1016/j.ic.2009.03.008 [DOI] [Google Scholar]

[CR91] 91.Zhang, Y., Qian, H., He, J. & Gao, W. Mechanisms of trna-derived fragments and trna halves in cancer treatment resistance. Biomarker Research810.1186/s40364-020-00233-0 (2020). [DOI] [PMC free article] [PubMed]

[CR92] 92.Valentini, G., Paccanaro, A., Caniza, H., Romero, A. E. & Re, M. An extensive analysis of disease-gene associations using network integration and fast kernel-based gene prioritization methods. Artificial Intelligence in Medicine61, 63–78, 10.1016/j.artmed.2014.03.003 (2014). 10.1016/j.artmed.2014.03.003 [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR93] 93.Cappelletti, L. et al. Node-degree aware edge sampling mitigates inflated classification performance in biomedical random walk-based graph representation learning. Bioinformatics Advances4, vbae036, 10.1093/bioadv/vbae036 (2024). 10.1093/bioadv/vbae036 [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR94] 94.Hamilton, W. L., Ying, R. & Leskovec, J. Representation learning on graphs: Methods and applications. IEEE Data Eng. Bull.40, 52–74 (2017).

[CR95] 95.Yang, C., Xiao, Y., Zhang, Y., Sun, Y. & Han, J. Heterogeneous network representation learning: A unified framework with survey and benchmark. IEEE Transactions on Knowledge and Data Engineering34, 4854–4873, 10.1109/tkde.2020.3045924 (2022). 10.1109/tkde.2020.3045924 [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR96] 96.Johnson, R., Li, M. M., Noori, A., Queen, O. & Zitnik, M. Graph artificial intelligence in medicine. Annu. Rev. Biomed. Data Sci.10.1146/annurev-biodatasci-110723-024625 (2024). [DOI] [PMC free article] [PubMed]

[CR97] 97.Li, M. M., Huang, K. & Zitnik, M. Graph representation learning in biomedicine and healthcare. Nature Biomedical Engineering6, 1353–1369, 10.1038/s41551-022-00942-x (2022). 10.1038/s41551-022-00942-x [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR98] 98.Vasilevsky, N. A. et al. Mondo: Unifying diseases for the world, by the world. Preprint at 10.1101/2022.04.13.22273750 (2022).

[CR99] 99.He, Y. et al. Vo: Vaccine ontology. Nature Precedings10.1038/npre.2009.3553.1 (2009).

[CR100] 100.Degtyarenko, K. et al. Chebi: a database and ontology for chemical entities of biological interest. Nucleic Acids Research36, D344–D350, 10.1093/nar/gkm791 (2007). 10.1093/nar/gkm791 [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR101] 101.Mungall, C. J., Torniai, C., Gkoutos, G. V., Lewis, S. E. & Haendel, M. A. Uberon, an integrative multi-species anatomy ontology. Genome Biology13, R5, 10.1186/gb-2012-13-1-r5 (2012). 10.1186/gb-2012-13-1-r5 [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR102] 102.Sarntivijai, S. et al. Clo: The cell line ontology. Journal of Biomedical Semantics5, 37, 10.1186/2041-1480-5-37 (2014). 10.1186/2041-1480-5-37 [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR103] 103.Natale, D. A. et al. The protein ontology: a structured representation of protein forms and complexes. Nucleic Acids Research39, D539–D545, 10.1093/nar/gkq907 (2010). 10.1093/nar/gkq907 [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR104] 104.Eilbeck, K. et al. The sequence ontology: a tool for the unification of genome annotations. Genome Biology6, 10.1186/gb-2005-6-5-r44 (2005). [DOI] [PMC free article] [PubMed]

[CR105] 105.Petri, V. et al. The pathway ontology - updates and applications. Journal of Biomedical Semantics5, 7, 10.1186/2041-1480-5-7 (2014). 10.1186/2041-1480-5-7 [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR106] 106.Kozomara, A., Birgaoanu, M. & Griffiths-Jones, S. mirbase: from microrna sequences to function. Nucleic Acids Research47, D155–D162, 10.1093/nar/gky1141 (2018). 10.1093/nar/gky1141 [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR107] 107.Chen, Y. & Wang, X. mirdb: an online database for prediction of functional microrna targets. Nucleic Acids Research48, D127–D131, 10.1093/nar/gkz757 (2019). 10.1093/nar/gkz757 [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR108] 108.Fan, Y., Habib, M. & Xia, J. Xeno-mirnet: a comprehensive database and analytics platform to explore xeno-mirnas and their potential targets. PeerJ6, e5650, 10.7717/peerj.5650 (2018). 10.7717/peerj.5650 [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR109] 109.Xiao, F. et al. mirecords: an integrated resource for microrna-target interactions. Nucleic Acids Research37, D105–D110, 10.1093/nar/gkn851 (2009). 10.1093/nar/gkn851 [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR110] 110.Huang, Z. et al. Hmdd v3.0: a database for experimentally supported human microrna-disease associations. Nucleic Acids Research47, D1013–D1017, 10.1093/nar/gky1010 (2018). 10.1093/nar/gky1010 [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR111] 111.Dai, E. et al. Epimir: a database of curated mutual regulation between mirnas and epigenetic modifications. Database2014, 10.1093/database/bau023 (2014). [DOI] [PMC free article] [PubMed]

[CR112] 112.Jiang, Q. et al. mir2disease: a manually curated database for microrna deregulation in human disease. Nucleic Acids Research37, D98–D104, 10.1093/nar/gkn714 (2009). 10.1093/nar/gkn714 [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR113] 113.McGeary, S. E. et al. The biochemical basis of microrna targeting efficacy. Science36610.1126/science.aav1741 (2019). [DOI] [PMC free article] [PubMed]

[CR114] 114.Bhattacharya, A. & Cui, Y. Somamir 2.0: a database of cancer somatic mutations altering microrna-cerna interactions. Nucleic Acids Research44, D1005–D1010, 10.1093/nar/gkv1220 (2015). 10.1093/nar/gkv1220 [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR115] 115.Karagkouni, D. et al. Diana-tarbase v8: a decade-long collection of experimentally supported mirna-gene interactions. Nucleic Acids Research46, D239–D245, 10.1093/nar/gkx1141 (2017). 10.1093/nar/gkx1141 [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR116] 116.Huang, H.-Y. et al. mirtarbase update 2022: an informative resource for experimentally validated mirna-target interactions. Nucleic Acids Research50, D222–D230, 10.1093/nar/gkab1079 (2021). 10.1093/nar/gkab1079 [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR117] 117.Liu, X. et al. Sm2mir: a database of the experimentally validated small molecules’ effects on microrna expression. Bioinformatics29, 409–411, 10.1093/bioinformatics/bts698 (2012). 10.1093/bioinformatics/bts698 [DOI] [PubMed] [Google Scholar]

[CR118] 118.Tong, Z., Cui, Q., Wang, J. & Zhou, Y. Transmir v2.0: an updated transcription factor-microrna regulation database. Nucleic Acids Research47, D253–D258, 10.1093/nar/gky1023 (2018). 10.1093/nar/gky1023 [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR119] 119.Bhattacharya, A., Ziebarth, J. D. & Cui, Y. Polymirts database 3.0: linking polymorphisms in micrornas and their target sites with human diseases and biological pathways. Nucleic Acids Research42, D86–D91, 10.1093/nar/gkt1028 (2013). 10.1093/nar/gkt1028 [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR120] 120.Xu, F. et al. dbdemc 3.0: Functional exploration of differentially expressed mirnas in cancers of human and model organisms. Genomics, Proteomics & Bioinformatics20, 446–454, 10.1016/j.gpb.2022.04.006 (2022). 10.1016/j.gpb.2022.04.006 [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR121] 121.Lu, M., Shi, B., Wang, J., Cao, Q. & Cui, Q. Tam: A method for enrichment and depletion analysis of a microrna category in a list of micrornas. BMC Bioinformatics11, 10.1186/1471-2105-11-419 (2010). [DOI] [PMC free article] [PubMed]

[CR122] 122.Bandyopadhyay, S. & Bhattacharyya, M. Putmir: A database for extracting neighboring transcription factors of human micrornas. BMC Bioinformatics11, 10.1186/1471-2105-11-190 (2010). [DOI] [PMC free article] [PubMed]

[CR123] 123.Kehl, T. et al. mirpathdb 2.0: a novel release of the mirna pathway dictionary database. Nucleic Acids Research48, D142–D147, 10.1093/nar/gkz1022 (2019). 10.1093/nar/gkz1022 [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR124] 124.Xie, B., Ding, Q., Han, H. & Wu, D. mircancer: a microrna-cancer association database constructed by text mining on literature. Bioinformatics29, 638–644, 10.1093/bioinformatics/btt014 (2013). 10.1093/bioinformatics/btt014 [DOI] [PubMed] [Google Scholar]

[CR125] 125.Bruno, A. E. et al. mirdsnp: a database of disease-associated snps and microrna target sites on 3’utrs of human genes. BMC Genomics13, 10.1186/1471-2164-13-44 (2012). [DOI] [PMC free article] [PubMed]

[CR126] 126.Russo, F. et al. mirandola 2017: a curated knowledge base of non-invasive biomarkers. Nucleic Acids Research46, D354–D359, 10.1093/nar/gkx854 (2017). 10.1093/nar/gkx854 [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR127] 127.Wishart, D. S. et al. Drugbank 5.0: a major update to the drugbank database for 2018. Nucleic Acids Research46, D1074–D1082, 10.1093/nar/gkx1037 (2017). 10.1093/nar/gkx1037 [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR128] 128.Lindstrom, M. The MIT/ICBP siRNA Database. Available at https://web.mit.edu/sirna/links.html (2009).

[CR129] 129.Aptagen, LLC. Apta-Index™ (Aptamer Database). Available at https://www.aptagen.com/apta-index/ (2023).

[CR130] 130.Chiba, S. et al. eskip-finder: a machine learning-based web application and database to identify the optimal sequences of antisense oligonucleotides for exon skipping. Nucleic Acids Research49, W193–W198, 10.1093/nar/gkab442 (2021). 10.1093/nar/gkab442 [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR131] 131.Kamens, J. The addgene repository: an international nonprofit plasmid and data resource. Nucleic Acids Research43, D1152–D1157, 10.1093/nar/gku893 (2014). 10.1093/nar/gku893 [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR132] 132.Li, Z. et al. Lncbook 2.0: integrating human long non-coding rnas with multi-omics annotations. Nucleic Acids Research51, D186–D191, 10.1093/nar/gkac999 (2022). 10.1093/nar/gkac999 [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR133] 133.Chen, G. et al. Lncrnadisease: a database for long-non-coding rna-associated diseases. Nucleic Acids Research41, D983–D986, 10.1093/nar/gks1099 (2012). 10.1093/nar/gks1099 [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR134] 134.Li, Z. et al. Lncexpdb: an expression database of human long non-coding rnas. Nucleic Acids Research49, D962–D968, 10.1093/nar/gkaa850 (2020). 10.1093/nar/gkaa850 [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR135] 135.Zhang, Y.-Y., Zhang, W.-Y., Xin, X.-H. & Du, P.-F. dbesslnc: A manually curated database of human and mouse essential lncrna genes. Computational and Structural Biotechnology Journal20, 2657–2663, 10.1016/j.csbj.2022.05.043 (2022). 10.1016/j.csbj.2022.05.043 [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR136] 136.Mas-Ponte, D. et al. Lncatlas database for subcellular localization of long noncoding rnas. RNA23, 1080–1087, 10.1261/rna.060814.117 (2017). 10.1261/rna.060814.117 [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR137] 137.Zhao, L. et al. Noncodev6: an updated database dedicated to long non-coding rna annotation in both animals and plants. Nucleic Acids Research49, D165–D171, 10.1093/nar/gkaa1046 (2020). 10.1093/nar/gkaa1046 [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR138] 138.Gao, Y. et al. Lnc2cancer 3.0: an updated resource for experimentally supported lncrna/circrna cancer associations and web tools based on rna-seq and scrna-seq data. Nucleic Acids Research49, D1251–D1258, 10.1093/nar/gkaa1006 (2020). 10.1093/nar/gkaa1006 [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR139] 139.Liu, L. et al. Lncrnawiki 2.0: a knowledgebase of human long non-coding rnas with enhanced curation model and database system. Nucleic Acids Research50, D190–D195, 10.1093/nar/gkab998 (2021). 10.1093/nar/gkab998 [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR140] 140.Karagkouni, D. et al. Diana-lncbase v3: indexing experimentally supported mirna targets on non-coding transcripts. Nucleic Acids Research10.1093/nar/gkz1036 (2019). [DOI] [PMC free article] [PubMed]

[CR141] 141.Li, J. et al. Tanric: An interactive open platform to explore the function of lncrnas in cancer. Cancer Research75, 3728–3737, 10.1158/0008-5472.can-15-0273 (2015). 10.1158/0008-5472.can-15-0273 [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR142] 142.Deng, J. et al. Ribocentre: a database of ribozymes. Nucleic Acids Research51, D262–D268, 10.1093/nar/gkac840 (2022). 10.1093/nar/gkac840 [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR143] 143.Kalvari, I. et al. Rfam 14: expanded coverage of metagenomic, viral and microrna families. Nucleic Acids Research49, D192–D200, 10.1093/nar/gkaa1047 (2020). 10.1093/nar/gkaa1047 [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR144] 144.Marchand, J. A., Pierson Smela, M. D., Jordan, T. H. H., Narasimhan, K. & Church, G. M. Tbdb: a database of structurally annotated t-box riboswitch:trna pairs. Nucleic Acids Research49, D229–D235, 10.1093/nar/gkaa721 (2020). 10.1093/nar/gkaa721 [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR145] 145.Penchovsky, R., Pavlova, N. & Kaloudas, D. Rswitch: A novel bioinformatics database on riboswitches as antibacterial drug targets. IEEE/ACM Transactions on Computational Biology and Bioinformatics18, 804–808, 10.1109/tcbb.2020.2983922 (2021). 10.1109/tcbb.2020.2983922 [DOI] [PubMed] [Google Scholar]

[CR146] 146.Kumar, P., Mudunuri, S. B., Anaya, J. & Dutta, A. trfdb: a database for transfer rna fragments. Nucleic Acids Research43, D141–D145, 10.1093/nar/gku1138 (2014). 10.1093/nar/gku1138 [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR147] 147.Wang, J.-H. et al. tsrfun: a comprehensive platform for decoding human tsrna expression, functions and prognostic value by high-throughput small rna-seq and clip-seq data. Nucleic Acids Research50, D421–D431, 10.1093/nar/gkab1023 (2021). 10.1093/nar/gkab1023 [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR148] 148.Pliatsika, V., Loher, P., Telonis, A. G. & Rigoutsos, I. Mintbase: a framework for the interactive exploration of mitochondrial and nuclear trna fragments. Bioinformatics32, 2481–2489, 10.1093/bioinformatics/btw194 (2016). 10.1093/bioinformatics/btw194 [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR149] 149.Lee, B. D., Neri, U., Oh, C. J., Simmonds, P. & Koonin, E. V. Viroiddb: a database of viroids and viroid-like circular rnas. Nucleic Acids Research50, D432–D438, 10.1093/nar/gkab974 (2021). 10.1093/nar/gkab974 [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR150] 150.Bouchard-Bourelle, P. et al. snodb: an interactive database of human snorna sequences, abundance and interactions. Nucleic Acids Research48, D220–D225, 10.1093/nar/gkz884 (2019). 10.1093/nar/gkz884 [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR151] 151.Jühling, F. et al. trnadb 2009: compilation of trna sequences and trna genes. Nucleic Acids Research37, D159–D162, 10.1093/nar/gkn772 (2009). 10.1093/nar/gkn772 [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR152] 152.Chan, P. P. & Lowe, T. M. Gtrnadb 2.0: an expanded database of transfer rna genes identified in complete and draft genomes. Nucleic Acids Research44, D184–D189, 10.1093/nar/gkv1309 (2015). 10.1093/nar/gkv1309 [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR153] 153.Hou, J., Wei, H. & Liu, B. ipida-gcn: Identification of pirna-disease associations based on graph convolutional network. PLOS Computational Biology18, e1010671, 10.1371/journal.pcbi.1010671 (2022). 10.1371/journal.pcbi.1010671 [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR154] 154.Gupta, P., Das, G., Chattopadhyay, T., Ghosh, Z. & Mallick, B. Tarpid, a database of putative and validated targets of pirnas. Mol. Omics19, 706–713, 10.1039/D3MO00098B (2023). 10.1039/D3MO00098B [DOI] [PubMed] [Google Scholar]

[CR155] 155.Kang, J. et al. Rnainter v4.0: Rna interactome repository with redefined confidence scoring system and improved accessibility. Nucleic Acids Research50, D326–D332, 10.1093/nar/gkab997 (2021). 10.1093/nar/gkab997 [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR156] 156.Cui, T. et al. Rnalocate v2.0: an updated resource for rna subcellular localization with increased coverage and annotation. Nucleic Acids Research50, D333–D339, 10.1093/nar/gkab825 (2021). 10.1093/nar/gkab825 [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR157] 157.Chen, J. et al. Rnadisease v4.0: an updated resource of rna-associated diseases, providing rna-disease analysis, enrichment and prediction. Nucleic Acids Research51, D1397–D1404, 10.1093/nar/gkac814 (2022). 10.1093/nar/gkac814 [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR158] 158.Wu, D. et al. ncrdeathdb: A comprehensive bioinformatics resource for deciphering network organization of the ncrna-mediated cell death system. Autophagy11, 1917–1926, 10.1080/15548627.2015.1089375 (2015). 10.1080/15548627.2015.1089375 [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR159] 159.Huang, Y. et al. cncrnadb: a manually curated resource of experimentally supported rnas with both protein-coding and noncoding function. Nucleic Acids Research49, D65–D70, 10.1093/nar/gkaa791 (2020). 10.1093/nar/gkaa791 [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR160] 160.Cheng, J. et al. Virbase v3.0: a virus and host ncrna-associated interaction repository with increased coverage and annotation. Nucleic Acids Research50, D928–D933, 10.1093/nar/gkab1029 (2021). 10.1093/nar/gkab1029 [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR161] 161.Pathan, M. et al. Vesiclepedia 2019: a compendium of rna, proteins, lipids and metabolites in extracellular vesicles. Nucleic Acids Research47, D516–D519, 10.1093/nar/gky1029 (2018). 10.1093/nar/gky1029 [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR162] 162.Zhang, Y. et al. Directrmdb: a database of post-transcriptional rna modifications unveiled from direct rna sequencing technology. Nucleic Acids Research51, D106–D116, 10.1093/nar/gkac1061 (2022). 10.1093/nar/gkac1061 [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR163] 163.Boccaletto, P. et al. Modomics: a database of rna modification pathways. 2021 update. Nucleic Acids Research50, D231–D235, 10.1093/nar/gkab1083 (2021). 10.1093/nar/gkab1083 [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

An ontology-based knowledge graph for representing interactions involving RNA molecules

Emanuele Cavalleri

Alberto Cabri

Mauricio Soto-Gomez

Sara Bonfitto

Paolo Perlasca

Jessica Gliozzo

Tiffany J Callahan

Justin Reese

Peter N Robinson

Elena Casiraghi

Giorgio Valentini

Marco Mesiti

Abstract

Background & Summary

Related Work

Approaches for the construction of bio-medical knowledge graphs

RNA molecules

Fig. 1.

Coding RNA

Non-coding RNA

Long non-coding RNA (lncRNA)

Small non-coding RNA (sncRNA)

sncRNAs involved in the translation process

sncRNAs associated with RNA interference pathways

Aptamers, riboswitches, ribozymes and guide-RNA

Biomedical ontologies for the semantic characterization of RNA-KG

Table 1.

Methods

Fig. 2.

RNA sources characterization

Fig. 3.

Table 2.

Table 3.

Table 4.

Fig. 4.

Fig. 5.

Ontological description of the KG

Table 5.

Fig. 6.

Fig. 7.

Table 6.

Ontological alignment specification

Fig. 8.

Fig. 9.

RNA-KG generation and analysis

Data Records

Table 7.

Technical Validation

RNA-KG statistical analysis

Fig. 10.

Fig. 11.

Fig. 12.

t-SNE representation

Fig. 13.

Topological analysis

Degree distribution

Fig. 14.

Treewidth

Isomorphic node groups

Usage Notes

Supplementary information

Acknowledgements

Author contributions

Code availability

Competing interests

Footnotes

Supplementary information

References

Associated Data

Data Citations

Supplementary Materials

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases