Overview of the Knowledge Management Center for Illuminating the Druggable Genome

Tudor Oprea; Cristian Bologa; Jayme Holmes; Stephen Mathias; Vincent T Metzger; Anna Waller; Jeremy J Yang; Andrew R Leach; Lars Juhl Jensen; Keith J Kelleher; Timothy K Sheils; Ewy Mathé; Sorin Avram; Jeremy S Edwards

doi:10.1016/j.drudis.2024.103882

. Author manuscript; available in PMC: 2025 Mar 1.

Published in final edited form as: Drug Discov Today. 2024 Jan 11;29(3):103882. doi: 10.1016/j.drudis.2024.103882

Overview of the Knowledge Management Center for Illuminating the Druggable Genome

Tudor Oprea ^1,⁷, Cristian Bologa ¹, Jayme Holmes ¹, Stephen Mathias ¹, Vincent T Metzger ¹, Anna Waller ¹, Jeremy J Yang ¹, Andrew R Leach ², Lars Juhl Jensen ³, Keith J Kelleher ⁴, Timothy K Sheils ⁴, Ewy Mathé ⁴, Sorin Avram ⁵, Jeremy S Edwards ^1,⁶

PMCID: PMC10939799 NIHMSID: NIHMS1959990 PMID: 38218214

Abstract

The Knowledge Management Center (KMC) for the Illuminating the Druggable Genome (IDG) project aims to aggregate, update, and articulate protein-centric data knowledge for the entire human proteome, with emphasis on the understudied proteins from the three IDG protein families. KMC collates and analyzes data from over 70 resources to compile the Target Central Resource Database (TCRD), which is the web-based informatics platform (Pharos). These data include experimental, computational, and text-mined information on protein structures, compound interactions, and disease and phenotype associations. Based on this knowledge, proteins are classified into different Target Development Levels (TDLs) for identification of understudied targets. Additional work by the KMC focuses on enriching target knowledge and producing DrugCentral and other data visualization tools for expanding investigation of understudied targets.

Keywords: Druggable Genome, Pharos, knowledge management, database

Teaser:

The Knowledge Management Center (KMC) for the IDG project aggregates data on human proteins, emphasizing understudied ones, to compile the Target Central Resource Database and produce tools, such as DrugCentral, for investigating these targets.

Introduction

The KMC for the National Institutes of Health (NIH) Common Fund Initiative of IDG has become an invaluable resource that serves as a connection between raw data and implementable biomedical knowledge (https://pharos.nih.gov). This Center has a crucial role in aggregating and intelligently processing an immense variety of protein-centric data, ensuring a particular focus on bringing to light understudied proteins from the three prominent IDG protein families (kinases, ion channels, and G-protein-coupled receptors; GPCRs). With its collection of methods for collating and analyzing data from over 70 resources, the KMC not only serves as a repository, but also acts as a dynamic interface where experimental, computational, and text-mined information are brought together. It is this aggregation of data that facilitates a comprehensive understanding of protein structures, compound interactions, and associations with diseases and phenotypes.¹

Underpinning the KMC’s operations is the TCRD, a database that forms the foundation for Pharos (Figure 1), their web-based informatics platform.¹ TCRD not only archives data, but also organizes it in a manner that enhances its utility and accessibility for researchers and professionals in the biomedical field. Protein targets are not only identified, but also systematically classified into different TDLs,² thereby identifying understudied targets and paving the way for strategic research development. This structured approach both provides a rich source of insights and illuminates pathways toward innovative drug development and therapeutic strategies by leveraging the interconnected data sets.

In addition to its core functionalities, the KMC has worked to expand the utility of the knowledge it manages. For example, the development of DrugCentral stands out,³ offering high-quality annotations of drugs, including those used in veterinary practice, and providing data regarding their mechanisms of action. Complementing this, the KMC has also developed data visualization tools, such as the Target Importance and Novelty Explorer (TIN-X)⁴ and the Target Illumination GWAW Analytics (TIGA) tools,⁵ specifically designed to explore target importance and novelty and to illuminate targets through genome-wide association studies (GWAS) analytics, respectively. These tools provide a lens through which researchers can explore the potential significance and novelty of a target in association with a disease or phenotype.

Discussion

Target Development Level

The TDL classification system² within the KMC of the IDG is a knowledge-based classification system that streamlines the exploration and understanding of the vast, complex terrain of the human proteome. The significance of TDL stems from its ability to categorize proteins into discernible groups based on their development stage from a drug discovery perspective, effectively summarizing the levels of existing knowledge about them.² The TDL system is structured into four primary categories: Tclin, Tchem, Tbio, and Tdark. Tclin are mechanism of action (MoA) biomolecular targets (specifically, proteins) through which approved drugs exert their therapeutic effects. Tchem proteins, although not known as MoA targets, can bind small molecules with significant potency, making them potential candidates for drug development. Tbio proteins might not be known to bind small molecules or act as MoA targets, but are proteins with well-documented biological roles. Lastly, Tdark proteins, most pertinent to the mission of IDG, are those that remain largely unexplored, failing to meet the criteria of the other three categories and thereby representing a frontier of potential new discoveries. By systematically categorizing proteins into these distinct TDL categories, the KMC guides researchers through the human proteome, thereby directing the exploration of the ‘dark genome’.

The four TDL-category stratification has been used to annotate and cross-reference schizophrenia-associated genes and clinically repurposed drugs.⁶ Using this workflow, the authors identify several Tdark genes (e.g., ZSCAN31, ZSCAN23, TSNARE1, and BORCS7) that have strong genetic association score with schizophrenia; since March 2022 (when the paper was published), all these targets have now been recategorized as Tbio, which suggests increased interest in their biology. Using a similar cross-referencing and annotation approach, but with focus on oncology, Jiang et al. identified 784 (out of 6083 potential druggable genes) genes necessary for cancer cell growth⁷; this resource, based, in part, on TDLs and Pharos, was made available in 2022 as ‘The Cancer Druggable Gene Atlas’ (TCDA; http://fcgportal.org/TCDA/). By integrating genetic evidence, druggability and tissue specificity, Ryaboshapkina and Hammar noted that at least 284 tissue-specific genes are classified as Tdark, and could present opportunities for target discovery.⁸ Thus, the utilization of TDL categories in combination with other techniques offers a promising approach to uncover novel therapeutic targets, enhancing the prospects of drug discovery in various diseases.

DrugCentral

DrugCentral, a publicly accessible digital repository (https://drugcentral.org) of drug-related information established in 2016,⁹ has consistently gathered data from three prominent regulatory bodies: the US Food and Drug Administration (FDA) in the USA (www.fda.gov/home), the European Medicines Agency (EMA) in Europe (www.ema.europa.eu/en), and the Pharmaceuticals and Medical Devices Agency (PMDA) in Japan (www.pmda.go.jp/english/index.html). This resource furnishes accurate data suitable for both preclinical research and clinical practice. It interconnects chemical structures, molecular physicochemical attributes, and patent statuses with bioactivity information and molecular targets. Through curation of drug labels and scientific literature, DrugCentral encompasses authorized therapeutic applications, off-label uses, and contraindications. Whenever feasible, mechanistic targets and bioactive properties are annotated. In addition to pharmacodynamic data, DrugCentral delivers a range of standardized pharmacokinetic descriptors. The database uses statistical signal detection analyses on raw pharmacovigilance data to delineate postmarket drug incidents housed within it. Notably, drug products sanctioned in the USA are detailed with pharmaceutical formulations, concentrations, and administration routes. Collectively, this repository functions as a comprehensive drug compendium that is accessible online, facilitating convenient searches.

Since its initial introduction and inclusion in the 2017 Nucleic Acids Research (NAR) database edition, DrugCentral has undergone two major updates, in 2018¹⁰ and 2021,¹¹ broadening its utility to encompass research domains such as drug repositioning,¹² exploration of sex-based adverse drug events, and identification of anti-Coronavirus 2019 (COVID-19) compounds. Consequently, DrugCentral has evolved into an indispensable resource for the scientific community, intimately linked with well-established references, such as UniProt,¹³ ChEBI,¹⁴ Guide to Pharmacology,¹⁵ UniChem,¹⁶ Probes & Drugs portal (P&D),¹⁷ PhenCards,¹⁸ and COVID19db.¹⁹ Furthermore, it is an integral component of the KMC Datasets and Tools²⁰ within the NIH Common Fund’s IDG consortium (https://commonfund.nih.gov/idg).

The current 2023 DrugCentral update delineates the additional data since its last published version in 2021, alongside the incorporation of fresh attributes.¹ Foremost, the primary content has been augmented by the inclusion of 285 drugs newly approved up until March 31, 2022. These approvals are categorized as follows: 101 by the FDA, 48 by EMA, and 47 by PMDA (Table 1). DrugCentral 2023 introduces novel features, including: (i) information regarding veterinary drugs; (ii) documentation of off-label drug applications not documented in conventional medical references²¹; and (iii) documentation of adverse drug events pertaining to pediatric and geriatric medicine.

Table 1.

Brief description of DrugCentral (includes drugs approved up to August 31, 2023)

Data	Counts	Description
Drugs	4995	Approvals: 2570 (FDA, 633 (EMA), 515 (PMDA) Use: 4841 human and 396 veterinary Type: 4107 small molecules,^a 404 biologics,^b 489 other Repurposing Categories^c: 1090 OFP, 413 OFM, 413 ONP 30 835 pharmacological classifications 82 230 external identifiers
Bioactivity values	19 375	2310 drugs and 2937 protein targets^d 2010 MoA activity values (7.92±1.589)^e
Protein targets	3,192	731 MoA targets Top drug classes: enzymes (1211), kinases (497), GPCRs (463), ion channels (352)
Pharmacology data	51 449	2635 indications, 1492 contraindications, 861 off-label uses 9142 pharmacokinetic values for 1876 drugs
Pharmacovigilance data	227 408^f	Sex-based: 48 870 female, 24 693 male Age-based: 63 633 geriatric, 2 pediatric, 90 210 neutral
Drug products	152 476	152 476 for human and 1636 for veterinary use 124 132 drug labels
References	1195	560 journal articles, 624 drug labels

Open in a new tab

Organic drugs with molecular weight between 50 and 1250 AMU.

Peptides, monoclonal antibodies, antibody–drug conjugates, proteins, and oligonucleotides.

OFP, on-market drugs with expired patent and exclusivity coverage; OFM, off-market drugs; ONP, on-market drugs covered by active patent and exclusivity.³⁴

UniProt accession IDs.¹³

Mean and standard deviation.

Log-likelihood ratios (LLRs) at least five times higher than the calculated drug-specific threshold values (LLRT).³

The 2023 version of DrugCentral adds 285 new drugs to the 2021 publication.³ Of these, 131 were approved only for human use and 154 only for veterinary use (Figure 2a). Thus, 242 drugs already stored in DrugCentral are now associated with human and veterinary approvals (Figure 2b).

Figure 2. — Title. **(a)** Overlap between drugs approved for humans and veterinary use. **(b)** Number of drugs in DrugCentral from 2017 to 2023.

Azaperone provides an example of the utility of DrugCentral. It is a butyrophenone invented by Paul Janssen, as part of a series of neuroleptic drugs (US Patent 2979508). Wikipedia states that azaperone is ‘uncommonly used in humans as an antipsychotic drug’ (https://en.wikipedia.org/wiki/Azaperone). However, exhaustive literature searches, including Dutch pharmaceutical references from the 1960s and 1970s, suggest that this drug did not receive regulatory approval for human use (I.M. van Geijlswijk, personal communication, 2023). Azaperone (Stresnil^®) is used to reduce fear and aggression in recently mixed groups of pigs, based on a 1968 report showing its sedative effect in swine,²² and is the only sedative currently approved for pigs. Its DrugCentral record indicates that this is a veterinary drug only (https://drugcentral.org/drugcard/5590).

In the future, DrugCentral aims to continue providing highly curated data regarding drug approvals, clinical trial data, and scholarly publications.

Key resources generated from the Knowledge Management Center

Pharos

Pharos, a pivotal component of the IDG project, is a user-oriented web interface, facilitating access to, and visualization of, the data housed within the TCRD.¹ Accessible via the Pharos website (https://pharos.nih.gov/), it represents a bridging platform between the extensive, multifaceted data aggregated in the TCRD (http://juniper.health.unm.edu/tcrd/) and the researchers who seek to leverage this data for insightful exploration into the human proteome, disease biology, and drug discovery. Recently, Pharos and TCRD prioritized the development and refinement of visualization and analysis tools designed to patterns and insights from the compiled data. These tools not only offer users the capacity to perform intricate enrichment calculations, focusing on specific subsets of targets, diseases, or ligands, but also empower them to generate interactive heat maps and UpSet charts, enhancing the visualization and comprehension of various types of annotation. Consequently, Pharos emerges as an invaluable resource, enabling researchers to delve deep into disease biology and drug discovery investigations by facilitating nuanced data exploration, enrichment calculations, and data visualization, thereby illuminating potential pathways for further research and development in the biomedical field.

Target Importance and Novelty Explorer

TIN-X emerges as a pivotal resource under the aegis of the IDG consortium, serving as a public web application designed to identify, visualize, and explore the critical associations between targets and diseases, as highlighted in previous studies (https://newdrugtargets.org/).⁴ It uses the PubMed Name Entity Recognition (NER) data provided by the Center for Protein Research (CPR) and integrates them with the TCRD¹ to not only identify understudied proteins, but also to position itself as a tool used in biomedical research. Figure 3 delineates the data sources and informational flow intrinsic to TIN-X, which has been continuously accessible at https://newdrugtargets.org since its initial 2017 release and maintains its application programming interface (API) and user interface (UI) source code availability at https://github.com/unmtransinfo/tinx-api and https://github.com/unmtransinfo/tinx-ui, respectively. The rejuvenation and enhancement of TIN-X, embodied in its recently released version 3.0, entailed a process of troubleshooting, bug rectification, software dependency upgrades, and the incorporation of user-suggested enhancements, ultimately unveiling a platform with an expanded data set, contemporary architecture, a REST API, an open-source repository, cloud-based database, and novel UI features. Thus, TIN-X continues to be not only an illuminating tool for elucidating understudied drug targets, but also a general resource for scientists to utilize.

Figure 3. — Data sources and the flow of information within the Target Importance and Novelty Explorer (TIN-X) informatics workflow. Users access the TIN-X user interface (UI) via a web browser. Unlike earlier versions of TIN-X, the UI and the REST application programming interface (API) are separate components. User activity on the TIN-X public web application results in API requests, which, in turn, access the TIN-X database. Users can query the data directly via the REST API, which is supported by Swagger documentation. The TIN-X UI, API, and database are all hosted in the cloud using Amazon Web Services. The TIN-X Database relies on the Target Central Resource Database (TCRD) for target data and the JensenLab DISEASES resource for text-mined PubMed content. Blue arrows depict the flow of data from these two major sources to the TIN-X database. Abbreviations: GPCR, G-protein-coupled receptor; ION, ion channels; OGPCR, orphan G-protein-coupled receptor; NR, nuclear receptor.

Target Illumination GWAS Analytics

Target Illumination GWAS Analytics (TIGA) is a data analysis tool tailored to elucidate connections between protein targets and specific traits or diseases using GWAS data (https://unmtid-shinyapps.net/shiny/tiga/).⁵ TIGA processes GWAS data, identifying and illuminating associations between genetic variants and traits or diseases. It operates by analyzing single nucleotide polymorphisms (SNPs) and their related genomic data to associate potential protein targets with disease phenotypes, providing researchers with a data-driven base to explore and validate potential therapeutic targets. By deciphering and visually representing these associations, TIGA enables researchers to explore the potential genetic basis of various diseases.

DISEASES

The DISEASES database is a weekly updated resource providing more than 4 million associations between human diseases and genes.²³ Although most of these come from automatic text mining of the biomedical literature, the resource also integrates manually curated associations and GWAS-derived associations from TIGA.⁵ DISEASES can be queried through a web interface at https://diseases.jensenlab.org/, from where it is also available for bulk download. It also provides disease–gene associations for gene-centric databases, such as Pharos¹ and GeneCards/MalaCards,²⁴ and is used for disease enrichment analysis by Enrichr²⁵ and the STRING database.²⁶ Furthermore, DISEASES facilitates retrieval of protein interaction networks for any diseases of interest from the STRING database. This can be done either through the web interface of the latest version of STRING or by using Cytoscape stringApp, which also integrates TDL information.²⁷

Patent data mining

The KMC also sought to acquire bioactivity data against targets classified as Tdark or Tbio as per criteria established by the IDG Consortium²; in light of the unavailability of such data in the ChEMBL database,²⁸ this work pivoted toward the patent literature, concentrating on all life science-relevant patents from 2012 to 2018 as indicated in the SureChEMBL database,²⁹ totaling over 3.7 million. Methodologies were developed to flag patents containing tables with bioactivity keywords such as ‘IC50’ and to seek Tdark/Tbio targets cited in contexts hinting at valuable assays, consequently classifying patents into six groups used as starting points for prioritization and leading to the manual curation of 191 patents, revealing bioactivity data against 155 Tdark or Tbio targets. This endeavor, detailed in a published work,³⁰ also spawned a larger data set of ‘positive’ and ‘negative’ patents, which became fodder for the development and training of two naïve Bayes models aimed at refining the patent prioritization process. Application of this method to patents from 2019 to 2021 unearthed an additional 92 patents with bioactivity data against 144 Tdark/Tbio targets. Notably, 22 targets harbored bioactivity data values within the target family-specific cut-off limits requisite for Tchem classification. The outcomes have not only shed light on understudied targets, but also demonstrated the relevance of the patent literature as a data source for such targets, with identified information, such as compounds and activity values, now integrated into the ChEMBL database, thereby contributing richly to the collective understanding and exploration of these understudied biological realms.

Case study

Identifying targets for drug discovery is a complex task, especially when linking potential targets to specific medical conditions. This is particularly true in precision medicine, where exploring the less understood parts of the genome is crucial. An example of this process can be seen in the use of Pharos to pinpoint new targets in the lesser known genome areas, focusing on a disease such as dilated cardiomyopathy (DCM). A case study reveals the significance of leucine-rich repeat-containing protein 10 (LRRC10) in relation to DCM.³¹ LRRC10 is predicted by TIN-X as the more ‘important’ Tdark protein (excluding kinases and transcription factors) (Figure 4). From the TIN-X analysis, this protein is mentioned in ten PubMed abstracts concerning DCM. Further investigations within Pharos reveal that genetic studies link LRRC10 (with a notable mean rank score) to ‘Atrial Fibrillation,’ a known adverse factor in DCM prognosis.³² Additionally, a specific study identified using the TIN-X analysis highlights the whole-exome sequencing of a young patient with DCM and their parents, uncovering a distinct LRRC10 variant (I195T), which hints at its role in cardiac L-type calcium channels, potentially influencing DCM development.³³ This variant appears likely to impact channel function, contributing to the pathophysiology of the disease. Thus, targeting this specific LRRC10 variant could be a promising therapeutic strategy for DCM treatment in the future.

Figure 4. — Proteins that are associated with the disease of interest are analyzed on the Importance–Novelty plot generated from the Target Importance and Novelty Explorer (TIN-X) app within Pharos. Proteins that have more evidence of association with the disease are plotted in the upper part of the plot. Protein–disease associations of immediate interest are usually placed on the upper right area of the TIN-X plot and represent potential targets with high novelty. We have restricted our analysis to Tdark proteins (black markers).

Concluding remarks

The KMC for the IDG initiative has a pivotal role in bridging the gap between protein-centric data and actionable biomedical knowledge, focusing particularly on the elucidation of understudied proteins within three major IDG protein families (kinases, ion channels, and GPCRs). Notably, the systematic approach to categorizing proteins via the TDL system illuminates both well-studied and underexplored proteins, thus encouraging exploration of the ‘dark genome’. Moreover, the continual evolution and enhancement of resources such as DrugCentral highlight the value of KMC to the biomedical community. Ultimately, the KMC both ‘illuminates’ potential proteins for innovative drug development and therapeutic strategies and establishes itself as an indispensable resource that catalyzes advances in understanding the human proteome and disease biology, and facilitating drug discovery and development.

Acknowledgments

We would like to acknowledge our various team members throughout the years who have contributed to the success of KMC (U24CA224370). Furthermore, we are grateful for the fruitful discussions and engagements with all the IDG program awardees and IDG NIH staff, which have enriched our understanding of these data sets and allowed for more constructive data analysis products. This work was supported, in part, by the National Center for Advancing Translational Sciences, NIH (ZICTR000410-03).

Footnotes

Declaration of Generative AI and AI-assisted technologies in the writing process

During the preparation of this work, the author(s) used OpenAI to edit already written text. After using this tool/service, the author(s) reviewed and edited the content as needed and take(s) full responsibility for the content of the publication.

Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

References

1.Kelleher KJ, Sheils TK, Mathias SL, Yang JJ, Metzger VT, Siramshetty VB, et al. Pharos 2023: an integrated resource for the understudied human proteome. Nucleic Acids Res. 2023; 51: D1405–D1416. [DOI] [PMC free article] [PubMed] [Google Scholar]
2.Oprea TI, Bologa CG, Brunak S, Campbell A, Gan GN, Gaulton A, et al. Unexplored therapeutic opportunities in the human genome. Nat Rev Drug Discov. 2018; 17: 317–332. [DOI] [PMC free article] [PubMed] [Google Scholar]
3.Avram S, Wilson TB, Curpan R, Halip L, Borota A, Bora A, et al. DrugCentral 2023 extends human clinical data and integrates veterinary drugs. Nucleic Acids Res. 2023; 51: D1276–D1287. [DOI] [PMC free article] [PubMed] [Google Scholar]
4.Cannon DC, Yang JJ, Mathias SL, Ursu O, Mani S, Waller A, et al. TIN-X: target importance and novelty explorer. Bioinformatics. 2017; 33: 2601–2603. [DOI] [PMC free article] [PubMed] [Google Scholar]
5.Yang JJ, Grissa D, Lambert CG, Bologa CG, Mathias SL, Waller A, et al. TIGA: target illumination GWAS analytics. Bioinformatics. 2021; 37: 3865–3873. [DOI] [PMC free article] [PubMed] [Google Scholar]
6.Lago SG, Bahn S. The druggable schizophrenia genome: from repurposing opportunities to unexplored drug targets. NPJ Genom Med. 2022; 7: 25. [DOI] [PMC free article] [PubMed] [Google Scholar]
7.Jiang J, Yuan J, Hu Z, Zhang Y, Zhang T, Xu M, et al. Systematic illumination of druggable genes in cancer genomes. Cell Rep. 2022; 38: 110400. [DOI] [PMC free article] [PubMed] [Google Scholar]
8.Ryaboshapkina M, Hammar M. Tissue-specific genes as an underutilized resource in drug discovery. Sci Rep. 2019; 9: 7233. [DOI] [PMC free article] [PubMed] [Google Scholar]
9.Ursu O, Holmes J, Knockel J, Bologa CG, Yang JJ, Mathias SL, et al. DrugCentral: online drug compendium. Nucleic Acids Res. 2017; 45: D932–D939. [DOI] [PMC free article] [PubMed] [Google Scholar]
10.Ursu O, Holmes J, Bologa CG, Yang JJ, Mathias SL, Stathias V, et al. DrugCentral 2018: an update. Nucleic Acids Res. 2019; 47: D963–D970. [DOI] [PMC free article] [PubMed] [Google Scholar]
11.Avram S, Bologa CG, Holmes J, Bocci G, Wilson TB, Nguyen DT, et al. DrugCentral 2021 supports drug discovery and repositioning. Nucleic Acids Res. 2021; 49: D1160–D1169. [DOI] [PMC free article] [PubMed] [Google Scholar]
12.Halip L, Avram S, Curpan R, Borota A, Bora A, Bologa C, et al. Exploring DrugCentral: from molecular structures to clinical effects. J Comput Aided Mol Des. 2023; 37: 681–694. [DOI] [PMC free article] [PubMed] [Google Scholar]
13.Consortium UniProt. UniProt: the Universal Protein Knowledgebase in 2023. Nucleic Acids Res. 2023; 51: D523–D531. [DOI] [PMC free article] [PubMed] [Google Scholar]
14.Hastings J, Owen G, Dekker A, Ennis M, Kale N, Muthukrishnan V, et al. ChEBI in 2016: Improved services and an expanding collection of metabolites. Nucleic Acids Res. 2016; 44: D1214–1219. [DOI] [PMC free article] [PubMed] [Google Scholar]
15.Alexander SPH, Kelly E, Mathie A, Peters JA, Veale EL, Armstrong JF, et al. The Concise Guide To Pharmacology 2019/20: Introduction and Other Protein Targets. Br J Pharmacol. 2019; 176 (Suppl 1): S1–S20. [DOI] [PMC free article] [PubMed] [Google Scholar]
16.Chambers J, Davies M, Gaulton A, Hersey A, Velankar S, Petryszak R, et al. UniChem: a unified chemical structure cross-referencing and identifier tracking system. J Cheminform. 2013; 5: 3. [DOI] [PMC free article] [PubMed] [Google Scholar]
17.Škuta C, Southan C, Bartůněk P. Will the chemical probes please stand up? RSC Med Chem. 2021; 12: 1428–1441. [DOI] [PMC free article] [PubMed] [Google Scholar]
18.Havrilla JM, Liu C, Dong X, Weng C, Wang K. PhenCards: a data resource linking human phenotype information to biomedical knowledge. Genome Med. 2021; 13: 91. [DOI] [PMC free article] [PubMed] [Google Scholar]
19.Zhang W, Zhang Y, Min Z, Mo J, Ju Z, Guan W, et al. COVID19db: a comprehensive database platform to discover potential drugs and targets of COVID-19 at whole transcriptomic scale. Nucleic Acids Res. 2022; 50: D747–D757. [DOI] [PMC free article] [PubMed] [Google Scholar]
20.Kropiwnicki E, Binder JL, Yang JJ, Holmes J, Lachmann A, Clarke DJB, et al. Getting started with the IDG KMC datasets and tools. Curr Protoc. 2022; 2: e355. [DOI] [PMC free article] [PubMed] [Google Scholar]
21.Avram S, Halip L, Curpan R, Borota A, Bora A, Oprea TI. Annotating off-label drug usage from unconventional sources. >medRxiv. Published online September 9, 2022. 10.1101/2022.09.08.22279709. [DOI] [Google Scholar]
22.Marsboom R, Symoens J. Ervaringen met azaperone (R1929*) als sedativum bij het varken. Tijdschr Diergeneeskd. 1968; 93: 3–15. [Google Scholar]
23.Grissa D, Junge A, Oprea TI, Jensen LJ. Diseases 2.0: a weekly updated database of disease–gene associations from text mining and data integration. Database. 2022; 2022: baac019. [DOI] [PMC free article] [PubMed] [Google Scholar]
24.Safran M, Rosen N, Twik M, BarShir R, Iny ST, Dahary D, et al. The GeneCards Suite. In: Abugessaisa I, Kasukawa T (eds) Practical Guide to Life Science Databases. Singapore; Springer, 2022: 27–56. [Google Scholar]
25.Evangelista JE, Xie Z, Marino GB, Nguyen N, Clarke DJB, Ma’ayan A. Enrichr-KG: bridging enrichment analysis across multiple libraries. Nucleic Acids Res. 2023; 51: W168–W179. [DOI] [PMC free article] [PubMed] [Google Scholar]
26.Szklarczyk D, Kirsch R, Koutrouli M, Nastou K, Mehryary F, Hachilif R, et al. The STRING database in 2023: protein-protein association networks and functional enrichment analyses for any sequenced genome of interest. Nucleic Acids Res. 2023; 51: D638–D646. [DOI] [PMC free article] [PubMed] [Google Scholar]
27.Doncheva NT, Morris JH, Holze H, Kirsch R, Nastou KC, Cuesta-Astroz Y, et al. Cytoscape stringApp 2.0: analysis and visualization of heterogeneous biological networks. J Proteome Res. 2023; 22: 637–646. [DOI] [PMC free article] [PubMed] [Google Scholar]
28.Mendez D, Gaulton A, Bento AP, Chambers J, De Veij M, Félix E, et al. ChEMBL: towards direct deposition of bioassay data. Nucleic Acids Res. 2019; 47: D930–D940. [DOI] [PMC free article] [PubMed] [Google Scholar]
29.Papadatos G, Davies M, Dedman N, Chambers J, Gaulton A, Siddle J, et al. SureChEMBL: a large-scale, chemically annotated patent document database. Nucleic Acids Res. 2016; 44: D1220–1228. [DOI] [PMC free article] [PubMed] [Google Scholar]
30.Magariños MP, Gaulton A, Félix E, Kiziloren T, Arcila R, Oprea TI, et al. Illuminating the druggable genome through patent bioactivity data. PeerJ. 2023; 11: e15153. [DOI] [PMC free article] [PubMed] [Google Scholar]
31.Oprea TI. Exploring the dark genome: implications for precision medicine. Mamm Genome. 2019; 30: 192–200. [DOI] [PMC free article] [PubMed] [Google Scholar]
32.Nuzzi V, Cannatà A, Manca P, Castrichini M, Barbati G, Aleksova A, et al. Atrial fibrillation in dilated cardiomyopathy: outcome prediction from an observational registry. Int J Cardiol. 2021; 323: 140–147. [DOI] [PubMed] [Google Scholar]
33.Woon MT, Long PA, Reilly L, Evans JM, Keefe AM, Lea MR, et al. Pediatric dilated cardiomyopathy-associated LRRC10 (leucine-rich repeat-containing 10) variant reveals LRRC10 as an auxiliary subunit of cardiac l-type Ca2+ channels. J Am Heart Assoc. 2018; 7: e006428. [DOI] [PMC free article] [PubMed] [Google Scholar]
34.Avram S, Curpan R, Halip L, Bora A, Oprea TI. Off-patent drug repositioning. J Chem Inf Model. 2020; 60: 5746–5753. [DOI] [PubMed] [Google Scholar]

[R1] 1.Kelleher KJ, Sheils TK, Mathias SL, Yang JJ, Metzger VT, Siramshetty VB, et al. Pharos 2023: an integrated resource for the understudied human proteome. Nucleic Acids Res. 2023; 51: D1405–D1416. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R2] 2.Oprea TI, Bologa CG, Brunak S, Campbell A, Gan GN, Gaulton A, et al. Unexplored therapeutic opportunities in the human genome. Nat Rev Drug Discov. 2018; 17: 317–332. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R3] 3.Avram S, Wilson TB, Curpan R, Halip L, Borota A, Bora A, et al. DrugCentral 2023 extends human clinical data and integrates veterinary drugs. Nucleic Acids Res. 2023; 51: D1276–D1287. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R4] 4.Cannon DC, Yang JJ, Mathias SL, Ursu O, Mani S, Waller A, et al. TIN-X: target importance and novelty explorer. Bioinformatics. 2017; 33: 2601–2603. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R5] 5.Yang JJ, Grissa D, Lambert CG, Bologa CG, Mathias SL, Waller A, et al. TIGA: target illumination GWAS analytics. Bioinformatics. 2021; 37: 3865–3873. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R6] 6.Lago SG, Bahn S. The druggable schizophrenia genome: from repurposing opportunities to unexplored drug targets. NPJ Genom Med. 2022; 7: 25. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R7] 7.Jiang J, Yuan J, Hu Z, Zhang Y, Zhang T, Xu M, et al. Systematic illumination of druggable genes in cancer genomes. Cell Rep. 2022; 38: 110400. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R8] 8.Ryaboshapkina M, Hammar M. Tissue-specific genes as an underutilized resource in drug discovery. Sci Rep. 2019; 9: 7233. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R9] 9.Ursu O, Holmes J, Knockel J, Bologa CG, Yang JJ, Mathias SL, et al. DrugCentral: online drug compendium. Nucleic Acids Res. 2017; 45: D932–D939. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R10] 10.Ursu O, Holmes J, Bologa CG, Yang JJ, Mathias SL, Stathias V, et al. DrugCentral 2018: an update. Nucleic Acids Res. 2019; 47: D963–D970. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R11] 11.Avram S, Bologa CG, Holmes J, Bocci G, Wilson TB, Nguyen DT, et al. DrugCentral 2021 supports drug discovery and repositioning. Nucleic Acids Res. 2021; 49: D1160–D1169. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R12] 12.Halip L, Avram S, Curpan R, Borota A, Bora A, Bologa C, et al. Exploring DrugCentral: from molecular structures to clinical effects. J Comput Aided Mol Des. 2023; 37: 681–694. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R13] 13.Consortium UniProt. UniProt: the Universal Protein Knowledgebase in 2023. Nucleic Acids Res. 2023; 51: D523–D531. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R14] 14.Hastings J, Owen G, Dekker A, Ennis M, Kale N, Muthukrishnan V, et al. ChEBI in 2016: Improved services and an expanding collection of metabolites. Nucleic Acids Res. 2016; 44: D1214–1219. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R15] 15.Alexander SPH, Kelly E, Mathie A, Peters JA, Veale EL, Armstrong JF, et al. The Concise Guide To Pharmacology 2019/20: Introduction and Other Protein Targets. Br J Pharmacol. 2019; 176 (Suppl 1): S1–S20. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R16] 16.Chambers J, Davies M, Gaulton A, Hersey A, Velankar S, Petryszak R, et al. UniChem: a unified chemical structure cross-referencing and identifier tracking system. J Cheminform. 2013; 5: 3. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R17] 17.Škuta C, Southan C, Bartůněk P. Will the chemical probes please stand up? RSC Med Chem. 2021; 12: 1428–1441. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R18] 18.Havrilla JM, Liu C, Dong X, Weng C, Wang K. PhenCards: a data resource linking human phenotype information to biomedical knowledge. Genome Med. 2021; 13: 91. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R19] 19.Zhang W, Zhang Y, Min Z, Mo J, Ju Z, Guan W, et al. COVID19db: a comprehensive database platform to discover potential drugs and targets of COVID-19 at whole transcriptomic scale. Nucleic Acids Res. 2022; 50: D747–D757. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R20] 20.Kropiwnicki E, Binder JL, Yang JJ, Holmes J, Lachmann A, Clarke DJB, et al. Getting started with the IDG KMC datasets and tools. Curr Protoc. 2022; 2: e355. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R21] 21.Avram S, Halip L, Curpan R, Borota A, Bora A, Oprea TI. Annotating off-label drug usage from unconventional sources. >medRxiv. Published online September 9, 2022. 10.1101/2022.09.08.22279709. [DOI] [Google Scholar]

[R22] 22.Marsboom R, Symoens J. Ervaringen met azaperone (R1929*) als sedativum bij het varken. Tijdschr Diergeneeskd. 1968; 93: 3–15. [Google Scholar]

[R23] 23.Grissa D, Junge A, Oprea TI, Jensen LJ. Diseases 2.0: a weekly updated database of disease–gene associations from text mining and data integration. Database. 2022; 2022: baac019. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R24] 24.Safran M, Rosen N, Twik M, BarShir R, Iny ST, Dahary D, et al. The GeneCards Suite. In: Abugessaisa I, Kasukawa T (eds) Practical Guide to Life Science Databases. Singapore; Springer, 2022: 27–56. [Google Scholar]

[R25] 25.Evangelista JE, Xie Z, Marino GB, Nguyen N, Clarke DJB, Ma’ayan A. Enrichr-KG: bridging enrichment analysis across multiple libraries. Nucleic Acids Res. 2023; 51: W168–W179. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R26] 26.Szklarczyk D, Kirsch R, Koutrouli M, Nastou K, Mehryary F, Hachilif R, et al. The STRING database in 2023: protein-protein association networks and functional enrichment analyses for any sequenced genome of interest. Nucleic Acids Res. 2023; 51: D638–D646. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R27] 27.Doncheva NT, Morris JH, Holze H, Kirsch R, Nastou KC, Cuesta-Astroz Y, et al. Cytoscape stringApp 2.0: analysis and visualization of heterogeneous biological networks. J Proteome Res. 2023; 22: 637–646. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R28] 28.Mendez D, Gaulton A, Bento AP, Chambers J, De Veij M, Félix E, et al. ChEMBL: towards direct deposition of bioassay data. Nucleic Acids Res. 2019; 47: D930–D940. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R29] 29.Papadatos G, Davies M, Dedman N, Chambers J, Gaulton A, Siddle J, et al. SureChEMBL: a large-scale, chemically annotated patent document database. Nucleic Acids Res. 2016; 44: D1220–1228. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R30] 30.Magariños MP, Gaulton A, Félix E, Kiziloren T, Arcila R, Oprea TI, et al. Illuminating the druggable genome through patent bioactivity data. PeerJ. 2023; 11: e15153. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R31] 31.Oprea TI. Exploring the dark genome: implications for precision medicine. Mamm Genome. 2019; 30: 192–200. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R32] 32.Nuzzi V, Cannatà A, Manca P, Castrichini M, Barbati G, Aleksova A, et al. Atrial fibrillation in dilated cardiomyopathy: outcome prediction from an observational registry. Int J Cardiol. 2021; 323: 140–147. [DOI] [PubMed] [Google Scholar]

[R33] 33.Woon MT, Long PA, Reilly L, Evans JM, Keefe AM, Lea MR, et al. Pediatric dilated cardiomyopathy-associated LRRC10 (leucine-rich repeat-containing 10) variant reveals LRRC10 as an auxiliary subunit of cardiac l-type Ca2+ channels. J Am Heart Assoc. 2018; 7: e006428. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R34] 34.Avram S, Curpan R, Halip L, Bora A, Oprea TI. Off-patent drug repositioning. J Chem Inf Model. 2020; 60: 5746–5753. [DOI] [PubMed] [Google Scholar]

PERMALINK

Overview of the Knowledge Management Center for Illuminating the Druggable Genome

Tudor Oprea

Cristian Bologa

Jayme Holmes

Stephen Mathias

Vincent T Metzger

Anna Waller

Jeremy J Yang

Andrew R Leach

Lars Juhl Jensen

Keith J Kelleher

Timothy K Sheils

Ewy Mathé

Sorin Avram

Jeremy S Edwards

Abstract

Teaser:

Introduction

Figure 1.

Discussion

Target Development Level

DrugCentral

Table 1.

Figure 2.

Key resources generated from the Knowledge Management Center

Pharos

Target Importance and Novelty Explorer

Figure 3.

Target Illumination GWAS Analytics

DISEASES

Patent data mining

Case study

Figure 4.

Concluding remarks

Acknowledgments

Footnotes

References

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases