Abstract
Compound databases (DBs) are essential tools for drug discovery. The number of DBs in public domain is increasing, so it is important to analyze these DBs. In this article, the main characteristics of 64 DBs will be presented. The methodological strategy used was a literature search. To analyze the characteristics obtained in the review, the DBs were categorized into two subsections: Open Access and Commercial DBs. Open access includes generalist DBs (containing compounds of diverse origins), DBs with specific applicability, DBs exclusive to natural products and those containing compounds with specific pharmacological action. The literature review showed that there are challenges to making these repositories available, such as standardizing information curation practices and funding to maintain and sustain them.
Keywords: : chemical library, database, drug design, virtual screening
GRAPHICAL ABSTRACT

Plain language summary
Executive summary.
Background
Compound databases (DBs) are crucial tools in the development of multidisciplinary research fields, such as drug discovery. The number of DBs of public domain compounds is increasing, however the process of building and curating these libraries is critical as the data must be diverse and reliable to enable safe trials. Therefore, it is of outmost importance the assembly and analysis of compound DBs regarding their structure and the legitimacy of the structures. In this article, an overview of the compound DBs is presented in order to assess and analyze their constitutional characteristics, the curation strategies and the contributions to the screening of new drugs, as well as other research areas.
Methodology
As a methodological strategy, a comprehensive bibliographic search was carried out to extract relevant DBs of compounds.
Results & discussion
34 articles containing diverse information about 64 DBs were manually selected. Most of these DBs were developed mainly in the period 2006–2010 in China, built by universities and research institutes, for the most part. These DB have various purposes, from repositories of curated 3D structures to DBs with applicability in the research and development of new drugs for specific diseases which collects potential bioactive compounds with antiviral action. Most of the DBs are open access, favoring the reach of thousands of users and the information contained in the researched DBs, were manually extracted from the literature, undergo largely curatorial processes and comprise a diversity of information related to physicochemical properties, spectral data, chemical space configuration, in silico prediction studies, in vitro assays and bioactivity. It was also found that several DBs incorporate chemoinformatics tools to query, mine and analyze the contents of the library.
Conclusion
In this article an overview of the DBs of compounds was presented in order to assess and analyze their constitutional characteristics, the curation strategies and the contributions to the screening of new drugs, as well as other research areas. This literature review has also shown that there are still challenges in making these repositories available, such as standardizing procedures for curating information and materials, and attracting funding and researchers to maintain and sustain them.
1. Background
Since the dawn of society, people have used information derived from the data generated by their tasks, from the simplest to the most complex. Data can be generated in any field, in different ways and from different sources. However, it is only by organizing, storing and analyzing it that valid information can be produced [1]. When a large amount of data is generated, the first step in organizing it is to structure it in systems called databases (DB) [2].
DBs have played a crucial role as centers for the acquisition, organization and distribution of knowledge aimed at solving questions related to the most important human and environmental problems. For example, the taxonomy and ecology repositories Species 2012 and the catalog of Life and the Global Biodiversity Information Facility (GBIF) provide cataloging services and address questions of species distribution and occurrence in unique ecological niches [3].
DBs have also been crucial in the development of multidisciplinary research fields such as drug discovery, medicinal chemistry, chemosystematics, ethnopharmacology and ‘omics’ approaches. For example, many genomic studies, such as human genome mapping, have used GenBank and the DNA Databank of Japan (DDBJ). Many pharmacological, computational and proteomic studies have employed the Protein Data Bank (PDB), the Human Proteome Map and the Peptide Atlas. In addition, several metabolomic studies have relied on the Human Metabolom Database (HMDB), Golm Metabolome Database, Global Natural Product Social Molecular Networking (GNPS), Metlin, Biological Magnetic Resonance Bank (BMRB) and Mass Bank. Also, the chemical and biological properties contained in compound libraries, including PubChem, ChemSpider, ZINC, PK/DB, BindingDB, ChemBank, ChEMBL and DrugBank, are widely used in drug discovery projects [3].
The importance of DBs in new drug discovery projects is continuously increasing, beyond their role as compound repositories. In fact, compound DBs and chemical datasets, are a centerpiece in pharmaceutical companies as well as in academic and government research centers [4]. Compounds present in DB have already led to the development of drugs in clinical use to treat different diseases. Several research groups have recently used computational methodologies to screen large DBs of compounds before experimental screening and designing their experiments [5].
In this respect, the number of DBs of compounds in the public domain, including those for compounds of natural origin, is increasing. This is in line with the growing and synergistic combination of natural product research and chemoinformatics [6]. Some results of this promising conjunction have already been observed, for example, a theoretical study by Nuñez et al. evaluated 3444 Latin American natural products using chemoinformatics tools [7].
It is therefore extremely important to analyze the compound DBs by means of a literature review. This strategy organizes the data and highlights the functionalities of these DBs. As a result, researchers have an overview of the DBs currently available. They can then look for those that will effectively contribute to their drug design projects. In addition to analyzing, this literature review also categorizes the DBs.
This article describes the objectives, characteristics and functionalities of 64 compound DBs. And for the description of the characteristics the DBs were categorized into two subsections: open access DBs and commercial DBs. The open access DBs include the generalist DBs (both compounds and natural products), the DBs with a specific applicability, the DBs exclusively for natural products and those that contain compounds with a specific pharmacological action.
This characterization showed that compound repositories should provide standardized and useful information about the compounds they contain, with a view to diversified multidisciplinary research and product and drug development. In addition, the major challenge for these repositories is to standardize procedures for curating information and compounds, and to provide financial resources and researchers for their maintenance and sustainability.
2. Methodology
A comprehensive literature search was conducted to identify relevant articles about compounds DBs. The search took place in indexed libraries such as Science Direct and PubMed, and on official compounds DB websites and was performed manually with a combination of different keywords, such as DB, compounds, natural products, virtual screening, according to Medical Subject Heading (MeSH) terms, using Boolean operators (AND, OR), in Portuguese and English. Inclusion criteria for publications were those that contained full articles describing the history of these DBs and their purpose, as well as the number of compounds, processes and resources used to maintain, update and curate the information. 34 articles containing diverse information from 62 DBs were manually selected. Descriptive statistical analysis was conducted in order to organize, summarize and describe the important aspects of the set of characteristics observed among these DBs, and to compare them [8].
3. Results & discussion
3.1. Compounds database evaluation
The bibliographical survey provided the selection of relevant and reliable information about the DBs, such as: names and electronic addresses, provider institutions and the number of compounds that contain them. This information was organized, and it is described in Supplementary Table S1.
All the selected DBs in the bibliographic search were also analyzed according to the following parameters: access (free or not); DB purpose, number of data or compounds, whether there is availability of access to other DBs and publications; additional information: need for registration; tutorial and contacts. Also, the use of statistical, chemometric and other tools was verified.
According to the information contained in Supplementary Table S1, most of the DBs have open access, favoring the reach of thousands of users from academia, industries and various entities, and consequently boosting the discovery of potential drugs, for example. Currently, these DBs also align knowledge of medicinal chemistry and add important approaches and applications of artificial intelligence, providing, in addition to quality content, the user's ability to interact with these data and integrate them more easily.
The growth in the number of DBs over the past 30 years is clearly observed according to Figure 1. Only DBs from 1998 could be found. In the period between the years 2006 and 2020, 67.35% (n = 33) of the 49 DBs, that have the date of their development described, were presented. DBs such as ChEMBL and DrugBank were developed in this period and are important repositories continuously fed with information obtained from updated references. Currently, the latest version Drugbank (Drugbank 5.0) includes data on hundreds of clinical trials of investigational drugs, as well as reuse trials, contributing greatly to optimizing the development of new drugs, as well as drugs discontinued from clinical use [9].
Figure 1.

Relative frequency of compounds databases developed – period 1995 to 2023.
DB: Database.
In this backdrop, the significant increase in the analysis and understanding of large amounts of data, such as mapping human DNA and the development of new drugs for clinical use, over the past three decades have become a relevant tool for understanding data, decision-making, and product development, such as medicines. For example, the current trends in the drug discovery is the study of the repositioning of compounds established in the clinical practice with accuracy and security to save time and financial source for the development of new drug [10].
Not only the creation, but also the continuous development of compound DBs is essential, both in response to significant improvements in web standards and to changes in drug research and development. In this sense, the updating of not only the quantity but also the quality and consistency of the data should be carried out by curation teams composed of qualified technical staff, at increasingly regular intervals and with shorter response times. These strategies guarantee the credibility of the information provided to users.
There are several examples of DBs that support policies aimed at continuous updating and curation of information. DrugBank 5.0 (2018) represents the most significant update to this DB in the last 10 years, with existing data increasing by 100% or more compared with the previous update. Many other important improvements have been made to the content, interface and performance of the DrugBank website, increasing its usability, usefulness and potential applications in many areas of pharmaceutical science research.
Another example is the latest update of Human Metabolomic Database (HMDB), HMDB 5.0, which brings a number of important improvements and updates to the DB, making HMDB more useful and attractive to users. These updates have broadened applications not only in human metabolism, but also in exposomics, lipidomics, nutrition, biochemistry and the clinic [9].
3.2. Origin of databases
Regarding the continent origin of the researched DBs, most of these DBs originate from Asia (n = 18, 29%), Europe (n = 16, 26%) and North America (n = 11, 17.6%), and the countries on these continents responsible for the development of the largest number of these DBs are USA (n = 8), China (n = 8), Germany (n = 7) and the UK (n = 7), respectively (Figure 2).
Figure 2.

Frequency distribution of database origin. (A) Relative frequency distribution of DB origin by continent. (B) Absolute frequency of DBs developed by country of origin.
DB: Database.
Historically, the global pharmaceutical landscape has been dominated by large multinational companies (Big Pharma), mostly of European and North American origin. However, emerging pharmaceutical markets (pharma-emerging countries) are growing, especially China, which is seen as the responsible for the growth of this segment [11]. This fact is in line with the data that most DBs developed in the last 30 years have their origin in the Asian, European and North American continents, since these bases are crucial tools for the optimization of research and development of new drugs, thus arousing the interest and the need for their construction and maintenance by countries belonging to these continents. However, as shown in Figure 3, it is in the academic sector universities (n = 38, 65.52%) and research institutes (n = 14, 24.14%) that the development and maintenance of DBs occurs, which, for the most part, are made available publicly and free of charge. The other DBs (10.34%) are privative, most maintained by the industry, being their data secret.
Figure 3.

Database relative frequency distribution related to developers and maintainers. Others include private companies, such as pharmaceutical enterprises.
4. Goals & overall contents
Initially, the DBs were a repository of compound and data, accompanied by their physicochemical properties. At the moment, a wide range of objectives are being added to these DBs, such as: to incorporate natural products and information about their origins, as well as spectral data, data about the structures of cell receptors (bench, to relate specific biological activities of the compounds, and to contain chemoinformatics tools that allow in silico predictions. Figure 4 shows some examples of open access DBs and their main objectives. Moreover, the DBs with chemoinformatics tools were also checked and it was observed that around 38% of the DBs have in their platform software for the calculation of physicochemical properties, for the characterization of chemical space, for principal component analysis (PCA) and molecular docking (Table 1).
Figure 4.

Diagram showing the relationship between open access databases and their main objectives. Categorized into: generalists (DBs of compounds and natural products), DBs with specific applications (DBs of 3D curated compounds, as well as other applications as a repository for in silico studies), DBs exclusively for natural products, and DBs of compounds with specific pharmacological activity.
DB: Database.
Table 1.
Tools in databases: frequency absolute and relative.
| Chemoinformaticsa tools | Absolute frequency | Relative frequency | Tools in DBs |
|---|---|---|---|
| Yes | 24 | 38.71% | Software for calculation of physicochemical properties, for the characterization of chemical space, principal component analysis and molecular docking |
| No | 38 | 61.29% | |
| Total | 62 | 100 |
Chemoinformatics is an independent discipline that has a broad range of applications in chemistry. It has several formal definitions, for example, ‘All concepts and methods that are designed to interface theoretical and experimental efforts involving small molecules' or ‘Chemoinformatics is a generic term that encompasses the design, creation, organization, management, retrieval, analysis, dissemination, visualization and use of chemical information’.
DB: Database.
5. Description of the database compounds features
In this section, there is a detailed description of the DBs presented in Supplementary Table S1. These DBs have been categorized into two subsections: open access DBs and the commercial DBs. The open access DBs comprise the generalist DBs (both compound and natural products), DBs with specific applicability, those exclusive to natural products, and those that comprise compounds with specific pharmacological actions. It is important to emphasize that the classification of DBs has been on the basis of their common objectives. Furthermore, the description of objectives and the main uses of mentioned DBs (grouped by categories) are also listed in Tables 2, 3 and 4 aiming to provide a horizontal comparison covered libraries.
Table 2.
Objectives and main uses of generalists and specific application databases.
| Group | DB | Citationa | Example of potential uses | Objectives |
|---|---|---|---|---|
| Generalists | ChEMBL | 16,400 | Acquisition of data for generating predictive models using machine learning | Collection of molecular structures and other information such as experimental and calculated properties, vendors (ZINC), available formulation (DrugBank), etc |
| ZINC | 2,200,000 | Input for virtual screening and other predictions performed by CADD related methods | ||
| PubChem | 64,000 | Acquisition of data for generating predictive models using machine learning | Drug discovery and development | |
| Drugbank | 31,400 | Input for virtual screening and other predictions performed by CADD-related methods focusing on drug repurposing method | ||
| NPEdia | 121 | Input for virtual screening and other predictions performed by CADD-related methods focusing on natural products | ||
| ChEBI | 17,500 | Input for virtual screening and other predictions performed by CADD-related methods focusing on ‘small’ chemical compounds | ||
| BraCoLi | 60 | Input for virtual screening and other predictions performed by CADD-related methods | ||
| Specific applications | SWEETLEAD | 2,340 | Input for virtual screening and other predictions performed by CADD-related methods focusing on drug repurposing | In silico studies |
| UNPD | 14,000 | Input for computational studies focused on the discovery or application of compound with sweet taste (e.g., sweeteners) | ||
| SuperNatural 3.0 | 13 | Input for virtual screening and other predictions performed by CADD-related methods focusing on natural products | Collection of mechanism of action of these natural and their target proteins | |
| Princeton | NA | Input for virtual screening related on scaffolds suitable for synthesis of new chemical entities | Drug design | |
| 3DMET | 271 | Input for virtual screening and other predictions performed by CADD-related methods focusing on natural products | Repository of 3D structures of natural metabolites | |
| KNApSAcK-3D | 80 | Input for virtual screening and other predictions performed by CADD-related methods focusing on natural products | Collection of structures for molecular docking studies | |
| DockCov | NA | Input for virtual screening and other predictions performed by CADD-related methods focusing on promising COVID-19 compounds | ||
| Spec Natural Products | NA | Input for virtual screening and other predictions performed by CADD-related methods focusing on synthesized bioactive compounds | Repository of synthesized bioactive substances for virtual screening | |
| The Herbal Ingredient Targets – HIT 2.0 | NA | Acquisition of data related to natural products and their molecular targets for generating predictive models using machine learning | Collections of natural products originating from Chinese plant species | |
| The Herbal Ingredients in vivo Metabolism – HIM | NA | Acquisition of data related to natural products for generating predictive models of metabolites and metabolism sites using machine learning | ||
| BindingDB | 5,650 | Acquisition of data related to compounds/targets affinities for validating docking studies and/or generating predictive models using machine learning | Drug discovery and development | |
| Drug Repurposing Hub | NA | Input for virtual screening and other predictions performed by CADD-related methods focusing on drug repurposing | Curated collection of US FDA-approved drugs | |
| NPASS | 12 | Acquisition of data related to natural products for generating predictive models of metabolites and metabolism sites using machine learning | Biological activity of natural products | |
| HMDB | 31,900 | Input for virtual screening and other predictions performed by CADD-related methods focusing on human metabolites | Information about human metabolites | |
| RIKEN MSn spectral DB | 334 | Acquisition of data related to natural products and their MS for generating predictive models and/or novel algorithms | Compendium of MS natural product | |
| GNPS | 391 | Containing only MS spectra of natural product organizing and sharing raw, processed or identified data | Compendium of MS natural product | |
| SuperScent | 175 | Input for computational studies focused on the discovery or application of scent/fragrance related products development | Compendium of volatile compounds | |
| SuperSweet | 1,680 | Input for computational studies focused on the discovery or application of compound with sweet taste (e.g., sweeteners) | Collection of carbohydrates, artificial sweeteners and other sweeteners such as proteins and peptides | |
| Pherobase | 2,990 | Input for virtual screening and other predictions performed by CADD-related methods focusing on pheromones from insect origin | Collection of pheromones and semiochemicals |
Searching for citations using Google Scholar on 16/11/2023 and 21/02/2024.
CADD: Computer-aided drug design; DB: Database; MS: Mass spectra; NA: Not applicable.
Table 3.
Objectives and the main uses of the natural products exclusive databases.
| Group | DB | Citationa | Example of potential uses | Objectives |
|---|---|---|---|---|
| Natural products exclusive | InterBioScreen | 753 | Input for virtual screening and other predictions performed by CADD-related methods focusing on natural products isolated from plants, microorganisms and marine species | Collection of natural products isolated from plants, microorganisms and marine species |
| PSC-db | 57 | Input for virtual screening and other predictions performed by CADD-related methods focusing on natural products (secondary plant compounds) | Drug discovery | |
| COCONUT | 82,900 | Input for virtual screening and other predictions performed by CADD-related methods focusing on natural products from 53 natural product DB | Drug discovery | |
| ChemDB | 839 | Input for virtual screening and other predictions performed by CADD-related methods focusing on natural products from 53 natural product DB | ||
| Natural Products Atlas | 438 | Input for virtual screening and other predictions performed by CADD-related methods focusing on natural products whose origin is found in microorganisms | DB of NPs whose origin is found in microorganisms | |
| TCM Database@Taiwan | 1,050 | Input for virtual screening and other predictions performed by CADD-related methods focusing on natural products from Traditional Chinese Medicine | DB of Traditional Chinese Medicine (TCM) compounds | |
| PHCD | NA | Input for virtual screening and other predictions performed by CADD-related methods focusing on natural products from Persian medicinal plants compounds | DB of Persian medicinal plants compounds | |
| VIETHERB | 71 | Input for virtual screening and other predictions performed by CADD-related methods focusing on natural products from Vietnamese traditional medicine | DB of Vietnamese traditional medicine | |
| TIPdb-3D | 38 | Input for virtual screening and other predictions performed by CADD-related methods focusing on natural products from Taiwan medicinal plants compounds | DB of Taiwan medicinal plants compounds | |
| NuBBEDB | 224 | Input for virtual screening and other predictions performed by CADD-related methods focusing on natural products from Brazilian flora | Compendium of available biogeochemical information on Brazilian biodiversity | |
| BIOFACQUIM | 212 | Input for virtual screening and other predictions performed by CADD-related methods focusing on natural products from Mexican plants | DB of Mexican medicinal plants compounds | |
| UNIIQUIM | 4 | Input for virtual screening and other predictions performed by CADD-related methods focusing on natural products from Mexican plants published by the Natural Products Department of the Institute of Chemistry, UNAM | DB of medicinal plants compounds of Natural Products Department of the Institute of Chemistry, UNAM | |
| UPMA | NA | Input for virtual screening and other predictions performed by CADD-related methods focusing on natural products from Panamá | DB of medicinal plants compounds from Panamá | |
| PeruNPDB | 6 | Input for virtual screening and other predictions performed by CADD-related methods focusing on natural products from Peru | DB of medicinal plants compounds from Peru | |
| LANaPD | 55 | Input for virtual screening and other predictions performed by CADD-related methods focusing on natural products Latin American biodiversity | A unified DB of NPs representing Latin American biodiversity | |
| AfroDB | 275 | Input for virtual screening and other predictions performed by CADD-related methods focusing on natural products from African plants | DB with a focus on natural product used in traditional medicines on the African continent | |
| p-ANAPL | 65 | Input for virtual screening and other predictions performed by CADD-related methods focusing on natural products from African biota | Consortium of collections of natural products isolated from the African biota | |
| EANPDB | 31 | Input for virtual screening and other predictions performed by CADD-related methods focusing on natural products from Eastern African plants | DB of medicinal plants compounds from Eastern African plants | |
| SANCDB | 106 | Input for virtual screening and other predictions performed by CADD-related methods focusing on natural products from South African plants | DB of medicinal plants compounds from South African plants | |
| NANPDB | 120 | Input for virtual screening and other predictions performed by CADD-related methods focusing on natural products from North African plants | DB of medicinal plants compounds from North African plants | |
| ETM-DB | 39 | Input for virtual screening and other predictions performed by CADD-related methods focusing on natural products from Ethiopian plants | DB of medicinal plants compounds from Ethiopian plants | |
| IMPPAT 2.0 | 34 | Input for virtual screening and other predictions performed by CADD-related methods focusing on natural products from Indian Medicinal Plants | DB of medicinal plants compounds from Indian Medicinal Plants | |
| LOTUS | 93 | Establish the exchange between ‘structure–organism’ pairs and relationships between different molecular structures and the living organisms from which they were identified | DB of referenced structure–organism pairs | |
| NPBS | 8 | Contains information on the relationships between natural products and biological sources reported in publications | DB of referenced natural products-biological parts pairs | |
| MacrolactoneDB | 20 | Input for virtual screening and other predictions performed by CADD-related methods focusing on macrolides | DB macrolides compounds |
Searching for citations using Google Scholar on 16/11/2023 and 21/02/2024.
CADD: Computer-aided drug design; DB: Datbase; NA: Not applicable.
Table 4.
Objectives and the main uses of the specific biological activities and private databases.
| Group | DB | Citationa | Example of potential uses | Objectives |
|---|---|---|---|---|
| Specific biological activities | AfroCancer DB | 62 | Input for virtual screening and other predictions performed by CADD-related methods focusing on natural products from African plants | Virtual screening of potential chemotherapeutics |
| NPACT | NA | Input for virtual screening and other predictions performed by CADD-related methods focusing on natural compounds exhibiting anticancer activities | DB of natural compounds exhibiting anticancer activities | |
| BIOPEP-UWM Bioactive Peptides DB | 548 | Input for virtual screening and other predictions performed by CADD-related methods focusing on bioactive peptides | DB of peptides especially those derived from food, which are components of diets that prevent the development of chronic diseases | |
| ANTIAGE-DB | 6 | Input for virtual screening and other predictions performed by CADD-related methods focusing on anti-ageing and anti-melanogenic agents | DB of anti-ageing and anti-melanogenic agents | |
| InflamNat | 20 | Input for virtual screening and other predictions performed by CADD-related methods focusing on natural compounds exhibiting anticancer activities | DB of natural compounds exhibiting anticancer activities | |
| AntibioticDB | NA | Input for virtual screening and other predictions performed by CADD-related methods focusing on natural compounds exhibiting antibacterial activities | DB for antibacterial research | |
| StreptomeDB | 67 | Input for virtual screening and other predictions performed by CADD-related methods focusing on natural products isolated from Streptomyces | DB of natural products isolated from Streptomyces | |
| avMpNp DB | 2 | Input for virtual screening and other predictions performed by CADD-related methods focusing on bioactive compounds from Brazilian biodiversity with antiviral activity | DB of bioactive compounds from Brazilian biodiversity with antiviral activity | |
| Private | The Dictionary of Natural Products® | NA | Input for virtual screening and other predictions performed by CADD-related methods focusing on natural products from literature reports on the isolation and identification of compounds from diverse biota | Drug discovery |
| ChemTCM | NA | Input for virtual screening and other predictions performed by CADD-related methods focusing on natural products from Chinese traditional medicine | DB of natural products obtained from the medicinal plants used in TCM | |
| Natural Products Library | NA | Input for virtual screening and other predictions performed by CADD-related methods focusing on more than 800 cured natural products only remained as an in-house collection | DB of Pharmaceutical industry AstraZeneca | |
| DB Ayurveda | NA | Input for virtual screening and other predictions performed by CADD-related methods focusing on natural products from Indian traditional medicine | DB of Indian traditional medicine | |
| INDOFINE Chemical Company | 4,200 | Input for catalog of isolated compounds of natural origin or synthesized by the cia with structures and annotations | Drug discovery | |
| ACDISC_NP | NA | Input for natural products bioactive from AnalytiCon Discovery NP | Drug discovery |
Searching for citations using Google Scholar on 16/11/2023 and 21/02/2024.
CADD: Computer-aided drug design; DB: Database; NA: Not applicable.
6. Description of the database compounds features: open access databases
6.1. Generalists
Generalists DBs are those that contain comprehensive information on small compounds, macromolecules, chemical structures, chemical and physical properties, biological activities, articles available in the literature, safety and toxicity data on compounds and even the validation of computational tools. Descriptions of the generalists DBs included in this review are given below.
ChEMBL is a DB that was introduced 10 years ago as an open access resource and plays an important role in drug discovery and validation of computational tools. Access to the data in ChEMBL is provided through the web user interface. A large proportion of this bioactivity data in ChEMBL is currently manually extracted. Version 32 of ChEMBL contains information extracted from over 86,364 publications and patents. New sources of data regarding bioactivity assays, as well as tools for calculations of physicochemical properties of pharmaceutical interest and statistical calculations are also available in ChEMBL [12].
PubChem is a public, open DB established by the National Institutes of Health (NIH). This DB contains mainly small compounds, but also nucleotides, carbohydrates, lipids, peptides and chemically modified macromolecules. Information is available on the chemical structures, chemical and physical properties, biological activities, patents, articles available in the literature, safety and toxicity data of the compounds [13].
First described in 2006, Drugbank has evolved over the past 17 years in response to marked improvements in web standards and the changes inherent in drug research and development. DrugBank (version 5.1.10) is a publicly available DB of pharmacological agents that has a collection of 8865 compounds, including 1806 approved drugs and 7059 investigational or off-market drugs. It provides convenient access to comprehensive molecular information about current drugs and their mechanisms, as well as interaction with targets, which facilitates efficient drug discovery and development. It has been extensively used in repurposing studies to find therapeutic candidates for various diseases, including COVID-19, with in silico or in vitro investigations [14].
NPEdia is a library developed by the RIKEN Natural Compounds Bank. This library consists of a collection of secondary metabolites from microorganisms and plant species. NPEdia contains derivatives and analogs of natural compounds as well as synthetic compounds. To date, NPEdia has distributed a total of more than 250,000 compounds [15].
ZINC is a free DB of compounds available for virtual screening, containing more than 230 million compounds in 3D format. Its interface also accepts queries represented in simplified molecular input line entry specification (SMILES), SMILES arbitrary target specification (SMART), International Chemical Identifier (InChI), InChIkey and even fingerprint binaries. ZINC also contains more than 750 million of compounds for searching for analogs, including natural products. In addition to a repository of compounds with the description of physicochemical properties of pharmaceutical interest, ZINC has been converted into a search source, ZINC15, through the adoption of tools from cheminformatics. This expanded version also brings research on genes and their target classes, chemical patterns and clinical trials [16]. The new version of the library, ZINC-22 uses data organization methods, allowing rapid search of molecules and their physical properties including conformations, atomic partial charges, CLogP values and solvation energies, all crucial for molecular coupling. In addition, the diversity of the chemical space of ZINC-22 was evaluated by employing the Bemis – Murcko Scaffold concept [17].
ChEBI is a freely available DB that contains information about chemical entities of biological interest. The chemical entities originate from synthetic or natural products. It currently includes over 46,000 entries, assigned with multiple annotations, including chemical structure, DB cross-references, synonyms and literature citations. ChEBI includes an ontological classification, whereby relationships between entities or molecular classes of entities are specified [18].
BraCoLi (Brazilian Compound Library) is a new manually curated virtual compounds library developed by Brazilian research groups to support computer-aided drug design work. The first version of BraCoLi has 1176 compounds. The updated version reported in 2022 contains 1913 compounds from several Brazilian universities. This DB has open access and, like ChEMBL, contains biological and chemical information on synthetic, semi-synthetic and natural compounds. The compounds structures are stored in Mol2 and SDF formats, also provides the molecular formula, molar mass, melting range, origin of the compound's additional remarks and bibliographic references [19].
6.2. Specific applications
Application-specific DBs are those with a primary focus on providing property data. In this review, we considered DBs with applications focused on storing 3D structures of compounds to those with the aim of providing information for in silico and in vitro studies.
Structures of Well-curated Extracts, Existing Therapies and Legally regulated Entities for Accelerated Discovery (SWEETLEAD) is a DB of in silico studies containing curated chemical structures represented by approved drugs, bioactive compounds isolated from traditional medicinal plants, and regulated chemicals. The motivation for the development of SWEETLEAD stemmed from the observation of conflicting information between publicly available DBs of compounds and the lack of a DB containing highly curated chemical structures for globally approved drugs. The construction relied on gathering information from various accessible DBs (PubChem, ChemSpider, DrugBank, ChEBI and others) in order to identify the correct structure for each compound [20].
Universal Natural Products Database (UNPD) was developed to compile all known natural products into one collection for in silico drug screening. This DB consists of approximately 200,000 natural product structures with the description of their biological activities and virtual screening results. PCA as well as chemical space characterization revealed that there was overlap between natural products and US FDA-approved drugs, which indicated a large number of lead compounds. All natural products were coupled to 332 target proteins of FDA-approved drugs, and the results pointed out that these products can interact with multiple cellular target proteins. The latest version is no longer accessible via the link provided in the original publication, but a copy of the structures contained therein is still maintained on the ISDB website (a DB for in silico predictions and MS for natural products) [21]. Subsets of diverse chemical structures obtained from UNPD are also freely available [22].
SuperNatural II is a freely available DB that contains over 300,000 natural products, containing the 2D structures, computed physicochemical properties and toxicity prediction. SuperNatural II also provides information on the pathways associated with the synthesis and degradation of natural products, as well as mechanism of action of these natural products in relation to structurally similar drugs and their target proteins [23]. SuperNatural 3.0 the updated version contains 449,058 natural products. Structural information is also provided. It also provides data on mechanism of action, toxicity, drug-likeness properties and chemical space prediction, targeting various diseases such as antiviral, antibacterial, antimalarial, anticancer and specific target cells such as the central nervous system. The updated version of the DB also provides a valuable set of natural compounds that are expected to contain potential high-sweetness compounds. The potential flavour profile of the natural compounds has been predicted using our published VirtualTaste models. The SuperNatural 3.0 DB is available free of charge at http://bioinf-applied.charite.de/supernatural_3 without login or registration [24].
Princeton is a free access DB that contains several collections. The collection called Express, offers over 1.3 million compounds representing 50 chemical classes. There is also the BioMolecular collection, a unique collection of high quality scaffolds. Currently, this collection covers several chemical and pharmacological spaces. The DB also provides computational services that represent a modern approach to drug design, with an emphasis on chemotherapeutics, including ligand-based virtual screening and 2D and 3D QSAR modeling, and receptor-based virtual screening. Calculations of the different molecular properties of the compounds are also available. More recently, DB has started to offer libraries of macrocyclic compounds employed in lead generation, both those of natural origin and semi-synthetic derivatives of natural products [25].
The 3DMET DB is a repository of 3D structures of natural metabolites, with emphasis on compounds curation. Starting in 2009, the curation of new compounds from published works began. In this process, problems with the digitization of 3D structures were pointed out, mainly concerning stereochemistry. This compounds curation strategy developed in 3DMET became very useful for other DBs, providing a support system for minimizing errors in chemical notations and structures [26].
KNApSAcK-3D is a DB that contains information about secondary metabolites, relating them to their organism(s) of expression. KNApSAcK-3D contains 3D structures of all of the structures of the bioactive compounds that are contained in it. The 3D structure for each compound was optimized using the Merck Molecular Force Field (MMFF94) and a multi-objective genetic algorithm was used to search extensively for possible conformations and locate the global minimum. The resulting set of structures can also be used for molecular docking studies to identify new and potentially binding in target proteins [27].
Spec Natural Products is a repository of synthesized bioactive compounds for virtual screening. There are about 750 compounds that have been characterized and their drug-likeness properties determined. The DB relies on chemoinformatics tools, which assist in specific target selections in new chemical entity discovery and lead optimization projects [23].
DockCov aims to accelerate the search process for potential drugs by providing extensive data from molecular docking studies. Through this DB, researchers can query the obtained binding free energy scores of compounds against the seven major SARS-CoV-2 proteins involved in the infection mechanisms of the virus. These compounds are present in the FDA Approved Drugs DB (2285 compounds) and the Taiwan National Health Insurance (NHI) DB (1748 compounds). The docking results are saved in Protein Database (PDB), Partial Charge (Q) and Atom Type (T), PDBQT formats and can be viewed directly on the website via NGLView [28].
The Herbal Ingredient Targets (HIT) and The Herbal Ingredients in vivo Metabolism (HIM) DB are two linked collections of natural products originating from mainly Chinese plant species. Both are no longer accessible online, but the structures of the bioactive compounds in these DBs are available in ZINC. There was also metadata on the molecular targets of these compounds, toxicity, a wide range of pharmacologically relevant molecular descriptors, and their therapeutic effects. Unfortunately, this metadata is not available in ZINC and has probably been lost [29,30].
BindingDB is a public, web-accessible DB that also supports medicinal chemistry and drug discovery through the development of structure–activity relationships, as well as computational chemistry and molecular modeling, such as docking. Therefore, this DB contextualizes binding affinity values derived from the interactions of proteins (considered potential drug targets), with ligands that are normally, low molecular weight compounds. Currently BindingDB has about 2.613,813 data of binding free energy values for 8942 proteins and over 1.123,939 ligands. BindingDB also provides information on basic molecular recognition and genomics studies. The data are extracted from the literature, as well as those selected from other DBs, such as PubChem and ChEMBL [31].
Drug Repurposing Hub is a curated collection of FDA-approved drugs that establishes preclinical or clinical data as important information resources. The library also collaborates with the screening of new drugs through the repositioning strategy. While the collection undoubtedly reveals new uses for already developed drugs, its true goal is achieved by discovering new insights related to mechanisms of action. Compounds are initially researched by entry from Phase I or higher clinical development, or by regulatory approval. Data regarding compound indications were curated by reviewing the drug labels available on the NIH DailyMed website. Terms referring to the 644 disease indications were manually curated to minimize redundancy and were also manually mapped to the diseases listed in the DB. Information on drug patents, on the other hand, was obtained from the FDA Orange Book (May 2016 edition) [32].
NPASS (Natural Product Activity and Species Source Database) provides the experimentally determined biological activity of natural products as reported in the literature, as well as information regarding the species from which these natural products were obtained. This information has been extracted from the comprehensive literature search by manual reviews, with emphasis on experimentally determined biological activities, for example, inhibitory concentration (IC50), dissociation constant (Ki), effective concentration (EC50) and the taxonomy of the source species of these bioactive compounds, along with place and date of collection [33].
The NPASS update (2023) includes approximately 95,000 records on the composition/concentration of 1490 NPs in 390 species. There is also a description of extended data on the biological activity of 43,200 NPs against 7700 targets, as well as on 31,600 source species of 94,400 NPs, including around 440 new microorganisms. Information was also added on 66,600 NPs without experimental pharmacological activity notations, but with activity profiles estimated using the chemical similarity tool, Chemical Checker. ADMET (absorption, distribution, metabolization and toxicity) properties have been calculated for all NPs. The updated version of this DB can be accessed free of charge at http://bidd.group/NPASS [34].
HMDB is a freely available electronic DB developed in 2007 that contains detailed information about human metabolites. It is intended for applications in the following areas: metabolomics, biomarker discovery, chemistry, clinical and general education. The DB is designed to contain or connect several types of data: chemical data, clinical data, molecular biology and biochemistry data. The latest release contains 220,945 metabolite entries, including water-soluble metabolites and lipids. In addition, 8610 protein sequences (enzymes and transporters) link to the metabolite entries. HMDB is connected to other DBs (KEGG, PubChem, MetaCyc, ChEBI, PDB, UniProt and GenBank). HMDB supports extensive searches for text, sequence, chemical structure and spectral data by MS and NMR. The latest update, HMDB 5.0, brings a number of important improvements and updates to the DB, making HMDB more useful and more attractive to users. These updates have broadened applications not only to human metabolism, but also in exposomics, lipidomics, nutritional science, biochemistry and the clinic [9,35].
The RIKEN MSn spectral DB for phytochemicals (ReSpect) is a collection of in-house and literature mass spectra (MS) natural product. The website is still maintained and is usable, but the last dataset has been added in 2013. The Global Natural Products Social Molecular Networking (GNPS) is a DB containing only MS spectra of natural product organizing and sharing raw, processed or identified data. Access to the spectra is by download and refers only to the structures of the natural product in this DB [36,37].
The SuperScent DB was established to provide users with detailed information on volatile compounds that are considered to be essences. This DB comprises the 2D/3D structures of approximately 2100 volatile compounds, as well as physicochemical properties, commercial availability and references. All information contained in this DB is taken from the literature. It should be noted that the volatile compounds have been classified according to their origin, functionality and odorant groups. SuperScent offers the user several search options, for example name, Pubchem ID number, species of origin, functional groups or molecular weight [38].
In this direction, SuperSweet is a comprehensive collection of carbohydrates, artificial sweeteners and other sweeteners such as proteins and peptides. It contains structural information and properties of these compounds, such as number of calories, therapeutic indications and the sweetness index. Currently, the DB consists of more than 8000 compounds. A user-friendly graphical interface allows for similarity search and visualization of sweeteners docked at the receptor [39].
Pherobase is an open-access DB that integrates several DBs to provide comprehensive information on pheromones and semiochemicals. Currently, Pherobase contains 7500 pheromones and 6500 semiochemicals whose occurrence list includes several types of animals. Miscellaneous information such as MS, kovats retention index, NMR, synthesis routes, molecular formula, 2D and 3D chemical structures of most of the semiochemicals that make up the DB are given. Information on the application of semiochemicals in pest management is also included in this DB and can be accessed by approach, region, country and host [40].
6.3. Natural products exclusive database
Natural product DBs aim to provide a wealth of information on compounds from a wide variety of natural products, such as plants, marine organisms and microorganisms. This information is quite comprehensive and includes data on the chemical structure of the compounds and their structural elucidation, as well as their biological activities.
InterBioScreen contains a collection of natural products that was started in 1984 and currently contains 68,000 compounds. The collection comprises 30–35% strictly natural compounds isolated from plants, microorganisms and marine species. In addition, about 40% of them are derivatives of natural compounds such as flavonoids, and the remaining 25–30% are mimetics (analogs) of strictly natural compounds, for example azocoumarins. The structures and stereochemistry of the compounds have been confirmed by various physicochemical methods [41].
PSC-db is a database of secondary plant compounds. It is available on the web. It includes a simple search tool that allows you to find compounds according to their chemical classification and their 3D structural similarity, using the parameter of the Tanimoto coefficient. Search filters can be applied such as the PSC-db ID of the compound, the name, the molecular formula, the molecular weight or by using specific descriptors such as the number of carbon atoms. For a subset of compounds, PSC-db allows the calculation of physicochemical properties, pharmacokinetics and drug-likeness properties. Interestingly, these types of correlations make it possible to validate estimates of compound permeability and oral absorption, and to compare the ADMET profile of several major chemical compound DBs. In this way, PSC-db can help identify compounds that have never been tested before. These speeds up the identification of bioactive compounds. In rational drug design processes where PSC are used as bioactive compounds or as pharmaceutical excipients, a new DBs offers the opportunity to reduce resources and time [42].
COlleCtion of Open Natural ProdUcTs (COCONUT) is considered the most complete and up-to-date continuously curated DB of natural products. Studies have shown that the fragments of natural products present in COCONUT have high diversity and structural complexity, making it a suitable source for drug discovery. COCONUT was developed by manually extracting information from 53 natural product DBs, from which compounds have been carefully extracted, curated, processed and annotated, such as, Unique Natural Product, the unified and curated collection of natural products. COCONUT includes approximately 406,076 structures without defined stereochemistry and a total of 730,441 structures with available stereochemistry [43].
The Natural Products Atlas (NPAtlas) is maintained at Simon Fraser University in Canada and is curated by a consortium of data curators. It is designed to cover NPs whose origin is found in microorganisms (bacteria, fungi, lichens and cyanobacteria) published in peer-reviewed literature. The resource is actively updated, allows a bulk download of all data and metadata, and as of September 2019, is completely open [44,45]. The new version of the DB includes a new page structure and an expansion to include 8128 new compounds, bringing the total to 32,552. Full taxonomic descriptions for all taxa and chemical ontology terms from NP Classifier and ClassyFire were added in addition to these structural and content changes. Manual curation has been undertaken to revise entries with incomplete configuration assignments and data has been integrated from external resources, including CyanoMetDB. The DB can be accessed via the new interactive website at www.npatlas.org [46].
ChemDB (Chemical Database of Pakistan) is an Asian DB and exclusively contains natural products with relevant potential for new drug discovery. The data is publicly available via the web for download and searches are targeted using a variety of methods. The chemical data include experimentally determined predictions or physicochemical properties in addition to 3D structure, melting temperature and solubility [47].
TCM Database@Taiwan is a DB based on the growing demand for the application of modern strategies employed in the search for potential drugs, with a focus on Traditional Chinese Medicine (TCM) compounds. This DB contains 3D structural information of TCM constituents, including for docking simulations. TCM Database@Taiwan contains information collected from Chinese medical texts and scientific publications and is currently the largest non-commercial TCM compound DB worldwide. In order to gather the information contained in this DB, extensive literature searches were performed to obtain data regarding the chemical composition of each TCM plant species. Currently, all compounds in the DB have been geometrically optimized in the MM2 force field and are available for download in Mol2 file format [48].
PHCD (Persian Herbal Constituents Database) contains useful information about a of medicinal plants and their chemical constituents. PHDC contains two main data sets: information about the plant species and information about the bioactive compounds. The general information, such as scientific name, common name, Persian name, family, medicinal use(s) and images for each plant species, was collected from books and internet DBs. Information on the bioactive compounds found in these species was manually extracted from peer-reviewed scientific publications [49].
VIETHERB was developed to organize the valuable information regarding Vietnamese traditional medicine. Vietnam has a highly diverse traditional medicine practice, in which various combinations of medicinal plants have been widely used for many types of diseases. However, current manuscript records and text-based DBs are poor, and in light of this efforts have been conducted to organize this information. VIETHERB provides users with information on medicinal plants, including metabolites, target diseases, morphology and geographic location for each species. The data consists of 2881 species, 10,887 metabolites, 458 geographic locations and 8046 therapeutic effects. Information on Vietnamese plant species can be easily accessed or queried using their scientific names. The DB offers, as an open source to users, subsidies for traditional medicine studies, projects for virtual drug screening and the conservation of endangered plants [50].
Another Asian DBs that is worth to be mentioned is the TIPdb-3D which was developed in Taiwan. In this country, there are endemic plants that have several biologically active phytochemicals, and information about these compounds was cataloged in this DB. TIPdb-3D is not only a repository of information, but also supports the discovery of new pharmacologically active compounds. Among the 8853 bioactive compounds, more than 1500 phytochemicals in TIPdb-3D have not been included in other Ds in associations with medicinal plants. The compounds have already been evaluated against Lipinski's Rule of 5 indicating that TIPdb-3D shows promising drug candidates. The most distinctive feature of TIPdb-3D is its curation related to the bioactivity of the compounds. and to the 3D chemical structures. Bioactivity-related curation refers to antiplatelet and antituberculosis activity, and currently anti-inflammatory action also undergoes this process. The DB is under active development to collect more structures and their bioactivities related to the published literature [51].
The Nucleic Bioassays, Ecophysiology and Biosynthesis of Natural Products Database (NuBBEDB) was created in 2013 as a library consisting of 640 compounds [52]. NuBBEDB was designed to provide molecular descriptors and chemical structures of natural products studied in the NuBBE molecular modeling and medicinal chemistry laboratories. Initially, this DB proved to be a valuable resource for new drug design and dereplication studies, which encouraged its expansion. Since 2015, continuous efforts have been made to expand its content, including the most diverse sources of natural products and with the establishment of a comprehensive compendium of available biogeochemical information on Brazilian biodiversity. The NuBBEDB content is online, free to access and provides validated multidisciplinary information, chemical descriptors, species sources, geographic locations, spectroscopic data and pharmacological properties, revealing an overview about the current profile of compounds present in the Brazilian territory. In the last 4 years, the number of NuBBEDB compounds has increased 200%, including more than 2000 natural products and derivatives. This information was extracted from more than 1500 articles [3].
BIOFACQUIM represents one of the first DB of natural products isolated and characterized in Mexico, which is being built, under the curatorship and maintenance of an academic group of the School of Chemistry at UNAM (Universidad Nacional Autónoma de México). This DB is available for free on the web interface. Compounds from BIOFACQUIM are also available in ZINC15. This DB was built on the basis of bibliographical research, with the objective of subsidizing the information on the inserted bioactive compounds. Its current version includes 553 compounds, included during the last 10 years of its development [53]. The characterization of BIOFACQUIM was performed using chemoinformatics tools. In addition, structural diversity analysis for understanding the coverage of chemical space is also a hallmark of this DB. And this analysis indicated that there are compounds in BIOFACQUIM with chemical structures very similar to drugs approved for clinical use [54].
For more than 5 years, the Informatics Unit of the Institute of Chemistry (UNIIQUIM) at UNAM has been developing and curating an open DB with natural product from Mexico, mainly isolated and published by researchers from the Natural Products Department of the Institute of Chemistry, UNAM. This is a DB designed to collect information regarding the vast biodiversity of Mexico published by the Natural Products Department of the Institute of Chemistry, UNAM. The compounds in UNIIQUIM are natural product isolated in Mexico from plants, fungi, marine organisms and insects. The total number of compounds is not entirely clear on the website which is only available in Spanish (https://uniiquim.iquimica.unam.mx) [55].
Contextualizing DBs from Latin American countries, the Center for Pharmacognostic Research on the Flora of Panama, Faculty of Pharmacy, University of Panama (CIFLORPAN) has been building the Natural Products DB of the University of Panama, Republic of Panama: UPMA. This dataset was first released in 2017 and recently contains 454 compounds. UPMA has compounds that have been biologically tested in over 25 in vitro and in vivo bioassays. Examples of therapeutic indications of the bioactive compounds are anti-HIV, antioxidant and anticancer. The content, diversity analysis of compounds and structure–activity relationship of UPMA bioactives are already planned [56].
Peru is a megadiverse country. It has endemic species of plants, terrestrial and marine animals and microorganisms. This is the reason for the systematization of information on natural products from this rich biodiversity in a DB. The Peruvian Natural Products Database (PeruNPDB) contains 280 natural products from Peru's biodiversity. The link https://perunpdb.com.pe/ provides free access to the PeruNPDB. The DB is intended for a variety of tasks, such as virtual screening projects against various disease targets. The DB has undergone curation processes and chemoinformatic characterization of molecular diversity and coverage in chemical space [57].
It is worth adding that the Latin American countries mentioned above are traditionally rich in their unique biodiversity and phytotherapy has a strong tradition and use in this region. In view of this, efforts have been mobilized in the development of a Latin American Natural Products DB (LANaPD): a unified DB of NPs representing Latin American biodiversity [58]. Currently, Brazil, Mexico and Panama have published their DBs in LANaPD, for example, NUBBE, BIOFACQUIM and UPMA, respectively. The chemical structures of these DBs are already available in the public domain and comprehensive analyzes of their content, diversity and characterization of various physicochemical properties have been released [7].
There are also DB with a focus on plant NPs used in traditional medicines on the African continent. Among these, the most famous and most generalist is AfroDB although it is only accessible through ZINC. The AfroCancer, AfroMalaria and Afrotryp data sets integrate bioactive compounds from traditionally employed medicinal plants for their potential targets involved in the treatment of cancer, malaria and trypanosome. AfroDB is a library of compounds represented by their 3D structures, relatively small (approximately 1000 compounds) and recognized for containing natural products from the entire African continent [59].
The pan-African natural products library (p-ANAPL) is a consortium of collections of natural products isolated from the African biota and belonging to researchers and/or groups of researchers working in African institutions. The p-ANAPL project was created in April 2009 by groups of natural product researchers from Botswana, Cameroon, Ethiopia, Kenya, Sudan and Tanzania in order to develop a collection of natural products and create an improved and efficient platform for biological screening, promoting the discovery of useful compounds from them. The p-ANAPL library is associated with the Centre for Scientific Research, Indigenous Knowledge and Innovation (Cesriki) at the University of Botswana. The compounds in the p-ANAPL collection comprise around 30 different classes of bioactives that reflect substantial chemical diversity, even in a relatively small collection. This collection shows the potential of African natural products in terms of molecular diversity [60].
The East Africa Natural Products Database (EANPDB) contains the structural and bioactivity information of 1,870 unique molecules isolated from approximately 300 species from the East African region, representing the largest collection of NPs from this geographical region. This DB covers literature data from 1962 to 2019. The physicochemical properties and toxicity profiles of each compound have been included in the collection. The EANPDB has been combined with the North African Natural Products Database (NANPDB) to form the African Natural Products Database (ANPDB). The ANPDB contains 6515 compounds isolated mainly from 1042 source organisms, mainly plants, with contributions from microorganisms, animals, for example corals, and marine sources. This DB is freely available at http://african-compounds.org [61].
The South African Natural Compounds Database (SANCDB), available at https://sancdb.rubi.ru.ac.za/, is the only fully referenced DB of natural compounds from South African biodiversity. It is freely available and, since its launch in 2015, its content has been used to train machine learning models and in drug discovery studies. The updated version of the SANCDB contains 1,012 compounds. SANCDB also provides direct links to commercially available analogs of its natural compounds from two major chemical DBs, Mcule and MolPort. This feature is not available in other NP DBs. In addition, to make the information more accessible to users, both the DB and website interface have been updated and the compounds can be downloaded in several different chemical formats [62].
The North African Natural Products Database (NANPDB) contains information on approximately 4500 NPs, covering literature data from 1962 to 2016. The data cover compounds isolated mainly from plants, with contributions from some endophytic, animal (e.g., coral), fungal and bacterial sources. The compounds were identified in 617 species belonging to 146 families. Physicochemical properties commonly used to predict drug metabolism and pharmacokinetics, as well as toxicity information, were included for each compound in the dataset. This is the largest collection of annotated natural products produced by organisms native to North Africa. Although the DB includes known compounds, the pharmacological potential of most of the compounds has not yet been investigated. This DB also helps to elucidate synthetic routes for secondary metabolites. The latest version of the NANPDB is available at http://african-compounds.org/nanpdb/ [63].
Ethiopia is rich in medicinal plants. They have been used for a long time, especially in rural areas. It is estimated that about 80% of the Ethiopian population uses traditional medicine. There are about 6500 to 7500 plant species in Ethiopia. About 12% of them are endemic. In view of this, ETM-DB was developed, a free online relational DB whose deposited information was searched in online research articles, theses, books and public DBs containing information on Ethiopian herbal medicines and phytochemicals. These resources were thoroughly reviewed. Manual curation and Python/Java code were used. ETM-DB is a comprehensive DB containing information on 1054 Ethiopian medicinal plants, 1465 traditional therapeutic uses and 4285 compounds. The physicochemical and ADMET properties of the compounds have been obtained with the help of various cheminformatics tools. The drug-like properties of the compounds were also evaluated using the FAF-Drugs4 web server. Of the 4285 compounds, 4080 passed the FAF-Drugs4 input stage. Of these, 876 showed acceptable drug-like properties. The ETM-DB website interface allows users to search for compounds using various options provided in the search menu. The ETM-DB is expected to accelerate the discovery and development of drugs from Ethiopian natural products. It contains information on chemical composition in relation to human target genes/proteins. The current version of the ETM-DB is openly accessible at http://biosoft.kaist.ac.kr/etm [64].
Indian Medicinal Plants, Phytochemistry And Therapeutics 2.0 (IMPPAT 2.0) is a manually curated DB that has been created through the digitization of information from over 100 books on traditional Indian medicine, over 7000 published research articles and other existing resources. IMPPAT 2.0 is the largest digital DB of Indian medicinal plant phytochemicals to date and represents a significant improvement and expansion of IMPPAT 1.0. IMPPAT 2.0 captures the following types of associations Indian medicinal plant – plant part – phytochemical and Indian medicinal plant – plant part – therapeutic use. The current version 2.0 of the IMPPAT DB covers 4,010 Indian medicinal plants, 17,967 phytochemicals and 1095 therapeutic uses. More importantly, the phytochemicals and therapeutic uses of Indian medicinal plants are now provided at the plant part level. Importantly, for the 17,967 phytochemicals in this DB, we have provided 2D and 3D chemical structures and used cheminformatics tools to calculate their physicochemical properties, drug similarity based on different scoring schemes and predicted ADMET properties. In addition to visualizing phytochemicals based on physicochemical properties, drug-like properties and chemical similarity, the site now allows visualization of phytochemicals based on molecular structures. IMPPAT 2.0 is available at https://cb.imsc.res.in/imppat/ [65].
One of the major challenges in NP research is to establish the exchange between ‘structure–organism’ pairs. In other words, to establish the relationships between different molecular structures and the living organisms from which they were identified. It is important to emphasize that bioinformatics and chemoinformatics tools have contributed. However, the consolidation and sharing of this information through an open platform has a strong transformative potential for natural products research. This is the aim of the LOTUS platform, which has already completed the first stages of harmonizing, curating, validating and disseminating more than 750,000 referenced structure–organism pairs. LOTUS data is hosted on Wikidata and regularly mirrored on https://lotus.naturalproducts.net. Sharing data on the Wikidata framework expands data access and interoperability. It opens up new possibilities for community curation and evolving publishing models [66].
Natural Products and Biological Sources (NPBS) contains information on the relationships between natural products and biological sources reported in publications. The relational data links a particular species to all natural products derived from it and, conversely, a particular natural product to all biological sources. There are corresponding references for each piece of relational data. In the NPBS, natural products are represented by their molecular structures and biological sources are represented by the names of the species of organisms. Other available information such as the parts derived from the organisms, names of the natural products and molecular properties are also included. The volume of the DB is constantly growing as publications on natural product-derived research increase, and it is intended that more and more publications will be included at the data collection stage [67].
Macrocyclic lactones, with at least 12 atoms in the central ring, include several natural products, such as macrolides, with potent and useful bioactivities, for example antibiotics. MacrolactoneDB integrates almost 14,000 existing macrolactones and their bioactivity information from different public DBs, such as NANPDB, StreptomeDB, unpd, NuBBe, ZINC15, TIPdb, AfroDB, BindingDB, AfroMalariaDB, BIOFACQUIM, PubChem and ChEMBL13. Macrolactones are a broad and diverse structural class at various levels of complexity, and a DB that contemplates this class of compounds needs to be useful for different research projects. In view of this, a web application was also developed with various filters on chemical properties, such as ring size, number of sugars, molecular weight, among others, allowing users to extract a highly specific subset of interest. In addition, a chemoinformatic analysis of MacrolactoneDB was carried out to understand its chemical diversity [68].
6.4. Database-specific biological activities
It is important to highlight the DBs referring to specific biological activities. In this context, the number of DBs that aim to list natural products targeting a particular biological activity is still small, but some research groups have been dedicated to the development of these valuable tools. African medicinal plants represent a good starting point for the discovery of anticancer drugs. The AfroCancer DB is an important tool for the virtual screening of potential chemotherapeutics.
AfroCancer DB contains around 400 bioactive compounds, which have already been studied for their drug-likeness properties. These compounds originate from plants that have already demonstrated anticancer activity by in vitro and/or in vivo assays. Furthermore, to verify the binding potential and the interactions of these bioactive compounds with targets, molecular docking simulations were conducted with 14 known targets for antitumor drugs. Binding affinity calculations were performed in comparison with known anticancer agents comprising ∼1,500 compounds originating from African medicinal plants. The results reveal that these plants represent a good starting point for anticancer drug discovery [69].
The NPACT (Naturally occurring Plant based Anticancerous Compound-Activity-Target Database) brings together information related to natural compounds exhibiting anticancer activities (in vitro and in vivo) in order to pinpoint potential targets for a specific disease. Currently, NPACT contains entries for 1,574 compounds and each record provides information about its structure, with manual curation, published data on in vitro and in vivo assays with inhibitory values (IC50/ED50/EC50), physicochemical properties, cancer types, cell lines, protein targets and mainly contains online tools for the search of new drugs by structure similarity, based on structure (using JC search tool) [70].
NPs are also a promising source of anti-inflammatory compounds and to optimize virtual target-based screening, 665 natural products were extracted from the literature and compiled into a dataset that was named InflamNat. The physicochemical properties of these compounds were analyzed, and the distribution of scaffolds has been presented. This DB has targets extracted from PubChem. There was a comparison between the compounds from InflamNat and the anti-tumor compounds from DB NPACT. The bioactives from InflamNat have a distribution comparable to the structure of compounds from NPCAT, but with a higher proportion that satisfies Lipinski's rule. Flavonoids and triterpenoids are the groups with the highest abundance [71].
The BIOPEP-UWM Bioactive Peptides DB has been available on the Internet since 2003. It is a useful tool for research on bioactive peptides, especially those derived from food, which are components of diets that prevent the development of chronic diseases. The DB is continuously updated and modified. The addition of new peptides and the introduction of new information about existing peptides, such as chemical codes and references to other DBs, are constantly being updated. New features include the possibility of notations for peptides containing D-enantiomers of amino acids, batch processing, conversion of amino acid sequences into SMILES code, new quantitative parameters to characterize the presence of bioactive fragments in protein sequences and the discovery of proteinases that release specific peptides. Links to BIOPEP-UWM have recently become available on sites such as MetaComBio, LabWorm and OmicX. The SpirPep and FeptideDB DBs also include information on peptides [72].
Natural products have a multivariate biochemical profile with diverse pharmacological properties. They have been widely used as anti-ageing and anti-melanogenic agents due to their effective contribution to the elimination of reactive oxygen species (ROS) caused by oxidative stress. Their anti-ageing activity is mainly related to their ability to inhibit enzymes such as human neutrophil elastase, hyaluronidase and tyrosinase. The information available in the literature (covering the period from 1965 to 2020) has listed data on the inhibitory activity of natural products in relation to these enzymes. This information is available in ANTIAGE-DB. It allows the prediction of the anti-ageing potential of target compounds. The server works on two axes. First, it compares compounds by similarity. Compounds whose inhibitory potential has been established in the literature are compared. Second, for a selected molecule, a reverse virtual screening against the three enzymes can be performed. The server is open access. A detailed report of the prediction results is sent to the user by e-mail. ANTIAGE-DB can be accessed online via the following link: https://bio-hpc.ucam.edu/anti-age-db [73].
AntibioticDB was the first open-access DB for antibacterial research including discontinued agents, drugs in preclinical development and those in clinical trials, and made available at AntibioticDB.com. The data present in this DB were obtained for publicly available sources. AntibioticDB includes over 1000 compounds that are in one of the following categories: preclinical development, Phase I–III clinical trials, in Phase IV clinical development, awaiting approval or recently approved. Discontinued compounds serve as a reference or as a starting point for future research and redevelopment [74].
The vast majority of antibiotics available today are natural products (NPs) isolated from Streptomyces. Streptomycetes are soil-dwelling bacteria. There is still a huge reservoir of Streptomyces NPs that remain untapped from a pharmaceutical point of view. Launched in 2012, StreptomeDB (www.pharmbioinf.uni-freiburg.de/streptomedb) is the first and only public online DB allowing interactive phylogenetic exploration of Streptomyces and their isolated NPs. Currently, about 2,500 NPs are annotated by manual curation. To enhance interoperability, StreptomeDB entries have been linked to several spectral, (bio)chemical and chemical DBs, as well as a genome-based server. In addition, pharmacokinetic and toxicity profiles have been added to the DB. Finally, some recent real-world use cases of StreptomeDB are highlighted to illustrate its applicability in the life sciences [75].
Antiviral Medicinal Plants and Natural Products Database (avMpNp DB) is a DB developed in Brazil and contains bioactive compounds from biodiversity with antiviral activity. The strategy for building the avMpNp DB consisted first of a bibliographic search in academic DBs and in compounds DBs, in order to systematize information about the bioactive compounds included in the DB. It is worth noting that most of the bioactive compounds stored in avMpNp DB have already been isolated by the Laboratory of Pharmacognosy and Homeopathy of the School of Pharmacy, UFMG, and some of them present themselves as potential candidates for antiviral drugs, both for COVID-19 and for other viruses. This is mainly due to the information obtained through in silico and in vitro studies already described in the scientific literature, and whose systematization of results is being conducted throughout this project [76].
6.5. Description of the database compound features: private databases
A private DB are for own use or sells the data, access or license, and in general, it is quite expensive even for academic use (from US$ 6,600 per year for the Dictionary of Natural Products® to over US$ 40,000 for the SciFinder).
The Dictionary of Natural Products (CRC Press, v. 27.1) (DNP) is a compilation of all known compounds that are derived from natural sources and can be regarded as one of the most comprehensive natural product libraries available to date [77]. The latest version of DNP (v. 27.1, at the time of writing) provides data on nearly 300,000 natural products and contains information on the physicochemical and biological properties of the compounds; along with their systematic and common names, bibliographic references, structures and origin (including family, genus and species). Due to its rich content, the DNP can be considered as a body of knowledge for natural products and as a guide in investigations for drug discovery based on natural products.
ChemTCM is a DB of natural products obtained from the medicinal plants used in TCM, developed at King's College London, with support from China-UK Innovation. This dataset comprises metadata regarding TCM natural products, but also encompasses their activities on common therapeutic targets for natural products employed in Western medicine [78].
The Natural Products Library (NPL) was first described in an article by the pharmaceutical industry AstraZeneca. However, the data in the publication regarding the more than 800 cured natural products only remained as an in-house collection [79,80]. DB Ayurveda contains a collection of information regarding natural products extracted from medicinal plants of traditional Indian medicine, first described in a publication developed by Lagunin et al. [77]. The link described in the mentioned publication still works but redirects to a website that provides software for researching natural products and chemicals in general. Perhaps the DB is still available on this site, but with subscription-only access [46].
INDOFINE Chemical Company, founded in 1981, provides high quality rare organic compounds and natural products for scientific advancement. Over the years, the development lab has synthesized more than 5000 new compounds for medicinal chemistry projects. Product catalogs are available for download in Adobe Acrobat (PDF) format. 20,736 structures in SDF are also available [7].
ACDISC_NP (AnalytiCon Discovery Natural Products) provides the optimization of natural products for a wide range of biological effects for companies in the pharmaceutical, cosmetic, food and agricultural industries. As a research and development company, it covers the entire spectrum of the conception of a potential drug, from the establishment of libraries that enable the identification of active compounds, to the development of synthesis, including for commercial scale production. This collection contains both plant and microorganism isolates, enabling faster discovery of natural products and thus also contributing to the optimization of targets. SAR studies as well as ‘Hit-to-Lead' programs can be conducted in the AnalytiCon laboratory [6].
The literature review conducted gathered 62 articles published in the last 15 years (2007–2023) which pointed to 64 compounds DBs made available for various purposes: from repositories of curated 3D structures, such as 3DMET, to DBs with applicability in research and development of new drugs for specific diseases such as avMpNp DB, which collects potential bioactive compounds with antiviral action. Most of the information contained in the researched DBs, were manually extracted from the literature, undergo largely, curatorial processes and comprise a diversity of information related to physicochemical properties, spectral data, chemical space configuration, in silico prediction studies, in vitro assays and bioactivity. And most of this organized information supports the optimization of searches for potential leads. Supplementary Table S2 shows a comparison of the information contained in the DBs studied in this article, such as: identification of substances, description of physico-chemical properties, biological activities, spectral data and access to bibliographic references.
However, it is noted that a DB usually does not include all the information related to a natural compound or product. As an example, a negative point of SuperNatural II, UNPD and ZINC, the three largest DBs in terms of the number of existing NPs, refers to the taxonomic or geographical origins of the organism that produced the compound, which are not identified [46]. In contrast, resources such as NUBBEDB contemplates validated information regarding the geographical locations of plant species, showing a current profile of the compounds present in the Brazilian territory [3].
Regarding the origin of the DBs, most of the 49 resources identified in this work were developed at universities and public research institutes and are maintained by these institutions. An example is UNIQUIM which was built and is maintained by the Chemistry Institute of UNAM. However, it is noted that some of these DBs, except the commercial ones, are no longer accessible, for example: SuperScent and SuperSweet have not been updated since 2010, AfroDB is accessible only through ZINC DB, besides the DBs HIT and HIM, which are no longer accessible online, but the structures of the bioactive compounds are available in ZINC DB. The metadata of HIT and HIM, on the other hand, are not available and have probably been lost [29,30]. Another example is the UNPD, the largest non-commercial, open access natural products DB (at the time it was developed, it contained 197,201 natural products from plants, animals and microorganisms), does not have the website that hosted it accessible.
Updating and maintaining compound DBs is of critical importance for free and open access and sustained and timely use of the information. This is a challenging step, particularly for public DBs, due to sustained funding issues. One solution to address such an issue is to make use of permanently linked repositories with a DOI number, such as Fighshare or Zenodo. Or be successful in obtaining funding to sustain the website. An excellent example is ZINC22 hosted by a research group at the University of California in San Francisco. Other examples of public DBs with sustained financial support are PubChem, ChEMBL and DrugBank, all available online and free of charge [56].
Another feature also observed in the DBs surveyed is that these DBs usually allow searching the compounds by simple searches, for example as in COCONUT, where these searches can occur by compound name, InChI, SMILES, molecular and structural formula. However, the quality of compound structures and notations requires additional attention and curation. There are no standards for DBs related to stereochemistry, aromaticity, or isotopes, which can lead to a variety of possible versions of the same compound, so for safe use of the data, throughout the process of developing a DB, diverse strategies for curation have been promoted. From adopting compound already curated by other DBs, as for example those provided by 3DMET, Pubchem and DrugBank; or by adopting their own protocols and strategies for manual curation, as occurs in BraCoLi and NPCAT [43].
Chemoinformatics has increasingly contributed to drug discovery at different levels, for example, in the development of the DB itself, as well as in the curation of its compounds. Another important contribution of this science refers to the screening of compounds by filtering in libraries, especially those that include in silico studies, for subsequent execution of experimental trials. It also helps in the generation and/or refinement of hypotheses of the mechanism of action of bioactive compounds at the molecular level. In view of this, many DBs have included chemoinformatics tools for the analysis of the diversity of their content and coverage of the compounds in chemical space, in addition to the optimization of the virtual screening, for the achievement of potential leads. Examples of DBs surveyed in this review that employ these tools are Specs Natural Products and BIOFACQUIM.
Structural diversity analysis for understanding the chemical space coverage of BIOFACQUIM indicated that there are compounds in this DB with chemical structures very similar to drugs approved for clinical use. Therefore, very similar to other natural product DBs, BIOFACQUIM can be employed, through virtual screening, in lead identification [53], proving the importance of DBs in virtual drug screening.
Among the 64 DBs analyzed, 39.1% (n = 25) refer to DBs exclusively of natural products These DBs originate from countries with extensive biodiversity, such as Brazil, Mexico, and Panama. There is a need to develop a unified universal repository, in order to avoid unnecessary duplication of online resources and facilitate research related to bioactive compounds. Besides, the great challenge of these repositories is the standardization of procedures for the curation of information and compounds, as well as the provision of financial resources and researchers for their maintenance and support.
7. Conclusion
The analysis of the DBs cited in this review points out that these tools are very diverse and that some of them are not available for access or are in a process of discontinuity, since the websites that provided them are not available or there is no updating of data and inclusion of new compounds. And this fact represents a serious loss in the generation of knowledge.
On the other hand, several of the DBs evaluated have free access, through websites that do not require login and registration. These DBs have broadened their applicability, as they have added tools and information that are essential for the virtual screening of drugs, such as those used for in silico predictions, and others that allow the characterization and curation of the DB. However, it is pertinent to highlight efforts to advance DBs of bioactive compounds with emphasis on a specific pharmacological action, such as AfroCancer, AfroMalaria, NPACT, InflamNat and avMpNp DB. These DBs provide disease directed information, enhancing virtual target-based screening, reducing time and costs in this process.
With regard to exclusive natural product databases, the development of a unified universal repository is necessary to avoid unnecessary duplication of online resources and to facilitate the search for compounds. The creation of a single public domain DB representing the biodiversity of several countries is challenging as it requires standardized procedures for curating information and compounds. As a proof of concept, eight Latin American countries have joined efforts to assemble a unified and curated so-called Latin American Natural Product Database – LANaP DB The DB is open-access and contains nearly 13,000 natural products. The chemical structures and information are gathered from nine individual DBs. As with other large-scale projects, LANaP DB requires sustained funding to keep updating, maintaining, and curating the information [6]. A viable suggestion is that such activities should be supported by funding between member countries, perhaps through a formal consortium, to make this project sustainable and ongoing.
Compound DBs have the intention of making available standardized and useful information about the compounds that contain them, aiming at diverse multidisciplinary research and development of products and drugs. Besides, the great challenge of these repositories is the standardization of procedures for the curation of information and compounds, as well as the provision of financial resources and researchers for their maintenance and sustainability.
8. Future perspective
Based on a literature review of existing DBs commonly used in drug discovery, we highlight the relevance of these libraries throughout the process. Natural product DBs, for example, are becoming promising accelerators in the development of drugs from bioactive compounds for various therapeutic areas such as cancer, viral infections and neglected diseases. There is a growing trend towards such libraries, for example. repositories of information on compounds with specific pharmacological activities. In the future, these virtual libraries with potential biological activity could also enrich the medically relevant chemical space. In addition, the incorporation of chemoinformatics tools, the automation of curation processes and the allocation of financial and human resources for the maintenance of DBs will contribute significantly to the development of drug discovery and development projects.
Supplementary Material
Acknowledgments
Please refer to the Author Disclosure Form regarding inclusion of individuals in the author list, versus acknowledging their contributions in this section.
Funding Statement
This research was funded by Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq – 303757/2019-1), CAPES foundation, grant numbers 311875/2022-0 and 88887.699683/2022-00 and Pró-Reitoria de Pós-Graduação da UFMG (PRPG) for intramural funding.
Supplemental material
Supplemental data for this article can be accessed at https://doi.org/10.1080/17568919.2024.2342203
Author contributions
All the authors participated in writing and revision of the manuscript; DQ de Azevedo, BM Campioni and FA Pedroz Lima participated actively in the systematic revision process and preparing of figures and tables; JL Medina-Franco, RO Castilho and VG Maltarollo carried out the experimental design and supervision of the revision process. For more information, please see the Author Disclosure Form.
Financial disclosure
This research was funded by Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq – 303757/2019-1), CAPES foundation, grant numbers 311875/2022-0 and 88887.699683/2022-00 and Pró-Reitoria de Pós-Graduação da UFMG (PRPG) for intramural funding. The authors have no other relevant affiliations or financial involvement with any organization or entity with a financial interest in or financial conflict with the subject matter or materials discussed in the manuscript apart from those disclosed.
Competing interests disclosure
The authors have no competing interests or relevant affiliations with any organization or entity with the subject matter or materials discussed in the manuscript. This includes employment, consultancies, honoraria, stock ownership or options, expert testimony, grants or patents received or pending, or royalties.
Writing disclosure
No writing assistance was utilized in the production of this manuscript.
References
Papers of special note have been highlighted as: •• of considerable interest
- 1.Silva TS. Desenvolvimento de banco de dados de pacientes submetidos ao transplante de células-tronco hematopoiéticas [Development of a database of patients undergoing haematopoietic stem cell transplantation] [master thesis]. Porto Alegre (RS): Universidade Federal do Rio Grande do Sul; 2018. [Google Scholar]
- 2.Heuser CA. Projeto de Banco de Dados [Database Design]. Porto Alegre: Bookman; 2009. [Google Scholar]
- 3.Pilon AC, Valli MD, AC Pinto MEF, et al. Castro-Gamboa, Andricopulo AD, Bolzani VS. NuBBE DB: an updated database to uncover chemical and biological information from Brazilian biodiversity. Sci Rep. 2017;7(1): 1–12. doi: 10.1038/s41598-017-07451-x [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Miller MA. Chemical database techniques in drug discovery. Nat Rev Drug Discov. 2002;1(3):220–227. doi: 10.1038/nrd745 [DOI] [PubMed] [Google Scholar]
- 5.Medina-Franco JL, Flores-Padilla EA, Cháves-Hernanndes AL. Discovery and development of lead compounds from natural sources using computational approaches. Evidence-Based Validation of Herbal Medicine, Elsevier; 2022; p. 539–560. [Google Scholar]
- 6.Medina-Franco JL. Towards a unified Latin American natural products database: LANaPD. Future Science AO. 2020;6(8):FSO468. doi: 10.2144/fsoa-2020-0068 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Núñez MJ, Díaz-Eufracio BI, Medina-Franco JL, et al. Latin American databases of natural products: biodiversity and drug discovery against SARS-CoV-2. RSC Advan. 2021;11(26):16051–16064. doi: 10.1039/d1ra01507a [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Reis EA, Reis IA. Análise descritiva de dados. Relatório Técnico do Departamento de Estatística da UFMG [Descriptive data analysis. UFMG Statistics Department Technical Report]. 2001. Located at: https://www.est.ufmg.br [Google Scholar]
- 9.Wishart DS, Feunang YD, Guo AC, et al. DrugBank 5.0: a major update to the DrugBank database for 2018. Nucleic Acids Res. 2018;46(D1):D1074–D1082. doi: 10.1093/nar/gkx1037 [DOI] [PMC free article] [PubMed] [Google Scholar]; •• Because of its broad scope, comprehensive referencing and detailed data descriptions, DrugBank (since 2006) is enabling major advancements across the data-driven medicine industry.
- 10.Kiriiri GK, Njogu PM, Mwangi AN. Exploring different approaches to improve the success of drug discovery and development projects: a review. Fut J Pharmaceut Sci. 2020;6(1):1–12. doi: 10.1186/s43094-020-00047-9 [DOI] [Google Scholar]
- 11.Akkari ACS, Munhoz IP, Tomioka J, et al. Inovação tecnológica na indústria farmacêutica: diferenças entre a Europa, os EUA e os países farmaemergentes [Technological innovation in the pharmaceutical industry: differences between Europe, the US and pharma-emerging countries]. Gest Prod. 2016;23:365–380. doi: 10.1590/0104-530X2150-15 [DOI] [Google Scholar]
- 12.Mendez D, Gaulton A, Bento AP, et al. ChEMBL: towards direct deposition of bioassay data. Nucleic Acids Res. 2022;47(D1):D930–D940. doi: 10.1093/nar/gky1075 [DOI] [PMC free article] [PubMed] [Google Scholar]; •• Since ChEMBL's first release in 2009, provide to access large amounts of high-quality, curated data on bioactive molecules from the medicinal chemistry literature.
- 13.Kim S. Exploring chemical information in PubChem. Curr Prot. 2021;1(8):217. doi: 10.1002/cpz1.217 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Feng H, Jiang J, Wei GW. Machine-learning repurposing of DrugBank compounds for opioid use disorder. Comp Biol Med. 2023;160:106921. doi: 10.1016/j.compbiomed.2023.106921 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Tomiki T, Saito T, Ueki M, et al. RIKEN natural products encyclopedia (RIKEN NPEdia), a chemical database of RIKEN natural products depository (RIKEN NPDepo). J Comput Aid Chem. 2006;7:157–162. [Google Scholar]
- 16.Sterling T, Irwin JJ. ZINC 15-ligand discovery for everyone. J Chem Inf Model. 2015;55(11):2324–2337. doi: 10.1021/acs.jcim.5b00559 [DOI] [PMC free article] [PubMed] [Google Scholar]; •• ZINC library is one of the most employed databases for virtual screening purposes. Furthermore, it was considered for long time the largest chemical database in the world.
- 17.Tingle BL, Tang KG, Castanon M, et al. ZINC-22- A free multi-billion-scale database of tangible compounds for ligand discovery. J Chem Inf Model. 2023;63(4):1166–1176. doi: 10.1021/acs.jcim.2c01253 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Hastings J, De Matos P, Dekker A, et al. The ChEBI reference database and ontology for biologically relevant chemistry: enhancements for 2013. Nucleic Acids Res. 2013;41:D456–D463. doi: 10.1093/nar/gks1146 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Veríssimo GC, dos Santos Júnior VS, de Almeida IADR, et al. The Brazilian compound library (BraCoLi) database: a repository of chemical and biological information for drug design. Mol Diver. 2022;26(6):3387–3397. doi: 10.1007/s11030-022-10386-9 [DOI] [PubMed] [Google Scholar]
- 20.Novick PA, Ortiz OF, Poelman J, et al. SWEETLEAD: an in silico database of approved drugs, regulated chemicals, and herbal isolates for computer-aided drug discovery. PLOS ONE. 2013;8(11):e79568. doi: 10.1371/journal.pone.0079568 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Gu J, Gui Y, Chen L, et al. Use of natural products as chemical library for drug discovery and network pharmacology. PLOS ONE. 2023;8:62839. doi: 10.1371/journal.pone.0062839 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Chávez Hernandez AL, Medina-Franco JL. Natural products subsets: Generation and characterization. Artif Intel Life Sci. 2023;3:100066. doi: 10.1016/j.ailsci.2023.100066 [DOI] [Google Scholar]
- 23.Banerjee P, Erehman J, Gohlke BO, et al. Super Natural II-a database of natural products. Nucleic Acids Res. 2015;3(D1):D935–D939. doi: 10.1093/nar/gku886 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Gallo K, Kemmler E, Goede A, et al. SuperNatural 3.0-a database of natural products and natural product-based derivatives. Nucleic Acids Res. 2023;6(51)(D1):D654–D659. doi: 10.1093/nar/gkac1008 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Chuprina A, Lukin O, Demoiseaux R, et al. Drug-and lead-likeness, target class, and molecular diversity analysis of 7.9 million commercially available organic compounds provided by 29 suppliers. J Chem Inf Model. 2010;50(4):470–479. doi: 10.1021/ci900464s [DOI] [PubMed] [Google Scholar]
- 26.Maeda MH, Yonezawa T, Komaba T. Chemical curation to improve data accuracy: recent development of the 3DMET database. Biophys Physicobiol. 2018;1:87–93. doi: 10.2142/biophysico.15.0_87 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Nakamura K, Shimura N, Otabe Y, et al. KNApSAcK-3D: a three-dimensional structure database of plant metabolites. Plant Cell Physiol. 2013;54(2):e4. doi: 10.1093/pcp/pcs186 [DOI] [PubMed] [Google Scholar]
- 28.Chen TF, Chang YC, Hsiao Y, et al. DockCoV2: a drug database against SARS-CoV-2. Nucleic Acids Res. 2021;49(D1):D1152–D1159. doi: 10.1093/nar/gkaa861 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Kang H, Tang K, Liu Q, et al. HIM-herbal ingredients in-vivo metabolism database. J Cheminform. 2013;5(1):1–6. doi: 10.1186/1758-2946-5-28 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Ye H, Ye L, Kang H, et al. HIT: linking herbal active ingredients to targets. Nucleic Acids Res. 2011;39:D1055–D1059. doi: 10.1093/nar/gkq1165 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Gilson MK, Liu T, Baitaluk M, et al. BindingDB in 2015: a public database for medicinal chemistry, computational chemistry and systems pharmacology. Nucleic Acids Res. 2016;44(D1):D1045–D1053. doi: 10.1093/nar/gkv1072 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Corsello SM, Bittker JA, Liu Z, et al. The Drug Repurposing Hub: a next-generation drug library and information resource. Nat Med. 2017;23(4):405–408. doi: 10.1038/nm.4306 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Zeng X, Zhang P, He W, et al. NPASS: natural product activity and species source database for natural product research, discovery and tool development. Nucl Acids Res. 2018;46(D1):D1217–D1222. doi: 10.1093/nar/gkx1026 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Gallo K, Kemmler E, Goede A, et al. SuperNatural 3.0-a database of natural products and natural product-based derivatives. Nucl Acids Res. 2023;51(D1):D654–D659. doi: 10.1093/nar/gkac1008 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Sawada Y, Nakabayashi R, Yamada Y, et al. RIKEN tandem mass spectral database (ReSpect) for phytochemicals: a plant-specific MS/MS-based data resource and database. Phytochemistry. 2012;82:38–44. doi: 10.1016/j.phytochem.2012.07.007 [DOI] [PubMed] [Google Scholar]
- 36.Wang M, Carver JJ, Phelan VV, et al. Sharing and community curation of mass spectrometry data with Global Natural Products Social Molecular Networking. Nat Biotechnol. 2016;34(8):828–837. doi: 10.1038/nbt.3597 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Horai H, Arita M, Kanaya S, et al. MassBank: a public repository for sharing mass spectral data for life sciences. J Mass Spectrom. 2010;45:703–714. doi: 10.1002/jms.1777 [DOI] [PubMed] [Google Scholar]
- 38.Dunkel M, Schmidt U, Struck S, et al. SuperScent – a database of flavors and scents. Nucleic Acids Res. 2009;37(1):D291–D294. doi: 10.1093/nar/gkn695 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Ahmed J, Preissner S, Dunkel M, et al. SuperSweet – a resource on natural and artificial sweetening agents. Nucleic Acids Res. 2010;39(1):D377–D382. doi: 10.1093/nar/gkq917 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.El-Sayed AM. The Pherobase: database of insect pheromones and semiochemicals [Internet]. New Zealand: 2024. [cited 2024 Mar 24]. Available from: https://pherobase.com/ [Google Scholar]
- 41.Patil VR, Dhote AM, Patil R, et al. Identification of structural scaffold from InterBioScreen (IBS) database to inhibit 3CLpro, PLpro, and RdRp of SARS-CoV-2 using molecular docking and dynamic simulation studies. J Biomol Struc Dyn. 2023;41(22):13168–13179. doi: 10.1080/07391102.2023.2175377 [DOI] [PubMed] [Google Scholar]
- 42.Valdés-Jiménez A, Peña-Varas C, Borrego-Muñoz P, et al. PSC-db: a structured and searchable 3D-database for plant secondary compounds. Molecules. 2021;26(4):1124. doi: 10.3390/molecules26041124 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Sorokina M, Merseburger P, Rajan K, et al. COCONUT online: collection of open natural products database. J Cheminform. 2021;13(1):1–13. doi: 10.1186/s13321-020-00478-9 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44.Almansour NM, Allemailem KS, Abd EL, et al. In silico mining of Natural Products Atlas (NPAtlas) database for identifying effective Bcl-2 inhibitors: molecular docking, molecular dynamics, and pharmacokinetics characteristics. Molecules. 2023;28(2):783. doi: 10.1186/s13321-020-00478-9 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45.van Santen JA, Poynton EF, Iskakova D, et al. The Natural Products Atlas 2.0: a database of microbially-derived natural products. Nucleic Acids Res. 2023;50(D1):D1317–D1323. doi: 10.1093/nar/gkab941 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46.Sorokina M, Steinbeck C. Review on natural products databases: where to find data in 2020. J Cheminform. 2020;12(1):20. doi: 10.1186/s13321-020-00424-9 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47.Chen JH, Linstead E, Swamidass SJ, et al. ChemDB update-full-text search and virtual chemical space. Bioinformatics. 2007;23(17):2348–2351. doi: 10.1093/bioinformatics/btm341 [DOI] [PubMed] [Google Scholar]
- 48.Chen CY. TCM Database@ Taiwan: the world's largest traditional Chinese medicine database for drug screening in silico. PLOS ONE. 2011;6(1):e15939. doi: 10.1371/journal.pone.0015939 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49.Karimi-Jafari MH, Firouzi R, Ashouri M, et al. A database of chemical compositions of Persian medicinal herb. ChemRxiv. 2022. doi: 10.26434/chemrxiv-2022-8rrwp [DOI] [Google Scholar]
- 50.Nguyen-Vo TH, Le T, Pham D, et al. VIETHERB: a database for Vietnamese herbal species. J Chem Inf Model. 2018;59(1):1–9. doi: 10.1021/acs.jcim.8b00399 [DOI] [PubMed] [Google Scholar]
- 51.Tung CW, Lin YC, Chang HS, et al. TIPdb-3D: the three-dimensional structure database of phytochemicals from Taiwan indigenous plants. Database. 2014;2014:bau055. doi: 10.1093/database/bau055 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 52.Valli M, Dos Santos RN, Figueira LD, et al. Development of a natural products database from the biodiversity of Brazil. J Nat Prod. 2013;76(3):439–444. doi: 10.1021/np3006875 [DOI] [PubMed] [Google Scholar]
- 53.Pilón-Jiménez BA, Saldívar-González FI, Díaz-Eufracio BI, et al. BIOFACQUIM: a Mexican compound database of natural products. Biomolecules. 2019;9(1):31. doi: 10.3390/biom9010031 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 54.Olmedo DA, Durant-Archibold AA, López-Pérez JL, et al. Design and diversity analysis of chemical libraries in drug discovery. Biol Med Chem. 2023;27(4):502–515. doi: 10.2174/1386207326666230705150110 [DOI] [PubMed] [Google Scholar]
- 55.Medina-Franco JL. Towards a unified Latin American natural products database: LANaPD. Fut Sci AO. 2020;6(8):FSO468. doi: 10.2144/fsoa-2020-0068 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 56.Núñez MJ, Díaz-Eufracio BI, Medina-Franco JL, et al. Latin American databases of natural products: biodiversity and drug discovery against SARS-CoV-2. RSC Advan. 2021;11(26):16051–16064. doi: 10.1039/d1ra01507a [DOI] [PMC free article] [PubMed] [Google Scholar]
- 57.Barazorda-Ccahuana HL, Ranilla LG, Candia-Puma MA, et al. PeruNPDB: the Peruvian natural products database for in silico drug screening. Sci Rep. 2023;13(1):7577. doi: 10.1038/s41598-023-34729-0 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 58.Garcia AG, Franco JLM, Valli M, et al. Latin American Natural Products Database (LANaPD): towards a comprehensive chemical library [Internet]. Posters. 2023; [cited 2024 mar. 24]. Available from: https://repositorio.usp.br/directbitstream/52e4fe21-d852-4789-b892 72f41235e4df/3148199.pdf [Google Scholar]
- 59.Ntie-Kang F, Zofou D, Babiaka SB, et al. AfroDb: a select highly potent and diverse natural product library from African medicinal plants. PLOS ONE. 2013;8(10):e78085. doi: 10.1371/journal.pone.0078085 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 60.Ntie-Kang F, Amoa Onguéné P, Fotso GW, et al. Virtualizing the p-ANAPL library: a step towards drug discovery from African medicinal plants. PLOS ONE. 2014;9(3):e90655. doi: 10.1371/journal.pone.0090655 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 61.Simoben CV, Qaseem A, Moumbock AFA, et al. Pharmacoinformatic investigation of medicinal plants from East Africa. Mol Inform. 2020;39(11):e2000163. doi: 10.1002/minf.202000163 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 62.Diallo BN, Glenister M, Musyoka TM, et al. SANCDB: an update on South African natural compounds and their readily available analogs. J Cheminform. 2021;13(1):37. doi: 10.1186/s13321-021-00514-2 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 63.Ntie-Kang F, Telukunta KK, Döring K, et al. NANPDB: a resource for natural products from Northern African sources. J Nat Prod. 2017;80(7):2067–2076. doi: 10.1021/acs.jnatprod.7b00283 [DOI] [PubMed] [Google Scholar]
- 64.Bultum LE, Woyessa AM, Lee D. ETM-DB: integrated Ethiopian traditional herbal medicine and phytochemicals database. BMC Complement Altern Med. 2019;19(1):212. doi: 10.1186/s12906-019-2634-1 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 65.Vivek-Ananth RP, Mohanraj K, Sahoo AK, et al. IMPPAT 2.0: an enhanced and expanded phytochemical Atlas of Indian Medicinal Plants. ACS Omega. 2023;8(9):8827–8845. doi: 10.1021/acsomega.3c00156 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 66.Rutz A, Sorokina M, Galgonek J, et al. The LOTUS initiative for open knowledge management in natural products research. Elife. 2022;11:e70780. doi: 10.7554/eLife.70780 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 67.Xu T, Chen W, Zhou J, et al. NPBS database: A chemical data resource with relational data between natural products and biological sources. Database. 2020;2020:baaa102. doi: 10.1093/database/baaa102 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 68.Zin PPK, Williams GJ, Ekins S. Cheminformatics analysis and modeling with macrolactone DB. Sci Rep. 2020;10(1):6284. doi: 10.1038/s41598-020-63192-4 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 69.Ntie-Kang F, Nwodo JN, Ibezim A, et al. Molecular modeling of potential anticancer agents from African medicinal plants. J Chem Inf Model. 2014;54:2433–2450. doi: 10.1021/ci5003697 [DOI] [PubMed] [Google Scholar]
- 70.Mangal M, Sagar S, Singh H, et al. NPACT: naturally occurring plant-based anti-cancer compound-activity-target database. Nucleic Acids Res. 2013;41(D1):D1124–D1129. doi: 10.1093/nar/gks1047 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 71.Zhang R, Lin J, Zou Y, et al. Chemical space and biological target network of anti-inflammatory natural products. J Chem Inf Model. 2018;59(1):66–73. doi: 10.1021/acs.jcim.8b00560 [DOI] [PubMed] [Google Scholar]
- 72.Minkiewicz P, Iwaniak A, Darewicz M. BIOPEP-UWM Database of bioactive peptides: current opportunities. Int J Mol Sci. 2019;20(23):5978. doi: 10.3390/ijms20235978 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 73.Papaemmanouil CD, Peña-García J, Banegas-Luna AJ, et al. ANTIAGE-DB: a database and server for the prediction of antiaging compounds targeting elastase, hyaluronidase, and tyrosinase. Antioxidants. 2022;11(11):2268. doi: 10.3390/antiox11112268 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 74.Farrell LJ, Lo R, Wanford JJ, et al. Revitalizing the drug pipeline: AntibioticDB, an open access database to aid antibacterial research and development. J Antimicrob Chemother. 2018;73(9):2284–2297. doi: 10.1093/jac/dky208 [DOI] [PubMed] [Google Scholar]
- 75.Moumbock AFA, Gao M, Qaseem A, et al. StreptomeDB 3.0: an updated compendium of streptomycetes natural products. Nucleic Acids Res. 2021;49(D1):D600–D604. doi: 10.1093/nar/gkaa868 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 76.Miranda-Salas J, Peña-Varas C, Martínez IV, et al. Trends and challenges in chemoinformatics research in Latin America. Artif Intel Life Sci. 2023;3:100077. doi: 10.1016/j.ailsci.2023.100077 [DOI] [Google Scholar]
- 77.Chassagne F, Cabanac G, Hubert G, et al. The landscape of natural product diversity and their pharmacological relevance from a focus on the Dictionary of Natural Products®. Phytochem Rev. 2019;18:601–622. doi: 10.1186/1749-8546-1-3 [DOI] [Google Scholar]
- 78.Ehrman TM, Barlow DJ, Hylands PJ. In silico search for multi-target anti-inflammatories in Chinese herbs and formulas. Bioorg Med Chem. 2010;18:2204–2218. doi: 10.1016/j.bmc.2010.01.070 [DOI] [PubMed] [Google Scholar]
- 79.Quinn RJ, Carroll AR, Pham NB, et al. Developing a drug-like natural product library. J Nat Prod. 2008;71:464–468. doi: 10.1021/np070526y [DOI] [PubMed] [Google Scholar]
- 80.Chen Y, de Bruyn Kops C, Kirchmair J. Data resources for the computer-guided discovery of bioactive natural products. J Chem Inf Model. 2017;57(9):2099–2111. doi: 10.1021/acs.jcim.7b00341 [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
