Skip to main content
iScience logoLink to iScience
. 2023 Aug 25;26(9):107729. doi: 10.1016/j.isci.2023.107729

Modern drug discovery using ethnobotany: A large-scale cross-cultural analysis of traditional medicine reveals common therapeutic uses

Daniel Domingo-Fernández 1,2,3,, Yojana Gadiya 1,2, Sarah Mubeen 1, Thomas Joseph Bollerman 1, Matthew D Healy 1, Shaurya Chanana 1, Rotem Gura Sadovsky 1, David Healey 1, Viswa Colluru 1
PMCID: PMC10494464  PMID: 37701812

Summary

For millennia, numerous cultures and civilizations have relied on traditional remedies derived from plants to treat a wide range of conditions and ailments. Here, we systematically analyzed ethnobotanical patterns across taxonomically related plants, demonstrating that congeneric medicinal plants are more likely to be used for treating similar indications. Next, we reconstructed the phytochemical space covered by medicinal plants to reveal that (i) taxonomically related medicinal plants cover a similar phytochemical space, and (ii) chemical similarity correlates with similar therapeutic usage. Lastly, we present several case scenarios illustrating how mining this information can be used for drug discovery applications, including: (i) investigating taxonomic hotspots around particular indications, (ii) exploring shared patterns of congeneric plants located in different geographic areas, but which have been used to treat the same indications, and (iii) showing the concordance between ethnobotanical patterns among non-taxonomically related plants and the presence of shared bioactive phytochemicals.

Subject areas: Health sciences, Therapeutics, Drug delivery system

Graphical abstract

graphic file with name fx1.jpg

Highlights

  • Congeneric medicinal plants tend to be used for treating similar indications

  • Chemical similarity correlates with medicinal usage

  • Ethnobotanical patterns can generate high-confidence hypotheses in drug discovery


Health sciences; Therapeutics; Drug delivery system

Introduction

The use of medicinal plants to treat myriad indications can be traced back thousands of years across a wide variety of geographical regions, cultures, and civilizations.1,2,3 Ethnobotany, as proposed by de Albuquerque and Hanazaki (2009), is a field of study that investigates the complex relationships between people and plants to explore the ways in which different societies utilize plants for various applications, including medicinal purposes.4 This field incorporates a number of disciplines, such as phylogeny, taxonomy, and pharmacology, all of which aid in the identification of bioactive natural products.5 By studying these traditional practices, scientists have been able to extract the active ingredients for applications in modern medicines.4,6 Similarly, the related interdisciplinary fields of ethnopharmacology,7 the study of traditionally used bioactive natural products, and chemical ecology, the investigation of how humans interact with their chemical environment (Johns, 1990), each provide valuable insights and resources for efforts in drug discovery.8 However, in recent decades, the pharmaceutical industry has shifted away from utilizing historical knowledge or natural resources as the primary basis for drug development.9,10 In parallel, there has been a sharp decline in our ability to translate laboratory research into successful clinical programs.

A major bottleneck in harvesting ethnobotany priors (i.e., a plant used to treat a certain indication by a specific culture) is balancing both scale and fidelity, as both are critical to identifying high-confidence hypotheses. For any one prior, confidence is largely determined by the quality of primary data testing a hypothesis. However, ethnobotanical practices are limited to codification based on experience. Evidence, similarly, is indirectly derived from the preservation of the knowledge and persistent use of the dosing practice over time. Therefore, any systematic investigation of ethnobotany priors requires building a global repository of the history of ethnobotany practices in order to allow for the generation of proxy measures of high-confidence.

One motivation for conducting systematic studies of ethnobotany priors is our observation that similar plants from distinct geographical regions have been used to treat similar diseases. For example, Tinospora cordifolia, a plant native to India, has been used to treat several conditions, including liver diseases and jaundice.11 Similarly, Tinospora bakis, a plant belonging to the same genus which grows in Nigeria and other countries in Eastern and Western Africa, is also used to treat liver diseases and jaundice.12 Similarly, Glycyrrhiza uralensis (native to Asia) and Glycyrrhiza lepidota (native to North America) are both used to treat cough and sore throat.13 In both of these examples and dozens of others, taxonomically and phylogenetically related plants have been used to treat similar conditions, despite growing in disparate regions of the world.14 Thus, analyzing the ethnobotanical use of medicinal plants to find conserved patterns across cultures can build confidence in the efficacy of these medicinal plants. As a result, therapeutic hypotheses derived from convergent use of medicinal plants may be prioritized for drug discovery.

Several studies have identified relationships between taxonomically related species and medicinal use across different regions.15 For instance, Saslis-Lagoudakis et al. (2012) found conserved patterns of medicinal use for 1,500 species traditionally used to treat 13 indication areas across three floras (i.e., Nepal, New Zealand, and South Africa).16 To do so, the authors built a phylogeny using RuBisCo sequences and identified genera located in the three explored floras that included significantly more medicinal plants than the rest of the phylogenetic tree. Similarly, Reinaldo et al. (2020) proposed a model (i.e., Utilitarian Equivalence Model) aimed at improving understanding of the shared applications of diverse medicinal flora to test the hypothesis that taxonomically related plants have a higher likelihood of being utilized for similar medicinal purposes.17 The model itself assumes that (i) a high medicinal usage overlap of taxonomically closely related species by disparate socioecological systems is indicative of similar as opposed to identical use, (ii) this similarity is influenced by two main factors: the cultural practices of the people using the plants, and the environmental conditions in which they grow, and (iii) the reason why these similarities exist may be because the plants have evolved in similar ways and thus possess similar characteristics. Specifically, the authors investigated 64 plants from semi-arid and humid regions in the Northeast of Brazil associated with 27 therapeutic indication areas and found that plants of the same genus are more likely to be used for the same indication area. Furthermore, previous research has also explored this hypothesis. For instance, Johnson-Fulton et al. (2018) analyzed 16 species of the Cochlospermaceae family used in tropical regions around the globe (i.e., South America, Africa, India, and North Australia) and found shared patterns of use among them.18 Furthermore, seminal work by Moerman (1991, 1996) specifically examined the medicinal flora of North America to show that medicinal plants are not randomly distributed, but rather, they are over-represented in certain families.19,20 Later, an additional study by the same author validated these findings on a larger scale by investigating five specific geographical regions across the globe.21 Since then, among the vast number of scientific articles on ethnopharmacology published in the last years,22 a significant body of literature has focused on comparing the use of plants from specific cultures.23,24 However, further research is needed to consolidate these findings by conducting a comprehensive pan study that systematically investigates all cultures and floras available in the ethnobotanical literature. Such a study would provide a holistic understanding of the relationships between taxonomically related species and their medicinal uses.

Common medicinal properties of taxonomically related plants can be attributed to shared bioactive structures produced by conserved metabolic routes.25,26,27 By analyzing the presence of eight secondary metabolite classes in 435 plant families, Zhang et al. (2021) found correlations between the abundance of some of these classes and the phylogenetic trees of these families.28 Furthermore, their findings suggest the convergent evolution of some biosynthesis pathways through the tree, as previously described in seminal work by Wink (2003).29 In a similar study, Alrashedy and Molina (2016) investigated the use of 126 psychoactive genera and found that some phytochemical and pharmacological traits (e.g., hallucinogenic and sedative) of these genera are phylogenetically clustered.30 Thus, combining ethnobotanical knowledge with other information, such as chemical composition, geographical distributions, or environmental conditions, can provide insights into shared ethnobotanical patterns and potentially reveal common chemical structures or metabolic routes that explain the medicinal properties of these plants.

In this work, we systematically assemble ethnobotanical knowledge and conduct a large-scale, cross-cultural analysis of traditional medicine to evaluate the empiricism and non-randomness of the traditional use of plants for medicinal purposes. The presented study sheds light on the ethnobotanical patterns of thousands of medicinal plants from all regions across the globe, encompassing thousands of therapeutic indications. Our findings reveal that taxonomically related medicinal plants tend to be used for similar therapeutic purposes across cultures. Subsequently, we study the influence that geographical location may have on ethnobotanical patterns. Here, we observe that geographically close congeneric species exhibit slightly higher correlations in their therapeutic use compared to geographically distant species. However, we also find dozens of examples of congeneric medicinal plants that have been used for the same therapeutic indications despite being located in widely separated regions of the world. Additionally, we investigate the phytochemical composition of medicinal plants, finding that taxonomically related species are more likely to have similar chemical compositions. We further find that these taxonomically related species, which are more likely to have similar chemical compositions, are in turn also more likely to be used for similar therapeutic purposes, providing support for the hypothesis that their traditional use is non-random. Finally, through multiple case scenarios, we demonstrate how the integration of ethnobotanical, taxonomical, and phytochemical information can aid in identifying compounds with therapeutic potential for applications in drug discovery.

Results

Medicinal plant usage correlates across taxonomic levels

In this section, we conduct a large-scale, systematic analysis to test whether there is a higher degree of correlation between the use of taxonomically close plants to treat similar indications than taxonomically distant ones. To test this hypothesis, we investigated plant-disease pairs (i.e., 5,636 medicinal plants against 23 indication areas) from scientific literature and ethnobotanical databases to establish correlations between the use of medicinal plants and taxonomic classification. In Figure 1, we compare the correlations in medicinal usage between pairs of plants that belong to the same family, same genus, or are unrelated at those levels (see example in Figure S1).

Figure 1.

Figure 1

Medicinal usage similarity between taxonomically related and unrelated plants

(A and B) Each raincloud plot illustrates the distribution of medicinal plants used to treat a disease from one of the 23 disease categories (see Table S2).71 Taxonomic levels are ordered from left to right based on taxonomic similarity, where plants belonging to the same genus are the most similar, and “random plants” contain 100,000 randomly selected medicinal plant pairs from a different family. We show the same analysis performed by extracting the medicinal usage vectors from either (A) scientific literature or (B) curated ethnobotanical databases. Legend: ∗∗∗∗ = q-value < 1.00e−04; ns = non-significant.

Overall, congeneric plant pairs (i.e., plants of the same genus) exhibited a higher correlation for treating similar indication areas than plant pairs from the same family or random pairs in both datasets (Figure 1). Thus, our findings confirmed that as the taxonomic classification space of medicinal plants broadens, the specificity for treating indication areas decreases. However, we observed these differences markedly diminish when comparing medicinal plants belonging to the same family and random plant pairs in the literature dataset, and disappear in the ethnobotanical databases.

When comparing the use of congeneric plants for similar medicinal purposes based on whether these plant pairs come from the scientific literature or ethnobotanical databases (see method details subsection “extracting ethnobotanical information from scientific literature and databases”), we observed slightly larger correlation values for databases than we did for literature, although both distributions exhibited the same shape. This can be explained by the fact that ethnobotanical databases have fewer diseases due to their focus on only five specific regions. As a result, there is a higher likelihood for a plant pair to treat the same indication(s), as discussed in the next subsection.

Lastly, to identify global patterns across different taxonomic levels, we inspected the indication areas associated with each genus (Figure 2). Figure 2 revealed that infectious diseases, inflammatory, and digestive-related indications were the most predominant indication areas reported in literature. By zooming into specific taxonomic levels, we can see trends in the use of plants of the same taxonomic group for the treatment of similar indication areas. For instance, in the case of the Phaseoleae tribe (see zoomed-in heatmap in Figure 2), we observe how the genera in this tribe are particularly associated with nervous system indications as well as infectious diseases. Overall, the figure illustrates how the ethnobotanical use of plants correlates with different taxonomic levels.

Figure 2.

Figure 2

Landscape of the therapeutic uses of medicinal plants

Schematic phylogenetic tree of the genera containing medicinal plants and their associated indication areas. The heatmap is colored by Relative Citation Count (RCC), which corresponds to the total number of literature citations per indication in each genus (i.e., sum of the disease vectors for all species in a genus) normalized by the total number of citations in the given genus. We zoom in on a heatmap that corresponds to genera present in the Phaseoleae tribe.

Investigating the influence of geography in the common medicinal use of congeneric plants

We evaluated whether taxonomically related plants across geographic locations are used to treat similar conditions to account for the potentially confounding effect of information exchange between geographically close communities. In other words, we speculate that taxonomically close plants across distant floras have common medicinal uses, which local communities have independently discovered. To test this hypothesis, we used all plant pairs of the same genus from both datasets (i.e., scientific literature and ethnobotanical databases) and grouped them based on whether they grow in at least one common country (see method details subsection “plant geographic and climatic information”). Next, we compared the distribution of the disease similarity scores (i.e., Pearson correlation) across the different groups.

For medicinal plants found in scientific literature, we observed that geographically distant plants exhibit a lower disease similarity compared to plants that are geographically proximal (Figure S2). However, these differences are reduced in the database dataset to barely above significance level (Figure S3). Here, it is important to note that the database dataset under consideration is composed of specific ethnobotanical information and floras curated from five distinct countries, resulting in a larger number of plant pairs in the geographically close group. The fact that the signal was less pronounced in this dataset prompted us to further classify the geographically close group into three different subgroups of approximately the same number of plants, namely: (i) low overlap, (ii) medium overlap, and (iii) high overlap, and compare the distribution of their disease similarity scores (Figure 3; Figure S4). We found that the geographic region in which a congeneric plant pair is located has little to no significance in determining whether the pair is used to treat a particular indication area, as illustrated in Figure 3; hence, indicating the possibility of the use of congeneric medicinal plants for the treatment of similar indications irrespective of their geographic location.

Figure 3.

Figure 3

Medicinal usage similarity between plants of the same genus located in common and distinct geographic areas

The set of medicinal plants used corresponds to the ones reported by ethnobotanical databases. We grouped plants based on the relative number of overlapping countries they are located in using the Szymkiewicz-Simpson coefficient (SSC). While plant pairs of the same genus that do not come from any common country are grouped in the “No overlap” group, plant pairs with at least one common country were aggregated into three groups based on their geographical similarity: (i) “Low geographical similarity” for 0 < SSC < 0.4, (ii) “Medium geographical similarity” for 0.4 < SSC < 0.8, and (iii) “High geographical similarity” for 0.8 < SSC. Each raincloud plot illustrates the distribution of medicinal plants used to treat a disease from one of the 23 disease categories (see Table S2). Legend: ∗∗ = 1.00e−03 < q-value < 1.00e−02; ns = non-significant.

Overall, our analysis suggests that congeneric plants located in the same regions are slightly more likely to be used to treat similar indications compared to congeneric plants situated in different areas. This finding is not surprising given that diseases themselves are often endemic to a certain region (e.g., malaria and ebola). Furthermore, closely related species tend to share geographic and ecological distribution,19,20,21,31,32,33 as shown in Figure 3 where we observe a larger sample size for geographically close species. However, it is also important to note that these differences in disease similarity between geographically close and distant plants are much less acute than the differences observed comparing taxonomically close and distinct plants (Figure 3). In conclusion, while there is a general trend showing plants of the same genus located in similar regions are more likely to treat similar indications, we also found that a large number of congeneric plant pairs located in non-overlapping geographical areas are also used to treat the same indications. We further explore these pairs in one of the following case scenarios as they represent potential signals of independent ethnobotanical convergence.34

Taxonomically related medicinal plants cover a similar chemical space

One possible explanation for the previously observed trend associating taxonomically related medicinal plants to common ethnobotanical uses is that related species share conserved metabolic pathways.25,26,27 This suggests that medicinal properties of plants for a given indication are due to the presence of specific metabolites produced by conserved metabolic pathways, and since taxonomically related plants conserve these pathways, they have been traditionally associated with that particular therapeutic indication. Thus, we first investigated whether taxonomically related plants share a larger chemical space than non-taxonomically related plants.

To do so, we calculated chemical similarity by grouping plant pairs of different classes (i.e., plants of the same genus, the same family, and taxonomically distant plants) for the set of medicinal plants previously analyzed. The results of this analysis corroborated the hypothesis that taxonomically related plants tend to share a chemical space (Figure 4). More specifically, regardless of whether the comparison was based on phytochemicals (Figure 4A) or their scaffolds (Figure 4B), we observed a significantly larger number of congeneric medicinal plants with higher chemical overlap compared to plants of the same family or non-taxonomically related medicinal plants.

Figure 4.

Figure 4

Chemical composition similarity between taxonomically and non-taxonomically related medicinal plants

(A) With chemicals and (B) with Murcko scaffolds for chemicals. As outlined in section in-depth exploration of ethnobotanical patterns for drug discovery, chemical similarity is defined by the Szymkiewicz-Simpson Coefficient (SSC) between the set of chemicals or Murcko scaffolds contained in two plants. The group Random corresponds to 100,000 pairs of plants not belonging to the same family (i.e., non-taxonomically related plants). Legend: ∗∗∗∗ = q-value < 1.00e−04.

Additionally, we took a subset of plants with known chemical space but which had not yet been reported to treat any disease (i.e., non-medicinal plants) and also analyzed their chemical space and chemical scaffold similarity at both taxonomic levels (Figure S5). Here, similar to medicinal plants, we observed higher chemical similarity for congeneric plants compared with the other two groups. However, when comparing the chemical similarity between congeneric medicinal and non-medicinal plants, we found a significantly higher correlation for medicinal plants (Figure S6). Thus, suggesting that medicinal plants of the same genus tend to cover a more similar chemical space than non-medicinal plants.

Lastly, we evaluated whether congeneric medicinal plants with high chemical similarity tend to demonstrate a higher disease similarity than congeneric plants with low chemical similarity. We distinguished these groups (i.e., congeneric plants with high and low chemical similarity) by selecting plant pairs with the lowest chemical similarity (SSC = 0) and selecting an equal number with high chemical similarity. Next, we compared the distributions of the correlation values for the ethnobotanical use of the two groups of medicinal plants (Figure 5), similar to the previous analyses. This analysis confirmed that chemically similar plants of the same genus have been traditionally used to treat similar indications. Overall, this analysis illustrates the importance of combining phytochemical and ethnobotanical information as an avenue to identify potential medicinal natural products in the future.

Figure 5.

Figure 5

Medicinal usage similarity between medicinal plants with low and high chemical similarity

Each raincloud plot illustrates the distribution of Pearson correlation coefficients for the two groups of medicinal plants (i.e., congeneric plants with high and low chemical similarity) used to treat a disease from one of the 23 disease categories (see Table S2). The disease vectors are populated based on the count of plant-disease associations in the scientific literature dataset. Legend: ∗∗ 1.00e−03 < q-value < 1.00e−02.

In-depth exploration of ethnobotanical patterns for drug discovery

While previous sections have shown global patterns related to medicinal plants by leveraging taxonomic, chemical, geographical, and ethnobotanical knowledge, in this section, we present a series of case scenarios demonstrating how this information can be used to identify medicinal plants and bioactive phytochemicals for drug discovery campaigns.

Case scenario 1: Exploring taxonomic hotspots around a particular indication

Previously, we showed that taxonomically related medicinal plants tend to be used to treat similar indications. In this subsection, we delve into two therapeutic indications and explore their taxonomic hotspots (i.e., taxonomic groups where multiple species are associated with the disease) to identify phytochemicals that could be responsible for their bioactivity. Firstly, we focused on medicinal plants used to treat asthma and identified several hotspots (Figure 6A). The most prominent one is the Salvia genus, in which three species exhibit a strong association with asthma (i.e., reported in more than 15 publications): Salvia officinalis, Salvia rosmarinus, and Salvia miltiorrhiza. Recent studies have reported bioactivity for extracts and phytochemicals present in these species, such as Salvianolic acids,35 rosmarinic acid,36 borneol,37 and carnosol38 (Figure S7A). Similarly, we also investigated the hotspots of medicinal plants associated with breast cancer (Figure S7B). For this indication, we observed hotspots in the Umbelliferae family, particularly in the Angelica genus. Mira and Shimizu (2016) studied the latter genus for its antitumoral properties based on the phytochemical composition of the plants.39 In concordance with their results, we were able to identify compounds with antitumor activity in Angelica gigas and Angelica sinensis, including alpha-phellandrene,40 angelicin,41 and alpha-pinene42 (Figure S7C). Overall, these examples demonstrate how focusing on taxonomic hotspots can reveal potential bioactive phytochemicals and assist in guiding natural product-based drug development.

Figure 6.

Figure 6

Illustrative examples of exploring ethnobotanical patterns for drug discovery

(A) Subset of plants traditionally used to treat asthma visualized in a subset of the taxonomic tree. Due to the large number of species associated with asthma (591), only species associated with the disease in more than 15 publications are highlighted in green in the outer ring of the tree with their respective binomial names displayed.

(B) Geographical location of the plants presented in the first case scenario. Color tones used to represent the main locations of each plant pair: Taraxacum (red), Sambucus (yellow), Liquidambar (green), Coptis (violet), Boswellia (blue). We would like to note the circle marker does not represent the entire geographical location where the plant is currently present but an approximate location of its native region according to Royal Botanic Gardens, Kew (https://www.kew.org/).

(C) Hierarchical clustering of the plants of the genus based on their reported indication areas. Species names highlighted in bold correspond to the plants known to contain atropine.

(D) Evaluating chemical diversity between non-medicinal and medicinal plants. The plotted distribution corresponds to the number of unique Murcko scaffolds obtained from 10,000 randomly generated sets of phytochemicals drawn from non-medicinal plants. The orange vertical bar represents the number of unique scaffolds obtained from the same number of phytochemicals selected from the top 100 medicinal plants with the most reported therapeutic uses. It is worth noting that all of the numbers of unique scaffolds plotted in the distribution are derived from sets of phytochemicals of equal size.

Case scenario 2: Analyzing shared cross-cultural patterns of geographically distant congeneric plants

Prior to our current era of globalization, the exchange of ideas was far more limited, and cultures separated by geographical regions would rely upon their respective local floras to treat various ailments, some of which were later discovered to hold medicinal properties. Notably, similar plants have been used to treat similar conditions in different parts of the world, one of the most compelling pieces of evidence to suggest that some medicinal plants are indeed efficacious. Here, we explore the pairs of congeneric geographically distant plants that exhibited high correlation in our previous analysis (Figure 3; “No overlap” group) to evaluate whether delving into these patterns can reveal high-confidence plant-disease associations.

Among the pairs with the strongest correlations, we found the dandelion species Taraxacum officinale, which is native to Western Europe, and Taraxacum mongolicum, which can be found from Siberia to temperate East Asia. Both species have reported uses for treating acute lung injury,43,44 breast cancer,45,46 and viral hepatitis.47,48 Among the many potentially bioactive compounds present in both Taraxacum officinale and Taraxacum mongolicum are taraxacosides, sesquiterpene lactones, phenylpropanoids, and triterpenoid saponins.49

We also found the elderberry species Sambucus canadensis, native to North and South America, and Sambucus williamsii, native to Asia and widely cultivated in Europe. The Eurasian species has thousands of years of traditional medicinal use and has been subject to considerable modern research, whereas the American species has received far less research attention, although it appears to be of a similar chemical composition.50 Uses and therapeutic indications for the Eurasian species include respiratory diseases, antimicrobial, cancer, immune enhancement, arthritis, and constipation,51 while uses for the American species in Brazilian and Cuban folk medicine include respiratory diseases, anti-inflammatory, sinusitis, influenza, and general immune enhancement.50 Similar to the previous example, both Sambucus species from the Western and Eastern Hemispheres contain many of the same or highly similar flavanoids, sambubiosides, anthocyanins, rutinosides, and quercetins.51 Additionally, we also identified several other congeneric species, such as Liquidambar styraciflua (North and Central America) and Liquidambar orientalis (Greece and Turkey), Coptis teeta (India) and Coptis japonica (Japan), and Boswellia serrata (India) and Boswellia sacra (South of the Arabian peninsula) (Figure 6B), all of which shared common therapeutic uses and can potentially be investigated to discover new bioactive phytochemicals.

Case scenario 3: Uncovering common ethnobotanical patterns across genera through phytochemical profiling

In this case scenario, we focus on well-known bioactive phytochemicals to illustrate how non-taxonomically related species can share common ethnobotanical patterns around certain therapeutic areas if they contain common bioactive compounds used to treat those.

Firstly, we investigated species containing sennosides, a group of phytochemicals commonly used as laxatives both for constipation and for emptying the digestive system before surgery.52 According to two natural product databases used in our work,53,54 sennosides are present in the following genera: Cassia, Fallopia, Reynoutria, Rheum, Senna, and Terminalia. In NCBI Taxonomy, these genera comprise a total of 431 species, of which 22 are known medicinal plants, as reported in the datasets used in this work. Among these species, only eight are known to contain sennosides, and all of these have been reported to hold therapeutic properties for digestive disorders from a total of 103 articles. Conversely, from the remaining 14 species that are known medicinal plants, only nine have been associated with digestive disorders by a total of 46 articles. Similarly, we examined species containing vinca alkaloids, a large group of alkaloids known for their anti-mitotic and anti-microtubule activity, which are used to treat various forms of cancer.55 Two genera known to contain these important phytochemicals are Catharanthus and Nelumbo. In both genera, only two species contain vinca alkaloids, namely, Catharanthus roseus and Nelumbo nucifera, which are also the only species for which anticancer activity has been reported.

Lastly, we focused on a known bioactive natural product, atropine, which has several medical uses, including neurological, cardiovascular, and eye conditions, and is also present in several genera, such as Atropa, Brugmansia, Cornus, Datura, Duboisia, and Hyoscyamus. Since this phytochemical is used across multiple indications, in contrast to the two preceding examples, we clustered all medicinal plants within these genera based on their therapeutic usage (Figure 6C). The clustering exercise grouped together species containing atropine, as these plant species have traditionally been used for the aforementioned therapeutic areas related to atropine (e.g., nervous system disorders). Furthermore, it also revealed that some of these plants have traditionally been used for treating poisoning, which is another application of atropine.

Case scenario 4: Evaluating the chemical diversity of medicinal plants

Natural products are one of the primary sources for discovering new drugs56 and have been demonstrated to contain a diverse range of biologically active molecules.57 Here, we explore whether the known chemical space of medicinal plants is more chemically diverse than the space of non-medicinal plants. To do so, we first selected a representative set of medicinal plants by choosing the top 100 plants with the most reported therapeutic uses. Next, we compared the number of unique Murcko scaffolds58 found in this medicinal plant sample to the number of unique scaffolds in 10,000 randomly generated sets of phytochemicals from non-medicinal plants. Here, we ensured that each random set derived from non-medicinal plants contained the same number of phytochemicals as the medicinal plant sample and that these phytochemicals did not have distinct chemical properties (see Figure S8). Our results indicate that this sample of medicinal plants indeed presented a significantly greater chemical diversity compared to non-medicinal plants (Figure 6D), which suggests that medicinal plants should be prioritized in the creation of natural product libraries when the goal is to maximize chemical space coverage.

Discussion

Despite the long-standing use of medicinal plants by a myriad of cultures over thousands of years, the potential of numerous indigenous plants as sources for new pharmaceuticals has yet to be fully explored. In our work, we assembled a comprehensive dataset of ethnobotany priors and explored the chemical space of known medicinal plants, to confirm that taxonomically related medicinal plants are more likely to be used for similar therapeutic purposes. We also examined the effect of geographical location on this relationship and found that congeneric species that are located in the same region exhibit slightly stronger correlations for the treatment of similar indications compared to congeneric species located in disparate regions. In addition, by analyzing the phytochemical composition of medicinal plants, we showed that (i) taxonomically related plants tend to have similar chemical compositions, and (ii) there is a correlation between chemical similarity and therapeutic usage. Finally, in a series of case scenarios, we demonstrated how combining ethnobotanical, taxonomical, and phytochemical information can be exploited for drug discovery applications. We conclude that a systems-level organization of ethnobotanical and phytochemical knowledge can be used to generate high-confidence hypotheses attuned to modern target or pathway-centric drug discovery at scale. Such hypotheses, rooted in orthogonal priors derived from millennia of human experience, may drive a much-needed improvement in the translatability of preclinical discoveries to approved medicines.

There are a number of potential avenues for future research. Firstly, the comprehensive dataset accompanying our analysis allows prospective analyses to investigate other ethnobotany trends in the future. Secondly, plant environmental properties (e.g., pH of the soil, altitude, climate, and atmospheric humidity) can also be leveraged to conduct clustering analyses to supplement the ones presented here. Thirdly, dietary patterns from wild animals or herbivorous insects59 could also potentially identify novel medicinal plants,60 as illustrated by Bautista-Sopelana et al. (2022) who identified two bioactive plants by studying the diet of a particular bird that preferentially eats these plants during the mating season.61 Fourthly, we have not yet investigated additional aspects, such as the part of the plant used to prepare an extract, how the plant extract is produced, or in what quantities it should be consumed. However, we can also leverage these factors to further our understanding of the profile of phytochemicals in an extract. Finally, we envisage tangential applications to the ones presented here, such as finding alternative sources for bioactive phytochemicals.

Limitations of the study

Although extracting ethnobotanical and phytochemical information using natural language processing (NLP) approaches made it feasible for us to assemble this knowledge without having to perform manual curation, NLP approaches have their limitations. Firstly, despite the high accuracy of the Named Entity Recognition (NER) model used for species and disease with F1-scores close to 0.9, linking the recognized species and disease to the corresponding MeSH and NCBITaxon identifiers and accurately predicting if there is a relation between them is a challenging task. Therefore, solely relying on NLP to systematically extract and evaluate plant-disease relations inherently implies adding some false relations. To account for this limitation, we chose to employ an additional manually curated dataset consisting of ethnobotany databases in order to validate our findings. We further employed human-in-the-loop curation at different sampling thresholds depending on the amenability of the dataset to NLP-based relationship extraction. Secondly, while we normalized plants and diseases to large and granular ontologies, such as NCBITaxon and MeSH, these resources do not yet cover all plant species and diseases; thus, there are likely species or indications that our NLP approach overlooked. Thirdly, we used available geographical data at the country level as a proxy for geographical proximity, which may not be the optimal resolution for plants exclusively growing in a specific region of a large country. However, this remains to be the most granular geographical resolution available in a dataset with enough coverage to allow us to analyze the thousands of medicinal plants employed in our work. Similarly, although the natural product databases used for phytochemical landscaping are two high-quality resources, we still do not know the complete chemical composition of a given plant.62 As a result, there are inherent gaps, or even possible inaccuracies, for some plants.

STAR★Methods

Key resources table

REAGENT or RESOURCE SOURCE IDENTIFIER
Deposited data

BERN2 (v.1.1)63 NLP tool by DMIS lab https://github.com/dmis-lab/BERN2
VietHerb64 Ho Chi Minh International University https://vietherb.com.vn/
Ewé65 University of Reading https://doi.org/10.1093/database/baz144
IMPPAT 2.066 The Institute of Mathematical Sciences (IMSc), Chennai https://cb.imsc.res.in/imppat/
ETM-DB67 University of Reading http://biosoft.kaist.ac.kr/etm/home.php/
Korean Traditional Knowledge Portal (KTKP)68 Korean Intellectual Property Office https://www.koreantk.com/ktkp2014/
NCBI Taxonomy (NCBITaxon) NIH https://www.ncbi.nlm.nih.gov/taxonomy
Mondo Disease Ontology (MONDO)69 (v.2022-10-11) Monarch Initiative https://mondo.monarchinitiative.org/
MeSH 2022 NIH https://meshb.nlm.nih.gov/
COCONUT53 University Friedrich-Schiller of Jena https://coconut.naturalproducts.net/
LOTUS54 University of Geneva https://lotus.naturalproducts.net/
Trefle Trefle https://github.com/treflehq/dump

Software and algorithms

Python 3.9.0 Python Software Foundation https://www.python.org
NumPy 1.22.0 NumPy project https://numpy.org/
Pandas 1.5.2 Pandas project https://pandas.pydata.org/
Seaborn 0.11.0 Seaborn project https://seaborn.pydata.org/
Dask Dask https://www.dask.org/
RDKit70 Landrum70 https://www.rdkit.org/
PubChemPy PubChemPy project https://github.com/mcs07/PubChemPy
ETE 3.0 ETE Toolkit http://etetoolkit.org/
Raincloud71 Allen et al.71 https://github.com/RainCloudPlots/RainCloudPlots
T0pp72 Sanh et al.72 https://huggingface.co/bigscience/T0pp

Resource availability

Lead contact

Further information and requests for data and script should be directed to and will be fulfilled by the lead contact, Daniel Domingo-Fernández (daniel.domingo-fernandez@envedabio.com).

Materials availability

The following ontologies: MeSH, NCBI Taxonomy (NCBITaxon), and Mondo Disease Ontology (MONDO) were downloaded from OBO Foundry (https://obofoundry.org/) on 10-23-2022 and were used to map BERN2 original identifiers from MeSH to MONDO as well as to build the plant taxonomy. NCBI taxonomy was used to generate the phylogenetic tree using ETE v3.0. The raw BERN2 dataset was processed using Dask. RDKit (v2022.09.1) and PubChemPy were employed to process chemical structures, load SDF files, map chemical structures to PubChem identifiers, and perform the scaffold analysis on the phytochemicals.

Method details

Extracting ethnobotanical information from scientific literature and databases

To systematically identify plant-disease associations reported in scientific literature, we employed BERN2,63 a large-scale dataset containing normalized biomedical entities for over 33 million PubMed abstracts. Firstly, we identified all sentences in PubMed where both a plant (NCBITaxon identifier) and a disease (MeSH identifier) occurred. Of the most highly occurring plants, we filtered out 129 plant species with names corresponding to fruits (e.g., Carica papaya) or common food ingredients (e.g., Piper longum) (see Table S1). Secondly, we mapped MeSH identifiers to MONDO identifiers using MONDO’s cross-references. Thirdly, to identify high-confidence plant-disease associations and remove false positives, we employed T0pp,72 a large language model similar to ChatGPT, as a relation extraction model to evaluate whether the occurrence of a plant and disease in a sentence was indeed attributed to the use of the plant to treat the disease (see next section for more details). Finally, after applying the relation extraction model, we extracted 72,981 high-confidence plant-disease associations containing 4,826 unique plants and 1,811 diseases (see Figures S9A and S9B).

As an additional dataset, we included several manually curated ethnobotanical databases, each with a focus on a specific geographical region: VietHerb64 (Vietnam), Ewé65 (Brazil), IMPPAT66 (India), ETM-DB67 (Ethiopia), and KTKP68 (Republic of Korea). After downloading the databases, we normalized plants and diseases to NCBITaxon and MONDO identifiers, respectively. The combination of all databases comprised 2,393 unique plants and 867 unique diseases involved in 24,085 plant-disease associations (see Figures S9A and S9B). Overall, the average number of diseases a given plant treats was 6.5 (see Figure S9C).

Using the T0pp model for relation extraction

We used a pre-trained large language model called T0pp72 as a relation extraction model to identify sentences where the occurrence of a plant and a disease did not imply the the plant is used to treat the disease (e.g., there is no relation between them, or the relation is positive (plant causes disease). Using the mentions of the plant and disease and the sentence, the following prompts were designed

  • 1.

    "{sentence}. In the previous sentence, which plants are used to treat {disease_mention}?"

  • 2.

    "{sentence}. In the previous sentence, which diseases are associated with {plant_mention}?"

  • 3.

    "{sentence}. In the previous sentence, is {plant_mention} used to treat {disease_mention}?

For prompt #1 and #2, we expect that the model answers the prompts with the plant and disease mentioned, respectively, if the plant is used to treat the disease. For prompt #3, however, the model will always answer with “Yes” (if it thinks there is a ‘treat’ relation) or “No”, otherwise. If at least two of the three prompts indicated that there is a ‘treat’ relation between the plant and a disease, we considered this relation as ‘treat’. Otherwise, the proposed plant-disease association was filtered. The rationale behind using the three prompts and ⅔ as a cutoff was an internal evaluation of 100 plant-disease associations (50 positive examples and 50 negative ones) which yielded a precision over 90% with a recall of 100%. The 3/3 cutoff on the other hand, reduced the recall to 80% and did not improve the precision.

Mapping the phytochemical space to plants

To annotate all the known phytochemicals in plants, we leveraged two natural product databases: COCONUT53 and LOTUS.54 Firstly, we normalized the chemicals in both SDF database dumps (version January 2022 and February 2022, respectively) to PubChem identifiers using each compound’s SMILES or InChIKeys. Secondly, for those normalized chemicals, we matched their taxonomic information (i.e., list of scientific names for the species that contain that chemical) to NCBITaxon identifiers. We used fuzzy matching between the species name to species names or synonyms in NCBITaxon, applying a strict cutoff of a distance ratio >0.95, where the distance ratio is equal to the Levenshtein distance between the two strings divided by the length of the alignment. After the normalization, 71,179 unique chemicals were present in 18,094 plants, of which 4,028 were medicinal (i.e., plants associated with at least one disease) (see Figures S9E and S9F). On average, medicinal plants contained 38 chemicals (see Figure S9D).

Evaluating the similarity of ethnobotanical use on taxonomically related plants

To measure the similarity of ethnobotanical use of medicinal plants, we first generated binary matrices of plants (6,048) and diseases (2,205), where a value of 1 denotes the use of the plant to treat a particular disease, and 0 denotes otherwise. Using this representation, we can employ similarity metrics, such as Jaccard and Cosine similarity, to quantify the ethnobotanical similarity between any given pair of plants. However, due to the sparsity of the matrix and the high granularity of the MONDO (e.g., disease subtypes can be on the 7th level of the hierarchy), we grouped the diseases into one or more of the 23 main indication areas corresponding to higher order concepts of MONDO (see Table S2), similar to Saslis-Lagoudakis et al. (2012).14 This way, we made more transparent the similarity calculations by focusing on the main indication areas, but also better reflect traditional ethnobotany use since plants are typically reported in the literature to treat a set of physical signs and symptoms (e.g., viral infections and jaundice) as opposed to specific indications and disease subtypes (e.g., otitis and nonalcoholic steatohepatitis). Therefore, the reduction of diseases to indication areas led to a lower sparsity in the plant-disease vector, and accordingly, we removed any medicinal plant without any association with one of the 23 main indications.

Finally, we grouped all medicinal plant pairs from the same genus (13,790 pairs) and family (273,695 pairs) to determine whether a correlation exists between the use of any given pair of medicinal plants of the same genus or family for a particular indication area. Non-binary vector representations of plant disease associations were constructed, with values of 0 denoting no mentions of the use of a particular plant for the treatment of a disease in either the scientific literature or ethnobotanical databases, and integers values above 0 indicating the number of literature citations and/or independent validations of the use of a plant to treat a disease. Then, for each of the two groups (i.e., plants from the same genus and plants from the same family), correlations between each plant pair were calculated using Pearson’s correlation coefficient (see Figure S1 for an illustrative example and next paragraph for details). Additionally, we generated the set of taxonomically distant plant pairs by sampling 100,000 random pairs of plants that belonged to a different family and subsequently calculating their correlations. Lastly, we compared distributions of the obtained correlations across groups using the Mann-Whitney-Wilcoxon test and applying Bonferroni correction.

Pearson’s correlation values vary between −1 and 1, with a value of −1 meaning a total negative linear correlation, 0 being no correlation, and +1 a perfect correspondence between the two vectors (i.e., both vectors are identical). Since the majority of the vectors are sparse, because the majority of the medicinal plants have only been used to treat one or a few indication areas, we expect that the majority of the pair comparisons will be between pairs of plants with sparse vectors. Giving, two sparse vectors of length 23 (number of indication areas), the most likely correlation that we expect is a negative one, because the two vectors will not have any matching indication area (i.e., given that both plants have only been used for a few of the 23 indication areas, we do not expect them to overlap). This explains why the mode of the distribution in Figure 2 is slightly negative (approximately −0.15), and not zero. Lastly, we would like to note that the similarity metric chosen does not influence the results as cosine similarity yields the same results (Figure S10).

Plant geographic and climatic information

To analyze the geographical distribution of medicinal plants, we leveraged the Trefle database. The database was open-source (with the latest data dump released in 2020) and had data collected from both plant-specific resources and users. For each of the plants recorded in the database, it contains information on the plant’s geographic location (i.e., country), and the plant growth characteristics such as atmospheric humidity, soil pH, plant vegetation, and height. The rationale behind choosing this database was its highest coverage in terms of geographical locations (i.e., over 140 countries) (see Figure S11) and the number of plants covered (see Figure S12).

Evaluating the similarity of the chemical space on taxonomically related plants

Analogous to the previous section for diseases, we also aimed at evaluating whether taxonomically related plants have a more similar chemical composition than taxonomically distant plants. To assess this, we calculated the chemical similarity for every pair of plants in the same genus and the same family using the Szymkiewicz-Simpson Coefficient (SSC) (Equation 1). Although other similarity metrics such as Cosine similarity and Sørensen–Dice could have been used, the selected one is more suitable for comparing sets that could broadly differ in size.

S(X,Y)=|XY|min(|X|,|Y|) (Equation 1)

Equation 1. The Szymkiewicz-Simpson coefficient calculates the overlap of two sets as a measure of their similarity. The similarity S, where 0 ≤ S ≤ 1, is defined as the size of the intersection of two sets (X and Y) divided by the size of the smaller of the two sets.

We calculate the similarity between taxonomically related plants and chemicals using two distinct approaches: firstly, we compare the similarity between plant-chemical binary vectors (where 0 indicates chemicals not present in plant and 1 indicates otherwise), and secondly, we compare the similarity between plants and chemical scaffolds using Murcko scaffolds58 of the original phytochemicals. Furthermore, we ran this analysis for all plants in the two employed databases as well as in a subset exclusively containing medicinal plants. Lastly, the distributions of the obtained chemical similarities were compared across groups using the Mann-Whitney-Wilcoxon test and applying Bonferroni correction.

Acknowledgments

We would like to thank the authors of the databases and resources used in our work for making their datasets available to the scientific community, and Daniel Ence for his useful suggestions for generating the phylogenetic tree.

Author contributions

V.C. conceived the original idea. D.D.F. designed the study with assistance from Y.G., T.J.B., R.G.S., and D.H. D.D.F., Y.G., and T.J.B. prepared the data. D.D.F., Y.G., and S.M. analyzed the data. D.D.F., Y.G., and S.M. interpreted the results. M.D.H. analyzed the second and third case scenario. S.C. generated the phylogenetic tree visualization. D.D.F., Y.G., and S.M. wrote the paper. All authors reviewed the manuscript.

All authors have read and approved the final manuscript.

Declaration of interests

All authors were employees of Enveda Biosciences Inc. during the course of this work and have real or potential ownership interest in the company.

Published: August 25, 2023

Footnotes

Supplemental information can be found online at https://doi.org/10.1016/j.isci.2023.107729.

Supplemental information

Document S1. Figures S1–S12 and Tables S1 and S2
mmc1.pdf (2.3MB, pdf)

Data and code availability

  • Code is publicly available at https://github.com/enveda/ethnobotany together with the raw and processed datasets.

  • Any additional information required to reanalyze the data reported in this paper is available from the lead contact upon request.

  • Additional Supplemental Items are available in the GitHub repository.

References

  • 1.Miller T.L. University of Texas Press; 2019. Plant Kin: A Multispecies Ethnography in Indigenous Brazil. [Google Scholar]
  • 2.Heinrich M., Robles M., West J.E., Ortiz De Montellano B.R., Rodriguez E. Ethnopharmacology of Mexican asteraceae (compositae) Annu. Rev. Pharmacol. Toxicol. 1998;38:539–565. doi: 10.1146/annurev.pharmtox.38.1.539. [DOI] [PubMed] [Google Scholar]
  • 3.Lewis W.H. Pharmaceutical discoveries based on ethnomedicinal plants: 1985 to 2000 and beyond. Econ. Bot. 2003;57:126–134. doi: 10.1663/0013-0001(2003)057[0126:PDBOEP]2.0.CO;2. [DOI] [Google Scholar]
  • 4.de Albuquerque U.P., Hanazaki N. Five problems in current ethnobotanical research—and some suggestions for strengthening them. Hum. Ecol. 2009;37:653–661. doi: 10.1007/s10745-009-9259-9. [DOI] [Google Scholar]
  • 5.Fabricant D.S., Farnsworth N.R. The value of plants used in traditional medicine for drug discovery. Environ. Health Perspect. 2001;109:69–75. doi: 10.1289/ehp.01109s169. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Kupchan S.M. Drugs from Natural Products - Plant Sources. Advances in Chemistry. 2009;108:1–13. doi: 10.1021/ba-1971-0108.ch001. [DOI] [Google Scholar]
  • 7.Heinrich M. Ethnopharmacology: a short history of a multidisciplinary field of research. Ethnopharmacology. 2015;1:10. doi: 10.1002/9781118930717. [DOI] [Google Scholar]
  • 8.Johns T. University of Arizona Press; 1990. With Bitter Herbs They Shall Eat it: Chemical Ecology and the Origins of Human Diet and Medicine. [Google Scholar]
  • 9.Katz L., Baltz R.H. Natural product discovery: past, present, and future. J. Ind. Microbiol. Biotechnol. 2016;43:155–176. doi: 10.1007/s10295-015-1723-5. [DOI] [PubMed] [Google Scholar]
  • 10.Beutler J.A. Natural products as a foundation for drug discovery. Curr. Protoc. Pharmacol. 2009;46:9.11.1–9.11.21. doi: 10.1002/0471141755.ph0911s46. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Khare C.P. Springer Science and Business Media; 2008. Indian Medicinal Plants: An Illustrated Dictionary. [Google Scholar]
  • 12.West African Health Organisation (WAHO) 2013. Pharmacopoeia. [Google Scholar]
  • 13.Quattrocchi U. Volume 5. CRC press; 2012. (CRC World Dictionary of Medicinal and Poisonous Plants: Common Names, Scientific Names, Eponyms, Synonyms, and Etymology). [Google Scholar]
  • 14.Souza E.D.N.F., Hawkins J.A. Ewé: A Web-Based Ethnobotanical Database for Storing and Analysing Data. Database. 2020;2020 doi: 10.1093/database/baz144. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Teixidor-Toneu I., Jordan F.M., Hawkins J.A. Comparative phylogenetic methods and the cultural evolution of medicinal plant use. Nat. Plants. 2018;4:754–761. doi: 10.1038/s41477-018-0226-6. [DOI] [PubMed] [Google Scholar]
  • 16.Saslis-Lagoudakis C.H., Savolainen V., Williamson E.M., Forest F., Wagstaff S.J., Baral S.R., Watson M.F., Pendry C.A., Hawkins J.A. Phylogenies reveal predictive power of traditional medicine in bioprospecting. Proc. Natl. Acad. Sci. USA. 2012;109:15835–15840. doi: 10.1073/pnas.1202242109. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Reinaldo R., Albuquerque U., Medeiros P. Taxonomic affiliation influences the selection of medicinal plants among people from semi-arid and humid regions—a proposition for the evaluation of utilitarian equivalence in Northeast Brazil. PeerJ. 2020;8 doi: 10.7717/peerj.9664. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Johnson-Fulton S., Watson L. Comparing medicinal uses of Cochlospermaceae throughout its geographic range with insights from molecular phylogenetics. Diversity. 2018;10:123. doi: 10.3390/d10040123. [DOI] [Google Scholar]
  • 19.Moerman D.E. The medicinal flora of native North America: an analysis. J. Ethnopharmacol. 1991;31:1–42. doi: 10.1016/0378-8741(91)90141-Y. [DOI] [PubMed] [Google Scholar]
  • 20.Moerman D.E. An analysis of the food plants and drug plants of native North America. J. Ethnopharmacol. 1996;52:1–22. doi: 10.1016/0378-8741(96)01393-1. [DOI] [PubMed] [Google Scholar]
  • 21.Moerman D.E., Pemberton R.W., Kiefer D., Berlin B. A comparative analysis of five medicinal floras. J. Ethnobiol. 1999;19:49–70. [Google Scholar]
  • 22.Yeung A.W.K., Heinrich M., Kijjoa A., Tzvetkov N.T., Atanasov A.G. The ethnopharmacological literature: An analysis of the scientific landscape. J. Ethnopharmacol. 2020;250 doi: 10.1016/j.jep.2019.112414. [DOI] [PubMed] [Google Scholar]
  • 23.Jan H.A., Mir T.A., Bussmann R.W., Jan M., Hanif U., Wali S. Cross-cultural ethnomedicinal study of the wild species of the genus Berberis used by the ethnic communities living along both sides of the Indo-Pak border in Kashmir. Ethnobot. Res. Appl. 2023;26:1–14. [Google Scholar]
  • 24.Turner N. Ancient Pathways, Ancestral Knowledge: Ethnobotany and Ecological Wisdom of Indigenous Peoples of Northwestern North America. BC Studies. 2015;188:111. [Google Scholar]
  • 25.Mano A., Tuller T., Béjà O., Pinter R.Y. Comparative classification of species and the study of pathway evolution based on the alignment of metabolic pathways. BMC Bioinf. 2010;11:1–10. doi: 10.1186/1471-2105-11-S1-S38. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Hoffmann T., Krug D., Bozkurt N., Duddela S., Jansen R., Garcia R., Gerth K., Steinmetz H., Müller R. Correlating chemical diversity with taxonomic distance for discovery of natural products in myxobacteria. Nat. Commun. 2018;9:803–810. doi: 10.1038/s41467-018-03184-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Zhou X., Liu Z. Unlocking plant metabolic diversity: a (pan)-genomic view. Plant Commun. 2022;3 doi: 10.1016/j.xplc.2022.100300. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Zhang Y., Deng T., Sun L., Landis J.B., Moore M.J., Wang H., Wang Y., Hao X., Chen J., Li S., et al. Phylogenetic patterns suggest frequent multiple origins of secondary metabolites across the seed-plant ‘tree of life. Natl. Sci. Rev. 2021;8 doi: 10.1093/nsr/nwaa105. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Wink M. Evolution of secondary metabolites from an ecological and molecular phylogenetic perspective. Phytochemistry. 2003;64:3–19. doi: 10.1016/S0031-9422(03)00300-5. [DOI] [PubMed] [Google Scholar]
  • 30.Alrashedy N.A., Molina J. The ethnobotany of psychoactive plant use: a phylogenetic perspective. PeerJ. 2016;4 doi: 10.7717/peerj.2546. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Amiguet V.T., Arnason J.T., Maquin P., Cal V., Sánchez-Vindas P., Alvarez L.P. A regression analysis of Q’eqchi’Maya medicinal plants from southern Belize. Econ. Bot. 2006;60:24–38. doi: 10.1663/0013-0001(2006)60[24:ARAOQM]2.0.CO;2. [DOI] [Google Scholar]
  • 32.Burns J.H., Strauss S.Y. More closely related species are more ecologically similar in an experimental test. Proc. Natl. Acad. Sci. USA. 2011;108:5302–5307. doi: 10.1073/pnas.1013003108. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Chen K., Burgess K.S., He F., Yang X.Y., Gao L.M., Li D.Z. Seed traits and phylogeny explain plants' geographic distribution. Biogeosciences. 2022;19:4801–4810. doi: 10.5194/bg-19-4801-2022. [DOI] [Google Scholar]
  • 34.Hawkins J.A., Teixidor Toneu I. Defining ‘ethnobotanical convergence. Trends Plant Sci. 2017;22:639–640. doi: 10.1016/j.tplants.2017.06.002. [DOI] [PubMed] [Google Scholar]
  • 35.Heo J.Y., Im D.S. Anti-allergic effects of salvianolic acid A and tanshinone IIA from Salvia miltiorrhiza determined using in vivo and in vitro experiments. Int. Immunopharm. 2019;67:69–77. doi: 10.1016/j.intimp.2018.12.010. [DOI] [PubMed] [Google Scholar]
  • 36.Bezerra J.J.L., Pinheiro A.A.V., de Oliveira Barreto E. Medicinal Plants Used in the Treatment of Asthma in Different Regions of Brazil: A Comprehensive Review of Ethnomedicinal Evidence, Preclinical Pharmacology and Clinical Trials. Phytomedicine Plus. 2022 doi: 10.1016/j.phyplu.2022.100376. [DOI] [Google Scholar]
  • 37.Wang J.Y., Dong X., Yu Z., Ge L., Lu L., Ding L., Gan W. Borneol inhibits CD4+ T cells proliferation by down-regulating miR-26a and miR-142-3p to attenuate asthma. Int. Immunopharm. 2021;90 doi: 10.1016/j.intimp.2020.107223. [DOI] [PubMed] [Google Scholar]
  • 38.Lee J.E., Im D.S. Suppressive effect of carnosol on ovalbumin-induced allergic asthma. Biomol. Ther. 2021;29:58–63. doi: 10.4062/biomolther.2020.050. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Mira A., Shimizu K. Medicinal Plants-Recent Advances in Research and Development. Springer; 2016. An update on antitumor activity of Angelica species; pp. 363–371. [Google Scholar]
  • 40.Thangaleela S., Sivamaruthi B.S., Kesika P., Tiyajamorn T., Bharathi M., Chaiyasut C. A Narrative Review on the Bioactivity and Health Benefits of Alpha-Phellandrene. Sci. Pharm. 2022;90:57. doi: 10.3390/scipharm90040057. [DOI] [Google Scholar]
  • 41.Wang Y., Chen Y., Chen X., Liang Y., Yang D., Dong J., Yang N., Liang Z. Angelicin inhibits the malignant behaviours of human cervical cancer potentially via inhibiting autophagy. Exp. Ther. Med. 2019;18:3365–3374. doi: 10.3892/etm.2019.7985. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42.Jo H., Cha B., Kim H., Brito S., Kwak B.M., Kim S.T., Bin B.H., Lee M.G. α-pinene enhances the anticancer activity of natural killer cells via ERK/AKT pathway. Int. J. Mol. Sci. 2021;22:656. doi: 10.3390/ijms22020656. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43.Liu L., Xiong H., Ping J., Ju Y., Zhang X. Taraxacum officinale protects against lipopolysaccharide-induced acute lung injury in mice. J. Ethnopharmacol. 2010;130:392–397. doi: 10.1016/j.jep.2010.05.029. [DOI] [PubMed] [Google Scholar]
  • 44.Ma C., Zhu L., Wang J., He H., Chang X., Gao J., Shumin W., Yan T. Anti-inflammatory effects of water extract of Taraxacum mongolicum hand.-Mazz on lipopolysaccharide-induced inflammation in acute lung injury by suppressing PI3K/Akt/mTOR signaling pathway. J. Ethnopharmacol. 2015;168:349–355. doi: 10.1016/j.jep.2015.03.068. [DOI] [PubMed] [Google Scholar]
  • 45.Nassan M.A., Soliman M.M., Ismail S.A., El-Shazly S. Effect of Taraxacum officinale extract on PI3K/Akt pathway in DMBA-induced breast cancer in albino rats. Biosci. Rep. 2018;38 doi: 10.1042/BSR20180334. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46.Qu J., Ke F., Liu Z., Yang X., Li X., Xu H., Li Q., Bi K. Uncovering the mechanisms of dandelion against triple-negative breast cancer using a combined network pharmacology, molecular pharmacology and metabolomics approach. Phytomedicine. 2022;99 doi: 10.1016/j.phymed.2022.153986. [DOI] [PubMed] [Google Scholar]
  • 47.Jia Y.Y., Guan R.F., Wu Y.H., Yu X.P., Lin W.Y., Zhang Y.Y., Liu T., Zhao J., Shi S.Y., Zhao Y. Taraxacum mongolicum extract exhibits a protective effect on hepatocytes and an antiviral effect against hepatitis B virus in animal and human cells. Mol. Med. Rep. 2014;9:1381–1387. doi: 10.3892/mmr.2014.1925. [DOI] [PubMed] [Google Scholar]
  • 48.Rehman S., Ijaz B., Fatima N., Muhammad S.A., Riazuddin S. Therapeutic potential of Taraxacum officinale against HCV NS5B polymerase: In-vitro and In silico study. Biomed. Pharmacother. 2016;83:881–891. doi: 10.1016/j.biopha.2016.08.002. [DOI] [PubMed] [Google Scholar]
  • 49.Yarnell E., Abascal K. Dandelion (Taraxacum officinale and T. mongolicum) Integr. Med. 2009;8:35–38. [Google Scholar]
  • 50.Thole J.M., Kraft T.F.B., Sueiro L.A., Kang Y.H., Gills J.J., Cuendet M., Pezzuto J.M., Seigler D.S., Lila M.A. A comparative evaluation of the anticancer properties of European and American elderberry fruits. J. Med. Food. 2006;9:498–504. doi: 10.1089/jmf.2006.9.498. [DOI] [PubMed] [Google Scholar]
  • 51.Xiao H.H., Zhang Y., Cooper R., Yao X.S., Wong M.S. Phytochemicals and potential health effects of Sambucus williamsii Hance (Jiegumu) Chin. Med. 2016;11:36. doi: 10.1186/s13020-016-0106-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 52.Portalatin M., Winstead N. Medical management of constipation. Clin. Colon Rectal Surg. 2012;25:012–019. doi: 10.1055/s-0032-1301754. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 53.Sorokina M., Merseburger P., Rajan K., Yirik M.A., Steinbeck C. COCONUT online: collection of open natural products database. J. Cheminf. 2021;13:2–13. doi: 10.1186/s13321-020-00478-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 54.Rutz A., Sorokina M., Galgonek J., Mietchen D., Willighagen E., Gaudry A., Graham J.G., Stephan R., Page R., Vondrášek J., et al. The LOTUS initiative for open knowledge management in natural products research. Elife. 2022;11 doi: 10.7554/eLife.70780. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 55.Moudi M., Go R., Yien C.Y.S., Nazre M. Vinca alkaloids. Int. J. Prev. Med. 2013;4:1231–1235. [PMC free article] [PubMed] [Google Scholar]
  • 56.Newman D.J., Cragg G.M. Natural products as sources of new drugs over the nearly four decades from 01/1981 to 09/2019. J. Nat. Prod. 2020;83:770–803. doi: 10.1021/acs.jnatprod.9b01285. [DOI] [PubMed] [Google Scholar]
  • 57.López-Vallejo F., Giulianotti M.A., Houghten R.A., Medina-Franco J.L. Expanding the medicinally relevant chemical space with compound libraries. Drug Discov. Today. 2012;17:718–726. doi: 10.1016/j.drudis.2012.04.001. [DOI] [PubMed] [Google Scholar]
  • 58.Bemis G.W., Murcko M.A. The properties of known drugs. 1. Molecular frameworks. J. Med. Chem. 1996;39:2887–2893. doi: 10.1021/jm9602928. [DOI] [PubMed] [Google Scholar]
  • 59.Helson J.E., Capson T.L., Johns T., Aiello A., Windsor D.M. Ecological and evolutionary bioprospecting: using aposematic insects as guides to rainforest plants active against disease. Front. Ecol. Environ. 2009;7:130–134. doi: 10.1890/070189. [DOI] [Google Scholar]
  • 60.Shurkin J. Animals that self-medicate. Proc. Natl. Acad. Sci. USA. 2014;111:17339–17341. doi: 10.1073/pnas.1419966111. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 61.Bautista-Sopelana L.M., Bolívar P., Gómez-Muñoz M.T., Martínez-Díaz R.A., Andrés M.F., Alonso J.C., Bravo C., González-Coloma C. Bioactivity of plants eaten by wild birds against laboratory models of parasites and pathogens. Front. Ecol. Evol. 2022;1118 doi: 10.3389/fevo.2022.1027201. [DOI] [Google Scholar]
  • 62.Afendi F.M., Okada T., Yamazaki M., Hirai-Morita A., Nakamura Y., Nakamura K., Ikeda S., Takahashi H., Altaf-Ul-Amin M., Darusman L.K., et al. KNApSAcK family databases: integrated metabolite–plant species databases for multifaceted plant research. Plant Cell Physiol. 2012;53:e1. doi: 10.1093/pcp/pcr165. [DOI] [PubMed] [Google Scholar]
  • 63.Sung M., Jeong M., Choi Y., Kim D., Lee J., Kang J. BERN2: an advanced neural biomedical named entity recognition and normalization tool. Bioinformatics. 2022;38:4837–4839. doi: 10.1093/bioinformatics/btac598. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 64.Nguyen-Vo T.H., Le T., Pham D., Nguyen T., Le P., Nguyen A., Nguyen T., Nguyen T.N., Nguyen V., Do H., et al. VIETHERB: a database for Vietnamese herbal species. J. Chem. Inf. Model. 2019;59:1–9. doi: 10.1021/acs.jcim.8b00399. [DOI] [PubMed] [Google Scholar]
  • 65.Souza E.N.F., Williamson E.M., Hawkins J.A. Which plants used in ethnomedicine are characterized? Phylogenetic patterns in traditional use related to research effort. Front. Plant Sci. 2018;9:834. doi: 10.3389/fpls.2018.00834. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 66.Vivek-Ananth R.P., Mohanraj K., Sahoo A.K., Samal A. IMPPAT 2.0: An Enhanced and Expanded Phytochemical Atlas of Indian Medicinal Plants. bioRxiv. 2022 doi: 10.1101/2022.06.17.496609. Preprint at. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 67.Bultum L.E., Woyessa A.M., Lee D. ETM-DB: integrated Ethiopian traditional herbal medicine and phytochemicals database. BMC Complement. Altern. Med. 2019;19:212–311. doi: 10.1186/s12906-019-2634-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 68.Shin J.S., Lee Y.S., Lee M.S. Protection and utilization of traditional knowledge resources through Korean traditional knowledge portal (KTKP) J. Korea Cont. Assoc. 2010;10:422–426. doi: 10.5392/JKCA.2010.10.5.422. [DOI] [Google Scholar]
  • 69.Vasilevsky N., Essaid S., Matentzoglu N., Harris N.L., Haendel M., Robinson P., Mungall C. Mondo Disease Ontology: Harmonizing Disease Concepts across the World. CEUR-WS; 2020. p. 2807. [Google Scholar]
  • 70.Landrum G. 2016. RDKit: Open-Source Cheminformatics.http://www.rdkit.org/ version Q3 2022. [DOI] [Google Scholar]
  • 71.Allen M., Poggiali D., Whitaker K., Marshall T.R., van Langen J., Kievit R.A. Raincloud plots: a multi-platform tool for robust data visualization. Wellcome Open Res. 2021;4:63. doi: 10.12688/wellcomeopenres.15191.2. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 72.Sanh V., Webson A., Raffel C., Bach S.H., Sutawika L., Alyafeai Z., Chaffin A., Stiegler A., Le Scao T., Raja A., et al. Multitask prompted training enables zero-shot task generalization. International Conference on Learning Representations. 2022 [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Document S1. Figures S1–S12 and Tables S1 and S2
mmc1.pdf (2.3MB, pdf)

Data Availability Statement

  • Code is publicly available at https://github.com/enveda/ethnobotany together with the raw and processed datasets.

  • Any additional information required to reanalyze the data reported in this paper is available from the lead contact upon request.

  • Additional Supplemental Items are available in the GitHub repository.


Articles from iScience are provided here courtesy of Elsevier

RESOURCES