Skip to main content
PLOS One logoLink to PLOS One
. 2023 May 15;18(5):e0285716. doi: 10.1371/journal.pone.0285716

Virtual screening of antimicrobial plant extracts by machine-learning classification of chemical compounds in semantic space

Hiroaki Yabuuchi 1,¤,*, Kazuhito Hayashi 1, Akihiko Shigemoto 2, Makiko Fujiwara 1, Yuhei Nomura 2, Mayumi Nakashima 2, Takeshi Ogusu 1, Megumi Mori 1, Shin-ichi Tokumoto 2, Kazuyuki Miyai 1
Editor: Guadalupe Virginia Nevárez-Moorillón3
PMCID: PMC10184910  PMID: 37186641

Abstract

Plant extract is a mixture of diverse phytochemicals, and considered as an important resource for drug discovery. However, large-scale exploration of the bioactive extracts has been hindered by various obstacles until now. In this research, we have introduced and evaluated a new computational screening strategy that classifies bioactive compounds and plants in semantic space generated by word embedding algorithm. The classifier showed good performance in binary (presence/absence of bioactivity) classification for both compounds and plant genera. Furthermore, the strategy led to the discovery of antimicrobial activity of essential oils from Lindera triloba and Cinnamomum sieboldii against Staphylococcus aureus. The results of this study indicate that machine-learning classification in semantic space can be a highly efficient approach for exploring bioactive plant extracts.

Introduction

Plant extracts have been used to treat various diseases for thousands of years. In eastern medicines, plant extracts have formed the basis for traditional medicine systems. In western medicines, by contrast, the isolation of bioactive low-molecular-weight compounds such as morphine (from opium), quinine (from cinchona tree), atropine (from Atropa belladonna) led to the idea of chemical compounds as drugs [1]. Identification of the active ingredients accelerated pharmacological researches, resulted in discovery of the target proteins and disentanglement of the molecular mechanism of actions.

Knowledge accumulation on active compounds has come with the development of information-rich approaches for efficient drug discovery. Quantitative structure-activity relationship (QSAR) and machine learning have been introduced to the drug development [2]. With the pharmacological reports increased, data resources for bioactive compounds such as MeSH, PubChem [3] and ChEBI [4] were also made available.

Exploring novel medicinal plants is a major task in natural product research. In order to predict biological activity of the plant extracts, a mathematical model called quantitative composition-activity relationships (QCAR) was proposed [5, 6]. QCAR accounts the relationship of magnitude of the various chemical compositions of plant extracts with the bioactivity. However, its application to medicinal plant screening is limited because of (1) lack of the large-scale open data treating relation between composition and bioactivity of plant extracts, (2) difficulty in comprehensive compositional analysis covering diverse secondary metabolites in a plant sample, (3) necessity of composition data for all plant extracts to be predicted.

To circumvent these limitations, we have shown that a new computational screening strategy, word embedding-based virtual screening (WEBVS), has the potential to identify bioactive plant extracts. The overview of WEBVS is shown in Fig 1. Word embedding is known to encode semantic and syntactic similarity insofar as the embeddings for similar words will be nearby one another in vector space [7]. The WEBVS method utilizes the word embedding and a large amount of biomedical literature data to encode all known compounds and plants into a semantic space. The compounds are labeled by the presence/absence (active/inactive) of biological annotation data, and the labels and vectors are learned to construct a classification model. Finally, the labels of plants are predicted by the model in the semantic space. In this research, WEBVS was applied to screening of antimicrobial plant extracts, and was evaluated by statistical methods and antimicrobial assay against Staphylococcus aureus, a major human pathogen that causes a wide range of clinical infections [8].

Fig 1. Overview of word embedding-based virtual screening (WEBVS).

Fig 1

Materials and methods

Data

Biomedical literature data with automatic annotation of chemical compounds and species was retrieved from Pubtator FTP site in September 2020 [9]. Biological annotation data of chemical compounds was retrieved from MeSH [3] and ChEBI [4] in September 2021. Plant taxonomic data was retrieved from NCBI Taxonomy [3] in September 2021. A list of antibacterial plants was retrieved from a systematic review conducted by Chassagne et al. [10] to evaluate prediction performance of WEBVS.

Reagents

Acetone for gas chromatography was purchased from KISHIDA CHEMICAL Co., Ltd, Japan. Dimethyl sulfoxide (DMSO) and thymol (special grade) were purchased from FUJIFILM Wako Pure Chemical Corporation, Japan. A series of n-alkane standards (C9 to C40) was purchased from GL Sciences Inc., Tokyo, Japan. Mueller-Hinton II broth was purchased from Becton, Dickinson and Company, USA. Staphylococcus aureus (NBRC 12732) for antibacterial activity tests were from the National Institute of Technology and Evaluation, Biological Resource Center (NBRC), Japan.

Preprocessing text data

We selected natural compounds annotated by “pharmacological action” term with “anti-bacterial agents” or “antifungal agents” or “fungicides, industrial” or “antitubercular agents” or “antibiotics, antitubercular” or “anti-infective agents” from MeSH, and those annotated by “has_role” relation with “antibacterial agent” or “antibacterial drug” or “antifungal agent” or “antifungal agrochemical” or “antifungal drug” or “antiinfective agent” or “antimicrobial agent” or “antiseptic drug” or “antitubercular agent” or “fungicide” from ChEBI. These compounds (128 compounds, S1 Table) were regarded as “active compounds” in this research. The other MeSH compounds were assumed to be “inactive compounds”.

The biomedical literature data consisted of 132962 PubTator articles which contain both a bioactivity-related keyword (“activity”, “action”, “effect”, “property”, “efficacy” or “assessment”) and a name of either active compounds or plants in their titles. The plant species, subspecies and variants were grouped at the genus level. Low-frequent words (appeared in less than 0.1% of the selected articles) and stop-words were removed from the abstracts of the articles.

Word embedding

12356663 words appeared in the abstracts were inputted to word2vec embedding with continuous bag of words (CBOW) [11] to encode 16381 unique words as numerical vectors. “word2vec” R package (version 0.3.4) was used for the embedding implementation. The number of dimensions was set to 100, the window size was set to 5, and the number of negative samples was set to 5.

Machine learning of antimicrobial activity of chemical compounds

The embedded vectors of 128 active and 6443 inactive compounds were inputted to machine learning algorithms to classify the presence/absence of antimicrobial activity. As the labels of inactive compounds were uncertain, we randomly selected the same number of inactive compounds as that of active compounds. This selection was repeated ten times to avoid bias and increase robustness. Support vector machine (SVM) with the radial basis function kernel [12], random forest [13] and deep neural network [14] were tested by five-fold cross-validation with hyper-parameter optimization. The machine learning algorithm which showed the best accuracy was chosen as the best classifier. The labels of all embedded compounds were predicted by the classifier, and were sorted by output probability of the presence of antimicrobial activity (hereinafter referred to as “antimicrobial probability”).

Virtual screening of antimicrobial plants

In order to predict labels of the plants, 2534 plant genera encoded by the word embedding were inputted to the classifier. The antimicrobial probability (classified as active if the value is above 0.5) was checked against the list of antibacterial plants, and plotted as an enrichment curve. “chemmodlab” R package (version 2.0.0) was used for plotting the curve with simultaneous plus-adjusted sup-t confidence bands [15]. Furthermore, two plants classified as active were selected for essential oil extraction, gas chromatography/mass spectrometry (GC/MS) analysis and antimicrobial assay.

Extraction of essential oils

Fresh plant samples of Lindera triloba (syn. Parabenzoin trilobum) were collected from Koya town (Wakayama, Japan) in September 2021, and were separated into leaves and branches. Fresh plant samples of Cinnamomum sieboldii (syn. Cinnamomum okinawense) were collected from Tanabe city (Wakayama, Japan) in September 2021, and were separated into leaves, branches and stem barks. After shade-dried for several weeks, the materials were submitted to hydro-distillation for 3 hr with distilled water using a Clevenger-type apparatus. The obtained essential oils were stored at 4°C until further analysis.

Gas chromatography-mass spectrometry (GC/MS) analysis

Chemical characterization was performed by gas chromatograph coupled with mass spectrometer model QP2010 (Shimadzu, Kyoto, Japan). Essential oils were dissolved in acetone (2 μL/mL). This solution (1 μL) was injected in split mode (1:50 ratio) onto a DB-5MS column (30 m × 0.25 mm i.d. × 0.25 μm film thickness, Agilent, USA). The injection temperature was set at 270°C. The oven temperature was started at 60°C for 1 min after injection and then increased at 10°C/min to 180°C for 1 min, increased at 20°C/min to 280°C for 3 min followed by an increase at 20°C/min to 325°C, where the column was held for 20 min. Mass spectra were obtained in the range of 20 to 550 m/z. Essential oil components were identified based on a search (National Institute of Standards and Technology, NIST 14), the calculation of retention indices relative to homologous series of n-alkane, and a comparison of their mass spectra libraries with data from the mass spectra in the literature [16, 17].

Antimicrobial assay

Broth microdilution assay was performed according to standard method of Japan Society of Chemotherapy [18] with slight modification. A stock solution of each essential oil (dissolved to a concentration of 40 mg/mL in DMSO) was diluted to 4 mg/mL by Mueller-Hinton II broth medium, followed by serial dilution by the medium to lower concentrations (2, 1, 0.5, 0.25, 0.125, 0.0625, 0.0313, 0.0156 and 0.0078 mg/mL). Thymol, a known antimicrobial agent, was dissolved and diluted in the same way to ensure microbial susceptibility (positive control). The oils were all tested in triplicate. Staphylococcus aureus NBRC 12732 was inoculated onto normal agar plates, and cultured for 24 hr at 35±1°C. The bacterial suspensions were diluted by saline to obtain 0.5 McFarland turbidity equivalent (ca. 108 colony forming units per mL (CFU/mL)), and were further diluted 10 times (ca. 107 CFU/mL). 0.1 mL of essential oil-containing medium and 5 μL inoculum were added to sterile micro-titre plates. 10% (v/v) DMSO in the medium was used to determine if the solvent exhibited any antimicrobial effect (negative control). The micro-titre plates were incubated for 18 to 24 hr at 35±1°C. Based on the opacity and color change in each well, minimum concentration capable of inhibiting the growth was determined.

Results

Machine learning of antimicrobial activity of chemical compounds

The classification models for antimicrobial compounds were successfully constructed in the semantic space. All machine learning algorithms showed good accuracies ranged from 84.3 to 85.4% in the five-fold cross-validation (S2 Table). In the following sections, SVM was adopted for further evaluations because it showed the best average accuracy.

The constructed model classified 726 MeSH compounds as active even though they were assumed to be inactive in the learning process. The top 10 MeSH compounds ranked by antimicrobial probability were shown in Table 1. Among the compounds, perillyl alcohol [19], daphnoretin [20], xanthohumol [21], rhodomyrtone [22], galbanic acid [23] and alpha-hederin [24] were previously reported to show antimicrobial activities. These compounds are potentially active, although they are not annotated as active compounds in the databases.

Table 1. The top 10 ranked compounds with higher antimicrobial probability.

Compound Probability Biological annotation
perillyl alcohol 0.975 antineoplastic agents, enzyme inhibitors
hydrazones 0.944
daphnoretin 0.938 antiviral agent, antineoplastic agent
xanthohumol 0.938 apoptosis inducer, antineoplastic agent, antiviral agent, diacylglycerol O-acyltransferase inhibitor, anti-HIV-1 agent
rhodomyrtone 0.918
calomel 0.917
dehydroabietinol 0.915
galbanic acid 0.901
naphthoquinones 0.895
alpha-hederin 0.890 anti-inflammatory agent

Virtual screening of antimicrobial plants

Out of 2534 plant genera, 561 were predicted as active by the classifier (S3 Table). Among them, 164 were overlapped with antimicrobial plants listed in the review [10]. On the other hand, 265 genera in the review were predicted as inactive (sensitivity = 38.2%). The results were also shown as enrichment curve (Fig 2). The closer the curve is to the ideal curve, the higher the predictive performance of the model is. In the top 1% ranked plant genera (25 genera), WEBVS model correctly predicted 9 active genera, while 4.2 (1% of 429) active genera were expected to be included at random sampling (Table 2).

Fig 2. Enrichment curve obtained by WEBVS.

Fig 2

The simultaneous 95 percent plus-adjusted sup-t confidence bands are colored in gray.

Table 2. The top 1% ranked plants with higher antimicrobial probability.

Genus Probability
Casearia 0.965
Lithospermum 0.956
Syngonanthus 0.956
Forsythia 0.938
Daphne 0.930
Biancaea 0.924
Ruta 0.918
Chelidonium 0.916
Sophora 0.914
Peganum 0.913
Spatholobus 0.911
Lindera 0.910
Ecballium 0.907
Carapa 0.901
Humulus 0.899
Garcinia 0.898
Alisma 0.897
Copaifera 0.895
Zanthoxylum 0.893
Boesenbergia 0.891
Kaempferia 0.890
Gardenia 0.890
Buddleja 0.888
Croton 0.887
Pentanema 0.886
Polygonum 0.886

Gray background indicates antimicrobial plants reviewed by Chassagne et al. [10]

Plant selection and extraction of essential oil

Lindera is a genus predicted as active (antimicrobial probability = 0.910), although it is not listed in the systematic review [10]. In fact, various pharmacological and biological properties of Lindera plants have been focused in many studies [25]. In this study, Lindera triloba, an endemic species in Japan, was selected for antimicrobial bioassay. The essential oils from branch and leaf of Lindera triloba were obtained by hydrodistillation with yields (v/w % on dry weight basis) of 0.36% and 0.46%, respectively (S4 Table).

Cinnamomum is one of the genera with the most species investigated for antibacterial activity [10], and was also predicted as active (antimicrobial probability = 0.614) in this study. Cinnamomum sieboldii, a species grown wild in Japan, was also selected for antimicrobial assay. The essential oils from branch, leaf and stem bark of Cinnamomum sieboldii were obtained by hydrodistillation with yields of 0.80%, 0.64% and 0.58%, respectively (S4 Table).

Chemical composition of selected essential oils

The chemical profile of investigated essential oils determined via GC/MS analysis, was presented in Table 3 and S4 Table. The main constituents of Lindera triloba branch oil were α-cadinol (9.4%), epi-α-muurolol (9.3%), camphor (9.1%), whereas those of the leaf oil were δ-cadinene (14.7%), α-cadinol (11.3%) and epi-α-muurolol (10.8%).

Table 3. Major components of essential oils from Lindera triloba and Cinnamomum sieboldii.

Species Parts Major compounds identified (%)*
Lindera triloba leaf δ-cadinene (14.7), α-cadinol (11.3), epi-α-muurolol (10.8), α-muurolene (6.1), alloaromadendrene (6.0), β-bisabolene (6.0)
branch α-cadinol (9.4), epi-α-muurolol (9.3), camphor (9.1), limonene (8.3), bornyl acetate (7.5), δ-cadinene (7.1)
Cinnamomum sieboldii leaf linalool (24.8), cinnamaldehyde (19.1), geranial (12.1)
branch linalool (51.2), cinnamaldehyde (21.0), 1,8-cineole (11.8)
stem bark linalool (41.4), cinnamaldehyde (19.0), 1,8-cineole (10.3)

Values in parentheses are the percentage of the total peak area obtained from the total ion current chromatogram.

The main constituents of Cinnamomum sieboldii leaf oil were linalool (24.8%), cinnamaldehyde (19.1%), geranial (12.1%), whereas those from the other parts were linalool (branch: 51.2%, stem bark: 41.4%) followed by cinnamaldehyde (21.0%, 19.0%).

Antimicrobial assay

The minimum inhibitory concentration (MIC) values against S. aureus were 1 mg/mL for Lindera triloba branch oil and 4 mg/mL for the leaf oil. The MIC value of Cinnamomum sieboldii oils from leaf, branch and stem bark were all 1 mg/mL (Table 4). These values are considered to be active with reference to Gibbons’ paper which defined the essential oils having significant activity if the MIC is equal to or less than 5 μL/mL [26]. MIC for thymol (positive control) was 0.25 mg/mL, which was equivalent to literature data (0.03 v/v % [27]). No inhibition of bacterial growth was observed in the negative control.

Table 4. Antimicrobial activity of essential oils from selected plants against Staphylococcus aureus.

Lindera triloba Cinnamomum sieboldii
leaf branch leaf branch stem bark
MIC (mg/mL) a 4 1 1 1 1

a MIC: Minimum inhibitory concentration

Discussion

Drug discovery and development is a long and costly process that takes years with an average cost of over $1–2 billion to be approved as a new drug [28]. Various technologies for miniaturization, lab automation and robotics have enabled pharma to perform bioassay targeting massive chemical compounds by means of high-throughput screening (HTS) [29]. However, application of HTS for identification of biologically active natural products remains a relatively uncommon activity because of requirement of expensive equipment and a variety of experimental obstacles such as sample unavailability (restricted season or location), degradation, precipitation and non-specific/off-target effects [30]. Therefore, computational approach is of great help in understanding the bioactivity of plant extracts composed of complex mixtures of phytochemicals. In this research, WEBVS method successfully classified antimicrobial plant extracts by capturing local context similarity between bioactive compounds and plant extracts.

The most important advantage of WEBVS is unnecessity of manual data curation that is costly and time-consuming process. Although recent studies [31, 32] showed good performance of QCAR-based model at predicting antimicrobial activity of essential oils, they have limitations in collecting new training data. WEBVS consists of simple and automated processes with public literature data which is regularly updated, indicating that the classification model is easily constructed and updated. Furthermore, WEBVS is suitable for large-scale exploration because it is applicable to all plants that appeared in literature data.

WEBVS also fits the idea of drug repositioning [33] that identifies new therapeutic uses for already-available drugs including approved, shelved and withdrawn drugs. To our knowledge, this is the first report on antimicrobial activity of Lindera triloba and Cinnamomum sieboldii. Lindera triloba is a deciduous shrub distributed on the Pacific side of the islands (Honshu, Shikoku and Kyushu) in Japan [34], and was reported to show insect anti-feeding activity [35]. In this research, GC/MS analysis of the essential oils revealed the presence of various sesquiterpene alcohols including α-cadinol and epi-α-muurolol (τ-cadinol). These alcohols were determined to be active by Su et al. [36], and are considered to contribute to the antimicrobial activity of Lindera triloba. Cinnamomum sieboldii is an evergreen arbor that used to be cultivated as a substitute for cassia (Cinnamomum cassia), and was used as traditional Japanese medicine in the 19th century. Watanabe and Goto reported that quantity of the essential oil compares favorably with that of cassia [37]. However, Cinnamomum sieboldii was removed from Japanese Pharmacopoeia (7th edition) in 1962 because the increasing import of low-cost cassia rendered it unnecessary as a substitute [38]. Both linalool and cinnamaldehyde, detected as main constituents of the essential oil in this study, were reported to show antimicrobial activity against S. aureus [27]. Further researches including clinical studies are needed to reconsider the medicinal use of Cinnamomum sieboldii.

Literature-based discovery, a text mining technique used to discover new knowledge implicitly present in scientific literature, has become widespread as scientific literature is growing at an exponential rate [39]. However, it has not been systematically explored in context with natural products [40]. Our WEBVS strategy can also be considered as an automated literature-based discovery trying to build a knowledge bridge from chemistry area to the natural product area. Development of different literature-based models such as co-occurrence models and semantic models may also support the drug discovery and drug repositioning for natural products as well.

Finally, WEBVS has potential limitations. The first is that WEBVS cannot predict for a plant which has never been reported before. Approximately 13500 plant genera have been identified worldwide [41], but just 19% of them (2534 genera) were targeted in this study because of the lack of literature data. Combining WEBVS with phylogenetic analysis may be a promising approach because secondary metabolites of the plants are often similar within members of a clade [42]. The second limitation concerns the quantitativity. Any values in the text data did not influence the embedding, indicating that WEBVS is not suitable for quantitative prediction. However, it is generally difficult to combine quantitative activity data from multiple studies because the method and experimental conditions differ among them. Development of a relation extraction technique could help for integration and prediction of the quantitative activity data from full-text, tables and figures of the articles. The third limitation concerns chemical and bioactive variation due to environmental conditions. Various factors including temperature, carbon dioxide, lighting, ozone, soil water, soil salinity and soil fertility are known to affect plants’ physiological and biochemical responses [43]. These factors may cause prediction error of WEBVS.

In conclusion, WEBVS is an efficient approach for exploring antimicrobial plant extracts. Application of WEBVS for other biological activities will be evaluated in future research.

Supporting information

S1 Table. Active compounds used for the machine learning of antimicrobial activity.

(XLS)

S2 Table. Accuracy result of various machine learning algorithms.

(XLS)

S3 Table. Antimicrobial probability of plant genera.

(XLS)

S4 Table. Chemical composition of essential oils from Lindera triloba and Cinnamomum sieboldii.

(XLS)

Acknowledgments

We sincerely thank Mr. & Mrs. Shimoyama (Monpetokuwa) and Mr. Nishida (Forestry cooperative of temple estate in Koya-san) for providing the plant samples used in this study. We appreciate the assistance of Kazuaki Sakaguchi, Sayo Sugimoto and Yuki Kishimoto for selection of the plant samples.

Data Availability

R scripts and preprocessed literature data are available at https://github.com/yabuuchi-hiroaki/webvs. All other relevant data are within the paper and its Supporting Information files.

Funding Statement

This research was supported by the Kayamori Foundation of Informational Science Advancement (K32 ken XXV 577). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

References

  • 1.Newman DJ, Cragga GM, Snader KM. The influence of natural products upon drug discovery. Nat Prod Rep. 2000;17(3):215–234. doi: 10.1039/a902202c [DOI] [PubMed] [Google Scholar]
  • 2.Zhu H. Big data and artificial intelligence modeling for drug discovery. Annu Rev Pharmacol Toxicol. 2020;60:573–589. doi: 10.1146/annurev-pharmtox-010919-023324 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Sayers EW, Bolton EE, Brister JR, Canese K, Chan J, Comeau DC, et al. Database resources of the National Center for Biotechnology Information in 2023. Nucleic Acids Res. 2023;51(D1):D29–D38. doi: 10.1093/nar/gkac1032 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Hastings J, Owen G, Dekker A, Ennis M, Kale N, Muthukrishnan V, et al. ChEBI in 2016: Improved services and an expanding collection of metabolites. Nucleic Acids Res. 2016;44(D1):D1214–D1219. doi: 10.1093/nar/gkv1031 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Cheng Y, Wang Y, Wang X. A causal relationship discovery-based approach to identifying active components of herbal medicine. Comput Biol Chem. 2006;30(2):148–154. doi: 10.1016/j.compbiolchem.2005.11.003 [DOI] [PubMed] [Google Scholar]
  • 6.Wang Y, Wang X, Cheng Y. A computational approach to botanical drug design by modeling quantitative composition-activity relationship. Chem Biol Drug Des. 2006;68(3):166–172. doi: 10.1111/j.1747-0285.2006.00431.x [DOI] [PubMed] [Google Scholar]
  • 7.Zhang Y, Rahman MM, Braylan A, Dang B, Chang HL, Kim H, et al. Neural Information Retrieval: A Literature Review. arXiv:1611.06792v3 [Preprint]. 2016. [cited 2017 Mar 3]. Available from: https://arxiv.org/abs/1611.06792v3 [Google Scholar]
  • 8.Tong SYC, Davis JS, Eichenberger E, Holland TL, Fowler VG Jr. Staphylococcus aureus infections: epidemiology, pathophysiology, clinical manifestations, and management. Clin Microbiol Rev. 2015;28(3):603–661. doi: 10.1128/CMR.00134-14 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Wei CH, Allot A, Leaman R, Lu Z. PubTator central: automated concept annotation for biomedical full text articles. Nucleic Acids Res. 2019;47(W1):W587–W593. doi: 10.1093/nar/gkz389 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Chassagne F, Samarakoon T, Porras G, Lyles JT, Dettweiler M, Marquez L, et al. A systematic review of plants with antibacterial activities: A taxonomic and phylogenetic perspective. Front Pharmacol. 2021;11:586548. doi: 10.3389/fphar.2020.586548 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Mikolov T, Chen K, Corrado G, Dean J. Efficient estimation of word representations in vector space. arXiv:1301.3781v3 [Preprint]. 2013. [cited 2013 Jan 16]. Available from: doi: 10.48550/arXiv.1301.3781 [DOI] [Google Scholar]
  • 12.Vapnik VN. The Nature of Statistical Learning Theory. New York, USA: Springer; 1995. [Google Scholar]
  • 13.Ho TK. Random decision forests. In: Proceedings of 3rd international conference on document analysis and recognition. 1995. pp. 278–282.
  • 14.Hinton GE, Osindero S, Teh YW. A fast learning algorithm for deep belief nets. Neural Comput. 2006;18(7):1527–1554. doi: 10.1162/neco.2006.18.7.1527 [DOI] [PubMed] [Google Scholar]
  • 15.Ash JR, Hughes-Oliver JM. Confidence bands and hypothesis tests for hit enrichment curves. J Cheminform. 2022;14(1):50. doi: 10.1186/s13321-022-00629-0 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Babushok VI, Linstrom PJ, Zenkevich IG. Retention indices for frequently reported compounds of plant essential oils. J Phys Chem Ref Data. 2011;40:043101. doi: 10.1063/1.3653552 [DOI] [Google Scholar]
  • 17.Adams RP. Identification of essential oil components by gas chromatography/mass spectrometry. 3rd ed. Carol Stream, IL, USA:Allured Publishing Corp.; 1995. [Google Scholar]
  • 18.Committee on antimicrobial susceptibility testing, Japanese Society of Chemotherapy. [title in Japanese] Chemotherapy. 1990;38:102–105. [Google Scholar]
  • 19.Figueiredo RDA, Ortega AC, Maldonado LAG, de Castro RD, Avila-Campos MJ, Rossa C, et al. Perillyl alcohol has antibacterial effects and reduces ROS production in macrophages. J Appl Oral Sci. 2020;28:e20190519. doi: 10.1590/1678-7757-2019-0519 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Cottiglia F, Garau GLD, Floris C, Casu M, Pompei R, Bonsignore L. Antimicrobial evaluation of coumarins and flavonoids from the stems of Daphne gnidium L. Phytomedicine. 2001;8(4):302–305. doi: 10.1078/0944-7113-00036 [DOI] [PubMed] [Google Scholar]
  • 21.Cermak P, Olsovska J, Mikyska A, Dusek M, Kadleckova Z, Vanicek J, et al. Strong antimicrobial activity of xanthohumol and other derivatives from hops (Humulus lupulus L.) on gut anaerobic bacteria. APMIS. 2017;125(11):1033–1038. doi: 10.1111/apm.12747 [DOI] [PubMed] [Google Scholar]
  • 22.Bach QN, Hongthong S, Quach LT, Pham LV, Pham TV, Kuhakarn C, et al. Antimicrobial activity of rhodomyrtone isolated from Rhodomyrtus tomentosa (Aiton) Hassk. Nat Prod Res. 2020;34(17):2518–2523. doi: 10.1080/14786419.2018.1540479 [DOI] [PubMed] [Google Scholar]
  • 23.Bazzaz BSF, Memariani Z, Khashiarmanesh Z, Iranshahi M, Naderinasab M. Effect of galbanic acid, a sesquiterpene coumarin from ferula szowitsiana, as an inhibitor of efflux mechanism in resistant clinical isolates of Staphylococcus aureus. Braz J Microbiol. 2010;41(3):574–580. doi: 10.1590/S1517-83822010000300006 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Favel A, Steinmetz MD, Regli P, Vidal-Ollivier E, Elias R, Balansard G. In vitro antifungal activity of triterpenoid saponins. Planta Med. 1994;60(1):50–53. doi: 10.1055/s-2006-959407 [DOI] [PubMed] [Google Scholar]
  • 25.Cao Y, Xuan B, Peng B, Li C, Chai X, Tu P. The genus Lindera: a source of structurally diverse molecules having pharmacological significance. Phytochem Rev. 2016;15:869–906. doi: 10.1007/s11101-015-9432-2 [DOI] [Google Scholar]
  • 26.Gibbons S. Anti-staphylococcal plant natural products. Nat Prod Rep. 2004;21(2):263–277. doi: 10.1039/b212695h [DOI] [PubMed] [Google Scholar]
  • 27.Reichling J, Suschke U, Schneele J, Geiss HK. Antibacterial activity and irritation potential of selected essential oil components—Structure-activity relationship. Nat Prod Commun. 2006;1(11):1003–1012. doi: 10.1177/1934578X0600101116 [DOI] [Google Scholar]
  • 28.Hinkson IV, Madej B, Stahlberg EA. Accelerating therapeutics for opportunities in medicine: A paradigm shift in drug discovery. Front Pharmacol. 2020;11:770. doi: 10.3389/fphar.2020.00770 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Mayr LM, Fuerst P. The future of high-throughput screening. J Biomol Screen. 2008;13(6):443–448. doi: 10.1177/1087057108319644 [DOI] [PubMed] [Google Scholar]
  • 30.Henrich CJ, Beutler JA. Matching the power of high throughput screening to the chemical diversity of natural products. Nat Prod Rep. 2013;30(10):1284–1298. doi: 10.1039/c3np70052f [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Daynac M, Cortes-Cabrera A, Prieto JM. Application of Artificial Intelligence to the Prediction of the Antimicrobial Activity of Essential Oils. Evid Based Complement Alternat Med. 2015;2015:561024. doi: 10.1155/2015/561024 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.El-Attar NE, Awad WA. Computational tool for optimizing the essential oils utilization in inhibiting the bacterial growth. Adv Appl Bioinform Chem. 2017;10:65–78. doi: 10.2147/AABC.S138944 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Parvathaneni V, Kulkarni NS, Muth A, Gupta V. Drug repurposing: a promising tool to accelerate the drug discovery process. Drug Discov Today. 2019;24(10):2076–2085. doi: 10.1016/j.drudis.2019.06.014 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Horikawa Y. Atlas of the Japanese flora, an introduction to plant sociology of east Asia. Tokyo, Japan: Gakken; 1972. p. 577 [in Japanese]. [Google Scholar]
  • 35.Wada K, Matsui K, Enomoto Y, Ogiso O, Munakata K. Insect feeding inhibitors in plants part I. Isolation of three new sesquiterpenoids in Parabenzoin trilobum Nakai. Agric Biol Chem. 1970;34(6):941–945. doi: 10.1080/00021369.1970.10859708 [DOI] [Google Scholar]
  • 36.Su YC, Hsu KP, Wang EIC, Ho CL. Composition, in vitro cytotoxic, and antimicrobial activities of the flower essential oil of Diospyros discolor from Taiwan. Nat Prod Commun. 2015;10(7):1311–1314. doi: 10.1177/1934578X1501000744 [DOI] [PubMed] [Google Scholar]
  • 37.Watanabe T, Goto M. Study on Japanese Cinnamon (II). Japan J Pharmacog. 1953;6(1):35–37. [Google Scholar]
  • 38.Nitta A. Studies on commercial cinnamon and allied barks. X. On nikkei, Cinnamomum sieboldii MEISN., syn. C. loureirii auct. Japon non NEES. Chem Pharm Bull. 1987;35(4):1464–1478. doi: 10.1248/cpb.35.1464 [DOI] [Google Scholar]
  • 39.Henry S, McInnes BT. Literature Based Discovery: Models, methods, and trends. J Biomed Inform. 2017;74:20–32. doi: 10.1016/j.jbi.2017.08.011 [DOI] [PubMed] [Google Scholar]
  • 40.Lardos A, Aghaebrahimian A, Koroleva A, Sidorova J, Wolfram E, Anisimova M, et al. Computational Literature-based Discovery for Natural Products Research: Current State and Future Prospects. Front Bioinform. 2022;2:827207. doi: 10.3389/fbinf.2022.827207 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41.Christenhusz MJM, Byng JW. The number of known plants species in the world and its annual increase. Phytotaxa. 2016;261(3):201–217. doi: 10.11646/phytotaxa.261.3.1 [DOI] [Google Scholar]
  • 42.Wink M. Evolution of secondary metabolites from an ecological and molecular phylogenetic perspective. Phytochemistry. 2003;64(1):3–19. doi: 10.1016/s0031-9422(03)00300-5 [DOI] [PubMed] [Google Scholar]
  • 43.Pant P, Pandey S, Dall’Acqua S. The Influence of Environmental Conditions on Secondary Metabolites in Medicinal Plants: A Literature Review. Chem Biodivers. 2021;18(11):e2100345. doi: 10.1002/cbdv.202100345 [DOI] [PubMed] [Google Scholar]

Decision Letter 0

Guadalupe Virginia Nevárez-Moorillón

19 Apr 2023

PONE-D-23-09446Virtual screening of antimicrobial plant extracts by machine-learning classification of chemical compounds in semantic spacePLOS ONE

Dear Dr. Yabuuchi,

Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process.

Please submit your revised manuscript by Jun 03 2023 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

Please include the following items when submitting your revised manuscript:

  • A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.

  • A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.

  • An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter.

If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: https://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols. Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols.

We look forward to receiving your revised manuscript.

Kind regards,

Guadalupe Virginia Nevárez-Moorillón, Ph.D.

Academic Editor

PLOS ONE

Journal Requirements:

When submitting your revision, we need you to address these additional requirements.

1. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at 

https://journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and 

https://journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf

2. Please note that PLOS ONE has specific guidelines on code sharing for submissions in which author-generated code underpins the findings in the manuscript. In these cases, all author-generated code must be made available without restrictions upon publication of the work. Please review our guidelines at https://journals.plos.org/plosone/s/materials-and-software-sharing#loc-sharing-code and ensure that your code is shared in a way that follows best practice and facilitates reproducibility and reuse.

3. Please remove your figures from within your manuscript file, leaving only the individual TIFF/EPS image files, uploaded separately. These will be automatically included in the reviewers’ PDF.

4. Please review your reference list to ensure that it is complete and correct. If you have cited papers that have been retracted, please include the rationale for doing so in the manuscript text, or remove these references and replace them with relevant current references. Any changes to the reference list should be mentioned in the rebuttal letter that accompanies your revised manuscript. If you need to cite a retracted article, indicate the article’s retracted status in the References list and also include a citation and full reference for the retraction notice.

Additional Editor Comments:

Please, consider the minor revisions suggested by the reviewers.

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: Yes

Reviewer #2: Yes

**********

2. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: Yes

Reviewer #2: Yes

**********

3. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: Yes

Reviewer #2: Yes

**********

4. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: Yes

Reviewer #2: Yes

**********

5. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: Comment for the manuscript:

“Virtual screening of antimicrobial plant extracts by machine-learning classification of chemical compounds in semantic space”

Author have applied semantic space and word embedding algorithm to develop machine learning for classification of antimicrobial compouds . This a new type methods to develop machine learning model for classification.

In my opinion -

It is a well-written manuscript.

Methods are written explaning every step.

The results and figures are well explained.

The discussion and conclusion explain very well the outcomes of this manuscript.

Reviewer #2: The manuscript entitled ‘Virtual screening of antimicrobial plant extracts by machine-learning

classification of chemical compounds in semantic space’ described the evaluation of a new computational screening strategy for classifying bioactive compounds and plants in semantic space generated by word embedding algorithm.

The manuscript is well written. The study is relevant and important especially for researchers in countries where traditional knowledge about the medicinal uses of available plant has become limited or unavailable. In such situations applications like this can help explore the flora for drug discovery. It is also important for areas with rich traditional knowledge but where indigens are unwilling to give out information on important medicinal plants for research.

Unfortunately, the application does not address the issues of chemical and bioactive variation in plants due to factors such as geographical location of plant material, season of harvesting etc. that may affect predictions by the application.

Lines 236-238: A stronger reason HTS are uncommon is the cost involved in setting one up (acquisition, installation and maintenance). Revise.

Linea 248-249: why should plants that were previous investigated be subjects to WEBVS in its large-scale exploration? Is it necessary?

Linea 264: What else is there to consider about the medicinal potential of C. sieboldii when the authors had indicated that the plant is in short supply therefore its removal from the Japanese Pharmacopoeia? Revise.

Overall, it is a good manuscript worthy for consideration for publication after minor corrections.

**********

6. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: Yes: Subhash Chandra

Reviewer #2: Yes: Gustav Komlaga

**********

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step.

PLoS One. 2023 May 15;18(5):e0285716. doi: 10.1371/journal.pone.0285716.r002

Author response to Decision Letter 0


27 Apr 2023

Journal Requirements:

1. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming.

RESPONSE: Thank you for your advice. We have checked the style requirements again, and uploaded figure files corrected by PACE.

2. Please note that PLOS ONE has specific guidelines on code sharing for submissions in which author-generated code underpins the findings in the manuscript. In these cases, all author-generated code must be made available without restrictions upon publication of the work. Please review our guidelines at https://journals.plos.org/plosone/s/materials-and-software-sharing#loc-sharing-code and ensure that your code is shared in a way that follows best practice and facilitates reproducibility and reuse.

RESPONSE: We have uploaded the code to GitHub, and added a description “R scripts and preprocessed literature data are available at https://github.com/yabuuchi-hiroaki/webvs” to “Data Availability” field.

3. Please remove your figures from within your manuscript file, leaving only the individual TIFF/EPS image files, uploaded separately. These will be automatically included in the reviewers’ PDF.

RESPONSE: We have removed the figures. We are sorry for forgetting to remove them.

4. Please review your reference list to ensure that it is complete and correct. If you have cited papers that have been retracted, please include the rationale for doing so in the manuscript text, or remove these references and replace them with relevant current references. Any changes to the reference list should be mentioned in the rebuttal letter that accompanies your revised manuscript. If you need to cite a retracted article, indicate the article’s retracted status in the References list and also include a citation and full reference for the retraction notice.

RESPONSE: Thank you for your advice. We have checked the reference list again.

Reviewers' comments:

5. Review Comments to the Author

Reviewer #1: Comment for the manuscript:

“Virtual screening of antimicrobial plant extracts by machine-learning classification of chemical compounds in semantic space” Author have applied semantic space and word embedding algorithm to develop machine learning for classification of antimicrobial compouds . This a new type methods to develop machine learning model for classification.

In my opinion - It is a well-written manuscript. Methods are written explaning every step. The results and figures are well explained. The discussion and conclusion explain very well the outcomes of this manuscript.

RESPONSE: Thank you for dedicating your time to review our manuscript and for the positive opinion.

Reviewer #2: The manuscript entitled ‘Virtual screening of antimicrobial plant extracts by machine-learning classification of chemical compounds in semantic space’ described the evaluation of a new computational screening strategy for classifying bioactive compounds and plants in semantic space generated by word embedding algorithm.

The manuscript is well written. The study is relevant and important especially for researchers in countries where traditional knowledge about the medicinal uses of available plant has become limited or unavailable. In such situations applications like this can help explore the flora for drug discovery. It is also important for areas with rich traditional knowledge but where indigens are unwilling to give out information on important medicinal plants for research.

Unfortunately, the application does not address the issues of chemical and bioactive variation in plants due to factors such as geographical location of plant material, season of harvesting etc. that may affect predictions by the application.

RESPONSE: Thank you for dedicating your time to review our manuscript and for constructive comments. You have raised an important point in the last paragraph. We have incorporated your comments by p.18, lines 284-287.

Lines 236-238: A stronger reason HTS are uncommon is the cost involved in setting one up (acquisition, installation and maintenance). Revise.

RESPONSE: Thank you for your advice. We agree with you and have incorporated this suggestion (p.15, line 238).

Linea 248-249: why should plants that were previous investigated be subjects to WEBVS in its large-scale exploration? Is it necessary?

RESPONSE: We wrote "appeared in previous studies" to mean somewhat described in the literature data. However, as you pointed out, this phrase give an impression that the activity of plant is already investigated in previous studies. We have amended the manuscript to avoid reader's confusion (p.16, line 250).

Linea 264: What else is there to consider about the medicinal potential of C. sieboldii when the authors had indicated that the plant is in short supply therefore its removal from the Japanese Pharmacopoeia? Revise.

Overall, it is a good manuscript worthy for consideration for publication after minor corrections.

RESPONSE: Thank you for the suggestive question. We have rewritten the history and background of C. sieboldii with some references (from p.16 line 259 to p.17 line 263), and specified the need of further research including clinical studies (p.17 lines 265-266).

Attachment

Submitted filename: RebuttalLetter.docx

Decision Letter 1

Guadalupe Virginia Nevárez-Moorillón

2 May 2023

Virtual screening of antimicrobial plant extracts by machine-learning classification of chemical compounds in semantic space

PONE-D-23-09446R1

Dear Dr. Yabuuchi,

We’re pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it meets all outstanding technical requirements.

Within one week, you’ll receive an e-mail detailing the required amendments. When these have been addressed, you’ll receive a formal acceptance letter and your manuscript will be scheduled for publication.

An invoice for payment will follow shortly after the formal acceptance. To ensure an efficient process, please log into Editorial Manager at http://www.editorialmanager.com/pone/, click the 'Update My Information' link at the top of the page, and double check that your user information is up-to-date. If you have any billing related questions, please contact our Author Billing department directly at authorbilling@plos.org.

If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they’ll be preparing press materials, please inform our press team as soon as possible -- no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org.

Kind regards,

Guadalupe Virginia Nevárez-Moorillón, Ph.D.

Academic Editor

PLOS ONE

Additional Editor Comments (optional):

I reviewed the corrections in the final document, as described in the cover letter by the corresponding author. Thank you for the edits and the manuscript can be accepted without further changes.

Reviewers' comments:

Acceptance letter

Guadalupe Virginia Nevárez-Moorillón

3 May 2023

PONE-D-23-09446R1

Virtual screening of antimicrobial plant extracts by machine-learning classification of chemical compounds in semantic space

Dear Dr. Yabuuchi:

I'm pleased to inform you that your manuscript has been deemed suitable for publication in PLOS ONE. Congratulations! Your manuscript is now with our production department.

If your institution or institutions have a press office, please let them know about your upcoming paper now to help maximize its impact. If they'll be preparing press materials, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information please contact onepress@plos.org.

If we can help with anything else, please email us at plosone@plos.org.

Thank you for submitting your work to PLOS ONE and supporting open access.

Kind regards,

PLOS ONE Editorial Office Staff

on behalf of

Dr. Guadalupe Virginia Nevárez-Moorillón

Academic Editor

PLOS ONE

Associated Data

    This section collects any data citations, data availability statements, or supplementary materials included in this article.

    Supplementary Materials

    S1 Table. Active compounds used for the machine learning of antimicrobial activity.

    (XLS)

    S2 Table. Accuracy result of various machine learning algorithms.

    (XLS)

    S3 Table. Antimicrobial probability of plant genera.

    (XLS)

    S4 Table. Chemical composition of essential oils from Lindera triloba and Cinnamomum sieboldii.

    (XLS)

    Attachment

    Submitted filename: RebuttalLetter.docx

    Data Availability Statement

    R scripts and preprocessed literature data are available at https://github.com/yabuuchi-hiroaki/webvs. All other relevant data are within the paper and its Supporting Information files.


    Articles from PLOS ONE are provided here courtesy of PLOS

    RESOURCES