Skip to main content
BMC Medical Informatics and Decision Making logoLink to BMC Medical Informatics and Decision Making
. 2025 Oct 14;25:378. doi: 10.1186/s12911-025-03041-4

Ensemble techniques for predictive modeling of leishmanial activity via molecular fingerprints

Saif Nalband 1, Pallavi Kiratkar 2, Maulik Gupta 1, Mansi Gambhir 3, Surabhi Sonam 4, Femi Robert 5, A Amalin Prince 6,
PMCID: PMC12522471  PMID: 41088199

Abstract

Background

Leishmaniasis, a neglected tropical disease caused by Leishmania protozoan parasites and transmitted by sandflies, poses a significant global health challenge, especially in resource-limited environments. The life cycle of the parasite includes crucial amastigote and promastigote stages, each contributing importantly to the infection process. The current therapies for leishmaniasis face limitations due to considerable side effects and the rise of drug-resistant strains, underscoring the pressing need for new, effective, and safe treatment options. Recent advancements in leishmaniasis vaccine development include live attenuated vaccines, recombinant vaccines, and the use of synthetic biology. These approaches aim to induce robust immune responses while ensuring safety. Controlled human infection studies are also being explored to accelerate vaccine development. However, a licensed vaccine remains elusive.

Method

This study introduces a novel method for drug discovery targeting leishmaniasis, employing machine learning and cheminformatics to forecast the efficacy of compounds against Leishmania promastigotes. A detailed dataset consisting of 65,057 molecules sourced from the PubChem database is utilized, with the Alamar Blue-based assay applied to assess drug susceptibility. The data encoding relies on molecular fingerprints derived from Simplified Molecular Input Line Entry System (SMILES) notations. We employed three distinct fingerprint algorithms, Avalon, MACCS Key, and Pharmacophore, for the development of machine learning models. Various algorithms, including random forest, multilayer perceptron, gradient boosting, and decision tree, are utilized to create models that effectively classify molecules as either active or inactive based on their structural and chemical characteristics, which could significantly impact the drug discovery process for leishmaniasis.

Results

We additionally introduced a model based on ensembles, achieving a peak accuracy of 83.65% and an area under the curve of 0.8367. This study offers significant promise in enhancing drug discovery efforts focused on tackling the global issue of leishmaniasis.

Conclusion

Furthermore, the proposed approach has the potential to serve as a framework for addressing other overlooked tropical diseases, offering a promising alternative to conventional drug discovery methods and their associated difficulties.

Keywords: Leishmanania, Machine learning, Molecular fingerprints, Ensemble learning

Introduction

Leishmaniasis stands as a major global health challenge, occupying the ninth position among the world’s most burdensome diseases. Its endemic presence spans 98 countries and three territories across five continents, illustrating its far-reaching impact. Government reports underscore the magnitude of this issue, documenting annual occurrences exceeding 58,000 cases of visceral leishmaniasis and 220,000 cases of cutaneous leishmaniasis [1].

Epidemiological forecasts present a disconcerting outlook, projecting annual global occurrences ranging from 700,000 to 1.2 million cases of cutaneous leishmaniasis and 200,000 to 400,000 cases of visceral leishmaniasis. The distribution of this disease burden demonstrates marked geographical clustering, with six nations such as Bangladesh, India, Brazil, Sudan, South Sudan, and one other country collectively representing over 90% of reported visceral leishmaniasis cases globally.

The gravity of leishmaniasis is further accentuated by its variable but substantial mortality rates across affected regions. Reported data indicate that visceral leishmaniasis mortality rates were 7.2% in Brazil (2006), 1.5% in India (2004–2008), 6.2% in Nepal (2004–2008), and 2.4% in Bangladesh (2004–2008). These alarming statistics underscore the critical necessity for enhanced prevention strategies, improved diagnostic methodologies, and more effective treatment regimens to address this widespread and potentially lethal disease [1]. Regrettably, leishmaniasis disproportionately impacts impoverished nations, with more than 90% of mucocutaneous cases concentrated in countries like Brazil, Ethiopia, Peru, and Bolivia. Despite this, both effective and safe treatments for the disease are scarce. Profit-driven entities, such as pharmaceutical companies, tend to focus on diseases prevalent in high-income regions to recover their drug discovery and development expenses. This market-oriented approach has resulted in a significant imbalance, neglecting diseases critically important in developing countries. Between 1975 and 2004, out of 1556 new molecular entities approved, only 21 (a mere 1.3%) were developed for tuberculosis and other neglected tropical diseases, underscoring this disparity [2].

Medications like antimonial compounds, particularly meglumine antimonate, have been used for over 70 years but are linked to serious side effects. Their widespread use has also led to the rapid emergence of antimonial-resistant strains. Alternative treatments such as amphotericin B (as deoxycholate), pentamidine, and liposome-based formulations are recommended but are known for their high toxicity and cost. Miltefosine, the only orally active treatment approved in 2014 initially for cancer, has shown effectiveness against infections caused by L. panamensis, L. braziliensis, and L. guyanensis.

Traditional methods for drug discovery and development for leishmaniasis face several limitations. Current drugs often have severe side effects, such as miltefosine being teratogenic and amphotericin B being nephrotoxic, which limits their use. The emergence of drug-resistant strains further complicates treatment efficacy, particularly with antimonials and miltefosine [3, 4]. Additionally, effective treatments like liposomal amphotericin B are expensive and inaccessible to many patients, and long treatment regimens can lead to poor compliance. The lack of new chemical entities (NCEs) means that most drugs are repurposed rather than newly developed, limiting innovation in antileishmanial treatments. Furthermore, inadequate preclinical models contribute to high failure rates in clinical stages, underscoring the need for more effective and innovative approaches to drug development [5, 6].

However, there is limited data on its efficacy against old world leishmaniasis. The lack of effective treatments for leishmaniasis has resulted in a significant disease burden and increasing resistance. Efforts to introduce new drugs are struggling, leading to a widening gap in available treatments. Machine learning algorithms could predict whether the drugs have improved their effect.

The integration of machine learning techniques with cheminformatics plays a vital role in categorizing pharmaceutical compounds as either efficacious or ineffective against leishmaniasis, a task of significant importance for multiple reasons:

  • Leishmaniasis, a neglected tropical disease, presents a significant global health challenge, disproportionately affecting economically disadvantaged and vulnerable populations. This parasitic infection is particularly prevalent in regions with inadequate healthcare infrastructure. Our research leverages machine learning technologies to optimize the therapeutic research and development process, potentially transforming the landscape of drug discovery for leishmaniasis in a cost-effective manner. This innovative approach aims to accelerate the identification of novel treatments, addressing the urgent need for accessible and effective interventions against this debilitating disease.

  • The implementation of this machine learning approach significantly accelerates the drug discovery and development timeline for leishmaniasis [7]. Traditional methods have been hindered by the intricate life cycle of the leishmania parasite and the diversity of pathogenic species, resulting in a time-consuming and resource-intensive process. However, by harnessing machine learning algorithms to analyze extensive datasets and elucidate molecular patterns and relationships, we achieve a marked increase in efficiency. This expedited approach is crucial for effective disease management and the alleviation of human suffering associated with leishmaniasis. The enhanced speed of drug candidate identification and optimization has the potential to dramatically impact public health outcomes in affected regions.

Our investigation employs three distinct molecular fingerprinting techniques to train an array of machine learning models. The primary objective is to develop a robust classification system capable of differentiating between active and inactive compounds against Leishmania promastigotes. By leveraging diverse chemical descriptors, we aim to capture a comprehensive representation of molecular structures and properties. This multifaceted approach is designed to enhance the accuracy and reliability of our predictive models. The overarching goal is to establish a high-performance classifier that can effectively categorize novel compounds based on their chemical composition and characteristics, thereby accelerating the identification of potential anti-leishmanial agents.

  • Train a machine learning model using 65,057 molecules.

  • Employ different types of fingerprints for each molecule.

  • Evaluate the effectiveness of different machine learning models separately for each fingerprint type.

In this study, a machine learning and cheminformatics approach to accelerate anti-leishmanial drug discovery, addressing the urgent need for improved treatments, has been proposed. Leveraging a robust dataset of 65,057 PubChem compounds experimentally validated via Alamar Blue assays, we encoded molecular features using three fingerprinting techniques such as Avalon, MACCS Key, and Pharmacophore derived from SMILES notations. This has been intentionally employed since it provides a diverse set of molecular fingerprints based descriptors. Each fingerprint captures unique aspects of molecular information. Avalon encodes general hashed substructures, MACCS keys reflect specific, interpretable substructural patterns, and 3D pharmacophore fingerprints include spatial arrangements of functional groups critical for bioactivity. By integrating these fingerprints, we ensure that our models do not rely solely on a limited or biased molecular view. This diversification reduces the risk of fragment misrepresentation and offers a more comprehensive chemical space for the machine learning models to learn from.

Different machine learning algorithms, such as random forest (RF), multilayer perceptron (MLP), decision tree (DT), and gradient boosting (XGB), have been used for training the dataset and evaluating these representations to classify molecules as either active or inactive against Leishmania promastigotes. Furthermore, we also propose to use ensemble learning to classify compound activity against Leishmania promastigotes. This work uniquely targets leishmaniasis, a neglected tropical disease with limited therapeutic options, offering a cost-effective alternative to traditional drug discovery. By systematically comparing fingerprinting methods, we identify optimal molecular representations for this task while establishing a transferable framework applicable to other neglected diseases. The integration of cheminformatics, large-scale experimental data, and ensemble-based machine learning not only improves predictive performance but also streamlines the identification of novel drug candidates. Our approach demonstrates how computational strategies can overcome resource constraints in neglected disease research, with the potential for broader adoption in tropical disease drug development pipelines.

Related work

Machine learning (ML) algorithms have found extensive application in various domains, including drug design and discovery, where they enable multitarget prediction and rational molecule optimization. Perturbation-theory machine learning (PTML) exemplifies this advancement by integrating physicochemical perturbations and experimental conditions into predictive models, allowing simultaneous evaluation of activity, toxicity, and pharmacokinetic properties [8, 9]. For instance, PTML models have been applied to design anti-pancreatic cancer agents through multicellular target QSAR frameworks, optimizing compounds for efficacy against heterogeneous tumor microenvironments [10]. Furthermore, the PTMLL model integrates molecular descriptors and experimental conditions to predict and design anti lung cancer agents. It enables virtual screening by correlating physicochemical perturbations with biological activity, optimizing compounds for efficacy across diverse in vitro and in vivo assays [11]. Similarly, anti-leishmanial 2-acylpyrrole derivatives were computationally designed using PTML models that correlated structural features with Leishmania protease inhibition, later validated experimentally [12].

Recent advances include ensemble PTML strategies, such as fragment-based topological approaches for generating Hsp90 inhibitors with improved binding affinity, and hybrid models predicting multi-strain antibacterial agents against Staphylococcus aureus by analyzing metabolic networks and ChEMBL bioactivity data [13]. PTMLs interpretability also supports de novo drug design, as seen in virtual campaigns for dual NET/SERT inhibitors targeting mood disorders and anti-flaviviridae agents optimized for broad-spectrum activity [14]. Furthermore, Kleandrova et al. designed a dual inhibitor targeting norepinephrine and serotonin transporters (NET/SERT) for mood disorders. Achieving 80% accuracy, the model identifies active compounds and employs fragment-based topological design to create novel drug-like molecules. These molecules show potential as dual-target inhibitors, offering promising candidates for experimental validation in mood disorder therapies [15]. These models highlight MLs transformative role in accelerating drug discovery by bridging computational predictions with experimental validation.

Table 1 provides the summary of related work with highlighting their similarities and major differences with current study.

Table 1.

Summary of related work with similarities and differences by different authors

Reference Similarities Differences
Harigua-Souiai et al. Uses ML + molecular fingerprints for anti-leishmanial activity prediction Likely uses 1 fingerprint type (unspecified); our study uses 3 Lower accuracy (0.72)
Olier et al. Focuses on QSAR/bioactivity prediction using molecular representations Uses meta-learning (multi-task) vs our single-task approach. Molecular representations may differ
Tu et al. SVM + ECFP4 fingerprints for bioactivity prediction Targets FLAP inhibitors (not leishmanial). Uses only ECFP4 (not compared to other fingerprints)
Tabares-Soto et al. Compares ML/DL algorithms for biological classification Cancer classification (gene expression data, not fingerprints)
Wang et al. QSAR models for binding affinity (PPAR) using fingerprints uantitative (affinity) vs our binary classification
Liu et al. Ensemble learning + fingerprints for blood-brain barrier permeability Different target (BBB permeability vs. leishmanial activity)
Ding et al. Multi-fingerprint integration (hERG cardiotoxicity) Focus on toxicity (not efficacy). Similar fingerprint diversity but different application
Sun et al. Stacking ensemble (IDO1Stack) for inhibitor prediction Targets IDO1 inhibitors. Specific stacking method vs our ensemble approach
Cuvitoglu et al. ML for anti-cancer drug pairs Uses network biology (not fingerprints)
Krivozubov, Pal, Yuan, Li ML in phylogenetics/protein classification Unrelated to small-molecule bioactivity prediction

Emna Harigua-Souiai et al. [7] employed molecular fingerprints (FPs) from a dataset of 65,057 molecules identified as either active or inactive against Leishmania major promastigotes to build a classifier capable of predicting the anti-leishmanial potential of new molecules. Olier et al. [16] investigated the learning of quantitative structure-activity relationships (QSARs) through meta-learning, focusing on the bioactivity prediction of chemical compounds using molecular representations. Tu et al. [17] created a support vector machine model using the ECFP_4 fingerprint to predict FLAP inhibitors’ activity with high accuracy and Matthews correlation coefficient. Tabares-Soto et al. [18] compared various machine learning and deep learning algorithms to classify cancer types based on gene expression data, achieving high accuracy rates with cross-validation.

Wang et al. [19] developed QSAR models for PPARY binding affinity using machine learning algorithms and molecular fingerprints, underscoring the importance of defined applicability domains for accurate predictions. Liu et al. [20] employed ensemble-learning models and molecular fingerprints to predict the blood-brain barrier permeability of chemicals, achieving high performance metrics. Chong et al. [21] conducted a comparative study on feature selection and classification algorithms for activity class prediction, highlighting the significance of selecting suitable feature subsets for model training. Hua et al. [22] focused on the in silico prediction of chemical-induced hematotoxicity using machine learning and deep learning methods, leveraging a large dataset of hematotoxic chemicals and approved drugs.

Ding et al. [23] Our study integrated diverse multidimensional molecular fingerprints to develop a predictive model for hERG-mediated cardiotoxicity in chemical compounds. The results underscored the efficacy of incorporating multiple molecular fingerprint types in enhancing the accuracy and robustness of toxicity prediction models. Sun et al. [24] proposed a stacking ensemble model called IDO1Stack for predicting IDO1 inhibitors, providing a reliable tool for rapid screening and discovery. Manaithiya et al. [25] employed a machine learning-guided bioactivity prediction model to identify metabolic pathways targeted by active phytochemicals in Zea mays for treating diabetes and inflammation. These studies collectively demonstrate the diverse applications of machine learning in predicting various biological activities based on molecular fingerprints. Shi et al. [26] explored in silico prediction and insights into the structural basis of drug-induced nephrotoxicity, emphasizing the importance of reliable data in understanding adverse drug reactions.

Cuvitoglu et al. [27] developed a classification model to identify effective anti-cancer drug pairs using a network biology approach, deriving six network biology features from drug-perturbed transcriptome profiles and relevant biological network analyses. The model, trained on publicly available drug synergy databases using three machine learning methods, can distinguish between synergistic and non-synergistic drug combinations, with network degree activity being a crucial feature in predicting drug synergy. Krivozubov et al. [28] proposed a model to predict the quality of phylogeny reconstruction based on features from sequence alignments using the Fitch Margoliash (FM) method and a random forest predictor. This model, trained on alignments of orthologous series (OS), achieved over 80% precision in predicting phylogeny quality.

Pal et al. [29] developed a machine learning-based phylogenetic tree generation model using agglomerative clustering (PTGAC), which compared protein sequences considering the chemical properties of amino acids, proving to be more efficient in both quality and time compared to traditional methods like UPGMA. Yuan et al. [30] introduced a novel method for oxidoreductase classification utilizing dipeptide scores and recursive feature elimination on PSSM matrices, achieving high accuracy in predicting enzyme subclasses and demonstrating general applicability to other protein classification problems. Finally, Li et al. [31] developed Pippin, a machine learning-based method using random forests to accurately identify presynaptic and postsynaptic neurotoxins, outperforming several other algorithms and providing a valuable tool for neurotoxin research through an online web server.

Dataset

The proposed study utilizes a comprehensive dataset comprising 65,057 molecules, obtained from the PubChem database (AID 1063) [32]. These compounds were subjected to an Alamar Blue-based assay to evaluate their potential as anti-leishmanial agents. The assay yielded binary outcomes, categorizing each molecule as either “antileishmanial” (active) or “leishmanial” (inactive), reflecting their impact on Leishmania parasite growth and viability.

To complement this primary dataset, we incorporated a curated list of FDA-approved drugs, sourced from a GitHub repository. This additional data serves to enhance the scope and relevance of the analysis, providing context within the current pharmacological landscape.

This dataset forms the foundation of the proposed machine learning approach, offering a robust basis for predicting compound activity against Leishmania promastigotes and potentially accelerating the drug discovery process for leishmaniasis treatment. The Alamar Blue assay is primarily used to assess the cytotoxicity of compounds in vitro (Fields and Lancaster, 1993; Ahmed et al., 1994). This assay relies on metabolic activity, as suggested by its name. It utilizes resazurin, a non-fluorescent blue compound, as a fluorometric redox indicator. In the presence of diaphoresis and NADH or NADPH, resazurin is reduced inside the cell, forming the fluorescent compound resorufin. Resorufin emits a bright red fluorescence with an emission range of 580–610 nm and an excitation range of 530–570 nm. The intensity of this fluorescence is used to determine cell viability. Additionally, absorbance at 570 nm, with 600 nm as a reference, can also be used to measure the assay results.

Methodology

The workflow diagram, depicted in Fig. 1, illustrates the structure of the proposed methodology. At the outset, the raw data is pre-processed to extract relevant features and remove noise. Subsequently, the preprocessed data is fed into the machine learning pipeline, where various algorithms are employed for training and evaluation. Hyper-parameter tuning is employed to tweak the models. The trained models are then subjected to rigorous testing using unseen data to assess their generalization capabilities. Finally, the results are analyzed and interpreted to draw meaningful insights and conclusions.

Fig. 1.

Fig. 1

Block diagram of proposed work

Feature extraction

Simplified Molecular Input Line Entry System (SMILES) notation is used to represent the chemical structures of molecules. These SMILES notations were then transformed into three distinct types of fingerprints: Pharmacophore fingerprint, MACCS Key fingerprint and Avalon fingerprint.

SMILES

The SMILES is a notation system used to represent a molecule’s structure in a format that computers can understand. It follows five fundamental syntax rules:

  • Simple Chains:

  • Simple chain structures are represented by combining bond symbols and atomic symbols

  • Hydrogen atoms are not explicitly included in the representation. If the bonds represented by SMILES notation are insufficient, it is assumed that these connections are fulfilled by hydrogen atoms

  • Atoms and Bonds:

  • SMILES uses atomic symbols to represent atoms and bond symbols to represent bonds.

  • Aromatic atoms are denoted by uppercase letters, while non-aromatic atoms are represented by lowercase letters.

  • Charged Atoms:

  • In SMILES notation, charged atoms are denoted by placing brackets after the atom, with the atomic charge indicated inside the closed bracket.

Molecular Fingerprints

  • Avalon fingerprint: The Avalon fingerprint is a technique used in computational chemistry to represent the structural characteristics of a molecule as a binary code. It encodes information about the presence or absence of specific molecular fragments within the molecule’s structure, such as rings (aromatic and non-aromatic), functional groups, and other chemical motifs.

  • MACCS fingerprints The Molecular ACCess System (MACCS) fingerprints are structural key-based representations employed to quantify the similarity between molecules within a two-dimensional structural framework.

  • 3D pharmacophore fingerprint: The 3D pharmacophore fingerprint is a computational technique that encodes comprehensive details about functional groups’ 3-dimensional spatial arrangement and chemical characteristics within a ligand molecule. It captures information regarding various chemical features, such as aromatic rings, hydrophobic regions, hydrogen bond donors, hydrogen bond acceptors, and ionizable groups carrying positive or negative charges.

Machine learning algorithms

The study employed a comprehensive machine learning strategy, encompassing the implementation and evaluation of four distinct algorithms: Multilayer Perceptron (MLP), Decision Tree (DT) Extreme Gradient Boosting (XGB) and Random Forest (RF).

Random forest (RF)

RF is a highly effective ensemble learning technique for predictive modeling. It boosts accuracy and robustness by integrating multiple decision trees. Each tree is trained separately on a randomly chosen subset of the dataset, which helps reduce overfitting and enhances generalization. The algorithm introduces randomness by both sampling data and selecting features for each tree, thereby reducing the risk of correlation between the trees.

The prediction of a RF model is represented as:

graphic file with name d33e589.gif

Where:

  • Inline graphic is the predicted output,

  • Inline graphic is the number of decision trees in the forest,

  • Inline graphic is the prediction of the Inline graphic-th decision tree.

Random Forest is a versatile and robust machine learning algorithm that necessitates minimal parameter tuning. It has been successfully applied across diverse fields, including finance, healthcare, and natural language processing.

Extreme gradient boosting (XGB)

XGB is based on an ensemble learning technique that combines different weak learners to develop an efficient model. It improves predictive accuracy by iteratively minimizing the errors from the loss function, usually mean squared error or cross-entropy of previous models. A new model is trained with each new iteration to predict the residuals. The predictions from all the models are summed together to obtain the final prediction.

The prediction of a XGB model can be represented as:

graphic file with name d33e635.gif

Where:

  • Inline graphic is the predicted output,

  • Inline graphic is the number of weak learners,

  • Inline graphic is the weight assigned to the Inline graphic-th weak learner,

  • Inline graphic is the prediction of the Inline graphic-th weak learner.

Decision tree (DT)

DT is a tree-based algorithm that accelerates decision-making. It works on splitting the dataset into shorter subsets recursively with input as feature vectors. The ultimate aim of this procedure is to create homogeneous leaf nodes with respect to the target variable. In a DT, the structure can be visualized as a sequence of rules. Each internal node represents a decision based on a particular feature, and each leaf node corresponds to a predicted outcome.

Multilayer perceptron (MLP)

The MLP is a neural network model comprising several layers of interconnected nodes, known as neurons. This architecture allows MLP to serve as a universal function approximation, capable of learning intricate relationships between inputs and outputs.

The output of a neuron Inline graphic in the Inline graphic-th layer of an MLP can be calculated as:

graphic file with name d33e709.gif

Where:

  • Inline graphic is the activation of neuron Inline graphic in the Inline graphic-th layer,

  • Inline graphic is the activation function,

  • Inline graphic is the weight associated with the connection between neuron Inline graphic in layer Inline graphic and neuron Inline graphic in layer Inline graphic,

  • Inline graphic is the activation of neuron Inline graphic in layer Inline graphic,

  • Inline graphic is the bias of neuron Inline graphic in layer Inline graphic,

  • Inline graphic is the number of neurons in layer Inline graphic.

Ensemble learning

In this technique, different diverse machine learning models are used for a prediction. This approach aims to reduce the generalization error. Using ensemble learning, the prediction error decreases provided that different machine learning models are distinct and independent. One common approach in ensemble learning is soft voting, where the predicted class labels are aggregated by taking the weighted average of the predicted probabilities from multiple base models.

Let Inline graphic be the total number of base models in the ensemble, and let Inline graphic represent the predicted probability distribution based on class labels by the Inline graphicth base model for input Inline graphic, where Inline graphic. Then, the soft voting ensemble prediction Inline graphic for input Inline graphic is computed as follows:

graphic file with name d33e880.gif

Where Inline graphic is the weight assigned to the Inline graphic base model. These weights are typically determined based on prediction and validation involving testing the validation sets as well as cross-validation. For binary classification tasks, the predicted probability distribution Inline graphic for the Inline graphic base model can be represented as a vector Inline graphic, where Inline graphic is the probability of the positive class and Inline graphic is the probability of the negative class. Then, the soft voting ensemble prediction for the positive class can be calculated as

graphic file with name d33e930.gif

Similarly, for multi-class classification tasks with Inline graphic classes, the predicted probability distribution Inline graphic for the Inline graphicth base model can be represented as a vector Inline graphic, and the soft voting ensemble prediction for class Inline graphic can be calculated as:

graphic file with name d33e968.gif

Ensemble learning, particularly soft voting, often leads to improved generalization performance by leveraging the diverse strengths of individual models in the ensemble.

Hyperparameter tuning

To optimize the performance of the models, we utilized grid search, a widely used technique for hyperparameter tuning. Grid search systematically explores a predefined set of hyperparameters to identify the combination that yields the best performance according to a specified evaluation metric.

Class imbalance management

From Fig. 2 this can be observed that the dataset is highly inbalance where one class is been over represented. Therefore to address this issue, we employed Synthetic Minority Over-sampling Technique (SMOTE) in building machine learning models. SMOTE generates synthetic samples for the minority class to balance the class distribution, thereby improving the models’ ability to generalize to unseen data.

Fig. 2.

Fig. 2

Distribution of active and inactive records in the dataset

Evaluation metrics

We assessed the model’s performance using different metrics, including the confusion matrix (CM), accuracy(ACC), precision (PR), recall (RC), F1-score (F1S), and the area under the receiver operating characteristic (AUC-ROC). These metrics offer a thorough understanding of the models’ classification capabilities, encompassing both predictive accuracy and their effectiveness in distinguishing between classes.

  1. Confusion Matrix : It is a matrix representation of actual outcomes with respect to predicted outcomes. It consists of four components.

  • True Positive: Correctly predicts the positive instances.

  • True Negative: Correctly predicts the negative instances.

  • False Positive: Incorrectly predicts positive instances.

  • False Negative: Incorrectly predicts negative instances.

  • b.
    Accuracy: Measures the percentage of corrected predictions.
    graphic file with name d33e1031.gif
  • c.
    Precision: It is the ratio of true positive to predicted positive.
    graphic file with name d33e1043.gif
  • d.
    Recall (Sensitivity): It is the ratio of actual positives correctly identified.
    graphic file with name d33e1055.gif
  • e.
    F1-Score: It is the ratio of the harmonic mean of precision and recall.
    graphic file with name d33e1067.gif
  • f.
    AUC-ROC: To assess and compare the performance of various models, we plotted Receiver Operating Characteristic (ROC) curves for each one. ROC curves depict the balance between the true positive rate (sensitivity) and the false positive rate (1-specificity) across a range of threshold values, providing essential insights into the models’ ability to distinguish between classes.
    graphic file with name d33e1079.gif

    AUC-ROC ranges from 0 to 1, with higher values indicating better performance.

  • g.

    TP (%): The percentage of all instances in the test set that were correctly predicted as Active (True Positives). This is calculated as (Number of True Positives / Total Number of Test Instances) * 100.

  • h.

    TN (%): The percentage of all instances in the test set that were correctly predicted as Inactive (True Negatives). This is calculated as (Number of True Negatives / Total Number of Test Instances) * 100.

  • i.

    FP (%): The percentage of all instances in the test set that were incorrectly predicted as Active when they were actually Inactive (False Positives). This is calculated as (Number of False Positives / Total Number of Test Instances) * 100.

  • j.

    FN (%): The percentage of all instances in the test set that were incorrectly predicted as Inactive when they were actually Active (False Negatives). This is calculated as (Number of False Negatives / Total Number of Test Instances) * 100.

Work flow

The proposed framework is shown in Fig. 1. Initially, the study utilized molecular fingerprints generated from SMILES notations to represent chemical structures in a machine-readable format. These SMILES were then converted into three different types of molecular fingerprints, such as Avalon, MACCS Key, and Pharmacophore. The fingerprints function as feature vectors (denoted as w in the equations) for the machine learning models. Each fingerprint encodes different aspects of the molecule’s structure and chemical properties, aiming to capture the characteristics that might influence its activity against Leishmania. These fingerprints served as the foundational input for subsequent machine learning models, transforming complex chemical information into quantifiable numerical representations suitable for computational analysis.

Four classification algorithms were applied to predict anti-leishmanial activity. DT constructed hierarchical splits based on fingerprint features; RF combined multiple DTs through ensemble voting to improve generalization; XGB sequentially optimized weak learners to correct prediction errors; and MLP modeled complex nonlinear relationships using neural networks. Each algorithm processed the fingerprint inputs to distinguish active from inactive compounds, leveraging different mathematical approaches to learn patterns in the chemical data.

The study enhanced predictive performance through a soft voting ensemble that strategically combined predictions from all individual models. This approach balanced the strengths of diverse algorithms while mitigating their individual weaknesses. Comprehensive hyperparameter tuning via grid search optimized each model’s configuration, and the Synthetic Minority Oversampling Technique (SMOTE) addressed dataset imbalance by generating synthetic active compounds. Together, these strategies produced a robust, high-performance classification system for identifying potential anti-leishmanial drug candidates.

Results

The following results were obtained by training machine learning models on three different feature extraction techniques from the molecules.

Avalon

Table 2 and Fig. 3 showcases the performance metrics of several machine learning models assessed in the research study. Each model’s ACC, PR, F1S and AUC-ROC are summarized. Remarkably, the ensemble model, formed by amalgamating various base models, emerges as the top performer across multiple metrics, showcasing strong ACC, PR and F1S. Additionally, the MLP model demonstrates robust performance, particularly in F1-Score. Furthermore, the XGB model exhibits noteworthy precision. These findings provide valuable insights into the efficacy of diverse machine learning algorithms, underscoring the ensemble model’s prominence as the leading performer.

Table 2.

Classification performance metrics for avalon features

Model ACC PR F1S AUC-ROC TP (%) TN (%) FP (%) FN (%)
DT 0.9208 0.8385 0.8559 0.9081 23.45% 68.27% 4.51% 3.38%
MLP 0.9434 0.8663 0.8989 0.9661 25.02% 68.87% 3.86% 1.77%
RF 0.7803 0.6535 0.4899 0.7893 10.50% 67.06% 5.56% 16.29%
XGB 0.7943 0.7734 0.4667 0.8079 8.99% 70.34% 2.63% 17.89%
Ensemble 0.9356 0.9008 0.8773 0.9658 23.40% 71.70% 2.58% 3.96%

Fig. 3.

Fig. 3

AUC-ROC plot for avalon features

From Table 2 and Fig. 3 it can be observed that the Avalon feature classification results reveal that the MLP model excels with the highest AUC (0.9661) and accuracy (0.9434), coupled with strong precision (0.8663) and a balanced FP rate (3.86%), indicating robust discriminative power and reliability. The ensemble model closely follows, achieving near-identical AUC-ROC (0.9658) and high accuracy (0.9356), while maintaining the lowest FP (1.89%), making it ideal for minimizing false positives. In contrast, RF and XGB underperform, with lower AUC-ROC (0.7893 and 0.8079, respectively) and elevated FP rates (728 for RF, 343 for XGB), though XGB’s precision (0.7734) suggests better FP control than RF. DT shows moderate performance (AUC = 0.9081, accuracy = 0.9208) but suffers from higher FP (5.39%) compared to MLP and ensemble. Overall, MLP is the top performer for Avalon features, while the ensemble offers a compelling alternative for FP-sensitive applications, whereas RF and XGB require caution due to weaker class separation and higher misclassification risks.

MLP and the ensemble show high TP% (25.02% and 23.40% respectively) and high TN% (68.87% and 71.70% respectively), indicating they correctly identify a large portion of both active and inactive compounds. These models also exhibit low FP% (3.86% for MLP, 2.58% for Ensemble) and low FN% (1.77% for MLP, 3.96% for ensemble). A low FP% is particularly desirable, especially in drug discovery, to minimize resources spent pursuing inactive compounds incorrectly flagged as active. A low FN% is also crucial to avoid missing potentially active compounds. The RF and XGB models show significantly lower TP% (10.50% and 8.99%) and higher FN% (16.29% and 17.89%), indicating they miss a large proportion of the actual active compounds. They also have relatively higher FP% compared to MLP and ensemble. This aligns with their generally lower overall performance metrics (ACC, AUC-ROC).

MACCS

Table 3 and Fig. 4 presents the performance metrics of various machine learning models assessed in the research study. Each model’s ACC, PR, F1S and AUC-ROC are outlined. Notably, the ensemble model, a composite of multiple base models, emerges as the top performer across all evaluated metrics, showcasing strong ACC, PR, F1S and AUC-ROC. Similarly, the MLP model demonstrates commendable performance, particularly in Accuracy and Precision. Furthermore, the DT and XGB models display competitive performance, with relatively balanced metrics across ACC, PR and F1S. Conversely, the RF model exhibits comparatively lower metrics but still demonstrates reasonable performance. Overall, these findings provide valuable insights into the efficacy of various machine learning algorithms, highlighting the ensemble and MLP models as prominent performers in the study.

Table 3.

Classification performance metrics for MACCS feature

Model ACC PR F1S AUC-ROC TP (%) TN (%) FP (%) FN (%)
DT 0.9084 0.8036 0.8370 0.9143 23.39% 66.95% 5.71% 3.39%
MLP 0.8811 0.7536 0.7899 0.9148 22.23% 65.43% 7.27% 4.56%
RF 0.6897 0.4380 0.4824 0.6993 14.37% 54.20% 18.43% 12.40%
XGB 0.7067 0.4629 0.5058 0.7326 14.92% 55.33% 17.30% 11.85%
Ensemble 0.8829 0.7673 0.7886 0.9270 21.97% 66.81% 6.66% 5.12%

Fig. 4.

Fig. 4

AUC-ROC plot for MACCS features

From Table 3 and Fig. 4 ot can be observed that the MACCS feature classification results highlight the ensemble model as the top performer with the highest AUC (0.9270) and competitive accuracy (0.8829), while maintaining a moderate FP rate (6.66%), suggesting strong overall discriminative ability. The DT and MLP models show comparable AUC-ROC values (0.9143 and 0.9148, respectively), with DT achieving slightly better accuracy (0.9084) and lower FP (5.71%) than MLP (FP = 7.27%), making DT preferable for balancing precision and sensitivity. In contrast, RF and XGB exhibit poor performance, with the lowest AUC-ROC (0.6993 and 0.7326), high FP rates (18.43% and 17.30%), and weak precision (0.4380 and 0.4629), indicating significant misclassification and unreliable positive predictions. The AUC-ROC curve further confirms these trends, with ensemble dominating the top-left ideal region, while RF and XGB lag near the diagonal. For MACCS-based applications, the ensemble model is optimal for robust performance, whereas DT offers a simpler alternative with fewer false positives, and RF/XGB should be avoided due to their high error rates.

Performance metrics, including the TP%, TN%, FP%, and FN% values, are generally less favorable across all models compared to using Avalon features. This is highlighted by higher FP% and FN% across the board relative to Avalon, suggesting less accurate discrimination between active and inactive compounds with MACCS keys. The ensemble model again shows the highest TN% (66.81%) and a moderate TP% (21.97%), along with a relatively lower FP% (6.66%) compared to other models like RF and XGB. DT shows a high TP% (23.39%) and the lowest FP% (5.71%) among individual models for MACCS, making it notable for minimizing false positives. RF and XGB again show poor performance, with low TP% (14.37%, 14.92%), high TN% (54.20%, 55.33%), and particularly high FP% (18.43%, 17.30%) and FN% (12.40%, 11.85%). This suggests they misclassify a significant number of compounds.

Pharmacophore

Table 4 and Fig. 5 showcases the performance metrics of various machine learning models evaluated in the research study. Each model’s ACC, PR, F1S and AUC-ROC are delineated. Notably, the ensemble model, formed by amalgamating multiple base models, emerges as the top performer across all metrics, exhibiting robust ACC, PR, F1S and AUC-ROC. Similarly, the MLP model demonstrates notable performance, particularly excelling in accuracy and precision. Furthermore, both the DT and XGB models display competitive performance, showing balanced metrics across ACC, PR and F1S. Conversely, the RF model exhibits relatively lower metrics but still demonstrates reasonable performance. These results offer valuable insights into the efficacy of diverse machine learning algorithms, with the ensemble and MLP models standing out as prominent performers in the study.

Table 4.

Classification performance metrics for pharmacophore features

Model ACC PR F1S AUC-ROC TP (%) TN (%) FP (%) FN (%)
DT 0.9172 0.8148 0.8533 0.9117 23.71% 66.65% 5.39% 2.76%
MLP 0.9347 0.8945 0.8761 0.9509 22.69% 69.14% 2.67% 3.73%
RF 0.6738 0.4133 0.4558 0.6744 13.34% 52.48% 18.94% 12.93%
XGB 0.7947 0.6420 0.5832 0.8026 14.15% 64.12% 7.89% 12.34%
Ensemble 0.9373 0.9214 0.8778 0.9518 22.17% 70.00% 1.89% 4.28%

Fig. 5.

Fig. 5

AUC-ROC plot for pharmacophore features

From Table 4 and Fig. 5 it can be observed that the pharmacophore classification results demonstrate that the Ensemble model delivers the strongest overall performance, achieving the highest AUC-ROC (0.9518), accuracy (0.9373), and precision (0.9214) while maintaining the lowest false positives (FP = 1.89%), making it the most reliable for distinguishing true pharmacophore features. In contrast, RF performs poorly, with the lowest AUC-ROC (0.6744), high FP (18.94%), and low precision (0.4133), indicating weak class separation and excessive false alarms. MLP shows competitive metrics (AUC = 0.9509, accuracy = 0.9347) but has slightly higher false negatives (FN = 3.73%) than the ensemble, suggesting a marginally more aggressive classification tendency. XGB falls in the mid-range, with moderate AUC (0.8026) but notable false negatives (FN = 12.34%), highlighting a trade-off between sensitivity and specificity. The ROC-AUC curve reinforces these findings, with ensemble and MLP dominating the top-left ideal region, while RF’s performance nears random guessing. For pharmacophore-based drug discovery, the ensemble model is optimal, balancing high true positives with minimal false positives, whereas RF should be avoided due to its high misclassification rate.

The ensemble model achieves the highest TN% (70.00%) and a good TP% (22.17%), alongside the lowest FP% (1.89%) and a relatively low FN% (4.28%). The very low FP% for the ensemble with Pharmacophore features is a strong indicator of its reliability in predicting positives, minimizing the pursuit of inactive compounds. The MLP model also performs well, with high TN% (69.14%) and a high TP% (22.69%), although its FP% (2.67%) and FN% (3.73%) are slightly higher than the ensemble’s. Similar to the other fingerprint types, RF shows very low TP% (13.34%), high FN% (12.93%), and the highest FP% (18.94%), indicating poor performance. XGB shows moderate performance, with a decent TN% (64.12%) but a relatively low TP% (14.15%) and high FN% (12.34%).

Furthermore, we also computed the top ten features of pharmacophores using XGB, and they are as follows:

  • Bit 64: Hydrophobe PosIonizable Inline graphic

  • Bit 74: LumpedHydrophobe PosIonizable Inline graphic

  • Bit 215: Acceptor Acceptor NegIonizable Inline graphic

  • Bit 664: Acceptor LumpedHydrophobe PosIonizable Inline graphic

  • Bit 75: NegIonizable NegIonizable Inline graphic

  • Bit 67: LumpedHydrophobe LumpedHydrophobe Inline graphic

  • Bit 675: Acceptor LumpedHydrophobe PosIonizable Inline graphic

  • Bit 201: Acceptor Acceptor LumpedHydrophobe Inline graphic

  • Bit 16: Acceptor NegIonizable Inline graphic

  • Bit 1658: Hydrophobe Hydrophobe LumpedHydrophobe Inline graphic

The top pharmacophoric features identified in the study highlight critical molecular interactions required for anti-leishmanial activity. These features combine hydrophobic regions, charged groups (PosIonizable/NegIonizable), and hydrogen bond acceptors in specific spatial arrangements. For example, features like Hydrophobe + PosIonizable (Bit 64) and Acceptor + LumpedHydrophobe + PosIonizable (Bit 664) reflect the need for compounds to bind mixed-polarity pockets in Leishmania enzymes, balancing hydrophobic interactions with electrostatic complementarity. Such patterns align with known targets like leishmanolysin (gp63), where hydrophobic residues (e.g., ILE255) and charged residues (e.g., HIS264) dominate active sites. Distance bins (e.g., 0 – 2 Å, 2 – 5 Å) further refine how these groups spatially align to optimize binding.

Multi-feature pharmacophores, such as Acceptor + Acceptor + NegIonizable (Bit 215), suggest interactions with metalloenzyme active sites, often involving zinc coordination. These patterns are critical for inhibiting zinc-dependent proteases, a common drug target in Leishmania. Conversely, less intuitive features like NegIonizable NegIonizable (Bit 75) may indicate niche interactions with positively charged residues or assay-specific artifacts. Hydrophobic clusters (e.g., LumpedHydrophobe + LumpedHydrophobe, Bit 67) likely enhance bioavailability by improving membrane permeability or stabilizing binding to nonpolar enzyme pockets.

Overall, the model prioritizes features that balance hydrophobicity, charge complementarity, and hydrogen bonding key traits for targeting Leishmania enzymes. This aligns with known inhibitors (e.g., flavonoids, chalcones) that use aromatic systems for Inline graphic stacking, hydroxyl groups for H-bonding, and hydrophobic side chains for binding. By encoding these 3D interaction patterns, the pharmacophore model efficiently guides virtual screening and rational drug design against leishmaniasis

Discussion

Table 5 summarizes the performance metrics for each feature extraction technique (i.e., MACCS keys, and pharmacophore fingerprints) across all models.

Table 5.

Classification performance using ensemble Model

Feature Extraction ACC PR F1S AUC TP (%) TN (%) FP (%) FN (%)
Avalon 0.9356 0.9008 0.8773 0.9658 22.67% 69.40% 2.50% 3.83%
MACCS 0.8829 0.7673 0.7886 0.9270 21.97% 66.81% 6.66% 5.12%
Pharmacophore 0.9373 0.9214 0.8778 0.9518 22.17% 70.00% 1.89% 4.28%

The research study conducted a comparative evaluation of three different models, such as Avalon, MACCS, and Pharmacophore, for the purpose of detecting Leishmania parasites. The results, presented in a tabular format, demonstrate that the Avalon model outperforms the other two models across all the assessed metrics. Its robust performance underscores its efficacy in accurately detecting Leishmania instances. Following closely behind is the pharmacophore model, which demonstrates competitive metrics, particularly excelling in precision. Meanwhile, the MACCS model exhibits slightly lower scores across all metrics but still showcases reasonable performance in Leishmania detection. These findings provide valuable insights into the comparative effectiveness of different models, with the Avalon and pharmacophore models standing out as promising candidates for accurate Leishmania detection applications.

From Table 2, 3, 4, it can be observed that models like MLP and XGB consistently performed better than simpler models such as DT, particularly with 3D pharmacophore and Avalon fingerprints. The Avalon features (Table 2) show that the MLP model outperformed the ensemble model in accuracy (94.34% vs. 93.56%) and AUC-ROC (0.9661 vs. 0.9658), albeit by small margins of 0.78% points and 0.0003, respectively. However, the Ensemble model achieved the lowest false positive rate (FP%) at 2.58%, compared to MLPs 3.86%, a difference of 1.28% points. The widest gap in AUC-ROC was between MLP (0.9661) and RF (0.7893), with a notable difference of 0.1768.

For the MACCS features (Table 3), the DT model led in accuracy at 90.84%, surpassing the ensemble and MLP models by over 2.5% points. However, the Ensemble model had the highest AUC-ROC (0.9270), marginally outperforming MLP and DT by approximately 0.012. False positives varied widely, with DT at 5.71% and RF at 18.43%, a stark difference of 12.72% points. The largest AUC-ROC disparity was between ensemble (0.9270) and RF (0.6993), a gap of 0.2277.

In the Pharmacophore features (Table 4), the Ensemble model slightly outperformed MLP in accuracy (93.73% vs. 93.47%) and AUC-ROC (0.9518 vs. 0.9509), with differences of 0.26% points and 0.0009, respectively. The Ensemble model also had the lowest FP% at 1.89%, compared to MLPs 2.67%, a 0.78-percentage-point advantage. The most significant AUC-ROC gap was between ensemble (0.9518) and RF (0.6744), with a difference of 0.2774.

The ensemble model achieves the highest TN% (70.00%) and a good TP% (22.17%), alongside the lowest FP% (1.89%) and a relatively low FN% (4.28%). The very low FP% for the Ensemble with Pharmacophore features is a strong indicator of its reliability in predicting positives, minimizing the pursuit of inactive compounds. The MLP model also performs well, with high TN% (69.14%) and a high TP% (22.69%), although its FP% (2.67%) and FN% (3.73%) are slightly higher than the ensemble’s. Similar to the other fingerprint types, RF shows very low TP% (13.34%), high FN% (12.93%), and the highest FP% (18.94%), indicating poor performance. XGB shows moderate performance, with a decent TN% (64.12%) but a relatively low TP% (14.15%) and high FN% (12.34%).

This suggests that nonlinear models are better suited to learning the complex, multidimensional relationships between structural features and bioactivity, especially when subtle fragment combinations or 3D spatial features determine activity. In the case of MACCS keys, performance was generally lower across all models. This may reflect the limited structural coverage of the 166 MACCS bits, which could miss out on more nuanced substructures important in Leishmania inhibition. Yet, MACCS fingerprints still contributed valuable fragment-level interpretability, especially regarding functional groups known to interact with biological targets.

The ensemble model demonstrated varying performance across different feature sets, with Pharmacophore features yielding the highest accuracy (93.73%), slightly outperforming Avalon (93.56%) by 0.17% points and significantly surpassing MACCS (88.29%) by 5.44% points. In terms of AUC-ROC, Avalon features achieved the best performance (0.9658), exceeding Pharmacophore (0.9518) by 0.014 and MACCS (0.9270) by 0.0388. False positives were lowest with Pharmacophore (1.89%), followed by Avalon (2.50%) and MACCS (6.66%), resulting in a 4.77-percentage-point gap between the best and worst FP%. Meanwhile, true positives (TP%) were highest with Avalon (22.67%), marginally ahead of Pharmacophore (22.17%) and MACCS (21.97%), with a narrow difference of just 0.70% points between the top and bottom performers. This highlights the trade-offs between feature sets, with Avalon excelling in AUC-ROC and TP%, Pharmacophore leading in accuracy and FP% reduction, and MACCS generally lagging behind in most metrics.

The ensemble model benefits by integrating predictions across all these fingerprint spaces and learning patterns that might not be evident in any single representation. For example, some models may better capture hydrophobic interactions, while others highlight hydrogen bonding or ring systems, all of which are important in drug-likeness and antiparasitic activity. Therefore, the ensemble’s superior performance is not only expected from a technical perspective but also meaningful from a biological one: it reflects a more holistic capture of chemically diverse features contributing to Leishmania inhibition, supporting the potential utility of ensemble learning in virtual screening pipelines.

The ensemble model provides the best results because it combines the strengths of multiple individual models, reducing overfitting and balancing the bias-variance trade-off. Aggregating diverse predictions from models like DT, MLP, RF, and XGB, as well as an ensemble, can correct individual errors and capture a broader range of patterns in the data. This leads to improved stability, robustness, and generalization, as reflected in its superior performance metrics, outperforming each standalone model.

Avalon outperforms MACCS and pharmacophore in Leishmania drug classification because it offers a more detailed and extensive representation of molecular fingerprints, capturing structural and chemical features relevant to biological activity more effectively. Here are some reasons for its superior performance:

  • Detailed Molecular Representation: Avalon fingerprints provide a more comprehensive and nuanced depiction of molecular structures compared to MACCS and pharmacophorese, allowing for better feature extraction by machine learning models.

  • High Information Density: The information-rich nature of Avalon fingerprints enables them to encode complex molecular features and interactions, which enhances the machine learning algorithms’ ability to distinguish between active and inactive compounds.

  • Machine Learning Optimization: Designed with machine learning applications in mind, Avalon fingerprints facilitate more effective learning from data, helping models form accurate decision boundaries and improving overall performance.

Comparisons with previous techniques

Table 6 provides a concise comparison of the performance metrics between the Emna Harigua-Souiai et. al. [7]. Notably, the proposed approach yields substantial enhancements across all key metrics, including accuracy, F1 score, and recall. Specifically, our model achieves an accuracy of 0.84, a significant improvement from the base paper’s 0.72. Similarly, both F1-score and recall witnessed substantial boosts, reaching 0.84 and 0.84, respectively, compared to 0.39 and 0.79 reported in the base paper. These enhancements underscore the effectiveness of the proposed methodology in this domain, promising improved performance and reliability in Leishmania detection.

Table 6.

Metrics comparison

Model Accuracy F1-Score Recall AUC-ROC
Emna Harigua-Souiai et al. [7] 0.72 0.39 0.79 0.81
Proposed Work 0.94 0.88 0.84 0.96

The proposed ensemble model (using Avalon features, as indicated by its matching 0.96 AUC-ROC) shows marked improvements across all evaluated metrics compared to Emna Harigua-Souiai et al.’s previous work [7]. It achieves superior accuracy (0.94 vs. 0.72, +22% points), a substantially higher F1-score (0.88 vs. 0.39, +0.49), and marginally better recall (0.84 vs. 0.79, +5% points). Most notably, the proposed model demonstrates a significant 0.15 advantage in AUC-ROC (0.96 vs 0.81), highlighting its enhanced discriminative power and overall performance. These results collectively underscore the substantial advancements made by the current approach.

The study provides a framework for predicting the efficacy of compounds against Leishmania promastigotes using various machine learning algorithms, such as RF, XGB, DT, and ensemble learning, alongside molecular fingerprinting techniques like Avalon, MACCS Key, and Pharmacophore. Further investigation would be carried out to compute the most important features/groups in each respective fingerprint model using xAI with SNAP. We would further investigate the molecular behavior influencing the activity of drug discovery. Investigating advanced deep learning techniques, such as graph representation learning, could be a subsequent step. Graph representation learning may capture more intricate relationships and features within molecular structures compared to the molecular fingerprints used in this study [33]. This approach might offer a more detailed understanding of the structural and chemical characteristics that affect the activity of molecules against Leishmania.

Conclusion

In this study, we assessed the efficacy of various machine learning models trained on features extracted through Avalon fingerprints, MACCS keys, and pharmacophore fingerprints for Leishmanial Activity Prediction. Our findings indicate that ensemble modeling consistently outperforms individual models, with the ensemble model leveraging Avalon fingerprints exhibiting the highest ACC, PR, F1S and AUC-ROC. Notably, Avalon fingerprints consistently outshine other techniques, showcasing their efficacy in capturing molecular structure and enabling more accurate predictions. Hence, we recommend employing an ensemble model utilizing Avalon fingerprints for optimal performance in Leishmanial Activity Prediction tasks, underscoring the significance of feature extraction techniques and ensemble methods in enhancing predictive capabilities. Future research could delve into exploring additional feature extraction techniques and ensemble strategies to further refine predictive performance in similar contexts.

Author contribution

All authors contributed equally. Pallavi conducted in-depth research. Maulik coding. Saif Nalband concept and idea. Surabhi Sonam provided biological aspects of study. Mansi, Femi and Amalin Prince: Proof read and editing.

Funding

Not applicable.

Data availability

Data is provided within the manuscript or supplementary information files: https://pubchem.ncbi.nlm.nih.gov/bioassay/1063.

Code availability

None.

Materials availability

None.

Declarations

Ethics approval and consent to participate

Not applicable.

Consent for publication

Not applicable.

Clinical

Non-clinical study.

Competing interests

The authors declare no competing interests.

Footnotes

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

References

  • 1.Alvar J, Vélez ID, Bern C, Herrero M, Desjeux P, Cano J, Jannin J. Margriet den Boer, and WHO leishmaniasis control team. Leishmaniasis worldwide and global estimates of its incidence. PLoS One. 2012;7(5):e35671. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Kato H. Epidemiology of leishmaniasis: Risk factors for its pathology and infection. Parasitol Int. 2025;105(102999). [DOI] [PubMed]
  • 3.Isabel Olias-Molero A, de la Fuente C, Cuquerella M, Torrado JJ, Alunda JM. Antileishmanial drug discovery and development: Time to reset the model? Microorganisms. 2021;9(12):2500. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Pengwei H, Huang Y-A, Mei J, Leung H, Chen Z-H, Kuang Z-M, You Z-H, Lun H. Learning from low-rank multimodal representations for predicting disease-drug associations. Bmc Med Inform Decis. 2021;21:1–13. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Wang Y-B, You Z-H, Yang S, Hai-Cheng Y, Chen Z-H, Zheng K. A deep learning-based method for drug-target interaction prediction based on long short-term memory neural network. Bmc Med Inform Decis. 2020;20:1–9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Corman HN, McNamara CW, Bakowski MA. Drug discovery for cutaneous leishmaniasis: A review of developments in the past 15 years. Microorganisms. 2023;11(12):2845. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Harigua-Souiai E, Oualha R. Oussama Souiai, Ines Abdeljaoued-Tej, and Ikram Guizani. Applied machine learning toward drug discovery enhancement: Leishmaniases as a case study. Bioinform Biol Insights. 2022;16(11779322221090349). [DOI] [PMC free article] [PubMed]
  • 8.Santiago C, Ortega-Tenezaca B, Barbolla I, Fundora-Ortiz B, Arrasate S, Auxiliadora Dea-Ayuela M, González-Díaz H, Sotomayor N, Lete E. Prediction of antileishmanial compounds: General model, preparation, and evaluation of 2-acylpyrrole derivatives. J Appl Psychol Chemical Information And Modeling. 2022;62(16):3928–40. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Kleandrova VV, Speck-Planche A. Ptml modeling for pancreatic cancer research: In silico design of simultaneous multi-protein and multi-cell inhibitors. Biomedicines. 2022;10(2):491. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Speck-Planche A. Combining ensemble learning with a fragment-based topological approach to generate new molecular diversity in drug discovery: In silico design of hsp90 inhibitors. ACS Omega. 2018;3(11):14704–16. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Kleandrova VV, Natália Ds Cordeiro M, Speck-Planche A. Perturbation theory machine learning model for phenotypic early antineoplastic drug discovery: Design of virtual anti-lung-cancer agents. Appl Sci. 2024;14(20):9344. [Google Scholar]
  • 12.Speck-Planche A. Multicellular target qsar model for simultaneous prediction and design of anti-pancreatic cancer agents. ACS Omega. 2019;4(2):3122–32. [Google Scholar]
  • 13.Kleandrova VV, Natália Ds Cordeiro M, Speck-Planche A. In silico approach for antibacterial discovery: Ptml modeling of virtual multi-strain inhibitors against staphylococcus aureus. Pharmaceuticals. 2025;18(2):196. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Diéguez-Santana K, Casañola-Martin GM, Torres R, Rasulev B, Green JR, González-Díaz H. Machine learning study of metabolic networks vs chembl data of antibacterial compounds. Mol Pharm. 2022;19(7):2151–63. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Kleandrova VV, Natália Ds Cordeiro M, Speck-Planche A. Perturbation-theory machine learning for mood disorders: Virtual design of dual inhibitors of net and sert proteins. BMC Chem. 2025;19(1):2. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Olier I, Sadawi N, Bickerton GR, Vanschoren J, Grosan C, Soldatova L, King RD. Meta-qsar: A large-scale application of meta-learning to drug design and discovery. Mach Learn. 2018;107:285–311. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Guiping T, Qin Z, Huo D, Zhang S, Yan A. Fingerprint-based computational models of 5-lipo-oxygenase activating protein inhibitors: Activity prediction and structure clustering. Chem Biol Drug Des. 2020;96(3):931–47. [DOI] [PubMed] [Google Scholar]
  • 18.Tabares-Soto R, Orozco-Arias S, Romero-Cano V, Segovia Bucheli V, Luis Rodríguez-Sotelo J, Felipe Jiménez-Varón C. A comparative study of machine learning and deep learning algorithms to classify cancer types based on microarray gene expression data. PeerJ Comput Sci. 2020;6(270). [DOI] [PMC free article] [PubMed]
  • 19.Wang Z, Chen J, Hong H. Developing qsar models with defined applicability domains on pparInline graphic binding affinity using large data sets and machine learning algorithms. Environ Sciamp; Technol. 2021;55(10):6857–66. [DOI] [PubMed]
  • 20.Liu L, Zhang L, Feng H, Shimeng L, Liu M, Zhao J, Liu H. Prediction of the blood–brain barrier (bbb) permeability of chemicals based on machine-learning and ensemble methods. Chem Res Toxicol. 2021;34(6):1456–67. [DOI] [PubMed] [Google Scholar]
  • 21.Chong J, Tjurin P, Niemelä M, Jämsä T, Farrahi V. Machine-learning models for activity class prediction: A comparative study of feature selection and classification algorithms. Gait Posture. 2021;89:45–53. [DOI] [PubMed] [Google Scholar]
  • 22.Hua Y, Shi Y, Cui X, Xiao L. In silico prediction of chemical-induced hematotoxicity with machine learning and deep learning methods. Mol Divers. 2021;25(3):1585–96. [DOI] [PubMed] [Google Scholar]
  • 23.Ding W, Nan Y, Juanshu W, Han C, Xin X, Siyuan L, Liu H, Zhang L. Combining multi-dimensional molecular fingerprints to predict the herg cardiotoxicity of compounds. Comput Biol Med. 2022;144(105390). [DOI] [PubMed]
  • 24.Sun H, Yang Q, Xinxin Y, Huang M, Ding M, Weihua L, Tang Y, Liu G. Prediction of ido1 inhibitors by a fingerprint-based stacking ensemble model named ido1stack. ChemMedchem. 2023;18(17):202300151. [DOI] [PubMed] [Google Scholar]
  • 25.Manaithiya A, Bhowmik R, Elhenawy AA, Sharma S, Dinesh S, Parkkila S, Aspatwar A. Molecular insights into zea mays active phytochemicals for diabetes and inflammation treatment: A web app-based machine learning and network pharmacology approach. bioRxiv. 2024;2024–25.
  • 26.Shi Y, Hua Y, Wang B, Zhang R, Xiao L. In silico prediction and insights into the structural basis of drug induced nephrotoxicity. Front Pharmacol. 2022;12(793332). [DOI] [PMC free article] [PubMed]
  • 27.Cuvitoglu A, Zhou JX, Huang S, Isik Z. Predicting drug synergy for precision medicine using network biology and machine learning. J Bioinform Comput Biol. 2019;17(2):1950012. [DOI] [PubMed] [Google Scholar]
  • 28.Krivozubov M, Goebels F, Spirin S. Estimation of relative effectiveness of phylogenetic programs by machine learning. J Bioinform Comput Biol. 2014;12(2):1441004. [DOI] [PubMed] [Google Scholar]
  • 29.Pal J, Saha S, Maji B, Kumar Bhattacharya D. Ptgac model: A machine learning approach for constructing phylogenetic tree to compare protein sequences. J Bioinform Comput Biol. 2023;21(1):2250028. [DOI] [PubMed] [Google Scholar]
  • 30.Yuan F, Liu G, Yang X, Wang S, Wang X. Prediction of oxidoreductase subfamily classes based on rfe-snd-cc-pssm and machine learning methods. J Bioinform Comput Biol. 2019;17(4):1950029. [DOI] [PubMed] [Google Scholar]
  • 31.Pengyu L, Zhang H, Zhao X, Jia C, Fuyi L, Song J. Pippin: A random forest-based method for identifying presynaptic and postsynaptic neurotoxins. J Bioinform Comput Biol. 2020;18(2):2050008. [DOI] [PubMed] [Google Scholar]
  • 32.Pittsburgh molecular library screening Cente. National center for biotechnology information. PubChem Bioassay Rec AID. 2025;1063. https://pubchem.ncbi.nlm.nih.gov/bioassay/1063
  • 33.Yang Y, Guodong L, Dongxu L, Zhang J, Pengwei H, Lun H. Integrating fuzzy clustering and graph convolution network to accurately identify clusters from attributed graph. IEEE Transactions on Network Science and Engineering. 2024.

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

Data is provided within the manuscript or supplementary information files: https://pubchem.ncbi.nlm.nih.gov/bioassay/1063.

None.

None.


Articles from BMC Medical Informatics and Decision Making are provided here courtesy of BMC

RESOURCES