Skip to main content
ACS AuthorChoice logoLink to ACS AuthorChoice
. 2024 May 9;15(11):2144–2159. doi: 10.1021/acschemneuro.3c00840

Identifying Substructures That Facilitate Compounds to Penetrate the Blood–Brain Barrier via Passive Transport Using Machine Learning Explainer Models

Lucca Caiaffa Santos Rosa 1, Caio Oliveira Argolo 1, Cayque Monteiro Castro Nascimento 1, Andre Silva Pimentel 1,*
PMCID: PMC11157485  PMID: 38723285

Abstract

graphic file with name cn3c00840_0006.jpg

The local interpretable model-agnostic explanation (LIME) method was used to interpret two machine learning models of compounds penetrating the blood–brain barrier. The classification models, Random Forest, ExtraTrees, and Deep Residual Network, were trained and validated using the blood–brain barrier penetration dataset, which shows the penetrability of compounds in the blood–brain barrier. LIME was able to create explanations for such penetrability, highlighting the most important substructures of molecules that affect drug penetration in the barrier. The simple and intuitive outputs prove the applicability of this explainable model to interpreting the permeability of compounds across the blood–brain barrier in terms of molecular features. LIME explanations were filtered with a weight equal to or greater than 0.1 to obtain only the most relevant explanations. The results showed several structures that are important for blood–brain barrier penetration. In general, it was found that some compounds with nitrogenous substructures are more likely to permeate the blood–brain barrier. The application of these structural explanations may help the pharmaceutical industry and potential drug synthesis research groups to synthesize active molecules more rationally.

Keywords: explainable artificial intelligence, structural alerts, LIME, XAI, XML

Introduction

Currently, one of the main challenges in medicine involving central nervous system (CNS) therapy is addressing the shortage of potential drugs that can cross the blood–brain barrier (BBB).17 The BBB is a highly selective structure that protects the CNS from neurotoxic substances, bacteria, parasites, and other solutes. It is related to diseases such as meningitis,8 multiple sclerosis,9 and Alzheimer’s.10 In addition to the requirement of good activity, adequate metabolic properties, and low toxicity, the predictability of the blood–brain barrier permeability of an active compound is an important aspect of drug development using in silico techniques.11,12 An example of this is the search for candidates that have similar molecular and pharmacokinetic properties to lomustine, an effective anticancer agent in the treatment of brain tumors in children13 and other studies.1422

The development of large datasets in drug discovery has further enhanced computational techniques and increased their use in assisting in high-throughput screening of molecular filters, which are fundamental in the search for new candidates. These filters help reduce in vivo and in vitro assays by focusing on using better-suited molecules for laboratory testing, saving time and money.2326 Recently, machine learning (ML) methods have generated quantitative relationships between drug penetration and drug properties such as molecular weight, polar surface area, and partition coefficients.27 Furthermore, a classification model was developed to improve the predictability of BBB penetration in which the most common penetrating compounds and their properties were analyzed using descriptors.28 According to these authors, there are privileged species with substructures that are easier to penetrate through the BBB.28 Additionally, other studies indicate that the acidity and basicity of the compounds must be taken into account for BBB penetration, as well as aromaticity and the number of hydrogen bonding donors and acceptors.2931

ML models are mostly considered “black boxes” because we do not understand which factors led to a particular prediction.3236 The interpretability of these ML models is a key factor in increasing their reliability.32,3539 Explainable artificial intelligence (XAI) is a method through which human beings can interpret the predictions made by an ML model, contrasting with the “black box” concept. Fortunately, explanation methods can be used so that the model prediction can be contrasted with human knowledge. Local interpretable model-agnostic explanations (LIME) is a popular example of an explanation method used to interpret ML models. As the name implies, LIME performs a single prediction on a local behavior (mostly linear) from a complex classifier with a highly nonlinear decision boundary.32,34,40

The contribution of the LIME method lies in its ability to provide interpretable explanations for complex machine learning models. It helps to bridge the gap between the complex inner workings of these models and human understanding by offering locally faithful explanations. This means that for a given prediction, LIME can generate an explanation that is understandable and aligned with human intuition, even if the underlying model is highly complex or nonlinear. This added value is crucial in fields where model transparency and interpretability are vital, such as in toxicology, healthcare, finance, and law, where decisions impact the lives of individuals and require justification and trust. The method is effective when working with a small set of molecular descriptors and molecule representation such as SMILES.41 However, it tends to fail when working with thousands of descriptors or molecular graphs,42 and it can convey little insight into human understanding if used inappropriately.41 LIME has recently been used with great ability and efficiency to find and interpret substructures from datasets of toxicology.43

In general, descriptors based on physicochemical properties, such as polar surface area, and hydrogen bonding acceptors and donors, are used in ML models to analyze the BBB penetration of drugs.44 Additionally, methods with graph convolutional networks are used to improve the understanding of the penetration mechanism, which occurs mainly by passive diffusion.45 The BBB penetration ability of organic structures has also been investigated with good accuracy using artificial neural networks (ANN)46 and graph neural networks.47 It is also known that random forest models have better predictive capacity to describe the BBB penetration.48 One of the recent discoveries of XAI in minimizing the black box effects is that these methods have the applicability to interpret the binding of chemical structures with two protein targets of pharmacological relevance using a locally interpretable model.49 In addition, XAI methods based on counterfactuals have already been applied to interpret the BBB penetration.41

XAI is the border of knowledge in the analysis of the structure–activity relationship of molecules, and interpretable models are being increasingly developed.5052 In this work, the most common substructures present in the compounds that penetrate the BBB were identified using LIME. It is important to mention that this study does not underestimate the complexity of the brain, especially in terms of its pharmacokinetic properties. Then, it is worth mentioning that the most common substructures found here should not be considered as rules.33 However, the substructures that render compounds to penetrate the BBB should have an easier BBB penetration, making a faster CNS action and being an important factor for the discovery of new candidates for the pharmaceutical industry.5355 Considering that there are few works in this area focusing on the structure of chemical compounds, it is intended to seek and verify the consistency of the models by contrasting the LIME predictions, check if they are suitable techniques to generate the substructures important to the BBB penetration, and compare with results found in the literature.

The research question addressed in the LIME method is how to provide interpretable explanations for complex machine learning models, especially in situations in which the models themselves are not inherently interpretable. LIME builds on established prior art in the field of interpretable machine learning and aims to answer this question by proposing a framework that generates explanations that are both locally faithful to the predictions of the model and understandable to humans. By doing so, LIME contributes to making machine learning models more transparent, trustworthy, and accountable in various domains. The purpose of this study is to identify substructures that facilitate the penetration of compounds through the BBB using ML models such as the deep residual network (DRN), random forest (RF), and extra trees (ET) classifier models. The quality of these models is assessed using metrics such as precision, recall, F1, and accuracy scores. The ML models are interpreted using the explanation LIME, which identifies the most significant substructures that explain BBB penetration. The interpretation of results from the LIME method can indeed be challenging, particularly in cases such as blood–brain barrier penetration and structural alert interpretation for toxicology, where domain-specific knowledge and heuristics are crucial. The reliance of LIME on local model approximations and perturbations to explain complex model predictions may lead to ad hoc or even spurious interpretations, if not carefully validated. Statistically, it is challenging to determine the causal relationship between identified substructures and blood–brain barrier penetration solely on the basis of LIME results. LIME provides a local explanation of model behavior for a specific instance but does not directly imply causality. To assess the validity of the LIME results, it is essential to complement them with domain knowledge and experimental validation. Statistical tests, such as hypothesis testing or correlation analysis, could be used to evaluate the significance of substructures identified by LIME, but these tests alone may not establish causality without additional evidence and validation. The novelty of our work lies in providing an explained comprehensive list of substructures important for BBB penetration that surpasses the existing knowledge obtained from the explainable methods so far.

Methodology

The Python packages DeepChem (version 2.7.1.),56 RDKit (version rdkit-2023.9.1),56,57 and LIME (version lime-0.2.0.1)34,40,58 were installed in a Google Colaboratory platform. Other Python auxiliary packages such as mols2grid (version mols2grid-2.0.0),59 Matplotlib (version 3.7.1),60 Scikit-learn (version 1.2.2),61 pandas (version 1.5.3),62 Numpy (version 1.23.5),63 and IPython64 were also installed.

The BBBP dataset48 was uploaded, which makes part of the benchmark MoleculeNet dataset65 built for training and validating molecular machine learning models of molecular properties. This BBBP dataset incorporates binary labels (0 for nonpenetrating and 1 for penetrating molecules) on permeability properties for more than 2000 drugs, hormones, and neurotransmitters. We used DeepChem56 to implement various known data splitters, featurizers, transformers, machine learning classifier models, and infrastructure needed to create a classification model of compounds with blood–brain barrier penetration. The BBBP dataset48 was featurized using 1024 extended-connectivity fingerprints (ECFPs),66 which are a class of topological fingerprints for molecular characterization. Then, this featurized dataset was randomly split (80/20 split) to train and validate the models using the K-fold cross-validation method (K = 5) for the DRN, RF, and ET classifier models. It is important to mention that the model is randomly generated for each run. Therefore, the results may vary from each other. In this way, because it was decided not to save the model, it was possible to generate different substructures to evaluate the results from each run and to check the consistency of the results.

First, we used the DRN model (known as the MultiTaskClassifier model from the DeepChem library).56 The MultiTaskClassifier model is a fully connected deep residual network for multitask classification composed of preactivation residual blocks. Hyperparameterization for this first model was performed to optimize the best parameters, including layer size [256, 256], [256, 256, 256], [256, 256, 256, 256], dropout [0.0, 0.1], learning rate [0.005, 0.0035, 0.0025, 0.0010], number of epochs [5, 10, 20, 30, 40, 50], momentum [0.0, 0.9], decay [0.0, 0.1], and batch size [32, 64, 128]. This was done using the GridHyperparamOpt optimizer,67 the hyperparameter search method,61 and the ROC-AUC metric.68 Later, we also used the RF and ET classifier models from the Scikit-learn library.61 The hyperparameterization for these latter models was done by optimization of the number of estimators (50 and 500) and maximum depth (1 and 20) of the trees using the randomized search CV method61 and the ROC-AUC metric.68 Then, the best models were used in the next steps with the training and validation datasets using the area under the ROC curve (ROC-AUC metric). The results were analyzed with the confusion matrix, accuracy, precision, F1, and recall scores.

After creating the model using the training dataset and analyzing it with the validation dataset using the aforementioned statistical analysis described below, the LIME package34,40,58 was used to explain the model. Once the model is generated, LIME uses the LimeTabularExplainer function in classification mode to explain the training data. The function takes in categorical names and feature names, which are represented as dictionaries and lists of indexes, respectively. This creates an explainer object for LIME, which uses 1024 circular fingerprints to represent the training dataset and feature names. The model predicted and explained the validation dataset using 100 features. The explainer object then returns a dictionary mapping the fingerprint index to the list of SMILES strings that activated that fingerprint. Since our features do not have natural names, they are simply numbered in the feature_names object. The class names in the BBBP dataset represent brain–blood barrier penetration assays and are binary labels indicating penetration or nonpenetration. ″0″ represents “no penetration” and ″1″ represents “penetration.”

The evaluation is performed for each molecule to explain why the model predicts whether a molecule will penetrate the brain–blood barrier or not. The model takes a 2D numpy array (samples, features) as input and returns predictions for each sample. The output is stored in a model function object. By applying a threshold of 0.8, we obtain a list of penetrating molecules in the validation dataset that are correctly predicted to penetrate. The task is then to loop through the list of molecules in the validation dataset, evaluate the model, and store the results in an object. This list, along with the object, is input into the explain instance class to determine why a molecule was predicted to penetrate. The explainer identifies the most sensitive features to the prediction by analyzing the elements in the fingerprint that correspond to one or more fragments. The output is stored in the explainer object, which contains information about which fragments contributed to the prediction.

The LIME explanation module provides visualization and mapping functions for the explainers. The as_map() class generates a map of explanations with labels, represented as a list of tuples containing the feature identification number and its weight. This class is used to obtain the information in a more suitable format for processing. The keys in this map are the labels, and the value for each key is a list of tuples containing the fingerprint index and its respective weight. This is then converted into a dictionary, mapping indices to weights. In the my_fragments object, the fragments are present for each molecule of interest, while the fragment_weights object (fragment_weight = dict(exp.as_map()[1])) contains information about which fragments contributed to the prediction.

The weights assigned by LIME represent the contribution of a fragment in a molecule to the prediction, ranging from 0 to −1 or +1. However, the sum of these weights for each molecule does not necessarily equal 1. These weights still indicate the contribution of a fragment to the prediction. Each molecule has multiple fragments, and each fragment can contribute positively or negatively to the prediction of penetration. It is possible for a molecule to have some fragments with positive contributions and others with no contribution. Therefore, the model classifies a molecule as “penetrating” or “nonpenetrating” by separately summing all of the positive and negative contributions. If the sum of the positive weights is greater than the sum of the negative weights, the molecule is classified as “penetrating.” Conversely, if the sum of the negative weights is greater than the sum of the positive weights, the molecule is classified as “nonpenetrating.”

At the end, the classification data of each active molecule for BBB penetration is exported. The number of occurrences with which a given substructure is obtained in the prediction of BBB penetration of molecules and their respective total weight of contribution. For each of these molecules, dictionaries are obtained with the counting frequency of each substructure and the total sum of associated weights for each substructure. Then, these dictionaries are converted in such a way that the desired data frame is obtained. Finally, RDKit resources are used to highlight the substructures, and their visualization in the associated compounds is rendered by mols2grid59 and Pandas libraries.62

LIME is versatile and applicable to any ML model. It is characterized as independent, easy to comprehend, and didactic.34,40,58,69 LIME aims to elucidate a representative set of model predictions and ensure local reliability by replicating model behavior with fidelity on a smaller scale. The key feature of LIME is its ability to construct a linear model in the proximity of a test instance. This involves initially generating artificial samples by permuting such instances with weights determined by a kernel value. Consequently, the model is trained to make predictions based on the importance coefficients of the interpreted characteristics of the model. When the LIME explainer module is utilized, it is noteworthy that testing various kernel width values did not yield unstable results. Moreover, the substructures produced in the explanation remained largely consistent across different kernel values.43 Consequently, the default value recommended by the LIME developers was adopted. To enhance the interpretability of results, efforts were made to eliminate unimportant substructures for BBB penetration. For conciseness, certain substructures with small or negative weights were excluded by filtering all fragments with a negative weight. In essence, what dictates the behavior in a drug is the substructure that facilitates BBB penetration, justifying the use of this cutoff.

Results

Data Curation

The SMILES representations in the benchmark dataset have been rectified manually, addressing an issue where 11 SMILES entries in the BBB dataset depicted uncharged tetravalent nitrogen atoms. It is essential to note that tetravalent nitrogen atoms should invariably bear a charge. The presence of these errors rendered the chemical structures unreadable by RDKit. Despite the widespread use of this dataset in numerous studies, there is a noticeable absence of discussions regarding the handling of these invalid SMILES representations. To ensure the integrity of chemical structures, the SMILES representations were meticulously corrected and the requisite positive charges were introduced where necessary in the benchmark datasets.

Depending on the structure, the carboxylic acid substructure in some molecules is represented in one of three different forms: the protonated acid, the anionic carboxylate, and the anionic salt form. For the sake of simplicity, the form used in this study was neutral, as it should be for most structures in physiological pH.

In the BBB dataset, 81 structures were identified as duplicates and 7 as triplicates, which is undesirable for a benchmark. To address this issue, the authors systematically removed these duplicate/triplicate compounds to prevent any potential issues. Remarkably, the BBB dataset includes 14 compounds (13 pairs and 1 trio), where the identical molecules are inconsistently labeled as, for example, BBB penetrant and BBB nonpenetrant at the same time. That is why these inconsistencies were removed. It is important to mention that some of these inconsistencies were found because the molecules are treated as P-gp transported on one label and as passively transported on the other label. Other inconsistencies may come from disagreements of results by different laboratories.

Hyperparameter Optimization

Initially, the architecture of the DRN classifier method was explored with layer sizes of [64], [128], [256], [512], and [1024]. The best result was achieved with a layer size of [256]. Then, the architecture was optimized with layer sizes of [[256, 256], [256, 256, 256], and [256, 256, 256, 256]]. Additionally, the other parameters mentioned in the Methodology Section were optimized using the procedure described in that section. The hyperparameter optimization of the DRN, RF, and ET classifier models was investigated to find the best parameters. The optimized hyperparameters are presented in Tables 1 and 2. The optimum parameters for the DRN, RF, and ET models varied with each run. Using these optimum parameters, the ROC-AUC metrics for each classification model were mostly found to be higher than 0.90 for the training and validation datasets, as shown in Table 3. Therefore, there is no indication of substantial overfitting. Although the training scores for the DRN, RF, and ET models were mostly higher than 0.98, the validation scores of the RF model are slightly higher than those of the DRN and ET models, as presented in Table 3.

Table 1. Optimized Parameters in the Hyperparameterization Process for the Deep Residual Network (DRN) Classifier Model in Three Different Runs (#1, #2, and #3) for the Blood–Brain Barrier Permeation of Compounds.

run layer sizea dropout learning rate epochs momentum decay batches
#1 4 0.1 0.0025 50 0.9 0.1 128
#2 4 0.0 0.005 30 0.0 0.0 128
#3 4 0.1 0.001 10 0.0 0.0 128
a

Layer size: 2 for [256, 256], 3 for [256, 256, 256], and 4 for [256, 256, 256, 256].

Table 2. Optimized Parameters in the Hyperparameterization Process for the Random Forest (RF) and Extra Trees (ET) Classifier Models in Three Different Runs (#1, #2, and #3) for the Blood–Brain Barrier Permeation of Compounds.

model run depth estimators
RF #1 19 89
#2 19 387
#3 18 327
ET #1 19 362
#2 18 429
#3 18 92

Table 3. Metrics (Mean ROC-AUC) of the Training and Validation Datasets for the BBB Penetration Model for Three Different Runs (#1, #2, and #3) Using the Deep Residual Network (DRN), Random Forest (RF), and Extra Trees (ET) Classifier Models for the Blood–Brain Barrier Permeation of Compounds.

model run training validation
DRN #1 0.999 0.894
#2 0.976 0.790
#3 0.989 0.882
RF #1 0.999 0.931
#2 0.999 0.943
#3 0.999 0.936
ET #1 0.999 0.913
#2 0.999 0.925
#3 0.999 0.915

The parameters of the LIME explainer were thoroughly investigated. The number of features parameter, which defines the maximum number of features within a single explanation, was scrutinized. Despite the classifier model containing 1024 features, less than a dozen proved to be significant for each sample. Modifying this parameter from 20 to 500 had no discernible impact on the results or execution time. However, a principal component analysis for all samples showed that 542 and 232 fingerprints out of the total 1024 fingerprints used in the featurization were significant, representing 95% of the cumulative explained variance in the training and validation datasets, respectively, as presented in Figure 1. It was found that the results from 542 and 1024 fingerprints were similar. Initially, a conservative setting of 100 features was chosen for safety. Another parameter studied was the number of explanations with the highest probability of prediction for each substructure. This parameter ranged from zero (none) to 10. Increasing this value resulted in more refined results due to increased complexity in the explanations. However, it was observed that higher values reduced the number of substructures with weights greater than 0.1. Consequently, a decision was made to set the number of explanations to 1, maximizing the quantity of substructures. Notably, high values for the number of explanations correlated with prolonged evaluation times, exceeding 1–2 h. This underscores the importance of avoiding excessively high values to maintain reasonable processing times.

Figure 1.

Figure 1

Principal component analysis of the BBB penetration dataset representing the 542 and 232 fingerprints (vertical dashed line in red) that are significant within 95% cumulative explained variance (horizontal dashed line in red) for all samples in the training and validation datasets, respectively. The number of fingerprints used in the featurization is 1024, representing the total dimensionality of the training dataset.

Statistical Analysis

The precision score is the ratio TP/(TP + FP), and the recall score is the ratio TP/(TP + FN), where TP is the number of true positives, FP is the number of false positives, and FN is the number of false negatives. False Positive (FP) and True Positive (TP) instances occur when the model predicts a positive data point incorrectly and when the model accurately predicts a positive data point, respectively. On the other hand, True Negative (TN) and False Negative (FN) instances occur when the model accurately predicts a negative data point and when the model incorrectly predicts a negative data point, respectively. The F1 score is the harmonic mean of the precision and recall. It is important to mention that precision, recall, and F1 scores reach their best values at 1 and worst values at 0. While the precision score is intuitively the ability of the classifier not to label a sample that is negative as positive, the recall score is intuitively the ability of the classifier to find all of the positive samples. Accuracy is a metric used to evaluate the performance of classification models. It is the number of correct predictions as a percentage of the number of observations in the dataset. The precision, recall, F1 scores, and accuracy scores for the classifier models used in this study for the validation dataset are presented in Table 4. The detection of a substructure falsely identified as irrelevant can happen when the local explanation is not accurate in relation to the underlying behavior of the model. This behavior can occur due to various reasons, such as inadequate sampling of data or features, model complexity, or the presence of data noise. If researchers want to check the relevance and importance of the substructures identified through LIME in the context of blood–brain barrier permeation to gain more confidence, they can utilize the following validation strategies. First, we compare the identified substructures with the expertise of professionals in the field of blood–brain barrier permeation to ensure their biological plausibility and relevance. Second, experiments were conducted using existing data to validate the effects of the identified substructures on blood–brain barrier permeation. Third, statistical tests were applied to assess the significance of the identified substructures in predicting blood–brain barrier permeation. Fourth, sensitivity analysis was performed to determine if the identified substructures remain relevant under different data perturbations. Finally, we evaluate the impact of the identified substructures on model predictions by comparing the performance of the model with and without them.

Table 4. Precision, Recall, F1, and Accuracy Scores, and the Matthew Correlation Coefficient (MCC) for the Deep Residual Network (DRN), Random Forest (RF), and Extra Trees (ET) Classifier Models in Three Different Runs (#1, #2, and #3) for the Validation Dataset of Blood–Brain Barrier Permeation of Compounds.

 
  scores
models run precision recall F1 MCC accuracy
DRN #1 0.868 0.868 0.868 0.645 0.868
#2 0.896 0.896 0.896 0.689 0.896
#3 0.885 0.885 0.885 0.660 0.885
RF #1 0.908 0.956 0.893 0.696 0.893
#2 0.900 0.954 0.883 0.653 0.883
#3 0.899 0.963 0.891 0.688 0.891
ET #1 0.895 0.945 0.878 0.672 0.878
#2 0.906 0.954 0.888 0.670 0.888
#3 0.909 0.964 0.898 0.699 0.898

As can be observed, both classifier models perform excellent classification for BBB penetrating compounds and poorer classification for BBB nonpenetrating compounds. It is also important to note that the RF and ET models are more accurate and precise than the DRN model. The confusion matrix is used to evaluate the accuracy of classification, as presented in Figure 2. By definition, a confusion matrix C is such that Cij is equal to the number of observations known to be in group I and predicted to be in group j. Thus, the count of TN is C00, FN is C10, TP is C11, and FP is C01, as presented in Figure 2 for the DRN, RF, and ET models in the validation dataset of BBBP. The TP values are large, and the TN values are small, meaning that the classification model accurately classifies the penetrating compounds in the BBB but not the nonpenetrating compounds. It is important to mention that the FP and FN values are also low, meaning that the classification model does not produce many false classifications. Of course, there are many more penetrating compounds in the BBBP validation dataset, so the classification model should only be used to classify the penetrating compounds in the BBB, and the classification for the nonpenetrating compounds is not sufficiently accurate for this purpose. It is noteworthy that a different classification model must be built using the nonpenetrating compounds if one would like to classify the nonpenetrating compounds in the BBB. In this situation, the C00 element in the confusion matrix would be much larger than the C11 element.

Figure 2.

Figure 2

Confusion matrix for the DRN (A), RF (B), and ET (C) classification models using the validation dataset of the BBB permeation for the run #1.

It is not our intention to compare whether any model or metric is better than another because of the pitfalls of comparing classification models and performance metrics solely based on statistics. The standard deviation is highlighted as a measure of variability rather than a statistical test. It quantifies the amount of variation or dispersion in a set of values. While it provides useful information about the spread of data, it is not a statistical test itself. Methods such as the Student’s t test and analysis of variance (ANOVA) are parametric statistical tests. These tests have assumptions about the distribution of data (e.g., normality and independence) that may not always hold true in real-world scenarios. To address the limitations of parametric tests, it is recommended to use nonparametric alternatives. The Student’s t test can be replaced by the Wilcoxon rank-sum test, which is a nonparametric analogue suitable for comparing two methods. For comparing more than two methods, the ANOVA is suggested but with caution about its limitations. Instead, it is advocated to use Friedman’s test, a nonparametric alternative to ANOVA, which is particularly useful when dealing with data that is not independent or normally distributed. Therefore, it is important to consider the distribution and nature of the data when comparing classification models and performance metrics. It is suggested to move away from relying solely on standard deviation and parametric tests, opting for nonparametric alternatives like the Wilcoxon rank-sum test and Friedman’s test for more robust comparisons in various scenarios. As the results of the models and metrics used in our work were quite satisfactory, it is also believed that it is not necessary to perform any statistical comparisons to determine which model or metric is better in comparison to each other.

The field of interpretability in machine learning models can progress methodologically in several key steps. First, there is a need to advance model-agnostic approaches in order to offer interpretable explanations to a more diverse group of models and datasets. Second, the integration of domain-specific knowledge into interpretability methods is essential, particularly in intricate domains such as toxicology, to enhance the relevance and accuracy of explanations. Third, developing methods to quantify uncertainty linked with interpretable explanations is crucial for providing users with a nuanced comprehension of model predictions. Fourth, enhancing the scalability and efficiency of interpretability methods is imperative for handling large datasets and complex models promptly. Finally, establishing standardized validation and benchmarking procedures is critical for evaluating the performance of interpretability methods and comparing their efficacy across various domains and tasks. By tackling these methodological challenges, the field can progress toward more dependable, interpretable, and credible machine learning models for predicting blood–brain barrier permeation.

Model Explanation

The LIME explanations for the BBB penetration are presented below. The results were filtered with weights of >0.1 to remove less important substructures. Then, LIME selected the ninth to 10th most influential substructures for the BBB penetration. In general, LIME mostly explains the penetrability of compounds by using structures with nitrogen. Basicity is considered one of the main factors for CNS penetrability, especially for groups with nitrogen because they can be protonated at physiological pH (such as piperidine and piperazine groups). For BBB penetration, the C=N double bond of the aromatics is also considered. LIME detected the positive weight of this feature in the contribution to BBB penetration. Aromatic rings tend to facilitate penetration into the BBB because the hydrophobicity is enhanced.

Figure 3 shows the most important substructures found by LIME for BBB penetration using the DRN classification model for runs #1, #2, and #3. From run #1, six compounds have nitrogen-containing groups (mostly cyclic and noncyclic amine groups), but only three of them are highlighted in the nitrogen group. In the other three compounds, LIME highlighted the neighborhood of the nitrogen group. However, the highlighted substructures in compounds 1 and 6 are oxygen- and sulfur-containing groups, respectively, although nonhighlighted fluorine-containing groups are found in these two molecules. Compounds 3 and 5 have nonhighlighted chlorine-containing groups, with the first one having a highlighted phosphorus-containing group. In run #2, seven compounds have nitrogen-containing groups, and five of them have highlighted nitrogen-containing groups. Although compound 6 is not highlighted in the nitrogen-containing group, compound 7 is highlighted in the nitrogen-containing group neighborhood. Compounds 3 and 8 have been highlighted as oxygen- and halogen-containing groups, respectively. In run #3, eight compounds have nitrogen-containing groups, but three of them are not highlighted in the nitrogen group. Compounds 1 and 3 have fluorine-containing groups, but the highlights are in the oxygen-containing groups. Compounds 0 and 2 are repetitions with highlights in different nitrogen-containing groups. Compounds 5 and 6 are highlighted in the hydrocarbon part of these molecules, but the nitrogen-containing groups are not highlighted in these molecules. Compound 9 is also highlighted in the hydrocarbon part of the molecule with two nonhighlighted sulfur-containing groups.

Figure 3.

Figure 3

Most important substructures found by LIME for BBB penetration using the DRN classification model for runs #1, #2, and #3.

Figure 4 presents the most important substructures found by LIME for BBB penetration using the RF classification model for runs #1, #2, and #3. In run #1, five compounds are highlighted with nitrogen-containing groups. Four of them (1, 2, 6, and 7) are repetitions with very similar highlighted nitrogen-containing substructures. Compounds 0, 4, and 8 are also repetitions but with different highlighted oxygen-containing groups. Compound 5 has highlighted substructures with oxygen-containing groups. In run #2, six nitrogen-containing substructures are highlighted in different compounds. Compounds 3 and 4 are repetitions with quite similar highlights, and compound 5 is a very similar compound with a highlighted nitrogen-containing group. Compounds 2, 7, and 8 are also repetitions with quite similar highlights. Compounds 1 and 6 are also repetitions with different highlights in the cyclic substructure with oxygen- and fluorine-containing groups. Compound 0 is a repetition of compound 5 in run #1 but with similar highlights (in the same region). In run #3, seven nitrogen-containing compounds are found by LIME, but two of them are highlighted in the nitrogen-containing neighborhood. The other five compounds have highlighted nitrogen-containing groups. Compounds 0 and 3 are oxygen- and fluorine-containing compounds with different highlights. Compounds 4 and 8 contain nonhighlighted nitrogen-containing substructures; however, the highlight is in the neighborhood. Compounds 5 and 7 are repetitions with very similar highlighted nitrogen-containing substructures.

Figure 4.

Figure 4

Most important substructures found by LIME for BBB penetration using the RF classification model for runs #1, #2, and #3.

Figure 5 shows the most important substructures found by LIME for BBB penetration by using the ET classification model. In run #1, six nitrogen-containing substructures are highlighted in different compounds. Compounds 4, 5, and 7 are repetitions with very similar highlighted substructures. Compounds 3 and 6 have oxygen- and halogen-containing groups with different non-nitrogen-containing highlighted substructures. In run #2, six compounds have highlighted nitrogen-containing substructures; however, compounds 1, 2, and 7 are repetitions with slightly different highlights. Compounds 0 and 4 are very different oxygen- and halogen-containing compounds with very different highlighted substructures. In run #3, seven nitrogen-containing compounds are highlighted with different substructures; however, five of them are nitrogen-containing substructures. Compounds 4 and 7 have non-nitrogen-containing highlighted substructures, but the latter one has a nitrogen-containing group in the neighborhood.

Figure 5.

Figure 5

Most important substructures found by LIME for BBB penetration using the ET classification model for runs #1, #2, and #3.

Discussion

The permeability of chemical compounds through the BBB is a complex interplay of several factors. Understanding the significance of the functional groups is crucial. Although lipophilicity plays a pivotal role in determining the ability of BBB penetration, different functional groups interact with the BBB components, influencing the BBB permeation of a compound. However, variations in lipophilicity caused by different functional groups impact the transport of chemical compounds by influencing the solubility of the chemical compounds in the lipid-rich membranes of the barrier. Generally, hydrophobic compounds tend to have a better BBB permeation. Nevertheless, optimal ranges of lipophilicity for efficient BBB permeation depend on the specific characteristics of the functional groups of compounds. Therefore, too hydrophilic compounds may struggle for BBB penetration, while excessively lipophilic compounds may face challenges related to clearance and transport.

Indeed, the permeability of chemical compounds through the BBB is intricately influenced by specific nitrogen-containing functional groups. Therefore, several key functional groups have been identified for their potential to enhance BBB permeability. For example, amino groups (−NH2, −NRH, and −NR1R2) in a compound can participate in interactions, affecting its ability for BBB penetration. Chloroquine (compound 4 in DRN model run #3, Figure 3) is an aminoquinoline used for the prevention and therapy of malaria. It attenuates neuroinflammation. LIME found that the quinoline substructure does not cause BBB penetration; however, the two amine groups in the side chain are highlighted as potential substructures that enhance BBB penetration. LIME also found the amfetaminil drug (compound 4 in DRN model run #2, Figure 3), a stimulant drug derived from amphetamine that has an amine group. Another example is octriptyline (compound 8 in DRN model run #1, Figure 3), a tricyclic antidepressant, and thioperamide (compound 0 in DRN run #2, Figure 3), used to prevent seizures or reduce their severity. Finally, tiazesim (compound 8 in ET model run #3, Figure 5) is a heterocyclic antidepressant related to tricyclic antidepressants. In this molecule, LIME highlighted the tertiary amine group in the side chain linked to the heterocyclic tertiary amine substructure.

Carbamate group (>N–C(=O)–O−) has unique properties that may also influence BBB permeability. Their involvement in hydrogen-bonding interactions with BBB components can impact the overall transport process. Oxyfenamate is a sedative and anxiolytic drug of the carbamate class found by LIME (compound 7 in DRN model run #1, Figure 3). Procymate (compound 5 in DRN model run #3, Figure 3) is a carbamate derivative that is a sedative and anxiolytic drug. Also, difebarbamate is a tranquilizer of the barbiturate and carbamate families found by LIME (compound 7 in DRN model run #2, Figure 3), but it only highlighted the carbamate substructure, not the barbiturate substructure. On the other hand, LIME highlighted part of the barbiturate ring of heptabarbital (compound 4 in RF model run #3, Figure 4), which is a sedative and hypnotic drug of the barbiturate family.

Nitrogen-containing heterocycles (such as imidazole) are known for their potential to enhance permeability. The combination of the aromatic nature of these structures, with their ability to participate in various interactions, contributes to their favorable influence on BBB permeation. It is important to emphasize that many compounds presented in Figures 35 are amines, amides, and nitrogen-containing heterocycles. In the following, some noteworthy examples of nitrogen-containing heterocycles exhibiting distinctive properties that affect the permeability of chemical compounds through the BBB are shown.

Pyrrole or pyrrolidine rings feature a nitrogen atom in a five-membered ring that can engage in various interactions with BBB components, potentially influencing transport and impacting BBB permeability. Piperidine and piperazine (and pyrazine) consist of six-membered rings with one nitrogen atom and two nitrogen atoms at opposite positions in the ring, respectively. Their unique structural characteristics may contribute to their potential influence on BBB permeation. LIME found the piperidine ring in pipradrol (compound 4 in DRN model run #1, Figure 3), a psychoactive agent and a central nervous system stimulant that has proven useful in the field of psychiatry. Another example of a piperidine derivative is iloperidone (compounds 2, 7, and 8 in RF model run #2, Figure 4), which is an atypical antipsychotic for the treatment of schizophrenia. LIME highlighted the nitrogen atom of the piperidine ring. Amiperone (molecule 0 in ET model run #1, Figure 5) is a psychotropic and neuroleptic drug. LIME highlighted the piperidine ring in the amiperone molecule. Finally, the piperidine ring was also highlighted in spiroxatrine (molecule 6 in ET model run #2, Figure 5), which is used to treat psychotic disorders.

Piperazine derivatives were found twice by LIME: the anxiolytic drug enpiprazole (compound 1 in ET run #1, Figure 5) and the antihistamine drug chlorcyclizine (compound 3 in ET run #3, Figure 5). Finally, loxapine (compound 5 in RF model run #2, Figure 4) is used primarily in the treatment of schizophrenia. It is a member of the dibenzoxazepine class and structurally very similar to clozapine (compound 3 in ET model run #2, Figure 5). The piperazine substructure in both compounds is highlighted by LIME. On the other hand, LIME has highlighted a different substructure (the O=C–N–C=O ring substructure) than the piperazine ring in buspirone (molecule 6 in RF model run #3, Figure 4), which is a psychoactive drug used for the management of general anxiety disorders. The piperazine ring was also highlighted in several drugs to treat brain disorders: trazodone (compounds 1, 2, and 7 in ET model run #2, Figure 5), cloxypendyl (compound 1 in ET model run #3, Figure 5), and blonanserin (compound 2 in ET model run #3, Figure 5).

The aromatic nature of pyrazines and their potential interactions with the BBB components can also play a role in the permeability of compounds featuring these structures. Pyrimidines may also have implications for permeability, although the specific impact depends on the overall structure of the chemical compound. Phenothiazine derivatives are related to the thiazine-class of heterocyclic compounds that show antipsychotic activity for the treatment of anxiety disorders, depressive symptoms secondary to anxiety, and agitation. This class of compounds was found twice by LIME: trifluoperazine and imiclopazine were found in the output in the random tree models (triplicate in ET run #1, Figure 5, and triplicate in RF run #3, Figure 4) and the neural network model, respectively. Thioproperazine is another phenothiazine derivative used in the treatment of all types of acute and chronic schizophrenia. It was also found by LIME (compound 1 in DRN model run #2, Figure 3). Another good example is (E)-thiothixene (compounds 1, 2, 6, and 7 in RF model run #1, Figure 4), which is a typical antipsychotic of the thioxanthene class. It is also related to thioproperazine and pipotiazine, which are members of the phenothiazine class. LIME highlighted the same piperazine substructure in this class of molecules.

Another class of compounds is found in clothiapine. It is a typical antipsychotic of the dibenzothiazepine chemical class found by LIME (compound 2 in ET model run #1, Figure 5). Also, fasoracetam (molecule 2 in DRN model run #2, Figure 3) is a class of drugs that share a pyrrolidone nucleus that was not found by LIME. However, the other part of the molecule (piperidine ring) is highlighted by LIME. Another important example is menitrazepam (compound 7 in DRN model run #3, Figure 3), a hypnotic agent used to treat insomnia. It is a drug which is a benzodiazepine derivative. LIME highlighted the nitrogen-containing ring of this molecule. Understanding the diverse effects of nitrogen-containing heterocycles provides valuable insights for understanding the intricate landscape of BBB permeation.

Similarly, the hydroxyl group (−OH) is a crucial functional group because it can enhance permeability due to hydrogen-bonding interactions with the BBB components. Carbonyl groups (C=O) found in ketones and aldehydes can influence the permeability. The carbonyl oxygen can engage in interactions with the barrier components, affecting the overall transport process. For instance, pipamperone (duplicate compounds 0 and 2 in DRN model run #3, Figure 3) is a typical antipsychotic drug of the butyrophenone family used in the treatment of schizophrenia. LIME found that three substructures are important to its BBB penetration: the amide group, piperidine ring, and butyl ketone substructure highlights. The latter is part of the butyrophenone substructure, although the phenyl group was not highlighted. Esters (−COOR) and carboxylic acids (−COOH) can contribute to permeability, engaging in hydrogen bonding and other interactions and impacting the compound’s ability to transpose the barrier. Here, it is important to mention a specific, controversial example. Valproate pivoxil (compound 3 in DRN model run #2, Figure 3) is an anticonvulsant used in the treatment of epilepsy. It is likely a prodrug for valproic acid. LIME highlighted the −C(O)OCH2O(O)C– substructure as explanation for its BBB penetration. However, it is possible that the carboxylic group of valproic acid may be the reason for BBB penetration. Additionally, the size and charge of the functional groups play a pivotal role. Smaller and neutrally charged groups generally have a higher permeability. Other oxygen-containing functional groups also play a crucial role in influencing BBB permeability. Therefore, several key functional groups with oxygen have been identified for their potential to enhance permeability, as follows.

Compounds containing ether linkages (−O−) may exhibit favorable permeability characteristics. The oxygen in the ether group can participate in interactions that aid in the compound’s transposition across the barrier. For example, pinoxepin is an antipsychotic of the tricyclic group with a dibenzoxepin ring system that was found by LIME (compounds 3 and 4 in RF model run #2, Figure 4). Another example is the antiviral desciclovir (compound 6 in DRN model run #3 in Figure 3). It is a nucleoside analogue to purine and prodrug of the neurotoxic acyclovir with activity against viruses. LIME found that the purine side chain with oxygen-containing substructures (−O– and −OH) is highlighted, so the purine substructure may not be the explanation for BBB penetration. Understanding the nuances of these oxygen-containing functional groups is essential for designing compounds with optimized BBB permeability. As specific functional oxygen-containing groups can negatively impact the BBB permeation due to the reduction of the compound lipophilicity, it is essential to consider these intricate details in conjunction with the compound lipophilicity when evaluating its potential to permeate BBB.

With respect to the reduction of compound lipophilicity, it is important to mention that other functional groups also contribute to lipophilicity variations. For instance, halogen-containing compounds can enhance lipophilicity and, consequently, BBB penetration. Sulfur-containing groups such as thiol (−SH) can also influence lipophilicity, potentially affecting the compound’s ability to penetrate the BBB. The sulfur atom in thiol groups can form interactions with the BBB components, facilitating BBB penetration.

Compounds containing sulfoxide (−SO) and sulfone (−SO2) groups may exhibit enhanced permeability. Sulfonethylmethane (compound 9 in DRN model run #3, Figure 3) is a sedative-hypnotic and anesthetic drug with GABAergic actions. LIME highlighted the hydrocarbon substructure in the middle of the molecule instead of the sulfone group. Certain compounds with thiolsulfinate groups, characterized by a sulfur atom bonded to both oxygen and sulfur, can positively influence BBB permeability. By exploring the interactions of sulfur-containing functional groups, it is possible to gain valuable insights into designing compounds with improved capabilities for BBB penetration. Therefore, understanding the nuances of these groups is crucial to designing compounds with optimized characteristics for BBB penetration.

Halogen atoms (particularly bromine, chlorine, and fluorine atoms) can have a significant impact on BBB permeability. The presence of halogen atoms introduces unique characteristics that influence interactions with the barrier components. Fluorine atoms can enhance BBB permeability because fluorine is a small, electronegative atom that can influence the lipophilicity and electronic properties of the compound. Therefore, it potentially promotes favorable interactions for the BBB permeation. Similar to fluorine, the presence of chlorine atoms can alter the physicochemical properties of a compound, affecting its ability to penetrate the BBB. The larger size of chlorine compared to fluorine introduces specific steric and electronic effects that contribute to the overall permeation characteristics of the compound. Understanding the nuances of halogen-containing compounds is vital for medicinal chemists and researchers aiming to design drugs with improved blood–brain barrier penetration.

It is important to mention that corticosteroids penetrate the BBB, as presented in Figures 35. Our results found many corticosteroids such as diflorasone diacetate, triamcinolone benetonide, betamethasone dipropionate, fluocinolone acetonide, halcinonide, fluticasone propionate, paramethasone acetate, rofleponide, Mometasone, icometasone enbutate, and cortisuzol. Of all of these, cortisuzol is the only corticosteroid that does not have a halogen atom in its structure. LIME highlighted an oxygen-containing substructure to explain its BBB permeation for cortisuzol. The reader should note that, except for halcinonide, no corticosteroids had a halogen atom highlighted by LIME, indicating that they may only modulate the lipophilicity of the corticosteroids. Most corticosteroids had oxygen-containing substructures highlighted by LIME. In the pursuit of unraveling the mysteries surrounding corticosteroid penetration through the BBB due to its complex structure with halogen atoms and many cyclic oxygen-containing substructures, our latest research has illuminated crucial insights that beckon a re-evaluation of existing paradigms. Our study delves into the intricate relationship between corticosteroids and the BBB, shedding light on the pivotal role played by halogen atoms, specifically fluorine and chlorine atoms, as well as oxygen-containing groups. These molecular constituents emerge as potent modulators of corticosteroid lipophilicity, acting as gatekeepers that influence BBB permeability. Our findings underscore the significance of considering not only the structural composition and lipophilicity of corticosteroids but also the nuanced interplay of halogen atoms and oxygen-containing groups in dictating their ability to transpose the BBB. This revelation holds profound implications for drug design and development.

Also, some opioids were found in our study: levallorphan (compound 7 in ET model run #3, Figure 5), (−)-pentazocine (compounds 2 and 8 in RF model run #3, Figure 4), and cogazocine (compound 3 in RF model run #1, Figure 4). It is important to note that the nitrogen-containing substructure is explained by LIME as the cause for BBB penetration in two molecules, (−)-pentazocine and cogazocine. The two substructures highlighted in levallorphan are in the vicinity of the nitrogen-containing substructure that was explained as the cause of BBB penetration for the other two molecules.

Our results suggest that nitrogen atoms may have a special effect in facilitating the permeation of organic molecules, particularly when these nitrogen atoms might be protonated under physiological conditions. The studies conducted by Singh and colleagues28 and White et al.30,31,41 reveal certain substructures that are commonly found in compounds that permeate the BBB. These results support our study, as the presence of nitrogen-containing compounds and aromatic rings is found to be more prevalent in BBB-permeating compounds compared to nonpermeating compounds. Our study provides a comprehensive list of substructures that are crucial for BBB penetration, surpassing the existing knowledge obtained through explainable methods found in the literature to date.28,30,31,41 Although other functional groups such as oxygen-, sulfur-, and halogen-containing compounds play a role in regulating and balancing the lipophilicity of a compound, nitrogen-containing groups appear to be key in influencing compound permeability. Specifically, nitrogen-containing substructures such as carbamate, barbiturate, piperidine, piperazine, pyrazine, phenothiazine, dibenzothiazepine, benzodiazepine, butyrophenone, and dibenzoxepin are deemed more relevant to BBB permeation. However, both the lipophilicity and the charge of compounds need to be taken into account when determining the BBB permeation of a chemical compound. Corticosteroids serve as examples of the importance of lipophilicity. Additionally, opioid compounds appear to possess a nitrogen-containing substructure that is vital for BBB permeation.

Re-Evaluation of the Methodology

K-fold cross-validation serves as a valuable technique for assessing the generalization performance of machine learning models on unseen data. It facilitates the optimization of model hyperparameters and aids in model selection for a given dataset. However, the use of the same cross-validation procedure and dataset for both tuning and model selection can lead to biased evaluations of model performance. Thus, the nested cross-validation approach is preferred to overcome this bias. This approach includes the hyperparameter optimization procedure within the model selection procedure, providing a less biased evaluation of the tuned models. The hyperparameters of most machine learning algorithms can be adjusted to fit a specific dataset. However, there are few guidelines for how to configure these hyperparameters. Instead, an optimization procedure is used to find the best hyperparameters for the dataset. This is where k-fold cross-validation comes in to select both the best hyperparameters for each model and the best model configuration.

K-fold cross-validation is effective for estimating model performance but has limitations. When used multiple times with the same algorithm, it can lead to overfitting. Each evaluation of a model with different hyperparameters provides information about the dataset. Noisy datasets tend to score worse, and this knowledge can be used to find the best configuration for the dataset. Although k-fold cross-validation attempts to reduce this effect, it cannot eliminate it completely. It is common for the hyperparameter optimization process to perform some form of hill-climbing or overfitting to the dataset. Nested cross-validation addresses the problem of overfitting the training dataset. Exposing the hyperparameter search to only a subset of the dataset provided by the outer cross-validation procedure reduces the risk of overfitting and provides a less biased estimate of model performance on the dataset. On the other hand, we chose to resample the imbalanced BBBP dataset using the standard Scikit-Learn library. The resampling method combined with 5-fold cross-validation was intended to combat overfitting instead of causing it. Overfitting occurs when a model learns the noise in the training data rather than captures the underlying pattern, resulting in poor performance on new data. The resampling technique aids in gauging the model performance on unseen data by iteratively training it on varied subsets of the dataset and assessing its performance on the remaining data. This iterative process offers a robust estimation of performance and guards against overfitting by promoting effective generalization to new data. Therefore, the resampling technique was instrumental in addressing potential overfitting issues.

The optimized parameters in the hyperparameterization process for the DRN, RF, and ET classifier models in three different runs for the blood–brain barrier permeation of compounds using the resampling method and 5-fold cross-validation are given in Tables S1 and S2. Table S3 shows the mean ROC-AUC of the training and validation data-sets for the BBB penetration model for three different runs using the DRN, RF, and ET classifier models for the blood–brain barrier permeation of compounds using the resampling method and 5-fold cross-validation. The precision, recall, F1, and accuracy scores and the Matthew correlation coefficient (MCC) for the DRN, RF, and ET classifier models are presented in Table S4 for three different runs for the validation dataset of blood–brain barrier permeation of compounds using the resampling method and 5-fold cross-validation. Figure S1 presents the confusion matrix for the DRN, RF, and ET classification models using the resampling method and 5-fold cross-validation of the BBB permeation. Figures S2–S4 show the most important substructures found by LIME for BBB penetration for the DRN, RF, and ET classification models using the resampling method and 5-fold cross-validation.

If the model performs significantly better on the training dataset compared to the validation dataset, this may indicate overfitting. Keeping this in mind, the results show that the ROC-AUC scores in the training dataset are not drastically greater than those for the validation dataset (see and compare Tables 3 and S3) before and after resampling. Therefore, that means the resampling method avoided overfitting. It was also found that the precision, recall, F1 and accuracy scores of the model on the training dataset are not significantly greater than those for the validation dataset (refer to Tables 4 and S4). Additionally, it is important to note that the accuracy ranged from 0.868 to 0.898. However, the Matthew correlation coefficient (MCC) on the validation dataset ranged from 0.645 to 0.699 due to the imbalanced nature of the BBBP dataset. MCC provides a measure of the quality of binary classifications based on the whole confusion matrix. It is considered a balanced measure and can be used even when the classes have substantial differences in sizes, as is the case here with the BBBP dataset. MCC ranges from −1 to +1, where +1 indicates a perfect prediction, 0 represents a random prediction, and −1 signifies total disagreement between the prediction and observation. While precision, recall, F1, and accuracy metrics offer valuable insights into various aspects of model performance, MCC is often preferred as it considers all elements of the confusion matrix and is especially useful with imbalanced datasets. It is worth mentioning that although the model was built using weights to address dataset imbalance, the MCC still only ranged from 0.645 to 0.699. Consequently, we opted to employ the resampling technique, resulting in an improved Matthew correlation coefficient on the validation dataset ranging from 0.850 to 0.947.

Another important point with respect to our methodology is that prior to parameter optimization, we divided our dataset into an 80:20 split. Subsequently, we utilized the training dataset, which consisted of 80% of the data, during the parameter optimization process. This approach ensured that the constructed model would not be overfitting. In addition to this, we performed 5-fold cross-validation and nested cross-validation for the purpose of comparison. Furthermore, we also resampled the imbalanced BBBP dataset, as previously explained. This led to even better results, particularly after resampling the BBBP dataset, but that does not mean the prior methodology produced overfitting.

The ROC-AUC scores in Tables 3 and S3 showed that the random forest and extra trees models do not overfit significantly, with scores around 0.99 and 0.91–0.94 for training and validation, respectively. The deep residual network (DRN) model had ROC-AUC scores ranging from 0.976 to 0.999 for training and 0.790–0.894 for validation, which is an indication of overfitting. However, our results on the imbalanced BBBP dataset were only reasonable for drug penetration (label 1) and poor for nondrug penetration (label 0), with an MCC score of only 0.65–0.70 (before balancing the dataset), as presented in Tables 4 and S4, although the precision, recall, F1, and accuracy for penetration were reasonable. After resampling the BBBP dataset, the model performed well for both labels, with MCC ranging from 0.90 to 0.95 using 5-fold cross-validation and 5 × 10-nested cross-validation (5-fold for inner and 10-fold for outer cross-validation) (see Tables 4 and S4). It is important to note that the ROC-AUC scores for training and validation of the model built with 5 × 10-nested cross-validation are around 0.999 and 0.988, respectively (similar results for 5-fold cross-validation) (see Tables 3 and S3). Once again, we have shown that both 5 × 10 nested cross-validation and 5-fold cross-validation with random forest and extra trees have similar results (see Tables S4 and S5) using the resampling method, do not overfit, and the MCC has considerably improved compared to the previous methodology. Therefore, we have decided to use the 5-fold cross-validation with the resampling method for both random forest and extra trees.

Figures S2–S4 present some of the known compounds with the most important substructures found by LIME for BBB penetration using the DRN, RF, and ET classification models, the resampling method, and 5-fold cross-validation: methylprednisolone succinate (a corticosteroid hormone (compound 0 in Figure S2 #1)); pirenzepine (a pyridobenzodiazepine (compound 3 in Figure S2 #1)); acetylmethadol (a diarylmethane and narcotic analgesic (compound 9 in Figure S2 #1)); fluorometholone acetate (a glucocorticoid (compound 0 in Figure S2 #2)); mazipredone and hydrocortisone aceponate (corticosteroids (compounds 3 and 4, respectively, in Figure S2 #2)); thiopropazate (a phenothiazine derivative (compound 6 in Figure S2 #2)); fluocinolone (a glucocorticoid (compound 0 in Figure S2 #3)); cortivazol (a steroid (compound 2 in Figure S2 #3)); chlorcyclizine (a phenylpiperazine (compound 4 in Figure S2 #3)); tybamate (a carbamate ester (compound 7 in Figure S2 #3)); mosapramine (an antipsychotic (compound 0 in Figure S3 #1)); sipatrigine (a member of pyrimidines and piperazines (compound 5 in Figure S3 #1)); triamcinolone diacetate (a corticosteroids (compounds 0, 3, and 7 in Figure S3 #2 and compound 2 in Figure S2 #1)); bromodiphenhydramine (an antihistamine (compound 4 in Figure S3 #2)); dixyrazine (a member of phenothiazines (compound 6 in Figure S3 #2)); fluocortolone and clobetasone butyrate (glucocorticoids (compounds 0, 1, and 2 in Figure S3 #3)); loprazolam (an imidazobenzodiazepine (compound 5 in Figure S3 #3)); oxymorphone (an opioid analgesic (compound 8 in Figure S3 #3)); flupentixol (an antipsychotic drug of the thioxanthene group (compounds 2, 4, and 6 in Figure S4 #1)); febarbamate (a member of barbiturates (compound 5 in Figure S4 #1)); amelometasone, Icometasone enbutate, and flumethasone (steroid hormones (compounds 0, 1, and 2, respectively, in Figure S4 #2)); nalmefene (a 6-methylene analogue of naltrexone, which is an opioid receptor antagonist (compound 3 in Figure S4 #2 and compound 3 in Figure S3 #3)); and clobetasol propionate and betamethasone benzoate (steroid compounds (compounds 0 and 1(8), respectively)).

Limitations and Benefits

These structural explanations cannot be seen as rules, and the peculiar pharmacological properties of each molecule in the BBB permeability and the biology of the system must not be underestimated. Our focus is on creating structural interpretability from LIME and assessing whether the most important substructures to BBB permeability make sense with a human scientific understanding. To exemplify that these analyses are not well-defined rules but rather structural interpretations, some molecules such as steroid-like molecules in Figures 35 have polar groups, resulting in a higher polar surface area compared to the other molecules. Nevertheless, the steroid-like molecules are identified as permeable even though most of these molecules do not have nitrogen-containing groups. Indeed, it is important to admit that LIME may also make the wrong interpretations. For example, the maleate group highlighted in the teniloxazine maleate (compound 6 in run #2, Figure 3) is not the reason for BBB permeation. Another example of a limitation is related to the prodrugs. Triclofos (compound 3 in DRN model run #1, Figure 3) is a sedative drug used rarely for treating insomnia. It is a prodrug that is metabolized in the liver into the active drug trichloroethanol. Figure 3 shows the triclofos highlighted as the phosphate group, but it is not the active group. The same is true for petrichloral (compound 5 in DRN model run #1, Figure 3), which is a sedative and hypnotic chloral hydrate prodrug. However, we believe that interpretating substructures important to BBB permeability with a certain level of confidence can serve as a basis for analyzing counterfactuals and graphical neural networks, which are being increasingly explored for their ability to suggest structural changes in molecules to improve their permeability capacity.

Likewise, Lipinski’s rule of five70,71 states that a drug must have certain well-defined properties. Over the years, in recent science, drug discovery has been less and less adhering to these rules.72 A good drug for acting on the CNS must permeate the BBB. Studies show that it is easier to permeate with a smaller number of hydrogen bond donors and acceptors, for example. It is difficult to create a rule to predict whether a molecule could be a good drug. Consequently, the interpretation of substructures important to BBB permeability has been gaining prominence in the study of drugs such as toxicity and permeability in the BBB.

The skepticism around the effectiveness of LIME in helping researchers in the new synthesis of active molecules stems from several factors. First, designing new molecules with desired biological activity is a highly complex and multidimensional problem. While LIME can provide explanations for individual model predictions, it may not capture the full complexity of molecular interactions and structural requirements for activity. Second, the explanations of LIME are limited to local, instance-specific insights and may not provide a holistic view of the underlying structure–activity relationships crucial for molecule design. Third, interpreting results from LIME requires domain expertise in both machine learning and chemistry, making it challenging for researchers without a strong background in both fields to effectively utilize the method. Fourth, any insights provided by LIME would still need to be validated experimentally, which can be time-consuming and costly. While LIME can offer some insights into the predictions of machine learning models in drug discovery, its utility in guiding the synthesis of new active molecules may be limited due to the lack of experimental findings about structural alerts, and researchers should consider its potential drawbacks and limitations when incorporating it into their workflow.

Disregarding LIME and persisting in employing state-of-the-art techniques in drug research and discovery may inadvertently limit the understanding and interpretation of predictions generated by complex machine learning models. LIME presents several advantages over alternative, explainable artificial intelligence methods. These advantages include offering localized and instance-specific explanations, which are often more intuitive and easier to grasp compared to explanations at a global model level. Being model-agnostic, it allows its application to any machine learning model, regardless of complexity or underlying architecture. LIME generates explanations that are typically simpler, quicker, and more transparent than other methods, thus enhancing accessibility for nonexperts and fostering trust in model predictions. Identifying the significant features influencing individual predictions enables researchers to understand which features drive the model’s decisions. Additionally, it provides insights into complex models, such as deep neural networks, where traditional interpretability methods may face challenges. By integrating LIME, researchers can improve their ability to interpret and trust machine learning model predictions in drug research and discovery, leading to more informed decision-making and potentially accelerating the discovery of new drugs.

Concluding Remarks

LIME was used to generate explanations regarding the permeability of organic compounds in the BBB. These explanations involve the substructures that contribute the most to the permeability of the molecule, or lack thereof, which serve as a useful guide for subsequent laboratory tests. LIME has a relatively fast execution time, with the large number of compounds in the dataset. It generates simple and intuitive explanations by highlighting molecules and indicating important substructures that facilitate BBB permeation. Understanding the important substructures that aid BBB permeation is vital for the pharmaceutical industry and for research groups involved in drug synthesis.

Acknowledgments

The authors are thankful to CNPq (Grants 573560/2008-0, 465259/2014-6, and 302554/2017-3), CAPES (Finance Code 001), and FAPESP (Grant 2014/50983-3). The authors also acknowledge the FAPERJ NanoHealth Research Network (E-26/010.000983/2019), the FAPERJ Support Program for Thematic Projects in the State of Rio de Janeiro (210.104/2020), and the National Institute of Science and Technology Complex Fluids (INCT-FCx) for funding. ASP was recently awarded the Scientist of Our State by FAPERJ (201.186/2022). He also thanks the research productivity fellowship granted by CNPq (310166/2020-9 and 305839/2023-3).

Data Availability Statement

Code, input data, and processing scripts for this paper are available at GitHub (https://github.com/andresilvapimentel/bbbp-explainer). The BBBP dataset cleaned by the authors is also provided on GitHub, together with the other data. The data analysis scripts of this paper are also available in the interactive notebook Google Colab.

Supporting Information Available

The Supporting Information is available free of charge at https://pubs.acs.org/doi/10.1021/acschemneuro.3c00840.

  • Optimized parameters in the hyperparameterization process, the metrics (mean ROC-AUC) of the training and validation datasets for the BBB penetration model for the blood–brain barrier permeation of compounds using resampling method with 5-fold cross-validation; the precision, recall, F1, and accuracy scores, and the Matthew correlation coefficient (MCC) for the validation dataset of blood–brain barrier permeation of compounds using the resampling method with 5-fold cross-validation; and also with the nested cross-validation; the confusion matrix for the classification models using resampling method and 5-fold cross-validation; and the most important substructures found by LIME for BBB penetration using resampling method; 5-fold cross-validation; and the classification models (PDF)

Author Contributions

L.C.S.R.: conceptualization, methodology, software, validation, formal analysis, investigation, data curation, writing—review and editing, and visualization. C.O.A.: conceptualization, methodology, and investigation. C.M.C.N.: conceptualization and writing—original draft. A.S.P.: resources, writing—review and editing, visualization, supervision, project administration, and funding acquisition.

Open access funded by CAPES.

The authors declare no competing financial interest.

Supplementary Material

cn3c00840_si_001.pdf (633KB, pdf)

References

  1. Ballabh P.; Braun A.; Nedergaard M. The Blood–Brain Barrier: An Overview. Neurobiol. Dis. 2004, 16 (1), 1–13. 10.1016/j.nbd.2003.12.016. [DOI] [PubMed] [Google Scholar]
  2. Di L.; Rong H.; Feng B. Demystifying Brain Penetration in Central Nervous System Drug Discovery. J. Med. Chem. 2013, 56 (1), 2–12. 10.1021/jm301297f. [DOI] [PubMed] [Google Scholar]
  3. Wu D.; Chen Q.; Chen X.; Han F.; Chen Z.; Wang Y. The Blood–Brain Barrier: Structure, Regulation, and Drug Delivery. Signal Transduction Targeted Ther. 2023, 8 (1), 217 10.1038/s41392-023-01481-w. [DOI] [PMC free article] [PubMed] [Google Scholar]
  4. Gupta M.; Lee H. J.; Barden C. J.; Weaver D. F. The Blood–Brain Barrier (BBB) Score. J. Med. Chem. 2019, 62 (21), 9824–9836. 10.1021/acs.jmedchem.9b01220. [DOI] [PubMed] [Google Scholar]
  5. Li H.; Yap C. W.; Ung C. Y.; Xue Y.; Cao Z. W.; Chen Y. Z. Effect of Selection of Molecular Descriptors on the Prediction of Blood–Brain Barrier Penetrating and Nonpenetrating Agents by Statistical Learning Methods. J. Chem. Inf. Model. 2005, 45 (5), 1376–1384. 10.1021/ci050135u. [DOI] [PubMed] [Google Scholar]
  6. Liu L.; Zhang L.; Feng H.; Li S.; Liu M.; Zhao J.; Liu H. Prediction of the Blood–Brain Barrier (BBB) Permeability of Chemicals Based on Machine-Learning and Ensemble Methods. Chem. Res. Toxicol. 2021, 34 (6), 1456–1467. 10.1021/acs.chemrestox.0c00343. [DOI] [PubMed] [Google Scholar]
  7. Zhao Y. H.; Abraham M. H.; Ibrahim A.; Fish P. V.; Cole S.; Lewis M. L.; de Groot M. J.; Reynolds D. P. Predicting Penetration Across the Blood-Brain Barrier from Simple Descriptors and Fragmentation Schemes. J. Chem. Inf. Model. 2007, 47 (1), 170–175. 10.1021/ci600312d. [DOI] [PubMed] [Google Scholar]
  8. Beam T. R.; Allen J. C. Blood, Brain, and Cerebrospinal Fluid Concentrations of Several Antibiotics in Rabbits with Intact and Inflamed Meninges. Antimicrob. Agents Chemother. 1977, 12 (6), 710–716. 10.1128/AAC.12.6.710. [DOI] [PMC free article] [PubMed] [Google Scholar]
  9. Schreibelt G.; Musters R. J. P.; Reijerkerk A.; de Groot L. R.; van der Pol S. M. A.; Hendrikx E. M. L.; Döpp E. D.; Dijkstra C. D.; Drukarch B.; de Vries H. E. Lipoic Acid Affects Cellular Migration into the Central Nervous System and Stabilizes Blood-Brain Barrier Integrity. J. Immunol. 2006, 177 (4), 2630–2637. 10.4049/jimmunol.177.4.2630. [DOI] [PubMed] [Google Scholar]
  10. Zipser B. D.; Johanson C. E.; Gonzalez L.; Berzin T. M.; Tavares R.; Hulette C. M.; Vitek M. P.; Hovanesian V.; Stopa E. G. Microvascular Injury and Blood–Brain Barrier Leakage in Alzheimer’s Disease. Neurobiol. Aging 2007, 28 (7), 977–986. 10.1016/j.neurobiolaging.2006.05.016. [DOI] [PubMed] [Google Scholar]
  11. Hong Y.; Zhou Y.; Wang J.; Liu H. Lead Compound Optimization Strategy (4)--Improving Blood-Brain Barrier Permeability through Structural Modification. Acta Pharm. Sin. 2014, 49 (6), 789–799. [PubMed] [Google Scholar]
  12. Goodwin J. T.; Clark D. E. In Silico Predictions of Blood-Brain Barrier Penetration: Considerations to “Keep in Mind. J. Pharm. Exp. Ther. 2005, 315 (2), 477–483. 10.1124/jpet.104.075705. [DOI] [PubMed] [Google Scholar]
  13. Bartzatt R. Lomustine Analogous Drug Structures for Intervention of Brain and Spinal Cord Tumors: The Benefit of In Silico Substructure Search and Analysis. Chemother. Res. Pract. 2013, 2013, 360624 10.1155/2013/360624. [DOI] [PMC free article] [PubMed] [Google Scholar]
  14. Kar S.; Roy K.; Leszczynski J.. In Silico Tools and Software to Predict ADMET of New Drug Candidates. In In Silico Methods for Predicting Drug Toxicity, Methods in Molecular Biology2022; Vol. 2425, pp 85–115. [DOI] [PubMed] [Google Scholar]
  15. Mullins J. G. L. Drug Repurposing in Silico Screening Platforms. Biochem. Soc. Trans. 2022, 50 (2), 747–758. 10.1042/BST20200967. [DOI] [PubMed] [Google Scholar]
  16. Zhang X.; Wu F.; Yang N.; Zhan X.; Liao J.; Mai S.; Huang Z. In Silico Methods for Identification of Potential Therapeutic Targets. Interdiscip. Sci.: Comput. Life Sci. 2022, 14 (2), 285–310. 10.1007/s12539-021-00491-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
  17. Ridgway H.; Moore G. J.; Mavromoustakos T.; Tsiodras S.; Ligielli I.; Kelaidonis K.; Chasapis C. T.; Gadanec L. K.; Zulli A.; Apostolopoulos V.; Petty R.; Karakasiliotis I.; Gorgoulis V. G.; Matsoukas J. M. Discovery of a New Generation of Angiotensin Receptor Blocking Drugs: Receptor Mechanisms and in Silico Binding to Enzymes Relevant to SARS-CoV-2. Comput. Struct. Biotechnol. J. 2022, 20, 2091–2111. 10.1016/j.csbj.2022.04.010. [DOI] [PMC free article] [PubMed] [Google Scholar]
  18. Mehta J.; Utkarsh K.; Fuloria S.; Singh T.; Sekar M.; Salaria D.; Rolta R.; Begum M. Y.; Gan S. H.; Rani N. N. I. M.; Chidambaram K.; Subramaniyan V.; Sathasivam K.; Lum P. T.; Uthirapathy S.; Fadare O. A.; Awofisayo O.; Fuloria N. K. Antibacterial Potential of Bacopa Monnieri(L.) Wettst. and Its Bioactive Molecules against Uropathogens—An In Silico Study to Identify Potential Lead Molecule(s) for the Development of New Drugs to Treat Urinary Tract Infections. Molecules 2022, 27 (15), 4971 10.3390/molecules27154971. [DOI] [PMC free article] [PubMed] [Google Scholar]
  19. Saeed M. E. M.; Yücer R.; Dawood M.; Hegazy M.-E. F.; Drif A.; Ooko E.; Kadioglu O.; Seo E.-J.; Kamounah F. S.; Titinchi S. J.; Bachmeier B.; Efferth T. In Silico and In Vitro Screening of 50 Curcumin Compounds as EGFR and NF-KB Inhibitors. Int. J. Mol. Sci. 2022, 23 (7), 3966 10.3390/ijms23073966. [DOI] [PMC free article] [PubMed] [Google Scholar]
  20. Abd El Hafez M. S. M.; AbdEl-Wahab M. G.; Seadawy M. G.; El-Hosseny M. F.; Beskales O.; Saber Ali Abdel-Hamid A.; el Demellawy M. A.; Ghareeb D. A. Characterization, in-Silico, and in-Vitro Study of a New Steroid Derivative from Ophiocoma Dentata as a Potential Treatment for COVID-19. Sci. Rep. 2022, 12 (1), 5846 10.1038/s41598-022-09809-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
  21. Jiménez-Luna J.; Grisoni F.; Schneider G. Drug Discovery with Explainable Artificial Intelligence. Nat. Mach. Intell. 2020, 2 (10), 573–584. 10.1038/s42256-020-00236-4. [DOI] [Google Scholar]
  22. Clement T.; Kemmerzell N.; Abdelaal M.; Amberg M. XAIR: A Systematic Metareview of Explainable AI (XAI) Aligned to the Software Development Process. Mach. Learn. Knowl. Extr. 2023, 5 (1), 78–108. 10.3390/make5010006. [DOI] [Google Scholar]
  23. Pasrija P.; Jha P.; Upadhyaya P.; Khan M. S.; Chopra M. Machine Learning and Artificial Intelligence: A Paradigm Shift in Big Data-Driven Drug Design and Discovery. Curr. Top. Med. Chem. 2022, 22 (20), 1692–1727. 10.2174/1568026622666220701091339. [DOI] [PubMed] [Google Scholar]
  24. Jing Y.; Bian Y.; Hu Z.; Wang L.; Xie X.-Q. S. Deep Learning for Drug Design: An Artificial Intelligence Paradigm for Drug Discovery in the Big Data Era. AAPS J. 2018, 20 (3), 58–68. 10.1208/s12248-018-0210-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
  25. Qian T.; Zhu S.; Hoshida Y. Use of Big Data in Drug Development for Precision Medicine: An Update. Expert Rev. Precis. Med. Drug Dev. 2019, 4 (3), 189–200. 10.1080/23808993.2019.1617632. [DOI] [PMC free article] [PubMed] [Google Scholar]
  26. Brown N.; Cambruzzi J.; Cox P. J.; Davies M.; Dunbar J.; Plumbley D.; Sellwood M. A.; Sim A.; Williams-Jones B. I.; Zwierzyna M.; Sheppard D. W. Chapter five - Big Data in Drug Discovery. Prog. Med. Chem. 2018, 57, 277–356. 10.1016/bs.pmch.2017.12.003. [DOI] [PubMed] [Google Scholar]
  27. Tong X.; Wang D.; Ding X.; Tan X.; Ren Q.; Chen G.; Rong Y.; Xu T.; Huang J.; Jiang H.; Zheng M.; Li X. Blood–Brain Barrier Penetration Prediction Enhanced by Uncertainty Estimation. J. Cheminform. 2022, 14 (1), 44–69. 10.1186/s13321-022-00619-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
  28. Singh M.; Divakaran R.; Konda L. S. K.; Kristam R. A Classification Model for Blood Brain Barrier Penetration. J. Mol. Graphics Modell. 2020, 96, 107516 10.1016/j.jmgm.2019.107516. [DOI] [PubMed] [Google Scholar]
  29. Sakiyama H.; Fukuda M.; Okuno T. Prediction of Blood-Brain Barrier Penetration (BBBP) Based on Molecular Descriptors of the Free-Form and In-Blood-Form Datasets. Molecules 2021, 26 (24), 7428 10.3390/molecules26247428. [DOI] [PMC free article] [PubMed] [Google Scholar]
  30. Wellawatte G. P.; Gandhi H. A.; Seshadri A.; White A. D. A Perspective on Explanations of Molecular Prediction Models. J. Chem. Theory Comput. 2023, 19 (8), 2149–2160. 10.1021/acs.jctc.2c01235. [DOI] [PMC free article] [PubMed] [Google Scholar]
  31. Gandhi H. A.; White A. D.. Explaining Molecular Properties with Natural Language ChemRxiv 2022 10.26434/chemrxiv-2022-v5p6m-v3. [DOI]
  32. Rodríguez-Pérez R.; Bajorath J. Interpretation of Compound Activity Predictions from Complex Machine Learning Models Using Local Approximations and Shapley Values. J. Med. Chem. 2020, 63 (16), 8761–8777. 10.1021/acs.jmedchem.9b01101. [DOI] [PubMed] [Google Scholar]
  33. Alves V. M.; Muratov E. N.; Capuzzi S. J.; Politi R.; Low Y.; Braga R. C.; Zakharov Av.; Sedykh A.; Mokshyna E.; Farag S.; Andrade C. H.; Kuz’min V. E.; Fourches D.; Tropsha A. Alarms about Structural Alerts. Green Chem. 2016, 18 (16), 4348–4360. 10.1039/C6GC01492E. [DOI] [PMC free article] [PubMed] [Google Scholar]
  34. Kumarakulasinghe N. B.; Blomberg T.; Liu J.; Leao A. S.; Papapetrou P. In Evaluating Local Interpretable Model-Agnostic Explanations on Clinical Machine Learning Classification Models, IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS); IEEE: Rochester, MN, USA, 2020; pp 7–12.
  35. Varnek A.; Baskin I. Machine Learning Methods for Property Prediction in Chemoinformatics: Quo Vadis?. J. Chem. Inf. Model. 2012, 52 (6), 1413–1437. 10.1021/ci200409x. [DOI] [PubMed] [Google Scholar]
  36. Hua Y.; Cui X.; Liu B.; Shi Y.; Guo H.; Zhang R.; Li X. SApredictor: An Expert System for Screening Chemicals Against Structural Alerts. Front. Chem. 2022, 10, 916614 10.3389/fchem.2022.916614. [DOI] [PMC free article] [PubMed] [Google Scholar]
  37. Rodríguez-Pérez R.; Miyao T.; Jasial S.; Vogt M.; Bajorath J. Prediction of Compound Profiling Matrices Using Machine Learning. ACS Omega 2018, 3 (4), 4713–4723. 10.1021/acsomega.8b00462. [DOI] [PMC free article] [PubMed] [Google Scholar]
  38. Lavecchia A. Machine-Learning Approaches in Drug Discovery: Methods and Applications. Drug. Discovery Today 2015, 20 (3), 318–331. 10.1016/j.drudis.2014.10.012. [DOI] [PubMed] [Google Scholar]
  39. Brenk R.; Schipani A.; James D.; Krasowski A.; Gilbert I. H.; Frearson J.; Wyatt P. G. Lessons Learnt from Assembling Screening Libraries for Drug Discovery for Neglected Diseases. ChemMedChem 2008, 3 (3), 435–444. 10.1002/cmdc.200700139. [DOI] [PMC free article] [PubMed] [Google Scholar]
  40. Zafar M. R.; Khan N. Deterministic Local Interpretable Model-Agnostic Explanations for Stable Explainability. Mach. Learn. Knowl. Extr. 2021, 3 (3), 525–541. 10.3390/make3030027. [DOI] [Google Scholar]
  41. Wellawatte G. P.; Seshadri A.; White A. D. Model Agnostic Generation of Counterfactual Explanations for Molecules. Chem. Sci. 2022, 13 (13), 3697–3705. 10.1039/D1SC05259D. [DOI] [PMC free article] [PubMed] [Google Scholar]
  42. Weininger D. SMILES, a Chemical Language and Information System: 1: Introduction to Methodology and Encoding Rules. J. Chem. Inf. Comput. Sci. 1988, 28 (1), 31–36. 10.1021/ci00057a005. [DOI] [Google Scholar]
  43. Nascimento C. M. C.; Moura P. G.; Pimentel A. S. Generating Structural Alerts from Toxicology Datasets Using the Local Interpretable Model-Agnostic Explanations Method. Digital Discovery 2023, 2 (5), 1311–1325. 10.1039/D2DD00136E. [DOI] [Google Scholar]
  44. Tong X.; Wang D.; Ding X.; Tan X.; Ren Q.; Chen G.; Rong Y.; Xu T.; Huang J.; Jiang H.; Zheng M.; Li X. Blood–Brain Barrier Penetration Prediction Enhanced by Uncertainty Estimation. J. Cheminform. 2022, 14 (1), 44 10.1186/s13321-022-00619-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
  45. Ding Y.; Jiang X.; Kim Y. Relational Graph Convolutional Networks for Predicting Blood–Brain Barrier Penetration of Drug Molecules. Bioinformatics 2022, 38 (10), 2826–2831. 10.1093/bioinformatics/btac211. [DOI] [PMC free article] [PubMed] [Google Scholar]
  46. Dyabina A. S.; Radchenko E. V.; Palyulin V. A.; Zefirov N. S. Prediction of Blood-Brain Barrier Permeability of Organic Compounds. Dokl. Biochem. Biophys. 2016, 470 (1), 371–374. 10.1134/S1607672916050173. [DOI] [PubMed] [Google Scholar]
  47. Mastropietro A.; Pasculli G.; Bajorath J. Protocol to Explain Graph Neural Network Predictions Using an Edge-Centric Shapley Value-Based Approach. STAR Protoc. 2022, 3 (4), 101887 10.1016/j.xpro.2022.101887. [DOI] [PMC free article] [PubMed] [Google Scholar]
  48. Martins I. F.; Teixeira A. L.; Pinheiro L.; Falcao A. O. A Bayesian Approach to in Silico Blood-Brain Barrier Penetration Modeling. J. Chem. Inf. Model. 2012, 52 (6), 1686–1697. 10.1021/ci300124c. [DOI] [PubMed] [Google Scholar]
  49. Harren T.; Matter H.; Hessler G.; Rarey M.; Grebner C. Interpretation of Structure–Activity Relationships in Real-World Drug Design Data Sets Using Explainable Artificial Intelligence. J. Chem. Inf. Model. 2022, 62 (3), 447–462. 10.1021/acs.jcim.1c01263. [DOI] [PubMed] [Google Scholar]
  50. Rodríguez-Pérez R.; Bajorath J. Explainable Machine Learning for Property Predictions in Compound Optimization. J. Med. Chem. 2021, 64 (24), 17744–17752. 10.1021/acs.jmedchem.1c01789. [DOI] [PubMed] [Google Scholar]
  51. Miller T. Explanation in Artificial Intelligence: Insights from the Social Sciences. Artif. Intell. 2019, 267, 1–38. 10.1016/j.artint.2018.07.007. [DOI] [Google Scholar]
  52. Humer C.; Heberle H.; Montanari F.; Wolf T.; Huber F.; Henderson R.; Heinrich J.; Streit M. ChemInformatics Model Explorer (CIME): Exploratory Analysis of Chemical Model Explanations. J. Cheminform. 2022, 14 (1), 21 10.1186/s13321-022-00600-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
  53. Yu T.-H.; Su B.-H.; Battalora L. C.; Liu S.; Tseng Y. J. Ensemble Modeling with Machine Learning and Deep Learning to Provide Interpretable Generalized Rules for Classifying CNS Drugs with High Prediction Power. Briefings Bioinform. 2022, 23 (1), bbab377 10.1093/bib/bbab377. [DOI] [PMC free article] [PubMed] [Google Scholar]
  54. Jeffrey P.; Summerfield S. Assessment of the Blood–Brain Barrier in CNS Drug Discovery. Neurobiol. Dis. 2010, 37 (1), 33–37. 10.1016/j.nbd.2009.07.033. [DOI] [PubMed] [Google Scholar]
  55. Reichel A. Addressing Central Nervous System (CNS) Penetration in Drug Discovery: Basics and Implications of the Evolving New Concept. Chem. Biodivers. 2009, 6 (11), 2030–2049. 10.1002/cbdv.200900103. [DOI] [PubMed] [Google Scholar]
  56. Ramsundar B.; Eastman P.; Walters P.; Pande V.. Deep Learning for the Life Sciences, 1st ed.; O’Reilly Media, 2019. [Google Scholar]
  57. Landrum G.RDKIT. https://zenodo.org/record/6961488#.YxtZBaDMLcc. (accessed December 21, 2013).
  58. Ribeiro M. T.; Singh S.; Guestrin C.. Why Should I Trust You?”: Explaining the Predictions of Any Classifier. 2016, arXiv:1602.04938. arXiv.org e-Printarchive. https://arxiv.org/abs/1602.04938. arXiv:1602.04938.
  59. GitHub . mols2grid is an Interactive Molecule Viewer for 2D Structures, Based on RDKit. https://zenodo.org/badge/latestdoi/348814588. (accessed December 21, 2023).
  60. Hunter J. D. MATPLOTLIB: A 2D Graphics Environment. Comput. Sci. Eng. 2007, 9 (3), 90–95. 10.1109/MCSE.2007.55. [DOI] [Google Scholar]
  61. Pedregosa F.; Varoquaux G.; Gramfort A.; Michel V.; Thirion B.; Grisel O.; Blondel M.; Prettenhofer P.; Weiss R.; Dubourg V.; Vanderplas J.; Passos A.; Cournapeau D.; Brucher M.; Perrot M.; Duchesnay E. Scikit-Learn: Machine Learning in Python. J. Mach. Learn. Res. 2011, 12, 2825–2830. [Google Scholar]
  62. Reback J.; McKinney W.; Jbrockmendel; Van den Bossche J.; Augspurger T.; Cloud P.; Gfyoung; Sinhrks; Klein A.; Hawkins S.; Roeschke M.; Tratner J.; She C.; Ayd W.; Petersen T.; Garcia M.; Schendel J.; Hayden A.; Vytautas J.. et al. Pandas-Dev/Pandas: Pandas 1.4.4, Zenodo.
  63. Harris C. R.; Millman K. J.; van der Walt S. J.; Gommers R.; Virtanen P.; Cournapeau D.; Wieser E.; Taylor J.; Berg S.; Smith N. J.; Kern R.; Picus M.; Hoyer S.; van Kerkwijk M. H.; Brett M.; Haldane A.; del Río J. F.; Wiebe M.; Peterson P.; Gérard-Marchant P.; Sheppard K.; Reddy T.; Weckesser W.; Abbasi H.; Gohlke C.; Oliphant T. E. Array Programming with NumPy. Nature 2020, 585 (7825), 357–362. 10.1038/s41586-020-2649-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
  64. Perez F.; Granger B. E. IPython: A System for Interactive Scientific Computing. Comput. Sci. Eng. 2007, 9 (3), 21–29. 10.1109/MCSE.2007.53. [DOI] [Google Scholar]
  65. Wu Z.; Ramsundar B.; Feinberg E. N.; Gomes J.; Geniesse C.; Pappu A. S.; Leswing K.; Pande V. MoleculeNet: A Benchmark for Molecular Machine Learning. Chem. Sci. 2018, 9 (2), 513–530. 10.1039/C7SC02664A. [DOI] [PMC free article] [PubMed] [Google Scholar]
  66. Rogers D.; Hahn M. Extended-Connectivity Fingerprints. J. Chem. Inf. Model. 2010, 50 (5), 742–754. 10.1021/ci100050t. [DOI] [PubMed] [Google Scholar]
  67. Bergstra J.; Yamins D.; Cox D. In Making a Science of Model Search: Hyperparameter Optimization in Hundreds of Dimensions for Vision Architectures; Proceedings of theth International Conference on Machine Learning, Dasgupta S.; McAllester D., Eds.; PMLR, 2013; pp 115–123.
  68. Bradley A. P. The Use of the Area under the ROC Curve in the Evaluation of Machine Learning Algorithms. Pattern Recognit. 1997, 30 (7), 1145–1159. 10.1016/S0031-3203(96)00142-2. [DOI] [Google Scholar]
  69. Ribeiro M. T.; Singh S.; Guestrin C.. Local Interpretable Model-Agnostic Explanations (LIME): An Introduction. https://www.oreilly.com/content/introduction-to-local-interpretable-model-agnostic-explanations-lime/. (accessed June 22, 2023).
  70. Lipinski C. A. Lead- and Drug-like Compounds: The Rule-of-Five Revolution. Drug Discovery Today: Technol. 2004, 1 (4), 337–341. 10.1016/j.ddtec.2004.11.007. [DOI] [PubMed] [Google Scholar]
  71. Lipinski C. A.; Lombardo F.; Dominy B. W.; Feeney P. J. Experimental and Computational Approaches to Estimate Solubility and Permeability in Drug Discovery and Development Settings. Adv. Drug. Delivery Rev. 2001, 46 (1–3), 3–26. 10.1016/S0169-409X(00)00129-0. [DOI] [PubMed] [Google Scholar]
  72. Hartung I. V.; Huck B. R.; Crespo A. Rules Were Made to Be Broken. Nat. Rev. Chem. 2023, 7 (1), 3–4. 10.1038/s41570-022-00451-0. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

cn3c00840_si_001.pdf (633KB, pdf)

Data Availability Statement

Code, input data, and processing scripts for this paper are available at GitHub (https://github.com/andresilvapimentel/bbbp-explainer). The BBBP dataset cleaned by the authors is also provided on GitHub, together with the other data. The data analysis scripts of this paper are also available in the interactive notebook Google Colab.


Articles from ACS Chemical Neuroscience are provided here courtesy of American Chemical Society

RESOURCES