Abstract

The prediction of cytochrome P450 inhibition by a computational (quantitative) structure–activity relationship approach using chemical structure information and machine learning would be useful for toxicity research as a simple and rapid in silico tool. However, there are few in silico models focusing on the species differences between rat and human in the P450s inhibition. This study aimed to establish in silico models to classify chemical substances as inhibitors or non-inhibitors of various rat and human P450s, using only molecular descriptors. Using the in-house test results from our in vitro experiments, we used 326 substances for model construction and internal validation data. Apart from the 326 substances, 60 substances were used as external validation data set. We focused on seven rat P450s (CYP1A1, CYP1A2, CYP2B1, CYP2C6, CYP2D1, CYP2E1, and CYP3A2) and 11 human P450s (CYP1A1, CYP1A2, CYP1B1, CYP2A6, CYP2B6, CYP2C8, CYP2C9, CYP2C19, CYP2D6, CYP2E1, and CYP3A4). Most of the models established using XGBoost showed an area under the receiver operating characteristic curve (ROC-AUC) of 0.8 or more in the internal validation. When we set an applicability domain for the models and confirmed their generalization performance through external validation, most of the models showed an ROC-AUC of 0.7 or more. Interestingly, for CYP1A1 and CYP1A2, we discovered that a human P450 inhibitory activity model can predict rat P450 inhibitory activity and vice versa. These models are the first attempts to predict inhibitory activity against a wide variety of P450s in both rats and humans using chemical structure information. Our experimental results and in silico models would be helpful to support information for species similarities and differences in chemical-induced toxicity.
1. Introduction
Cytochrome P450 (P450) enzymes are major drug-metabolizing enzymes highly expressed in the liver. They play essential roles in controlling the toxicity of chemical substances by affecting their detoxification and metabolic activation of chemical substances. The P450-mediated formation of reactive metabolites, which cause cell stress, mitochondrial damage, and cytotoxicity, is associated with systemic toxicity, including hepatotoxicity.1
Most pharmacokinetic drug–drug interactions of pharmaceutical drugs result from the enzyme inhibition of P450s. Due to the diversity of the molecular species of P450s and their low substrate specificity, competition occurs for metabolic reactions between multiple substrates. The inhibition of drug metabolism may cause an increase in the concentration of administered drugs or co-administered drugs in the blood or tissue, which may significantly affect the efficacy and safety of drugs.2 Therefore, the drug–drug interaction guidelines describe the need for in vitro enzyme inhibition tests for major human P450s (CYP1A2, CYP2B6, CYP2C8, CYP2C9, CYP2C19, CYP2D6, and CYP3A).3−5
In addition, the development of liver injury is associated with chemicals that are substrates or inhibitors of P450s.6 Recently, it was reported that the inhibition of P450s was related to specific end points in repeated-dose toxicity studies in rats.7 It has also been reported that P450 inhibitors are associated with the onset of drug-induced liver injury in humans.8 Therefore, it is crucial to evaluate the P450 inhibitory activity of chemical substances to understand the development of toxicity.9
In toxicity studies, efficient toxicity prediction methods using existing data and artificial intelligence technology are attracting attention to reduce the cost and labor required for experiments. The (quantitative) structure–activity relationship ((Q)SAR) approach, which predicts the physical properties or toxicity of chemical substances from their structural information, can be a very efficient method for evaluating the toxicity of new chemical substances.10 Using the (Q)SAR approach, the toxicity of a new substance can be predicted by creating a relational formulation between chemical structural information, represented by a wide variety of molecular descriptors, and toxicity. Machine learning methods have recently been used for computational (Q)SAR models.11 Many in silico experiments that predict P450 inhibition have also been reported and have shown relatively high performance.12−16 However, many of these models target major human P450s, and no comprehensive prediction models for rat and human P450s have been reported. In other words, prediction models have not been sufficiently examined for species differences in rat and human P450s.
In this study, we assessed the inhibitory activity of the substances against various rat and human P450s using luminescent substrates as indicators of P450 reactivity of the test substances. We then developed in silico binary classification models of P450 inhibitors and non-inhibitors by machine learning using in vitro test data from rat and human P450s. We focused on seven rat P450s, namely CYP1A1, CYP1A2, CYP2B1, CYP2C6, CYP2D1, CYP2E1, and CYP3A2, and 11 human P450s, namely CYP1A1, CYP1A2, CYP1B1, CYP2A6, CYP2B6, CYP2C8, CYP2C9, CYP2C19, CYP2D6, CYP2E1, and CYP3A4. We then set Applicability Domains (ADs) to obtain reliable prediction results for our models. In addition to internal validation, external validation was performed to confirm the generalizability and robustness of the models. In addition, we conducted whether the human P450 inhibition models can predict P450 inhibition in rats, and vice versa. Finally, the variable importance was confirmed for model interpretation.
2. Material and Methods
2.1. Data Preparation
2.1.1. P450 Inhibition Data Set
In this study, we prepared P450 inhibition data sets from the experimental data. Our P450 inhibition data for model construction and internal validation used 326 substances common to both rat and human P450s.17 These substances were selected from the Hazard Evaluation Support System Integrated Platform (HESS) database (https://www.nite.go.jp/en/chem/qsar/hess-e.html; accessed August 2023). To assess the generalization ability of the models, an external validation data set was constructed from another source, the Registration, Evaluation, Authorization, and Restriction of Chemicals Regulation (REACH) substances, which were extracted randomly (ECHA, https://echa.europa.eu/information-on-chemicals/registered-substances, accessed August 2023). The duplicate substances with the internal validation data were excluded from the external validation data set. Finally, 60 substances were used as external validation data sets. These commercially available substances met the conditions for P450 inhibition assays and did not contain any inorganic substances or polymers. All chemical structures were generated using a Simplified Molecular Input Line Entry System (SMILES). If the substance was a salt, it was converted into its corresponding base or acid. The detailed substance information, such as the name, CAS Registry Number, and SMILES are presented in Table S1. Additionally, the substances used in this study are contained in the AI-SHIPS Chemical Toxicity Database (AI-SHIPS ToxDB) (https://riss.aist.go.jp/en/research-outcomes/707/, accessed April 2024). AI-SHIPS ToxDB includes toxicological findings from repeated-dose subacute toxicity studies in male rats. The substances’ names and SMILES are also listed, and SMILES are listed after rechecking the sources of the toxicity study data and considering chemical properties. The chemical space of the AI-SHIPS ToxDB substances was visualized in two dimensions based on the representative physical properties of molecular weight and LogP, and we checked the distribution of the internal and external validation substances.
2.1.2. P450 Inhibition Assays
Inhibitory activity of the test substances at 0.1, 1, 10 μM against seven rat P450s (CYP1A1, CYP1A2, CYP2B1, CYP2C6, CYP2D1, CYP2E1, and CYP3A2), and 11 human P450s (CYP1A1, CYP1A2, CYP1B1, CYP2A6, CYP2B6, CYP2C8, CYP2C9, CYP2C19, CYP2D6, CYP2E1, and CYP3A4) were determined using P450-Glo CYP1A1 Assay, P450-Glo CYP1A2 Induction/Inhibition Assay, P450-Glo CYP1B1 Assay, P450-Glo CYP2A6 Assay, P450-Glo CYP2B1 Assay, P450-Glo CYP2B6 Assay, P450-Glo CYP2C6 Assay, P450-Glo CYP2C8 Assay, P450-Glo CYP2C9 Assay, P450-Glo CYP2C19 Assay, P450-Glo CYP2D1 Assay, P450-Glo CYP2D6 Assay, P450-Glo CYP2E1 Assay, P450-Glo CYP3A2 Assay, and P450-Glo CYP3A4 Assay with Luciferin-IPA (Promega, Madison, WI, USA), and Supersomes (Corning, Corning, NY, USA) as enzyme sources, according to the manufacturer’s protocols as reported previously7,8 with minor modifications. In this study, we focused on a single result, the presence or absence of inhibitory activity, determined using luminescence-based P450 Glo-assays which are often used to screen substances. A summary of typical P450 inhibition assay conditions is presented in Table S2.
2.1.3. Definition of P450 Inhibition
The maximum value among the measured values (0–100%) at the three concentrations was used as the inhibitory activity value for each substance. When the maximum inhibitory activity was 15% or higher, the substance was considered an inhibitor, and when it was less than 15%, the substance was considered a non-inhibitor. In the classification model, P450 inhibitors were labeled as “positive” and non-inhibitors as “negative.”
2.2. Molecular Descriptors
Molecular descriptors were used to handle chemical structure information in the machine-learning models. The Mordred descriptors were calculated from SMILES using mordred (mordred 2019.01).18 A total of 1826 molecular descriptors were calculated, including 1613 descriptors in two dimensions and 213 in three dimensions. Detailed information on the molecular descriptors can be found in the literature.
2.3. Model Construction
Based on a preliminary examination using machine learning algorithms, including LightGBM, Random Forest (RF), and XGBoost, we decided to use XGBoost since it showed the best performance. A 5 × 2 nested cross-validation (CV) was employed as internal validation to examine the robustness of the models. A nested CV provides a better estimate of generalization performance and should be preferred, especially with relatively small sample sizes.19 In the 5 × 2 nested CV, the data set was randomly divided to prevent bias due to division while maintaining the ratio of positives and negatives. In the external validation, the best-performing model of a 5 × 2 nested CV for each P450 was used for prediction. Several metrics were used to evaluate the performance of the model. In our model evaluation, sensitivity (SE), specificity (SP), and balanced accuracy (BA) were calculated using the following formulas:
TP, TN, FP, and FN represent the numbers of true positives, true negatives, false positives, and false negatives, respectively. BA scores were used to evaluate the imbalanced data. We calculated the area under the receiver operating characteristic curve (ROC-AUC) and used it as the primary evaluation index for the classification model. The closer the ROC-AUC is to 1, the better the classification model. The cutoff values for judging positive or negative results were defined using the Youden index, in which the sum of SE and SP was maximized using the ROC-AUC curve obtained from each model.20 We also calculated the area under the precision-recall curve (PRC-AUC). PRC-AUC evaluates whether the model can correctly predict the minority positive sample when predicting imbalanced data. The closer the value is to 1, the better the classification model.
2.4. Definition of AD
The need to define AD is described in OECD principles to validate the (Q)SAR model.21 There are limitations on the type of chemical structure, physicochemical properties, and mechanism of action in a model that generates reliable prediction results. In this study, we used data density to define the area where the model can show sufficient performance. The average of the Euclidean distances of the nearest five substances was calculated for each substance in the internal validation data set using the k-nearest neighbor (k = 5).22,23 Euclidean distances were calculated using the molecular descriptors used for model construction. The AD scores were calculated using the logarithmic average of the five nearest substance distances. The AD scores of the substances in the internal validation data set were arranged in ascending order, and the 100th percentile score was adopted as the AD score threshold. The average score of the distances between the five nearest substances in the internal validation data was calculated for the query substance in external validation. If the score was higher than the AD score threshold, it was outside the AD; otherwise, it was within the AD threshold. The AD coverage was defined as the percentage of substances determined to be within the AD of the total substances for validation.
2.5. Prediction Certainty
In addition to the AD score, prediction certainty was evaluated using the prediction probability (0–1) for each substance calculated from the prediction model. The prediction probability was evaluated by adjusting the cutoff value for each model to 0.5. In other words, a substance was considered positive if its predicted probability was greater than 0.5 and negative if it was less than 0.5. In contrast, a substance with a predicted probability near 0.5 was treated as an inconclusive predicted result. The inconclusive prediction probability range was set to 0.5 ± 0, 0.05, 0.1, 0.15, or 0.2. In external validation, we confirmed the prediction performance when substances with inconclusive predictions were excluded.
2.6. Variable Importance
Variable importance quantifies the contribution of each explanatory variable to predictive results. There are five methods for calculating variable importance in the XGBoost package, and this study used “Gain,” which quantitatively indicates the contribution to the predicted results. “Gain” is an indicator of the contribution to the objective variable and is involved in the learning process of XGBoost. The calculation method is the sum of improvements in the objective variables when each branch is added. Indicators with high variable importance significantly improved the predictive value (https://xgboost.readthedocs.io/en/latest/index. cml#, accessed August 2023). Furthermore, we focused on “AtomTypeEState” of Mordred descriptors and ordered only the descriptors of “AtomTypeEState” by their importance. The “AtomTypeEState” is the Electrotopological-State (EState) index for atoms in a molecule.24 The EState value for each atom reflects the steric and electronic effects of the surrounding atoms (http://mordred-descriptor.github.io/documentation/v0.2.1/api/mordred.EState.html#mordred.EState.AtomTypeEState.es_types, accessed August 2023).
2.7. Software
Predictive model building was conducted using the following Python libraries for Windows (v3.7.4): XGBoost was used for the classification model (0.20.2), and scikit-learn (0.22.1), NumPy (1.18.1), and Pandas (0.25.1) were used for data processing. The models are available at http://www.phar.nagoya-cu.ac.jp/hp/dse/en/outline.html.
3. Results
3.1. Data Sets for Model Construction
We investigated the inhibitory activity of the 326 test substances at 0.1, 1, and 10 μM against seven rat P450s and 11 human P450s. For all the rat and human P450s, the maximum inhibitory activity values at three concentrations were used for each substance. A value of 15% of the threshold, judged as positive/negative, was determined by considering the model performance and variations in the experimental values (Table S2). Among the 326 substances from the internal validation data set for rat P450s, fewer substances were positive than negative for any of the P450s. The highest positive rate was observed for CYP1A1, with 126 positive and 200 negative substances, and the lowest rate was observed for CYP2E1, with 26 positive and 300 negative substances (Table 1). In human CYPs, fewer substances were positive than negative. The highest positive rate was observed for CYP1A1, with 137 positive and 189 negative substances, and the lowest rate was observed for CYP2E1, with 45 positive and 281 negative substances (Table 2).
Table 1. Prediction Performance of the Rat P450 Models.
| P450 | internal validation | external validation | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| number of substances (positives/negatives) | SE | SP | BA | PRC-AUC | ROC-AUC | number of substances (positives/negatives) | SE | SP | BA | PRC-AUC | ROC-AUC | |
| CYP1A1 | 126/200 | 0.79 | 0.79 | 0.79 | 0.80 | 0.86 | 24/36 | 0.63 | 0.89 | 0.76 | 0.71 | 0.84 |
| CYP1A2 | 68/258 | 0.66 | 0.78 | 0.72 | 0.63 | 0.82 | 10/50 | 0.70 | 0.72 | 0.71 | 0.38 | 0.80 |
| CYP2B1 | 108/218 | 0.72 | 0.76 | 0.74 | 0.69 | 0.84 | 22/38 | 0.73 | 0.63 | 0.68 | 0.58 | 0.73 |
| CYP2C6 | 95/231 | 0.57 | 0.89 | 0.73 | 0.69 | 0.83 | 19/41 | 0.63 | 0.93 | 0.78 | 0.77 | 0.93 |
| CYP2D1 | 38/288 | 0.63 | 0.94 | 0.78 | 0.55 | 0.87 | 7/53 | 0.29 | 0.96 | 0.62 | 0.37 | 0.81 |
| CYP2E1 | 26/300 | 0.38 | 0.78 | 0.58 | 0.11 | 0.55 | 4/56 | 0.25 | 0.89 | 0.57 | 0.10 | 0.69 |
| CYP3A2 | 81/245 | 0.72 | 0.78 | 0.75 | 0.58 | 0.81 | 16/44 | 0.69 | 0.80 | 0.74 | 0.55 | 0.78 |
Table 2. Prediction Performance of the Human P450 Models.
| P450 | internal validation | external validation | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| number of substances (positives/negatives) | SE | SP | BA | PRC-AUC | ROC-AUC | number of substances (positives/negatives) | SE | SP | BA | PRC-AUC | ROC-AUC | |
| CYP1A1 | 137/189 | 0.77 | 0.81 | 0.79 | 0.81 | 0.85 | 20/40 | 0.70 | 0.83 | 0.76 | 0.66 | 0.82 |
| CYP1A2 | 69/257 | 0.58 | 0.84 | 0.71 | 0.56 | 0.81 | 15/45 | 0.27 | 0.87 | 0.57 | 0.47 | 0.75 |
| CYP1B1 | 111/215 | 0.70 | 0.85 | 0.77 | 0.73 | 0.85 | 16/44 | 0.69 | 0.91 | 0.80 | 0.76 | 0.90 |
| CYP2A6 | 55/271 | 0.69 | 0.86 | 0.77 | 0.51 | 0.85 | 6/54 | 0.50 | 0.93 | 0.71 | 0.39 | 0.72 |
| CYP2B6 | 104/222 | 0.64 | 0.82 | 0.73 | 0.70 | 0.83 | 17/43 | 0.53 | 0.81 | 0.67 | 0.50 | 0.75 |
| CYP2C8 | 78/248 | 0.54 | 0.88 | 0.71 | 0.67 | 0.84 | 7/53 | 0.43 | 0.94 | 0.69 | 0.55 | 0.78 |
| CYP2C9 | 110/216 | 0.74 | 0.78 | 0.76 | 0.69 | 0.81 | 27/33 | 0.70 | 0.73 | 0.72 | 0.77 | 0.78 |
| CYP2C19 | 109/217 | 0.67 | 0.84 | 0.75 | 0.75 | 0.83 | 25/35 | 0.40 | 0.86 | 0.63 | 0.73 | 0.76 |
| CYP2D6 | 60/266 | 0.57 | 0.92 | 0.74 | 0.62 | 0.83 | 11/49 | 0.09 | 0.92 | 0.50 | 0.29 | 0.75 |
| CYP2E1 | 45/281 | 0.42 | 0.79 | 0.60 | 0.38 | 0.67 | 4/56 | 0.25 | 0.80 | 0.53 | 0.08 | 0.58 |
| CYP3A4 | 126/200 | 0.75 | 0.75 | 0.75 | 0.77 | 0.83 | 22/38 | 0.64 | 0.79 | 0.71 | 0.74 | 0.82 |
Similarly, of the 60 substances as the external validation data set for rat P450s, fewer substances were positive than negative for any of the P450s. The most positive rate was observed for CYP1A1, with 24 positive and 36 negative substances, and the least was for CYP2E1, with four positive and 56 negative substances (Table 1). Fewer substances were positive than negative for human P450s. The most positive rate was observed for CYP2C9, with 27 positive and 33 negative substances, and the least was for CYP2E1, with four positive and 56 negative substances (Table 2).
We checked the chemical space distribution of the internal and external validation substances based on the representative physical properties of molecular weight and Log P (Figure 1). Most of the 326 substances for the internal validation selected from the HESS database had a molecular weight of 500 or less and a Log P of 5 or less, and the 60 substances for the external validation selected from ECHA were found to have a similar distribution to the internal validation data set. The chemical space of the substances in the HESS database in AI-SHIPS ToxDB was also visualized (Figure S1).
Figure 1.
Chemical space distribution of the substances for internal and external validation. Molecular weight (MW, X-axis) and Log P (Y-axis) and were used to define the chemical space. Mordred was used to calculate MW and Log P.
3.2. Prediction Performance of Internal Validation and External Validation
The training data were divided into five subsets for the 326 substances of internal validation in the nested CV. Then, four subsets were used as training data for model building, and the remaining subset was used as test data for evaluation. The training data, which consisted of four subsets, were further divided into two subsets, and the hyperparameters were adjusted using a 2-fold CV. Finally, the optimal parameters were used to predict the test data, and the average value of the five outer loop test data was used for internal validation evaluation of the classification model. The XGboost hyperparameters were adjusted using Bayesian optimization for max_depth, n_estimators, subsample, min_child_weight, and colsample_bytree.25 A final model was constructed with 326 substances using the best parameters, and 60 substances were predicted for external validation. The best parameter values for the 5 × 2 CV are summarized in Table S3.
The prediction performances of the models, including SE, SP, BA, PRC-AUC, and ROC-AUC, are presented in Tables 1 and 2. In the internal validation of the rat P450 models, except for the CYP2E1 model, the ROC-AUC was greater than 0.8. The SE ranged from 0.38 to 0.79, SP from 0.76 to 0.94, BA from 0.58 to 0.79, and PRC-AUC from 0.11 to 0.80. Similarly, for the human P450 models, except for the CYP2E1 model, the ROC-AUC was greater than 0.8. The SE ranged from 0.42 to 0.77, SP from 0.75 to 0.92, BA from 0.60 to 0.79, and PRC-AUC from 0.38 to 0.81. For external validation, we evaluated ADs to provide reliable predictive results. This study used the k-nearest neighbor method (k = 5) with Euclidean distances. When the ROC-AUC and AD coverage were comprehensively evaluated, the appropriate value for k of the k-nearest neighbor method was five, and the AD score threshold was the 100th percentile (data not shown). In external validation, all 60 substances were within the ADs of the models. The ROC-AUC of inside ADs ranged from 0.69 to 0.93 for the rat P450 models (Tables 1 and 2). The SE ranged from 0.25 to 0.73, SP from 0.63 to 0.96, BA from 0.57 to 0.78, and PRC-AUC from 0.10 to 0.77. Similarly, the human P450 models’ ROC-AUC ranged from 0.58 to 0.90, SE from 0.09 to 0.70, SP from 0.73 to 0.94, BA from 0.50 to 0.80, and PRC-AUC from 0.08 to 0.77. Except for the CYP2E1 model, the ROC-AUC was greater than 0.7.
Regarding CYP1A1 and CYP1A2, we investigated whether the human P450 models can predict the P450 inhibitory activity in rats and vice versa. Rat and human CYP1A1 and CYP1A2 are ortholog P450s, and the performances of their predictive models were high in both internal and external validations. We compared the predicted results of the human CYP1A1 model for rat CYP1A1 inhibition and the experimental results of rat CYP1A1 for 60 external validation substances. The SE was 0.71, SP was 0.89, and BA was 0.80. Conversely, we compared the predicted results of the rat CYP1A1 model for human CYP1A1 and the experimental results of human CYP1A1. The SE was 0.65, SP was 0.85, and BA was 0.75. Similarly, we analyzed CYP1A2. When the predicted results of the human CYP1A2 model for rat CYP1A2 inhibition were compared with the experimental results of rat CYP1A2, the SE was 0.40, SP was 0.88, and BA was 0.64. When the predicted results of the rat CYP1A2 model for human CYP1A2 inhibition were compared with the experimental results of human CYP1A2, the SE was 0.53, SP was 0.71, and BA was 0.62. These results suggest that the inhibitory activity of CYP1A1 and CYP1A2 in humans can be predicted from the rat model and vice versa. In addition, CYP1A1 had better predictive performance than CYP1A2 in the mutual use of rat and human models.
3.3. Prediction Certainty
In the external validation, prediction certainty was evaluated using the prediction probability (0–1) for each substance calculated from the prediction model. The inconclusive prediction probability range was set to 0.5 ± 0, 0.05, 0.1, 0.15, or 0.2, and the ROC-AUC and coverage after excluding substances with inconclusive results were calculated (Table 3). Although coverage was limited, some models with improved predictive performance were observed. For rat CYP2B1, rat CYP2E1, and human CYP2E1, the ROC-AUC increased by more than 0.1 when substances with inconclusive prediction results were excluded, and the best ROC-AUC were 0.83, 0.80, and 0.78, respectively. For rat CYP2B1, rat CYP2E1, and human CYP2E1, the coverages were 47, 70, and 57%, respectively. For the rat CYP2E1 model, the predicted probability range for all substances was 0.5 ± 0.1. Therefore, none of the substances had a predicted probability of 0.5 ± 0.15 or 0.2. ROC-AUC and coverage could not be calculated. In contrast, for rat CYP2C6, human CYP1A1, human CYP1B1, human CYP2A6, and human CYP2C8, the ROC-AUC did not increase even when substances with inconclusive prediction results were excluded.
Table 3. Prediction Probability and Coverage of the Rat and Human P450 Models.
| P450 | distance from the predicted probability of 0.5 | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| 0 | ±0.05 | ±0.1 | ±0.15 | ±0.2 | ||||||
| ROC-AUC | coverage | ROC-AUC | coverage | ROC-AUC | coverage | ROC-AUC | coverage | ROC-AUC | coverage | |
| Rat | ||||||||||
| CYP1A1 | 0.84 | 100% | 0.84 | 97% | 0.85 | 93% | 0.84 | 78% | 0.84 | 73% |
| CYP1A2 | 0.80 | 100% | 0.82 | 92% | 0.85 | 78% | 0.88 | 62% | 0.86 | 42% |
| CYP2B1 | 0.73 | 100% | 0.74 | 88% | 0.74 | 73% | 0.80 | 53% | 0.83 | 47% |
| CYP2C6 | 0.93 | 100% | 0.93 | 97% | 0.93 | 95% | 0.93 | 88% | 0.93 | 87% |
| CYP2D1 | 0.81 | 100% | 0.79 | 97% | 0.80 | 95% | 0.82 | 93% | 0.84 | 90% |
| CYP2E1 | 0.69 | 100% | 0.76 | 80% | 0.80 | 70% | NAa | NA | NA | NA |
| CYP3A2 | 0.78 | 100% | 0.78 | 88% | 0.79 | 75% | 0.80 | 67% | 0.80 | 62% |
| Human | ||||||||||
| CYP1A1 | 0.82 | 100% | 0.82 | 95% | 0.82 | 93% | 0.82 | 92% | 0.82 | 90% |
| CYP1A2 | 0.75 | 100% | 0.77 | 98% | 0.76 | 97% | 0.76 | 90% | 0.76 | 82% |
| CYP1B1 | 0.90 | 100% | 0.90 | 98% | 0.90 | 98% | 0.90 | 95% | 0.90 | 95% |
| CYP2A6 | 0.72 | 100% | 0.69 | 93% | 0.64 | 87% | 0.65 | 80% | 0.67 | 73% |
| CYP2B6 | 0.75 | 100% | 0.75 | 100% | 0.75 | 98% | 0.77 | 90% | 0.78 | 85% |
| CYP2C8 | 0.78 | 100% | 0.78 | 100% | 0.76 | 97% | 0.75 | 90% | 0.75 | 88% |
| CYP2C9 | 0.78 | 100% | 0.79 | 90% | 0.80 | 78% | 0.81 | 73% | 0.82 | 67% |
| CYP2C19 | 0.76 | 100% | 0.78 | 97% | 0.78 | 97% | 0.77 | 93% | 0.76 | 92% |
| CYP2D6 | 0.75 | 100% | 0.75 | 100% | 0.78 | 97% | 0.76 | 95% | 0.76 | 83% |
| CYP2E1 | 0.58 | 100% | 0.63 | 88% | 0.65 | 73% | 0.67 | 72% | 0.78 | 57% |
| CYP3A4 | 0.82 | 100% | 0.84 | 95% | 0.84 | 92% | 0.85 | 88% | 0.88 | 77% |
NA; not available.
3.4. Variable Importance
The variable importance of the prediction models in the internal validation was examined. Approximately 600 molecular descriptors were used in the models. A list of variables of importance for each model is presented in Table S4. In the XGBoost package, we calculated variable importance using “gain,” which quantitatively indicates the contribution to the predicted results. We averaged the variable importance of the five models in the outer loop of the 5 × 2 nested CV and focused on “AtomTypeEState” molecular descriptors. Only the descriptors of “AtomTypeEState” in the model are ordered by their importance to the prediction of the objective variable (Table S4). We compared the variable importance between the models for the ortholog P450s, namely CYP1A1, CYP1A2, and CYP2E1 (Table 4). For the rat CYP1A1 and human CYP1A1 models, the both “SaasC” and “SdssC” variables were commonly ranked high. Similarly, for the rat CYP1A2 and human CYP1A2 models, the “SaaaC,” “SaasC,” and “NaasC” variables were ranked high. For “SaasC,” “S“ stands for the sum of E-state values for all the “aasC,” and for “NaasC,” “N” stands for the count of “aasC.” “aasC” represents aCa— (“a” means aromatic bond), “dssC” represents = C<, and “aaaC” represents aaCa (“a” means aromatic bond).24 These results suggest that these variables contributed to the predictions. In contrast to the CYP1A1 and CYP1A2 models, no common “AtomTypeEState” descriptors with high contributions were found for rat and human CYP2E1.
Table 4. Top 3 Important Variables Among “AtomTypeEState”.
| rank | rat CYP1A1 | human CYP1A1 | rat CYP1A2 | human CYP1A2 | rat CYP2E1 | human CYP2E1 |
|---|---|---|---|---|---|---|
| 1 | SaasC | NaasC | SaasC | SaaaC | SdssC | NaasC |
| 2 | SsOH | SaasC | NaasC | SaasC | SsNH2 | SssNH |
| 3 | SdssC | SdssC | SaaaC | NaasC | SdsCH | SssCH2 |
4. Discussion
We constructed XGBoost models using only molecular descriptors to classify the inhibitors and non-inhibitors of seven rat P450s (CYP1A1, CYP1A2, CYP2B1, CYP2C6, CYP2D1, CYP2E1, and CYP3A2) and 11 human P450s (CYP1A1, CYP1A2, CYP1B1, CYP2A6, CYP2B6, CYP2C8, CYP2C9, CYP2C19, CYP2D6, CYP2E1, and CYP3A4). Our constructed models showed satisfactory predictive performance for most of the P450s investigated in internal and external validations. For internal validation, most of the models showed an ROC-AUC of approximately 0.8, whereas rat CYP2E1 had an ROC-AUC of less than 0.6. Most of the models showed a ROC-AUC of 0.7 for external validation, and human CYP2E1 did not reach 0.6 (Tables 1 and 2). In terms of BA and PRC-AUC, which evaluate the predictive performance of imbalanced data, CYP2E1 had the lowest values in both rat and human models. This low performance might be attributed to the lack of positive substances. The rat and human CYP2E1 data set contained only 26 and 45 positive substances, respectively, among 326 substances tested. Therefore, in rat and human CYP2E1, we attempted to improve the positive–negative imbalance by oversampling categories with little data, but the performance did not improve (data not shown). Moreover, for rat CYP2D1, which had the fewest positive substances next to rat CYP2E1 with 38 positives and 288 negatives, the ROC-AUC was 0.87 in internal and 0.81 in external validation. For human CYP2A6, which had the fewest positive substances next to human CYP2E1, with 55 positives and 271 negative substances, the ROC-AUC was 0.85 in internal and 0.72 in external validation. For rat CYP2D1, the BA and PRC-AUC were 0.78 and 0.55 in internal and 0.62 and 0.37 in external validation, respectively. For human CYP2A6, the BA and PRC-AUC were 0.77 and 0.51 in internal and 0.71 and 0.39 in external validation, respectively. These scores were not extremely low compared to other P450 models. These results suggest that the poor predictive performance of the rat and human CYP2E1 models was due not only to the small number of positive substances but also to the difficulty in learning positive substances. In the training data for this model selected from HESS, it was thought that the inhibitors of rat and human CYP2E1 were very complex. In the rat and human CYP2E1 models, many substances showed ambiguous predictions with predicted probabilities around 0.5 (Table 3).
Our models had prediction performance, confirmed by external verification using data sources other than those used for model building. Thus, it has a general performance for predicting new substances. To apply our models, the AD score of the new substance must fall within the ADs of the training data. In addition, the P450 inhibitory activity of high-molecular-weight and inorganic substances cannot be predicted since such substances were excluded from the experiments. Additionally, it is necessary to check the prediction probability of the new substance and carefully consider the prediction results if it is approximately 0.5. In this study when substances with a prediction probability of 0.5 ± 0.2 were excluded, and only substances with 0.7 or more or less than 0.3 were used, almost all the models showed an ROC-AUC of 0.75 or more in the external validation (Table 3). The model of rat CYP2E1, which originally had a low performance, showed enough performance with an ROC-AUC of 0.80 when substances with a prediction probability of 0.5 ± 0.1 were excluded and only those with 0.6 or more or less than 0.4 were used. Moreover, the coverage was greater than 70%, indicating that many chemical substances could be predicted. Similarly, the model of human CYP2E1, which originally had a low performance, showed sufficient performance with ROC-AUC of 0.78 when substances with a prediction probability of 0.5 ± 0.2 were excluded and only substances with 0.7 or more or less than 0.3 were used. However, the coverage rate was 57%. From these results, it was assumed that rat and human CYP2E1 models could also be used to predict a part of new substances by confirming predicted probability values.
Previously, focusing on human P450 inhibition, several studies have reported machine learning prediction models. Given the differences in data sets or algorithms, it is inappropriate to compare our models with the previous models directly. Therefore, we only presented the ROC-AUC or BA values of the models in the internal validation. For human CYP2B6 and CYP2C8, our model had performance comparable to or better than previously reported models. In the internal validation, the human CYP2B6 model of Li et al.14 had an ROC-AUC of 0.75 and a BA of 0.65, while the ROC-AUC and BA of our model were 0.83 and 0.73, respectively. The human CYP2C8 model of Zhang et al.12 had an ROC-AUC of 0.85 and a BA of 0.80, while the ROC-AUC and BA of our model were 0.84 and 0.71, respectively, in the internal validation. In Li et al.14 and Zhang et al.,12 XGBoost and Support Vector Machine (SVM) algorithms were used, the training data was less than 500 substances, and fingerprints were used. The inhibitor/non-inhibitor ratio was approximately 1:2 for human CYP2B6 and approximately 4:3 for human CYP2C8. Although the number of positive substances for human CYP2C8 was about twice as high as ours. On the other hand, for human CYP1A2, CYP2C9, CYP2C19, CYP2D6, and CYP3A4, our model performed lower than previously reported models. The human CYP1A2 model of Novotarskyi et al.15 had a BA of 0.83, while the BA of our model was 0.71 in the internal validation. The human CYP2C9 model of Goldwaser et al.13 had a BA of 0.83, while the BA of our model was 0.76 in the internal validation. Novotarskyi et al.15 and Goldwaser et al.13 used Neural Networks, SVM, and RF algorithms, the training data were thousands of substances, and molecular descriptors were used. Recently, Ai et al.16 developed DEEPCYPs and achieved prediction of inhibitory activity against human CYP1A2, CYP2C9, CYP2C19, CYP2D6, and CYP3A4 using a multitask model using deep learning. The training data of DEEPCYPs consisted of approximately 10,000 substances, and a combination of molecular graphs and fingerprints was used in the deep learning model. The average ROC-AUC and BA values of the five P450s were 0.91 and 0.82, respectively. In our model, the average ROC-AUC and BA values for these five human P450s were 0.82 and 0.74, respectively, in internal validation.
Furthermore, we performed predictions of the 60 substances in our external validation data set using the DEEPCYPs online platform.16 The average ROC-AUC and BA values of the five P450s were 0.75 and 0.63, respectively. In our model, the average ROC-AUC and BA values for these five human P450s were 0.77 and 0.63, respectively, in external validation. The average PRC-AUC values of the five P450s were 0.67 for DEEPCYPs and 0.60 for our model. The evaluation metrics for individual P450s are summarized in Table S5. In external validation, DEEPCYPs and our model showed comparable performance. In addition, while there were large differences in prediction performance between internal and external validation for DEEPCYPs, the differences were small for our model, suggesting that robust predictions with our model can be made for new substances.
The variables that contributed to the prediction were examined using variable importance. Since the inhibitory activity data of rat and human P450s was tested comprehensively, we focused on the ortholog P450s and analyzed “AtomTypeEState” descriptors of Mordred. Due to the wide variety of molecular descriptors used in the models, we focused on “AtomTypeEState,” which reflects the steric and electronic effects of surrounding atoms. CYP1A1 and CYP1A2 had common “AtomTypeEState” descriptors with high importance of “AtomTypeEState” in human and rat, but in contrast, CYP2E1 showed that the important “AtomTypeEState” descriptors differed between human and rat (Table 4). Therefore, although rat CYP2E1 and human CYP2E1 are orthologous enzymes, the “AtomTypeEState” descriptors important for predicting inhibitory activity may differ between rats and humans.
For both CYP1A1 and CYP1A2, rat P450 inhibitory activity prediction models were able to predict human P450 inhibitory activity, and vice versa. Although CYP1A1 and CYP1A2 of the CYP1A subfamily are orthologs, species differences in substrate specificity between humans and rats have been reported.26,27 Among the 326 substances used to construct our models, the difference between rats and humans in the number of inhibitors and non-inhibitors defined from the inhibitory activity value obtained from the experimental results was 39 substances (12%) for CYP1A1 and 33 substances for CYP1A2 (10%). The predictive performance results of the mutual use of our CYP1A models may indicate that species differences are not affected in predicting inhibitory activity. Although the number was small, some substances showed species differences, and thus to take species differences into account, the prediction of P450 inhibition in human and rat would be useful. For CYP1A1 and CYP1A2, we have shown a list of substances with species differences and the predicted and experimental results in Table S6. Internal validation results showed that the CYP1A1 models performed better than the CYP1A2 models in correctly predicting substances that cause species differences in both human and rat models, but no differences were observed in the external validation, probably due to the small number of substances. The predictive performance of both human and rat models of CYP1A1 was higher than that of CYP1A2 (Tables 1 and2). In other words, to accurately predict the difference in CYP inhibitory activity between humans and rats, models with high predictive performance for both human and rat are necessary.
On the other hand, differences in the predictive performance of CYP1A1 and CYP1A2 were observed in rat and human mutual use. Although the amino acid sequences of CYP1A1 and CYP1A2 are similar between rat and human enzymes, differences in their substrate specificity and organ distribution have been confirmed, and the selectivity of inhibitors is also discussed.28,29 Our results also showed that CYP1A1 had better predictive performance than CYP1A2 in the mutual use of rat and human models. This suggests that the differences in responsiveness to inhibitors between humans and rats might be larger for CYP1A2 than for CYP1A1.
In conclusion, we constructed high-performance in silico models to classify inhibitors and non-inhibitors for various P450s in both rat and human using Xgboost. Predictions can be made using only chemical structure information, and thus, the models provide a simple method that does not require synthesizing a new substance. Furthermore, generalization performance has been confirmed through external verification, and predictions of new substances can be used based on AD judgment and prediction probability information. Although limited to specific P450s, our findings show that a human P450 inhibitory activity model can predict rat P450 inhibitory activity and vice versa. The advantage of our in silico models is that they can predict the differences in the inhibitory activity of chemicals against various P450s and the differences in P450 inhibition between humans and rats. Finally, our in silico models might provide supporting information for considering in vivo toxic effects.
Acknowledgments
The authors thank AI-SHIPS project members for many helpful discussions. We would like to thank Editage (www.editage.jp) for English language editing.
Supporting Information Available
The Supporting Information is available free of charge at https://pubs.acs.org/doi/10.1021/acs.chemrestox.4c00168.
Substance information of the data set, details of the cytochrome P450 inhibition assays, optimized parameters used in the models, detailed variable importance values, prediction performance of external validation of our model and DEEPCYPs, experimental and predicted values for the CYP1A1 and CYP1A2 models, and chemical space of our data set (XLSX)
Author Contributions
K.A., M.N., K.Y., and M.T. wrote the manuscript. K.A., K.Y., and M.T. designed the research. T.S. and K.Y. performed the in vitro research. K.A., M.N., R.T., K.S., and M.T. constructed computational models and analyzed the data.
This study was supported in part by the artificial intelligence-based substance hazard integrated prediction system (AI-SHIPS) project of the Ministry of Economy, Trade, and Industry of Japan and a grant from the Japanese Society for the Promotion of Science KAKENHI program (Grant Number JP23K06133)
The authors declare no competing financial interest.
Supplementary Material
References
- Guengerich F. P. A history of the roles of cytochrome P450 enzymes in the toxicity of drugs. Toxicological research 2021, 37 (1), 1–23. 10.1007/s43188-020-00056-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Chalasani N.; Björnsson E. Risk factors for idiosyncratic drug-induced liver injury. Gastroenterology 2010, 138 (7), 2246–2259. 10.1053/j.gastro.2010.04.001. [DOI] [PMC free article] [PubMed] [Google Scholar]
- FDA . In vitro Drug Interaction Studies—Cytochrome P450 Enzyme- and Transporter-Mediated Drug Interactions: Final guidance. 2020https://www.fda.gov/regulatory-information/search-fda-guidance-documents/clinical-drug-interaction-studies-cytochrome-p450-enzyme-and-transporter-mediated-drug-interactions (last accessed August 2023).
- EMA . Guideline on the Investigation of Drug Interactions, 2013https://www.ema.europa.eu/en/documents/scientific-guideline/guideline-investigation-drug-interactions-revision-1_en.pdf (last accessed August 2023).
- MHLW (PMDA) . Guideline on drug interaction for drug development and appropriate provision of information, 2018. https://www.pmda.go.jp/files/000228122.pdf. (last accessed August 2023).
- Yu K.; Geng X.; Chen M.; Zhang J.; Wang B.; Ilic K.; Tong W. High daily dose and being a substrate of cytochrome P450 enzymes are two important predictors of drug-induced liver injury. Drug metabolism and disposition 2014, 42 (4), 744–750. 10.1124/dmd.113.056267. [DOI] [PubMed] [Google Scholar]
- Watanabe M.; Sasaki T.; Takeshita J.; Kushida M.; Shimizu Y.; Oki H.; Kitsunai Y.; Nakayama H.; Saruhashi H.; Ogura R.; Shizu R.; Hosaka T.; Yoshinari K. Application of cytochrome P450 reactivity on the characterization of chemical compounds and its association with repeated-dose toxicity. Toxicol. appl. pharmacol. 2020, 388, 114854 10.1016/j.taap.2019.114854. [DOI] [PubMed] [Google Scholar]
- Shimizu Y.; Sasaki T.; Yonekawa E.; Yamazaki H.; Ogura R.; Watanabe M.; Hosaka T.; Shizu R.; Takeshita J.; Yoshinari K. Association of CYP1A1 and CYP1B1 inhibition in in vitro assays with drug-induced liver injury. J. Toxic. Sci. 2021, 46 (4), 167–176. 10.2131/jts.46.167. [DOI] [PubMed] [Google Scholar]
- Yoda T.; Tochitani T.; Usui T.; Kouchi M.; Inada H.; Hosaka T.; Kanno Y.; Miyawaki I.; Yoshinari K. Involvement of the CYP1A1 inhibition-mediated activation of aryl hydrocarbon receptor in drug-induced hepatotoxicity. J. Toxic. Sci. 2022, 47 (9), 359–373. 10.2131/jts.47.359. [DOI] [PubMed] [Google Scholar]
- Bloomingdale P.; Housand C.; Apgar J. F.; Millard B. L.; Mager D. E.; Burke J. M.; Shah D. K. Quantitative systems toxicology. Current Opinion in Toxicology 2017, 4, 79–87. 10.1016/j.cotox.2017.07.003. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Torres Silva F.; Trossini G. H. G. The survey of the use of QSAR methods to determine intestinal absorption and oral bioavailability during drug design. Med. Chem. 2014, 10 (5), 441–448. 10.2174/1573406410666140415122115. [DOI] [PubMed] [Google Scholar]
- Zhang X.; Zhao P.; Wang Z.; Xu X.; Liu G.; Tang Y.; Li W. In silico prediction of CYP2C8 inhibition with machine-learning methods. Chem. Res. Toxicol. 2021, 34 (8), 1850–1859. 10.1021/acs.chemrestox.1c00078. [DOI] [PubMed] [Google Scholar]
- Goldwaser E.; Laurent C.; Lagarde N.; Fabrega S.; Nay L.; Villoutreix B. O.; Jelsch C.; Nicot A. B.; Loriot M.-A.; Miteva M. A. Machine learning-driven identification of drugs inhibiting cytochrome P450 2C9. PLoS Computational Biology 2022, 18 (1), e1009820 10.1371/journal.pcbi.1009820. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Li L.; Lu Z.; Liu G.; Tang Y.; Li W. Machine Learning Models to Predict Cytochrome P450 2B6 Inhibitors and Substrates. Chem. Res. Toxicol. 2023, 36 (8), 1332–1344. 10.1021/acs.chemrestox.3c00065. [DOI] [PubMed] [Google Scholar]
- Novotarskyi S.; Sushko I.; Körner R.; Pandey A. K.; Tetko I. V. A comparison of different QSAR approaches to modeling CYP450 1A2 inhibition. J. Chem. Inf Model. 2011, 51 (6), 1271–1280. 10.1021/ci200091h. [DOI] [PubMed] [Google Scholar]
- Ai D.; Cai H.; Wei J.; Zhao D.; Chen Y.; Wang L. DEEPCYPs: A deep learning platform for enhanced cytochrome P450 activity prediction. Front Pharmacol. 2023, 14, 1099093 10.3389/fphar.2023.1099093. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kodama S.; Yoshii N.; Ota A.; Takeshita J.; Yoshinari K.; Ono A. Association between in vitro nuclear receptor-activating profiles of chemical compounds and their in vivo hepatotoxicity in rats. J. Toxic. Sci. 2021, 46 (12), 569–587. 10.2131/jts.46.569. [DOI] [PubMed] [Google Scholar]
- Moriwaki H.; Tian Y.-S.; Kawashita N.; Takagi T. Mordred: a molecular descriptor calculator. J. Cheminf. 2018, 10 (1), 1–14. 10.1186/s13321-018-0258-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Cawley G. C.; Talbot N. L. On over-fitting in model selection and subsequent selection bias in performance evaluation. J. Mach. Learn. Res. 2010, 11, 2079–2107. [Google Scholar]
- Youden W. J. Index for rating diagnostic tests. Cancer 1950, 3 (1), 32–35. . [DOI] [PubMed] [Google Scholar]
- OECD . Guidance Document on the Validation of (Quantitative) Structure-Activity Relationship [(Q)SAR] Models. In: OECD Series on Testing and Assessment, 2014, No. 69. [Google Scholar]
- Sahigara F.; Ballabio D.; Todeschini R.; Consonni V. Defining a novel k-nearest neighbours approach to assess the applicability domain of a QSAR model for reliable predictions. J. Cheminf. 2013, 5, 1–9. 10.1186/1758-2946-5-27. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Mathea M.; Klingspohn W.; Baumann K. Chemoinformatic classification methods and their applicability domain. Molecular Informatics 2016, 35 (5), 160–180. 10.1002/minf.201501019. [DOI] [PubMed] [Google Scholar]
- Hall L. H.; Kier L. B. Electrotopological state indices for atom types: a novel combination of electronic, topological, and valence state information. J. Chem. Inf. Comput. Sci. 1995, 35 (6), 1039–1045. 10.1021/ci00028a014. [DOI] [Google Scholar]
- Snoek J.; Larochelle H.; Adams R. P. Practical Bayesian Optimization of Machine Learning Algorithms. Adv. Neural Inf. Process. Syst. 2012, 25. [Google Scholar]
- Yamazoe Y.; Yoshinari K. Prediction of regioselectivity and preferred order of CYP1A1-mediated metabolism: Solving the interaction of human and rat CYP1A1 forms with ligands on the template system. Drug Metab Pharmacokinet. 2020, 35 (1), 165–185. 10.1016/j.dmpk.2019.10.008. [DOI] [PubMed] [Google Scholar]
- Yamazoe Y.; Yoshinari K. Prediction of regioselectivity and preferred order of metabolisms on CYP1A2-mediated reactions part 3: Difference in substrate specificity of human and rodent CYP1A2 and the refinement of predicting system. Drug Metab Pharmacokinet. 2019, 34 (4), 217–232. 10.1016/j.dmpk.2019.02.001. [DOI] [PubMed] [Google Scholar]
- Liu J.; Sridhar J.; Foroozesh M. Cytochrome P450 family 1 inhibitors and structure-activity relationships. Molecules. 2013, 18 (12), 14470–14495. 10.3390/molecules181214470. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lewis D. F.; Lake B. G.; George S. G.; Dickins M.; Eddershaw P. J.; Tarbit M. H.; Beresford A. P.; Goldfarb P. S.; Guengerich F. P. Molecular modelling of CYP1 family enzymes CYP1A1, CYP1A2, CYP1A6 and CYP1B1 based on sequence homology with CYP102. Toxicology. 1999, 139 (1–2), 53–79. 10.1016/S0300-483X(99)00098-0. [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.

