Abstract
Molecular design and evaluation for drug development and chemical safety assessment have been advanced by quantitative structure–activity relationship (QSAR) using artificial intelligence techniques, such as deep learning (DL). Previously, we have reported the high performance of prediction models molecular initiation events (MIEs) on the adverse toxicological outcome using a DL-based QSAR method, called DeepSnap-DL. This method can extract feature values from images generated on a three-dimensional (3D)-chemical structure as a novel QSAR analytical system. However, there is room for improvement of this system’s time-consumption. Therefore, in this study, we constructed an improved DeepSnap-DL system by combining the processes of generating an image from a 3D-chemical structure, DL using the image as input data, and statistical calculation of prediction-performance. Consequently, we obtained that the three prediction models of agonists or antagonists of MIEs achieved high prediction-performance by optimizing the parameters of DeepSnap, such as the angle used in the depiction of the image of a 3D-chemical structure, data-split, and hyperparameters in DL. The improved DeepSnap-DL system will be a powerful tool for computer-aided molecular design as a novel QSAR system.
Keywords: cheminformatics, computer aided molecular design, deep learning, molecular remodeling, QSAR
1. Introduction
Quantitative structure–activity relationship (QSAR) models can reduce the time and cost of molecular screening through mathematical prediction models of regression or classification of properties and activities of a chemical compound based on their chemical structure and statistically significant corresponding physicochemical/toxicological properties with other methods such as homology modeling, molecular docking, and molecular dynamics (MD) simulation [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34]. The structure-based molecular design mainly includes a receptor-based method through a three-dimensional (3D) chemical structure to obtain ligand interaction [1,35,36]. However, traditional QSAR models may frequently miss suitable candidate molecules, because of the poor predictive accuracy and versatility caused by poor feature selection that requires skill and knowledge and conformational limitations for coincidence effect [1,37,38,39]. Therefore, a QSAR system with high-throughput and performance is desired because of the development of novel medicines, chemicals, and nanomaterials on human health. The important factor for solving the QSAR issue is the extraction of information-rich numerical molecular descriptors associated with physicochemical/toxicological properties. However, 3D-QSAR has a high computational cost, and its performance is sensitive to changes in the ligand geometry such as conformation and orientation [1,40]. To resolve these drawbacks, 4D-QSAR, called MD-QSAR, applied the ligand geometry problem for the effective ligand constrains using a 4D-chemical descriptor with multiple structural conformation, orientation, and protonation state calculated through short run MD stimulation to approximate the Boltzman sampling [1,41,42]. Although the 4D-QSAR can reduce the bias by selecting conformation, orientation, and protonation state, it requires more adaptation of the ligand topology within its target protein-binding pocket [43]. Thus, 5D-QSAR was proposed explicitly to represent different induced-fit models in 4D-QSAR [1,41,43,44,45]. Furthermore, 6D-QSAR was introduced incorporating different solvation models in 5D-QSAR [1,41,46].
A DL-based QSAR system, called DeepSnap-DL, was reported to capture molecular features from molecular images photographed on a 3D-chemical structure [47]. In the DeepSnap-DL system, parameters for depicting a ball-and-stick model of chemical structure influenced prediction-performances of toxicity activity of the molecular initiation event (MIE) molecule, which interacts with protein and/or DNA in an adverse outcome pathway induced by chemical compounds in human body using the Tox21 10k library, including about 10,000 (10k) chemicals, e.g., approved drugs and environmental chemicals [48,49,50]. The prediction models using the DeepSnap-DL system achieved higher performance than conventional ML techniques, such as random forest, XGBoost, LightGBM, and CatBoost [51,52]. Additionally, prediction models of MIE molecule agonist or antagonist activity were constructed using by the DeepSnap-DL system with the Tox21 10k library, suggesting this system as essential tool for novel QSAR analysis due to automatic feature extraction with numerous structural information from a 3D-chemical structure [53,54]. For high-throughput of the DeepSnap-DL system, automation in the DeepSanp-DL system has been conducted by combining each process consisting of the generation of images from a 3D-chemical structure based on the simplified molecular input line entry system (SMILES) format, DL using these images as input data, and calculation of prediction-performance indexes using TensorFlow and Keras [54]. In the modified DeepSnap-DL system, the mean values of receiver operating characteristic area under the curve (ROC_AUC) of the prediction models for 59 MIE targets in validation, test, and foldout datasets indicated 0.818 ± 0.056, 0.803 ± 0.063, and 0.792 ± 0.076, respectively [54]. Furthermore, two of the MIE targets, peroxisome proliferator activated receptor γ (PPARγ) agonist (PPARg_ago, AID:743140) and aromatase antagonist (Arom_ant, AID:743139), improved the prediction-performance by optimizing of parameters in the modified DeepSnap-DL system, such as angle in the depiction of the image from 3D-chemicals, data-split ratio with training (train), validation (valid), and test datasets, background color in an image, and learning rate (LR) and batch size (BS) in hyperparameters in DL [54].
In this study, we used the modified DeepSnap-DL with Python and basic DeepSnap-DL with DIGITS systems to construct prediction models in three of MIEs, glucocorticoid receptor (PubChem assay AID:720725_GR_ant), transforming growth factor (TGF)-beta/Smad (PubChem assay AID:1347032_TGF_beta_ant), and thyrotropin-releasing hormone receptor (PubChem assay AID:1347030_TRHR_ago), by optimizing parameters in the DeepSnap-DL system. According to the previously reported MIE molecules, agonist, or antagonist prediction models in the three MIE molecules constructed using the modified DeepSnap-DL with Python showed that it would be essential tools in a novel QSAR system in computer-aided molecular design.
2. Results and Discussion
2.1. Angles and Data Split in DeepSnap-DL with DIGITS and Python Systems
To analyze the influence of different angles on the snapshot generation of DeepSnap_Python and DeepSnap_DIGITS as 256 × 256 pixel PNG files, we used 31 and 23 from 65°, 65°, 65° to 350°, 350°, 350° in Python and from 70°, 70°, 70° to 345°, 345°, 345° in DIGITS of 720725_GR_ant, 15 and 17 from 95°, 95°, 95° to 325°, 325°, 325° in Python and from 95°, 95°, 95° to 355°, 355°, 355° in DIGITS of 1347030_TRHR_ago, 16 and 16 from 75°, 75°, 75° to 350°, 350°, 350° in Python and from 75°, 75°, 75° to 350°, 350° 350° in DIGITS of 1347032_TGF_beta_ant, different angles (Table 1). Additionally, to examine the influence of different splits among the train, valid, and test datasets, seven types train:valid:test = 1:1:1, 2:2:1, 3:3:1, 4:4:1, 5:5:1, 5:3:2, 7:1:2 in DeepSnap_Python and DeepSnap_DIGITS of 720725_GR_ant, three types train:valid:test = 1:1:1, 3:1:2, 5:3:4, in DeepSnap_Python and DeepSnap_DIGITS of 1259395_TSHR_ant, and eight types train:valid:test = 1:1:1, 2:2:1, 3:1:1, 3:2:1, 5:3:2, 5:5:1, 6:1:2, 7:1:2 in DeepSnap_Python and DeepSnap_DIGITS of 1347032_TGF_beta of data-split ratios were prepared (Table 2).
Table 1.
Angles in DeepSnap_Python | Angles in DeepSnap_DIGITS | |||||
---|---|---|---|---|---|---|
PubChem Assay AID | No. | Minimum (°) | Maximum (°) | No. | Minimum (°) | Maximum (°) |
720725_GR_ant | 31 | 65 | 350 | 23 | 70 | 345 |
1347030_TRHR_ago | 15 | 95 | 325 | 17 | 95 | 355 |
1347032_TGF_beta | 16 | 75 | 350 | 16 | 75 | 350 |
Table 2.
Data Splits in DeepSnap_Python | Data Splits in DeepSnap_DIGITS | |||
---|---|---|---|---|
PubChem Assay AID | No. | Type | No. | Type |
720725_GR_ant | 7 | 1:1:1, 2:2:1, 3:3:1, 4:4:1, 5:5:1, 5:3:2, 7:1:2 | 7 | 1:1:1, 2:2:1, 3:3:1, 4:4:1, 5:5:1, 5:3:2, 7:1:2 |
1347030_TRHR_ago | 3 | 1:1:1, 3:1:2, 5:3:4 | 3 | 1:1:1, 3:1:2, 5:3:4 |
1347032_TGF_beta | 8 | 1:1:1, 2:2:1, 3:1:1, 3:2:1, 5:3:2, 5:5:1, 6:1:2, 7:1:2 |
8 | 1:1:1, 2:2:1, 3:1:1, 3:2:1, 5:1:1, 5:3:2, 6:1:2, 7:1:2 |
As results, DeepSnap_Python and DeepSnap_DIGITS in the three MIE targets achieved the following prediction-performance. The mean ROC_AUC, BAC, MCC, and Acc values in the valid dataset were 0.832 ± 0.048 for ROC_AUC_Python in 720725_GR_ant, 0.856 ± 0.029 for ROC_AUC_DIGITS in 720725_GR_ant, 0.875 ± 0.031 for ROC_AUC_Python in 1347030_TRHR_ago, 0.886 ± 0.028 for ROC_AUC_DIGITS in 1347030_TRHR_ago, 0.879 ± 0.015 for ROC_AUC_Python in 1347032_TGF_beta_ant, 0.907 ± 0.020 for ROC_AUC_DIGITS in 1347032_TGF_beta_ant, 0.762 ± 0.044 for BAC_Python in 720725_GR_ant, 0.791 ± 0.023 for BAC_DIGITS in 720725_GR_ant, 0.811 ± 0.032 for BAC_Python in 1347030_TRHR_ago, 0.829 ± 0.023 for BAC_DIGITS in 1347030_TRHR_ago, 0.805 ± 0.015 for BAC_Python in 1347032_TGF_beta_ant, 0.849 ± 0.030 for BAC_DIGITS in 1347032_TGF_beta_ant, 0.248 ± 0.065 for MCC_Python in 720725_GR_ant, 0.282 ± 0.030 for MCC_DIGITS in 720725_GR_ant, 0.141 ± 0.017 for MCC_Python in 1347030_TRHR_ago, 0.155 ± 0.022 for MCC_DIGITS in 1347030_TRHR_ago, 0.309 ± 0.025 for MCC_Python in 1347032_TGF_beta_ant, 0.384 ± 0.044 for MCC_DIGITS in 1347032_TGF_beta_ant, and 0.790 ± 0.058 for Acc_Python in 720725_GR_ant, 0.812 ± 0.044 for Acc_DIGITS in 720725_GR_ant, 0.781 ± 0.030 for Acc_Python in 1347030_TRHR_ago, 0.769 ± 0.060 for Acc_DIGITS in 1347030_TRHR_ago, 0.770 ± 0.029 for Acc_Python in 1347032_TGF_beta_ant, 0.833 ± 0.033 for Acc in 1347032_TGF_beta_ant, respectively (Table 3).The highest prediction-performance values of ROC_AUC on the valid dataset for the angles and data-split ratios were 0.926 at 185° and train:valid:test = 7:1:2 for Python in 720725_GR_ant, 0.910 at 95° and train:valid:test = 5:5:1 for DIGITS in 720725_GR_ant, 0.915 at 176° and train:valid:test = 3:1:2 for Python in 1347030_TRHR_ago, 0.918 at 185° and train:valid:test = 5:3:4 for DIGITS in 1347030_TRHR_ago, 0.911 at 185° and train:valid:test = 7:1:2 for Python in 1347032_TGF_beta_ant, 0.932 at 75° and train:valid:test = 5:3:2 for DIGITS in 1347032_TGF_beta_ant (Figure 1 and Figure 2; Table 3). Additionally, the highest prediction-performance values of BAC on the valid dataset for the angles and data-split ratios were 0.864 at 185° and train:valid:test = 7:1:2 for Python in 720725_GR_ant, 0.837 at 95° and train:valid:test = 3:3:1 for DIGITS in 720725_GR_ant, 0.868 at 176°and train:valid:test = 3:1:2 for Python in 1347030_TRHR_ago, 0.876 at 355° and train:valid:test = 5:3:4 for DIGITS in 1347030_TRHR_ago, 0.844 at 185° and train:valid:test = 7:1:2 for Python 1347032_TGF_beta_ant, 0.930 at 176° and train:valid:test = 5:3:2 for DIGITS in 1347032_TGF_beta_ant (Figure 3 and Figure 4; Table 3). The highest prediction-performance values of MCC on the valid dataset for the angles and data-split ratios were 0.451 at 176° and train:valid:test = 7:1:2 for Python in 720725_GR_ant, 0.354 at 75° and train:valid:test = 4:4:1 for DIGITS in 720725_GR_ant, 0.473 at 185° and train:valid:test = 7:1:1 for Python in 1259395_TSHR_ant, 0.623 at 75° and train:valid:test = 5:5:1 for DIGITS in 1259395_TSHR_ant, 0.194 at 176° and train:valid:test = 3:1:2 for Python in 1347030_TRHR_ago, 0.876 at 355° and train:valid:test = 5:3:4 for DIGITS in 1347030_TRHR_ago, 0.208 at 355° and train:valid:test = 1:1:1 for Python in 1347032_TGF_beta_ant, 0.478 at 165° and train:valid:test = 2:2:1 for DIGITS in 1347032_TGF_beta_ant (Figures S1 and S2; Table 3). Furthermore, the highest prediction-performance values of Acc on the valid dataset 720725_GR_ant for the angles and data-split ratios were 0.917 at 176° and train:valid:test = 7:1:2 for Python in, 0.939 at 155° and train:valid:test = 4:4:1 for DIGITS in 720725_GR_ant, 0.856 at 176° and train:valid:test = 3:1:2 for Python in 1347030_TRHR_ago, 0.902 at 125° and train:valid:test = 1:1:1 for DIGITS in 1347030_TRH R_ago, 0.834 at 176° and train:valid:test = 7:1:2 for Python in 1347032_TGF_beta_ant, 0.896 at 165° and train:valid:test = 2:2:1 for DIGITS in 1347032_TGF_beta_ant (Figures S3 and S4; Table 3). Addtionally, DeepSnap_Python in the three MIE targets indicated prediction performances of loss, PR_AUC, and F as follows. The mean loss values on the train and valid datasets were 0.413 ± 0.153 for loss_train in 720725_GR_ant and 0.383 ± 0.115 for loss_valid in 720725_GR_ant, 0.247 ± 0.088 for loss_train in 1347030_TRHR_ago and 0.189 ± 0.070 for loss_valid in 1347030_TRHR_ago, 0.280 ± 0.120 for loss_train in 1347032_TGF_beta_ant and 0.316 ± 0.061 for loss_valid in 1347032_TGF_beta_ant (Figures S5–S8; Table 3). The mean PR_AUC values on the valid dataset were 0.335 ± 0.117 in 720725_GR_ant, 0.103 ± 0.041 in 1347030_TRHR_ago, and 0.315 ± 0.056 in 1347032_TGF_beta_ant (Figures S9 and S10; Table 3). The mean F values on the valid dataset were 0.853 ± 0.039 in 720725_GR_ant, 0.868 ± 0.020 in 1347030_TRHR_ago, and 0.833 ± 0.020 in 1347032_TGF_beta_ant (Figures S11 and S12; Table 3). Further, the lowest prediction performance values of loss on the train and valid datasets for the angles and data-split ratios were 0.038 at 176° and train:valid:test = 2:2:1 and 0.110 at 185° and train:valid:test = 7:1:2 in 720725_GR_ant; 0.047 at 176° and train:valid:test = 1:1:1 and 0.194 at 176° and train:valid:test = 3:1:2 in 1347030_TRHR_ago; and 0.044 at 176° and train:valid:test = 1:1:1 and 0.197 at 350° and train:valid:test = 3:2:1 in 1347032_TGF_beta_ant (Figures S5–S8; Table 3).
Table 3.
PubChem | 720725_GR_Ant | 1347030_TRHR_Ago | 1347032_TGF_Beta_Python | ||||
---|---|---|---|---|---|---|---|
Assay AID | Python | DIGITS | Python | DIGITS | Python | DIGITS | |
ROC_AUC | average | 0.832 ± 0.048 | 0.856 ± 0.029 | 0.875 ± 0.031 | 0.886 ± 0.028 | 0.879 ± 0.015 | 0.907 ± 0.020 |
max_ROC_AUC | 0.926 | 0.910 | 0.915 | 0.918 | 0.911 | 0.932 | |
max_angle | 185 | 95 | 176 | 185 | 185 | 75 | |
max_split | 7:1:2 | 5:5:1 | 3:1:2 | 5:3:4 | 7:1:2 | 5:3:2 | |
BAC | average | 0.762 ± 0.044 | 0.791 ± 0.023 | 0.811 ± 0.032 | 0.829 ± 0.023 | 0.805 ± 0.015 | 0.849 ± 0.030 |
max_BAC | 0.864 | 0.837 | 0.868 | 0.876 | 0.844 | 0.930 | |
max_angle | 185 | 95 | 176 | 355 | 185 | 176 | |
max_split | 7:1:2 | 3:3:1 | 3:1:2 | 5:3:4 | 7:1:2 | 5:3:2 | |
MCC | average | 0.248 ± 0.065 | 0.282 ± 0.030 | 0.141 ± 0.017 | 0.155 ± 0.022 | 0.309 ± 0.025 | 0.384 ± 0.044 |
max_MCC | 0.451 | 0.354 | 0.194 | 0.208 | 0.373 | 0.478 | |
max_angle | 176 | 75 | 176 | 355 | 176 | 165 | |
max_split | 7:1:2 | 4:4:1 | 3:1:2 | 1:1:1 | 7:1:2 | 2:2:1 | |
Acc | average | 0.790 ± 0.058 | 0.812 ± 0.044 | 0.781 ± 0.030 | 0.769 ± 0.060 | 0.770 ± 0.029 | 0.833 ± 0.033 |
max_Acc | 0.917 | 0.939 | 0.856 | 0.902 | 0.834 | 0.896 | |
max_angle | 176 | 155 | 176 | 125 | 176 | 165 | |
max_split | 7:1:2 | 4:4:1 | 3:1:2 | 1:1:1 | 7:1:2 | 2:2:1 | |
loss_val | average | 0.383 ± 0.115 | 0.108 ± 0.014 | 0.189 ± 0.070 | 0.032 ± 0.007 | 0.316 ± 0.061 | 0.113 ± 0.011 |
min_loss_train | 0.110 | 0.065 | 0.194 | 0.024 | 0.197 | 0.087 | |
min_angle | 185 | 195 | 176 | 325 | 350 | 230 | |
max_split | 7:1:2 | 7:1:2 | 3:1:2 | 3:1:2 | 3:2:1 | 7:1:2 | |
loss_train | average | 0.413 ± 0.153 | 0.247 ± 0.088 | 0.280 ± 0.120 | |||
min_loss_train | 0.038 | 0.047 | 0.044 | ||||
min_angle | 176 | 176 | 176 | ||||
max_split | 2:2:1 | 1:1:1 | 1:1:1 | ||||
PR_AUC | average | 0.335 ± 0.117 | 0.103 ± 0.041 | 0.315 ± 0.056 | |||
max_PR_AUC | 0.660 | 0.194 | 0.453 | ||||
max_angle | 176 | 176 | 176 | ||||
max_split | 7:1:2 | 3:1:2 | 3:1:1 | ||||
F | average | 0.853 ± 0.039 | 0.868 ± 0.020 | 0.833 ± 0.020 | |||
max_F | 0.935 | 0.914 | 0.876 | ||||
max_angle | 176 | 176 | 176 | ||||
max_split | 7:1:2 | 3:1:2 | 7:1:2 |
The highest prediction performance values of PR_AUC on the valid dataset for the angles and data-split ratios were 0.660 at 176° and train:valid:test = 7:1:2 in AID:720725_GR_ant, 0.194 at 176° and train:valid:test = 3:1:2 in 1347030_TRHR_ago, and 0.453 at 176° and train:valid:test = 3:1:1 in 1347032_TGF_beta_ant (Figures S9 and S10; Table 3). In addition, the highest prediction performance values of F on the valid dataset for the angles and data-split ratios were 0.935 at 176° and train:valid:test = 7:1:2 in 720725_GR_ant, 0.914 at 176° and train:valid:test = 3:1:2 in 1347030_TRHR_ago, and 0.876 at 176° and train:valid:test = 7:1:2 in 1347032_TGF_beta_ant (Figures S11 and S12; Table 3). In this study, we observed two performance peaks in prediction models at 176°and 355°of angles in DeepSnap, according to previous results [53,54].
These findings suggested that image augmentation is effectively worked. It has been reported that even though a small number of images was used, the DL can classify by increasing the number of images with the addition of artificial operations, such as movement, rotation, enlargement/reduction, and inversion to the original images [55,56]. In addition, it is known that in conformation generation using algorithms other than MMFF for the force field, the 3D structure differs significantly depending on the algorithm. Therefore, further performance improvement can be expected using other force field calculation algorithms. Further, as a result of examining the depiction condition for ball-and-stick models in the DeepSnap, it was previously reported that the performance can be improved by adjusting the bond thickness and atom color [50].
However, since the image will be similar to the original image, the risk of overfitting, i.e., a decrease in the performance on the test dataset due to the prediction model fitting to match into the training dataset, cannot be ruled [57,58,59,60,61,62,63,64,65,66,67,68]. Thus, data augmentation effectively enables learning with a small number of data. However, suppose it is required to use complex models or obtain high performance to avoid overfitting. In that case, it is important to use high-quality data with few biased features and a sufficiently large data size. There are mainly two methods of data augmentation: offline and online augmentation (also called on-the-fly augmentation), depending on the augmentation timing [69,70]. The offline augmentation is the rotation conversion added to each image in the dataset, doubled in size with the increase in the capacity because the converted image is created for each image. The online augmentation applied to mini-batch that split the dataset into multiple datasets, where the capacity of the dataset does not increase and different random images are generated if the DL is performed using multiple epochs in the same mini-batch. Additionally, this DeepSnap-python has a built-in early stopping function that can the effects of epochs as well as overfit. Therefore, the performance of this system could be more improved by combination of these functions with parameter optimization.
2.2. LR and BS in DeepSnap-DL
To investigate the effect of hyperparameters in DeepSnap-DL with Python system on prediction-performance values of the three MIE targets, we optimized 39 LRs from 0.004 to 0.0000001 in 720725_GR_ant, 24 LRs from 0.007 to 0.000001 in 1347030_TRHR_ago, and 38 LRs from 0.002 to 0.000001 in 1347032_TGF_beta_ant using the valid dataset (Table S2). The mean ROC_AUC, BAC, MCC, and Acc values in the valid dataset were 0.884 ± 0.930 for ROC_AUC in 720725_GR_ant, 0.897 ± 0.016 for ROC_AUC in 1347030_TRHR_ago, 0.909 ± 0.011 for ROC_AUC in 1347032_TGF_beta_ant, 0.817 ± 0.053 for BAC in 720725_GR_ant, 0.844 ± 0.012 for BAC in 1347030_TRHR_ago, 0.839 ± 0.010 for BAC in 1347032_TGF_beta_ant, 0.354 ± 0.090 for MCC in 720725_GR_ant, 0.171 ± 0.015 for MCC in 1347030_TRHR_ago, 0.361 ± 0.016 for MCC in 1347032_TGF_beta_ant, and 0.859 ± 0.060 for Acc in 720725_GR_ant, 0.881 ± 0.025 for Acc in 1347030_TRHR_ago, 0.807 ± 0.028 for Acc in 1347032_TGF_beta_ant, respectively (Table 4). The highest prediction-performance values of ROC_AUC on the valid dataset for LRs were 0.930 at 0.00009 in 720725_GR_ant, 0.911 at 0.000002 in 1347030_TRHR_ago, 0.922 at 0.000021 in 1347032_TGF_beta_ant (Figure 5, Table 4). Additionally, the highest prediction-performance values of BAC on the valid dataset for LRs were 0.865 at 0.0007 in 720725_GR_ant, 0.865 at 0.000001 in 1347030_TRHR_ago, 0.853 at 0.000029 in 1347032_TGF_beta_ant (Figure 5, Table 4). The highest prediction-performance values of MCC on the valid dataset for LRs were 0.466 at 0.00007 in 720725_GR_ant, 0.191 at 0.0048 in 1347030_TRHR_ago, 0.387 at 0.000029 in 1347032_TGF_beta_ant (Figure 5, Table 4). Furthermore, the highest prediction-performance values of Acc on the valid dataset for LRs were 0.928 at 0.00007 in 720725_GR_ant, 0.848 at 0.000005 in 1347030_TRHR_ago, 0.855 at 0.00002 in 1347032_TGF_beta_ant (Figure 5, Table 4).
Table 4.
PubChem | 720725_GR_Ant | 1347030_TRHR | 1347032_TGF_Beta | |
---|---|---|---|---|
Assay AID | Train:Valid:Test = 7:1:2 | Train:Valid:Test = 3:1:2 | Train:Valid:Test = 7:1:2 | |
ROC_AUC | average | 0.884 ± 0.053 | 0.897 ± 0.016 | 0.909 ± 0.011 |
max_ROC_AUC | 0.930 | 0.911 | 0.922 | |
max_LR | 0.00009 | 0.000002 | 0.000021 | |
BAC | average | 0.817 ± 0.053 | 0.844 ± 0.012 | 0.839 ± 0.010 |
max_BAC | 0.865 | 0.865 | 0.853 | |
max_LR | 0.0007 | 0.000001 | 0.000029 | |
MCC | average | 0.354 ± 0.090 | 0.171 ± 0.015 | 0.361 ± 0.016 |
max_MCC | 0.466 | 0.191 | 0.387 | |
max_LR | 0.00007 | 0.0048 | 0.000029 | |
Acc | average | 0.859 ± 0.060 | 0.811 ± 0.025 | 0.807 ± 0.028 |
max_Acc | 0.928 | 0.848 | 0.855 | |
max_LR | 0.00007 | 0.000005 | 0.00002 | |
loss_train | average | 0.215 ± 0.231 | 0.098 ± 0.062 | 0.125 ± 0.110 |
min_loss | 0.022 | 0.020 | 0.038 | |
min_LR | 0.00003 | 0.00002 | 0.00003 | |
loss_val | average | 0.263 ± 0.186 | 0.122 ± 0.058 | 0.236 ± 0.062 |
min_loss | 0.124 | 0.066 | 0.170 | |
min_LR | 0.00003 | 0.0008 | 0.000021 | |
PR_AUC | average | 0.502 ± 0.177 | 0.155 ± 0.045 | 0.410 ± 0.064 |
max_PR_AUC | 0.789 | 0.213 | 0.472 | |
max_LR | 0.00007 | 0.0042 | 0.00003 | |
F | average | 0.898 ± 0.039 | 0.886 ± 0.015 | 0.858 ± 0.019 |
max_F | 0.942 | 0.909 | 0.890 | |
max_LR | 0.00007 | 0.000005 | 0.00002 |
DeepSnap_Python in the three MIE targets achieved the following prediction-performance values of loss, PR_AUC, and F. The mean loss values on the train and valid datasets were 0.215 ± 0.231 for loss_train in 720725_GR_ant and 0.263 ± 0.186 for loss_valid in 720725_GR_ant, 0.098 ± 0.062 for loss_train in AID: 1347030_TRHR_ago and 0.122 ± 0.058 for loss_valid in 1347030_TRHR_ago, 0.125 ± 0.110 for loss_train in 1347032_TGF_beta_ant and 0.236 ± 0.062 for loss_valid in 1347032_TGF_beta_ant (Figure 5, Table 4). Additionally, the mean PR_AUC values on the valid dataset were 0.502 ± 0.177 in 720725_GR_ant, 0.155 ± 0.045 in 1347030_TRHR_ago, 0.410 ± 0.064 in 1347032_TGF_beta_ant (Figure 5, Table 4). The mean F values on the valid dataset were 0.898 ± 0.039 (PubChem assay AID:720725_GR_ant), 0.886 ± 0.015 in 1347030_TRHR_ago, 0.858 ± 0.019 in 1347032_TGF_beta_ant (Figure 5, Table 4).
Furthermore, the lowest prediction-performance values of loss on the train and valid datasets for the LRs were 0.022 at 0.00003 and 0.124 at 0.00003 in 720725_GR_ant, 0.020 at 0.00002 and 0.066 at 0.0008 in 1347030_TRHR_ago, 0.038 at 0.00003 and 0.170 at 0.000021 in 1347032_TGF_beta_ant (Figure 5, Table 4). The highest prediction-performance values of PR_AUC on the valid dataset for LRs were 0.789 at 0.00007 in 720725_GR_ant, 0.213 at 0.0042 in 1347030_TRHR_ago, 0.472 at 0.00003 in 1347032_TGF_beta_ant (Figure 5, Table 4). Additionally, the highest prediction-performance values of F on the valid dataset for LRs were 0.942 at 0.00007 in 720725_GR_ant, 0.909 at 0.000005, 0.890 at 0.00002 in 1347032_TGF_beta_ant (Figure 5, Table 4).
Finally, to investigate the effect of BS in the improved DeepSnap-DL with Python system on prediction-performance values, we optimized 84 BSs from 2 to 300 in 720725_GR_ant, 13 LRs from 2 to 26 in 1347030_TRHR_ago, and 37 LRs from 2 to 80 in 1347032_TGF_beta_ant using the valid dataset (Table S3). The mean ROC_AUC, BAC, MCC, and Acc values in the test dataset were 0.983 ± 0.032 for ROC_AUC in 720725_GR_ant, 0.929 ± 0.003 for ROC_AUC in 1347030_TRHR_ago, 0.918 ± 0.005 for ROC_AUC in 1347032_TGF_beta_ant, 0.866 ± 0.033 for BAC in 720725_GR_ant, 0.877 ± 0.004 for BAC in 1347030_TRHR_ago, 0.848 ± 0.007 for BAC in 1347032_TGF_beta_ant, 0.444 ± 0.056 for MCC in 720725_GR_ant, 0.194 ± 0.004 for MCC in 1347030_TRHR_ago, 0.368 ± 0.011 for MCC in 1347032_TGF_beta_ant, and 0.908 ± 0.021 for Acc in 720725_GR_ant, 0.855 ± 0.005 for Acc in 1347030_TRHR_ago, 0.810 ± 0.011 for Acc in 1347032_TGF_beta_ant, respectively (Table 5).
Table 5.
PubChem | 720725_GR_Ant | 1347030_TRHR | 1347032_TGF_Beta | |
---|---|---|---|---|
Assay AID | Train:Valid:Test = 7:1:2 | Train:Valid:Test = 3:1:2 | Train:Valid:Test = 7:1:2 | |
ROC_AUC | average | 0.983 ± 0.032 | 0.929 ± 0.003 | 0.918 ± 0.005 |
max_ROC_AUC | 0.983 | 0.934 | 0.925 | |
max_BS | 125 | 14 | 28 | |
BAC | average | 0.866 ± 0.033 | 0.877 ± 0.004 | 0.848 ± 0.007 |
max_BAC | 0.930 | 0.881 | 0.862 | |
max_BS | 125 | 22 | 44 | |
MCC | average | 0.444 ± 0.056 | 0.194 ± 0.004 | 0.368 ± 0.011 |
max_MCC | 0.604 | 0.200 | 0.390 | |
max_BS | 200 | 14 | 28 | |
Acc | average | 0.908 ± 0.021 | 0.855 ± 0.005 | 0.810 ± 0.011 |
max_Acc | 0.954 | 0.863 | 0.835 | |
max_BS | 200 | 14 | 20 | |
loss_train | average | 0.045 ± 0.033 | 0.322 ± 0.013 | 0.097 ± 0.047 |
min_loss | 0.019 | 0.301 | 0.037 | |
min_BS | 48 | 14 | 20 | |
loss_test | average | 0.119 ± 0.025 | 0.314 ± 0.022 | 0.203 ± 0.023 |
min_loss | 0.073 | 0.255 | 0.172 | |
min_BS | 120 | 2 | 34 | |
PR_AUC | average | 0.654 ± 0.087 | 0.136 ± 0.011 | 0.431 ± 0.032 |
max_PR_AUC | 0.800 | 0.154 | 0.476 | |
max_BS | 290 | 14 | 28 | |
F | average | 0.930 ± 0.014 | 0.914 ± 0.003 | 0.860 ± 0.008 |
max_F | 0.961 | 0.919 | 0.877 | |
max_BS | 200 | 14 | 20 |
The highest prediction-performance values of ROC_AUC on the test dataset for BS were 0.983 at 125 in 720725_GR_ant, 0.934 at 14 in 1347030_TRHR_ago, 0.925 at 28 in 1347032_TGF_beta_ant (Figure S13, Table 5). Additionally, the highest prediction-performance values of BAC on the test dataset for BSs were 0.930 at 125 in 720725_GR_ant, 0.881 at 22 in 1347030_TRHR_ago, 0.862 at 44 in 1347032_TGF_beta_ant (Figure S13, Table 5). The highest prediction-performance values of MCC on the test dataset for BSs were 0.604 at 200 in 720725_GR_ant, 0.200 at 14 in 1347030_TRHR_ago, 0.390 at 28 in 1347032_TGF_beta_ant (Figure S13, Table 5). Furthermore, the highest prediction-performance values of Acc on the test dataset for BSs were 0.954 at 200 in 720725_GR_ant, 0.863 at 14 in 1347030_TRHR_ago, 0.835 at 20 in 1347032_TGF_beta_ant (Figure S13, Table 5).
Additionally, DeepSnap_Python in the three MIE targets achieved the following prediction-performance values of loss, PR_AUC, and F. The mean loss values on the train and test datasets were 0.045 ± 0.033 for loss_train in 720725_GR_ant and 0.119 ± 0.025 for loss_test in 720725_GR_ant, 0.322 ± 0.013 for loss_train in 1347030_TRHR_ago and 0.314 ± 0.022 for loss_test in 1347030_TRHR_ago, 0.097 ± 0.047 for loss_train in 1347032_TGF_beta_ant and 0.203 ± 0.023 for loss_test in 1347032_TGF_beta_ant (Figure S13, Table 5). Additionally, the mean PR_AUC values on the test dataset were 0.654 ± 0.087 in 720725_GR_ant, 0.136 ± 0.011 in 1347030_TRHR_ago, 0.431 ± 0.032 in 1347032_TGF_beta_ant (Figure S13, Table 5). The mean F values on the test dataset were 0.930 ± 0.014 in 720725_GR_ant, 0.914 ± 0.003 in 1347030_TRHR_ago, 0.860 ± 0.008 in 1347032_TGF_beta_ant (Figure S13, Table 5). Furthermore, the lowest prediction-performance values of loss on the train and test datasets for BSs were 0.019 at 48 and 0.073 at 120 in 720725_GR_ant, 0.301 at 14 and 0.255 at 2 in 1347030_TRHR_ago, 0.037 at 20 and 0.172 at 34 in 1347032_TGF_beta_ant (Figure S13, Table 5). The highest prediction-performance values of PR_AUC on the test dataset for BSs were 0.800 at 290 in 720725_GR_ant, 0.154 at 14 in 1347030_TRHR_ago, 0.476 at 28 in 1347032_TGF_beta_ant (Figure S13, Table 5). Additionally, the highest prediction-performance values of F on the test dataset for BSs were 0.961 at 200 in 720725_GR_ant, 0.919 at 14, 0.877 at 20 in 1347032_TGF_beta_ant (Figure S13, Table 5).
As a method often used to improve the generalization performance of DL, LR decay, meaning to lower LR in places where learning has progressed to some extent, is known to improve accuracy sharply [71]. However, their behavior changes significantly depending on datasets, network types, and optimization methods. Therefore, a function that automatically attenuates the LR decay is required when learning converges to some extent. Thus, the improved DeepSnap-DL method was added with an early stopping function to extract models with the highest performance in a series of learning processes by discontinuing learning before entering the overfitting phase; thereby shortening the learning time.
It was previously reported that BS and LR are proportional, whereas BS and momentum coefficient are inversely proportional [72]. It is considered that the learning converges to the sharp minimum as BS increases. Meanwhile, when BS is small, larger variance positively affects the performance in DL, such as regularization. However, it was shown that there was an optimal BS for LRs, suggesting that it is essential to have an appropriate BS within that LRs, instead of reducing the BS. Thus, considering the learning efficiency, it is appropriate to set the BS sufficiently large and adjust the LR.
These findings are expected to lead to drug development from the estimation and identification of new ligands for nuclear receptors.
3. Materials and Methods
3.1. Data
The datasets of three MIE targets, including antagonists of the glucocorticoid receptor (PubChem assay AID:720725_GR_ant), TGF-beta/Smad (PubChem assay AID:1347032_TGF_beta_ant), and agonist of the thyrotropin-releasing hormone receptor (PubChem assay AID:1347030_TRHR_ago) for the chemical structures in SMILES format and the corresponding agonist or antagonist scores defined as Pubchem_activity_scores from the Tox21 10K library in the PubChem database housing quantitative high-throughput assays to identify small molecule agonists and antagonists for MIEs, as previously reported, were downloaded [50,51,52,53,54] (Table S1a–c). After eliminating overlapping chemicals and inorganic compounds because of the presence of possible stereoisomers or salts, we defined active and inactive compounds by activity scores, which the agonist and antagonist scores ranged from 0% to 100% by normalizing each titration point relative to the positive control compound and dimethyl sulfoxide (DMSO)-only wells according to the following equation: % activity = [(Vcompound − Vdmso)/(Vpos − Vdmso)] × 100, where Vcompound, Vdmso, and Vpos denote the compound-well values, median values of the DMSO-only wells, and median values of the positive control well in the reporter gene assay, i.e., active and inactive compounds were defined by activity scores 40–100 and 0–39, respectively (Table S1a–c). The mean number of chemicals was 7601 ± 63, and the highest and lowest numbers of chemicals were 7662 in 1347030_TRHR_ago and 7539 in 720725_GR_ant, respectively (Table S1a–c and Table 1). Further, we divided the data for the chemical compounds into two groups based on their activity scores: active and inactive chemicals. Active chemicals had an activity score ≥ 40, whereas inactive chemicals had an activity score < 40. The mean numbers and percentages of active chemicals among three MIEs were 248 ± 167 and 3.27 ± 2.20, and the highest and lowest numbers and percentages of active chemicals were, respectively, 395 and 5.19% for 1347032_TGF_beta and 67 and 0.87% for 1347030_TRHR_ago (Table 6). Data were divided into train, valid, and test datasets. The first two datasets were used for training and fine-tuning the prediction models. The final evaluation of the constructed models was performed using a foldout test dataset.
Table 6.
All | Active Compound | Inactive Compound | |||
---|---|---|---|---|---|
PubChem Assay AID | No. | No. | % | No. | % |
720725_GR_ant | 7537 | 283 | 3.75 | 7254 | 96.25 |
1347030_TRHR_ago | 7662 | 67 | 0.87 | 7595 | 99.13 |
1347032_TGF_beta | 7604 | 395 | 5.19 | 7209 | 94.81 |
3.2. DeepSnap
We applied the SMILES format for 3D conformational import to generate the 3D chemical database with rotatable torsion and saved it as a structure data file (SDF) using molecular operating environment (MOE) 2018 scientific applications (MOLSIS Inc., Tokyo, Japan). Then, the external program, CORINA classic software (Molecular Network GmbH, Nürnberg, Germany, https://www.mn-am.com/products/corina, accessed on 25 January 2022) was used to determine a suitable form of each chemical structure. The 3D chemical structures of the compounds from SDF files were depicted as 3D ball-and-stick models with different colors corresponding to different atoms by a Jmol, an open-source Java viewer software for 3D molecular modeling of chemical structures [73,74,75]. The 3D-chemical models were automatically captured as snapshots of user-defined angle increments on the x-, y-, and z-axes saved as 256 × 256-pixel resolution PNG files (RGB) and split into three train, valid, and test datasets, as previously reported [50,51,52,53,54]. All PNG image files produced by DeepSnap were resized using NVIDIA DL GPU training system (DIGITS) version 4.0.0 software (NVIDIA, Santa Clara, CA, USA), on four-GPU systems, Tesla-V100-PCIE (31.7 GB) with 256 × 256-pixel resolution as input data, as previously reported [50,51,52]. We used a pre-trained open-source DL model, Caffe, that the network of GoogLeNet consisted of deep convolutional neural network (CNN) architectures comprised complex inspired by LeNet, on the CentOS Linux distribution 7.3.1611. At the DeepSnap-DL-DIGITS method, the prediction models were constructed by train datasets using 30 epochs in DL. Among these epochs, the lowest loss value in the valid dataset was selected for the next examination for prediction using the test dataset.
The improved DeepSnap-DL-Python system used a new 3D conformational import application, called SMILES_TO_SDF, to produce the SDF files from the SMILES format. We used PyMOL, an open-source molecular visualization system written in the Python programming language (Schrödinger, Inc., New York, NY, USA), to obtain high-quality 3D molecular modeling of chemical structures with 3D ball-and-stick models with different colors corresponding to different atoms. The 3D chemical structures can produce different images depending on the direction. They are captured automatically by DeepSnap as snapshots with user-defined angle increments with respect to the x-, y-, and z-axes as the DeepSnap-DL-DIGITS method. The snapshots, saved as 256 × 256-pixel PNG files (RGB), were divided into the train, valid, and test datasets. Additionally, the external test dataset is permanently fixed. TensorFlow and Keras on CentOS Linux 7.3.1611 with the CNN GoogLeNet were used all 2D PNG images produced by the DeepSnap-DL-Python system for training and fine-tuning the prediction models. Background colors in the images were changed to the color values in PyMOL, where a force field, which is a set of parameters for the bond lengths, angles, torsional parameters, electrostatic properties, and van der Waals interactions, uses the Merck Molecular Force Field (MMFF) [76].
Next, using the structural information for these chemicals derived from the SMILES format, the 3D chemical structure per compound with “rotatable torsions” was depicted using MOE application software program, and optimized to generate a single low energy conformation using CORINA classic software. These 3D chemical structures were saved in SDF format as a database file. Then, molecular images were generated as snapshots of the 3D structure from the SDF file using the DeepSnap method at different angles along the x, y, and z axes. The prediction models of the three MIE targets were constructed using these images of the 3D chemicals as input data for the DIGITS-based DL. Another system that is modified DeepSnap-DL by TensorFlow and Keras with Python was used. The SMILES format was used for a new 3D application, called SMILES_TO_SDF, to produce high-quality 3D molecular modeling of the chemical structures saved as a chemical database in SDF format. 2D PNG images produced from the SDF file were produced by DeepSnap, and the prediction models were constructed using these images as input data by DL with TensorFlow and Keras, called DeepSnap-DL-Python.
3.3. Evaluation of Prediction Models
We analyzed the probability of the prediction results using the prediction model with the lowest minimum loss in valid value among 30 examined echoes using the DeepSnap-DL-DIGITS method. We used the medians of each predicted value as representative values for target molecules using statistical analysis software JMP® Pro. 14 (SAS Institute Inc., Cary, NC, USA), as previously reported [50,51,52], because the process of the DeepSnap-DL-DIGITS method calculated the probabilities for each image prepared from different angles with the x-, y-, and z-axes directions for one molecule. Classification performance was evaluated based on a confusion matrix defined by the cutoff value (θ) from the Youden’s Index (YI) as follows [77,78,79]:
where k is the diagnostic categories, wj ∈ (0,1).
However, the DeepSnap-DL-Python system automatically obtains the probability of prediction results with the lowest minimum loss_valid value among 30 examined epochs, which are the numbers of repeats for one training dataset modulated by early stopping. Additionally, the performance of each model was automatically calculated in terms of the metrics: ROC_AUC, precision recall_AUC (PR_AUC), balanced accuracy (BAC), F, Matthew’s correlation coefficient (MCC), accuracy (Acc), and loss. These performance metrics are defined as follows. Here TP, FN, TN, and FP denote true positive, false negative, true negative, and false positive, respectively.
Sensitivity = ΣTPs/(ΣTPs + ΣFNs) |
Specificity = ΣTNs/(ΣTNs + ΣFPs) |
BAC = (sensitivity + specificity)/2 |
Acc = Accuracy = (TP + TN)/(TP + FP + TN + FN) |
Precision = TP/(TP + FP) |
Recall = TP/(TP + FN) |
F-measure (F) = 2 × Recall × Precision/(Recall + Precision) |
To determine the optimal cutoff point for the definition of TP, FN, TN, and FP, we adopted a method for maximizing the sensitivity (1—specificity), called YI. This index has a value ranging from 0 to 1, where 1 represents the maximum effectiveness, and 0 represents the minimum effectiveness. Additionally, the area under the curve (AUC) for the receiver operating characteristics (ROC) is given by
Wt = 1/2 (prect + 1 − prect − 1) |
Here, ROC_AUC denotes AUC f, j iterates over the true points, Np is the number of true points, T is the number of thresholds, and prect is the precision at threshold t. For broader cases, let prec0 = prec1, and precT = 0 [80]. The PR curve is the plot of Recall (x) vs. Precision (y), and PR_AUC was calculated according to previous studies [53,54]. This study used N = 3 to reduce the bias, and the values are represented as averages.
4. Conclusions
In this study, we constructed prediction models for antagonists of the glucocorticoid receptor, TGF-beta/Smad, and agonist of the thyrotropin-releasing hormone receptor using the classic DeepSnap-DL system with DIGITS and improved DeepSnap-DL system with TensorFlow and Keras using the Tox21 10k library. We performed high-throughput and decreased computational costs using the improved DeepSnap-DL system by optimizing the parameters in DeepSnap. Consequently, we obtained that the improved DeepSnap-DL system would be a powerful advanced QSAR system on toxicological and biochemical/cheminformatic fields.
Acknowledgments
The computer analysis was supported by Shunichi Sasaki and Kota Kurosaki.
Supplementary Materials
The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/ijms23042141/s1.
Author Contributions
Y.U. initiated and supervised the study, designed the experiments, collected information about the chemical compounds, and edited the manuscript. Y.M. performed computer analysis and drafted the manuscript. All authors have read and agreed to the published version of the manuscript.
Funding
This study was funded in part by grants from the Ministry of Economy, Trade and Industry, AI-SHIPS (AI-based Substances Hazardous Integrated Prediction System), Japan, project (20180314ZaiSei8).
Institutional Review Board Statement
Not applicable.
Informed Consent Statement
Not applicable.
Data Availability Statement
All samples of the SMILES compounds and technicalism are available from the authors.
Conflicts of Interest
The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
Footnotes
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.
References
- 1.Mao J., Akhtar J., Zhang X., Sun L., Guan S., Li X., Chen G., Liu J., Jeon H.N., Kim M.S., et al. Comprehensive strategies of machine-learning-based quantitative structure-activity relationship models. iScience. 2021;24:103052. doi: 10.1016/j.isci.2021.103052. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Ahmadi S., Moradi Z., Kumar A., Almasirad A. SMILES-based QSAR and molecular docking study of xanthone derivatives as alpha-glucosidase inhibitors. J. Recept. Signal. Transduct. Res. 2021;12:1–12. doi: 10.1080/10799893.2021.1957932. [DOI] [PubMed] [Google Scholar]
- 3.Amin S.A., Ghosh K., Gayen S., Jha T. Chemical-informatics approach to COVID-19 drug discovery: Monte Carlo based QSAR, virtual screening and molecular docking study of some in-house molecules as papain-like protease (PLpro) inhibitors. J. Biomol. Struct. Dyn. 2021;39:4764–4773. doi: 10.1080/07391102.2020.1780946. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Ashraf S., Ranaghan K.E., Woods C.J., Mulholland A.J., Ul-Haq Z. Exploration of the structural requirements of Aurora Kinase B inhibitors by a combined QSAR, modelling and molecular simulation approach. Sci. Rep. 2021;11:18707. doi: 10.1038/s41598-021-97368-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Aziz M.A., Shehab W.S., Al-Karmalawy A.A., El-Farargy A.F., Abdellattif M.H. Design, Synthesis, Biological Evaluation, 2D-QSAR Modeling, and Molecular Docking Studies of Novel 1H-3-Indolyl Derivatives as Significant Antioxidants. Int. J. Mol. Sci. 2021;22:10396. doi: 10.3390/ijms221910396. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.de Souza A.S., de Souza R.F., Guzzo C.R. Quantitative structure-activity relationships, molecular docking and molecular dynamics simulations reveal drug repurposing candidates as potent SARS-CoV-2 main protease inhibitors. J. Biomol. Struct. Dyn. 2021;9:1–18. doi: 10.1080/07391102.2021.1958700. [DOI] [PubMed] [Google Scholar]
- 7.Bahmani A., Tanzadehpanah H., Hosseinpour Moghadam N., Saidijam M. Introducing a pyrazolopyrimidine as a multi-tyrosine kinase inhibitor, using multi-QSAR and docking methods. Mol. Divers. 2021;25:949–965. doi: 10.1007/s11030-020-10080-8. [DOI] [PubMed] [Google Scholar]
- 8.Elekofehinti O.O., Iwaloye O., Molehin O.R., Famusiwa C.D. Identification of lead compounds from large natural product library targeting 3C-like protease of SARS-CoV-2 using E-pharmacophore modelling, QSAR and molecular dynamics simulation. Silico Pharmacol. 2021;9:49. doi: 10.1007/s40203-021-00109-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Gentile D., Floresta G., Patamia V., Chiaramonte R., Mauro G.L., Rescifina A., Vecchio M. An Integrated Pharmacophore/Docking/3D-QSAR Approach to Screening a Large Library of Products in Search of Future Botulinum Neurotoxin A Inhibitors. Int. J. Mol. Sci. 2020;21:9470. doi: 10.3390/ijms21249470. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.He Q., Han C., Li G., Guo H., Wang Y., Hu Y., Lin Z., Wang Y. In silico design novel (5-imidazol-2-yl-4-phenylpyrimidin-2-yl)[2-(2-pyridylamino)ethyl]amine derivatives as inhibitors for glycogen synthase kinase 3 based on 3D-QSAR, molecular docking and molecular dynamics simulation. Comput. Biol. Chem. 2020;88:107328. doi: 10.1016/j.compbiolchem.2020.107328. [DOI] [PubMed] [Google Scholar]
- 11.Huang M., Duan W.G., Lin G.S., Li B.Y. Synthesis, Antifungal Activity, 3D-QSAR, and Molecular Docking Study of Novel Menthol-Derived 1,2,4-Triazole-thioether Compounds. Molecules. 2021;26:6948. doi: 10.3390/molecules26226948. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Izadpanah E., Riahi S., Abbasi-Radmoghaddam Z., Gharaghani S., Mohammadi-Khanaposhtanai M. A simple and robust model to predict the inhibitory activity of alpha-glucosidase inhibitors through combined QSAR modeling and molecular docking techniques. Mol. Divers. 2021;25:1811–1825. doi: 10.1007/s11030-020-10164-5. [DOI] [PubMed] [Google Scholar]
- 13.Kasmi R., Hadaji E., Chedadi O., El Aissouq A., Bouachrine M., Ouammou A. 2D-QSAR and docking study of a series of coumarin derivatives as inhibitors of CDK (anticancer activity) with an application of the molecular docking method. Heliyon. 2020;6:e04514. doi: 10.1016/j.heliyon.2020.e04514. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Mellado M., González C., Mella J., Aguilar L.F., Viña D., Uriarte E., Cuellar M., Matos M.J. Combined 3D-QSAR and docking analysis for the design and synthesis of chalcones as potent and selective monoamine oxidase B inhibitors. Bioorganic Chem. 2021;108:104689. doi: 10.1016/j.bioorg.2021.104689. [DOI] [PubMed] [Google Scholar]
- 15.Menke J., Maskri S., Koch O. Computational Ion Channel Research: From the Application of Artificial Intelligence to Molecular Dynamics Simulations. Cell Physiol. Biochem. 2021;55:14–45. doi: 10.33594/000000336. [DOI] [PubMed] [Google Scholar]
- 16.Metelytsia L.O., Trush M.M., Kovalishyn V.V., Hodyna D.M., Kachaeva M.V., Brovarets V.S., Pilyo S.G., Sukhoveev V.V., Tsyhankov S.A., Blagodatnyi V.M., et al. 1,3-Oxazole derivatives of cytisine as potential inhibitors of glutathione reductase of Candida spp.: QSAR modeling, docking analysis and experimental study of new anti-Candida agents. Comput. Biol. Chem. 2021;90:107407. doi: 10.1016/j.compbiolchem.2020.107407. [DOI] [PubMed] [Google Scholar]
- 17.Oyewole R.O., Oyebamiji A.K., Semire B. Theoretical calculations of molecular descriptors for anticancer activities of 1,2,3-triazole-pyrimidine derivatives against gastric cancer cell line (MGC-803): DFT, QSAR and docking approaches. Heliyon. 2020;6:e03926. doi: 10.1016/j.heliyon.2020.e03926. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Poustforoosh A., Faramarz S., Nematollahi M.H., Hashemipour H., Tüzün B., Pardakhty A., Mehrabani M. 3D-QSAR, molecular docking, molecular dynamics, and ADME/T analysis of marketed and newly designed flavonoids as inhibitors of Bcl-2 family proteins for targeting U-87 glioblastoma. J. Cell Biochem. 2021 doi: 10.1002/jcb.30178. in press. [DOI] [PubMed] [Google Scholar]
- 19.Rahman M.M., Saha T., Islam K.J., Suman R.H., Biswas S., Rahat E.U., Hossen M.R., Islam R., Hossain M.N., Mamun A.A., et al. Virtual screening, molecular dynamics and structure-activity relationship studies to identify potent approved drugs for Covid-19 treatment. J. Biomol. Struct. Dyn. 2021;39:6231–6241. doi: 10.1080/07391102.2020.1794974. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Righetti G., Casale M., Liessi N., Tasso B., Salis A., Tonelli M., Millo E., Pedemonte N., Fossa P., Cichero E. Molecular Docking and QSAR Studies as Computational Tools Exploring the Rescue Ability of F508del CFTR Correctors. Int. J. Mol. Sci. 2020;21:8084. doi: 10.3390/ijms21218084. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Rosa G.P., Palmeira A., Resende D.I.S.P., Almeida I.F., Kane-Pagès A., Barreto M.C., Sousa E., Pinto M.M.M. Xanthones for melanogenesis inhibition: Molecular docking and QSAR studies to understand their anti-tyrosinase activity. Bioorganic Med. Chem. 2021;29:115873. doi: 10.1016/j.bmc.2020.115873. [DOI] [PubMed] [Google Scholar]
- 22.Rosell-Hidalgo A., Young L., Moore A.L., Ghafourian T. QSAR and molecular docking for the search of AOX inhibitors: A rational drug discovery approach. J. Comput. Aided Mol. Des. 2021;35:245–260. doi: 10.1007/s10822-020-00360-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Shah B.M., Modi P., Trivedi P. Pharmacophore-based virtual screening, 3D-QSAR, molecular docking approach for identification of potential dipeptidyl peptidase IV inhibitors. J. Biomol. Struct. Dyn. 2021;39:2021–2043. doi: 10.1080/07391102.2020.1750485. [DOI] [PubMed] [Google Scholar]
- 24.Shamsi E., Rahati A., Dehghanian E. A modified binary particle swarm optimization with a machine learning algorithm and molecular docking for QSAR modelling of cholinesterase inhibitors. SAR QSAR Environ. Res. 2021;32:745–767. doi: 10.1080/1062936X.2021.1971761. [DOI] [PubMed] [Google Scholar]
- 25.Shulga D.A., Kudryavtsev K.V. Selection of Promising Novel Fragment Sized S. aureus SrtA Noncovalent Inhibitors Based on QSAR and Docking Modeling Studies. Molecules. 2021;26:7677. doi: 10.3390/molecules26247677. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Taha I., Keshk E.M., Khalil A.M., Fekri A. Synthesis, characterization, antibacterial evaluation, 2D-QSAR modeling and molecular docking studies for benzocaine derivatives. Mol. Divers. 2021;25:435–459. doi: 10.1007/s11030-020-10138-7. [DOI] [PubMed] [Google Scholar]
- 27.Sun C., Feng L., Sun X., Yu R., Kang C. Design and screening of FAK, CDK 4/6 dual inhibitors by pharmacophore model, molecular docking, and molecular dynamics simulation. J. Biomol. Struct. Dyn. 2021;39:5358–5367. doi: 10.1080/07391102.2020.1786458. [DOI] [PubMed] [Google Scholar]
- 28.Tong J.B., Luo D., Feng Y., Bian S., Zhang X., Wang T.H. Structural modification of 4, 5-dihydro-[1, 2, 4] triazolo [4, 3-f] pteridine derivatives as BRD4 inhibitors using 2D/3D-QSAR and molecular docking analysis. Mol. Divers. 2021;25:1855–1872. doi: 10.1007/s11030-020-10172-5. [DOI] [PubMed] [Google Scholar]
- 29.Veligeti R., Madhu R.B., Anireddy J., Pasupuleti V.R., Avula V.K.R., Ethiraj K.S., Uppalanchi S., Kasturi S., Perumal Y., Anantaraju H.S., et al. Synthesis of novel cytotoxic tetracyclic acridone derivatives and study of their molecular docking, ADMET, QSAR, bioactivity and protein binding properties. Sci. Rep. 2020;10:20720. doi: 10.1038/s41598-020-77590-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Wang F., Qiu Y., Zhou B. In silico exploration of hydroxylated polychlorinated biphenyls as estrogen receptor beta ligands by 3D-QSAR, molecular docking and molecular dynamics simulations. J. Biomol. Struct. Dyn. 2021;1:1–12. doi: 10.1080/07391102.2021.1890220. [DOI] [PubMed] [Google Scholar]
- 31.Wang X., Duan W., Lin G., Li B., Chen M., Lei F. Synthesis, 3D-QSAR and Molecular Docking Study of Nopol-Based 1,2,4-Triazole-Thioether Compounds as Potential Antifungal Agents. Front. Chem. 2021;9:757584. doi: 10.3389/fchem.2021.757584. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Yalcin-Ozkat G. Molecular Modeling Strategies of Cancer Multidrug Resistance. Drug Resist. Updat. 2021;24:100789. doi: 10.1016/j.drup.2021.100789. [DOI] [PubMed] [Google Scholar]
- 33.Zięba A., Laitinen T., Patel J.Z., Poso A., Kaczor A.A. Docking-Based 3D-QSAR Studies for 1,3,4-oxadiazol-2-one Derivatives as FAAH Inhibitors. Int. J. Mol. Sci. 2021;22:6108. doi: 10.3390/ijms22116108. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Zhang C., Li Q., Ren Y., Liu F. Molecular modeling studies of benzothiophene-containing derivatives as promising selective estrogen receptor downregulators: A combination of 3D-QSAR, molecular docking and molecular dynamics simulations. J. Biomol. Struct. Dyn. 2021;39:2702–2723. doi: 10.1080/07391102.2020.1751717. [DOI] [PubMed] [Google Scholar]
- 35.Fang C., Xiao Z. Receptor-based 3D-QSAR in Drug Design: Methods and Applications in Kinase Studies. Curr. Top. Med. Chem. 2016;16:1463–1477. doi: 10.2174/1568026615666150915120943. [DOI] [PubMed] [Google Scholar]
- 36.Deb P.K., Chandrasekaran B., Mailavaram R., Tekade R.K., Jaber A.M.Y. Molecular modeling approaches for the discovery of adenosine A(2B) receptor antagonists: Current status and future perspectives. Drug Discov. Today. 2019;24:1854–1864. doi: 10.1016/j.drudis.2019.05.011. [DOI] [PubMed] [Google Scholar]
- 37.Valizade Hasanloei M.A., Sheikhpour R., Sarram M.A., Sheikhpour E., Sharifi H. A combined Fisher and Laplacian score for feature selection in QSAR based drug design using compounds with known and unknown activities. J. Comput. Aided Mol. Des. 2018;32:375–384. doi: 10.1007/s10822-017-0094-6. [DOI] [PubMed] [Google Scholar]
- 38.Antelo-Collado A., Carrasco-Velar R., García-Pedrajas N., Cerruela-García G. Effective Feature Selection Method for Class-Imbalance Datasets Applied to Chemical Toxicity Prediction. J. Chem. Inf. Model. 2021;61:76–94. doi: 10.1021/acs.jcim.0c00908. [DOI] [PubMed] [Google Scholar]
- 39.Shin H.K. Topological Distance-Based Electron Interaction Tensor to Apply a Convolutional Neural Network on Drug-like Compounds. ACS Omega. 2021;6:35757–35768. doi: 10.1021/acsomega.1c05693. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Bak A. Two Decades of 4D-QSAR: A Dying Art or Staging a Comeback? Int. J. Mol. Sci. 2021;22:5212. doi: 10.3390/ijms22105212. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Damale M.G., Harke S.N., Kalam Khan F.A., Shinde D.B., Sangshetti J.N. Recent advances in multidimensional QSAR (4D–6D): A critical review. Mini Rev. Med. Chem. 2014;14:35–55. doi: 10.2174/13895575113136660104. [DOI] [PubMed] [Google Scholar]
- 42.Fourches D., Ash J. 4D-quantitative structure-activity relationship modeling: Making a comeback. Expert Opin. Drug Discov. 2019;14:1227–1235. doi: 10.1080/17460441.2019.1664467. [DOI] [PubMed] [Google Scholar]
- 43.Vedani A., Dobler M. 5D-QSAR: The key for simulating induced fit? J. Med. Chem. 2002;45:2139–2149. doi: 10.1021/jm011005p. [DOI] [PubMed] [Google Scholar]
- 44.Ducki S., Mackenzie G., Lawrence N.J., Snyder J.P. Quantitative structure-activity relationship (5D-QSAR) study of combretastatin-like analogues as inhibitors of tubulin assembly. J. Med. Chem. 2005;48:457–465. doi: 10.1021/jm049444m. [DOI] [PubMed] [Google Scholar]
- 45.Oberdorf C., Schmidt T.J., Wünsch B. 5D-QSAR for spirocyclic sigma1 receptor ligands by Quasar receptor surface modeling. Eur. J. Med. Chem. 2010;45:3116–3124. doi: 10.1016/j.ejmech.2010.03.048. [DOI] [PubMed] [Google Scholar]
- 46.Vedani A., Dobler M., Lill M.A. Combining protein modeling and 6D-QSAR. Simulating the binding of structurally diverse ligands to the estrogen receptor. J. Med. Chem. 2005;48:3700–3703. doi: 10.1021/jm050185q. [DOI] [PubMed] [Google Scholar]
- 47.Uesawa Y. Quantitative structure-activity relationship analysis using deep learning based on a novel molecular image input technique. Bioor. Med. Chem. Lett. 2018;28:3400–3403. doi: 10.1016/j.bmcl.2018.08.032. [DOI] [PubMed] [Google Scholar]
- 48.Attene-Ramos M.S., Miller N., Huang R., Michael S., Itkin M., Kavlock R.J., Austin C.P., Shinn P., Simeonov A., Tice R.R., et al. The Tox21 robotic platform for the assessment of environmental chemicals--from vision to reality. Drug Discov. Today. 2013;18:716–723. doi: 10.1016/j.drudis.2013.05.015. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49.Menegola E., Veltman C.H.J., Battistoni M., Di Renzo F., Moretto A., Metruccio F., Beronius A., Zilliacus J., Kyriakopoulou K., Spyropoulou A., et al. An adverse outcome pathway on the disruption of retinoic acid metabolism leading to developmental craniofacial defects. Toxicology. 2021;458:152843. doi: 10.1016/j.tox.2021.152843. [DOI] [PubMed] [Google Scholar]
- 50.Matsuzaka Y., Uesawa Y. Optimization of a Deep-Learning Method Based on the Classification of Images Generated by Parameterized Deep Snap a Novel Molecular-Image-Input Technique for Quantitative Structure-Activity Relationship (QSAR) Analysis. Front. Bioeng. Biotechnol. 2019;7:65. doi: 10.3389/fbioe.2019.00065. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 51.Matsuzaka Y., Uesawa Y. Prediction model with high-performance constitutive androstane receptor (car) using deepsnap-deep learning approach from the Tox21 10K compound library. Int. J. Mol. Sci. 2019;20:4855. doi: 10.3390/ijms20194855. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 52.Matsuzaka Y., Uesawa Y. DeepSnap-deep learning approach predicts progesterone receptor antagonist activity with high performance. Front. Bioeng. Biotechnol. 2020;7:485. doi: 10.3389/fbioe.2019.00485. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 53.Matsuzaka Y., Uesawa Y. molecular image-based prediction models of nuclear receptor agonists and antagonists using the deepsnap-deep learning approach with the Tox21 10K library. Molecules. 2020;25:2764. doi: 10.3390/molecules25122764. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 54.Matsuzaka Y., Totoki S., Handa K., Shiota T., Kurosaki K., Uesawa Y. Prediction Models for Agonists and Antagonists of Molecular Initiation Events for Toxicity Pathways Using an Improved Deep-Learning-Based Quantitative Structure-Activity Relationship System. Int. J. Mol. Sci. 2021;22:10821. doi: 10.3390/ijms221910821. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 55.Heo T.Y., Kim K.M., Min H.K., Gu S.M., Kim J.H., Yun J., Min J.K. Development of a Deep-Learning-Based Artificial Intelligence Tool for Differential Diagnosis between Dry and Neovascular Age-Related Macular Degeneration. Diagnostics. 2020;10:261. doi: 10.3390/diagnostics10050261. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 56.Umer M., Ashraf I., Ullah S., Mehmood A., Choi G.S. COVINet: A convolutional neural network approach for predicting COVID-19 from chest X-ray images. J. Ambient. Intell. Humaniz. Comput. 2021;28:535–547. doi: 10.1007/s12652-021-02917-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 57.He C., Liu J., Zhu Y., Du W. Data Augmentation for Deep Neural Networks Model in EEG Classification Task: A Review. Front. Hum. Neurosci. 2021;15:765525. doi: 10.3389/fnhum.2021.765525. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 58.Nanni L., Paci M., Brahnam S., Lumini A. Comparison of Different Image Data Augmentation Approaches. J. Imaging. 2021;7:254. doi: 10.3390/jimaging7120254. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 59.Tsai K.J., Chang C.C., Lo L.C., Chiang J.Y., Chang C.S., Huang Y.J. Automatic segmentation of paravertebral muscles in abdominal CT scan by U-Net: The application of data augmentation technique to increase the Jaccard ratio of deep learning. Medicine. 2021;100:e27649. doi: 10.1097/MD.0000000000027649. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 60.Chun J., Park J.C., Olberg S., Zhang Y., Nguyen D., Wang J., Kim J.S., Jiang S. Intentional deep overfit learning (IDOL): A novel deep learning strategy for adaptive radiation therapy. Med. Phys. 2021 doi: 10.1002/mp.15352. in press. [DOI] [PubMed] [Google Scholar]
- 61.Lin B., Cheng M., Wang S., Li F., Zhou Q. Automatic detection of anteriorly displaced temporomandibular joint discs on magnetic resonance images using a deep learning algorithm. Dentomaxillofac. Radiol. 2021;29:20210341. doi: 10.1259/dmfr.20210341. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 62.Wang S.H., Zhu Z., Zhang Y.D. PSCNN: PatchShuffle Convolutional Neural Network for COVID-19 Explainable Diagnosis. Front. Public Health. 2021;9:768278. doi: 10.3389/fpubh.2021.768278. [DOI] [PMC free article] [PubMed] [Google Scholar] [Retracted]
- 63.Tsai J.Y., Hung I.Y., Guo Y.L., Jan Y.K., Lin C.Y., Shih T.T., Chen B.B., Lung C.W. Lumbar Disc Herniation Automatic Detection in Magnetic Resonance Imaging Based on Deep Learning. Front. Bioeng. Biotechnol. 2021;9:708137. doi: 10.3389/fbioe.2021.708137. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 64.Hidayatullah P., Wang X., Yamasaki T., Mengko T.L.E.R., Munir R., Barlian A., Sukmawati E., Supraptono S. DeepSperm: A robust and real-time bull sperm-cell detection in densely populated semen videos. Comput. Methods Programs Biomed. 2021;209:106302. doi: 10.1016/j.cmpb.2021.106302. [DOI] [PubMed] [Google Scholar]
- 65.Whang A.J., Chen Y.Y., Tseng W.C., Tsai C.H., Chao Y.P., Yen C.H., Liu C.H., Zhang X. Pupil Size Prediction Techniques Based on Convolution Neural Network. Sensors. 2021;21:4965. doi: 10.3390/s21154965. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 66.Cui J., Zhang X., Xiong F., Chen C.L. Pathological Myopia Image Recognition Strategy Based on Data Augmentation and Model Fusion. J. Healthc. Eng. 2021;2021:5549779. doi: 10.1155/2021/5549779. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 67.Mai Z., Hu G., Chen D., Shen F., Shen H.T. MetaMixUp: Learning Adaptive Interpolation Policy of MixUp With Metalearning. IEEE Trans. Neural. Netw. Learn. Syst. 2021 doi: 10.1109/TNNLS.2020.3049011. in press. [DOI] [PubMed] [Google Scholar]
- 68.Yi L., Mak M.W. Improving Speech Emotion Recognition With Adversarial Data Augmentation Network. IEEE Trans. Neural. Netw. Learn Syst. 2022;33:172–184. doi: 10.1109/TNNLS.2020.3027600. [DOI] [PubMed] [Google Scholar]
- 69.Tang Z., Gao Y., Karlinsky L., Sattigeri P., Feris R., Metaxas D. OnlineAugment: Online Data Augmentation with Less Domain Knowledge. [(accessed on 17 July 2020)];arXiv. 2020 Available online: https://arxiv.org/abs/2007.09271.2007.09271 [Google Scholar]
- 70.Lam T.K., Ohta M., Schamoni S., Riezler S. On-the-Fly Aligned Data Augmentation for Sequence-to-Sequence ASR. [(accessed on 3 April 2021)];arXiv Preprint. 2021 Available online: https://arxiv.org/abs/2104.01393.2104.01393 [Google Scholar]
- 71.Vasudevan S. Mutual Information Based Learning Rate Decay for Stochastic Gradient Descent Training of Deep Neural Networks. Entropy. 2020;22:560. doi: 10.3390/e22050560. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 72.Smith S.L., Kindermans P.-J., Ying C., Le Q.L. Don’t Decay the Learning Rate, Increase the Batch Size. [(accessed on 1 November 2017)];arXiv Prepr. 2018 Available online: https://arxiv.org/abs/1711.00489.1711.00489 [Google Scholar]
- 73.Hanson R.M. Jmol SMILES and Jmol SMARTS: Specifications and applications. J. Cheminform. 2016;8:50. doi: 10.1186/s13321-016-0160-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 74.Scalfani V.F., Williams A.J., Tkachenko V., Karapetyan K., Pshenichnov A., Hanson R.M., Liddie J.M., Bara J.E. Programmatic conversion of crystal structures into 3D printable files using Jmol. J. Cheminform. 2016;8:66. doi: 10.1186/s13321-016-0181-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 75.Hanson R.M., Lu X.J. DSSR-enhanced visualization of nucleic acid structures in Jmol. Nucleic Acids Res. 2017;45:W528–W533. doi: 10.1093/nar/gkx365. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 76.PyMOLWiki. [(accessed on 22 January 2011)]. Available online: https://pymolwiki.org/index.php/Color_Values.
- 77.Xu T., Wang J., Fang Y. A model-free estimation for the covariate-adjusted Youden index and its associated cut-point. [(accessed on 8 February 2014)];arXiv Prepr. 2014 doi: 10.1002/sim.6290. Available online: https://arxiv.org/abs/1402.1835.1402.1835v1 [DOI] [PubMed] [Google Scholar]
- 78.Yuan M., Li P., Wu C. Semiparametric Inference of the Youden Index and the Optimal Cutoff Point under Density Ratio Models. [(accessed on 9 May 2020)];arXiv Prepr. 2020 Available online: https://arxiv.org/abs/2005.04362.2005.04362 [Google Scholar]
- 79.Syring N. Robust posterior inference for Youden’s index cutoff. [(accessed on 10 August 2021)];arXiv Prepr. 2021 doi: 10.1080/03610926.2021.1969409. Available online: https://arxiv.org/abs/2108.04898.2108.04898v1 [DOI] [Google Scholar]
- 80.Artificial Intelligence Research Computing Deviation of Area under the Precision-Recall CURVE (washington.edu) [(accessed on 10 August 2021)]. Available online: http://aiweb.cs.washington.edu/ai/mln/auc.html.
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
All samples of the SMILES compounds and technicalism are available from the authors.