Skip to main content
Pharmaceutics logoLink to Pharmaceutics
. 2023 Feb 16;15(2):675. doi: 10.3390/pharmaceutics15020675

DoubleSG-DTA: Deep Learning for Drug Discovery: Case Study on the Non-Small Cell Lung Cancer with EGFRT790M Mutation

Yongtao Qian 1, Wanxing Ni 1, Xingxing Xianyu 1, Liang Tao 1,*, Qin Wang 1,*
Editors: Adam Pacławski1, Jakub Szlęk1
PMCID: PMC9965659  PMID: 36839996

Abstract

Drug–targeted therapies are promising approaches to treating tumors, and research on receptor–ligand interactions for discovering high-affinity targeted drugs has been accelerating drug development. This study presents a mechanism-driven deep learning-based computational model to learn double drug sequences, protein sequences, and drug graphs to project drug–target affinities (DTAs), which was termed the DoubleSG-DTA. We deployed lightweight graph isomorphism networks to aggregate drug graph representations and discriminate between molecular structures, and stacked multilayer squeeze-and-excitation networks to selectively enhance spatial features of drug and protein sequences. What is more, cross-multi-head attentions were constructed to further model the non-covalent molecular docking behavior. The multiple cross-validation experimental evaluations on various datasets indicated that DoubleSG-DTA consistently outperformed all previously reported works. To showcase the value of DoubleSG-DTA, we applied it to generate promising hit compounds of Non-Small Cell Lung Cancer harboring EGFRT790M mutation from natural products, which were consistent with reported laboratory studies. Afterward, we further investigated the interpretability of the graph-based “black box” model and highlighted the active structures that contributed the most. DoubleSG-DTA thus provides a powerful and interpretable framework that extrapolates for potential chemicals to modulate the systemic response to disease.

Keywords: drug–target affinity, graph isomorphism network, squeeze-and-excitation network, cross-multi-head attention, drug discovery, non-small cell lung cancer

1. Introduction

Clinically acquired resistance is an insurmountable dilemma for small-molecule kinase inhibitors to treat cancer [1]. Nevertheless, locating small-molecule ligands with high affinity and good properties for target proteins in a broad chemical space has been a primary challenge in drug research and development (R&D) [2]. To date, it cannot be overstated to describe the kinase drugs approved by The U.S. Food and Drug Administration (FDA) to overcome clinical resistance driven by protein kinase “gatekeeper” mutation as “desert oasis”. Lung cancer is the leading cause of cancer-related deaths worldwide, with non-small cell lung cancer (NSCLC) being the most common type of lung cancer. Secondary epidermal growth factor receptor (EGFR) mutations in threonine 790 (T790M) lead to acquired resistance which severely affects patient prognosis. Therefore, strategies or drugs to overcome resistance are urgent to prolong the survival of patients with NSCLC.

Laborious wet labs and high-throughput screening techniques are so time-consuming and challenging that they are unsuitable for screening candidate drugs from a broad range of compound groups in pre-drug R&D. With improvements in machine learning theory and an abundance of pharmacological data available, machine learning provides sufficient power for the development of precision medicine and artificially intelligent drug design (AIDD). Many encouraging scientific achievements have convincingly demonstrated the potential of these approaches. For instance, the knowledge graph (KG) enables to detect of the drivers of tumor resistance and adverse drug reactions in a wider multi-omics space [3,4]; reinforcement learning (RL) has been found to be particularly effective in the de novo design and multi-objective optimization of drug molecules [5,6,7]. Deep learning is a powerful data-driven algorithm in machine learning, which offers significant advantages to reveal implicit relationships between drugs, diseases, and genes that are not easily detected, owing to the powerful generalization and representation extraction capability. Some in silico methods that explore potential drug–target associations to advance drug R&D have been developed to narrow the research concentration areas toward the more workable drugs.

Some studies have viewed DTA prediction as a binary classification task, borrowing binary numbers (1/0) to label whether the two are combined [8,9,10], while some others treat it as a regression task and use floating-point numbers to indicate DTAs [11,12,13].

The random forest (RF) algorithm broke the previous methods of relying on multi-parameter scoring functions to infer DTA [14], which has proven to be convincing for extrapolating drug–target relationships in larger chemical spaces. KronRLS [15] and SimBoost [12] were regression-based machine-learning approaches that evaluated similarities between drugs and proteins to determine DTA. Various excellent deep-learning works have been presented. DeepDTA [8] and Attention-DTA [16] leveraged the convolutional neural networks (CNNs) to obtain the hidden relationships of atomic and amino acid sequences. DeepCDA incorporated the long- to short-term memory network which aims to alleviate the phenomenon of gradient disappearance and gradient explosion [17]. MATT-DTI deployed relation-aware self-attention with position embedding to reinforce relative positional associations among atoms [13]. Transformer-based works have come to the fore in various natural language processing (NLP) tasks. DMIL-PPDTA utilized the transformer encoder to enrich word embeddings of drug and protein sequences, aiming to learn hidden associations from the raw data [18]. DeepAtom [19] extrapolated node-level interaction information relevant to binding from the voxelized protein–compound complex structures. Nevertheless, these models rely on known 3D drug–target complexes, and the computational burden of complex 3D convolutional networks to extract the features of massive complexes is expensive. GraphDTA [11] and MGraphDTA [20] represented compounds as topological graphs and evaluated several types of Graph Neural Network (GNN) variants, including Graph Convolutional Network (GCN) [21], Graph Isomorphism Network (GIN) [22], and the Graph Attention Network (GAT) [23], with the aim of replacing CNN and achieving excellent performance. Additionally, DGraphDTA encoded both drugs and proteins into the graphs for inferring DTA by GNN [24]. Among those graph-based methods, they not only effectively avoid the drawbacks of few complex samples and high computational cost, but compensate for the problem of inadequate SMILES (Simplified Molecular Input Line Entry System) [25] for drug representation, and the molecule graph is closer to the natural description of compounds.

Although these methods produce excellent prediction results, they are difficult to generalize to real-world problems. Firstly, the molecular similarity principle [26] states that molecules with similar structures usually show similar biological activities and physicochemical properties; conversely, there are significant differences. Therefore, the model must discriminate between molecular structures over a wide chemical space. Moreover, modeling underlying complicated mapping patterns between compounds and proteins simply concatenate, which deviates from the non-covalent interaction between the receptor and ligand. More importantly, these approaches have limited interpretability as a result of the “black-box” property of graph neural networks. Considering that the false-positive statistics generated by the binary classification task directly impair the robustness of the model, here, predicting DTA was regarded as a regression problem. We propose a three-channel DoubleSG-DTA theoretical framework based on GINs and multiple attention mechanisms to address the aforementioned problems, which significantly outshines other regression-based SOTA methods on various benchmark datasets. Afterward, we visualize the gradient of atomic contributions in graph representations and compare them with the molecular docking poses to further extend the interpretability of the graph-based model.

This paper presents the main contributions as follows:

  • DoubleSG-DTA combined graph isomorphism networks and the squeeze-and-excitation networks to extract multimodal representations of drugs in parallel, aiming to enhance the model to discriminate between compound structures and selectively suppress redundant information to disturb model decisions.

  • The design of cross-multi-head attention mechanisms to model the reality-based non-covalent molecular docking behavior of drug substructures and subsequences with target proteins, respectively;

  • Application of the DoubleSG-DTA to screen promising hit compounds of the NSCLC harboring EGFRT790M mutation from natural products, which have been consistent with reported laboratory studies.

2. Double Sequence and Graph to Predict Drug–Target Affinity (DoubleSG-DTA)

This work developed the DoubleSG-DTA model with three-channel multimodal representations, four-channel interaction, and one-channel output for DTA prediction, which deployed multilayer GINs and multiple attention blocks, as shown in Figure 1. Primarily, we took the drug graphs and SMILES as inputs into the drug representation learning models. Multilayer GINs [22] and squeeze-and-excitation networks (SENets) [27] are jointly used as feature extractors for drugs. Additionally, the protein representation learning model captures the dominant feature of the over-redundant protein sequences that are highly dependent on stacked SENets. Moreover, to further encode the drug–target mutual interaction information, we designed cross-multi-head attention to model the reality-based non-covalent molecular docking behavior of drug substructures and subsequences with target proteins, respectively. Ultimately, we decoupled the attention coefficients into the Multilayer Perceptrons (MLPs) to predict DTA. This section presents the building blocks of our framework in order.

Figure 1.

Figure 1

Architecture of the presented DoubleSG-DTA model.

2.1. Word Embedding and Graph Encoding

Initially, we utilized high-dimensional word embeddings to uniquely encode drug and protein sequences. To this aim, we built label/integer dictionaries for drug SMILES and protein FASTA sequences, which consist of 64 and 22 key-value pairs, respectively. For example, the SMILES of Propylene glycol “CC(O)CO” and the EGFRT790M [28] protein subsequence “NWCVQIA” are encoded as [2222433312233] and [14212221581] according to the SMILES dictionary {‘C’:22, ‘N’:34, ‘O’:33, ‘(’:4, ‘)’:3} and the protein dictionary {‘A’:1, ‘N’:14, ‘C’:2, ‘Q’:15, ‘I’:8, ‘V’:22, ‘W’:21}. We then map each integer vector into word embeddings DeRld×le and PeRlp×le by embedding layers. Where ld and lp denote the size of the SMILES and protein FASTA sequence, le represents the embedding dimensions.

We convert SMILES to their corresponding molecular graphs G=V,E and extract atom features by RDKit [29], where E and V are the sets of edges and atoms, respectively. Each atom node in a drug is represented by a multi-dimension vector of 10 molecular descriptors (atom symbol, atom number, hybridization, number of adjacent atoms, chirality, formal charge, aromaticity, number of bonded hydrogens, and explicit and implicit valence).

2.2. Drug and Protein Sequence Representation Learning Model

The CNNs construct text features by fusing spatial correlations between features that benefit from the convolutional kernel’s local receptive field but are likewise limited by it. In computer vision, the squeeze-and-excitation (SE) block with channel attention was integrated into existing architectures, which adaptively rescales channel-wise feature weights by explicitly modeling non-mutually-exclusive relationships between channels [27]. The research has confirmed that the SENets achieved superior performance for image classification with a slight increase in computational cost [27]. Accordingly, we stacked multilayer SENets designed to selectively enhance effective statistics and suppress noise to disturb model decisions. Given URH×W×C as the feature matrix of the convolution layer output, we routed it to the SE block, where U=[u1,u2,,uC].

SE module makes use of squeeze, excitation, and reweighting operators. The squeeze operator intrinsically aims to transform the dimensions of the feature matrix U and obtain channel-wise statistics zRC by applying the global average pooling operation.

zc=Fsq(uc)=1H×Wi=1Hj=1Wuc(i,j). (1)

The excitation module leverages two learnable FCNs with the gating mechanism to learn inter-channel non-linear interaction and filter non-dominant features.

s=Fex(Z,W)=σW2δ(W1z), (2)

where the δ is the Rectified Linear Unit (ReLU) activation function, and σ is the sigmoid function, and W1RCr×C and W2RC×Cr are the two learnable weight matrices. The reduction ratio was set to r = 16 to reconcile the balance between performance and complexity [27].

The reweighting representation xc was computed by applying the channel-wise multiplication operation to the channel attention weight sc and the feature map uc.

xc=Fsc(uc,sc)=sc×uc. (3)

where X=x1,x2,xC, xcRH×W.

The word embeddings De and Pe are directly fed into the convolutional layers, then delivered to the SE block accompanied by a global max pooling operation to calculate desired feature information. Hence, the drug and protein sequence representations can be expressed as:

DSENet=gmpSECNNDePSENet=gmpSECNNPe. (4)

2.3. Drug Graph Representation Learning Model

Drug molecules are non-Euclidean chemical structures that consist of entities (atoms) and relations (bonds) with rich semantic information and complex spatial structures. This is essential for accurately discriminating between drug molecules and precisely predicting the binding affinity of different compound molecules with proteins. Nevertheless, that is beyond the reach of traditional GNNs.

Meanwhile, we take into account that drugs with similar substructures may react pharmacologically with target proteins with the same or similar protein binding pockets. Interestingly, graph isomorphism networks [22] with injectivity broadly follow a flexible message-passing scheme that enables atoms to recursively update semantic information through aggregating near and far neighboring atomic features. A sufficient number of iterations allows the GIN to be perfectly equipped with the most powerful ability to “read-out” drug graph representations and identify drug molecules.

GIN updates atom feature vectors via the MLPs, ensuring that GIN still satisfies injectivity after K-iterations of aggregation. The graph representation is obtained by summing all of the atom feature vectors in the drug. Formally, the kernel function of GINs updates atom feature vector Dvk, and the drug graph representation DGIN is:

Dvk=MLPk1+εk·Dvk1+iNvDik1DGIN=CONCATREADOUTDvk|vG, (5)

where Nv is a set of nodes adjacent to atom i. The READOUT function is a graph-level pooling function. We made ε a learnable parameter.

The successful construction of deep GINs is highly dependent on the ReLU activation function and batch normalization, while batch normalization can effectively alleviate the vanishing gradient and over-smoothing problems.

GIN(l+1)G=BNLayerGINlGDGIN=DropoutδGINnG,W (6)

where BNLayer denotes node-level batch normalization.

2.4. Drug Molecule and Target Protein Interaction Model

Drug molecules binding to target proteins is actually an identification relationship similar to the “lock and key” model. Inspired by previous attention-based methods [13,17,30], we constructed two cross-multi-head attention modules to model non-covalent molecular docking behavior between compounds and proteins, instead of simply connecting drug and protein representations that inherently generates more intrusive information. Concretely, we observed the associations among molecules’ substructures, subsequences, and residues from multiple independent perspectives. The cross-multi-head attention blocks take the drug and protein sequences feature matrices DSENetRld×lc and PSENetRlp×lc of SENets, and the drug graph-level representation DGINRld×lg of the GIN as inputs, respectively.

In the following paragraphs, we construct learnable linear transition layers so that each head can fully learn from the high-dimensional features. Afterward, we combine DSENet, DGIN with PSENet by adopting the cross-multi-head attention mechanism.

Qs=δ(DSENetWsenet+bsenet),Qg=δ(DGINWgin+bgin)K=δ(PSENetWsenet+bsenet),V=δ(PSENetWsenet+bsenet) (7)

where WsenetRlc×la, WginRlg×la, and bsenet, bgin are the learnable weights and bias terms, respectively. Q, K, and V represent queries, keys, and values vectors. An individual scaled dot–product attention module was expressed as mapping the Q with K-V pairs to the similarity matrix. Multi-head attention jointly concerned different representation subspaces at distinct positions by concatenating h individual attention units [31].

We obtained one of the cross-multi-head attention weight ADP1 as follows:

Attention(Qs,K,V)=SoftmaxQsKTlc/h·V (8)
headi=Attention(QsWiQ,KWiK,VWiV)ADP1=Concat[head1,,headh]WO (9)

where WiQ, WiK, WiV, and WO are parameter matrices for learning linear projections. Next, another cross-multi-head attention coefficient ADP2 was computed as:

Attention(Qg,K,V)=SoftmaxQgKTlg/h·V (10)
headj=Attention(QgWjQ,KWjK,VWjV)ADP2=Concat[head1,,headh]WO (11)

Afterward, we decoupled the attention weight ADP to obtain drug attention weight αd and protein attention weight αp by applying row-wise sum and column-wise sum operations. We updated the drug representation αD and protein representation αP.

ADP=Concat[ADP1,ADP2] (12)
αD=Concat[αdDsenet,αdDgin],αP=αpPsenet (13)

where ⊙ is an element-wise product. The drug–target interaction weight Idp can be interpreted as modeling the significant semantic correlations between target proteins and compound features.

Idp=gapConcatgmpαD,gmpαP (14)

where gap is the global average pooling operation.

2.5. Drug and Target Protein Binding Affinity Prediction

Finally, interaction information Idp was fed directly into MLPs to map the drug–target affinity score. Here, this MLPs consists of four layers, each followed by a ReLU and dropout layer, which are applied to alleviate the model from over-fitting.

DTA=MLP(Idp). (15)

3. Materials and Methods

3.1. Benchmark Datasets

This research assessed the DoubleSG-DTA with three benchmark datasets: Davis [32], KIBA [33], and BindingDB [34] datasets. The statistics of the Davis, KIBA, and BindingDB datasets and split strategy have been listed in Table 1.

pKd=log10Kd1×109 (16)

Table 1.

The detailed statistics of Davis, KIBA, and BindingDB datasets.

Dataset No. Proteins No. Drugs No. Interactions Interactions
Train Data Validation Data Test Data
Davis 442 68 30,056 20,037 5009 5010
KIBA 229 2111 118,254 78,836 19,709 19,709
BindingDB 1620 18,044 56,525 37,684 9421 9420

The Davis dataset was highly biased and discrete. We converted the Kd values into log space according to Equation (16) [8], and the KIBA dataset comprises KIBA scores for about 118 K protein–compound interactions, and KIBA scores were derived from different bioactivity measures, such as Ki, Kd, or IC50. The BindingDB dataset collects binding affinities for small molecule drugs and target proteins for public access.

3.2. Evaluation Metrics

To ensure consistency and a fair comparison, we applied the Concordance index (CI, ↑), Mean Square Error (MSE, ↓), and Regression toward the mean (rm2 index, ↑) as performance metrics following previous studies [8,11,13] to assess the model.

MSE: The MSE metric was commonly used to measure the difference between the ground truths and the predicted values, and minimizing the MSE was the main training objective.

CI: The CI metric was introduced to measure the probability of the concordance between the ground truths and the predicted values. CI values range between 0.50 and 1.0, with values less than 0.7 indicating less convincing model prediction, 0.71 to 0.90 indicating moderate prediction accuracy, and more than 0.9 indicating reliable predictions.

rm2: The rm2 metric was extensively adopted to evaluate the external predictive performance of regression-based models, and an acceptable model has a rm2 value greater than 0.5.

MSE=1Ni=1NDTAiLabeli2 (17)

DTAi and Labeli mean the predictive value and the ground truth, respectively.

CI=1Zδi>δjζDTAmaxDTAmin (18)

DTAmax and DTAmin represent the predictive values of the highest affinity δi and the lowest affinity δj. ζ(x) expresses the step function [15], where ζx=1,x>0;0.5,x=0;0,x<0;, Z is a normalization constant.

rm2=r2×(1r2r02). (19)

Generally, an acceptable model has a rm2 value greater than 0.5, where the r02 and r2 designate squared correlation coefficients of interception or not.

More importantly, the Pearson correlation coefficient was employed to measure the linear correlation between the ground truths and predicted values. The Pearson correlation coefficient can be calculated as follows.

PearsonDTA,Label=CovDTA,LabelσDTAσLabel, (20)

where Cov means co-variance, and σ represents the standard deviation.

3.3. Hyperparameter Settings

Experiments were conducted with an NVIDIA RTX A5000 GPU. We adopted five-fold cross-validation to evaluate the quality of previously reported works and DoubleSG-DTA model, Table 2 gives the hyperparameter settings in experiments.

Table 2.

The hyperparameters of DoubleSG-DTA.

Hyperparameters Davis Dataset KIBA Dataset BindingDB Dataset
Embedding Size 128 128 128
SENet layers 3 3 3
GIN layers [3, 4, 5, 6, 7] [3, 4, 5, 6, 7] [3, 4, 5, 6, 7]
Number of filters in SENets [16, 32, 48] [32, 64, 96] [32, 64, 96]
Hidden size in MLPs [1024, 1024, 512] [1024, 1024, 512] [1024, 1024, 512]
Number of attention heads 8 8 8
Epoch 600 600 600
Learning rate 0.0001 0.0001 0.0001
Batch Size 512 1024 1024
Dropout rate 0.2 0.2 0.2
Optimizer Adam Adam Adam
Activation Function ReLU ReLU ReLU
Loss Function MSEloss MSEloss MSEloss

3.4. Baselines

In this part, we conducted experiments applying the MSE(↓), CI(↑), and rm2(↑) to assess the DoubleSG-DTA method and previous studies on the above three benchmark datasets, including DeepDTA [8], GraphDTA [11], MATT-DTI [13], AttentionDTA [16], DeepCDA [17], and DMIL-PPDTA [18]. Besides, we also benchmarked our work against proteochemometrics methods [35], including the support vector machine (SVM), feedforward neural network (FNN), SimBoost [12], Random Forest (RF) [14], and KronRLS [15].

4. Results and Discussion

4.1. Comparison against Baselines in Regression Tasks

Table 3, Table 4 and Table 5 summarize the quantitative results of the DoubeSG-DTA and previously studied models on the benchmark datasets. Obviously, DoubleSG-DTA achieved significantly superior performances to other regression-based methods on various datasets.

Table 3.

Comparison of previous studies and the DoubleSG-DTA on the Davis dataset.

Dataset Methods Protein Compounds Interaction CI(std)↑ MSE↓ rm2(std)↑
Davis Random Forest [14] ECFP PSC 0.854 (0.002) 0.359 0.549 (0.005)
SVM [20] ECFP PSC 0.857 (0.001) 0.383 0.513 (0.003)
FNN [20] ECFP PSC 0.893 (0.003) 0.244 0.685 (0.015)
KronRLS [15] Smith-Waterman Pubchem Sim 0.871 (0.001) 0.379 0.407 (0.005)
SimBoost [12] Smith-Waterman Pubchem Sim 0.872 (0.001) 0.282 0.644 (0.006)
DeepDTA [8] CNN CNN Concatention&FCN 0.878 (0.004) 0.261 0.630 (0.017)
DeepCDA [17] CNN&LSTM 1 CNN&LSTM Two-sided Attention&FCN 0.891 (0.003) 0.248 0.649 (0.009)
MATT-DTI [13] CNN CNN&Relation-aware Self-Attention Multi-head Attention&FCN 0.891 (0.002) 0.227 0.683 (0.017)
AttentionDTA [16] CNN CNN Multi-head Attention&FCN 0.887 (0.005) 0.245 0.657 (0.024)
DMIL-PPDTA [18] Transformer Transformer Multi-head attention&FCN 0.880 (0.007) 0.223 0.642 (0.017)
GraphDTA [11] CNN GIN Concatention&FCN 0.893 (—) 0.229
GraphDTA [11] CNN GAT Concatention&FCN 0.892 (—) 0.232
GraphDTA [11] CNN GCN Concatention&FCN 0.890 (—) 0.254
GraphDTA [11] CNN GAT&GCN Concatention&FCN 0.881 (—) 0.245
DoubleSG-DTA CNN GIN+CNN 2 Concatention&FCN 0.886 (0.003) 0.250 0.688 (0.031)
DoubleSG-DTA SENet GIN+SENet Cross-Multi-head Attention&FCN 0.902 (0.008) 0.219 0.725 (0.008)

1 & stands for concatenating learning. 2 + stands for parallel learning. Bold text indicates the best result.

Table 4.

Comparison of previous studies and the DoubleSG-DTA on the KIBA dataset.

Dataset Methods Protein Compounds Interaction CI(std)↑ MSE↓ rm2(std)↑
KIBA Random Forest [14] ECFP PSC 0.837 (0.000) 0.245 0.581 (0.000)
SVM [20] ECFP PSC 0.799 (0.001) 0.308 0.513 (0.004)
FNN [20] ECFP PSC 0.818 (0.005) 0.216 0.659 (0.015)
KronRLS [15] Smith-Waterman Pubchem Sim 0.782 (0.001) 0.411 0.342 (0.001)
SimBoost [12] Smith-Waterman Pubchem Sim 0.836 (0.001) 0.222 0.629 (0.007)
DeepDTA [8] CNN CNN Concatention&FCN 0.863 (0.002) 0.194 0.673 (0.009)
DeepCDA [17] CNN&LSTM CNN&LSTM Two-sided Attention&FCN 0.889 (0.002) 0.176 0.682 (0.008)
MATT-DTI [13] CNN CNN&Relation-aware Self-Attention Multi-head Attention&FCN 0.889 (0.001) 0.150 0.756 (0.011)
AttentionDTA [16] CNN CNN Multi-head Attention&FCN 0.882 (0.004) 0.162 0.735 (0.003)
DMIL-PPDTA [18] Transformer Transformer Multi-head attention&FCN 0.881 (0.003) 0.147 0.784 (0.006)
GraphDTA [11] CNN GIN Concatention&FCN 0.882 (—) 0.147
GraphDTA [11] CNN GAT Concatention&FCN 0.866 (—) 0.179
GraphDTA [11] CNN GCN Concatention&FCN 0.889 (—) 0.139
GraphDTA [11] CNN GAT&GCN Concatention&FCN 0.891 (—) 0.139
DoubleSG-DTA CNN GIN+CNN Concatention&FCN 0.856 (0.002) 0.164 0.721 (0.009)
DoubleSG-DTA SENet GIN+SENet Cross-Multi-head Attention&FCN 0.896 (0.010) 0.138 0.787 (0.005)

Bold text indicates the best result.

Table 5.

Comparison of previous studies and the DoubleSG-DTA on the BindingDB dataset.

Dataset Methods Protein Compounds Interaction CI(std)↑ MSE↓ rm2(std)↑
BindingDB KronRLS [15] Smith-Waterman Pubchem Sim 0.815 (0.003) 0.939
DeepDTA [8] CNN CNN Concatention & FCN 0.826 (0.001) 0.703 0.669 (0.004)
DeepCDA [17] CNN & LSTM CNN & LSTM Two-sided Attention & FCN 0.822 (0.001) 0.844 0.631 (0.002)
AttentionDTA [16] CNN CNN Multi-head Attention & FCN 0.852 (0.003) 0.603 0.687 (0.013)
GraphDTA [11] CNN GIN Concatention & FCN 0.857 (—) 0.557 0.703 (—)
GraphDTA [11] CNN GAT Concatention & FCN 0.817 (—) 0.929 0.555 (—)
GraphDTA [11] CNN GCN Concatention & FCN 0.850 (—) 0.638 0.647 (—)
GraphDTA [11] CNN GAT & GCN Concatention & FCN 0.855 (—) 0.593 0.682 (—)
DoubleSG-DTA CNN GIN+CNN Concatention & FCN 0.853 (0.001) 0.624 0.642 (0.008)
DoubleSG-DTA SENet GIN+SENet Cross-Multi-head Attention&FCN 0.862 (0.002) 0.533 0.726 (0.009)

Bold text indicates the best result.

Considering the Davis dataset, the MSE metric of the DoubleSG-DTA model was 0.219, 0.004 lower than the best DMIL-PPDTA [18] model in the sequence-based models, and the CI and rm2 metrics of our model were 0.902 and 0.725, 0.009 and 0.04 higher than FNN [20] model in the sequence-based models, respectively. When comparing with the best GraphDTA [11] model in the graph-based models, the CI value was increased by 0.009 and the MSE value was decreased by 4.37%.

Considering the KIBA dataset, the MSE and rm2 metric of the DoubleSG-DTA model were 0.138 and 0.787, 6.12% lower and 0.003 higher than the best DMIL-PPDTA [18] model in the sequence-based models, and the CI metrics of our model were 0.896, 0.007 higher than the MATT-DTI [13] model in the sequence-based models, respectively. When compared with the best GraphDTA [11] model in the graph-based models, the CI value was increased by 0.005 and the MSE value was decreased by 0.001.

Considering the BindingDB dataset, the MSE metric of the DoubleSG-DTA model was 0.533, 11.61% lower than the best AttentionDTA [16] model in the sequence-based models, and the CI and rm2 metrics were 0.862 and 0.726, which were 0.01 and 0.039 higher than it, respectively. When compared with the best GraphDTA [11] model in the graph-based models, the CI and rm2 metrics were increased by 0.005 and 0.023, respectively, and the MSE metric was decreased by 4.31%.

Figure 2 presents that the predictive values and ground truths show approximately overlapping distribution trends in the KIBA, Davis, and BindingDB datasets. In addition, using the Pearson correlation enabled us to make an unbiased assessment for DoubleSG-DTA that is optimized for MSE. In particular, our model achieved even better Pearson correlations of 0.852, 0.894, and 0.867 in the three benchmark datasets, respectively.

Figure 2.

Figure 2

Correlation distribution between ground truths and predictive values on benchmark datasets, (a) scatter and (b) kernel density estimate plots.

These results indicate that the powerful graph isomorphism networks, coupled with the lightweight squeeze-and-excitation networks enable the DoubleSG-DTA to perform exceptionally well under the support of cross-multi-head attention.

4.2. Ablation Study 1: The Effect of Graph Isomorphism Network Layers on Model Performance

Extracting drug representations highly relies on the graph computational capability of GIN. We conducted an ablation experiment to investigate the contribution of graph isomorphism network depth on prediction performance. It can be seen from Figure 3 that the DoubleSG-DTA outperforms all other settings when the count of layers of GINs L4,5, and the CI and rm2 metrics of the DoubleSG-DTA model tend to decrease as the number of GIN layers increases, and the MSE metric of the main objective of DoubleSG-DTA training increases sharply. GIN performs a weighted average of its own features and near and far neighboring node features to update the node’s new features, with the aim of capturing graph representations and discriminating between graph structures. However, increasing the number of layers infinitely will cause the feature vectors of nodes within the same cluster to gradually converge to similarity, which may lead to node-wise over-smoothing and impair model decision-making performance [36]. Therefore, the appropriate depth of GIN facilitates obtaining drug graph representations, while stacking a collection of GIN layers may cause over-smoothing and vanishing gradients problems.

Figure 3.

Figure 3

Impact of the layers of the graph isomorphism network on the performance of DoubleSG-DTA.

4.3. Ablation Study 2: The Effect of Se Block on Model Performance

This work forgoes the CNNs used in previous studies [8,13,16,17] as the feature extractor but instead creates multilayer squeeze-and-excitation networks to construct textual features of drug and amino acid sequences, which was compared with a CNN-based method. As shown in Table 6, although the multilayer SE modules with channel attention were embedded into the DoubleSG-DTA model that caused the model parameters to rise and also caused higher model complexity, there was no significant increase in the training time of the model on the three benchmark datasets. Therefore, controlled experiments demonstrated that the DoubleSG-DTA model with SENet blocks (DoubleSG-DTA + SENet) achieves considerable improvements at a slightly additional computational burden than the models without it (DoubleSG-DTA + CNN). Overall, our findings suggest that SENets significantly reduce the model’s error rate, which benefits from inter-channel attention.

Table 6.

Investigating the contributions of SENet on Davis, KIBA, and BindingDB datasets.

Dataset Methods Protein Compounds Interaction CI(std)↑ MSE↓ rm2(std)↑ Time 1 (std)
Davis DoubleSG-DTA CNN GIN+CNN Cross-Multi-head Attention&FCN 0.897 (0.008) 0.229 0.713 (0.077) 4.102 (0.061)
DoubleSG-DTA SENet GIN+SENet Cross-Multi-head Attention&FCN 0.902 (0.008) 0.219 0.725 (0.008) 4.139 (0.066)
KIBA DoubleSG-DTA CNN GIN+CNN Cross-Multi-head Attention&FCN 0.887 (0.014) 0.147 0.760 (0.048) 19.619 (0.357)
DoubleSG-DTA SENet GIN+SENet Cross-Multi-head Attention&FCN 0.896 (0.010) 0.138 0.787 (0.005) 20.023 (0.109)
BindingDB DoubleSG-DTA CNN GIN+CNN Cross-Multi-head Attention&FCN 0.854 (0.001) 0.614 0.646 (0.009) 13.787 (0.203)
DoubleSG-DTA SENet GIN+SENet Cross-Multi-head Attention&FCN 0.862 (0.002) 0.533 0.726 (0.009) 14.276 (0.165)

1Time (s) denotes the time that our proposed DoubleSG-DTA model took to train an epoch.

4.4. Ablation Study 3: Interaction Learning with Cross-Multi-Head Attention Mechanism

Ultimately, this study investigated the impact of the cross-multi-head attention mechanism modeling the reality-based molecular docking behavior of drug molecules and target proteins, and compared it against the method of concatenating both. As shown in Table 7, the MSE index of the DoubleSG-DTA model with cross-multi-head attention decreased by 9.50%, 10.39%, and 3.79% compared to the latter in the Davis, KIBA, and BindingDB datasets, respectively. Besides, the rm2 index increased by 0.012, 0.014, and 0.024. Overall, after using the cross-multi-head attention mechanism, the complete DoubleSG-DTA model led to more considerable improvements.

Table 7.

Investigating the contributions of the cross-multi-head attention mechanism on Davis, KIBA, and BindingDB datasets.

Dataset Methods Protein Compounds Interaction CI(std)↑ MSE↓ rm2(std)↑ Pearson↑
Davis DoubleSG-DTA SENet GIN+SENet Concatenation&FCN 0.892 (0.007) 0.242 0.713 (0.026) 0.845
DoubleSG-DTA SENet GIN+SENet Cross-Multi-head Attention&FCN 0.902 (0.008) 0.219 0.725 (0.008) 0.852
KIBA DoubleSG-DTA SENet GIN+SENet Concatenation&FCN 0.878 (0.018) 0.154 0.773 (0.063) 0.880
DoubleSG-DTA SENet GIN+SENet Cross-Multi-head Attention&FCN 0.896 (0.010) 0.138 0.787 (0.005) 0.894
BindingDB DoubleSG-DTA SENet GIN+SENet Concatenation&FCN 0.859 (0.002) 0.554 0.702 (0.009) 0.862
DoubleSG-DTA SENet GIN+SENet Cross-Multi-head Attention&FCN 0.862 (0.002) 0.533 0.726 (0.009) 0.867

5. Case Study on the NSCLC with EGFRT790M Mutation

According to the statistics of cancer data in 2021 [37], lung cancer mortality increased to around 46% of total cancer mortality, among which NSCLC accounted for approximately 85% of lung malignancies. Patients with NSCLC are normally accompanied by epidermal growth factor receptor (EGFR) mutations [38], which brings great challenges to the treatment of NSCLC. In recent years, the remarkable achievements of small-molecule EGFR tyrosine kinase inhibitors (EGFR-TKIs) in targeted therapy have brought light to NSCLC patients. First-generation EGFR-TKIs (Gefitinib and Erlotinib) and second-generation EGFR-TKI (Afatinib) significantly improved the prognosis of advanced NSCLC patients compared to platinum-based chemotherapy. Unfortunately, the majority of patients develop EGFRT790M mutation, resulting in severe resistance symptoms [39]. Inevitably, despite the high selectivity of the third-generation EGFR-TKI (Osimertinib) targeting NSCLC harboring EGFRT790M mutation, patients develop secondary resistance [40].

Natural products continue to be a precious source of templates with structural complexity and numerous pharmacophores in drug R&D, especially effective in cancer. For instance, paclitaxel [41] and vincristine [42] have been widely invested in the clinical treatment of tumors. In this section, we preferred to screen high-affinity and good properties targeted inhibitors of NSCLC with EGFRT790M mutation from natural products. We hope our results may provide clues for medical scientists to develop highly selective natural drugs.

For the above purpose, we acquired the FASTA sequence of mutant protein EGFRT790M (PDB ID:2JIT [28]) from the Protein Data Bank [43] and collected 2645 natural compounds from Selleck Chemicals https://www.selleck.cn/ (accessed on 4 January 2023), which are easily optimized for good human oral bioavailability (OB > 40%) and drug-likeness (DL > 0.18) [44,45]. Table 8 provides information on the top 10 natural products predicted by DoubleSG-DTA, which have the highest affinity to the EGFRT790M mutant protein.

Then, we carried out a comprehensive literature survey on the top 10 natural products. Based on the study [46], gossypol not only significantly increased the sensitivity to EGFR-TKIs in H1975 cells carrying EGFRL858R/T790M, but inhibited cell proliferation and induced apoptosis. The Gö6976 is derived from Staurosporine, experimental confirmation that Gö6976 (at 500 nanomolar) exhibits significant binding affinity for EGFRT790M mutants, while it shows a significantly lower affinity for wild-type EGFR [47]. The research results indicate that Shikonin has selective cytotoxic effects on gefitinib-resistant NSCLC cell lines carrying EGFRT790M mutation, while relatively safe to normal lung cells [48]. Gossypol acetic acid significantly enhances sensitized lung cancer cells carrying EGFRL858R/T790M mutation to gefitinib and overcomes EGFR-TKIs resistance [49,50]. According to the above-mentioned report, such natural products may be promising strategies to combat resistance in NSCLC harboring EGFRT790M mutation.

Table 8.

Docking information of the top 10 natural products with the highest affinity.

Natural Products MF MW H-Bonds Binding-Energy (KJ/mol)
Gossypol [46] C30H30O8 518.60 4 −12.636
Gossypol acetic acid [50] C32H34O10 578.60 3 −14.644
Staurosporine [47] C28H26N4O3 466.50 3 −18.744
Emodin C15H10O5 270.24 4 −13.933
Physcion C16H12O5 284.26 3 −16.862
Aurantio-obtusin C17H14O7 330.29 4 −17.531
Shikonin [48] C16H16O5 288.29 3 −13.180
Rhein C15H8O6 284.22 6 −16.192
Obtusifolin C16H12O5 284.26 3 −15.104
Chrysophanol C15H10O4 254.24 5 −16.025

6. Molecular Docking and Biological Interpretation

To further validate such new interactions, computational docking was performed via AutoDock [51]. As shown in Figure 4, we employed the most efficient, reliable, and successful Lamarckian genetic algorithm in Autodock to perform an adaptive global–local search for the lowest-energy ligand–receptor docked conformation, and predicted the binding free energy via an empirical binding free energy force field [52]. The ligand–receptor binding energy includes electrostatic interactions, hydrogen bonding, van der Waals forces and hydrophobic interactions, and so forth, and the structural stability is negatively correlated with the binding energy value. Furthermore, an acceptable molecular docking conformation that has a binding energy of less than −5.0208 KJ/mol. Drug molecule ligands interact stably with target proteins in the above manner, aiming to exert a variety of biological activities such as anti-inflammatory and anti-tumor activities of the drug molecules, and to stimulate the physiological and pharmacological functions of the protein. As shown in Figure 4 and Table 8, the docking indicates that the top 10 natural compounds can be stably docked to the EGFRT790M protein by generating multiple hydrogen bonds.

Figure 4.

Figure 4

The blue box shows the heatmaps of atomic contributions. In the red box, are molecular docking poses of the top 10 natural drugs with EGFRT790M mutant proteins.

Graph neural networks have always been criticized because of their poor interpretability, and these models are commonly thought of as “black boxes”. In this work, inspired by Grad-AAM [20] and Grad-CAM [53], which employed the gradient-weighted class activation mapping method, the regions of graph structure that contribute most to the prediction results are visualized as heatmaps, enhancing the interpretability of deep learning-based network models processing graph data.

Since the last layer of the GINs of DoubleSG-DTA incorporates the richest high-level semantic information, the drug graph representations are visualized to produce heatmaps depicting the atoms and functional groups that contribute most prominently to predicting DTA. We denote the feature map of the last graph convolution layer as F. In order to obtain the probability map P of atomic node v for a given drug molecule, we calculate the gradient of the predicted affinity DTA of the molecule binding to the target protein at the c-th channel of the feature map F and atomic node v. The gradient Wc has been calculated as follows.

Wc=1|V|vVDTAFvc. (21)

Next, a weighted combination of the data for each channel of the feature map F was performed, followed by the ReLU activation function.

P=δcWcFc. (22)

Finally, the gradient weights were scaled to the range of 0 to 1 using min–max normalization to obtain a probability map P of the weighted distribution of the drug molecules, which was further rendered into a heatmap.

As shown in Figure 4, the active structures in the heatmaps overlap with molecular docking sites by more than 77.14%, and the mathematical calculation formulation is given as Equation (23). Figure 4 explains that describing the drug molecules as graphs and learning the topological pattern structures of the drug molecules with an appropriate depth of GIN can accurately discriminate between drug molecular active structures.

overlaprate=1Ni=1NPdrugPprotein, (23)

where N denotes the number of drugs, Pprotein stands for the number of molecular docking sites, and Pdrug is the number of atoms and functional groups that contributes the most and is identical to the molecular docking site.

7. Conclusions

This investigation presented an interpretable deep learning-based computational model to project the affinity of drug–target pairs for aiding in drug discovery. The experimental results indicated that the simple yet powerful graph isomorphism networks coupled with the lightweight squeeze-and-excitation networks made the DoubleSG-DTA perform exceptionally well with the support of cross-multi-head attention compared with all previously reported works. Extensive experiments have revealed that (i) the most appropriate number of graph isomorphism network layers for extracting drug graph representations and discriminating between molecular structures is 4,5, (ii) the SE block with the soft attention mechanism selectively emphasized information features by expanding the perceptual field, significantly boosting the model’s decision making, and (iii) fully modeling the interaction between compounds and proteins facilitates further performance in predicting drug–target binding affinity. Ultimately, the well-established DoubleSG-DTA was applied to screen promising high-affinity compounds of Non-Small Cell Lung Cancer with EGFRT790M mutation from natural products to provide some clues for medical scientists. In addition, drug graph representations were visualized as heatmaps, in which the active structures that contributed the most covered almost all molecular docking sites, which may provide biological interpretation and entry points for later molecular optimization. Overall, DoubleSG-DTA may be an effective in silico drug discovery tool for medical challenges and urgent public health emergencies.

Acknowledgments

We gratefully acknowledge the editors and reviewers for reviewing the paper.

Abbreviations

The following abbreviations are used in this manuscript:

DoubleSG-DTA Double Sequence and Graph to Predict drug–target Affinity
DTA drug–target affinities
EGFR epidermal growth factor receptor
EGFR-TKIs EGFR tyrosine kinase inhibitors
NSCLC Non-Small Cell Lung Cancer
T790M threonine 790 mutations
R& D Research and Development
SMILES Simplified Molecular Input Line Entry System
GIN Graph Isomorphism Network
SENet Squeeze-and-Excitation Network
MLP Multilayer Perceptrons
ReLU Rectified Linear Unit activation function
gap global average pooling
gmp global max pooling
RF Random Forest
SVM Support Vector Machine
FNN Feedforward Neural Network
CI Concordance index
MSE Mean Square Error
rm2 Regression toward the mean
MF Molecular Formula
MW Molecular Weight(g/mol)
H-Bonds Hydrogen Bonds

Author Contributions

All the authors have contributed in various degrees to ensure the quality of this work. Y.Q., conceptualization, methodology, investigation, visualization, writing-original draft; W.N., methodology, visualization, formal analysis; X.X., writing-review and editing; Y.Q., writing-review and editing; L.T., conceptualization, supervision, funding acquisition; Q.W., conceptualization, validation, project administration, funding acquisition. All authors have read and agreed to the published version of the manuscript.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The source code at https://github.com/YongtaoQian/DoubleSG-DTA (accessed on 4 January 2023).

Conflicts of Interest

The authors declare no conflict of interest.

Funding Statement

This work was supported by the National Natural Science Foundation of China (grant No. 81473234, Guangzhou, China), the Guangdong Basic and Applied Basic Research Foundations (grant No. 2019A1515012215, Guangzhou, China), and the Joint Fund of the National Natural Science Foundation of China (grant No. U1303221, Guangzhou, China).

Footnotes

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

References

  • 1.Zhou Y., Xiang S., Yang F., Lu X. Targeting Gatekeeper Mutations for Kinase Drug Discovery. J. Med. Chem. 2022;65:15540–15558. doi: 10.1021/acs.jmedchem.2c01361. [DOI] [PubMed] [Google Scholar]
  • 2.Chan H.S., Shan H., Dahoun T., Vogel H., Yuan S. Advancing drug discovery via artificial intelligence. Trends Pharmacol. Sci. 2019;40:592–604. doi: 10.1016/j.tips.2019.06.004. [DOI] [PubMed] [Google Scholar]
  • 3.Gogleva A., Polychronopoulos D., Pfeifer M., Poroshin V., Ughetto M., Martin M.J., Thorpe H., Bornot A., Smith P.D., Sidders B., et al. Knowledge graph-based recommendation framework identifies drivers of resistance in EGFR mutant non-small cell lung cancer. Nat. Commun. 2022;13:1667. doi: 10.1038/s41467-022-29292-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Wang M., Ma X., Si J., Tang H., Wang H., Li T., Ouyang W., Gong L., Tang Y., He X., et al. Adverse drug reaction discovery using a tumor-biomarker knowledge graph. Front. Genet. 2021;11:625659. doi: 10.3389/fgene.2020.625659. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Popova M., Isayev O., Tropsha A. Deep reinforcement learning for de novo drug design. Sci. Adv. 2018;4:eaap7885. doi: 10.1126/sciadv.aap7885. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Li Y., Pei J., Lai L. Structure-based de novo drug design using 3D deep generative models. Chem. Sci. 2021;12:13664–13675. doi: 10.1039/D1SC04444C. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Chen Z., Min M.R., Parthasarathy S., Ning X. A deep generative model for molecule optimization via one fragment modification. Nat. Mach. Intell. 2021;3:1040–1049. doi: 10.1038/s42256-021-00410-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Öztürk H., Özgür A., Ozkirimli E. DeepDTA: Deep drug–target binding affinity prediction. Bioinformatics. 2018;34:i821–i829. doi: 10.1093/bioinformatics/bty593. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Gao K.Y., Fokoue A., Luo H., Iyengar A., Dey S., Zhang P. Interpretable Drug Target Prediction Using Deep Neural Representation; Proceedings of the IJCAI; Stockholm, Sweden. 13–19 July 2018; pp. 3371–3377. [Google Scholar]
  • 10.Wang L., You Z.H., Chen X., Xia S.X., Liu F., Yan X., Zhou Y., Song K.J. A computational-based method for predicting drug–target interactions by using stacked autoencoder deep neural network. J. Comput. Biol. 2018;25:361–373. doi: 10.1089/cmb.2017.0135. [DOI] [PubMed] [Google Scholar]
  • 11.Nguyen T., Le H., Quinn T.P., Nguyen T., Le T.D., Venkatesh S. GraphDTA: Predicting drug–target binding affinity with graph neural networks. Bioinformatics. 2021;37:1140–1147. doi: 10.1093/bioinformatics/btaa921. [DOI] [PubMed] [Google Scholar]
  • 12.He T., Heidemeyer M., Ban F., Cherkasov A., Ester M. SimBoost: A read-across approach for predicting drug–target binding affinities using gradient boosting machines. J. Cheminform. 2017;9:24. doi: 10.1186/s13321-017-0209-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Zeng Y., Chen X., Luo Y., Li X., Peng D. Deep drug–target binding affinity prediction with multiple attention blocks. Briefings Bioinform. 2021;22:bbab117. doi: 10.1093/bib/bbab117. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Li H., Leung K.S., Wong M.H., Ballester P.J. Low-quality structural and interaction data improves binding affinity prediction via random forest. Molecules. 2015;20:10947–10962. doi: 10.3390/molecules200610947. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Pahikkala T., Airola A., Pietilä S., Shakyawar S., Szwajda A., Tang J., Aittokallio T. Toward more realistic drug–target interaction predictions. Briefings Bioinform. 2015;16:325–337. doi: 10.1093/bib/bbu010. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Zhao Q., Duan G., Yang M., Cheng Z., Li Y., Wang J. AttentionDTA: Drug–target binding affinity prediction by sequence-based deep learning with attention mechanism. IEEE/ACM Trans. Comput. Biol. Bioinform. 2022. Online ahead of print . [DOI] [PubMed]
  • 17.Abbasi K., Razzaghi P., Poso A., Amanlou M., Ghasemi J.B., Masoudi-Nejad A. DeepCDA: Deep cross-domain compound–protein affinity prediction through LSTM and convolutional neural networks. Bioinformatics. 2020;36:4633–4642. doi: 10.1093/bioinformatics/btaa544. [DOI] [PubMed] [Google Scholar]
  • 18.Wang C., Chen Y., Zhao L., Wang J., Wen N. Modeling DTA by Combining Multiple-Instance Learning with a Private-Public Mechanism. Int. J. Mol. Sci. 2022;23:11136. doi: 10.3390/ijms231911136. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Rezaei M.A., Li Y., Wu D., Li X., Li C. Deep learning in drug design: Protein-ligand binding affinity prediction. IEEE/ACM Trans. Comput. Biol. Bioinform. 2020;19:407–417. doi: 10.1109/TCBB.2020.3046945. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Yang Z., Zhong W., Zhao L., Chen C.Y.C. MGraphDTA: Deep multiscale graph neural network for explainable drug–target binding affinity prediction. Chem. Sci. 2022;13:816–833. doi: 10.1039/D1SC05180F. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Kipf T.N., Welling M. Semi-supervised classification with graph convolutional networks. arXiv. 20161609.02907 [Google Scholar]
  • 22.Xu K., Hu W., Leskovec J., Jegelka S. How powerful are graph neural networks? arXiv. 20181810.00826 [Google Scholar]
  • 23.Velickovic P., Cucurull G., Casanova A., Romero A., Lio P., Bengio Y. Graph attention networks. Statistics. 2017;1050:20. [Google Scholar]
  • 24.Jiang M., Li Z., Zhang S., Wang S., Wang X., Yuan Q., Wei Z. Drug–target affinity prediction using graph neural network and contact maps. RSC Adv. 2020;10:20701–20712. doi: 10.1039/D0RA02297G. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Weininger D. SMILES, a chemical language and information system. 1. Introduction to methodology and encoding rules. J. Chem. Inf. Comput. Sci. 1988;28:31–36. doi: 10.1021/ci00057a005. [DOI] [Google Scholar]
  • 26.Hendrickson J.B. Concepts and applications of molecular similarity. Science. 1991;252:1189–1190. doi: 10.1126/science.252.5009.1189.a. [DOI] [Google Scholar]
  • 27.Hu J., Shen L., Sun G. Squeeze-and-excitation networks; Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; Salt Lake City, UT, USA. 18–23 June 2018; pp. 7132–7141. [Google Scholar]
  • 28.Yun C.H., Mengwasser K.E., Toms A.V., Woo M.S., Greulich H., Wong K.K., Meyerson M., Eck M.J. The T790M mutation in EGFR kinase causes drug resistance by increasing the affinity for ATP. Proc. Natl. Acad. Sci. USA. 2008;105:2070–2075. doi: 10.1073/pnas.0709662105. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Landrum G. RDKit: Open-Source Cheminformatics. 2006. [(accessed on 4 January 2023)]. Available online: http://rdkit.org/
  • 30.Zhao Q., Zhao H., Zheng K., Wang J. HyperAttentionDTI: Improving drug–protein interaction prediction by sequence-based deep learning with attention mechanism. Bioinformatics. 2022;38:655–662. doi: 10.1093/bioinformatics/btab715. [DOI] [PubMed] [Google Scholar]
  • 31.Vaswani A., Shazeer N., Parmar N., Uszkoreit J., Jones L., Gomez A.N., Kaiser Ł., Polosukhin I. Attention is all you need. Adv. Neural Inf. Process. Syst. 2017;30:5998–6008. [Google Scholar]
  • 32.Davis M.I., Hunt J.P., Herrgard S., Ciceri P., Wodicka L.M., Pallares G., Hocker M., Treiber D.K., Zarrinkar P.P. Comprehensive analysis of kinase inhibitor selectivity. Nat. Biotechnol. 2011;29:1046–1051. doi: 10.1038/nbt.1990. [DOI] [PubMed] [Google Scholar]
  • 33.Tang J., Szwajda A., Shakyawar S., Xu T., Hintsanen P., Wennerberg K., Aittokallio T. Making sense of large-scale kinase inhibitor bioactivity data sets: A comparative and integrative analysis. J. Chem. Inf. Model. 2014;54:735–743. doi: 10.1021/ci400709d. [DOI] [PubMed] [Google Scholar]
  • 34.Liu T., Lin Y., Wen X., Jorissen R.N., Gilson M.K. BindingDB: A web-accessible database of experimentally determined protein–ligand binding affinities. Nucleic Acids Res. 2007;35:D198–D201. doi: 10.1093/nar/gkl999. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Bongers B.J., IJzerman A.P., Van Westen G.J. Proteochemometrics–recent developments in bioactivity and selectivity modeling. Drug Discov. Today Technol. 2019;32:89–98. doi: 10.1016/j.ddtec.2020.08.003. [DOI] [PubMed] [Google Scholar]
  • 36.Zhao L., Akoglu L. Pairnorm: Tackling oversmoothing in gnns. arXiv. 20191909.12223 [Google Scholar]
  • 37.Siegel R.L., Miller K.D., Fuchs H.E., Jemal A. Cancer statistics, 2021. CA Cancer J. Clin. 2021;71:7–33. doi: 10.3322/caac.21654. [DOI] [PubMed] [Google Scholar]
  • 38.Remon J., Hendriks L.E., Cardona A.F., Besse B. EGFR exon 20 insertions in advanced non-small cell lung cancer: A new history begins. Cancer Treat. Rev. 2020;90:102105. doi: 10.1016/j.ctrv.2020.102105. [DOI] [PubMed] [Google Scholar]
  • 39.Leonetti A., Sharma S., Minari R., Perego P., Giovannetti E., Tiseo M. Resistance mechanisms to osimertinib in EGFR-mutated non-small cell lung cancer. Br. J. Cancer. 2019;121:725–737. doi: 10.1038/s41416-019-0573-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Soria J.C., Ohe Y., Vansteenkiste J., Reungwetwattana T., Chewaskulyong B., Lee K.H., Dechaphunkul A., Imamura F., Nogami N., Kurata T., et al. Osimertinib in untreated EGFR-mutated advanced non–small-cell lung cancer. N. Engl. J. Med. 2018;378:113–125. doi: 10.1056/NEJMoa1713137. [DOI] [PubMed] [Google Scholar]
  • 41.Scribano C.M., Wan J., Esbona K., Tucker J.B., Lasek A., Zhou A.S., Zasadil L.M., Molini R., Fitzgerald J., Lager A.M., et al. Chromosomal instability sensitizes patient breast tumors to multipolar divisions induced by paclitaxel. Sci. Transl. Med. 2021;13:eabd4811. doi: 10.1126/scitranslmed.abd4811. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42.Said R., Tsimberidou A.M. Pharmacokinetic evaluation of vincristine for the treatment of lymphoid malignancies. Expert Opin. Drug Metab. Toxicol. 2014;10:483–494. doi: 10.1517/17425255.2014.885016. [DOI] [PubMed] [Google Scholar]
  • 43.Berman H.M., Westbrook J., Feng Z., Gilliland G., Bhat T.N., Weissig H., Shindyalov I.N., Bourne P.E. The protein data bank. Nucleic Acids Res. 2000;28:235–242. doi: 10.1093/nar/28.1.235. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44.Liu H., Wang J., Zhou W., Wang Y., Yang L. Systems approaches and polypharmacology for drug discovery from herbal medicines: An example using licorice. J. Ethnopharmacol. 2013;146:773–793. doi: 10.1016/j.jep.2013.02.004. [DOI] [PubMed] [Google Scholar]
  • 45.Xu X., Zhang W., Huang C., Li Y., Yu H., Wang Y., Duan J., Ling Y. A novel chemometric method for the prediction of human oral bioavailability. Int. J. Mol. Sci. 2012;13:6964–6982. doi: 10.3390/ijms13066964. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46.Xu J., Zhu G.Y., Cao D., Pan H., Li Y.W. Gossypol overcomes EGFR-TKIs resistance in non-small cell lung cancer cells by targeting YAP/TAZ and EGFRL858R/T790M. Biomed. Pharmacother. 2019;115:108860. doi: 10.1016/j.biopha.2019.108860. [DOI] [PubMed] [Google Scholar]
  • 47.Lee H.J., Schaefer G., Heffron T.P., Shao L., Ye X., Sideris S., Malek S., Chan E., Merchant M., La H., et al. Noncovalent Wild-type–Sparing Inhibitors of EGFR T790MReversible Inhibitors of EGFR T790M. Cancer Discov. 2013;3:168–181. doi: 10.1158/2159-8290.CD-12-0357. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 48.Li X., Fan X.X., Jiang Z.B., Loo W.T., Yao X.J., Leung E.L.H., Chow L.W., Liu L. Shikonin inhibits gefitinib-resistant non-small cell lung cancer by inhibiting TrxR and activating the EGFR proteasomal degradation pathway. Pharmacol. Res. 2017;115:45–55. doi: 10.1016/j.phrs.2016.11.011. [DOI] [PubMed] [Google Scholar]
  • 49.Renner O., Mayer M., Leischner C., Burkard M., Berger A., Lauer U.M., Venturelli S., Bischoff S.C. Systematic Review of Gossypol/AT-101 in Cancer Clinical Trials. Pharmaceuticals. 2022;15:144. doi: 10.3390/ph15020144. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 50.Zhao R., Zhou S., Xia B., Zhang C.y., Hai P., Zhe H., Wang Y.y. AT-101 enhances gefitinib sensitivity in non-small cell lung cancer with EGFR T790M mutations. BMC Cancer. 2016;16:491. doi: 10.1186/s12885-016-2519-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 51.Forli S., Huey R., Pique M.E., Sanner M.F., Goodsell D.S., Olson A.J. Computational protein–ligand docking and virtual drug screening with the AutoDock suite. Nat. Protoc. 2016;11:905–919. doi: 10.1038/nprot.2016.051. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 52.Laederach A., Reilly P.J. Specific empirical free energy function for automated docking of carbohydrates to proteins. J. Comput. Chem. 2003;24:1748–1757. doi: 10.1002/jcc.10288. [DOI] [PubMed] [Google Scholar]
  • 53.Selvaraju R.R., Cogswell M., Das A., Vedantam R., Parikh D., Batra D. Grad-cam: Visual explanations from deep networks via gradient-based localization; Proceedings of the IEEE International Conference on Computer Vision; Venice, Italy. 22–29 October 2017; pp. 618–626. [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

The source code at https://github.com/YongtaoQian/DoubleSG-DTA (accessed on 4 January 2023).


Articles from Pharmaceutics are provided here courtesy of Multidisciplinary Digital Publishing Institute (MDPI)

RESOURCES