A systematic pipeline of protein structure selection for computer‐aided drug discovery: A case study on T790M/L858R mutant EGFR structures

Agneesh Pratim Das; Prajwal Nandekar; Puniti Mathur; Subhash M Agarwal

doi:10.1002/pro.4740

. 2023 Sep 1;32(9):e4740. doi: 10.1002/pro.4740

A systematic pipeline of protein structure selection for computer‐aided drug discovery: A case study on T790M/L858R mutant EGFR structures

Agneesh Pratim Das ^1,², Prajwal Nandekar ³, Puniti Mathur ², Subhash M Agarwal ^1,^✉

PMCID: PMC10443354 PMID: 37515373

Abstract

Virtual screening (VS) is a routine method to evaluate chemical libraries for lead identification. Therefore, the selection of appropriate protein structures for VS is an essential prerequisite to identify true actives during docking. But the presence of several crystal structures of the same protein makes it difficult to select one or few structures rationally for screening. Therefore, a computational prioritization protocol has been developed for shortlisting crystal structures that identify true active molecules with better efficiency. As identification of small‐molecule inhibitors is an important clinical requirement for the T790M/L858R (TMLR) EGFR mutant, it has been selected as a case study. The approach involves cross‐docking of 21 co‐crystal ligands with all the structures of the same protein to select structures that dock non‐native ligands with lower RMSD. The cross docking performance was then correlated with ligand similarity and binding‐site conformational similarity. Eventually, structures were shortlisted by integrating cross‐docking performance, and ligand and binding‐site similarity. Thereafter, binding pose metadynamics was employed to identify structures having stable co‐crystal ligands in their respective binding pockets. Finally, different enrichment metrics like BEDROC, RIE, AUAC, and EF_1% were evaluated leading to the identification of five TMLR structures (5HCX, 5CAN, 5CAP, 5CAS, and 5CAO). These structures docked a number of non‐native ligands with low RMSD, contain structurally dissimilar ligands, have conformationally dissimilar binding sites, harbor stable co‐crystal ligands, and also identify true actives early. The present approach can be implemented for shortlisting protein targets of any other important therapeutic kinases.

Keywords: binding pose metadynamics, cross docking, drug discovery, EGFR, kinase, structure selection, virtual screening

1. INTRODUCTION

In recent times, molecular docking and virtual screening (VS) have become indispensable tools of the computer‐aided drug discovery (CADD) process (Macalino et al., 2015). Among the different avenues of CADD, structure‐based drug discovery (SBDD) techniques are dependent on the 3D structure of protein targets. Identification of 3D structures for VS is one of the most important prerequisites of any in silico docking‐based study (Anderson, 2003; Lionta et al., ²⁰¹⁴). The library of molecules screened during VS is hierarchically classified based on how well they bind to the receptor site leading to the identification of ligands that are more likely to have pharmacological action against the protein target. Therefore, the selection of an optimal structure in a VS exercise leads to a higher probability of identifying true positives that have better potential to act as strong binders when subjected to experimental validation. Specifically, in the case of proteins for which several crystal structures have been resolved and are present in Protein Data Bank (PDB), it becomes difficult to select a few structures for VS (Sharma et al., 2016; Yadav, Nandekar, et al., ²⁰¹⁴). In such cases, structures are generally selected using qualitative assessments, such as the type and activity of bound ligands, missing residue counts, and structural quality (such as resolution). However, no systematic evaluation is generally undertaken. In such instances, problems may arise because the selected structures are not vetted for their ability to identify true actives, and therefore, the results may not be optimal. Thus, to identify X‐ray crystal structures for VS that may uncover tight binders having higher chances of developing into promising lead molecules, the current protocol has been developed.

The Epidermal Growth Factor Receptor (EGFR) is one of the most commonly studied proteins of the kinase family and a major therapeutic target (Agarwal et al., 2022; Singh et al., ²⁰¹⁵; Yadav, Singh, et al., ²⁰¹⁴; Zhang et al., ²⁰⁰⁷). This protein tyrosine kinase (PTK) belongs to the ErbB family of receptor proteins and governs several cellular processes like proliferation, differentiation, migration, angiogenesis, and apoptosis (Burgess et al., 2003; Olayioye et al., ²⁰⁰⁰). Overexpression of EGFR and its downstream pathways are associated with a wide range of cancers namely, non‐small cell lung cancer (NSCLC), breast, colorectal, head and neck, ovarian, and bladder cancers (Mendelsohn & Baselga, 2003). Therefore, several first‐generation drugs have been developed that target the L858R activating mutation in EGFR, which turns the protein oncogenic in nature. But in 60% of NSCLC cases, it is observed that after 6–12 months of treatment, the drugs become ineffective due to the development of an acquired threonine to methionine mutation at the 790th position (T790M) (Agarwal et al., 2017; Chan et al., ²⁰¹⁶; Engelhardt et al., ²⁰¹⁹; Heald et al., ²⁰¹⁵). Therefore, the double mutant form of the EGFR protein (T790M/L858R, i.e., TMLR) is a clinically important cancer target, for which there is a need of identifying new small molecule inhibitors (Fatima & Agarwal, 2018; Fatima et al., ²⁰¹⁹; Hanan et al., ²⁰¹⁴; Saini et al., ²⁰²⁰; Yun et al., ²⁰⁰⁸; Zhong et al., ²⁰²¹).

The double‐mutated EGFR (TMLR) has several available crystal structures in the PDB owing to its significance as a therapeutic target and thus, an efficient strategy is needed for selecting appropriate receptor structures for molecular docking‐based VS. Therefore, in the current study, a prioritization protocol is developed to identify optimal structures for TMLR using an in silico approach. The ligand‐bound TMLR structures in PDB were first screened based on the presence of biological activity of their co‐crystallized inhibitor molecule and their resolution. Thereafter, cross‐docking was performed wherein each of the co‐crystallized ligands were docked to each of the TMLR crystal structures and the structures which accommodated higher number of non‐native ligands (with lower RMSD) were selected. Along with cross‐docking, ligand similarity was also computed to assess the number of dissimilar ligands binding with lower RMSD. Simultaneously, binding‐site similarity was also analyzed and structures with conformational variation in the binding sites were identified. By integrating cross‐docking performance with ligand similarity and binding‐site conformation, a few structures were shortlisted for further evaluation. As in VS, the co‐crystal ligand is used to identify the binding‐site and also acts as a control to select a few molecules with more favorable docking scores for experimental validation; therefore, the stability of the co‐crystal ligand's pose in the binding pocket was evaluated by subjecting the selected protein–ligand complexes to binding pose metadynamics (BPMD) (Clark et al., 2016). The structures that were found to be stable through BPMD were used to evaluate their ability to identify true positives during docking‐based VS. The implementation of the above workflow allowed the shortlisting of five TMLR crystal structures that have a higher potential to identify true positive lead molecules.

2. RESULTS AND DISCUSSION

The double‐mutated EGFR protein harboring the TMLR mutations is an important clinical target for NSCLC treatment as it causes resistance to first‐ and second‐generation drugs (Agarwal et al., 2017; Bryan et al., ²⁰¹⁶; Lu et al., ²⁰¹⁸; Sogabe et al., ²⁰¹³). So, identification of new inhibitors for this mutated protein is a therapeutic requirement for which VS is an important in silico screening approach. Owing to its clinical significance, the PDB database houses 29 ligand bound TMLR X‐ray crystal structures, which makes it difficult to prioritize one or few structures for optimal performance toward the identification of strong binders. Among these 29 structures, the K_i value of the co‐crystallized inhibitor against the mutant protein was reported for 23 structures and it varied from 1.2 to 262 nM. Furthermore, for 21 of the protein–ligand complexes, the resolution was less than 3 Å. As a result, 21 structures having biological activity of the ligand bound to the X‐ray crystal structure and resolution <3 Å were shortlisted for cross‐docking (Table 1 and Figure 1).

TABLE 1.

Detailed characteristics of the 21 crystal structures with resolution <3 Å selected from PDB.

Sl no.	PDB ID	Res (Å)	Ligand ID	TMLR K_i (nM)	Reference
1	3W2R	2.05	W2R	19	(Sogabe et al., 2013)
2	4RJ4	2.78	3QW	16	(Hanan et al., 2014)
3	4RJ6	2.7	3R0	76	(Hanan et al., 2014)
4	4RJ7	2.55	3R1	22	(Hanan et al., 2014)
5	4RJ8	2.5	3QS	17	(Hanan et al., 2014)
6	5C8M	2.9	4YW	64	(Heald et al., 2015)
7	5C8N	2.4	4YX	28	(Heald et al., 2015)
8	5CAL	2.7	4Z8	22	(Heald et al., 2015)
9	5CAN	2.8	4ZB	124	(Heald et al., 2015)
10	5CAO	2.6	4ZG	38	(Heald et al., 2015)
11	5CAP	2.4	4ZH	52	(Heald et al., 2015)
12	5CAQ	2.5	4ZJ	2.7	(Heald et al., 2015)
13	5CAS	2.1	4ZQ	1.4	(Heald et al., 2015)
14	5CAU	2.25	4ZR	1.6	(Heald et al., 2015)
15	5EDQ	2.8	5N3	2.1	(Hanan et al., 2016)
16	5EDR	2.6	5N4	34.3	(Hanan et al., 2016)
17	5EM5	2.65	5Q2	262	(Bryan et al., 2016)
18	5EM6	2.78	5Q3	4	(Bryan et al., 2016)
19	5EM7	2.81	5Q4	19	(Bryan et al., 2016)
20	5HCX	2.6	60B	4.1	(Chan et al., 2016)
21	5HCY	2.46	60D	1.2	(Chan et al., 2016)

Open in a new tab

2.1. Prioritization of structures based on cross‐docking and ligand similarity

Often, ligands with diverse scaffolds can stimulate changes in the active site of the protein during the process of binding (Ikeguchi et al., 2005). The various chemical characteristics of these ligand molecules can in turn induce different conformational changes in the same protein, which may be reflected after it has been crystallized. Therefore, cross‐docking was performed where each of the 21 co‐crystal ligands extracted from their corresponding PDB structures were docked to each of the 21 mutated proteins and the RMSD values of the docked ligands were calculated (Table S1 and Figure 2 ). As the presence of more closely related ligands makes cross‐docking equivalent to self‐docking, therefore, the similarities between the 21 co‐crystal ligands were calculated using the Tanimoto coefficient (Table S2 and Figure 3) so that dissimilar non‐native ligands that bind with lower RMSD are identified.

Results of the cross‐docking exercise performed on the 21 protein–ligand complexes. Ligands from respective PDB IDs are on X‐axis and protein PDB IDs are shown on Y‐axis. Color gradient shows the ligand RMSD values.

Heatmap representing ligand similarity across the 21 co‐crystal ligands. PDB IDs of the co‐crystal ligands are on the X and Y‐axes. Color gradient shows the Tanimoto scores.

The cross‐docking results indicated that in the case of 13 crystal structures (4RJ4, 4RJ8, 5C8M, 5C8N, 5CAL, 5CAN, 5CAO, 5CAP, 5CAQ, 5CAS, 5CAU, 5HCX, and 5HCY), at least 10 or more co‐crystal ligands were docked with an RMSD less than 2.5 Å (Figure 2). Among these, 5HCY showed the best performance, which docked 15 co‐crystal ligands with an RMSD of less than 2.5 Å. Similarly, 5CAS, 5CAP, 5C8M, and 5CAO docked 14 co‐crystal ligands with lower RMSD values. For 5CAU, 5HCX, 4RJ8, and 5CAQ, the number of ligands docked with RMSD <2.5 Å is 13. 5C8N, 5CAN, and 4RJ4 accommodated 12 ligands each. However, for 5CAL, only 11 ligands showed an RMSD of less than 2.5 Å. Apart from the 13 structures, the other eight structures were able to dock only 2–6 co‐crystal ligands with lower RMSDs and therefore were removed.

Among these 13, the cross‐docking results of the 7 structures (5HCY, 5CAP, 5C8M, 5CAO, 5HCX, 5CAN, and 5CAL) did not have any ligands with a Tanimoto score >0.8. This indicates the absence of ligands similar to their respective co‐crystal ligands. For the remaining six structures (5CAS, 4RJ8, 5CAU, 5C8N, 5CAQ, and 4RJ4), the number of cross‐docked co‐crystal ligands with Tanimoto score >0.8 varied from 3 to 6. These structures were not removed at this step because even if two ligands share structural similarity, they may have dissimilar binding‐site conformation. Therefore, these 13 TMLR structures that were able to dock several non‐native ligands with lower RMSDs were further taken up for binding‐site conformational similarity analysis.

2.2. Screening of structures based on binding‐site residue conformational similarity, cross‐docking efficiency, and ligand similarity

Cross‐docking and ligand similarity information alone are not sufficient to warrant the selection or removal of structures prior to the evaluation of binding‐site conformation. For instance, crystal structures with good cross‐docking performance and dissimilar ligands should not be selected immediately because they may harbor similar binding sites leading to a higher probability of giving similar results. Alternatively, structures with good cross‐docking performance and similar co‐crystal ligands should not be removed, as they may exhibit variations in binding‐site conformation. Therefore, in order to further screen the 13 TMLR structures identified through cross‐docking and ligand similarity, the similarity between their binding‐site residue conformations were compared (Table S3).

The comparison of the RMSD between the binding‐site residues of the 13 proteins (Figure 4) revealed four blocks of similarity. The first block consists of 5C8M and 5CAQ, whose binding site was found to be similar (RMSD = 1.05 Å). In order to determine the better performing structure among the two, their cross‐docking and ligand similarity results were checked and it was observed that 5C8M had docked a greater number of non‐native ligands with RMSD less than 2.5 Å, none of which were similar to its own co‐crystal ligand (i.e., Tanimoto >0.8) (Figure 2). Therefore, 5C8M was chosen among these two. The second block consisted of 5C8N, 5CAS, and 5CAU (Figure 4). As the binding site of 5CAS and 5CAU (RMSD = 0.91 Å) was similar, therefore, their cross‐docking performance was checked and 5CAS was selected as it had docked more ligands with lower RMSD and had lesser number of similar non‐native co‐crystal ligands than 5CAU. However, since the binding‐site residue conformation of 5C8N was different from both 5CAS and 5CAU (RMSD ~2.9 Å in both cases), it was also selected. In the third block, similarity was observed between the binding‐site residue conformation of 5CAL and 5CAN (RMSD = 0.91 Å). Since, both of these structures had no similar ligands in their cross‐docked sets, evaluation of their cross‐docking results led to the selection of 5CAN as it had demonstrated better performance in the cross‐docking exercise (Figure 2). In the fourth block, 5CAO and 5HCX were compared; however, their binding‐site residues were found to be conformationally dissimilar (RMSD = 3.8 Å). Therefore, both 5CAO and 5HCX were shortlisted for further analysis as they demonstrated a difference in their binding‐site conformation and did not have any similar non‐native co‐crystal ligands. In addition, 4RJ4, 4RJ8, 5CAP, and 5HCY were also selected for further analysis as their binding‐site residue conformation was considerably different from all the other structures (RMSDs >7 Å). Among these four structures, the co‐crystal ligands of only 4RJ4 and 4RJ8 were similar (Tanimoto coefficient = 0.84); however, both were selected as they demonstrated conformational difference in their binding‐site residues (RMSD = 7.1 Å). Overall, 10 protein structures (5C8M, 5C8N, 5CAS, 5CAN, 5CAO, 5HCX, 4RJ4, 4RJ8, 5CAP, and 5HCY) were shortlisted for further analysis.

Heatmap representation of the binding‐site residue conformation similarity (RMSD) of the 13 TMLR crystal structures. Color gradient shows the RMSD values. The actual values are provided in Supporting Information Table S3.

2.3. Binding pose metadynamics

As a common practice in CADD exercises, the co‐crystal ligands are self‐docked to their native proteins in order to derive a comparative measure of their binding affinity and interactions. This information is then utilized to evaluate and prioritize the molecules screened during VS. However, literature and experimental evidence has shown that in several X‐ray crystal structures, the co‐crystal ligand does not have a stable pose in the binding site of the protein (Reynolds, 2014). Therefore, it is essential to identify such proteins where the bound pose of the co‐crystal ligand is stable. As, BPMD is capable of distinguishing between stable and unstable ligand binding pose in the crystal structures (Clark et al., 2016; Fusani et al., ²⁰²⁰), the binding poses of the co‐crystal ligands in the active sites of the 10 proteins (identified from the previous step) were assessed using BPMD's PoseScore and PersScore functionalities. The PoseScore, which is an average of the 10 consecutive RMSD calculations with respect to the original ligand heavy atoms coordinates, was used to evaluate the BPMD results for pose stability. Since by convention RMSD value ≤2 Å is considered as an indicator of stability in molecular dynamics simulations, therefore a PoseScore value ≤2 was considered as stable. Furthermore, the PersScore was checked in order to ascertain whether BPMD had any effect on the hydrogen bond (H‐bond) interaction of the protein–ligand complexes. Generally higher PersScore values (i.e., closer to 1) indicate continued H‐bond interaction between the protein and ligand during the last 2 ns of the simulation. While lower values of PersScore (i.e., closer to 0) denote either the absence or loss of H‐bond contacts during the course of the metadynamics simulation.

In the current study, the PoseScore of the 10 proteins ranged from 0.741 to 4.69. Among these proteins, the lowest PoseScore was observed for 5HCX (0.741), which exhibited the most stable co‐crystal ligand pose (Figure 5). Apart from 5HCX, five other proteins 5CAO, 5CAN, 5CAS, 5CAP, and 5C8M had also exhibited PoseScores <2, that is, 0.806, 0.882, 1.014, 1.296, and 1.844, respectively (Figure 5). For three of the six above‐mentioned proteins (5HCX, 5CAO, 5CAS), their respective PersScore values were 0.838, 0.828, 0.83, which indicated the retention of hydrogen bonds for 80% or more of the average period of simulation. While in case of the remaining three proteins (5CAN, 5CAP, 5C8M), the H‐bond contacts were maintained for over 50%–60% of simulation time. On the other hand, PoseScore values >2 was observed for four structures, namely, 5C8N (2.767), 5HCY (3.403), 4RJ8 (4.429), 4RJ4 (4.469) (Figure 5). The high RMSDs observed for these ligands is a clear indication of the co‐crystal ligand's instability in their respective binding pockets because of which they were displaced from their initial position when subjected to BPMD's bias. 4RJ8 and 4RJ4, which displayed PoseScores of 4.429 and 4.69 exhibited zero PersScore values indicating a total absence/loss of H‐bond contacts. The remaining two proteins, 5C8N and 5HCY, also exhibited poor PersScore values (0.173 and 0.107, respectively). Therefore, these four proteins that showed high PoseScore and low PersScore were removed from consideration at this stage. Moreover, a comparison between the PoseScore and PerScore distribution of the 10 proteins (Figure 6) revealed that in all the cases when a protein–ligand complex has good PoseScore (i.e., less than 2) it also demonstrated good PersScore (i.e., greater than 50% interactions). Therefore, the six crystal structures 5HCX, 5CAO, 5CAN, 5CAS, 5CAP, and 5C8M having PoseScore <2 and PersScore >0.5 were thus taken forward for enrichment factor (EF) calculations (Figures 5 and 6). Furthermore, the similarity of the crystal ligands bound to these six proteins was also considered and it was observed that all the six co‐crystal ligands are dissimilar (Figure 3 and Table S2).

Binding Pose Metadynamics results of the 10 EGFR mutant proteins along with their PoseScore and PersScore.

Comparison between PoseScore and PersScore of the 10 protein–ligand complexes for which BPMD was performed.

2.4. Enrichment metric evaluation

A common difficulty that arises with VS is the “early recognition” problem wherein the position of active molecules in the rank ordered screened list of ligands is unknown, that is, whether they lie in the early or last part or distributed throughout the set (Truchon & Bayly, 2007). Since, experimental validation is done post VS, for the top few compounds only, therefore identification of an optimal crystal structure that is capable of identifying the true actives early in the dataset is critical. If active molecules are not present early in the set, then there is a chance that they may be missed during experimental validation. Therefore, the six TMLR crystal structures shortlisted after cross‐docking, ligand similarity, binding‐site conformational similarity comparison, and BPMD were then validated by docking a library of molecules containing both actives and decoys and evaluating them using various enrichment metrics like ROC, area under the accumulation curve (AUAC), robust initial enhancement (RIE), Boltzmann‐enhanced discrimination receiver operator characteristic (BEDROC), and 1% enrichment factor (EF_1%) (Table 2).

TABLE 2.

Results of the enrichment calculations performed for the six proteins identified through BPMD.

Protein	Res (Å)	K_i (nM)	Pose score	Pers score	BEDROC	ROC	RIE	AUAC	EF_1%
5HCX	2.6	4.1	0.741	0.838	0.878	0.90	11.97	0.89	46
5CAN	2.8	124	0.882	0.538	0.873	0.93	12.25	0.92	45
5CAP	2.4	52	1.296	0.598	0.838	0.92	12.01	0.91	44
5CAO	2.6	38	0.8	0.828	0.823	0.92	11.91	0.91	44
5CAS	2.1	1.4	1.014	0.830	0.832	0.93	12.09	0.93	43
5C8M	2.9	64	1.84	0.539	0.746	0.92	11.7	0.92	38

Open in a new tab

Abbreviations: AUAC, area under the accumulation curve; BEDROC, Boltzmann‐enhanced discrimination receiver operator characteristic; BPMD, binding pose meta‐dynamics; EF_1%, 1% enrichment factor; RIE, robust initial enhancement; ROC, receiver operating characteristic curve.

The results of the enrichment analysis showed that the six structures performed well and were able to efficiently identify true active molecules as evident from their ROC and AUAC values ranging from 0.90 to 0.93 and 0.89 to 0.93, respectively (Table 2). Since ROC and AUAC are both unweighted metrics, their performance cannot be interpreted with respect to early recognition (Truchon & Bayly, 2007). Therefore, RIE was estimated for these six proteins as it is a well‐known metric of “early recognition.” The RIE values range from 11.7 to 12.25 (Table 2). One of the main advantages of using RIE stems from the fact that it is able to discriminate cases when the active molecules have been ranked in the earlier part of the list in comparison to the later part or near the cut‐off values (Sheridan et al., 2001). But since RIE has no upper or lower limits, it becomes difficult to interpret the magnitude of the results. Therefore, BEDROC was computed which is capable of identifying whether actives are present early in the list or not and has the increased advantage of being bound by limiting values (0–1) (Truchon & Bayly, 2007). For five of the six structures evaluated, namely, 5HCX, 5CAN, 5CAP, 5CAO, and 5CAS, the BEDROC, and EF_1% values ranged from 0.823 to 0.878 and 43 to 46, respectively. For the remaining sixth protein, 5C8M, its BEDROC and EF_1% values were less than 0.8 and 40, respectively, and hence was removed from consideration. Since the BEDROC values of 5HCX, 5CAN, 5CAP, 5CAO, and 5CAS was >0.8, therefore their potential of ranking true actives early in the dataset is high (Table 2). Additionally, EF_1% values of these five proteins range from 43 to 46, which is quite good considering the maximum achievable EF_1% in our case is ~50, thereby indicating that for each of these five proteins at least 35 active molecules were present in top 42 (i.e., 1%) ranked compounds. Therefore, five TMLR crystal structures have been shortlisted that are not only capable of identifying true actives with high efficiency but also perform well with respect to the “early recognition” problem.

2.5. Conserved interaction analysis

Hydrogen bonds are not only essential for the stability of a ligand but it also decides the orientation and conformation of the ligand in the binding pocket. Similarly, hydrophobic interactions indicate the lipophilic nature of a ligand, which in turn increases its binding affinity toward the protein (Agarwal et al., 2017). Therefore, the presence of hydrogen and hydrophobic interactions was checked between the five shortlisted structures and their respective ligands using LigPlot (Laskowski & Swindells, 2011). Analysis of the five shortlisted structures revealed conserved hydrogen bonds with Gln791, Met793, and Thr854. Herein, Met793 acts as a hydrogen bond donor, while Gln791 acts as a hydrogen bond acceptor. Since both Gln791 and Met793 are hinge region residues of the EGFR backbone, hydrogen bonds with these residues are important for ligand stability. Moreover, the hydroxyl group of Thr854 has been known to form important interaction with ligand molecules that ultimately contribute toward the stability of the protein–ligand complexes. In addition to hydrogen bonds, hydrophobic interactions with Leu718, Ala743, Lys745, Cys775, Met790, Leu792, Leu844, and Asp855 were conserved in all five protein–ligand complexes. Since these residues are usually buried in the core of the protein structure, establishing connections with these residues might ultimately contribute to the increased binding strength of the ligands. Our analysis thereby indicates that the identified complexes show conserved interactions even though they contain different ligands and have dissimilar binding‐site residue orientation.

2.6. Control calculations

Considering that any important structures may have been filtered out during the systematic selection protocol, enrichment calculations were performed on the structures that were initially filtered during cross‐docking as a control. 4RJ6, 4RJ7, 5EDQ, 5EDR, 5EM5, 5EM6, 5EM7, and 3W2R were directly selected for the docking‐based enrichment calculations, with the same method as discussed above. The results showed that the BEDROC values of these structures ranged from 0.074 to 0.402 with an average BEDROC score of 0.28 in comparison to 0.85 of the five selected structures. Similarly, the EF_1% of these structures ranged from 4.7 to 24 with an average value of ~15 in comparison to ~44 for the selected structures. This further demonstrates that the prioritization and selection approach developed in the current study is indeed capable of identifying such 3D protein crystal structures that have a higher probability of identifying truly active molecules.

2.7. Computational resources

The current study has been performed on a Linux system with 16GB RAM, Intel i9 processor having a clock speed of ~3 GHz, and 16GB GPU (Quadro RTX 5000). BPMD for a protein–ligand complex was completed in ~6 h in the above‐mentioned system. Although, in the present study, Schrodinger software has been used for implementing the proposed workflow; however, this may also be achieved using open‐source tools like AutoDock/AutoDock Vina (Eberhardt et al., 2021; Morris et al., ²⁰⁰⁹) and OpenBPMD (Lukauskis et al., 2022).

3. CONCLUSION

The double mutated EGFR protein (T790M/L858R) is an important therapeutic target in lung cancer and therefore several crystal structures have been resolved for this protein. As a result, selecting one or more of these structure(s) for VS or ensemble docking becomes difficult. Therefore, in the current study, a computational workflow has been developed wherein multiple approaches including the resolution of crystal structures, the binding affinity of the co‐crystal ligands, cross‐docking, ligand similarity, binding‐site residue conformational similarity comparison, BPMD, and enrichment calculations have been integrated. This allowed the shortlisting of five TMLR crystal structures that have docked a number of dissimilar non‐native ligands with low RMSD, differ in their binding site, harbor stable co‐crystal ligands and also have a higher potential of identifying true actives early in the screening process. Moreover, the prioritizing approach implemented in the current study can also be used for shortlisting structures of other kinases or therapeutically relevant proteins for structure‐based drug discovery.

4. MATERIALS AND METHODS

The in silico pipeline developed in this study for the identification of TMLR crystal structures that can be used for VS is described below as well as given in the flowchart (Figure 1).

4.1. Collection of double‐mutated EGFR (TMLR) crystal structures

Although several crystal structures are present in the PDB (Berman et al., 2000) for EGFR protein and its mutations; however, we have considered only those structures that harbor the TMLR double mutation. As it is known that the protein structure in inhibitor bound form represents the bioactive conformation, the TMLR crystal structures identified by querying PDB were checked for the presence of co‐crystal ligands and their reported biological activity in the form of inhibitor constant (K_i) (Cheng & Prusoff, 1973). Structures with either no co‐crystal ligand or with no reported K_i values were not considered. The structures were then checked for their resolution, as it gives an idea about the quality of the protein crystal. By convention crystals that are resolved at 2 Å or less are considered better (Holton & Frankel, 2010). However, since majority of the PDB IDs identified had a resolution of nearly 3 Å, therefore the structures with resolution lesser than 3 Å were retained for the analysis. Finally, 21 crystal structures out of the 29 present in PDB, were selected for further analysis.

4.2. Preparation of proteins

The tyrosine kinase domain of EGFR is approximately 300 amino acid long ranging from around 695 to 1022 residues with minor variations at both ends. In order to ensure the variations at the N and C terminals of the different protein crystals are dealt with, all the structures were truncated from the terminal ends such that they uniformly ranged from 696th to the 990th residues. All the protein structures were prepared using protein preparation wizard of Schrodinger Suite 2021. Apart from the addition of hydrogen atoms and missing side chains, the Prime module was also utilized for filling the in‐between gaps of structures containing missing residues. Herein, it was observed that mostly there are two regions in the EGFR kinase domain where residues have not been resolved, that is, between residue number 748–749, and 862–875. Additionally, it was also observed that none of the missing regions lie in the binding site of the protein. All water molecules and other het groups present in protein structures were deleted to maintain the homogeneity in the binding pocket. Hydrogen bond network was optimized using PROPKA at pH 7.4. The protein structure was energy minimized using OPLS4 force field with convergence gradient of 0.3 Å RMSD.

4.3. Cross‐docking

The various TMLR crystal structures considered in this study belong to the same protein, but due to variations in crystallization conditions, they exhibit diverse conformations. Cross‐docking is an efficient method for evaluating the performance of a protein across a wide set of ligands, wherein the co‐crystal ligands isolated from different PDB structures of the same protein are docked to each of the proteins (Shamsara, 2016). Therefore, this method was utilized to evaluate how the crystal structures of the same protein binds to different ligands with diverse scaffolds. All the protein–ligand complexes were first aligned and the co‐crystal ligands of all the receptors were removed. The 3D conformation of the ligands was used by directly taking them out from the crystal and were prepared by assigning bond order and adding hydrogens. Since, the ability of a ligand to explore the active site and fit into it is dependent on the initial conformation of the ligand, therefore if the starting geometry of ligand is far from its actual binding pose, then it may hinder the docking algorithms ability to find the optimal pose. Therefore, it is important to consider the crystallographic orientation of the ligand derived from the co‐crystallized inhibitor. The grid for molecular docking was generated using the centroid of the co‐crystal ligand. Thereafter, cross‐docking was performed in extra‐precision mode (XP) using the Glide module where all the receptor structures were docked with each of the co‐crystal ligands and the RMSD was calculated. Ten poses were subjected to post‐docking minimization and the best pose (lowest Glide Score) was considered. The matrix of ligand RMSD values (21 × 21) generated as result was then transformed into a heatmap.

4.4. Ligand similarity

The co‐crystal ligands of the 21 structures were checked for their similarity using the Tanimoto coefficient. In order to calculate the similarity, the SMILES of the co‐crystal ligands were generated and used as input in the ChemFPS module of the ChemDes web server (Dong et al., 2015) using MACCS fingerprints. The higher the Tanimoto score, the greater the similarity between two structures, and vice‐versa. The Tanimoto score varies from 0 to 1, wherein molecules with a score higher than 0.8 are considered as similar. Each co‐crystal ligand molecule was used as a query and compared with the other ligands to generate a matrix of Tanimoto scores, which was then converted to a heatmap.

4.5. Binding‐site residue similarity

In order to calculate the conformational similarity of binding‐site residues, the residues residing within the binding pocket of all the aligned proteins were compared and a matrix of RMSD values was generated. The residues that are present within 5 Å of the ligand were defined as the binding‐site residues and used for determining the conformational similarity.

The heatmaps for representing the results of the cross‐docking, ligand similarity, and binding‐site residue similarity have been generated in R using the “pheatmap” package.

4.6. Binding pose metadynamics

Since the co‐crystallized ligand and its associated active site conformation is important for docking‐based VS, there is a need to verify the conformational stability of bound ligand inside such X‐ray crystal structures. Thus, in the current study, BPMD module of Schrodinger Suite 2021 was utilized for identifying the crystal structures of protein–ligand complexes having stable co‐crystal ligand pose in the binding pocket. BPMD is an enhanced sampling approach that can sample the structural dynamics of the protein–ligand complex in a relatively shorter timeframe in comparison to molecular dynamics (Bernardi et al., 2015). Using this methodology, it is possible to estimate the ligand binding pose by reconstructing the entire free energy landscape of protein–ligand interaction where the ligand is compelled to traverse in and around its binding pose, and the presence of increased mobility is considered as an indication of binding mode instability (Clark et al., 2016). The BPMD panel performs 10 separate 10 ns metadynamics simulations with collective variable as a measure of the RMSD on each protein–ligand complex structure. BPMD provides scores that are associated with the ligand's consistency during the metadynamics simulations, namely, PoseScore (the average of the RMSD values obtained with respect to the starting structure) and PersScore (a measure of hydrogen bond persistence in the last 2 ns of the simulation, averaged across all 10 repeat runs).

4.7. Enrichment calculations

The TMLR crystal structures shortlisted through the previous steps were then evaluated for their ability to identify strong binders from an active and decoy dataset. PubMed was searched to identify literature containing information of small molecule inhibitors that act against TMLR and have reported biological activity in the form of K_i. An active set comprising of 84 molecules having K_i value less than 100 nM was then created and the SMILES were used as input in the Database of Useful (Docking) Decoys: Enhanced (DUD‐E) (Mysinger et al., 2012). A total of 50 decoys were generated for each active molecule leading to the total of 4200 decoys for 84 actives. Thereafter, duplicate molecules in the decoy dataset were removed, which led to the creation of a dataset of 4095 unique molecules. Finally, a dataset of 4179 molecules containing both actives and decoys were generated, which was used for docking‐based VS in XP mode against the selected TMLR crystal structures. The results of the docking exercise (for each of the proteins) were then analyzed using the Docking‐based Enrichment Calculator panel and several enrichment metrics were computed. The different statistical measures that have been computed are as follows:

4.7.1. Area under the receiver operating characteristic curve

The receiver operating characteristic (ROC) metric is generally used to graphically represent the performance of the screen by providing a measure of the area that is being covered by the outcome of the screening process in comparison to random picking (Triballeau et al., 2005). The formula for ROC is given as follows,

ROC = \frac{1}{(nN)} \sum_{k = 2}^{N} F_{a} (k) [F_{i} (k) - F_{i} (k - 1)]

where $F_{i} (k)$ is the total number of inactives at rank position $k$ , “n” is the number of actives among “N” compounds, and F _a(k) is the total number of actives at rank position “k.” The value of ROC ranges from 0 to 1, where 0.5 represents random behavior and 1 represents optimal screen performance.

4.7.2. Area under the accumulation curve

AUAC is another commonly used measure for ranking the performance of a screen where the AUAC score can be used as the probability that an active will be ranked before a randomly chosen compound from a uniformly distributed dataset (Empereur‐Mot et al., 2015). The AUAC may be defined as,

AUAC = \frac{1}{2 nN} \sum_{k = 0}^{N - 1} [F_{a} (k) + F_{a} (k + 1)]

where the total number of actives and the total number of compounds examined are denoted by “n” and “N,” respectively and F _a(k) is the total number of actives at rank position “k.” The value of AUAC ranges from 0 to 1, with 1 denoting the best possible result for the structure used for screening the active‐decoy set.

4.7.3. Robust initial enhancement

The RIE metric uses an exponential term with a constant decline in value to weight ranked active molecules where higher‐ranking actives are given greater weight (Sheridan et al., 2001). As a result of this, structures with higher positive RIE values are better at spotting actives earlier. It is given as,

RIE = \frac{\sum_{i = 1}^{n} e^{- α x_{i}}}{\frac{n}{N} (\frac{1 - e^{- α}}{e^{α / N} - 1})}

where, “n” is the number of actives among “N” compounds and “ $α$ ” is the early recognition parameter.

4.7.4. Boltzmann‐enhanced discrimination receiver operator characteristic

BEDROC calculates the likelihood that an active would be ranked higher than a compound chosen at random from an exponential probability distribution. The value of BEDROC ranges from 0 to 1, with 1 indicating the best possible screening result (Truchon & Bayly, 2007). It is calculated using the formula mentioned below

BEDROC = \frac{\sum_{i = 1}^{n} e^{- α x_{i} / N}}{\frac{n}{N} (\frac{1 - e^{- α}}{e^{α / N} - 1})} \frac{R_{a} e^{a R_{a}} (e^{a} - 1)}{(e^{a} - e^{{aR}_{a}}) (e^{{aR}_{a}} - 1)} + \frac{1}{1 - e^{a (1 - R_{a})}}

where “x _i” is the relative rank of the ith active in the sorted list, “R _a” is the ratio of the total number of actives “n” to the total number of compounds “N” that were screened and “ $α$ ” is the early recognition parameter.

4.7.5. 1% enrichment factor

EF_1% is a measure of the relative occurrence of actives within 1% of the ranked list of molecules in comparison to their random distribution (Halgren et al., 2004). The formula for EF_1% is given as,

{EF}_{1 %} = \frac{a / n}{A / N}

where “a” is the number of actives discovered in the first 1% (n compounds) of a rank‐ordered list comprising “A” actives out of a total of “N” actives and decoys examined.

Supporting information

Table S1. RMSD values obtained by cross‐docking the 21 co‐crystal ligands against each of the 21 proteins.

Table S2. Shows Tanimoto score between the 21 co‐crystal ligands.

Table S3. RMSD values calculated by comparing the conformational similarity of the binding‐site residues in the 13 crystal structure.

Figure S1. 2D structures of the 21 co‐crystal ligands.

Click here for additional data file.^{(594KB, docx)}

ACKNOWLEDGMENTS

Subhash M. Agarwal would like to thank the Director, NICPR for institutional support.

Das AP, Nandekar P, Mathur P, Agarwal SM. A systematic pipeline of protein structure selection for computer‐aided drug discovery: A case study on T790M/L858R mutant EGFR structures. Protein Science. 2023;32(9):e4740. 10.1002/pro.4740

Review Editor: Nir Ben‐Tal

REFERENCES

Agarwal SM, Nandekar P, Saini R. Computational identification of natural product inhibitors against EGFR double mutant (T790M/L858R) by integrating ADMET, machine learning, molecular docking and a dynamics approach. RSC Adv. 2022;12:16779–16789. [DOI] [PMC free article] [PubMed] [Google Scholar]
Agarwal SM, Pal D, Gupta M, Saini R. Insight into discovery of next generation reversible TMLR inhibitors targeting EGFR activating and drug resistant T790M mutants. Curr Cancer Drug Targets. 2017;17:617–636. [DOI] [PubMed] [Google Scholar]
Anderson AC. The process of structure‐based drug design. Chem Biol. 2003;10:787–797. [DOI] [PubMed] [Google Scholar]
Berman HM, Westbrook J, Feng Z, Gilliland G, Bhat TN, Weissig H, et al. The Protein Data Bank. Nucleic Acids Res. 2000;28:235–242. [DOI] [PMC free article] [PubMed] [Google Scholar]
Bernardi RC, Melo MCR, Schulten K. Enhanced sampling techniques in molecular dynamics simulations of biological systems. Biochim Biophys Acta. 2015;1850:872–877. [DOI] [PMC free article] [PubMed] [Google Scholar]
Bryan MC, Burdick DJ, Chan BK, Chen Y, Clausen S, Dotson J, et al. Pyridones as highly selective, noncovalent inhibitors of T790M double mutants of EGFR. ACS Med Chem Lett. 2016;7:100–104. [DOI] [PMC free article] [PubMed] [Google Scholar]
Burgess AW, Cho HS, Eigenbrot C, Ferguson KM, Garrett TPJ, Leahy DJ, et al. An open‐and‐shut case? Recent insights into the activation of EGF/ErbB receptors. Mol Cell. 2003;12:541–552. [DOI] [PubMed] [Google Scholar]
Chan BK, Hanan EJ, Bowman KK, Bryan MC, Burdick D, Chan E, et al. Discovery of a noncovalent, mutant‐selective epidermal growth factor receptor inhibitor. J Med Chem. 2016;59:9080–9093. [DOI] [PubMed] [Google Scholar]
Cheng Y, Prusoff WH. Relationship between the inhibition constant (K1) and the concentration of inhibitor which causes 50 per cent inhibition (I50) of an enzymatic reaction. Biochem Pharmacol. 1973;22:3099–3108. [DOI] [PubMed] [Google Scholar]
Clark AJ, Tiwary P, Borrelli K, Feng S, Miller EB, Abel R, et al. Prediction of protein‐ligand binding poses via a combination of induced fit docking and Metadynamics simulations. J Chem Theory Comput. 2016;12:2990–2998. [DOI] [PubMed] [Google Scholar]
Dong J, Cao D‐S, Miao H‐Y, Liu S, Deng B‐C, Yun Y‐H, et al. ChemDes: an integrated web‐based platform for molecular descriptor and fingerprint computation. J Chem. 2015;7:60. [DOI] [PMC free article] [PubMed] [Google Scholar]
Eberhardt J, Santos‐Martins D, Tillack AF, Forli S. AutoDock Vina 1.2.0: new docking methods, expanded force field, and python bindings. J Chem Inf Model. 2021;61:3891–3898. [DOI] [PMC free article] [PubMed] [Google Scholar]
Empereur‐Mot C, Guillemain H, Latouche A, Zagury J‐F, Viallon V, Montes M. Predictiveness curves in virtual screening. J Chem. 2015;7:52. [DOI] [PMC free article] [PubMed] [Google Scholar]
Engelhardt H, Böse D, Petronczki M, Scharn D, Bader G, Baum A, et al. Start selective and rigidify: the discovery path toward a next generation of EGFR tyrosine kinase inhibitors. J Med Chem. 2019;62:10272–10293. [DOI] [PubMed] [Google Scholar]
Fatima S, Agarwal SM. Unraveling structural requirements of amino‐pyrimidine T790M/L858R double mutant EGFR inhibitors: 2D and 3D QSAR study. J Recept Signal Transduct Res. 2018;38:299–306. [DOI] [PubMed] [Google Scholar]
Fatima S, Pal D, Agarwal SM. QSAR of clinically important EGFR mutant L858R/T790M pyridinylimidazole inhibitors. Chem Biol Drug des. 2019;94:1306–1315. [DOI] [PubMed] [Google Scholar]
Fusani L, Palmer DS, Somers DO, Wall ID. Exploring ligand stability in protein crystal structures using binding pose metadynamics. J Chem Inf Model. 2020;60:1528–1539. [DOI] [PMC free article] [PubMed] [Google Scholar]
Halgren TA, Murphy RB, Friesner RA, Beard HS, Frye LL, Pollard WT, et al. Glide: a new approach for rapid, accurate docking and scoring. 2. Enrichment factors in database screening. J Med Chem. 2004;47:1750–1759. [DOI] [PubMed] [Google Scholar]
Hanan EJ, Baumgardner M, Bryan MC, Chen Y, Eigenbrot C, Fan P, et al. 4‐Aminoindazolyl‐dihydrofuro[3,4‐d]pyrimidines as non‐covalent inhibitors of mutant epidermal growth factor receptor tyrosine kinase. Bioorg Med Chem Lett. 2016;26:534–539. [DOI] [PubMed] [Google Scholar]
Hanan EJ, Eigenbrot C, Bryan MC, Burdick DJ, Chan BK, Chen Y, et al. Discovery of selective and noncovalent diaminopyrimidine‐based inhibitors of epidermal growth factor receptor containing the T790M resistance mutation. J Med Chem. 2014;57:10176–10191. [DOI] [PMC free article] [PubMed] [Google Scholar]
Heald R, Bowman KK, Bryan MC, Burdick D, Chan B, Chan E, et al. Noncovalent mutant selective epidermal growth factor receptor inhibitors: a Lead optimization case study. J Med Chem. 2015;58:8877–8895. [DOI] [PubMed] [Google Scholar]
Holton JM, Frankel KA. The minimum crystal size needed for a complete diffraction data set. Acta Crystallogr D Biol Crystallogr. 2010;66:393–408. [DOI] [PMC free article] [PubMed] [Google Scholar]
Ikeguchi M, Ueno J, Sato M, Kidera A. Protein structural change upon ligand binding: linear response theory. Phys Rev Lett. 2005;94:78102. [DOI] [PubMed] [Google Scholar]
Laskowski RA, Swindells MB. LigPlot+: multiple ligand‐protein interaction diagrams for drug discovery. J Chem Inf Model. 2011;51:2778–2786. [DOI] [PubMed] [Google Scholar]
Lionta E, Spyrou G, Vassilatis D, Cournia Z. Structure‐based virtual screening for drug discovery: principles, applications and recent advances. Curr Top Med Chem. 2014;14:1923–1938. [DOI] [PMC free article] [PubMed] [Google Scholar]
Lu X, Yu L, Zhang Z, Ren X, Smaill JB, Ding K. Targeting EGFR(L858R/T790M) and EGFR(L858R/T790M/C797S) resistance mutations in NSCLC: current developments in medicinal chemistry. Med Res Rev. 2018;38:1550–1581. [DOI] [PubMed] [Google Scholar]
Lukauskis D, Samways ML, Aureli S, Cossins BP, Taylor RD, Gervasio FL. Open binding pose Metadynamics: an effective approach for the ranking of protein‐ligand binding poses. J Chem Inf Model. 2022;62:6209–6216. [DOI] [PMC free article] [PubMed] [Google Scholar]
Macalino SJY, Gosu V, Hong S, Choi S. Role of computer‐aided drug design in modern drug discovery. Arch Pharm Res. 2015;38:1686–1701. [DOI] [PubMed] [Google Scholar]
Mendelsohn J, Baselga J. Status of epidermal growth factor receptor antagonists in the biology and treatment of cancer. J Clin Oncol. 2003;21:2787–2799. [DOI] [PubMed] [Google Scholar]
Morris GM, Huey R, Lindstrom W, Sanner MF, Belew RK, Goodsell DS, et al. AutoDock4 and AutoDockTools4: automated docking with selective receptor flexibility. J Comput Chem. 2009;30:2785–2791. [DOI] [PMC free article] [PubMed] [Google Scholar]
Mysinger MM, Carchia M, Irwin JJ, Shoichet BK. Directory of useful decoys, enhanced (DUD‐E): better ligands and decoys for better benchmarking. J Med Chem. 2012;55:6582–6594. [DOI] [PMC free article] [PubMed] [Google Scholar]
Olayioye MA, Neve RM, Lane HA, Hynes NE. The ErbB signaling network: receptor heterodimerization in development and cancer. EMBO J. 2000;19:3159–3167. [DOI] [PMC free article] [PubMed] [Google Scholar]
Reynolds CH. Protein–ligand cocrystal structures: we can do better. ACS Med Chem Lett. 2014;5:727–729. [DOI] [PMC free article] [PubMed] [Google Scholar]
Saini R, Fatima S, Agarwal SM. TMLRpred: a machine learning classification model to distinguish reversible EGFR double mutant inhibitors. Chem Biol Drug des. 2020;96:921–930. [DOI] [PubMed] [Google Scholar]
Shamsara J. CrossDocker: a tool for performing cross‐docking using Autodock Vina. Springerplus. 2016;5:344. [DOI] [PMC free article] [PubMed] [Google Scholar]
Sharma VK, Nandekar PP, Sangamwar A, Pérez‐Sánchez H, Agarwal SM. Structure guided design and binding analysis of EGFR inhibiting analogues of erlotinib and AEE788 using ensemble docking, molecular dynamics and MM‐GBSA. RSC Adv. 2016;6:65725–65735. [Google Scholar]
Sheridan RP, Singh SB, Fluder EM, Kearsley SK. Protocols for bridging the peptide to nonpeptide gap in topological similarity searches. J Chem Inf Comput Sci. 2001;41:1395–1406. [DOI] [PubMed] [Google Scholar]
Singh H, Singh S, Singla D, Agarwal SM, Raghava GPS. QSAR based model for discriminating EGFR inhibitors and non‐inhibitors using Random forest. Biol Direct. 2015;10:10. [DOI] [PMC free article] [PubMed] [Google Scholar]
Sogabe S, Kawakita Y, Igaki S, Iwata H, Miki H, Cary DR, et al. Structure‐based approach for the discovery of Pyrrolo[3,2‐d]pyrimidine‐based EGFR T790M/L858R mutant inhibitors. ACS Med Chem Lett. 2013;4:201–205. [DOI] [PMC free article] [PubMed] [Google Scholar]
Triballeau N, Acher F, Brabet I, Pin J‐P, Bertrand H‐O. Virtual screening workflow development guided by the “receiver operating characteristic” curve approach. Application to high‐throughput docking on metabotropic glutamate receptor subtype 4. J Med Chem. 2005;48:2534–2547. [DOI] [PubMed] [Google Scholar]
Truchon J‐F, Bayly CI. Evaluating virtual screening methods: good and bad metrics for the “early recognition” problem. J Chem Inf Model. 2007;47:488–508. [DOI] [PubMed] [Google Scholar]
Yadav IS, Nandekar PP, Srivastavaa S, Sangamwar A, Chaudhury A, Agarwal SM. Ensemble docking and molecular dynamics identify knoevenagel curcumin derivatives with potent anti‐EGFR activity. Gene. 2014;539:82–90. [DOI] [PubMed] [Google Scholar]
Yadav IS, Singh H, Khan MI, Chaudhury A, Raghava GPS, Agarwal SM. EGFRIndb: epidermal growth factor receptor inhibitor database. Anticancer Agents Med Chem. 2014;14:928–935. [DOI] [PubMed] [Google Scholar]
Yun C‐H, Mengwasser KE, Toms AV, Woo MS, Greulich H, Wong K‐K, et al. The T790M mutation in EGFR kinase causes drug resistance by increasing the affinity for ATP. Proc Natl Acad Sci U S A. 2008;105:2070–2075. [DOI] [PMC free article] [PubMed] [Google Scholar]
Zhang X, Pickin KA, Bose R, Jura N, Cole PA, Kuriyan J. Inhibition of the EGF receptor by binding of MIG6 to an activating kinase domain interface. Nature. 2007;450:741–744. [DOI] [PMC free article] [PubMed] [Google Scholar]
Zhong L, Li Y, Xiong L, Wang W, Wu M, Yuan T, et al. Small molecules in targeted cancer therapy: advances, challenges, and future perspectives. Signal Transduct Target Ther. 2021;6:201. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Table S1. RMSD values obtained by cross‐docking the 21 co‐crystal ligands against each of the 21 proteins.

Table S2. Shows Tanimoto score between the 21 co‐crystal ligands.

Table S3. RMSD values calculated by comparing the conformational similarity of the binding‐site residues in the 13 crystal structure.

Figure S1. 2D structures of the 21 co‐crystal ligands.

Click here for additional data file.^{(594KB, docx)}

[pro4740-bib-0001] Agarwal SM, Nandekar P, Saini R. Computational identification of natural product inhibitors against EGFR double mutant (T790M/L858R) by integrating ADMET, machine learning, molecular docking and a dynamics approach. RSC Adv. 2022;12:16779–16789. [DOI] [PMC free article] [PubMed] [Google Scholar]

[pro4740-bib-0002] Agarwal SM, Pal D, Gupta M, Saini R. Insight into discovery of next generation reversible TMLR inhibitors targeting EGFR activating and drug resistant T790M mutants. Curr Cancer Drug Targets. 2017;17:617–636. [DOI] [PubMed] [Google Scholar]

[pro4740-bib-0003] Anderson AC. The process of structure‐based drug design. Chem Biol. 2003;10:787–797. [DOI] [PubMed] [Google Scholar]

[pro4740-bib-0004] Berman HM, Westbrook J, Feng Z, Gilliland G, Bhat TN, Weissig H, et al. The Protein Data Bank. Nucleic Acids Res. 2000;28:235–242. [DOI] [PMC free article] [PubMed] [Google Scholar]

[pro4740-bib-0005] Bernardi RC, Melo MCR, Schulten K. Enhanced sampling techniques in molecular dynamics simulations of biological systems. Biochim Biophys Acta. 2015;1850:872–877. [DOI] [PMC free article] [PubMed] [Google Scholar]

[pro4740-bib-0006] Bryan MC, Burdick DJ, Chan BK, Chen Y, Clausen S, Dotson J, et al. Pyridones as highly selective, noncovalent inhibitors of T790M double mutants of EGFR. ACS Med Chem Lett. 2016;7:100–104. [DOI] [PMC free article] [PubMed] [Google Scholar]

[pro4740-bib-0007] Burgess AW, Cho HS, Eigenbrot C, Ferguson KM, Garrett TPJ, Leahy DJ, et al. An open‐and‐shut case? Recent insights into the activation of EGF/ErbB receptors. Mol Cell. 2003;12:541–552. [DOI] [PubMed] [Google Scholar]

[pro4740-bib-0008] Chan BK, Hanan EJ, Bowman KK, Bryan MC, Burdick D, Chan E, et al. Discovery of a noncovalent, mutant‐selective epidermal growth factor receptor inhibitor. J Med Chem. 2016;59:9080–9093. [DOI] [PubMed] [Google Scholar]

[pro4740-bib-0009] Cheng Y, Prusoff WH. Relationship between the inhibition constant (K1) and the concentration of inhibitor which causes 50 per cent inhibition (I50) of an enzymatic reaction. Biochem Pharmacol. 1973;22:3099–3108. [DOI] [PubMed] [Google Scholar]

[pro4740-bib-0010] Clark AJ, Tiwary P, Borrelli K, Feng S, Miller EB, Abel R, et al. Prediction of protein‐ligand binding poses via a combination of induced fit docking and Metadynamics simulations. J Chem Theory Comput. 2016;12:2990–2998. [DOI] [PubMed] [Google Scholar]

[pro4740-bib-0011] Dong J, Cao D‐S, Miao H‐Y, Liu S, Deng B‐C, Yun Y‐H, et al. ChemDes: an integrated web‐based platform for molecular descriptor and fingerprint computation. J Chem. 2015;7:60. [DOI] [PMC free article] [PubMed] [Google Scholar]

[pro4740-bib-0012] Eberhardt J, Santos‐Martins D, Tillack AF, Forli S. AutoDock Vina 1.2.0: new docking methods, expanded force field, and python bindings. J Chem Inf Model. 2021;61:3891–3898. [DOI] [PMC free article] [PubMed] [Google Scholar]

[pro4740-bib-0013] Empereur‐Mot C, Guillemain H, Latouche A, Zagury J‐F, Viallon V, Montes M. Predictiveness curves in virtual screening. J Chem. 2015;7:52. [DOI] [PMC free article] [PubMed] [Google Scholar]

[pro4740-bib-0014] Engelhardt H, Böse D, Petronczki M, Scharn D, Bader G, Baum A, et al. Start selective and rigidify: the discovery path toward a next generation of EGFR tyrosine kinase inhibitors. J Med Chem. 2019;62:10272–10293. [DOI] [PubMed] [Google Scholar]

[pro4740-bib-0015] Fatima S, Agarwal SM. Unraveling structural requirements of amino‐pyrimidine T790M/L858R double mutant EGFR inhibitors: 2D and 3D QSAR study. J Recept Signal Transduct Res. 2018;38:299–306. [DOI] [PubMed] [Google Scholar]

[pro4740-bib-0016] Fatima S, Pal D, Agarwal SM. QSAR of clinically important EGFR mutant L858R/T790M pyridinylimidazole inhibitors. Chem Biol Drug des. 2019;94:1306–1315. [DOI] [PubMed] [Google Scholar]

[pro4740-bib-0017] Fusani L, Palmer DS, Somers DO, Wall ID. Exploring ligand stability in protein crystal structures using binding pose metadynamics. J Chem Inf Model. 2020;60:1528–1539. [DOI] [PMC free article] [PubMed] [Google Scholar]

[pro4740-bib-0018] Halgren TA, Murphy RB, Friesner RA, Beard HS, Frye LL, Pollard WT, et al. Glide: a new approach for rapid, accurate docking and scoring. 2. Enrichment factors in database screening. J Med Chem. 2004;47:1750–1759. [DOI] [PubMed] [Google Scholar]

[pro4740-bib-0019] Hanan EJ, Baumgardner M, Bryan MC, Chen Y, Eigenbrot C, Fan P, et al. 4‐Aminoindazolyl‐dihydrofuro[3,4‐d]pyrimidines as non‐covalent inhibitors of mutant epidermal growth factor receptor tyrosine kinase. Bioorg Med Chem Lett. 2016;26:534–539. [DOI] [PubMed] [Google Scholar]

[pro4740-bib-0020] Hanan EJ, Eigenbrot C, Bryan MC, Burdick DJ, Chan BK, Chen Y, et al. Discovery of selective and noncovalent diaminopyrimidine‐based inhibitors of epidermal growth factor receptor containing the T790M resistance mutation. J Med Chem. 2014;57:10176–10191. [DOI] [PMC free article] [PubMed] [Google Scholar]

[pro4740-bib-0021] Heald R, Bowman KK, Bryan MC, Burdick D, Chan B, Chan E, et al. Noncovalent mutant selective epidermal growth factor receptor inhibitors: a Lead optimization case study. J Med Chem. 2015;58:8877–8895. [DOI] [PubMed] [Google Scholar]

[pro4740-bib-0022] Holton JM, Frankel KA. The minimum crystal size needed for a complete diffraction data set. Acta Crystallogr D Biol Crystallogr. 2010;66:393–408. [DOI] [PMC free article] [PubMed] [Google Scholar]

[pro4740-bib-0023] Ikeguchi M, Ueno J, Sato M, Kidera A. Protein structural change upon ligand binding: linear response theory. Phys Rev Lett. 2005;94:78102. [DOI] [PubMed] [Google Scholar]

[pro4740-bib-0024] Laskowski RA, Swindells MB. LigPlot+: multiple ligand‐protein interaction diagrams for drug discovery. J Chem Inf Model. 2011;51:2778–2786. [DOI] [PubMed] [Google Scholar]

[pro4740-bib-0025] Lionta E, Spyrou G, Vassilatis D, Cournia Z. Structure‐based virtual screening for drug discovery: principles, applications and recent advances. Curr Top Med Chem. 2014;14:1923–1938. [DOI] [PMC free article] [PubMed] [Google Scholar]

[pro4740-bib-0026] Lu X, Yu L, Zhang Z, Ren X, Smaill JB, Ding K. Targeting EGFR(L858R/T790M) and EGFR(L858R/T790M/C797S) resistance mutations in NSCLC: current developments in medicinal chemistry. Med Res Rev. 2018;38:1550–1581. [DOI] [PubMed] [Google Scholar]

[pro4740-bib-0027] Lukauskis D, Samways ML, Aureli S, Cossins BP, Taylor RD, Gervasio FL. Open binding pose Metadynamics: an effective approach for the ranking of protein‐ligand binding poses. J Chem Inf Model. 2022;62:6209–6216. [DOI] [PMC free article] [PubMed] [Google Scholar]

[pro4740-bib-0028] Macalino SJY, Gosu V, Hong S, Choi S. Role of computer‐aided drug design in modern drug discovery. Arch Pharm Res. 2015;38:1686–1701. [DOI] [PubMed] [Google Scholar]

[pro4740-bib-0029] Mendelsohn J, Baselga J. Status of epidermal growth factor receptor antagonists in the biology and treatment of cancer. J Clin Oncol. 2003;21:2787–2799. [DOI] [PubMed] [Google Scholar]

[pro4740-bib-0030] Morris GM, Huey R, Lindstrom W, Sanner MF, Belew RK, Goodsell DS, et al. AutoDock4 and AutoDockTools4: automated docking with selective receptor flexibility. J Comput Chem. 2009;30:2785–2791. [DOI] [PMC free article] [PubMed] [Google Scholar]

[pro4740-bib-0031] Mysinger MM, Carchia M, Irwin JJ, Shoichet BK. Directory of useful decoys, enhanced (DUD‐E): better ligands and decoys for better benchmarking. J Med Chem. 2012;55:6582–6594. [DOI] [PMC free article] [PubMed] [Google Scholar]

[pro4740-bib-0032] Olayioye MA, Neve RM, Lane HA, Hynes NE. The ErbB signaling network: receptor heterodimerization in development and cancer. EMBO J. 2000;19:3159–3167. [DOI] [PMC free article] [PubMed] [Google Scholar]

[pro4740-bib-0033] Reynolds CH. Protein–ligand cocrystal structures: we can do better. ACS Med Chem Lett. 2014;5:727–729. [DOI] [PMC free article] [PubMed] [Google Scholar]

[pro4740-bib-0034] Saini R, Fatima S, Agarwal SM. TMLRpred: a machine learning classification model to distinguish reversible EGFR double mutant inhibitors. Chem Biol Drug des. 2020;96:921–930. [DOI] [PubMed] [Google Scholar]

[pro4740-bib-0035] Shamsara J. CrossDocker: a tool for performing cross‐docking using Autodock Vina. Springerplus. 2016;5:344. [DOI] [PMC free article] [PubMed] [Google Scholar]

[pro4740-bib-0036] Sharma VK, Nandekar PP, Sangamwar A, Pérez‐Sánchez H, Agarwal SM. Structure guided design and binding analysis of EGFR inhibiting analogues of erlotinib and AEE788 using ensemble docking, molecular dynamics and MM‐GBSA. RSC Adv. 2016;6:65725–65735. [Google Scholar]

[pro4740-bib-0037] Sheridan RP, Singh SB, Fluder EM, Kearsley SK. Protocols for bridging the peptide to nonpeptide gap in topological similarity searches. J Chem Inf Comput Sci. 2001;41:1395–1406. [DOI] [PubMed] [Google Scholar]

[pro4740-bib-0038] Singh H, Singh S, Singla D, Agarwal SM, Raghava GPS. QSAR based model for discriminating EGFR inhibitors and non‐inhibitors using Random forest. Biol Direct. 2015;10:10. [DOI] [PMC free article] [PubMed] [Google Scholar]

[pro4740-bib-0039] Sogabe S, Kawakita Y, Igaki S, Iwata H, Miki H, Cary DR, et al. Structure‐based approach for the discovery of Pyrrolo[3,2‐d]pyrimidine‐based EGFR T790M/L858R mutant inhibitors. ACS Med Chem Lett. 2013;4:201–205. [DOI] [PMC free article] [PubMed] [Google Scholar]

[pro4740-bib-0040] Triballeau N, Acher F, Brabet I, Pin J‐P, Bertrand H‐O. Virtual screening workflow development guided by the “receiver operating characteristic” curve approach. Application to high‐throughput docking on metabotropic glutamate receptor subtype 4. J Med Chem. 2005;48:2534–2547. [DOI] [PubMed] [Google Scholar]

[pro4740-bib-0041] Truchon J‐F, Bayly CI. Evaluating virtual screening methods: good and bad metrics for the “early recognition” problem. J Chem Inf Model. 2007;47:488–508. [DOI] [PubMed] [Google Scholar]

[pro4740-bib-0042] Yadav IS, Nandekar PP, Srivastavaa S, Sangamwar A, Chaudhury A, Agarwal SM. Ensemble docking and molecular dynamics identify knoevenagel curcumin derivatives with potent anti‐EGFR activity. Gene. 2014;539:82–90. [DOI] [PubMed] [Google Scholar]

[pro4740-bib-0043] Yadav IS, Singh H, Khan MI, Chaudhury A, Raghava GPS, Agarwal SM. EGFRIndb: epidermal growth factor receptor inhibitor database. Anticancer Agents Med Chem. 2014;14:928–935. [DOI] [PubMed] [Google Scholar]

[pro4740-bib-0044] Yun C‐H, Mengwasser KE, Toms AV, Woo MS, Greulich H, Wong K‐K, et al. The T790M mutation in EGFR kinase causes drug resistance by increasing the affinity for ATP. Proc Natl Acad Sci U S A. 2008;105:2070–2075. [DOI] [PMC free article] [PubMed] [Google Scholar]

[pro4740-bib-0045] Zhang X, Pickin KA, Bose R, Jura N, Cole PA, Kuriyan J. Inhibition of the EGF receptor by binding of MIG6 to an activating kinase domain interface. Nature. 2007;450:741–744. [DOI] [PMC free article] [PubMed] [Google Scholar]

[pro4740-bib-0046] Zhong L, Li Y, Xiong L, Wang W, Wu M, Yuan T, et al. Small molecules in targeted cancer therapy: advances, challenges, and future perspectives. Signal Transduct Target Ther. 2021;6:201. [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

A systematic pipeline of protein structure selection for computer‐aided drug discovery: A case study on T790M/L858R mutant EGFR structures

Agneesh Pratim Das

Prajwal Nandekar

Puniti Mathur

Subhash M Agarwal

Abstract

1. INTRODUCTION

2. RESULTS AND DISCUSSION

TABLE 1.

FIGURE 1.

2.1. Prioritization of structures based on cross‐docking and ligand similarity

FIGURE 2.

FIGURE 3.

2.2. Screening of structures based on binding‐site residue conformational similarity, cross‐docking efficiency, and ligand similarity

FIGURE 4.

2.3. Binding pose metadynamics

FIGURE 5.

FIGURE 6.

2.4. Enrichment metric evaluation

TABLE 2.

2.5. Conserved interaction analysis

2.6. Control calculations

2.7. Computational resources

3. CONCLUSION

4. MATERIALS AND METHODS

4.1. Collection of double‐mutated EGFR (TMLR) crystal structures

4.2. Preparation of proteins

4.3. Cross‐docking

4.4. Ligand similarity

4.5. Binding‐site residue similarity

4.6. Binding pose metadynamics

4.7. Enrichment calculations

4.7.1. Area under the receiver operating characteristic curve

4.7.2. Area under the accumulation curve

4.7.3. Robust initial enhancement

4.7.4. Boltzmann‐enhanced discrimination receiver operator characteristic

4.7.5. 1% enrichment factor

Supporting information

ACKNOWLEDGMENTS

REFERENCES

Associated Data

Supplementary Materials

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases