RETRACTED: Optimizing chemotherapeutic targets in non-small cell lung cancer with transfer learning for precision medicine

Varun Malik; Ruchi Mittal; Deepali Gupta; Sapna Juneja; Khalid Mohiuddin; Swati Kumari

doi:10.1371/journal.pone.0319499

This article has been retracted.

Retraction in: PLoS One. 2026 Apr 8;21(4):e0345987 See also: PMC Retraction Policy

. 2025 Apr 29;20(4):e0319499. doi: 10.1371/journal.pone.0319499

RETRACTED: Optimizing chemotherapeutic targets in non-small cell lung cancer with transfer learning for precision medicine

Varun Malik ¹, Ruchi Mittal ¹, Deepali Gupta ¹, Sapna Juneja ², Khalid Mohiuddin ³, Swati Kumari ^4,^*

Editor: Ruo Wang⁵

PMCID: PMC12040121 PMID: 40299923

Abstract

Non-small cell lung cancer (NSCLC) accounts for the majority of lung cancer cases, making it the most fatal diseases worldwide. Predicting NSCLC patients’ survival outcomes accurately remains a significant challenge despite advancements in treatment. The difficulties in developing effective drug therapies, which are frequently hampered by severe side effects, drug resistance, and limited effectiveness across diverse patient populations, highlight the complexity of NSCLC. The machine learning (ML) and deep learning (DL) modelsare starting to reform the field of NSCLC drug disclosure. These methodologies empower the distinguishing proof of medication targets and the improvement of customized treatment techniques that might actually upgrade endurance results for NSCLC patients. Using cutting-edge methods of feature extraction and transfer learning, we present a drug discovery model for the identification of therapeutic targets in this paper. For the purpose of extracting features from drug and protein sequences, we make use of a hybrid UNet transformer. This makes it possible to extract deep features that address the issue of false alarms. For dimensionality reduction, the modified Rime optimization (MRO) algorithm is used to select the best features among multiples. In addition, we design the deep transfer learning (DTransL) model to boost the drug discovery accuracy for NSCLC patients’ therapeutic targets. Davis, KIBA, and Binding-DB are examples of benchmark datasets that are used to validate the proposed model. Results exhibit that the MRO+DTransL model outflanks existing cutting edge models. On the Davis dataset, the MRO+DTransL model performed better than the LSTM model by 9.742%, achieved an accuracy of 98.398%. It reached 98.264% and 97.344% on the KIBA and Binding-DB datasets, respectively, indicating improvements of 8.608% and 8.957% over baseline models.

1. Introduction

A projected 2.2 million instances and 1.79 million deaths would be attributed to lung cancer in 2024, making it one of the most common cancers and the top cause of deaths due to cancer globally [1]. The two main types of lung cancer are small cell lung cancer (SCLC) and non-small cell lung cancer (NSCLC). Approximately 85% of all instances of LC are NSCLC [2]. At present, cisplatin, carboplatin, and oxaliplatin are the three platinum-based medications that are generally accepted around the world as first-line therapies for non-small cell lung cancer. Despite their efficacy, these medicines come with a hefty price tag: major adverse effects and, even worse, the emergence of resistance to drugs [3]. Despite advancements in both diagnostic and therapeutic techniques for lung cancer, it remains the prevalent cancer worldwide, making the establishment of effective treatments highly challenging. Clinical treatment options for lung cancer include immunotherapy, chemotherapy, and radiotherapy [4]. However, chemotherapy is limited by drug toxicity to healthy tissues, inadequate delivery to target tissues, short treatment cycles, and the need for high drug concentrations. Improving drug delivery to tumor tissues while reducing toxicity during delivery is key strategy for enhancing treatment efficacy. Although numerous small molecule inhibitors with anti-cancer activity, such as KDM4s, glycolysis inhibitors, and kinase inhibitors, have been identified, these therapies are still in the research phase and are not yet applied in clinical settings [5]. Traditional therapeutic molecules, including synthetic drugs, natural compounds, and RNA/DNA inhibitors, often lack the capacity to specifically target tumor cells [6]. An efficient method of medication distribution is critical for the successful treatment of lung cancer. A significant challenge in cancer treatment is the development of mechanisms that decrease the effectiveness of chemotherapeutic medications. When cancer cells overexpress transporters for efflux on membranes, such as P-glycoprotein, MRP-1, and BCRP, the result is multidrug resistance (MDR) [7]. The negative effects on patients brought on caused by ABC enzymes are frequently linked to the development of resistance to many drugs. Over the years, researchers have developed several P-gp modulators in an effort to reverse MDR [8]. However, all previous generations of P-gp inhibitors have failed to show any promise because of their low selectivity and high toxicity.The most important aspects of lung cancer treatment, such as surgery, radiation, and chemotherapy, are still the most prevalent adverse effects on normal cells and the difficulty of preventing cancer from spreading [9]. Considerations like the patient’s general health, the cancer’s stage, and its histological type determine the best course of treatment for lung cancer. These treatments aim to kill cancer cells, but they kill cancer cells and normal cells at the same time, which could make the disease worse. Novel therapeutic approaches, such as gene therapy, immunotherapy, and targeted therapy, have been created to circumvent these restrictions and enhance treatment results [10]. The main objectives of these treatments are to improve patient prognosis and combat cancer spread. On the other hand, medication resistance is still a major problem when it comes to treating lung cancer. The development of numerous ways by cancer cells to resist the effects of targeted medicines and chemotherapy has occurred despite significant breakthroughs in therapy. The significance of exospores in drug resistance has been extensively studied. One of these processes involves the transfer of many components to neighboring cells, such as genetic material, lipids, survival signal molecules, nucleic acids, and drug sensitivity proteins [11,12].

By streamlining operations and cutting expenses, artificial intelligence (AI) has a great opportunity to revolutionize cancer care [13,14]. Several areas of cancer treatment have begun to rely heavily on AI, such as imaging, early identification, targeted therapy, and medication repurposing. In spite of all these progress, artificial intelligence still has a long way to go before it can fully synthesize novel anticancer compounds. Important tasks in artificial intelligence are performed by ML and DL, which employ methods such as unsupervised and supervised learning [15]. In order to diagnose diseases and evaluate the effectiveness of drugs, supervised learning is used, whereas unsupervised learning helps with patient stratification and illness identification [16]. The remarkable ability of deep learning to process enormous information has caused a revolution in areas like melanoma detection [17]. The generative model is an outstanding example of an unsupervised model; it learns from the training dataset and then generates new data that is very similar to it. The goal of these models is to find patterns and structures in the data so that they can create new samples that are very similar to the original. Several approaches have been studied for potential application in drug development, such as variation auto-encoders, normalizing flows, generative adversarial networks, and diffusion models. Thanks to transformer architectures in massive language models, additional progress in this field is now feasible [18]. Successful application of transformer-and NLP-based models to the drug development challenge has resulted in the capture of complex data patterns. Using prognostic elements added to generative models, we may evaluate the anti-tumor activity of the created compounds [19], which further helps in the finding of the best candidates [20].

For further enhancement, we present the drug discovery model to identify the therapeutic targets forNSCLC by using feature extraction and transfer learning techniques. The key contributions of this model are as follows:

Hybrid UNet transformer: Used for feature extraction from drug and protein sequences, this method effectively captures deep features, addressing the challenge of false alarms.
Modified rime optimization (MRO) algorithm: Employed for dimensionality reduction, the MRO algorithm selects the most optimal features from a range of possibilities.
Deep transfer learning (DTransL) model: Enhances the accuracy of drug discovery in detecting therapeutic targets for NSCLC patients.
Validation with benchmark datasets: Our model’s effectiveness is confirmed through validation against benchmark datasets, including Davis, KIBA, and Binding-DB.

The remaining sections of the article are arranged as follows: Section 2 describes the literature review on the drug discovery model for cancer treatment, and Section 3 discusses the proposed drug discovery model with the feature extraction, dimension reduction and drug discovery. Then, Section 4 elaborates on the results comparison, and Section 5 completes the work.

2. Related works

Using ML/DL methods, this part gives a literature evaluation on a drug development model for cancer treatment. Table 1 provides a synopsis of the research gaps highlighted by the current state of the art. One approach to predicting gene mutation in NSCLC patients is to combine CNN with dense neural networks [21]. A search strategy that uses a combination of DL models to categorize the individual gene variants. The model’s accuracy for forecasting gene mutation was 94%, which is considered high. A dual network analysis technique [22] combines the best of biological mechanism analysis, Bayesian causality network, and Spearman correlation network analysis to find novel treatment targets in non-small cell lung cancer. Using data on DNA methylation and RNA/miRNA sequencing from HCC patients in the TCGA database, a DL-based survival-sensitive strategy was created [23]. The algorithm was tested with five outside data sets from different omics types and successfully separated patients into two ideal groupings based on their survival differences. One notable use of AI-based drug discovery networks is the identification of potential repurposed medication therapy for Alzheimer’s disease [24]. A drug-target pair (DTP) network that takes into account various drug and target characteristics, along with the relationships amid DTP nodes; in this framework, drug-target pairs serve as nodes, and the connections amid DTP nodes are like the edges in an AD disease network. Cancer target discovery and medication repurposing were both assisted by an AI method [25]. Anticancer medication targeting STK33 is identified using an AI-driven screening approach. This drug induces cell cycle arrest at the s phase and kills cancer cells.

Table 1. Summary of research gaps from the existing drug discoverymodels for cancer treatment.

Ref.	Methodology	ML/DL techniques	Target drugs	Findings	Research gaps
[21]	Gene mutation prediction in lung cancer	SENet154 and ResNet	EGFR, RAS, andALK	AUC score 94%	The model failed to achieve sufficient test performance due to the overfitting problem.
[22]	Therapeutic targets discovery for NSCLC	CNN and RNN	PSMD2, PSMB1 and PSMB5	MSE 0.456	Only small and imbalanced datasets are provided
[23]	Predictive biomarker discovery in cancer	Graph convolution neural network	Tumor protein 53 (TP53)	Accuracy97.8%	The handcrafted features are not enough to optimizes the targets
[24]	AI-based drug discovery network (AI- DrugNet)	Drug-target pair (DTP) network	Reactome, KEGG and NCI-Nature	Accuracy 87.958%	Limited by sum of training samples which limits the target detection rate
[25]	Drug repurposing and target detection for cancer	LSTM	Antiviral agent Z29077885	MAE 13.562	Vitro tests can be time consuming and costly.
[26]	Detect promising target for drug therapy	CNN	GPX4, TFR1, SLC7A11, Nrf2	Accuracy 85.63%	Affected by computational resources and low hit rates
[27]	Discovery of an OTUD3 inhibitor for NSCLC	RNN and SVM	OTUD3	Precision 89.635%	Drug discovery is very complex due to the fixed threshold condition
[28]	Prediction of anticancer drug sensitivity	Visual neural network (VNN)	GDSC and CTRP	AUC 0.923	Poor performance like over fit and produce close analogs
[29]	Identifying the optimal drug-gene pairing	DL regression model(DRPO)	CCLE and GDSC1	RMSE 0.39 and NDCG 0.98	Leading to significant data dimensionality issue
[32]	Drug discovery tumor antigenic pathways	GEM-GNN model	VEGFR2 inhibitors	RMSF 0.025	Difficulty in extracts the meaningful insights

Open in a new tab

The combination of many cellular signaling pathways causes’ferroptosis, a type of PCD that was first hypothesized in 2012 and, like many PCD modalities, is regulated by genes [26]. Using a virtual test against a diverse range of 100,000 molecules, the drug development model based on an OTUD3 inhibitor found lead compounds for the treatment [27]. CADD can be used in almost every step of drug discovery, from identifying and validating targets to disoptimaling and optimizing leads and conducting preclinical studies. By combining chemical properties of anticancer medications with gene expression, gene mutation, gene copy sum variation in cancer cells, and an interpretable DL model (DrugGene) [28], we can anticipate their sensitivity. The chemical and structural properties of medications are captured by an artificial neural network (ANN). By integrating the results of VNN and ANN into a fully linked layer, DrugGene delivers final drug response predictions. The model improves its accuracy, disoptimals how anticancer medications interact with cell lines using a variety of features, and can understand and apply its projected outcomes. One deep learning model that effectively incorporates genomic and chemical characteristics for IC50 prediction is DL model DRPO [29]. The drug and cell line are mapped into latent pharmaceutical genomics space using matrix factorization. Predicted drug responses are then line-specific. The candidates’ activities were predicted using a GEM model, which includes a graph neural network (GNN) as its core extrapolative process. The structural modeling approach used flexible docking to investigate inhibitor mechanisms and screening data with high affinity.

2.1 Problem description

DoubleSG-DTA is DL based model used to predict drug–target affinities by integrating medication orders, protein orders, and drug graphs [30]. The graph isomorphism network is used to analyze molecular assemblies and squeeze-and-excitation network to augment spatial characteristics. Extensive testing has shown that DoubleSG-DTA outperforms other models and has successfully identified potential high-affinity compounds for NSCLC with the EGFR^T790M mutation [31], providing valuable insights for drug discovery.In this NSCLC drug discovery, numerous significant challenges persist, particularly within the framework of existing state-of-the-art models [21–29,32]. A major issue is the intrinsic heterogeneity of NSCLC, which complicates the identification of universal therapeutic targets. Overfitting hinders the model’s ability to extrapolate to new data, resulting in poor performance. Feature extraction false positives are another major concern, especially with complicated models like UNet converters. Inaccuracies can identify non-therapeutic targets, resulting in poor treatment. Dimensionality reduction, needed to manage huge data, can remove crucial features, reducing predictive model accuracy. The intricacy of these models makes it hard for physicians to comprehend and trust their results, delaying their use in clinical practice. This lack of openness is especially concerning when models influence high-stakes decisions like NSCLC patient treatment target selection. Model inconsistency lowers confidence and hinders integration into healthcare workflows. High computing needs limit its application, especially in resource-constrained contexts. The rapid evolution of NSCLC medication resistance complicates matters. Adding new resistance mechanisms to existing models requires continuing development, which drains economic and intellectual resources. To improve NSCLC drug development ML/DL model dependability, creativity and multidisciplinary teamwork are needed.

3. Materials and methods

3.1 Data collection

The public benchmark datasets have used to verifythe proposed model’s performance: Davis data: Over 80% of the mammalian catalytic protein kinome is represented by 72 inhibiting kinase and 442 kinases. It has 25,772 DTI pairs with 68 medicines and 379 proteins. KIBA data: The KIBA dataset integrates IC50, K(i), and K(d) bioactivity types to create a drug-target bioactivity matrix. It features 117,657 DTI pairings from 2,068 medicines and 229 proteins to predict binding affinity using targeted sequences and component SMILES strings.BindingDB dataset: This public database focuses on the interactions amid proteins and small molecules. It includes 4,1296 entries for small compounds and 8,810 protein targets with 2,519,702 interaction data points. On top of that, it includes 11,442 structures for proteins with sequence identities of up to 85% and 5,988 protein-ligand crystal models with attraction levels for proteins with 100% code identity. The quantitative description of datasets is shown in Fig 1.

Fig 2 shows the architecture of the proposed drug discovery model for identifying therapeutic targets in NSCLC patients using feature extraction and transfer learning. The process begins with the selection of benchmark datasets, including Davis, KIBA, and Binding-DB, which provide valuable drug and protein sequences. These sequences undergo data preprocessing to ensure that the data is of high quality and suitable for further analysis. Hybrid UNet transformers extract spatial and residual information from preprocessed sequences. MRO reduces the dimensionality after feature extraction. It chooses the most important elements, simplifying data and improving model efficiency. After finding NSCLC-specific therapeutic targets, the deep learning via transfer (DTransL) method disoptimals medicines. EGFRT790M mutation predicts compounds that can affect the overall reaction to NSCLC, validate the model’s performance.

3.2 Feature extraction from drug and protein sequences

For the purpose of comprehending the biological interactions that take place between drugs and proteins, feature extraction is utilized in this instance to locate the significant patterns that are required. The drug’s interaction with a protein target may be influenced by structural properties, functional motifs, binding sites, and other biochemical properties. Effective feature extraction is necessary for building predictive models that can accurately determine the binding affinity, efficacy, and safety of potential drug candidates in drug discovery. The hybrid UNet transformer (HUNet) is designed to remove significant elements from complex information, such as medication and protein groupings [33,34]. From the sequence data, the HUNet model learns to extract spatial features like sequence motifs or patterns that may correspond to drug active sites or functional domains in proteins. HUNet is made up of two sublayers: the forwarding layer and the self-addressing layer. A self-focusing layer allows the association to choose the meaning of different parts in the information progression and coordinate this information into its criticism depiction. It has a position embedding that provides each token with crucial position information for a variety of vision tasks. The best health is created by combining data from various tokens via HUNet.

\begin{array}{l} T = s e (P) + p e \\ Q^{s a} = F_{t} (r_{t} (\ln (s p a (r_{t} (\ln (T)))))) \end{array}

(1)

\begin{array}{l} l = l e (P) + p e \\ Q^{l a} = F_{l} (r_{t} (\ln (l p a (r_{t} (\ln (l)))))) \end{array}

(2)

where T and l are the production voxel patches embedded with minor and major threshold, respectively. $Q^{s a}$ and $Q^{l a}$ denotes the production of spatial and residual features, individually. The embedding and level coding mechanism of the learning small and large voxel patch with related to the coating regularization mechanism.

\begin{array}{l} Z = c o n c a t e n a t e (Q^{s a}, Q^{l a}) \\ W = r (F (\ln (c p a (r_{t} (\ln (Z)))))) \end{array}

(3)

The optimal solution set denotes Z. HUNet training trains the monitoring trajectories of initial efficiency by associating the production chances with the major knowledge of the encoding-decoding and self-improvement. The cross-entropy loss corresponds $x_{d}^{h}$ to the distribution of integers H and type label D and the probability of the calculated production $y_{d}^{h}$ . The inverse state loss $l_{c e}$ is printed as follows.

l_{c e} = - \frac{1}{B} \sum_{d} \sum_{h} x_{d}^{h} \log (y_{d}^{h})

(4)

The soft dice loss $l_{d i c e}$ uses the dice score using SoftMax and describes as follows.

l_{d i c e} = \frac{1}{B} \sum_{d} 1 - \frac{2 \sum_{h} y_{d}^{h} x_{d}^{h}}{\sum_{h} (y_{d}^{h} + x_{d}^{h})}

(5)

where $y_{d}^{h}$ is the likelihood of forecast type, $x_{d}^{h}$ is the prospect of realtype at integers i and N signifies the quantity of lots.The self-managed depletion mechanism is select as the inverse state defeat to equal the probability circulation $x_{c l s}$ of the small patch solution and the prospect $y_{c l s}$ of the great optimal solution. The inverse state loss $l_{T T}$ among the prospect production $x_{c l s}$ from a minor optimal solution and the chance production $y_{c l s}$ from a huge optimal solution is printed as follows.

l_{T T} = - \frac{1}{B} \sum_{g} x_{c l s}^{g} \log (y_{c l s}^{g})

(6)

where g is the gth component of the projecting layer’s soft maximal output following the optimal solution. The cumulative loss mechanism (l) is compute as follows.

l = l_{c e} + λ_{d i c e} l_{d i c e} + λ_{T T} l_{T T}

(7)

where $λ_{d i c e}$ and $λ_{T T}$ are the threshold issues for the dice and self-managed solution, the $λ_{d i c e}$ is analytically found as 0.5 for the maximum threshold and 0 for minimum threshold. The steps involved in the feature extraction using HUNet are summarized in Algorithm 1.

Algorithm 1. Feature extraction by HUNet.

Input:Sum of drug sequences, protein sequences, Threshold set
Output:Feature extraction - spatial and residual features
1.	Begin;
2.	Initialization the populace
3.	Define $Q^{s a}$ and $Q^{l a}$ for each solution
4.	Compute cross-entropy loss $l_{c e} = - \frac{1}{B} \sum_{d} \sum_{h} x_{d}^{h} \log (y_{d}^{h})$
5.	Define SoftMax for each know features $l_{d i c e} = \frac{1}{B} \sum_{d} 1 - \frac{2 \sum_{h} y_{d}^{h} x_{d}^{h}}{\sum_{h} (y_{d}^{h} + x_{d}^{h})}$
6	Update threshold mechanism to compute cumulative loss $l = l_{c e} + λ_{d i c e} l_{d i c e} + λ_{T T} l_{T T}$
7.	Find the best production level
8.	Stop

Open in a new tab

3.3 Dimensionality reduction using MRO algorithm

In drug discovery, especially when dealing with complex biological data such as drug and protein sequences, the sum of features can become extremely large. These features might include various chemical properties of drugs, amino acid sequences of proteins, structural motifs, binding affinities, and more.As the sum of features increases, the complexity of the model also increases, which can lead to overfitting. Overfitting occurs when a model performs well on training data but poorly on unseen data because it has learned noise rather than the underlying pattern.High-dimensional data can also lead to computational inefficiency. Processing and analyzing a large sum of features requires more computational power and time, which can be prohibitive in large-scale studies.To address these challenges, dimensionality reduction is used which aim to reduces the sum of features while preserving the most critical information needed for the model.The modified rime optimization (MRO) algorithm used to perform dimensionality reduction by selecting the most relevant features from multiple features.The RIME is inspired by the Moisture vapor in the atmosphere is frozen at zero degrees caused by issues such as humidity, wind speed, and temperature [35,36]. MRO algorithm begins by possible features. It then evaluates different subsets and iteratively selects relevant features and retaining the most significant ones. The process continues until the optimal feature set is detected where further reduction would lead to a loss of critical information. We define the early inhabitants of rime as follows.

R = [\begin{array}{l} p_{11} p_{12} \dots p_{1 g} \\ p_{21} p_{22} \dots p_{2 g} \\ ⋮ ⋮ ⋱ ⋮ \\ p_{h 1} p_{h 2} \dots p_{h g} \end{array}]

(8)

where R is the rim populace matrix $p_{h g}$ representing the location of rim particles. The couch rim grows irregularly, shelters a varied part in windy environments, but raises gradually in the similar course. A smooth rim search strategy is used to rapidly compute the algorithmic hunt planetary and evade native goals with uneven rim optimization.

R_{h g}^{N e w} = R_{B e s t g} + r_{1} \times \cos (\frac{s \times π}{10 \times S}) \times (1 - R o u n d (\frac{5 \times s}{S}) \div 5), r_{2} < e

(9)

Smooth ring search strategy is state of the $R_{h g}^{N e w}$ updated rhythm $R_{B e s t g}$ particle is the best individual g-th rhythm particle in the populace, is random number between 0 and 1, $r_{2}$ plump is a circular number, and h is the overlap amid the smooth ring, $u n_{h g}$ and $\ln_{h g}$ individually denotes the higher and lesserlimits of the hunt space, s is the present sum of repetitions, S is the extreme sum of repetitions, and the coupling coefficient ‘e’ affects the suppression probability.

e = \sqrt{(\frac{s}{S})}

(10)

Strong winds cause rapid growth of the rigid horn in a coordinated direction. A rhyme growing in one direction is easy to cross, i.e., rhyme piercing. During a solid growth stage, hard margin increases and absorption is more likely. To increase the algorithm’s convergence, update it, enable particle exchange, and enhance its capacity to migrate away from local optima, a rough slice technique is utilized.

R_{h g}^{N e w} = R_{B e s t, g,} r_{3} < f_{n o r m r}

(11)

The $R_{h g}^{N e w}$ state of the gum particle updated by the solid slice puncture mechanism is $r_{3}$ a random number 0 and 1, $f_{n o r m r}$ represents the regularized worth. Due to changes in individual states in the populace, agents may be inferior off than the populace earlier the update.

f (R_{h}) = f (R_{h}^{N e w}) R_{h} = R_{h}^{N e w}, f (R_{h}^{N e w}) < f (R_{h})

(12)

f (R_{B e s t}) = f (R_{h}^{N e w}) R_{B e s t} = R_{h}^{N e w}, f (R_{h}^{N e w}) < f (R_{B e s t})

(13)

where $f (R_{h})$ characterizes the efficiency level of the proxy for informing, $f (R_{h}^{N e w})$ is the rationalizedefficiencylevel, $f (R_{B e s t})$ is the universalidealefficiency, $R_{h}$ signifies the placeformerlyapprising, and $R_{B e s t}$ denotes the individual global optimal site.

p_{b + 1} =  {\begin{array}{l} \frac{p_{b}}{α} p_{b} \in [0, α) \\ \frac{1 - p_{b}}{1 - α} p_{b} \in [α, 1) \end{array}

(14)

where α is the level 1.1, $p_{b}$ is the medium of the b-thcharting, and $p_{b + 1}$ denotes the medium $b^{t h + 1}$ of the charting.To maintain a healthy equilibrium between exploration and exploitation, the iterative method progressively zeroes in on the region with the most promise.

R_{h g}^{N e w} = R_{B e s t g} + P_{a} - R_{h g}

(15)

where $R_{h g}^{N e w}$ is the frost subdivision location after the soft frost exploration approach inform, $R_{h g}$ characterizes the frostelementplacebeforehand apprise, $R_{B e s t g}$ signifies the g-th frost element of the best discrete in the populace, and $P_{a}$ displays the regular of the locations of the former M agents

P_{a} = M e a n (R_{M})

(16)

where $R_{M}$ formerly known as M Agent Place. M is casual integers from 2 to B, where B represents the rime populace. ln, un, and i denote the initial height and height of the lens, $i_{1}$ individually, for the lower and upper limits.

\frac{(L n + U n) / 2 - p}{p 1 - (L n + U n) / 2} = \frac{i}{i_{1}}

(17)

where p and p1 embody the prognosis themes of the thing, pretentious $K = \frac{i}{i_{1}}$ ,

p 1 = \frac{L n + U n}{2} + \frac{L n + U n}{2 \times K} - \frac{p}{K}

(18)

When the K = 1, develop the optimal rule function with average minimum and maximum fitness.

K = {(1 + {(\frac{s}{S})}^{0.5})}^{10}

(19)

When the present position is over the limit, the average fitness is compute a follows.

R_{h g}^{N e w} = (M e a n (R) + u n_{h g}) \div 2

(20)

R_{h g}^{N e w} = (M e a n (R) + \ln_{h g}) \div 2

(21)

where Mean(R) is search agent position of the mean. It enhances the overall performance of the model, making it efficient and reliable in the drug discovery for therapeutic targets in NSCLC. Algorithm 2 describes the working steps involved in the feature optimization using MRO.

Algorithm 2. Feature optimization using MRO.

Input: Sum of topographies, extremerepetition and fix stop criteria
Output: Ideal feature selection
1.	Prime the populace
2.	Define the sum of repetitions increases, $e = \sqrt{(\frac{s}{S})}$
3.	While s < S+1
4.	if (Rand < e)
5.	Compute maximum efficiency with K-metrics $K = {(1 + {(\frac{s}{S})}^{0.5})}^{10}$
6	Update displacement of particles $R_{h g}^{N e w} = R_{B e s t, g,} r_{3} < f_{n o r m r}$
7.	Fix minimum and maximum threshold mechanism $p_{b + 1} = {\begin{array}{l} \frac{p_{b}}{α} p_{b} \in [0, α) \\ \frac{1 - p_{b}}{1 - α} p_{b} \in [α, 1) \end{array}$
8.	Find the best the convergence speed in rime
9.	End while
10.	End if
11.	Update the final level
12.	End

Open in a new tab

3.4 Drug discovery for therapeutic targets in NSCLC patients

Drug discovery is used to recognize new medications that can effectively target specific diseases or conditions. For NSCLC, which is one of the most common and deadly forms of lung cancer, drug discovery focuses on identifying compounds that can target cancer cells while minimizing harm to healthy cells. Given the complexity of cancer biology, especially in NSCLC, where patients often exhibit diverse genetic mutations and varying responses to treatment, the discovery of effective therapeutic targets is both challenging and critical. In this context, deep transfer learning (DTransL)enhances the efficiency of identifying potential therapeutic targets.By using transfer learning, the DTransLmodel speeds up the drug discovery process [37]. A marginal likelihood distribution x(p) and an attribute space P make up the feature set. An objective prediction mechanism f, which is not directly observable but can be obtained from the training data, and a production space Q make up a job S= {Q, F}. The information is held in pairs denoted as (p, q), with p belonging to the set P and q to the set Q.We define the pre-trained model with source domain $C_{T}$ and source target $S_{T}$ , and a fine-tuned model with target domain $C_{S}$ and target task $S_{S}$ . We define the objective mechanism of $C_{T}$ , $S_{T}$ , $C_{S}$ , and $S_{S}$ as follows.

\begin{array}{l} C_{T} \overset{Δ}{=} \{P_{T}, x (ξ_{T})\}, ξ_{T} \in P_{T}, S_{T} \overset{Δ}{=} \{Q_{T}, F_{T}\} \\ F_{T} : P_{T} \to Q_{T}, q_{T} = F_{T} (ξ_{T}) \in Q_{T} \end{array}

(22)

\begin{array}{l} C_{S} \overset{Δ}{=} \{P_{S}, x (ξ_{S})\}, ξ_{S} \in P_{S}, S_{S} \overset{Δ}{=} \{Q_{S}, F_{S}\} \\ F_{S} : P_{S} \to Q_{S}, q_{S} = F_{S} (ξ_{S}) \in Q_{S} \end{array}

(23)

where $P_{T}$ , $Q_{T}$ , $P_{S}$ , and $Q_{S}$ represent the characteristic space of the target, the manufacturing space of the target, the characteristic space of the source, and the characteristic space of the target domain. Likewise, $p_{t}$ and $p_{s}$ signify the input characteristic trajectories of the basis and mark fields, individually. The target model with less multifaceted design is compute by refinement the basis model with $S_{s} = S_{t}$ .

\begin{array}{l} P = {[p_{1}^{S}; ...; p_{k}^{S}]}^{S}, p_{K} \overset{Δ}{=} [x_{T}^{S}, x_{C_{k}}^{S}, x_{H_{1}}^{S}, ...., x_{H_{H}}^{S} \\ x_{D}^{S}, f, A, a_{T D}, a_{D C_{K}} . L_{T D}, L_{D C_{K}}, B] \end{array}

(24)

where b denotes the instance sum. The N is the sum of samples.Here is the code for the original model, which includes layers for convolutional in nature, pooling, completely interconnected, and production.

\begin{array}{l} X S (P; θ^{s r c}) = [F_{F c_{6}}^{(11)} (θ_{F c_{6}}^{s r c}) \circ F_{F c_{5}}^{(10)} (θ_{F c_{5}}^{s r c}) \circ F_{F c_{4}}^{(9)} (θ_{F c_{4}}^{s r c}) \\ \circ F_{F c_{3}}^{(8)} (θ_{F c_{3}}^{s r c}) \circ F_{F c_{2}}^{(7)} (θ_{F c_{2}}^{s r c}) \circ F_{F c_{1}}^{(6)} (θ_{F c_{1}}^{s r c}) \circ F_{F l a t}^{(5)} \\ \circ F_{x_{2}}^{(4)} \circ F_{d_{3}}^{(3)} (θ_{d_{2}}^{s r c}) \circ F_{x_{1}}^{(2)} \circ F_{d_{1}}^{(1)} (θ_{d_{1}}^{s r c})] (P) \end{array}

(25)

where $(F_{2} (θ_{2}) \circ F_{1} (θ_{1})) (p) \overset{Δ}{=} F_{2} (F_{1} (p; θ_{1}); θ_{2})$ $θ_{l a y}$ represents the weights of layer lay. The next step consists of CNN operation using a kernel that applies a convolutional operation to the input data. Given $D^{(L)}$ grains, we describe each L-thgrain as a medium ${[Z_{s}^{(L)}]}_{d}$ with $X^{(L)}$ noises and $Y^{(L)}$ pillars. The characteristic map ${[Z_{s}^{(L)}]}_{d}$ is accessible as follows.

{[W^{(L)}]}_{d} = c_{d}^{(L)} + {[Z_{s}^{(L)}]}_{d} * {[M_{s}^{(L - 1)}]}_{d}

(26)

An element-wise quadratic activating technique is used to introduce irregularities after the operation of convolution. The d-th channel of $M^{(L)}$ is compute through efficiency mechanism.

M^{(L)} = φ_{D_{s}}^{(L)} (W^{(L)})

(27)

In this, the strictures of the Conv2D-s layer are signified as follows.

θ_{D_{s}}^{s r c} = \{l_{s}^{(L)}, Z_{s}^{(L)}\}

(28)

The Maxpooling operation lessens the spatial measurement of the characteristic map while retentive the recordvitalcharacteristics. Let $M^{(L)} = F_{x_{s}}^{(L)} (M^{(L - 1)})$ be the production of the t-thmaxpooling layer.

{[M^{(L)}]}_{d} = \underset{1 \leq U \leq X^{(L)}}{M a x} \underset{1 \leq V \leq Y^{(L)}}{M a x} {\bar{m}}_{t (h - 1) + U, t (g - 1) + V, d}^{(L - 1)}

(29)

where $X^{(L)}$ and $Y^{(L)}$ stipulate the grain size and t is the pace. Shadowed by the compress layer, fully connected layer calculate the activation mechanism, then a linear transformation on the compressed output of the previous layer.

m^{(L)} = φ_{F c_{s}}^{(L)} (ψ^{(L)})

(30)

w^{(L)} = n_{s}^{(L)} + {(Z_{s}^{(L)})}^{S} m^{(L - 1)}

(31)

where $Z_{s}^{(L)} \in r^{c^{(L - 1)} \times c^{(L)}}$ is the weight and $n_{s}^{(L)} r^{c^{(L)}}$ is the bias of the s-th fully associated layer. The new perfectedmarkmodel can be expressed as follows.

F S (P; θ^{F s}) = [F_{o u t}^{(13)} \circ \overset{N e w L a y e r}{F_{F c_{7}}^{(12)} (θ_{F c_{7}}^{F s})} \circ \overset{T r a i n a b l e L a y e r s}{F_{F c_{6}}^{(11)} (θ_{F c_{6}}^{s r c}) \circ \dots \circ F_{F c_{s}}^{(L_{s})} (θ_{F c_{s}}^{s r c})} \circ F_{F r e e z e} ({\tilde{θ}}_{f r e e z e}^{s r c})] (P)

(32)

where $L_{s}$ and s are the directories of the last trainable FC layer, $F_{F r e e z e}$ represents the non-trainable layers, ${\tilde{θ}}_{f r e e z e}^{s r c}$ are the non-trainable strictures of the source model due to maximum threshold, and $θ^{F s} \overset{Δ}{=} \geq \{θ_{F d_{7}}^{F s}, θ_{F d_{6}}^{F s}, ..., θ_{F d_{s}}^{s r c}\}$ are the trainable parameters. The steps involved in the drug discovery to identify the therapeutic targets in NSCLC patients using DTransL are given in Algorithm 3.

Algorithm 3. Drug discovery for NSCLC patients using DTransL.

Input: Sum of best optimal features, threshold set and maximum iteration
Output:New drug model
1.	Begin;
2.	Initialize the step sizes of the populace
3.	While loss not converged do
4.	Define the feature matrix for objective mechanism $\begin{array}{l} P = {[p_{1}^{S}; ...; p_{k}^{S}]}^{S}, p_{K} \overset{Δ}{=} [x_{T}^{S}, x_{C_{k}}^{S}, x_{H_{1}}^{S}, ...., x_{H_{H}}^{S} \\ x_{D}^{S}, f, A, a_{T D}, a_{D C_{K}} . L_{T D}, L_{D C_{K}}, B] \end{array}$
5.	Fix threshold set by using ${[W^{(L)}]}_{d} = c_{d}^{(L)} + {[Z_{s}^{(L)}]}_{d} * {[M_{s}^{(L - 1)}]}_{d}$
6	The strictures of the Conv2D-s layer $θ_{D_{s}}^{s r c} = \{l_{s}^{(L)}, Z_{s}^{(L)}\}$
7.	Compute production of the t-thmaxpooling layer ${[M^{(L)}]}_{d} = \underset{1 \leq U \leq X^{(L)}}{M a x} \underset{1 \leq V \leq Y^{(L)}}{M a x} {\bar{m}}_{t (h - 1) + U, t (g - 1) + V, d}^{(L - 1)}$
8.	Define activation mechanism $φ_{F c_{s}}^{(L)}$ on preceding layer’s, $m^{(L)} = φ_{F c_{s}}^{(L)} (ψ^{(L)})$
9.	End while
10.	Update the all level
11.	End

Open in a new tab

4. Results and discussion

This unit presents the results and comparative analysis of drug discoverymodels to identify the therapeutic targets towards the NSCLC patients. The performance can be validated through the benchmark datasets, including Davis, KIBA, and Binding-DB.The results of proposed MRO+DTransLmodel is compared with the existing state-of-art models such as casual forest (RF) [38], support vector machine (SVM), feedforward neural network (FNN) [39], KronRLS [40], SimBoost [41], DeepDTA [42], DeepCDA [43], MATT-DTI [44], AttentionDTA [45], DMIL-PPDTA [46], GraphDTA [47] and DoubleSG-DTA [31]. The performance of proposed and existing models can be validated through different metrics such as concordance index, mean square error, regression towards mean and Pearson correlation.

4.1 Error measure comparison

Table 2 depicts the quantitative results of the proposed MRO+DTransLmodel and previously studied modelson the Davis dataset.The performance of the proposed MRO+DTransLmodel on the Davis dataset demonstrates significant improvements across all error measures compared to other state-of-the-art methods. In terms of the Concordance Index, the MRO+DTransLmodel achieved a level of 0.955, representing a 5.88% increase compared to the DoubleSG-DTA model, which had the highest Concordance Index among existing methods. Regarding Mean Square Error (MSE), the MRO+DTransLmodel recorded a level of 0.198, which is 9.59% lower than the MSE of the DoubleSG-DTA model, indicates the reduction in prediction error. For the regression towards mean, the MRO+DTransLmodel scored 0.898, shows 23.86% improvement over the 0.725 score of the DoubleSG-DTA model. This highlights the model’s enhanced capacity to make predictions closer to the mean, reducing the bias in predictions. Finally, in terms of Pearson Correlation, the MRO+DTransLmodel achieved a level of 0.966, marking a 13.39% increase over the 0.852 Pearson Correlation of the DoubleSG-DTA model. This indicates a stronger linear relationship amid the predicted and actual levels, shows the MRO+DTransLmodel’s superior predictive accuracy. MRO+DTransLmodel outperforms previous methods, demonstrating its effectiveness in drug-target interaction prediction.

Table 2. Error measure comparison for Davis dataset.

Methods	Concordance index	Mean square error	Regression towards mean	Pearson correlation
RF	0.854	0.359	0.549	0.512
SVM	0.857	0.383	0.513	0.547
FNN	0.893	0.244	0.685	0.589
KronRLS	0.871	0.379	0.407	0.610
SimBoost	0.872	0.282	0.644	0.632
DeepDTA	0.878	0.261	0.631	0.655
DeepCDA	0.891	0.248	0.649	0.685
MATT-DTI	0.891	0.227	0.683	0.706
AttentionDTA	0.887	0.245	0.657	0.725
DMIL-PPDTA	0.880	0.223	0.642	0.785
GraphDTA	0.881	0.250	0.688	0.815
DoubleSG-DTA	0.902	0.219	0.725	0.852
MRO+DTransL	0.955	0.198	0.898	0.966

Open in a new tab

Table 3 depicts the quantitative results of the proposed MRO+DTransLmodel and previously studied modelson the KIBA dataset.The MRO+DTransLmodel achieved a Concordance Index of 0.936, represents 4.46% increase compared to the DoubleSG-DTA model, which previously held the highest Concordance Index at 0.896. In terms of mean square error, the MRO+DTransLmodel recorded a level of 0.112, which is 18.84% lower than the MSE of the DoubleSG-DTA model, which had an MSE of 0.138. This indicates a significant reduction in the prediction error, shows the model’s enhanced accuracy.For regression towards mean, the MRO+DTransLmodel achieved a level of 0.902, shows 14.61% improvement over the DoubleSG-DTA model, which had a level of 0.787. This improvement underscores the model’s superior capacity to make predictions closer to the mean, reducing bias in the results.Lastly, the Pearson Correlation for the MRO+DTransLmodel reached 0.925, marking a 3.47% increase over the 0.894 correlation achieved by the DoubleSG-DTA model. This highlights the stronger linear relationship amid the predicted and actual levels, further solidifying the MRO+DTransLmodel’s effectiveness in drug-target interaction prediction. MRO+DTransLmodel demonstrates consistent and substantial improvements across all error metrics, establishing its superiority over existing methods.

Table 3. Error measure comparison for KIBA dataset.

Methods	Concordance index	Mean square error	Regression towards mean	Pearson correlation
RF	0.837	0.245	0.581	0.522
SVM	0.799	0.308	0.513	0.562
FNN	0.818	0.216	0.659	0.585
KronRLS	0.782	0.411	0.342	0.599
SimBoost	0.836	0.222	0.629	0.612
DeepDTA	0.863	0.194	0.673	0.625
DeepCDA	0.889	0.176	0.682	0.644
MATT-DTI	0.889	0.151	0.756	0.689
AttentionDTA	0.882	0.162	0.735	0.712
DMIL-PPDTA	0.881	0.147	0.784	0.735
GraphDTA	0.891	0.139	0.721	0.798
DoubleSG-DTA	0.896	0.138	0.787	0.894
MRO+DTransL	0.936	0.112	0.902	0.925

Open in a new tab

Table 4 depicts the quantitative results of the proposed MRO+DTransLmodel and previously studied modelson the Binding-DB dataset.The MRO+DTransLmodel achieved a Concordance Index of 0.948, representing a 9.98% increase over the DoubleSG-DTA model, which previously held the highest Concordance Index at 0.862. This improvement indicates a marked enhancement in the consistency of predictions. In terms of MSE, the MRO+DTransLmodel recorded a level of 0.365, which is a 31.51% reduction compared to the MSE of the DoubleSG-DTA model, which had MSE of 0.533. This solution in prediction error underscores the improved accuracy and precision of the MRO+DTransLmodel in identifying drug-target interactions. For regression towards mean, the MRO+DTransLmodel achieved a level of 0.899, demonstrating a 23.82% improvement over the DoubleSG-DTA model, which had level of 0.726. This highlights the model’s superior to generate predictions closer to the actual mean, reducing the likelihood of biased outcomes. The Pearson Correlation for the MRO+DTransLmodel reached 0.963, marking an 11.09% increase compared to the 0.867 correlation achieved by the DoubleSG-DTA model. This significant improvement in Pearson Correlation suggests a stronger linear relationship amid the predicted and actual binding affinities, indicating that the MRO+DTransLmodel provides more accurate predictions. MRO+DTransLmodel exhibits consistent improvements across all evaluated parameters on all three dataset, clearly establishing its dominance over previous methodologies.

Table 4. Error measure comparison for Binding-DB dataset.

Methods	Concordance index	Mean square error	Regression towards mean	Pearson correlation
KronRLS	0.815	0.939	0.526	0.612
DeepDTA	0.826	0.703	0.669	0.635
DeepCDA	0.822	0.844	0.631	0.698
AttentionDTA	0.852	0.603	0.703	0.745
GraphDTA	0.855	0.593	0.682	0.789
DoubleSG-DTA	0.862	0.533	0.726	0.867
MRO+DTransL	0.948	0.365	0.899	0.963

Open in a new tab

Fig 3 presents the performance metrics (Accuracy, Precision, Recall, and F-measure) across different epochs (100–1000). Accuracy shows a general upward trend, peaking at epoch 400 (98.607%), before stabilizing with slight fluctuations around 98.4%. Precision remains relatively stable throughout, with a slight increase toward the later epochs. Recall, however, experiences a gradual decline from the start, reflecting a slight decrease in model performance as training progresses. F-measure shows a similar pattern, initially increasing before stabilizing and gradually decreasing in the later epochs. These trends indicate that while the model shows strong performance overall, slight decreases in Recall and F-measure suggest potential areas for improvement in later stages of training

4.2 Ablation study-1 results analysis with varying epochs of training data

As shown in Fig 3, the comparison of drug discovery models on the Davis dataset across different epochs reveals improvements in key performance metrics including accuracy, precision, recall, and F-measure. Starting with accuracy, the model shows an increase from 97.989% at 200 epochs to 98.958% at 1000 epochs. It represents 0.99% improvement, indicating that the model becomes more accurate as it continues to learn over more epochs. Precision also shows a positive trend, rising from 96.523% at 200 epochs to 96.959% at 1000 epochs. This 0.45% increase suggests that the model’s capacity to correctly identify true positives improves as it is trained longer. Recall follows a similar trajectory, starting at 95.325% at 200 epochs and reaching 95.989% at 1000 epochs. The improvement suggests a greater ability to recognize all relevant situations throughout training. F-measure combines Precision and Recall, improving 0.57% from 95.920% at 200 times to 96.472% at 1000. As the algorithm improves in identifying and predicting drug-target interactions, this metric grows. Repeated training up to 1000 epochs improves all important performance measures, improving the model’s ability to accurately and consistently anticipate interactions amid drugs and targets on the Davis dataset.

Fig 4 presents the comparison of drug discovery models on the KIBA dataset. Accuracy shows a steady increase throughout the epochs, reaching a peak of 98.838% at epoch 1000, reflecting consistent improvement in model performance. Precision follows a similar upward trend, gradually improving from 95.982% at epoch 100 to 97.678% at epoch 1000, demonstrating an overall enhancement in the model’s ability to correctly identify positive instances. Recall shows a steady increase, peaking at 97.285% at epoch 1000, indicating the model’s growing sensitivity to positive instances. F-measure also improves consistently, with a slight increase from epoch 100 to epoch 1000, confirming that the model maintains a balanced performance in both precision and recall. The trends suggest that the model performs optimally throughout the training process, with continuous improvement in all metrics.

As seen in Fig 4, the development of drugs algorithms on the KIBA database improve in key performance indicators over time. The method improves Accuracy by 1.13% from 97.856% at 200 times to 98.958% at 1000. As training epochs rise, the algorithm predicts drug-target interactions more accurately. Precision improves 0.51% from 96.415% at 200 times to 96.908% at 1000. This shows that more epochs improve the model’s genuine positive detection accuracy. Recall improves by 0.41%, from 96.001% at 200 times to 96.398% at 1000. As training proceeds, the model’s ability to recognize all relevant events improves. The F-measure, which combines Precision and Recall, improves 0.46% from 96.208% at 200 epochs to 96.652% at 1000. The model’s performance has increased overall, suggesting a balanced and reliable ability to forecast interactions amid drugs and targets on the KIBA dataset. As training advances to 1000 epochs, all important performance parameters improve incrementally but consistently, improving the model’s accuracy in forecasting on the KIBA database.

Fig 5 shows the comparison of drug discovery models on the Binding-DB dataset. Accuracy shows a consistent increase throughout the epochs, reaching a peak of 98.4% at epoch 1000, indicating a steady improvement in model performance. Precision also shows gradual improvement, rising from 95.5% at epoch 100 to 97.4% at epoch 1000, reflecting an enhanced ability to correctly identify positive instances. Recall displays a steady increase from 94.8% at epoch 100 to 96.5% at epoch 1000, showing that the model becomes more sensitive to positive instances as training progresses. F-measure follows a similar trend, improving from 95.1% at epoch 100 to 96.6% at epoch 1000, confirming that the model maintains a balanced performance across both precision and recall. Overall, these trends suggest that the model demonstrates continuous improvement in all performance metrics over time, highlighting its growing efficacy in drug discovery tasks.

The search for drugs algorithms on the Binding-DB database improve in key performance measures over time, as seen in Fig 5. Accuracy improves 0.83% from 96.985% at 200 times to 97.788% at 1000. This suggests that more epochs improve the model’s drug-target interaction prediction. Precision increases 0.79% from 96.345% at 200 times to 97.102% at 1000. This improvement suggests that the model becomes better at correctly identifying true positive interactions with fewer false positives as training continues.Recall also improvesmoving from 95.233% at 200 epochs to 95.968% at 1000 epochs, represent0.77% increase. This indicates that the model’s capacity to capture all relevant positive interactions enhances with more training. F-measureincreases from 95.786% at 200 epochs to 96.532% at 1000 epochs, shows 0.78% improvement. This growth reflects the overall enhancement in themodel’s performance, indicates capacity to predict drug-target interactions effectively on the Binding-DB dataset.

4.3 Ablation study-2 misdiscovery rate with training data

The misdiscovery rate analysis indicated by the training and validation losses across the Davis, KIBA, and Binding-DB datasets, Figs 6–8 shows a consistent trend of decreasing losses as the training progresses;reflect improvedmodel performance over time. The training loss decreases from 0.0568 at 5000 repetitions to 0.0258 at 25000 repetitions, marking a significant reduction of 54.6%. Similarly, the validation loss drops from 0.0256 to 0.0134 over the same range, resulting in a 47.7% decrease. The Davis dataset drug discoverymodel is learning well as training and validation mistakes decrease, indicating higher generalization and accuracy. The KIBA dataset’s training loss drops 45.4% from 0.0568 at 5000 repetitions to 0.0310 at 75000. Validation loss also dropped 32%, from 0.0278 to 0.0189. The model’s ability to minimize errors improves predictive accuracy and reduces KIBA dataset misdiscovery as validation and training losses decrease across repetitions. The Binding-DB dataset’s training loss lowers 61.1% from 0.0653 at 5000 repetitions to 0.0254 at 35000. Validation loss drops 52.6% from 0.0325 to 0.0154. These declines imply that the model is improving its predictions, reducing errors and improving drug target identification. As training progresses, validation and training losses decrease across all three databases, demonstrating the model’s misdiscovery reduction. The model is improving at recognizing actual drug-target interactions, eliminating errors, and generalizing due to the significant loss level reductions. Complex datasets like Davis, KIBA, and Binding-DB require dependable drug discovery, hence this improved performance is crucial.

Fig 6 shows the relationship between training and validation loss across varying training data. The figure plots the misdiscovery rate against both training and validation loss, with data points representing different levels of training data. As the training data increases, the training loss consistently decreases, reflecting the model’s improved performance with more data and its ability to fit the training set better. The validation loss, however, shows a slight fluctuation, initially decreasing with more training data but eventually stabilizing or slightly increasing, suggesting that the model may start to overfit the training data as it becomes more complex.

Fig 7 shows the relationship between training and validation loss as the training data size varies. In this figure, we analyze how the misdiscovery rate changes with increasing training data. Initially, as the training data size increases, both training loss and validation loss show a decreasing trend, reflecting the model’s improved ability to fit the training data and generalize better to unseen data. After reaching a threshold at approximately 50,000 training samples, both the training and validation losses continue to decrease at a much slower rate, suggesting that the model has reached a point of diminishing returns where additional data no longer significantly reduces the losses.

Fig 8 shows the relationship between training and validation loss as the training data size varies. In this figure, we analyze the misdiscovery rate as training data increases. Initially, both training and validation losses decrease together, indicating that the model is improving its ability to fit the training data and generalize to new, unseen data. However, at approximately 27,500 training samples, the validation loss surpasses the training loss, peaking at this point. This suggests that the model may be overfitting the training data, as it performs well on the training set but struggles to generalize effectively to the validation set. After this peak, both the training and validation losses gradually return to their original positions, with the validation loss slowly decreasing and stabilizing, indicating a recovery from overfitting and a more balanced model performance.

4.4 Ablation study-3 misdiscovery rate with training data

Table 5 describes the comparison of the previous models with the proposed MRO+DTransL model across the Davis, KIBA, and BindingDB datasets reveals enhancement in accuracy, precision, recall, and F-measure. For Davis dataset, the MRO+DTransL model achieves an accuracy of 98.398%, which is a notable 9.75% improvement over the LSTM model’s accuracy of 89.656%. Precision also sees increase, with the proposed model reaching 96.739%, reflect 5.69% improvement compared to LSTM’s 91.245%. Similarly, recall and F-measure for MRO+DTransL are 95.644% and 96.189%, individually, surpassing LSTM’s levels of 88.125% and 89.658% by 8.55% and 7.3%. In KIBA dataset, MRO+DTransL achieve an accuracy of 98.264%, represent an 8.99% increase over LSTM model’s accuracy of 90.152%. Precision improves by 7.64% with MRO+DTransL at 96.719% compared to LSTM’s 89.858%. The recall for the proposed model is 96.217%, shows 22.35% increase over LSTM’s recall of 78.613%. The F-measure for MRO+DTransL also sees significant boost, increasing by 15.05% to 96.467% from LSTM’s 83.860%.For the BindingDB dataset, the MRO+DTransL model achieves an accuracy of 97.344%, an 8.93% improvement over LSTM’s accuracy of 89.356%. Precision improves by 14.23%, with MRO+DTransL reaching 96.756% compared to LSTM’s 84.696%. The recall for the proposed model is 95.617%, 22.65% increase over LSTM’s 77.989%, and F-measure is 96.183%, reflecting 14.73% improvement from LSTM’s 81.204%. MRO+DTransL model consistently outperforms previous models, particularly in recall and F-measure highlights superior capacity to manage complex datasets and potential contribution to precision medicine.

Table 5. Comparison of previous and proposed models on the datasets.

Models	Levels in %
	Davis dataset
	Accuracy	Precision	Recall	F-measure
RF	80.152	78.325	74.158	76.185
DT	83.347	82.198	78.545	80.330
LR	84.125	84.215	82.454	83.325
SVM	86.457	87.458	84.565	85.987
LSTM	89.656	91.245	88.125	89.658
MRO+DTransL	98.398	96.739	95.644	96.189
	KIBA dataset
RF	80.125	75.856	71.289	73.502
DT	85.345	80.188	72.458	76.127
LR	86.654	84.125	76.345	80.046
SVM	89.525	87.466	77.859	82.383
LSTM	90.152	89.858	78.613	83.860
MRO+DTransL	98.264	96.719	96.217	96.467
	BindingDB dataset
RF	75.465	71.645	73.356	72.490
DT	80.968	73.289	74.598	73.938
LR	83.198	77.568	75.628	76.586
SVM	84.525	82.154	76.325	79.132
LSTM	89.356	84.696	77.989	81.204
MRO+DTransL	97.344	96.756	95.617	96.183

Open in a new tab

4.5 Case study on lung cancer with EGFR^T790M mutation

Lung cancer, particularly NSCLC, remains one of the leading causes of cancer-related deaths, with a significant portion of cases complicated by the EGFR^T790M mutation. This mutation often results in resistance to first- and second-generation EGFR tyrosine kinase inhibitors (EGFR-TKIs), which are commonly used in NSCLC treatment. Although third-generation EGFR-TKIs offer improved targeting, resistance still develops over time, necessitating the exploration of new therapeutic strategies. In response to these challenges, our study introduces a drug discoverymodel for NSCLC using ML/DL techniques. MRO+DTransLintegrate hybrid UNet transformer for feature extraction and MRO algorithm for dimensionality reduction. This approach improves NSCLC treatment target identification accuracy and efficiency. MRO+DTransL outperformed current models on Davis, KIBA, and Binding-DB benchmark datasets. The model had 98.398% accuracy on the Davis a database, exceeding baseline methods like LSTM, which had 89.656%. KIBA and Binding-DB databases showed similar gains, with accuracy ratings of 98.264% and 97.344%. MRO+DTransL persistent outperformance of state-of-the-art methods suggests it could transform NSCLC drug discovery. This method can target the EGFRT790M mutation and eliminate false negatives and positives in order to develop tailored medicines that could increase NSCLC survival while decreasing adverse reactions associated with treatment.

5. Conclusion

The proposed drug discovery method enhances the extraction of features and utilizes transfer learning to identify therapeutic targets for NSCLC. By employing a hybrid UNet converter, the model extracts deep drug and protein sequence characteristics, reducing false positives. Dimensionality reduction is performed using the MRO algorithm, which selects optimal features from multiple options. Drug discovery for NSCLC therapeutic targets is further improved with DTransL. Our model was validated on the Davis, KIBA, and Binding-DB datasets. The Davis dataset consists of 25,772 drug-target interaction (DTI) pairs, representing 68 drugs and 379 proteins. The KIBA dataset includes 117,657 DTI pairs across 2,068 drugs and 229 proteins, combining IC50, Ki, and Kd values. The BindingDB dataset comprises 2,519,702 interactions between 4,1296 compounds and 8,810 protein targets. On the Davis dataset, MRO+DTransL achieved an accuracy of 98.398%, showing significant improvement over existing models. Precision and recall also improved notably, with MRO+DTransL outshining existing models. Its F-measure of 96.189% reflects its robust and balanced performance across all metrics. On the KIBA dataset, MRO+DTransL reached an accuracy of 98.264%, an improvement compared to existing models. Precision, recall, and F-measure also showed substantial gains, further emphasizing the model’s effectiveness. For the BindingDB dataset, MRO+DTransL achieved an accuracy of 97.344%, with significant improvements in precision, recall, and F-measure compared to existing models. These findings demonstrate that the proposed MRO+DTransL model is highly effective for drug discovery, particularly in the context of NSCLC with the EGFRT790M mutations, making it a promising tool for identifying therapeutic targets.

Data Availability

All dataset files are available from https://github.com/dingyan20/Davis-Dataset-for-DTA-Prediction https://github.com/warastra/ligand_target_prediction/blob/main/BindingDB_ligand_target_IC50_cleaned.csv https://github.com/Zhaoyang-Chu/HGRL-DTA/tree/main/source/data

Funding Statement

The authors extend their appreciation to the Deanship of Scientific Research and Graduate Studies at King Khalid University, KSA, for funding this work through General Research Project under grant number: GRP/4/45. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

References

1.Shi Y, Jin Z, Deng J, Zeng W, Zhou L. A novel high-dimensional kernel joint non-negative matrix factorization with multimodal information for lung cancer study. IEEE J Biomed Health Informat. 2023;28(2). [DOI] [PubMed] [Google Scholar]
2.Wu Q, Wang J, Sun Z, Xiao L, Ying W, Shi J. Immunotherapy efficacy prediction for non-small cell lung cancer using multi-view adaptive weighted graph convolutional networks. IEEE J Biomed Health Informat. 2023;27(11). [DOI] [PubMed] [Google Scholar]
3.Nakamura M, Ishikawa H, Ohnishi K, Mori Y, Baba K, Nakazawa K, et al. Effects of lymphopenia on survival in proton therapy with chemotherapy for non-small cell lung cancer. J Radiat Res. 2023;64(2):438–47. doi: 10.1093/jrr/rrac084 [DOI] [PMC free article] [PubMed] [Google Scholar]
4.Chen Y, Zhang Z, Xiong R, Luan M, Qian Z, Zhang Q, et al. A multi-component paclitaxel -loaded β-elemene nanoemulsion by transferrin modification enhances anti-non-small-cell lung cancer treatment. Int J Pharm. 2024;663:124570. doi: 10.1016/j.ijpharm.2024.124570 [DOI] [PubMed] [Google Scholar]
5.Zhong Y, Luo B, Hong M, Hu S, Zou D, Yang Y, et al. Oxymatrine induces apoptosis in non-small cell lung cancer cells by downregulating TRIM46. Toxicon. 2024;244:107773. doi: 10.1016/j.toxicon.2024.107773 [DOI] [PubMed] [Google Scholar]
6.Zhao H, Wu G, Luo Y, Xie Y, Han Y, Zhang D, et al. WNT5B promotes the malignant phenotype of non-small cell lung cancer via the FZD3-DVL3-RAC1-PCP-JNK pathway. Cell Signal. 2024;122:111330. doi: 10.1016/j.cellsig.2024.111330 [DOI] [PubMed] [Google Scholar]
7.Wang J, Zhu X, Jiang H, Ji M, Wu Y, Chen J. Cancer cell-derived exosome based dual-targeted drug delivery system for non-small cell lung cancer therapy. Colloids Surf B. 2024;244:114141. doi: 10.1016/j.colsurfb.2024.114141 [DOI] [PubMed] [Google Scholar]
8.Bian W, Chen Y, Ni Y, Lv B, Gong B, Zhu K, et al. Efficacy of GluN2B-containing NMDA receptor antagonist for antitumor and antidepressant therapy in non-small cell lung cancer. Eur J Pharmacol. 2024;980:176860. doi: 10.1016/j.ejphar.2024.176860 [DOI] [PubMed] [Google Scholar]
9.Zhang K, Wang K, Zhang X, Qian Z, Zhang W, Zheng X, et al. Discovery of small molecules simultaneously targeting NAD(P)H:quinone oxidoreductase 1 and nicotinamide phosphoribosyltransferase: treatment of drug-resistant non-small-cell lung cancer. J Med Chem. 2022;65(11):7746–69. doi: 10.1021/acs.jmedchem.2c00077 [DOI] [PubMed] [Google Scholar]
10.Yukuyama MN, de Souza A, Henostroza MAB, de Araujo GLB, Löbenberg R, Faria R de O, et al. Unveiling microtubule dynamics in lung cancer: recent findings and prospects for drug delivery and treatment. J Drug Deliv Sci Technol. 2023;89:105017. doi: 10.1016/j.jddst.2023.105017 [DOI] [Google Scholar]
11.Patra SK, Sahoo RK, Biswal S, Panda SS, Biswal BK. Enigmatic exosomal connection in lung cancer drug resistance. Mol Ther Nucleic Acids. 2024;35(2):102177. [DOI] [PMC free article] [PubMed] [Google Scholar]
12.Wang X, Ren X, Lin X, Li Q, Zhang Y, Deng J, et al. Recent progress of ferroptosis in cancers and drug discovery. Asian J Pharm Sci. 2024;19(4):100939. doi: 10.1016/j.ajps.2024.100939 [DOI] [PMC free article] [PubMed] [Google Scholar]
13.Hill J, Jones RM, Crich D. Discovery of a hydroxylamine-based brain-penetrant EGFR inhibitor for metastatic non-small-cell lung cancer. J Med Chem. 2023;66(22):15477–92. doi: 10.1021/acs.jmedchem.3c01669 [DOI] [PMC free article] [PubMed] [Google Scholar]
14.Wu T, Yu B, Xu Y, Du Z, Zhang Z, Wang Y, et al. Discovery of selective and potent macrocyclic CDK9 inhibitors for the treatment of osimertinib-resistant non-small-cell lung cancer. J Med Chem. 2023;66(22):15340–61. doi: 10.1021/acs.jmedchem.3c01400 [DOI] [PubMed] [Google Scholar]
15.Zhao C, Zhang R, Yang H, Gao Y, Zou Y, Zhang X. Antibody-drug conjugates for non-small cell lung cancer: advantages and challenges in clinical translation. Biochem Pharmacol. 2024;226:116378. doi: 10.1016/j.bcp.2024.116378 [DOI] [PubMed] [Google Scholar]
16.Karampuri A, Kundur S, Perugu S. Exploratory drug discovery in breast cancer patients: a multimodal deep learning approach to identify novel drug candidates targeting RTK signaling. Comput Biol Med. 2024;174:108433. doi: 10.1016/j.compbiomed.2024.108433 [DOI] [PubMed] [Google Scholar]
17.Kaur R, Suresh PK. Chemoresistance mechanisms in non-small cell lung cancer-opportunities for drug repurposing. Appl Biochem Biotechnol. 2024;196(7):4382–438. doi: 10.1007/s12010-023-04595-7 [DOI] [PubMed] [Google Scholar]
18.Thirunavukkarasu MK, Ramesh P, Karuppasamy R, Veerappapillai S. Transcriptome profiling and metabolic pathway analysis towards reliable biomarker discovery in early-stage lung cancer. J Appl Genet. 2025;66(1):115–26. [DOI] [PubMed] [Google Scholar]
19.Srinivasarao DA, Shah S, Famta P, Vambhurkar G, Jain N, Pindiprolu SKS, et al. Unravelling the role of tumor microenvironment responsive nanobiomaterials in spatiotemporal controlled drug delivery for lung cancer therapy. Drug Deliv Transl Res. 2024;1–29. doi: DOINeeded [DOI] [PubMed] [Google Scholar]
20.Das AP, Agarwal SM. Recent advances in the area of plant-based anti-cancer drug discovery using computational approaches. Mol Divers. 2024;28(2):901–25. doi: 10.1007/s11030-022-10590-7 [DOI] [PMC free article] [PubMed] [Google Scholar]
21.Tripathi S, Moyer EJ, Augustin AI, Zavalny A, Dheer S, Sukumaran R, et al. RadGenNets: deep learning-based radiogenomics model for gene mutation prediction in lung cancer. Inform Med Unlocked. 2022;33:101062. doi: 10.1016/j.imu.2022.101062 [DOI] [Google Scholar]
22.Bai Y, Zhou L, Zhang C, Guo M, Xia L, Tang Z, et al. Dual network analysis of transcriptome data for discovery of new therapeutic targets in non-small cell lung cancer. Oncogene. 2023;42(49):3605–18. doi: 10.1038/s41388-023-02866-5 [DOI] [PMC free article] [PubMed] [Google Scholar]
23.Mathema VB, Sen P, Lamichhane S, Orešič M, Khoomrung S. Deep learning facilitates multi-data type analysis and predictive biomarker discovery in cancer precision medicine. Comput Struct Biotechnol J. 2023;21:1372–82. doi: 10.1016/j.csbj.2023.01.043 [DOI] [PMC free article] [PubMed] [Google Scholar]
24.Pan X, Yun J, Coban Akdemir ZH, Jiang X, Wu E, Huang JH, et al. AI-drugnet: a network-based deep learning model for drug repurposing and combination therapy in neurological disorders. Comput Struct Biotechnol J. 2023;21:1533–42. doi: 10.1016/j.csbj.2023.02.004 [DOI] [PMC free article] [PubMed] [Google Scholar]
25.Tran NL, Kim H, Shin C-H, Ko E, Oh SJ. Artificial intelligence-driven new drug discovery targeting serine/threonine kinase 33 for cancer treatment. Cancer Cell Int. 2023;23(1):321. doi: 10.1186/s12935-023-03176-2 [DOI] [PMC free article] [PubMed] [Google Scholar]
26.Xing N, Du Q, Guo S, Xiang G, Zhang Y, Meng X, et al. Ferroptosis in lung cancer: a novel pathway regulating cell death and a promising target for drug therapy. Cell Death Discov. 2023;9(1):110. doi: 10.1038/s41420-023-01407-z [DOI] [PMC free article] [PubMed] [Google Scholar]
27.Zhang Y, Du T, Liu N, Wang J, Zhang L, Cui C-P, et al. Discovery of an OTUD3 inhibitor for the treatment of non-small cell lung cancer. Cell Death Dis. 2023;14(6):378. doi: 10.1038/s41419-023-05900-2 [DOI] [PMC free article] [PubMed] [Google Scholar]
28.Pang W, Chen M, Qin Y. Prediction of anticancer drug sensitivity using an interpretable model guided by deep learning. BMC Bioinform. 2024;25(1):182. doi: 10.1186/s12859-024-05669-x [DOI] [PMC free article] [PubMed] [Google Scholar]
29.Shahzad M, Kadani AZUA, Tahir MA, Malick RAS, Jiang R. DRPO: a deep learning technique for drug response prediction in oncology cell lines. Alex Eng J. 2024;105:88–97. doi: 10.1016/j.aej.2024.06.052 [DOI] [Google Scholar]
30.Qian Y, Ni W, Xianyu X, Tao L, Wang Q. DoubleSG-DTA: deep learning for drug discovery: case study on the non-small cell lung cancer with EGFRT790M mutation. Pharmaceutics. 2023;15(2):675. doi: 10.3390/pharmaceutics15020675 [DOI] [PMC free article] [PubMed] [Google Scholar]
31.Suda K, Onozato R, Yatabe Y, Mitsudomi T. EGFR T790M mutation: a double role in lung cancer cell survival?. J Thorac Oncol. 2009;4(1):1–4. doi: 10.1097/JTO.0b013e3181913c9f [DOI] [PubMed] [Google Scholar]
32.Xu M, Xiao X, Chen Y, Zhou X, Parisi L, Ma R. 3D physiologically-informed deep learning for drug discovery of a novel vascular endothelial growth factor receptor-2 (VEGFR2). Heliyon. 2024;10(16):e35769. [DOI] [PMC free article] [PubMed] [Google Scholar]
33.Zhang M, Yu Y, Jin S, Gu L, Ling T, Tao X. VM-UNET-V2: rethinking Vision Mamba UNet for medical image segmentation. In: International symposium on bioinformatics research and applications. Singapore: Springer Nature Singapore; 2024. pp. 335–346. [Google Scholar]
34.Liao W, Zhu Y, Wang X, Pan C, Wang Y, Ma L. 2024. Lightm-unet: Mamba assists in lightweight unet for medical image segmentation. arXiv preprint arXiv:2403.05246. [Google Scholar]
35.Zhong R, Yu J, Zhang C, Munetomo M. SRIME: a strengthened RIME with Latin hypercube sampling and embedded distance-based selection for engineering optimization problems. Neural Comput Applic. 2024;36(12):6721–40. doi: 10.1007/s00521-024-09424-4 [DOI] [Google Scholar]
36.Abdel-Salam M, Hu G, Çelik E, Gharehchopogh FS, El-Hasnony IM. Chaotic RIME optimization algorithm with adaptive mutualism for feature selection problems. Comput Biol Med. 2024;179:108803. doi: 10.1016/j.compbiomed.2024.108803 [DOI] [PubMed] [Google Scholar]
37.Ma Y, Chen S, Ermon S, Lobell DB. Transfer learning in environmental remote sensing. Remote Sens Environ. 2024;301:113924. doi: 10.1016/j.rse.2023.113924 [DOI] [Google Scholar]
38.Li H, Leung KS, Wong MH, Ballester PJ. Low-quality structural and interaction data improves binding affinity prediction via casual forest. Molecules. 2015;20(6):10947–62. [DOI] [PMC free article] [PubMed] [Google Scholar]
39.Yang Z, Zhong W, Zhao L, Yu-Chian Chen C. MGraphDTA: deep multiscale graph neural network for explainable drug-target binding affinity prediction. Chem Sci. 2022;13(3):816–33. doi: 10.1039/d1sc05180f [DOI] [PMC free article] [PubMed] [Google Scholar]
40.Pahikkala T, Airola A, Pietilä S, Shakyawar S, Szwajda A, Tang J, et al. Toward more realistic drug-target interaction predictions. Brief Bioinform. 2015;16(2):325–37. doi: 10.1093/bib/bbu010 [DOI] [PMC free article] [PubMed] [Google Scholar]
41.He T, Heidemeyer M, Ban F, Cherkasov A, Ester M. SimBoost: a read-across approach for predicting drug-target binding affinities using gradient boosting machines. J Cheminform. 2017;9(1):24. doi: 10.1186/s13321-017-0209-z [DOI] [PMC free article] [PubMed] [Google Scholar]
42.Öztürk H, Özgür A, Ozkirimli E. DeepDTA: deep drug-target binding affinity prediction. Bioinformatics. 2018;34(17):i821–9. doi: 10.1093/bioinformatics/bty593 [DOI] [PMC free article] [PubMed] [Google Scholar]
43.Abbasi K, Razzaghi P, Poso A, Amanlou M, Ghasemi JB, Masoudi-Nejad A. DeepCDA: deep cross-domain compound-protein affinity prediction through LSTM and convolutional neural networks. Bioinformatics. 2020;36(17):4633–42. doi: 10.1093/bioinformatics/btaa544 [DOI] [PubMed] [Google Scholar]
44.Zeng Y, Chen X, Luo Y, Li X, Peng D. Deep drug-target binding affinity prediction with multiple attention blocks. Brief Bioinform. 2021;22(5):bbab117. doi: 10.1093/bib/bbab117 [DOI] [PMC free article] [PubMed] [Google Scholar]
45.Zhao Q, Duan G, Yang M, Cheng Z, Li Y, Wang J. AttentionDTA: drug-target binding affinity prediction by sequence-based deep learning with attention mechanism. IEEE/ACM Trans Comput Biol Bioinform. 2023;20(2):852–63. doi: 10.1109/TCBB.2022.3170365 [DOI] [PubMed] [Google Scholar]
46.Wang C, Chen Y, Zhao L, Wang J, Wen N. Modeling DTA by combining multiple-instance learning with a private-public mechanism. Int J Mol Sci. 2022;23(19):11136. doi: 10.3390/ijms231911136 [DOI] [PMC free article] [PubMed] [Google Scholar]
47.Nguyen T, Le H, Quinn TP, Nguyen T, Le TD, Venkatesh S. GraphDTA: predicting drug-target binding affinity with graph neural networks. Bioinformatics. 2021;37(8):1140–7. doi: 10.1093/bioinformatics/btaa921 [DOI] [PubMed] [Google Scholar]

PLoS One. 2025 Apr 29;20(4):e0319499. doi: 10.1371/journal.pone.0319499.r001

Author response to Decision Letter 0

2 Sep 2024

PLoS One. doi: 10.1371/journal.pone.0319499.r002

Decision Letter 0

Ruo Wang

1 Oct 2024

PONE-D-24-38371Optimizing chemotherapeutic targets in non-small cell lung cancer with transfer learning for precision medicinePLOS ONE

Dear Dr. kumari,

Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process.

Please submit your revised manuscript by Nov 15 2024 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org . When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

Please include the following items when submitting your revised manuscript:

A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.
A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.
An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter.

If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: https://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols . Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols .

We look forward to receiving your revised manuscript.

Kind regards,

Ruo Wang

Academic Editor

PLOS ONE

Journal Requirements:

1. When submitting your revision, we need you to address these additional requirements.

Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at

https://journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and

https://journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf

2. Please note that PLOS ONE has specific guidelines on code sharing for submissions in which author-generated code underpins the findings in the manuscript. In these cases, we expect all author-generated code to be made available without restrictions upon publication of the work. Please review our guidelines at https://journals.plos.org/plosone/s/materials-and-software-sharing#loc-sharing-code and ensure that your code is shared in a way that follows best practice and facilitates reproducibility and reuse.

3. Thank you for stating the following financial disclosure:

Please state what role the funders took in the study. If the funders had no role, please state: "The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript."

If this statement is not correct you must amend it as needed.

Please include this amended Role of Funder statement in your cover letter; we will change the online submission form on your behalf.

4. Thank you for stating the following in the Acknowledgments Section of your manuscript:

We note that you have provided funding information that is not currently declared in your Funding Statement. However, funding information should not appear in the Acknowledgments section or other areas of your manuscript. We will only publish funding information present in the Funding Statement section of the online submission form.

Please remove any funding-related text from the manuscript and let us know how you would like to update your Funding Statement. Currently, your Funding Statement reads as follows:

Please include your amended statements within your cover letter; we will change the online submission form on your behalf.

5. We note that your Data Availability Statement is currently as follows: All relevant data are within the manuscript and its Supporting Information files.

Please confirm at this time whether or not your submission contains all raw data required to replicate the results of your study. Authors must share the “minimal data set” for their submission. PLOS defines the minimal data set to consist of the data required to replicate all study findings reported in the article, as well as related metadata and methods (https://journals.plos.org/plosone/s/data-availability#loc-minimal-data-set-definition).

For example, authors should submit the following data:

- The values behind the means, standard deviations and other measures reported;

- The values used to build graphs;

- The points extracted from images for analysis.

Authors do not need to submit their entire data set if only a portion of the data was used in the reported study.

If your submission does not contain these data, please either upload them as Supporting Information files or deposit them to a stable, public repository and provide us with the relevant URLs, DOIs, or accession numbers. For a list of recommended repositories, please see https://journals.plos.org/plosone/s/recommended-repositories.

If there are ethical or legal restrictions on sharing a de-identified data set, please explain them in detail (e.g., data contain potentially sensitive information, data are owned by a third-party organization, etc.) and who has imposed them (e.g., an ethics committee). Please also provide contact information for a data access committee, ethics committee, or other institutional body to which data requests may be sent. If data are owned by a third party, please indicate how others may request data access.

6. When completing the data availability statement of the submission form, you indicated that you will make your data available on acceptance. We strongly recommend all authors decide on a data sharing plan before acceptance, as the process can be lengthy and hold up publication timelines. Please note that, though access restrictions are acceptable now, your entire data will need to be made freely accessible if your manuscript is accepted for publication. This policy applies to all data except where public deposition would breach compliance with the protocol approved by your research ethics board. If you are unable to adhere to our open data policy, please kindly revise your statement to explain your reasoning and we will seek the editor's input on an exemption. Please be assured that, once you have provided your new statement, the assessment of your exemption will not hold up the peer review process.

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: Partly

Reviewer #2: Partly

Reviewer #3: No

**********

2. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: Yes

Reviewer #2: Yes

Reviewer #3: No

**********

3. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: No

Reviewer #2: No

Reviewer #3: No

**********

4. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: Yes

Reviewer #2: No

Reviewer #3: No

**********

5. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: The manuscript presents a novel approach for identifying chemotherapeutic targets in non-small cell lung cancer (NSCLC) using transfer learning models. The proposed method integrates a hybrid UNet transformer for feature extraction and a modified Rime optimization (MRO) algorithm for dimensionality reduction, followed by a deep transfer learning (DTransL) model to improve the accuracy of drug discovery. While the study addresses an important topic in the intersection of machine learning and oncology, several critical issues regarding methodology clarity, data presentation, and interpretation of results need to be addressed to improve the manuscript's rigor and impact.

Major Comments:

1. Introduction and Background:

The introduction should better articulate the novelty of the proposed approach compared to existing methods. Currently, it is not clear how the combination of feature extraction and transfer learning represents a significant advancement over prior models.

2. Methods:

The methodology section is quite detailed, but it lacks clarity in several key areas:

The description of the hybrid UNet transformer model for feature extraction is overly technical without sufficient context for readers who may not be familiar with this architecture. A brief explanation of how the hybrid UNet differs from standard UNet models and its relevance to feature extraction in drug discovery would be helpful.

The use of the Modified Rime Optimization (MRO) algorithm for dimensionality reduction needs more justification. The rationale behind selecting this algorithm over other common methods (e.g., Principal Component Analysis, t-SNE) is not well explained. A comparative discussion of its advantages and limitations is necessary.

The explanation of the deep transfer learning (DTransL) model is also highly technical and lacks clarity. More intuitive descriptions or visual aids (e.g., flowcharts) showing how transfer learning is applied in this context could greatly enhance understanding.

3. Data Presentation and Analysis:

The manuscript presents results across three benchmark datasets (Davis, KIBA, and Binding-DB), demonstrating improvements in predictive accuracy using the proposed MRO+DTransL model. However, the presentation of the results could be more systematic:

Tables summarizing the performance of different models should include standard deviations or confidence intervals to provide a sense of variability and statistical significance.

The manuscript reports various performance metrics (e.g., Concordance Index, Mean Square Error, Regression towards Mean, Pearson Correlation), but the relevance of these metrics to the clinical applicability of the model is not well discussed. The authors should provide more context on why these particular metrics were chosen and how they relate to real-world drug discovery and treatment scenarios.

The ablation studies (e.g., varying epochs) are a good addition to demonstrate the robustness of the model. However, more detailed explanations are needed regarding the implications of these results for practical applications, such as computational costs and model training time.

4. Interpretation of Results:

While the manuscript claims significant improvements over existing models, it does not provide a critical discussion of potential limitations. For instance, how might overfitting or data imbalance impact the reported results? Are there any challenges in replicating these results in a clinical setting?

The authors should also address the generalizability of their findings. The datasets used for validation (Davis, KIBA, and Binding-DB) are standard in the field, but it is unclear whether the model would perform equally well on new, unseen datasets, particularly those that reflect diverse patient populations or drug-target interactions.

5. Discussion and Conclusion:

The discussion section should provide a more balanced view of the study’s strengths and limitations. While the authors emphasize the superiority of their approach, there is little mention of potential drawbacks, such as the computational complexity of the proposed models or the need for large-scale, high-quality data.

The manuscript should propose specific future research directions to address current limitations. For example, are there plans to test the model on other types of cancer, or to explore alternative feature extraction techniques? Such a discussion would help situate the work within the broader field and suggest avenues for further development.

6. Figures and Visual Aids:

The manuscript contains several figures, such as the conceptual structure of the proposed drug discovery model. However, some of these figures are not clearly referenced in the text. Each figure should be explicitly described and referenced to enhance comprehension.

The addition of comparative visualizations (e.g., ROC curves, precision-recall curves) could provide more insights into the model's performance relative to existing methods.

Minor Comments:

Abstract: The abstract should be more concise and directly state the novel contributions of the study. Currently, it contains too much general information and lacks a clear summary of key findings.

Language and Clarity: The manuscript contains numerous grammatical errors and awkward phrasings that affect readability. A thorough proofreading is necessary to improve clarity and coherence.

References: Ensure all references are up-to-date and relevant. Some recent studies on transfer learning and drug discovery in oncology could provide a more comprehensive background.

Reviewer #2: The authors designed and tested a deep learning algorithm with a feature extraction (integrate hybrid UNet) combined with modified rime optimization (MRO) for dimensionality reduction called MRO+DTransL. The new network was tested on 3 different datasets according to its performance against state of the art networks for datasets for lung cancer with EGFRT790M mutation. Nevertheless, the accuracy of the presented approach is very high for benchmark datasets, there are some points of the presented work the authors should address in more detail.

Authors should fix typing errors:

forNSCLC (last section of introduction)

extrapolativeprocess (2 section)

mutationprediction (Table 1)

andALK (Table 1)

EGFRT790M (2 section)

theoptimal (3.1 section)

removesless (3.1 section)

variedpart (3.1 section)

populaceearlier (3.1 section)

Medicatioinnovation (3.3 section)

ofCT (3.3 section)

,QT (3.3 section)

characteristictrajectories (3.3 section)

markfields (3.3 section)

efficiencymechanism (3.3 section)

Maxpooling (3.3 section)

charcteristicmap (3.3 section)

recordvitalcharacteristics (3.3 section)

maxpooling (3.3 section)

)are (3.3 section)

resultsLastly (4.1 section)

…

The relation of this work to the mentioned networks in the second paragraph of section ‘2. related works’ is not clear to me. Authors do not use this related works in their conclusion. I do not see the importance of mentioning this works.

Authors should correct typing errors in Table 1.

In Table 1 authors mention MAE, NDCG, CNN and RNN as well as much more abbreviations. I would suggest writing the meaning of the abbreviations in the caption or directly in the table, so that the reader can better follow the logic of the table.

The ratio of training, validation and testing should be mentioned in the caption of Figure 1.

The caption of Figure 2 should help to understand the Figure itself. More info is needed.

The statement ‘a huge optimal solution is printed as follow’ in section 3.1. The word printed is confusing.

Algorithm 1: Step 1 and 8 are needed to be mentioned?

‘M Agent Place’ is not clear to me. (after equation 16)

‘ln’, ‘un’ in the paragraph before equation 17 are not corresponding to formula 17.

Algorithm 2: step 9 to 12 are not clear to me.

‘which is one of the most common and deadly forms of lung cancer’, this statement was mentioned several times before and is redundant.

Also, all the paragraph after this statement is a repetition of previous statements.

Algorithm 3: 3. … converged do, what means ‘do’?

Also, here step 9 to 11 are not clear to me?

Section 3 show a lot of information about the suggested deep learning approach, which are never mentioned in the results section. It is not clear for me why such a detailed explanation is needed. I would suggest reducing the mentioned formula to the most important ones and shift the other formula in the supporting information section. In general, the section 3 is very long and should be reduced.

In the result section authors present very similar outcome for each investigated dataset. I would suggest combining the findings in a single statement without repeating sever times the same findings. Also, the table 2,3 and 4 can be moved to the supporting information section.

Figures 3, 4 and 5 are looking quite the same. I would suggest compacting the outcome in a single figure, which better show the differences between the datasets. The legend has no sense if the axes label shows the same labels.

The findings are presented 3 times in nearly the same way. I would suggest summarizing the findings in one single statement.

How the authors avoid overfitting? A very high accuracy was reached already with 200 epochs. Why the authors suggest making 1000 epochs with such a small increment of accuracy compared to the high calculation costs.

The authors did not mention the calculation environment and the times needed to obtain the presented outcome.

In Figure 6, 7 and 8 Training time for the iteration is confusing.

In figure 7, why the loss does not oscillate after a certain point?

Table 5 and Figure 9-11 are showing the same. What is the sense to present 2 times the same findings? Figure 9-11 are redundant.

In the conclusion as well as abstract it should be clear what is new in this work. The algorithm was developed by the authors? What are the structural difference to the state of the art algorithm? The feature reduction is different to the presented state of the art algorithm? Main differences and new findings should be more clear.

Reviewer #3: The paper introduces an approach to finding new therapeutic agents for non-small cell lung cancer (NSCLC) by means of a machine learning (neural) model augmented with an optimisation technique for dimensionality reduction. While the goal of the paper is related to very relevant problem, the presented description of the solution is not very convincing, leading to severe doubts about the actual contribution of the work, as elaborated in detail in the following paragraphs.

Perhaps most importantly, the code and any data pertinent to the presented work is missing in the submission, and there is no link to an online resource (e.g., GitHub or paperswithcode). This makes it impossible to cross-reference the authors' claims with the actual implementation, let alone reproduce the whole approach.

The other general problem is that the authors claim to improve identification of chemotherapeutic targets that could improve outcome in NSCLC patients, but they do not show any results demonstrating success in this specific tasks - all the experiments are done on general drug binding and/or target datasets, which is a rather long shot from actual treatment response. Also, the datasets contain all sorts of drugs, while a more focused approach motivated, for instance, by the specific biology of NSCLC, would make more sense. Last but not least, the authors mention kinase inhibitors in the text, which is inconsistent with what the title of the paper says, not contributing to credibility of the approach.

The approach itself is rather poorly motivated - why a U-net architecture? This is designed for image processing, and the authors do not show any specific input preprocessing that would let them treat drug and protein target representations as image data (which they seem to use, since when describing that part of the solution, they seem to be working with voxel data). The Rime optimisation part then seems totally disconnected - why is it needed at all? (neural models typically deal with feature extraction well enough on their own; that's one of their key advantages, after all) And how does it process the drug and target features, exactly? None of this is clear at all from the sort of boilerplate, disconnected description in the paper. The exact use of transfer learning is not clear either - what is the original domain where the neural model is trained, and what is the new one? And these are only the selected major questions related to the technical soundness of the approach, unfortunately...

In the validation part, we are shown some impressive numbers, but without context, and without any relationship whatsoever to the goal of the paper (i.e., improving existing NSCLC treatment regimes). Furthermore, comparison with approaches that also target this particular diagnosis is missing. This makes it very hard to see exactly how the data support the hypothesis of the authors.

Finally, in terms of presentation and scientific rigor, the paper does not score too high, either. The level of English is rather low, with grammar issues already in the abstract, lots of typos (especially merged words) and some sentences and even whole paragraphs barely understandable (e.g., the sentences after eq. (8) in the Rime part). Lots of references that are supposed to support claims made in the paper are about totally irrelevant research (e.g., ref [37] in the transfer learning part that is about remote sensing).

This assessment unfortunately leads to the only possible conclusion - this work is by far not ready for publication.

**********

6. PLOS authors have the option to publish the peer review history of their article (what does this mean? ). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy .

Reviewer #1: No

Reviewer #2: Yes: David Dannhauser

Reviewer #3: No

**********

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/ . PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org . Please note that Supporting Information files do not need this step.

PLoS One. 2025 Apr 29;20(4):e0319499. doi: 10.1371/journal.pone.0319499.r003

Author response to Decision Letter 1

21 Oct 2024

Review Comments to the Author

Reviewer #1:

The manuscript presents a novel approach for identifying chemotherapeutic targets in non-small cell lung cancer (NSCLC) using transfer learning models. The proposed method integrates a hybrid UNet transformer for feature extraction and a modified Rime optimization (MRO) algorithm for dimensionality reduction, followed by a deep transfer learning (DTransL) model to improve the accuracy of drug discovery. While the study addresses an important topic in the intersection of machine learning and oncology, several critical issues regarding methodology clarity, data presentation, and interpretation of results need to be addressed to improve the manuscript's rigor and impact.

Major Comments:

1. Introduction and Background:

The introduction provides a good overview of the challenges in NSCLC treatment and the potential for machine learning models in drug discovery. However, the transition to the specific approach used in this study is abrupt. It would be beneficial to provide more context on the limitations of existing models and why the proposed hybrid UNet transformer and DTransL model are expected to address these gaps. The introduction should better articulate the novelty of the proposed approach compared to existing methods. Currently, it is not clear how the combination of feature extraction and transfer learning represents a significant advancement over prior models.

Response: To address this, I will revise the introduction to include more context on the limitations of existing drug discovery models. Many traditional models, such as LSTM and other deep learning approaches, struggle with overfitting, especially when handling complex, high-dimensional data like drug-protein interactions. Additionally, current feature extraction techniques often lead to false positives, which can reduce the reliability of drug-target predictions. The proposed hybrid UNet transformer model addresses these issues by employing advanced feature extraction that captures deeper, more relevant patterns in drug and protein sequences. This significantly reduces the rate of false positives, improving the overall accuracy of drug discovery. The use of the DTransL model in combination with the Modified Rime Optimization (MRO) algorithm also provides a novel method for dimensionality reduction and transfer learning, which enhances generalization across diverse datasets while minimizing overfitting. In comparison to prior models, this combination represents a significant advancement in both accuracy and robustness, as demonstrated by the superior performance on benchmark datasets like Davis, KIBA, and Binding-DB. I will ensure that the introduction clearly articulates these advancements and positions the proposed methodology as a solution to the limitations observed in earlier models.

2. Methods:

The methodology section is quite detailed, but it lacks clarity in several key areas:

Response: As per the suggestions, we have included brief explanation about feature extraction and difference between standard UNet and hybrid UNet, highlighted as red color text.

Response: As per the suggestions, the correction done in the revised manuscript and highlighted with Red color text.

Response: We understand the importance of making technical concepts more accessible and agree that visual aids such as flowcharts can improve understanding. However, given the complexity of the DTransL model and the need for precise explanations, we have opted to include a detailed pseudo-code representation rather than flowcharts. This choice allows us to clearly outline the step-by-step operations and logic used in the model, offering a more structured and transparent view of its working.

3. Data Presentation and Analysis:

Tables summarizing the performance of different models should include standard deviations or confidence intervals to provide a sense of variability and statistical significance.

Response: As per the suggestions, we have added confidence intervals to describe the statistical significance between proposed and existing works.

Response: In response to the feedback, we have expanded the discussion on the relevance of the chosen performance metrics—Concordance Index (CI), Mean Square Error (MSE), Regression towards Mean (RTM), and Pearson Correlation (PC)—to enhance their clinical applicability. The CI is particularly important as it evaluates the ability of the model to accurately rank drug-target interactions, which is critical for prioritizing therapeutic candidates in clinical settings. MSE provides insight into the model's predictive accuracy regarding binding affinities, ensuring that the predictions align closely with actual biological interactions, a key factor in effective drug development. RTM assesses the model's robustness by examining its generalizability to unseen data, mitigating the risk of overfitting and ensuring that the model can reliably predict outcomes in diverse patient populations. Lastly, the PC quantifies the strength of the linear relationship between predicted and observed outcomes, which is essential for validating the model's effectiveness in real-world applications. This additional context underscores the significance of these metrics in ensuring that our drug discovery model is not only statistically sound but also clinically relevant.

Response: In response to the feedback regarding the ablation studies, we have revised the manuscript to provide a more detailed interpretation of the results, particularly concerning their implications for practical applications. We highlight how varying epochs impacts model performance, specifically in terms of training time and computational costs.

4. Interpretation of Results:

Response: In response, we have added a section that addresses several key issues, including the risks of overfitting and data imbalance, as well as the challenges associated with replicating our results in clinical settings. Overfitting is especially when working with complex models like our MRO+DTransL. We acknowledge that while our model demonstrates high accuracy on benchmark datasets, there is a risk that it may perform less effectively on unseen data. To mitigate this risk, we employed techniques such as cross-validation and hyperparameter tuning, but we recognize the importance of continuous monitoring for overfitting, particularly in clinical applications where patient populations can differ significantly from training datasets. Additionally, we discuss the potential impacts of data imbalance, which can skew model performance and limit generalizability. The datasets used, while comprehensive, may not fully represent the diverse patient demographics encountered in clinical practice. We emphasize the need for future studies to incorporate more diverse datasets to enhance model robustness and ensure its applicability across different patient populations. We address the challenges of replicating our results in clinical settings, including the need for real-time data integration and the complexities of translating computational models into practical workflows in healthcare environments.

Response: We recognize that while the Davis, KIBA, and Binding-DB datasets are widely accepted benchmarks in the field of drug discovery, their ability to accurately represent the performance of our model on new, unseen datasets—especially those reflecting diverse patient populations and drug-target interactions—is a crucial consideration. In our revised manuscript, we have added a discussion addressing this concern. We emphasize that the performance of the MRO+DTransL model on these benchmark datasets demonstrates its potential; however, we acknowledge that the true test of its generalizability lies in its application to real-world data. To enhance the robustness and applicability of our model, we recommend future validation on additional datasets that encompass a broader range of patient demographics and drug-target interactions. This would help in assessing how well our model adapts to variations in biological contexts and treatment scenarios. Furthermore, we discuss potential strategies to improve generalizability, such as incorporating techniques like transfer learning and domain adaptation, which can help our model learn from related tasks and datasets, thus improving its performance on novel data. By addressing these aspects, we aim to provide a clearer understanding of the generalizability of our findings and encourage further research in diverse clinical settings.

5. Discussion and Conclusion:

Response: In response, we have revised the discussion to include a thorough examination of potential drawbacks associated with our proposed models. We recognize that while the MRO+DTransL model demonstrates significant improvements in drug discovery for NSCLC patients, it also introduces certain complexities. Specifically, the computational demands of the hybrid UNet transformer and the deep transfer learning components may pose challenges in terms of resource requirements, especially for large-scale applications.

Response: In response, we have added a dedicated section outlining potential avenues for further exploration that address the current limitations of our work. We plan to extend our research to investigate the applicability of the MRO+DTransL model on other types of cancer beyond non-small cell lung cancer (NSCLC). This expansion would not only validate the robustness of our approach across different cancer types but also contribute to the broader understanding of drug discovery in oncology. Additionally, we intend to explore alternative feature extraction techniques that may complement or enhance our existing hybrid UNet transformer model. By integrating methods such as graph-based feature extraction or unsupervised learning techniques, we could potentially uncover new insights into drug-target interactions and improve model performance. Furthermore, we aim to investigate the impact of incorporating real-world clinical data, such as patient demographics and treatment histories, to enhance the model's generalizability and predictive power. This would help ensure that our findings are relevant to diverse patient populations and reflect the complexities of actual treatment scenarios. By proposing these future research directions, we aim to situate our work within the broader context of cancer research and drug discovery, highlighting the potential for continued development and refinement of our methodology. Thank you for prompting us to clarify these important aspects of our study.

6. Figures and Visual Aids:

Response: In response, we have thoroughly reviewed the document to ensure that all figures are clearly referenced and described within the text.

The addition of comparative visualizations (e.g., ROC curves, precision-recall curves) could provide more insights into the model's performance relative to existing methods.

Response: In response, we have added a misdiscovery rate plot to the manuscript. This plot provides valuable insights into the accuracy and reliability of our model during training, allowing for a more nuanced understanding of its performance. The misdiscovery rate plot illustrates the relationship between the rate of false discoveries and the model's predictive capability, offering a clearer picture of how our proposed method compares to existing models. By incorporating this visualization, we aim to demonstrate not only the strengths of our approach but also its effectiveness in minimizing false positives, which is crucial for real-world applications in drug discovery.

Minor Comments:

Abstract: The abstract should be more concise and directly state the novel contributions of the study. Currently, it contains too much general information and lacks a clear summary of key findings.

Response: As per the suggestions, we have revised the Abstract and highlighted as Red color text.

Language and Clarity: The manuscript contains numerous grammatical errors and awkward phrasings that affect readability. A thorough proofreading is necessary to improve clarity and coherence.

Response: In response, we have conducted a thorough proofreading to correct grammatical errors and improve the overall readability. We have refined awkward phrasings, enhanced sentence structure, and ensured that the technical content is communicated clearly and coherently.

References: Ensure all references are up-to-date and relevant. Some recent studies on transfer learning and drug discovery in oncology could provide a more comprehensive background.

Response: We have

Attachment

Submitted filename: Response to the comments (4).docx

pone.0319499.s002.docx^{(31.7KB, docx)}

PLoS One. doi: 10.1371/journal.pone.0319499.r004

Decision Letter 1

Ruo Wang

29 Oct 2024

PONE-D-24-38371R1Optimizing chemotherapeutic targets in non-small cell lung cancer with transfer learning for precision medicinePLOS ONE

Dear Dr. kumari,

Please submit your revised manuscript by Dec 13 2024 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org . When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

Please include the following items when submitting your revised manuscript:

A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.
A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.
An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

We look forward to receiving your revised manuscript.

Kind regards,

Ruo Wang

Academic Editor

PLOS ONE

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation.

Reviewer #1: All comments have been addressed

Reviewer #2: All comments have been addressed

**********

2. Is the manuscript technically sound, and do the data support the conclusions?

Reviewer #1: (No Response)

Reviewer #2: Partly

**********

3. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: (No Response)

Reviewer #2: Yes

**********

4. Have the authors made all data underlying the findings in their manuscript fully available?

Reviewer #1: (No Response)

Reviewer #2: Yes

**********

5. Is the manuscript presented in an intelligible fashion and written in standard English?

Reviewer #1: (No Response)

Reviewer #2: No

**********

6. Review Comments to the Author

Reviewer #1: (No Response)

Reviewer #2: The Authors did not review and edit the conclusion of their work. Most of the figures/tables are unchanged. In the current version the figures do not highlight the findings of the authors. I suggest to edit the repeated findings for better readability. A reduction of section 3 (supporting Information) should be considerate by the authors.

**********

7. PLOS authors have the option to publish the peer review history of their article (what does this mean? ). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy .

Reviewer #1: No

Reviewer #2: Yes: David Dannhauser

**********

PLoS One. 2025 Apr 29;20(4):e0319499. doi: 10.1371/journal.pone.0319499.r005

Author response to Decision Letter 2

12 Nov 2024

Reviewer's Responses to Questions

Comments to the Author

Reviewer #1: All comments have been addressed

Reviewer #2: All comments have been addressed

Response: We trust that these revisions adequately meet the reviewers' expectations, and we are grateful for their valuable feedback, which has significantly contributed to enhancing the manuscript's quality.

________________________________________

2. Is the manuscript technically sound, and do the data support the conclusions?

Reviewer #1: (No Response)

Reviewer #2: Partly

Response: In response to Reviewer #2’s suggestions, we have revised the manuscript’s conclusion to more effectively emphasize how the data substantiate our findings. The updated conclusion has been clearly marked in red text in the revised document for easy reference. This revision integrates further clarification on how the experimental data robustly support our conclusions, aligning with the rigorous standards of technical soundness and validity required.

________________________________________

3. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: (No Response)

Reviewer #2: Yes

Response: Thanks

________________________________________

4. Have the authors made all data underlying the findings in their manuscript fully available?

Reviewer #1: (No Response)

Reviewer #2: Yes

Response: We have adhered to the PLOS Data policy by providing comprehensive data access without restrictions, except where ethical considerations, such as privacy or third-party data agreements, apply. All relevant data, including those supporting statistical measures like means, medians, and variances, is accessible either within the manuscript, as supplementary information, or through deposition in a publicly accessible repository as specified in the Data Availability Statement.

________________________________________

5. Is the manuscript presented in an intelligible fashion and written in Standard English?

Reviewer #1: (No Response)

Reviewer #2: No

Response: In response to the reviewer’s feedback, we have carefully revised the language of the manuscript with the assistance of a native English speaker to improve clarity, grammar, and readability.

________________________________________

6. Review Comments to the Author

Reviewer #1: (No Response)

Reviewer #2: The Authors did not review and edit the conclusion of their work. Most of the figures/tables are unchanged. In the current version the figures do not highlight the findings of the authors. I suggest editing the repeated findings for better readability. A reduction of section 3 (supporting Information) should be considerate by the authors.

Response: In response to the reviewer's comments, we have thoroughly revised the conclusion of the manuscript to better reflect the findings of our study and improve the clarity of the message. The updated conclusion now highlights the key results and their implications more effectively. Additionally, we have reviewed the figures and tables in the manuscript. While the content in these sections has not changed significantly, we acknowledge the reviewer's concern about the presentation of the findings. We have revised the figures and tables to ensure that they are more aligned with the main findings, providing clearer visual support for the conclusions.

________________________________________

7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: Yes: David Dannhauser

Response: No

Attachment

Submitted filename: Response to the comments.docx

pone.0319499.s003.docx^{(15.4KB, docx)}

PLoS One. doi: 10.1371/journal.pone.0319499.r006

Decision Letter 2

Ruo Wang

10 Dec 2024

PONE-D-24-38371R2Optimizing chemotherapeutic targets in non-small cell lung cancer with transfer learning for precision medicinePLOS ONE

Dear Dr. kumari,

Please submit your revised manuscript by Jan 24 2025 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org . When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

Please include the following items when submitting your revised manuscript:

A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.
A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.
An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

We look forward to receiving your revised manuscript.

Kind regards,

Ruo Wang

Academic Editor

PLOS ONE

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

Reviewer #2: All comments have been addressed

**********

2. Is the manuscript technically sound, and do the data support the conclusions?

Reviewer #2: Partly

**********

3. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #2: (No Response)

**********

4. Have the authors made all data underlying the findings in their manuscript fully available?

Reviewer #2: (No Response)

**********

5. Is the manuscript presented in an intelligible fashion and written in standard English?

Reviewer #2: Yes

**********

6. Review Comments to the Author

Reviewer #2: The authors modified the conclusion of the manuscript and updated the colors of the last three figures.

The figures 3-5 are still unchanged. The reader does not get any information out of these figures. I strongly suggest to modify figure 3-5.

The table 5 and figure 9-11 show the same data. The authors should combine or reduce repeated information.

The conclusion section repeat the values of table 5 against an average value, which is not indicated in the figures or table.

The caption of all figures should be revised. The reader should be able to understand a figure without reading the whole manuscript.

The conlcusion does not comment any reason for the higher performance or differences between the datasets.

**********

7. PLOS authors have the option to publish the peer review history of their article (what does this mean? ). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy .

Reviewer #2: Yes: David Dannhauser

**********

PLoS One. 2025 Apr 29;20(4):e0319499. doi: 10.1371/journal.pone.0319499.r007

Author response to Decision Letter 3

26 Dec 2024

Reviewer's Responses to Questions

Comments to the Author

Reviewer #2: All comments have been addressed

Response: We sincerely thank the reviewer for their valuable feedback and for recognizing the efforts we have made to address all the comments raised in the previous round of review. We appreciate your positive remarks and are grateful for your thorough evaluation of our manuscript.

2. Is the manuscript technically sound, and do the data support the conclusions?

Reviewer #2: Partly

Response: We appreciate the reviewer’s constructive feedback and acknowledge that certain aspects of the manuscript may require clarification or further elaboration to ensure technical soundness and alignment between the data and conclusions. We have thoroughly reviewed the identified areas and made necessary revisions to strengthen the methodological rigor, including elaboration on controls, replication, and sample sizes. Additionally, we have revised the discussion section to ensure that the conclusions are appropriately supported by the data presented. We hope these improvements address the concerns raised and enhance the overall quality of the manuscript.

3. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #2: (No Response)

Response: We have ensured that the statistical analysis in the manuscript has been conducted appropriately and rigorously, following standard practices and methodologies relevant to the study. Detailed descriptions of the statistical methods, including tests performed, sample sizes, and significance levels, are provided in the manuscript. If there are specific aspects of the statistical analysis that require further clarification, we would be happy to address them in additional revisions.

4. Have the authors made all data underlying the findings in their manuscript fully available?

Reviewer #2: (No Response)

Response: We confirm that all data underlying the findings in our manuscript have been made fully available, adhering to the PLOS Data Policy. The raw data, including data points behind means, medians, and variance measures, have been provided in the supporting information or deposited in a publicly accessible repository. The details, including repository links and access instructions, are outlined in the Data Availability Statement within the manuscript.

5. Is the manuscript presented in an intelligible fashion and written in Standard English?

Reviewer #2: Yes

Response: We thank the reviewer for their positive feedback regarding the clarity and language of the manuscript. We have made every effort to ensure that the manuscript is written in clear, correct, and unambiguous Standard English.

6. Review Comments to the Author

Reviewer #2: The authors modified the conclusion of the manuscript and updated the colors of the last three figures.

Comment 1: The figures 3-5 are still unchanged. The reader does not get any information out of these figures. I strongly suggest modifying figure 3-5.

Response: We have revisited Figures 3-5 and made significant changes to improve their clarity and informativeness. The updated versions of these figures have been modified to better convey the key insights, and the changes have been highlighted in the manuscript using red-colored text for easy identification.

Comment 2: The table 5 and figure 9-11 show the same data. The authors should combine or reduce repeated information.

Response: We have carefully reviewed Table 5 and Figures 9-11, and we acknowledge the redundancy in presenting the same data across these sections. To streamline the presentation and avoid repetition, we have removed the redundant information.

Comment 3: The conclusion section repeats the values of table 5 against an average value, which is not indicated in the figures or table.

Response: We have revised the conclusion section to ensure that it accurately reflects the results presented in the figures and tables without referencing any averages that are not explicitly shown. The comparison to average values has been removed, and the conclusion now focuses on highlighting the performance improvements of the MRO+DTransL model based on the results from the Davis, KIBA, and Binding-DB datasets.

Comment 4: The caption of all figures should be revised. The reader should be able to understand a figure without reading the whole manuscript.

Response: We have revised the captions of all figures to ensure that they are self-explanatory and provide a clear understanding of the content without requiring the reader to refer to the entire manuscript. The revised captions are now more detailed and include explanations of trends and key insights, as suggested.

Comment 5: The conclusion does not comment any reason for the higher performance or differences between the datasets.

Response: We have revised the conclusion to provide insights into the reasons for the higher performance of the MRO+DTransL model and the observed differences between the datasets. The performance variations across the datasets can be attributed to differences in the characteristics of the datasets, such as the number of drug-target interaction (DTI) pairs, the diversity of drugs and proteins, and the nature of bioactivity measurements. For instance, the Davis dataset, with a smaller size compared to KIBA and BindingDB, showed slightly higher performance due to more precise drug-target pairings. The KIBA and BindingDB datasets, being larger and more diverse, presented a broader range of interactions, which enhanced the model's ability to generalize and improve precision and recall. These factors contribute to the performance differences observed in the MRO+DTransL model across the datasets. We have now included these explanations in the revised conclusion to provide a clearer understanding of the model's efficacy and the reasons behind its superior performance.

7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #2: Yes: David Dannhauser

Response: Thanks for your review.

Attachment

Submitted filename: Response_to_the_comments_auresp_3.docx

pone.0319499.s004.docx^{(15.2KB, docx)}

PLoS One. doi: 10.1371/journal.pone.0319499.r008

Decision Letter 3

Ruo Wang

16 Jan 2025

PONE-D-24-38371R3Optimizing chemotherapeutic targets in non-small cell lung cancer with transfer learning for precision medicinePLOS ONE

Dear Dr. kumari,

Please submit your revised manuscript by Mar 02 2025 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org . When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

Please include the following items when submitting your revised manuscript:

A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.
A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.
An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

We look forward to receiving your revised manuscript.

Kind regards,

Ruo Wang

Academic Editor

PLOS ONE

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

Reviewer #2: All comments have been addressed

**********

2. Is the manuscript technically sound, and do the data support the conclusions?

Reviewer #2: Yes

**********

3. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #2: Yes

**********

4. Have the authors made all data underlying the findings in their manuscript fully available?

Reviewer #2: Yes

**********

5. Is the manuscript presented in an intelligible fashion and written in standard English?

Reviewer #2: Yes

**********

6. Review Comments to the Author

Reviewer #2: The authors modifed the figures of the manuscript. The findings of the authors are now clear to the reader. I would only suggest to remove from figure 3-5 the percentage values in the graph, because some of the values overlap, which makes it difficult to read them.

**********

7. PLOS authors have the option to publish the peer review history of their article (what does this mean? ). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy .

Reviewer #2: Yes: David Dannhauser

**********

PLoS One. 2025 Apr 29;20(4):e0319499. doi: 10.1371/journal.pone.0319499.r009

Author response to Decision Letter 4

31 Jan 2025

Reviewer #2: All comments have been addressed

2. Is the manuscript technically sound, and do the data support the conclusions?

Reviewer #2: Yes

3. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #2: Yes

4. Have the authors made all data underlying the findings in their manuscript fully available?

Reviewer #2: Yes

5. Is the manuscript presented in an intelligible fashion and written in standard English?

Reviewer #2: Yes

6. Review Comments to the Author

Reviewer #2: The authors modified the figures of the manuscript. The findings of the authors are now clear to the reader. I would only suggest to remove from figure 3-5 the percentage values in the graph, because some of the values overlap, which makes it difficult to read them.

Response: As per the suggestions, we have removed the % values from the figures 3-5 in the revised manuscript, highlighted as Red color text.

7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #2: Yes: David Dannhauser

Attachment

Submitted filename: Response to Reviewers.docx

pone.0319499.s005.docx^{(12.2KB, docx)}

PLoS One. doi: 10.1371/journal.pone.0319499.r010

Decision Letter 4

Ruo Wang

4 Feb 2025

Optimizing chemotherapeutic targets in non-small cell lung cancer with transfer learning for precision medicine

PONE-D-24-38371R4

Dear Dr. kumari,

We’re pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it meets all outstanding technical requirements.

Within one week, you’ll receive an e-mail detailing the required amendments. When these have been addressed, you’ll receive a formal acceptance letter and your manuscript will be scheduled for publication.

An invoice will be generated when your article is formally accepted. Please note, if your institution has a publishing partnership with PLOS and your article meets the relevant criteria, all or part of your publication costs will be covered. Please make sure your user information is up-to-date by logging into Editorial Manager at Editorial Manager® and clicking the ‘Update My Information' link at the top of the page. If you have any questions relating to publication charges, please contact our Author Billing department directly at authorbilling@plos.org.

If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they’ll be preparing press materials, please inform our press team as soon as possible -- no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org.

Kind regards,

Ruo Wang

Academic Editor

PLOS ONE

Additional Editor Comments (optional):

Reviewers' comments:

PLoS One. doi: 10.1371/journal.pone.0319499.r011

Acceptance letter

Ruo Wang

PONE-D-24-38371R4

PLOS ONE

Dear Dr. kumari,

I'm pleased to inform you that your manuscript has been deemed suitable for publication in PLOS ONE. Congratulations! Your manuscript is now being handed over to our production team.

At this stage, our production department will prepare your paper for publication. This includes ensuring the following:

* All references, tables, and figures are properly cited

* All relevant supporting information is included in the manuscript submission,

* There are no issues that prevent the paper from being properly typeset

If revisions are needed, the production department will contact you directly to resolve them. If no revisions are needed, you will receive an email when the publication date has been set. At this time, we do not offer pre-publication proofs to authors during production of the accepted work. Please keep in mind that we are working through a large volume of accepted articles, so please give us a few weeks to review your paper and let you know the next and final steps.

Lastly, if your institution or institutions have a press office, please let them know about your upcoming paper now to help maximize its impact. If they'll be preparing press materials, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org.

If we can help with anything else, please email us at customercare@plos.org.

Thank you for submitting your work to PLOS ONE and supporting open access.

Kind regards,

PLOS ONE Editorial Office Staff

on behalf of

Dr. Ruo Wang

Academic Editor

PLOS ONE

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Attachment

Submitted filename: Response to the comments (4).docx

pone.0319499.s002.docx^{(31.7KB, docx)}

Attachment

Submitted filename: Response to the comments.docx

pone.0319499.s003.docx^{(15.4KB, docx)}

Attachment

Submitted filename: Response_to_the_comments_auresp_3.docx

pone.0319499.s004.docx^{(15.2KB, docx)}

Attachment

Submitted filename: Response to Reviewers.docx

pone.0319499.s005.docx^{(12.2KB, docx)}

Data Availability Statement

[pone.0319499.ref001] 1.Shi Y, Jin Z, Deng J, Zeng W, Zhou L. A novel high-dimensional kernel joint non-negative matrix factorization with multimodal information for lung cancer study. IEEE J Biomed Health Informat. 2023;28(2). [DOI] [PubMed] [Google Scholar]

[pone.0319499.ref002] 2.Wu Q, Wang J, Sun Z, Xiao L, Ying W, Shi J. Immunotherapy efficacy prediction for non-small cell lung cancer using multi-view adaptive weighted graph convolutional networks. IEEE J Biomed Health Informat. 2023;27(11). [DOI] [PubMed] [Google Scholar]

[pone.0319499.ref003] 3.Nakamura M, Ishikawa H, Ohnishi K, Mori Y, Baba K, Nakazawa K, et al. Effects of lymphopenia on survival in proton therapy with chemotherapy for non-small cell lung cancer. J Radiat Res. 2023;64(2):438–47. doi: 10.1093/jrr/rrac084 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0319499.ref004] 4.Chen Y, Zhang Z, Xiong R, Luan M, Qian Z, Zhang Q, et al. A multi-component paclitaxel -loaded β-elemene nanoemulsion by transferrin modification enhances anti-non-small-cell lung cancer treatment. Int J Pharm. 2024;663:124570. doi: 10.1016/j.ijpharm.2024.124570 [DOI] [PubMed] [Google Scholar]

[pone.0319499.ref005] 5.Zhong Y, Luo B, Hong M, Hu S, Zou D, Yang Y, et al. Oxymatrine induces apoptosis in non-small cell lung cancer cells by downregulating TRIM46. Toxicon. 2024;244:107773. doi: 10.1016/j.toxicon.2024.107773 [DOI] [PubMed] [Google Scholar]

[pone.0319499.ref006] 6.Zhao H, Wu G, Luo Y, Xie Y, Han Y, Zhang D, et al. WNT5B promotes the malignant phenotype of non-small cell lung cancer via the FZD3-DVL3-RAC1-PCP-JNK pathway. Cell Signal. 2024;122:111330. doi: 10.1016/j.cellsig.2024.111330 [DOI] [PubMed] [Google Scholar]

[pone.0319499.ref007] 7.Wang J, Zhu X, Jiang H, Ji M, Wu Y, Chen J. Cancer cell-derived exosome based dual-targeted drug delivery system for non-small cell lung cancer therapy. Colloids Surf B. 2024;244:114141. doi: 10.1016/j.colsurfb.2024.114141 [DOI] [PubMed] [Google Scholar]

[pone.0319499.ref008] 8.Bian W, Chen Y, Ni Y, Lv B, Gong B, Zhu K, et al. Efficacy of GluN2B-containing NMDA receptor antagonist for antitumor and antidepressant therapy in non-small cell lung cancer. Eur J Pharmacol. 2024;980:176860. doi: 10.1016/j.ejphar.2024.176860 [DOI] [PubMed] [Google Scholar]

[pone.0319499.ref009] 9.Zhang K, Wang K, Zhang X, Qian Z, Zhang W, Zheng X, et al. Discovery of small molecules simultaneously targeting NAD(P)H:quinone oxidoreductase 1 and nicotinamide phosphoribosyltransferase: treatment of drug-resistant non-small-cell lung cancer. J Med Chem. 2022;65(11):7746–69. doi: 10.1021/acs.jmedchem.2c00077 [DOI] [PubMed] [Google Scholar]

[pone.0319499.ref010] 10.Yukuyama MN, de Souza A, Henostroza MAB, de Araujo GLB, Löbenberg R, Faria R de O, et al. Unveiling microtubule dynamics in lung cancer: recent findings and prospects for drug delivery and treatment. J Drug Deliv Sci Technol. 2023;89:105017. doi: 10.1016/j.jddst.2023.105017 [DOI] [Google Scholar]

[pone.0319499.ref011] 11.Patra SK, Sahoo RK, Biswal S, Panda SS, Biswal BK. Enigmatic exosomal connection in lung cancer drug resistance. Mol Ther Nucleic Acids. 2024;35(2):102177. [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0319499.ref012] 12.Wang X, Ren X, Lin X, Li Q, Zhang Y, Deng J, et al. Recent progress of ferroptosis in cancers and drug discovery. Asian J Pharm Sci. 2024;19(4):100939. doi: 10.1016/j.ajps.2024.100939 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0319499.ref013] 13.Hill J, Jones RM, Crich D. Discovery of a hydroxylamine-based brain-penetrant EGFR inhibitor for metastatic non-small-cell lung cancer. J Med Chem. 2023;66(22):15477–92. doi: 10.1021/acs.jmedchem.3c01669 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0319499.ref014] 14.Wu T, Yu B, Xu Y, Du Z, Zhang Z, Wang Y, et al. Discovery of selective and potent macrocyclic CDK9 inhibitors for the treatment of osimertinib-resistant non-small-cell lung cancer. J Med Chem. 2023;66(22):15340–61. doi: 10.1021/acs.jmedchem.3c01400 [DOI] [PubMed] [Google Scholar]

[pone.0319499.ref015] 15.Zhao C, Zhang R, Yang H, Gao Y, Zou Y, Zhang X. Antibody-drug conjugates for non-small cell lung cancer: advantages and challenges in clinical translation. Biochem Pharmacol. 2024;226:116378. doi: 10.1016/j.bcp.2024.116378 [DOI] [PubMed] [Google Scholar]

[pone.0319499.ref016] 16.Karampuri A, Kundur S, Perugu S. Exploratory drug discovery in breast cancer patients: a multimodal deep learning approach to identify novel drug candidates targeting RTK signaling. Comput Biol Med. 2024;174:108433. doi: 10.1016/j.compbiomed.2024.108433 [DOI] [PubMed] [Google Scholar]

[pone.0319499.ref017] 17.Kaur R, Suresh PK. Chemoresistance mechanisms in non-small cell lung cancer-opportunities for drug repurposing. Appl Biochem Biotechnol. 2024;196(7):4382–438. doi: 10.1007/s12010-023-04595-7 [DOI] [PubMed] [Google Scholar]

[pone.0319499.ref018] 18.Thirunavukkarasu MK, Ramesh P, Karuppasamy R, Veerappapillai S. Transcriptome profiling and metabolic pathway analysis towards reliable biomarker discovery in early-stage lung cancer. J Appl Genet. 2025;66(1):115–26. [DOI] [PubMed] [Google Scholar]

[pone.0319499.ref019] 19.Srinivasarao DA, Shah S, Famta P, Vambhurkar G, Jain N, Pindiprolu SKS, et al. Unravelling the role of tumor microenvironment responsive nanobiomaterials in spatiotemporal controlled drug delivery for lung cancer therapy. Drug Deliv Transl Res. 2024;1–29. doi: DOINeeded [DOI] [PubMed] [Google Scholar]

[pone.0319499.ref020] 20.Das AP, Agarwal SM. Recent advances in the area of plant-based anti-cancer drug discovery using computational approaches. Mol Divers. 2024;28(2):901–25. doi: 10.1007/s11030-022-10590-7 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0319499.ref021] 21.Tripathi S, Moyer EJ, Augustin AI, Zavalny A, Dheer S, Sukumaran R, et al. RadGenNets: deep learning-based radiogenomics model for gene mutation prediction in lung cancer. Inform Med Unlocked. 2022;33:101062. doi: 10.1016/j.imu.2022.101062 [DOI] [Google Scholar]

[pone.0319499.ref022] 22.Bai Y, Zhou L, Zhang C, Guo M, Xia L, Tang Z, et al. Dual network analysis of transcriptome data for discovery of new therapeutic targets in non-small cell lung cancer. Oncogene. 2023;42(49):3605–18. doi: 10.1038/s41388-023-02866-5 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0319499.ref023] 23.Mathema VB, Sen P, Lamichhane S, Orešič M, Khoomrung S. Deep learning facilitates multi-data type analysis and predictive biomarker discovery in cancer precision medicine. Comput Struct Biotechnol J. 2023;21:1372–82. doi: 10.1016/j.csbj.2023.01.043 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0319499.ref024] 24.Pan X, Yun J, Coban Akdemir ZH, Jiang X, Wu E, Huang JH, et al. AI-drugnet: a network-based deep learning model for drug repurposing and combination therapy in neurological disorders. Comput Struct Biotechnol J. 2023;21:1533–42. doi: 10.1016/j.csbj.2023.02.004 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0319499.ref025] 25.Tran NL, Kim H, Shin C-H, Ko E, Oh SJ. Artificial intelligence-driven new drug discovery targeting serine/threonine kinase 33 for cancer treatment. Cancer Cell Int. 2023;23(1):321. doi: 10.1186/s12935-023-03176-2 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0319499.ref026] 26.Xing N, Du Q, Guo S, Xiang G, Zhang Y, Meng X, et al. Ferroptosis in lung cancer: a novel pathway regulating cell death and a promising target for drug therapy. Cell Death Discov. 2023;9(1):110. doi: 10.1038/s41420-023-01407-z [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0319499.ref027] 27.Zhang Y, Du T, Liu N, Wang J, Zhang L, Cui C-P, et al. Discovery of an OTUD3 inhibitor for the treatment of non-small cell lung cancer. Cell Death Dis. 2023;14(6):378. doi: 10.1038/s41419-023-05900-2 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0319499.ref028] 28.Pang W, Chen M, Qin Y. Prediction of anticancer drug sensitivity using an interpretable model guided by deep learning. BMC Bioinform. 2024;25(1):182. doi: 10.1186/s12859-024-05669-x [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0319499.ref029] 29.Shahzad M, Kadani AZUA, Tahir MA, Malick RAS, Jiang R. DRPO: a deep learning technique for drug response prediction in oncology cell lines. Alex Eng J. 2024;105:88–97. doi: 10.1016/j.aej.2024.06.052 [DOI] [Google Scholar]

[pone.0319499.ref030] 30.Qian Y, Ni W, Xianyu X, Tao L, Wang Q. DoubleSG-DTA: deep learning for drug discovery: case study on the non-small cell lung cancer with EGFRT790M mutation. Pharmaceutics. 2023;15(2):675. doi: 10.3390/pharmaceutics15020675 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0319499.ref031] 31.Suda K, Onozato R, Yatabe Y, Mitsudomi T. EGFR T790M mutation: a double role in lung cancer cell survival?. J Thorac Oncol. 2009;4(1):1–4. doi: 10.1097/JTO.0b013e3181913c9f [DOI] [PubMed] [Google Scholar]

[pone.0319499.ref032] 32.Xu M, Xiao X, Chen Y, Zhou X, Parisi L, Ma R. 3D physiologically-informed deep learning for drug discovery of a novel vascular endothelial growth factor receptor-2 (VEGFR2). Heliyon. 2024;10(16):e35769. [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0319499.ref033] 33.Zhang M, Yu Y, Jin S, Gu L, Ling T, Tao X. VM-UNET-V2: rethinking Vision Mamba UNet for medical image segmentation. In: International symposium on bioinformatics research and applications. Singapore: Springer Nature Singapore; 2024. pp. 335–346. [Google Scholar]

[pone.0319499.ref034] 34.Liao W, Zhu Y, Wang X, Pan C, Wang Y, Ma L. 2024. Lightm-unet: Mamba assists in lightweight unet for medical image segmentation. arXiv preprint arXiv:2403.05246. [Google Scholar]

[pone.0319499.ref035] 35.Zhong R, Yu J, Zhang C, Munetomo M. SRIME: a strengthened RIME with Latin hypercube sampling and embedded distance-based selection for engineering optimization problems. Neural Comput Applic. 2024;36(12):6721–40. doi: 10.1007/s00521-024-09424-4 [DOI] [Google Scholar]

[pone.0319499.ref036] 36.Abdel-Salam M, Hu G, Çelik E, Gharehchopogh FS, El-Hasnony IM. Chaotic RIME optimization algorithm with adaptive mutualism for feature selection problems. Comput Biol Med. 2024;179:108803. doi: 10.1016/j.compbiomed.2024.108803 [DOI] [PubMed] [Google Scholar]

[pone.0319499.ref037] 37.Ma Y, Chen S, Ermon S, Lobell DB. Transfer learning in environmental remote sensing. Remote Sens Environ. 2024;301:113924. doi: 10.1016/j.rse.2023.113924 [DOI] [Google Scholar]

[pone.0319499.ref038] 38.Li H, Leung KS, Wong MH, Ballester PJ. Low-quality structural and interaction data improves binding affinity prediction via casual forest. Molecules. 2015;20(6):10947–62. [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0319499.ref039] 39.Yang Z, Zhong W, Zhao L, Yu-Chian Chen C. MGraphDTA: deep multiscale graph neural network for explainable drug-target binding affinity prediction. Chem Sci. 2022;13(3):816–33. doi: 10.1039/d1sc05180f [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0319499.ref040] 40.Pahikkala T, Airola A, Pietilä S, Shakyawar S, Szwajda A, Tang J, et al. Toward more realistic drug-target interaction predictions. Brief Bioinform. 2015;16(2):325–37. doi: 10.1093/bib/bbu010 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0319499.ref041] 41.He T, Heidemeyer M, Ban F, Cherkasov A, Ester M. SimBoost: a read-across approach for predicting drug-target binding affinities using gradient boosting machines. J Cheminform. 2017;9(1):24. doi: 10.1186/s13321-017-0209-z [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0319499.ref042] 42.Öztürk H, Özgür A, Ozkirimli E. DeepDTA: deep drug-target binding affinity prediction. Bioinformatics. 2018;34(17):i821–9. doi: 10.1093/bioinformatics/bty593 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0319499.ref043] 43.Abbasi K, Razzaghi P, Poso A, Amanlou M, Ghasemi JB, Masoudi-Nejad A. DeepCDA: deep cross-domain compound-protein affinity prediction through LSTM and convolutional neural networks. Bioinformatics. 2020;36(17):4633–42. doi: 10.1093/bioinformatics/btaa544 [DOI] [PubMed] [Google Scholar]

[pone.0319499.ref044] 44.Zeng Y, Chen X, Luo Y, Li X, Peng D. Deep drug-target binding affinity prediction with multiple attention blocks. Brief Bioinform. 2021;22(5):bbab117. doi: 10.1093/bib/bbab117 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0319499.ref045] 45.Zhao Q, Duan G, Yang M, Cheng Z, Li Y, Wang J. AttentionDTA: drug-target binding affinity prediction by sequence-based deep learning with attention mechanism. IEEE/ACM Trans Comput Biol Bioinform. 2023;20(2):852–63. doi: 10.1109/TCBB.2022.3170365 [DOI] [PubMed] [Google Scholar]

[pone.0319499.ref046] 46.Wang C, Chen Y, Zhao L, Wang J, Wen N. Modeling DTA by combining multiple-instance learning with a private-public mechanism. Int J Mol Sci. 2022;23(19):11136. doi: 10.3390/ijms231911136 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0319499.ref047] 47.Nguyen T, Le H, Quinn TP, Nguyen T, Le TD, Venkatesh S. GraphDTA: predicting drug-target binding affinity with graph neural networks. Bioinformatics. 2021;37(8):1140–7. doi: 10.1093/bioinformatics/btaa921 [DOI] [PubMed] [Google Scholar]

PERMALINK

This article has been retracted.

RETRACTED: Optimizing chemotherapeutic targets in non-small cell lung cancer with transfer learning for precision medicine

Varun Malik

Ruchi Mittal

Deepali Gupta

Sapna Juneja

Khalid Mohiuddin

Swati Kumari

Roles

Abstract

1. Introduction

2. Related works

Table 1. Summary of research gaps from the existing drug discoverymodels for cancer treatment.

2.1 Problem description

3. Materials and methods

3.1 Data collection

Fig 1. Statistical description of datasets.

Fig 2. Conceptual structure of proposed drug discovery model for identifying therapeutic targets using transfer learning.

3.2 Feature extraction from drug and protein sequences

Algorithm 1. Feature extraction by HUNet.

3.3 Dimensionality reduction using MRO algorithm

Algorithm 2. Feature optimization using MRO.

3.4 Drug discovery for therapeutic targets in NSCLC patients

Algorithm 3. Drug discovery for NSCLC patients using DTransL.

4. Results and discussion

4.1 Error measure comparison

Table 2. Error measure comparison for Davis dataset.

Table 3. Error measure comparison for KIBA dataset.

Table 4. Error measure comparison for Binding-DB dataset.

Fig 3. Comparison of drug discovery models on Davis dataset.

4.2 Ablation study-1 results analysis with varying epochs of training data

Fig 4. Comparison of drug discovery models on KIBA dataset.

Fig 5. Comparison of drug discovery models on Binding-DB dataset.

4.3 Ablation study-2 misdiscovery rate with training data

Fig 6. Scatter plot for misdiscovery rate analysis on Davis dataset.

Fig 8. Scatter plot for misdiscovery rate analysis on Binding-DB dataset.

Fig 7. Scatter plot for misdiscovery rate analysis on KIBA dataset.

4.4 Ablation study-3 misdiscovery rate with training data

Table 5. Comparison of previous and proposed models on the datasets.

4.5 Case study on lung cancer with EGFRT790M mutation

5. Conclusion

Data Availability

Funding Statement

References

Author response to Decision Letter 0

Decision Letter 0

Ruo Wang

Roles

Author response to Decision Letter 1

Decision Letter 1

Ruo Wang

Roles

Author response to Decision Letter 2

Decision Letter 2

Ruo Wang

Roles

Author response to Decision Letter 3

Decision Letter 3

Ruo Wang

Roles

Author response to Decision Letter 4

Decision Letter 4

Ruo Wang

Roles

Acceptance letter

Ruo Wang

Roles

Associated Data

Supplementary Materials

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases

4.5 Case study on lung cancer with EGFR^T790M mutation