Abstract
Background
NDUFS1 is the largest subunit of OXPHOS complex I (MC-I) and mutations in this gene are associated with MC-I deficiency. This study aims to develop a graph neural network and attention mechanism-based radiopharmaceutical-protein (RP-protein) interaction prediction model for identifying an imaging candidate of mitochondrial function through targeting its core subunit NDUFS1.
Results
The estimated cell viability values for trastuzumab, 177Lu-DOTA-trastuzumab, and 225Ac-DOTA-trastuzumab were 290.1, 89.01, and 8.262 nM, respectively. The deep learning (DL) model was pretrained with normal compound-protein pairs. Afterwards, the model was fine-tuned with the dataset of RP-protein pairs and evaluated with five-fold cross validation. The prediction model trained with normal compound-protein pairs effectively predicted the binding affinity. The fine-tuned model incorporating radioactive properties outperformed the same model trained only on normal compounds. The model estimated the important substructure of a compound related to its binding to the target protein. NDUFS1 protein-targeting compounds were identified and BDBM210829 compound had the best binding affinities, binding rank, and LogP as it binds to the NDUFS1.
Conclusions
This study proposed a DL-based radiolabelled compound-protein interaction prediction model to identify a radiopharmaceutical (RP) that binds to the mitochondrial core subunit NDUFS1. The proposed model shows good performance for predicting RP-protein interaction. BDBM210829 was identified as a top candidate for radiolabeling and targeting the mitochondrial core subunit NDUFS1. This model can be used as an effective virtual screening tool for RP discovery.
Supplementary Information
The online version contains supplementary material available at 10.1186/s13550-025-01300-z.
Keywords: Binding affinity, Radiopharmaceutical discovery, Compound protein interaction, Graph neural network, Mitochondria, NDUFS1
Background
Breast cancer is the second most frequently diagnosed malignancy worldwide and the fourth leading cause of cancer-related death [1]. mitochondria play a significant role in breast cancer progression and metastasis. Mitochondria oxidative phosphorylation (OXPHOS) was significantly upregulated in human breast tumours (
) compared to that in normal tissues [2]. It was reported that cancerous cells exhibit mitochondrial dysfunction [3]; therefor, imaging of mitochondria would play a critical role in the diagnosis and staging of disease. OXPHOS involves five key complexes (CI-CIV); among them complex I (MC-I) is the largest (1 MDa) and is composed of 45 subunits, seven of which are encoded by mitochondrial DNA (mtDNA) (ND1–ND6 and ND4L), while the remaining subunits are encoded by nuclear DNA (nDNA). Among the nDNA-encoded subunits, seven core subunits NDUFV1, NDUFV2, NDUFS1, NDUFS2, NDUFS3, NDUFS7, and NDUFS8 in the peripheral arm are responsible for NADH oxidation and electron transfer [4]. The largest subunit NDUFS1 (75 kDa) containing 3 iron–sulfur clusters, N1b, N4, and N5, plays an important role in electron transfer [5], and mutations in this gene lead to MC-I deficiency [6–8]. Mitochondria imaging by targeting NDUFS1 gene would provide valuable information on mitochondrial function.
Rotenone is a well-known MC-I high-affinity inhibitor that is believed to act near the site of ubiquinone reduction in the distal portion of the enzyme complex [9]. MC-I inhibitor-derivative imaging agents of mitochondria have been developed and studied. The uptake of 18F-BMS, which is a structural analogue of the insecticide pyridaben, a known MC-I inhibitor, was correlated with MC-1 activity [10]. Later, BCPP PET probes were developed after observations that 18F-BMS exhibits a relatively high nonspecific binding; here, the binding affinity of the produced RP was compared to that of rotenone and it was assumed that they share the same bending site based on experiments [11].
Drug screening is a crucial part of the drug discovery process [12, 13], wherein several millions of chemical compounds are evaluated for their binding capacity to a target protein that was already identified and validated for specific diseases. In the drug screening process, the binding affinity between the target and compounds is estimated by experiments such as high-throughput screening [14–16]. Although several approaches have been studied for efficient drug screening (including the use of DNA-encoded libraries) [17, 18], experimental assays to assess the binding affinity of pairs are expensive [19, 20]. Computational approaches are widely proposed to overcome this limitation.
Computational approaches to predict compound-protein interactions (CPI) that are based on machine learning help determine the statistical relationship between the binding affinity and properties of compounds and proteins. The methodology varies according to the properties to be utilised. Conventional machine learning (ML)-based methods mainly use the chemical fingerprint as input features [21–24]. In contrast, DL-based methods mainly utilise the structural information of molecules and proteins and their chemical or physical properties. Recently, many studies have focused on the graph representation of molecules wherein the chemical properties of atoms within compounds and the chemical bonds between them are utilised, and various CPI models were adopted [25–35].
RP are pharmaceutical compounds containing radioisotopes (RI) in their structure [36, 37]. Compared to normal compounds without RI, the effect of radiation is expected to be quite significant for binding affinity to the target protein because the imparted energy is high enough to break the chemical bonds. Therefore, interactions between the RP and protein must be estimated for practical applications. However, its experimental estimation is restricted to selected small peptides or monoclonal antibodies because the synthesis of millions of various radiolabelled compounds is impossible and extremely costly. Moreover, the conventional CPI prediction model has limitations in accurate prediction since most radioactive decay (including α decay or β decay) involves the change of chemical properties. These limitations highlight the necessity for computational approaches that predict the binding affinity between RP and its target protein as a supplementary virtual screening tool. However, ML- or DL-based methods for predicting these interactions have never been addressed. This study aimed to develop a RP-protein interaction prediction model that integrates a graph neural network and attention mechanism while incorporating radioactive properties, molecular structure, and target protein embeddings, to identify potential imaging candidates of the mitochondrial function through targeting its core subunit NDUFS1.
Methods
Cell viability assay
A cell viability assay was conducted to investigate effect of RP with different radioisotopes. NCI-N87 cells overexpressing the HER2 protein were seeded at a density of 104 cells/well and incubated for 48 h after treatments. The cells were exposed to three types of drugs: trastuzumab, 177Lu-DOTA-trastuzumab, and 225Ac-DOTA-trastuzumab. The concentrations of the trastuzumab and 177Lu-DOTA-trastuzumab were 222.5-, 445-, 890-, and 1,780- nM and corresponding activities were 12.5-, 25-, 50-, and 100- µCi. On the other hand, the concentrations of 225Ac-DOTA-trastuzumab were 10.15-, 20.31-, 40.63-, 81.25-, 162.5-, and 325- nM and corresponding activities were 0.0625-, 0.125-, 0.250-, 0.500-, 1.000-, and 2.000 µCi. To estimate the effect of RPs, the difference between cell viability and unity was used. GraphPad Prism v10.1.0 software was used to estimate cell viability. Non-linear fitting was conducted manually when using GraphPad Prism did not work well.
Dataset and data processing
In this study, binding affinity measurements of compound-protein pairs were acquired from BindingDB [38]. The acquired data included 2,799,762 measurements of 1,199,203 compounds, and 9,185 targets. Among these, we selected the compound-protein pairs inhibitory constant (Ki) and Kd, which are the types of binding affinities. Pairs with missing information regarding amino acid chains or the PubChem CID [39] were excluded. For Ki, 529,643 pairs were used as normal compound-protein binding affinity datasets, which included 3,496 proteins and 214,028 drugs. In contrast, 95,760 pairs of Kd (composed of 1,928 proteins and 21,918 drugs) were used as the normal compound-protein datasets. Both datasets were independently used for training. That is, the prediction models for Ki and Kd were obtained separately. The binding affinity profiles of the RPs were curated, and 73 pairs were gathered. The dataset includes 73 RPs and 8 target proteins, including PSMA, SSTR2, NTS1, NTS2, B1R, prolyl endopeptidase, amyloid β, and HER2. A detailed description of the RP-binding affinity dataset is presented in Table S1.
The PubChem CID described in the raw data file was used to acquire a SMILES representation of the compounds using the PubChemPy library. The obtained SMILES representation was used as the input for the function provided by the PyTorch Geometric library [40], which returns the molecular structure graph of the given SMILES based on the RDKit library [41]. The molecular structure graph was homogeneous and consisted of atoms and chemical bond edges. Node embedding is the chemical property of each atom, including the atomic number, chirality, and formal charge. Amino acid sequences were used to represent proteins. A single amino acid was selected as the unit for protein sequence data. Furthermore, the binding affinity was log-transformed to avoid underestimation of drugs with high affinity for the target protein.
Network structure
DL-based model was developed to predict the Ki value of RP-protein interaction, The DL-based prediction model consists of four modules: drug, protein, attention, and regression. The drug module plays a role of extract compound-wise embeddings from a molecular graph. This module is composed of three graph convolution network (GCN) modules and returns an atom-wise embedding. The GCN layer returns the transformed atom embedding, which is the neighbourhood aggregation of the transformed embedding based on the convolution matrix over the input node embedding. The protein module extracts the amino acid residues embedded in the initial embedding by the long short-term memory (LSTM) layer. The attention module applies attention weights to atom embedding and amino acid residue embedding based on their interaction. We referred to the bi-attention module used in the compound-protein interaction model proposed by Li et al. [27]. The detailed mechanism of this module is described in the Supplementary file. The attention-weighted atom embedding is pooled into a compound embedding by applying graph max pooling. In contrast, attention-weighted amino acid residue embedding was transformed into protein embedding by applying the fully connected (FC) layer. Finally, the concatenation of the two embeddings was used as the input for the regression module, which comprised three FC layers. A schematic of the network structure is shown in Fig. 1.
Fig. 1.
Network architecture of the radiopharmaceutical-compound interaction prediction model
Radioactive properties of the RPs
The only crucial difference between a normal compound and RP is the presence of a RI. The radiation emitted from the radioactive decay of RI has high energy, and chemical bonds can be broken by the imparted energy. The RI was added to the compound’s SIMILES to consider the radiation effect on the RP ligand interaction. The RI’s radioactive properties, including decay constant, mass, charge, and energy of the radiation were extracted using RDKit.Chem library. The decay constant is defined as the number of nuclides that decay per unit time and indicates the rate of radioactive decay. However, the physical properties of radiation (including its mass and charge) determine its type of interaction with matter. Radioactive properties were used as the additional node embedding of the molecular graph in the RP-protein interaction prediction model. Therefore, the node embedding dimension of the RP molecular graph is 13.
Training and Evaluation
Training of the RP protein interaction model was conducted in two steps: pre-training the model with normal compound data from BindingDB and fine-tuning the model with the RP data. In the pre-training phase, the model was trained using the Adam optimiser. The number of epochs was set to 200. The initial learning rate was 10− 4 and exponentially decreased with a ratio of 0.99 since the epoch became 100. The model was trained to predict ln(Ki), which is the natural logarithm of Ki. The Huber loss function was used as the loss function. The Huber loss is defined as follows for the predicted value
, ground-truth y, and hyperparameter δ (which was set to 1.0):
![]() |
To evaluate the pre-trained model, the data were randomly split into training and test sets at a ratio of 9:1. The accuracy of the predicted Ki was assessed using three evaluation metrics: Pearson’s correlation coefficient (PCC), R-squared (R2), Mean Absolute Error (MAE), Area Under the Curve (AUC), Confidence Interval (CI), and Mean Squared Error (MSE).
In the fine-tuning step, the training conditions (including the optimiser and learning rate) were the same as those in the pre-training. The number of epochs was 200 and the learning rate exponentially decreased when the epoch was 100. We set five cases and compared their performances by five-fold cross-validation to mitigate insufficient data to verify the effective fine-tuning strategy according to which modules will be fine-tuned. A detailed description of the fine-tuned cases is presented in Table 1.
Table 1.
Fine tuning strategies for radiopharmaceutical protein interaction models
| Index | Layers supposed to be fine-tuned |
|---|---|
| Case 1 | Every trainable layer in regression module. |
| Case 2 | Every trainable layer in attention module and regression module. |
| Case 3 | Last GCN layer and batch normalisation layer in drug module, every trainable layer in attention module and regression module. |
| Case 4 | Every trainable layer in drug module, attention module, and regression module. |
| Case 5 | Entire model |
Patient selection
This study was conducted in 30 patients diagnosed with breast cancer. Patient age ranged from 31 to 68 years. The patients were selected based on specific inclusion criteria. All patients were drug-naive and had no history of cancer treatment, such as chemotherapy, radiotherapy, or surgical intervention. Breast cancer cases included 14 luminal subtypes, 4 HER2-positive subtypes, 3 luminal and HER2-positive cases, and 9 triple-negative breast cases (two with recurrence). This selection was to ensure population homogeneity and reduce variability arising from previous treatment or different cancer subtypes.
Gene expression analysis
Differentially expressed genes (DEGs) between 30 breast cancer patients were identified using PathfindR, an R-based analysis tool. DEG analysis compared 28 breast cancer patients without recurrence to two patients with triple-negative breast cancer who experienced recurrence. A second comparison was conducted between seven patients with triple-negative breast cancer without recurrence and the same two patients with recurrence. DEGs were selected based on a significance threshold of P-values < 0.05 and a log2 fold change > 1. Additionally, the percentage difference was calculated to identify the NDUFS1 gene changes.
NDUFS1 binding RP screening
Drug screening for identifying a compound that is suitable for radiolabelling and binds to MC-I core subunit NDUFS1 was conducted using RDKit library to analyse the structures of the compounds. Structures considered suitable for radiolabelling included nitrobenzene, halogen elements (Cl, Br, I, F), NO2, SO2, and pyridine. The MolFromSmiles() function in RDKit was used to convert chemical structures in the SMILES format into molecular structures. Subsequently, HasSubstructMatch() function was applied to determine whether specific substructures were present in each compound. The analysis first focused on identifying compounds containing the nitrobenzene structure. Additionally, compounds containing halogen elements, NO2, SO2, and pyridine substructures were similarly screened.
Screened compounds were filtered to collect the best candidate to bind to NDUFS1. The binding affinity and binding rank of the compounds to NDUFS1 were compared to those of rotenone and BCPP-BF. PubChem using XLogP3-AA was utilised to obtain compound LogP values. All compounds with a XlogP higher than 3 were excluded. Compounds with less binding affinity to NDUFS1 than rotenone were excluded. The top candidate was then radiolabelled and re-screened to confirm its binding affinity.
Computational validation
Computational validation was conducted using AutoDock 4 to assess binding affinity (Ki) and visualize the binding sites of compounds binding to NDUFS1 protein. The tested compounds included BDBM210829, BDBM214042, and rotenone. The 2D structure files (.sdf) of the ligands were obtained from BindingDB and subsequently converted to “.mol2” format using Open Babel., BIOVIA Discovery Studio Visualizer was utilized to visualize the binding sites of each compound within NDUFS1. AutoDock Ki values were compared with Ki values predicted by DL model.
Results
Cell viability assay
The NCI-N87 cell viability assay was conducted for trastuzumab, 177Lu-DOTA-trastuzumab, and 225Ac-DOTA-trastuzumab. Radiopharmaceutical’s effect on Cell viability values were 290.1, 89.01, and 8.262 nM, respectively. Although the targeting drug was identical across all three types of RPs, their cytotoxicity varied. 225Ac showed the highest efficacy among the drugs. It was found that radiation affects the binding affinity of the drug-protein pair [].
Model evaluation
The estimated R2 values of Ki and Kd between the predicted and ground truth values were 0.755 and 0.746, respectively, in the prediction model pre-trained with test datasets Ki (n = 48,986) and Kd (n = 8,353) randomly selected from the normal compound database (Fig. 2; Table 2). PCC of Ki and Kd between the predicted and ground-truth values were 0.875 and 0.866, respectively. Other evaluation matrices results are presented in Table 2. These metrics demonstrated that the prediction model effectively predicted the binding affinity of the normal compound to the target protein within the test dataset. However, the pre-trained model hardly predicted the binding affinity of RP, despite its high performance in the prediction of normal compounds (Table 3). The table also shows that the fine-tuning results for every case predicted the binding affinity of the RP pairs more precisely. Among them, fine-tuning every trainable parameter in the model yielded the highest accuracy for R2 and PCC.
Fig. 2.
Scatter plot of the predicted natural log of binding affinities by the prediction model and their ground-truth of the normal test dataset. A: Ki; B: Kd
Table 3.
Five-fold cross validation results for the proposed methods with the radiopharmaceutical dataset
| Fine-tuning strategy | R 2 | MAE | PCC |
|---|---|---|---|
| Case 1 | 0.119 | 1.26 | 0.512 |
| Case 2 | 0.154 | 1.23 | 0.506 |
| Case 3 | 0.0772 | 1.32 | 0.561 |
| Case 4 | 0.204 | 1.14 | 0.512 |
| Case 5 | 0.374 | 0.996 | 0.662 |
| Pre-trained model | -12.1 | 3.64 | 0.437 |
Table 2.
Evaluation of the pre-training results of the prediction model with the normal test dataset (n = 48,986)
| Binding affinity | R 2 | MAE | PCC | AUC | CI | MSE |
|---|---|---|---|---|---|---|
| K i | 0.755 | 1.27 | 0.875 | 0.937 | 0.855 | 3.529 |
| K d | 0.746 | 0.985 | 0.866 | 0.933 | 0.859 | 4.452 |
Case study
We evaluated the binding affinity of specific pairs in addition to evaluating the prediction model by cross-validation. We focused on the PSMA protein, which was mainly addressed in studies on RP development. The inhibitory constants of PSMA-617 and radiolabelled PSMA-617 on the PSMA protein were estimated using AutoDock Vina [42], a widely used docking simulation program, and the proposed DL model. Moreover, SSTR2 in the training and test datasets was also considered. Dotatate and radiolabelled dotatate were considered target compounds of the SSTR2 protein. The experimental results for radiolabelled compound pairs were compared [43–45]. The fine-tuned RP-protein interaction prediction model showed similar Ki to the experiments, while the value predicted by the normal CPI model or AutoDock Vina was less accurate (Table 4). Moreover, the values predicted by AutoDock Vina fluctuated.
Table 4.
Comparison of experimental, AutoDock vina, Compound-Protein interactions (CPI) and Radiopharmaceutical-Protein interactions (RPI) obtained Ki (nm) using specific pairs
| Target protein | Compound | Experiment | AutoDock Vina | AI prediction model | |
|---|---|---|---|---|---|
| CPI model | RPI model | ||||
| PSMA | PSMA-617 | 2.34 ± 2.94 | 6.79 × 10− 3 | 0.1862 | - |
| 68Ga-PSMA-617 | 6.40 ± 1.02 | 32.11 | 0.1367 | 6.292 | |
| 177Lu-PSMA-617 | 6.9 ± 1.32 | 1.59 × 103 | 0.2623 | 5.747 | |
| SSTR2 | Dotatate | x | 128.75 | 0.1717 | - |
| 68Ga-dotatate | 0.7 ± 0.2 | 1.77 × 108 | 0.1617 | 5.325 | |
| 177Lu-dotatate | 3.84 ± 0.30 | 531.48 | 0.2837 | 5.737 | |
Model interpretation
The trained attention module in the prediction model provides an understanding of the individual effects of elements in the input data. From this perspective, it is expected that analysing attention weights can reveal the effect of RI. The attention weights of the pairs evaluated in the case studies were compared. The attention weights in the normal CPI prediction model were acquired for the normal compounds: PSMA-617 with PSMA and dotatate with SSTR2. We calculated the atom-wise importance for a given pair by integrating the attention weights with respect to the amino acid residue to estimate the significant atoms that were closely related to binding. Docking simulation results showed that the oxygen atom in the carboxyl group (R-COOH) and the hydrogen atom in the amino acid group (R-NH2) were involved in binding of the drug to the protein. Results from our model showed that there was a tendency for the carboxyl and amino groups to receive significant attention. These results align with previous findings [46, 47]. The substructures with high attention weights were almost identical compared with the normal and radiolabelled compounds. Moreover, 177Lu showed a high attention weight, whereas 68Ga did not. The result is reasonable since 177Lu is a widely used therapeutic RI, which emits β radiation. Meanwhile, 68Ga is the positron emitter that is used as the diagnostic RI.
Gene expression analysis
DEG analysis identified NDUFS1 as a gene of interest between the compared groups and showed that NDUFS1 expression was higher in patients with recurrence than in patients without recurrence. In the comparison between non-recurrence patients and triple-negative with recurrence patients, NDUFS1 showed a fold change of 1.398 with a percentage difference of 39.80% (P = 0.649). Conversely, the comparison between triple-negative with non-recurrence patients and triple-negative with recurrence patients showed that NDUFS1 exhibited a fold change of 1.651 and a percentage difference of 65.10% (P = 0.405).
NDUFS1 binding RP screening
After training the model on a large drug and protein dataset, NDUFS1 protein was set as a target binding site. Besides rotenone and BCPP-BF, the model identified 37 compounds that bound to NDUFS1 with different binding affinities and rankings. The binding affinity of the compounds ranged from 4.5 to 3240.8 nM, while the binding ranking ranged from 2580 to 194,788. Figure 3 shows the 37 identified compounds with corresponding binding affinities and rankings.
Fig. 3.
Binding affinity with corresponding ranking of NDUFS1-binding compounds identified by the proposed model
The top NDUFS1-binding compounds were selected based on binding affinity, binding rank, and logP value. After exclusion according to the criteria, 8 final filtered compounds were selected as the best NDUFS1-binding candidates. Figure 4 shows the chemical structure of each compound. Among the screened compounds, BDBM210829 was selected as the top compound that binds to NDUFS1. BDBM210829 exhibited the highest binding affinity and best ranking for the NDUFS1 protein. The binding affinities of rotenone, 18F-BCPP-BF, and BDBM210829 were 873.20, 2082.24 and 4.47, respectively, with corresponding ranking values of 188,430, 121,508 and 2580 nM, respectively. The LogP values were 2.9, 1.4 and 1.47, respectively. Table 5 presents each compound’s binding affinity, ranking, and LogP. However, after labelling with Fluorine-18 (18F), the binding affinities, binding ranks, and LogP values of BDBM210829 decreased to 598.08, 90,835 and 1.47, respectively.
Fig. 4.

Chemical structure of top 8 selected NDUFS1-binding compounds identified by the proposed model. (A) BDBM85009, (B) BDBM210829, (C) CYN154806, (D) BDBM214012, (E) BDBM214042, (F) CID135423658, (G) CHEMBL4290888, and (H) CHEMBL3605412
Table 5.
Binding affinity (Ki), binding rank, and XLogP of top 10 screened compounds that bind to NDUFS1
| Drug | Ki (nM) | Binding rank | XLogP |
|---|---|---|---|
| BDBM85009 | 223.53 | 44,343 | 1.1 |
| BDBM210829 | 4.47 | 2580 | 1.4 |
| CYN 154,806 | 294.88 | 42,553 | 2.1 |
| BDBM214012 | 21.52 | 99,701 | 2.9 |
| BDBM214042 | 11.14 | 125,285 | 2.9 |
| CID135423658 | 485.95 | 92,311 | 1.9 |
| CHEMBL4290888 | 375.44 | 97,145 | 1.7 |
| CHEMBL3605412 | 35.98 | 12,962 | 1.9 |
| Rotenone | 873.20 | 188,430 | 2.9 |
| BCPP_BF | 2082.24 | 121,508 | 3.3 |
Computational validation
Computational validation of the proposed model’s binding affinity results was conducted using AutoDock 4 for three compounds binding to NDUFS1 protein. AutoDock predicted Ki values of 4.12 nM, 11.72 nM, and 1030 nM for BDBM210829, BDBM214042, and rotenone, respectively. Our model showed strong agreement with these results, with Ki values of 4.47 nM, 11.14 nM, and 873.20 nM for BDBM210829, BDBM214042, and rotenone, respectively. The binding sites of each compound with NDUFS1 are illustrated in Fig. 5. BDBM210829 exhibits strong electrostatic interactions with negatively charged residues such as Asp551, Asp698, and Glu369 of NDUFS1. hydrogen bond was formed with ARG702, while a repulsive interaction is noted with LYS592 due to positive–positive charge conflict.
Fig. 5.
Docking simulation results using AutoDock 4 of compounds binding to NDUFS1 protein. Dotted yellow line indicates the docking site of the pair. (A) BDBM210829; (B) BDBM214042; (C) Rotenone; (D) BDBM210829-NDUFS1 interactions
Discussion
Radioimmunotherapy utilises the tumour-targeting property of antibodies to simultaneously improve the efficacy and reduce the side effects of radiation. In this study, we conducted a cell viability assay for non-radiolabelled radiolabelled trastuzumab, which is a sort of monoclonal antibody. It was found that radiation had a large effect on cell viability. Moreover, in comparison of 177Lu and 225Ac, which are beta- and alpha-emitters respectively, radiation with high linear energy transfer has more cell killing ability.
In our future work, we plan to develop a virtual screening model for radiolabelled antibodies. Recently, structure-based DL virtual screening of antibodies was proposed by Constantin et al. [48]. However, the extension of the model to the radiolabelled antibody is expected to require different approaches to reflect the radioactive properties of the radiation because the atom-wise properties are not used in the model.
Although the prediction model effectively utilised radioactive properties that affect compound-protein interactions, there were limitations to this study. The most critical issue was the lack of data. Among the many available radioisotopes, approximately 16 radioisotopes are predominantly used in form of radiopharmaceuticals (RPs) for internal therapy and diagnosis. However, in this study we gathered and utilized 73 RP pairs radiolabeled with only seven widely used radioisotopes including 68Ga, 111In, 177Lu, 123I, 124I, 64Cu, and 212Pb for training and validation (Table S1). Therefore, to enhance the model’s generalizability and applicability, a broader range of clinically used radioisotopes should be included. To mitigate overfitting, we employed cross-validation, regularization, and dropout layers in our DL model. Additionally, since many studies estimating the binding affinity of RP have addressed PSMA as the target protein, the target of the pairs within the RP dataset was predominantly PSMA. Moreover, various types of radiation should be used to determine the effect of radiation mass and charge that are closely related to the interaction of radiation with matter. However, the dataset only included positrons, electrons, and photons. Therefore, information on heavily charged particles is not included. Furthermore, the model is not capable of considering the daughter nuclides of the RI. The chemical properties of the RI and compound vary accordingly because most radioactive decay types involve changes in the number of proteins within the nuclide. Therefore, the model must be improved to better reflect its nature. While the model performs well on test cases, further experimental validation is required.
The proposed DL model can predict important structures within a chemical compound in terms of protein binding using the trained attention module. Both physics-based docking simulations and the DL model suggests that the carboxyl and amino groups are significant regions in protein docking. The main difference between the two approaches is that the RP-protein interaction prediction model considers the effects of radiation. Docking-site analysis using AutoDock Vina for 68Ga-PSMA-617 and 177Lu-PSMA-617 showed similar results. The RI cannot be considered unless it is closely related to binding because the results represent the most likely binding subregion to the protein. On the other hand, the DL model can discriminate whether the RI is related to the binding or not. However, there is a limitation in estimating the binding site. Because the molecular graph only considered the covalent bond between the atoms, the chelation of the RI, which is based on the ionic bond, is not considered. Therefore, the carboxyl and amino groups within the DOTA chelator were expected to be involved in the binding by the DL model. However, since they are already involved in chelation, they are not mainly related to the protein-compound binding.
Identifying a RP that binds to the core subunit of mitochondria, NDUFS1, would be helpful in imaging mitochondrial function and providing valuable insights into disease progression. In this study, we screened potential RPs targeting NDUFS1. Among the compounds evaluated, BDBM210829 exhibited the highest binding affinity and ranking for NDUFS1. Docking simulations using AutoDock 4 confirmed the strong binding of BDBM210829 to NDUFS1, with a predicted Ki value that closely matched the deep learning model’s output. BDBM210829 outperformed rotenone and BCPP-BF in terms of binding affinity and ranking to the NDUFS1 protein, which was expected since these compounds do not target NDUFS1 protein that is located in an N-module. Instead, rotenone and BCPP-BF bind to the Q reduction site located in the Q-module of MC-1 [49]. Furthermore, BDBM210829 contains a halogenated pyridine ring (iodine substituent), which is a common motif used in radiopharmaceutical chemistry due to its compatibility with halogen exchange reactions or nucleophilic substitution for ¹⁸F-lableing. Radiolabeling of BDBM210829 with 18F has affected the binding affinity and ranking values. These findings were in good agreement with previous study [50]. In future work, we plan to evaluate BDBM210829 via in vivo experiments to further validate its potential for imaging mitochondrial function.
BDBM210829 was identified by the DL model as the top candidate for binding to the NDUFS1 protein. To validate this finding, computational validation was performed using physics-based docking simulation with AutoDock 4. The analysis included BDBM210829, BDBM214042 the second top candidate and the well-known MC-1 inhibitor, rotenone. Both the DL model and AutoDock produced similar Ki values for all tested compounds, demonstrating the high accuracy of the proposed model.
Conclusions
This study proposed a DL-based radiolabelled compound-protein interaction prediction model to identify a RP that binds to the mitochondrial core subunit NDUFS1, aiming to facilitate imaging of mitochondrial function. The model utilises a graph neural network and an attention mechanism, incorporating the radioactive properties of radionuclides (RIs) as additional input atom embeddings. Designed to predict the inhibition constant (Ki) for a given compound-protein pair, it was optimised using data from BindingDB and fine-tuned with a curated dataset of RP-protein pairs. The fine-tuned model effectively predicted the binding affinity of RPs to their target proteins, and atom-wise importance of the input could be assessed through attention-based model interpretation. The compound BDBM210829 is suitable for 18F radiolabelling and it exhibited high binding affinity to NDUFS1 protein with both DL model and AutoDock. This work is expected to contribute to advancements in RP discovery for functional imaging.
Supplementary Information
Below is the link to the electronic supplementary material.
Acknowledgements
None.
Abbreviations
- MC-I
Mitochondrial complex I
- Ki
inhibitory constant
- CID
Compound ID
- RP
Radiopharmaceuticals
- SMILES
Simplified molecular input line entry system
- GCN
Graph convolution network
- LSTM
Long short-term memory
- FC
Fully connected
- RI
Radioisotope
- R2
coefficient of determination
- MAE
Mean absolute error
- DL
Deep learning
Author contributions
All authors contributed to the study conception and design. Conceptualization, Muath Almaslamani, Jingyu Yang, Chi Soo Kang, Choong Mo Kang and Sang-Keun Woo.; Methodology, Muath Almaslamani, Jingyu Yang, Chi Soo Kang, Choong Mo Kang and Sang-Keun Woo.; Software, Muath Almaslamani, Jingyu Yang and Sang-Keun Woo.; validation, Jingyu Yang.; formal analysis, Muath Almaslamani.; investigation, Muath Almaslamani.; resources, Sang-Keun Woo.; data curation, Jung Mi Park and Sang-Keun Woo.; writing—original draft preparation, Muath Almaslamani.; writing—review and editing, Muath Almaslamani and Sang-Keun Woo.; visualization, Muath Almaslamani Jingyu Yang.; supervision, Sang-Keun Woo.; project administration, Sang-Keun Woo. All the authors have read and agreed to the published version of the manuscript.
Funding
This research was funded by the Korea Institute of Radiological and Medical Sciences (KIRAMS) (50554 − 2025, 50461 − 2025).
Data availability
The datasets generated and/or analysed during the current study are available from the corresponding author upon reasonable request.
Declarations
Ethics approval and consent to participate
The study was approved by the Institutional Review Board of KIRAMS (IRB No.: 2022-09-006-002). All methods were performed in accordance with the relevant guidelines and regulations.
Consent for publication
Informed consent was obtained from all participants involved in the study.
Competing interests
The authors have declared that no competing interest exists.
Clinical trial number
Not applicable.
Footnotes
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
References
- 1.Chhikara BS, Parang K. Global cancer statistics 2022: the trends projection analysis. Chem Biol Lett. 2023;10(1):451. [Google Scholar]
- 2.Whitaker-Menezes D, Martinez-Outschoorn UE, Flomenberg N, Birbe RC, Witkiewicz AK, Howell A, et al. Hyperactivation of oxidative mitochondrial metabolism in epithelial cancer cells in situ: visualizing the therapeutic effects of Metformin in tumor tissue. Cell Cycle. 2011;10(23):4047–64. 10.4161/cc.10.23.18151. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Mondal SK, Haas D, Han J, Whiteside TL. Small EV in plasma of triple negative breast cancer patients induce intrinsic apoptosis in activated T cells. Commun Biol. 2023;6(1):815. 10.1038/s42003-023-05169-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Mimaki M, Wang X, McKenzie M, Thorburn DR, Ryan MT. Understanding mitochondrial complex I assembly in health and disease. Biochim Biophys Acta. 2012;1817(6):851–62. 10.1016/j.bbabio.2011.08.010. [DOI] [PubMed] [Google Scholar]
- 5.Read AD, Bentley RE, Archer SL, Dunham-Snary KJ. Mitochondrial iron–sulfur clusters: structure, function, and an emerging role in vascular biology. Redox Biol. 2021;47:102164. 10.1016/j.redox.2021.102164. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Martín MA, Blázquez A, Gutierrez-Solana LG, Fernández-Moreira D, Briones P, Andreu AL, et al. Leigh syndrome associated with mitochondrial complex I deficiency due to a novel mutation in the NDUFS1 gene. Arch Neurol. 2005;62(4):659–61. 10.1001/archneur.62.4.659. [DOI] [PubMed] [Google Scholar]
- 7.Björkman K, Sofou K, Darin N, Holme E, Kollberg G, Asin-Cayuela J, et al. Broad phenotypic variability in patients with complex I deficiency due to mutations in NDUFS1 and NDUFV1. Mitochondrion. 2015;21:33–40. 10.1016/j.mito.2015.01.003. [DOI] [PubMed] [Google Scholar]
- 8.Bénit P, Chretien D, Kadhom N, de Lonlay-Debeney P, Cormier-Daire V, Cabral A, et al. Large-scale deletion and point mutations of the nuclear NDUFV1 and NDUFS1 genes in mitochondrial complex I deficiency. Am J Hum Genet. 2001;68(6):1344–52. 10.1086/320603. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Talpade DJ, Greene JG, Higgins DS Jr, Greenamyre JT. In vivo labeling of mitochondrial complex I (NADH: ubiquinoneoxidoreductase) in rat brain using [3H] dihydrorotenone. J Neurochem. 2000;75(6):2611–21. 10.1046/j.1471-4159.2000.0752611.x. [DOI] [PubMed] [Google Scholar]
- 10.Yalamanchili P, Wexler E, Hayes M, Yu M, Bozek J, Kagan M, et al. Mechanism of uptake and retention of F-18 BMS-747 158-02 in cardiomyocytes: a novel PET myocardial imaging agent. J Nucl Cardiol. 2007;14(6):782–8. 10.1016/j.nuclcard.2007.07.009. [DOI] [PubMed] [Google Scholar]
- 11.Harada N, Nishiyama S, Kanazawa M, Tsukada H. Development of novel PET probes,[18F] BCPP-EF,[18F] BCPP‐BF, and [11 C] BCPP‐EM for mitochondrial complex 1 imaging in the living brain. J Label Comp Radiopharm. 2013;56(11):553–61. 10.1002/jlcr.3056. [DOI] [PubMed] [Google Scholar]
- 12.Kola I, Landis J. Can the pharmaceutical industry reduce attrition rates? Nat Rev Drug Discov. 2004;3(8):711–5. 10.1038/nrd1470. [DOI] [PubMed] [Google Scholar]
- 13.Paul SM, Mytelka DS, Dunwiddie CT, Persinger CC, Munos BH, Lindborg SR, Schacht AL. How to improve R&D productivity: the pharmaceutical industry’s grand challenge. Nat Rev Drug Discov. 2010;9(3):203–14. 10.1038/nrd3078. [DOI] [PubMed] [Google Scholar]
- 14.Bleicher KH, Böhm H-J, Müller K, Alanine AI. Hit and lead generation: beyond high-throughput screening. Nat Rev Drug Discov. 2003;2(5):369–78. 10.1038/nrd1086. [DOI] [PubMed] [Google Scholar]
- 15.Mayr LM, Bojanic D. Novel trends in high-throughput screening. Curr Opin Pharmacol. 2009;9(5):580–8. 10.1016/j.coph.2009.08.004. [DOI] [PubMed] [Google Scholar]
- 16.Schmidtke P, Barril X. Understanding and predicting druggability. A high-throughput method for detection of drug binding sites. J Med Chem. 2010;53(15):5858–67. 10.1021/jm100574m. [DOI] [PubMed] [Google Scholar]
- 17.Clark MA, Acharya RA, Arico-Muendel CC, Belyanskaya SL, Benjamin DR, Carlson NR, et al. Design, synthesis and selection of DNA-encoded small-molecule libraries. Nat Chem Biol. 2009;5(9):647–54. 10.1038/nchembio.211. [DOI] [PubMed] [Google Scholar]
- 18.Franzini RM, Neri D, Scheuermann J. DNA-encoded chemical libraries: advancing beyond conventional small-molecule libraries. Acc Chem Res. 2014;47(4):1247–55. 10.1021/ar400284t. [DOI] [PubMed] [Google Scholar]
- 19.Chen X, Yan CC, Zhang X, Zhang X, Dai F, Yin J, Zhang Y. Drug–target interaction prediction: databases, web servers and computational models. Brief Bioinform. 2016;17(4):696–712. 10.1093/bib/bbv066. [DOI] [PubMed] [Google Scholar]
- 20.Lim S, Lu Y, Cho CY, Sung I, Kim J, Kim Y, et al. A review on compound-protein interaction prediction methods: data, format, representation and model. Comput Struct Biotechnol J. 2021;19:1541–56. 10.1016/j.csbj.2021.03.004. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Yu H, Chen J, Xu X, Li Y, Zhao H, Fang Y, et al. A systematic prediction of multiple drug-target interactions from chemical, genomic, and Pharmacological data. PLoS ONE. 2012;7(5):e37608. 10.1371/journal.pone.0037608. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Shi H, Liu S, Chen J, Li X, Ma Q, Yu B. Predicting drug-target interactions using Lasso with random forest based on evolutionary information and chemical structure. Genomics. 2019;111(6):1839–52. 10.1016/j.ygeno.2018.12.007. [DOI] [PubMed] [Google Scholar]
- 23.Mahmud SMH, Chen W, Meng H, Jahan H, Liu Y, Hasan SMM. Prediction of drug-target interaction based on protein features using undersampling and feature selection techniques with boosting. Anal Biochem. 2020;589:113507. 10.1016/j.ab.2019.113507. [DOI] [PubMed] [Google Scholar]
- 24.Zeng X, Zhu S, Hou Y, Zhang P, Li L, Li J, et al. Network-based prediction of drug–target interactions using an arbitrary-order proximity embedded deep forest. Bioinformatics. 2020;36(9):2805–12. 10.1093/bioinformatics/btaa010. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Li S, Wan F, Shu H, Jiang T, Zhao D, Zeng J. MONN: A multi-objective neural network for predicting compound-protein interactions and affinities. Cell Syst. 2020;10(4):308–e2211. 10.1016/j.cels.2020.03.002. [Google Scholar]
- 26.Chen L, Tan X, Wang D, Zhong F, Liu X, Yang T, et al. TransformerCPI: improving compound–protein interaction prediction by sequence-based deep learning with self-attention mechanism and label reversal experiments. Bioinformatics. 2020;36(16):4406–14. 10.1093/bioinformatics/btaa524. [DOI] [PubMed] [Google Scholar]
- 27.Li M, Lu Z, Wu Y, Li Y. BACPI: a bi-directional attention neural network for compound–protein interaction and binding affinity prediction. Bioinformatics. 2022;38(7):1995–2002. 10.1093/bioinformatics/btac035. [DOI] [PubMed] [Google Scholar]
- 28.Wei L, Long W, Wei L. Mdl-cpi: multi-view deep learning model for compound-protein interaction prediction. Methods. 2022;204:418–27. 10.1016/j.ymeth.2022.01.008. [DOI] [PubMed] [Google Scholar]
- 29.Nguyen N-Q, Jang G, Kim H, Kang J. Perceiver CPI: a nested cross-attention network for compound–protein interaction prediction. Bioinformatics. 2023;39(1):btac731. 10.1093/bioinformatics/btac731. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Kyro GW, Smaldone AM, Shee Y, Xu C, Batista VS. T-ALPHA: A hierarchical Transformer-Based deep neural network for Protein–Ligand binding affinity prediction with Uncertainty-Aware Self-Learning for Protein-Specific alignment. J Chem Inf Model. 2025;65(5):2395–415. [DOI] [PubMed] [Google Scholar]
- 31.Öztürk H, Özgür A, Ozkirimli E. DeepDTA: deep drug–target binding affinity prediction. Bioinformatics. 2018;34(17):i821–9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Nguyen T, Le H, Quinn TP, Nguyen T, Le TD, Venkatesh S. GraphDTA: predicting drug–target binding affinity with graph neural networks. Bioinformatics. 2021;37(8):1140–7. [DOI] [PubMed] [Google Scholar]
- 33.Li Y, Rezaei MA, Li C, Li X. DeepAtom: a framework for protein-ligand binding affinity prediction. In2019 IEEE international conference on bioinformatics and biomedicine (BIBM) 2019; pp. 303–310. IEEE.
- 34.Kyro GW, Brent RI, Batista VS. Hac-net: A hybrid attention-based convolutional neural network for highly accurate protein–ligand binding affinity prediction. J Chem Inf Model. 2023;63(7):1947–60. [DOI] [PubMed] [Google Scholar]
- 35.Wang Y, Wu S, Duan Y, Huang Y. A point cloud-based deep learning strategy for protein–ligand binding affinity prediction. Brief Bioinform. 2022;23(1):bbab474. [DOI] [PubMed] [Google Scholar]
- 36.Payolla FB, Massabni AC, Orvig C. Radiopharmaceuticals for diagnosis in nuclear medicine: A short review. Eclética Quim. 2019;44:11–9. [Google Scholar]
- 37.Sgouros G, Bodei L, McDevitt MR, Nedrow JR. Radiopharmaceutical therapy in cancer: clinical advances and challenges. Nat Rev Drug Discov. 2020;19(9):589–608. 10.1038/s41573-020-0073-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Gilson MK, Liu T, Baitaluk M, Nicola G, Hwang L, Chong J. BindingDB in 2015: A public database for medicinal chemistry, computational chemistry and systems Pharmacology. Nucleic Acids Res. 2016;44(D1):D1045–53. 10.1093/nar/gkv1072. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Kim S, Chen J, Cheng T, Gindulyte A, He J, He S, et al. PubChem 2019 update: improved access to chemical data. Nucleic Acids Res. 2019;47(D1):D1102–9. 10.1093/nar/gky1033. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Fey M, Lenssen JE. Fast graph representation learning with PyTorch Geometric. arXiv Prepr. arXiv1903.02428 2019.
- 41.Landrum G, RDKit:. A software suite for cheminformatics, computational chemistry, and predictive modeling. Greg Landrum. 2013;8:31. [Google Scholar]
- 42.Trott O, Olson AJ, Vina AD. AutoDock vina: improving the speed and accuracy of Docking with a new scoring function, efficient optimization, and multithreading. J Comput Chem. 2010;31(2):455–61. 10.1002/jcc.21334. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Benešová M, Schäfer M, Bauder-Wüst U, Afshar-Oromieh A, Kratochwil C, Mier W, et al. Preclinical evaluation of a tailor-made DOTA-conjugated PSMA inhibitor with optimized linker moiety for imaging and endoradiotherapy of prostate cancer. J Nucl Med. 2015;56(6):914–20. 10.2967/jnumed.114.147413. [DOI] [PubMed] [Google Scholar]
- 44.Rousseau E, Lau J, Zhang Z, Uribe CF, Kuo H-T, Zhang C, et al. Effects of adding an albumin binder chain on [177Lu] Lu-DOTATATE. Nucl Med Biol. 2018;66:10–7. 10.1016/j.nucmedbio.2018.08.001. [DOI] [PubMed] [Google Scholar]
- 45.Liu Z, Pourghiasian M, Pan J, Lin KS, Benard F, Perrin D. Preclinical evaluation of a novel 18F-labelled somatostatin receptor-binding peptide. J Nucl Med. 2014;55(supplement 1):1089. [DOI] [PubMed]
- 46.Kopka K, Benešová M, Bařinka C, Haberkorn U, Babich J. Glu-ureido–based inhibitors of prostate-specific membrane antigen: lessons learned during the development of a novel class of low-molecular-weight theranostic radiotracers. J Nucl Med. 2017;58(Supplement 2):S17–26. [DOI] [PubMed] [Google Scholar]
- 47.Gervasoni S, Öztürk I, Guccione C, Bosin A, Ruggerone P, Malloci G. Interaction of radiopharmaceuticals with somatostatin receptor 2 revealed by molecular dynamics simulations. J Chem Inf Model. 2023;63(15):4924–33. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48.Schneider C, Buchanan A, Taddese B, Deane CM. DLAB: deep learning methods for structure-based virtual screening of antibodies. Bioinformatics. 2022;38(2):377–83. 10.1093/bioinformatics/btab660. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49.Galemou Yoga E, Schiller J, Zickermann V. Ubiquinone binding and reduction by complex I—open questions and mechanistic implications. Front Chem. 2021;9:672851. 10.3389/fchem.2021.672851. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 50.Ganguly T, Bauer N, Davis RA, Foster CC, Harris RE, Hausner SH, Roncali E, Tang SY, Sutcliffe JL. Preclinical evaluation of 68Ga-and 177Lu-labeled integrin αvβ6-targeting radiotheranostic peptides. J Nucl Med. 2023;64(4):639–44. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
The datasets generated and/or analysed during the current study are available from the corresponding author upon reasonable request.





