A complete graph-based approach with multi-task learning for predicting synergistic drug combinations

Xiaowen Wang; Hongming Zhu; Danyi Chen; Yongsheng Yu; Qi Liu; Qin Liu

doi:10.1093/bioinformatics/btad351

. 2023 Jun 1;39(6):btad351. doi: 10.1093/bioinformatics/btad351

A complete graph-based approach with multi-task learning for predicting synergistic drug combinations

Xiaowen Wang ¹, Hongming Zhu ², Danyi Chen ³, Yongsheng Yu ⁴, Qi Liu ^5,^✉, Qin Liu ^6,^✉

Editor: Zhiyong Lu

PMCID: PMC10256571 PMID: 37261842

Abstract

Motivation

Drug combination therapy shows significant advantages over monotherapy in cancer treatment. Since the combinational space is difficult to be traversed experimentally, identifying novel synergistic drug combinations based on computational methods has become a powerful tool for pre-screening. Among them, methods based on deep learning have far outperformed other methods. However, most deep learning-based methods are unstable and will give inconsistent predictions even by simply changing the input order of drugs. In addition, the insufficient experimental data of drug combination screening limits the generalization ability of existing models. These problems prevent the deep learning-based models from being in service.

Results

In this article, we propose CGMS to address the above problems. CGMS models a drug combination and a cell line as a heterogeneous complete graph, and generates the whole-graph embedding to characterize their interaction by leveraging the heterogeneous graph attention network. Based on the whole-graph embedding, CGMS can make a stable, order-independent prediction. To enhance the generalization ability of CGMS, we apply the multi-task learning technique to train the model on drug synergy prediction task and drug sensitivity prediction task simultaneously. We compare CGMS’s generalization ability with six state-of-the-art methods on a public dataset, and CGMS significantly outperforms other methods in the leave-drug combination-out scenario, as well as in the leave-cell line-out and leave-drug-out scenarios. We further present the benefit of eliminating the order dependency and the discrimination power of whole-graph embeddings, interpret the rationality of the attention mechanism, and verify the contribution of multi-task learning.

Availability and implementation

The code of CGMS is available via https://github.com/TOJSSE-iData/CGMS.

1 Introduction

Drug combination therapy is considered to be the cornerstone of cancer treatment (Mokhtari et al. 2017; Fudio et al. 2022). At present, more than 350 drug combinations have been clinically used for cancer treatment (Fudio et al. 2022). Compared with monotherapy, the co-involvement of two or more drugs is expected to inhibit multiple oncogenic pathways simultaneously (Gilad et al. 2021), resulting in higher efficacy, more delayed drug resistance and lower toxicity (Mokhtari et al. 2017). However, studies looking for synergistic drug combinations suffer from substantial failure rates (Fudio et al. 2022). Many synergistic drug combinations are proposed based on clinical experience and in vivo experiments, which are time-consuming, labor-intensive, and may expose patients to harmful treatments (Wang et al. 2022b). Although high-throughput screening (HTS) technology can screen drug combinations in vitro, the technology is expensive to build (Macarron et al. 2011) and cannot test the full combinational space. In order to discover anti-cancer synergistic drug combinations more efficiently and at low cost, researchers started to apply computational methods to predict synergistic drug combinations for pre-screening.

Early computational models were usually only applicable to specific targets, pathways, diseases, or cell lines (Preuer et al. 2018). In the past decade, researchers have started to utilize machine learning-based methods (Li et al. 2015; Ding et al. 2019; Sidorov et al. 2019) to predict synergistic drug combinations with omics data, medicinal chemical structures, or biological networks. With the increasing accumulation of experimental data (O’Neil et al. 2016) from techniques such as HTS, deep learning (DL) began to be involved for drug combination prediction. DeepSynergy (Preuer et al. 2018) is the earliest research that applies DL to predict synergistic drug combinations, which predicts the Loewe Synergy Score (Loewe 1953) of drug combinations with the multi-layer perceptron (MLP) and achieves far better performance than classical ML methods. AuDNNSynergy (Zhang et al. 2021) improves the prediction precision on the basis of DeepSynergy by introducing auto-encoder for feature dimensionality reduction. Afterwards, the maturity of graph neural networks (GNNs) further inspired researchers. PRODeepSyn (Wang et al. 2022b) utilizes the graph convolutional network (GCN) model to construct more predictive cell line features to predict synergistic drug combinations, while DeepDDS (Wang et al. 2022a) utilizes GNNs to construct more informative drug features for predictions. Jiang et al. (2020) utilizes the GCN model to predict synergism between drug nodes based on biomedical networks. GraphSynergy (Yang et al. 2021) proposed a GCN model with attention mechanism to predict synergistic drug combinations. TranSynergy (Liu and Xie 2021) predicts drug combinations through the Transformer model (Vaswani et al. 2017) which also has a powerful attention mechanism. There are also DL-based studies that predict synergistic drug combinations with tensor factorization (DTF) (Sun et al. 2020) or deep cascade forest (EC-DFR) (Lin et al. 2022).

Although the rise of DL techniques has injected new impetus into predicting anti-cancer synergistic drug combinations, these DL-based models still have a long way from being in service. For example, most of the DL-based models are unstable and will give inconsistent predictions even if just changing the input order of drugs from (Drug A, Drug B, Cell line C) to (Drug B, Drug A, Cell line C). Despite the fact that many studies try to force models to give order-independent predictions through data augmentation (Preuer et al. 2018; Zhang et al. 2021; Wang et al. 2022b), models are still unable to truly reveal the association between drug combinations and cell lines (Atarashi et al. 2017). We have noticed that GraphSynergy has implemented the order-independent deep feature generation, however, it re-introduces the issue about order when making the final prediction.

Another factor hindering the application of DL-based models is that the generalization ability of existing models is still unsatisfactory. DL is a data-driven technology. Although many synergy datasets such as O’Neil dataset (O’Neil et al. 2016) and NCI-ALMANAC (Holbeck et al. 2017) have been published, the amount of published drug synergy datasets is still far less than the possible combination of drugs and cell lines. Furthermore, the number of synergistic drug combinations is much smaller than the number of non-synergistic ones, which seriously affects the precision and generalization ability of DL models. Existing studies have adopted techniques such as multi-model ensemble (Ding et al. 2019) and transfer learning (Kim et al. 2021) to obtain more accurate prediction results. However, multi-model ensemble is costly, and transfer learning still relies on large-scale pre-training data.

In this article, we propose a Complete Graph-based approach with Multi-task learning to identify anti-cancer Synergistic drug combinations named CGMS. A complete graph is a graph such that each pair of different nodes in the graph is connected with one and only one edge. CGMS regards a drug combination and a cell line as a heterogeneous complete graph, where two drug nodes and a cell line node are interconnected, to learn the relation between them. Then CGMS generates order-independent whole-graph embeddings for predicting synergy scores by leveraging heterogeneous graph attention network (HAN) (Wang et al. 2019). Meanwhile, the attention mechanism in HAN could encourage the model to focus more on the important parts in different samples adaptively, making the model generate more discriminative whole-graph embeddings. To improve the generalization ability of the model, we adopt the multi-task learning technique to train CGMS. We select the prediction of drug synergy scores and drug sensitivity scores as two learning tasks, since the calculation of synergy score is related to the sensitivity of the cell line to each drug in the combination. By learning more common knowledge from multiple tasks, the model can avoid overfitting a certain task, thereby improving the generalization ability (Zhang and Yang 2021). We evaluate CGMS on a widely used public dataset in three different scenarios, including the leave-drug combination-out scenario, the leave-cell line-out scenario, and the leave-drug-out scenario. CGMS outperforms the state-of-the-art methods in all scenarios. We also present the benefit of eliminating the impact of order dependency on the synergy predictions and the discrimination power of whole-graph embeddings. We analyze the interpretability of CGMS’s attention mechanism as well. Besides, we verify the improvement brought by applying multi-task learning. Overall, CGMS provides a possible direction for anti-cancer synergistic drug combination prediction models to move towards application.

2 Materials and methods

2.1 Datasets

We used the large-scale combination screening data published by O’Neil et al. (2016) to evaluate CGMS. The experimental data covers the in vitro testing results of 583 different drug combinations against 39 human cancer cell lines. Referring to Preuer et al. (2018), we constructed the synergy dataset which consists of 22 737 samples and their Loewe scores. We also constructed the sensitivity dataset based on DrugComb (Zagidullin et al. 2019) to train CGMS with multi-task learning. We select the relative inhibition (RI) value as the sensitivity score of a single drug, since the RI value is more robust than other methods in characterizing drug sensitivity (Douglass et al. 2022) and can be used to compare drug response in different concentration ranges (Zagidullin et al. 2019). We finally constructed a sensitivity dataset of 11 766 samples with 346 drugs and 175 cell lines for CGMS. Please see Supplementary Material S1 for more details.

2.2 Drug features

The molecular fingerprints and descriptors were used to construct drug feature vectors to retain the information on the drug structures and physicochemical properties of the drugs. We collected the simplified molecular input line entry system (SMILES) expressions of 3118 drugs from DrugComb and utilized the RDKit toolkit (Landrum et al. 2013) to extract the fingerprints and molecular descriptors of each drug. We calculated the Morgan fingerprint with the radius of 3 and converted the fingerprint into a 1024-dimensional binary vector. We also calculated 189 non-zero variance molecular descriptors for the collected drugs. Then the drug fingerprint vectors and molecular descriptors were concatenated and z-score normalized to form the original drug feature matrix. Finally, we used auto-encoder to encode drug features to 256 dimension (see Supplementary Material S2).

2.3 Cell line features

The cell line features were mainly constructed based on the RNA-Seq transcript per million (TPM) gene expression data provided by Cancer Cell Line Encyclopedia (CCLE) (Barretina et al. 2012), in which the TPM value has been processed by logarithm. Here we obtained expression data from CCLE for 19 177 genes of 174 cell lines. Since the gene expression data of cell line OCUB-M was unable to be obtained from CCLE, we supplemented the TPM data of OCUB-M from Cell Model Passports (van der Meer et al. 2018). After that, we screened out genes with the largest variance of gene expression level among 175 cell lines and kept 5001 genes which may carry more information. The gene expression data was also preprocessed with z-score normalization and encoded as 256-dimensional vectors (see Supplementary Material S2).

2.4 CGMS

2.4.1 The heterogeneous complete graph

CGMS is a complete graph-based approach for anti-cancer synergistic drug combination prediction trained with multi-task learning. In this article, a heterogeneous complete graph is built to describe the interaction between drug combinations and cell lines. As presented in Fig. 1, the constructed graph contains nodes for both cell line and drug types. Each node with a self-loop is connected by binary-directed edges. There are four types of possible meta-paths as illustrated in Fig. 1C. The paths drug–cell line and cell line–drug are used to simulate the therapeutic effect of a single drug on the cell line, and the binary-directed edge reflects that the efficacy depends on both the drug and the cell line. The path drug–drug reflects that drugs may interact due to the chemical structure and other factors. The existence of self-loops can help to extract drug and cell line features without forgetting the information of the nodes themselves (Jin et al. 2021). Afterwards, CGMS can generate the order-independent whole-graph embedding to represent the drug combination and the cell line for later prediction.

The heterogeneous complete graph for drug combinations and cell lines. (A) Two node types in the complete graph. (B) The heterogeneous complete graph with three nodes and six edges. (C) Four types of meta-paths in the complete graph.

2.4.2 Generation of the whole-graph embedding

As shown in Fig. 2A, for the constructed complete graph, we use the heterogeneous graph attention network (HAN) (Wang et al. 2019) with three layers to generate the whole-graph embedding, which is the cell line node embedding output by the third layer of HAN. As shown in the figure, a HAN layer follows a hierarchical attention structure: from the node-level attention to the semantic-level attention. The self-attention mechanism is used to adaptively control the flow of information between nodes to reflect the importance of each node to the final synergy score, which makes the model more expressive.

Workflow of CGMS. (A) The whole-graph embedding is generated with HAN and fed into two MLPs to predict the Loewe synergy score and the RI value, respectively. (B) The generation of the cell line embedding in layer l of HAN. The features of each node will first be projected to the same feature space and then aggregated through node-level attention depending on the meta-path. Finally, the cell line embedding is generated through semantic-level attention.

Figure 2B shows the generation process of the cell line embedding in one layer of HAN. First of all, the features of each node will be projected to the same feature space with type-specified projection matrix W. In layer l, the representation h of node u with type $ϕ$ is transformed as

h_{u *}^{(l)} = W_{ϕ} \cdot h_{u}^{(l - 1)}, u \in ϕ

(1)

After projection, the heterogeneous graph could be regarded as a homogeneous graph. Afterwards, HAN performs node-level aggregation on nodes accessible through the same meta-path for each node. Given a meta-path $Φ$ , the set of neighbors reachable by node u via $Φ$ is denoted as $N_{(Φ, u)}$ . The attention of node u to node v in $N_{(Φ, u)}$ is where $a_{Φ}^{T}$ is the attention vector related to the meta-path, σ is the leaky rectified linear unit activate function, and $[\cdot | | \cdot]$ represents vector concatenation. In CGMS, since there is always one and only one cell line node in the complete graph, the attention of each node to the cell line node is always 1 when calculating the node-level attention. Through node-level attention, we can get the feature representation of node u under meta-path $Φ$ as

e_{u v}^{Φ} = exp (σ (a_{Φ}^{T} \cdot [h_{u *} | | h_{v *}]))

(2)

α_{u v}^{Φ} = \frac{e_{u v}^{Φ}}{\sum_{k \in N_{(Φ, u)}} e_{u k}^{Φ}}

(3)

z_{u}^{Φ} = σ (\sum_{v \in N_{(Φ, u)}} α_{u v}^{Φ} \cdot h_{v *})

(4)

Similarly, the semantic-level attention is calculated to aggregate the representations of the same node under different meta-paths. The attention weights, or the importance, of each meta-path ${Φ_{0}, Φ_{1}, \dots, Φ_{m}}$ related to node u is calculated as where $q$ is the semantic-level attention vector. Finally, the representation output of node u in the lth layer is

w_{u}^{Φ_{i}} = exp (q^{T} \cdot \tanh (W \cdot z_{u}^{Φ_{i}} + b))

(5)

β_{Φ_{i}} = \frac{w_{u}^{Φ_{i}}}{\sum_{j = 1}^{m} w_{u}^{Φ_{j}}}

(6)

h_{u}^{(l)} = \sum_{i = 1}^{m} β_{Φ_{i}} \cdot z_{u}^{Φ_{i}}

(7)

In addition, we also adopt the multi-head attention mechanism to obtain more stable feature representations. For a given number of heads N, the feature representation of node u under meta-path $Φ_{i}$ will be calculated N times under different parameters, and the resulting vectors will be sequentially concatenated as $z_{u}^{Φ_{i}}$ to calculate the semantic-level attention.

As can be seen from the above formulas, the representation output of each node in the complete graph is the weighted sum of all nodes’ projected features, where the weights are calculated with the self-attention mechanism. Therefore, the whole-graph embedding is independent of the input order of drugs.

2.4.3 Prediction for multiple tasks

To enhance the generalization ability of the model, we apply multi-task learning (Zhang and Yang 2021) to train the model. The chosen training tasks include the synergy score prediction for drug combinations and the sensitivity score prediction for a single drug, both of which use the mean square error (MSE) as the loss function. The synergy score is the difference between the observed efficacy and the expected efficacy of drug combinations, and the calculation of expected efficacy requires drug response of each drug (Loewe 1953). The RI value used to characterize drug sensitivity is calculated based on the area under single drug’s dose–response curve (Zagidullin et al. 2019). Therefore, we think that the two prediction tasks are related, from which CGMS could learn more general knowledge of the drug effect on cell lines. As shown in Fig. 2A, the whole-graph embedding of the complete graph is regarded as the general knowledge and accepted by two separate MLPs for different tasks. Each MLP has one hidden layer whose number of neural units is the half of the whole-graph embedding’s dimension, followed by a rectified linear unit activation layer and a dropout layer. It is worth pointing out that for the sensitivity score prediction task, we set the feature of the same drug as the node feature for two drug nodes as input, so that the obtained whole-graph embedding can represent the interaction between one single drug and one cell line.

Considering the inconsistent volume of drug sensitivity data and synergy data, we propose a modified joint training strategy to train the CGMS model. At each training step, we will randomly sample a batch of data for only one task, and accumulate the gradients in continuous steps. Only after the model is trained on the synergy prediction task, will we propagate back gradients to update the model parameters. Due to the smaller volume of the sensitivity dataset, we allow repeated sampling of the drug sensitivity data and control the non-replacement sampling of the synergy dataset. One epoch ends after all training samples from the synergy dataset have been trained on once.

3 Results

3.1 Experimental setup

We compared CGMS with six state-of-the-art methods, including one ensemble method named EC-DFR; two pure MLP-based methods, named DeepSynergy and AuDNNSynergy; two GCN-based methods, named PRODeepSyn and DeepDDS; and one transformer-based method named TranSynergy. We evaluated the generalization ability of CGMS and other methods in three different scenarios using 5-fold cross-validation. In the leave-drug combination-out scenario, the synergy dataset was evenly divided into five folds where drug combinations contained in one fold never occurred in the other folds. The sensitivity dataset was evenly divided to make all drugs occur in each fold, which was consistent with the synergy dataset. All the methods were trained on four folds and tested on the leave-out fold iteratively. Similarly, in the leave-cell line-out scenario or the leave-drug-out scenario, both the two datasets were divided into five folds to guarantee that the cell lines or drugs in the test fold were isolated from the other folds, respectively. The details of dataset division are described in Supplementary Material S1. With the three scenarios, we could evaluate the generalization ability of methods when facing novel drug combinations, novel cell lines, or novel drugs.

We treat the identification of anti-cancer synergistic drug combinations as a regression task, considering that treating it as a classification problem could oversimplify the reality (Preuer et al. 2018). We evaluated all methods mainly by calculating the MSE based on the predicted Loewe synergy scores and the ground truth. We also report the root MSE (RMSE) and the Pearson correlation coefficient (PCC). We re-implemented the methods that did not provide related metrics in paper following their original settings as possible as we could (Supplementary Material S3). We used nested cross-validation to select hyperparameters for CGMS. The learning rate was chosen in the set of {1e-3, 1e-4, 1e-5}, and the hidden layer size of the graph attention network was chosen in the set of {128, 256, 512}. We fixed the number of attention heads to 8 and the batch size to 512. The dropout rate was set to 0.5. The optimizer used was the AdamW optimizer. The maximum number of epochs per training was 500. We also adopted the Early Stopping technique to further avoid overfitting. See Supplementary Material S4 for more details.

3.2 Method comparison

The comparison was carried out in three different scenarios. We first evaluated CGMS and the selected baseline methods in the leave-drug combination-out scenario. The results are shown in Table 1. CGMS achieves the lowest MSE of 208.38 and the highest PCC of 0.78 among all compared methods. It reduces the MSE by 9.2% compared to the second best, GCN-based method PRODeepSyn. The performance of CGMS and PRODeepSyn indicates that graph-based methods are more likely to achieve better performance than pure MLP-based methods. Compared to TranSynergy that also has a powerful attention mechanism, CGMS achieves a 9.8% lower MSE and 0.03 higher PCC on a larger dataset, which we attribute to CGMS’s elimination of the impact of input order and the introduction of multi-task learning. The results of DeepDDS and EC-DFR are less competitive, which may be owning to the inappropriate scenarios (Supplementary Material S3). We further evaluated CGMS and five other methods in the more challenging scenarios. In the leave-cell line-out or leave-drug-out scenario, methods are required to predict the synergy scores for novel cell lines or novel drugs, which can better test the generalization ability of methods. Although all methods perform obviously worse than in the leave-drug combination-out scenario, CGMS still achieves the lowest MSE in both scenarios (Supplementary Tables S1 and S2). In summary, the results show that CGMS has significantly better generalization ability and outperforms other advanced methods when predicting the synergy scores of drug combinations.

Table 1.

Results of method comparison on the synergy score prediction task.

Method	MSE	RMSE	Confidence interval 95%	PCC
CGMS	208.38 ± 36.16	14.38 ± 1.23	[163.48, 253.28]	0.78 ± 0.02
PRODeepSyn	229.49 ± 42.81	15.09 ± 1.37	[176.34, 282.64]	0.75 ± 0.02₅
TranSynergy	231 ± 21			0.75 ± 0.02
AudnnSynergy	241.12 ± 43.52	15.46 ± 1.44	[187.09, 295.15]	0.74 ± 0.03
DeepSynergy	255.49	15.91 ± 1.56		0.73 ± 0.04
DeepDDS	271.62 ± 30.15	16.45 ± 0.93	[234.18, 309.05]	0.38 ± 0.06
EC-DFR	314.26 ± 57.09	17.66 ± 1.57	[243.38, 385.13]	0.62 ± 0.02₅

Open in a new tab

Note: Values of MSE, RMSE, and PCC are mean values ± 1 standard deviation. The best results are shown in bold.

3.3 Predictions with different input orders of drug pairs

CGMS is designed to learn the order-independent embedding of a drug combination and a cell line. To illustrate the impact of the order dependency on the synergy prediction results of CGMS and other GCN-based methods, we evaluated CGMS, GraphSynergy (Yang et al. 2021), and DeepDDS with their variants in the leave-drug combination-out scenario. Specifically, for CGMS, we implemented CGMS-ord that replaced the whole-graph embedding with the concatenation of node embeddings to make predictions. Due to the concatenation, CGMS-ord became order dependent. For GraphSynergy and DeepDDS, we utilized the self-attention mechanism to implement the order-independent variants named GraphSynergy-indep and DeepDDS-indep, respectively. We also introduced the MLP structure into GraphSynergy for a more fair comparison. The implementation details are described in Supplementary Material S6. Each order-dependent model was trained twice, one with data augmentation and one without, identified by the suffixes wa and woa, respectively. When testing models, each sample would be predicted both in the order (Drug A, Drug B, Cell line) and in the order (Drug B, Drug A, Cell line). The average of the two predictions will be taken as the actual prediction. As results, the MSEs of CGMS, GraphSynergy-indep, and DeepDDS-indep are all lower than their order-dependent variants (Supplementary Table S3). The results also verify the effectiveness of data augmentation (see Supplementary Fig. S1), but even with data augmentation, there are still differences between predictions of the same sample with different input orders. We draw the kernel density estimation (KDE) plot on absolute errors between ground truth and predictions given by CGMS and its variants in Fig. 3. It can be seen that the errors of CGMS are more concentrated in the interval 0–10 than its variants. The same phenomenon can also be observed in GraphSynergy and DeepDDS (Supplementary Figs S2 and S3). The experiment findings demonstrate that the order-independent methods are capable of modeling the relations between drug combinations and cell lines more reasonably and more precisely.

KDE plot on absolute errors between the predictions and the ground truth.

3.4 Distribution of the whole-graph embeddings

CGMS generates the whole-graph embedding to reflect the interaction between a drug combination and a cell line. To explore how well the whole-graph embeddings can represent the samples, we visualized the whole-graph embeddings of four cell lines with t-SNE (Van der Maaten and Hinton 2008). The selected cell lines are those who occur most times in the strong-synergistic samples whose synergy scores are higher than 30 (Preuer et al. 2018). Figure 4 presents the distribution of t-SNE results of samples in test fold 1. The results in other test folds are plotted in Supplementary Figure S4. As shown in the figures, samples with high positive synergy scores (blue dots) and samples with negative scores (orange crosses) are separated clearly, which indicates that CGMS obtains a discriminative whole-graph embedding through HAN and can distinguish synergistic drug combinations for a certain cell line.

T-SNE results of four cell lines’ whole-graph embeddings distribution generated by CGMS. Dots in blue are samples whose synergy scores are higher than 30, squares with light color are samples whose synergy scores are in range [0, 30], and the crosses in orange are samples whose synergy scores are lower than 0.

3.5 Analysis of graph attention in CGMS

The CGMS model introduces the graph attention mechanism to focus on the significant information for determining the synergism of drugs combinations. To analyze the rationality of the attention weights that CGMS has assigned for drugs and cell lines, we masked different parts of features according to the attention to observe the impact on the prediction. Specifically, we first extracted the attention values given by different heads in the last HAN layer. Then we masked the lowest, medium, and highest attention to zero to generate defective graph embeddings for making predictions, respectively. We calculated the evaluation metrics and visualized the relations between the predicted results and the ground truth with the least squares method. Figure 5 presents that the MSE and RMSE increase as we mask higher attentions. Meanwhile, the PCC and R-value between the predictions and the ground truth decrease. This phenomenon indicates that CGMS has learnt to pay more attention to more important parts for predicting the synergism of drug combinations on a certain cell line.

Relations between the predicted results and the ground truth. The values in x-axis are predicted scores, and the values in y-axis are the ground truth. The blue straight lines are fitted via the least squares method with P-value <1e−5. (A) The original results of CGMS without mask. (B) The results of masking the lowest attention. (C) The results of masking the medium attention. (D) The results of masking the highest attention.

3.6 Ablation study for multi-task learning

CGMS adopts a multi-task learning technology to train the model for better generalization. To explore the effect of multi-task learning on the generalization ability of the model, we conducted ablation experiments. As mentioned before, we choose the synergy prediction task and the sensitivity prediction task for training CGMS. In the ablation study, we constructed two variants of CGMS, named CGMSyn and CGMSen, and compared them with CGMS on the two tasks, respectively. Each variant has the identical main structure with CGMS, but keeps only one MLP for prediction and is trained only on a single task.

Table 2 summarizes the results of the ablation study. The prediction error of CGMS is reduced compared to both variants trained only on a single task. On the synergy prediction task, CGMS achieves a 4.8% lower MSE than CGMSyn with a higher PCC. Besides, CGMS achieves a 11.7% lower MSE than CGMSen on the sensitivity prediction task, and increases the PCC by 0.05. It can be seen from the results that multi-task learning enhances the generalization ability of the CGMS model on both tasks.

Table 2.

Experimental results of the ablation study.

	Synergy prediction		Sensitivity prediction
Model	MSE	PCC	MSE	PCC
CGMS	208.38 ± 36.16	0.78 ± 0.02	349.94 ± 73.44	0.62 ± 0.04
CGMSyn	218.90 ± 33.83	0.75 ± 0.01
CGMSen			396.41 ± 85.18	0.57 ± 0.05

Open in a new tab

Values of MSE and PCC are mean values ± 1 standard deviation. The best results are shown in bold.

4 Conclusion

We propose a complete graph-based approach for predicting anti-cancer synergistic drug combinations named CGMS. CGMS innovatively models the relationship between drug combinations and cell lines as a heterogeneous complete graph, and generates the whole-graph embedding with HAN for making stable, order-independent predictions. We adopt a multi-task learning technique to train CGMS to improve its generalization ability. We compare CGMS with six state-of-the-art methods on a public dataset in terms of predicting synergy scores for novel drug combinations, novel drugs, or novel cell lines. The results show that CGMS has significantly lower prediction errors than other compared methods in all scenarios. Further experiment results also present the benefit of eliminating the order dependency and the discrimination power of the whole-graph embeddings, interpret the rationality of attention mechanism and verify the contribution of multi-task learning to better generalization ability.

We also recognize that CGMS still has potential for further improvement. In this work, we focus more on the construction of deep model, while constructing the drugs’ and cell lines’ features following common practice. There are researches who have explored the way to integrate more informative features for synergistic drug combination prediction (Yang et al. 2021; Wang et al. 2022a,b), which inspire us to design input features of CGMS better.

Based on CGMS’s superior performance, its predicted scores can provide a reference for drug pre-screening. The drug combinations with higher predicted synergy scores may be more worthy of being tested. In conclusion, CGMS outperforms the state-of-the-art methods in predicting the synergy scores of drug combinations, which we anticipate will further advance the practical application of anti-cancer synergistic drug combination prediction models.

Supplementary Material

btad351_Supplementary_Data

Click here for additional data file.^{(1.5MB, docx)}

Contributor Information

Xiaowen Wang, School of Software Engineering, Tongji University, Shanghai 201804, China.

Hongming Zhu, School of Software Engineering, Tongji University, Shanghai 201804, China.

Danyi Chen, School of Software Engineering, Tongji University, Shanghai 201804, China.

Yongsheng Yu, Drug Discovery and Development Department, Tasly Biopharmaceuticals Co., Ltd, Shanghai 201203, China.

Qi Liu, Translational Medical Center for Stem Cell Therapy and Institute for Regenerative Medicine, Shanghai East Hospital, Bioinformatics Department, School of Life Sciences and Technology, Tongji University, Shanghai 200092, China.

Qin Liu, School of Software Engineering, Tongji University, Shanghai 201804, China.

Supplementary data

Supplementary data is available at Bioinformatics online.

Conflict of interest: None declared.

Funding

This work was supported by the National Key Research and Development Program of China [grant numbers 2021YFF1201200, 2021YFF1200900].

Data availability

The data and source code underlying this article are available in https://github.com/TOJSSE-iData/CGMS.

References

Atarashi K, Oyama S, Kurihara M. et al. A deep neural network for pairwise classification: enabling feature conjunctions and ensuring symmetry. In: Advances in Knowledge Discovery and Data Mining. Cham: Springer International Publishing, 2017, 83–95. [Google Scholar]
Barretina J, Caponigro G, Stransky N. et al. The cancer cell line encyclopedia enables predictive modelling of anticancer drug sensitivity. Nature 2012;483:603–7. [DOI] [PMC free article] [PubMed] [Google Scholar]
Ding P, Yin R, Luo J. et al. Ensemble prediction of synergistic drug combinations incorporating biological, chemical, pharmacological, and network knowledge. IEEE J Biomed Health Inform 2019;23:1336–45. [DOI] [PubMed] [Google Scholar]
Douglass EF, Allaway RJ, Szalai B. et al. A community challenge for a pancancer drug mechanism of action inference from perturbational profile data. Cell Rep Med 2022;3:100492. [DOI] [PMC free article] [PubMed] [Google Scholar]
Fudio S, Sellers A, Pérez Ramos L. et al. Anti-cancer drug combinations approved by US FDA from 2011 to 2021: main design features of clinical trials and role of pharmacokinetics. Cancer Chemother Pharmacol 2022;90:285–99. [DOI] [PubMed] [Google Scholar]
Gilad Y, Gellerman G, Lonard DM. et al. Drug combination in cancer treatment-from cocktails to conjugated combinations. Cancers 2021;13:669. [DOI] [PMC free article] [PubMed] [Google Scholar]
Holbeck SL, Camalier R, Crowell JA. et al. The National Cancer Institute ALMANAC: A Comprehensive Screening Resource for the Detection of Anticancer Drug Pairs with Enhanced Therapeutic Activity. Cancer Res 2017;77:3564–76. [DOI] [PMC free article] [PubMed] [Google Scholar]
Jiang P, Huang S, Fu Z. et al. Deep graph embedding for prioritizing synergistic anticancer drug combinations. Comput Struct Biotechnol J 2020;18:427–38. [DOI] [PMC free article] [PubMed] [Google Scholar]
Jin W, Derr T, Wang Y. et al. Node similarity preserving graph convolutional networks. In: WSDM ’21, Virtual Event, Israel, pp. 148–56, New York: Association for Computing Machinery, 2021.
Kim Y, Zheng S, Tang J. et al. Anticancer drug synergy prediction in understudied tissues using transfer learning. J Am Med Inform Assoc 2021;28:42–51. [DOI] [PMC free article] [PubMed] [Google Scholar]
Landrum G. RDKit: Open-source cheminformatics 2013. https://www.rdkit.org (5 March 2022, date last accessed).
Li P, Huang C, Fu Y. et al. Large-scale exploration and analysis of drug combinations. Bioinformatics 2015;31:2007–16. [DOI] [PubMed] [Google Scholar]
Lin W, Wu L, Zhang Y. et al. An enhanced cascade-based deep forest model for drug combination prediction. Brief Bioinformatics 2022;23:bbab562. [DOI] [PubMed] [Google Scholar]
Liu Q, Xie L.. Transynergy: mechanism-driven interpretable deep neural network for the synergistic prediction and pathway deconvolution of drug combinations. PLoS Comput Biol 2021;17:e1008653. [DOI] [PMC free article] [PubMed] [Google Scholar]
Loewe S. The problem of synergism and antagonism of combined drugs. Arzneimittelforschung 1953;3:285–90. [PubMed] [Google Scholar]
Macarron R, Banks MN, Bojanic D. et al. Impact of high-throughput screening in biomedical research. Nat Rev Drug Discov 2011;10:188–95. [DOI] [PubMed] [Google Scholar]
Mokhtari RB, Homayouni TS, Baluch N. et al. Combination therapy in combating cancer. Oncotarget 2017;8:38022–43. [DOI] [PMC free article] [PubMed] [Google Scholar]
O’Neil J, Benita Y, Feldman I. et al. An unbiased oncology compound screen to identify novel combination strategies. Mol Cancer Ther 2016;15:1155–62. [DOI] [PubMed] [Google Scholar]
Preuer K, Lewis RPI, Hochreiter S. et al. DeepSynergy: predicting anti-cancer drug synergy with deep learning. Bioinformatics 2018;34:1538–46. [DOI] [PMC free article] [PubMed] [Google Scholar]
Sidorov P, Naulaerts S, Ariey-Bonnet J. et al. Predicting synergism of cancer drug combinations using NCI-ALMANAC data. Front Chem 2019;7:509. [DOI] [PMC free article] [PubMed] [Google Scholar]
Sun Z, Huang S, Jiang P. et al. DTF: deep tensor factorization for predicting anticancer drug synergy. Bioinformatics 2020;36:4483–9. [DOI] [PubMed] [Google Scholar]
Van der Maaten L, Hinton G.. Visualizing data using t-SNE. J Mach Learn Res 2008;9:2579–605. [Google Scholar]
van der Meer D, Barthorpe S, Yang W. et al. Cell model passports—a hub for clinical, genetic and functional datasets of preclinical cancer models. Nucleic Acids Res 2018;47:D923–9. [DOI] [PMC free article] [PubMed] [Google Scholar]
Vaswani A, Shazeer N, Parmar N. et al. Attention is all you need. Adv Neural Inform Process Syst 2017;30:5998–6008. [Google Scholar]
Wang J, Liu X, Shen S. et al. DeepDDS: deep graph neural network with attention mechanism to predict synergistic drug combinations. Brief Bioinformatics 2022a;23:bbab390. [DOI] [PubMed] [Google Scholar]
Wang X, Ji H, Shi C. et al. Heterogeneous graph attention network. In: The World Wide Web Conference, WWW ’19, pp. 2022–32, New York: Association for Computing Machinery, 2019.
Wang X, Zhu H, Jiang Y. et al. PRODeepSyn: predicting anticancer synergistic drug combinations by embedding cell lines with protein–protein interaction network. Brief Bioinformatics 2022b;23:bbab587. [DOI] [PMC free article] [PubMed] [Google Scholar]
Yang J, Xu Z, Wu WKK. et al. GraphSynergy: a network-inspired deep learning model for anticancer drug combination prediction. J Am Med Inform Assoc 2021;28:2336–45. [DOI] [PMC free article] [PubMed] [Google Scholar]
Zagidullin B, Aldahdooh J, Zheng S. et al. DrugComb: an integrative cancer drug combination data portal. Nucleic Acids Res 2019;47:W43–51. [DOI] [PMC free article] [PubMed] [Google Scholar]
Zhang T, Zhang L, Payne PRO. et al. Synergistic Drug Combination Prediction by Integrating Multiomics Data in Deep Learning Models. New York: Springer US, 2021, 223–38. [DOI] [PubMed] [Google Scholar]
Zhang Y, Yang Q.. A survey on multi-task learning. IEEE Trans Knowledge Data Eng 2021;34:5586–609. [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

btad351_Supplementary_Data

Click here for additional data file.^{(1.5MB, docx)}

Data Availability Statement

The data and source code underlying this article are available in https://github.com/TOJSSE-iData/CGMS.

[btad351-B1] Atarashi K, Oyama S, Kurihara M. et al. A deep neural network for pairwise classification: enabling feature conjunctions and ensuring symmetry. In: Advances in Knowledge Discovery and Data Mining. Cham: Springer International Publishing, 2017, 83–95. [Google Scholar]

[btad351-B2] Barretina J, Caponigro G, Stransky N. et al. The cancer cell line encyclopedia enables predictive modelling of anticancer drug sensitivity. Nature 2012;483:603–7. [DOI] [PMC free article] [PubMed] [Google Scholar]

[btad351-B3] Ding P, Yin R, Luo J. et al. Ensemble prediction of synergistic drug combinations incorporating biological, chemical, pharmacological, and network knowledge. IEEE J Biomed Health Inform 2019;23:1336–45. [DOI] [PubMed] [Google Scholar]

[btad351-B4] Douglass EF, Allaway RJ, Szalai B. et al. A community challenge for a pancancer drug mechanism of action inference from perturbational profile data. Cell Rep Med 2022;3:100492. [DOI] [PMC free article] [PubMed] [Google Scholar]

[btad351-B5] Fudio S, Sellers A, Pérez Ramos L. et al. Anti-cancer drug combinations approved by US FDA from 2011 to 2021: main design features of clinical trials and role of pharmacokinetics. Cancer Chemother Pharmacol 2022;90:285–99. [DOI] [PubMed] [Google Scholar]

[btad351-B6] Gilad Y, Gellerman G, Lonard DM. et al. Drug combination in cancer treatment-from cocktails to conjugated combinations. Cancers 2021;13:669. [DOI] [PMC free article] [PubMed] [Google Scholar]

[btad351-B7] Holbeck SL, Camalier R, Crowell JA. et al. The National Cancer Institute ALMANAC: A Comprehensive Screening Resource for the Detection of Anticancer Drug Pairs with Enhanced Therapeutic Activity. Cancer Res 2017;77:3564–76. [DOI] [PMC free article] [PubMed] [Google Scholar]

[btad351-B8] Jiang P, Huang S, Fu Z. et al. Deep graph embedding for prioritizing synergistic anticancer drug combinations. Comput Struct Biotechnol J 2020;18:427–38. [DOI] [PMC free article] [PubMed] [Google Scholar]

[btad351-B9] Jin W, Derr T, Wang Y. et al. Node similarity preserving graph convolutional networks. In: WSDM ’21, Virtual Event, Israel, pp. 148–56, New York: Association for Computing Machinery, 2021.

[btad351-B10] Kim Y, Zheng S, Tang J. et al. Anticancer drug synergy prediction in understudied tissues using transfer learning. J Am Med Inform Assoc 2021;28:42–51. [DOI] [PMC free article] [PubMed] [Google Scholar]

[btad351-B11] Landrum G. RDKit: Open-source cheminformatics 2013. https://www.rdkit.org (5 March 2022, date last accessed).

[btad351-B12] Li P, Huang C, Fu Y. et al. Large-scale exploration and analysis of drug combinations. Bioinformatics 2015;31:2007–16. [DOI] [PubMed] [Google Scholar]

[btad351-B13] Lin W, Wu L, Zhang Y. et al. An enhanced cascade-based deep forest model for drug combination prediction. Brief Bioinformatics 2022;23:bbab562. [DOI] [PubMed] [Google Scholar]

[btad351-B14] Liu Q, Xie L.. Transynergy: mechanism-driven interpretable deep neural network for the synergistic prediction and pathway deconvolution of drug combinations. PLoS Comput Biol 2021;17:e1008653. [DOI] [PMC free article] [PubMed] [Google Scholar]

[btad351-B15] Loewe S. The problem of synergism and antagonism of combined drugs. Arzneimittelforschung 1953;3:285–90. [PubMed] [Google Scholar]

[btad351-B16] Macarron R, Banks MN, Bojanic D. et al. Impact of high-throughput screening in biomedical research. Nat Rev Drug Discov 2011;10:188–95. [DOI] [PubMed] [Google Scholar]

[btad351-B17] Mokhtari RB, Homayouni TS, Baluch N. et al. Combination therapy in combating cancer. Oncotarget 2017;8:38022–43. [DOI] [PMC free article] [PubMed] [Google Scholar]

[btad351-B18] O’Neil J, Benita Y, Feldman I. et al. An unbiased oncology compound screen to identify novel combination strategies. Mol Cancer Ther 2016;15:1155–62. [DOI] [PubMed] [Google Scholar]

[btad351-B19] Preuer K, Lewis RPI, Hochreiter S. et al. DeepSynergy: predicting anti-cancer drug synergy with deep learning. Bioinformatics 2018;34:1538–46. [DOI] [PMC free article] [PubMed] [Google Scholar]

[btad351-B20] Sidorov P, Naulaerts S, Ariey-Bonnet J. et al. Predicting synergism of cancer drug combinations using NCI-ALMANAC data. Front Chem 2019;7:509. [DOI] [PMC free article] [PubMed] [Google Scholar]

[btad351-B21] Sun Z, Huang S, Jiang P. et al. DTF: deep tensor factorization for predicting anticancer drug synergy. Bioinformatics 2020;36:4483–9. [DOI] [PubMed] [Google Scholar]

[btad351-B22] Van der Maaten L, Hinton G.. Visualizing data using t-SNE. J Mach Learn Res 2008;9:2579–605. [Google Scholar]

[btad351-B23] van der Meer D, Barthorpe S, Yang W. et al. Cell model passports—a hub for clinical, genetic and functional datasets of preclinical cancer models. Nucleic Acids Res 2018;47:D923–9. [DOI] [PMC free article] [PubMed] [Google Scholar]

[btad351-B24] Vaswani A, Shazeer N, Parmar N. et al. Attention is all you need. Adv Neural Inform Process Syst 2017;30:5998–6008. [Google Scholar]

[btad351-B25] Wang J, Liu X, Shen S. et al. DeepDDS: deep graph neural network with attention mechanism to predict synergistic drug combinations. Brief Bioinformatics 2022a;23:bbab390. [DOI] [PubMed] [Google Scholar]

[btad351-B26] Wang X, Ji H, Shi C. et al. Heterogeneous graph attention network. In: The World Wide Web Conference, WWW ’19, pp. 2022–32, New York: Association for Computing Machinery, 2019.

[btad351-B27] Wang X, Zhu H, Jiang Y. et al. PRODeepSyn: predicting anticancer synergistic drug combinations by embedding cell lines with protein–protein interaction network. Brief Bioinformatics 2022b;23:bbab587. [DOI] [PMC free article] [PubMed] [Google Scholar]

[btad351-B28] Yang J, Xu Z, Wu WKK. et al. GraphSynergy: a network-inspired deep learning model for anticancer drug combination prediction. J Am Med Inform Assoc 2021;28:2336–45. [DOI] [PMC free article] [PubMed] [Google Scholar]

[btad351-B29] Zagidullin B, Aldahdooh J, Zheng S. et al. DrugComb: an integrative cancer drug combination data portal. Nucleic Acids Res 2019;47:W43–51. [DOI] [PMC free article] [PubMed] [Google Scholar]

[btad351-B30] Zhang T, Zhang L, Payne PRO. et al. Synergistic Drug Combination Prediction by Integrating Multiomics Data in Deep Learning Models. New York: Springer US, 2021, 223–38. [DOI] [PubMed] [Google Scholar]

[btad351-B31] Zhang Y, Yang Q.. A survey on multi-task learning. IEEE Trans Knowledge Data Eng 2021;34:5586–609. [Google Scholar]

PERMALINK

A complete graph-based approach with multi-task learning for predicting synergistic drug combinations

Xiaowen Wang

Hongming Zhu

Danyi Chen

Yongsheng Yu

Qi Liu

Qin Liu

Roles

Abstract

Motivation

Results

Availability and implementation

1 Introduction

2 Materials and methods

2.1 Datasets

2.2 Drug features

2.3 Cell line features

2.4 CGMS

2.4.1 The heterogeneous complete graph

Figure 1.

2.4.2 Generation of the whole-graph embedding

Figure 2.

2.4.3 Prediction for multiple tasks

3 Results

3.1 Experimental setup

3.2 Method comparison

Table 1.

3.3 Predictions with different input orders of drug pairs

Figure 3.

3.4 Distribution of the whole-graph embeddings

Figure 4.

3.5 Analysis of graph attention in CGMS

Figure 5.

3.6 Ablation study for multi-task learning

Table 2.

4 Conclusion

Supplementary Material

Contributor Information

Supplementary data

Funding

Data availability

References

Associated Data

Supplementary Materials

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases