PANCDR: precise medicine prediction using an adversarial network for cancer drug response

Juyeon Kim; Sung-Hye Park; Hyunju Lee

doi:10.1093/bib/bbae088

. 2024 Mar 14;25(2):bbae088. doi: 10.1093/bib/bbae088

PANCDR: precise medicine prediction using an adversarial network for cancer drug response

Juyeon Kim ¹, Sung-Hye Park ^2,³, Hyunju Lee ^4,^5,^✉

PMCID: PMC10940842 PMID: 38487849

Abstract

Pharmacogenomics aims to provide personalized therapy to patients based on their genetic variability. However, accurate prediction of cancer drug response (CDR) is challenging due to genetic heterogeneity. Since clinical data are limited, most studies predicting drug response use preclinical data to train models. However, such models might not be generalizable to external clinical data due to differences between the preclinical and clinical datasets. In this study, a Precision Medicine Prediction using an Adversarial Network for Cancer Drug Response (PANCDR) model is proposed. PANCDR consists of two sub-models, an adversarial model and a CDR prediction model. The adversarial model reduces the gap between the preclinical and clinical datasets, while the CDR prediction model extracts features and predicts responses. PANCDR was trained using both preclinical data and unlabeled clinical data. Subsequently, it was tested on external clinical data, including The Cancer Genome Atlas and brain tumor patients. PANCDR outperformed other machine learning models in predicting external test data. Our results demonstrate the robustness of PANCDR and its potential in precision medicine by recommending patient-specific drug candidates. The PANCDR codes and data are available at https://github.com/DMCB-GIST/PANCDR.

Keywords: deep learning, cancer drug response, adversarial learning, domain adaptation

INTRODUCTION

The goal of pharmacogenomics is to provide personalized therapy based on genetic information for each patient [1]. Personalized therapy requires accurate prediction of cancer drug response (CDR). However, effective anticancer therapy prediction remains challenging due to genetic heterogeneity [2]. To address this challenge, public large-scale preclinical datasets, including Genomics of Drug Sensitivity in Cancer (GDSC) [3], Cancer Cell Line Encyclopedia (CCLE) [4] and Cancer Therapeutics Response Portal [5], have been created and machine learning approaches were utilized to predict drug response [6–8]. Since clinical datasets, such as The Cancer Genome Atlas (TCGA) [9], are limited, many studies have used preclinical datasets in model training.

CDR prediction models can be categorized as single-drug models and multi-drug models based on the number of drugs used in training. Single-drug models are trained and predict responses for a specific drug. Geeleher et al. [6] trained a logistic ridge regression model using GDSC data and applied it to TCGA data. Ding et al. [8] selected the feature of cell line data using autoencoder. They trained elastic net regression and support vector machines to predict drug responses. MOLI [10] and Super.FELT [11] are deep learning models that integrate multi-omics to predict drug responses. MOLI and Super.FELT are trained using GDSC and validated by external data, such as patient-derived xenograft [12] and TCGA. Velodrome [13] is a semi-supervised method for making generalizable predictions using both labeled and unlabeled data from different datasets. However, it is challenging to predict the response for new drugs that were not included in the training dataset using single-drug models.

Conversely, multi-drug models are trained to predict responses for multiple drugs. Multi-drug models can predict the responses of new drugs that were not included in the training data. CDRscan [7] is an ensemble model with five convolutional neural network (CNN) models. CDRscan uses mutations from COSMIC cell line project [14] and drugs from GDSC as input. DeepDR [15] is a deep learning model that pre-trains the encoder of mutation and expression using TCGA data, which is then trained with CCLE. However, unlike other multi-drug models, DeepDR cannot predict responses for drugs not included in the training set due to fixed output dimensions. DeepCDR [16] applies a hybrid graph convolutional network (GCN) that incorporates genomics, transcriptomics and epigenomics as input. DeepCDR consists of uniform GCN and omics-specific subnetworks. The multi-omics data from CCLE are used for training, while the multi-omics data from TCGA are used for external validation. GraphCDR [17] employs a graph neural network and contrastive learning to predict CDR. Genomic, epigenomic and transcriptomic data of GDSC were utilized as input.

In the field of machine learning, several studies have attempted to shift the distribution using a process called domain adaptation when differences in training and test data distributions are present. One effective approach to domain adaptation is through the use of adversarial networks. Adversarial-based domain adaptation methods employ discriminators to classify domains while encoders extract features from the input to deceive the discriminator [18–20]. In CDR prediction, numerous studies have used cell line data for model training [6–8, 11, 16, 17]. However, gene expression distributions differ between cell lines and patients. In addition, immune system, the tumor microenvironment and vasculature are lacking in cell lines [21]. Furthermore, the difference in growth rate between tumors and cultured cells affects gene distribution [22]. To address such disparities between preclinical and clinical data distributions, some studies have conducted model training using both preclinical and clinical data [13, 15, 23–25]. Among the studies, some have employed adversarial domain adaptation techniques [23–25]. AITL [23] and TUGDA [24] are multi-task learning models that employ adversarial networks to address discrepancies between preclinical and clinical data. Both models use gradient reversal to train the discriminator. The datasets used in both models were GDSC, CCLE and TCGA. Additional clinical trial datasets were used for AITL. CODE-AE [26] is an autoencoder that is capable of extracting hidden biological signals based on context-specific patterns and confounding factors. CODE-AE uses Wasserstein generative adversarial networks [25] to make cell-line and tissue samples similar. The limitations of the models lie in their single-drug nature, which presents challenges in predicting responses to new drugs. Moreover, the gradient reversal method can lead to a vanishing gradient, as the discriminator may converge too quickly during the early stages of training [20].

In this study, we propose precision medicine prediction using an adversarial network for cancer drug response (PANCDR). We aim to achieve accurate prediction of CDR, even with external clinical data such as TCGA, by training the PANCDR with preclinical data such as GDSC. PANCDR consists of two steps, discriminator training and CDR prediction model training. In the first step, the discriminator takes gene expression to distinguish unlabeled clinical data from preclinical data. The weights of the CDR prediction model are fixed during the discriminator training step. Next, the CDR prediction model is trained to predict CDR and fool the discriminator while the weights of the discriminator are fixed. The key distinctions between the existing CDR prediction models that utilize adversarial domain adaptation techniques and our approach lie in two aspects: firstly, our model is a multi-drug model, and secondly, we adopted a two-step process instead of the gradient reversal method, training the discriminator and CDR prediction model separately. Compared with the gradient reversal method, dividing the learning process into two steps enabled the model to obtain a stronger gradient [20]. After model training with both preclinical and unlabeled clinical data, the performance of PANCDR is evaluated with the external test, using clinical data with labels. Our results demonstrate that PANCDR outperforms other machine learning methods in the external test.

MATERIALS AND METHODS

Materials

We used the pan-cancer and pan-drug of the GDSC dataset [3] and the TCGA dataset [9]. In the binary drug response prediction task, we collected GDSC gene expression data with binary responses (sensitive or resistant). The GDSC dataset provides genomic profiles of cell lines and drug screening data. For GDSC, we downloaded raw gene expression data from ArrayExpress (E-MTAB-3610) and binary response from the supplementary data of Iorio et al. [3], where LOBICO [27] was employed to binarize the Inline graphic values. We excluded cell lines without response and drug samples with no PubChem Id or simplified molecular-input line-entry system (SMILES). We obtained a total of 112,575 instances across 950 cell lines and 151 drugs. The ratio of sensitive to resistant instances was around 1:7.5. The TCGA dataset provides genomic profiles of patients and clinic annotation. For TCGA, we downloaded gene expression data from http://gdac.broadinstitute.org/ and clinic annotation from the supplementary data of Ding et al. [28]. We categorized instances of ‘Complete Response’ and ‘Partial Response’ as sensitive, while ‘Clinical Progressive Disease’ and ‘Stable Disease’ were categorized as resistant. In the TCGA response data, some patients have received multiple drug treatments. Similar to the preprocessing approach employed in MOLI [10], we selected patient–drug response cases where only a single drug was administered at a time during a specific period. The TCGA gene expression data with clinical annotation were used for the external test, consisting of 666 instances with annotation across 569 patients and 69 drugs. The TCGA gene expression data without annotation were used to train the adversarial model, consisting of 9,424 primary solid tumor samples. The ratio of sensitive to resistant instances was around 1:1.

For gene expression data, TCGA data were converted to TPM and transformed into Inline graphic . Both TCGA and GDSC data were normalized by z-score for each sample. To reduce the dimension, we selected 702 genes from the COSMIC Cancer Gene Census (https://cancer.sanger.ac.uk/census) [29]. For drug data, we converted the SMILES form to a graph using RDKit (https://www.rdkit.org) and obtained a feature matrix and adjacent matrix using DeepChem [30]. Drugs without SMILES were excluded.

In the regression drug response prediction task, we collected GDSC gene expression data with continuous Inline graphic values for training the regression model. We used the natural log of as response data. A total of 282,218 instances across 921 cell lines and 357 drugs were used. TCGA data were utilized in the same manner as in classification tasks.

Furthermore, to show the applicability of our approach, we incorporated an additional dataset consisting of brain tumor samples obtained from Seoul National University Hospital. The dataset comprised a diverse range of brain tumor types, including five patients with medulloblastoma (MB), two patients with embryonal tumor with multilayered rosettes (ETMR), glioblastoma multiforme (GBM), germinoma, meningioangiomatosis, pediatiric low grade astrocytoma, papillary intralymphatic angioendothelioma (PILA), infant-hemispheric glioma, anaplastic astrocytoma (AA), low-grade glioma (LGG) and anaplastic pilocytic astrocytoma (APA). The gene expression data of brain tumor patients were preprocessed in the same way as that of TCGA. For each patient, Inline graphic values for the 357 drugs from GDSC were predicted to find potential candidate drugs. The study on brain tumors was approved by the Institutional Review Board of Seoul National University Hospital (IRB No:1905–108-1035).

Methods

PANCDR consists of two sub-models, the CDR prediction model and the adversarial model (Figure 1). The CDR prediction model extracted features from input data and predicted the CDR. The adversarial model is able to recognize if a feature is from GDSC or TCGA. PANCDR is trained by iterating two steps. The first step is training the discriminator. The discriminator uses gene expression features of GDSC and TCGA as input. The discriminator is trained to distinguish whether the data source is GDSC or TGCA. During this step, the weight of the CDR prediction model is fixed. We optimized the discriminator using binary cross-entropy loss. The second step is training the CDR prediction model and fooling the discriminator. In this step, the weights of the discriminator are fixed. The latent vectors of gene expression and drug graph are concatenated as input of CNN. To optimize the model in the second step, we designed the loss function Inline graphic as shown below:

The architecture of PANCDR. PANCDR comprises two sub-models, the CDR prediction model and the adversarial model. The CDR prediction model consists of an encoder, a UGCN and a CNN. The adversarial model uses a discriminator to distinguish the dataset and the encoder as a generator. In step 1, the encoder weight is fixed, and the discriminator is trained. In step 2, the CDR prediction model is trained while the discriminator weight is fixed. The symbol represents the concatenation of two latent vectors.

Inline graphic — The architecture of PANCDR. PANCDR comprises two sub-models, the CDR prediction model and the adversarial model. The CDR prediction model consists of an encoder, a UGCN and a CNN. The adversarial model uses a discriminator to distinguish the dataset and the encoder as a generator. In step 1, the encoder weight is fixed, and the discriminator is trained. In step 2, the CDR prediction model is trained while the discriminator weight is fixed. The symbol represents the concatenation of two latent vectors.

(1)

where Inline graphic and represent the loss function of the CDR prediction model and adversarial model, respectively, and is a regularization coefficient. The Adam optimizer was used to optimize both the CDR prediction model and the adversarial network. The details of data representation are illustrated in Supplementary Figure S1.

CDR prediction model

The CDR prediction model consists of three parts, Gaussian encoder for gene expression, uniform graph convolutional network (UGCN) for drug and CNN for prediction. The Gaussian encoder arose from a variational autoencoder. The latent vector of gene expression is calculated by reparameterization to enable backpropagation [31], as shown below:

(2)

where Inline graphic and represent the gene expression and the Gaussian encoder, while is sampled from a standard normal distribution. UGCN is capable of processing input matrices with diverse dimensions by augmenting the original graph with a complementary graph. The complementary graph is added to make the various input sizes the same size [16]. Similar to DeepCDR, the maximum atoms of preprocessed drug features were set to 100. We used UGCN, denoted as Inline graphic , to extract drug feature from , where and are drug feature matrix and adjacent matrix with preprocessing. The UGCN can be represented as . We concatenated and as , which was the input of the convolutional network h. We used 1D convolutional layers for CDR prediction, similar to a previously published paper [7]. The CDR prediction model takes binary cross entropy as a loss function Inline graphic :

(3)

where Inline graphic represents the drug annotation. For the regression model, we used the mean squared error as loss function :

(4)

where Inline graphic is the number of data instances.

Adversarial model

In our experiment, we used a Gaussian encoder instead of a generator. The discriminator is able to determine whether the data are from the cell line or patient data. We employed a strategy to deceive the discriminator, resulting in the encoder generating similar latent vectors for cell lines and patients. To achieve this, we first trained the discriminator D with the loss function, as shown below:

(5)

where Inline graphic and represent gene expression of the unlabeled TCGA data and GDSC data, respectively, while denotes the latent vector of . The unlabeled TCGA data used during model training did not participate in drug response prediction. Conversely, the labeled TCGA data were not used for training the model, but for the external test data. After training the discriminator, we trained the feature extractor to fool the discriminator by minimizing the loss function, as shown below:

(6)

Model evaluation

Evaluation of the classification model using GDSC and TCGA

In our experiment, cross-validation (CV) was used to evaluate model performance and a random hyperparameter search was performed to find the optimal set of hyperparameters (Supplementary Figure S2). In the case of preclinical data, GDSC, the model was evaluated by nested CV (Supplementary Figure S2A), which consisted of 10 folds in the outer loop and 5 folds in the inner loop. To find the optimal hyperparameters of the outer folds, 20 iterations of random hyperparameter search were conducted in the CV of the inner loop. The optimal hyperparameters were determined based on the highest average area under the ROC curve (AUC) of inner validation folds. In the inner training folds, 5% were selected randomly for early stopping. After finishing the inner CV, whole inner folds were used to train a model with the optimal hyperparameters, except for 5% of whole inner folds used for early stopping. The performance was calculated using the mean performance scores of outer test folds.

In the case of external test data, TCGA, 10-fold CV was used to tune the hyperparameters. To find the optimal hyperparameters, 20 iterations of random hyperparameter search were conducted in the k-fold CV. The optimal hyperparameters were determined based on the highest AUC average of the validation folds (Supplementary Figure S2B). After the 10-fold CV, the model was refitted with optimal hyperparameters using the entire training dataset, which was divided randomly into 95% for training and 5% for early stopping. The refitted model was subsequently evaluated using labeled TCGA data. To validate the robustness of the model, 100 iterations of the previously mentioned refitting process were conducted with random weight initialization. The average performance scores were then calculated. The training and validation data were fixed during 100 times of refitting. AUC, accuracy, precision, recall and F1 were performance scores.

The comparison models used in this study were random forest (RF), logistic regression (LR) and the recently developed AD-AE [32], CODE-AE-ADV [26] and DeepCDR [16] models. AD-AE is a model that utilizes adversarial networks to separate confounding signals from gene expression data, resulting in generalized embeddings. Although it was not initially designed for predicting CDR, it demonstrated the second-best performance in single-drug response prediction [26]. CODE-AE-ADV is an extension of the CODE-AE model that incorporates an adversarial network. We added UGCN to both AD-AE and CODE-AE-ADV, which were single-drug models, to convert multi-drug models. For AD-AE and CODE-AE-ADV, the autoencoder and discriminator were pre-trained. In the subsequent CDR prediction step, the UGCN, pre-trained encoder and drug response prediction classifier were trained, while the decoder and the discriminator were excluded from the training process. The machine learning model used grid search CV with k = 10 to find optimal hyperparameters. Since RF and LR can only take 2D features as input, it is necessary to extract drug features using a molecular representation method. We selected LayeredFP as the molecular representation method because the response prediction performance was consistently good on various datasets [33]. The hyperparameters we searched in the RF were maximum depth, minimum samples leaf and the number of estimators. In LR, the inverse of regularization strength and penalty were tuned by grid search. AD-AE, CODE-AE-ADV and DeepCDR model were evaluated similar to ours. We converted the Keras model of DeepCDR into PyTorch. The hyperparameters selected for tuning in AD-AE, CODE-AE-ADV and DeepCDR by random search were the latent vector dimensions of gene expressions and drug graphs, learning rate and batch size. In PANCDR, a hyperparameter search was performed by tuning the latent vector dimensions of gene expression and drug graph, the learning rates of the CDR prediction model and adversarial model, lambda and batch size. The range of hyperparameters of the comparing models and PANCDR can be found in Supplementary Table S1. The optimal hyperparameters we used are shown in Supplementary Table S2.

Robustness and biological analysis of PANCDR using TCGA data

We further trained the PANCDR using various ratios of unlabeled TCGA data to demonstrate robustness even with limited amounts of unlabeled clinical data for training. Additionally, we prioritized the contribution of genes in response prediction for each patient–drug pair using absolute SHapley Addictive exPlanations (SHAP) [34] values. We selected the top five highly contributed genes from SHAP to compare with known target genes of drugs. The known target genes for each drug were obtained from DGIdb [35]. The target genes were considered if they were included in the 702 genes used as input to PANCDR. On average, there were 14.83 target genes per drug, with a minimum of 1 and a maximum of 43. Among the 69 drugs and 666 drug–patient pairs in TCGA, we selected true positive pairs containing drugs with target genes from DGIdb, resulting in 30 drugs and 261 pairs.

Evaluation of the regression model on patient data

We evaluated the regression model of PANCDR based on two sets of patient data: TCGA and brain tumor data. We employed the 10-fold CV with random search to determine optimal hyperparameters. After obtaining the optimal hyperparameters, PANCDR was trained using randomly split 95% of GDSC data, and tested on the remaining 5% of GDSC data. The process was repeated 10 times and the model yielding the highest Pearson correlation coefficient value with the GDSC was selected as the best model. For TCGA data, we compared the predicted Inline graphic values depending on binary labels indicating resistance and sensitivity. In the brain tumor data from Seoul National University Hospital, the predicted values were -normalized using predicted values from TCGA for each drug. We selected drugs with predicted values less than -2 as well as with Inline graphic -score of predicted less than -2. Subsequently, we selected the top five unique drugs according to the -score and conducted a literature search.

PANCDR was implemented using PyTorch 1.10 with CUDA version 9.1 and NVIDIA GeForce RTX 3090 GPU graphics card.

RESULTS

Comparing the classification performance in GDSC and TCGA

We conducted model evaluations using nested CV for GDSC data and performed 100 times of refitting for TCGA data. PANCDR was compared with machine learning methods, existing adversarial network models (AD-AE, CODE-AE-ADV) and DeepCDR. Table 1 reports comparisons of the performances of baseline models and PANCDR in GDSC and TCGA. The GDSC prediction performance was obtained by averaging the scores of the test folds of the outer loop in nested CV. The TCGA prediction performance was obtained by averaging the scores of the 100 random weight initializations with optimal hyperparameters. The CV results are shown in Supplementary Tables S3–S10. Excluding recall in GDSC, the DeepCDR showed performance superior to PANCDR, with AUC, ACC, precision and F1 values of 0.8361, 0.7761, 0.3095 and 0.4328, respectively. However, PANCDR achieved the highest value in TCGA, with AUC, ACC, precision and F1 scores of 0.7106, 0.6686, 0.6491 and 0.6704, respectively. The results show that PANCDR was not overfitted to GDSC preclinical data and can be generalized to TCGA clinical data.

Table 1.

The performance of comparing methods and PANCDR on the GDSC and TCGA datasets.

	Model	AUC (std)	ACC (std)	Precision (std)	Recall (std)	F1 (std)
GDSC	DeepCDR	0.8361 (0.0048)	0.7761 (0.0250)	0.3095 (0.0234)	0.7263 (0.0360)	0.4328 (0.0171)
	PANCDR	0.7970 (0.0051)	0.7192 (0.0336)	0.2571 (0.0202)	0.7273 (0.0377)	0.3788 (0.0170)
TCGA	RF (LayeredFP)	0.5410 (0.0108)	0.5534 (0.0092)	0.5859 (0.0331)	0.3045 (0.1193)	0.3863 (0.0818)
	LR (LayeredFP)	0.4996 (0.0065)	0.5303 (0.0097)	0.5734 (0.0403)	0.2509 (0.2689)	0.2848 (0.1524)
	AD-AE	0.4918 (0.0423)	0.5239 (0.0240)	0.5186 (0.1071)	0.5462 (0.3319)	0.4640 (0.2062)
	CODE-AE-ADV	0.5350 (0.0424)	0.5549 (0.0308)	0.5696 (0.0577)	0.4812 (0.1996)	0.4930 (0.0874)
	DeepCDR	0.5273 (0.0510)	0.5552 (0.0396)	0.5679 (0.1007)	0.5112 (0.2111)	0.5027 (0.1246)
	PANCDR	0.7106 (0.0246)	0.6686 (0.0183)	0.6491 (0.0305)	0.7050 (0.1005)	0.6704 (0.0409)

Open in a new tab

std: standard deviation

We visualized the TCGA prediction performances of AD-AE, CODE-AE-ADV, DeepCDR and PANCDR across different hyperparameter sets and random weight initializations through plots. In Figure 2A, a violin plot shows the AUC scores of the external test in baseline deep learning models and PANCDR for 20 different combinations of hyperparameters used in the random search. The AUC scores differed greatly depending on the hyperparameters, excluding AD-AE. For CODE-AE-ADV and DeepCDR, the values of external test AUC were up to 0.6641 and 0.6590, respectively. However, the average test AUC values based on the optimal hyperparameters, determined using the validation data, were 0.5350 and 0.5273, respectively. PANCDR had a maximum external test AUC of 0.7385 and the optimal hyperparameter-based AUC of 0.7106. For these different combinations of hyperparameters, PANCDR showed significantly higher performance than AD-AE, CODE-AE-ADVand DeepCDR ( Inline graphic -values = [, , ]). Figure 2B shows a box plot comparing the TCGA AUC of baseline deep learning models and PANCDR, obtained by refitting the models 100 times with random weight initialization using optimal hyperparameters. PANCDR showed significantly higher TCGA AUC than AD-AE, CODE-AE-ADV and DeepCDR ( Inline graphic -values ). The standard deviation of PANCDR was 0.0246, which is much lower than that of other deep learning models.

Prediction performance of AD-AE, CODE-AE-ADV, DeepCDR and PANCDR on TCGA using various hyperparameter sets and data splits. (A) The violin plot shows the performance of models when they were trained using 20 different hyperparameter sets generated in 20 random searches. (B) The box plot shows the performance of models when the models were refitted 100 times with random weight initialization, using optimal hyperparameters.

Table 2 represents the performance for seen and unseen drugs in external TCGA data. Seen drugs refer to the drugs that were included in the GDSC data and used during the training. Conversely, unseen drugs refer to drugs that were not present in the GDSC data and were not used during model training. Of the 69 drugs in TCGA, 24 were seen, while the remaining 45 were unseen. When matching a patient with a drug, 505 pairs were seen drugs, while 161 pairs were unseen drugs. For both DeepCDR and PANCDR, performances in unseen drugs were lower than those in seen drugs. Nevertheless, the PANCDR performance in unseen drugs remained relatively high. Figure 3A shows the performance of DeepCDR and PANCDR in unseen drugs of TCGA. After excluding drugs with all resistant or all sensitive responses among the 45 unseen drugs, 16 drugs remained. PANCDR produced equal or higher AUC values than DeepCDR, except for pemetrexed and ifosfamide.

Table 2.

The seen and unseen drugs performances of PANCDR and DeepCDR on the TCGA datasets.

Model		AUC (std)	ACC (std)	Precision (std)	Recall (std)	F1 (std)
DeepCDR	Seen	0.5337 (0.0643)	0.5699 (0.0542)	0.5745 (0.0844)	0.5195 (0.1672)	0.5217 (0.0877)
	Unseen	0.5060 (0.0568)	0.5530 (0.0343)	0.5657 (0.0498)	0.7122 (0.2740)	0.5897 (0.1499)
PANCDR	Seen	0.7285 (0.0298)	0.6828 (0.0205)	0.6609 (0.0363)	0.6996 (0.1048)	0.6733 (0.0435)
	Unseen	0.6615 (0.0265)	0.6447 (0.0209)	0.6432 (0.0373)	0.7203 (0.1148)	0.6720 (0.0444)

Open in a new tab

std: standard deviation

The performances of PANCDR. (A) AUC scores for the unseen drugs compared with PANCDR and DeepCDR. The table on the right shows drug names and resistant/sensitive ratios (R/S) for each point. (B) Box plot of the predicted distributions of resistant and sensitive data (-value = )

In addition, we used UMAP [36] to visualize whether the encoder alleviated the difference between GDSC and TCGA (Supplementary Figure S3). The gene expressions of GDSC and TCGA were separated before passing through the encoder. Conversely, the latent vectors of the two datasets were fused after passing through the encoder.

We further conducted an ablation study to explore different PANCDR architectures, including PANCDR without discriminator during the CDR prediction step, similar to CODE-AE-ADV, and PANCDR with a simplified encoder without reparameterization instead of a Gaussian encoder (Supplementary Table S11). PANCDR without discriminator showed the lowest AUC score of the external test, 0.4851. PANCDR with a simplified encoder yielded a slightly lower external test AUC score of 0.6931 compared with PANCDR, with a higher standard deviation of 0.350.

Robustness and biological analysis of PANCDR using TCGA data

In this section, we reduced the unlabeled TCGA data used for the training to check the robustness of PANCDR when the number of clinical data is low. PANCDR was trained with 4,712 instances, half of the 9,424 total instances, and also with 942 instances, one-tenth of the instances. As a result, the performance of PANCDR was found to decrease slightly from 0.7106 to 0.7062 in AUC value (Table 3), but was still higher than the AUC value of baseline models. This result shows that the small number of unlabeled clinical data can increase prediction performance in the PANCDR model.

Table 3.

The performance of PANCDR with various ratios of unlabeled TCGA.

Ratio	AUC	ACC	Precision	Recall	F1
1	0.7106	0.6686	0.6491	0.7050	0.6704
0.5	0.7066	0.6654	0.6474	0.6955	0.6647
0.1	0.7062	0.6658	0.6489	0.6915	0.6638

Open in a new tab

Note: Ratio represents that of unlabeled TCGA data used for training.

Table 4 demonstrates that highly contributing genes in the response prediction of drugs were also known target genes of drugs. Among the 30 drugs and 261 drug–patient pairs, known target genes of 17 drugs and 148 pairs were included in the top five contributing genes. Notably, some drugs had their target genes ranked in the top priority. TP53 was ranked first in patients treated with paclitaxel, capecitabine, gemcitabine, carboplatin, cisplatin, cetuximab, fluorouracil and ifosfamide. MYC was also ranked first in cisplatin-treated patients. The detailed results are shown in Supplementary Table S12.

Table 4.

Target genes included in the top five genes.

Drug (total TG count)	TGs in the top five
Capecitabine (8)	MET, TP53
Carboplatin (23)	MET, TP53
Cetuximab (23)	AKT1, TP53
Cisplatin (43)	AKT1, BAX, MYC, TP53
Cyclophosphamide (17)	TP53
Dasatinib (41)	TP53
Docetaxel (21)	ERBB3, TP53
Doxorubicin (26)	TP53
Erlotinib (27)	ERBB3
Etoposide (13)	TP53
Fluorouracil (22)	TP53
Gemcitabine (25)	TP53
Ifosfamide (2)	TP53
Paclitaxel (28)	AKT1, BCL2, MET, PDGFRA, TP53
Pazopanib (19)	MET
Sorafenib (31)	CYP2C8, MET
Tamoxifen (12)	TP53

Open in a new tab

TG: target gene

Evaluation of the regression model on patient data

Next, we evaluated the regression performance of PANCDR. Similar to the classification model, we performed the nested CV with the random search for GDSC and the 10-fold CV with the random search for TCGA. The CV results are shown in Supplementary Tables S13–S15. The Pearson correlation of PANCDR in GDSC was 0.8864. As TCGA data do not have continuous Inline graphic values, we compared the predicted values depending on the binary label, resistant and sensitive. Figure 3B shows that the predicted value of sensitive data was significantly lower than that of resistant data (-value = ).

In addition, we predicted candidate drugs with another external test data, the gene expression of patients with brain tumor at Seoul National University Hospital. Inline graphic values of 357 drugs from GDSC were predicted for each patient with the trained model using GDSC.

To select sensitive patient–drug pairs, we set the threshold of predicted Inline graphic values to -2, the same value used by Chang et al.[7]. However, the predicted value alone is insufficient to determine the sensitivity of a drug due to the variation in the response range among drugs. Thus, the predicted values were -normalized by drugs to determine the relative sensitivity of drugs. We also set the threshold of Inline graphic -scores of predicted values to -2, according to the GDSC database. The sensitive pairs were selected when both the predicted values and -scores were less than or equal to -2. Next, we sorted drugs based on the minimum -score for each patient, and the top five drugs were selected as candidates.

For patients with brain tumor at Seoul National University Hospital, the top five drugs were methotrexate, panobinostat, GSK1070916, trichostatin A and dacinostat (Table 5). All five drugs were previously known to be associated with brain cancer [37, 42, 48, 49, 51]. First, PANCDR predicted that methotrexate was a candidate drug in eight patients, including three MBs, ETMR, GBM, germinoma, LGG and APA. Previous studies on methotrexate were found in MB, ETMR, GBM and germinoma. Methotrexate was associated with improved survival in MB patients [37, 38]. ETMR is an extremely rare brain tumor with no standard treatment. Brain tumor treatment protocols in children include PNET-HR, German HIT, COG study and Head Start regimen. Methotrexate is among the commonly used drugs in such regimens. Increasing the dose of cyclophosphamide and methotrexate in the Head Start regimen for ETMRs could be used as adjuvant chemotherapy after surgery [39]. In GBM cells, the viability of the cells was decreased by methotrexate [40]. Furthermore, high-dose methotrexate induced complete remission in the 27-year-old germinoma patient [41]. Next, panobinostat was predicted as a candidate drug in 15 patients, except for PILA. Panobinostat showed high sensitivity in MB cell lines and significantly improved the survival in MB mouse model [42, 43]. The ETMR cell line showed good sensitivity when treated with panobinostat. Therefore, panobinostat could be suggested as a potential treatment [44]. The cotreatment with panobinostat was reported to have a synergistic effect in the GBM cell line [45–47]. GSK1070916 was predicted as a candidate drug in nine patients. The combination of GSK1070916 with JQ1 was reported to have a synergistic effect in the GBM cell line [48]. Trichostatin A was predicted as a candidate drug in nine patients. Trichostatin A was highly sensitive in MB cell lines [49]. It partly promoted apoptosis in the GBM cells and suppressed tumor growth in a mouse GBM model [50]. Dacinostat, also known as LAQ824, was predicted to be a candidate drug in all patients. In MB cell lines, dacinostat induced cell apoptosis and cell cycle arrest at the G2/M stage. Moreover, dacinostat inhibited tumor growth in the MB mouse model. These findings support the use of dacinostat as a potential treatment for MB patients [51]. Dacinostat also exhibited better apoptosis induction when treated with 2-DG in GBM cell lines [52].

Table 5.

Recommended drugs for brain tumor patients.

Drug	Patients	Related papers
Methotrexate	3 MBs, ETMR, GBM, germinoma, LGG, APA	MB [37, 38], ETMR [39], GBM [40], Germinoma [41]
Panobinostat	15 out of 16 patients, excluding PILA	MB [42, 43], ETMR [44], GBM [45–47]
GSK1070916	3 MBs, ETMR, GBM, germinoma, APA, infant-hemispheric glioma, AA	GBM [48]
Trichostatin A	4 MBs, 2 ETMRs, GBM, germinoma, APA	MB [49], GBM [50]
Dacinostat	All 16 patients	MB [51], GBM [52]

Open in a new tab

We also examined candidate drug predictions for TCGA patients based on cancer types, and the results are provided in Supplementary Table S16. Taken together, our PANCDR model can recommend drugs for real patients using their expression dataset.

Performance comparison with a model trained on patient data

We compared PANCDR with a CDR prediction model trained with the labeled TCGA dataset. Because it was trained without unlabeled TCGA data and GDSC data, the adversarial model was not used. Prediction performances with only labeled TCGA data were evaluated through a 5-fold nested cross-validation (Supplementary Figure S4A). The average AUC value for the outer test fold was 0.7474 with a standard deviation of 0.1260 (Supplementary Tables S17 and S18). The lowest AUC was 0.5244, and the highest AUC was 0.8123, indicating substantial performance variations depending on the data split. For a comparison with PANCDR, for PANCDR, within each outer fold, validation data were selected randomly from the outer training fold, with the size matching with that of the inner validation fold (Supplementary Figure S4B). Subsequently, the best model among the 100 trained weights was identified based on validation AUC. The best model was then applied to calculate the AUC value for the outer test fold, with the process iterated across all outer folds. The resulting average AUC value for the outer test fold was 0.7401 with a standard deviation of 0.0531. The lowest and highest AUC were 0.6923 and 0.8306, respectively, demonstrating a relatively stable performance. It is important to note that the weights used in this experiment were obtained from the preceding experiment with 100 randomly initialized weights. Therefore, PANCDR was not trained on labeled TCGA data in this experiment. Although PANCDR was not trained on the labeled TCGA data, it has a performance similar to that of the model trained with the labeled TCGA data.

DISCUSSION

In this study, we proposed PANCDR, an adversarial network-based method for predicting CDR in precision medicine. Although PANCDR underperformed in the internal tests, it outperformed in external tests. PANCDR seemed to prevent overfitting to cell lines by adding adversary loss. The existing models, AD-AE and CODE-AE-ADV, which utilize adversarial networks for CDR prediction, demonstrated lower performance. In the ablation study, PANCDR without discriminator during the CDR prediction step also demonstrated lower performance. The results suggest that training the adversarial network together with the CDR prediction model improves generalization and contributes to the prediction performance in external test data. The ablation study results also indicate that the Gaussian encoder contributed to the improvement of model performance and enhanced its stability. When PANCDR was trained 100 times with optimal hyperparameters, the standard deviation of the AUC was lower, and the AUC was consistently higher than other deep learning models in TCGA. PANCDR also showed similar performance even when the number of unlabeled TCGA data used for the train was reduced. These results suggest that PANCDR is robust and is applicable to other clinical data with fewer samples. Moreover, PANCDR exhibited a performance similar to that of the model trained on labeled TCGA data, which was used as the external test in PANCDR. This implies that PANCDR has strong generalization capabilities.

In biological analysis, we found that more than half of all drug–patient pairs included target genes in the highly contributing top five genes, although the average number of known target genes per drug is approximately 2% of all genes. The results demonstrate that PANCDR is effective at capturing important features of each drug. Furthermore, PANCDR is able to find genes related to the drug. In the regression model, PANCDR was shown to provide candidate drugs for patients with cancer.

We should consider multi-omics data as input in future work. Recent studies have shown that model performance using multi-omics is superior to that when gene expression is used alone [10, 16]. The use of multi-omics data such as mutation, methylation and CNA can further improve PANCDR performance.

Although PANCDR has outperformed in predicting CDR in clinical data, it still has some limitations. First, the adjustment of the latent vectors between cell lines and patients by adversarial learning was based on a methodological approach to integrate data from different domains. Therefore, the process could introduce false-positive or false-negative drug responses. Furthermore, it has been demonstrated clinically that drug combination is effective [53]. Since PANCDR was trained using single-agent treatment data, it struggles to predict the synergistic effects of such drug combinations. Lastly, our model does not have the capacity to predict the toxicity or potential side effects within the human body. Therefore, further research is necessary to address such aspects for real-world clinical applications.

Key Points

PANCDR leverages both the CDR prediction model and the adversarial model to achieve domain adaptation, improving its generalizability to external clinical datasets.
In testing with external clinical data, PANCDR outperformed other machine learning models and achieved the highest performance.
Based on the analysis of the target genes and brain tumor patients, the predicted drug responses and extracted gene expression features generated by PANCDR contain biologically meaningful information.

Supplementary Material

Supplementary_materials_bbae088

supplementary_materials_bbae088.pdf^{(2.5MB, pdf)}

ACKNOWLEDGEMENTS

This work was supported by an Institute of Information & Communications Technology Planning & Evaluation (IITP) grant funded by the Korean government (MSIT) (No. 2019-0-00567, Development of Intelligent SW Systems for Uncovering Genetic Variation and Developing Personalized Medicine for Cancer Patients with Unknown Molecular Genetic Mechanisms, No. 2019-0-01842, Artificial Intelligence Graduate School Program [GIST]). The brain tumor biospecimens and data used in this study were provided by the Human Biobank of Seoul National University Hospital, a member of Korea Biobank Network (KBN4_A03), and Seoul National University Hospital Cancer Tissue Bank. All samples derived from the Biobanks of SNUH were obtained with informed consent under institutional review board-approved protocols.

Author Biographies

Juyeon Kim is a mater degree student at Gwangju Institute of Science and Technology (GIST), working on research of developing computational methods for drug response prediction.

Sung-Hye Park, MD/PhD, is a tenured professor in the Department of Pathology at Seoul National University College of Medicine and an expert in neuropathology, neuro-oncology, and molecular pathology.

Hyunju Lee, PhD, is a tenured professor of GIST, leading data mining and computational biology laboratory for developing artificial intelligence approaches for biological problems.

Contributor Information

Juyeon Kim, School of Electrical Engineering and Computer Science, Gwangju Institute of Science and Technology, 61005, Gwangju, South Korea.

Sung-Hye Park, Department of Pathology, Seoul National University Hospital, Seoul National University College of Medicine, 03080, Seoul, South Korea; Neuroscience Research Institute, Seoul National University College of Medicine, 03080, Seoul, South Korea.

Hyunju Lee, School of Electrical Engineering and Computer Science, Gwangju Institute of Science and Technology, 61005, Gwangju, South Korea; Artificial Intelligence Graduate School, Gwangju Institute of Science and Technology, 61005, Gwangju, South Korea.

AUTHOR CONTRIBUTIONS

H.L. initiated and supervised the project. H.L., J.K. and S.P. collected data and analyzed the results. H.L. and J.K. developed the algorithm and wrote the manuscript. Y.K. performed the experiments.

DATA AND CODE AVAILABILITY

The PANCDR codes and data are available at https://github.com/DMCB-GIST/PANCDR. The data on GitHub are publicly available data and have been preprocessed.

References

1. Evans WE, Relling MV. Pharmacogenomics: translating functional genomics into rational therapeutics. Science 1999;286(5439):487–91. [DOI] [PubMed] [Google Scholar]
2. Lee J-K, Liu Z, Sa JK, et al.. Pharmacogenomic landscape of patient-derived tumor cells informs precision oncology therapy. Nat Genet 2018;50(10):1399–411. [DOI] [PMC free article] [PubMed] [Google Scholar]
3. Iorio F, Knijnenburg TA, Vis DJ, et al.. A landscape of pharmacogenomic interactions in cancer. Cell 2016;166(3):740–54. [DOI] [PMC free article] [PubMed] [Google Scholar]
4. Barretina J, Caponigro G, Stransky N, et al.. The cancer cell line encyclopedia enables predictive modelling of anticancer drug sensitivity. Nature 2012;483(7391):603–7. [DOI] [PMC free article] [PubMed] [Google Scholar]
5. Seashore-Ludlow B, Rees MG, Cheah JH, et al.. Harnessing connectivity in a large-scale small-molecule sensitivity dataset. Cancer Discov 2015;5(11):1210–23. [DOI] [PMC free article] [PubMed] [Google Scholar]
6. Geeleher P, Zhang Z, Wang F, et al.. Discovering novel pharmacogenomic biomarkers by imputing drug response in cancer patients from large genomics studies. Genome Res 2017;27(10):1743–51. [DOI] [PMC free article] [PubMed] [Google Scholar]
7. Chang Y, Park H, Yang H-J, et al.. Cancer drug response profile scan (cdrscan): a deep learning model that predicts drug effectiveness from cancer genomic signature. Sci Rep 2018;8(1):1–11. [DOI] [PMC free article] [PubMed] [Google Scholar]
8. Ding MQ, Chen L, Cooper GF, et al.. Precision oncology beyond targeted therapy: combining omics data with machine learning matches the majority of cancer cells to effective therapeutics. Mol Cancer Res 2018;16(2):269–78. [DOI] [PMC free article] [PubMed] [Google Scholar]
9. Weinstein JN, Collisson EA, Mills GB, et al.. The cancer genome atlas pan-cancer analysis project. Nat Genet 2013;45(10):1113–20. [DOI] [PMC free article] [PubMed] [Google Scholar]
10. Sharifi-Noghabi H, Zolotareva O, Collins CC, Ester M. Moli: multi-omics late integration with deep neural networks for drug response prediction. Bioinformatics 2019;35(14):i501–9. [DOI] [PMC free article] [PubMed] [Google Scholar]
11. Park S, Soh J, Lee H. Super. Felt: supervised feature extraction learning using triplet loss for drug response prediction with multi-omics data. BMC Bioinformatics 2021;22(1):1–23. [DOI] [PMC free article] [PubMed] [Google Scholar]
12. Gao H, Korn JM, Ferretti S, et al.. High-throughput screening using patient-derived tumor xenografts to predict clinical trial drug response. Nat Med 2015;21(11):1318–25. [DOI] [PubMed] [Google Scholar]
13. Sharifi-Noghabi H, Harjandi PA, Zolotareva O, et al.. Out-of-distribution generalization from labelled and unlabelled gene expression data for drug response prediction. Nat Mach Intell 2021;3(11):962–72. [Google Scholar]
14. Forbes SA, Beare D, Boutselakis H, et al.. Cosmic: somatic cancer genetics at high-resolution. Nucleic Acids Res 2017;45(D1):D777–83. [DOI] [PMC free article] [PubMed] [Google Scholar]
15. Chiu Y-C, Chen H-IH, Zhang T, et al.. Predicting drug response of tumors from integrated genomic profiles by deep neural networks. BMC Med Genomics 2019;12(1):143–55. [DOI] [PMC free article] [PubMed] [Google Scholar]
16. Liu Q, Zhiqiang H, Jiang R, Zhou M. Deepcdr: a hybrid graph convolutional network for predicting cancer drug response. Bioinformatics 2020;36(Supplement_2):i911–8. [DOI] [PubMed] [Google Scholar]
17. Liu X, Song C, Huang F, et al.. Graphcdr: a graph neural network method with contrastive learning for cancer drug response prediction. Brief Bioinform 2022;23(1):bbab457. [DOI] [PubMed] [Google Scholar]
18. Tzeng E, Hoffman J, Zhang N, et al.. Deep domain confusion: maximizing for domain invariance arXiv preprint arXiv:1412.3474. 2014.
19. Ganin Y, Ustinova E, Ajakan H, et al.. Domain-adversarial training of neural networks. J Mach Learn Res 2016;17(1):2096–30. [Google Scholar]
20. Tzeng Eric, Hoffman Judy, Saenko Kate, and Darrell Trevor. Adversarial discriminative domain adaptation. InProceedings of the IEEE conference on computer vision and pattern recognition. pp 7167–76, 2017, Honolulu, HI, USA.
21. Mourragui S, Loog M, Van De Wiel MA, et al.. Precise: a domain adaptation approach to transfer predictors of drug response from pre-clinical models to tumors. Bioinformatics 2019;35(14):i510–9. [DOI] [PMC free article] [PubMed] [Google Scholar]
22. Gillet J-P, Varma S, Gottesman MM. The clinical relevance of cancer cell lines. J Natl Cancer Inst 2013;105(7):452–8. [DOI] [PMC free article] [PubMed] [Google Scholar]
23. Sharifi-Noghabi H, Peng S, Zolotareva O, et al.. Aitl: adversarial inductive transfer learning with input and output space adaptation for pharmacogenomics. Bioinformatics 2020;36(Supplement_1):i380–8. [DOI] [PMC free article] [PubMed] [Google Scholar]
24. da Silva RP, Suphavilai C, Nagarajan N. Tugda: task uncertainty guided domain adaptation for robust generalization of cancer drug response prediction from in vitro to in vivo settings. Bioinformatics 2021;37(Supplement_1):i76–83. [DOI] [PMC free article] [PubMed] [Google Scholar]
25. Arjovsky M, Chintala S, Bottou L. Wasserstein generative adversarial networks. In:International conference on machine learning. PMLR, 2017, 214–23. [Google Scholar]
26. He D, Liu Q, You W, Xie L. A context-aware deconfounding autoencoder for robust prediction of personalized clinical drug response from cell-line compound screening. Nat Mach Intell 2022;4(10):879–92. [Google Scholar]
27. Knijnenburg TA, Klau GW, Iorio F, et al.. Logic models to predict continuous outputs based on binary inputs with an application to personalized cancer therapy. Sci Rep 2016;6:36812. [DOI] [PMC free article] [PubMed] [Google Scholar]
28. Ding Z, Songpeng Z, Jin G. Evaluating the molecule-based prediction of clinical drug responses in cancer. Bioinformatics 2016;32(19):2891–5. [DOI] [PubMed] [Google Scholar]
29. Tate JG, Bamford S, Jubb HC, et al.. Cosmic: the catalogue of somatic mutations in cancer. Nucleic Acids Res 2019;47(D1):D941–7. [DOI] [PMC free article] [PubMed] [Google Scholar]
30. Ramsundar B, Eastman P, Walters P, et al.. Deep Learning for the Life Sciences. Sebastopol, CA, USA: O’Reilly Media, 2019. https://www.amazon.com/Deep-Learning-Life-Sciences-Microscopy/dp/1492039837. [Google Scholar]
31. Kingma DP, Welling M. Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114. 2013.
32. Dincer AB, Janizek JD, Lee S-I. Adversarial deconfounding autoencoder for learning robust gene expression embeddings. Bioinformatics 2020;36:i573–82. [DOI] [PMC free article] [PubMed] [Google Scholar]
33. Baptista D, Correia J, Pereira B, Rocha M. Evaluating molecular representations in machine learning models for drug response prediction and interpretability. J Integr Bioinform 2022;19(3). [DOI] [PMC free article] [PubMed] [Google Scholar]
34. Lundberg SM, Lee S-I. A unified approach to interpreting model predictions. In: Guyon I, Von Luxburg U, Bengio S et al. (eds). Advances in Neural Information Processing Systems. Vol. 30. Long Beach, CA, USA: Curran Associates, Inc., 2017. [Google Scholar]
35. Freshour SL, Kiwala S, Cotto KC, et al.. Integration of the drug–gene interaction database (dgidb 4.0) with open crowdsource efforts. Nucleic Acids Res 2021;49(D1):D1144–51. [DOI] [PMC free article] [PubMed] [Google Scholar]
36. McInnes L, Healy J, Saul N, GroÃYberger L. Umap: uniform manifold approximation and projection. J Open Source Softw 2018;3(29):861. [Google Scholar]
37. Chi SN, Gardner SL, Levy AS, et al.. Feasibility and response to induction chemotherapy intensified with high-dose methotrexate for young children with newly diagnosed high-risk disseminated medulloblastoma. J Clin Oncol 2004;22(24):4881–7. [DOI] [PubMed] [Google Scholar]
38. Pompe RS, von Bueren AO, Mynarek M, et al.. Intraventricular methotrexate as part of primary therapy for children with infant and/or metastatic medulloblastoma: feasibility, acute toxicity and evidence for efficacy. Eur J Cancer 2015;51(17):2634–42. [DOI] [PubMed] [Google Scholar]
39. Khan S, Solano-Paez P, Suwal T, et al.. Clinical phenotypes and prognostic features of embryonal tumours with multi-layered rosettes: a rare brain tumor registry study. The Lancet Child & Adolescent Health 2021;5(11):800–13. [DOI] [PubMed] [Google Scholar]
40. Lopes DV, de Fraga A, Dias LF, et al.. Influence of nsaids and methotrexate on cd73 expression and glioma cell growth. Purinergic Signalling 2021;17(2):273–84. [DOI] [PMC free article] [PubMed] [Google Scholar]
41. Fietz T, Thiel E, Baldus C, et al.. Successful treatment of extracranially metastasized pineal gland germinoma with high-dose methotrexate. Ann Oncol 2002;13(10):1681–5. [DOI] [PubMed] [Google Scholar]
42. Milde T, Lodrini M, Savelyeva L, et al.. Hd-mb03 is a novel group 3 medulloblastoma model demonstrating sensitivity to histone deacetylase inhibitor treatment. J Neurooncol 2012;110(3):335–48. [DOI] [PubMed] [Google Scholar]
43. Phi JH, Choi SA, Kwak PA, et al.. Panobinostat, a histone deacetylase inhibitor, suppresses leptomeningeal seeding in a medulloblastoma animal model. Oncotarget 2017;8:56747–57. [DOI] [PMC free article] [PubMed] [Google Scholar]
44. Schmidt C, Schubert NA, Brabetz S, et al.. Preclinical drug screen reveals topotecan, actinomycin d, and volasertib as potential new therapeutic candidates for etmr brain tumor patients. Neuro Oncol 2017;19(12):1607–17. [DOI] [PMC free article] [PubMed] [Google Scholar]
45. Meng W, Wang B, Mao W, et al.. Enhanced efficacy of histone deacetylase inhibitor combined with bromodomain inhibitor in glioblastoma. J Exp Clin Cancer Res 2018;37(1):1–11. [DOI] [PMC free article] [PubMed] [Google Scholar]
46. Meng W, Wang B, Mao W, et al.. Enhanced efficacy of histone deacetylase inhibitor panobinostat combined with dual pi3k/mtor inhibitor bez235 against glioblastoma. Nagoya J Med Sci 2019;81(1):93–102. [DOI] [PMC free article] [PubMed] [Google Scholar]
47. De La Rosa J, Urdiciain A, Zazpe I, et al.. The synergistic effect of dz-nep, panobinostat and temozolomide reduces clonogenicity and induces apoptosis in glioblastoma cells. Int J Oncol 2020;56(1):283–300. [DOI] [PubMed] [Google Scholar]
48. Stathias V, Jermakowicz AM, Maloof ME, et al.. Drug and disease signature integration identifies synergistic combinations in glioblastoma. Nat Commun 2018;9:5315. [DOI] [PMC free article] [PubMed] [Google Scholar]
49. Furchert SE, Lanvers-Kaminsky C, Juürgens H, et al.. Inhibitors of histone deacetylases as potential therapeutic tools for high-risk embryonal tumors of the nervous system of childhood. Int J Cancer 2007;120(8):1787–94. [DOI] [PubMed] [Google Scholar]
50. Hoering E, Podlech O, Silkenstedt B, et al.. The histone deacetylase inhibitor trichostatin a promotes apoptosis and antitumor immunity in glioblastoma cells. Anticancer Res 2013;33(4):1351–60. [PubMed] [Google Scholar]
51. Zhang S, Gong Z, Oladimeji PO, et al.. A high-throughput screening identifies histone deacetylase inhibitors as therapeutic agents against medulloblastoma. Exp Hematol Oncol 2019;8(1):1–10. [DOI] [PMC free article] [PubMed] [Google Scholar]
52. Egler V, Korur S, Failly M, et al.. Histone deacetylase inhibition and blockade of the glycolytic pathway synergistically induce glioblastoma cell death. Clin Cancer Res 2008;14(10):3132–40. [DOI] [PubMed] [Google Scholar]
53. Al-Lazikani B, Banerji U, Workman P. Combinatorial drug therapy for cancer in the post-genomic era. Nat Biotechnol 2012;30(7):679–92. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary_materials_bbae088

supplementary_materials_bbae088.pdf^{(2.5MB, pdf)}

Data Availability Statement

The PANCDR codes and data are available at https://github.com/DMCB-GIST/PANCDR. The data on GitHub are publicly available data and have been preprocessed.

[ref1] 1. Evans WE, Relling MV. Pharmacogenomics: translating functional genomics into rational therapeutics. Science 1999;286(5439):487–91. [DOI] [PubMed] [Google Scholar]

[ref2] 2. Lee J-K, Liu Z, Sa JK, et al.. Pharmacogenomic landscape of patient-derived tumor cells informs precision oncology therapy. Nat Genet 2018;50(10):1399–411. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref3] 3. Iorio F, Knijnenburg TA, Vis DJ, et al.. A landscape of pharmacogenomic interactions in cancer. Cell 2016;166(3):740–54. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref4] 4. Barretina J, Caponigro G, Stransky N, et al.. The cancer cell line encyclopedia enables predictive modelling of anticancer drug sensitivity. Nature 2012;483(7391):603–7. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref5] 5. Seashore-Ludlow B, Rees MG, Cheah JH, et al.. Harnessing connectivity in a large-scale small-molecule sensitivity dataset. Cancer Discov 2015;5(11):1210–23. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref6] 6. Geeleher P, Zhang Z, Wang F, et al.. Discovering novel pharmacogenomic biomarkers by imputing drug response in cancer patients from large genomics studies. Genome Res 2017;27(10):1743–51. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref7] 7. Chang Y, Park H, Yang H-J, et al.. Cancer drug response profile scan (cdrscan): a deep learning model that predicts drug effectiveness from cancer genomic signature. Sci Rep 2018;8(1):1–11. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref8] 8. Ding MQ, Chen L, Cooper GF, et al.. Precision oncology beyond targeted therapy: combining omics data with machine learning matches the majority of cancer cells to effective therapeutics. Mol Cancer Res 2018;16(2):269–78. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref9] 9. Weinstein JN, Collisson EA, Mills GB, et al.. The cancer genome atlas pan-cancer analysis project. Nat Genet 2013;45(10):1113–20. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref10] 10. Sharifi-Noghabi H, Zolotareva O, Collins CC, Ester M. Moli: multi-omics late integration with deep neural networks for drug response prediction. Bioinformatics 2019;35(14):i501–9. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref11] 11. Park S, Soh J, Lee H. Super. Felt: supervised feature extraction learning using triplet loss for drug response prediction with multi-omics data. BMC Bioinformatics 2021;22(1):1–23. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref12] 12. Gao H, Korn JM, Ferretti S, et al.. High-throughput screening using patient-derived tumor xenografts to predict clinical trial drug response. Nat Med 2015;21(11):1318–25. [DOI] [PubMed] [Google Scholar]

[ref13] 13. Sharifi-Noghabi H, Harjandi PA, Zolotareva O, et al.. Out-of-distribution generalization from labelled and unlabelled gene expression data for drug response prediction. Nat Mach Intell 2021;3(11):962–72. [Google Scholar]

[ref14] 14. Forbes SA, Beare D, Boutselakis H, et al.. Cosmic: somatic cancer genetics at high-resolution. Nucleic Acids Res 2017;45(D1):D777–83. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref15] 15. Chiu Y-C, Chen H-IH, Zhang T, et al.. Predicting drug response of tumors from integrated genomic profiles by deep neural networks. BMC Med Genomics 2019;12(1):143–55. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref16] 16. Liu Q, Zhiqiang H, Jiang R, Zhou M. Deepcdr: a hybrid graph convolutional network for predicting cancer drug response. Bioinformatics 2020;36(Supplement_2):i911–8. [DOI] [PubMed] [Google Scholar]

[ref17] 17. Liu X, Song C, Huang F, et al.. Graphcdr: a graph neural network method with contrastive learning for cancer drug response prediction. Brief Bioinform 2022;23(1):bbab457. [DOI] [PubMed] [Google Scholar]

[ref18] 18. Tzeng E, Hoffman J, Zhang N, et al.. Deep domain confusion: maximizing for domain invariance arXiv preprint arXiv:1412.3474. 2014.

[ref19] 19. Ganin Y, Ustinova E, Ajakan H, et al.. Domain-adversarial training of neural networks. J Mach Learn Res 2016;17(1):2096–30. [Google Scholar]

[ref20] 20. Tzeng Eric, Hoffman Judy, Saenko Kate, and Darrell Trevor. Adversarial discriminative domain adaptation. InProceedings of the IEEE conference on computer vision and pattern recognition. pp 7167–76, 2017, Honolulu, HI, USA.

[ref21] 21. Mourragui S, Loog M, Van De Wiel MA, et al.. Precise: a domain adaptation approach to transfer predictors of drug response from pre-clinical models to tumors. Bioinformatics 2019;35(14):i510–9. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref22] 22. Gillet J-P, Varma S, Gottesman MM. The clinical relevance of cancer cell lines. J Natl Cancer Inst 2013;105(7):452–8. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref23] 23. Sharifi-Noghabi H, Peng S, Zolotareva O, et al.. Aitl: adversarial inductive transfer learning with input and output space adaptation for pharmacogenomics. Bioinformatics 2020;36(Supplement_1):i380–8. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref24] 24. da Silva RP, Suphavilai C, Nagarajan N. Tugda: task uncertainty guided domain adaptation for robust generalization of cancer drug response prediction from in vitro to in vivo settings. Bioinformatics 2021;37(Supplement_1):i76–83. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref25] 25. Arjovsky M, Chintala S, Bottou L. Wasserstein generative adversarial networks. In:International conference on machine learning. PMLR, 2017, 214–23. [Google Scholar]

[ref26] 26. He D, Liu Q, You W, Xie L. A context-aware deconfounding autoencoder for robust prediction of personalized clinical drug response from cell-line compound screening. Nat Mach Intell 2022;4(10):879–92. [Google Scholar]

[ref27] 27. Knijnenburg TA, Klau GW, Iorio F, et al.. Logic models to predict continuous outputs based on binary inputs with an application to personalized cancer therapy. Sci Rep 2016;6:36812. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref28] 28. Ding Z, Songpeng Z, Jin G. Evaluating the molecule-based prediction of clinical drug responses in cancer. Bioinformatics 2016;32(19):2891–5. [DOI] [PubMed] [Google Scholar]

[ref29] 29. Tate JG, Bamford S, Jubb HC, et al.. Cosmic: the catalogue of somatic mutations in cancer. Nucleic Acids Res 2019;47(D1):D941–7. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref30] 30. Ramsundar B, Eastman P, Walters P, et al.. Deep Learning for the Life Sciences. Sebastopol, CA, USA: O’Reilly Media, 2019. https://www.amazon.com/Deep-Learning-Life-Sciences-Microscopy/dp/1492039837. [Google Scholar]

[ref31] 31. Kingma DP, Welling M. Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114. 2013.

[ref32] 32. Dincer AB, Janizek JD, Lee S-I. Adversarial deconfounding autoencoder for learning robust gene expression embeddings. Bioinformatics 2020;36:i573–82. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref33] 33. Baptista D, Correia J, Pereira B, Rocha M. Evaluating molecular representations in machine learning models for drug response prediction and interpretability. J Integr Bioinform 2022;19(3). [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref34] 34. Lundberg SM, Lee S-I. A unified approach to interpreting model predictions. In: Guyon I, Von Luxburg U, Bengio S et al. (eds). Advances in Neural Information Processing Systems. Vol. 30. Long Beach, CA, USA: Curran Associates, Inc., 2017. [Google Scholar]

[ref35] 35. Freshour SL, Kiwala S, Cotto KC, et al.. Integration of the drug–gene interaction database (dgidb 4.0) with open crowdsource efforts. Nucleic Acids Res 2021;49(D1):D1144–51. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref36] 36. McInnes L, Healy J, Saul N, GroÃYberger L. Umap: uniform manifold approximation and projection. J Open Source Softw 2018;3(29):861. [Google Scholar]

[ref37] 37. Chi SN, Gardner SL, Levy AS, et al.. Feasibility and response to induction chemotherapy intensified with high-dose methotrexate for young children with newly diagnosed high-risk disseminated medulloblastoma. J Clin Oncol 2004;22(24):4881–7. [DOI] [PubMed] [Google Scholar]

[ref38] 38. Pompe RS, von Bueren AO, Mynarek M, et al.. Intraventricular methotrexate as part of primary therapy for children with infant and/or metastatic medulloblastoma: feasibility, acute toxicity and evidence for efficacy. Eur J Cancer 2015;51(17):2634–42. [DOI] [PubMed] [Google Scholar]

[ref39] 39. Khan S, Solano-Paez P, Suwal T, et al.. Clinical phenotypes and prognostic features of embryonal tumours with multi-layered rosettes: a rare brain tumor registry study. The Lancet Child & Adolescent Health 2021;5(11):800–13. [DOI] [PubMed] [Google Scholar]

[ref40] 40. Lopes DV, de Fraga A, Dias LF, et al.. Influence of nsaids and methotrexate on cd73 expression and glioma cell growth. Purinergic Signalling 2021;17(2):273–84. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref41] 41. Fietz T, Thiel E, Baldus C, et al.. Successful treatment of extracranially metastasized pineal gland germinoma with high-dose methotrexate. Ann Oncol 2002;13(10):1681–5. [DOI] [PubMed] [Google Scholar]

[ref42] 42. Milde T, Lodrini M, Savelyeva L, et al.. Hd-mb03 is a novel group 3 medulloblastoma model demonstrating sensitivity to histone deacetylase inhibitor treatment. J Neurooncol 2012;110(3):335–48. [DOI] [PubMed] [Google Scholar]

[ref43] 43. Phi JH, Choi SA, Kwak PA, et al.. Panobinostat, a histone deacetylase inhibitor, suppresses leptomeningeal seeding in a medulloblastoma animal model. Oncotarget 2017;8:56747–57. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref44] 44. Schmidt C, Schubert NA, Brabetz S, et al.. Preclinical drug screen reveals topotecan, actinomycin d, and volasertib as potential new therapeutic candidates for etmr brain tumor patients. Neuro Oncol 2017;19(12):1607–17. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref45] 45. Meng W, Wang B, Mao W, et al.. Enhanced efficacy of histone deacetylase inhibitor combined with bromodomain inhibitor in glioblastoma. J Exp Clin Cancer Res 2018;37(1):1–11. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref46] 46. Meng W, Wang B, Mao W, et al.. Enhanced efficacy of histone deacetylase inhibitor panobinostat combined with dual pi3k/mtor inhibitor bez235 against glioblastoma. Nagoya J Med Sci 2019;81(1):93–102. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref47] 47. De La Rosa J, Urdiciain A, Zazpe I, et al.. The synergistic effect of dz-nep, panobinostat and temozolomide reduces clonogenicity and induces apoptosis in glioblastoma cells. Int J Oncol 2020;56(1):283–300. [DOI] [PubMed] [Google Scholar]

[ref48] 48. Stathias V, Jermakowicz AM, Maloof ME, et al.. Drug and disease signature integration identifies synergistic combinations in glioblastoma. Nat Commun 2018;9:5315. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref49] 49. Furchert SE, Lanvers-Kaminsky C, Juürgens H, et al.. Inhibitors of histone deacetylases as potential therapeutic tools for high-risk embryonal tumors of the nervous system of childhood. Int J Cancer 2007;120(8):1787–94. [DOI] [PubMed] [Google Scholar]

[ref50] 50. Hoering E, Podlech O, Silkenstedt B, et al.. The histone deacetylase inhibitor trichostatin a promotes apoptosis and antitumor immunity in glioblastoma cells. Anticancer Res 2013;33(4):1351–60. [PubMed] [Google Scholar]

[ref51] 51. Zhang S, Gong Z, Oladimeji PO, et al.. A high-throughput screening identifies histone deacetylase inhibitors as therapeutic agents against medulloblastoma. Exp Hematol Oncol 2019;8(1):1–10. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref52] 52. Egler V, Korur S, Failly M, et al.. Histone deacetylase inhibition and blockade of the glycolytic pathway synergistically induce glioblastoma cell death. Clin Cancer Res 2008;14(10):3132–40. [DOI] [PubMed] [Google Scholar]

[ref53] 53. Al-Lazikani B, Banerji U, Workman P. Combinatorial drug therapy for cancer in the post-genomic era. Nat Biotechnol 2012;30(7):679–92. [DOI] [PubMed] [Google Scholar]

PERMALINK

PANCDR: precise medicine prediction using an adversarial network for cancer drug response

Juyeon Kim

Sung-Hye Park

Hyunju Lee

Abstract

INTRODUCTION

MATERIALS AND METHODS

Materials

Methods

Figure 1.

CDR prediction model

Adversarial model

Model evaluation

Evaluation of the classification model using GDSC and TCGA

Robustness and biological analysis of PANCDR using TCGA data

Evaluation of the regression model on patient data

RESULTS

Comparing the classification performance in GDSC and TCGA

Table 1.

Figure 2.

Table 2.

Figure 3.

Robustness and biological analysis of PANCDR using TCGA data

Table 3.

Table 4.

Evaluation of the regression model on patient data

Table 5.

Performance comparison with a model trained on patient data

DISCUSSION

Key Points

Supplementary Material

ACKNOWLEDGEMENTS

Author Biographies

Contributor Information

AUTHOR CONTRIBUTIONS

DATA AND CODE AVAILABILITY

References

Associated Data

Supplementary Materials

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases