Abstract
Combination therapy is a promising strategy for confronting the complexity of cancer. However, experimental exploration of the vast space of potential drug combinations is costly and unfeasible. Therefore, computational methods for predicting drug synergy are much needed for narrowing down this space, especially when examining new cellular contexts. Here, we thus introduce CCSynergy, a flexible, context aware and integrative deep-learning framework that we have established to unleash the potential of the Chemical Checker extended drug bioactivity profiles for the purpose of drug synergy prediction. We have shown that CCSynergy enables predictions of superior accuracy, remarkable robustness and improved context generalizability as compared to the state-of-the-art methods in the field. Having established the potential of CCSynergy for generating experimentally validated predictions, we next exhaustively explored the untested drug combination space. This resulted in a compendium of potentially synergistic drug combinations on hundreds of cancer cell lines, which can guide future experimental screens.
Keywords: drug synergy, deep learning, Chemical Checker, cancer cell lines, untested drug combination space
Introduction
Aberrant behaviour of cancer cells is caused by malfunctioning of multiple signalling pathways that promote proliferation and inhibit apoptosis [1]. The pervasive redundancy, inherent multifunctionality and the combinatorial control of these biological processes have challenged the traditional ‘one gene, one drug’ paradigm pioneered by Ehrlich [2, 3]. This is evidenced by the increasing rate of drug failure and the recurrent emergence of drug resistance in targeted cancer therapy [2, 4]. To overcome these challenges, combination therapy is a promising strategy as drug synergy ensures greater efficacy in lower drug dosages, which results in avoiding toxicity and minimizing the chance of drug resistance [5].
High-throughput screening methods have enabled testing and quantifying drug synergy [6]. However, synergistic drug pairs are rare and exhaustive exploration of the vast space of potential drug combinations is not experimentally feasible. Thus, computational predictive models of drug synergy, which enable prioritization of the candidate drug combinations, are much needed for narrowing down this vast search space. Computational models as diverse as kinetic [7, 8], network [9, 10] and logic models [11, 12] have thus been employed to gain quantitative insights into the mystery of drug synergy. Moreover, the availability of large-scale drug synergy data, such as the Merck [13] dataset has encouraged the emergence of a wide variety of machine learning-based methods ranging from logistic regression [14] to extremely randomized trees [15] and XGBoost [16]. Ultimately, the state-of-the-art deep-learning approaches such as DeepSynergy [17] and recently TranSynergy [18] and MatchMaker [19] have entered the race and outperformed the others.
Various metrics of drug similarity have been proposed to represent drug pairs in drug synergy prediction models. The focus has primarily been on chemical features of drugs [17, 20–22] and structural or network-level similarity of their targets within cells [9, 23–25]. Further quantities based on phenotypic effects of drugs such as therapeutic and side effect similarities or cell-line-based sensitivity profiles have also been considered [26–28]. Moreover, the Connectivity Map [29] has sparked development of new similarity metrics based on drug-induced gene expression profiles [30–34]. Furthermore, different combinations of these similarity measures have also been examined [35–40]. Ultimately, the Chemical Checker (CC) database arose, which provides a unified framework to systematically extend the concept of drug similarity to all levels of biology, from chemistry, targets, networks, to cellular and clinical effects of drugs [41]. Nevertheless, its enormous potential for predicting drug synergy has not yet been unleashed. This motivated us to develop a method integrating these various levels of bioactivity profiles (Figure 1A).
Figure 1.
CCSynergy framework. (A) The CC [41] based signatures of type II are used as drug features in CCSynergy, which cover five main characteristics of small molecules: (i) chemistry, (ii) targets, (iii) networks, (iv) cells and (v) clinics. Each of them is further divided into five sub-categories totalling 25 distinct levels for drug representation (see Supplementary Methods S1). (B) Five different methods for representing cancer cell lines are used: (i) downstream gene expression profiles, (ii) transcription factor activity and (iii) signalling pathway activity profiles inferred using CARNIVAL [42], (iv) DepMap-based gene essentiality and (v) signalling pathway dependency profiles (see Supplementary Methods S2). The logo of method (iii) is obtained from [42], which is licenced under the Creative Commons Attribution 4.0 International License. (C) CCSynergy DNN architecture. A given triplet is represented as a vector of length 356, which is formed by concatenating the corresponding vectors of drug pair and cell line features. Since drug synergy is order agnostic, each triplet is represented twice to account for both directions (AB and BA). The DNN contains three hidden layers comprising 2000, 1000 and 500 neurons, respectively, which propagate information from the input layer to the output unit (see Supplementary Methods S3). In total, 125 distinct DNNs are trained, each corresponding to one of the 25 CC spaces and one of the five cell line representation methods.
Moreover, it is well-established that drug synergy is highly context specific [43], which necessitates precise representation of the cellular features in drug synergy prediction models. This is relevant, especially in precision medicine, where computational methods are required to enable accurate predictions in specific cellular contexts. Genome-wide expression profiles of cancer cell lines have been extensively employed for this purpose [17,18,21,24,44–47]. However, the expression level of downstream genes is not necessarily a strong indicator of the functional status of the cell and may not directly connect with the drug response phenotype. To address this, computational methods to infer the causal upstream processes, namely transcription factor or signalling pathway activities, which drive the downstream expression changes, have been introduced (e.g. CARNIVAL [42]). However, their potential for representation of the cell in drug synergy prediction has not yet been unlocked. Moreover, genome-wide CRISPR-based essentiality profile of cancer cell lines (DepMap) [48–51], which can more directly establish causal links with cell survival, constitutes another promising alternative. Therefore, we attempted to effectively embrace the context specificity of drug synergy by incorporating these alternative cell representation methods in our predictive framework (Figure 1B).
In this work, we thus introduce CCSynergy, a CC harnessing deep neural network (DNN) that enables context-aware anti-cancer drug synergy prediction (Figure 1C). Using our rigorous cross validation (CV) schemes (Figure 2), we ensure that CCSynergy offers drug synergy predictions of superior accuracy, remarkable robustness and improved context generalizability. Moreover, leveraging a recently published large-scale resource of drug synergy [43], we extensively examine the potential of CCSynergy to generate experimentally validated predictions. Finally, we provide a compendium of potentially synergistic drug combinations that calls for follow-up experimental investigation.
Figure 2.
CV schemes. A drug synergy dataset is shown as a matrix each element of which represents the synergy score of a given triplet of drug pair (
) + cell line (
). The dataset is divided row wise into 5-folds of equal size, which are needed in our 5-fold CV schemes. (A) CV1: Five training cycles are needed in each of which, 1-fold is considered as the testing set (yellow) and the remaining four are used as the training set (grey). This ensures that the set of drug pairs in the testing and training sets do not overlap (held-out drug combinations). (B) CV2: The matrix is not only divided row wise, but also column wise. Columns are grouped according to the tissue of origin of the corresponding cell lines.
learning cycles are needed, where L is the number of distinct tissues in our dataset. In this example, the elements corresponding to the rows in the first fold and a set of columns belonging to the lung tissue {
, … ,
} are considered as the testing set (yellow). The samples in the remaining 4-folds (excluding those whose cell lines originated from the lung tissue), are considered as the training set (grey). This CV scheme further ensures that the cell lines in the training set originate from tissues that do not overlap with that of the testing set (held-out tissues).
Materials and methods
CCSynergy overview
The primary aim of CCSynergy is to unlock the potential of CC bioactivity profiles [41] for predicting anti-cancer drug synergy. CC catalogues integrated bioactivity data on almost 800 000 small molecules. It encompasses five levels of increasing complexity from A: the chemical properties of the compounds, B: their targets and C: network-level properties, to D: their cellular and E: clinical effects. Furthermore, each level is divided into five sub-levels resulting in 25 distinct signatures (Figure 1A), each of which is represented in a vector format of the same length (128). The vectors have been generated via a two-step procedure: applying a dimensionality reduction technique (type I signatures) followed by running a network embedding approach on the resulting similarity networks (type II signatures. See Supplementary Methods S1 for further details).
CCSynergy also strives to enable context-aware predictions using five distinct methods for representing cellular contexts (Figure 1B); I: the downstream gene expression profiles, II: the inferred transcription factor activity profiles, III: the inferred signalling pathway activity profiles, IV: CRISPR-based gene essentiality profiles and V: DepMap-based signalling pathway dependency profiles (See Supplementary Methods S2). It is important to note that in CCSynergy, the cell lines are represented as vectors of the same lengths (100) after reducing the dimension of the original profiles either using auto-encoder-based techniques (CCSynergy I and IV) or by selecting the top 100 most informative signalling pathways (CCSynergy III and V) or transcription factors (CCSynergy II) (See Supplementary Tables S1–S6).
Finally, we represent each sample as a vector of length 356 by concatenating its drug pair and cell line vectors (Figure 1C). CCSynergy is a feedforward DNN and its architecture includes three hidden layers, which propagate information from the input vectors to the output unit, where the synergy score is predicted. For a given training set, 125 separate DNNs are trained each corresponding to one of the 25 CC spaces and one of the five cell line representation methods. Note that we considered different hyper-parameter settings and found the optimal one by considering all 125 DNNs in a 5-fold CV scheme (see Supplementary Methods S3 and Table S7). We analysed the prediction results both separately for each individual CC space and also integratively through decision-level aggregation of the 25 CC spaces. In regression tasks, we used simple averaging for integrating the predictions, while in classification settings, we applied three different integration methods, namely Majority Voting (MV), Spectral-Meta Learner (SML) [52] and Randomized Boltzmann Machine (RBM) [53].
We evaluated the performance of CCSynergy on two separate datasets namely the Merck dataset [13] (Supplementary Methods S4 and Table S8) and the one recently published by the Sanger Institute [43] (Supplementary Methods S5 and Table S9), using two different 5-fold CV schemes (Figure 2). In both CV types, we ensured that any given drug pair in the testing set does not appear in the training set (held-out drug combinations). Furthermore, in the CV2 scheme, we guaranteed that the cell lines in the training set originate from tissues that do not overlap with that of the testing set (held-out tissues) (See Supplementary Methods S6 for further details). Performance evaluation was done using different metrics both within regression and classification settings (Supplementary Methods S7).
We compared CCSynergy with state-of-the-art methods, such as DeepSynergy [17], TranSynergy [18] and MatchMaker [19] on the Merck dataset. Similar to CCSynergy, DeepSynergy and MatchMaker both employ feedforward neural networks. However, MatchMaker is based on an intermediate fusion strategy for combining drug and cell line features that necessitates three neural subnetworks in its architecture, whereas CCSynergy and DeepSynergy both use early fusion strategy that requires a standard feedforward DNN. On the other hand, TranSynergy uses a transformer boosted deep-learning model, which contains an input dimension-reduction component that is a single-layer neural network. It is important to note that dimension reduction in CCSynergy is achieved mainly via prior knowledge about the underlying signalling pathways or transcription factors (for cell line features), and via the network embedding technique that has been operated on the CC drug similarity networks [54]. In contrast, in DeepSynergy and MatchMaker, a separate dimension-reduction technique is not employed, which requires these methods to use larger neural networks. Furthermore, while CCSynergy aims to systematically integrate various bioactivity characteristics of drugs and employs diverse cell line representations, the other methods are mainly focused on a single property of drugs and mostly use one cell line representation strategy (see Supplementary Methods S8 for further details).
Results
CCSynergy outperforms state-of-the-art methods on the Merck dataset
We first aimed to evaluate the performance of CCSynergy as compared to its competitors namely DeepSynergy, TranSynergy and MatchMaker. Therefore, we applied the CV1 scheme on the Merck dataset and trained the 125 distinct DNNs within the CCSynergy framework. In Figure 3A, we show Pearson correlation coefficient (PCC) between the predicted and real values (averaged among the 5-folds) for the five CCSynergy methods across the 25 CC spaces. Several important patterns are relevant: (i) CCSynergy I is clearly outperformed by the other four methods, which implies that gene expression profiles on their own are not strong enough for representing the cell. (ii) In CCSynergy II–V methods, all 25 CC signatures are highly informative (PCC > 0.7), and their relative ranking remains almost the same. For example, E3 always yields the highest PCC, whereas C2 always stays the lowest. (iii) CCSynergy II is outcompeted by the other three, but still remains quite close to them. It is remarkable that TF activity on its own could get so close to the signalling pathway-based profiles. (iv) Expression profiles when combined with causal reasoning (CCSynergy III) can yield the same PCC as CRISPR-based essentiality profiling (CCSynergy IV and V). (v) Integrating the 25 CC spaces by simple averaging always leads to a higher PCC. Figure 3B reveals that all CCSynergy methods (except the first one), when integrating the 25 CC spaces, surpass all three competitors: DeepSynergy, TranSynergy and MatchMaker. CCSynergy II (0.78) is very close to DeepSynergy (0.77) and MatchMaker (0.79), but CCSynergy III, IV and V yield significantly higher PCC (above 0.81), whereas TranSynergy (0.69) and CCSynergy I (0.62) clearly lag behind. It is of note that CCSynergy methods cannot significantly outcompete DeepSynergy or MatchMaker when using each of the 25 CC spaces separately (Supplementary Figure S1) highlighting the fact that it is the integration of the 25 CC spaces, which empowers CCSynergy.
Figure 3.
CCSynergy outperforms state-of-the-art drug synergy prediction methods on the Merck dataset. Five versions of CCSynergy method, which differ in their choice of cell line representation methods (Figure 1B and Supplementary Methods S2), are colour coded according to the legend (the uppermost box) and are compared against the existing methods such as DeepSynergy [17] (black), TranSynergy [18] (grey) and MatchMaker (dark khaki) [19]. Vertical axes in all panels indicate the PCC between the real and predicted drug synergy values. In panels A–G, the PCC scores were calculated within the CV1 scheme, whereas in panels H and I, they were measured within the CV2 scheme. In panel A, the PCC scores are shown across the 25 CC spaces and also when integrated using simple averaging. In contrast, in panels B–I only the integrated PCC scores are shown. Panel A only compares the five CCSynergy versions, while in other panels, DeepSynergy, TranSynergy and MatchMaker are also included. For clarity purposes, in panels C and E, only CCSynergy III is shown, while the other four versions are illustrated in Supplementary Figures S2 and S4, respectively. The circles indicate average PCC across the 5-folds, whereas the error bars in panels B, G and H show the corresponding SDs. In panel C, the PCC scores are calculated for each cell line separately (horizontal axis), and box plots of panel D show their distribution (among the cell lines). Similarly, in panel E, the PCC scores are shown per drug (horizontal axis), and box plots of panel F show their distribution (among the drugs). Moreover, in panel I, the average PCC per tissue type (within CV2 scheme) is illustrated. It is important to mention that in panel G, the circles indicate the average PCC, when the entire dataset is used (N = 14 280), while squares show the average PCC, when a reduced subset of the data is used (N = 6880). Note that all the analyses in this figure are based on the Merck drug synergy dataset (See Supplementary Methods S4).
We then calculated the PCC scores per cell line to check the consistency of CCSynergy performance across different cellular contexts. Figure 3C indicates that in 16 out of 28 cell lines (57.1%), CCSynergy III outcompetes all three competitors, and in 27 cell lines (96.4%) it outcompetes at least two of them, which is similarly the case for CCSynergy IV and V but not for I and II (Supplementary Figure S2). The distribution of PCC scores across cell lines further confirms the superiority of the CCSynergy III, IV and V when integrating the 25 CC spaces (Figure 3D), but not when using each of them separately (Supplementary Figure S3). We also checked the consistency of CCSynergy performance across different drugs. Figure 3E shows that in 19 out of 36 drugs (53%), the per-drug PCC score of CCSynergy III is above that of all three competitors, and in 29 drugs (81%) it outcompetes at least two of them, which is similarly the case for CCSynergy IV and V but not for I and II (Supplementary Figure S4). The distribution of PCC scores across drugs further confirms the superiority of the CCSynergy III, IV and V when integrating the 25 CC spaces (Figure 3F), but not when using each of them separately (Supplementary Figure S5).
Next, we aimed to compare the robustness of these methods to data loss. Thus, we removed six drugs and eight cell lines from the original dataset resulting in a sample of size 6880, which is 48.2% of the original one. We applied the CV1 scheme on the reduced data and calculated the PCC scores for each method. We noted a pronounced reduction of the average PCC scores for DeepSynergy (PCC = –0.061), TranSynergy (
PCC = –0.054), MatchMaker (
PCC = –0.043) and CCSynergy I (
PCC = –0.049), while the reductions for CCSynergy II (
PCC = –0.027), III (
PCC = –0.018), IV (
PCC = –0.019) and V (
PCC = –0.022) were significantly less noticeable (Figure 3G). This is especially important and attests to the remarkable robustness and hence more reliable predictions that the integrated CCSynergy framework provides.
Finally, we employed the CV2 scheme to examine the generalizability of these methods on new cellular contexts. Figure 3H shows the resulting PCC scores for the five CCSynergy methods as compared to the competing ones. The following patterns are germane: (i) the average PCC scores in CV2 substantially decreased in all methods as compared to the CV1 highlighting the difficulty of drug synergy prediction for novel cellular contexts. (ii) The PCC score for the CCSynergy II dropped down to the same level as that of the CCSynergy I (0.48) implying that the context generalizability of the TF-based cell representation is as low as the simple gene expression based one. (iii) CCSynergy V (0.57) distinguished itself from the CCSynergy III (0.54) and IV (0.54), which were indistinguishable in the CV1 scheme and (iv) CCSynergy V is the only method that significantly outperforms all three competitors: DeepSynergy (0.54), MatchMaker (0.52) and TranSynergy (0.43). Furthermore, calculating the PCC scores in the CV2 scheme per tissue type confirms the superiority of CCSynergy V in all five tissues (Figure 3I). Again, we observed that CCSynergy V has gained its superior performance by integrating the 25 CC spaces as it gets outperformed by DeepSynergy when using each CC signature separately (Supplementary Figure S6), which further highlights the importance of the integrative nature of the CCSynergy framework. Additionally, we observed that integrating some combinations of the five CCSynergy variants can further increase the PCC score in the CV2 scheme, but not noticeably in the CV1 scheme (Supplementary Figure S7).
CCSynergy performs well on the Sanger drug synergy dataset
We then examined the performance of CCSynergy on a new dataset in which the drug synergy is measured differently from the Merck dataset. Thus, we examined the large-scale drug combination screen recently performed in the Sanger institute [43], which has reported drug synergy in a binary format enabling us to evaluate CCSynergy in a classification setting. We limited this analysis to CCSynergy III and V, which were the top-performing ones, respectively, in the CV1 and CV2 schemes on the Merck dataset. Under the CV1 scheme, the corresponding 225 DNNs were trained, which output the synergy probability (
) for each testing triplet. This enabled us to calculate the area under the ROC curve (AUC) for these two methods across the 25 CC spaces. Figure 4A shows that (i) all CC signatures are almost equally informative (AUC ranging between 0.79 and 0.83 in CCSynergy III and between 0.80 and 0.84 in CCSynergy V), (ii) CCSynergy V yields slightly higher AUC than CCSynergy III across the majority of the CC spaces and (iii) integrating the 25 CC spaces (by simple averaging) produces the highest AUC (0.84 in CCSynergy III and 0.86 in CCSynergy V).
Figure 4.
CCSynergy performs well on the Sanger drug synergy dataset. Panels (A)–(E) show the results obtained under the CV1 scheme, while panels F–J show their CV2-based equivalent. (A and F) The average AUC values across the 25 CC spaces plus the integrated one (using simple averaging) are shown as red (CCSynergy III) or blue (CCSynergy V) circles. The curves in panels (B), (C), (G) and (H) indicate F1-score (black), precision (blue) and recall (red) as a function of the synergy probabilities () when using CCSynergy III (CV1: panel (B) and CV2: panel G) or CCSynergy V (CV1: panel C and CV2: panel H). Note that in these four panels, the vertical orange and cyan lines, respectively, show the
* and the
maximizing the F1-score. Moreover, the horizontal grey lines, respectively, show the maximum and half of the maximum F1-score. In panels (D) (CV1) and (I) (CV2), circles and squares indicate, respectively, the average precision and recall obtained using CCSynergy III (red) or V (blue) after integrating the 25 CC spaces based on the three integration methods mentioned in the horizontal axis, namely: MV, SML [53] and RBM [54]. Similarly, in panels (E) (CV1) and (J) (CV2), the vertical axes indicate the precision fold increase obtained when using CCSynergy III (red) or V (blue) under operation of the three different integration methods. The error bars in all of these panels indicate the SD of the PCC scores across the 5-folds. Note that all the analyses in this figure are based on the Sanger drug synergy dataset (See Supplementary Methods S5).
Next, we aimed to binarize the outputted synergy probabilities () by determining an optimal threshold (
*). To this end, we measured F1-score, precision and recall as a function of
for both methods (Figure 4B and C). Whereas the recall evidently decreases monotonically by increasing
, we observed that precision increases up to around
= 0.6, but then starts to fluctuate. As a common practice in the field,
* is chosen so as to maximize the F1-score, which is a harmonic mean of precision and recall. However, notably precision is much more important than recall for the ultimate goal that drug synergy prediction pursues. Therefore, instead of maximizing F1-score, we selected
* so as to maximize precision subjected to the constraint that F1 (
*)
Max (F1). We thus ended up with
* = 0.55 for both methods.
Afterwards, we integrated the binary output of the 25 CC spaces using three different approaches: MV, SML [52] and RBM [53]. We noted that MV-based integration leads to substantially higher precision and lower recall than SML and RBM methods (Figure 4D). Moreover, Figure 4E indicates ˃7-fold increase of precision (relative to a random classifier) in both CCSynergy methods when using MV and ˃5-fold increase when using SML or RBM. We detected similar patterns, when measuring these metrics per tissue type (Supplementary Figure S8). Importantly, Supplementary Figure S9 shows that whereas MV-based integration of the 25 CC spaces always substantially exceeds the single CC based ones in terms of precision, the SML or RBM-based integration always provides superior recall. Furthermore, we observed that all three integration methods always lead to higher convergence between CCSynergy III and V (measured as the Jaccard similarity index; Supplementary Figure S10) than the single CC based ones. Moreover, intersection of the set of synergistic triplets predicted by CCSynergy III and V culminates in evidently lower recall than either method alone and interestingly higher precision both in the single-CC methods and in the integrated ones (Supplementary Figure S9).
Finally, we performed similar analyses within the CV2 scheme, and first noted that compared to the CV1, the AUC values across all CC spaces have expectedly decreased in both methods, but predictive power to some extent is still preserved (Figure 4F). The average AUC values across the CC spaces varies between 0.60 and 0.68 in CCSynergy V, and similarly between 0.59 and 0.66 in CCSynergy III. We also observed that in both methods, the chosen * = 0.55 fulfils the expectations, albeit with a negligible deviation (Figure 4G and H). After discretizing the results, we observed that in all integrative approaches, both precision and recall has decreased in CV2 as compared to the CV1 (Figure 4I). Nevertheless, we can still detect significant enrichment of precision in all integrative scenarios (Figure 4J). SML and RBM-based integration yields higher than 2-fold precision increase in both methods, and the VM based one produces even higher enrichment (average 3.1 in CCSynergy III and 4.5 in V). Moreover, the detailed patterns described in CV1 regarding the superiority of the integrative approaches over the single CC based ones (Supplementary Figures S8–S10), are also similarly detected here (Supplementary Figures S11–S13). Thus, CCSynergy remains helpful, even when applied for predicting drug synergy in previously unseen cellular contexts, and it performs well on an alternative drug synergy dataset, where it is evaluated in a classification setting.
CCSynergy is of potential to generate experimentally validated predictions
Our next goal was to evaluate a higher-level generalizability of CCSynergy within a cross-dataset learning scheme, which is challenging, especially if drug synergy is measured differently across datasets. This is indeed the case for the Mark and the Sanger datasets, which we used as the training and the testing sets, respectively. This can be regarded as a large-scale experimental validation of the CCSynergy predictions. We ensured that no triplet is shared between the two datasets by considering only the cell lines that were not seen in the Merck data. We distinguished between three scenarios in the Sanger data (Figure 5A) and categorized a given drug combination by checking whether: (i) both drugs are seen, (ii) only one of the drugs is seen and (iii) neither drug is seen in the Merck dataset. We then trained 2 25 DNNs, corresponding to two methods (CCSynergy III and V) and 25 CC spaces using the entire Merck data as the training set in a classification setting.
Figure 5.
The potential of CCSynergy for generating experimentally validated predictions. In this analysis, we used the Merck dataset as the training and the Sanger dataset as the testing set. Based on their overlap with the Merck data, we have considered three scenarios in the Sanger data and analysed them separately, which are colour coded and described in panel (A). The upper (B, D and F) and lower (C, E and G) panels were obtained using CCSynergy V and III, respectively. The vertical axes indicate the AUC (panels B and C), precision (panels D and E) and precision fold increase (panels F and G). Note that in this analysis, the 25 CC spaces were integrated using simple averaging in panels (B) and (C), while in the other panels three integrative approaches (horizontal axes in panels D–G), namely: MV, SML [52] and RBM [53] were considered.
After integrating the synergy probabilities () of the 25 CC spaces by simple averaging, we calculated the AUC values separately for the above three scenarios. Figure 5B and C show that, in line with our expectations, the AUC values in scenario I were pretty good in both methods (0.70 in CCSynergy V and 0.72 in III). In contrast, in scenario III, they were quite close to the baseline 0.50 for both methods (⁓0.55) implying that the model is not much better than a random classifier in cases where neither drug is seen in the training set. However, for scenario II, we still detected some predictive power as the AUC values were considerably higher than 0.5 in both methods (0.63).
We then binarized the results and integrated the 25 CC spaces using the three integration approaches (MV, SML and RBM). We detected consistent patterns of precision (Figure 5D and E) and its enrichment (Figure 5F and G) across the three scenarios, regardless of the integration approaches and the CCSynergy methods used. In both scenarios I and II, we observed enrichment of precision in all cases, and expectedly the enrichment in the first scenario was always higher than 2-fold and stays consistently above the second one. In the MV-based integration, generally we observed higher enrichment (e.g. ˃3-fold in both scenarios for CCSynergy V) as compared to the SML and RBM, which corroborated our previous observations. However, again we did not see a noticeable departure from the baseline (1-fold) and hence no enrichment of precision for the third scenario (Figure 5F and G). Thus, we conclude that CCSynergy retains a considerable predictive power on unseen cellular contexts in a cross-data learning scheme and hence enhances the potential for generating experimentally validated predictions, provided that at least one of the drugs is seen in the training set.
CCSynergy generates a compendium of potentially synergistic drug combinations
We showed that CCSynergy is of enhanced potential for generating experimentally validated predictions and so it can facilitate exploration of the untested drug combination space. This motivated us to embark on a voyage to exhaustively explore this space. However, the lack of precision enrichment in the third scenario (Figure 5) cautioned us that CCSynergy could be helpful only in a restricted subspace of drug combinations in which at least one of the drugs is seen in the training set. We thus adjusted our exploration strategy accordingly by focusing on a subspace encompassing the pairing of every single drug (62 drugs: anchor drugs) used in our training data (Sanger dataset) with another pool of drugs that we obtained from the GDSC database [55] (264 drugs: library drugs). By considering 543 well-characterized cell lines, we ended up with a subspace including 7 786 146 unique triplets that were not tested in the Sanger screen (Figure 6A). We then applied CCSynergy III and V across the 25 CC signatures using the entire Sanger data as the training set in order to predict drug synergy for every triplet in this subspace. After binarizing the outputs using * = 0.55, two binary matrices with 7 786 146 rows and 25 columns, were generated. Furthermore, integration of the 25 single-CC results using MV, SML and RBM methods generated three additional binary columns that we added to the final matrices (Supplementary Tables S10 and S11). We distinguished between three types of triplets in this space (Figure 6B). Note that although the first two types here are equivalent to the scenarios I and II in the previous analysis, the type III here is not, but rather is an easier to predict version of the type II.
Figure 6.
CCSynergy generates a compendium of potentially synergistic drug combinations. (A) a subspace of the untested drug combination space was considered for exploration, which was constructed by pairing every single drug that was used in the Sanger dataset (62 anchor drugs) with another pool of drugs that were obtained from the GDSC database [55] (264 library drugs) in 543 well-characterized cancer cell lines. This resulted in a subspace including 7 786 146 unique triplets that were not tested in the Sanger drug combination screen. After training CCSynergy III and V based on the 25 CC signature levels using the entire Sanger dataset as the training set, two binary matrices with 7 786 146 rows and 25 columns were generated. Furthermore, we added three additional columns to these matrices based on the results obtained by MV, SML and RBM integration methods. (B) We divided the triplets in this subspace into three types based on their overlap with the Sanger dataset. (C) The bars indicate the number of synergistic triplets across the 25 CC spaces (horizontal axis) identified by CCSynergy III (red) and V (blue). (D) Each circle indicates the number of synergistic triplets (in logarithmic scale) identified as synergistic in at least n CC spaces (horizontal axis) based on CCSynergy III (red) and V (blue). (E) The Jaccard similarity between the set of synergistic triplets identified by CCSynergy III and the one using CCSynergy V, is shown as a function of n (minimum number of CC spaces on which a given triplet is required to be synergistic). (F) The sets of synergistic triplets after integrating the 25 CC spaces based on MV, SML and RBM using CCSynergy V are identified and the Venn diagram shows the overlap between them. The density plots show the distribution of the number of synergistic triplets identified (using CCSynergy V) (G) per cell line and (H) per drug pair, separately for each of the three integrative approaches. Panels (I), (J) and (K) are the equivalent panels based on CCSynergy III.
We enumerated the synergistic triplets for both methods across the 25 CC spaces, which varied between 76 468 (0.98%) and 201 443 (2.59%) in CCSynergy III and between 53 486 (0.68%) and 242 682 (3.11%) in CCSynergy V (Figure 6C). We noted that in both methods a considerable fraction of the triplets is predicted as synergistic at least in one CC space [1 214 794 (15.60%) in CCSynergy III and 1 104 751 (14.18%) in V], but this number declines exponentially by increasing the minimum number (n) of required CC spaces (Figure 6D). For example, in CCSynergy V it goes down to 153 091 (1.96%) when n = 5 and to 843 (0.01%) when n = 25. The agreement between the two methods (CCSynergy III and V), which is measured using the Jaccard similarity of their synergistic triplet sets, in the unseen-cell line scenarios (I and II) is unsurprisingly lower than the seen-cell line type (III) (Supplementary Figure S14). Moreover, the methods diverge further by increasing the minimum number (n) of required CC spaces (Figure 6E).
Next, we observed that the MV-based integration of the 25 CC spaces is quite stringent as it identifies only 24 355 synergy cases in CCSynergy V (0.3%), which is a subset of the ones predicted using SML [523 022 (6.7%)] and overlaps strongly (98.5%) also with that of the RBM [342 840 (4.4%)] (Figure 6F). We then checked how synergy is distributed across different cell lines and observed power-law distribution both when using the integration approaches (Figure 6G) or considering the CC spaces alone (Supplementary Figure S15). The implication is that there are few cellular contexts, which are generally more prone to synergy than the others. For example, in the MV-based integration, ˂10% of the cell lines (50 out of 543) account for the majority [12 484 (51.3%)] of the predicted synergies. We observed a similar power-law distribution of the number of cell lines providing synergy per drug pair, regardless of the integration methods (Figure 6H) or the single-CC spaces (Supplementary Figure S16) used. This reflects the existence of few drug pairs, which are generally synergistic independent of the cellular context. For example, the MV-based integration method predicts synergy for Gemcitabine and AZD7762 in 421 out of 543 cell lines (77.5%). Similarly, we found 36 drug combinations (out of 14 483) for which synergy in ˃100 cell lines are predicted. Synergy was found at least in one cell line only for 2153 drug combinations (15.3%), so for the majority of them (84.7%), synergy was never detected. Analysis of the CCSynergy III results also revealed very similar patterns (Figure 6I–K and Supplementary Figures S17 and S18).
To validate (at least partially) our massive predictions, we checked the DrugComb database [56, 57], where the majority of the existing drug combination studies have been amalgamated. We identified 17 472 distinct triplets shared with DrugComb, which includes all three triplet types (Figure 7A and B). The sample size is sufficiently large for a statistical analysis, even though it covers only 0.22% of the triplets in our database. We considered samples with a Loewe score above 9.2 (i.e. the top 10% among the overlapping set) as our reference of true positives. We observed considerable precision enrichment for all three triplet types under RBM (Figure 7B) and SML (Supplementary Figure S19A and B) integration methods. In the MV-based integration scheme, we also observed precision enrichment for triplets of type I, but for types II and III, the sample of MV-based predicted synergies was not of sufficient size (Supplementary Figure S19C). These observations are valid when either CCSynergy III or V method is used, and their intersection leads to even a higher precision enrichment. For example, in the second scenario, the intersection results in 3.10-fold precision enrichment, while the methods alone yield enrichment of 1.35- and 1.87-fold, respectively (Figure 7B). Figure 7C lists the 29 triplets that both methods under RBM integration predict as synergy. As we can see nine out of the 29 is among the top 10% (true positives), which is considerably larger than 2.9 (expected by chance). Importantly, these true positive cases belong to skin and lung tissues, which were not used in our training set, and so the observed enrichment is not simply an artefact of the choice of tissues in our training set. Furthermore, the enrichment still stays noticeable, if we define synergy more moderately, for example based on the top 25% or 50% triplets. Moreover, we have observed considerable depletion of antagonist ones. We identified only one triplet belonging to the bottom 10%, while the expectation is to observe 2.9 by chance. Thus, in line with our previous observations, the overlap between our database and DrugComb provides additional statistical evidence attesting to the enhanced potential of CCSynergy to generate experimentally validated predictions.
Figure 7.
Partial validation of CCSynergy database using its overlap with DrugComb database [51, 52]. We identified partial overlap between the triplets considered in the CCSynergy database with those in the DrugComb database. We ranked triplets in this subset in terms of their Loewe synergy score, and considered the top 10% as synergistic (i.e. those with Loewe score >9.2). (A) We partitioned the triplets the same as in Figure 6B. (B) Number of triplets in the overlapping set (N), number of synergistic triplets predicted by CCSynergy (TP + FP), number of triplets that in both databases are considered as synergistic (i.e. truly synergistic cases: observed TP), the TP that is expected by chance [10% of (TP + FP)], and the precision fold increase, which is basically the ratio of observed by expected TP, are shown for each three scenarios separately and in total (the column names). These measurements have been calculated both for CCSynergy V and III and also their intersection (the row names in the right-hand side). Note that the results in this table were obtained using RBM-based integration of the 25 CC spaces. For the MV or SML-based versions, please see Supplementary Figure S19. (C) We have zoomed into the 29 triplets in scenario II identified as synergistic by both CCSynergy III and V methods (i.e. their intersection, which is highlighted by a blue rectangle in panel B). We have listed the drug names, cell lines, tissues, study name and synergy Loewe values for each of the 29 triplets and they are classified and colour coded according to their relative ranking in terms of synergy values in the DrugComb database. For each class, the expected and observed number of true positive cases (triplets identified as synergistic in both DrugComb and CCSynergy databases) along with their corresponding fold changes are specified.
Discussion
We have introduced CCSynergy, a deep-learning framework that we have established to unleash the potential of the CC extended bioactivity profiles [41] for drug synergy prediction. We have proved that the 25 CC spaces provide highly potent representations of drug features and by integrating them, CCSynergy has managed to surpass the state-of-the-art deep-learning methods in the field. Moreover, we performed insightful analyses on how to effectively embrace the context specificity of drug synergy in our predictive models. Firstly, we have demonstrated that downstream gene expression profiles on their own are not sufficiently informative, but can be substantially upgraded under a causal reasoning framework inferring upstream signalling pathway activities (CCSynergy III). Secondly, our analysis revealed that representing cell lines based on genome-wide CRISPR/Cas9 screens, ensures consistent superiority of the model in terms of context generalizability (CCSynergy V).
Moreover, the fact that CCSynergy performs well on an alternative dataset (Sanger dataset [43]), where drug synergy was measured differently from the dataset used in the hyper-parameter optimization step (Merck data [13]), confirms its potential for wider applicability. More importantly, we have also demonstrated that compared to its competitors, CCSynergy is remarkably more robust to data loss, which ensures higher reliability and generalizability. Furthermore, we observed considerable precision enrichment when applying CCSynergy in a cross-data learning scheme operated on cell lines that were not seen before. This observation is all the more remarkable, if we consider the fact that drug synergy is notorious for its poor reproducibility across different experimental studies. For example, despite the efforts made in the DrugComb database [56, 57] for standardization and harmonization, the distribution of Loewe scores in the Merck [13] and NCI-ALMANAC [58] datasets is still quite different and they are poorly correlated (PCC = 0.25) (Supplementary Figure S20). This might be due to the quality of the NCI-ALMANAC dataset, which unlike the Merk and Sanger datasets has not benefited from replicate experiments when measuring the synergy scores. However, thanks to the availability of two high-quality datasets (Merk and Sanger dataset), we have managed to conduct a successful cross-data learning analysis and achieved noteworthy precision enrichments, which indeed attests to the fact that CCSynergy, to some extent, is of potential for generating experimentally validated predictions and hence can guide future experimental screens by narrowing down the space of untested drug combinations to a more promising subspace enriched with true positive cases. This motivated us to cautiously explore this space, which ultimately culminated in a new drug synergy database of unparalleled scale that can be of great assistance for designing follow-up experimental screens.
Nevertheless, as quantified rigorously under the CV2 scheme, our results indicate that there is still ample room for improvement, as current methods are all suboptimal for predicting drug synergy in unseen cellular contexts. Importantly, a substantially higher context generalizability is necessary in order for computational methods to ultimately exert significant clinical impact, especially toward fulfilling the ambitious goals of precision medicine, where the specificity of the cellular contexts plays a decisive role. Thus, the field definitely needs to invest further into exploring innovative strategies on how to represent the cell. However, we doubt that deep-learning methods trained directly on drug combination data, would be sufficient for this purpose. We strongly believe that insights from single-drug response screens would be the key complement that might also pave the way for detailed mechanistic understanding of drug synergy.
Further directions for methodological improvements exist, both in terms of model architecture and especially with regard to the integration of the 25 CC spaces. For example, instead of CC signatures of type II, the similarity networks inferred from the CC signatures of type I could be employed, which would require using Convolutional Neural Networks instead of DNNs. Moreover, further strategies for fusion of drug and cell line features are available. Here, we have taken an early fusion strategy, but alternatively, an intermediate fusion approach can be considered, which offers architectural flexibility that might impact the context generalizability of the model, and hence warrants further investigation. Additionally, considering the 25 CC signatures simultaneously in the deep-learning framework instead of using them separately is also a valid option. That being said, in the current study, we have already applied three different decision-level approaches for integrating the CC spaces, which endow CCSynergy with further flexibility. For example, on one hand, the MV approach leads to a higher precision but lower recall, which can be helpful especially for exploring very large spaces. On the other hand, in scenarios where recall is more important, for example when prioritizing a drug combination list of small or moderate size, applying the SML or RBM methods will be more appropriate.
In summary, we have unlocked the potential of CC bioactivity profiles by establishing CCSynergy, which is a flexibly integrative framework for context-aware prediction of drug synergy. We anticipate that CCSynergy will ignite further methodological developments in the field, and will help speed up exploration of the untested drug combination space.
Key points
CCSynergy is a deep-learning framework that predicts anti-cancer drug synergy by integrating the Chemical Checker extended drug bioactivity profiles, which span all levels of drug representation from chemistry, targets, networks, to cellular and clinical effects of drugs.
CCSynergy strives to effectively embrace the context specificity of drug synergy by incorporating alternative cell representation methods, which include both data-driven and knowledge-based approaches.
Potential drug combinations predicted by CCSynergy are enriched with experimentally validated synergies and hence CCSynergy can facilitate exploration of the untested drug combination space even in previously unseen cellular contexts.
CCSynergy generates a database of drug combination predictions, which is of unprecedented scale and can guide future experimental screens.
Supplementary Material
Acknowledgements
The authors would like to thank Julio Saez-Rodriguez and Rosa Hernansaiz-Ballesteros for sharing their CARNIVAL data. Furthermore, S-R.H. appreciates inspiring discussions with Mathew Garnett regarding the drug combination resource that has recently been published by his team. S-R.H. is also thankful to Mahya Mehrmohamadi for helpful discussions and Narjes Rohani for technical assistance. Computational resources for this project were provided by Texas Advanced Computing Center.
Author Biographies
Sayed-Rzgar Hosseini is an assistant professor at the Center for Computational Systems Medicine at School of Biomedical Informatics at UTHealth. His research interests include bioinformatics, cancer systems biology and systems pharmacology.
Xiaobo Zhou is a professor and director of the Center for Computational Systems Medicine at School of Biomedical Informatics at UTHealth. His research interests are bioinformatics, systems biology, imaging informatics and clinical informatics.
Contributor Information
Sayed-Rzgar Hosseini, School of Biomedical Informatics, University of Texas Health Science Center (UTHealth), Houston, TX, USA.
Xiaobo Zhou, School of Biomedical Informatics, University of Texas Health Science Center (UTHealth), Houston, TX, USA.
Funding
National Institute of Health (grants numbers: NIH R01GM123037, U01AR069395 and R01CA241930) and National Science Foundation (grant number: NSF 2217515).
Data availability
CCSynergy codes and data are available at: https://github.com/RzgarHosseini/CCSynergy.
References
- 1. Yaffe MB. Why geneticists stole cancer research even though cancer is primarily a signaling disease. Sci Signal 2019;12:eaaw3483. [DOI] [PubMed] [Google Scholar]
- 2. Hopkins AL. Network pharmacology: the next paradigm in drug discovery. Nat Chem Biol 2008;4:682–90. [DOI] [PubMed] [Google Scholar]
- 3. Fitzgerald JB, Schoeberl B, Nielsen UB, et al. Systems biology and combination therapy in the quest for clinical efficacy. Nat Chem Biol 2006;2:458–66. [DOI] [PubMed] [Google Scholar]
- 4. Holohan C, Van Schaeybroeck S, Longley DB, et al. Cancer drug resistance: an evolving paradigm. Nat Rev Cancer 2013;13:714–26. [DOI] [PubMed] [Google Scholar]
- 5. Al-Lazikani B, Banerji U, Workman P. Combinatorial drug therapy for cancer in the post-genomic era. Nat Biotechnol 2012;30:679–92. [DOI] [PubMed] [Google Scholar]
- 6. He L, Kulesskiy E, Saarela J, et al. Methods for high-throughput drug combination screening and synergy scoring. Methods Mol Biol Clifton NJ 2018;1711:351–98. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7. Nelander S, Wang W, Nilsson B, et al. Models from experiments: combinatorial drug perturbations of cancer cells. Mol Syst Biol 2008;4:216. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8. Yuan B, Shen C, Luna A, et al. CellBox: interpretable machine learning for perturbation biology with application to the design of cancer combination therapy. Cell Syst 2021;12:128–140.e4. [DOI] [PubMed] [Google Scholar]
- 9. Huang L, Li F, Sheng J, et al. DrugComboRanker: drug combination discovery based on target network analysis. Bioinforma Oxf Engl 2014;30:i228–36. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10. Cheng F, Kovács IA, Barabási A-L. Network-based prediction of drug combinations. Nat Commun 2019;10:1197. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11. Flobak Å, Baudot A, Remy E, et al. Discovery of drug synergies in gastric cancer cells predicted by logical modeling. PLoS Comput Biol 2015;11:e1004426. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12. Eduati F, Jaaks P, Wappler J, et al. Patient-specific logic models of signaling pathways from screenings on cancer biopsies to prioritize personalized combination therapies. Mol Syst Biol 2020;16:e8664. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13. O’Neil J, Benita Y, Feldman I, et al. An unbiased oncology compound screen to identify novel combination strategies. Mol Cancer Ther 2016;15:1155–62. [DOI] [PubMed] [Google Scholar]
- 14. Li J, Huo Y, Wu X, et al. Essentiality and transcriptome-enriched pathway scores predict drug-combination synergy. Biology 2020;9:E278. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15. Jeon M, Kim S, Park S, et al. In silico drug combination discovery for personalized cancer therapy. BMC Syst Biol 2018;12:16. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16. Sidorov P, Naulaerts S, Ariey-Bonnet J, et al. Predicting synergism of cancer drug combinations using NCI-ALMANAC data. Front Chem 2019;7:509. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17. Preuer K, Lewis RPI, Hochreiter S, et al. DeepSynergy: predicting anti-cancer drug synergy with deep learning. Bioinforma Oxf Engl 2018;34:1538–46. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18. Liu Q, Xie L. TranSynergy: mechanism-driven interpretable deep neural network for the synergistic prediction and pathway deconvolution of drug combinations. PLoS Comput Biol 2021;17:e1008653. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19. Kuru HI, Tastan O, Cicek AE. MatchMaker: a deep learning framework for drug synergy prediction. IEEE/ACM Trans Comput Biol Bioinform 2022;19:2334–44. [DOI] [PubMed] [Google Scholar]
- 20. Zhang T, Zhang L, Payne PRO, et al. Synergistic drug combination prediction by integrating multiomics data in deep learning models. Methods Mol Biol Clifton NJ 2021;2194:223–38. [DOI] [PubMed] [Google Scholar]
- 21. Kim Y, Zheng S, Tang J, et al. Anticancer drug synergy prediction in understudied tissues using transfer learning. J Am Med Inform Assoc JAMIA 2021;28:42–51. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22. Kuenzi BM, Park J, Fong SH, et al. Predicting drug response and synergy using a deep learning model of human cancer cells. Cancer Cell 2020;38:672–684.e6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23. Tang J, Karhinen L, Xu T, et al. Target inhibition networks: predicting selective combinations of druggable targets to block cancer survival pathways. PLoS Comput Biol 2013;9:e1003226. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24. Yang M, Jaaks P, Dry J, et al. Stratification and prediction of drug synergy based on target functional similarity. NPJ Syst Biol Appl 2020;6:16. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25. Chen G, Tsoi A, Xu H, et al. Predict effective drug combination by deep belief network and ontology fingerprints. J Biomed Inform 2018;85:149–54. [DOI] [PubMed] [Google Scholar]
- 26. Ji X, Tong W, Liu Z, et al. Five-feature model for developing the classifier for synergistic vs. antagonistic drug combinations built by XGBoost. Front Genet 2019;10:600. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27. Li H, Li T, Quang D, et al. Network propagation predicts drug synergy in cancers. Cancer Res 2018;78:5446–57. [DOI] [PubMed] [Google Scholar]
- 28. Narayan RS, Molenaar P, Teng J, et al. A cancer drug atlas enables synergistic targeting of independent drug vulnerabilities. Nat Commun 2020;11:2935. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29. Subramanian A, Narayan R, Corsello SM, et al. A next generation connectivity map: L1000 platform and the first 1,000,000 profiles. Cell 2017;171:1437–1452.e17. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30. Lee J-H, Kim DG, Bae TJ, et al. CDA: combinatorial drug discovery using transcriptional response modules. PLoS One 2012;7:e42573. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31. Zhao J, Zhang X-S, Zhang S. Predicting cooperative drug effects through the quantitative cellular profiling of response to individual drugs. CPT Pharmacometrics Syst Pharmacol 2014;3:e102. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32. Stathias V, Jermakowicz AM, Maloof ME, et al. Drug and disease signature integration identifies synergistic combinations in glioblastoma. Nat Commun 2018;9:5315. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33. Regan-Fendt KE, Xu J, DiVincenzo M, et al. Synergy from gene expression and network mining (SynGeNet) method predicts synergistic drug combinations for diverse melanoma genomic subtypes. NPJ Syst Biol Appl 2019;5:6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34. Huang L, Brunell D, Stephan C, et al. Driver network as a biomarker: systematic integration and network modeling of multi-omics data to derive driver signaling pathways for drug combination prediction. Bioinforma Oxf Engl 2019;35:3709–17. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35. Zhao X-M, Iskar M, Zeller G, et al. Prediction of drug combinations by integrating molecular and pharmacological data. PLoS Comput Biol 2011;7:e1002323. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36. Sun Y, Sheng Z, Ma C, et al. Combining genomic and network characteristics for extended capability in predicting synergistic drugs for cancer. Nat Commun 2015;6:8481. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37. Li P, Huang C, Fu Y, et al. Large-scale exploration and analysis of drug combinations. Bioinforma Oxf Engl 2015;31:2007–16. [DOI] [PubMed] [Google Scholar]
- 38. Li X, Xu Y, Cui H, et al. Prediction of synergistic anti-cancer drug combinations based on drug target network and drug induced gene expression profiles. Artif Intell Med 2017;83:35–43. [DOI] [PubMed] [Google Scholar]
- 39. Ding P, Yin R, Luo J, et al. Ensemble prediction of synergistic drug combinations incorporating biological, chemical, pharmacological, and network knowledge. IEEE J Biomed Health Inform 2019;23:1336–45. [DOI] [PubMed] [Google Scholar]
- 40. Guo W-F, Zhang S-W, Feng Y-H, et al. Network controllability-based algorithm to target personalized driver genes for discovering combinatorial drugs of individual patients. Nucleic Acids Res 2021;49:e37. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41. Duran-Frigola M, Pauls E, Guitart-Pla O, et al. Extending the small-molecule similarity principle to all levels of biology with the chemical checker. Nat Biotechnol 2020;38:1087–96. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42. Liu A, Trairatphisan P, Gjerga E, et al. From expression footprints to causal pathways: contextualizing large signaling networks with CARNIVAL. NPJ Syst Biol Appl 2019;5:40. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43. Jaaks P, Coker EA, Vis DJ, et al. Effective drug combinations in breast, colon and pancreatic cancer cells. Nature 2022;603:166–73. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44. Celebi R, Bear Don’t Walk O, Movva R, et al. In-silico prediction of synergistic anti-cancer drug combinations using multi-omics data. Sci Rep 2019;9:8949. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45. Yan X, Yang Y, Chen Z, et al. H-RACS: a handy tool to rank anti-cancer synergistic drugs. Aging 2020;12:21504–17. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46. Julkunen H, Cichonska A, Gautam P, et al. Leveraging multi-way interactions for systematic prediction of pre-clinical drug combination effects. Nat Commun 2020;11:6136. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47. Xia F, Shukla M, Brettin T, et al. Predicting tumor cell line response to drug pairs with deep learning. BMC Bioinformatics 2018;19:486. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48. Behan FM, Iorio F, Picco G, et al. Prioritization of cancer therapeutic targets using CRISPR-Cas9 screens. Nature 2019;568:511–6. [DOI] [PubMed] [Google Scholar]
- 49. Meyers RM, Bryan JG, McFarland JM, et al. Computational correction of copy-number effect improves specificity of CRISPR-Cas9 essentiality screens in cancer cells. Nat Genet 2017;49:1779–84. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 50. Pacini C, Dempster JM, Boyle I, et al. Integrated cross-study datasets of genetic dependencies in cancer. Nat Commun 2021;12:1661. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 51. Tsherniak A, Vazquez F, Montgomery PG, et al. Defining a cancer dependency map. Cell 2017;170:564–576.e16. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 52. Parisi F, Strino F, Nadler B, et al. Ranking and combining multiple predictors without labeled data. Proc Natl Acad Sci 2014;111:1253–8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 53. Shaham U, Cheng X, Dror O, et al. A deep learning approach to unsupervised ensemble learning. Proc 33rd Int Conf Int Conf Mach Learn 2016;48:30–9. [Google Scholar]
- 54. Grover A, Leskovec J. node2vec: scalable feature learning for networks. KDD Proc Int Conf Knowl Discov Data Min 2016;2016:855–64. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 55. Yang W, Soares J, Greninger P, et al. Genomics of drug sensitivity in cancer (GDSC): a resource for therapeutic biomarker discovery in cancer cells. Nucleic Acids Res 2013;41:D955–61. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 56. Zagidullin B, Aldahdooh J, Zheng S, et al. DrugComb: an integrative cancer drug combination data portal. Nucleic Acids Res 2019;47:W43–51. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 57. Zheng S, Aldahdooh J, Shadbahr T, et al. DrugComb update: a more comprehensive drug sensitivity data repository and analysis portal. Nucleic Acids Res 2021;49:W174–84. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 58. Holbeck SL, Camalier R, Crowell JA, et al. The National Cancer Institute ALMANAC: a comprehensive screening resource for the detection of anticancer drug pairs with enhanced therapeutic activity. Cancer Res 2017;77:3564–76. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
CCSynergy codes and data are available at: https://github.com/RzgarHosseini/CCSynergy.