Network-principled deep generative models for designing drug combinations as graph sets

Mostafa Karimi; Arman Hasanzadeh; Yang Shen

doi:10.1093/bioinformatics/btaa317

. 2020 Jul 13;36(Suppl 1):i445–i454. doi: 10.1093/bioinformatics/btaa317

Network-principled deep generative models for designing drug combinations as graph sets

Mostafa Karimi ^b1,^b2,¹, Arman Hasanzadeh ^b1,¹, Yang Shen ^b1,^b2,^✉

PMCID: PMC7355302 PMID: 32657357

Abstract

Motivation

Combination therapy has shown to improve therapeutic efficacy while reducing side effects. Importantly, it has become an indispensable strategy to overcome resistance in antibiotics, antimicrobials and anticancer drugs. Facing enormous chemical space and unclear design principles for small-molecule combinations, computational drug-combination design has not seen generative models to meet its potential to accelerate resistance-overcoming drug combination discovery.

Results

We have developed the first deep generative model for drug combination design, by jointly embedding graph-structured domain knowledge and iteratively training a reinforcement learning-based chemical graph-set designer. First, we have developed hierarchical variational graph auto-encoders trained end-to-end to jointly embed gene–gene, gene–disease and disease–disease networks. Novel attentional pooling is introduced here for learning disease representations from associated genes’ representations. Second, targeting diseases in learned representations, we have recast the drug-combination design problem as graph-set generation and developed a deep learning-based model with novel rewards. Specifically, besides chemical validity rewards, we have introduced novel generative adversarial award, being generalized sliced Wasserstein, for chemically diverse molecules with distributions similar to known drugs. We have also designed a network principle-based reward for disease-specific drug combinations. Numerical results indicate that, compared to state-of-the-art graph embedding methods, hierarchical variational graph auto-encoder learns more informative and generalizable disease representations. Results also show that the deep generative models generate drug combinations following the principle across diseases. Case studies on four diseases show that network-principled drug combinations tend to have low toxicity. The generated drug combinations collectively cover the disease module similar to FDA-approved drug combinations and could potentially suggest novel systems pharmacology strategies. Our method allows for examining and following network-based principle or hypothesis to efficiently generate disease-specific drug combinations in a vast chemical combinatorial space.

Availability and implementation

https://github.com/Shen-Lab/Drug-Combo-Generator.

Supplementary information

Supplementary data are available at Bioinformatics online.

1 Introduction

Drug resistance is a fundamental barrier to developing robust antimicrobial and anticancer therapies (Housman et al., 2014; Taubes, 2008). Its first sign was observed in 1940s soon after the discovery of penicillin (Abraham and Chain, 1940), the first modern antibiotic. Since then, drug resistance has surfaced and progressed in infectious diseases such as HIV (Clavel and Hance, 2004), tuberculosis (TB) (Dooley et al., 1992) and hepatitis (Ghany and Liang, 2007) as well as cancers (Holohan et al., 2013). Mechanistically, it can emerge through drug efflux (Chang and Roth, 2001), activation of alternative pathways (Lovly and Shaw, 2014) and protein mutations (Balbas et al., 2013; Toy et al., 2013) while decreasing the efficacy of drugs.

Combination therapy is a resistance-overcoming strategy that has found success in combating HIV (Shafer and Vuitton, 1999), TB (Ramón-García et al., 2011), cancers (Bozic et al., 2013; Sharma and Allison, 2015) and so on. Considering that most diseases and their resistances are multifactorial (Kaplan and Junien, 2000; Keith et al., 2005), multiple drugs targeting multiple components simultaneously could confer less resistance than individual drugs targeting components separately. Examples include targeting both mitogen-activated protein kinase kinase (MEK) and B-Raf proto-oncogene serine/threonine-protein kinase (BRAF) in patients with BRAF V600-mutant melanoma rather than targeting MEK or BRAF alone (Flaherty et al., 2012; Madani Tonekaboni et al., 2018). The effect of drug combination is usually categorized as synergistic, additive or antagonistic depending on whether it is greater than, equal to or less than the sum of individual drug effects (Chou, 2006). Synergistic combinations are effective at delaying the beginning of the resistance; however, antagonistic combinations are effective at suppressing expansion of resistance (Saputra et al., 2018; Singh and Yeh, 2017), representing offensive and defensive strategies to overcome drug resistance. In particular, offensive strategies cause huge early causalities but defensive ones anticipate and develop protection against future threats (Saputra et al., 2018).

Discovering a drug combination to overcome resistance is, however, extremely challenging, even more so than discovering a drug which is already a costly (∼billions of USD) (DiMasi et al., 2016) and lengthy (∼12 years) (Van Norman, 2016) process with low success rates (3.4% phase-1 oncology compounds make it to approval and market) (Wong et al., 2019). An apparent challenge, a combinatorial one, is in the scale of chemical space, which is estimated to be 10⁶⁰ for single compounds (Bohacek et al., 1996) and can ‘explode’ to $10^{60 K}$ for K-compound combinations. Even if the space is restricted to around 10³ U.S. Food and Drug Administration (FDA)-approved human drugs, there are 10⁵–10⁶ pairwise combinations. Another challenge, a conceptual one, is in the complexity of systems biology. On top of on-target efficacy and off-target side effects or even toxicity that need to be considered for individual drugs, network-based design principles are much needed for drug combinations that effectively target multiple proteins in a disease module and have low toxicity or even resistance profiles (Billur Engin et al., 2014; Martínez-Jiménez and Marti-Renom, 2016).

Current computational models in drug discovery, especially those for predicting pharmacokinetic and pharmacodynamic properties of individual drugs/compounds, can be categorized into discriminative and generative models. Discriminative models predict the distribution of a property for a given molecule, whereas generative models would learn the joint distribution on the property and molecules. For instance, discriminative models have been developed for predicting single compounds’ toxicities, based on support vector machines (Darnag et al., 2010), random forest (Svetnik et al., 2003) and deep learning (Mayr et al., 2016). Whereas discriminative models are useful for evaluating given compounds or even searching compound libraries, generative models can effectively design compounds of desired properties in chemical space. Recent advance in inverse molecular design has seen deep generative models such as SMILES representation-based reinforcement learning (RL) (Popova et al., 2018) or recurrent neural networks as well as graph representation-based generative adversarial networks (GANs), RL (You et al., 2018) and generative tensorial RL (Zhavoronkov et al., 2019).

Unlike single-drug design, current computational efforts for drug combinations are exclusively focused on discriminative models and lack generative models. The main focus for drug combination is to use discriminate models to identify synergistic or antagonistic drugs for a given specific disease. Examples include the Chou–Talalay method (Chou, 2010), integer linear programming (Pang et al., 2014) and deep learning (Preuer et al., 2018). However, it is daunting if not infeasible to enumerate all cases in the enormous chemical combinatorial space and evaluate their combination effects using a discriminative model. Not to mention that such methods often lack explainability.

Directly addressing aforementioned combinatorial and conceptual challenges and filling the void of generative models for drug combinations, in this study, we develop network-based representation learning for diseases and deep generative models for accelerated and principled drug combination design (the general case of K drugs). Recently, by analyzing the network-based relationships between disease proteins and drug targets in the human protein–protein interactome, Cheng et al. proposed an elegant principle for FDA-approved drug combinations that targets of two drugs both hit the disease module but cover different neighborhoods. Our methods allow for examining and following the proposed network-based principle (Cheng et al., 2019) to efficiently generate disease-specific drug combinations in a vast chemical combinatorial space. They will also help meet a critical need of computational tools in a battle against quickly evolving bacterial, viral and tumor populations with accumulating resistance.

To tackle the problem, we have developed a network principle-based deep generative model for faster, broader and deeper exploration of drug combination space by following the principle underling FDA-approved drug combinations. First, we have developed hierarchical variational graph auto-encoders (HVGAE) for jointly embedding disease–disease network and gene–gene network. Through end-to-end training, we embed genes in a way that they can represent the human interactome. Then, we utilize their embeddings with novel attentional pooling to create features for each disease so that we can embed diseases more accurately. Second, we have also developed a reinforcement-learning-based graph-set generator for drug combination design by utilizing both gene/disease embedding and network principles. Besides those for chemical validity and properties, our rewards also include (i) a novel adversarial reward, generalized sliced Wasserstein distance (GSWD), that fosters generated molecules to be diverse yet similar in distribution to known compounds (ZINC database and FDA-approved drugs); and (ii) a network principle-based reward for drug combinations that are feasible for online calculations.

The overall schematics are shown in Figure 1, and they are detailed in Section 3.

Fig. 1. — Overall schematics of the proposed approach for generating disease-specific drug combinations

2 Data

2.1 Human interactome and its features

We used the human interactome data (a gene–gene network) from Menche et al. (2015) that feature 13 460 proteins interconnected by 141 296 interactions.

We introduced edge features for the human interactome based on the biological nature of edges (interactions). The interactome was compiled by combining experimental support from various sources/databases including (i) regulatory interactions from TRANSFAC (Matys et al., 2003); (ii) binary interactions from high-throughput (including Rolland et al. 2014) and literature-curated datasets [including IntAct (Aranda et al., 2010) and MINT (Ceol et al., 2010)] as well as literature-curated interactions from low-throughput experiments [IntAct, MINT, BioGRID (Stark et al., 2011), and HPRD (Keshava Prasad et al., 2009)]; (iii) metabolic enzyme-coupled interactions from Lee et al. (2008); (iv) protein complexes from CORUM (Ruepp et al., 2010); (v) kinase-substrate pairs from PhosphositePlus (Hornbeck et al., 2012); and (vi) signaling interactions. In summary, an edge could correspond to one or multiple physical interaction types. So we used a six-hot encoding for edge features, based on whether an edge corresponds to regulatory, binary, metabolic, complex, kinase and signaling interactions.

We also introduced features for nodes (genes) in the human interactome based on (i) Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways (Kanehisa et al., 2002) (336 features) queried through Biopython (Cock et al., 2009); (ii) Gene Ontology (GO) terms (Ashburner et al., 2000) including biological process (30 769 features), molecular function (12 183 features) and cellular component (4451 features), mapped using the NCBI Gene2Go dataset; (iii) disease–gene associations from the database OMIM (Mendelian Inheritance in Man) (Hamosh et al., 2005) and the results from genome-wide association studies (Mottaz et al., 2008; Ramos et al., 2014) (299 features). The last 299 features correspond to 299 diseases represented by the Medical Subject Headings (MeSH) vocabulary (Rogers, 1963).

After removing those genes without KEGG pathway information, the human interactome used in this study has 13 119 genes and 352 464 physical interactions.

2.2 Disease–disease network

We used a disease–disease network from Menche et al. (2015) with 299 nodes (diseases), created based on human interactome data (as detailed earlier), gene expression data (Su et al., 2004), disease–gene associations (Hamosh et al., 2005; Mottaz et al., 2008; Ramos et al., 2014), GO (Ashburner et al., 2000), symptom similarity (Zhou et al., 2014) and comorbidity (Hidalgo et al., 2009). The original disease–disease network is a complete graph with real-valued edges. The edge value between two diseases shows how much they are topologically separated from each other. A positive/negative edge weight indicates that that two disease modules are topologically separated/overlapped. Therefore, we used zero-weight as the threshold and pruned positive-valued edges, which results in a disease–disease network of 299 nodes and 5986 edges (without weights).

2.3 Disease–gene associations

We used disease–gene associations from the database OMIM (Hamosh et al., 2005). These associations bridge aforementioned gene–gene and disease–disease networks into a hierarchical graph of genes and diseases, based on which gene and disease representations will be learned.

2.4 Disease classification

For the purpose of assessment, we used the Comparative Toxicogenomics Database (CTD) (Davis et al., 2019) to classify diseases into eight classes based on their Disease Ontology (DO) terms (Schriml et al., 2012) where diseases are represented in the MeSH vocabulary (Rogers, 1963). In the CTD database only 201 of the 299 diseases have a corresponding DO term. Therefore, for the 98 diseases with missing DO terms we considered the majority of their parents’ DO terms, if applicable, as their DO terms. With this approach, we assigned DO terms to 66 such diseases and classified 267 of the 299 diseases. The 32 diseases with DO terms still missing are usually at the top layers of the MeSH tree.

2.5 FDA-approved drugs and drug combinations

To assess our deep generative model for drug combination design (to be detailed in Section 3.2), we consider a comprehensive list of US FDA-approved combination drugs (1940–2018.9) (Das et al., 2019). The dataset contains 419 drug combinations consisting of 328 unique drugs, including 341 (81%), 67 (16%) and 11 (3%) of double, triple and quadruple drug combinations.

We also utilized the curated drug–disease association from CTD database (Davis et al., 2019).

3 Materials and Methods

We have developed a network-based drug combination generator which can be utilized in overcoming drug resistance. Representing drugs through their molecular graphs, we recast the problem of drug combination generation into network-principled graph-set generation by incorporating prior knowledge such as human interactome (gene–gene), disease–gene, disease–disease, gene pathway and gene–GO relationships. Furthermore, we formulate the graph-set generation problem as learning an RL agent that iteratively adds substructures and edges to each molecular graph in a chemistry- and system-aware environment. To that end, the RL model is trained to maximize a desired property Q (e.g. therapeutic efficacy for drug combinations) while following the valency (chemical validity) rules and being similar in distribution to the prior set of graphs.

As shown in Figure 1, the proposed approach consists of (i) embedding prior knowledge (different network relationships) through HVGAE and (ii) generating drug combinations as graph sets through an RL algorithm, which will be detailed next.

Notations: As both gene–gene and disease–disease networks can be represented as graphs, notations are differentiated by superscripts ‘g’ and ‘d’ to indicate gene–gene and disease–disease networks, respectively. Drugs (compounds) are also represented as graphs and notations with ‘k’ in the superscript indicate the kth drug (graph) in the drug combination (graph set).

3.1 HVGAE for representation learning

Suppose that a gene–gene network is represented as a graph $G^{(g)} = (A^{(g)}, {F^{(g, m)}}_{m = 1}^{M})$ , where $A^{(g)} = [A^{(g, 1)}, \dots, A^{(g, n_{e})}] \in {0, 1}^{n_{g} \times n_{g} \times n_{e}}$ is the adjacency tensor of the gene–gene network with $n_{g}$ nodes and n_e edge types (k-hot encoding of six types of aforementioned physical interactions such as regulatory, binary, metabolic, complex, kinase and signaling interactions). We also define ${\tilde{A}}^{(g)} \in {0, 1}^{n_{g} \times n_{g}}$ to be elementwise $OR$ of ${A^{(g, 1)}, \dots, A^{(g, n_{e})}}$ . Furthermore, $F^{(g, m)}$ denotes the mth set of node features for gene–gene network where M (5 in the study) represents different types of node features such as pathways, three GO terms and gene–disease relationship. We also suppose that the disease–disease network is represented as graph $G^{(d)} = (A^{(d)}, F^{(d)})$ , where $A^{(d)} \in {0, 1}^{n_{d} \times n_{d}}$ is the adjacency matrix of the disease–disease network with $n_{d}$ nodes; and $F^{(d)}$ represents the set of node features for the disease–disease network.

We have developed a hierarchical embedding with two levels. In the first level, we embed the gene–gene network to get the features related to each disease and then we incorporate the disease features within the disease–disease network to embed their relationship. We infer the embedding for each gene and disease jointly through end-to-end training. The proposed HVGAE perform probabilistic auto-encoding to capture uncertainty of representations which is in the same spirit as the variational graph auto-encoder models introduced in (Hajiramezanali et al., 2019; Hasanzadeh et al., 2019; Kipf and Welling, 2016).

3.1.1 First level: gene–gene embedding

The inference model for variational embedding of the gene–gene network is formulated as follows. We first use M graph neural networks (GNNs) to transform individual nodes’ features in M types and then concatenate the M sets of results ${\hat{F}}^{(g, m)}$ ( $m = 1, \dots, M$ ) into ${\hat{F}}^{(g)}$ :

\begin{array}{l} {\hat{F}}^{(g, m)} = AGG ({{GNN}_{j} (A^{(g, j)}, F^{(g, m)})}, j = 1, \dots, n_{e}) \\ {\hat{F}}^{(g, m)} \in R^{n_{g} \times L_{g}}, m = 1, \dots, M \\ {\hat{F}}^{(g)} = CONCAT ({{\hat{F}}^{(g, m)}}_{m = 1}^{M}) \in R^{n_{g} \times M L_{g}}, \end{array}

(1)

where $AGG$ is an aggregation function combining output features of ${GNN}_{j}$ s for each node. We used a two layer fully connected neural network with ReLU activation functions followed by a single linear layer in our implementation. We then approximate the posterior distribution of stochastic latent variables $Z^{(g)}$ [containing $z_{i}^{(g)} \in ℝ^{L_{g}}$ for $i = 1, \dots, n_{g}$ where $L_{g}$ (32 in this study) is the latent space dimensionality for the ith gene], with a multivariate Gaussian distribution $q (\cdot)$ given the gene–gene network’s aggregated node features ${\hat{F}}^{(g)}$ and adjacency tensor $A^{(g)}$ :

\begin{matrix} q (Z^{(g)} | {\hat{F}}^{(g)}, A^{(g)}) = \prod_{i = 1}^{n_{g}} q (z_{i}^{(g)} | {\hat{F}}^{(g)}, A^{(g)}), where \\ q (z_{i}^{(g)} | {\hat{F}}^{(g)}, A^{(g)}) = N (μ_{i}^{(g)}, diag (σ_{i}^{2, (g)})), \\ μ^{(g)} = AGG ({{GNN}_{μ, g, j} (A^{(g, j)}, {\hat{F}}^{(g)})}, j = 1, \dots, n_{e}), \\ log (σ^{(g)}) = AGG ({{GNN}_{σ, g, j} (A^{(g, j)}, {\hat{F}}^{(g)})}, j = 1, \dots, n_{e}), \\ μ^{(g)} \in ℝ^{n_{g} \times L_{g}}, log (σ^{(g)}) \in ℝ^{n_{g} \times L_{g}}, \end{matrix}

(2)

where $Z^{(g)} \in ℝ^{n_{g} \times L_{g}}$ ; $μ^{(g)}$ is the matrix of mean vectors $μ_{i}^{(g)}$ ; and $σ^{(g)}$ is the matrix of standard deviation vectors $σ_{i}^{(g)}$ ( $i = 1, \dots, n_{g}$ ).

The generative model for the gene–gene network is formulated as:

\begin{matrix} p ({\tilde{A}}^{(g)} | Z^{(g)}) = \prod_{i = 1}^{n} \prod_{j = 1}^{n} p ({\tilde{A}}_{i j}^{(g)} | z_{i}^{(g)}, z_{j}^{(g)}), where \\ p ({\tilde{A}}_{i j}^{(g)} | z_{i}^{(g)}, z_{j}^{(g)}) = σ (z_{i}^{(g)} z_{j}^{(g) T}), \end{matrix}

(3)

and $σ (\cdot)$ is the logistic sigmoid function. The loss for gene–gene variational embedding is represented as a variational lower bound (ELBO):

\begin{matrix} L^{(g)} = E_{q (Z^{(g)} | {\hat{F}}^{(g)}, A^{(g)})} [log p ({\tilde{A}}^{(g)} | Z^{(g)})] \\ - KL (q (Z^{(g)} | {\hat{F}}^{(g)}, A^{(g)}) | | p (Z^{(g)})), \end{matrix}

(4)

where $KL (q (\cdot) | | p (\cdot))$ is the Kullback–Leibler divergence between $q (\cdot)$ and $p (\cdot)$ . We take the Gaussian prior for $p (Z^{(g)})$ and make use of the reparameterization trick (Kipf and Welling, 2016) for training.

3.1.2 Second level: disease–disease embedding

The inference model for variational embedding of the disease–disease network is similar to that of the gene–gene network except that the disease–disease network’s aggregated node features, ${\hat{F}}^{(d)}$ , are derived through parameterized attentional pooling of ${\hat{Z}}_{r}^{(g)}$ , latent variables of genes associated with the rth disease (a subset of $Z^{(g)}$ ):

\begin{array}{l} e_{r} = v tanh ({\hat{Z}}_{r}^{(g)} W + b), r = 1, \dots, n_{d} \\ α_{r} = softmax (e_{r}), r = 1, \dots, n_{d} \\ {\hat{F}}_{r}^{(d)} = \sum_{i} α_{r, i} {\hat{Z}}_{r, i}^{(g)}, r = 1, \dots, n_{d} \\ {\hat{F}}^{(d)} = CONCAT ({{\hat{F}}_{r}^{(d)}}_{r = 1}^{n_{d}}) \in R^{n_{d} \times L_{d}}, \end{array}

(5)

where $α_{m}$ capture the importance of genes related to the rth disease for calculating its latent representations and $L_{d}$ is the latent space dimensionality of a disease.

Once ${\hat{F}}^{(d)}$ , the disease–disease network’s aggregated node features for all diseases, are derived; we again define $q (Z^{(d)} | {\hat{F}}^{(d)}, A^{(d)})$ for the posterior distribution of stochastic latent variables $Z^{(d)}$ similarly to what we did in Equation (2) except that AGG functions are removed since disease–disease network has one binary adjacency matrix; give the generative decoder $p (A^{(d)} | Z^{(d)})$ for embedding the disease–disease network similarly to what we did in Equation (3); and calculate the variational lowerbound (ELBO) loss $L^{(d)}$ for the disease–disease network similarly to what we did in Equation (4). Details can be found in Supplementary Section S1.1.

Both levels of our proposed HVGAE, i.e. gene–gene and disease–disease variational graph representation learning, are jointly trained in an end-to-end fashion using the following overall loss:

L^{HVGAE} = L^{(d)} + L^{(g)} .

(6)

3.2 RL-based graph-set generator for drug combinations

In this section, we introduce the RL-based drug combination generator. We will detail (i) the state space of graph sets (K compounds) and the action space of graph-set growth; (ii) multi-objective rewards including chemical validity and our generalized sliced Wasserstein reward for individual drugs as well as our newly designed network principle-based reward for drug combinations; and (iii) policy network that learns to take actions in the rewarding environment.

3.2.1 State and action space

We represent a graph set (drug combination) with K graphs as $G = {G^{(k)}}_{k = 1}^{K}$ . Each graph $G^{(k)} = (A^{(k)}, E^{(k)}, F^{(k)})$ where $A^{(k)} \in {0, 1}^{n_{k} \times n_{k}}$ is the adjacency matrix, $F^{(k)} \in ℝ^{n_{k} \times ϕ}$ the node feature matrix, $E^{(k)} \in {0, 1}^{ϵ \times n_{k} \times n_{k}}$ the edge-conditioned adjacency tensor and n_k the number of vertices for the kth graph; and $ϕ$ is the number of features per nodes and ϵ the number of edge types.

The state space $G$ is the set of all K graphs with different numbers and types of nodes or edges. Specifically, the state of the environment s_t at iteration t is defined as the intermediate graph set $G_{t} = {G_{t}^{(k)}}_{k = 1}^{K}$ generated so far which is fully observable by the RL agent.

The action space is the set of edges that can be added to the graph set. An action a_t at iteration t is analogous to link prediction in each graph in the set. More specifically, a link can either connect a new subgraph (a single node/atom or a subgraph/drug-substructure) to a node in $G_{t}^{(k)}$ or connect existing nodes within graph $G_{t}^{(k)}$ . The actions can be interpreted as connecting the current graph with a member of scaffold subgraphs set C. Mathematically, for $G_{t}^{(k)}$ , graph k at step t, the action $a_{t}^{(k)}$ is the quadruple of $a_{t}^{(k)} = concat (a_{first, t}^{(k)}, a_{second, t}^{(k)}, a_{edge, t}^{(k)}, a_{stop, t}^{(k)})$ .

3.2.2 Multi-objective reward

We have defined a multi-objective reward R_t to satisfy certain requirements in drug combination therapy. First, a chemical validity reward maintains that individual compounds are chemically valid. Second, a novel adversarial reward, generalized sliced Wasserstein GAN (GS-WGAN), enforces that generated compounds are synthesizable and ‘drug-like’ by following the distribution of synthesizable compounds in the ZINC database (Irwin and Shoichet, 2005) or FDA-approved drugs. Third, a network principle-based award would encourage individual drugs to target the desired disease module but not to overlap in their target sets. Toxicity due to drug–drug interactions (DDIs) can also be included as a reward. It is intentionally left out in this study so that toxicity can be evaluated for drug combinations designed to follow the network principle.

When training the RL agent, we use different reward combinations in different stages. We first only use the weighted combination of chemical validity and GS-WGAN awards learning over drug combinations for all diseases; then we remove the penalized logP (Pen-logP) portion of chemical validity and add adversarial loss again while learning over drug combinations for all diseases; and finally use the combination of the three rewards as in the second stage but focusing on a target disease and possibly on restricted actions/scaffolds (in a spirit similar to transfer learning). The three types of rewards are detailed as follows.

Chemical validity reward for individual drugs. A small positive reward is assigned if the action does not violate valency rules. Otherwise a small negative reward is assigned. This is an intermediate reward added at each step. Another reward is on penalized logP (lipophilicity where P is the octanol–water partition coefficient) or Pen-logP values. The design and the parameters of this reward are adopted from You et al. (2018) without optimization.

Adversarial reward using GSWD. To ensure that the generated molecules resemble a given set of molecules (such as those in ZINC or FDA-approved), we deploy GANs. GANs are very successful at modeling high-dimensional distributions from given samples. However, they are known to suffer from training instability and cannot generate diverse samples (a phenomenon known as mode collapse).

Wasserstein GANs (WGAN) have shown to improve stability and mode collapse by replacing the Jenson–Shannon divergence in original GAN formulation with the Wasserstein distance (WD) (Arjovsky et al., 2017). More specifically, the objective function in WGAN with gradient penalty (Gulrajani et al., 2017) is defined as follows:

\begin{matrix} \min_{θ} \max_{ϕ} V_{W} (π_{θ}, D_{ϕ}) + λ R (D_{ϕ}), \\ with V_{W} (π_{θ}, D_{ϕ}) = E_{x \sim p_{r}} [log D_{ϕ} (x)] - E_{y \sim π_{θ}} [log D_{ϕ} (y)], \end{matrix}

(7)

where p_r is the data distribution, λ is a hyper-parameter, R is the Lipschitz continuity regularization term, $D_{ϕ}$ is the critic with parameters $ϕ$ , and $π_{θ}$ is the policy (generator) with parameters θ.

Despite theoretical advantages of WGANs, solving Equation (7) is computationally expensive and intractable for high-dimensional data. To overcome this problem, we propose and formulate a novel GS-WGAN which deploys GSWD (Kolouri et al., 2019). GSWD, first, factorizes high-dimensional probabilities into multiple marginal 1D distributions with generalized Radon transform (GRT). Then, by taking advantage of closed form solution of WD in 1D, the distance between two distributions is approximated by the sum of WDs of marginal 1D distributions. More specifically, let $R$ represent GRT operator. The GRT of a probability distribution $P (\cdot)$ which is defined as follows:

R P (t, ψ) = \int_{ℝ^{d}} P (x) δ (t - f (x, ψ)) d x,

(8)

where $δ (\cdot)$ is the 1D Dirac delta function, $t \in ℝ$ is a scalar, ψ is a unit vector in the unit hyper-sphere in a d-dimensional space ( $S^{d - 1}$ ), and f is a projection function whose parameters will be learned in training. Injectivity of the GRT (Beylkin, 1984) is the requirement for the GSWD to be a valid distance. We use linear project $f (x, ψ)$ here and can easily extend to two nonlinear cases that maintain the GRT-injectivity (circular nonlinear projections or homogeneous polynomials with an odd degree).

GSWD between two d-dimensional distributions $P_{X}$ and $P_{Y}$ is therefore defined as:

GSWD (P_{X}, P_{Y}) = \int_{S^{d - 1}} WD (R P_{X} (\cdot, ψ), R P_{Y} (\cdot, ψ)) d ψ .

(9)

The integral in the above equation can be approximated with a Riemann sum. Knowing the definition of GSWD, we define the objective function of GS-WGAN as follows:

\min_{θ} \max_{ϕ} V_{GSW} (π_{θ}, D_{ϕ}) + λ R (D_{ϕ}),

(10)

\begin{matrix} s . t . V_{GSW} (π_{θ}, D_{ϕ}) = \int_{ψ \in S^{d - 1}} E_{x \sim p_{r}} [log D_{ϕ} (x)] \\ - E_{y \sim π_{θ}} [log D_{ϕ} (y)] d ψ, \end{matrix}

(11)

where the parameters and notations are the same as defined in Equation (7).

We note that x and y in Equation (10) are random variables in $ℝ^{d}$ , which is not a reasonable assumption for graphs. To that end, we use an embedding function g that maps each graph to a vector in $ℝ^{d}$ . We use graph convolutional (GC) layers followed by fully connected layers to implement g. We deploy the same type of neural network architecture for $D_{ϕ}$ . We use $R_{advers} = - V_{GSW} (π_{θ}, D_{ϕ})$ as the adversarial reward used together with other rewards and optimize the total rewards with a policy gradient method (Section 3.2.3).

Network principle-based reward for drug combinations. Proteins or genes associated with a disease tend to form a localized neighborhood disease module rather than scattering randomly in the interactome (Cheng et al., 2019). A network-based score has been introduced (Menche et al., 2015) to efficiently capture the network proximity of a drug (X) and disease (Y) based on the shortest-path length d(x, y) between a drug target (x) and a disease protein (y):

\begin{matrix} Z = \frac{d (X, Y) - \bar{d}}{σ_{d}} \\ d (X, Y) = \frac{1}{| | Y | |} \sum_{y \in Y} \min_{x \in X} d (x, y), \end{matrix}

(12)

where $d (\cdot, \cdot)$ is the shortest path distance; $\bar{d}$ and σ_d are the mean and standard deviation of the reference distribution which is corresponding to the expected network topological distance between two randomly selected groups of proteins matched to size and degree (connectivity) distribution as the original disease proteins and drug targets in the human interactome. Z-score being negative (Z < 0) implies network proximity of disease module and drug targets which is desirable. From the drug combination perspective, it has been shown that the drug–drug relationship of complementary exposure has the least side effect and the most clinical efficacy (Cheng et al., 2019). Complementary exposure for two drugs (X₁ and X₂) means that the X₁ drug targets (x₁) and X₂ drug targets (x₂) are not in the same neighborhood and have the least overlapping. Therefore, Cheng et al. (2019) have proposed a network-separation score which is formulated as follows:

s_{X_{1}, X_{2}} = d (X_{1}, X_{2}) - \frac{d (X_{1}, X_{1}) + d (X_{2}, X_{2})}{2},

(13)

where $d (X_{1}, X_{2})$ is the mean shortest path distance between drugs X₁ and X₂; $d (X_{1}, X_{1})$ and $d (X_{2}, X_{2})$ are the mean shortest path distance within drug targets X₁ and X₂, respectively. The separation score being positive (s > 0) implies that two networks are separated from each other, which is desirable. We have extended and combined these scores for general drug combination therapy where we have a set of k drugs ${X_{1}, \dots, X_{k}}$ and disease Y:

R_{network} = λ_{1} \sum_{i = 1}^{k} \sum_{j > i} s (X_{i}, X_{j}) - λ_{2} \sum_{i = 1}^{k} Z (X_{i}, Y) .

(14)

However, the exact online calculation of the reward $R_{network}$ is infeasible while training across all the diseases and the whole human interactome with more than 13K nodes and 352K edges. Therefore, we have developed a relaxed version of the reward which is feasible for online calculation and correlates with the actual reward. Specifically, we consider the normalized exclusive or (XOR) of intersections of disease modules with drug targets:

{\hat{R}}_{network} = \frac{Y \cap (X_{1} \oplus \dots \oplus X_{k})}{| Y |} = \frac{(X_{1} \cap Y) \oplus \dots \oplus (X_{k} \cap Y)}{| Y |} .

(15)

The relaxed network principle-based reward is penalizing a drug combination if the overlap between drug targets in the disease module is high; therefore, it will prevent the adverse DDIs. We scaled the network score by a constant (equals 10) such that the score would be in the same range as Pen-logP and can use the same weight in the total reward as Pen-logP did in You et al. (2018).

For a generated compound, we predict its protein targets by DeepAffinity (Karimi et al., 2019), judging by whether the predicted IC₅₀ is below 1 μM.

3.2.3 Policy network

Having explained the graph generation environment (various rewards), we outline the architecture of our proposed policy network. Our method takes the intermediate graph set $G_{t}$ and the collection of scaffold subgraphs C as inputs, and outputs the action a_t, which predicts a new link for each of the graphs in $G_{t}$ (You et al., 2018).

Since the input to our policy network is a set of K compounds or graphs ${G_{t}^{(k)} \cup C}_{k = 1}^{K}$ , we first deploy some layers of graph neural network to process each of the graphs. More specifically,

X^{(k)} = GN N^{(k)} (G_{t}^{(k)} \cup C), for k = 1, \dots, K,

(16)

where $GN N^{(k)}$ is a multilayer-graph neural network. The link prediction-based action at iteration t is a concatenation of four components for each of the K graphs: selection of two nodes, prediction of edge type, and prediction of termination. Each component is sampled according to a predicted distribution (You et al., 2018). Details are included in the Supplementary Section S1.2. We note that the first node is always chosen from $G_{t}$ while the next node is chosen from ${G_{t}^{(k)} \cup C}_{k = 1}^{K}$ . We also note that infeasible actions (i.e. actions that do not pass valency check) proposed by the policy network are rejected and the state remains unchanged. We adopt proximal policy optimization (PPO) (Schulman et al., 2017), one of the state-of-the-art policy gradient methods, to train the model.

4 Results

To assess the performance of our proposed model, we have designed a series of experiments. In Section 4.1, we first compare HVGAE to state-of-the-art graph embedding methods in disease–disease network representation learning and further include several variants of HVGAE for ablation studies. We then assess the performance of the proposed RL method in two aspects. In a landscape assessment in Section 4.2, we examine designed pairwise compound combinations for 299 diseases in quantitative scores of following a network-based principle (Cheng et al., 2019). In Section 4.3, we focus on four case studies involving multiple diseases of various systems pharmacology strategies. Our method is capable of generating higher-order combinations of K drugs. As FDA-approved drug combinations are often pairs, here we design compound pairs from the scaffolds of FDA-approved drug pairs. We further delve into designed compound pairs to understand the benefit of following network principles in lowering toxicity from DDIs. We also do so to understand their systems pharmacology strategies in comparison to the FDA-approved drug combinations.

4.1 HVGAE representation compares favorably to baselines

4.1.1 Experiment setup

To assess the performance of our proposed embedding method HVGAE, we compare its performance in (disease–disease) network reconstruction with Node2Vec (Grover and Leskovec, 2016), DeepWalk (Perozzi et al., 2014) and VGAE (Kipf and Welling, 2016), as well as some variants of our own model for ablation study. Node2Vec and DeepWalk are random walk-based models that do not capture node attributes; hence, we only used the disease–disease graph structure. For VGAE, we used identity matrix as node attributes as suggested by the authors.

For our HVGAE described in Section 3.1, we also considered two variants for ablation study: HVGAE-disjoint does not jointly embed gene–gene and disease–disease networks and does not use attentional pooling for disease embedding, whereas HVGAE-noAtt just does not use attentional pooling. Specifically, in HVGAE-disjoint, we, first, learned an embedding for gene–gene network, then used the sum of mean of the node representations of genes affected by a disease as its node attributes. In HVGAE-noAtt, we jointly learned the representations while using sum of mean of the node representations of genes as node attributes for disease–disease network.

In node2vec and DeepWalk, the walk length was set to 80, the number of walks starting at each node was set to 10, and the nodes were embedded to a 16D space. The window size was 10 for node2vec while it is set to 10 in DeepWalk. All models were trained using Adam optimizer. In VGAE, a 32D GC layer followed by two 16D layers was used for mean and variance inference. The learning rate was set to 0.01.

For HVGAE and its variants (for ablation study), we embed gene networks in 32D space using a single GC layer with 32 filters for each of the five types of input followed by a 64D GC layer and two 32D GC layer to infer mean and variance of the representation. We used a single 32D fully connected layer for attention layer. For disease–disease network embedding, we deployed a single 32D GC layer followed by two 16D layer for mean and variance inference resulting in 16D embedding for disease–disease network. Learning rates were set to 0.001. The models were trained for 1000 epochs choosing the best representation based on their reconstruction performance at each epoch.

4.1.2 Numerical analysis and ablation study for network embedding

Table 1 summarizes the reconstruction performance of the aforementioned methods. Compared to all baselines, our HVGAE showed the best performance in all metrics considered. Node2Vec and DeepWalk showed the worst performance as they only use the graph structure. The performance of VGAE was very close to DeepWalk. This is due to the fact that no attributes have been provided to VGAE despite having the capability of capturing attributes.

Table 1.

Graph reconstruction performances (unit: %) in the disease–disease network using our proposed HVGAE and baselines

Method	AUC-ROC	AP	F1-Macro	F1-Micro
Node2Vec	79.01	72.82	35.73	51.10
DeepWalk	79.32	73.77	40.28	53.30
VGAE	88.12	85.71	60.19	64.98
HVGAE-disjoint	91.45	90.72	73.45	74.77
HVGAE-noAtt	92.83	92.34	73.81	75.14
HVGAE	96.11	95.89	79.77	80.45

Open in a new tab

Note: F1 scores are based on 50% threshold. Method names in bold are our HVGAE and its variants. Numbers in bold correspond to the best performances.

Compared to VGAE, HVGAE-disjoint without joint embedding or attentional pooling still saw better performance, which suggests that the attribute generated by the gene–gene network contains meaningful features about the disease–disease network. The slight performance gain from HVGAE-disjoint to HVGAE-noAtt shows that joint learning of both networks hierarchically helps to render more informative features for the disease–disease network. Finally, HVGAE had another performance boost compared to HVGAE-noAtt and outperformed all competing methods, which shows the benefit of attentional pooling. Specifically, the attention layer of HVGAE allows the model to produce features that are specifically informative for the disease–disease network representation learning.

4.2 Our model generates drug combinations following network principles across diseases

4.2.1 Experiment setup

We have trained the proposed reinforcement model in three stages using different rewards, disease sets and action spaces to increasingly focus on a target disease while exploiting all diseases whose representations already jointly embed gene–gene, disease–disease and gene–disease networks. In the first stage, we train the model to only generate drug-like small molecules which follow the chemistry valency reward, lipophilicity reward (Pen-logP where P is the octanol–water partition coefficient) (You et al., 2018), and our novel adversarial reward for individual compounds. In this study, we trained the model for 3 days (4800 iterations) to learn to follow the valency conditions and promote high logP for generated compounds.

In the second stage, we start from the trained model at the end of the first stage (‘warm-start’ or ‘pre-training’). And we continue to train the model to generate good drug combinations across all diseases. We do so by adding the network principle-based reward for compound combinations and sequentially generating drug combinations for each disease one by one. Then, we calculate the network-based score for the generated drug combinations at the last epoch across disease ontologies and compare them with the FDA-approved melanoma drug combinations’ network-based score. In this study, we trained the model for 1500 iterations to generate drug combinations across all 299 diseases. In each iteration, we generated eight drug combinations for a given disease. We adopted PPO (Schulman et al., 2017) with a learning rate of 0.001 to train the proposed RL for both stages.

The last stage is disease-specific and is detailed in Section 4.3.

4.2.2 Numerical analysis

Across disease ontologies we quantify the performance of the proposed RL (stage 2 model first) using quantitative scores of compound combinations following a network-based principle (Cheng et al., 2019). We consider the generated combinations in the last epoch (the last 299 iterations) and calculate the network score ${\hat{R}}_{network}$ based on disease ontologies. We asses our model based on two versions of disease classification, original disease ontology and its extension, explained in Section 2.4. Table 2 summarizes the network-based scores for our model. Specifically, suppose that the set of targets for drugs 1 and 2 are represented by A and B whereas the disease module is the universal set Ω, we report the portion exclusively covered by drug 1 ( $η_{A - B}$ ), exclusively covered by drug 2 ( $η_{B - A}$ ), overlapped by both ( $η_{A \cap B}$ ) and collectively by both ( $η_{A \cup B}$ ). As a reference, we calculated the corresponding network scores for three FDA-approved drug combinations for melanoma.

Table 2.

Network-based score for the generated drug combinations based on disease ontology classifications

	Disease Ontology				Disease Ontology extended
	$η_{A - B}$	$η_{B - A}$	$η_{A \cap B}$	$η_{A \cup B}$	$η_{A - B}$	$η_{B - A}$	$η_{A \cap B}$	$η_{A \cup B}$
Infectious disease	0.25	0.10	0.06	0.41	0.20	0.07	0.05	0.33
Disease of anatomical entity	0.27	0.12	0.10	0.49	0.26	0.11	0.09	0.48
Disease of cellular proliferation	0.25	0.09	0.07	0.42	0.25	0.10	0.08	0.44
Disease of mental health	0.22	0.11	0.10	0.43	0.22	0.11	0.10	0.43
Disease of metabolism	0.22	0.13	0.10	0.46	0.23	0.14	0.11	0.48
Genetic disease	0.23	0.15	0.11	0.4	0.23	0.15	0.11	0.49
Syndrome	0.22	0.11	0.11	0.44	0.22	0.11	0.11	0.44

Open in a new tab

Based on the results shown in Table 2, we note that across all disease classes, the designed compound combinations learned in an environment, where the network principle(Cheng et al., 2019) was rewarded, did achieve the desired performances. Specifically, their overlaps in disease modules were low as $η_{A \cap B}$ fractions were around 0.1; whereas their joint coverage in disease modules was high as $η_{A \cup B}$ fractions were in the range of 0.4–0.5 for all diseases.

Compared to a few FDA-approved drugs for melanoma in Table 3, we notice that the designed compound combinations had similar exclusive coverage ( $η_{A - B}$ and $η_{B - A}$ ) as the drug combinations. However, the overlapping and overall coverage ( $η_{A \cap B}$ and $η_{A \cup B}$ ) were both much higher in FDA-approved drug combinations than the designed. Improvements could be made by training the RL agent longer, as these scores had already been improving during the limited training process under computational restrictions. More improvement can be made by adjusting the network-based reward as well.

Table 3.

Network-based scores for FDA-approved melanoma drug combinations

	$η_{A - B}$	$η_{B - A}$	$η_{A \cap B}$	$η_{A \cup B}$
Dabrafenib + Trametinib	0.05	0.21	0.55	0.81
Encorafenib + Binimetinib	0.21	0.05	0.53	0.86
Vemurafenib + Cobimetinib	0.05	0.27	0.36	0.68

Open in a new tab

4.3 Case studies for specific diseases

4.3.1 Experiment setup

In the third and last stage of RL model training, we start from the stage 2 model and generate drug combinations for a fixed target disease and can choose scaffold libraries specific to the disease. In parallel, we trained the model for 500 iterations (roughly 1 day) to generate 4000 drug combinations specifically for each of four diseases featuring various drug-combination strategies: melanoma, lung cancer, ovarian cancer and breast cancer. In all cases, we started with the Murcko scaffolds of specific FDA-approved drug combinations to be detailed next.

Melanoma: Different targets in the same pathway. Resistance to BRAF kinase inhibitors is associated with reactivation of the mitogen-activated protein kinase (MAPK) pathway. There is, thus, a phase 1 and 2 trial of combined treatment with Dabrafenib, a selective BRAF inhibitor, and Trametinib, a selective MAPK kinase (MEK) inhibitor. As melanoma is not one of the 299 diseases, we chose broader neoplasm as an alternative. To compensate the loss of focus on target disease, we design compound pairs from Murcko scaffolds of Dabrafenib + Trametinib.

Lung and ovarian cancers: targeting parallel pathways. MAPK and phosphoinositide 3-kinase (PI3K) signaling pathways are parallels important for treating many cancers, including lung and ovarian cancers (Bedard et al., 2015; Day and Siu, 2016). Clinical data suggest that dual blockade of these parallel pathways has synergistic effects. Buparlisib (BKM120) and Trametinib (GSK1120212; Mekinist) are used as a drug combination therapy for the purpose. Specifically, Buparlisib is a potent and highly specific PI3K inhibitor, whereas Trametinib is a highly selective, allosteric inhibitor of MEK1/MEK2 activation and kinase activity (Bedard et al., 2015).

Breast cancer: reverse resistance. Endocrine therapies, including Fulvestrant, are the main treatment for hormone receptor-positive breast cancers (80% of breast cancers) (Turner et al., 2015). However, they could confer resistance to patients during or after the treatment. A Phase 3 study is using Fulvestrant and Palbociclib as a combination therapy to reverse the resistance. Fulvestrant and Palbociclib are targeting different genes in different pathways. Specifically, Fulvestrant targets estrogen receptor α in estrogen signaling pathway and Palbociclib targets cyclin-dependent kinases 4 and 6 (CDK4 and CDK6) in cell cycle pathway (Turner et al., 2015).

4.3.2 Baseline methods for drug pair combination

Since our proposed method is the first to generate drug combinations for specific diseases, we consider the following baseline methods to compare with: (i) random selection of 1000 pairs from 8724 small-molecule drugs in DrugBank (Wishart et al., 2018); (ii) 628 FDA-approved drug combinations curated by Cheng et al. (2019) for hypertension and cancers (our case studies are on four types of cancers); (iii) random selection of 1000 pairs of FDA-approved drugs for the given disease, based on drug–disease dataset ‘SCMFDD-L’ (Zhang et al., 2018).

4.3.3 Designed pairs follow network principles and improve toxicity

We first compare the compound combinations designed by our model and those from the baselines using the network score that reflects the network-based principle. Figure 2(a–d) shows that our designed combinations in all four cases, with higher network scores in distribution, respected the network principle more than the baselines (including the FDA-approved pairs not necessarily specific for the target disease). The observation is statistically significant with P-values ranging from 6E-74 to 7E-7 (one-sided Kolmogorov–Smirnov test; see more details in the Supplementary Tables S2 and S3). Such a result is thanks to the network-principled reward we introduced.

Fig. 2. — Comparison of network score and toxicity of RL-generated pairs of compounds (our proposed method) with three baselines, i.e. random pairs of DrugBank compounds, FDA-approved drug pairs and random pairs of FDA-approved drugs for four case-study diseases. Panels (a-h) are combinations of row measure (network score or toxicity) and column diseases (melanoma, lung cancer, ovarian cancer, and breast cancer)

We also examine whether drug combinations designed to follow the network principle could reduce toxicity from DDIs. DDIs are crucial when using drug combinations since they may trigger unexpected pharmacological effects, including adverse drug events. We used a deep-learning model DeepDDI (Ryu et al., 2018) with a mean accuracy of 92.4% to predict for each combination the probabilities of 86 types of DDIs (we manually split them into 16 positives and 70 negatives; see details in the Supplementary Section S1.3). To summarize over the DDIs, we considered both maximum and mean probabilities of positive or negative ones. And we compared those distributions between our designed pairs and baselines in each disease.

Figure 2(e–h), using the mean probability among negative DDIs, shows that our compound pairs designed for all four diseases were predicted to have less chances of toxicity compared to the baselines. One-sided Kolmogorov–Smirnov tests attested to the statistical significance of the observation as P-values ranged between 2E-166 and 2E-53. More analyses can be found in the Supplementary Section S3.

Taken together, Figure 2 suggested that following the network principle in designing drug combinations would help reduce toxicity due to DDIs.

4.3.4 Designed pairs reproduce approved polypharmacology strategies

We next examine the DeepAffinity-predicted target genes of our designed pairs and compare them to the polypharmacology strategies outlined in Section 4.3.1 for each disease. Since improved network scores have been shown to correlate with lower toxicity, we used the scores to filter the 4000 combinations designed for each disease. Specifically, we retained combinations with network scores above 0.5 and $η_{A \cap B}$ below 0.1. These designs are shared along with the codes.

For melanoma, out of 69 combination designs retained, 26% were predicted to jointly cover BRAF and MEK genes in a complementary way. In other words, one molecule only targets BRAF and the other only targets MEK, according to our DeepAffinity (Karimi et al., 2019)-predicted IC₅₀, echoing the systems pharmacology strategy of the drug combination of Dabrafenib and Trametinib. There were also other designs which demand further examination and potentially contain novel strategies. All retained designs were predicted to target the MAPK pathway to which BRAF and MEK belong.

For lung and ovarian cancers, the same filtering criteria retained 204 (896) compound combinations designed for ovarian (lung) cancer. As disease modules can be limited, MEK1/2 does not exist in the used modules for ovarian or lung cancer so a gene-level analysis cannot be performed as the melanoma case. Instead, we performed the pathway-level analysis and found that 50.9% (45.2%) of combination designs for ovarian (lung) cancer were predicted to jointly and complementarily cover the MAPK and PI3K signaling pathways, which echoes the combination of Buparlisib and Trametinib. Moreover, 99.5% (100%) of these retained designs were predicted to jointly target both pathways for ovarian (lung) cancer.

For breast cancer, 77 designed compound-combinations passed the filters. As CDK4/6 does not belong to the breast-cancer module due to the limitation of disease modules used, we again only performed a pathway-level analysis. Nine percent of the combinations were predicted to jointly and complementarily cover estrogen receptor-signaling and cell cycle pathways as Fulvestrant and Palbociclib do. Also, 74% of the retained combinations jointly cover these pathways. These two portions suggest that many designed combinations were predicted to simultaneously target both pathways (with possible overlapping genes). If we consider PI3K signaling rather than cell cycle pathway for CDK4/6, 15.5% of retained drug combinations were predicted to jointly and complementarily cover estrogen and PI3K signaling pathways and all of them did jointly.

4.3.5 Ablation study for RL-based drug-combination generation

Besides HVGAE for network and disease embedding, two of our novel contributions in RL-based drug set generations were network-principled reward and adversarial reward through GS-WGAN. To assess the effects of these contributions to our model, we performed ablation study for Stage 3 using the case of melanoma. We ablated the originally proposed model in two ways: removing the network-principled reward or replacing the GS-WGAN adversarial reward with the previously used GAN reward based on Jenson–Shannon divergence. Results in Figure 3 suggested that both rewards led to faster initial growth and higher saturation values in network-based scores.

Fig. 3. — Ablation study for RL: best network scores achieved by three variants of the proposed method over training iterations

5 Conclusion

In response to the need of accelerated and principled drug-combination design, we have recast the problem as graph-set generation in a chemically and net-biologically valid environment and developed the first deep generative model with novel adversarial award and drug-combination award in RL for the purpose. We have also designed hierarchical variation graph auto-encoders (HGVAE) to jointly embed domain knowledge such as gene–gene, disease–disease, gene–disease networks and learn disease representations to be conditioned on in the generative model for disease-specific drug combination. Our results indicate that HGVAE learns integrative gene and disease representations that are much more generalizable and informative than state-of-the-art graph unsupervised-learning methods. The results also indicate that the RL model learns to generate drug combinations following a network-based principle thanks to our adversarial and drug-combination rewards. Case studies involving four diseases indicate that drug combinations designed to follow network principles tend to have low toxicity from DDIs. These designs also encode systems pharmacology strategies echoing FDA-approved drug combinations as well as other potentially promising strategies. As the first generative model for disease-specific drug-combination design, our study allows for assessing and following network-based mechanistic hypotheses in efficiently searching the chemical combinatorial space and effectively designing drug combinations.

Supplementary Material

btaa317_Supplementary_Data

Click here for additional data file.^{(844.7KB, zip)}

Acknowledgement

Part of the computing time is provided by the Texas A&M High Performance Research Computing.

Funding

This study was in part supported by the National Institute of General Medical Sciences of the National Institutes of Health [R35GM124952 to Y.S.].

Conflict of Interest: none declared.

References

Abraham E.P., Chain E. (1940) An enzyme from bacteria able to destroy penicillin. Nature, 146, 837–837. [PubMed] [Google Scholar]
Aranda B. et al. (2010) The intact molecular interaction database in 2010. Nucleic Acids Res., 38, D525–D531. [DOI] [PMC free article] [PubMed] [Google Scholar]
Arjovsky M. et al. (2017) Wasserstein GAN. In: Proceedings of the 34th International Conference on Machine Learning (PMLR), 70, 214–223. [Google Scholar]
Ashburner M. et al. (2000) Gene ontology: tool for the unification of biology. Nat. Genet., 25, 25–29. [DOI] [PMC free article] [PubMed] [Google Scholar]
Balbas M.D. et al. (2013) Overcoming mutation-based resistance to antiandrogens with rational drug design. Elife, 2, e00499. [DOI] [PMC free article] [PubMed] [Google Scholar]
Bedard P.L. et al. (2015) A phase Ib dose-escalation study of the oral pan-PI3K inhibitor buparlisib (BKM120) in combination with the oral MEK1/2 inhibitor trametinib (GSK1120212) in patients with selected advanced solid tumors. Clin. Cancer Res., 21, 730–738. [DOI] [PubMed] [Google Scholar]
Beylkin G. (1984) The inversion problem and applications of the generalized Radon transform. Commun. Pure Appl. Math., 37, 579–599. [Google Scholar]
Bohacek R.S. et al. (1996) The art and practice of structure-based drug design: a molecular modeling perspective. Med. Res. Rev., 16, 3–50. [DOI] [PubMed] [Google Scholar]
Bozic I. et al. (2013) Evolutionary dynamics of cancer in response to targeted combination therapy. Elife, 2, e00747. [DOI] [PMC free article] [PubMed] [Google Scholar]
Ceol A. et al. (2010) Mint, the molecular interaction database: 2009 update. Nucleic Acids Res., 38, D532–D539. [DOI] [PMC free article] [PubMed] [Google Scholar]
Chang G., Roth C.B. (2001) Structure of MsbA from E. coli: a homolog of the multidrug resistance ATP binding cassette (ABC) transporters. Science, 293, 1793–1800. [DOI] [PubMed] [Google Scholar]
Cheng F. et al. (2019) Network-based prediction of drug combinations. Nat. Commun., 10, 1197. [DOI] [PMC free article] [PubMed] [Google Scholar]
Chou T.-C. (2006) Theoretical basis, experimental design, and computerized simulation of synergism and antagonism in drug combination studies. Pharmacol. Rev., 58, 621–681. [DOI] [PubMed] [Google Scholar]
Chou T.-C. (2010) Drug combination studies and their synergy quantification using the Chou-Talalay method. Cancer Res., 70, 440–446. [DOI] [PubMed] [Google Scholar]
Clavel F., Hance A.J. (2004) HIV drug resistance. N. Engl. J. Med., 350, 1023–1035. [DOI] [PubMed] [Google Scholar]
Cock P.J. et al. (2009) Biopython: freely available python tools for computational molecular biology and bioinformatics. Bioinformatics, 25, 1422–1423. [DOI] [PMC free article] [PubMed] [Google Scholar]
Darnag R. et al. (2010) Support vector machines: development of QSAR models for predicting anti-HIV-1 activity of TIBO derivatives. Eur. J. Med. Chem., 45, 1590–1597. [DOI] [PubMed] [Google Scholar]
Das P. et al. (2019) A survey of the structures of US FDA approved combination drugs. J. Med. Chem., 62, 4265–4311. [DOI] [PubMed] [Google Scholar]
Davis A.P. et al. (2019) The comparative toxicogenomics database: update 2019. Nucleic Acids Res., 47, D948–D954. [DOI] [PMC free article] [PubMed] [Google Scholar]
Day D., Siu L.L. (2016) Approaches to modernize the combination drug development paradigm. Genome Med., 8, 115. [DOI] [PMC free article] [PubMed] [Google Scholar]
DiMasi J.A. et al. (2016) Innovation in the pharmaceutical industry: new estimates of R&D costs. J. Health Econ., 47, 20–33. [DOI] [PubMed] [Google Scholar]
Dooley S.W. et al. (1992) Multidrug-resistant tuberculosis. Ann. Int. Med., 117, 257–259. [DOI] [PubMed] [Google Scholar]
Engin H. et al. (2014) Network-based strategies can help mono-and poly-pharmacology drug discovery: a systems biology view. Curr. Pharm. Des., 20, 1201–1207. [DOI] [PubMed] [Google Scholar]
Flaherty K.T. et al. (2012) Combined BRAF and MEK inhibition in melanoma with BRAF v600 mutations. N. Engl. J. Med., 367, 1694–1703. [DOI] [PMC free article] [PubMed] [Google Scholar]
Ghany M., Liang T.J. (2007) Drug targets and molecular mechanisms of drug resistance in chronic hepatitis B. Gastroenterology, 132, 1574–1585. [DOI] [PubMed] [Google Scholar]
Grover A., Leskovec J. (2016) node2vec: scalable feature learning for networks. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 855–864. [DOI] [PMC free article] [PubMed] [Google Scholar]
Gulrajani I. et al. (2017) Improved training of Wasserstein GANs. In: Advances in Neural Information Processing Systems, pp. 5767–5777. [Google Scholar]
Hajiramezanali E. et al. (2019) Variational graph recurrent neural networks. In: Advances in Neural Information Processing Systems, pp. 10700–10710. [Google Scholar]
Hamosh A. et al. (2005) Online Mendelian Inheritance in Man (OMIM), a knowledgebase of human genes and genetic disorders. Nucleic Acids Research, 33, D514–D517. [DOI] [PMC free article] [PubMed] [Google Scholar]
Hasanzadeh A. et al. (2019) Semi-implicit graph variational auto-encoders. In: Advances in Neural Information Processing Systems, pp. 10711–10722. [Google Scholar]
Hidalgo C.A. et al. (2009) A dynamic network approach for the study of human phenotypes. PLoS Comput. Biol., 5, e1000353. [DOI] [PMC free article] [PubMed] [Google Scholar]
Holohan C. et al. (2013) Cancer drug resistance: an evolving paradigm. Nat. Rev. Cancer, 13, 714–726. [DOI] [PubMed] [Google Scholar]
Hornbeck P.V. et al. (2012) Phosphositeplus: a comprehensive resource for investigating the structure and function of experimentally determined post-translational modifications in man and mouse. Nucleic Acids Res., 40, D261–D270. [DOI] [PMC free article] [PubMed] [Google Scholar]
Housman G. et al. (2014) Drug resistance in cancer: an overview. Cancers, 6, 1769–1792. [DOI] [PMC free article] [PubMed] [Google Scholar]
Irwin J.J., Shoichet B.K. (2005) Zinc- a free database of commercially available compounds for virtual screening. J. Chem. Inf. Model., 45, 177–182. [DOI] [PMC free article] [PubMed] [Google Scholar]
Kanehisa M. et al. (2002) The KEGG database In: Novartis Foundation Symposium, Wiley Online Library, pp. 91–100. [PubMed] [Google Scholar]
Kaplan J.-C., Junien C. (2000) Genomics and medicine: an anticipation. C. R. Acad. Sci., 323, 1167–1174. [DOI] [PubMed] [Google Scholar]
Karimi M. et al. (2019) DeepAffinity: interpretable deep learning of compound–protein affinity through unified recurrent and convolutional neural networks. Bioinformatics, 35, 3329–3338. [DOI] [PMC free article] [PubMed] [Google Scholar]
Keith C.T. et al. (2005) Multicomponent therapeutics for networked systems. Nat. Rev. Drug Discov., 4, 71–78. [DOI] [PubMed] [Google Scholar]
Keshava Prasad T. et al. (2009) Human protein reference database-2009 update. Nucleic Acids Res., 37, D767–D772. [DOI] [PMC free article] [PubMed] [Google Scholar]
Kipf T.N., Welling M. (2016) Variational graph auto-encoders. NIPS Workshop on Bayesian Deep Learning.
Kolouri S. et al. (2019) Generalized sliced Wasserstein distances. Adv. Neural Inf. Process. Sys., 32, 261–272. [Google Scholar]
Lee D.-S. et al. (2008) The implications of human metabolic network topology for disease comorbidity. Proc. Natl. Acad. Sci. USA, 105, 9880–9885. [DOI] [PMC free article] [PubMed] [Google Scholar]
Lovly C.M., Shaw A.T. (2014) Molecular pathways: resistance to kinase inhibitors and implications for therapeutic strategies. Clin. Cancer Res., 20, 2249–2256. [DOI] [PMC free article] [PubMed] [Google Scholar]
Madani Tonekaboni S.A. et al. (2018) Predictive approaches for drug combination discovery in cancer. Brief. Bioinform., 19, 263–276. [DOI] [PMC free article] [PubMed] [Google Scholar]
Martínez-Jiménez F., Marti-Renom M.A. (2016) Should network biology be used for drug discovery? Expert Opin. Drug Discov., 11, 1135–1137. [DOI] [PubMed] [Google Scholar]
Matys V. et al. (2003) Transfac^®: transcriptional regulation, from patterns to profiles. Nucleic Acids Res., 31, 374–378. [DOI] [PMC free article] [PubMed] [Google Scholar]
Mayr A. et al. (2016) DeepTox: toxicity prediction using deep learning. Front. Environ. Sci., 3, 80. [Google Scholar]
Menche J. et al. (2015) Uncovering disease-disease relationships through the incomplete interactome. Science, 347, 1257601. [DOI] [PMC free article] [PubMed] [Google Scholar]
Mottaz A. et al. (2008) Mapping proteins to disease terminologies: from UniProt to MeSH. BMC Bioinformatics, 9, S3. [DOI] [PMC free article] [PubMed] [Google Scholar]
Pang K. et al. (2014) Combinatorial therapy discovery using mixed integer linear programming. Bioinformatics, 30, 1456–1463. [DOI] [PMC free article] [PubMed] [Google Scholar]
Perozzi B. et al. (2014) DeepWalk: online learning of social representations. In: Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 701–710. [Google Scholar]
Popova M. et al. (2018) Deep reinforcement learning for de novo drug design. Sci. Adv., 4, eaap7885. [DOI] [PMC free article] [PubMed] [Google Scholar]
Preuer K. et al. (2018) DeepSynergy: predicting anti-cancer drug synergy with deep learning. Bioinformatics, 34, 1538–1546. [DOI] [PMC free article] [PubMed] [Google Scholar]
Ramón-García S. et al. (2011) Synergistic drug combinations for tuberculosis therapy identified by a novel high-throughput screen. Antimicrob. Agents Chemother., 55, 3861–3869. [DOI] [PMC free article] [PubMed] [Google Scholar]
Ramos E.M. et al. (2014) Phenotype–genotype integrator (PheGenI): synthesizing genome-wide association study (GWAS) data with existing genomic resources. Eur. J. Hum. Genet., 22, 144–147. [DOI] [PMC free article] [PubMed] [Google Scholar]
Rogers F.B. (1963) Medical subject headings. Bull. Med. Libr. Assoc., 51, 114–116. [PMC free article] [PubMed] [Google Scholar]
Rolland T. et al. (2014) A proteome-scale map of the human interactome network. Cell, 159, 1212–1226. [DOI] [PMC free article] [PubMed] [Google Scholar]
Ruepp A. et al. (2010) CORUM: the comprehensive resource of mammalian protein complexes-2009. Nucleic Acids Res., 38, D497–D501. [DOI] [PMC free article] [PubMed] [Google Scholar]
Ryu J.Y. et al. (2018) Deep learning improves prediction of drug–drug and drug–food interactions. Proc. Natl. Acad. Sci. USA, 115, E4304–E4311. [DOI] [PMC free article] [PubMed] [Google Scholar]
Saputra E.C. et al. (2018) Combination therapy and the evolution of resistance: the theoretical merits of synergism and antagonism in cancer. Cancer Res., 78, 2419–2431. [DOI] [PubMed] [Google Scholar]
Schriml L.M. et al. (2012) Disease ontology: a backbone for disease semantic integration. Nucleic Acids Res., 40, D940–D946. [DOI] [PMC free article] [PubMed] [Google Scholar]
Schulman J. et al. (2017) Proximal policy optimization algorithms. arXiv Preprint arXiv : 1707.06347.
Shafer R., Vuitton D. (1999) Highly active antiretroviral therapy (HAART) for the treatment of infection with human immunodeficiency virus type 1. Biomed. Pharmacother., 53, 73–86. [DOI] [PubMed] [Google Scholar]
Sharma P., Allison J.P. (2015) Immune checkpoint targeting in cancer therapy: toward combination strategies with curative potential. Cell, 161, 205–214. [DOI] [PMC free article] [PubMed] [Google Scholar]
Singh N., Yeh P.J. (2017) Suppressive drug combinations and their potential to combat antibiotic resistance. J. Antibiot., 70, 1033–1042. [DOI] [PMC free article] [PubMed] [Google Scholar]
Stark C. et al. (2011) The biogrid interaction database: 2011 update. Nucleic Acids Res., 39, D698–D704. [DOI] [PMC free article] [PubMed] [Google Scholar]
Su A.I. et al. (2004) A gene atlas of the mouse and human protein-encoding transcriptomes. Proc. Natl. Acad. Sci. USA, 101, 6062–6067. [DOI] [PMC free article] [PubMed] [Google Scholar]
Svetnik V. et al. (2003) Random forest: a classification and regression tool for compound classification and QSAR modeling. J. Chem. Inf. Comput. Sci., 43, 1947–1958. [DOI] [PubMed] [Google Scholar]
Taubes G. (2008). The bacteria fight back. Science, 321, 356–361. [DOI] [PubMed] [Google Scholar]
Toy W. et al. (2013) ESR1 ligand-binding domain mutations in hormone-resistant breast cancer. Nat. Genet., 45, 1439–1445. [DOI] [PMC free article] [PubMed] [Google Scholar]
Turner N.C. et al. (2015) Palbociclib in hormone-receptor–positive advanced breast cancer. N. Engl. J. Med., 373, 209–219. [DOI] [PubMed] [Google Scholar]
Van Norman G.A. (2016) Drugs, devices, and the FDA: Part 1: an overview of approval processes for drugs. JACC Basic Transl. Sci., 1, 170–179. [DOI] [PMC free article] [PubMed] [Google Scholar]
Wishart D.S. et al. (2018) DrugBank 5.0: a major update to the DrugBank database for 2018. Nucleic Acids Res., 46, D1074–D1082. [DOI] [PMC free article] [PubMed] [Google Scholar]
Wong C.H. et al. (2019) Estimation of clinical trial success rates and related parameters. Biostatistics, 20, 273–286. [DOI] [PMC free article] [PubMed] [Google Scholar]
You J. et al. (2018) Graph convolutional policy network for goal-directed molecular graph generation. In: Advances in Neural Information Processing Systems, pp. 6410–6421. [Google Scholar]
Zhang W. et al. (2018) Predicting drug-disease associations by using similarity constrained matrix factorization. BMC Bioinformatics, 19, 1–12. [DOI] [PMC free article] [PubMed] [Google Scholar]
Zhavoronkov A. et al. (2019) Deep learning enables rapid identification of potent DDR1 kinase inhibitors. Nat. Biotechnol., 37, 1038–1040. [DOI] [PubMed] [Google Scholar]
Zhou X. et al. (2014) Human symptoms–disease network. Nat. Commun., 5, 1–10. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

btaa317_Supplementary_Data

Click here for additional data file.^{(844.7KB, zip)}

[btaa317-B1] Abraham E.P., Chain E. (1940) An enzyme from bacteria able to destroy penicillin. Nature, 146, 837–837. [PubMed] [Google Scholar]

[btaa317-B2] Aranda B. et al. (2010) The intact molecular interaction database in 2010. Nucleic Acids Res., 38, D525–D531. [DOI] [PMC free article] [PubMed] [Google Scholar]

[btaa317-B3] Arjovsky M. et al. (2017) Wasserstein GAN. In: Proceedings of the 34th International Conference on Machine Learning (PMLR), 70, 214–223. [Google Scholar]

[btaa317-B4] Ashburner M. et al. (2000) Gene ontology: tool for the unification of biology. Nat. Genet., 25, 25–29. [DOI] [PMC free article] [PubMed] [Google Scholar]

[btaa317-B5] Balbas M.D. et al. (2013) Overcoming mutation-based resistance to antiandrogens with rational drug design. Elife, 2, e00499. [DOI] [PMC free article] [PubMed] [Google Scholar]

[btaa317-B6] Bedard P.L. et al. (2015) A phase Ib dose-escalation study of the oral pan-PI3K inhibitor buparlisib (BKM120) in combination with the oral MEK1/2 inhibitor trametinib (GSK1120212) in patients with selected advanced solid tumors. Clin. Cancer Res., 21, 730–738. [DOI] [PubMed] [Google Scholar]

[btaa317-B7] Beylkin G. (1984) The inversion problem and applications of the generalized Radon transform. Commun. Pure Appl. Math., 37, 579–599. [Google Scholar]

[btaa317-B8] Bohacek R.S. et al. (1996) The art and practice of structure-based drug design: a molecular modeling perspective. Med. Res. Rev., 16, 3–50. [DOI] [PubMed] [Google Scholar]

[btaa317-B9] Bozic I. et al. (2013) Evolutionary dynamics of cancer in response to targeted combination therapy. Elife, 2, e00747. [DOI] [PMC free article] [PubMed] [Google Scholar]

[btaa317-B10] Ceol A. et al. (2010) Mint, the molecular interaction database: 2009 update. Nucleic Acids Res., 38, D532–D539. [DOI] [PMC free article] [PubMed] [Google Scholar]

[btaa317-B11] Chang G., Roth C.B. (2001) Structure of MsbA from E. coli: a homolog of the multidrug resistance ATP binding cassette (ABC) transporters. Science, 293, 1793–1800. [DOI] [PubMed] [Google Scholar]

[btaa317-B12] Cheng F. et al. (2019) Network-based prediction of drug combinations. Nat. Commun., 10, 1197. [DOI] [PMC free article] [PubMed] [Google Scholar]

[btaa317-B13] Chou T.-C. (2006) Theoretical basis, experimental design, and computerized simulation of synergism and antagonism in drug combination studies. Pharmacol. Rev., 58, 621–681. [DOI] [PubMed] [Google Scholar]

[btaa317-B14] Chou T.-C. (2010) Drug combination studies and their synergy quantification using the Chou-Talalay method. Cancer Res., 70, 440–446. [DOI] [PubMed] [Google Scholar]

[btaa317-B15] Clavel F., Hance A.J. (2004) HIV drug resistance. N. Engl. J. Med., 350, 1023–1035. [DOI] [PubMed] [Google Scholar]

[btaa317-B16] Cock P.J. et al. (2009) Biopython: freely available python tools for computational molecular biology and bioinformatics. Bioinformatics, 25, 1422–1423. [DOI] [PMC free article] [PubMed] [Google Scholar]

[btaa317-B17] Darnag R. et al. (2010) Support vector machines: development of QSAR models for predicting anti-HIV-1 activity of TIBO derivatives. Eur. J. Med. Chem., 45, 1590–1597. [DOI] [PubMed] [Google Scholar]

[btaa317-B18] Das P. et al. (2019) A survey of the structures of US FDA approved combination drugs. J. Med. Chem., 62, 4265–4311. [DOI] [PubMed] [Google Scholar]

[btaa317-B19] Davis A.P. et al. (2019) The comparative toxicogenomics database: update 2019. Nucleic Acids Res., 47, D948–D954. [DOI] [PMC free article] [PubMed] [Google Scholar]

[btaa317-B20] Day D., Siu L.L. (2016) Approaches to modernize the combination drug development paradigm. Genome Med., 8, 115. [DOI] [PMC free article] [PubMed] [Google Scholar]

[btaa317-B21] DiMasi J.A. et al. (2016) Innovation in the pharmaceutical industry: new estimates of R&D costs. J. Health Econ., 47, 20–33. [DOI] [PubMed] [Google Scholar]

[btaa317-B22] Dooley S.W. et al. (1992) Multidrug-resistant tuberculosis. Ann. Int. Med., 117, 257–259. [DOI] [PubMed] [Google Scholar]

[btaa317-B23] Engin H. et al. (2014) Network-based strategies can help mono-and poly-pharmacology drug discovery: a systems biology view. Curr. Pharm. Des., 20, 1201–1207. [DOI] [PubMed] [Google Scholar]

[btaa317-B24] Flaherty K.T. et al. (2012) Combined BRAF and MEK inhibition in melanoma with BRAF v600 mutations. N. Engl. J. Med., 367, 1694–1703. [DOI] [PMC free article] [PubMed] [Google Scholar]

[btaa317-B25] Ghany M., Liang T.J. (2007) Drug targets and molecular mechanisms of drug resistance in chronic hepatitis B. Gastroenterology, 132, 1574–1585. [DOI] [PubMed] [Google Scholar]

[btaa317-B26] Grover A., Leskovec J. (2016) node2vec: scalable feature learning for networks. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 855–864. [DOI] [PMC free article] [PubMed] [Google Scholar]

[btaa317-B27] Gulrajani I. et al. (2017) Improved training of Wasserstein GANs. In: Advances in Neural Information Processing Systems, pp. 5767–5777. [Google Scholar]

[btaa317-B28] Hajiramezanali E. et al. (2019) Variational graph recurrent neural networks. In: Advances in Neural Information Processing Systems, pp. 10700–10710. [Google Scholar]

[btaa317-B29] Hamosh A. et al. (2005) Online Mendelian Inheritance in Man (OMIM), a knowledgebase of human genes and genetic disorders. Nucleic Acids Research, 33, D514–D517. [DOI] [PMC free article] [PubMed] [Google Scholar]

[btaa317-B30] Hasanzadeh A. et al. (2019) Semi-implicit graph variational auto-encoders. In: Advances in Neural Information Processing Systems, pp. 10711–10722. [Google Scholar]

[btaa317-B31] Hidalgo C.A. et al. (2009) A dynamic network approach for the study of human phenotypes. PLoS Comput. Biol., 5, e1000353. [DOI] [PMC free article] [PubMed] [Google Scholar]

[btaa317-B32] Holohan C. et al. (2013) Cancer drug resistance: an evolving paradigm. Nat. Rev. Cancer, 13, 714–726. [DOI] [PubMed] [Google Scholar]

[btaa317-B33] Hornbeck P.V. et al. (2012) Phosphositeplus: a comprehensive resource for investigating the structure and function of experimentally determined post-translational modifications in man and mouse. Nucleic Acids Res., 40, D261–D270. [DOI] [PMC free article] [PubMed] [Google Scholar]

[btaa317-B34] Housman G. et al. (2014) Drug resistance in cancer: an overview. Cancers, 6, 1769–1792. [DOI] [PMC free article] [PubMed] [Google Scholar]

[btaa317-B35] Irwin J.J., Shoichet B.K. (2005) Zinc- a free database of commercially available compounds for virtual screening. J. Chem. Inf. Model., 45, 177–182. [DOI] [PMC free article] [PubMed] [Google Scholar]

[btaa317-B36] Kanehisa M. et al. (2002) The KEGG database In: Novartis Foundation Symposium, Wiley Online Library, pp. 91–100. [PubMed] [Google Scholar]

[btaa317-B37] Kaplan J.-C., Junien C. (2000) Genomics and medicine: an anticipation. C. R. Acad. Sci., 323, 1167–1174. [DOI] [PubMed] [Google Scholar]

[btaa317-B38] Karimi M. et al. (2019) DeepAffinity: interpretable deep learning of compound–protein affinity through unified recurrent and convolutional neural networks. Bioinformatics, 35, 3329–3338. [DOI] [PMC free article] [PubMed] [Google Scholar]

[btaa317-B39] Keith C.T. et al. (2005) Multicomponent therapeutics for networked systems. Nat. Rev. Drug Discov., 4, 71–78. [DOI] [PubMed] [Google Scholar]

[btaa317-B40] Keshava Prasad T. et al. (2009) Human protein reference database-2009 update. Nucleic Acids Res., 37, D767–D772. [DOI] [PMC free article] [PubMed] [Google Scholar]

[btaa317-B41] Kipf T.N., Welling M. (2016) Variational graph auto-encoders. NIPS Workshop on Bayesian Deep Learning.

[btaa317-B42] Kolouri S. et al. (2019) Generalized sliced Wasserstein distances. Adv. Neural Inf. Process. Sys., 32, 261–272. [Google Scholar]

[btaa317-B43] Lee D.-S. et al. (2008) The implications of human metabolic network topology for disease comorbidity. Proc. Natl. Acad. Sci. USA, 105, 9880–9885. [DOI] [PMC free article] [PubMed] [Google Scholar]

[btaa317-B44] Lovly C.M., Shaw A.T. (2014) Molecular pathways: resistance to kinase inhibitors and implications for therapeutic strategies. Clin. Cancer Res., 20, 2249–2256. [DOI] [PMC free article] [PubMed] [Google Scholar]

[btaa317-B45] Madani Tonekaboni S.A. et al. (2018) Predictive approaches for drug combination discovery in cancer. Brief. Bioinform., 19, 263–276. [DOI] [PMC free article] [PubMed] [Google Scholar]

[btaa317-B46] Martínez-Jiménez F., Marti-Renom M.A. (2016) Should network biology be used for drug discovery? Expert Opin. Drug Discov., 11, 1135–1137. [DOI] [PubMed] [Google Scholar]

[btaa317-B47] Matys V. et al. (2003) Transfac^®: transcriptional regulation, from patterns to profiles. Nucleic Acids Res., 31, 374–378. [DOI] [PMC free article] [PubMed] [Google Scholar]

[btaa317-B48] Mayr A. et al. (2016) DeepTox: toxicity prediction using deep learning. Front. Environ. Sci., 3, 80. [Google Scholar]

[btaa317-B49] Menche J. et al. (2015) Uncovering disease-disease relationships through the incomplete interactome. Science, 347, 1257601. [DOI] [PMC free article] [PubMed] [Google Scholar]

[btaa317-B50] Mottaz A. et al. (2008) Mapping proteins to disease terminologies: from UniProt to MeSH. BMC Bioinformatics, 9, S3. [DOI] [PMC free article] [PubMed] [Google Scholar]

[btaa317-B51] Pang K. et al. (2014) Combinatorial therapy discovery using mixed integer linear programming. Bioinformatics, 30, 1456–1463. [DOI] [PMC free article] [PubMed] [Google Scholar]

[btaa317-B52] Perozzi B. et al. (2014) DeepWalk: online learning of social representations. In: Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 701–710. [Google Scholar]

[btaa317-B53] Popova M. et al. (2018) Deep reinforcement learning for de novo drug design. Sci. Adv., 4, eaap7885. [DOI] [PMC free article] [PubMed] [Google Scholar]

[btaa317-B54] Preuer K. et al. (2018) DeepSynergy: predicting anti-cancer drug synergy with deep learning. Bioinformatics, 34, 1538–1546. [DOI] [PMC free article] [PubMed] [Google Scholar]

[btaa317-B55] Ramón-García S. et al. (2011) Synergistic drug combinations for tuberculosis therapy identified by a novel high-throughput screen. Antimicrob. Agents Chemother., 55, 3861–3869. [DOI] [PMC free article] [PubMed] [Google Scholar]

[btaa317-B56] Ramos E.M. et al. (2014) Phenotype–genotype integrator (PheGenI): synthesizing genome-wide association study (GWAS) data with existing genomic resources. Eur. J. Hum. Genet., 22, 144–147. [DOI] [PMC free article] [PubMed] [Google Scholar]

[btaa317-B57] Rogers F.B. (1963) Medical subject headings. Bull. Med. Libr. Assoc., 51, 114–116. [PMC free article] [PubMed] [Google Scholar]

[btaa317-B58] Rolland T. et al. (2014) A proteome-scale map of the human interactome network. Cell, 159, 1212–1226. [DOI] [PMC free article] [PubMed] [Google Scholar]

[btaa317-B59] Ruepp A. et al. (2010) CORUM: the comprehensive resource of mammalian protein complexes-2009. Nucleic Acids Res., 38, D497–D501. [DOI] [PMC free article] [PubMed] [Google Scholar]

[btaa317-B60] Ryu J.Y. et al. (2018) Deep learning improves prediction of drug–drug and drug–food interactions. Proc. Natl. Acad. Sci. USA, 115, E4304–E4311. [DOI] [PMC free article] [PubMed] [Google Scholar]

[btaa317-B61] Saputra E.C. et al. (2018) Combination therapy and the evolution of resistance: the theoretical merits of synergism and antagonism in cancer. Cancer Res., 78, 2419–2431. [DOI] [PubMed] [Google Scholar]

[btaa317-B62] Schriml L.M. et al. (2012) Disease ontology: a backbone for disease semantic integration. Nucleic Acids Res., 40, D940–D946. [DOI] [PMC free article] [PubMed] [Google Scholar]

[btaa317-B63] Schulman J. et al. (2017) Proximal policy optimization algorithms. arXiv Preprint arXiv : 1707.06347.

[btaa317-B64] Shafer R., Vuitton D. (1999) Highly active antiretroviral therapy (HAART) for the treatment of infection with human immunodeficiency virus type 1. Biomed. Pharmacother., 53, 73–86. [DOI] [PubMed] [Google Scholar]

[btaa317-B65] Sharma P., Allison J.P. (2015) Immune checkpoint targeting in cancer therapy: toward combination strategies with curative potential. Cell, 161, 205–214. [DOI] [PMC free article] [PubMed] [Google Scholar]

[btaa317-B66] Singh N., Yeh P.J. (2017) Suppressive drug combinations and their potential to combat antibiotic resistance. J. Antibiot., 70, 1033–1042. [DOI] [PMC free article] [PubMed] [Google Scholar]

[btaa317-B67] Stark C. et al. (2011) The biogrid interaction database: 2011 update. Nucleic Acids Res., 39, D698–D704. [DOI] [PMC free article] [PubMed] [Google Scholar]

[btaa317-B68] Su A.I. et al. (2004) A gene atlas of the mouse and human protein-encoding transcriptomes. Proc. Natl. Acad. Sci. USA, 101, 6062–6067. [DOI] [PMC free article] [PubMed] [Google Scholar]

[btaa317-B69] Svetnik V. et al. (2003) Random forest: a classification and regression tool for compound classification and QSAR modeling. J. Chem. Inf. Comput. Sci., 43, 1947–1958. [DOI] [PubMed] [Google Scholar]

[btaa317-B70] Taubes G. (2008). The bacteria fight back. Science, 321, 356–361. [DOI] [PubMed] [Google Scholar]

[btaa317-B71] Toy W. et al. (2013) ESR1 ligand-binding domain mutations in hormone-resistant breast cancer. Nat. Genet., 45, 1439–1445. [DOI] [PMC free article] [PubMed] [Google Scholar]

[btaa317-B72] Turner N.C. et al. (2015) Palbociclib in hormone-receptor–positive advanced breast cancer. N. Engl. J. Med., 373, 209–219. [DOI] [PubMed] [Google Scholar]

[btaa317-B73] Van Norman G.A. (2016) Drugs, devices, and the FDA: Part 1: an overview of approval processes for drugs. JACC Basic Transl. Sci., 1, 170–179. [DOI] [PMC free article] [PubMed] [Google Scholar]

[btaa317-B74] Wishart D.S. et al. (2018) DrugBank 5.0: a major update to the DrugBank database for 2018. Nucleic Acids Res., 46, D1074–D1082. [DOI] [PMC free article] [PubMed] [Google Scholar]

[btaa317-B75] Wong C.H. et al. (2019) Estimation of clinical trial success rates and related parameters. Biostatistics, 20, 273–286. [DOI] [PMC free article] [PubMed] [Google Scholar]

[btaa317-B76] You J. et al. (2018) Graph convolutional policy network for goal-directed molecular graph generation. In: Advances in Neural Information Processing Systems, pp. 6410–6421. [Google Scholar]

[btaa317-B77] Zhang W. et al. (2018) Predicting drug-disease associations by using similarity constrained matrix factorization. BMC Bioinformatics, 19, 1–12. [DOI] [PMC free article] [PubMed] [Google Scholar]

[btaa317-B78] Zhavoronkov A. et al. (2019) Deep learning enables rapid identification of potent DDR1 kinase inhibitors. Nat. Biotechnol., 37, 1038–1040. [DOI] [PubMed] [Google Scholar]

[btaa317-B79] Zhou X. et al. (2014) Human symptoms–disease network. Nat. Commun., 5, 1–10. [DOI] [PubMed] [Google Scholar]

PERMALINK

Network-principled deep generative models for designing drug combinations as graph sets

Mostafa Karimi

Arman Hasanzadeh

Yang Shen

Abstract

Motivation

Results

Availability and implementation

Supplementary information

1 Introduction

Fig. 1.

2 Data

2.1 Human interactome and its features

2.2 Disease–disease network

2.3 Disease–gene associations

2.4 Disease classification

2.5 FDA-approved drugs and drug combinations

3 Materials and Methods

3.1 HVGAE for representation learning

3.1.1 First level: gene–gene embedding

3.1.2 Second level: disease–disease embedding

3.2 RL-based graph-set generator for drug combinations

3.2.1 State and action space

3.2.2 Multi-objective reward

3.2.3 Policy network

4 Results

4.1 HVGAE representation compares favorably to baselines

4.1.1 Experiment setup

4.1.2 Numerical analysis and ablation study for network embedding

Table 1.

4.2 Our model generates drug combinations following network principles across diseases

4.2.1 Experiment setup

4.2.2 Numerical analysis

Table 2.

Table 3.

4.3 Case studies for specific diseases

4.3.1 Experiment setup

4.3.2 Baseline methods for drug pair combination

4.3.3 Designed pairs follow network principles and improve toxicity

Fig. 2.

4.3.4 Designed pairs reproduce approved polypharmacology strategies

4.3.5 Ablation study for RL-based drug-combination generation

Fig. 3.

5 Conclusion

Supplementary Material

Acknowledgement

Funding

References

Associated Data

Supplementary Materials

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases