Abstract
The accumulation of various types of drug informatics data and computational approaches for drug repositioning can accelerate pharmaceutical research and development. However, the integration of multi-dimensional drug data for precision repositioning remains a pressing challenge. Here, we propose a systematic framework named PIMD to predict drug therapeutic properties by integrating multi-dimensional data for drug repositioning. In PIMD, drug similarity networks (DSNs) based on chemical, pharmacological, and clinical data are fused into an integrated DSN (iDSN) composed of many clusters. Rather than simple fusion, PIMD offers a systematic way to annotate clusters. Unexpected drugs within clusters and drug pairs with a high iDSN similarity score are therefore identified to predict novel therapeutic uses. PIMD provides new insights into the universality, individuality, and complementarity of different drug properties by evaluating the contribution of each property data. To test the performance of PIMD, we use chemical, pharmacological, and clinical properties to generate an iDSN. Analyses of the contributions of each drug property indicate that this iDSN was driven by all data types and performs better than other DSNs. Within the top 20 recommended drug pairs, 7 drugs have been reported to be repurposed. The source code for PIMD is available at https://github.com/Sepstar/PIMD/.
Keywords: Drug repositioning, Drug similarity network, Multiple characterization fusion, Network pharmacology, Drug discovery
Introduction
Despite the ever-increasing funding of pharmaceutical research and development (R&D), the number of new drugs approved has not increased significantly [1]. Traditional de novo drug development remains costly, risky, and time-consuming [2], [3]. Drug repositioning, wherein an existing drug receives a new application, provides a new opportunity for pharmaceutical R&D [4]. With the accumulation of drug informatics datasets, computational algorithms can be used for systematic identification of potential new indications for on-market drugs, thereby reducing the financial and time investment, as well as the risk, involved in pharmaceutical R&D [5], [6], [7], [8].
Recent studies have shown that computational approaches based on drug similarity have the potential to reveal novel indications for on-market drugs. These approaches have been applied primarily in the following four ways. 1) The first group includes transcriptional response-based approaches. For instance, Iorio et al. [9] used drug-specific response profiles based on the Connectivity Map (CMap) database to find drugs with similar modes of action, whereas Xie et al. [10] used drug perturbation profiles from the Library of Integrated Network-based Cellular Signatures (LINCS) project to predict additional therapeutic properties of drugs. 2) The second group includes chemical structure-based approaches. For example, Keiser et al. [11] predicted potential drug targets based on combined drug–target structure. 3) The third group includes side effect- or other phenotype-based approaches. For example, Campillos et al. [12], [13] constructed a side effect-driven drug similarity network (DSN) based on the assumption that drugs with similar side effects may share targets leading to the identification of novel drug targets. 4) The last group includes target property-based approaches. For example, Yildirim et al. [14] used known drug–target associations to assess ongoing trends and shifts in drug discovery and to quantify interrelationships between drug targets and disease-causing gene products. Although they shared the assumption that similar drugs tend to share therapeutic properties, each of these studies focused on a single drug feature in assessment of drug similarity, raising some doubts about the usefulness of these approaches. For example, Yildirim et al. [14] pointed out that most drugs with the same targets have different chemical structures, and Keiser et al. [15] demonstrated that a small change in drug structure could alter binding affinity dramatically. In addition, the transcriptional response to drug perturbation may differ across cell lines and drug dosages, thus introducing noise into drug repositioning strategies based on transcriptome data. Notwithstanding, our previous studies illustrated positive correlations between repositioning potential based on transcriptome data and that based on side effect profile or structure [10].
With the development of network pharmacology and systems biology, integrating multi-attribute data of drugs seems to be a feasible means of identifying new opportunities for drug repositioning [16]. One of the most common methods to integrate such data is to concatenate several measurements from various properties, such as side effects and chemical fragments, of each drug [17], [18]. However, the already low signal-to-noise ratio in each data type could be diluted by concatenation [19]. To avoid this problem, many researchers have made some preliminary attempts to use DSNs based on different drug properties for data combination. For example, Napolitano et al. [20] constructed three DSNs, based on drug structures, distances between drug targets in protein–protein interaction (PPI) networks, and expression patterns of drug perturbations, separately. They then integrated these attribute datasets by averaging three drug similarity measurements to predict new therapeutic properties of drugs. Meanwhile, Wang et al. [21] proposed a new algorithm, called PreDR, which predicts as yet unidentified drug–disease associations by taking the maximums of three drug similarity matrices derived from chemical structure, target protein sequence, and side effect profile similarities. Zhang et al. [22] proposed the Similarity-based LArge-margin learning of Multiple Sources (SLAMS) algorithm of drug similarity based on multiple sources of drug and disease property data. SLAMS outputs therapeutic scores for each drug–disease pair that correspond to multi-level drug properties and disease properties, and then averages the scores to predict the novel disease applications for drugs. Liu et al. [23] proposed the two-pass random walks with restart on a heterogeneous network (TP-NRWRH) to predict new indications for approved drugs. In the model, DSNs are integrated using the probability disjunction formula. Additionally, there are many articles on the drug–target predictions with data integration based on a linear combination of multiple attributes [24], [25], [26], [27], [28], [29], [30], [31], [32], [33], [34], [35], [36], [37], [38], [39], [40], [41], [42]. Although many studies take into account multiple drug sources, these integration strategies based on simple averaging or maximization are linear and cannot make full use of topology and non-linear information. Furthermore, most of the methods cannot evaluate the contribution and the relative importance of each property data. As for network-based analysis, to our knowledge, there is no integrated method offering a systematic way to annotate and evaluate the drug clusters. In recent years, non-linear multi-dimensional data fusion algorithms and tools have been widely applied in disease subtype identification, such as the Similarity Network Fusion (SNF) and Integrated Clustering of Multidimensional biomedical data (ICM) [19], [43].
In this study, we propose a systematic and extensible paradigm of drug repositioning, namely prediction of drug therapeutic property by integrating multi-dimensional data (PIMD), and report the construction of an integrated DSN (iDSN) based on drug structure, side effect profile, and target protein sequence data. First, we integrated different types of drug information, including side effects, chemical structures, and molecular targets representing clinical, chemical, and pharmacological properties, separately, and constructed a DSN for each of these properties. Second, we used a non-linear fusion algorithm, namely SNF, to combine three DSNs into the iDSN iteratively [19]. Next, we evaluated the contributions of each dimension of data in the iDSN and the correlation among them.
Our study examined the types of data underlying the iDSN and how the iDSN performs relative to single-property networks. We used spectral clustering to divide the iDSN into clusters, and then conducted a systematic and comprehensive drug cluster analysis through five types of statistical analyses, including drug-based enrichment analysis, target-based enrichment analysis, drug property analysis, chemogenomic enrichment analysis (CGEA), and chemical ontology enrichment analysis. We hypothesize that if similar drugs have similar therapeutic properties, then the drugs that appear unexpectedly in a cluster based on their anatomical therapeutic chemical (ATC) label would represent repositioning candidates.
Method
Data source
We collected information on 7132 drugs with their corresponding target protein information from the DrugBank database, a bioinformatics resource with detailed drug data and complete drug–target interaction data. We used version 5.0 of the DrugBank database to construct the DSN based on drug targets (DSN-T) [5]. Protein sequence data were extracted from the UniProt database, which provides high-quality, freely accessible protein sequence data [44].
We obtained 139,756 relationships between 1430 drugs and 5868 side effects from the Side Effect Resource (SIDER) database, which contains information about on-market drugs and their recorded adverse drug reactions or side effects, extracted from public documents and package inserts. We used version 4.1 of SIDER to construct the DSN based on drug side effects (DSN-S) [8].
PubChem Compound is a database containing more than 92 million unique structures of compounds. Similarly, We extracted chemical structures of drugs from PubChem Compound to construct the DSN based on drug chemical structure (DSN-C) [7]. We used the PubChem Compound Identifier (CID) as the only identifier of drugs to identify drugs shared across the databases. We used the identified drugs to construct three single DSNs.
Drug similarity measurements
Drug similarity quantifies the degree of shared features between paired drugs. We restricted similarity scores to be between 0 (lowest) and 1 (highest). We defined the drug similarity measurements of the three properties examined separately.
Drug similarity based on drug side effects
Side effects represent the clinical properties of a drug. We obtained side effect data from SIDER. We used drug side effect information with frequency data (as opposed to without) because such information was derived empirically and thus deemed more credible. Given there are risks of bias in observation and statistics, we filtered outsider effect terms if they occurred only once or with a frequency < 0.1%. Finally, we characterized drug side effects according to the 2072-dimensional binary vector , known as side effect profile. Similarity based on side effects between two drugs and was computed by the Tanimoto coefficient of their side effect profiles:
(1) |
where and are the number of side effect terms for drugs and , respectively. represents the number of side effects shared by these two drugs.
Drug similarity based on drug chemical structure
Drug chemical structure represents the chemical properties of a drug. We obtained chemical structure information from PubChem Compound and computed atom-pair descriptors of drugs using the R package “ChemmineR” [45]. The atom-pair descriptors used to quantify the chemical structure of small molecule compounds encode all atom pairs in a drug. We computed similarity based on the chemical structure between drugs and as the Tanimoto coefficient of the chemical atom-pair descriptors:
(2) |
where and represent the number of atom pairs for drug and drug , respectively. represents the number of the atom pairs shared by these two drugs.
Drug similarity based on drug targets
Drug targets represent the pharmacological properties of a drug. We obtained drug–target interactions and target protein sequences from the DrugBank and Uniprot databases, respectively. Then, similarity based on drug targets between drugs and was computed with a normalized Smith–Waterman score as follows:
(3) |
where represents a target protein set of drug , indicates the th target of drug , and represents the size of the target protein set of drug . is the Smith–Waterman sequence alignment score of target proteins of drugs and [46].
SNF method
We used the “SNFtool” in R software to achieve SNF, a useful and popular computational method for data integration in the field of disease subtype identification [19]. It can deal with noise in different data types and make full use of common and complementary information across data types by integrating data in a non-linear way. We introduced SNF into multi-dimensional drug informatics data integration in this study for the first time, in the following three steps. 1) DSNs are built for each data type. 2) Multiple DSNs are integrated with SNF, and each of these DSNs is updated iteratively with information from other networks, making them more similar to each other than before. There are three main parameters in SNF: hyperparameter (η), number of neighbors (K), and number of iterations (T). The integration is robust to these parameters as described previously [19]. Here we set , , , as recommended by Wang and colleagues [19]. 3) A final iDSN is obtained from SNF process convergence.
Cluster validity index
We used two cluster validity indexes to determine the number of clusters.
Dunn index
The Dunn index is the ratio of the smallest distance between observations not in the same cluster to the largest intra-cluster distance. The Dunn index has a value between zero and infinity and should be maximized. The Dunn index is calculated as follows:
(4) |
where C is the collection of all clusters, is the largest intra-cluster distance in Cluster , is the distance between the nearest pair of samples in Cluster and Cluster .
Silhouette index
The silhouette value is a measure of how well each object lies within its cluster. The silhouette value ranges from −1 to 1, and should be maximized. It is calculated as follows:
(5) |
where is the average distance between sample and all other data points within the same cluster, is the lowest average distance of sample to all points in any other clusters.
Evaluation measurements
Several evaluation measurements were used in the study, as introduced below.
Normalized mutual information
In probability theory and information theory, the mutual information (MI) of two random variables is a measure of their mutual dependence. In this study, is the ATC label vector of all the 593 drugs in iDSN, and is the predicted label vector obtained by clustering these drugs. MI was calculated as follows:
(6) |
where is the joint probability function of and , whereas and are the marginal probability distribution functions of and , respectively. Normalized MI (NMI) is calculated as follows:
(7) |
where and are the entropy of and , respectively.
(8) |
where is the marginal probability distribution function of .
ATC overlap rate
In drug set , the number of drug pairs is denoted as Sum. For each drug pair in the set, we calculated the intersection of ATC first-level codes of drugs and . If the intersection was not empty, , otherwise denoted . The ATC overlap rate (AOR) in drug set is computed as follows:
(9) |
Superclass overlap rate
In drug set , the number of drug pairs is denoted as Sum. For each drug pair in the set, if the superclass label of drug is the same as that of drug , , otherwise . The superclass overlap rate (SOR) in drug set is computed as follows:
(10) |
Connectivity
The connectivity indicates the degree of connectedness of the clusters. Denote N as the number of observations and denote C as the collection of all clusters. L represents the number of nearest neighbors. Define as the th nearest neighbor of observation . Let be zero if i and are in the same cluster, and otherwise. The connectivity is defined as:
(11) |
The connectivity has a value between zero and infinity and should be minimized.
Rogers–Tanimoto index
The Rogers–Tanimoto similarity rely on a 2 × 2 contingency table, consisting of the following four cells: , , , and . is the number of observation pairs, where the two observations belong to the same cluster according to both partition and . is the number of observation pairs, where the two observations belong to the same cluster according to partition but not to . is the number of observation pairs, where the two observations belong to the same cluster according to partition but not to . is the number of observation pairs, where the two observations do not belong to the same cluster according to both partition and . The Rogers–Tanimoto similarity is defined as:
(12) |
Other integrative methods for comparison
We compared the network fusion performance of PIMD with three previous integrative methods: 1) the maximum method [21], 2) the weighted average method [20], [22], [24], [25], [26], [27], [28], [29], [30], [31], [32], and 3) the probability disjunction [23]. For the maximum method, we took the maximums of multiple drug similarity matrices. For the weighted average method, we averaged multiple drug similarity matrices by traversing weight. The weight of each drug similarity network is from 0 to 1 with step 0.1. For the probability disjunction, the formula is:
(13) |
where is the integrative similarity measurement between drug and .
Data type contribution
For each edge in the iDSN, we used similarity scores from each single network to describe which data type was the primary contributor. First, we ranked three similarity scores of the edge in each single network as , where i, j, and k refer to the three types of data. If was 10% higher than , the edge was attributed to the data type. If was < 10% higher than but was 10% higher than , the edge was attributed to both the and data types. If is < 10% higher than , but is < 10% higher than , the edge was attributed to all three data types.
Statistical analysis
We performed five types of enrichment analyses to annotate the drug clusters for drug precision repositioning. For drug-based and target-based enrichment analyses, we calculated enrichment score (ES) as follows:
(14) |
where is the number of drugs with a particular label (e.g., ATC code) in the cluster of interest, is the number of drugs with the label in the overall dataset, is the total number of drugs in the cluster of interest, and is the total number of drugs in the overall dataset. Then we used the hypergeometric distribution to calculate the P value as follows:
(15) |
Drug-based enrichment analysis
For each cluster, we performed drug class and absorption, distribution, metabolism, excretion, and toxicity (ADMET) property enrichment analyses based on the DrugBank database. We computed ESs and P values based on drug ATC code, superclass label, and ADMET properties.
Target-based enrichment analysis
For target proteins of each cluster, we performed target class, Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway, and Gene Ontology (GO) enrichment analyses based on the International Union of Basic and Clinical Pharmacology/British Pharmacological Society (IUPHAR/BPS), KEGG, and GO databases, respectively [47], [48], [49]. KEGG and GO analyses were conducted with the ClusterProfiler tool in R software [50]. KEGG and GO enrichment results were selected by a P value threshold of 0.05.
Drug property analysis
Physicochemical features were extracted from a drug property list in RepurposeDB [51]. Chemical descriptors were computed based on information from the Pybel, JOELib2, and Chemminer chemoinformatics libraries [45], [52], [53]. We used t-tests to evaluate the significance of deviation of mean values for each cluster from that of all drugs in DrugBank. In each cluster, for property in property set , the deviation from the mean value of all drugs in DrugBank is calculated as follows:
(16) |
where is the mean property value for all drugs in DrugBank, and is the mean property value fro drugs in the cluster.
Chemogenomic enrichment analysis
We use the CGEA online tool to analyze drugs in each cluster (http://server.dudleylab.org/index). CGEA maps the drug list to various annotation resources, including drug-induced transcriptional modules, enzymes, and fragments. These potentially relevant features identified provide a rich chemogenomic context for drugs of interest.
Chemical ontology enrichment analysis
BiNChE is a web tool for chemical enrichment analysis based on the Chemical Entities of Biological Interest (ChEBI) Ontology [54]. We performed enrichment analysis by mapping the drugs in each cluster to both ‘structure’ and ‘role’ subsets of ChEBI Ontology.
Results
An overview of PIMD framework
PIMD is a systematic framework that predicts new therapeutic properties of known drugs, calculates the contributions of various types of data, and annotates various aspects of drug grouping, by integrating multiple drug properties. In this study, we hypothesize that if drug dA and drug dB are similar with respect to a particular parameter, then the therapeutic property t of drug dA may also be shared by drug dB. The functions of a drug can be characterized by multiple drug informatics. Therefore, in PIMD, we investigated three representative drug properties: 1) chemical properties, based on chemical structure data derived from the PubChem Compound database [7]; 2) pharmacological properties, based on protein target sequence data derived from the DrugBank and UniProt databases [5], [44]; and 3) clinical and phenotypic properties, based on side effect data derived from the SIDER database [8].
The PIMD consists of three parts (Figure 1). In the first part, we constructed three DSNs based on side effects, chemical structure, and drug targets, denoted as DSN-S, DSN-C, and DSN-T, respectively. Then, we applied the SNF method to fuse the DSN-S, DSN-C, and DSN-T into the iDSN in a manner that enables the relative contribution of each data type to be calculated. In the second part, we assessed drug repositioning using two approaches: finding drug pairs with a high similarity score, and identifying unexpected drugs in each cluster. In the iDSN, drug pairs are ranked according to their similarity scores, with highly ranked drug pairs being the most likely to achieve drug repositioning, wherein one drug in a pair may be repurposed for the therapeutic properties of the other drug. We used spectral clustering to distinguish clusters within the iDSN [55], wherein drugs within the same cluster have similar therapeutic properties, as reflected by first-level ATC codes. Within each cluster, drugs with unexpected ATC labels are flagged as having the potential for a repositioning of therapeutic properties. In the third part, aimed at drug precision repositioning, we carried out the following series of enrichment analyses to label and annotate each cluster: 1) drug-based enrichment analysis; 2) target-based enrichment analysis; 3) drug property analysis; 4) CGEA; and 5) chemical ontology enrichment analysis. These analyses can provide guidance to researchers conducting drug repositioning studies from various perspectives. They may also help to elucidate differences in the mode of action of different drug clusters and reveal potential associations between drugs within a cluster.
Global analysis of iDSN
We used PubChem CIDs as the sole drug identifiers and extracted chemical structures, target sequences, and side effect sources for 593 drugs from the PubChem Compound, DrugBank, UniProt, and SIDER databases. After constructing the DSN-C, DSN-T, and DSN-S, and combining these three networks into the iDSN with the SNF method, we applied spectral clustering to divide the iDSN into 32 subnetworks (labeled Cluster 1–32) based on two validity indexes (Figure 2, Figure S1) [55], [56].
In the iDSN, each node represents a drug, and the edges connecting the nodes are thickness-weighted according to similarity of the connected drugs. Node color represents the cluster which the drug belongs to. Note that high-similarity connections are found predominantly within clusters. Edge color indicates the data type that is the main contributor to the similarity between the drugs, corresponding to the color scheme in the pie chart. Note that the iDSN model as a whole is supported by all analyzed types of data; in particular, drug pair similarities in the iDSN are supported by two or more types of data, and among the three single drug properties, the highest relative contribution comes from the drug target-based data.
We divided the iDSN edges into two categories: within-cluster edges and between-cluster edges. Data contribution analysis for edges within the cluster (Figure S2) shows that side effect-based data, chemical structure-based data, and drug target-based data account for 11.0%, 22.2%, and 43.9%, respectively. Although the contributions of drug target-based data and side effect-based data are greater than that of chemical structure-based data for all edges in the iDSN (Figure 2), side effect-based data contribute the least to the edges inside the cluster and excessive contribution is from drug target-based data (Figure S2). These results indicate that drug target-based data play a more important role in drug clustering. Data contribution details for each cluster are reported in Table S1. For instance, the contribution of chemical structure-based data in Cluster 6 accounts for 94.4%. We found that Cluster 6 drugs are all peptide drugs, which have chemical structures significantly different from those of other drugs. Cluster 6 drugs also exhibit a significant deviation for physicochemical features (Figure S3A and B). The deviation is consistent with the nature of these peptide drugs.
Comparing the performances of the iDSN and each single-property network, we found that the iDSN has a clearer cluster structure than any of the three single-property DSNs (Figure 3A). To verify that, we calculated Dunn index, a metric for evaluating the quality of clustering results. The Dunn index for the iDSN is higher than those for the single-property DSNs (Figure 3B).
Furthermore, compared to the single-property DSNs, the iDSN has a larger overlap between the drug cluster and drug ATC labels. NMI, which reflects consistency across the original ATC and cluster labels, was examined, and a higher NMI for the iDSN was obtained than those for the DSN-C, DSN-T, or DSN-S (Figure 3C). In addition, we compared the AOR of each single-property DSN and the iDSN by first ranking similarity scores of all drug pairs in the iDSN or single-property DSNs and assigning overlapping consecutive drug pair bins with 3000 pairs per bin, such that bin 1 contains the top 3000 most similar drug–drug pairs, bin 2 contains pairs ranked 100th to 3100th in similarity, and so on (first 320 bins are shown in Figure 3D). Subsequent calculation of AORs for each bin shows clearly that the AORs for the iDSN are higher than those of the single-property DSNs (Figure 3D). These results suggest that PIMD makes full use of common and complementary information about drug properties and that the iDSN performs better than any single-property DSNs.
The network fusion of PIMD goes beyond a simple integration representing a maximum or an average of drug similarity measurements. It can capture potential links between drugs. On the one hand, if the similarity score in a single-property DSN is high, but the similarity scores in other single-property DSNs are low, PIMD does not dilute the original information. On the other hand, if the similarity scores for a drug pair in each of the three single-property DSNs are unremarkable, PIMD can still capture potential similarities. For example, the first-level ATC codes for fenoprofen and sulfasalazine are M and A, respectively. Fenoprofen (CID: 000003342) is used for symptomatic relief of rheumatoid arthritis, osteoarthritis, and mild to moderate pain, whereas sulfasalazine (CID: 005359476) is used to treat inflammatory bowel disease. The similarity scores for fenoprofen and sulfasalazine based on the three individual properties are relatively low (rank of 16,951 in DSN-C, 13,466 in DSN-T, and 146,145 in DSN-S), while the iDSN similarity score ranked much higher (rank of 2729) and both drugs were placed in Cluster 8. Although crossover in an application would not have been predicted by any of the single-property DSNs, it has been reported that sulfasalazine can be used to treat rheumatoid arthritis [57], [58].
PIMD shows better performance than previous integrated methods
To better evaluate the network fusion performance of PIMD, we compared our results with three previous integrative methods: 1) the maximum method [21], 2) the weighted average method [20], [22], [24], [25], [26], [27], [28], [29], [30], [31], [32], and 3) the probability disjunction [23]. Here, we calculated two internal indices (connectivity and Dunn index) and two external comparison indices (NMI and Rogers–Tanimoto index). The internal indices are used to measure the goodness of a clustering structure without external information [59]. The external indices are a measure of agreement between two partitions where the first partition is the a priori known clustering structure, and the second partition results from the clustering procedure [60]. The connectivity indicates the degree of connectedness of the clusters. The Rogers–Tanimoto index reflects similarity between the original ATC and cluster labels. Among the four indices, only connectivity should be minimized. We found that PIMD performs the best compared with other methods (Figure 3E). The superior network fusion performance of PIMD results from the application of network-based approach, which is non-linear and utilizes topology information of the network.
Drug repositioning from two aspects
There are two approaches for drug repositioning: the first is finding drug pairs with a high similarity score; the second is finding unexpected drugs in each cluster. Drug pairs with high similarity in the iDSN could provide us with clues for drug repositioning. The AORs of the top 10 (Table 1) and the top 100 (top 1000 drug pairs are listed in Table S2) drug pairs reached 80% and 94%, respectively. Drug pairs with high similarity scores but different ATC codes may have the potential for repositioning. For example, triptorelin (CID: 025074470) and nafarelin acetate (CID: 025077649) are both gonadotropin-releasing hormone receptor agonists despite having different first-level ATC codes. Interestingly, this drug pair has a high iDSN similarity score and is ranked second in the drug association list. Within the top 20 drug pairs, 7 drugs have been repurposed successfully according to RepurposeDB [51] and Repurposed Drug Database (http://drugrepurposingportal.com/), which record all repurposed drugs thus far. We further checked whether the repositioning in the databases is relevant to our prediction and found that 6 out of these 7 drugs are repositioned for the same purpose as we predicted.
Table 1.
Note: Similarity scores between drug pairs among the 593 drugs in the iDSN were calculated using SNF and the 10 drug pairs with highest similarity scores were selected. iDSN, integrated drug similarity network; SNF, Similarity Network Fusion; ATC, anatomical therapeutic chemical; A, alimentary tract and metabolism; C, cardiovascular system; D, dermatological; H, systemic hormonal preparations, excluding sex hormones and insulins; J, anti-infectives for systemic use; L, antineoplastic and immunomodulating agents; M, musculoskeletal system; N, nervous system; R, respiratory system; S, sensory organs; V, various.
Unexpected drugs within an iDSN drug cluster can also signify the potential for repositioning based on distinct therapeutic properties. That is, a drug with an unusual ATC code within a certain cluster may be repositioned for alternative therapeutic properties. The unexpected drugs in each cluster are indicated in Table S3, and some cases are discussed in the section of drug repositioning case using PIMD.
We compared the two drug repositioning approaches together. In the top 100 drug pairs with a high similarity score, there are 6 drug pairs with totally different first-level ATC codes, among which 3 pairs can be discovered using the second approach. In the top 1000 drug pairs, there are 234 drug pairs with totally different first-level ATC codes, among which 75 pairs can be discovered using the second approach. These data indicate a certain overlap between the results predicted using these two approaches. The unexpected drugs in each cluster can achieve drug repositioning between different ATC codes. Compared to the second approach, top-ranking drug pairs with the same ATC codes are more inclined to achieve drug repositioning.
To explore the global repositioning association among ATC codes, we analyzed the top 5% of drug pairs in the iDSN. After dividing these drugs into 14 groups according to their ATC codes, we averaged the similarity scores between the different drug groups and used these averages as indices of the repositioning potential within each ATC code. As shown in Figure 4 (where edge thickness represents repositioning potential between two ATCs), there is a particularly high repositioning potential between dermatological drugs and respiratory system drugs. Among 78 dermatological drugs (D) and respiratory system drugs (R), 23 drugs have been repurposed successfully between D and R (Table S4).
Drug cluster annotation in the iDSN
To label and annotate clusters for drug precision repositioning, we performed iDSN cluster analysis consisting of five statistical analysis methods from drug and target perspectives, thus providing multiple views to verify and select drug clusters for researchers in different fields. These analyses reveal potential links between drugs in the same clusters and differences in the modes of action between drugs in different clusters.
ATC, superclass, and ADMET property enrichment analyses of drugs
We set to explore to what extent drugs within a cluster share a common ATC code. For each cluster, we computed an ATC code enrichment score and an accompanying P value. We found that 30 out of 32 clusters were significantly enriched for at least one ATC code (P < 0.05; Figure 5A). Furthermore, 25 out of 32 clusters were significantly enriched for at least one superclass label (P < 0.05; Figure 5B). The superclass label of the drug is extracted from the DrugBank database and focuses more on the chemical attributes of drugs. Combining the two enrichment results may lead to new discoveries. For example, Cluster 22 was found to be enriched in the lignan/norlignan superclass as well as in the antineoplastic and immunomodulating agent ATC code. Recent studies have shown that lignans/norlignans play an important role in anti-cancer therapies [61]. In addition, we extracted 18 ADMET property terms from the DrugBank database and examined whether drugs within the same cluster tend to have the same ADMET properties (Figure S3C). For example, 6 clusters are enriched in the nervous system ATC code, while four of them (Cluster 11, Cluster 26, Cluster 28, and Cluster 30) are also enriched in the blood brain barrier (+). This observation is in line with the knowledge that nervous system drugs usually need to penetrate the blood brain barrier.
KEGG and GO enrichment analyses of targets
To explore whether drugs in the same cluster tend to target similar proteins and whether particular drug classes are associated with particular target classes, we performed KEGG pathway and GO enrichment analyses for targets of each drug cluster [48], [49], [50]. The most enriched pathways, biological processes, cellular components, and molecular functions (Figure 5C and D, Figure S3D and E; result matrices are listed in Table S5) differ substantially among the different clusters, highlighting differences in mode of action of drugs. Nonetheless, some drug clusters were found to contain common pathways and biological processes. We applied key biological themes for major clusters. For example, Cluster 27, which is enriched for antineoplastic and immunomodulating agents, was most enriched for the KEGG pathways of Rap1 signaling pathway, Ras signaling pathway, and central carbon metabolism in cancer. The most enriched GO biological process terms for Cluster 27 include protein autophosphorylation, positive regulation of MAPK cascade, and phosphatidylinositol-mediated signaling. These pathways and biological processes are indeed closely related to the mechanisms of antineoplastic drugs.
Another example is Cluster 16. Drugs in Cluster 16 are enriched for the ‘respiratory system’ ATC code. Chemical ontology enrichment analysis shows that drugs in this cluster are enriched for methylxanthine. The methylxanthines in Cluster 16, such as caffeine and theophylline, are used in therapy for respiratory diseases. However, the KEGG pathway enrichment analysis shows that drug targets in this cluster are also enriched for some cancer and immunology related pathways. These data suggest the association between methylxanthine and cancer, which is supported by recent studies [62], [63].
Furthermore, we conducted target enrichment analysis for each cluster based on class information. The class information is collected from the IUPHAR/BPS database [47], which indicates the type of drug targets, such as G-protein-coupled receptors (GPCR), catalytic receptor, and enzyme. We found that 24 of the 32 iDSN clusters were significantly enriched (P < 0.05; Figure S3F). For example, Cluster 32 is enriched in the voltage-gated ion channel (VGIC) target class as well as in the ‘cardiovascular system’ ATC code. This observation is in line with the fact that most of Cluster 32 drugs are calcium channel blockers and used as antihypertensive drugs, whose targets mainly include calcium voltage-gated channel alpha1 (CACNA1) subunits.
Physicochemical feature and chemical descriptor analyses of drug properties
Given pharmacological profiling of small molecules may also affect drug repositioning, we analyzed the characteristic physicochemical features and chemical descriptors for each cluster. In total, we extracted 14 physicochemical features from the RepurposeDB drug property list [51] and 62 chemical (i.e., atomic, compositional, and geometric) descriptors. These properties were quantitated using the Pybel, JOELib2, and Chemminer chemoinformatics tools [45], [52], [53]. For each property, we calculated mean values of these drug properties for all drugs in DrugBank and in each cluster. We found that the mean value of drug properties for drugs in each cluster was deviated from that for all drugs in DrugBank (Table S6). The degrees of statistically significant deviation (P < 0.05; see Method) are shown in Figure S3A and G. Considering the bias resulting from the incompleteness of drug set, we also compared the mean value of drugs in each cluster with those of all the 593 drugs in the iDSN (Figure S3B and H).
Chemogenomic enrichment analysis
CGEA, similar to gene set enrichment analysis, is a method that compares drugs with a range of biological and chemical annotations [64] (http://server.dudleylab.org/index). By identifying chemogenomic characteristics shared by sets of drugs such as enzymes, transporters, and structural fragments, we can obtain abundant chemogenomic context for a biological state of the drug set. We used CGEA to analyze drugs in each cluster for biological and chemical annotations. The results are listed in Table S7. The example of CGEA is described in detail in the section of ‘Case study: drug repositioning using 4 clusters of iDSN’.
Enrichment analysis of chemical ontology
Chemical ontology enrichment analysis is based on the ChEBI Ontology, which is a dictionary of chemical compounds with biological roles [65]. Mapping drugs to the ChEBI database can improve our understanding of biochemical nature of drugs. We performed a chemical ontology enrichment analysis of the drugs in each cluster using the BiNChE tool [54]. The resultant ChEBI-based enriched structure terms and role terms for each cluster are provided in Table S8. The example of chemical ontology enrichment analysis is described in detail in the section of ‘Case study: drug repositioning using 4 clusters of iDSN’.
These analyses validate the rationality of the cluster division. Furthermore, by performing the 5 types of analyses described above, we have a comprehensive understanding of various properties of drugs in the cluster to conduct drug repositioning better. For example, Cluster 15 and Cluster 27 are both enriched for antineoplastic and immunomodulating agents, but their enriched KEGG pathways and GO terms (Figure 5C and D, Figure S3D and E) differed from each other. Chemical ontology enrichment analysis shows that Cluster 15 drugs have a role term of antimetabolite, while Cluster 27 drugs have a role term of protein kinase inhibitor. Most of Cluster 15 drugs are purine or pyrimidine analogs, whereas most of Cluster 27 drugs are tyrosine kinase inhibitors. These analyses highlight the differences in mode of action of the drugs. In Cluster 15, ribavirin (ATC code for ‘anti-infectives for systemic use’), which is a guanosine analog used for anti-virus, shows its therapeutic potential for cancers [66], and its mode of action is similar to other Cluster 15 drugs.
Additionally, these analyses provide us a chance to look at the same drug cluster in combination of different perspectives, which may bring us some new discoveries. For example, Cluster 5 drugs are enriched for the ‘nervous system’ ATC code. Chemical ontology enrichment analysis shows that Cluster 5 drugs have a role term of serotonergic drug, a type of nervous system-related drugs. KEGG pathway enrichment and GO enrichment analyses also reveal terms related to the nervous system. However, there are also terms related to cardiovascular system according to cellular components (CC) and molecular function (MF) of GO enrichment analysis results (Figure S3D and E). In ADMET property enrichment analysis, Cluster 5 drugs are enriched in human ether-a-go-go-related gene (hERG) inhibition (predictor I, strong inhibitor) and hERG inhibition (predictor II, inhibitor). hERG inhibition is related to QT prolongation. Moreover, CGEA results indicate that Cluster 5 drugs are enriched for the side effect electrocardiogram QT prolonged. These aforementioned analyses suggest that some serotonergic drugs (especially in Cluster 5) may cause QT prolongation. Indeed, some recent studies revealed the association between serotonergic drugs and QT prolongation [67], [68].
Case study: drug repositioning using 4 clusters of iDSN
To illustrate the drug repositioning performance of PIMD, we examined some clusters in this section.
The Cluster 28 drugs (Figure 2) are enriched for the ‘nervous system’ ATC code, the ‘benzenoids’ superclass label, and the ‘G-protein coupled receptors’ target class. KEGG pathway enrichment and GO enrichment analysis results also include terms related to the nervous system (Figure 6A and B, Figure S4A and B; Table S9). Our CGEA analysis shows that Cluster 28 drugs are enriched for the molecular fragment CCCN(C)C and drug-induced transcriptional module PC3-3 (Figure 6C). The module contains expression profiles for 38 genes in response to 25 drugs, 6 drugs out of which are included in Cluster 28 [69]. The most prominent drug mode of action among Cluster 28 drugs is ‘Antihistamines for systemic use’, which can affect the central nervous system [70]. Chemical ontology enrichment analysis illustrates that Cluster 28 drugs are enriched with neurotransmitters, neurotransmitter derivatives, and central nervous system drugs (Figure 6D–F; drug property analysis results in Figure 6F). We found that cyclobenzaprine, a drug used to treat skeletal muscle spasms and fibromyalgia, is an unexpected drug in Cluster 28. Cyclobenzaprine, as well as another drug named the anti-depression drug amitriptyline in this cluster, exhibits antagonistic effects on the 5-hydroxytryptamine receptor 2A and possesses a tricyclic structure. Cyclobenzaprine is being studied for the post-traumatic stress disorder treatment according to ClinicalTrials.gov (https://clinicaltrials.gov/), a database of clinical studies conducted around the world. Therefore, the novel therapeutic property for cyclobenzaprine might be N (nervous system). These results indicate that Cluster 28 drugs are related to nervous system and show the good compatibility. Unexpected drugs in Cluster 28 also have the potential to be repositioned to ‘nervous system’ ATC code.
Similarly, pentoxifylline, an unexpected drug in Cluster 16 (drugs in this cluster are enriched for the ‘respiratory system’ ATC code), is an interesting case. Pentoxifylline carries an ATC code for ‘cardiovascular system’ and is used in therapy for intermittent claudication [71]. Notably, it has a high similarity score with theophylline (ATC code for ‘respiratory system’), another drug in Cluster 16 used to treat respiratory diseases such as asthma. They are both members of the xanthine family, so they have a relatively high similarity score based on chemical structure (rank of 2898 in DSN-C). They also have a relatively high similarity score based on drug targets (rank of 7277 in DSN-T) because of the presence of 5 common targets. Pentoxifylline can increase red blood cell deformability and decrease blood viscosity [72]. But its similarity with theophylline has encouraged researchers to explore the potential of pentoxifylline to treat asthma [73].
Clusters simultaneously enriched in two therapeutic properties can also provide a unique perspective for drug repurposing. Cluster 3 drugs (Figure 2) are enriched for both the ‘cardiovascular system’ and ‘genitourinary system and reproductive hormones’ ATC codes (Figure S5A–G; Table S9). In this cluster, iloperidone, a drug used for schizophrenia, may be repositioned for applications related to the ‘cardiovascular system’ and the ‘genitourinary system and reproductive hormones’. Indeed, a previous study showed that repeat administration of iloperidone moderated hypotension [74]. Interestingly, we noted that sildenafil, a drug that was successfully repositioned from ‘cardiovascular system’ treatment to ‘genitourinary system and reproductive hormones’ treatment, is also present in Cluster 3, suggesting that drugs in this cluster may have the potential for both therapeutic properties. Likewise, prazosin and terazosin are an interesting pair. Prazosin (ATC code for ‘cardiovascular system’) is used to treat hypertension and also for urinary hesitancy associated with prostatic hyperplasia [75]. Terazosin (ATC code for ‘genitourinary system and reproductive hormones’) is used to treat enlarged prostate symptoms, which can also moderate hypertension [76], [77]. Hence, the primary indications of these two drugs can be treated as secondary indications of each other. In terms of target proteins, both drugs have antagonistic effects on alpha-1A, -1B, and -1D adrenergic receptors. In terms of chemical structures, their most common substructure is the molecular fragment O=CN1CCNCC1. Their common adverse reactions include dizziness, headache, drowsiness, lack of energy, and weakness.
Another interesting case is cyproterone acetate (ATC code for ‘genito-urinary system and sex hormones’) in Cluster 19. Cyproterone acetate can not only treat androgen-dependent conditions like excessive hair growth and acne but also treat prostate cancer [78]. Therefore, the novel therapeutic property for cyproterone acetate could be L (antineoplastic and immunomodulating agents). More interestingly, cyproterone acetate was originally developed as a progestin [79], but it was first marketed as an antiandrogen [80]. Cluster 19 drugs are enriched for both the ‘genito-urinary system and sex hormones’ and ‘antineoplastic and immunomodulating agents’ ATC codes. In this cluster, cyproterone acetate has high similarity scores with bicalutamide (ATC code for ‘antineoplastic and immunomodulating agents’) and medroxyprogesterone acetate (ATC code for ‘antineoplastic and immunomodulating agents’ and ‘genito-urinary system and sex hormones’). For the drug pair of cyproterone acetate and medroxyprogesterone acetate, their similarity score based on the chemical structure is relatively high (rank of 101 in DSN-C), but the similarity score based on drug targets is relatively low (rank of 35,974 in DSN-T). On the contrary, cyproterone acetate and bicalutamide have relatively high similarity score based on drug targets (rank of 178 in DSN-T) but relatively low similarity score based on chemical structures (rank of 81,257 in DSN-C). This is because cyproterone acetate is more structurally similar to medroxyprogesterone acetate (a type of progestin) but has antiandrogenic effects on androgen receptor like bicalutamide (a kind of antiandrogen). This explains the uniqueness of cyproterone acetate. These results also show that PIMD takes advantage of the universality, individuality, and complementarity of different drug properties.
Discussion
Here, we proposed PIMD, an integrative, systematic, and extensible framework for discovering novel therapeutic properties of drugs from heterogeneous data sources. PIMD characterizes the iDSN by integrating chemical structure data, target protein sequence data, and side effect data; and performing spectral clustering of the iDSN identified 32 drug clusters. Additionally, PIMD facilitates drug repositioning from two aspects: drug pairs with high iDSN similarity score and unexpected drugs in each cluster. Finally, via a series of enrichment analyses, PIMD annotates and evaluates all clusters from chemical, pharmacological, and genomic views. Thus, PIMD screens suitable clusters for drug precision repositioning.
By integrating multi-dimensional drug informatics data, PIMD can capture potential similarities between drugs with sensitivity. The iDSN is superior to single-property DSNs in multiple evaluation measurements, including AOR, NMI, and cluster compactness. In this study, we primarily used ATC label as a golden standard to annotate drugs and evaluate the performance of PIMD and other compared methods. Interestingly, if the superclass label is used to calculate the evaluation measurements instead of ATC code, the SOR and NMI values of chemical structure-based DSN are higher than those of other DSNs, even the iDSN (Figure S6A and B). This suggests that chemical property has a large positive effect on superclass, whereas clinical and pharmacological properties could affect drug superclass label negatively. Indeed, ATC label and superclass label are two different drug catalogs. ATC label can comprehensively reflect therapeutic properties of drugs, thus commonly used to classify and label drugs, whereas superclass label primarily represents the chemical properties of the drugs. This result can improve our understanding of the difference between ATC and superclass labels of the drugs.
PIMD offers new insights into the universality, individuality, and complementarity of three drug properties, including chemical, pharmacological, and clinical properties. Our calculations of the relative contribution of each data type indicate that the iDSN is driven by all data types. PIMD makes full use of the information on each property. Examination of NMI among single-property, dual-property, and three-integrated-property DSNs (Figure S7A–D) shows that networks based on single property alone overlap marginally but are complementary to each other instead. Furthermore, PIMD provides the drug property contribution to each cluster, improving our understanding of the cluster characteristics (Table S1).
Rather than simple fusion, PIMD provides a systematic way to evaluate the drug cluster and drug repositioning. Pharmacochemistry and pharmacogenomics researchers can use PIMD to screen drug clusters based on their own requirements (Figure 5A–D, Figure S3A–F; Tables S7 and S8).
As a highly extensible framework, PIMD can fuse various properties beyond the three properties examined here. For example, combining drug expression profiles would allow us to elucidate similarities between drugs at the transcriptional level. Drugs have multi-dimensional properties, and drug effects on disease processes are an interdisciplinary issue. The complementarity among multiple properties allows us to assess drug similarities more accurately and thus to provide a more comprehensive and clearer direction for drug repositioning.
There are two reasons why we chose these three data sources as an example. Firstly, these three data sources are representative of a variety of drug informatics data. Side effect, chemical structure, and molecular target data represent the clinical, chemical, and pharmacological properties of drugs, respectively. The three properties comprehensively summarize the drug characteristics. Secondly, these three data sources have sufficient data to extract. We can collect plenty of these data from the public databases. Moreover, despite insufficient data available, we also compared iDSN with DSNs based on three other data types: 1) drug 3D structure, 2) drug expression profiles, and 3) PPI network (similarity measurements can be found in File S1). The results show that iDSN performs best compared with other DSNs (Figure S8).
The current study is limited by the set of drugs available when using the intersection of drug sets with multi-dimensional properties. If a certain property of a drug is not available, it would be excluded from the construction of iDSN. Therefore, with the accumulation of drug informatics data in the future, we expect that the scale of iDSN would be expanded, and the performance of PIMD would be further improved accordingly.
In summary, PIMD provides a new perspective for drug repositioning through multi-property fusion and an analysis package. It facilitates to understand the integration of drug properties at a deeper level, and its high expansibility and modularity would allow users to explore drugs from a wider range of fields.
Code availability
Source code of PIMD is available at https://github.com/Sepstar/PIMD/.
CRediT author statement
Song He: Conceptualization, Methodology, Investigation, Software, Visualization, Writing - original draft, Writing - review & editing. Yuqi Wen: Conceptualization, Methodology, Investigation, Software, Visualization, Writing - original draft, Writing - review & editing. Xiaoxi Yang: Investigation, Visualization. Zhen Liu: Investigation, Visualization. Xinyu Song: Investigation, Visualization. Xin Huang: Investigation, Visualization. Xiaochen Bo: Conceptualization, Methodology, Supervision, Software, Writing - review & editing. All authors read and approved the final manuscript.
Competing interests
The authors have declared no competing interests.
Acknowledgments
This work was supported by the National Natural Science Foundation of China (Grant No. U1435222) and the Program of International Sci-Tech Cooperation, China (Grant No. 2014DFB30020).
Handled by Xin Gao
Footnotes
Peer review under responsibility of Beijing Institute of Genomics, Chinese Academy of Sciences and Genetics Society of China.
Supplementary data to this article can be found online at https://doi.org/10.1016/j.gpb.2018.10.012.
Supplementary material
The following are the Supplementary data to this article:
References
- 1.Paul S.M., Mytelka D.S., Dunwiddie C.T., Persinger C.C., Munos B.H., Lindborg S.R. How to improve R&D productivity: the pharmaceutical industry's grand challenge. Nat Rev Drug Discov. 2010;9:203–214. doi: 10.1038/nrd3078. [DOI] [PubMed] [Google Scholar]
- 2.Adams C.P., Brantner V.V. Estimating the cost of new drug development: is it really $802 million dollars? Health Aff. 2006;25:420–428. doi: 10.1377/hlthaff.25.2.420. [DOI] [PubMed] [Google Scholar]
- 3.DiMasi J.A., Hansen R.W., Grabowski H.G. The price of innovation: new estimates of drug development costs. J Health Econ. 2003;22:151–185. doi: 10.1016/S0167-6296(02)00126-1. [DOI] [PubMed] [Google Scholar]
- 4.Boguski M.S., Mandl K.D., Sukhatme V.P. Repurposing with a difference. Science. 2009;324:1394–1395. doi: 10.1126/science.1169920. [DOI] [PubMed] [Google Scholar]
- 5.Law V., Knox C., Djoumbou Y., Jewison T., Guo A.C., Liu Y. DrugBank 4.0: shedding new light on drug metabolism. Nucleic Acids Res. 2014;42:D1091–D1097. doi: 10.1093/nar/gkt1068. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Günther S., Kuhn M., Dunkel M., Campillos M., Senger C., Petsalaki E. SuperTarget and Matador: resources for exploring drug–target relationships. Nucleic Acids Res. 2008;36:D919–D922. doi: 10.1093/nar/gkm862. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Kim S., Thiessen P.A., Bolton E.E., Chen J., Fu G., Gindulyte A. PubChem substance and compound databases. Nucleic Acids Res. 2016;44:D1202–D1213. doi: 10.1093/nar/gkv951. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Kuhn M., Letunic I., Jensen L.J., Bork P. The SIDER database of drugs and side effects. Nucleic Acids Res. 2016;44:D1075–D1079. doi: 10.1093/nar/gkv1075. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Iorio F., Bosotti R., Scacheri E., Belcastro V., Mithbaokar P., Ferriero R. Discovery of drug mode of action and drug repositioning from transcriptional responses. Proc Natl Acad Sci U S A. 2010;107:14621–14626. doi: 10.1073/pnas.1000138107. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Xie L., He S., Wen Y., Bo X., Zhang Z. Discovery of novel therapeutic properties of drugs from transcriptional responses based on multi-label classification. Sci Rep. 2017;7:7136. doi: 10.1038/s41598-017-07705-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Keiser M.J., Setola V., Irwin J.J., Laggner C., Abbas A.I., Hufeisen S.J. Predicting new molecular targets for known drugs. Nature. 2009;462:175–181. doi: 10.1038/nature08506. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Campillos M., Kuhn M., Gavin A.C., Jensen L.J., Bork P. Drug target identification using side-effect similarity. Science. 2008;321:263–266. doi: 10.1126/science.1158140. [DOI] [PubMed] [Google Scholar]
- 13.Yang L., Agarwal P. Systematic drug repositioning based on clinical side-effects. PLoS One. 2011;6:e28025. doi: 10.1371/journal.pone.0028025. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Yıldırım M.A., Goh K.I., Cusick M.E., Barabási A.L., Vidal M. Drug–target network. Nat Biotechnol. 2007;25:1119–1126. doi: 10.1038/nbt1338. [DOI] [PubMed] [Google Scholar]
- 15.Keiser M.J., Roth B.L., Armbruster B.N., Ernsberger P., Irwin J.J., Shoichet B.K. Relating protein pharmacology by ligand chemistry. Nat Biotechnol. 2007;25:197–206. doi: 10.1038/nbt1284. [DOI] [PubMed] [Google Scholar]
- 16.Hopkins A.L. Network pharmacology: the next paradigm in drug discovery. Nat Chem Biol. 2008;4:682–690. doi: 10.1038/nchembio.118. [DOI] [PubMed] [Google Scholar]
- 17.Coelho E.D., Arrais J.P., Oliveira J.L. Computational discovery of putative leads for drug repositioning through drug–target interaction prediction. PLoS Comput Biol. 2016;12:e1005219. doi: 10.1371/journal.pcbi.1005219. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Wang Z., Clark N.R., Ma’ayan A. Drug-induced adverse events prediction with the LINCS L1000 data. Bioinformatics. 2016;32:2338–2345. doi: 10.1093/bioinformatics/btw168. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Wang B., Mezlini A.M., Demir F., Fiume M., Tu Z., Brudno M. Similarity network fusion for aggregating data types on a genomic scale. Nat Methods. 2014;11:333–337. doi: 10.1038/nmeth.2810. [DOI] [PubMed] [Google Scholar]
- 20.Napolitano F., Zhao Y., Moreira V.M., Tagliaferri R., Kere J., D’Amato M. Drug repositioning: a machine-learning approach through data integration. J Cheminform. 2013;5:30. doi: 10.1186/1758-2946-5-30. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Wang Y., Chen S., Deng N., Wang Y. Drug repositioning by kernel-based integration of molecular structure, molecular activity, and phenotype data. PLoS One. 2013;8:e78518. doi: 10.1371/journal.pone.0078518. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Zhang P., Agarwal P., Obradovic Z. Computational drug repositioning by ranking and integrating multiple data sources. Mach Learn Knowl Discov Databases. 2013:579–594. [Google Scholar]
- 23.Liu H., Song Y., Guan J., Luo L., Zhuang Z. Inferring new indications for approved drugs via random walk on drug-disease heterogenous networks. BMC Bioinformatics. 2016;17:539. doi: 10.1186/s12859-016-1336-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Wen M., Zhang Z., Niu S., Sha H., Yang R., Yun Y. Deep-learning-based drug–target interaction prediction. J Proteome Res. 2017;16:1401–1409. doi: 10.1021/acs.jproteome.6b00618. [DOI] [PubMed] [Google Scholar]
- 25.Nascimento A.C., Prudêncio R.B., Costa I.G. A multiple kernel learning algorithm for drug–target interaction prediction. BMC Bioinformatics. 2016;17:46. doi: 10.1186/s12859-016-0890-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Shi J.Y., Yiu S.M., Li Y., Leung H.C., Chin F.Y. Predicting drug–target interaction for new drugs using enhanced similarity measures and super-target clustering. Methods. 2015;83:98–104. doi: 10.1016/j.ymeth.2015.04.036. [DOI] [PubMed] [Google Scholar]
- 27.Naveed H., Hameed U.S., Harrus D., Bourguet W., Arold S.T., Gao X. An integrated structure- and system-based framework to identify new targets of metabolites and known drugs. Bioinformatics. 2015;31:3922–3929. doi: 10.1093/bioinformatics/btv477. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Zhang P., Wang F., Hu J. Towards drug repositioning: a unified computational framework for integrating multiple aspects of drug similarity and disease similarity. AMIA Annu Symp Proc. 2014;2014:1258–1267. [PMC free article] [PubMed] [Google Scholar]
- 29.Yamanishi Y., Kotera M., Moriya Y., Sawada R., Kanehisa M., Goto S. DINIES: drug–target interaction network inference engine based on supervised analysis. Nucleic Acids Res. 2014;42:W39–W45. doi: 10.1093/nar/gku337. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Fakhraei S., Huang B., Raschid L., Getoor L. Network-based drug–target interaction prediction with probabilistic soft logic. IEEE/ACM Trans Comput Biol Bioinform. 2014;11:775–787. doi: 10.1109/TCBB.2014.2325031. [DOI] [PubMed] [Google Scholar]
- 31.Zheng X., Ding H., Mamitsuka H., Zhu S. The 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2013. Collaborative matrix factorization with multiple similarities for predicting drug–target interactions; pp. 1025–1033. [Google Scholar]
- 32.Mei J.P., Kwoh C.K., Yang P., Li X.L., Zheng J. Drug–target interaction prediction by learning from local information and neighbors. Bioinformatics. 2013;29:238–245. doi: 10.1093/bioinformatics/bts670. [DOI] [PubMed] [Google Scholar]
- 33.Luo Y., Zhao X., Zhou J., Yang J., Zhang Y., Kuang W. A network integration approach for drug–target interaction prediction and computational drug repositioning from heterogeneous information. Nat Commun. 2017;8:573. doi: 10.1038/s41467-017-00680-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Wang W., Yang S., Zhang X., Li J. Drug repositioning by integrating target information through a heterogeneous network model. Bioinformatics. 2014;30:2923–2930. doi: 10.1093/bioinformatics/btu403. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Xia Z., Wu L.Y., Zhou X., Wong S.T. Semi-supervised drug-protein interaction prediction from heterogeneous biological spaces. BMC Syst Biol. 2010;4:S6. doi: 10.1186/1752-0509-4-S2-S6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Bleakley K., Yamanishi Y. Supervised prediction of drug–target interactions using bipartite local models. Bioinformatics. 2009;25:2397–2403. doi: 10.1093/bioinformatics/btp433. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.van Laarhoven T., Marchiori E. Predicting drug–target interactions for new drug compounds using a weighted nearest neighbor profile. PLoS One. 2013;8:e66952. doi: 10.1371/journal.pone.0066952. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.van Laarhoven T., Nabuurs S.B., Marchiori E. Gaussian interaction profile kernels for predicting drug–target interaction. Bioinformatics. 2011;27:3036–3043. doi: 10.1093/bioinformatics/btr500. [DOI] [PubMed] [Google Scholar]
- 39.Wang Y., Zeng J. Predicting drug–target interactions using restricted Boltzmann machines. Bioinformatics. 2013;29:i126–i134. doi: 10.1093/bioinformatics/btt234. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Yuan Q., Gao J., Wu D., Zhang S., Mamitsuka H., Zhu S. DrugE-Rank: improving drug–target interaction prediction of new candidate drugs or targets by ensemble learning to rank. Bioinformatics. 2016;32:i18–i27. doi: 10.1093/bioinformatics/btw244. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Wan F., Hong L., Xiao A., Jiang T., Zeng J. NeoDTI: neural integration of neighbor information from a heterogeneous network for discovering new drug–target interactions. Bioinformatics. 2019;35:104–111. doi: 10.1093/bioinformatics/bty543. [DOI] [PubMed] [Google Scholar]
- 42.Yamanishi Y., Kotera M., Kanehisa M., Goto S. Drug–target interaction prediction from chemical, genomic and pharmacological data in an integrated framework. Bioinformatics. 2010;26:i246–i254. doi: 10.1093/bioinformatics/btq176. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.He S., He H., Xu W., Huang X., Jiang S., Li F. ICM: a web server for integrated clustering of multi-dimensional biomedical data. Nucleic Acids Res. 2016;44:W154–W159. doi: 10.1093/nar/gkw378. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44.UniProt Consortium UniProt: a hub for protein information. Nucleic Acids Res. 2015;43:D204–D212. doi: 10.1093/nar/gku989. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45.Cao Y., Charisi A., Cheng L.C., Jiang T., Girke T. ChemmineR: a compound mining framework for R. Bioinformatics. 2008;24:1733–1734. doi: 10.1093/bioinformatics/btn307. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46.Smith T.F., Waterman M.S. Identification of common molecular subsequences. J Mol Biol. 1981;147:195–197. doi: 10.1016/0022-2836(81)90087-5. [DOI] [PubMed] [Google Scholar]
- 47.Pawson A.J., Sharman J.L., Benson H.E., Faccenda E., Alexander S.P., Buneman O.P. The IUPHAR/BPS guide to pharmacology: an expert-driven knowledgebase of drug targets and their ligands. Nucleic Acids Res. 2014;42:D1098–D1106. doi: 10.1093/nar/gkt1143. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48.Kanehisa M., Furumichi M., Tanabe M., Sato Y., Morishima K. KEGG: new perspectives on genomes, pathways, diseases and drugs. Nucleic Acids Res. 2017;45:D353–D361. doi: 10.1093/nar/gkw1092. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49.Gene Ontology Consortium The Gene Ontology (GO) database and informatics resource. Nucleic Acids Res. 2004;32:D258–D261. doi: 10.1093/nar/gkh036. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 50.Yu G., Wang L.G., Han Y., He Q.Y. clusterProfiler: an R package for comparing biological themes among gene clusters. OMICS. 2012;16:284–287. doi: 10.1089/omi.2011.0118. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 51.Shameer K., Glicksberg B.S., Hodos R., Johnson K.W., Badgeley M.A., Readhead B. Systematic analyses of drugs and disease indications in RepurposeDB reveal pharmacological, biological and epidemiological factors influencing drug repositioning. Brief Bioinform. 2018;19:656–678. doi: 10.1093/bib/bbw136. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 52.O'Boyle N.M., Morley C., Hutchison G.R. Pybel: a Python wrapper for the OpenBabel cheminformatics toolkit. Chem Cent J. 2008;2:5. doi: 10.1186/1752-153X-2-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 53.Wegner J. Department of Computer Science, University of Tübingen; Germany: 2005. JOELib/JOELib2. [Google Scholar]
- 54.Moreno P., Beisken S., Harsha B., Muthukrishnan V., Tudose I., Dekker A. BiNChE: a web tool and library for chemical enrichment analysis based on the ChEBI ontology. BMC Bioinformatics. 2015;16:56. doi: 10.1186/s12859-015-0486-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 55.Luxburg U.V. A tutorial on spectral clustering. Stat Comput. 2007;17:395–416. [Google Scholar]
- 56.Brock G., Pihur V., Datta S., Datta S. clValid, an R package for cluster validation. J Stat Softw. 2008;25:1–22. [Google Scholar]
- 57.O'Dell J.R., Haire C.E., Erikson N., Drymalski W., Palmer W., Eckhoff P.J. Treatment of rheumatoid arthritis with methotrexate alone, sulfasalazine and hydroxychloroquine, or a combination of all three medications. N Engl J Med. 1996;334:1287–1291. doi: 10.1056/NEJM199605163342002. [DOI] [PubMed] [Google Scholar]
- 58.Hannonen P., Möttönen T., Hakola M., Oka M. Sulfasalazine in early rheumatoid arthritis. a 48-week double-blind, prospective, placebo-controlled study. Arthritis Rheumatol. 1993;36:1501–1509. doi: 10.1002/art.1780361104. [DOI] [PubMed] [Google Scholar]
- 59.Thalamuthu A., Mukhopadhyay I., Zheng X., Tseng G.C. Evaluation and comparison of gene clustering methods in microarray analysis. Bioinformatics. 2006;22:2405–2412. doi: 10.1093/bioinformatics/btl406. [DOI] [PubMed] [Google Scholar]
- 60.Dudoit S., Fridlyand J. A prediction-based resampling method for estimating the number of clusters in a dataset. Genome Biol. 2002;3 doi: 10.1186/gb-2002-3-7-research0036. research0036. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 61.Wrobel A., Eklund P., Bobrowska-Hägerstrand M., Hägerstrand H. Lignans and norlignans inhibit multidrug resistance protein 1 (MRP1/ABCC1)-mediated transport. Anticancer Res. 2010;30:4423–4428. [PubMed] [Google Scholar]
- 62.Ohta A., Sitkovsky M. Methylxanthines, inflammation, and cancer: fundamental mechanisms. Handb Exp Pharmacol. 2011:469–481. doi: 10.1007/978-3-642-13443-2_19. [DOI] [PubMed] [Google Scholar]
- 63.Uccella S., Mariani A., Wang A., Vierkant R., Cliby W.A., Robien K. Intake of coffee, caffeine and other methylxanthines and risk of Type I vs Type II endometrial cancer. Br J Cancer. 2013;109:1908–1913. doi: 10.1038/bjc.2013.540. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 64.Subramanian A., Tamayo P., Mootha V.K., Mukherjee S., Ebert B.L., Gillette M.A. Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc Natl Acad Sci U S A. 2005;102:15545–15550. doi: 10.1073/pnas.0506580102. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 65.Hastings J., de Matos P., Dekker A., Ennis M., Harsha B., Kale N. The ChEBI reference database and ontology for biologically relevant chemistry: enhancements for 2013. Nucleic Acids Res. 2012;41:D456–D463. doi: 10.1093/nar/gks1146. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 66.Borden K.L., Culjkovic-Kraljacic B. Ribavirin as an anti-cancer therapy: acute myeloid leukemia and beyond? Leuk Lymphoma. 2010;51:1805–1815. doi: 10.3109/10428194.2010.496506. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 67.Ballesteros A., Saiz H., Rosero Á., Portilla A., Montes L., Elorza R. SSRIs and QT interval prolongation management. A review. Eur Psychiatry. 2017;41:S750. [Google Scholar]
- 68.Ojero-Senard A., Benevent J., Bondon-Guitton E., Durrieu G., Chebane L., Araujo M. A comparative study of QT prolongation with serotonin reuptake inhibitors. Psychopharmacology. 2017;234:3075–3081. doi: 10.1007/s00213-017-4685-7. [DOI] [PubMed] [Google Scholar]
- 69.Iskar M., Zeller G., Blattmann P., Campillos M., Kuhn M., Kaminska K.H. Characterization of drug-induced transcriptional modules: towards drug repositioning and functional understanding. Mol Syst Biol. 2013;9:662. doi: 10.1038/msb.2013.20. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 70.Lieberman P. Histamine, antihistamines, and the central nervous system. Allergy Asthma Proc. 2009;30:482–486. doi: 10.2500/aap.2009.30.3264. [DOI] [PubMed] [Google Scholar]
- 71.Salhiyyah K., Forster R., Senanayake E., Abdel-Hadi M., Booth A., Michaels J.A. Pentoxifylline for intermittent claudication. Cochrane Database Syst Rev. 2015;9:CD005262. doi: 10.1002/14651858.CD005262.pub3. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 72.Ward A., Clissold S.P. Pentoxifylline. Drugs. 1987;34:50–97. doi: 10.2165/00003495-198734010-00003. [DOI] [PubMed] [Google Scholar]
- 73.Entzian P., Bitter-Suermann S., Burdon D., Ernst M., Schlaak M., Zabel P. Differences in the anti-inflammatory effects of theophylline and pentoxifylline: important for the development of asthma therapy? Allergy. 1998;53:749–754. doi: 10.1111/j.1398-9995.1998.tb03970.x. [DOI] [PubMed] [Google Scholar]
- 74.Sainati S.M., Hubbard J.W., Chi E., Grasing K., Brecher M.B. Safety, tolerability, and effect of food on the pharmacokinetics of iloperidone (HP 873), a potential atypical antipsychotic. J Clin Pharmacol. 1995;35:713–720. doi: 10.1002/j.1552-4604.1995.tb04112.x. [DOI] [PubMed] [Google Scholar]
- 75.Tsujii T. Comparison of prazosin, terazosin and tamsulosin in the treatment of symptomatic benign prostatic hyperplasia: ashort-term open, randomized multicenter study. Int J Urol. 2000;7:199–205. doi: 10.1046/j.1442-2042.2000.00175.x. [DOI] [PubMed] [Google Scholar]
- 76.Mersey J.H. Long-term experience with terazosin for treatment of mild to moderate hypertension. Am J Med. 1986;80:68–72. doi: 10.1016/0002-9343(86)90855-7. [DOI] [PubMed] [Google Scholar]
- 77.Itskovitz H.D. Alpha 1-blockade for the treatment of hypertension: a megastudy of terazosin in 2214 clinical practice settings. Clin Ther. 1994;16:490–504. [PubMed] [Google Scholar]
- 78.Torri V., Floriani I. Cyproterone acetate in the therapy of prostate carcinoma. Arch Ital Urol Androl. 2005;77:157–163. [PubMed] [Google Scholar]
- 79.Pucci E., Petraglia F. Treatment of androgen excess in females: yesterday, today and tomorrow. Gynecol Endocrinol. 1997;11:411–433. doi: 10.3109/09513599709152569. [DOI] [PubMed] [Google Scholar]
- 80.Wakelin S.H., Maibach H.I., Archer C.B. CRC Press; Boca Raton: 2002. Systemic drug treatment in dermatology: a handbook. [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.