Abstract
Identifying drug–target interactions has been a key step for drug repositioning, drug discovery and drug design. Since it is expensive to determine the interactions experimentally, computational methods are needed for predicting interactions. In this work, the authors first propose a single‐view penalised graph (SPGraph) clustering approach to integrate drug structure and protein sequence data in a structural view. The SPGraph model does clustering on drugs and targets simultaneously such that the known drug–target interactions are best preserved in the clustering results. They then apply the SPGraph to a chemical view with drug response data and gene expression data in NCI‐60 cell lines. They further generalise the SPGraph to a multi‐view penalised graph (MPGraph) version, which can integrate the structural view and chemical view of the data. In the authors' experiments, they compare their approach with some comparison partners, and the results show that the SPGraph could improve the prediction accuracy in a small scale, and the MPGraph can achieve around 10% improvements for the prediction accuracy. They finally give some new targets for 22 Food and Drug Administration approved drugs for drug repositioning, and some can be supported by other references.
Inspec keywords: graphs, drug delivery systems, drugs, proteins, molecular biophysics, molecular configurations, optimisation, eigenvalues and eigenfunctions, Laplace equations, cancer, cellular biophysics, gene therapy, medical computing
Other keywords: MPGraph, multiview penalised graph clustering, drug‐target interactions, drug repositioning, drug discovery, drug design, computational methods, single‐view penalized graph clustering approach, drug structure, protein sequence data, SPGraph model, optimisation problem, spectral clustering, eigenvalue decomposition, Laplacian model, gene expression data, NCI‐60 cell lines
1 Introduction
It is crucial to identify whether a drug interacts with protein targets in drug discovery. New drug–protein relationships can help repositioning drugs and understand the mechanism of drugs and design new drugs (NDs). Owing to the high‐throughput technologies, the number of ways to measure the properties and the behaviour of biological systems increases rapidly. Since a bipartite graph composed of US Food and Drug Administration (FDA)‐approved drugs and proteins linked by drug–target binary associations was constructed in [1], a number of studies [2–9] have been done using different types of biological data individually or together, such as protein sequence, protein function, protein–protein network, gene expression, drug structure, drug side effects, drug response and metabolic network and so on. Kutalik et al. [3] proposed a modular approach by integrating gene expression and drug response data in NCI‐60 cell lines. The method discovers drug–gene associations by identifying co‐modules of drugs and genes, and thus breaking down the massive sets of data into smaller building blocks that exhibit similar patterns across certain genes and drugs in some of the cell lines. Drugs and genes in one co‐module are assumed to be associated with each other with high probability. Yamanishi [4] Bleakley and Yamanishi [5] proposed bipartite graph learning (BGL) methods by integrating protein sequence and drug structure data. The main idea is to first construct a bipartite graph of drugs and targets, then map them to a common space. The close drugs and targets are assumed to be associated with each other. Campillos et al. [6] proposed to use drug side effects to measure the similarity among drugs, which pave a new way to understand drugs. They tested 20 of unexpected drug–target interactions (DTIs) and validated 13 ones by in vitro binding assays. Mizutani et al. [7] integrated drug side effects and protein function to predict drug–targets.
It is challenging to integrate multiple data sets for identifying ND–target (NT) associations. The docking of drugs and proteins based on their structure can be considered as a structural view, and the gene expression and drug response can be considered as a chemical view. We first consider how to predict DTIs in the structural view by integrating drug structure and protein sequence, and then in a combined structural–chemical view by integrating drug structure, protein sequence, drug response and gene expression. In the structural view, we can construct a drug–target graph to represent the relationships among drugs and targets. In the graph, there are three types of edges. The edges between drugs represent drug similarities, the edges between targets represent target protein similarities, the edges between drugs and targets represent whether they have interactions or not. Our aim is to do graph clustering and generate drug–target co‐modules. Thus, we can predict the drugs and targets in the same co‐module to be interacted. An easiest way is to do clustering on drug subgraph and target subgraph individually, and then connect the drug cluster and target cluster to be a co‐module if there are interactions between two clusters. However, by this straightforward method, the clustering on drugs or targets does not make use of the information of known DTIs. We propose a novel method by projecting the drugs and targets to a common space with the known interactions are preserved as best as possible. This can be done by minimising an objective function with an extra penalised term added to the two terms for spectral clustering of drugs and targets. It could strengthen the prediction ability if we can integrate the structural view and the chemical view. Note that the form of the penalised objective function is shown to have the same form with that for spectral clustering, and thus we could borrow the idea of multi‐view co‐regularised spectral clustering [10] to generalise the single‐view penalised graph (SPGraph) to a multi‐view version, which is called multi‐view penalised graph (MPGraph).
The remainder of this paper is structured as follows: in Section 2, we first describe the materials used in this paper, and then propose the SPGraph and MPGraph models. We also present the algorithms for solving the model and a strategy to select parameters. In Section 3, we first evaluate the two approaches by comparing them with the methods of support vector machine (SVM) and BGL by cross‐validation, and report the prediction accuracy for these methods. Then, we apply the MPGraph to the whole data, and report the predicted DTIs.
2 Methods and materials
2.1 Materials
2.1.1 NCI‐60‐drug response data
The NCI‐60 cell line screen method is developed by National Cancer Institute (NCI) and serves to screen a large number of substances for cytotoxic activity. The panel consists of 60 cell lines derived from distinct cancer types. The 60 cell lines were assayed for their sensitivity to a variety of drugs as a part of the Developmental Therapeutics Program (DTP) at the NCI. Briefly, each cell line was exposed to each drug for 48 h, and growth inhibition was assessed by the sulforhodamine B assay for cellular protein. The concentration of compound required for 50% growth inhibition was scored as the GI50. We obtain the DTP human tumor cell line screening data from the DTP web site.
2.1.2 NCI‐60 gene expression data
We obtained gene expression data (mRNA:Affy‐U133B, GCRMA‐normalised) in NCI‐60 cell lines conducted by Shankavaram et al. [11] from NCI web site [12].
2.1.3 Drug structure data and protein sequence data
We obtained drug structures from KEGG database [13], and then computed the structure similarities between drugs using SIMCOMP [14], which finds the common substructures between two compounds and outputs the global similarity score based on a graph alignment algorithm.
We also downloaded protein sequence data from KEGG database, and computed the protein sequence similarities using Smith–Waterman method [15] by MATLAB.
2.1.4 DTI data
The DTI data was obtained from the DrugBank database [16]. The data set includes the known DTIs.
2.1.5 Data preprocessing and data selection
We constructed drug ID mapping between DrugBank IDs and KEGG IDs, and chose the 326 overlapping drugs from the drug response data and drug structure data. We then obtained 608 overlapping proteins from the gene expression data and protein sequence data. There are totally 114 known interactions among these drugs and protein targets in DrugBank database. The 326 drugs and 608 proteins with their known 114 interactions constitute our data set.
Note that the data set can be composed of two parts. The first includes the drug structural similarities and protein sequence similarities data. To interact with a protein target, a drug should have structural docking sites on the protein structure. We thus call this part of the data set the structural view. The other part includes the drug similarities among the 326 drugs based on their response in NCI‐60 cell lines, and protein similarities among the 628 proteins based on the expression of their encoding genes in NCI‐60 cell lines. We constructed drug similarities based on drug response data by using a Gauss kernel with σ being the median distance among all pairwise distances among drugs. Similarly, we used Gauss function to construct gene similarities based on gene expression data. Two drugs which show similar profiling in NCI‐60 cell lines may interact with the same target, and two genes which show similar expression in 60 cell lines may interact with the same drug. We call this part of the data the chemical view, since the data sets reflect the chemical properties of drugs and proteins. We will use either view and also both views to predict the DTIs.
2.2 Methods
2.2.1 Single‐view penalised graph (SPGraph) method
In this section, we propose a SPGraph method to predict DTIs. We will explain the approach with the structural view, and it can also be applied to the chemical view. Suppose we have structures for m drugs and sequences for n proteins. The drug structure similarity matrix and the protein sequence similarity matrix are known to be and , respectively, [‘s’ is the short for structural.]. We suppose protein jh is known to be a target of drug ih , h = 1, …, l and denote the set of known interacted drug–target pairs as M = {(i 1, j 1), …, (il , jl )}.
One can construct a drug graph and a protein graph by using as the adjacent matrices, respectively. The graph Lapalacian are and , where are diagonal matrices with diagonal elements being the row sums of , respectively. One can cluster drugs or proteins using spectral clustering to K clusters by solving the following relaxed optimisation problems
(1) |
and
(2) |
respectively. The optimal
can be obtained by the eigenvectors of (or ) corresponding to the largest K eigenvalues. This can be considered as a dimension reduction step, in which the drugs or proteins are projected to a K ‐dimensional space. k ‐means is finally applied to cluster the m rows of (or ). However, the clustering of individual drugs or targets has limited use for drug–target prediction since the projection does not consider the known interactions between drugs and targets.
To make full use of the known DTIs, we can construct a combined weighted drug–target graph where nodes are the drugs and targets, the edges between drugs represent drug similarities, the edges between targets represent target protein similarities, the edges between drugs and targets represent whether they are known to have interactions or not. We hope to do clustering on this graph to generate drug–target co‐modules and the drugs and proteins in the same co‐module are thus likely to interact with each other. Note that there are three types of edges in the graph, drug–drug, protein–protein and drug–protein, and the weights of different types of edges are in different scales and thus non‐comparable. We project drugs and proteins to a common space with the known interacted drugs and targets being as close as possible. When we do clustering on the projected drugs and targets, the known drug–target pairs are likely to be in the same co‐modules. Noting that protein jh is known to be a target of drug ih , h = 1, …, l, thus we hope could be as close as possible for all h = 1, …, l. A penalised term which punishes the separation between the known interacted drugs and targets after projection can be added to the two individual objectives in (1) and (2). Thus, the following optimisation problem is proposed
(3) |
where is the transpose of the ih ‐row of , is the transpose of the jh ‐row of , λ s is a parameter to control the penalised term. The third term represents the sum of distances between drugs and targets in the projected space over all known interacted drugs and targets. Apparently, we hope this value to be as small as possible so that the known drug–target pairs can be in the same co‐modules after doing clustering. To obtain a simpler model, we change the original constraints and in (3) to . Thus, by introducing , we can obtain the following problem as the SPGraph model
(4) |
Note that
where
We introduce indices matrix , where A (h, ih ) = 1 and B (h, jh ) = 1 and other elements of A and B are zeros. Then, we can obtain . The objective function of (4) can be further simplified as
By denoting
(5) |
we can simplify the optimisation problem (4) to be
(6) |
L s can be viewed as the combined Laplacian of drugs and targets in the structural view. The optimal Z s can be obtained by the top K eigenvectors of L s. The standard spectral clustering procedure can then be used. We first normalise each row of Z s to be unit and then apply k ‐means to these m + n rows to obtain co‐modules of drugs and targets. The drugs and targets which are in the same cluster are more likely to interact with each other.
To apply the SPGraph approach in the chemical view, we can construct another combined drug–target graph using the respective Gaussian kernels as adjacent matrices [‘c’ is the short for chemical.]. Thus, the same procedure as above can be done to cluster the drugs and proteins together by solving the following problem
(7) |
where
(8) |
Similarly, L c can be viewed as the combined Laplacian of drugs and targets in the chemical view.
‘SPGraph: single‐view penalised graph clustering’
Inputs : drug structure similarity matrix , protein sequence similarity matrix . Known DTIs M, number of clusters K and parameters λ s.
Outputs : and K clusters of drugs and proteins:
Construct the matrix L s .
Compute Z s = the top K eigenvectors of L s corresponding to its largest eigenvalues.
Normalise the rows of Z s to be unit vectors.
Apply k ‐means to the rows of normalised Z s.
2.2.2 Multi‐view penalised graph method
To predict drug–targets more accurately, one can integrate the structural and chemical views of drugs and proteins. Kumar et al. [10] proposed a co‐regularised spectral clustering to do multi‐view clustering, with each view having samples in one category. However, in our case, we have two categories of samples (drugs and proteins) in each view. Fortunately, the formulation of SPGraph model allows us to generalise it two multiple views as follows
(9) |
where single‐view Laplacian L s and L c can be obtained from (5) and (8) in the structural view and chemical view, respectively. The first two terms aim to cluster drugs and proteins in individual views, and the last term measures the disagreement between the two views. The parameters λ 1 and λ 2 trade‐off the spectral clustering objectives in both views and the spectral embedding disagreement term. The objective function can be simplified as follows
(10) |
The optimisation can be solved by alternating minimisation with regard to Z s and Z c. For a given Z s, we obtain the optimisation
(11) |
This is a spectral clustering objective on the chemical view with graph Laplacian , and the solution Z c can be given by the top eigenvectors of this modified Laplacian. Once Z c is obtained, we can update Z s in the same way. The convergence stops when the difference in the value of objective function between consecutive iterations fall below a minimum threshold . Note that we can use either Z s or Z c in the final k ‐means of the spectral clustering algorithm. In our experiments, we used Z s since for DTI, since it is believed that structural is more important.
‘MPGraph: multi‐view penalised graph clustering’
Inputs : drug structure similarity matrix , protein sequence similarity matrix , drug response similarity matrix , gene expression similarity matrix , known DTIs M. Number of clusters K, parameters λ 1, λ 2, λ s and λ c.
Outputs : Z s, Z c and K clusters of drugs and proteins:
Compute L s and L c.
Set initial Z s.
Solve model (10) for Z s and Z c iteratively.
Normalise the rows of Z s (or Z c) to be unit vectors.
Apply k ‐means to the rows of normalised Z s (or Z c).
2.3 Parameter selection
Note that there are five parameters in this multi‐view model, λ 1, λ 2, K, λ c and λ s. Like many multi‐view approaches (e.g. [10]), one can manually choose the weights, given the prior information which view could be more important. To achieve more objective results, cross‐validation can also be used to select the parameters. However, it would be expensive to choose five parameters. Thus, we compromisingly chose a partially automatic selection strategy, which allows us fix some parameters and select the others by cross‐validation. For example, we can fix λ 1 and λ 2, and select K, λ s and λ c by cross‐validation based on the training data. In the cross‐validation, we choose the best parameters corresponding to the highest average area under curve (AUC) values.
3 Results and discussion
3.1 Evaluation of the MPGraph approach
To evaluate our method, we used a smaller subset of our data. Among the original 326 drugs, there are some drugs which do not have any known targets. The case is similar with proteins. We removed these drugs and proteins and obtained 65 drugs and 80 targets with the 114 interactions.
On this smaller subset of data, we evaluate our methods in three experimental settings, ND, NT and NDNT. ND aims to predict targets for NDs, NT is to predict interacted drugs for NTs, and NDNT is to predict whether there is interaction between a ND and a NT. For each setting, we applied our SPGraph method to drug structure and protein sequence in structural view (SPGraph‐S), and to NCI‐60‐drug response and NCI‐60‐gene expression in chemical view (SPGraph‐C). We also applied our MPGraph method to integrate both views (MPGraph).
For ND, we randomly split all the drugs to 5‐folds, one of which is considered as test drugs and the remaining are training drugs. Each fold is considered as test set once. Using the known interactions between the training drugs and all proteins, we can project all drugs and proteins to a K ‐dimensional space by SPGraph‐S, SPGraph‐C or MPGraph, where the closest r proteins around a test drug are assumed to be its targets. By changing the threshold r, we can have the ROC curve and thus compute the AUCs. The setting of NT is similar to that of ND, except that all the proteins are split to training and test sets. In the case of NDNT, we split both drugs and proteins to 5‐folds, respectively. Each time we take 1‐fold of drugs and 1‐fold of proteins as test samples, and the remaining as training samples. By using the known interactions between the training drugs and training targets, we project all drugs and proteins into a common K ‐dimensional space. AUCs can be computed in the same way as in the ND case. For each setting, we randomly split the samples 50 times and reported the average AUCs and standard errors.
Note that for both SPGraph and MPGraph approaches, we set λ 1 = λ 2 = 1, and chose K, λ c, λ s by 4‐fold cross‐validation from the sets {20, 30, 40, 50}, {0.01, 1, 100} and {0.01, 1, 100}, respectively.
We compared our methods with SVMs and BGL [4] in these three settings. For SVM, we aim to classify drug–protein pairs to two classes indicating the pairs in them are interacted or not. The kronecker product K = K d ⊗ K t of the drug kernel K d and protein kernel K t as the kernel among drug–protein pairs is used in SVM. The regularisation parameter is selected by 4‐fold cross‐validation on training data. We also applied a multiple kernel SVM by using the addictive kernel of structural kernel and chemical kernel generated from structural view and chemical view, respectively.
We reported the results in Table 1. We can see that the approaches applied to the structural view obtained higher AUCs than the chemical view. In the structural view, the three approaches performed all very bad in NT setting. The SPGraph could obtain similar results with other methods in ND setting and the best results in the NDNT setting. In the chemical view, for all the three settings, the three approaches can only obtain almost random results. The results show that by either view only the results are not good. For the multi‐view method, SVM performed even worst than in the single‐view setting. However, our MPGraph could increase the AUC by 10% in ND setting, 4% in NT setting and 7% in NDNT setting. This shows that although the SPGraph has limited improvement for single‐view data, its multi‐view version MPGraph could improve the accuracy a lot.
Table 1.
Average AUCs and standard errors by the SPGraph, the MPGraph and other comparison partners
AUCs | Structures | Chemical | Multi‐view | |||||
---|---|---|---|---|---|---|---|---|
SVM | BGL | SPGraph | SVM | BGL | SPGraph | MK‐SVM | MPGraph | |
ND | 0.578 ± 0.003 | 0.547 ± 0.003 | 0.581 ± 0.006 | 0.443 ± 0.001 | 0.508 ± 0.003 | 0.482 ± 0.006 | 0.479 ± 0.003 | 678 ± 0.004 |
NT | 0.508 ± 0.003 | 0.435 ± 0.003 | 0.460 ± 0.005 | 0.468 ± 0.001 | 0.495 ± 0.003 | 0.473 ± 0.006 | 0.455 ± 0.003 | 541 ± 0.005 |
NDNT | 0.499 ± 0.009 | 0.480 ± 0.009 | 0.548 ± 0.016 | 0.460 ± 0.009 | 0.480 ± 0.011 | 0.484 ± 0.019 | 0.474 ± 0.008 | 612 ± 0.015 |
3.2 Co‐modules of drugs and targets by the MPGraph
We applied our MPGraph method to the integrated data with structural view and chemical view of the original data including m = 326 drugs and n = 608 proteins. We first chose fixed parameters λ s = λ c = λ 1 = 1, λ 2 = 100 by the experience in the above experiments and K = 40. With the 114 known interactions, we applied the MPGraph algorithm to project drugs and proteins to K ‐dimensional Euclidean space and then cluster them to co‐modules.
Note that the clustering results of k ‐means depend on the choice of initial centres, and thus different run of k ‐means might obtain different results. Thus, we conducted 100 times of k ‐means with random initials, and obtained 100 clustering results. We constructed a weighted bipartite graph of drugs and targets from these results as follows. For each clustering result, we constructed a non‐weighted bipartite graph, in which the nodes are the m drugs and n targets, and there is an edge between a drug and protein if they belong to the same co‐module. Thus, we have 100 unweighted bipartite graphs, whose adjacent matrices are denoted by . By taking the average of all the adjacent matrices, we can obtain the weighted bipartite graph with adjacent matrix as . We further remove the edges with weights < 0.6 in and the new graph is denoted as . Thus the drug–target pairs linked in belong to one co‐module with enough confidence, and can be considered as DTIs. Note that some known drug–targets interactions can be recalled among the edges in , and some cannot. We call the former interactions as recalled interactions, and call the others as unrecalled interactions. Thus, we can define training precision and recall as follows
In Fig. 1, we reported the number of edges in and the number of the recalled interactions for K = 10:10:400. We can see that the majority of the known interactions can be recalled. We also reported the precision–recall curve in Fig. 1. We can see that small number of K generates lots of edges in , and thus precision is low and recall is high, and big number of K generates high precision and low recall. Note that the recall is higher than 0.5 for all K s, which means our approach can reproduce the known interactions.
Fig. 1.
Prediction results
Left: the number of interactions in and the number of recalled interactions for different K s and right: the precision–recall curve
Removing the recalled interactions, we can obtain new predicted DTIs from . We reported the results when K is chosen as 40. In Fig. 2, we plotted the nodes and edges in the bipartite graph , where the dot edges represent the recalled edges and solid edges represent the new predicted edges. We also plotted the unrecalled known interactions as dash edges in the figure. We can see that some solid edges occur when the end drugs and end targets can be linked in a short path (e.g. the solid edge in the right corner in the figure).
Fig. 2.
DTIs (solid and dot edges) in the graph when K = 40 and the unrecalled known interactions (dash edges). The circles and triangles represent drugs and targets, respectively.
Finally we reported in Table 2, the new predicted DTIs for FDA‐approved drugs with weights 1 in graph . The predicted drug–targets with weights larger than 0.6 and less than 1 is reported in the supplementary material (S.xls). We observed that some of the predicted DTIs in Table 2 can be supported by some references, and some predicted interactions can give us some hints for understanding the drug mechanism or side effects.
Table 2.
Predicted targets for 22 FDA‐approved drugs
KEGG IDs | Drug names | Gene names |
---|---|---|
D00221 | acetylcysteine | HADHSC, SLC1A4 |
D00203 | amphotericin B | MMP8 |
D03150 | bortezomib | GRM3 |
D02089 | buclizine | SCNN1A, ORM1, UROD |
D00266 | chlorambucil | CACNB1, CYP1B1 |
D00214 | dactinomycin | JARID1D |
D00292 | dexamethasone | CCL23, HLA‐DQB, DDC |
D01464 | flumethasone pivalate | POLA2, CTNNB1 |
D01825 | fluocinolone acetonide | POLA2, CTNNB1 |
D01367 | fluorometholone | POLA2, CTNNB1 |
D00070 | folic acid | AARS, NDUFS1, HMOX1, CFD |
D00341 | hydroxyurea | NSDHL, STAT1 |
D05096 | mycophenolic acid | PDE2A |
D00183 | nalidixic Acid | CACNB1, CYP1B1 |
D01631 | pentagastrin | AARS, NDUFS1, HMOX1, CFD, GRM3 |
D00560 | pimozide | SCNN1A |
D05529 | podofilox | CPB1 |
D00473 | prednisone | ALDH3B2 |
D05932 | streptozocin | GABRA5 |
D00888 | terconazole | ORM1, UROD |
D00153 | testolactone | ALDH3B2 |
D00385 | triamcinolone | POLA2, CTNNB1 |
Acetylcysteine has been shown to have antiviral effects in patients with human immunodeficiency virus because of inhibition of viral stimulation by reactive oxygen intermediates. Hydroxyacyl‐CoA dehydrogenase (HADHSC) is a member of the 3‐hydroxyacyl‐CoA dehydrogenase gene family. One function of the encoded protein in the mitochondrial matrix is to catalyse the oxidation of straight‐chain 3‐hydroxyacyl‐CoAs as part of the beta‐oxidation pathway. The possible interaction between HADHSC and acetylcysteine is mentioned in [17, 18]. Besides, solute carrier family 1, member 4 (SLC1A4) might be a transporter of acetylcysteine, since another gene SLCO1B1 in solute carrier family is known to be one of its transporters. Amphotericin B is used to treat fungal infections. The protein encoded by matrix metallopeptidase 8 (MMP8) is involved in the breakdown of extracellular matrix in normal physiological processes. Rohini et al. [19] found the MMP8 is related to fungal keratitis. Buclizine is an FDA‐approved drug used for prevention and treatment of nausea, vomiting and dizziness. Vomiting is a key element of the body's defence system against accidentally ingested toxins and indigestible food matter. The gene sodium channel, non‐voltage‐gated 1 alpha subunit (SCNN1A) is responsible for controlling fluid and electrolyte transport across epithelia in many organs. The close relationship between sodium channel and vomiting has been partially approved in [20] by experiments on fishes. Dexamethasone is an anti‐inflammatory drug. It functions by interferencing in the function of mediators of inflammatory response, suppression of humoral immune responses and reduction in edema or scar tissue. The gene chemokine ligand 23 (CCL23) is one of several cytokine genes, and cytokines are a family of secreted proteins involved in immunoregulatory and inflammatory processes. Another candidate target HLA‐DQB1 is also known to have function of mediated immune response. Mycophenolic acid is antineoplastic drug. It is an inhibitor of inosine monophosphate dehydrogenase. The gene phosphodiesterase 2A (PDE2A, cGMP‐stimulated) belongs to a family of related phosphohydrolyases that selectively catalyse the hydrolysis of 3′ cyclic phosphate bonds in adenosine and/or guanine cyclic monophosphate (cAMP and/or cGMP). Thus, the gene PDE2A might have some link with mycophenolic acid. Testolactone is an antineoplastic agent for the treatment of breast cancer. It has established target cytochrome P450 19A1, which catalyses the formation of aromatic C18 estrogens from C19 androgens [21]. We found that aldehyde dehydrogenase 3 family, member B2 (ALDH3B2) is involved in the pathway of Drug metabolism – cytochrome P450 [13].
4 Conclusion
In this work, we proposed two semi‐supervised learning approaches SPGraph and MPGraph to predict DTIs by integrating multiple data sources in a single view and multiple views.
The SPGraph approach was first developed to predict DTIs in a structural view by integrating drug structure and protein sequence data. It attempts to do clustering on both drugs and targets such that known interacted drug–target pairs could belong to the same clusters as best as possible. This can be done by projecting drugs and targets to a low‐dimensional space by Laplacian eigenmaps with an extra penalised term which represents the closeness of the known interacted drugs and proteins. The model is proved to have the form of spectral clustering with combined Laplacian of drug space and protein space, and the optimisation problem can be easily solved by eigenvalue decomposition. We also applied the SPGraph method for the chemical view with drug response data and gene expression data. The form of optimisation objective function in SPGraph model motivates us to generalise the SPGraph model to a multi‐view version MPGraph. The MPGraph borrows the idea of multi‐view co‐regularised spectral clustering [10], and thus can integrate the structural view and chemical view for drug–target prediction. Our experimental results showed that MPGraph can improve the prediction accuracy a lot. We also reported some new predicted DTIs for further exploration. Although we only give a two‐view formulation of the MPGraph, it can be easily used in more than two views.
Note that drug side effects were also used in much research work [6, 7] to predict drug–target, since similar side effects of drugs might imply that they have similar targets. The biological functions of the off‐targets often result in the side effects of the targets. Thus, one can also apply the SPGraph model in a so‐called functional view to predict drug–target by integrating drug side effects data and protein function data. The MPGraph approach can be applied for drug–target prediction by integrating three views: structural view, chemical view and functional view. However, the increase of the number of views will increase the number of parameters in the model, which requires a more efficient strategy to choose parameters or avoid choosing parameters. This is a current general problem in multi‐view learning research, and we will do further research on this in the future.
5 Acknowledgment
The work was supported in part by the NSFC projects 11101328 and 11071218 and the Open Project Program of the State Key Lab of CAD & CG (Grant No. A1313) in Zhejiang University.
6 References
- 1. Cusick M.E. Barabasi A.L. Vidal M. Yildirim M.A., and Goh K.I.: ‘Drug‐target network’, Nat Biotechnol., 2008, 25, pp. 1119C1126 [DOI] [PubMed] [Google Scholar]
- 2. Cheng F. Liu C., and Jiang J. et al.: ‘Prediction of drug‐target interactions and drug repositioning via network‐based inference’, PLoS Comput. Biol., 2012, 8, (5), pp. e1002503 (doi: 10.1371/journal.pcbi.1002503) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3. Kutalik Z. Beckmann1 J.S., and Bergmann S.: ‘A modular approach for integrative analysis of large‐scale gene‐expression and drug‐response data’, Nat. Biotechnol., 2008, 26, pp. 531–539 (doi: 10.1038/nbt1397) [DOI] [PubMed] [Google Scholar]
- 4. Yamanishi Y. Araki M. Gutteridge A. Honda W., and Kanehisa M.: ‘Prediction of drug–target interaction networks from the integration of chemical and genomic spaces’, Bioinformatics, 2008, 24, pp. i232–i240 (doi: 10.1093/bioinformatics/btn162) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5. Bleakley K., and Yamanishi Y.: ‘Supervised prediction of drug–target interactions using bipartite local models’, Bioinformatics, 2009, 25, (18), pp. 2397–2403 (doi: 10.1093/bioinformatics/btp433) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6. Campillos M. Kuhn M. Gavin A.C. Jensen L.J., and Bork P.: ‘Drug target identification using side‐effect similarity’, Science, 2008, 321, pp. 263–266 (doi: 10.1126/science.1158140) [DOI] [PubMed] [Google Scholar]
- 7. Mizutani S. Pauwels E. Stoven V. Goto S., and Yamanishi Y.: ‘Relating drug–protein interaction network with drug side effects’, Bioinformatics, 2012, 28, (ECCB), pp. i522–i528 (doi: 10.1093/bioinformatics/bts383) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8. Folger O. Jerby L. Frezza C. Gottlieb E. Ruppin E., and Shlomi T.: ‘Predicting selective drug targets in cancer through metabolic networks’, Mol. Syst. Biol., 2011, 7, p. 501 (doi: 10.1038/msb.2011.63) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9. Li L. Zhou X. Ching W.‐K., and Wang P.: ‘Predicting enzyme targets for cancer drugs by profiling human metabolic reactions in nci‐60 cell lines’, BMC Bioinf., 2011, 11, p. 501 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10. Kumar A. Rai P., and Daumé H. III: ‘Co‐regularized multi‐view spectral clustering’. NIPS, 2011.
- 11. Shankavaram U.T. Reinhold W.C., and Nishizuka S. et al.: ‘Transcript and protein expression profiles of the nci‐60 cancer cell panel: an integromic microarray study’, Mol. Cancer Ther., 2007, 6, pp. 1535–7163 (doi: 10.1158/1535-7163.MCT-06-0650) [DOI] [PubMed] [Google Scholar]
- 12. Reinhold W.C. Sunshine M., and Liu H. et al.: ‘Cellminer: a web‐based suite of genomic and pharmacologic tools to explore transcript and drug patterns in the nci‐60 cell line set’, Cancer Res., 2012, 72, pp. 3499 (doi: 10.1158/0008-5472.CAN-12-1370) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13. Kanehisa M. Goto S., and Hattori M. et al.: ‘From genomics to chemical genomics: new developments in kegg’, Nucleic Acids Res., 2006, 35, pp. 354–357 (doi: 10.1093/nar/gkj102) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14. Hattori M. Okuno Y. Goto S., and Kanehisa M. et al.: ‘Development of a chemical structure comparison method for integrated analysis of chemical and genomic information in the metabolic pathways’, J. Am. Chem. Soc., 2003, 125, pp. 11853–11865 (doi: 10.1021/ja036030u) [DOI] [PubMed] [Google Scholar]
- 15. Smith M., and Waterman T.F.: ‘Identification of common molecular subsequences’, J. Mol. Biol, 1981, 147, pp. 195–197 (doi: 10.1016/0022-2836(81)90087-5) [DOI] [PubMed] [Google Scholar]
- 16. Knox C. Law V. Jewison T., and Liu P. et al.: ‘Drugbank 3.0: a comprehensive resource for ‘omics’ research on drugs’, Nucleic Acids Res., 2011, 39, pp. D1035–1041 (doi: 10.1093/nar/gkq1126) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17. Seiva F.R. Amauchi J.F., and Rocha K.K. et al.: ‘Alcoholism and alcohol abstinence: N‐acetylcysteine to improve energy expenditure, myocardial oxidative stress, and energy metabolism in alcoholic heart disease’, Alcohol., 2009, 43, (8), pp. 649–656 (doi: 10.1016/j.alcohol.2009.09.028) [DOI] [PubMed] [Google Scholar]
- 18. Diniz Y.S. Rocha K.K., and Souza G.A. et al.: ‘Effects of n‐acetylcysteine on sucrose‐rich diet‐induced hyperglycaemia, dyslipidemia and oxidative stress in rats’, Eur. J. Pharmacol., 2006, 543, (1–3), pp. 151–157 (doi: 10.1016/j.ejphar.2006.05.039) [DOI] [PubMed] [Google Scholar]
- 19. Rohini G. Murugeswari P. Prajna N.V. Lalitha P., and Muthukkaruppan V.: ‘Matrix metalloproteinases (mmp‐8, mmp‐9) and the tissue inhibitors of metalloproteinases (timp‐1, timp‐2) in patients with fungal keratitis’, Cornea, 2007, 26, (2), pp. 207–211 (doi: 10.1097/01.ico.0000248384.16896.7d) [DOI] [PubMed] [Google Scholar]
- 20. Andrews P.L.R. Simsf D.W., and Young J.Z.: ‘Induction of emesis by the sodium channel activator veratrine in the lesser spotted dogfish, scyliorhinus canicula (chondrichthyes:elasmobranchii)’, J. Mar. Biol. Assoc. (UK), 1998, 78, pp. 1269–1279 (doi: 10.1017/S0025315400044489) [DOI] [Google Scholar]
- 21. Dunkel L.: ‘Use of aromatase inhibitors to increase final height’, Mol. Cell Endocrinol., 2006, 254–255, pp. 207–216 (doi: 10.1016/j.mce.2006.04.031) [DOI] [PubMed] [Google Scholar]