Predicting The Effects of Chemical-Protein Interactions On Proteins Using Tensor Factorisation

Sameh K Mohamed; Aayah Nounu

. 2020 May 30;2020:430–439.

Predicting The Effects of Chemical-Protein Interactions On Proteins Using Tensor Factorisation

Sameh K Mohamed ^1,², Aayah Nounu ³

PMCID: PMC7233103 PMID: 32477664

Abstract

Understanding the different effects of chemical substances on human proteins is fundamental for designing new drugs. It is also important for elucidating the different mechanisms of action of drugs that can cause side-effects. In this context, computational methods for predicting chemical-protein interactions can provide valuable insights on the relation between therapeutic chemical substances and proteins. Their predictions therefore can help in multiple tasks such as drug repurposing, identifying new drug side-effects, etc. Despite their useful predictions, these methods are unable to predict the different implications — such as change in protein expression, abundance, etc, — of chemical — protein interactions. Therefore, In this work, we study the modelling of chemical-protein interactions’ effects on proteins activity using computational approaches. We hereby propose using 3D tensors to model chemicals, their target proteins and the effects associated to their interactions. We then use multi-part embedding tensor factorisation to predict the different effects of chemicals on human proteins. We assess the predictive accuracy of our proposed method using a benchmark dataset that we built. We then show by computational experimental evaluation that our approach outperforms other tensor factorisation methods in the task of predicting effects of chemicals on human proteins.

Introduction

Understanding the different effects of chemical substances on human proteins is fundamental for designing new drugs. It is also important for elucidating the different mechanisms-of-action of current drugs that can cause unwanted side-effects ⁽¹⁾. This encouraged researchers to investigate the different chemical–protein interactions and their effects on the protein activity in living systems. Chemicals can have different types of effects on their target proteins such as change of expression, abundance, secretion, etc. These different effects then play various roles in the mechanism-of-action of chemicals in living systems. Therefore, understanding the chemical–protein interactions with their respective effects is crucial to elucidating the mechanism-of-action of therapeutic chemical substances.

The process of investigating chemical–protein interactions and their effects commonly involves ‘omics approaches such as mass spectrometry which is used for the generation of proteomic data. This method is mainly used to identify off-target or non-canonical targets of chemicals (drugs) that may be unknown ^{(2, 3)}. Other lab approaches include phenotypic screening such as work carried out by Iljin et. al. ⁽⁴⁾ whereby 4910 drug-like small molecule compounds were tested against prostate cancer cell lines to identify which ones affected cell proliferation ⁽⁴⁾. Despite the insightful findings of such approaches, they are laborious, time-consuming and resource-consuming processes.

This encouraged the development of different computational approaches to inform and assist laboratory experimentation of chemical–protein interactions ^(5–7). These approaches enabled predicting the most plausible chemical–protein interactions with high accuracy and efficiency ^{(6, 7)}. However, all the current computational approaches are only focused on predicting the existence of chemical–protein interactions, and they do not provide any information on the effects of these interactions.

In this study, we extend the design of traditional chemical–protein interaction computational approach to allow them to encode the different types of effects caused by these interaction on target proteins. We model the information on chemicals, their target proteins and associated interaction effects as 3D where we can easily apply tensor factorisation methods to infer new unknown chemical–protein interactions’ effects.

Tensor factorisation methods have been widely adopted in different biological tasks including drug–target interaction prediction ^{(7, 8)}, drug side-effect prediction ^{(9, 10)}, protein biological functions prediction ⁽¹¹⁾, etc. Tensor factorisation approaches were then used to generate vector representations of biological entities to provide predictions of their unknown associations to other entities.

In the context of our study, we use tensor factorisation methods to generate embeddings of chemicals, their targeted proteins and corresponding chemical effects. We then use these embeddings to predict new possible chemical–protein interactions and their associated chemical effects. We use a multi-part tensor factorisation approach that models tensor objects’ embeddings using multiple tensors and we show that this approach has better accuracy than other tensor factorisation approaches. To the best of our knowledge, this work is the first computational method that considers the context of chemical effects in predicting chemical–protein interactions. Therefore, we only compare our method to other tensor factorisation approaches.

We build a benchmarking dataset that consists of known chemical–protein interactions with their associated chemical effects extracted from the Comparative Toxicogenomics Database (CTD) ^{(12, 13)}. We then show by computational experimental evaluation on this benchmark that our proposed multi-part tensor factorisation approach outperforms all other approaches in the task of predicting chemical–protein interactions and their associated chemical effects.

The rest of this study is outlined as follows: the background section presents a brief background on the studied problem and proposed approach. The materials section discusses the benchmarking dataset used in this study. The methods section discusses the design details of our approach. The results section presents the experimental setup of our evaluation benchmark and the outcome results. The discussion presents the findings of this study and discuss the challenges and limitations of the proposed approach.

Background

In this section, we discuss the preliminary concepts and notations that we use through this study. We first discuss the problem of the different systematic approaches used for drug repurposing. We then discuss the tensor factorisation model and other notations that we use during the training and evaluation of our approach.

Drug repurposing approaches. Drug repurposing is the use and effectiveness of well-known drugs for alternative diseases other than the disease it was originally designed for ⁽¹⁴⁾. This is a more cost-effective and time-efficient process than developing new drugs, as it bypasses the need for the drug to go through Phase I of the clinical trials since it already has a known safety profile ⁽¹⁵⁾. For this reason, more systematic approaches have been developed including both computational and experimental approaches.

The first example of a computational approach includes signature matching-comparing transcriptomics, chemical structures or adverse drug effects between two different drugs. Increased similarity in these signatures indicates similar targets ^(16–18). Other examples include the use of genome-wide association studies (GWAS) whereby an association of a genetic loci with one disease may also be associated with other diseases. This shared association may indicate the potential to use the same drug to treat the diseases ⁽¹⁹⁾. Another approach involves retrospective analysis of electronic health records which has been useful in alluding to the repurposing of drugs. This approach has been used for repurposing drugs like Sildenafil Citrate ⁽²⁰⁾. Furthermore, it was also recently used for repurposing Aspirin which was originally used for cardiovascular diseases but has now been recommended by the US Preventive Services Task Force for the chemoprevention of colorectal cancer ⁽²¹⁾.

In this work however, the scope of our study is focused on computational approaches that utilize information about the drug–protein interactions and their chemical effects on the protein activity.

Tensor factorisation. Scalars are singular numerical values, vectors are one dimensional numerical arrays, matrices are two dimensional numerical arrays, and tensors are numerical arrays with three or more dimensions. In our study, we focus on tensors with only three dimensions. In this case, a tensor cell represents the interaction between three components from the different tensor dimensions. This interaction commonly denotes the likelihood of a joint association between the three components which are represented by the cell. In practice, each dimension of a tensor data model represents components from a specific type such as chemicals or proteins in our case. For example, let us assume that we have a 3D tensor of chemicals, effects and protein targets. The value of the cell corresponding to the three components (1-nitropyrene, decreases_expression, LRRC17) will be 1 if the chemical “1-nitropyrene” has the effect “decreases_expression” on the protein target “LRRC17” and 0 otherwise.

The objective of the procedure of tensor decomposition i.e. tensor factorisation, is to complete all cell values in an incomplete tensor using a set of initial known cell values. This procedure is achieved by learning numerical representations of different tensor objects. The representations (i.e. embeddings) are then used to provide scores for any given tensor combination.

Let M be a three dimensional tensor, where the three dimensions represent objects of different sets X, Y , Z. Any element (i, j, k) in the tensor represents the interaction between the components i ∈ X, j ∈ Y , and k ∈ Z. We denote the weight of this interaction using η^M(i, j, k). In this study, we use a tensor M with elements of the three sets: chemicals (C), effects (E), and target proteins (P). The objective of tensor decomposition then is to complete the tensor values such that the weight of any interaction of a true known chemical effect on a protein target is larger than all other known false combinations. This can be defined as follows:

\forall (c, e, p) \forall (c, e, p)' η^{M} (c, e, p) > η^{M} (c, e, p)'

(1)

where c ∈ C, e ∈ E, p ∈ P, (c, e, p)′ is any known true combination of a chemical, an effect and a target protein such that the chemical c has the effect e on the protein p, and combinations (c, e, p)t represent any other false combinations. This objective is achieved using a multi-phase procedure where the model iteratively learns the missing scores by processing each of the initially known tensor combinations. In this work, we use the learning procedure of knowledge graph embedding models ⁽²²⁾. First, each object represented in the tensor is associated with initial random embeddings; these embeddings are then updated during the learning process such that the interactions of embeddings (i.e. the scoring function) yields high values for true combinations and lower values otherwise. Different models were developed to perform tensor decomposition where they vary in their modelling of the object embeddings, embedding interaction functions, and training objectives.

Ranking training objectives. Tensor factorisation models traditionally use learning-to-rank loss functions as their training objectives. This allows them to perform tensor completion through ranking tensor combinations according to their factuality. They use different approaches for modelling ranking objectives such as pairwise and pointwise loss objectives. For example, the DistMult model uses a pointwise hinge loss function. Its objective is then to minimise the marginal difference between negative and positive scores ⁽²³⁾, therefore, this makes the scores of positive combinations always higher than the scores of the negative combinations as shown in Eq. 1. On the other hand, other tensor factorisation models such as the ComplEx model ⁽²⁴⁾ uses a pointwise logistic objective which aims to minimise the difference between combination scores and their assigned targets.

Materials

In this study, we use a drug target interaction dataset extracted from the Comparative Toxicogenomics Database (CTD) ^{(12, 13)}. The CTD database contains data on chemicals, pathways, diseases, exposures, genes and phenotypes. It also contains different types of associations between these entities. In our study, we only consider the chemical gene associations where we filter out the interactions according to the related species to keep only the interactions assigned to humans.

We build a new benchmarking dataset, CTD38E, which contains associations between chemicals and their human protein targets from the CTD data. It also includes the different effect types related to these associations between the chemicals and proteins. The dataset includes a set of 38 different chemical effects which are filtered according to their coverage where we only keep effects that have 500 instances or more. We further divide the dataset into training and testing splits with 90% and 10% ratios respectively for the training and evaluation pipeline.

The different chemical effects describe an increase, decrease or uncategorised effect on the different protein attributes and activities such as methylation, oxidation, etc. For example, a chemical effect on a protein can increase the protein expression, decrease its abundance, have a general effect on its binding, etc.

The generated CTD38E dataset have a variable coverage of chemical effects where the change of protein expression associated effects have the highest coverage. Also, 32 out of the 38 represented chemical effects have approximately ≈ 15k or less instances in the dataset. On the other hand, the remaining effects have variable coverage that varies from ≈ 20k to ≈ 186k instances.

Methods

In this work, we use the TriVec tensor factorisation approach which provides an efficient means for modelling 3D tensors using multi-part embeddings. It also uses a multi-component embedding interaction function and multi-class training objective. In the following, we discuss the design of the TriVec tensor factorisation method and its training and evaluation pipelines. We further discuss its different unique properties such as its scoring function, training objective and embedding representation.

The training and prediction pipelines. The tensor factorisation process operates by learning vector representations for the different tensor objects during the training phase. These representations are then used to predict the probability of unknown tensor combinations. (Figure 1) presents an illustration of the training and prediction pipelines of our approach, the TriModel.

Figure 1. — Summary of training and evaluation pipeline of the TriVec tensor factorisation approach. The abbreviation CPE denotes chemical protein effect.

In the training phase, our approach starts with consuming the set of known tensor combinations and an initial set of embeddings of the tensor objects as shown in (Figure 1). The method then iteratively processes the known tensor combinations to update the initial embeddings. The updates to the embeddings vectors are executed through a batch-based gradient decent optimisation procedure ⁽²⁵⁾. The objective of this optimisation procedure is to maximise the scores of the given true combinations and to minimise the scores of other random combinations as specified in (Eq. 1). This objective is dependent on the methods scoring function, i.e. embedding interaction function, where this function provides a score for each tensor combination using the embeddings of its objects.

The consumed known combinations, in this context, represent the training split of the CTD38E dataset where the method tries to learn an efficient representation for each chemical, effect and protein in the dataset. After a specific given number of training iterations, the method stores the current values of the tensor objects’ embeddings. In the prediction phase, the method is able to provide a score for a given (chemical, effect, protein) combination using the learnt embeddings vectors. This is achieved by processing the embeddings corresponding to the combination’s objects through the same scoring function used in the training phase. This procedure is shown in prediction part in (Figure 1).

Multiple vector embeddings. The ComplEx model ⁽²⁴⁾ has introduced the use of multiple vectors to represent single tensor objects. This allowed it to encoding both ordered and unordered tensor combinations. In our work, we use the TriModel embeddings ⁽²⁶⁾ which is a similar approach that utilizes three embedding vectors for each tensor object. This enables efficiently encoding ordered combination like the ComplEx model with higher accuracy due to the extended representation.

Modelling embedding interactions. The embedding interaction function, the scoring function, of the TriModel uses a combination of symmetric and asymmetric products to encode embedding interactions. This allow the method to efficiently model both symmetric and asymmetric tensor combinations. The scoring function of the TriModel is defined as follows:

η_{(c, e, p)}^{M} = \sum_{i = 1}^{k} v_{c}^{1} . v_{e}^{1} . v_{p}^{3} + v_{c}^{2} . v_{e}^{2} . v_{p}^{2} + v_{c}^{3} . v_{e}^{3} . v_{p}^{1},

(2)

where $v_{c}^{1}, v_{c}^{2}$ and $v_{c}^{3}$ are the three vector representations of the object c and k is the size of the single embedding vectors. This procedure can be simplified as a collection of three products $(v_{c}^{1} . v_{e}^{1} . v_{p}^{3}), (v_{c}^{2} . v_{e}^{2} . v_{p}^{2})$ and $(v_{c}^{3} . v_{e}^{3} . v_{p}^{1})$ . The first and third products, then, are asymmetric while the second product is a symmetric procedure.

Training objective. The training objective of the TriVec model is to maximise the score of the true combinations and to minimise the scores of the other combinations as indicated in (Eq. 1). This is achieved through a multi-class loss objective that tries to maximise the score of each true combination compared to all possible corruption of its sides. For example, a true combination like (27-hydroxycholesterol, increases the expression of, CCN5) is compared to all combinations in the format of (· · · , increases the expression of, CCN5) and (27-hydroxycholesterol, increases the expression of, · · · ) where the dots represents all the possible values of its category. This procedure is executed using the negative-log softmax loss that is defined as follows:

\begin{array}{l} J_{TriModel-MC} = \sum_{c, e, p} [- 2. η_{(c, e, p)}^{M} + \log (\sum_{i'} η_{(i', e, p)}^{M}) + l o g \sum_{k'} η_{(c, e, p')}^{M} \\ + \frac{λ}{3} \sum_{m = 1}^{M} \sum_{d = 1}^{3} (| v_{i}^{d} | + | v_{j}^{d} | + | v_{k}^{d} |)], \end{array}

(3)

where c′ and k′ represent all possible chemicals and proteins respectively, λ is a configurable weight parameter and the term $\frac{λ}{3} \sum_{m = 1}^{M} \sum_{d = 1}^{3} (| v_{i}^{d} | + | v_{j}^{d} | + | v_{k}^{d} |)]$ is a regularisation term that represents the nuclear 3-norm ⁽²⁷⁾ that 3 m=1 d=1 is used for model generalisation purposes. This loss allows tensor factorisation methods to provide high accuracy predictions ⁽²⁷⁾, however, it has limited scalability ⁽²⁶⁾. This occurs since the function processes the full vocabulary of tensor objects for each training instance. Therefore, it has a a quadratic space and time complexity unlike the traditional ranking objectives that have linear time and space complexity ^{(23, 24, 28)}.

We also assess the performance of the TriVec method using a pointwise logistic loss function which enables highly scalable predictions compared to the previous multi-class loss approach. It is, however, known to have inferior accuracy when compared to the multi-class softmax loss. The pointwise logistic loss function is defined as follows:

J_{TriModel-Pt} = \sum_{c, e, p} \log (1 + e x p [- l_{(c, e, p)} . η_{(c, e, p)}^{M}]),

(4)

where l_(c,e,p) denotes the true label of the combination (c, e, p) which is equal to 1 if the combination is 1 true and 0 otherwise.

Methods

In this section, we discuss the design and configuration of our experiments. We first present the setup of our model training strategy: the models training configurations and the details parameters grid search. We then discuss our evaluation protocol, evaluation metrics and benchmarking data configuration. Finally, we compare the outcome evaluation results of the TriVec model to other approaches in terms of both the accuracy and efficiency.

Experimental setup. In our experiments, we use the CTD38E benchmarking dataset which we have generated. The dataset is divided into two splits: training and testing. We divide the training split into two random splits for training and validation (90% for training and 10% for validation). The testing split is only used of the evaluation of investigated models.

We compare the TriVec model to other tensor factorisation methods such as the DistMult and ComplEx models. We also compare it to the TransE model ⁽²⁸⁾ which is a distance-based graph embedding model which can be utilised to perform tensor completion. We train all models through a grid search procedure to find the best hyperparameters for each model. The search space of the hyperparameters is defined as follows: the learning rate lr ∈ {0.1, 0.3, 0.5}, embeddings size k ∈ {50, 75, 100, 150, 210} and batch size b ∈ {1000, 3000, 5000, 8000}. The rest of the grid search hyper parameters are defined as follows: in the ranking loss approach, we use the negative sampling ratio n ∈ {2, 5, 10}, and in the multi-class approach we use the regularisation weight λ ∈ {0.1, 0.3, 0.35, 0.01, 0.03, 0.035} and dropout d ∈ {0.0, 0.1, 0.2, 0.01, 0.02}. In the ranking loss approach, the number of training epochs is fixed to 1000. On the other hand, the number of epochs in the multi-class loss is 250.

Evaluation protocol. We use the testing split of the CTD38E dataset to assess the predictive accuracy and required training runtime of each of the investigated methods. To assess the predictive accuracy, we use different types of metrics such as mean reciprocal rank and Hits@10, area under the ROC and precision recall curves. In the following, we give a short description of each of these metrics.

Mean reciprocal rank (MRR). A ranking metric that is specified in assessing the quality of the highest predicted item in a rank. In our study, we use the MRR metric to assess if the model is able to find the right chemical and protein of a combination (in the testing split) in the set of all possible chemicals and drugs. This procedure resembles the link prediction procedure for knowledge graph completion ⁽²⁸⁾.
Hits@10. A ranking metric that has the same corruption mechanism and negative to positive ratio as the MRR metric. However, the Hits@10 focuses on the top 10 ranked items where it is equal to one if the item is found in the top 10 rank and zero otherwise. In our experiments, the reported Hits@10 values are the average of all the Hits@10 values of each testing combination.
The area under the ROC and precision recall curves (AUC-ROC and AUC-PR). We use both these metrics with different positive to negative ratios to evaluate the models’ sensitivity and specificity. We use three ratio, 1:1, 1:10 and 1:50 respectively, where the negative samples are generated randomly for each of the investigated chemical effects. The randomly generated negatives are filtered such that they do not intersect with any of the known combinations of the investigated effect. The AUC-ROC and AUC-PR metrics are then computed per chemical effect and averaged on the overall number of effects available in the data testing split.

Implementation details. We use TensorFlow framework (GPU) on Python 3.5 to perform our experiments. All experiments were executed on a Linux machine with processor Intel(R) Core(TM) i70.4790K CPU @ 4.00GHz, 32 GB RAM, and an nVidia Titan Xp GPU. We have published the dataset and training logs; a set of model predictions are published in a figshare repository at: https://figshare.com/articles/CTD-experiment/9383918. The source of our experiments is also published at: https://github.com/samehkamaleldin/ecpi.

Comparison with other approaches. (Table 1) presents a comparison between the TriVec method and other approaches in terms of both predictive accuracy metrics and training runtime. The results show that the TriVec model achieves significantly better scores compared to other approaches in terms of the 1-versus-all negative to positive metrics (MRR and Hits@10). The results show that the TriVec model with the multi-class loss approach achieves the best results in terms of all the predictive accuracy metrics. For example, the results show that the TriVec -MC approach achieve 0.28 MRR score which is approximately 200% better than the scores of its pointwise loss version, ComplEx, DistMult, TransE models. The Hits@10 score of the TriVec -MC method is also approximately 100% better than all the other approaches.

Table 1:

A Comparison between the TriVec model and other models model in terms of the mean reciprocal rank, Hits@10, area under the ROC and precision-recall curves, the training runtime for each training iteration and the total training runtime.

Open in a new tab

The multi-class version of the TriVec model also achieves the best results in terms of the area under the ROC and precision recall curves. However, the difference between its scores and the scores of other models is small (ranges from 1% to 2%) in all the negative sampling configurations. The results also show that the achieved enhancements of the TriVec model positively correlate with the negative to positive ratio. This shows that the TriVec model is able to provide better results than other approaches in harder evaluation settings (N50), which can be supported by its results on the 1-versus-all evaluation metrics (MRR and Hits10).

Despite the predictive accuracy enhancements achieved by the multi-class loss version of the TriVec model, it requires significantly higher training time compared to all other approaches as shown in (Table 1). In this context, the results show that the TransE and DistMult model require the least training time compared to all other methods.

Discussion

In this section, we discuss in details the findings of our experiments and the details of evaluation scores of the highest performing methods in terms of the AUC-PR for each of the investigated chemical effects. We also discuss the different properties and features that the family of tensor factorisation methods enable. We then discuss the challenges and limitations associated with using tensor factorisation techniques for predicting the effects of chemicals on the human proteins. Finally, we discuss the intended future activities that we intend to perform in upcoming works to extend the scope and objective of this study.

Scalability. Tensor factorisation methods are representation learning techniques that operate by learning efficient vector representations of tensor objects. They then use representations to assess the factuality of tensor combinations. The embedding learning procedure is known to operate with linear time and space complexity ^{(24, 26)}. This allows tensor factorisation methods to provide scalable predictions compared to other traditional approaches that require more complex feature processing routines.

Furthermore, the predictive procedure of the tensor factorisation methods is a constant time complexity routine (O1). This gives them a significant scalability advantage over other approaches that require feature processing in their predictive procedure after training.

Significance to clinical research. Despite the high accuracy of computational approaches in multiple biological inference tasks, they are never supposed to replace clinical experimentation. They however, aim to assist researchers in biological studies in prioritizing their experimentation configurations. For example, our study aims to assist biologists who are experimenting on different chemical substances to assess their effects on human proteins. Our proposed computational approach provides predictions that enables ranking possible configurations (combinations) of chemicals, proteins and their associated chemical effects according to their likelihood of being present. Biologists can use this rank to prioritize the order of executed experimentation to focus on the highest ranked combinations.

Limitations. Despite the high predictive accuracy of the tensor factorisation approaches, they are not easily interpreted. These methods operate as black boxes where it is hard to determine which set of features have affected their predictions. This also affects the trust in their predictions, especially in the biomedical domain, as it is critical to understand the rationale behind predictions to avoid misinformed judgements.

Tensor factorisation procedures build representations of tensor objects using their existing known combinations. Therefore, they provide low quality representations of the under represented objects ⁽²²⁾. In the context of biological information, the coverage of biological entities has a high variance due to the unbalanced focus of clinical and research studies of biological entities such as proteins, drugs, etc. Therefore, this affects the quality of representations of tensor factorisation methods of the under represented objects. In addition, the tensor factorisation methods are unable to provide beneficial representation of newly introduced objects e.g. new chemicals and proteins, as they require prior information to operate.

Future works. In future works, we intend to incorporate the information about the associated body tissues of each chemical–protein interactions. This will enable more accurate and specialised effects since the interactions between chemicals and proteins are strongly affected by the associated tissue context.

We also aim to experiment with representation learning methods that utilize protein and chemical structures rather than their prior information. This direction will enable more accurate and efficient predictions for new and under studied chemicals and proteins.

Conclusions

In this work, we have studied the problem of identifying the effects of the interactions between chemicals and human proteins. We have shown the importance of computational methods in assisting clinical research in this particular task. We have then proposed using tensor factorisation methods to predict the effects of chemicals on human proteins where we modelled the chemicals, their effects and the proteins as a tensor. We then used tensor factorisation to learn efficient representations of the tensor objects to be able to predict new combinations.

We have adopted the TriVec method as our main approach, and we have built a benchmarking dataset (CTD38E) based on the comparative Toxicogenomics database to train and evaluate our investigated approaches. We have then shown by computational experimental evaluation that the TriVec method outperforms other known tensor factorisation methods in the studied task in terms of different evaluation metrics such as the MRR, Hits@10 and the area under the ROC and precision recall curves.

Finally, we have discussed different properties and limitations of the different tensor factorisation approaches, and we have presented the set of intended future activities to extend the scope and objective of this study.

Funding

The work presented in this paper was supported by the CLARIFY project funded by European Commission under the grant number 875160, and by the Insight Centre for Data Analytics at the National University of Ireland Galway, Ireland (supported by the Science Foundation Ireland grant (12/RC/2289_P2).

Figures & Tables

References

1.Macdonald Marnie L., Lamerdin Jane E, Owens Stephen F., Keon Brigitte H., Bilter Graham K., Shang Zhidi, Huang Zhengping, Yu Helen, Dias Jennifer M., Minami Tomoe, Michnick Stephen W., Westwick John K. Identifying off-target effects and hidden phenotypes of drugs in human cells. Nature Chemical Biology. 2006;2:329–337. doi: 10.1038/nchembio790. [DOI] [PubMed] [Google Scholar]
2.Brehmer Dirk, Greff Zoltán, Godl Klaus, Blencke Stephanie, Kurtenbach Alexander, Weber Martina, Müller Stephan, Klebl Bert, Cotten Matthew, Kéri Gy., Wissing Josef, Daub Henrik. Cellular targets of gefitinib. Cancer research. 2005;65(2):379–82. [PubMed] [Google Scholar]
3.Kuenzi Brent M, Lily L., Rix Remsing, Stewart Paul Alexander, Fang Bin, Kinose Fumi, Bryant Annamarie T, Boyle Theresa A, Koomen John Matthew, Haura Eric B., Rix Uwe. Polypharmacology-based ceritinib repurposing using integrated functional proteomics. Nature chemical biology. 2017;13(12):1222–1231. doi: 10.1038/nchembio.2489. [DOI] [PMC free article] [PubMed] [Google Scholar]
4.Iljin Kristiina, Ketola Kirsi, Vainio Paula, Halonen Pasi K, Kohonen Pekka, Fey Vidal rer. nat., Grafström Roland C, Perälä Merja, Kallioniemi Olli. High-throughput cell-based screening of 4910 known drugs and drug-like small molecules identifies disulfiram as an inhibitor of prostate cancer cell growth. Clinical cancer research : an official journal of the American Association for Cancer Research. 2009;15(19):6070–8. doi: 10.1158/1078-0432.CCR-09-1035. [DOI] [PubMed] [Google Scholar]
5.Bleakley Kevin, Yamanishi Yoshihiro. Supervised prediction of drug–target interactions using bipartite local models. Bioinformatics. 2009 doi: 10.1093/bioinformatics/btp433. [DOI] [PMC free article] [PubMed] [Google Scholar]
6.Olayan Rawan S., Ashoor Haitham, Bajic Vladimir B. Ddr: efficient computational method to predict drug–target interactions using graph mining and machine learning approaches. Bioinformatics. 2018 doi: 10.1093/bioinformatics/bty417. [DOI] [PMC free article] [PubMed] [Google Scholar]
7.Mohamed Sameh K, Novácˇek Vít, Nounu Aayah. Discovering protein drug targets using knowledge graph embeddings. Bioinformatics. 2019;08 doi: 10.1093/bioinformatics/btz600. [DOI] [PubMed] [Google Scholar]
8.Mohamed Sameh K., Novácek Vít, Vandenbussche Pierre-Yves. SAC. ACM; 2018. Knowledge base completion using distinct subgraph paths; pp. 1992–1999. [Google Scholar]
9.Zitnik Marinka, Agrawal Monica, Leskovec Jure. Modeling polypharmacy side effects with graph convolutional networks. Bioinformatics. 2018 doi: 10.1093/bioinformatics/bty294. [DOI] [PMC free article] [PubMed] [Google Scholar]
10.Muñoz Emir, Novácek Vít, Vandenbussche Pierre-Yves. Using drug similarities for discovery of possible adverse reactions; AMIA 2016, American Medical Informatics Association Annual Symposium; Chicago, IL, USA. 2016. November 12-16, 2016. AMIA. [PMC free article] [PubMed] [Google Scholar]
11.Zitnik Marinka, Leskovec Jure. Predicting multicellular function through multi-layer tissue networks. Bioinformatics. 2017 doi: 10.1093/bioinformatics/btx252. [DOI] [PMC free article] [PubMed] [Google Scholar]
12.Mattingly Carolyn J., Rosenstein Michael C., Colby Glenn T., Forrest John N., Boyer James. The comparative toxicogenomics database (ctd): a resource for comparative toxicological studies. Journal of experimental zoology. Part A, Comparative experimental biology. 2006;305(9):689–92. doi: 10.1002/jez.a.307. [DOI] [PMC free article] [PubMed] [Google Scholar]
13.Davis Allan Peter, Grondin Cynthia J, Johnson Robin J, Sciaky Daniela, McMorran Roy, Wiegers Jolene, Wiegers Thomas C, Mattingly Carolyn J. The Comparative Toxicogenomics Database: update 2019. Nucleic Acids Research. 2018;47(D1):D948–D954, 09. doi: 10.1093/nar/gky868. [DOI] [PMC free article] [PubMed] [Google Scholar]
14.Ashburn Ted T, Thor Karl B. Drug repositioning: identifying and developing new uses for existing drugs. Nature reviews Drug discovery. 2004;3(8):673. doi: 10.1038/nrd1468. [DOI] [PubMed] [Google Scholar]
15.Chong Curtis Robert, Sullivan David J. New uses for old drugs. Nature. 2007;448:645–646. doi: 10.1038/448645a. [DOI] [PubMed] [Google Scholar]
16.Oprea Tudor I., Tropsha Alexander, Faulon Jean-Loup, Rintoul Mark D. Systems chemical biology. Nature chemical biology. 2007;3(8):447–50. doi: 10.1038/nchembio0807-447. [DOI] [PMC free article] [PubMed] [Google Scholar]
17.Campillos Mónica, Kuhn Michael, Gavin Anne-Claude, Jensen Lars Juhl, Bork Peer. Drug target identification using side-effect similarity. Science. 2008;321(5886):263–6. doi: 10.1126/science.1158140. [DOI] [PubMed] [Google Scholar]
18.Di Iorio Francesco, Rittman Timothy, Ge Hong, Menden Michael P., Sáez-Rodríguez Julio. Transcriptional data: a new gateway to drug repositioning? Drug discovery today. 2013 doi: 10.1016/j.drudis.2012.07.014. [DOI] [PMC free article] [PubMed] [Google Scholar]
19.Sanseau Philippe, Agarwal Pankaj, Barnes M. R., Pastinen Tomi, Richards Jeremy B, Cardon Lon R., Mooser Vincent E. Use of genome-wide association studies for drug repositioning. Nature Biotechnology. 2012;30:317–320. doi: 10.1038/nbt.2151. [DOI] [PubMed] [Google Scholar]
20.Jin Guangxu, Wong Stephen T. C. Toward better drug repositioning: prioritizing and integrating existing methods into efficient pipelines. Drug discovery today. 2014;19(5):637–44. doi: 10.1016/j.drudis.2013.11.005. [DOI] [PMC free article] [PubMed] [Google Scholar]
21.Calonge Ned, Petitti Diana B., Dewitt Thomas G, Dietrich Allen J., Gregory Kimberly D., Harris Richard J., Isham George T., Lefevre Michael L, Leipzig Roseanne M., Loveland-Cherry Carol J., Marion Lucy Nelle, Melnyk Bernadette Mazurek, Moyer Virginia A., Ockene Judith K., Sawaya George F., Yawn Barbara P. Screening for colorectal cancer: U.s. preventive services task force recommendation statement. Annals of internal medicine. 2008;149(9):627–37. doi: 10.7326/0003-4819-149-9-200811040-00243. [DOI] [PubMed] [Google Scholar]
22.Nickel Maximilian, Murphy Kevin, Tresp Volker, Gabrilovich Evgeniy. A review of relational machine learning for knowledge graphs. Proceedings of the IEEE. 2016;104(1):11–33. [Google Scholar]
23.Yang Bishan, Yih Wen-tau, He Xiaodong, Gao Jianfeng, Deng Li. Embedding entities and relations for learning and inference in knowledge bases. ICLR. 2015 [Google Scholar]
24.Trouillon Théo, Welbl Johannes, Riedel Sebastian, Gaussier Éric, Bouchard Guillaume. Complex embeddings for simple link prediction. ICML, volume 48 of JMLR Workshop and Conference Proceedings; pp. 2071–2080. JMLR.org. 2016. [Google Scholar]
25.Tran Phuong Thi, et al. On the convergence proof of amsgrad and a new version. IEEE Access. 2019;7:61706–61716. [Google Scholar]
26.Mohamed Sameh K., Novácek Vít. ESWC, volume 11503 of Lecture Notes in Computer Science. Springer; 2019. Link prediction using multi part embeddings; pp. 240–254. [Google Scholar]
27.Lacroix Timothée, Usunier Nicolas, Obozinski Guillaume. Canonical tensor decomposition for knowledge base completion. ICML, volume 80 of JMLR Workshop and Conference Proceedings; pp. 2869–2878. JMLR.org. 2018. [Google Scholar]
28.Bordes Antoine, Usunier Nicolas, García-Durán Alberto, Weston Jason, Yakhnenko Oksana. Translating embeddings for modeling multi-relational data. NIPS. 2013:2787–2795. [Google Scholar]

[r1_3269416] 1.Macdonald Marnie L., Lamerdin Jane E, Owens Stephen F., Keon Brigitte H., Bilter Graham K., Shang Zhidi, Huang Zhengping, Yu Helen, Dias Jennifer M., Minami Tomoe, Michnick Stephen W., Westwick John K. Identifying off-target effects and hidden phenotypes of drugs in human cells. Nature Chemical Biology. 2006;2:329–337. doi: 10.1038/nchembio790. [DOI] [PubMed] [Google Scholar]

[r2_3269416] 2.Brehmer Dirk, Greff Zoltán, Godl Klaus, Blencke Stephanie, Kurtenbach Alexander, Weber Martina, Müller Stephan, Klebl Bert, Cotten Matthew, Kéri Gy., Wissing Josef, Daub Henrik. Cellular targets of gefitinib. Cancer research. 2005;65(2):379–82. [PubMed] [Google Scholar]

[r3_3269416] 3.Kuenzi Brent M, Lily L., Rix Remsing, Stewart Paul Alexander, Fang Bin, Kinose Fumi, Bryant Annamarie T, Boyle Theresa A, Koomen John Matthew, Haura Eric B., Rix Uwe. Polypharmacology-based ceritinib repurposing using integrated functional proteomics. Nature chemical biology. 2017;13(12):1222–1231. doi: 10.1038/nchembio.2489. [DOI] [PMC free article] [PubMed] [Google Scholar]

[r4_3269416] 4.Iljin Kristiina, Ketola Kirsi, Vainio Paula, Halonen Pasi K, Kohonen Pekka, Fey Vidal rer. nat., Grafström Roland C, Perälä Merja, Kallioniemi Olli. High-throughput cell-based screening of 4910 known drugs and drug-like small molecules identifies disulfiram as an inhibitor of prostate cancer cell growth. Clinical cancer research : an official journal of the American Association for Cancer Research. 2009;15(19):6070–8. doi: 10.1158/1078-0432.CCR-09-1035. [DOI] [PubMed] [Google Scholar]

[r5_3269416] 5.Bleakley Kevin, Yamanishi Yoshihiro. Supervised prediction of drug–target interactions using bipartite local models. Bioinformatics. 2009 doi: 10.1093/bioinformatics/btp433. [DOI] [PMC free article] [PubMed] [Google Scholar]

[r6_3269416] 6.Olayan Rawan S., Ashoor Haitham, Bajic Vladimir B. Ddr: efficient computational method to predict drug–target interactions using graph mining and machine learning approaches. Bioinformatics. 2018 doi: 10.1093/bioinformatics/bty417. [DOI] [PMC free article] [PubMed] [Google Scholar]

[r7_3269416] 7.Mohamed Sameh K, Novácˇek Vít, Nounu Aayah. Discovering protein drug targets using knowledge graph embeddings. Bioinformatics. 2019;08 doi: 10.1093/bioinformatics/btz600. [DOI] [PubMed] [Google Scholar]

[r8_3269416] 8.Mohamed Sameh K., Novácek Vít, Vandenbussche Pierre-Yves. SAC. ACM; 2018. Knowledge base completion using distinct subgraph paths; pp. 1992–1999. [Google Scholar]

[r9_3269416] 9.Zitnik Marinka, Agrawal Monica, Leskovec Jure. Modeling polypharmacy side effects with graph convolutional networks. Bioinformatics. 2018 doi: 10.1093/bioinformatics/bty294. [DOI] [PMC free article] [PubMed] [Google Scholar]

[r10_3269416] 10.Muñoz Emir, Novácek Vít, Vandenbussche Pierre-Yves. Using drug similarities for discovery of possible adverse reactions; AMIA 2016, American Medical Informatics Association Annual Symposium; Chicago, IL, USA. 2016. November 12-16, 2016. AMIA. [PMC free article] [PubMed] [Google Scholar]

[r11_3269416] 11.Zitnik Marinka, Leskovec Jure. Predicting multicellular function through multi-layer tissue networks. Bioinformatics. 2017 doi: 10.1093/bioinformatics/btx252. [DOI] [PMC free article] [PubMed] [Google Scholar]

[r12_3269416] 12.Mattingly Carolyn J., Rosenstein Michael C., Colby Glenn T., Forrest John N., Boyer James. The comparative toxicogenomics database (ctd): a resource for comparative toxicological studies. Journal of experimental zoology. Part A, Comparative experimental biology. 2006;305(9):689–92. doi: 10.1002/jez.a.307. [DOI] [PMC free article] [PubMed] [Google Scholar]

[r13_3269416] 13.Davis Allan Peter, Grondin Cynthia J, Johnson Robin J, Sciaky Daniela, McMorran Roy, Wiegers Jolene, Wiegers Thomas C, Mattingly Carolyn J. The Comparative Toxicogenomics Database: update 2019. Nucleic Acids Research. 2018;47(D1):D948–D954, 09. doi: 10.1093/nar/gky868. [DOI] [PMC free article] [PubMed] [Google Scholar]

[r14_3269416] 14.Ashburn Ted T, Thor Karl B. Drug repositioning: identifying and developing new uses for existing drugs. Nature reviews Drug discovery. 2004;3(8):673. doi: 10.1038/nrd1468. [DOI] [PubMed] [Google Scholar]

[r15_3269416] 15.Chong Curtis Robert, Sullivan David J. New uses for old drugs. Nature. 2007;448:645–646. doi: 10.1038/448645a. [DOI] [PubMed] [Google Scholar]

[r16_3269416] 16.Oprea Tudor I., Tropsha Alexander, Faulon Jean-Loup, Rintoul Mark D. Systems chemical biology. Nature chemical biology. 2007;3(8):447–50. doi: 10.1038/nchembio0807-447. [DOI] [PMC free article] [PubMed] [Google Scholar]

[r17_3269416] 17.Campillos Mónica, Kuhn Michael, Gavin Anne-Claude, Jensen Lars Juhl, Bork Peer. Drug target identification using side-effect similarity. Science. 2008;321(5886):263–6. doi: 10.1126/science.1158140. [DOI] [PubMed] [Google Scholar]

[r18_3269416] 18.Di Iorio Francesco, Rittman Timothy, Ge Hong, Menden Michael P., Sáez-Rodríguez Julio. Transcriptional data: a new gateway to drug repositioning? Drug discovery today. 2013 doi: 10.1016/j.drudis.2012.07.014. [DOI] [PMC free article] [PubMed] [Google Scholar]

[r19_3269416] 19.Sanseau Philippe, Agarwal Pankaj, Barnes M. R., Pastinen Tomi, Richards Jeremy B, Cardon Lon R., Mooser Vincent E. Use of genome-wide association studies for drug repositioning. Nature Biotechnology. 2012;30:317–320. doi: 10.1038/nbt.2151. [DOI] [PubMed] [Google Scholar]

[r20_3269416] 20.Jin Guangxu, Wong Stephen T. C. Toward better drug repositioning: prioritizing and integrating existing methods into efficient pipelines. Drug discovery today. 2014;19(5):637–44. doi: 10.1016/j.drudis.2013.11.005. [DOI] [PMC free article] [PubMed] [Google Scholar]

[r21_3269416] 21.Calonge Ned, Petitti Diana B., Dewitt Thomas G, Dietrich Allen J., Gregory Kimberly D., Harris Richard J., Isham George T., Lefevre Michael L, Leipzig Roseanne M., Loveland-Cherry Carol J., Marion Lucy Nelle, Melnyk Bernadette Mazurek, Moyer Virginia A., Ockene Judith K., Sawaya George F., Yawn Barbara P. Screening for colorectal cancer: U.s. preventive services task force recommendation statement. Annals of internal medicine. 2008;149(9):627–37. doi: 10.7326/0003-4819-149-9-200811040-00243. [DOI] [PubMed] [Google Scholar]

[r22_3269416] 22.Nickel Maximilian, Murphy Kevin, Tresp Volker, Gabrilovich Evgeniy. A review of relational machine learning for knowledge graphs. Proceedings of the IEEE. 2016;104(1):11–33. [Google Scholar]

[r23_3269416] 23.Yang Bishan, Yih Wen-tau, He Xiaodong, Gao Jianfeng, Deng Li. Embedding entities and relations for learning and inference in knowledge bases. ICLR. 2015 [Google Scholar]

[r24_3269416] 24.Trouillon Théo, Welbl Johannes, Riedel Sebastian, Gaussier Éric, Bouchard Guillaume. Complex embeddings for simple link prediction. ICML, volume 48 of JMLR Workshop and Conference Proceedings; pp. 2071–2080. JMLR.org. 2016. [Google Scholar]

[r25_3269416] 25.Tran Phuong Thi, et al. On the convergence proof of amsgrad and a new version. IEEE Access. 2019;7:61706–61716. [Google Scholar]

[r26_3269416] 26.Mohamed Sameh K., Novácek Vít. ESWC, volume 11503 of Lecture Notes in Computer Science. Springer; 2019. Link prediction using multi part embeddings; pp. 240–254. [Google Scholar]

[r27_3269416] 27.Lacroix Timothée, Usunier Nicolas, Obozinski Guillaume. Canonical tensor decomposition for knowledge base completion. ICML, volume 80 of JMLR Workshop and Conference Proceedings; pp. 2869–2878. JMLR.org. 2018. [Google Scholar]

[r28_3269416] 28.Bordes Antoine, Usunier Nicolas, García-Durán Alberto, Weston Jason, Yakhnenko Oksana. Translating embeddings for modeling multi-relational data. NIPS. 2013:2787–2795. [Google Scholar]

PERMALINK

Predicting The Effects of Chemical-Protein Interactions On Proteins Using Tensor Factorisation

Sameh K Mohamed

Aayah Nounu

Abstract

Introduction

Background

Materials

Methods

Figure 1.

Methods

Table 1:

Discussion

Conclusions

Funding

Figures & Tables

References

ACTIONS

PERMALINK

RESOURCES

Cite

Add to Collections

PERMALINK

Predicting The Effects of Chemical-Protein Interactions On Proteins Using Tensor Factorisation

Sameh K Mohamed

Aayah Nounu

Abstract

Introduction

Background

Materials

Methods

Figure 1.

Methods

Table 1:

Discussion

Conclusions

Funding

Figures & Tables

References

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases