Reconstructing phylogenetic tree using a protein–protein interaction technique

Shamita Malik; Dolly Sharma; Sunil Kumar Khatri

doi:10.1049/iet-nbt.2016.0177

. 2017 Sep 18;11(8):1005–1016. doi: 10.1049/iet-nbt.2016.0177

Reconstructing phylogenetic tree using a protein–protein interaction technique

Shamita Malik ^1,^✉, Dolly Sharma ², Sunil Kumar Khatri ³

PMCID: PMC8676049 PMID: 29155401

Abstract

In this study, a novel substitution method for finding potential protein–protein interactions (PPIs) has been discussed. This newly designed method for analyzing PPI also aids in the comparison of evolutionary distances. The method deals with various data sets, and additionally performs measurable assessment to determine PPIs is introduced. PPIs are biologically relevant and aid in better conceptual framework of phylogenetic profiling. The newly designed framework gives vision to relate the topological properties of the system with evolutionary behavior of datasets. Firstly, this study found that the most conserved protein motifs exist at the roots of the system, whereas newer motifs with mutations have a tendency to dwell on the branches. In‐depth functional analysis revealed that the most conserved motifs have high specificity for improved structural procedures and pathway engagements, which may help identify their formative parts in cells. In conclusion, this study demonstrates several important aspects for future studies focusing to enhance phylogenetic profiling systems. This study can also be used effectively to utilize such strategies to develop new biological insights which will further lead to understanding of disease mechanisms.

Inspec keywords: proteins, molecular biophysics

Other keywords: phylogenetic tree, protein–protein interaction technique, biological networks, topological properties, functional analysis, phylogenetic profiling systems, disease mechanisms

1 Introduction

The complexity of phylogenetic models greatly encourages the discussion and testing of hypotheses. However, it has been observed that the complex evolutionary scenarios are poorly described by such models. To model such assumptions ‘Phylogenetic networks’ are used as tools to explore and interpret data. These networks can be computed from a wide range of data including results from multiple sequence alignments (MSA), clusters, splits, rooted and unrooted trees as well as from distance matrices. The phylogenetic analysis aims at uncovering the evolutionary relationships between different species or taxa, to obtain an understanding of the evolution of life on earth. This analysis is computed from molecular sequences which help in understanding evolutionary history of gene families, sequence analysis, transmission of infectious diseases, and disease evolution. In almost all cases, there is no way to confirm whether a given phylogenetic tree represents a 100% genuine tree along with the sequences that actually evolved. Several computational techniques for anticipating protein‐protein interactions (PPIs) are turning out to be extremely useful and are being used by biologists to predict the same. Proteins control every single biological framework in the cells. Although numerous proteins perform their functions independently, by far most of the proteins connect with others for precise biological functioning. Proteins are considered as active workers that channelise several biological processes in the cells, including cell development, multiplication, morphological changes, motility, intercellular communication and apoptosis [1]. However, the cells react to a bunch of stimuli and thus protein expression is a very dynamic process. In addition, these interaction systems are imperative as a framework of systems biology, as they may reveal the generic association of functional cell systems, when both spatial and temporal parts of interaction are considered [2]. Protein–protein association models introduce bits of knowledge into a system for improved understanding of the functional association of the proteomes. Several PPI models have elucidated valuable information regarding collective dynamics [3], module identification [4], signalling pathway modelling [5], and guide in clinical research, such as finding biomarkers, disease discoveries [6], and tumour stratification. Rao et al. [7] have described that PPIs can regulate diverse biological processes by:

Altering the kinetic properties of enzymes, which can be mediated by either altered binding of substrates or altered catalysis.
Acting as a general mechanism by coordinating substrate channelling.
Creating a new interaction site for small molecules.
Inactivating a protein.
Changing the specificity of a protein and its substrate through association with various binding accomplices.
Serving as a modulatory part in either upstream or downstream process.

Typically a blend of strategies is important to affirm and confirm protein interactions. Obscure proteins might be understood better by analysing their relationship with at least one protein that is known. PPI investigations may likewise reveal unanticipated information that would allow us to deeply understand protein biology. Studies have demonstrated that proteins with more number of associations (centre points) can incorporate groups of catalysts and several inherently scattered proteins [8]. In any case, PPIs include several heterogeneous processes and the extent of their diversity is substantial. For more accurate comprehension of their significance in the cells, one needs to perceive diverse protein interactions and choose the results from the best models [9]. The extensive PPI systems have been modelled from intensive analysis of datasets. In summary, the magnitude of PPI information has enforced an enormous task on systems biologists and data analysts. The computational examination of PPI systems is progressively turning into an inevitable tool to comprehend the hidden elements of protein biology. The PPI is one of the most important and valuable subjects that could be used for the improvement and advancement of our current understanding of biological science.

1.1 Classification of PPI detection methods

Distinguishing and describing the full reactome of these cellular proteins and the interactions between them are of critical significance for detailed comprehension of the properties of a living cell. There are different physical techniques to detect and analyse proteins that bind other proteins. Protein Affinity Chromatography is one of the oldest techniques used to study cellular protein interactions. It was first used 40 years ago to study interaction of phage proteins with different forms of E. coli RNA polymerase [10]. Another technique is Affinity Blotting in which proteins can be fractionated by Poly Acrylamide Gel Electrophoresis (PAGE) and then transferred onto a nitrocellulose membrane [11]. In Immunoprecipitation, cell lysates are prepared and incubated with antibody for protein in question. Once the binding of antibody–protein is attained, bound proteins are eluted and analysed by gel electrophoresis [12]. The cross‐connecting is utilised as a part of two approaches to determine PPI. To begin with, it is utilised to reason the design of proteins or gatherings that are promptly segregated. Second, it is utilised to recognise proteins that collaborate with a given test protein ligand. The PPI recognition strategies are completely characterised into three sorts, specifically, in vitro, in vivo, and in silico techniques. In vitro, yeast two‐hybrid (Y2H) technique is performed in a controlled situation outside a living system. The Y2H technique in PPI discovery is pair partiality purging, liking chromatography, coimmunoprecipitation, protein exhibits, protein section complementation, phage show, X‐beam crystallography and NMR spectroscopy. In vivo analysis, ‘n’ Y2H is performed inside the organism. In silico, methods are performed on a computer (or) by means of computer simulation. In any case, the information produced through these methodologies may not be dependable as a result of non‐availability of conceivable PPIs [13].

1.2 PPI network and its computational analysis

A PPI framework can be portrayed as a heterogeneous arrangement of proteins joined by associations as edges. The computational examination of PPI frameworks begins with the representation of the PPI framework arrangement. Protein is represented as a node in a graph, while the other proteins that interact with it are represented as adjacent nodes associated by an edge. An examination of the framework can yield a variety of results. However, sometimes, the computational analysis of PPI faced significant obstructions as below [7, 14]:

The proteins cooperation is not steady.
Every protein may have differing parts to perform.
Proteins with unmistakable capacities intermittently associate with each other.

1.3 Phylogenetic and PPI detection

Another vital technique for location of interaction between the proteins is phylogenetic tree. The idea of phylogenetic tree is the functionally associated proteins tend to exist together in the midst of the headway of a life form. A phylogenetic profile portrays an occasion of a particular protein in an arrangement of genomes: two proteins in the same phylogenetic profile implied both proteins having functional and effective correlation. The phylogenetic tree provides the evolution of the protein. The essential thought of this procedure is that the co‐existing evolution of the associating proteins is derived from the level of closeness between the distance matrices of relating phylogenetic trees of PPI. The course of action of living beings frames general to the protein–protein that are looked over the different various MSA which results in building related protein distance matrix. The BLAST scores can be utilised in the same for determining the structures. At that point the direct relationship is ascertained among these separation frameworks. The high association scores demonstrate the resemblance in phylogenetic trees and in this way the proteins are assumed having the interaction correlation [15]. The mirror tree strategy recognises the co‐existing evolution correlation among proteins and infers the probability of physical correspondence between them [14, 16, 17].

The computational techniques have pulled in huge consideration among biologists due to the capacity to anticipate PPI. In this study, the new computational strategy named PPI‐I for PPI prediction and with addition to portraying significant motifs, which store both anticipated and distinguished PPI is introduced. The test results uncovered that every arrangement of conserved motifs has high specificity of enhanced biological factors and pathway engagements, which could relate to their developmental parts in eukaryotic cells. Therefore, it is very challenging to develop computational methods that efficiently calculate PPI's which are further helpful to create phylogenetic network, which acts as base for disease finding and drug discoveries.

2 Methods

In this paper, the dataset is taken from National Centre for Biotechnology Information (NCBI) [18]. The data is freely available on this site for researchers. The selected datasets are the HIV, H1N1, Ebola viruses used in newly designed algorithm PPI‐I, so that the performance of newly designed algorithm can be verified. The dataset is used as the training sets for final reconstruction of phylogenetic trees (Fig. 1).

Fig. 1 — PPI‐I for reconstruction of phylogenetic tree

Initially after getting the datasets for analysing, the sequences are arranged with the help of sequence alignment. The sequence arrangement is essential in any examinations of relationship connections, and even tertiary structure information from protein amino successions. The alignments process can be done either manually or by using in‐built software. In this study, ClustalW [19] is used for alignment of HIV, H1N1 and Ebola viruses. The evolutionary relationships expect that a specific rate of the amino acid deposits in a protein sequence is conserved, the most direct way to deal with the review of the relationships between two sequences. It would be, by checking the number of identical and similar amino acids. This is done by sequence alignment. The number of identical and similar amino acid corrosive deposits may then be contrasted with the aggregate number of amino acids in the protein and the subsequent number is known as the percentage of sequence identity or sequence similarity.

Once the sequences are aligned, the need arises to find out the motifs. The motifs are sorted on the basis of recursion in DNA sequence that is assumed to have biological functions. The planted motif search (PMS5) is used identify the list of motifs [20, 21]. The basic idea behind PMS5 is get the list of motifs which are frequent occurring in the dataset. However, on this list of motifs, again ReTF [22] algorithm is applied to get list of most conserved motifs. The conditional probability for each motif is calculated in the following manner:

Input : A motif m with alphabets Ω = {A, C, G, T }, a threshold ‘t’, an integer ‘a’, the stationary probability and transition probability of a first‐order Markov region R

Output : Pr(m hits region R at least ‘a’ times)

For each motif x in L

{ calculate P ‐value of x;

Sort L based on the P ‐value;

Output the motif M with highest P ‐value (In case of multiple motifs with the same

P ‐Value, one motif is randomly chosen from them);

}

The motifs calculation is done from database with parameters ‘l’ for length and ‘d’ is number of mutations (l, d) from length 6 till 20. Let M be the final dataset of motifs

M = (a [list of motifs])

The p ‐value for each motif is calculated and finally the top motifs which have the highest p ‐value are used for further correlation analysis.

2.1 Phylogenetic matrix calculation and its correlation analysis

The primary mirror tree technique is proposed by Pazos and Valencia [6] to result in PPI communication. These communications evaluate through the corresponding proteins. The belief is that two proteins need to have a higher risk to share correlated evolutionary history, if they have interaction with each other. As the developmental history for a protein might be addressed as a phylogenetic tree, it looks good to contrast the proteins tree and discover any association among their evolutionary history [23]. Rather than assessing two trees immediately, that is a very non‐trivial assignment in expressions of both algorithmic execution and evolutionary knowledge, the mirror tree strategy uses as a surrogate for partition structures that keep the genetic separations among the protein and its orthologues in a social occasion of genomes. Let us assume for two proteins motifs X and Y, the mirror tree technique compares their distance matrices DA and DB , respectively, where ‘Ave’ and ‘var’ represent the average and the variance of the upper (or lower) half elements of a distance matrix [14]

ρ (A B) = \frac{\sum_{i = 1}^{n - 1} \sum_{j = i + 1}^{n} (D_{A} (i, j) - Ave (D_{A})) (D_{B} (i, j) - Ave (D_{B}))}{\sqrt{var (D_{A} var (D_{B}))}}

Due to the fact that the distance motifs matrices are symmetric, simply the components within the upper or lower triangle of the matrices are expected to extract the connection, which is measured as the Pearson relationship coefficient ‘ρ’.

In this study, an algorithm is proposed for an approach that deals with the intra‐framework relationships. The idea is to find out interaction specificity among paralogous proteins [24]. The rows and columns of the distance matrices are reshuffled so as to discover maximal similarity measured as an inter‐matrix correlation. As the species tree exhibits a connection among the host genomes, weighting the network components with a specific end goal. The threshold point is decided and all the motifs above that decided threshold will be considered for new matrix, others are ignored (Fig. 2).

Fig. 2 — Distance matrix after identifying most conserved motifs

The phylogenetic matrix calculation and its correlation analysis are shown in Appendix 1.

2.2 Support vector machine (SVM)

The classifier utilised in this research is an SVM. It is an effective statistical learning method technique, bolster vector proposed by Vapnik [25]. It is widely used as bioinformatics problems. SVM has a few favourable circumstances as connected in the present setting [26]:

SVM gives a principle intended to estimate execution by means of generalisation error.
SVM is fast preparing which is required for high‐throughput screening of huge datasets.
SVM is instantly adapted to new data. This also allows for continuous model in parallel with the procedure with advancement of evolutionary data.

In the present examination, SVM is developed to appraise the most conserved motifs of interacting proteins. The decision patters created by the algorithms are then used to generate a decision upon introduction of a new set of motifs. However, these new set of motifs is based on primary structure of the motifs.

2.3 Database of interacting protein and preparation of training and testing sets

The database matrix comprises 8784, 7882, 7677 entries representing most conserved motifs after implementation of PPI‐I. The accepted conserved motifs contain fields representing accession codes linking to other public protein databases and protein name identification. The motifs identified were inspected aimlessly, and information was divided into training and testing sets. The 1 : 1 ratio meaning one set was used as examples for training and other for testing the prediction system. Testing illustrations were not presented to the framework during SVM learning. The database is robust as in it speaks to a summary of protein interaction data gathered from different experiments.

3 Results and conclusion

Parsing the list of motifs, testing the records and structure, randomisation control and feature list creation were executed in Java programming. The PPI‐I is processed to remove unnecessary motifs and stored into new database for fast reconstruction of phylogenetic network using meaningful PPI's.

3.1 Filtering less correlated motifs

The proposed algorithm filters out the less correlated pairs, thus comprising out dataset with highly correlated pairs. The threshold for the motifs is decided [22], so that motif_val < = α, the motif pair was kept, otherwise discarded. In this, α = 1 is used for the assessment of the technique. The estimation is further shifted from 0.01 to 1 keeping in mind the end goal to assess the impact of the quality of the channel on trade‐off between the extent of loss of positive cases from the motifs list and extent of shifted negative illustrations.

Table 1 represents the database of motifs after applying PMS5 on the motifs and column 2 represents motifs after applying PPI‐I algorithm.

Table 1.

List of most conserved motifs

Motifs after PMS5		Motifs after PPI‐I
HIV	25,552	8784
HINI	23,333	7882
Swine Flu	18,630	7677

Open in a new tab

3.2 Findings

From the list of motifs that came after PPI‐I, that was assigned the highest score were taken for final evaluation of the algorithm. These speak to protein combines that have highlighted in the same way as common co‐complexed proteins. The datasets considered are 1500 sets initially. These motifs sets show exploratory proof, since confirmation of co‐existence over neighbourhood protection and genomes not autonomous with the phylogenetic profile utilised as a part of strategy.

Fig. 3 represents the overall time required to calculate the motifs list on a number of different processors.

The faded line shows the extent of loss of most correlated examples whenever value of α shifted to negative range. The vertical and horizontal bars displayed the standard deviation of the mean when same motifs are found repeatedly.

3.3 Training and testing model findings

The training and testing model information documents were produced utilising a recommended k ‐let correlation frequency. The correlation frequency set as (k ∈[1, 2, 3, 4, 5]) assuming sampling size as input parameters to the data preparation software. Every motif in SVM is accounted for random sampling. Also a different SVM was trained for each k ‐let correlation frequency. The training was done to check the result biases and then the results were averaged to get the final output. The execution of every single SVM is assessed utilising the inductive precision on the already concealed test illustrations as the execution metric.

‘Inductive precision’ is characterised here as the rate of right protein interaction expectations on the test set, comprising of about equivalent quantities of true and false interaction cases (Table 2). Every column in the table relates to a consistent k ‐let frequency. It is basically used to produce the negative training and testing cases. In the second column, it reflects the data used for each case. These information have further been arrived at the midpoint of over a trails of N = 10.

Table 2.

SVM training dataset with inductive precision

k ‐let correlation frequency	(Training, Testing)	Inductive precision
1	(2137, 2137)	79.14 ± 0.82
2	(2132, 2110)	79.13 ± 0.79
3	(2166, 2185)	80.09 ± 0.88
4	(2188, 2172)	81.14 ± 0.85
5	(2160, 2160)	80.14 ± 0.88

Open in a new tab

3.4 Sequence rearrangements

The motifs selected after PPI‐I algorithm helps us further to rearrange the sequences. The program PhyML has been utilised for the reconstruction of the phylogenetic trees. The sequences with most correlated motifs without mutation are kept at top (root node) assuming data is not limited to mutation, but once mutations are encounter, then sequences are further assigned to nodes. Utilising this rearranged dataset, the PPI‐I guarantee the base measure of less correlated motif set and highly correlated, permitting an assessment of the quality of evaluation and not of the nature of the information on which it is connected. This framework leads to analysis of PPI pairs; around 22% data consists of highly correlated PPI pairs.

3.5 Tree reconstruction

The results are shown in Appendix 2.

3.6 Comparison with mirror tree

The outcome is compared with mirror tree, insight into the region under the ROC curve after rearranging the datasets for reconstruction of tree. The PPI‐I takes into account only highly correlated motifs/pairs. The standard technique is to assess the similitude of every pair of protein trees in the light of the complete system of comparability between protein trees, utilising a direct relationship coefficient between the distance matrices extricated from the protein trees as a marker of the similarity between trees. The results of mirror tree are shown in [27]. The AUC for the mirror tree is with ρ > 0.55 while the best AUC was acquired from PPI‐I is ρ > 0.66. The mirror tree has manually curated EcoCyc complexes, where a level of confirmation close to nearly 40% was obtained for the top 500 pair predictions. While If PPI‐I is considered it shows around 42% of level of confirmation for HIV, H1N1 and Ebola datasets.

4 Future work

The PPI system is very complex and dynamic in nature. It is also considered that in comprehensive evolution of biology, exploring PPI systems is very essential and is one of the most important topics in science [28]. The system can play a role in protein cooperation, dependencies and functional predication. The PPI forecast in gene duplication is currently one of the most interesting areas of research. The gene duplication relates to the expansion of a node and also with the connections indistinguishable from the actual node. After gene duplication, a protein that can bind emphatically will be in a better position in terms of investigating the changes that allow it for mutation, or to dimerise with other existing homologous proteins. It utilises the inherited binding mode for the cases. In smaller family, particularly for qualities encoding protein complex segments, paralogs are less likely to advance new communication accomplices with removal, mutation, addition or eliminating nodes. Although these problems are biological problems but the huge amount of computations and number crunching makes them more suitable for computer scientists to devise new algorithms. The objective is to design more realistic models and novel techniques or algorithms to solve such problems.

6.1 Appendix 1

Estimates of Evolutionary Divergence of HIV, HINI, Ebola virus (see Fig. 4).

The numbers of base substitutions per site from between sequences are shown. Analyses were conducted using the maximum composite likelihood model [24]. The analysis involved 25 nucleotide sequences. Codon positions included were 1st + 2nd + 3rd + non‐coding. All positions containing gaps and missing data were eliminated. There were a total of 1303 positions in the final dataset. Evolutionary analyses were conducted in MEGA5 [29].

6.2 Appendix 2

See Figs. 5, 6, 7.

Fig. 5 — Reconstruction of 25 taxa of H1N1 virus phylogenetic tree in PHYLIP after implementing PPI‐I algorithm

Fig. 6 — Reconstruction of 25 taxa of HIV‐1 virus phylogenetic tree in PHYLIP after implementing PPI‐I algorithm

Fig. 7 — Reconstruction of 25 taxa of Ebola virus phylogenetic tree in PHYLIP after implementing PPI‐I algorithm

5 References

1. Ge H. Walhout A.J.M. Vidal M.: ‘Integrating ‘omic’ information: a bridge between genomics and systems biology’, Trends Genet., 2003, 19, (10), pp. 551 –560 [DOI] [PubMed] [Google Scholar]
2. Stelzl U. Worm U. Lalowski M. et al.: ‘A human protein‐protein interaction network: a resource for annotating the proteome’, Cell, 2005, 122, (6), pp. 957 –968 [DOI] [PubMed] [Google Scholar]
3. Bork P. Jensen L.J. von Mering C. et al.: ‘Protein interaction networks from yeast to human’, Curr. Opin. Struct. Biol., 2004, 14, (3), pp. 292 –299 [DOI] [PubMed] [Google Scholar]
4. Chen B. Fan W. Liu J. et al.: ‘Identifying protein complexes and functional modules‐from static PPI networks to dynamic PPI networks’, Brief Bioinf., 2014, 15, (2), pp. 177 –194 [DOI] [PubMed] [Google Scholar]
5. Gitter A. Klein‐Seetharaman J. Gupta A. et al.: ‘Discovering pathways by orienting edges in protein interaction networks’, Nucleic Acids Res., 2010, 39, (4), p. e22 [DOI] [PMC free article] [PubMed] [Google Scholar]
6. Lee E. Chuang H.Y. Kim J.W. et al.: ‘Inferring pathway activity toward precise disease classification’, PLoS Comput. Biol., 2008, 4, (11), 10.1371/journal.pcbi.1000217 [DOI] [PMC free article] [PubMed] [Google Scholar]
7. Rao V.S. Srinivas K. Sujini G.N. et al.: ‘Protein‐protein interaction detection: methods and analysis’, Int. J. Proteomics, 2014, 2014, pp. 1 –12 [DOI] [PMC free article] [PubMed] [Google Scholar]
8. Sarmady M. Dampier W. Tozeren A.: ‘HIV protein sequence hotspots for crosstalk with host hub proteins’, PLoS One, 2011, 6, (8), 10.1371/journal.pone.0023293 [DOI] [PMC free article] [PubMed] [Google Scholar]
9. Zhang A.: ‘Protein interaction networks: computational analysis’, 2009.
10. Ratner D.: ‘The interaction of bacterial and phage proteins with immobilized Escherichia coli RNA polymerase’, J. Mol. Biol., 1974, 88, (2), pp. 373 –378 [DOI] [PubMed] [Google Scholar]
11. Olmsted J.B.: ‘Affinity purification of antibodies from diazotized paper blots of heterogeneous protein samples’, J. Biol. Chem., 1981, 256, (23), pp. 11955 –11957 [PubMed] [Google Scholar]
12. Kuo M.‐H. David Allis C.: ‘In vivo cross‐linking and immunoprecipitation for studying dynamic protein: DNA associations in a chromatin environment’, Methods, 1999, 19, (3), pp. 425 –433 [DOI] [PubMed] [Google Scholar]
13. Ito T. Chiba T. Ozawa R. et al.: ‘A comprehensive two‐hybrid analysis to explore the yeast protein interactome’, Proc. Natl. Acad. Sci. USA, 2001, 98, (8), pp. 4569 –4574 [DOI] [PMC free article] [PubMed] [Google Scholar]
14. Sato T. Yamanishi Y. Kanehisa M. et al.: ‘Improvement of the mirror tree method by extracting evolutionary information’, in ‘Sequence and genome analysis: method and applications’ (Concept Press, 2011), pp. 129 –139 [Google Scholar]
15. Pitre S. Alamgir M. Green J.R. et al.: ‘Computational methods for predicting protein‐protein interactions’, Adv. Biochem. Eng./Biotechnol., 2008, 110, pp. 247 –267 [DOI] [PubMed] [Google Scholar]
16. Craig R. Liao L.: ‘Phylogenetic tree information aids supervised learning for predicting protein‐protein interaction based on distance matrices’, BMC Bioinf., 2007, 8, (1), p. 6 [DOI] [PMC free article] [PubMed] [Google Scholar]
17. Pazos F. Valencia A.: ‘Similarity of phylogenetic trees as indicator of protein‐protein interaction’, Protein Eng., 2001, 14, (9), pp. 609 –614 [DOI] [PubMed] [Google Scholar]
18. https://www.ncbi.nlm.nih.gov/genome
19. Larkin M. Blackshields G. Brown N. et al.: ‘Clustalw and clustalX version 2’, Bioinformatics, 2007, 23, (21), pp. 2947 –2948 [DOI] [PubMed] [Google Scholar]
20. Rajasekaran S. Balla S. Huang C.‐H.: ‘Exact algorithms for planted motif problems’, J. Comput. Biol., 2005, 12, (8), pp. 1117 –1128 [DOI] [PubMed] [Google Scholar]
21. Sharma D. Rajasekaran S.: ‘A simple algorithm for (l, d) motif search1’. 2009 IEEE Symp. Computational Intelligence Bioinformatics and Computational Biology, 2009, no. September, pp. 1 –3
22. Malik S. Sharma D.: ‘Reconstructing phylogenetic network with ReTF algorithm (rearranging transcriptional factor)’. 2013 IEEE 13th Int. Conf. Bioinformatics and Bioengineering (BIBE), 2013.
23. Phizicky E.M. Fields S.: ‘Protein‐protein interactions: methods for detection and analysis’, Microbiol. Rev., 1995, 59, (1), pp. 94 –123 [DOI] [PMC free article] [PubMed] [Google Scholar]
24. Tamura K. Nei M. Kumar S.: ‘Prospects for inferring very large phylogenies by using the neighbor‐joining method’, Proc. Natl. Acad. Sci. USA, 2004, 101, (30), pp. 11030 –11035 [DOI] [PMC free article] [PubMed] [Google Scholar]
25. Vapnik V.N.: ‘The nature of statistical learning theory’, 1995, vol. 8 [Google Scholar]
26. Hecht‐Nielsen R.: ‘Replicator neural networks for universal optimal source coding’, Science, 1995, 269, (5232), pp. 1860 –1863 [DOI] [PubMed] [Google Scholar]
27. Juan D. Pazos F. Valencia A.: ‘High‐confidence prediction of global interactomes based on genome‐wide coevolutionary networks’, Proc. Natl. Acad. Sci. USA, 2008, 105, (3), pp. 934 –939 [DOI] [PMC free article] [PubMed] [Google Scholar]
28. Zhang Y. Gao P. Yuan J.S.: ‘Plant protein‐protein interaction network and interactome’, Curr. Genomics, 2010, 11, (1), pp. 40 –46 [DOI] [PMC free article] [PubMed] [Google Scholar]
29. Tamura K. Peterson D. Peterson N. et al.: ‘MEGA5: molecular evolutionary genetics analysis using maximum likelihood, evolutionary distance, and maximum parsimony methods’, Mol. Biol. Evol., 2011, 28, (10), pp. 2731 –2739 [DOI] [PMC free article] [PubMed] [Google Scholar]

[nbt2bf00327-bib-0001] 1. Ge H. Walhout A.J.M. Vidal M.: ‘Integrating ‘omic’ information: a bridge between genomics and systems biology’, Trends Genet., 2003, 19, (10), pp. 551 –560 [DOI] [PubMed] [Google Scholar]

[nbt2bf00327-bib-0002] 2. Stelzl U. Worm U. Lalowski M. et al.: ‘A human protein‐protein interaction network: a resource for annotating the proteome’, Cell, 2005, 122, (6), pp. 957 –968 [DOI] [PubMed] [Google Scholar]

[nbt2bf00327-bib-0003] 3. Bork P. Jensen L.J. von Mering C. et al.: ‘Protein interaction networks from yeast to human’, Curr. Opin. Struct. Biol., 2004, 14, (3), pp. 292 –299 [DOI] [PubMed] [Google Scholar]

[nbt2bf00327-bib-0004] 4. Chen B. Fan W. Liu J. et al.: ‘Identifying protein complexes and functional modules‐from static PPI networks to dynamic PPI networks’, Brief Bioinf., 2014, 15, (2), pp. 177 –194 [DOI] [PubMed] [Google Scholar]

[nbt2bf00327-bib-0005] 5. Gitter A. Klein‐Seetharaman J. Gupta A. et al.: ‘Discovering pathways by orienting edges in protein interaction networks’, Nucleic Acids Res., 2010, 39, (4), p. e22 [DOI] [PMC free article] [PubMed] [Google Scholar]

[nbt2bf00327-bib-0006] 6. Lee E. Chuang H.Y. Kim J.W. et al.: ‘Inferring pathway activity toward precise disease classification’, PLoS Comput. Biol., 2008, 4, (11), 10.1371/journal.pcbi.1000217 [DOI] [PMC free article] [PubMed] [Google Scholar]

[nbt2bf00327-bib-0007] 7. Rao V.S. Srinivas K. Sujini G.N. et al.: ‘Protein‐protein interaction detection: methods and analysis’, Int. J. Proteomics, 2014, 2014, pp. 1 –12 [DOI] [PMC free article] [PubMed] [Google Scholar]

[nbt2bf00327-bib-0008] 8. Sarmady M. Dampier W. Tozeren A.: ‘HIV protein sequence hotspots for crosstalk with host hub proteins’, PLoS One, 2011, 6, (8), 10.1371/journal.pone.0023293 [DOI] [PMC free article] [PubMed] [Google Scholar]

[nbt2bf00327-bib-0009] 9. Zhang A.: ‘Protein interaction networks: computational analysis’, 2009.

[nbt2bf00327-bib-0010] 10. Ratner D.: ‘The interaction of bacterial and phage proteins with immobilized Escherichia coli RNA polymerase’, J. Mol. Biol., 1974, 88, (2), pp. 373 –378 [DOI] [PubMed] [Google Scholar]

[nbt2bf00327-bib-0011] 11. Olmsted J.B.: ‘Affinity purification of antibodies from diazotized paper blots of heterogeneous protein samples’, J. Biol. Chem., 1981, 256, (23), pp. 11955 –11957 [PubMed] [Google Scholar]

[nbt2bf00327-bib-0012] 12. Kuo M.‐H. David Allis C.: ‘In vivo cross‐linking and immunoprecipitation for studying dynamic protein: DNA associations in a chromatin environment’, Methods, 1999, 19, (3), pp. 425 –433 [DOI] [PubMed] [Google Scholar]

[nbt2bf00327-bib-0013] 13. Ito T. Chiba T. Ozawa R. et al.: ‘A comprehensive two‐hybrid analysis to explore the yeast protein interactome’, Proc. Natl. Acad. Sci. USA, 2001, 98, (8), pp. 4569 –4574 [DOI] [PMC free article] [PubMed] [Google Scholar]

[nbt2bf00327-bib-0014] 14. Sato T. Yamanishi Y. Kanehisa M. et al.: ‘Improvement of the mirror tree method by extracting evolutionary information’, in ‘Sequence and genome analysis: method and applications’ (Concept Press, 2011), pp. 129 –139 [Google Scholar]

[nbt2bf00327-bib-0015] 15. Pitre S. Alamgir M. Green J.R. et al.: ‘Computational methods for predicting protein‐protein interactions’, Adv. Biochem. Eng./Biotechnol., 2008, 110, pp. 247 –267 [DOI] [PubMed] [Google Scholar]

[nbt2bf00327-bib-0016] 16. Craig R. Liao L.: ‘Phylogenetic tree information aids supervised learning for predicting protein‐protein interaction based on distance matrices’, BMC Bioinf., 2007, 8, (1), p. 6 [DOI] [PMC free article] [PubMed] [Google Scholar]

[nbt2bf00327-bib-0017] 17. Pazos F. Valencia A.: ‘Similarity of phylogenetic trees as indicator of protein‐protein interaction’, Protein Eng., 2001, 14, (9), pp. 609 –614 [DOI] [PubMed] [Google Scholar]

[nbt2bf00327-bib-0018] 18. https://www.ncbi.nlm.nih.gov/genome

[nbt2bf00327-bib-0019] 19. Larkin M. Blackshields G. Brown N. et al.: ‘Clustalw and clustalX version 2’, Bioinformatics, 2007, 23, (21), pp. 2947 –2948 [DOI] [PubMed] [Google Scholar]

[nbt2bf00327-bib-0020] 20. Rajasekaran S. Balla S. Huang C.‐H.: ‘Exact algorithms for planted motif problems’, J. Comput. Biol., 2005, 12, (8), pp. 1117 –1128 [DOI] [PubMed] [Google Scholar]

[nbt2bf00327-bib-0021] 21. Sharma D. Rajasekaran S.: ‘A simple algorithm for (l, d) motif search1’. 2009 IEEE Symp. Computational Intelligence Bioinformatics and Computational Biology, 2009, no. September, pp. 1 –3

[nbt2bf00327-bib-0022] 22. Malik S. Sharma D.: ‘Reconstructing phylogenetic network with ReTF algorithm (rearranging transcriptional factor)’. 2013 IEEE 13th Int. Conf. Bioinformatics and Bioengineering (BIBE), 2013.

[nbt2bf00327-bib-0023] 23. Phizicky E.M. Fields S.: ‘Protein‐protein interactions: methods for detection and analysis’, Microbiol. Rev., 1995, 59, (1), pp. 94 –123 [DOI] [PMC free article] [PubMed] [Google Scholar]

[nbt2bf00327-bib-0024] 24. Tamura K. Nei M. Kumar S.: ‘Prospects for inferring very large phylogenies by using the neighbor‐joining method’, Proc. Natl. Acad. Sci. USA, 2004, 101, (30), pp. 11030 –11035 [DOI] [PMC free article] [PubMed] [Google Scholar]

[nbt2bf00327-bib-0025] 25. Vapnik V.N.: ‘The nature of statistical learning theory’, 1995, vol. 8 [Google Scholar]

[nbt2bf00327-bib-0026] 26. Hecht‐Nielsen R.: ‘Replicator neural networks for universal optimal source coding’, Science, 1995, 269, (5232), pp. 1860 –1863 [DOI] [PubMed] [Google Scholar]

[nbt2bf00327-bib-0027] 27. Juan D. Pazos F. Valencia A.: ‘High‐confidence prediction of global interactomes based on genome‐wide coevolutionary networks’, Proc. Natl. Acad. Sci. USA, 2008, 105, (3), pp. 934 –939 [DOI] [PMC free article] [PubMed] [Google Scholar]

[nbt2bf00327-bib-0028] 28. Zhang Y. Gao P. Yuan J.S.: ‘Plant protein‐protein interaction network and interactome’, Curr. Genomics, 2010, 11, (1), pp. 40 –46 [DOI] [PMC free article] [PubMed] [Google Scholar]

[nbt2bf00327-bib-0029] 29. Tamura K. Peterson D. Peterson N. et al.: ‘MEGA5: molecular evolutionary genetics analysis using maximum likelihood, evolutionary distance, and maximum parsimony methods’, Mol. Biol. Evol., 2011, 28, (10), pp. 2731 –2739 [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

Reconstructing phylogenetic tree using a protein–protein interaction technique

Shamita Malik

Dolly Sharma

Sunil Kumar Khatri

Abstract

1 Introduction

1.1 Classification of PPI detection methods

1.2 PPI network and its computational analysis

1.3 Phylogenetic and PPI detection

2 Methods

Fig. 1.

2.1 Phylogenetic matrix calculation and its correlation analysis

Fig. 2.

2.2 Support vector machine (SVM)

2.3 Database of interacting protein and preparation of training and testing sets

3 Results and conclusion

3.1 Filtering less correlated motifs

Table 1.

3.2 Findings

Fig. 3.

3.3 Training and testing model findings

Table 2.

3.4 Sequence rearrangements

3.5 Tree reconstruction

3.6 Comparison with mirror tree

4 Future work

6.1 Appendix 1

Fig. 4.

Fig. 4.

Fig. 4.

Fig. 4.

6.2 Appendix 2

Fig. 5.

Fig. 6.

Fig. 7.

5 References

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases