Abstract
Protein complexes are involved in many biological processes. Examining coupling between subunits of a complex would be useful to understand the molecular basis of protein function. Here, our updated (PS)2 web server predicts the three-dimensional structures of protein complexes based on comparative modeling; furthermore, this server examines the coupling between subunits of the predicted complex by combining structural and evolutionary considerations. The predicted complex structure could be indicated and visualized by Java-based 3D graphics viewers and the structural and evolutionary profiles are shown and compared chain-by-chain. For each subunit, considerations with or without the packing contribution of other subunits cause the differences in similarities between structural and evolutionary profiles, and these differences imply which form, complex or monomeric, is preferred in the biological condition for the subunit. We believe that the (PS)2 server would be a useful tool for biologists who are interested not only in the structures of protein complexes but also in the coupling between subunits of the complexes. The (PS)2 is freely available at http://ps2v3.life.nctu.edu.tw/.
INTRODUCTION
Proteins are important molecules involved in almost all biological processes. The analysis of protein three-dimensional (3D) structures plays important role in understanding the molecular basis of their functions. Comparative modeling is one of the most important computational approaches to predict structures from amino acid sequences. Our previous comparative modeling web tools, (PS)2 (1,2), have been widely used by many studies for proteomic research and have been cited in various journals such as The American Journal of Human Genetics (3), Human mutation (4,5), BBA Reviews on Cancer (6), Journal of Clinical Microbiology (7). Due to many fundamental cellular processes are mediated by protein–protein interactions, the prediction of protein complex structure is more and more important in proteomic studies. Indeed, many comparative modeling tools have been published to build complex structures (8–15). However, the predicted complex structures emerge a relevant question: whether individual subunit itself is enough to perform biological function or the whole complex structure is necessary in biological condition. Recently, Chang et al. addressed this issue only through comparing packing density and sequence conservation profiles, where only structure coordinates and sequence homologs are required to obtain these two properties (16).
Previous studies have shown that packing density is correlated with the sequence conservation (17–20). Furthermore, Shih et al. showed that the reciprocal value of packing density is proportional to the sequence conservation at residue level (20) and this linear relationship could be explained by a mechanistic model based on statistical physics (21). Using various methods to calculate packing density and sequence conservation, Yeh et al. found that the highest correlation between these two properties occurs when the packing density is estimated by weighted contact number (WCN) (22) as well as the sequence conservation is estimated by ConSurf (23,24). In order to obtain positive correlation between packing density and ConSurf profiles, the reciprocal WCN (rWCN) is used in these studies (16,18,20,25). For the given 3D structures of protein complexes, there are two ways to compute rWCN profile for each protein subunit: one ignores the packing contribution of the other subunits of the complex (denoted by rWCN I) and the other considers packing contribution of the other subunits (denoted by rWCN II). Chang et al. showed that rWCN I better agrees with sequence conservation for the set of enzymes whose active sites are located in individual subunits; in contrast, rWCN II better agrees with sequence conservation for the set of enzymes whose active sites comprise catalytic residues from multiple subunits (16). These findings imply that an individual subunit might be enough to perform biological function when its rWCN I profile better agrees with sequence conservation profile; on the contrary, complex form is preferred in biological condition when rWCN II profile better agrees with sequence conservation profile. As a consequence, Chang et al. suggested that comparing packing density and conservation profiles is a novel way of looking at couplings between subunits of a complex by comparing structural and sequence conservation profiles (16).
Here, we report the updated (PS)2 server that automatically predicts protein complex structures from query sequences and shows the coupling between subunits of predicted complex structures based on the findings of Chang et al. The updated (PS)2 accepts protein sequence in FASTA format and tries to build homologous 3D structures using (PS)2 methodology (1,2), which was previously developed by one of the authors and is based on effective consensus strategy in both template selection and target-template alignment to build homologous structures; afterward, the updated (PS)2 provides the packing density profiles calculated by rWCN I and rWCN II (16) for the predicted structure, as well as provides the sequence conservation profile calculated by ConSurf (23,24). The correlation coefficients between these structure-derived and sequence-derived profiles give a clue about the degree of coupling among the subunits of the predicted complex structure. Finally, the predicted tertiary structure and the structure-sequence profile comparisons can be visualized and indicated by Java-based 3D graphics viewers. This web server is freely available at http://ps2v3.life.nctu.edu.tw.
MATERIALS AND METHODS
The workflow of (PS)2 is schematically shown in Figure 1. One of the new features of the updated (PS)2 server is 3D structure prediction for protein complexes. Users can enter two query protein sequences, then the server models 3D complex structure through the (PS)2 homology modeling strategies (1,2). The complex template dataset, consisting of 56547 3D structures, contains all available protein complexes in Protein Data Bank. With the predicted 3D structure and the query sequences, the packing density and the sequence conservation of each residue are respectively calculated by WCN (22) and ConSurf (24).
Figure 1.
The schematic workflow of (PS)2 version 3.0.
For a monomeric structure, the WCN of the ith residue is , the summation of the square of inversed Cα–Cα distances to the other residues (22). In order to compare with the conservation score calculated by ConSurf (24), the reciprocal value of WCN (rWCN) is used. For a dimeric structure, rWCN could be calculated by two ways (16): rWCN I only considers residues in the same subunit (chain), while rWCN II considers residues in the whole complex. In the result page, profiles of rWCN I, rWCN II and ConSurf are z-score normalized by
, where
and σx are the mean and the standard deviation of x, respectively.
The new (PS)2 server is built on the basis of the WCN, ConSurf and the original (PS)2 systems, where the MODELLER is upgraded to version 9v8 (26) and Perl scripts are used to integrate the prediction pipeline. The web page is constructed by HTML and PHP. This web server runs on a Linux system with 2.40 GHz Intel Xeon processors consisting of 24 cores. The OpenAstexViewer (http://openastexviewer.net) was used for visualization of the predicted models.
WEB SERVER
Input format
The (PS)2 web server accepts two types of input (Figure 2A): users could paste protein sequence or upload a sequence file in FASTA format. Furthermore, there are three ways for template selection: (i) our web server automatically selects templates based on the E-values from sequence alignments, (ii) users could assign a PDB structure and choose which chains would be used as template, (iii) users could upload their own 3D structure as template. It should be noted that, for the ‘prediction for complex’, the entered specific template must be a complex structure and the input chain IDs, ‘Chain1’ and ‘Chain2’, must be different. For a protein with about 400 amino acids, it costs about 10 min for the template-based structure modeling, packing density calculation with and without the consideration of other subunits, but it costs about 30 min for sequence conservation calculation. For a longer sequence, it may take more than one hour in run time because of the time-consuming calculation for sequence conservation. Therefore, users are encouraged to enter their e-mail addresses so that notification will be sent by e-mail when the submitted job is finished.
Figure 2.
The features of the (PS)2 web server: (A) Input: users can enter or upload protein sequences in FASTA format. (B) Output: users can view the predicted structure in 3D graphics viewer (top-right panel) as well as get the structural and sequence properties for each residue (top-left panel) and can compare the sequence conservation and packing density profiles chain-by-chain (bottom panel).
Output format
After the prediction, the updated (PS)2 server will return two types of results (Figure 2B): (i) if the 3D structure being successfully built, the visualization of its predicted complex structure (Display Structure) will be shown in the right region of the top panel. (ii) The sequence conservation and structure packing density of each residue will be shown in the left region of the top panel, including rWCN I, rWCN II and ConSurf. The server integrates these two types of results for users to easily analyze and view 3D structures and manipulate their orientations in space. If clicked on the check box of the (Label) column of the top-left panel, the selected residues will be labeled in 3D graphics viewer of the top-right panel. The different structure display modes (e.g. Cartoon, Lines, Spheres and Surface) can be visualized together or individually in the 3D graphics viewer for easy analysis. The (PS)2 server also allows user to download the predicted structure coordinates in the PDB format.
EXAMPLE ANALYSIS
The alanine racemase (gene name: ALR_BACPS, UniProt ID: Q9S5V6) from Bacillus psychrosaccharolyticus catalyzes the alanine racemization between the L and D forms. Based on the phenylhydrazine method of Wada and Snell (27), the study of Inagaki et al. indicated that alanine racemases are dimeric complexes (28). In order to build ALR_BACPS complex model, the (PS)2 server automatically selects the alanine racemase from Geobacillus stearothermophilus (PDB ID: 1BD0, chains A and B) (29) as the template whose sequences are most similar to the query sequence (about 58%) for the whole PDB database. The complex structure predicted by (PS)2 server is shown in the top-right panel of Figure 2B, while the packing density (both rWCN I and rWCN II) as well as sequence conservation (ConSurf) of each residue are shown in the top-left panel. In order to compare structural and sequence profiles for each subunit, the profiles of rWCN I, rWCN II and ConSurf are shown in the bottom panel chain-by-chain. For each individual chain, the Pearson's correlation coefficients for rWCN I-ConSurf and rWCN II-ConSurf are shown in the bottom of the profile comparison figure. Take the profiles of chain A as example, it is clear that rWCN II have better agreement with ConSurf (0.769) than rWCN I (0.498) and this is the same for chain B (0.759 > 0.486). Based on the findings of Chang et al. (16) and the profile comparisons results, we infer that alanine racemases are dimers in biological condition. This inference is consistent with the experimental results from Inagaki et al. (28).
Figure 3 shows the predicted complex structure of ALR_BACPS by the (PS)2 server. As a result, the left active site consists of Lys41 from chain A and Tyr266 from chain B and so the right active site did it too. Since each active site of alanine racemase consisting of residues from different chains, dimer form is necessary to perform the function. This is consistent with the inference that alanine racemase is dimer based on the method proposed by Chang et al. (16). In summary, this example shows that our (PS)2 server not only correctly predicts the protein complex structure but also has the potential to predict biological functional unit.
Figure 3.
The predicted complex structure of alanine racemase (ALR_BACPS) by (PS)2 server. Structure models of ALR_BACPS complex are built using sequence homolog (PDB ID: 1BD0, chains A and B) as their templates. The active-site residues Lys41 and Tyr266 are shown in spheres mode.
CONCLUSION
Here we present an updated (PS)2 web server for predicting complex structures and analyzing these structures by integrating ConSurf, WCN and (PS)2 systems. One of the unique features of (PS)2 is the integration of packing density and sequence conservation analysis into complex structure prediction, and this integration predicts structures of protein complexes as well as points out that the biological functional unit should be monomeric or complex form. The example demonstrates that the updated (PS)2 server is effective for complex structure prediction and provides a new way to look at the coupling between subunits. We believe that the (PS)2 server will be useful to general biologists.
Acknowledgments
We are grateful for both the hardware and software support provided from the Center for Bioinformatics Research, Nation Chiao Tung University, Taiwan, Center for Lipid Biosciences, Kaohsiung Medical University Hospital, Taiwan and Center for Lipid and Glycomedicine Research, Kaohsiung Medical University, Taiwan.
FUNDING
Academic Summit Program of Ministry of Science and Technology [MOST-103-2321-B-009-002]; 'Center for Bioinformatics Research of Aiming for the Top University Program' of the National Chiao Tung University and Ministry of Education, Taiwan; NSYSU-KMU joint research project, [NSYSUKMU 104-P027]. Funding for open access charge: 'Center for Bioinformatics Research of Aiming for the Top University Program' of the National Chiao Tung University and Ministry of Education, Taiwan.
Conflict of interest statement. None declared.
REFERENCES
- 1.Chen C.C., Hwang J.K., Yang J.M. (PS)2: protein structure prediction server. Nucleic Acids Res. 2006;34:W152–W157. doi: 10.1093/nar/gkl187. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Chen C.C., Hwang J.K., Yang J.M. (PS)2-v2: template-based protein structure prediction server. BMC Bioinformatics. 2009;10:366. doi: 10.1186/1471-2105-10-366. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Weber S., Thiele H., Mir S., Toliat M.R., Sozeri B., Reutter H., Draaken M., Ludwig M., Altmuller J., Frommolt P., et al. Muscarinic acetylcholine receptor M3 mutation causes urinary bladder disease and a prune-belly-like syndrome. Am. J. Hum. Genet. 2011;89:668–674. doi: 10.1016/j.ajhg.2011.10.007. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Soltys D.T., Rocha C.R., Lerner L.K., de Souza T.A., Munford V., Cabral F., Nardo T., Stefanini M., Sarasin A., Cabral-Neto J.B., et al. Novel XPG (ERCC5) mutations affect DNA repair and cell survival after ultraviolet but not oxidative stress. Hum. Mutat. 2013;34:481–489. doi: 10.1002/humu.22259. [DOI] [PubMed] [Google Scholar]
- 5.Kuo Y.C., Lin Y.H., Chen H.I., Wang Y.Y., Chiou Y.W., Lin H.H., Pan H.A., Wu C.M., Su S.M., Hsu C.C., et al. SEPT12 mutations cause male infertility with defective sperm annulus. Hum. Mutat. 2012;33:710–719. doi: 10.1002/humu.22028. [DOI] [PubMed] [Google Scholar]
- 6.Friedman R., Boye K., Flatmark K. Molecular modelling and simulations in cancer research. Biochim. Biophys. Acta. 2013;1836:1–14. doi: 10.1016/j.bbcan.2013.02.001. [DOI] [PubMed] [Google Scholar]
- 7.Huang S.W., Hsu Y.W., Smith D.J., Kiang D., Tsai H.P., Lin K.H., Wang S.M., Liu C.C., Su I.J., Wang J.R. Reemergence of enterovirus 71 in 2008 in taiwan: dynamics of genetic and antigenic evolution from 1998 to 2008. J. Clin. Microbiol. 2009;47:3653–3662. doi: 10.1128/JCM.00630-09. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Mosca R., Ceol A., Aloy P. Interactome3D: adding structural details to protein networks. Nat. Methods. 2013;10:47–53. doi: 10.1038/nmeth.2289. [DOI] [PubMed] [Google Scholar]
- 9.Mukherjee S., Zhang Y. Protein-protein complex structure predictions by multimeric threading and template recombination. Structure. 2011;19:955–966. doi: 10.1016/j.str.2011.04.006. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Guerler A., Govindarajoo B., Zhang Y. Mapping monomeric threading to protein-protein structure prediction. J. Chem. Inf. Model. 2013;53:717–725. doi: 10.1021/ci300579r. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Szilagyi A., Zhang Y. Template-based structure modeling of protein-protein interactions. Curr. Opin. Struct. Biol. 2014;24:10–23. doi: 10.1016/j.sbi.2013.11.005. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Lu L., Lu H., Skolnick J. MULTIPROSPECTOR: an algorithm for the prediction of protein-protein interactions by multimeric threading. Proteins. 2002;49:350–364. doi: 10.1002/prot.10222. [DOI] [PubMed] [Google Scholar]
- 13.Baspinar A., Cukuroglu E., Nussinov R., Keskin O., Gursoy A. PRISM: a web server and repository for prediction of protein-protein interactions and modeling their 3D complexes. Nucleic Acids Res. 2014;42:W285–W289. doi: 10.1093/nar/gku397. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Kundrotas P.J., Lensink M.F., Alexov E. Homology-based modeling of 3D structures of protein-protein complexes using alignments of modified sequence profiles. Int. J. Biol. Macromol. 2008;43:198–208. doi: 10.1016/j.ijbiomac.2008.05.004. [DOI] [PubMed] [Google Scholar]
- 15.Fukuhara N., Kawabata T. HOMCOS: a server to predict interacting protein pairs and interacting sites by homology modeling of complex structures. Nucleic Acids Res. 2008;36:W185–W189. doi: 10.1093/nar/gkn218. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Chang C.M., Huang Y.W., Shih C.H., Hwang J.K. On the relationship between the sequence conservation and the packing density profiles of the protein complexes. Proteins. 2013;81:1192–1199. doi: 10.1002/prot.24268. [DOI] [PubMed] [Google Scholar]
- 17.Franzosa E.A., Xia Y. Structural determinants of protein evolution are context-sensitive at the residue level. Mol. Biol. Evol. 2009;26:2387–2395. doi: 10.1093/molbev/msp146. [DOI] [PubMed] [Google Scholar]
- 18.Yeh S.W., Liu J.W., Yu S.H., Shih C.H., Hwang J.K., Echave J. Site-specific structural constraints on protein sequence evolutionary divergence: local packing density versus solvent exposure. Mol. Biol. Evol. 2014;31:135–139. doi: 10.1093/molbev/mst178. [DOI] [PubMed] [Google Scholar]
- 19.Liao H., Yeh W., Chiang D., Jernigan R.L., Lustig B. Protein sequence entropy is closely related to packing density and hydrophobicity. Protein Eng. Des. Sel. 2005;18:59–64. doi: 10.1093/protein/gzi009. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Shih C.H., Chang C.M., Lin Y.S., Lo W.C., Hwang J.K. Evolutionary information hidden in a single protein structure. Proteins. 2012;80:1647–1657. doi: 10.1002/prot.24058. [DOI] [PubMed] [Google Scholar]
- 21.Huang T.T., del Valle Marcos M.L., Hwang J.K., Echave J. A mechanistic stress model of protein evolution accounts for site-specific evolutionary rates and their relationship with packing density and flexibility. BMC Evol. Biol. 2014;14:78. doi: 10.1186/1471-2148-14-78. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Lin C.P., Huang S.W., Lai Y.L., Yen S.C., Shih C.H., Lu C.H., Huang C.C., Hwang J.K. Deriving protein dynamical properties from weighted protein contact number. Proteins. 2008;72:929–935. doi: 10.1002/prot.21983. [DOI] [PubMed] [Google Scholar]
- 23.Goldenberg O., Erez E., Nimrod G., Ben-Tal N. The ConSurf-DB: pre-calculated evolutionary conservation profiles of protein structures. Nucleic Acids Res. 2009;37:D323–D327. doi: 10.1093/nar/gkn822. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Ashkenazy H., Erez E., Martz E., Pupko T., Ben-Tal N. ConSurf 2010: calculating evolutionary conservation in sequence and structure of proteins and nucleic acids. Nucleic Acids Res. 2010;38:W529–W533. doi: 10.1093/nar/gkq399. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Yeh S.W., Huang T.T., Liu J.W., Yu S.H., Shih C.H., Hwang J.K., Echave J. Local packing density is the main structural determinant of the rate of protein sequence evolution at site level. Biomed. Res. Int. 2014;2014:572409. doi: 10.1155/2014/572409. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Webb B., Sali A. Comparative protein structure modeling using MODELLER. Curr. Protoc. Bioinformatics. 2014;47:5.6.1–5.6.32. doi: 10.1002/0471250953.bi0506s47. [DOI] [PubMed] [Google Scholar]
- 27.Wada H., Snell E.E. The enzymatic oxidation of pyridoxine and pyridoxamine phosphates. J. Biol. Chem. 1961;236:2089–2095. [PubMed] [Google Scholar]
- 28.Inagaki K., Tanizawa K., Badet B., Walsh C.T., Tanaka H., Soda K. Thermostable alanine racemase from Bacillus stearothermophilus: molecular cloning of the gene, enzyme purification, and characterization. Biochemistry. 1986;25:3268–3274. doi: 10.1021/bi00359a028. [DOI] [PubMed] [Google Scholar]
- 29.Stamper G.F., Morollo A.A., Ringe D. Reaction of alanine racemase with 1-aminoethylphosphonic acid forms a stable external aldimine. Biochemistry. 1998;37:10438–10445. doi: 10.1021/bi980692s. [DOI] [PubMed] [Google Scholar]