PredMP: a web server for de novo prediction and visualization of membrane proteins

Sheng Wang; Shiyang Fei; Zongan Wang; Yu Li; Jinbo Xu; Feng Zhao; Xin Gao

doi:10.1093/bioinformatics/bty684

. 2018 Aug 4;35(4):691–693. doi: 10.1093/bioinformatics/bty684

PredMP: a web server for de novo prediction and visualization of membrane proteins

Sheng Wang ^1,^✉,¹, Shiyang Fei ^2,¹, Zongan Wang ^3,¹, Yu Li ¹, Jinbo Xu ⁴, Feng Zhao ^5,^✉, Xin Gao ^1,^✉

Editor: Alfonso Valencia

PMCID: PMC6378930 PMID: 30084960

Abstract

Motivation

PredMP is the first web service, to our knowledge, that aims at de novo prediction of the membrane protein (MP) 3D structure followed by the embedding of the MP into the lipid bilayer for visualization. Our approach is based on a high-throughput Deep Transfer Learning (DTL) method that first predicts MP contacts by learning from non-MPs and then predicts the 3D model of the MP using the predicted contacts as distance restraints. This algorithm is derived from our previous Deep Learning (DL) method originally developed for soluble protein contact prediction, which has been officially ranked No. 1 in CASP12. The DTL framework in our approach overcomes the challenge that there are only a limited number of solved MP structures for training the deep learning model. There are three modules in the PredMP server: (i) The DTL framework followed by the contact-assisted folding protocol has already been implemented in RaptorX-Contact, which serves as the key module for 3D model generation; (ii) The 1D annotation module, implemented in RaptorX-Property, is used to predict the secondary structure and disordered regions; and (iii) the visualization module to display the predicted MPs embedded in the lipid bilayer guided by the predicted transmembrane topology.

Results

Tested on 510 non-redundant MPs, our server predicts correct folds for ∼290 MPs, which significantly outperforms existing methods. Tested on a blind and live benchmark CAMEO from September 2016 to January 2018, PredMP can successfully model all 10 MPs belonging to the hard category.

Availability and implementation

PredMP is freely accessed on the web at http://www.predmp.com.

Supplementary information

Supplementary data are available at Bioinformatics online.

1 Introduction

Membrane proteins (MPs) are encoded by ∼30% genes and have been targeted by ∼50% of therapeutic drugs. Compared to non-membrane proteins (non-MPs), the determination of MP structures is challenging in large part due to the difficulty in establishing experimental conditions where the correct conformation of the protein in isolation from its native environment is preserved. Therefore, it is important to develop computational methods to predict MP structures from sequence information.

Though homology modeling (or, template-based modeling) works well for many non-MPs (such as soluble proteins), it encounters some difficulties for predicting MPs partially due to lack of sufficient MPs with solved structures. In particular, currently there are only about 510 non-redundant MPs in Protein Data Bank (PDB), which makes homology modeling infeasible for a large portion of MPs. Thus, de novo prediction (or, ab initio folding) is needed.

So far the most successful de novo prediction methods could be categorized into two classes: fragment assembly, e.g. Rosetta (Kim et al., 2004) and contact-assisted ab initio folding, e.g. CoinFold (Wang et al., 2016b). Fragment assembly approach works mostly on some small proteins but most of the multi-pass transmembrane proteins are relatively large in size; contact-assisted approach heavily depends on accurate prediction of protein contacts, which cannot be achieved either by pure co-evolution methods, such as Gremlin (Kamisetty et al., 2013) or by methods that exploit co-evolution features using shallow neural networks, such as metaPSICOV (Jones et al., 2015) on proteins without many sequence homologs (Wang et al., 2017b).

Here we present PredMP, a web server that first predicts the MP structure without using any structural templates, and then visualizes the predicted MP model embedded in the lipid bilayer. The key part of PredMP is the 3D modeling module, which is implemented in RaptorX-Contact. The underlying algorithm of this module originates from a Deep Learning (DL) method mainly developed for soluble protein contact prediction, which obtained the highest F1 score in the contact prediction category in CASP12 (Wang et al., 2018). To overcome the insufficient training data for MP contact prediction, we transfer the knowledge learned from non-MPs to MP contact prediction, and thus call such a method Deep Transfer Learning (DTL) (Wang et al., 2017a). Using the predicted contacts as distance restraints, the 3D model of the MP is constructed by the Crystallography & NMR System (CNS) suite (Brunger et al., 1998).

With the help of predicted transmembrane topology by DeepCNF (Wang et al., 2015, 2016c), the 3D model of the query MP is first embedded into the membrane bilayer using a depth- and residue-dependent membrane burial potential (Wang et al., 2016d), and then visualized by a WebGL-based protein viewer.

2 Workflow and implementation

The basic workflow of PredMP is shown in Supplementary Figure S1. There are three modules in the PredMP server: (i) the 1D annotation module for the prediction of secondary structure and disordered regions by the RaptorX-Property server (Wang et al., 2016a); (ii) the 3D modeling module for de novo generating five 3D models of the query MP by the RaptorX-Contact server (Wang et al., 2016b, Wang et al., 2017a,b), and (iii) the visualization module to display the predicted MPs embedded into the lipid bilayer. Below are the major steps of how PredMP works.

2.1 Multiple sequence alignment construction

When the amino acid sequence of an MP is submitted by the user, the server first generates the multiple sequence alignment (MSA) to retrieve the sequence homologs from the protein family to which the input MP belongs.

2.2 1D annotation module for local structural property prediction

The MSA is utilized to predict two structural properties of an MP, namely the secondary structure elements and the disordered regions. Specifically, these properties are predicted by RaptorX-Property (Wang et al., 2016a).

2.3 3D modeling module for de novo generating MP models

This module consists of two parts: (i) contact map prediction, and (ii) 3D model construction. For contact map prediction, the MSA is exploited to predict the residue–residue contact map of an MP by a Deep Transfer Learning (DTL) model that learns from non-MPs (Wang et al., 2017a). For 3D model construction, the 3D models of the input MP are constructed by feeding the predicted secondary structures and predicted contacts to the Crystallography & NMR System (CNS) suite (Brunger et al., 1998). In brief, the predicted secondary structure is converted into distance, angle and h-bond restraints. We also convert the top predicted contacts to distance restraints. Finally, we build 3D structure models using the CNS suite and select top five models according to the CNS energy function (Wang et al., 2016b). The entire approach is implemented in RaptorX-Contact (Wang et al., 2017b).

2.4 Visualization module for the display of the embedded MPs

The final step is the visualization of the embedded 3D model of the input MP into the bilayer membrane, which consists of two procedures: (i) transmembrane topology prediction, and (ii) MP embedding. For transmembrane topology prediction, we train a machine learning model DeepCNF (Wang et al., 2015, 2016c) to predict the 9-label transmembrane region at each residue (Supplementary Section S3). For MP embedding, we use a similar approach as the Positioning of Proteins in Membranes (PPM) method (Lomize et al., 2006), which calculates rotational and translational positions of the 3D membrane protein model inside the membrane. The membrane potential is obtained from the statistics of a curated training set of non-homologous transmembrane proteins (Wang et al., 2016d).

3 Performances

The underlying DL method (Wang et al., 2018) has been blindly tested in CASP12 in 2016 and officially ranked first in the category of protein contact prediction (Schaarschmidt et al., 2018). Tested on a blind and live benchmark CAMEO (Haas et al., 2013) from September 2016 to January 2018, the key module RaptorX-Contact in our server PredMP can successfully model all 10 MPs belonging to the hard category (Supplementary Section S4).

Here we briefly describe the results on all 510 non-redundant MPs with solved structures in PDB (Supplementary Table S1). According to the CASP official definition (Kryshtafovych et al., 2018), for each target the predictor could provide five models. If the best TM-score among the five models is larger than 0.5 to the native structure, then we can claim that this target is correctly predicted (Xu and Zhang, 2010). As shown in Table 1, PredMP significantly exceeds the other methods in terms of accuracy of the TM-score for 3D models and accuracy of the top L/5 (L is the protein length) for predicted contacts. For each target in the 510 dataset, we provide a URL to display/download the 3D models generated by PredMP (http://predmp.com/#/detail/5c6oA).

Table 1.

The accuracy of 3D models (first three columns: TM-score, the number of models whose TM-score is above a threshold 0.6 and 0.5, respectively) and the accuracy of long/medium range contact prediction (last two columns: Top L/5, where L is the protein length) on all the 510 non-redundant membrane proteins

Methods	TMscore	#TM > 0.5	#TM > 0.6	long	med
Gremlin	0.384	122	56	0.40	0.23
metaPSICOV	0.413	147	77	0.49	0.34
PredMP	0.547	298	223	0.69	0.48

Open in a new tab

Note: A contact is short-, medium and long-range when the sequence separation of two residues in a contact falls into [6, 11], [12, 23], and ≥ 24 residues, respectively.

4 Conclusions and discussions

In this work, we introduced PredMP for de novo prediction of membrane proteins (MPs). The server not only allows the accurate modeling of the membrane protein 3D structure, but also enables the embedding of the MP into the lipid bilayer. PredMP was calibrated on a blind and live benchmark CAMEO (Haas et al., 2013) from September 2016 to January 2018 and successfully modelled 10 MPs. We also constructed a reliable correlation curve between the 3D modelling accuracy and the number of effective sequence homologs (Supplementary Section S5), and estimated that our server could predict correct folds for ∼1500 among 2215 human multi-pass MPs including a few hundred new folds (Wang et al., 2017a). This website is free and open to all users and there is no login requirement. The only required input is the putative membrane protein sequence and the running time of our server is about 2 hours per target with about 500 residues. Supplementary Section S6 details the input/output format of the PredMP server.

We have made available the predicted models as well as the native structures of the 510 non-redundant MP dataset, which is free to access at http://www.predmp.com/#/download. Users can evaluate the quality of the results generated by PredMP. We hope that this 510 dataset could serve as an MP benchmark for the protein prediction community.

Supplementary Material

Click here for additional data file.^{(2.7MB, pdf)}

Acknowledgements

The authors thank Prof. Tobin Sosnick for helpful discussion.

Funding

This work was supported by the King Abdullah University of Science and Technology (KAUST) Office of Sponsored Research (OSR) under Awards No. FCC/1/1976-04, URF/1/2602-01, and URF/1/3007-01 to X.G. This work was also supported by National Institutes of Health (NIH) [R01GM089753 to J.X.] and National Science Foundation (NSF) [DBI-1564955 to J.X.].

Conflict of Interest: none declared.

References

Brunger A.T. et al. (1998) Crystallography & NMR system: a new software suite for macromolecular structure determination. Acta Crystallogr. D Biol. Crystallogr., 54, 905–921. [DOI] [PubMed] [Google Scholar]
Haas J. et al. (2013) The Protein Model Portal—a comprehensive resource for protein structure and model information. Database, 10.1093/database/bat031. [DOI] [PMC free article] [PubMed] [Google Scholar]
Jones D.T. et al. (2015) MetaPSICOV: combining coevolution methods for accurate prediction of contacts and long range hydrogen bonding in proteins. Bioinformatics, 31, 999–1006. [DOI] [PMC free article] [PubMed] [Google Scholar]
Kamisetty H. et al. (2013) Assessing the utility of coevolution-based residue–residue contact predictions in a sequence-and structure-rich era. Proc. Natl. Acad. Sci. U. S. A., 110, 15674–15679. [DOI] [PMC free article] [PubMed] [Google Scholar]
Kim D.E. et al. (2004) Protein structure prediction and analysis using the Robetta server. Nucleic Acids Res., 32, W526–W531. [DOI] [PMC free article] [PubMed] [Google Scholar]
Kryshtafovych A. et al. (2018) Assessment of model accuracy estimations in CASP12. Proteins, 86, 345–360. [DOI] [PMC free article] [PubMed] [Google Scholar]
Lomize A.L. et al. (2006) Positioning of proteins in membranes: a computational approach. Protein Sci., 15, 1318–1333. [DOI] [PMC free article] [PubMed] [Google Scholar]
Schaarschmidt J. et al. (2018) Assessment of contact predictions in CASP12: co‐evolution and deep learning coming of age. Proteins, 86, 51–66. [DOI] [PMC free article] [PubMed] [Google Scholar]
Wang S. et al. (2015) DeepCNF-D: predicting protein order/disorder regions by weighted deep convolutional neural fields. Int. J. Mol. Sci., 16, 17315–17330. [DOI] [PMC free article] [PubMed] [Google Scholar]
Wang S. et al. (2016a) RaptorX-Property: a web server for protein structure property prediction. Nucleic Acids Res., 44, W430–W435. [DOI] [PMC free article] [PubMed] [Google Scholar]
Wang S. et al. (2016b) CoinFold: a web server for protein contact prediction and contact-assisted protein folding. Nucleic Acids Res., 44, W361–W366. [DOI] [PMC free article] [PubMed] [Google Scholar]
Wang S. et al. (2016c) Protein secondary structure prediction using deep convolutional neural fields. Sci. Rep., 6, 18962.. [DOI] [PMC free article] [PubMed] [Google Scholar]
Wang Z. et al. (2016d) Including H-bonding in depth-dependent membrane burial potentials for improving folding simulations. Biophys. J., 110, 58a. [Google Scholar]
Wang S. et al. (2017a) Folding membrane proteins by deep transfer learning. Cell Syst., 5, 202–211. e203. [DOI] [PMC free article] [PubMed] [Google Scholar]
Wang S. et al. (2017b) Accurate de novo prediction of protein contact map by ultra-deep learning model. PLoS Comput. Biol., 13, e1005324. [DOI] [PMC free article] [PubMed] [Google Scholar]
Wang S. et al. (2018) Analysis of deep learning methods for blind protein contact prediction in CASP12. Proteins, 86, 66–77. [DOI] [PMC free article] [PubMed] [Google Scholar]
Xu J., Zhang Y. (2010) How significant is a protein structure similarity with TM-score= 0.5? Bioinformatics, 26, 889–895. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary Material

Click here for additional data file.^{(2.7MB, pdf)}

[bty684-B1] Brunger A.T. et al. (1998) Crystallography & NMR system: a new software suite for macromolecular structure determination. Acta Crystallogr. D Biol. Crystallogr., 54, 905–921. [DOI] [PubMed] [Google Scholar]

[bty684-B2] Haas J. et al. (2013) The Protein Model Portal—a comprehensive resource for protein structure and model information. Database, 10.1093/database/bat031. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bty684-B3] Jones D.T. et al. (2015) MetaPSICOV: combining coevolution methods for accurate prediction of contacts and long range hydrogen bonding in proteins. Bioinformatics, 31, 999–1006. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bty684-B4] Kamisetty H. et al. (2013) Assessing the utility of coevolution-based residue–residue contact predictions in a sequence-and structure-rich era. Proc. Natl. Acad. Sci. U. S. A., 110, 15674–15679. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bty684-B5] Kim D.E. et al. (2004) Protein structure prediction and analysis using the Robetta server. Nucleic Acids Res., 32, W526–W531. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bty684-B6] Kryshtafovych A. et al. (2018) Assessment of model accuracy estimations in CASP12. Proteins, 86, 345–360. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bty684-B7] Lomize A.L. et al. (2006) Positioning of proteins in membranes: a computational approach. Protein Sci., 15, 1318–1333. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bty684-B8] Schaarschmidt J. et al. (2018) Assessment of contact predictions in CASP12: co‐evolution and deep learning coming of age. Proteins, 86, 51–66. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bty684-B9] Wang S. et al. (2015) DeepCNF-D: predicting protein order/disorder regions by weighted deep convolutional neural fields. Int. J. Mol. Sci., 16, 17315–17330. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bty684-B10] Wang S. et al. (2016a) RaptorX-Property: a web server for protein structure property prediction. Nucleic Acids Res., 44, W430–W435. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bty684-B11] Wang S. et al. (2016b) CoinFold: a web server for protein contact prediction and contact-assisted protein folding. Nucleic Acids Res., 44, W361–W366. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bty684-B12] Wang S. et al. (2016c) Protein secondary structure prediction using deep convolutional neural fields. Sci. Rep., 6, 18962.. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bty684-B13] Wang Z. et al. (2016d) Including H-bonding in depth-dependent membrane burial potentials for improving folding simulations. Biophys. J., 110, 58a. [Google Scholar]

[bty684-B14] Wang S. et al. (2017a) Folding membrane proteins by deep transfer learning. Cell Syst., 5, 202–211. e203. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bty684-B15] Wang S. et al. (2017b) Accurate de novo prediction of protein contact map by ultra-deep learning model. PLoS Comput. Biol., 13, e1005324. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bty684-B16] Wang S. et al. (2018) Analysis of deep learning methods for blind protein contact prediction in CASP12. Proteins, 86, 66–77. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bty684-B17] Xu J., Zhang Y. (2010) How significant is a protein structure similarity with TM-score= 0.5? Bioinformatics, 26, 889–895. [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

PredMP: a web server for de novo prediction and visualization of membrane proteins

Sheng Wang

Shiyang Fei

Zongan Wang

Yu Li

Jinbo Xu

Feng Zhao

Xin Gao

Roles

Abstract

Motivation

Results

Availability and implementation

Supplementary information

1 Introduction

2 Workflow and implementation

2.1 Multiple sequence alignment construction

2.2 1D annotation module for local structural property prediction

2.3 3D modeling module for de novo generating MP models

2.4 Visualization module for the display of the embedded MPs

3 Performances

Table 1.

4 Conclusions and discussions

Supplementary Material

Acknowledgements

Funding

References

Associated Data

Supplementary Materials

ACTIONS

PERMALINK

RESOURCES

Cite

Add to Collections

PERMALINK

PredMP: a web server for de novo prediction and visualization of membrane proteins

Sheng Wang

Shiyang Fei

Zongan Wang

Yu Li

Jinbo Xu

Feng Zhao

Xin Gao

Roles

Abstract

Motivation

Results

Availability and implementation

Supplementary information

1 Introduction

2 Workflow and implementation

2.1 Multiple sequence alignment construction

2.2 1D annotation module for local structural property prediction

2.3 3D modeling module for de novo generating MP models

2.4 Visualization module for the display of the embedded MPs

3 Performances

Table 1.

4 Conclusions and discussions

Supplementary Material

Acknowledgements

Funding

References

Associated Data

Supplementary Materials

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases