Abstract
The complex hydrophobic and hydrophilic milieus of membrane-associated proteins pose experimental and theoretical challenges to their understanding. Here we produce a non-redundant database to compute knowledge-based asymmetric cross-membrane potentials from the per-residue distributions of Cβ, Cγ and functional group atoms. We predict transmembrane and peripherally associated regions from genomic sequence and position peptides and protein structures relative to the bilayer (available at http://www.degradolab.org/ez). The pseudo-energy topological landscapes underscore positional stability and functional mechanisms demonstrated here for antimicrobial peptides, transmembrane proteins, and viral fusion proteins. Moreover, experimental effects of point mutations on the relative ratio changes of dual-topology proteins are quantitatively reproduced. The functional group potential and the membrane-exposed residues display the largest energetic changes enabling to detect native-like structures from decoys. Hence, focusing on the uniqueness of membrane-associated proteins and peptides, we quantitatively parameterize their cross-membrane propensity thus facilitating structural refinement, characterization, prediction and design.
Introduction
While membrane proteins account for over a quarter of the proteome and most drug targets, they constitute only 2% of deposited structures (Fagerberg et al., 2010; White, 2009). Computational structural analysis (Arinaminpathy et al., 2009; Frishman, 2010; Pellegrini-Calace and Thornton, 2009), prediction (Barth et al., 2009; Elofsson and von Heijne, 2007; Fleishman and Ben-Tal, 2006; Hurwitz et al., 2006; Kernytsky and Rost, 2003; Michino et al., 2009) and design (Ghirlanda, 2009; Samish et al., 2010; Senes, 2011) provide a way to bridge this gap. The five distinct environments of membrane proteins (a hydrocarbon core surrounded by asymmetric polar head groups and aqueous milieus) present experimental and theoretical challenges in their biophysical dissection (Bill et al., 2011; Bowie, 2005; Dowhan and Bogdanov, 2009; Langosch and Arkin, 2009; Moore et al., 2008; White, 2009). The need for functional plasticity within a hydrophobic setting requires weak and intricate helix-helix interaction motifs. The corresponding conformational changes facilitate functions such as transport, channeling and signal transduction. Other membrane-associated proteins disrupt the bilayer: SNARE proteins mediate eukaryotic cell vesicular fusion (Langosch et al., 2007), while viral fusion proteins merge the viral envelope and the target cell membrane (Harrison, 2008).
While transmembrane (TM) domains vary in length and composition according to their organelle association (Sharpe et al., 2010), the basic properties of TM helices follow simple rules. Hydrophobicity is pivotal in TM helix insertion via the translocon (White and von Heijne, 2008) as demonstrated by per-residue scales derived from experimental measurements of transfer energy to the membrane (Bowie, 2005; Elofsson and von Heijne, 2007; White, 2009). Still, charged residues can be located in the hydrophobic core of the membrane with their side chains snorkeling to the membrane surface (Chamberlain et al., 2004). Positively charged residues prefer the cytoplasm (the “positive-inside” rule (Nilsson et al., 2005; von Heijne, 1984)) significantly affecting topology. The clustering of Trp and Tyr at membrane interfaces (Yau et al., 1998) contributes to protein positioning in the membrane, and is biased to the outer leaflet for reasons that are not well understood (Nakashima and Nishikawa, 1992; Nilsson et al., 2005). Likewise, Gly exhibits preference for the outer leaflet, partially due to asymmetric helix-to-coil unwinding (Jin and Takada, 2008). Helix-helix interfaces are generally lined by small residues, which is key to the limited number of helix-assembly geometries (Walters and DeGrado, 2006). Thus, while some structural biases of membrane proteins are known, the full biophysical parameterization of these discrete phenomena and related energetics is, unfortunately, still unclear.
Surface helices, lying nearly parallel to the membrane, serve as structural supports or play functional roles, defining specific microenvironments, gating channels, or interacting with other proteins (Orgel, 2006). They may also be independent peptides such as antimicrobial peptides (Wimley, 2010; Zasloff, 2002), which are part of the innate immune system. Often, these amphiphillic segments have adjacent TM regions, making sequence context and hydrophobic moment key parameters for their prediction (Phoenix et al., 2002). Yet, there is currently no publicly-available tool for quickly docking such peptides onto the membrane.
To overcome the lack of biophysical understanding, statistical approaches have been applied to position proteins in the membrane (Senes et al., 2007; Ulmschneider et al., 2005, 2006). The knowledge-based Ez potential (Senes et al., 2007) uses an inverse Boltzmann potential (statistical potential or log-odds score (Saven, 2003)) to convert observed depth (z-coordinate) propensities into amino acid pseudo-energies. However, this symmetric algorithm was limited by the small size and limited diversity of the database on which it relied. It was unable to discriminate between the inner and outer leaflets of the membrane, let alone resolve higher-resolution phenomena such as rotamer preference. Moreover, the search and sampling procedure was unable to address large structures. Ulmschneider et al. evaluated the membrane insertion of six TM proteins and nine antimicrobial peptides using a gaussian set derived as an effective potential of mean force (Ulmschneider et al., 2005, 2006). They found that while hydrophobic matching is pivotal, it is the polar, charged, and aromatic residues that determine the orientation. Lomize et al. produced a dataset (OPM (Lomize et al., 2011)) for calculating an all-atom transfer energy using a planar hydrophobic slab and a solvation model.
Importantly, current methods lack the ability to robustly extend the conceptual framework from full TM proteins to the effect of numerous mutations as required for structure prediction and design. Further, a topological energy landscape is required to unravel characteristics such as stability and multiple positional minima, as found in dual topology proteins as well as in fusion and lytic peptides. Hence, while the study of membrane proteins has yielded significant insights, general rules for the intricate relationships between sequence, structure, and environment remain to be discovered and used for new analysis, prediction and design applications.
Here we utilize the growing number of structurally diverse, high resolution helical membrane protein structures to assemble a non-redundant, representative dataset from which we compute an asymmetric Ez potential for Cβ, and Cγ atoms, as well as functional group centers. In addition to orienting full-sized TM proteins accurately in the membrane we analyze pseudo-energy landscapes to examine biological systems like antimicrobial peptides (Wimley, 2010) and membrane fusion proteins. Likewise, we investigate higher-resolution phenomena such as rotameric preferences, topological stabilities, and the effects of point mutations on positional stability. We demonstrate that the new potential can predict and characterize membrane-associated proteins and peptides from genomic sequences and accurately position them relative to the membrane. It is also sufficiently sensitive to quantitatively assess the energetic effect of point mutations on the relative population ratio of dual topology TM proteins as well as accurately score and rank competing models. These features facilitate the prediction, refinement and design of TM protein structures.
Results
From Biophysical Insight to a Knowledge-Based Potential and Back
We trained our model on a representative experimental dataset of 76 proteins, which is over three times larger relative to the previous symmetric version (Senes et al., 2007) and, more importantly, has increased diversity (Figure 1A, Table S1a).
The cross-membrane distributions of residues are treated using reverse Boltzmann statistics, resulting in pseudo-energy profiles (Figure 1B, Figure S1a). The position-dependent potentials are modeled as gaussians or sigmoids, with added features at the edges when appropriate (Table S1b). For polar and charged side chains, the steepness of the transition into the hydrocarbon core depends on side chain length. For example, Asp and Asn have a steeper transition compared to Glu and Gln, respectively. The latter residues have an extra methylene group in their side chains, allowing them to snorkel more than the former ones. Overlaying the symmetric and asymmetric fits shows that many residues display significant asymmetries (Figure S1b). Lys and Arg show the largest asymmetry, following the positive-inside rule (Nilsson et al., 2005; von Heijne, 1984). The negatively charged residues Asp and Glu show a mild preference for the inner leaflet. This may facilitate salt bridges to Arg and Lys that follow the positive-inside rule. Pro, Tyr, and Trp exhibit mild asymmetric preferences for the outer leaflet. Gly and His show clear clusters just outside of the outer leaflet of the membrane. For His, this is mainly due to metal ligation, with the majority of counts in this cluster originating from respiratory and photosynthetic proteins. Interestingly, Cys favors the extracellular region where disulfide bonds are more easily formed compared to the reducing environment of the cytoplasm. Forty occurrences of Cys from sixteen proteins form an outer cluster facilitating disulfide bonds (28 residues) and metal ligation (12 residues), the latter mainly due to iron sulfur clusters. Notably, Cys is the main outlier of the otherwise excellent agreement between existing experimental scales and our computed transfer energy from the aqueous medium to the membrane center (Figure S1c).
Next, we focused on side chain orientation by deriving potentials using the location of Cγ or functional atoms (Tables S1c,d). Figure 1C demonstrates that functional group location drives the cross-membrane variation for some residues: polar and charged side chains display the sharpest transitions into the hydrophobic region, with features that are more clearly defined. An important possible application is the energetic scoring of rotamers.
Positioning and Orienting Proteins in the Membrane
With the sensitivity derived from the rich database, the algorithm should capture TM and surface-active helices from sequence information alone. EZ-Profile in TM-mode threads a 26 sliding residue sequence onto an ideal helix to assess whether the sequence is TM and if so, in what topology. The potential finds 484 of the 539 (90%) unique TM helices in the structural database. Interestingly, 80% of the predicted TM helices are predicted with the correct topology (Table S2a). This is comparable to the six leading machine learning methods (Table S2b). The comparison utilizes a standardized benchmark database (Kernytsky and Rost, 2003) circumventing the inherent bias of comparing to our database that was used to parameterize our potential.
For structures, Ez-3D uses a quick grid search (Figure 2), accurately positioning proteins to within the published error of OPM (Lomize et al., 2011), regardless of which potential (Cβ, Cγ or functional group) is applied (Table S2c and Figure S2). A similar result was obtained with a database of 185 membrane proteins positioned in the membrane using coarse-grained molecular dynamics simulations (Sansom et al., 2008), with the average difference in tilt being 9.5 ± 12°.
As a more stringent test, we applied Ez-3D to newly released structures that became available after the derivation of our potentials (Table S2b). The results are within the experimental error of OPM. For example for the cbb3 cytochrome oxidase (PDB ID 3mk7) and the FucP fucose transporter (PDB ID 3o7q) the tilt and shift relative to OPM are 3.9° / 2.7 Å and 0.9° / 1.0 Å for the two proteins, respectively.
To ensure that Ez-3D compares well with experimental data, two structures crystallized in a membrane (rather than a detergent) were analyzed. Compared to the experimental structure, the shift and tilt that Ez-3D assigns to bacteriorhodopsin (PDB ID 2brd) and aquaporin (PDB ID 3m9i) are 170° / 0Å and 169° / −1Å, respectively. As the input file is not z-aligned, the center of mass was used as a reference state with respect to shift. The z-axis definition for the aquaporin file is defined in a flipped conformation for the experimental structure relative to our definition, namely Ez assigns the correct topology. For bacteriorhodopsin, Ez-3D gives a flipped structure, however with an energy landscape showing a small energetic change between the two orientations. This is possibly due to buried charged residues such as the retinal-binding Arg.
Finally, to ameliorate the few cases in which TM helices are misidentified, we provided an option to manually identify one or more TM segments in place of the automatic assignment. For five structures annotated incorrectly, the average tilt decreased from 23.7° to 16.8° and the average shift is decreases from 17.4 Å to 3.5 Å (Table S2d).
Full Genome Residue Biases
To exemplify how our general method can recapitulate inside/outside sequence biases previously found by artificial intelligence techniques specifically trained for the task (Nilsson et al., 2005) we scanned the genomes of E. coli, P. falciparum, and S. pombe (Figure 3, Table S3a). Following the positive-inside rule, statistically significant biases are observed for Lys and Arg in all species, including P. falciparum, where the bias was not previously detected (Nilsson et al., 2005). More intriguing are species-specific biases on the outer leaflet, e.g., S. pombe preferentially places the polar residues Asp and Asn in the outer leaflet, while P. falciparum places the small residues Ala and Gly there. Thus, species-specific biases may be biologically important and partly due to different structure and composition of the translocon machinery.
More challengingly, we applied the TM-protein-derived parameterization to the medically important surface-anchored proteins. Three viruses were tested, and most known surface-anchored proteins were found (Table S3b). For example, Ez-Profile successfully predicts the positively charged residues near the N-terminus of HIV Nef protein (Gerlach et al., 2010) as being involved in membrane binding. Likewise, it picks out the C-terminal region of the HIV Env protein, which is known as LLP-1 and modulates fusion kinetics (Wyss et al., 2005). In the parainfluenza virus 5 matrix protein, the surface helix adjacent to and including the FPIV viral budding (Schmitt et al., 2005) motif is predicted as surface anchored.
Topology Characterization: Pseudo-Energy Landscapes of TM, Antimicrobial, Lytic and Fusion Peptides
Ez-3D provides pseudo-energy landscapes illuminating the energetic effect of changing tilt and orientation (Figure S4). For instance, TM peptides can easily be distinguished by eye from surface active helices and sequences that prefer to orient at an oblique angle (Figure 4). Likewise, the positive-inside rule manifests an energy well bias to the positive-inside conformation (Figure 4A). An amphiphillic peptide such as mastoparan exhibits a strong preference to the interfacial region of the inner leaflet (Figure 4B). In contrast, a TM peptide stabilized mainly by aromatic residues displays a more disperse energy well compared to a well stabilized TM peptide (Figure 4C). Finally, a lytic peptide e.g. Melittin, forms pores in the cell membrane and displays a dynamic equilibrium between a surface-bound conformation and a fully-inserted TM one (Figure 4D). Notably, Melittin places its positively charged residues at the C-terminus allowing for a TM configuration.
A more stringent test is the generation of mechanistic hypotheses for proteins that function dynamically. Figure 5 compares averaged pseudo-energy landscapes from antimicrobial, lytic, and fusion peptides (listed in Table S5). Antimicrobial peptides are amphiphillic sequences capable of penetrating and killing prokaryotic cells and play a role in the innate immune system. Such peptides depend on electrostatic interactions and thus are less effective against neutral eukaryotic membranes. Lytic peptides are similar but can also lyse eukaryotic cells. The latter have a weaker preference for the inner leaflet and a smaller barrier toward crossing the membrane core, possibly due to increased hydrophobicity and a reduced dependence on electrostatic interactions. Indeed, antimicrobial peptides are more charge dependent and are less accessible to non-microbial membranes. Both groups have a preference for forming inner leaflet helices parallel to the bilayer. This supports the “interfacial activity” model (Wimley, 2010), suggesting that the peptides partition to the head group region and eventually translocate to the inner leaflet, causing membrane disruption.
Similarities can also be observed among three classes of peptides which facilitate fusion. These display broad minima that encompass a range of orientations, from a modestly oblique TM segment to a surface-bound helix to being completely buried in the hydrocarbon layer. The extensive plasticity is unique to this class of proteins as compared to most multi or single-span helices and comes from the need to stabilize and adapt to multiple lipidic intermediates along the fusion pathway (Donald et al., 2011). Interestingly, viral fusion protein TMs exhibit nearly identical landscapes to SNARE TMs, while viral fusion peptides (which start the viral life-cycle buried in the protein core) tend to be less stable than the permanent TM segments.
Quantitative Recapitulation of Experimental Mutation Effects on Dual Topology Protein Topology
High resolution landscapes can demonstrate when a structure is topologically unstable, allowing investigation of the effects of single mutations. The EmrE transporter is a naturally-occurring dual-topology transporter found both with the N terminus facing the cytoplasm (Nin) and in the opposite orientation (Nout). The population ratio between topologies changes following even single point mutations (Seppala et al., 2010). Strikingly, these experimentally-determined effects are fully recapitulated by our potentials (Figure 6). The potential not only captures the trend of the relative topological change but also the precise division between the two topologies as measured by the relative Ez energy of each conformation (see Supplemental Methods).
Decoy Scoring: Knowledge-based vs. Energy-based Approach
Next, we assessed the capability of the functional group potential to score side chain conformations, e.g. select the native-like conformation from different rotameric states, a task pivotal to mutational analysis, structure prediction, refinement, and design. Using the high-resolution bacteriorhodopsin as a test case (Figure 7A) and other examples (Figure S7a) decoy structures were tested. These were produced by randomly rotating the side chain around the χ1 angle and tested by “near-nativeness,” defined as the percent of residues with χ1 angles within 30o of the native structure. The Ez-3D functional score is well correlated with the near-nativeness of the decoy structures. The R2 of the linear trend line is inversely correlated with the resolution and positively correlated with the protein size (Figure S7). To ensure that this correlation was not due solely to violations of stereochemistry, we used MESHI (Kalisman et al., 2005) to remove steric violations using an energy function trained on water soluble proteins (Amir et al., 2008; Summa and Levitt, 2007). The correlation between near-nativeness and Ez score was, indeed, maintained (Figure S7). Moreover, subjecting these decoys to minimization using the ROSETTA-membrane (Rohl et al., 2004) scoring function maintained this correlation (Figure S7).
Since the membrane milieu dictates conformational preferences, we hypothesized that Ez will be most powerful for the protein region that interacts with the complex headgroup region of the membrane. To test this hypothesis we looked at polar and charged residues (Asp, Asn, Glu, Gln, Lys, and Arg) that are on the surface of the protein but within the membrane. Such residues are known to snorkel their side chains from the hydrophobic region toward the aqueous solvent. Snorkeling involves a changing microenvironment making rotamer optimization via a local energy function challenging. The per residue contribution of the snorklers correlates with the average error of their χ1 angle (Figure 7B). Note that different residue types carry different penalties for membrane insertion; Asn and Gln have a lower penalty compared to charged residues. Further, the positive-inside rule means that insertion of Lys and Arg into the inner leaflet is energetically advantageous compared to the outer leaflet. Combined with structure resolution large difference are apparent in per residue contributions among the various structures. Strikingly, once these are accounted for the slope of the correlation with χ1 error is the same for all three proteins. Consequently, it is not surprising that while ROSETTA-membrane optimizes the decoys overall (Figure S7), it is not successful in optimizing snorkeling residues (Figure 7C) – in 89% of the decoys tested, the average χ1 deviation either decreases by less than 10° or actually increases. Indeed, our knowledge-based pseudo-energetic potential derives the transfer energy mainly from membrane exposed residues (Figure 7D) resulting in an ability to distinguish between a rotamer that is selected by a regular minimization routine to a snorkeling rotamer as exemplified in Figure 7E. Hence, the membrane protein designated functional group potential is superior to local energy minimization for rotamer optimization.
Discussion
The updated database presented here has enabled rigorous positioning and analysis of membrane protein helical structures using an exclusively knowledge-based potential. The potential focused on helical structures, which are the largest and most important membrane-associated protein group. Recently a similar approach was applied to beta-barrel proteins (Hsieh et al., 2011). The conceptual core method has been published by the DeGrado lab (Senes et al., 2007) and by others (Ulmschneider et al., 2005, 2006). We have extended those works to encompass assymmetry of residue spread, improved server reliability, accuracy and speed. More importantly, the new method can carry out a detailed analysis of the specific interactions between individual residues or helices and the membrane environment. These advantages are demonstrated for (a) the quantitativly precise detection of shifts in the relative topology of dual topology proteins, (b) the inference of mechanistic hypotheses from topological energy landscapes of short peptides, and (c) the detection of native-like rotamers within a decoy set. Taken together, the accuracy of our method along with the detailed topological energy landscape and the speed of the output can enable thorough analysis, prediction and design of membrane proteins.
Other methods for positioning proteins within a membrane include atomic-level, long-range molecular dynamics. Due to computational constraints, coarse-grain molecular dynamics have been used with overall satisfactory results (Sansom et al., 2008). However, molecular dynamics cannot be used to regularly produce full range energy landscapes required for thorough analysis or to systematically scan numerous mutations as required, e.g. for protein design. Notably, the pseudo-energy lanscape represents the topological aspect of backbone or side-chain energetics for a given rigid structure, rather than a full search for all side-chain conformations. As shown for several decoy sets, focusing on the difficult task of scoring alternative models and understanding the energetics of protein-membrane interaction provides a structure prediction and design capability when combined with an existing model generating tool.
The increased resolution of our results comes from the data’s coverage and representativeness (Samish, 2009), drawing on the exponential growth of the high resolution structural data (White, 2009) and the shift to more difficult-to-crystallize, wild-type, and mesophilic structures. While some applications require non-redundant datasets, others benefit from including all possible data. For membrane proteins, few and often extremophile structures, e.g. photosynthetic bacterial reaction centers, have been structurally elucidated resulting in highly redundant structural data. Including such data will bias the overall potential to the unique properties of specific families of structures. Such biases are beyond the recently characterized organelle specific biases (Sharpe et al., 2010). For this study, we required that subunits with two or more TM helices share less than 30% sequence identity with other such subunits in the database and gave preference to high-resolution, wild-type, ground-state mesophile structures.
The biogenesis of TM proteins requires insertion into the membrane via the translocon, which is thought to operate as a hydrophobicity sensor (Hessa et al., 2007). The determination of whether a sequence will become TM is captured by the Ez potential in great detail. For example, it enables the recapitulation of genomic sequence biases previously found by far more advanced artificial intelligence methods. As a follow-up study, it would be interesting to assess sequence-structure relationships by examining sequence biases not found by the Ez potential. Structurally, pseudo-energetic landscapes are rich in information that provides insight on conformational stability and can be used to generate mechanistic hypotheses or, alternatively, stabilize membrane proteins.
Current state of the art modeling relies on local energetics parameterized from soluble proteins. In contrast, this study emphasizes the unique cross-membrane distribution of each residue’s functional group specifically capturing the difference between soluble and membrane proteins. Consequently, the new method can improve existing tools for selecting preferred residues based on location and refining side chain orientations and rotamers.
The potential provides extraordinary power, especially considering its simple, biophysically-based structure. Despite using a fixed window and no machine learning or hierarchical characterization, Ez-Profile detects membrane-associated segments from primary sequence with results comparable to the most sophisticated methods and, further, can correctly determine their topologies. Indeed, since our scale agrees with experimental scales of position-specific amino acid contributions to the free energy of membrane proteins (Figure S1c), such scales can also predict membrane protein topology on par with the most sophisticated prediction methods (Bernsel et al., 2008). Thus, Ez helps bridge the gap through its ability to quickly scan sequences of TM helices and examine rotamer tendencies. No less important is the computational speed up compared to the previous, symmetric version. For instance, a medium-sized structure (1pw4, 434 residues) is oriented in 30 seconds (2.8 GHz Intel Core i7, 8 GB RAM). The speed-up increases rapidly with protein size, e.g. a 3,550-residue protein (PDB ID 1v54) runs in 5 minutes instead of days, making it now possible to run the program on an online server.
Our results show that the solvent exposed residues have the greatest dependency on membrane depth. The average transfer energy of buried residues is flatter than previously thought (Ulmschneider et al., 2005), since the packing and interior electrostatics of membrane and soluble protein interiors are similar (Joh et al., 2009). Consequently, TM protein interiors are well optimized via routine minimization. In contrast, the average transfer energy of exposed residues is highly dependent on the changing charge densities of the microenvironment. As such, utilization of this characterization, as done in the knowledge-based potential presented here, paves the way for a new membrane-protein-specific rotamer minimization routine.
Future directions will be possible following additional increase in the size and resolution of the database. These include the derivation of different scales for different subgroups of interest: different protein types with unique characteristics (e.g. transporters or photosynthetic systems), and different membrane types (e.g. specific organelles or different membrane thicknesses), as well as different organism types, such as eukaryotic vs. prokaryotic. Future applications include a membrane protein specific rotamer library, which will account for the relative distribution of lipid-facing rotamers in a cross-membrane specific manner, thus accounting for phenomena such as snorkeling. Finally, integrating this work into a protein design algorithm that produces alternative models will enable the implementation of a quick protein design focused on making proteins or peptides with specific topological stabilities or tendencies.
Hence, the newly derived biophysical parameterizations and the resulting toolbox offer benefits for the analysis, prediction and design of membrane-associated proteins. The community wide assessment of membrane protein structural prediction has shown that the field lags far behind the success for soluble structures and is highly dependent on expert input (Michino et al., 2009). Despite few computational design success stories (Yin et al., 2007), membrane protein computational design has yet to become widespread. Finally, beyond shedding light on membrane protein structure-function relationships, our approach may be used routinely for applications ranging from de novo design to assessing the energetics of side chain locations. It will also prove useful for mutagenesis efforts, as in stabilizing membrane proteins for biochemistry, industry, or crystallization. Current membrane protein design efforts in the lab are already employing these capabilities.
Experimental Procedures
Database
The dataset includes all TM alpha-helical structures available as of June 2010 with resolution better than 3.5Å and a maximum sequence identity of 30% for subunits with two or more TM segments. Preference was given to structural quality (resolution and Rfree), wild-type, mesophile, ground-state and non-engineered structures (Figure 1A and Table S1a). Proteins in the OPM (Lomize et al., 2011) database have an average TM helix tilt of θ=0° and thus were used ‘as is’ in the Ez parameterization. Proteins not found in this database were aligned to a related structure. Ez-3D tilts are defined as difference compared to OPM positioning. Separate databases were computed for buried and exposed residues by rolling a 1.9 Å-radius lipid methylene probe on the TM protein complex and defining residues in contact with the probe as ‘exposed’ (see Supplementary Methods)
Data Fitting
One subunit was retained from homooligomeric structures except 2j7a, for which two chains in different configurations were retained. Residues were divided into 2Å bins along the membrane normal, with Z=0 at the bilayer center and positive Z toward the exoplasm. Residue position was defined by using Cβ (Cα for glycine) and as in Table S1e for the advanced potentials. The propensity (Pz) and effective energy (Ez) were calculated as in the symmetric version (Senes et al., 2007) (Supplementary Methods and Table S1f) but separately for positive and negative Z. ΔEz was fit to a sigmoid, a guassian, or a combination of the two, depending on what best described the data (Table S1b and Supplementary Methods).
Ez-Profile
Ez-Profile follows an approach previously applied to a test case (Bissonnette et al., 2009). In TM helix mode, each 26-residue sliding window was threaded on an ideal helix, for which Cβ coordinates (pseudo-Cβ for glycine) had been precomputed. Three flanking residues on each side were included to help determine topology. Pseudo-energy was calculated with the center of mass positioned at the bilayer center. Topology was determined by a majority vote of all windows below the threshold (-4.9) in that local minimum. Consecutive TM segments were forced to alternate topology. Flipped helices were chosen to minimize the number of flips and avoid flipping segments for which the initial vote overwhelmingly favored a particular orientation (Supplementary Methods).
In surface helix mode, Ez-Profile examined regions less than 17 residues from chain termini and TM segments. A 7-residue window was threaded onto an ideal helix placed parallel to the plane of the membrane at a distance of +/−15Å. The helix was rotated about its axis in 10° steps, and the score for the same window, rotated by 180° around the helical axis, was subtracted so that,
where i is a sequence window, and z and θ are the depth and rotation of the helix, respectively. Thus, non-amphiphillic sequences tend to have an Sfinal~0. Windows for which the minimum of Sfinal(i)≥−1.25 were discarded. For greater specificity, a second, slower, step used Ez-3D to orient each remaining window; those with a preferred depth of −22.0≤Z≤−11.5 and Ez<2.0 were empirically determined to be those most likely to be surface helices.
Amino Acid Biases
Each chain in the proteome was scanned using Ez-Profile. Those for which TM segments were found and for which the topology could be assigned with a confidence score >0.95 (see Supplementary Methods) were analyzed for inside-outside amino acid biases, following von Heijne (Nilsson et al., 2005) (Supplementary Methods).
Ez-3D
Ez-Profile identified the TM segments, and HELANAL (Bansal et al., 2000) was used to find the helical axis of each one. These were averaged in order to estimate the protein axis, which was initially aligned to the Z-axis. The center of mass of the detected TM segments was positioned at Z=0 (Figure 2A). Translation along the Z-axis was carried out in 0.5Å increments and rotation was carried out in 5° and 2.5° steps around the Z and Y axes, respectively (Figure 2B). Large step sizes, restricted search space, and a precalculated lookup table of Ez values were used to reduce runtime without sacrificing accuracy (Figure S1e). Notably, while membrane proteins are found in membranes of diverse content and thickness, within the framework of our knowledge-based potential, scaling the potential to different membrane thicknesses did not affect our result significantly. Pseudo-energy landscapes were drawn with Gnuplot 4.2.5 (http://gnuplot.sourceforge.net/).
Overall, solvent-exposed residues contribute the most to the pseudo-energy and these contributions are centered on the membrane region (Figure 7D). However, applying our potential over solvent exposed residues alone did not show a significant improvement in the results with 5.5 ± 15Å change in shift and 4.7 ± 8.7Å change in tilt, both averaging over all proteins relative to the OPM database (Figure S7b). The increased spread of results possibly stems from considering far fewer residues and from considering residues beyond the depth that affect the potential the most. Thus, the correlation between near-nativeness and Ez score is most pronounced for the membrane exposed residues and contributes the most to the correlation coefficient.
Decoys
Decoys were generated using a random rotation of the χ1 dihedral angle from the native structure, allowing for a high variation of rotameric states among the decoy set. Notably, these decoys included large clashes and thus were not usable in standard minimization schemes. To compensate, the decoys were minimized using a designated protocol (Supplementary Methods) within MESHI (Kalisman et al., 2005). The “near-nativeness” of each decoy is defined as the number of residues for which the side chain χ1 angle is within 30o of the native structure, and was neither significantly affected by the application of MESHI nor by ROSETTA (Figure 7A) and ROSETTA-membrane (Figures 7 and S7).
To look specifically at snorkeling, we isolated highly polar or charged residues (Asp, Asn, Glu, Gln, Lys, and Arg) that were designated by VOLBL (see below) as lying on the protein surface. We excluded residues for which the Cβ atom had ∣Z∣ > 15 Å, since those are already in the aqueous solvent and do not snorkel. We also excluded residues that are involved in ion coordination, since their rotamers are determined by the geometry of the coordination. This left 5 residues in 3mk7, 14 residues in 1su4, and 32 in 2b6o that are considered to be snorkeling. The average deviation of χ1 from the native structure for these residues was calculated before and after minimization with ROSETTA-membrane.
EmrE models
The Cα EmrE structure (PDB ID 3b5d) was used as a template for the dual-topology mutation analysis. Geometric constraints were used for backbone reconstruction (Gront et al., 2007). Missing residues at each terminus and all side chains were added and minimized using Rosetta (Rohl et al., 2004) with an mutations as in (Seppala et al., 2010). To determine the relative ratios of Nin and Nout, we assumed a Boltzmann distribution for the probability of finding the model in a given topology. The two configurations were delimited by −5Å≤Z≤5Å with θ ≤10° and θ≥170° for Nin and Nout, respectively. For this area the Boltzman probability (back-calculated from the reported Ez energy) was computed for each grid point. We then took the Boltzman probability ratio P(Nin)/P(Nout) (Supplementary methods) to eliminate the constant and further normalized so that the ratio for E14D was equal to that reported by Seppala et al. (Seppala et al., 2010).
Supplementary Material
Highlights.
A statistical potential accurately positions proteins and peptides in the membrane.
Topological energy landscapes present structure-function-stability relationships.
Side-chain functional-atom parameters allow modeling of rotameric preferences.
Experimental effects of point mutations on topology are quantitatively reproduced.
Acknowledgements
We thank Cinque S. Soto, Nathan H. Joh, Alessandro Senes and Andrei L. Lomize for fruitful discussions. We thank Christopher M. MacDermaid for assistance regarding Figure 1. We thank financial support from NIH (GM54616 to WFD; HL085303 to JGS), DOD (NDSEG fellowship to BTH), the Human Frontiers Science Program (IS), and NSF (DMR 0520020 to JGS and MRSEC and URSCC to WFD).
WFD, JGS, CAS and IS designed research; IS and CAS performed research; BTH and IS fit the data; JED performed the original Ez-Profile study; CK contributed the MESHI and adapted it with IS for this study; all authors analyzed the data; CAS and IS wrote the paper with contributions from the other authors.
Footnotes
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
References
- Amir ED, Kalisman N, Keasar C. Differentiable, multi-dimensional, knowledge-based energy terms for torsion angle probabilities and propensities. Proteins. 2008;72:62–73. doi: 10.1002/prot.21896. [DOI] [PubMed] [Google Scholar]
- Arinaminpathy Y, Khurana E, Engelman DM, Gerstein MB. Computational analysis of membrane proteins: the largest class of drug targets. Drug Discov Today. 2009;14:1130–1135. doi: 10.1016/j.drudis.2009.08.006. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Bansal M, Kumar S, Velavan R. HELANAL: a program to characterize helix geometry in proteins. J Biomol Struct Dyn. 2000;17:811–819. doi: 10.1080/07391102.2000.10506570. [DOI] [PubMed] [Google Scholar]
- Barth P, Wallner B, Baker D. Prediction of membrane protein structures with complex topologies using limited constraints. Proc Natl Acad Sci U S A. 2009;106:1409–1414. doi: 10.1073/pnas.0808323106. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Bernsel A, Viklund H, Falk J, Lindahl E, von Heijne G, Elofsson A. Prediction of membrane-protein topology from first principles. Proc Natl Acad Sci U S A. 2008;105:7177–7181. doi: 10.1073/pnas.0711151105. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Bill RM, Henderson PJ, Iwata S, Kunji ER, Michel H, Neutze R, Newstead S, Poolman B, Tate CG, Vogel H. Overcoming barriers to membrane protein structure determination. Nat Biotechnol. 2011;29:335–340. doi: 10.1038/nbt.1833. [DOI] [PubMed] [Google Scholar]
- Bissonnette ML, Donald JE, DeGrado WF, Jardetzky TS, Lamb RA. Functional analysis of the transmembrane domain in paramyxovirus F protein-mediated membrane fusion. J Mol Biol. 2009;386:14–36. doi: 10.1016/j.jmb.2008.12.029. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Bowie JU. Solving the membrane protein folding problem. Nature. 2005;438:581–589. doi: 10.1038/nature04395. [DOI] [PubMed] [Google Scholar]
- Chamberlain AK, Lee Y, Kim S, Bowie JU. Snorkeling preferences foster an amino acid composition bias in transmembrane helices. J Mol Biol. 2004;339:471–479. doi: 10.1016/j.jmb.2004.03.072. [DOI] [PubMed] [Google Scholar]
- Donald JE, Zhang Y, Fiorin G, Carnevale V, Slochower DR, Gai F, Klein ML, Degrado WF. Transmembrane orientation and possible role of the fusogenic peptide from parainfluenza virus 5 (PIV5) in promoting fusion. Proc Natl Acad Sci U S A. 2011 doi: 10.1073/pnas.1019668108. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Dowhan W, Bogdanov M. Lipid-dependent membrane protein topogenesis. Annu Rev Biochem. 2009;78:515–540. doi: 10.1146/annurev.biochem.77.060806.091251. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Elofsson A, von Heijne G. Membrane protein structure: prediction versus reality. Annu Rev Biochem. 2007;76:125–140. doi: 10.1146/annurev.biochem.76.052705.163539. [DOI] [PubMed] [Google Scholar]
- Fagerberg L, Jonasson K, von Heijne G, Uhlen M, Berglund L. Prediction of the human membrane proteome. Proteomics. 2010;10:1141–1149. doi: 10.1002/pmic.200900258. [DOI] [PubMed] [Google Scholar]
- Fleishman SJ, Ben-Tal N. Progress in structure prediction of alpha-helical membrane proteins. Curr Opin Struct Biol. 2006;16:496–504. doi: 10.1016/j.sbi.2006.06.003. [DOI] [PubMed] [Google Scholar]
- Frishman D. Structural bioinformatics of membrane proteins. 1st edn Springer; New York: 2010. [Google Scholar]
- Gerlach H, Laumann V, Martens S, Becker CF, Goody RS, Geyer M. HIV-1 Nef membrane association depends on charge, curvature, composition and sequence. Nat Chem Biol. 2010;6:46–53. doi: 10.1038/nchembio.268. [DOI] [PubMed] [Google Scholar]
- Ghirlanda G. Design of membrane proteins: toward functional systems. Curr Opin Chem Biol. 2009;13:643–651. doi: 10.1016/j.cbpa.2009.09.017. [DOI] [PubMed] [Google Scholar]
- Gront D, Kmiecik S, Kolinski A. Backbone building from quadrilaterals: a fast and accurate algorithm for protein backbone reconstruction from alpha carbon coordinates. J Comput Chem. 2007;28:1593–1597. doi: 10.1002/jcc.20624. [DOI] [PubMed] [Google Scholar]
- Harrison SC. Viral membrane fusion. Nat Struct Mol Biol. 2008;15:690–698. doi: 10.1038/nsmb.1456. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hessa T, Meindl-Beinker NM, Bernsel A, Kim H, Sato Y, Lerch-Bader M, Nilsson I, White SH, von Heijne G. Molecular code for transmembrane-helix recognition by the Sec61 translocon. Nature. 2007;450:1026–1030. doi: 10.1038/nature06387. [DOI] [PubMed] [Google Scholar]
- Hsieh D, Davis A, Nanda V. A knowledge-based potential highlights unique features of membrane alpha-helical and beta-barrel protein insertion and folding. Protein Sci. 2011 doi: 10.1002/pro.758. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hurwitz N, Pellegrini-Calace M, Jones DT. Towards genome-scale structure prediction for transmembrane proteins. Philos Trans R Soc Lond B Biol Sci. 2006;361:465–475. doi: 10.1098/rstb.2005.1804. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Jin W, Takada S. Asymmetry in membrane protein sequence and structure: glycine outside rule. J Mol Biol. 2008;377:74–82. doi: 10.1016/j.jmb.2008.01.013. [DOI] [PubMed] [Google Scholar]
- Joh NH, Oberai A, Yang D, Whitelegge JP, Bowie JU. Similar energetic contributions of packing in the core of membrane and water-soluble proteins. J Am Chem Soc. 2009;131:10846–10847. doi: 10.1021/ja904711k. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kalisman N, Levi A, Maximova T, Reshef D, Zafriri-Lynn S, Gleyzer Y, Keasar C. MESHI: a new library of Java classes for molecular modeling. Bioinformatics. 2005;21:3931–3932. doi: 10.1093/bioinformatics/bti630. [DOI] [PubMed] [Google Scholar]
- Kernytsky A, Rost B. Static benchmarking of membrane helix predictions. Nucleic Acids Res. 2003;31:3642–3644. doi: 10.1093/nar/gkg532. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Langosch D, Arkin IT. Interaction and conformational dynamics of membrane-spanning protein helices. Protein Sci. 2009;18:1343–1358. doi: 10.1002/pro.154. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Langosch D, Hofmann M, Ungermann C. The role of transmembrane domains in membrane fusion. Cell Mol Life Sci. 2007;64:850–864. doi: 10.1007/s00018-007-6439-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lomize MA, Pogozheva ID, Joo H, Mosberg HI, Lomize AL. OPM database and PPM web server: resources for positioning of proteins in membranes. Nucleic Acids Res. 2011 doi: 10.1093/nar/gkr703. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Michino M, Abola E, Brooks CL, 3rd, Dixon JS, Moult J, Stevens RC. Community-wide assessment of GPCR structure modelling and ligand docking: GPCR Dock 2008. Nat Rev Drug Discov. 2009;8:455–463. doi: 10.1038/nrd2877. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Moore DT, Berger BW, DeGrado WF. Protein-protein interactions in the membrane: Sequence, structural, and biological motifs. Structure. 2008;16:991–1001. doi: 10.1016/j.str.2008.05.007. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Nakashima H, Nishikawa K. The amino acid composition is different between the cytoplasmic and extracellular sides in membrane proteins. FEBS Lett. 1992;303:141–146. doi: 10.1016/0014-5793(92)80506-c. [DOI] [PubMed] [Google Scholar]
- Nilsson J, Persson B, von Heijne G. Comparative analysis of amino acid distributions in integral membrane proteins from 107 genomes. Proteins. 2005;60:606–616. doi: 10.1002/prot.20583. [DOI] [PubMed] [Google Scholar]
- Orgel JP. Surface-active helices in transmembrane proteins. Curr Protein Pept Sci. 2006;7:553–560. doi: 10.2174/138920306779025666. [DOI] [PubMed] [Google Scholar]
- Pellegrini-Calace M, Thornton JM. Methods to classify and predict the structure of membrane proteins. In: Gu J, Bourne PE, editors. Structural Bioinformatics. Wiley; 2009. pp. 883–908. [Google Scholar]
- Phoenix DA, Harris F, Daman OA, Wallace J. The prediction of amphiphilic alpha-helices. Curr Protein Pept Sci. 2002;3:201–221. doi: 10.2174/1389203024605368. [DOI] [PubMed] [Google Scholar]
- Rohl CA, Strauss CE, Misura KM, Baker D. Protein structure prediction using Rosetta. Methods Enzymol. 2004;383:66–93. doi: 10.1016/S0076-6879(04)83004-0. [DOI] [PubMed] [Google Scholar]
- Samish I. Search and sampling in structural bioinformatics. In: Gu J, Bourne P, editors. Structural Bioinformatics. 2nd edition Wiley; 2009. pp. 207–235. [Google Scholar]
- Samish I, MacDermaid CM, Perez-Aguilar JM, Saven JG. Theoretical and Computational Protein Design. Annu Rev Phys Chem. 2010 doi: 10.1146/annurev-physchem-032210-103509. [DOI] [PubMed] [Google Scholar]
- Sansom MS, Scott KA, Bond PJ. Coarse-grained simulation: a high-throughput computational approach to membrane proteins. Biochem Soc Trans. 2008;36:27–32. doi: 10.1042/BST0360027. [DOI] [PubMed] [Google Scholar]
- Saven JG. Connecting statistical and optimized potentials in protein folding via a generalized foldability criterion. J Chem Phys. 2003;118:6133–6136. [Google Scholar]
- Schmitt AP, Leser GP, Morita E, Sundquist WI, Lamb RA. Evidence for a new viral late-domain core sequence, FPIV, necessary for budding of a paramyxovirus. J Virol. 2005;79:2988–2997. doi: 10.1128/JVI.79.5.2988-2997.2005. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Senes A. Computational design of membrane proteins. Curr Opin Struct Biol. 2011;21:460–466. doi: 10.1016/j.sbi.2011.06.004. [DOI] [PubMed] [Google Scholar]
- Senes A, Chadi DC, Law PB, Walters RF, Nanda V, Degrado WF. E(z), a depth-dependent potential for assessing the energies of insertion of amino acid side-chains into membranes: derivation and applications to determining the orientation of transmembrane and interfacial helices. J Mol Biol. 2007;366:436–448. doi: 10.1016/j.jmb.2006.09.020. [DOI] [PubMed] [Google Scholar]
- Seppala S, Slusky JS, Lloris-Garcera P, Rapp M, von Heijne G. Control of membrane protein topology by a single C-terminal residue. Science. 2010;328:1698–1700. doi: 10.1126/science.1188950. [DOI] [PubMed] [Google Scholar]
- Sharpe HJ, Stevens TJ, Munro S. A comprehensive comparison of transmembrane domains reveals organelle-specific properties. Cell. 2010;142:158–169. doi: 10.1016/j.cell.2010.05.037. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Summa CM, Levitt M. Near-native structure refinement using in vacuo energy minimization. Proc Natl Acad Sci U S A. 2007;104:3177–3182. doi: 10.1073/pnas.0611593104. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ulmschneider MB, Sansom MS, Di Nola A. Properties of integral membrane protein structures: derivation of an implicit membrane potential. Proteins. 2005;59:252–265. doi: 10.1002/prot.20334. [DOI] [PubMed] [Google Scholar]
- Ulmschneider MB, Sansom MS, Di Nola A. Evaluating tilt angles of membrane-associated helices: comparison of computational and NMR techniques. Biophys J. 2006;90:1650–1660. doi: 10.1529/biophysj.105.065367. [DOI] [PMC free article] [PubMed] [Google Scholar]
- von Heijne G. Analysis of the distribution of charged residues in the N-terminal region of signal sequences: implications for protein export in prokaryotic and eukaryotic cells. EMBO J. 1984;3:2315–2318. doi: 10.1002/j.1460-2075.1984.tb02132.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Walters RF, DeGrado WF. Helix-packing motifs in membrane proteins. Proc Natl Acad Sci U S A. 2006;103:13658–13663. doi: 10.1073/pnas.0605878103. [DOI] [PMC free article] [PubMed] [Google Scholar]
- White SH. Biophysical dissection of membrane proteins. Nature. 2009;459:344–346. doi: 10.1038/nature08142. [DOI] [PubMed] [Google Scholar]
- White SH, von Heijne G. How translocons select transmembrane helices. Annu Rev Biophys. 2008;37:23–42. doi: 10.1146/annurev.biophys.37.032807.125904. [DOI] [PubMed] [Google Scholar]
- Wimley WC. Describing the mechanism of antimicrobial Peptide action with the interfacial activity model. ACS Chem Biol. 2010;5:905–917. doi: 10.1021/cb1001558. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wyss S, Dimitrov AS, Baribaud F, Edwards TG, Blumenthal R, Hoxie JA. Regulation of human immunodeficiency virus type 1 envelope glycoprotein fusion by a membrane-interactive domain in the gp41 cytoplasmic tail. J Virol. 2005;79:12231–12241. doi: 10.1128/JVI.79.19.12231-12241.2005. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Yau WM, Wimley WC, Gawrisch K, White SH. The preference of tryptophan for membrane interfaces. Biochemistry. 1998;37:14713–14718. doi: 10.1021/bi980809c. [DOI] [PubMed] [Google Scholar]
- Yin H, Slusky JS, Berger BW, Walters RS, Vilaire G, Litvinov RI, Lear JD, Caputo GA, Bennett JS, DeGrado WF. Computational design of peptides that target transmembrane helices. Science. 2007;315:1817–1822. doi: 10.1126/science.1136782. [DOI] [PubMed] [Google Scholar]
- Zasloff M. Antimicrobial peptides of multicellular organisms. Nature. 2002;415:389–395. doi: 10.1038/415389a. [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.