Skip to main content
Proceedings of the National Academy of Sciences of the United States of America logoLink to Proceedings of the National Academy of Sciences of the United States of America
. 2025 Oct 7;122(41):e2503134122. doi: 10.1073/pnas.2503134122

Design principles of the common Gly-X6-Gly membrane protein building block

Kiana Golden a, Catalina Avarvarei a, Charlie T Anderson a, Matthew Holcomb a, Weiyi Tang a, Xiaoping Dai a, Minghao Zhang a, Colleen A Mailie a, Brittany B Sanchez b, Jason S Chen b, Stefano Forli a, Marco Mravic a,1
PMCID: PMC12541321  PMID: 41055983

Significance

Current biophysical models poorly describe the behaviors and energetics of proteins within lipid bilayers, limiting our abilities to understand disease-related misfolding or modulate mechanisms for therapeutic intervention. Membrane protein α-helices’ packing determines the stabilities and architectures that brace function and evolution. We demonstrated a protein design approach for decoding sequence–structure relationships underlying transmembrane α-helix packing. Through characterizing optimized synthetic versions of a common natural structural motif, we uncover molecular features clarifying how Small-X6 -Small consensus sequences encode antiparallel helix folds and stability. These findings offer chemical rules and computational strategies that enhance our ability to interpret and manipulate membrane protein folding and function, with implications for therapeutic targeting, protein evolutionary trajectories, and biotechnological applications like engineering nanopores and lipid-embedded minibinders.

Keywords: protein design, membrane protein, bioinformatics, lipid bilayers, protein folding

Abstract

Protein behavior in lipids is poorly understood and inadequately represented in current computational models. Design and prediction abilities for bilayer-embedded molecular structures may be improved by characterizing membrane proteins’ most frequent, favored structural features to glean both context-specific and general principles. We used protein design to proactively interrogate the sequence–structure relationship and stabilizing atomic details of two highly prevalent antiparallel transmembrane (TM) motifs with Small-X6-Small consensus sequences. A fragment-based data-mining and sequence statistical inference method including cross-evolutionary structure-aligned covariance enabled engineering of de novo TM protein assemblies by successfully encoding Gly-X6-Gly and Ala-X6-Ala building blocks. A highly stable glycine-based design’s X-ray structure hosts Cα-H∙∙∙O = C H-bonding alongside extensive backbone-directed van der Waals packing, idealizing features of this motif in Nature. Data-driven design navigates sequence space to directly inquire upon how to encode and stabilize vital membrane protein structural elements, facilitating efficacious construction of lipid-embedded architectures of increasing complexity.


A few unique helix–helix packing geometries are sufficient to describe about half of membrane protein internal architectures (13). This basis set of transmembrane (TM) domain building blocks constitutes folded cores, drives oligomerization interfaces, and anchors dynamic regions and functional sites across organisms, proteomes, and functional families. Although these recursive packing units have been previously identified, most of their sequence–structure relationships remain ambiguous—underutilized in clarifying principles of folding in lipid and membrane protein structure prediction and design. Studies of one such TM building block revolutionized the membrane field: i.e., tight parallel-oriented TM spans mediated by Small-X3-Small sequences (Small = Gly, Ser, Ala) including G-X3-G (46). Successful protein design reflects mastery of that motif’s sequence–structure encoding (711). By contrast, packing motifs between antiparallel TM helices are exceedingly common yet poorly defined, having limited experimental evidence clarifying sidechain interaction patterns or consensus sequences that reliably encode these important structures.

One lipid-embedded sequence pattern, Small-X6-Small, is becoming increasingly correlated with a packing unit of tight left-handed antiparallel helices. This ubiquitous geometry comprises ~11% of all 2-helix interaction modes in membrane assemblies and is more common than parallel GASright motifs (Small-X3-Small-type) (1) (1214). Small-X6-Small patterns like G-X6-G are conserved in many important protein classes including transporters (15), channels (16), lipid enzymes (17), and tetraspanins (18) (Fig. 1A), often stabilizing local TM helix bundles within folds that undergo global interdomain conformational changes. When investigated experimentally, G-X6-G motifs within the CD36 receptor superfamily (19) and homodimeric multidrug transporters (12, 20) were functionally essential, mediating necessary TM assembly. Likewise, Small-X6-Small sequences in TM peptides can drive their self-assembly in model membranes (13, 21, 22). Small residues repeated every 7 residues (i.e. every other helix turn, “a” in “abcdefg” heptad) line a continuous helix face and mutually interdigitate to mediate extended antiparallel packing surfaces like leucine zippers, but with much closer mainchains distances (23). At glycines (e.g. G-X6-G), backbone atoms can come near enough for potential interhelix Cα-H∙∙∙O = C hydrogen bonds (H-bonds), with distinct antiparallel configurations understudied compared to those mediating parallel G-X3-G motifs (4, 5). In Nature, however, Small-X6-Small sequences do not always utilize H-bonds and in multispanning proteins they are seen in diverse contexts including packing both parallel and antiparallel interfaces (24). Thus, it remains uncertain how specific TM geometries (i.e. parallel/antiparallel, oligomer size) are encoded and how polar versus apolar interactions determine energetics for the scope of Small-X6-Small sequences.

Fig. 1.

Fig. 1.

De novo design of minimal idealized G-X6-G TM proteins. (A) Motif of close antiparallel TM helices in membrane proteins using G-X6-G repeats (white cartoon); glycines, purple sticks or red Cα spheres; yellow dash, Cα-H hydrogen bonds. PDB codes Left to Right: 6LEO, 7C83, 4ZP0, 8SRN. (B) A design model is broken into fragments centered around each interface amino acid (AA), at a and “d-e” packing layers (red and purple Cα spheres, respectively). Natural membrane protein structures (SI Appendix, Table S1) are data-mined for instances of similar local helix–helix geometries, collecting corresponding primary sequences to compile a structure-based MSA. Packing layers represented in a graph-based network (Right) are sampled to statistically infer new de novo TM protein sequences from the MSA data. (C) Models of de novo antiparallel G-X6-G TM homodimer complexes overlaid with ESMfold predicted structures (pLDDT > 0.8). N and C termini are labeled blue and red, respectively.

Protein design has preliminarily explored the sequence–structure linkage for Small-X6-Small patterns (25, 26). Studies of 3 synthetic A-X6-A TM peptides find this motif can encode monomers (21), antiparallel dimers (13), or mixed topology trimers (8)— indicating the residues outside the Small a position heavily influence the fold. S-X6-S TM peptides with (SxxLxxx)3 repeats assembled primarily to parallel dimers in model membranes (21), mirroring behavior of the mouse erythropoietin receptor (mEpoR)’s similar TM sequence pattern (27). From limited evidence, G-X6-G repeats may have greater antiparallel orientation tendency (13). G-X6-G TM peptides mimicking sequences of the conserved 4th TM spans (TM4) mediating subunit dimerization have been used to inhibit small multidrug resistance (SMR) efflux pumps (28). Their activities across SMR orthologs (29) suggest that “d” and “e” positions at intermediate helix turns between glycines tune specificity. We previously designed de novo TM proteins that selectively recognize and inhibit mEpoR by accommodating mixed Small-X6-Small identities (Ala/Ser/Gly at a) and explicitly optimizing d and e residues (14). Notably, two recent studies attempting to design parallel TM domain assemblies inadvertently used Small-X6-Small patterns; both structures instead adopted corresponding antiparallel TM interfaces (8, 30). Overall, general principles connecting Small-X6-Small sequences and antiparallel TM structures are not clear from these limited examples. Filling the major knowledge gaps, i.e. the motif’s structure-energetic basis and impact of non-a sequence context in stability and conformational specificity, should broadly impact the understanding of membrane protein folding and boost capabilities for engineering TM architectures.

Here, we developed a data-driven method we expected would successfully design model TM proteins which encode this antiparallel motif, thus enabling experiments clarifying determinants of folding and stability. The engineered TM sequences hosting G-X6-G and A-X6-A patterns all fold selectively to the homodimeric antiparallel assemblies intended. Characterization of the synthetic G-X6-G TM protein family revealed a range of folding free energies and characteristic interhelical Cα-H∙∙∙O = C H-bonds. Van der Waals (vdW) sidechain packing at ancillary interacting positions was the dominant molecular feature tuning the specificity and stability among the synthetic TM protein variants studied. The algorithm and biophysical principles presented here establish frameworks to design TM interfaces, clarify membrane protein principles through the lens of their fundamental building blocks, and construct synthetic protein assemblies folded in lipid by-parts.

Results

Computational Design of Minimal Small-X6-Small Proteins.

We sought to design simple single-span TM domains as idealized representatives of Small-X6-Small TM building blocks in Nature to test and clarify consensus elements encoding their folding. Force field-driven rotamer trials like RosettaMembrane do not recapitulate natural preferences at TM domain interfaces (8, 31). Machine learning models trained on predominantly water-soluble proteins likely undervalue lipid-embedded sequence–structure relationships. Instead, we hypothesized this encoding could be achieved by implementing a very simple framework of statistical inference, inspired by refs. 3234 for globular proteins, and utilizing the relatively limited set of membrane protein structures available. In this fragment-based analysis, we deconstruct the target protein architecture we aim to engineer into smaller subsections of local tertiary geometry, i.e. pairs of mutually packing α-helices (Fig. 1B). To determine amino acid identity preferences and interresidue relationships within each residue’s local environment, we search all known unique membrane proteins for geometrically similar instances of the intended helix–helix arrangement (1 Å backbone RMSD, SI Appendix, Table S1) and collect all primary sequences observed to support the conformation. This results in structure-based multiple sequence alignments (MSA) of interacting groups surrounding each key interface position (Fig. 1B). We posited that a simple 2 term scoring function comparing observed per-residue MSA statistics to expected frequencies would be sufficient to discern amino acid and interaction group choices that can stabilize the intended fold: amino acid enrichment (1-body) and covariance between interhelical residues (2-body). To search sequences globally optimizing this statistical score, a graph-based representation of residues constituting the interface was used (Fig. 1B), thus generating full de novo TM domains with native-like molecular features and patterns likely for the structure.

Our generative design model aims to similarly leverage covariance’s predictive power proven in emerging AI/ML models (3537) by mining similar nano-environments from unrelated membrane proteins across diverse organisms, protein families, and topologies. The approach overcomes the lack of relevant protein familial MSA data crucial to those models, done by constructing analogous structurally aligned “cross-evolutionary” MSA data and identifying convergent molecular patterns for de novo design. Thus, we hypothesize that the quantity of existing membrane-embedded structures combined with the proposed data structure (sequence–structure tertiary fragment pairs) is sufficient to capture and proactive engineer TM-specific relationships—presumably with interaction strengths comparable to or optimized beyond those in Nature. Designing new minimalist TM proteins serves as a simple accessible model experimental system. Each sequence is a data-driven hypothesis of features suitable for that protein topology. We explored this design-test-learn platform to evaluate the computational approach and probe consensus relationships between sequence, structure, and energetics for TM building blocks like Small-X6-Small.

We benchmarked this algorithm for encoding sequences compatible with our recently designed membrane pentamer channel, which hosts a common parallel-oriented building block (38). We previously showed 1-body per-residue amino acid enrichment from this cross-evolutionary structure-based MSA analysis has predictive power of highly stable protein variants. Our generative model, now including a 2-body residue covariance term, results in superior recovery of sequences known to have experimentally proven stable folding, notably improving correlated pairwise amino acid choices at residues with further Cα–Cα distances (SI Appendix, Fig. S1).

Next, we deployed this algorithm to design 1-pass TM spans hosting Small-X6-Small repeats, testing whether resulting sequences can stabilize homodimeric folding to the tight packing antiparallel motif (3). First, the range of left-handed backbone arrangements relevant to this motif in Nature was defined by sampling a set of idealized coiled–coiled molecular models varying in helix–helix geometry (39) and scoring each by frequency of occurrence in known membrane proteins (40) (Fig. 1B and SI Appendix, Fig. S2 and Table S1). We selected 4 common, but mutually distinct representatives to design: 3 close backbones (8.0 or 8.4 Å interhelical distance) preferring G-X6-G patterns and a wider backbone (8.8 Å) preferring A-X6-A patterns (Fig. 1C and SI Appendix, Fig. S2B).

For protein design, a set of sequences statistically optimized for each interhelix geometry was produced (logos in SI Appendix, Fig. S3). Then, top ranked sequences were subsequently modeled and repacked in all-atom detail. Models with inevitable clashes were discarded. Designed sequences with <2 Å RMSD and RMSF stability all-atom lipid bilayers molecular dynamics (MD) simulations were selected for experimental characterization. We additionally generated a structure-based manually designed sequence, G-X6-G Design-1, using the same starting backbone used for the statistically optimized G-X6-G Design-2. This compares our computational approach versus rational design for efficacy to derive G-X6-G sequences that fold, the key differences being d and e sidechains (SI Appendix, Fig. S2B). Five unique parent de novo TM proteins resulted, varying in hydrophobic length, composition, and number of repeats (Fig. 2A and SI Appendix, Table S2).

Fig. 2.

Fig. 2.

Folding of synthetic G-X6-G TM domains. (A) G-X6-G sequences. Expected TM region underlined, numbered, and aligned by heptad position. Glycines, red, bolded. Interface positions, bold. TM domain length listed. Right, cartoon of SUMO (purple) and TM domain (red) fusion construct in detergent. (B) Size-exclusion chromatogram (SEC) of normalized 280 nm absorbance for 2 mg/mL SUMO-Design-1a (representative of n = 3) on superdex200i column with reducing 1 mM DDM mobile phase. Red spheres, assigned oligomer. The orange box is expected dimer elution range. (C) SEC of SUMO-Design-2 (representative of n = 3) in DDM. (D) SEC of SUMO-Design-3a (representative of n = 3) in DDM. (E) Left, Tris-Glycine SDS-PAGE of 1 mg/mL SUMO-Design-2 in DDM (monomer, 17 kDa; dimer, 34 kDa) at increasing loading or mutant G16F (representative of n = 3). Middle, Design-1a SDS-PAGE (representative of n = 3). Right, Design-2 by MES-SDS after heating to 95 ° C for 30 min in 4% w/v lithium dodecyl sulfate (LDS) or 0.2% SDS (n = 2). (F) SEC of SUMO-Design-4 preincubated in 30 mM C14B or 40 mM DDM (load) and run in 3 mM C14B or 1 mM DDM mobile phase.

Additional variants were derived from the guiding design models to facilitate cysteine-directed folding experiments and to test contributions from d and e interface residues. G-X6-G Design-1 was produced as the original cysteine-containing design (variant 1b) and as a methionine point mutant (variant 1a, C11M), expecting analogous sidechain packing at this d residue. Three variants of G-X6-G Design-3 were designed to assess contributions of potential interhelical H-bonds. The original Design-3 (i.e. variant 3a) hosts Ser19 at d, expecting an interhelical H-bond to Leu5’s backbone carbonyl (SI Appendix, Fig. S4A). Design-3b varies from 3a by the synonymous Ser19Cys mutation, retaining that intended intermolecular H-bond, and mutating Ala13Ser at e to add another intermolecular H-bond. A more drastically mutated variant 3c further challenged the structure’s tolerance, replacing 3a/b’s Ser/Cys19 H-bonding at d with Met19’s apolar packing and adding back a putative interhelical H-bond at the preceding d Met12Ser (SI Appendix, Fig. S4B). Conversely, these Ser/Cys sidechains could prefer intrahelical rather than interhelical H-bonds.

The final stable design models for the G-X6-G and A-X6-A sequences were compared with the range of antiparallel TM geometries occurring naturally and with those predicted ab initio by orthogonal methods. A previous clustering analysis of all ways 2 helices interact in membrane proteins identified the tight left-handed antiparallel motif we designed as the 3rd most common recurring packing geometry (~141 unique helix–helix pairs) (1). The 5 synthetic TM proteins’ expected backbone conformations adopt native-like variations of the local fold family and should thus appropriately represent its sequence–structural relationship (SI Appendix, Table S3). Predicted structures by AlphaFold-3 (AF3) (37) were confident but likely incorrect for 4/5 TM sequences, adopting low contact density parallel-oriented helices (SI Appendix, Fig. S5). AF3 did accurately predict G-X6-G Design-4 (2 Å RMSD), which has the longest TM span. ESMfold (35) predicted high-confidence models with <0.7 Å backbone RMSD agreement for each design, suggesting the intended fold is the lowest energy structure for each sequence and this large language model (LLM) recognizes membrane-spanning Small-X6-Small motifs (Fig. 1C and SI Appendix, Fig. S4b).

Stable and Specific Folding of De Novo Small-X6-Small TM Proteins.

The ability of the designed TM domains to drive stable specific folding to the intended homodimer assembly was assessed by size SEC, using SUMO fusion proteins of the G-X6-G variants and characterizing each design’s oligomeric distribution in multiple detergents: n-Dodecyl-B-D-Maltoside (DDM), Myristyl sulfobetaine (C14B), and N,N-dimethyldodecylamine N-oxide (LDAO). A set of TM domains known to dimerize (13, 41) and trimerize fused to SUMO established reference SEC migration and folding behavior, eluting at expected detergent–oligomer complex sizes in different micelles and achieving adequate resolution (≥0.5 mL separation) to assign states for the designed TM oligomers (SI Appendix, Fig. S6 B and C and Table S2). Design-1a and -1b migrate as a 2-species mixture in both DDM and LDAO, assigned as dimer and monomer peaks based on size, with apparent ~60 to 75% fraction folded dimer—reflecting a modest stability and equivalency for Cys11 or Met11 at d (Fig. 2B and SI Appendix, Fig. S6A). By contrast, Design-2 robustly folded exclusively to a dimer in DDM (>98%, no detectable monomer, Fig. 2C). Design-2, -3a, and -3b were predominantly dimeric (>80%) in all detergents tested (Fig. 2 C and D and SI Appendix, S6 DF). Design-3c variant exhibited much lower folding specificity, eluting as a mixture of dimer and a larger likely trimeric oligomer: indicating G-X6-G does not strictly encode TM dimerization (SI Appendix, Fig. S6F). Thus, the 3 parent G-X6-G TM designs encode specific homo-oligomeric folding to the dimeric state, exhibit tolerance and disruption as the intended interface is varied, and have distinct monomer-dimer equilibria.

Next, relative stabilities were assessed. After exposure to harsh conditions expected to denature most natural membrane proteins, Design-2 remained predominantly dimeric in SEC indicating high stability: incubation with excess short-chain detergent (200 mM octyl-glucoside, SI Appendix, Fig. S6D) or heating at 85 to 90° C in LDAO or DDM (SI Appendix, Figs. S6D and S7A). Design-3a exhibited similar amounts of folded dimer after 15 min at 85° C in LDAO (SI Appendix, Fig. S6E). The Design-2 TM protein shows greater resistance to denaturation, running as a sodium dodecyl sulfate (SDS)-resistant dimer by polyacrylamide gel electrophoresis (PAGE), whereas Designs-1 and Design-3 variants run as monomers (Fig. 2E and SI Appendix, Fig. S7B). Mutation of the central motif glycine (G16F) ablates this dimerization, indicating responsibility of the G-X6-G sequence in its assembled structure. Incubation at high concentration of lithium dodecyl sulfate (LDS) partially reduced dimerization in gel migration, showing folding’s detergent dependence. Heating in SDS had minimal impact to Design-2’s dimeric folded fraction by PAGE (Fig. 2E), indicating either temperature-independence or rapid dimer refolding. Thus, Design-2 is exceptionally resistant to denaturation, among the most stable membrane proteins characterized to date.

Design-4, which has the longest hydrophobic span (4 G-X6-G repeats), showed strong self-assembly like the other designs, but differed as its oligomeric state was detergent-dependent (i.e. folding specificity). When incubated in excess DDM and ran in the DDM mobile phase, Design-4 instead folds to a dimer as intended (12.8 mL, Fig. 2F). When purified in C14B and ran in SEC with DDM mobile phase, Design-4 eluted as two peaks—a likely dimer and a slightly larger species. Running C14B-purified protein in C14B mobile phase yielded a monodisperse peak of this slightly larger species (11.8 mL). Glutaraldehyde cross-linking confirms a trimer dominates in C14B while a dimer is preferred in DDM (SI Appendix, Fig. S8). To discern the molecular basis, we predicted the trimer’s structure. ESMfold’s top model bears only a tight antiparallel dimer and a fully dissociated 3rd TM span (i.e. nontrimer), whereas AF3 predictions are low confidence—neither clarifying the alternative stable trimer conformation (SI Appendix, Fig. S2A). While Design-4’s dimerization is achieved in DDM, off-target folding in C14B highlights that G-X6-G motifs can stabilize specific trimers as well as dimers depending on the sequence and model membrane context. We interpret this detergent-dependent behavior observed as the interplay between interaction energetics and solvation, i.e. TM hydrophobic thickness matching.

We next addressed whether Designs 1-3 encode the antiparallel orientation using thiol-disulfide equilibrium exchange (42). N- and C-terminal cysteine-tagged TM peptides of each design were mixed in a 1:1 ratio (Fig. 3 and SI Appendix, Table S2), reconstituted in aqueous glutathione redox buffer with 100 molar excess C14B (>1 micelle per peptide) for overnight reversible oxidation, then read out by LC-MS. We expected formation of the disulfide-bonded homo- or heterodimers to reflect the underlying preference of noncovalent TM interaction geometries, parallel versus antiparallel (Fig. 3A). For Design-1a, -2, and -3b, the dominant disulfide-bonded dimer species is consistently the antiparallel heterodimer (Fig. 3 B and C and SI Appendix, Figs. S9–S11), although monomer fraction varied between designs. The two possible covalent parallel homodimers were either undetectable or present at much lower abundance (at least fivefold less). These de novo G-X6-G TM domains all demonstrate strong nonrandom preference for the antiparallel geometry, matching the topology intended by design.

Fig. 3.

Fig. 3.

Thiol-disulfide exchange of antiparallel G-X6-G TM peptides. (A) N-terminal and C-terminal cysteine (nCys, cyan; cCys, green) TM peptides reconstituted at 50 µM in aqueous 1.5 mM glutathione buffer (3:1 ratio [GSH]:[GSSG]) with 10 mM C14B can oxidize into three disulfide species: parallel cCys or nCys homodimers or antiparallel N-C heterodimer. (B) UV max plot RP-HPLC of Design-1a (representative of n = 3), Left. Right, MS of the major UV peak, antiparallel N-C heterodimer (expected: 7544.8 Da, Observed: 7544.2 Da). (C) Chromatogram of Design-2 reaction (Left) and MS of major peak, antiparallel N-C heterodimer (expected: 7729.2 Da, Observed: 7729.1 Da; representative of n = 3). (D) Chromatogram of Design-3a reaction (Left) and MS of major peak, antiparallel N-C heterodimer (expected: 7372.8 Da, Observed: 7372.8 Da; representative of n = 3).

A-X6-A Design Forms Antiparallel Dimers.

Unable to form Cα-H∙∙∙O = C H-bonds due to stereochemistry and distance, A-X6-A motifs are expected to achieve slightly wider variations of this TM building block geometry analogous to water-soluble “Ala-coils” (43, 44). Thus, our de novo A-X6-A Design predominantly utilizes apolar sidechains with tight geometrically compatible vdW interactions; S12 at d forms intrahelical H-bonds, packing its Cβ methylene. Motifs alanine methyl “knobs” project deeply into the vacant backbone polar “hole” region between interhelical d and e residues (Fig. 4 A and B). The ESMfold prediction adopts identical sidechain packing and interhelix distance (~8.8 Å) as the design model (1.5 Å RMSD). Packing small residues in TM spans may also result in entropy from lipid exclusion from the more exposed polar mainchain upon folding, leading to favorable solvo-phobic effects—reciprocal to the hydrophobic effect in water (26, 45, 46)— although its energetic significance and magnitude for different small residues (G, A, S) is unclear.

Fig. 4.

Fig. 4.

Folding of synthetic A-X6-A TM protein. (A) Design model of A-X6-A TM domain (white) overlaid with ESMfold prediction (purple cartoon, 1.5 Å Cα RMSD). Motif alanines, orange sticks. (B) A-X6-A TM sequence. Motif Ala, red underlined, labeled by heptad. (C) Thiol-disulfide equilibrium exchange LC–MS UV maxplot chromatogram of A-X6-A in 10 mM LDAO, with major peaks annotated like Fig. 3 including major N-C heterodimer peak. Representative of n = 3 trials. (D) SEC of 2 mg/mL SUMO-TM A-X6-A (representative of n = 3) on superdex200i 10/300 in different detergents (Left, 6 mM LDAO; middle, 3 mM C14B; center, 1 mM DDM) annotated with expected dimer elution range for that detergent, orange.

The SUMO-fused A-X6-A TM domain drives specific folding to a monodisperse dimer, which is not SDS-resistant in gel migration, thus weaker than G-X6-G Design-2 and similar to Design-3 (Fig. 4D and SI Appendix, Fig. S7B). The minimal A-X6-A as well as the G-X6-G TM peptides reconstituted in mild detergent form SDS-resistant oligomers (>6 kDa) during SDS or lauryl sarkosyl PAGE (47) (SI Appendix, Fig. S4). The TM domains readily self-assemble, robust to SDS. However, the stoichiometry of their oligomeric states could not be assigned due to anomalously migration (48). Thiol disulfide exchange of N- and C-terminal A-X6-A Design TM peptides reconstituted in LDAO or C14B reveals the antiparallel heterodimer as the predominant disulfide-bonded dimer species, indicative of strong antiparallel preference (Fig. 4C and SI Appendix, Fig. S13). Thus, our generative model implicitly optimized steric packing of apolar character to encode the wider backbone variant of the TM geometry accommodating A-X6-A, lacking the glycine’s close approach or H-bonds. Likewise, our design’s d/e sidechains achieve structural specificity to antiparallel dimerization, differentiating it from past synthetic A-X6-A sequences which adopt multiple or off-target folds (8, 21).

Structural Basis of Design 2’s High Stability.

To discern the basis of Design-2’s stability, we solved a 3.27 Å X-ray structure in octyl-glucoside (Fig. 5 and SI Appendix, Table S4). All 3 dimers in asymmetric unit have subatomic accuracy to the design model (0.6 ± 0.1 Å RMSD, SI Appendix, Fig. S14). As expected, sequential G-X6-G motifs mutually line an extended antiparallel interface of left-handed supercoiling α-helices, contrasting G-X3-G motifs’ shorter X-crossed interfaces (5, 41) (Fig. 5A). Tighter than expected interhelical distances (7.5 ± 0.2 Å) yield close interglycine distances and favorable backbone–backbone vdW surfaces. Glycine Cα-H2’s act as noncanonical “knobs” packing into the backbone “hole” composed of interhelical residues d, e, and the subsequent a glycine (i + 3, i + 4, i, respectively). Design-2’s apolar d-e sidechains pack with apparent mainchain-directed “knobs-to-holes” interactions, achieved similarly for Leu or beta-branched Ile or Val, burying the expanded polar backbone area flanking each motif glycine (“g-a” and “b-a” “holes” with i-1 and i+1 neighbors, respectively) (Fig. 5A). Glycines repeating on the same helix face in the unfolded state TM monomer exposes large polar mainchain surfaces (Fig. 5B) likely putting strain on nearby apolar lipid tails which, upon glycine burial during folding, should be released to the lipid bulk regaining favorable entropy. Design-2’s heightened stability appears correlated with its exclusive use of large sterically compatible apolar d-e amino acids. Extensive vdW surfaces from intimately packing backbone grooves should provide substantial enthalpy and also benefit from the entropy of sealing ample backbone polar area from lipid. Designs-1 and -3 utilize more small or polar sidechains (Ala, Ser, Cys, and Thr), with some intended for interhelix H-bonding, but ultimately resulted in weaker apparent folding.

Fig. 5.

Fig. 5.

G-X6-G Design-2 X-ray structure. (A) Interface packing of Design-2 antiparallel complex (termini denoted) at glycine residues (red spheres). Left, overall X-ray structure. Right, “knob-to-holes” packing layers of a G23 into backbone hole of d-e residues I11-L12 sidechains direct backbone-focused packing into holes surrounding opposing glycine g-a, a-b holes. (B) Interhelical Cα H-bonding (green dots) of glycine protons to carbonyl of symmetric glycine’s, overlaying the 3 unique TM dimers (backbone line trace) within the asymmetric unit. Inset, zoom, labeled in Å. (C) Extensive polar backbone surface exposed to lipid in the monomeric state. (D) Overlay, Design-2 X-ray structure (white) versus intended design model (lime), 0.6 Å Cα RMSD.

Design-2 forms clear Cα-H∙∙∙O = C H-bonds repeating between each motif glycine and the carbonyl of the opposing symmetrically related glycine across the interface, 6 possible per dimer (Fig. 5C). Cα-H∙∙∙O distances are particularly close, 2.6 to 2.9 Å in the core with a mean distance of 3.0 Å across the 3 unique dimers, including longer distances at termini. These closer H-bonds likely contribute more energy than the common 3.2 to 3.5 Å Cα H-bonds in TM regions and are similar to those in glycophorin A’s (GpA) prototypical G-X3-G motif (pdb: 5EH4, 3.0 ± 0.3 Å, SI Appendix, Fig. S15A) (49). Designs 1-4 models from Rosetta (50), Charmm36 (51), or ESMfold (35) consistently form serial Cα-H∙∙∙O = C H-bonds, however at further distances (3.1 to 3.5 Å) indicating these systems undervalue the interaction and are unsuitable for precisely estimating design-specific differences. If each repeating Cα-H H-bond contributes only modestly, up to the −0.9 kcal/mol estimated for a G-X3-G motif (52), the cumulative effect should provide substantial folding free energy—as in Design-2.

Cα H-Bonding in Natural and Synthetic Glycine TM Building Blocks.

We next analyzed the characteristics of the glycine Cα-H∙∙∙O = C H-bonds to discern energetic determinants, particularly Cα-H2 proton pseudochirality. In Design-2’s three crystallographic dimers (17 H-bonds), one Cα-H adopts a more favorable geometry (2.6-3.0 Å Cα-H∙∙∙O distance; Cα–H–O angle, ~110°; H–O–C angle, ~130°) while the other proton is further but in plausible H-bonding distance at less ideal angles (3.0 to 3.4 Å Cα-H∙∙∙O; Cα–H–O angle, 90-100°; H–O–C angle, 90 to 100°). The Cα-H with stereochemistry analogous to a sidechain (Cα-Cβ) we call “R” (unique to Gly); its pseudochiral counterpart the “S” proton (Fig. 6A). The “R” proton is the favored H-bond donor in 16 of 17 instances (Fig. 6B). This small difference in R vs “S” Cα-H∙∙∙O distance (DRvS = 0.3 ± 0.1 Å) suggests a possibly geometric balancing to align both protons into carbonyl orbital interactions, influencing these atoms’ positions. This antiparallel H-bonding starkly contrasts the more imbalanced pseudochirality in GpA’s parallel G-X3-G motif wherein R protons are the exclusive H-bond donors (“S” protons, 3.8 ± 0.4 Å; mean DRvS = 0.8 ± 0.2 Å, n = 8; SI Appendix, Fig. S15A).

Fig. 6.

Fig. 6.

Cα H-bonding characteristics in de novo Design-2 and natural membrane protein TM domains. (A) Glycine proton pseudochirality with interhelical Cα H-bonds labeled R or S (green and blue, respectively) by analogy to “L” amino acid sidechain stereochemistry. (B) R and S glycine proton distances to interhelix carbonyls in Design-2 structure, examples from across three unique dimers. (C) QM-optimized G16-G16 core of Design-2. Atoms, sticks; 0.5 a.u. isosurface of reduced density gradient (s) colored by sign(λ2)ρ (a.u.) (color bar), for interaction strength. Partially covalent H-bonds, yellow dots. (D) Cα H-bonds in natural membrane proteins by the local TM domain helix–helix geometry. R, yellow; S, blue. Left, G-X6-G motif donors. Middle, all glycine Cα-H donors. Right, all nonglycine Cα donors. Red dotted regions denote key differences.

Next, to investigate R and S proton H-bonding character, we performed quantum mechanics (QM) calculations on the central Cα–Cα region. Based on the QM minimized structure with harmonic backbone restraints, there is an extensive attractive interface between glycines according to the noncovalent interaction (NCI) index (53) (Fig. 6C). Interhelical Cα S proton contributions are largely disperse vdW forces; its electrostatic contribution is minimal, unlikely to impact protein geometry. Both R protons are associated with more localized strongly attracting interactions to interhelical carbonyl partners having associated bond critical points with properties indicative of H-bonds with covalent character and weakly shared electron density (54), analyzed through Quantum Theory of Atoms in Molecules (QTAIM) framework (55). Based on a published correlation between bond critical point electron density and hydrogen bond strength (56), they are estimated to provide 2.4 to 2.9 kcal/mol per H-bond in stabilization energy in vacuo (SI Appendix, Fig. S15 BD). Thus, the balanced Cα-H2 arrangement is a consequence of tight mainchain packing vdW surfaces simultaneously positioning “R” protons for near-optimal Cα-H∙∙∙O H-bond geometries without compromise.

We examined whether Design-2’s Cα-H∙∙∙O = C features approximate those in natural G-X6-G motifs and TM helix geometries (Dataset S1). Parallel-right (Small-X3-Small), Antiparallel-left (Small-X6-Small), and Antiparallel-right (consensus underdetermined) recursively host Cα H-bonds (Fig. 6D). Parallel left-handed packing is common (1), but seldomly positions Cα H-bonds. TM G-X6-G sequences predominantly host Cα H-bonds in antiparallel left-handed geometries (−135 to 165° crossing angle) with 70% (66/94) having balanced R proton-led H-bonds matching the pseudochirality of Design-2 (Fig. 6D). A subset of G-X6-G Cα H-bonds occur in near-completely parallel TM helices (crossing angle, ~0 ± 15°). However, 95% (20/21) come from a single protein family in highly kinked conformations (5 ATP synthetases in our database with divergent TM spans, <40% global identity)—distinct from ideal coiling helices in Antiparallel-left structures.

Cα H-bonds are ubiquitous across the membrane proteome, with or without glycine residues. Parallel-right geometries (−25 to 50° crossing angles) favor imbalanced R Cα H-bonds with glycines, and closer interhelix packing (6.5 to 7.5 Å) can occur at non-Gly amino acids (S protons). The latter tight interhelix S proton H-bonds are typically found preceding a glycine at the i-3 position (XxxG; sometimes within a Small-X3-Small motif, e.g. Small-Xxx-G), proceeding a packing glycine at the i + 4 position (G-xxx-X), or when packed into an interhelical Small-X3-Small motif with 1 or 2 glycine. Antiparallel-right topologies (135 to 165° crossing angle) access closer interhelical distances when hosting glycine R Cα H-bonds (6.5 to 7.7 Å) compared to nonglycine S H-bonded interfaces. Different interhelical geometries in membrane proteins are predisposed for distinct Cα H-bond characteristics. Likewise, natural Antiparallel-left TM building blocks prefer similar balanced-type R-directed geometries as in our de novo designs.

Discussion

Here, we benchmark a data-driven design approach and refine the sequence–structure relationship for one of the most common ways membrane proteins pack. In addition to corroborating that G-X6-G and A-X6-A are consensus patterns able to drive antiparallel TM geometries, our work illuminates determinants for encoding this architecture. One defining feature is d-e sidechains packing directed toward the mainchain with high steric specificity within the tight local interhelix geometry. Given that diverse sequence compositions and sidechain sterics are compatible at intervening helix-turns among designs, evaluating this 3D packing quality within its structural context may be more predictive than simplified sequence-based rules—differing from soluble and TM coiled-coils with strict, clear patterns (e.g. beta-branching) (38, 57, 58). Notably, AI models show discrepancies for this helix–helix geometry, which is seldom found in water-soluble folds (1). For evaluating apolar TM packing structures in membrane protein design or modeling, we propose quantifying this vdW packing character of deeply engaging backbone Holes (<3.2 Å mainchain backbone contacts) as a promising structural quality metric to supplement or supplant current common ranking and filtering methods (8, 31). Support for this notion emerges from sidechain-to-backbone directed packing being a common feature of exceptionally stable TM protein complexes, including recent designs (7, 38, 59). In water, most forms of apolar burial are advantageous, even peripheral aliphatic contacts. By contrast, not just any apolar protein–protein contact arrangement will be more favorable than the chemically similar lipid–protein interactions competing within the membrane environment. Thus, vdW sterics of this form apparently optimized in Design-2 may be a general mechanism for how proteins achieve favorable apolar packing in membranes without the hydrophobic effect.

While A-X6-A motifs can achieve similar folds and stabilities as G-X6-G designs, glycines’ H-bonding adds another stabilizing force alongside vdW packing structures, exemplified by the strongest Design-2. Early work surmised that TM Cα H-bonds’ may contribute minimally and close Cα-H∙∙∙O = C distances can simply be geometric consequences of more important interactions (60, 61). Design-2’s extensive close-distanced H-bonding strongly points to the contrary—that Cα H-bonds bolster this TM structure. If the other G-X6-G designs can potentially achieve similar Cα H-bonding networks, then why are they less stable? We hypothesize the differences may arise from either weaker internal packing structures (less vdW surfaces from smaller or polar sidechains), not realizing similar strength Cα-H∙∙∙O = C geometries, or an interrelated combination—i.e. not accommodating or balancing polar and packing features into their best forms, as in Design-2. Likewise, we are suspicious of whether interfaces of just two apolar helices lacking polar interactions, e.g. A-X6-A, can reach such heights of stability (62). However, the practical energy requirements for natural motifs are likely far less than these levels among optimized designs, especially when restrained by loops or domains. While TM spans in Nature typically suffice with short motif subsegments (1 to 2 heptads like EmrE) often in kinked helices, design epitomizing those substructures into extended architectures yields highly stable and specific folding alongside idealized molecular features. Thus, our software effectively builds molecular archetypes of TM building blocks well suited for further deciphering of this and other membrane-specific motifs.

Advancing transmembrane-focused chemical principles and expanding the molecular design toolkit, alone or in combination with water-soluble engineering, will allow encoding of more sophisticated functions traversing different solvation environments such as sensing, cross-transport, tunable signaling (8, 30, 6365). The effectiveness of the very simple 2-term generative design approach analyzing structure-aligned cross-evolutionary MSAs establishes that this depth and format of sequence–structure data is a viable starting point to train more generalized AI models for improved prediction and design focused on membrane-spanning architectures.

Materials and Methods

Protein Design.

Idealized homodimeric helices were modeled from parametric coiled–coiled equations (39) systematically varying interhelix radius, α-helix frequency, supercoiling frequency (and interface or pitch angle), and z-offset. Using MASTER (40), Nine residue fragments of each helix from each model were queried for close geometric matches to nonredundant database membrane protein structures as of June 2020, prepared, and curated at TM regions only as previously (38) (SI Appendix, Table S1). Four unique starting backbones highly common were selected for de novo sequence design. The data-mining, statistical inference, and generative protein design algorithm are described in detail in the Extended Methods of the SI Appendix. MD simulations were prepared in Charmm-GUI (66) and run using GROMACS (67).

Structure prediction was done by ESMfold server (35) using TM sequence repeated twice bridged by 20x glycine linker or AF3 server (37).

Expression and Purification of Designed TM Proteins.

SUMO-fusion constructs were cloned and expressed in pET-28a(+) vectors in C43(DE3) Escherichia coli (Lucigen) grown in Terrific Broth (TB) with 50 µg/mL kanamycin induced with 0.5 mM IPTG at either 18, 30, or 37 °C overnight, depending on the ideal temperature for that specific protein determined on a small scale. Cells were resuspended, lysed by tip sonicate, and extracted with 20 mM C14-Betaine (C14B, Sigma) detergent, prior to high-speed centrifugation. Clarified lysate was purified by Nickle affinity (Ni-Indigo, Cube), eluted into buffer containing 8 mM C14B.

SEC.

Purified proteins were concentrated to 2 to 3 mg/mL and injected to a Superdex200i 10/300 column (Cytiva) at room temperature using a mobile phase of 25 mM NaPi at pH 8, 150 mM NaCl, 1 mM DTT, 0.5 mM EDTA, and detergent (DDM, Anatrace; C14B; LDAO, Cayman).

Gel Electrophoresis.

SDS-PAGE of Nickle-purified or SEC purified proteins were conducted either by the Biorad Tris/Glycine/SDS anyKd method or Invitrogen MES-SDS NuPage 4 to 12% method, which give different oligomerization banding or smearing behaviors. NuPage LDS sample buffer was mixed to final 1x concentration, and ~5 µg SUMO-TM protein is loaded, without boiling unless specified. Low SDS loading buffer was prepared as 0.2% SDS supplemented to native loading buffer: 10% glycerol, 50 mM Tris pH 6.8, and 0.05% bromophenol blue final. TM peptides were first reconstituted to 1 mg/mL in 50 mM OG (Chem-Impex) or SDS before mixing with loading buffer.

SARK-PAGE.

Samples were mixed 1:1 with 0.2 or 2% w/v N-Lauroylsarcosine (Sark, Sigma) supplemented to native loading buffer, run on BioRad 12% polyacrylamide gels in Sark-Tris-Glycine running buffer (0.1% sarkosyl w/v, 25 mM Tris pH 8.3, and 192.5 mM Glycine) as described previously (47).

Glutaraldehyde Cross-Linking.

Purified protein diluted to 0.1 mg/mL and preincubated with varied detergent concentrations was cross-linked with 5 mM glutaraldehyde (Sigma) for 1 to 60 min and quenched with 20 mM hydrazine monohydrate final (Sigma).

Preparation of TM Peptides.

Cystine-labeled peptides were prepared by expression as recombinant proteins fused at the N-terminus with the Sumo domain as described above followed addition of thrombin protease (Sigma) added to sample prior to overnight dialysis in Tris buffer with 8 mM beta-mercaptoethanol. Minimal TM peptides for crystallography were digested overnight from SUMO fusion proteins by addition of porcine trypsin (1:100 mass ratio, Sigma). Cleaved TM peptides were purified HPLC using a Vydac C4 column with linear gradient of solvent A (99.9% water, 0.1% TFA) and solvent B (60% isopropanol, 29% acetonitrile, 9.9% trifluoroethanol, 1% water, and 0.1% TFA). Identity and 95% purity were confirmed by LC–MS (Waters)

Thiol-Disulfide Equilibrium Exchange.

As previously described (42), for TM peptide glutathione redox equilibrium exchange, 50 µM each of N-terminal cysteine or C-terminal cysteine peptide was reconstituted in 100 molar excess detergent with 1.5 mM [GSH]:[GSSG] (3:1 ratio) degassed redox buffer (25 mM Tris, 200 mM KCl, and 0.5 mM ethylenediaminetetraacetic acid (EDTA), pH 8.2) and then incubated overnight to equilibration. Reaction products were analyzed by LC–MS with a UPLC C4 Column using a linear gradient of Solvent A and Solvent B (60% isopropanol, 35% acetonitrile, 4.95% water, and 0.05% TFA).

X-Ray Crystallography.

Design-2 TM peptide was crystallized by hanging drop vapor diffusion in 0.1 M Sodium chloride, 0.1 M Lithium sulfate, 0.1 M Sodium citrate pH 3.5, 30% PEG400, flash frozen without additional cryoprotectant, and diffracted at APS 23-ID-B. The design model for molecular replacement in Phenix (68).

QM Calculations.

The coordinates for the two glycines (G16 from chain A and chain B of pdb 8SRN) and backbone atoms from adjacent residues to yield an N-terminal acetyl cap and C-terminal methyl capped tertiary fragment subjected to constrained minimization at B3LYP/6-31G* level of theory with GD3 empirical dispersion using Gaussian09 (69) with a harmonic potential of 0.0005 Hartree/Bohr (70) applied toward the initial conformation. The resulting wave functions were analyzed using Multiwfn (70).

Structural Informatics Analysis of the Membrane Protein Structural Database.

Cα-H∙∙∙O = C hydrogen bond interactions were identified using the geometric criteria described by Senes et al (4), searching unique instances (accounting for symmetry) in nonredundant membrane spanning domains (SI Appendix, Table S1), listed in Dataset S1. Each Cα proton was assigned as R or S.

Supplementary Material

Appendix 01 (PDF)

Dataset S01 (XLSX)

Acknowledgments

We thank the support of Robyn Stanfield and Ian Wilson in conducting our crystallography work. We thank Huong Kratochvil for careful reading of the manuscript and insightful comments. M.H. and S.F. are supported by R01GM069832. C.A.M. was supported by the Diekman Family Graduate Fellowship and Advancing Science in America Fellowship. C.A. was supported by the University of California San Diego McNair Scholars Program. Use of the Stanford Synchrotron Radiation Lightsource, Stanford Linear Accelerator National Accelerator Laboratory, is supported by the US Department of Energy, Office of Science, Office of Basic Energy Sciences under Contract No. DE-AC02-76SF00515. The Stanford Synchrotron Radiation Lightsource Structural Molecular Biology Program is supported by the Department of Energy Office of Biological and Environmental Research and by the NIH, National Institute of General Medical Sciences (P30GM133894).

Author contributions

K.G., M.H., and M.M. designed research; K.G., C.A., C.T.A., M.H., W.T., X.D., M.Z., C.A.M., and M.M. performed research; K.G., B.B.S., J.S.C., and M.M. contributed new reagents/analytic tools; K.G., C.A., C.T.A., M.H., X.D., S.F., and M.M. analyzed data; and K.G. and M.M. wrote the paper.

Competing interests

Provisional patent to K.G. and M.M. has been filed by Scripps Research describing the protein design algorithm.

Footnotes

This article is a PNAS Direct Submission.

Data, Materials, and Software Availability

X-ray crystallography data have been deposited in to the protein databank (PDB) with code 8SRN (71). Biochemistry raw data & uncropped gels are available on Zenodo at DOI: 10.5281/zenodo.15476293 (72). Software and usage for the protein design algorithm described are available at: https://github.com/goldenki55/tmDimer (73).

Supporting Information

References

  • 1.Zhang S. Q., et al. , The membrane- and soluble-protein helix-helix interactome: Similar geometry via different interactions. Structure 23, 527–541 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Feng X., Barth P., A topological and conformational stability alphabet for multipass membrane proteins. Nat. Chem. Biol. 12, 167–173 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Walters R. F., DeGrado W. F., Helix-packing motifs in membrane proteins. Proc. Natl. Acad. Sci. U.S.A. 103, 13658–13663 (2006). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Senes A., Ubarretxena-Belandia I., Engelman D. M., The Calpha --H.O hydrogen bond: A determinant of stability and specificity in transmembrane helix interactions. Proc. Natl. Acad. Sci. U.S.A. 98, 9056–9061 (2001). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Mueller B. K., Subramaniam S., Senes A., A frequent, GxxxG-mediated, transmembrane association motif is optimized for the formation of interhelical Cα–H hydrogen bonds. Proc. Natl. Acad. Sci. U.S.A. 111, E888–E895 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Doura A. K., Kobus F. J., Dubrovsky L., Hibbard E., Fleming K. G., Sequence context modulates the stability of a GxxxG-mediated transmembrane helix-helix dimer. J. Mol. Biol. 341, 991–998 (2004). [DOI] [PubMed] [Google Scholar]
  • 7.Anderson S. M., Mueller B. K., Lange E. J., Senes A., Combination of Calpha-H hydrogen bonds and van der waals packing modulates the stability of GxxxG-mediated dimers in membranes. J. Am. Chem. Soc. 139, 15774–15783 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Elazar A., et al. , De novo-designed transmembrane domains tune engineered receptor functions. eLife 11, e75660 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Nash A., Notman R., Dixon A. M., De novo design of transmembrane helix-helix interactions and measurement of stability in a biological membrane. Biochim. Biophys. Acta 1848, 1248–1257 (2015). [DOI] [PubMed] [Google Scholar]
  • 10.Yin H., et al. , Computational design of peptides that target transmembrane helices. Science 315, 1817–1822 (2007). [DOI] [PubMed] [Google Scholar]
  • 11.Mravic M., et al. , De novo designed transmembrane peptides activating the alpha5beta1 integrin. ProteiN Eng. Design Selec. PEDS 31, 181–190 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Elbaz Y., Salomon T., Schuldiner S., Identification of a glycine motif required for packing in EmrE, a multidrug transporter from Escherichia coli*. J. Biol. Chem. 283, 12276–12283 (2008). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Zhang Y., Kulp D. W., Lear J. D., DeGrado W. F., Experimental and computational evaluation of forces directing the association of transmembrane helices. J. Am. Chem. Soc. 131, 11341–11343 (2009). [DOI] [PubMed] [Google Scholar]
  • 14.Mravic M., et al. , De novo-designed transmembrane proteins bind and regulate a cytokine receptor. Nat. Chem. Biol. 20, 751–760 (2024). 10.1038/s41589-024-01562-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Liao J., et al. , Structural insight into the ion-exchange mechanism of the sodium/calcium exchanger. Science 335, 686–690 (2012). [DOI] [PubMed] [Google Scholar]
  • 16.Liu Y., Engelman D. M., Gerstein M., Genomic analysis of membrane protein families: Abundance and conserved motifs. Genome Biol. 3, research0054.0051 (2002). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Han Y., et al. , Crystal structure of steroid reductase SRD5A reveals conserved steroid reduction mechanism. Nat. Commun. 12, 449 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Kovalenko O. V., Metcalf D. G., DeGrado W. F., Hemler M. E., Structural organization and interactions of transmembrane domains in tetraspanin proteins. BMC Struct. Biol. 5, 11 (2005). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Gaidukov L., Nager A. R., Xu S., Penman M., Krieger M., Glycine dimerization motif in the N-terminal transmembrane domain of the high density lipoprotein receptor SR-BI required for normal receptor oligomerization and lipid transport. J. Biol. Chem. 286, 18452–18464 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Poulsen B. E., Cunningham F., Lee K. K., Deber C. M., Modulation of substrate efflux in bacterial small multidrug resistance proteins by mutations at the dimer interface. J. Bacteriol. 193, 5929–5935 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.North B., et al. , Characterization of a membrane protein folding motif, the ser zipper, using designed peptides. J. Mol. Biol. 359, 930–939 (2006). [DOI] [PubMed] [Google Scholar]
  • 22.Johnson R. M., Heslop C. L., Deber C. M., Hydrophobic helical hairpins: Design and packing interactions in membrane environments. Biochemistry 43, 14361–14369 (2004). [DOI] [PubMed] [Google Scholar]
  • 23.Crick F. H., The packing of-helices: Simple coiled-coils. Acta Crystallogr. 6, 689–697 (1953). [Google Scholar]
  • 24.Adamian L., Liang J., Interhelical hydrogen bonds and spatial motifs in membrane proteins: Polar clamps and serine zippers. Proteins 47, 209–218 (2002). [DOI] [PubMed] [Google Scholar]
  • 25.Curnow P., et al. , Small-residue packing motifs modulate the structure and function of a minimal de novo membrane protein. Sci. Rep. 10, 15203 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Kratochvil H. T., Newberry R. W., Mensa B., Mravic M., DeGrado W. F., Spiers Memorial Lecture: Analysis and de novo design of membrane-interactive peptides. Faraday Discuss. 232, 9–48 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Seubert N., et al. , Active and inactive orientations of the transmembrane and cytosolic domains of the erythropoietin receptor dimer. Mol. Cell 12, 1239–1250 (2003). [DOI] [PubMed] [Google Scholar]
  • 28.Ovchinnikov V., Stone T. A., Deber C. M., Karplus M., Structure of the EmrE multidrug transporter and its use for inhibitor peptide design. Proc. Natl. Acad. Sci. U.S.A. 115, E7932–E7941 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Mitchell C. J., Nguyen K. M., Deber C. M., Toward a broad-spectrum peptide-based inhibitor of small multidrug resistance efflux pumps. Peptide Sci. 116, e24327 (2023). [Google Scholar]
  • 30.Scott A. J., et al. , Constructing ion channels from water-soluble α-helical barrels. Nat. Chem. 13, 643–650 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Kroncke B. M., et al. , Documentation of an imperative to improve methods for predicting membrane protein stability. Biochemistry 55, 5002–5009 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Zhou J., Panaitiu A. E., Grigoryan G., A general-purpose protein design framework based on mining sequence-structure relationships in known protein structures. Proc. Natl. Acad. Sci. U.S.A. 117, 1059–1068 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Zheng F., Zhang J., Grigoryan G., Tertiary structural propensities reveal fundamental sequence/structure relationships. Structure 23, 961–971 (2015). [DOI] [PubMed] [Google Scholar]
  • 34.Li A. J., et al. , Neural network-derived Potts models for structure-based protein design using backbone atomic coordinates and tertiary motifs. Protein Sci. 32, e4554 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Lin Z., et al. , Evolutionary-scale prediction of atomic-level protein structure with a language model. Science 379, 1123–1130 (2023). [DOI] [PubMed] [Google Scholar]
  • 36.Marks D. S., et al. , Protein 3D structure computed from evolutionary sequence variation. PLoS One 6, e28766 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.Abramson J., et al. , Accurate structure prediction of biomolecular interactions with AlphaFold 3. Nature 630, 493–500 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Mravic M., et al. , Packing of apolar side chains enables accurate design of highly stable membrane proteins. Science 363, 1418–1423 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Grigoryan G., Degrado W. F., Probing designability via a generalized model of helical bundle geometry. J. Mol. Biol. 405, 1079–1100 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Zhou J., Grigoryan G., Rapid search for tertiary fragments reveals protein sequence-structure relationships. Protein Sci. 24, 508–524 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41.MacKenzie K. R., Prestegard J. H., Engelman D. M., A transmembrane helix dimer: Structure and implications. Science 276, 131–133 (1997). [DOI] [PubMed] [Google Scholar]
  • 42.Cristian L., Lear J. D., DeGrado W. F., Determination of membrane protein stability via thermodynamic coupling of folding to thiol-disulfide interchange. Protein Sci. 12, 1732–1740 (2003). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43.Gernert K. M., Surles M. C., Labean T. H., Richardson J. S., Richardson D. C., The alacoil: A very tight, antiparallel coiled-coil of helices. Protein Sci. 4, 2252–2260 (1995). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44.Liu J., Lu M., An alanine-zipper structure determined by long range intermolecular interactions. J. Biol. Chem. 277, 48708–48713 (2002). [DOI] [PubMed] [Google Scholar]
  • 45.Weinstein J. Y., Elazar A., Fleishman S. J., A lipophilicity-based energy function for membrane-protein modelling and design. PLoS Comput. Biol. 15, e1007318 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46.Corin K., Bowie J. U., How physical forces drive the process of helical membrane protein folding. EMBO Rep. 23, e53025 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47.Huang L., et al. , An efficient method for detecting membrane protein oligomerization and complex using 05SAR-PAGE. Electrophoresis 45, 1450–1454 (2024). [DOI] [PubMed] [Google Scholar]
  • 48.Rath A., Glibowicka M., Nadeau V. G., Chen G., Deber C. M., Detergent binding explains anomalous SDS-PAGE migration of membrane proteins. Proc. Natl. Acad. Sci. U.S.A. 106, 1760–1765 (2009). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 49.Trenker R., Call M. E., Call M. J., Crystal structure of the Glycophorin A transmembrane dimer in lipidic cubic phase. J. Am. Chem. Soc. 137, 15676–15679 (2015). [DOI] [PubMed] [Google Scholar]
  • 50.Alford R. F., Fleming P. J., Fleming K. G., Gray J. J., Protein structure prediction and design in a biologically realistic implicit membrane. Biophys. J. 118, 2042–2055 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 51.Huang J., MacKerell A. D. Jr., CHARMM36 all-atom additive protein force field: Validation based on comparison to NMR data. J. Comput. Chem. 34, 2135–2145 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 52.Arbely E., Arkin I. T., Experimental measurement of the strength of a C alpha-H.O bond in a lipid bilayer. J. Am. Chem. Soc. 126, 5362–5363 (2004). [DOI] [PubMed] [Google Scholar]
  • 53.Contreras-Garcia J., et al. , NCIPLOT: A program for plotting non-covalent interaction regions. J. Chem. Theory Comput. 7, 625–632 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 54.Koch U., Popelier P. L. A., Characterization of c-h-o hydrogen bonds on the basis of the charge density. J. Phys. Chem. 99, 9747–9754 (1995). [Google Scholar]
  • 55.Bader R. F. W., A quantum theory of molecular structure and its applications. Chem. Rev. 91, 893–928 (1991). [Google Scholar]
  • 56.Parthasarathi R., Subramanian V., Sathyamurthy N., Hydrogen bonding without borders: An atoms-in-molecules perspective. J. Phys. Chem. A. 110, 3349–3351 (2006). [DOI] [PubMed] [Google Scholar]
  • 57.Harbury P., Zhang T., Kim P., Alber T., A switch between two-, three-, and four-stranded coiled coils in GCN4 leucine zipper mutants. Science 262, 1401–1407 (1993). [DOI] [PubMed] [Google Scholar]
  • 58.Woolfson D. N., Understanding a protein fold: The physics, chemistry, and biology of alpha-helical coiled coils. J. Biol. Chem. 299, 104579 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 59.Zhou C., et al. , Accurate de novo design of a voltage-gated anion channel. bioRxiv [Preprint] (2024). 10.1101/2024.12.25.630309 (Accessed 1 August 2025). [DOI]
  • 60.Chamberlain A. K., Bowie J. U., Evaluation of C-H···O hydrogen bonds in native and misfolded proteins. J. Mol. Biol. 322, 497–503 (2002). [DOI] [PubMed] [Google Scholar]
  • 61.Yohannan S., et al. , A C alpha-H.O hydrogen bond in a membrane protein is not stabilizing. J. Am. Chem. Soc. 126, 2284–2285 (2004). [DOI] [PubMed] [Google Scholar]
  • 62.Loiseau G. J., A senes packing of apolar amino acids is not a strong stabilizing force in transmembrane helix dimerization. bioRxiv [Preprint] (2025). 10.1101/2025.04.26.649789 (Accessed 1 August 2025). [DOI] [PMC free article] [PubMed]
  • 63.An L., et al. , Binding and sensing diverse small molecules using shape-complementary pseudocycles. Science 385, 276–282 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 64.Jefferson R. E., et al. , Computational design of dynamic receptor-peptide signaling complexes applied to chemotaxis. Nat. Commun. 14, 2875 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 65.Zhu J., et al. , De novo design of transmembrane fluorescence-activating proteins. Nature 640, 249–257 (2025). [DOI] [PubMed] [Google Scholar]
  • 66.Lee J., et al. , CHARMM-gui input generator for NAMD, GROMACS, AMBER, OpenMM, and CHARMM/OpenMM simulations using the CHARMM36 additive force field. J. Chem. Theory Comput. 12, 405–413 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 67.Abraham M. J., et al. , GROMACS: High performance molecular simulations through multi-level parallelism from laptops to supercomputers. SoftwareX 1–2, 19–25 (2015). [Google Scholar]
  • 68.Adams P. D., et al. , PHENIX: A comprehensive Python-based system for macromolecular structure solution. Acta Crystallogr. D Biol. Crystallogr. 66, 213–221 (2010). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 69.Frisch M., et al. , Gaussian 09, Revision d. 01 (Gaussian.Inc, Wallingford, 2009). [Google Scholar]
  • 70.Lu T., Chen F., Multiwfn: A multifunctional wavefunction analyzer. J. Comput. Chem. 33, 580–592 (2012). [DOI] [PubMed] [Google Scholar]
  • 71.Golden J., Dai X., Mravic M., pdb_00008srn. Protein Data Bank. 10.2210/pdb8SRN/pdb. Deposited 5 May 2023. [DOI]
  • 72.Mravic M., et al. , Source data: Design principles of the common Gly-X6-Gly membrane protein building block. Zenodo. 10.5281/zenodo.15476293. Deposited 20 May 2025. [DOI] [PMC free article] [PubMed]
  • 73.Golden K., et al. , tmDimer. GitHub. https://github.com/goldenki55/tmDimer. Deposited 22 August 2025.

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Appendix 01 (PDF)

Dataset S01 (XLSX)

Data Availability Statement

X-ray crystallography data have been deposited in to the protein databank (PDB) with code 8SRN (71). Biochemistry raw data & uncropped gels are available on Zenodo at DOI: 10.5281/zenodo.15476293 (72). Software and usage for the protein design algorithm described are available at: https://github.com/goldenki55/tmDimer (73).


Articles from Proceedings of the National Academy of Sciences of the United States of America are provided here courtesy of National Academy of Sciences

RESOURCES