Design of complicated all-α protein structures

Koya Sakuma; Naohiro Kobayashi; Toshihiko Sugiki; Toshio Nagashima; Toshimichi Fujiwara; Kano Suzuki; Naoya Kobayashi; Takeshi Murata; Takahiro Kosugi; Rie Tatsumi-Koga; Nobuyasu Koga

doi:10.1038/s41594-023-01147-9

. 2024 Jan 4;31(2):275–282. doi: 10.1038/s41594-023-01147-9

Design of complicated all-α protein structures

Koya Sakuma ^1,^#, Naohiro Kobayashi ^2,^3,^#, Toshihiko Sugiki ³, Toshio Nagashima ², Toshimichi Fujiwara ³, Kano Suzuki ⁴, Naoya Kobayashi ⁵, Takeshi Murata ^4,^6,⁷, Takahiro Kosugi ^1,^5,⁸, Rie Tatsumi-Koga ⁵, Nobuyasu Koga ^1,^5,^8,^9,^✉

PMCID: PMC11377298 PMID: 38177681

Abstract

A wide range of de novo protein structure designs have been achieved, but the complexity of naturally occurring protein structures is still far beyond these designs. Here, to expand the diversity and complexity of de novo designed protein structures, we sought to develop a method for designing ‘difficult-to-describe’ α-helical protein structures composed of irregularly aligned α-helices like globins. Backbone structure libraries consisting of a myriad of α-helical structures with five or six helices were generated by combining 18 helix–loop–helix motifs and canonical α-helices, and five distinct topologies were selected for de novo design. The designs were found to be monomeric with high thermal stability in solution and fold into the target topologies with atomic accuracy. This study demonstrated that complicated α-helical proteins are created using typical building blocks. The method we developed will enable us to explore the universe of protein structures for designing novel functional proteins.

Subject terms: Biophysics, Proteins, Solution-state NMR, Computational biology and bioinformatics

Here authors developed a computational method to design complicated all-α structures using typical helix–loop–helix motifs and canonical α-helices, and demonstrated the ability to create complicated all-α proteins.

Main

Many naturally occurring protein structures are complicated, lacking distinguishable symmetry and regularity. Prominent examples of such complicated proteins are globin-fold structures with eight irregularly packed α-helices; Kendrew referred to the tertiary arrangement of the secondary structures as being difficult to describe in simple terms¹ (Fig. 1a). In most parts of globin fold structures, two helices adjacent in the sequence are connected crosswise rather than hairpin-like, and the helix–helix packings deviate from the canonical patterns^2,3; this fold does not include internal structural repeats such as α-solenoids^4,5. These asymmetric, irregular and nonrepetitive secondary structure arrangements make it difficult to simply describe globin structures, and many naturally occurring proteins as well.

a,b, Structures of myoglobin (a) and representative de novo designed all-α proteins (b) (the N- and C-terminals are colored in blue and red, respectively, and the characters represent PDB IDs). The α-helices in the globin structure are irregularly aligned, whereas those of the de novo designs are almost parallelly aligned. c, The order parameter capturing the complexities of α-helical proteins, HO. HO is defined by the average of inner products between helix orientation vectors, u_i, for all pairs of N α-helices⁵⁵. Higher values indicate more ordered, and lower values more complicated. d, HO distributions for naturally occurring and de novo designed proteins with three to eight α-helices. Whereas naturally occurring all-α proteins show broad distributions irrespective of the number of constituent α-helices, previous de novo designed all-α proteins indicated by yellow-ocher bars show relatively higher values in the distributions (for details of the previous designs, see Extended Data Fig. 1). Notably, globin structures indicated by blue bars have quite low values. The all-α proteins created in this study, indicated by red bars, have lower values than the previous designs.

Source data

A wide range of all-α protein structures have been designed, but the designs have been limited to simple and ordered structures consisting of α-helices in almost parallel alignment, such as coiled-coil, bundle and barrel structures (Fig. 1b–d and Extended Data Fig. 1)^5–27. Jacobs et al. attempted to design α-helical proteins with more variety¹⁵, but their designs were still bundle-like (the two designs with five α-helices in Fig. 1b). However, the distribution of complexity for naturally occurring all-α protein structures is biased to the complicated ones (Fig. 1d). The observed distribution bias is probably due to the fact that all-α proteins with complicated spatial arrangements of α-helices can provide diverse and heterogenous molecular surfaces, enabling specific interactions with binding partners. Moreover, such complicated all-α proteins should make it possible to incorporate a functional site enclosed on nearly all sides by multiple structural elements in three dimensions, like globins. Therefore, the ability to create protein structures with irregularly packed α-helices would contribute to the design of various functional proteins.

Extended Data Fig. 1 — a, Structures and their PDB IDs of the twenty-two de novo designed proteins collected from PDB. b, The helix order (HO) values of the designs were plotted in the HO histograms for naturally occurring all-α proteins (these histograms are identical to those in Fig. 1).

Source data

In this article, we sought to develop a computational method to design complicated all-α structures by employing helix–loop–helix (HLH) motifs typically observed in naturally occurring proteins. The developed method enabled us to generate a wide range of α-helical protein structure topologies from bundle-like to complicated by combining the typical HLH motifs and canonical α-helices. Finally, we demonstrated the ability to create complicated all-α proteins by de novo design of five distinct topologies.

Strategy for all-α topology building

Although it has been suggested that the overall tertiary arrangements of helices of naturally occurring α-helical proteins can be approximated by a quasi-spherical polyhedral model²⁸, the major obstacle in designing complicated all-α topologies with irregularly aligned α-helices is attributed to the difficulty in determining a priori feasible topologies with their backbone blueprints involving lengths of secondary structures and loops. This is different from the design of αβ-proteins: the topologies are selected in advance by β-strand arrangements (that is, the order and orientations of β-strands in a β-sheet), and the backbone blueprints were derived from a set of rules relating local backbone structures of a few successive secondary structure elements to the preferred tertiary motifs²⁹. Therefore, we attempted to explore all-α topologies, not by preparing them a priori but by generating backbone structure topologies through the combinatorial enumeration of tertiary building blocks (Fig. 2). Moreover, the tertiary building blocks were selected from those typically observed in nature, so that the generated backbone structures are likely to be feasible. Therefore, the question is whether complicated all-α topologies can be generated from typical building blocks.

Fig. 2 — Top: building blocks for generating backbone structures. Canonical α-helices and three types of HLH tertiary motifs typically observed in nature, hairpin (h), v-shaped (v) and corner (c), are used. Helices range from 5 to 35 residues, and each motif type comprises six patterns (Fig. 3a). The motif types were classified on the basis of the bending angle between the constituent helices in HLH motifs. Middle: secondary-structure element ordering to build α-helical proteins with five helices. According to the ordering, globular backbone structures without steric clashes are exhaustively explored by combining the building blocks, with the constraint of total residue length. Bottom: examples for generated α-helical backbone structure topologies. Poorly packed structures (lower) are discarded, whereas globularly folded structures (upper) are collected.

A typical set of HLH motifs as building blocks

We first attempted to collect a set of HLH tertiary motifs that are typically observed in nature as building blocks. The HLH units consisting of two α-helices and the connecting loop of one to five residues in length were extracted from naturally occurring proteins, then clustered into 18 subgroups based on the five-dimensional feature vectors representing the HLH tertiary geometries³⁰ (Extended Data Figs. 2 and 3 and Methods). The representative 18 HLH motifs corresponding to each cluster density peak exhibited a broad range of bending angles between two helices, such as left- or right-handed helix–turn–helix, helix–corner–helix and kinked helices (Fig. 3a and Extended Data Figs. 3 and 4); the amino acid preference for each motif is shown in Extended Data Fig. 5: Gly at the residues with positive phi backbone torsion angle, helix capping residues immediately before helices such as Asp, Asn, Thr or Ser (refs. ^31,32), and hydrophobic periodicity of helix residues specific to each motif are observed. The 18 HLH motifs are classified into three classes according to the magnitude of the bending angle: hairpin (h), v-shaped (v) and corner (c). The 18 representative HLH motifs were used as building blocks (Fig. 2, top) for generating α-helical backbone structure topologies.

Extended Data Fig. 2 — **a-d**, For representing tertiary geometries of HLH units, the following angles, θ_N, θ_C, ϕ_NC, ϕ_NL, and ϕ_CL, were identified using the V_N, V_C, V_L, V_HN, and V_HC vectors (these vectors are calculated using Cα atoms). a, The definitions of θ_N and θ_C. V_N and V_C respectively represent the helix vectors for the N- and C- terminal helices in a HLH geometry, which are calculated using the equations proposed by Krissinel et al.⁵³. V_L is the loop vector from the last Cα atom (blue) in the N-terminal helix to the first Cα atom (red) in the C-terminal helix. θ_Ν was identified as the angle between the V_N and V_L vectors; θ_C, was identified as the angle between the V_C and V_L vectors. b, The definitions of ϕ_NC· ϕ_NC was identified as the dihedral angle between the plane defined with the V_N and V_L vectors and that with the V_C and V_L vectors. c, The definition of ϕ_NL· V_HN is the helix spiral vector at the end of the N-terminal helix, which was identified as the vector pointed to the last Cα atom (blue) in the N-terminal helix from the Cα atom immediately before the last Cα atom (white). ϕ_NL was identified as the dihedral angle between the plane defined with the V_HN and V_L vectors and that with the V_HN and V_N vectors. d, The definition of ϕ_CL· V_HC is the helix spiral vector at the beginning of the C-terminal helix, which was identified as the vector from the first Cα atom (red) in the C-terminal helix to the Cα atom immediately after the first Cα atom (yellow). ϕ_CL was identified as the dihedral angle between the plane defined with the V_C and V_HC vectors and that with the V_L and V_HC vectors.

Extended Data Fig. 3 — HLH structures collected from naturally occurring protein structures were clustered for each loop length from one to five based on the pairwise Euclidean distance between the five-dimensional vectors of the features shown in Extended Data Fig. 2, using the density clustering algorithm³⁰. For each loop length, decision graphs to determine density peaks of clusters are shown, in which rho represents the local density of a point in the five-dimensional feature vector space and delta represents the minimum distance between a point to any other point with higher density; for the point with highest density, delta is calculated as the maximum distance to any other points.

Source data

Fig. 3 — a, Identified 18 HLH tertiary motifs typically observed in nature. The motifs are classified based on the bending angle between the two helices in the motifs: hairpin (h), v-shaped (v) and corner (c), which are presented in order of the magnitude of the bending angle, with the ABEGO backbone torsion pattern for the connecting loop. The residues with the backbone torsion angle, A, B, E and G, in the ABEGO torsion representation (‘A’ corresponds to the right-handed α-helix region in the Ramachandran map, ‘B’ to the β-strand region, ‘E’ to the extended region with a positive phi angle, and ‘G’ to a left-handed α-helix) are shown in red, blue, yellow and green, respectively. b, HO distributions for generated backbone structures with six helices. The black, gray and white bars respectively represent the distributions for the ensemble generated using only hairpin motifs (h), hairpin and v-shaped (h + v) motifs, and all three motifs (h + v + c). Incorporation of v-shaped and corner loops yields lower HO structures. c, Examples for the generated backbone structures. The used motif type strings and the HO values are indicated for each structure. The N- and C-terminals are colored in blue and red, respectively.

Source data

Extended Data Fig. 4 — The loop numbers with their ABEGO torsion patterns correspond to the 18 representative HLH motifs shown in Extended Data Fig. 3.

Source data

Extended Data Fig. 5 — (Left) The 18 representative HLH motifs are shown as in Extended Data Fig. 3. (Middle) ABEGO torsion patterns of the loop. This result suggests that the relative arrangements of adjacent helices strongly limit the torsion patterns of the connecting loop. (Right) Amino acid sequence preferences of each HLH motif. The first residue of the loop is indicated by an arrow.

Source data

Generation of all-α topologies by combinatorial enumeration

Next, we investigated whether complicated topologies are produced using these typical tertiary motifs. Helical backbone structures composed of five and six helices were built with 90 and 110 residues in the total length, respectively, by combining the set of 18 HLH motifs and canonical α-helices ranging from 5 to 35 residues. The backbone structures were generated by enumerating all the combinations and selecting compact and steric-clash-free structures (Methods): 1,159,937,910 five-helix and 20,878,882,380 six-helix structures were enumerated, and 1,899,355 and 380,869 structures were then selected for each. The resulting topologies exhibited a broad spectrum ranging from helical bundle-like to complicated globular structures, demonstrating that complicated α-helical topologies are created from the typical tertiary motifs and canonical α-helices (Fig. 3b, white bar; Fig. 3c and Extended Data Fig. 6); the helix lengths were also widely distributed in the generated structures (Extended Data Fig. 7). Moreover, we found that the complexities of the generated topologies increase, as tertiary motifs with larger bending angles are included (black, gray and white bars in Fig. 3b). These results highlight the importance of corner-type motifs³³ in building complicated α-helical topologies.

Extended Data Fig. 6 — The structures are sorted by their HO values from top left to bottom right. The top left structure shows the smallest HO value and has irregularly packed α-helices, whereas the bottom right one shows the highest HO value and has parallelly aligned α-helices. The five designed topologies are shown together by enclosed squares.

Source data

Extended Data Fig. 7 — Left: The distribution of mean helix lengths for the generated backbone structures. For each backbone structure, the mean helix length was calculated by averaging lengths of the helices in the structure. Middle: The distribution of standard deviation of helix lengths for the generated backbone structures. For each backbone structure, the standard deviation of lengths of the helices in the structure was calculated. Right: The two-dimensional distribution of the mean and standard deviation of helix lengths. The width of the distribution of the standard deviation was in the order of the four-helix, five-helix, and six-helix structures. This is because the four-helix structures were not subject to the Rg constraint, and the five- and six- helix structures were the ones with Rg < 14 Å. Since the same threshold value for the Rg constraint was used, the distribution width for the five-helix structures is slightly wider than that of the six-helix structures. The helix lengths of the designs chosen for experimental characterization correspond to the vicinity of the peaks of the distributions.

Source data

Design of complicated α-helical topologies

From the generated myriad backbone structure topologies, we selected five for de novo design, H5_fold-0, H6_fold-C, H6_fold-Z, H6_fold-U and H7_fold-K (the Arabic numeral after ‘H’ indicates the number of helices) (Fig. 4 and Supplementary Fig. 1), in the following way. We first selected three topologies exhibiting extremely low helix order (HO) values (for the definition, see Fig. 1c and Methods): H5_fold-0, H6_fold-C and H6_fold-Z (Fig. 1d). Next, to test whether all identified HLH motifs could be used for de novo design, we selected H6_fold-U and H7_fold-K, which include all of the HLH motifs not used in the first three and still exhibit lower HO values (Fig. 1d). For all target folds except H5_fold-0, the lengths of the terminal helices were manually elongated to ensure sufficient packing interactions. None of these backbone structures is similar to any known protein structures; H5_fold-0, H6_fold-C, H6_fold-Z and H6-fold-U show a TM-score <0.6, using TM-align³⁴ against the ECOD database³⁵, and H6_fold-K shows a score of 0.610, with a structure of e2bnlA1 (Extended Data Fig. 8). The details of the selected topologies are described in Supplementary Text. For each backbone structure, amino acid sequences were designed through iterations of fixed-backbone sequence optimization and fixed-sequence structure optimization using Rosetta design calculations^36,37. Designs with low energy, tight core packing³⁸ and high compatibility between local sequences and structures²⁹ were selected, and their energy landscapes were explored by 10,000 independent Rosetta ab initio structure prediction simulations starting from an extended conformation³⁹. Ninety-one percent (75 of 82 designs) for H5_fold-0, 45% (18 of 40 designs) for H6_fold-C, 68% (27 of 40 designs) for H6_fold-Z, 67% (60 of 90 designs) for H6_fold-U, and 40% (36 of 90 designs) for H7_fold-K, showed funnel-shaped energy landscapes. Among the designs having funnel-shaped energy landscapes, we selected approximately ten designs for each topology (for the details, see Methods).

Fig. 4 — The design target backbone structures. H1–7 represents the first to seventh helices. The letter string next to a loop indicates the ABEGO torsion pattern and the character within a bracket indicates the motif type. The loop residues are colored in the ABEGO torsion representation, same as Fig. 3a. The HO value and radius of gyration (Rg) are indicated for each structure.

Extended Data Fig. 8 — The designed structures (left) and the most similar ones (right) with pdb ids and TM-score values.

Experimental characterization of designed proteins

We obtained synthetic genes encoding ten designs for H5_fold-0, seven for H6_fold-C, seven for H6_fold-Z, eight for H6_fold-U and eight for H6_fold-K. Some designs (H6_fold-Z, 2; H6_fold-U, 1; H7_fold-K, 2) have weak sequence similarity to known proteins with blast E-value <0.005, but the structures are unknown (Supplementary Table 1). The proteins were expressed in Escherichia coli and purified using a Ni²⁺-NTA affinity column. The purified proteins were then characterized by circular dichroism (CD) spectroscopy and size-exclusion chromatography combined with multi-angle light scattering (SEC–MALS). For all design target topologies, 34 of 40 designed proteins were found to be well expressed and highly soluble, and showed CD spectra typical of α-helical proteins; 27 out of the 34 designs were found to be monomeric by SEC–MALS (Supplementary Tables 2–6). Furthermore, the monomeric designs were characterized by ¹H-¹⁵N heteronuclear single quantum coherence (HSQC) nuclear magnetic resonance (NMR) spectroscopy, and 23 designs showed well-dispersed sharp peaks (Supplementary Tables 2–6 and Supplementary Fig. 2). The experimental results for all the designs are summarized in Extended Data Table 1. For each topology, we selected one monomeric design with well-dispersed sharp NMR peaks for NMR structure determination (Fig. 5 and Supplementary Fig. 3). All the designs were found to be highly stable from thermal denaturation up to 170 °C by CD (Fig. 5b,c). The NMR structures were solved at high quality using MagRO-NMRViewJ^40,41 (Table 1, Supplementary Text, Supplementary Figs. 4 and 5 and Supplementary Table 7), and the solved structures were consistent with the design models (Fig. 6 and Supplementary Table 8). For H5_fold-0, one of the designs was solved by X-ray crystallography and was nearly identical to the design model except for the domain swapping in the crystallized condition (Fig. 6, Table 2 and Supplementary Fig. 5). Despite the inclusion of noncanonical helix–helix packing arrangements in each design, the sidechains from distant α-helices were found to be coherently packed to constitute a single hydrophobic core similar to the design model. Notably, the bulky hydrophobic sidechains from the loops and neighboring α-helices also contributed largely to the core: they spiked the core and pinned the loops to the target conformations (Extended Data Fig. 9; for the importance of hydrophobic residues in the HLH motifs on energy landscapes of the designs, see Supplementary Fig. 6b,e). Interestingly, the N- and C-terminal helices of H6-FoldU_Nomur was found to be fluctuated despite the helix formation (Supplementary Figs. 7–9). Furthermore, in the thermal denaturation, the helical content of H6-FoldU_Nomur was gradually decreased before the transition (the second from the bottom in Fig. 5c), and in the chemical denaturation, the m-value, which represents the cooperativity, was lower than those of the other designs (Fig. 5d; note that m-values also depend on protein size, with larger proteins having larger m-values⁴²; therefore, the H5_fold-0_Elsa and Chantal, which are smaller in size than the other designs, show lower m-values). These results would be attributed to the low hydrophobicity for the core-forming residues of the C-terminus: almost all of the residues are Ala (Supplementary Fig. 8). We also compared the loop geometries of all HLH motifs at the ABEGO level in the design models and experimental structures (Supplementary Fig. 10 and Supplementary Table 9) (for the importance of helix capping residues in the HLH motifs on energy landscapes of the designs, see also Supplementary Fig. 6c,f). Except for the loop immediately before the C-terminal helix of H6-FoldU_Nomur, all loop geometries of the experimental structures agreed with those of the design models. These results indicate that the difficult-to-describe α-helical proteins are designable with typical building blocks.

Extended Data Table 1.

Summary of experimental results for designed proteins

Open in a new tab

The second column shows the number of designs experimentally tested for the fold in the leftmost column. The subsequent columns give the number of designs that satisfy each experimental characterization, which was performed sequentially from the left to the right. The successful designs are defined as those that satisfy all criteria and are expected to fold into correct fold. The details of the results are shown in Supplementary Tables 2-6.

^†Expression and solubility were assessed by SDS-PAGE and mass spectrometry.

^‡SEC-MALS was used to determine oligomerization state. The number of designs in which the main peak of the absorbance at 280 nm corresponds to the monomeric state was counted.

^*1H-¹⁵N HSQC spectra were collected.

Fig. 5 — a, Energy landscapes from Rosetta ab initio structure prediction simulations. The y axis represents Rosetta all-atom energy and the x axis represents the Cα RMSD from the design model. Black points represent the lowest energy structures obtained in independent Monte Carlo structure prediction trajectories starting from an extended chain for each sequence; red points represent the lowest energy structures obtained in trajectories starting from the design model. b, Far-ultraviolet CD spectra at 30 °C, the temperatures close to the melting temperature T_m, and 170 °C. The CD spectra were recorded under the pressure of 10 bar. c, Thermal denaturation measured at 222 nm under the pressure of 10 bar. For each design, the data were fitted to a two-state model (black solid line) to obtain the T_m. d, Chemical denaturation with GuHCl (square brackets denote concentration) measured at 222 nm and 25 °C. For each design, the data were fitted to a two-state model (black solid line) to obtain the free energy of unfolding ΔG and its dependency on the denaturant, m-value. e, Two-dimensional ¹H-¹⁵N HSQC spectra at 25 °C and 600 MHz.

Table 1.

NMR constraints and structure statistics of the five designed structures

Design ID	H5_fold-0_Chantal	H6_fold-C_Rei	H6_fold-Z_Gogy	H6_fold-U_Nomur	H7_fold-K _Mussoc
PDB ID	7BQM	7BQN	7BQQ	7BQS	7BQR
BMRB entry	36,335	36,336	36,337	36,339	36,338
NMR distance and dihedral constraints
Distance constraints
Total NOE	2,098 (100.0%)	3,018 (100.0%)	2,771 (100.0%)	2,515 (100.0%)	2,934 (100.0%)
Intra-residue	425 (20.3%)	596 (19.7%)	551 (19.9%)	436 (17.3%)	484 (16.5%)
Inter-residue
Sequential (\|i − j\| = 1)	555 (26.5%)	727 (24.1%)	620 (22.4%)	578 (23.0%)	661 (22.5%)
Medium range (1 < \|i − j\| < 5)	617 (29.4%)	983 (32.6%)	867 (31.3%)	801 (31.8%)	906 (30.9%)
Long range (\|i − j\| ≥ 5)	501 (23.9%)	712 (23.6%)	733 (26.5%)	700 (27.8%)	883 (30.1%)
Total dihedral angle restraints	132	206	232	195	220
ϕ	66	103	116	97	110
ψ	66	103	116	98	110
Structure statistics
Violations (mean and s.d.)†
Distance constraints (Å)	0.024 ± 0.068 (0.10 ± 0.28)	0.000 ± 0.000 (0.00 ± 0.00)	0.192 ± 0.059 (1.20 ± 0.70)	0.318 ± 0.071 (1.50 ± 0.59)	0.219 ± 0.020 (1.20 ± 0.43)
Dihedral angle constraints (°)	0.000 ± 0.000 (0.00 ± 0.00)	1.006 ± 4.273 (0.05 ± 0.21)	21.473 ± 0.262 (0.90 ± 0.26)	2.056 ± 5.853 (0.10 ± 0.28)	4.336 ± 7.772 (0.20 ± 0.35)
Max. distance constraint violation (Å)	0.243	0.186	0.413	0.416	0.278
Max. dihedral angle violation (°)	18.782	20.114	29.059	21.092	23.562
Deviations from idealized geometry‡
Bond lengths (Å)	0	0	0	0	0
Bond angles (°)	0	0	0	0	0
Impropers (°)	0	0	0	0	0
Average pairwise RMSD* (Å)
Heavy	1.34 ± 0.11	1.26 ± 0.12	1.19 ± 0.08	1.15 ± 0.10	1.14 ± 0.10
Backbone	0.39 ± 0.09	0.47 ± 0.10	0.39 ± 0.06	0.35 ± 0.06	0.27 ± 0.05

Open in a new tab

†Mean and s.d. values are derived from 20 models of Amber refined structures. The averaged number of violations for dihedral angle constraints (>20°) and distance constraints (>0.2 Å) across 20 models, and the s.d. are indicated in parentheses.

‡No geometrical outliers are found in all models.

*Averaged RMSD and deviation of backbone and heavy atoms for all pair of models in ensemble (20 × 20), are calculated by MolMol⁵³, fitted on the residues in ordered region (H5_fold-0_Chantal: 4–85; H6-fold-C_Rei: 2–15, 24–39, 69–91, 97–112; H6_fold-Z_Gogy: 4–118; H6_fold-U_Nomur: 5–19, 21–36, 38–96, 99–107; H7_fold-K_Mussoc: 8–25, 27–48, 52–121) identified by Filt_Robot⁵⁴.

Fig. 6 — Design structures (left) and NMR structures (a crystal structure for H5_fold-0_Elsa) (right); the N- and C-terminals are colored in blue and red, respectively. The Cα RMSD between them is indicated (for H5_fold-0_Elsa, of which crystal structure is domain-swapped dimer, the Cα RMSD was calculated using MICAN⁵⁶).

Table 2.

X-ray crystallography data collection and refinement statistics

	H5_fold-0_Elsa
Data collection
Space group	P2₁
Cell dimensions
a, b, c (Å)	45.98, 33.66, 58.38
α, β, γ (°)	90.00, 93.11, 90.00
Resolution (Å)	45.9–2.33 (2.47–2.33)*
R_merge	0.090 (0.483)
I/σI	13.63 (2.54)
Completeness (%)	98.7 (92.3)
Redundancy	6.4 (4.5)
Refinement
Resolution (Å)	45.9–2.33
No. reflections	49,835
R_work/R_free	0.2066/0.2469
No. atoms
Protein	1,422
Ligand/ion	6
Water	16
B-factors
Protein	57.59
Ligand/ion	84.02
Water	45.80
RMSD
Bond lengths (Å)	0.003
Bond angles (°)	0.517

Open in a new tab

*Statistics for the highest resolution shell are shown in parentheses. PDB ID: 7DNS.

Extended Data Fig. 9 — Hydrophobic core residues are shown in stick. Bulky hydrophobic side chains from loops and the neighboring α-helices, which spiked the core and pinned the loops to the target conformations, are shown in yellow.

Discussion

De novo designs of α-helical proteins have focused on structures consisting of parallelly aligned α-helices (Fig. 1), many of which are based on helical structure models such as the helical wheel⁴³ and Crick’s parameterization⁴⁴. We sought to develop a computational method for designing difficult-to-describe α-helical protein structures. We first identified the 18 HLH motifs typically observed in naturally occurring proteins. We then demonstrated that a wide range of globular all-α backbone structure topologies from bundle-like to complicated are generated by combining the 18 typical HLH motifs and canonical α-helices. The key to building complicated α-helical topologies is to include HLH motifs with larger bending angles such as corner-type motifs. The approach of this developed method is regarded as the reverse of blueprint-based design: design target topologies are searched by the combinations of HLH motifs in this approach, whereas design target topologies are predetermined and then local backbone structures favoring the topologies are selected in blueprint-based design.

We succeeded in designing complicated α-helical protein structures with five distinct topologies, three of which, H5_fold-0, H6_fold-C and H6_fold-Z, exhibited structural complexities comparable to the globin fold. The design success rate was as high as that of previous de novo designs, and the design exhibited high solubility and thermal stability, similarly to previous designs^29,45–49. Moreover, the loop geometries of almost all HLH motifs were formed as designed, which must have enabled the designed proteins to fold into the target topologies. These de novo design results indicate that the compact and steric-clash free backbone structures generated by using the typical HLH motifs are probably designable. In this regard, however, one of the questions is whether all or how much of the generated backbone structures can have tight core packing of sidechains. We have demonstrated that the selected five backbone structures are packable through de novo design, but the packability for the other backbone structures has not been clarified, which should be addressed in next works.

The computationally generated myriad of complicated all-α structures should provide diverse and heterogeneous molecular surfaces for engineering functions such as binding, enzymatic activity and self-assembly into symmetric oligomers. The myriad of generated structures, which are presumably highly soluble and stable, coupled with the recently developed massive gene synthesis^50,51 and parallel high-throughput screening^17,18,26,52, should make it possible to create proteins with optimal structures for specific functions^17,26.

Methods

Definition of HO

HO is the order parameter that captures the complexities of α-helical proteins. HO is defined by the average of inner products between helix orientation vectors, u_i, for all pairs of N α-helices⁵⁵:

HO = \frac{2}{N (N - 1)} \sum_{\begin{matrix} i = 1 \\ i < j \end{matrix}}^{N} {(u_{i} \cdot u_{j})}^{2} .

Higher values indicate more ordered and lower values, more complicated.

Analysis of all-α protein structures for de novo designed and naturally occurring proteins

Twenty-two de novo designed all-α protein structures were collected from Protein Data Bank (PDB). To this end, de novo designs were searched by the keyword ‘de novo’ or ‘de-novo’ in PDB as of November 2020, and then all-α structures containing no β-strands were extracted on the basis of the secondary structure assignments by the DSSP algorithm⁵⁷ (for the PDB structures including multiple chains or NMR models, the first chain or model was used). The following four classes of de novo designed proteins were excluded from the dataset: (1) designed proteins created on the basis of backbone structures of naturally occurring proteins, and those with sequence similarity higher than 0.90 (as an exception, the three-helix bundle structure designs (PDB code: 6DS9 and 2A3D) were both included because of their structural dissimilarity); (2) assemblies composed of one or two α-helices (for example, 3R3K and 1U7J); (3) repetitive structures such as α-solenoids (for example, 1MJ0 and 5K7V); (4) membrane proteins.

For naturally occurring all-α protein structures, 7,352 representative structures found in the mainly-α class in the CATH database⁵⁸ with sequence identity less than 40% were used.

For calculating the HO values of the collected structures, secondary structure elements and loops were assigned by DSSP⁵⁷ (α-helices are defined for the residue regions composed of at least five successive residues assigned as ‘H’ by the DSSP calculation). Note that the secondary structure assignments by DSSP are not always consistent with those originally defined by the authors. For example, the number of α-helices in the structures (PDB codes: 4TQL and 1P68) respectively designed with three and four α-helices were defined as four and five due to partially distorted α-helices.

Clustering of HLH units using the five features representing a HLH geometry

A total of 13,667 HLH structures were extracted from 7,280 X-ray structures (secondary structures were assigned by DSSP⁵⁷), obtained from the PISCES server⁵⁹, with resolution ≤2.5 Å, R-factor ≤0.3, sequence length more than or equal to 40, and ≤25% sequence identity. We then classified the HLH structures by their loop residue lengths and extracted 13,510 HLH structures in total with loop of one to five residues in length. The extracted HLH structures were clustered for each loop length from one to five using the density clustering algorithm³⁰ (Extended Data Fig. 3), with the five features representing a HLH geometry (Extended Data Fig. 2).

Building backbone structures

α-Helical backbone structures were built using Rosetta by exhaustive sampling for the conformations with steric-clash free (Rosetta vdw score <4.0 using the weight value, 0.1) and smaller radius of gyration (<14 Å) (the threshold value corresponds to the peak of the distribution of the radius of gyration for naturally occurring proteins; Supplementary Fig. 11) by combining canonical α-helices ranging from 5 to 35 residues (backbone torsion angles, phi, psi and omega, were set to −60.0, −45.0 and 180.0, respectively) and the identified 18 HLH motifs (Main and Fig. 3a), with length constraints of 90 and 110 residues for the five- and six-helix proteins, respectively. For generating five-helix structures, 64,440,995 steric-clash free four-helix structures with 70 residues were first generated, and then an α-helix with 18 types of connecting loops was appended to the C-terminal of the generated four-helix structures so that the total length becomes 90 residues. For generating six-helix structures, an α-helix with 18 types of connecting loops was appended to the N-terminal of the generated five-helix structures so that the total length becomes 110 residues. From these structures, the globular five- and six-helix structures were collected on the basis of the radius of gyration.

Selection of designs for experimental characterization based on the shapes of energy landscapes

We set three criteria for the selection by the shapes of energy landscapes. First, the overall shape of the landscape should be funnel-like with an apparent and sharp ‘neck’ reaching low-root mean square deviation (RMSD) and low-energy region, which is the hallmark of the foldability specifically into the target conformation. This is the most important criterion on the selection of energy landscape plots: for ill-designed sequences, all conformations remain in the high-RMSD and high-energy regions and do not have such a ‘neck’. Second, the funnel should not have subminima that indicate that the protein has alternative folded states. This is a criterion to exclude the possibility of misfolding and avoid a rugged energy landscape. Third, the ensemble of lowest-RMSD and lowest-energy conformations at the bottom of the funnels should not be away from, and ideally should overlap with, the conformational ensemble in the simulations starting from the target structure. This criterion is not mandatory, but consistency between fragment assembly simulations that offer global sampling and near-native relax simulations helps us to rank the designs with the similar quality in terms of the first and second criteria.

Expression and purification of designed proteins

The genes encoding the designed sequences were synthesized and inserted into pET21b vectors. The whole plasmid constructs were purchased from FASMAC or Eurofins Genomics. The target proteins were overexpressed by IPTG induction in E. coli BL21 Star (DE3) cells cultured in MJ9 minimal media including ¹⁵N ammonium sulfate as the sole nitrogen source and ¹²C glucose as the sole carbon source⁶⁰. The expressed uniformly (U-)¹⁵N-labeled proteins with a 6xHis tag at the C-terminus were purified by Ni²⁺-affinity columns. The purified proteins were then dialyzed against phosphate-buffered saline (PBS) buffer, 137 mM NaCl, 2.7 mM KCl, 10 mM Na₂HPO₄ and 1.8 mM KH₂PO₄, at pH 7.4; this buffer was used for all the experiments except NMR structure determination. The expression level, solubility and purity of each designed protein were evaluated by sodium dodecyl sulfate–polyacrylamide gel electrophoresis. To further confirm them, the samples were analyzed by mass spectroscopy (Bruker Daltonics REFLEX III and Thermo Scientific Orbitrap Elite).

Experiments to identify designed proteins exhibiting folding ability

The following three experiments were conducted to evaluate the folding ability of designed sequences: CD spectroscopy, size exclusion chromatography with multi-angle light scattering (SEC–MALS) and ¹H-¹⁵N HSQC NMR spectroscopy. Supplementary Tables 2–6 present the results of the evaluations for each designed sequence for each fold.

CD spectroscopy under 1-bar pressure

Far-UV CD spectra was measured to study whether the designs show the characteristic spectra of α-helical proteins, by scanning from 260 to 200 nm at 20 °C for ∼15 μM protein samples in PBS buffer on a JASCO J-1500 CD spectrometer. The measurements were performed four times and then averaged.

SEC–MALS

Oligomeric states for the designs in solution were studied by SEC–MALS with miniDAWN TREOS static light scattering detector (Wyatt Technology Corp.) combined with a high-performance liquid chromatography system (1260 Infinity LC, Agilent Technologies) with a Shodex KW-802.5 column (Showa Denko K.K.) for H5_fold-0_Chantal and H6_fold-C_Rei or a Superdex 75 increase 10/300 GL column (GE Healthcare) for H5_fold-0_Elsa, H6_fold-Z_Gogy and H7_fold-K_Mussoc. After the equilibration of the column with PBS buffer, 100 µl of the samples after purification by Ni²⁺-affinity columns were injected. The absorbance at 280 nm was measured by the high-performance liquid chromatography system to give the protein concentrations and intensity of light scattering at 659 nm was measured at angles of 43.6°, 90.0° and 136.4°. These data were analyzed by the ASTRA software (version 6.1.2, Wyatt Technology) using a change in the refractive index with concentration, a dn/dc value, 0.185 ml g⁻¹, to estimate the molecular weight of dominant peaks.

¹H-¹⁵N HSQC NMR spectroscopy

Whether the designs fold into well-packed structures or not was evaluated by ¹H-¹⁵N HSQC 2D-NMR spectroscopy. The purified protein samples were concentrated to 0.2–1.0 mM, and mixed with their 10% volume of D₂O. The experiments were performed at 25 °C on a JEOL JNM-ECA 600 MHz spectrometer, and data were analyzed by JEOL Delta (version 5.3.1).

High-pressure CD spectroscopy for melting temperature (T_m) estimation

For the designs that were evaluated to have the folding ability in the above experiments (one design for each target topology was selected), thermal denaturation was studied by using high-pressure CD spectroscopy. JASCO J-1500 CD spectrometer was equipped with additional pressure instruments so that temperature of the solution samples can be scanned from 30 °C to 170 °C under 10 bar. Temperature was increased 1 °C per minute for ∼15 μM protein samples. Fixed wavelength measurements at 222 nm were performed at every 1 °C, and wavelength scanning measurements (260 to 200 nm) were performed at 30, 40, 60, 80, 90, 100, 110, 120, 130, 140, 150, 160 and 170 °C. Thermal denaturation was measured once. T_m was estimated by nonlinear fitting to thermal denaturation CD curve at 222 nm. The nonlinear least-squares analysis was performed by nls function in R language, given a two-state unfolding and linear extrapolation model. After this fitting, we obtained T_m at which the estimated populations of folded and unfolded states become equal.

CD spectroscopy for chemical denaturation

Chemical denaturation with GuHCl was monitored at 222 nm for 2–3 μM protein samples in PBS buffer (pH 7.4) at 25 °C in a 1-cm path length cuvette. The GuHCl concentration was automatically controlled by a JASCO ATS-530 titrator. Chemical denaturation was measured once. The chemical denaturation curves were fit by nonlinear least-squares analysis using a two-state unfolding and linear extrapolation model⁶¹. The free energy change, ΔG, for the unfolding transition and its dependency on the denaturant, m-value, were obtained from the fitting.

Sample preparation for NMR structure determination

The most promising design for each target topology was overexpressed by IPTG induction in E. coli BL21 Star (DE3) cells cultured in MJ9 minimal media containing ¹⁵N ammonium sulfate as the sole nitrogen source and ¹³C glucose as the sole carbon source⁶⁰. The expressed U-¹⁵N,U-¹³C-enriched proteins were purified by Ni²⁺-affinity columns, and dialyzed against PBS buffer. The protein samples were further purified by gel filtration chromatography on an ÄKTA Pure 25 FPLC (GE Healthcare) using a Superdex75 or Superdex75 increase 10/300 GL column (GE Healthcare), which also replaced the PBS buffer at pH 7.4 with the customized buffer for NMR spectroscopy. The following 95% H₂O/5% D₂O buffer conditions for each sample were used: 100 mM NaCl, 5.6 mM Na₂HPO₄, 1.1 mM KH₂PO₄, at pH 7.4 for H5_fold-0_Chantal; 50 mM NaCl, 5.5 mM Na₂HPO₄, 4.5 mM KH₂PO₄, at pH 6.9 for H6_fold-C_Rei; 50 mM NaCl, 3.2 mM Na₂HPO₄, 4.5 mM KH₂PO₄, at pH 6.5 for H6_fold-Z_Gogy; 155 mM NaCl, 3.0 mM Na₂HPO₄, 1.1 mM KH₂PO₄, 10 μM ethylenediaminetetraacetic acid, 0.02% NaN₃, cOmplete protease inhibitor cocktail (Roche), at pH 7.4 for H6_fold-U_Nomur; and 155 mM NaCl, 3.0 mM Na₂HPO₄, 1.1 mM KH₂PO₄, at pH 7.4 for H7_fold-K_Mussoc.

Solution structure determination by NMR

NMR measurements

NMR measurements were performed on Bruker AVANCE III NMR spectrometers equipped with QCI cryo-Probes at 303 K. The spectrometers with 600, 700 and 800 MHz magnets were used for the signal assignments and nuclear Overhauser effect (NOE)-related measurements, while 700, 900 and 950 MHz ones, for residual dipolar coupling (RDC) experiments. For the signal assignments, 2D ¹H-¹⁵N HSQC (echo/anti-echo), ¹H-¹³C Constant-Time HSQC for aliphatic and aromatic signals, 3D HNCO, HN(CO)CACB and 3D HNCACB for backbone signal assignments, while BEST pulse sequence was applied to the triple resonance measurements for H6_fold-C_Rei. For structure determination, 3D ¹⁵N-edited NOESY and 3D ¹³C-edited NOESY for aliphatic and aromatic signals (mixing time 100 ms) were performed. For H6_fold-U_Nomur, additional 3D HN(CA)CO, HN(CO)CA, HNCA, HBHA(CO)NH, HBHANH, H(CCCO)NH, CC(CO)NH, 3D ¹³C-HSQC (¹³C-t1) NOESY ¹³C-HSQC, 3D ¹³C-HSQC (¹³C-t1) NOESY ¹⁵N-HSQC and 4D ¹³C-HSQC NOESY ¹³C-HSQC were measured. Except for 3D-edited NOESY, all the other spectra were performed using non-uniform sampling (NUS) for H6_fold-U_Nomur and H7_fold-K_Mussoc. For NUS, sampling ratio was set at 25% for 3D and 6% for 4D with a fixed random seed. The NUS spectra were reconstructed by iteratively re-weighted least squares for 3D while iterative soft thresholding for 4D spectra with virtual-echo technique using qMDD tool⁶².

For the RDC experiments, 2D in-phase and anti-phase (IPAP) ¹H-¹⁵N HSQC using water-gate pulses for water suppression were measured with or without 6–10 mg ml⁻¹ of Pf1 phage (ASLA Biotech). For confirming the positions of ¹H-¹⁵N signals in the 2D IPAP ¹H-¹⁵N HSQC, 3D HNCO at the identical buffer condition containing Pf1 phage were measured. The α- and β-states of ¹⁵N signals split by ¹H-¹⁵N ¹J-coupling were separately identified for the protein in the isotropic and weakly aligned states, to obtain 1-bond RDC $\binom{1}{} D_{\binom{1}{} H /^{15} N}$ values. For the sample H6_fold-U_Nomur, 3D J-HNCO (without ¹H decoupling for ¹⁵N evolution) was measured at 25% NUS, which were used for confirming α- and β-states of ¹⁵N signal positions overlapped in 2D IPAP spectra. 3D J-HN(CO)CA spectrum was also measured for H6_fold-U_Nomur to obtain $\binom{1}{} D_{\binom{1}{} H α /^{13} C α}$ for appending an additional number of alignment data at the identical magnetic field and alignment tensor.

NMR signal assignments

All NMR signals were identified in a fully automated manner using MagRO-NMRViewJ (upgraded version of Kujira⁴⁰), in which noise peaks were filtered by deep-learning methods using Filt_Robot⁴¹. FLYA module was used for fully automated signal assignments and structure calculation⁶³ to obtain roughly assigned chemical shifts (Acs), and then trustworthy ones were selected into the MagRO Acs table. After confirmation and correction of the Acs by visual inspection using MagRO, TALOS+⁶⁴ calculations were performed to predict phi/psi dihedral angles, which were then converted to angle constraints for the CYANA format.

Structure calculation

Several CYANA⁶⁵ calculations were performed using the Acs table, NOE peak table and dihedral angle constraints. The Acs table was exported by the MagRO CYANA module, and then the aliased chemical shifts were automatically calculated depending on the spectrum width of responsible NOESY spectra. For dihedral angle constraints, phi and psi, with deviation were derived from TALOS+ prediction using chemical shifts of ¹⁵N, ¹³C′, ¹³Cα and ¹³Cβ, with high prediction score noted by ‘Good’. The minimal angle deviation was set at 20°. After several iterations of CYANA calculations, dihedral angle constraints derived from TALOS+⁶⁴ revealing large violation for nearly all models in structure ensemble were eliminated.

After the averaged target function of the ensemble reached to less than 2.0 Å², refinement calculations by Amber12 were carried out for 20 models with lowest target functions. The coordinates of final.pdb calculated by CYANA, distance constraints (final.upl), dihedral angle constraints derived from TALOS+ prediction were converted into Amber format and topology file using Sander Tools. Firstly, 500 steps of minimization (250 steps of steepest decent, 250 steps of conjugate gradient) were carried out without electrostatic potential and NMR constraints. Second, molecular dynamics simulations with the ff99SB force field using implicit water system (0.1 M of ionic strength, 18.0 Å of cutoff) were performed, in which the temperature was gradually increased from 0.0 K to 300.0 K by 1,500 steps, followed by the simulation with 28,500 steps at 300.0 K (1.0 fs time step, total 30 ps). Finally, 2,000 steps for minimization (1,000 steps for steepest decent and 1,000 steps for conjugate gradient) with constraints of distance and dihedral angle were applied at the same condition used in the molecular dynamics simulations.

NMR structure validation

The RMSD values were calculated for the 20 structures overlaid to the mean coordinates for the ordered regions, automatically identified by Filt_Robot using multi-dimensional nonlinear scaling⁵⁴.

The RDC back-calculation was performed by PALES⁶⁶ using experimentally determined values of RDC. The averaged correlation between the simulated and experimental values was obtained using the signals except the residues on overlapped regions in ¹H-¹⁵N HSQC and the ones in low-order parameters less than 0.8 predicted by TALOS+. For the validation of H6_fold-U_Nomur, a lot of signals were overlapped in 2D IPAP-HSQC spectra. To overcome this problem, $\binom{1}{} J_{HN -^{15} N}$ split 3D HNCO (without ¹H-decoupling scheme in ¹⁵N evolution period) spectra in isotropic and anisotropic states were measured by NUS (25% data point reduction) to obtain signal positions of α- and β-states of ¹⁵N spins at resolution of 0.3 Hz. $\binom{1}{} J_{H α /^{13} C α}$ split 3D HN(CO)CA spectra at the same conditions were also measured to obtain $\binom{1}{} D_{\binom{1}{} H α /^{13} C α}$ at resolution of 0.2 Hz. Initially the RDC reproducibility of H6_fold-U_Nomur were examined using separately $\binom{1}{} D_{HN -^{15} N}$ and $\binom{1}{} D_{H α -^{13} C α}$ tables by PALES for all models to confirm that the averaged correlation coefficients are greater than 0.9, and then final correlation coefficients were calculated with two merged tables.

Solution structural dynamics of H6_fold-U_Nomur measured by NMR

¹⁵N R₁, R₂ and ¹⁵N-{¹H} NOE experiments

The ¹⁵N R₁, R₂ and ¹⁵N-{¹H} NOE measurements were performed for a uniformly ¹⁵N-labeled H6_fold-U_Nomur protein sample with a concentration of 0.78 mM, which is the same condition as the solution used for the structure determination. These were conducted at 303 K on Bruker 700 MHz Avance-III NMR spectrometer equipped with cryogenic probe, using the 4-mm-diameter NMR Shigemi-tube. The ¹⁵N R₁ and R₂ were obtained by measuring 2D ¹H-¹⁵N HSQC with the inversion-recovery technique and with the temperature-compensated CPMG method, respectively⁶⁷. Steady-state ¹⁵N-{¹H} NOE was obtained by measuring 2D ¹H-¹⁵N HSQC spectra with and without saturation pulse in each of the retardation time acquired by the interleaved method. The 2D ¹H-¹⁵N peaks were automatically identified and assigned using the MagRO software⁴⁰. Some assignments were corrected with visual inspection. The ¹⁵N-{¹H} NOE values were estimated as the peak intensity ratio I/I₀ derived from the 2D HSQC spectra with (I) and without (I₀) saturation pulse. The I/I₀ data were fitted by using an exponential equation, I/I₀ = exp(−R × t) with delay time t (s) to obtain the ¹⁵N relaxation rate constant R (s⁻¹).

2D ¹H-¹⁵N CLEANEX-PM FHSQC experiments

The uniformly ¹⁵N-labeled protein sample of H6_fold-U_Nomur was lyophilized, and then 2D ¹H-¹⁵N HSQC data were collected immediately after dissolving the lyophilized sample in 100% D₂O. However, protons of the amide groups of most residues were promptly replaced by deuterium within 10 min after the dissolution, probably due to the high pH of the sample solvent (pH 7.4). This prevented us to obtain practical H–D exchange rates. Therefore, the exchange rates between the water and amide protons were obtained using the 2D ¹H-¹⁵N CLEANEX-PM FHSQC^68,69 scheme. In this method, the exchange ratio depends only on k_open in the protein folding/unfolding. The amide group would be in the EX1 limit due to the relatively high pH of 7.4, namely k_close « k, where k_close is the global and/or local folding rate of a protein and k is the exchange rate of amide group in the unfolded state, the observable solvent exchange rate k_ex would be obtained as the global and/or local unfolding rate of a protein, k_open. The 2D ¹H-¹⁵N FHSQC data without applying spin-lock pulse was also measured under the same condition to obtain the reference, I₀. For 2D ¹H-¹⁵N CLEANEX-PM FHSQC spectra with different spin-lock time t_m and the reference spectrum, the observed peaks were automatically identified and assigned by MagRO⁴⁰ with manual correction to obtain a normalized list of signal intensities for each residue. The following equation was used to obtain k_obs for each residue:

\frac{I}{I_{0}} = \frac{k_{ex}}{k_{ex} + R_{1 A} - R_{1 B}} \times \{\exp (- R_{1 B} \times t_{m}) - \exp [- (R_{1 A} + k_{ex}) \times t_{m}]\},

where R_1B is the apparent longitudinal relaxation rate of water molecules, and R_1A is a mixture of the apparent longitudinal and transverse relaxation rates on the rotational frame for the residue of interest. The values of R_1A and k_ex for each residue with error values were obtained by curve-fitting by this equation, with the assumption, R_1B = 0.6 (s⁻¹).

X-ray structure determination of H5_fold-0_Elsa

Sample preparation for X-ray structure determination

The gene encoding the designed sequence of H5_fold-0_Elsa in pET21b vector was digested at the NdeI and XhoI restriction sites and cloned into pET15b-TEV vector with cleavable sites by TEV protease instead of thrombin (original) between the designed sequence and the N-terminal 6xHis tag. Designed protein was expressed in E. coli BL21 Star (DE3) cells, and purified by a Ni²⁺-affinity column. The N-terminal His tag was then cleaved by TEV protease, and removed through a Ni²⁺-affinity column. The protein samples without a His tag were purified by an anion-exchange chromatography (HiTrapQ HP 1-ml column, GE Healthcare) followed by gel filtration chromatography (Superdex 75 10/300 GL column) on an ÄKTA Pure 25 FPLC. Mass spectroscopy was performed to confirm that a His tag was successfully cleaved.

To assess the effect of the tag cleavage on the oligomeric state and stability, we performed SEC–MALS and thermal denaturation CD experiments under high pressure for the original and tag-cleaved samples of H5_fold-0_Elsa. The solvent was exchanged to PBS at pH 7.4 before these experiments. The results showed that the tag-cleaved protein was also monomeric and had nearly identical denaturation temperature (the second row in Fig. 5c, 106 °C) as the original sample with the C-terminal His tag (Supplementary Fig. 12, 105 °C), which indicates that the removal of tag and slight differences in flanking amino-acid sequences do not largely change the stability and oligomeric state of the designed protein in solution.

Crystallization and X-ray structure determination

The protein samples of H5_fold-0_Elsa at the concentration of 12 mg ml⁻¹ (1.07 mM) was crystallized in the solution of 0.4 M MgCl₂, 0.1 M Tris–HCl (pH 7.5) and 30% PEG 3350, using the sitting-drop vapor diffusion method at 296 K. The obtained crystals were soaked in the solution of 0.4 M MgCl₂, 0.1 M Tris–HCl (pH 7.5), 30% PEG 3350 and 10% glycerol, mounted on cryo-loops (Hampton Research), flash-cooled and stored in liquid nitrogen.

X-ray diffraction data of the crystal were collected with BL-1A beamline (λ = 1.1000 Å) at Photon Factory, and processed to 2.3 Å by XDS⁷⁰. After phase determination by molecular replacement using the design model by Molrep⁷¹ in the CCP4 suite, the molecular model was constructed and refined using Coot⁷² and Phenix Refine⁷³. Translation/Libration/Screw refinement was performed in late stages of refinement. The refined structures were validated with RAMPAGE⁷⁴. Ramachandran plot statistics showed that 98.8% and 0.00% of residues were in favored and outlier regions, respectively. The crystallographic data collection is summarized in Table 2.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

Online content

Any methods, additional references, Nature Portfolio reporting summaries, source data, extended data, supplementary information, acknowledgements, peer review information; details of author contributions and competing interests; and statements of data and code availability are available at 10.1038/s41594-023-01147-9.

Supplementary information

Supplementary Information^{(11.5MB, pdf)}

Supplementary Text, Figs. 1–12 and Tables 1–9.

Reporting Summary^{(67KB, pdf)}

Peer Review File^{(1.4MB, pdf)}

Supplementary Data 1^{(203.2KB, zip)}

Computational design models of the designed all-α proteins.

Supplementary Data 2^{(290.4KB, txt)}

Source data for Supplementary Fig. 11.

Source data

Source Data Fig. 1^{(149.2KB, txt)}

Statistical source data.

Source Data Fig. 3^{(47.3MB, txt)}

Statistical source data.

Source Data Extended Data Fig. 1^{(5.8MB, zip)}

Statistical source data.

Source Data Extended Data Fig. 3^{(391.8KB, xlsx)}

Statistical source data.

Source Data Extended Data Figs. 4 and 5^{(956.3KB, txt)}

Statistical source data.

Source Data Extended Data Fig. 6^{(1.5MB, zip)}

Statistical source data.

Source Data Extended Data Fig. 7^{(344.1MB, zip)}

Statistical source data.

Acknowledgements

We thank RIKEN Yokohama NMR Facility for NMR measurements, the Functional Genomics Facility, NIBB Core Research Facilities, especially Y. Makino, for mass spectrometry analysis, and the Instrument Center, Okazaki, Japan, especially M. Nakano, for HSQC spectra measurements. We also thank M. Yamamoto for experimental assistance, S. Minami for advice on structure similarity comparison and useful discussions, and Y. Ishii and S. Akiyama for valuable comments on the paper. The computations were performed using the Research Center for Computational Science (RCCS), Okazaki, Japan (Project: 21-IMS-C174, 20-IMS-C157, 19-IMS-C175, 18-IMS-C155, 17-IMS-C147, 16-IMS-C129 and 15-IMS-C180). Three-dimensional structure determination was supported by Basis for Supporting Innovative Drug Discovery and Life Science Research (BINDS) from AMED under grant numbers JP19am0101072 and JP20am0101083. The synchrotron radiation experiments were performed at Photon Factory (proposals 2016G-048). We also thank the beamline staff at BL1A of Photon Factory (Tsukuba, Japan) for help during data collection. This work was supported by the Japan Society for the Promotion of Science (JSPS) KAKENHI Grants-in-Aid for Scientific Research 15H05592 to N. Koga, 18H05420 to T.K. and N. Koga, 18H05425 to T.M. and 18K06152 to Naohiro K., the Japan Science and Technology Agency (JST) Precursory Research for Embryonic Science and Technology (PRESTO, grant number JPMJPR13AD to N. Koga) and the JST-Mirai Program (JPMJMI17A2 to Naohiro K.). K. Sakuma was also supported by JSPS KAKENHI Grant-in-Aid for JSPS Research Fellow 15J02427.

Extended data

Author contributions

K. Sakuma and N. Koga designed the research. K. Sakuma analyzed natural proteins and performed computational design work. K. Sakuma wrote the program code. K. Sakuma, T.K. and R.T.-K. expressed, purified, characterized the designed proteins by biochemical assay, and prepared protein samples for NMR: H5_fold-0, H6_fold-C and H6_fold-Z, by K. Sakuma, and H6_fold-U and H7_fold-K, by T.K. and R.T.-K. For NMR structure determination, Naohiro K., T.S. and T.N. collected data: H5_fold-0_Chantal, H5_fold-C_Rei and H6_fold-Z_Gogy, by T.S., and H6_fold-U_Nomur and H7_fold-K_Mussoc, by Naohiro K. and T.N. Naohiro K. performed structural analysis. For NMR structural dynamics of H6_fold-U_Nomur, Naohiro K. and T.N. collected and analyzed data. For crystal structure determination, Naoya K. prepared protein samples and K. Suzuki, with advice from T.M., performed crystallization and structural analysis. K. Sakuma, Naohiro K., T.F., K. Suzuki, T.M., T.K., R.T.-K. and N. Koga wrote the paper.

Peer review

Peer review information

Nature Structural & Molecular Biology thanks John Anderson and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Peer reviewer reports are available. Primary Handling Editor: Sara Osman, in collaboration with the Nature Structural & Molecular Biology team.

Data availability

The solution NMR structures of the five designs have been deposited in the PDB under the accession numbers 7BQM (H5_fold-0_Chantal), 7BQN(H6_fold-C_Rei), 7BQQ (H6_fold-Z_Gogy), 7BQS (H6_fold-U_Nomur) and 7BQR (H7_fold-K_Mussoc). The NMR data have been deposited in the Biological Magnetic Resonance Data Bank under the accession numbers 36335(H5_fold-0_Chantal), 36336 (H6_fold-C_Rei), 36337 (H6_fold-Z_Gogy), 36339 (H6_fold-U_Nomur) and 36338 (H7_fold-K_Mussoc). The crystal structure of H5_fold-0_Elsa has been deposited in the PDB under the accession number 7DNS. The computational design models are presented as Supplementary Data 1. The generated compact and steric-clash-free five-helix (1,899,355) and six-helix (380,869) structures are available at https://github.com/kogalab21/all-alpha_design. The plasmids encoding the designed sequences are available through Addgene under the accession numbers 201825 (H5_fold-0_Elsa), 201826(H5_fold-0_Chantal), 201827 (H6_fold-C_Rei), 201828 (H6_fold-Z_Gogy), 201829(H6_fold-U_Nomur) and 201830 (H7_fold-K_Mussoc). Source data are provided with this paper.

Code availability

The code for building and analyzing helical structures has been implemented into Rosetta at https://github.com/RosettaCommons/main/tree/koga/all-alpha_design. The demo for building helical structures is available at https://github.com/kogalab21/all-alpha_design.

Competing interests

The authors declare no competing interests.

Footnotes

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

These authors contributed equally: Koya Sakuma, Naohiro Kobayashi.

Extended data

is available for this paper at 10.1038/s41594-023-01147-9.

Supplementary information

The online version contains supplementary material available at 10.1038/s41594-023-01147-9.

References

1.Kendrew, J. C. et al. A three-dimensional model of the myoglobin molecule obtained by x-ray analysis. Nature181, 662–666 (1958). [DOI] [PubMed] [Google Scholar]
2.Crick, F. H. C. The packing of α-helices: simple coiled-coils. Acta Crystallogr.6, 689–697 (1953). [Google Scholar]
3.Chothia, C., Levitt, M. & Richardson, D. Structure of proteins: packing of alpha-helices and pleated sheets. Proc. Natl Acad. Sci. USA74, 4130–4134 (1977). [DOI] [PMC free article] [PubMed] [Google Scholar]
4.Kobe, B. & Kajava, A. V. When protein folding is simplified to protein coiling: the continuum of solenoid protein structures. Trends Biochem. Sci.25, 509–515 (2000). [DOI] [PubMed] [Google Scholar]
5.Doyle, L. et al. Rational design of alpha-helical tandem repeat proteins with closed architectures. Nature528, 585–588 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
6.Walsh, S. T., Cheng, H., Bryson, J. W., Roder, H. & DeGrado, W. F. Solution structure and dynamics of a de novo designed three-helix bundle protein. Proc. Natl Acad. Sci. USA96, 5486–5491 (1999). [DOI] [PMC free article] [PubMed] [Google Scholar]
7.Dai, Q. H. et al. Structure of a de novo designed protein model of radical enzymes. J. Am. Chem. Soc.124, 10952–10953 (2002). [DOI] [PubMed] [Google Scholar]
8.Wei, Y., Kim, S., Fela, D., Baum, J. & Hecht, M. H. Solution structure of a de novo protein from a designed combinatorial library. Proc. Natl Acad. Sci. USA100, 13270–13273 (2003). [DOI] [PMC free article] [PubMed] [Google Scholar]
9.Go, A., Kim, S., Baum, J. & Hecht, M. H. Structure and dynamics of de novo proteins from a designed superfamily of 4-helix bundles. Protein Sci.17, 821–832 (2008). [DOI] [PMC free article] [PubMed] [Google Scholar]
10.Calhoun, J. R. et al. Solution NMR structure of a designed metalloprotein and complementary molecular dynamics refinement. Structure16, 210–215 (2008). [DOI] [PMC free article] [PubMed] [Google Scholar]
11.Huang, P. S. et al. High thermodynamic stability of parametrically designed helical bundles. Science346, 481–485 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
12.Murphy, G. S. et al. Computational de novo design of a four-helix bundle protein–DND_4HB. Protein Sci.24, 434–445 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
13.Brunette, T. J. et al. Exploring the repeat protein universe through computational protein design. Nature528, 580–584 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
14.Boyken, S. E. et al. De novo design of protein homo-oligomers with modular hydrogen-bond network-mediated specificity. Science352, 680–687 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
15.Jacobs, T. M. et al. Design of structurally distinct proteins using strategies inspired by evolution. Science352, 687–690 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
16.Bhardwaj, G. et al. Accurate de novo design of hyperstable constrained peptides. Nature538, 329–335 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
17.Chevalier, A. et al. Massively parallel de novo protein design for targeted therapeutics. Nature550, 74–79 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
18.Rocklin, G. J. et al. Global analysis of protein folding using massively parallel design, synthesis, and testing. Science357, 168–175 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
19.Polizzi, N. F. et al. De novo design of a hyperstable non-natural protein–ligand complex with sub-A accuracy. Nat. Chem.9, 1157–1164 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
20.Studer, S. et al. Evolution of a highly active and enantiospecific metalloenzyme from short peptides. Science362, 1285–1288 (2018). [DOI] [PubMed] [Google Scholar]
21.Koebke, K. J. et al. Clarifying the copper coordination environment in a de novo designed red copper protein. Inorg. Chem.57, 12291–12302 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
22.ElGamacy, M., Coles, M. & Lupas, A. Asymmetric protein design from conserved supersecondary structures. J. Struct. Biol.204, 380–387 (2018). [DOI] [PubMed] [Google Scholar]
23.Chen, Z. et al. Programmable design of orthogonal protein heterodimers. Nature565, 106–111 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
24.Silva, D. A. et al. De novo design of potent and selective mimics of IL-2 and IL-15. Nature565, 186–191 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
25.Xu, C. et al. Computational design of transmembrane pores. Nature585, 129–134 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
26.Cao, L. et al. De novo design of picomolar SARS-CoV-2 miniprotein inhibitors. Science370, 426–431 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
27.Sesterhenn, F. et al. De novo protein design enables the precise induction of RSV-neutralizing antibodies. Science368, eaay5051 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
28.Murzin, A. G. & Finkelstein, A. V. General architecture of the α-helical globule. J. Mol. Biol.204, 749–769 (1988). [DOI] [PubMed] [Google Scholar]
29.Koga, N. et al. Principles for designing ideal protein structures. Nature491, 222–227 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
30.Rodriguez, A. & Laio, A. Machine learning. Clustering by fast search and find of density peaks. Science344, 1492–1496 (2014). [DOI] [PubMed] [Google Scholar]
31.Richardson, J. S. & Richardson, D. C. Amino acid preferences for specific locations at the ends of α helices. Science240, 1648–1652 (1988). [DOI] [PubMed] [Google Scholar]
32.Doig, A. J. & Baldwin, R. L. N- and C-capping preferences for all 20 amino acids in α-helical peptides. Protein Sci.4, 1325–1336 (1995). [DOI] [PMC free article] [PubMed] [Google Scholar]
33.Efimov, A. V. A novel super-secondary structure of proteins and the relation between the structure and the amino acid sequence. FEBS Lett.166, 33–38 (1984). [DOI] [PubMed] [Google Scholar]
34.Zhang, Y. & Skolnick, J. TM-align: a protein structure alignment algorithm based on the TM-score. Nucleic Acids Res.33, 2302–2309 (2005). [DOI] [PMC free article] [PubMed] [Google Scholar]
35.Cheng, H. et al. ECOD: an evolutionary classification of protein domains. PLoS Comput. Biol.10, e1003926 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
36.Kuhlman, B. et al. Design of a novel globular protein fold with atomic-level accuracy. Science302, 1364–1368 (2003). [DOI] [PubMed] [Google Scholar]
37.Leaver-Fay, A. et al. ROSETTA3: an object-oriented software suite for the simulation and design of macromolecules. Methods Enzymol.487, 545–574 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]
38.Sheffler, W. & Baker, D. RosettaHoles2: a volumetric packing measure for protein structure refinement and validation. Protein Sci.19, 1991–1995 (2010). [DOI] [PMC free article] [PubMed] [Google Scholar]
39.Rohl, C. A., Strauss, C. E., Misura, K. M. & Baker, D. Protein structure prediction using Rosetta. Methods Enzymol.383, 66–93 (2004). [DOI] [PubMed] [Google Scholar]
40.Kobayashi, N. et al. KUJIRA, a package of integrated modules for systematic and interactive analysis of NMR data directed to high-throughput NMR structure studies. J. Biomol. NMR39, 31–52 (2007). [DOI] [PubMed] [Google Scholar]
41.Kobayashi, N. et al. Noise peak filtering in multi-dimensional NMR spectra using convolutional neural networks. Bioinformatics34, 4300–4301 (2018). [DOI] [PubMed] [Google Scholar]
42.Myers, J. K., Pace, C. N. & Scholtz, J. M. Denaturant m values and heat capacity changes: relation to changes in accessible surface areas of protein unfolding. Protein Sci.4, 2138–2148 (1995). [DOI] [PMC free article] [PubMed] [Google Scholar]
43.Schiffer, M. & Edmundson, A. B. Use of helical wheels to represent the structures of proteins and to identify segments with helical potential. Biophys. J.7, 121–135 (1967). [DOI] [PMC free article] [PubMed] [Google Scholar]
44.Crick, F. H. C. The Fourier transform of a coiled-coil. Acta Crystallogr.6, 685–689 (1953). [Google Scholar]
45.Lin, Y. R. et al. Control over overall shape and size in de novo designed proteins. Proc. Natl Acad. Sci. USA112, E5478–E5485 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
46.Huang, P. S. et al. De novo design of a four-fold symmetric TIM-barrel protein with atomic-level accuracy. Nat. Chem. Biol.12, 29–34 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
47.Marcos, E. et al. Principles for designing proteins with cavities formed by curved beta sheets. Science355, 201–206 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
48.Marcos, E. et al. De novo design of a non-local β-sheet protein with high stability and accuracy. Nat. Struct. Mol. Biol.25, 1028–1034 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
49.Dou, J. et al. De novo design of a fluorescence-activating β-barrel. Nature561, 485–491 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
50.Klein, J. C. et al. Multiplex pairwise assembly of array-derived DNA oligonucleotides. Nucleic Acids Res.44, e43 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
51.Plesa, C., Sidore, A. M., Lubock, N. B., Zhang, D. & Kosuri, S. Multiplexed gene synthesis in emulsions for exploring protein functional landscapes. Science359, 343–347 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
52.Basanta, B. et al. An enumerative algorithm for de novo design of proteins with diverse pocket structures. Proc. Natl Acad. Sci. USA117, 22135–22145 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
53.Koradi, R., Billeter, M. & Wuthrich, K. MOLMOL: a program for display and analysis of macromolecular structures. J. Mol. Graph14, 51–55 (1996). [DOI] [PubMed] [Google Scholar]
54.Kobayashi, N. A robust method for quantitative identification of ordered cores in an ensemble of biomolecular structures by non-linear multi-dimensional scaling using inter-atomic distance variance matrix. J. Biomol. NMR58, 61–67 (2014). [DOI] [PubMed] [Google Scholar]
55.Krissinel, E. & Henrick, K. Secondary-structure matching (SSM), a new tool for fast protein structure alignment in three dimensions. Acta Crystallogr D60, 2256–2268 (2004). [DOI] [PubMed] [Google Scholar]
56.Minami, S., Sawada, K. & Chikenji, G. MICAN: a protein structure alignment algorithm that can handle multiple-chains, inverse alignments, C α only models, alternative alignments, and non-sequential alignments. BMC Bioinf.14, 24 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
57.Kabsch, W. & Sander, C. Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features. Biopolymers22, 2577–2637 (1983). [DOI] [PubMed] [Google Scholar]
58.Orengo, C. A. et al. CATH–a hierarchic classification of protein domain structures. Structure5, 1093–1108 (1997). [DOI] [PubMed] [Google Scholar]
59.Wang, G. & Dunbrack, R. L. Jr. PISCES: a protein sequence culling server. Bioinformatics19, 1589–1591 (2003). [DOI] [PubMed]
60.Jansson, M. et al. High-level production of uniformly N-15- and C-13-enriched fusion proteins in Escherichia coli. J. Biomol. NMR7, 131–141 (1996). [DOI] [PubMed] [Google Scholar]
61.Santoro, M. M. & Bolen, D. W. Unfolding free energy changes determined by the linear extrapolation method. 1. Unfolding of phenylmethanesulfonyl α-chymotrypsin using different denaturants. Biochemistry27, 8063–8068 (1988). [DOI] [PubMed] [Google Scholar]
62.Hassanieh, H., Mayzel, M., Shi, L., Katabi, D. & Orekhov, V. Y. Fast multi-dimensional NMR acquisition and processing using the sparse FFT. J. Biomol. NMR63, 9–19 (2015). [DOI] [PubMed] [Google Scholar]
63.Schmidt, E. & Guntert, P. A new algorithm for reliable and general NMR resonance assignment. J. Am. Chem. Soc.134, 12817–12829 (2012). [DOI] [PubMed] [Google Scholar]
64.Shen, Y., Delaglio, F., Cornilescu, G. & Bax, A. TALOS+: a hybrid method for predicting protein backbone torsion angles from NMR chemical shifts. J. Biomol. NMR44, 213–223 (2009). [DOI] [PMC free article] [PubMed] [Google Scholar]
65.Guntert, P. & Buchner, L. Combined automated NOE assignment and structure calculation with CYANA. J. Biomol. NMR62, 453–471 (2015). [DOI] [PubMed] [Google Scholar]
66.Zweckstetter, M. & Bax, A. Prediction of sterically induced alignment in a dilute liquid crystalline phase: aid to protein structure determination by NMR. JACS122, 3791–3792 (2000). [Google Scholar]
67.Farrow, N. A. et al. Backbone dynamics of a free and phosphopeptide-complexed Src homology 2 domain studied by 15 N NMR relaxation. Biochemistry33, 5984–6003 (1994). [DOI] [PubMed] [Google Scholar]
68.Hwang, T. L., Mori, S., Shaka, A. J. & vanZijl, P. C. M. Application of phase-modulated CLEAN chemical EXchange spectroscopy (CLEANEX-PM) to detect water–protein proton exchange and intermolecular NOEs. JACS119, 6203–6204 (1997). [Google Scholar]
69.Hwang, T. L., van Zijl, P. C. & Mori, S. Accurate quantitation of water-amide proton exchange rates using the phase-modulated CLEAN chemical EXchange (CLEANEX-PM) approach with a Fast-HSQC (FHSQC) detection scheme. J. Biomol. NMR11, 221–226 (1998). [DOI] [PubMed] [Google Scholar]
70.Kabsch, W. Xds. Acta Crystallogr. D66, 125–132 (2010). [DOI] [PMC free article] [PubMed] [Google Scholar]
71.Vagin, A. & Teplyakov, A. Molecular replacement with MOLREP. Acta Crystallogr. D66, 22–25 (2010). [DOI] [PubMed] [Google Scholar]
72.Emsley, P. & Cowtan, K. Coot: model-building tools for molecular graphics. Acta Crystallogr. D60, 2126–2132 (2004). [DOI] [PubMed] [Google Scholar]
73.Adams, P. D. et al. PHENIX: a comprehensive Python-based system for macromolecular structure solution. Acta Crystallogr. D66, 213–221 (2010). [DOI] [PMC free article] [PubMed] [Google Scholar]
74.Lovell, S. C. et al. Structure validation by Cα geometry: ϕ,ψ and Cβ deviation. Proteins50, 437–450 (2003). [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary Information^{(11.5MB, pdf)}

Supplementary Text, Figs. 1–12 and Tables 1–9.

Reporting Summary^{(67KB, pdf)}

Peer Review File^{(1.4MB, pdf)}

Supplementary Data 1^{(203.2KB, zip)}

Computational design models of the designed all-α proteins.

Supplementary Data 2^{(290.4KB, txt)}

Source data for Supplementary Fig. 11.

Source Data Fig. 1^{(149.2KB, txt)}

Statistical source data.

Source Data Fig. 3^{(47.3MB, txt)}

Statistical source data.

Source Data Extended Data Fig. 1^{(5.8MB, zip)}

Statistical source data.

Source Data Extended Data Fig. 3^{(391.8KB, xlsx)}

Statistical source data.

Source Data Extended Data Figs. 4 and 5^{(956.3KB, txt)}

Statistical source data.

Source Data Extended Data Fig. 6^{(1.5MB, zip)}

Statistical source data.

Source Data Extended Data Fig. 7^{(344.1MB, zip)}

Statistical source data.

Data Availability Statement

[CR1] 1.Kendrew, J. C. et al. A three-dimensional model of the myoglobin molecule obtained by x-ray analysis. Nature181, 662–666 (1958). [DOI] [PubMed] [Google Scholar]

[CR2] 2.Crick, F. H. C. The packing of α-helices: simple coiled-coils. Acta Crystallogr.6, 689–697 (1953). [Google Scholar]

[CR3] 3.Chothia, C., Levitt, M. & Richardson, D. Structure of proteins: packing of alpha-helices and pleated sheets. Proc. Natl Acad. Sci. USA74, 4130–4134 (1977). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR4] 4.Kobe, B. & Kajava, A. V. When protein folding is simplified to protein coiling: the continuum of solenoid protein structures. Trends Biochem. Sci.25, 509–515 (2000). [DOI] [PubMed] [Google Scholar]

[CR5] 5.Doyle, L. et al. Rational design of alpha-helical tandem repeat proteins with closed architectures. Nature528, 585–588 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR6] 6.Walsh, S. T., Cheng, H., Bryson, J. W., Roder, H. & DeGrado, W. F. Solution structure and dynamics of a de novo designed three-helix bundle protein. Proc. Natl Acad. Sci. USA96, 5486–5491 (1999). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR7] 7.Dai, Q. H. et al. Structure of a de novo designed protein model of radical enzymes. J. Am. Chem. Soc.124, 10952–10953 (2002). [DOI] [PubMed] [Google Scholar]

[CR8] 8.Wei, Y., Kim, S., Fela, D., Baum, J. & Hecht, M. H. Solution structure of a de novo protein from a designed combinatorial library. Proc. Natl Acad. Sci. USA100, 13270–13273 (2003). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR9] 9.Go, A., Kim, S., Baum, J. & Hecht, M. H. Structure and dynamics of de novo proteins from a designed superfamily of 4-helix bundles. Protein Sci.17, 821–832 (2008). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR10] 10.Calhoun, J. R. et al. Solution NMR structure of a designed metalloprotein and complementary molecular dynamics refinement. Structure16, 210–215 (2008). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR11] 11.Huang, P. S. et al. High thermodynamic stability of parametrically designed helical bundles. Science346, 481–485 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR12] 12.Murphy, G. S. et al. Computational de novo design of a four-helix bundle protein–DND_4HB. Protein Sci.24, 434–445 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR13] 13.Brunette, T. J. et al. Exploring the repeat protein universe through computational protein design. Nature528, 580–584 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR14] 14.Boyken, S. E. et al. De novo design of protein homo-oligomers with modular hydrogen-bond network-mediated specificity. Science352, 680–687 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR15] 15.Jacobs, T. M. et al. Design of structurally distinct proteins using strategies inspired by evolution. Science352, 687–690 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR16] 16.Bhardwaj, G. et al. Accurate de novo design of hyperstable constrained peptides. Nature538, 329–335 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR17] 17.Chevalier, A. et al. Massively parallel de novo protein design for targeted therapeutics. Nature550, 74–79 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR18] 18.Rocklin, G. J. et al. Global analysis of protein folding using massively parallel design, synthesis, and testing. Science357, 168–175 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR19] 19.Polizzi, N. F. et al. De novo design of a hyperstable non-natural protein–ligand complex with sub-A accuracy. Nat. Chem.9, 1157–1164 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR20] 20.Studer, S. et al. Evolution of a highly active and enantiospecific metalloenzyme from short peptides. Science362, 1285–1288 (2018). [DOI] [PubMed] [Google Scholar]

[CR21] 21.Koebke, K. J. et al. Clarifying the copper coordination environment in a de novo designed red copper protein. Inorg. Chem.57, 12291–12302 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR22] 22.ElGamacy, M., Coles, M. & Lupas, A. Asymmetric protein design from conserved supersecondary structures. J. Struct. Biol.204, 380–387 (2018). [DOI] [PubMed] [Google Scholar]

[CR23] 23.Chen, Z. et al. Programmable design of orthogonal protein heterodimers. Nature565, 106–111 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR24] 24.Silva, D. A. et al. De novo design of potent and selective mimics of IL-2 and IL-15. Nature565, 186–191 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR25] 25.Xu, C. et al. Computational design of transmembrane pores. Nature585, 129–134 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR26] 26.Cao, L. et al. De novo design of picomolar SARS-CoV-2 miniprotein inhibitors. Science370, 426–431 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR27] 27.Sesterhenn, F. et al. De novo protein design enables the precise induction of RSV-neutralizing antibodies. Science368, eaay5051 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR28] 28.Murzin, A. G. & Finkelstein, A. V. General architecture of the α-helical globule. J. Mol. Biol.204, 749–769 (1988). [DOI] [PubMed] [Google Scholar]

[CR29] 29.Koga, N. et al. Principles for designing ideal protein structures. Nature491, 222–227 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR30] 30.Rodriguez, A. & Laio, A. Machine learning. Clustering by fast search and find of density peaks. Science344, 1492–1496 (2014). [DOI] [PubMed] [Google Scholar]

[CR31] 31.Richardson, J. S. & Richardson, D. C. Amino acid preferences for specific locations at the ends of α helices. Science240, 1648–1652 (1988). [DOI] [PubMed] [Google Scholar]

[CR32] 32.Doig, A. J. & Baldwin, R. L. N- and C-capping preferences for all 20 amino acids in α-helical peptides. Protein Sci.4, 1325–1336 (1995). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR33] 33.Efimov, A. V. A novel super-secondary structure of proteins and the relation between the structure and the amino acid sequence. FEBS Lett.166, 33–38 (1984). [DOI] [PubMed] [Google Scholar]

[CR34] 34.Zhang, Y. & Skolnick, J. TM-align: a protein structure alignment algorithm based on the TM-score. Nucleic Acids Res.33, 2302–2309 (2005). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR35] 35.Cheng, H. et al. ECOD: an evolutionary classification of protein domains. PLoS Comput. Biol.10, e1003926 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR36] 36.Kuhlman, B. et al. Design of a novel globular protein fold with atomic-level accuracy. Science302, 1364–1368 (2003). [DOI] [PubMed] [Google Scholar]

[CR37] 37.Leaver-Fay, A. et al. ROSETTA3: an object-oriented software suite for the simulation and design of macromolecules. Methods Enzymol.487, 545–574 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR38] 38.Sheffler, W. & Baker, D. RosettaHoles2: a volumetric packing measure for protein structure refinement and validation. Protein Sci.19, 1991–1995 (2010). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR39] 39.Rohl, C. A., Strauss, C. E., Misura, K. M. & Baker, D. Protein structure prediction using Rosetta. Methods Enzymol.383, 66–93 (2004). [DOI] [PubMed] [Google Scholar]

[CR40] 40.Kobayashi, N. et al. KUJIRA, a package of integrated modules for systematic and interactive analysis of NMR data directed to high-throughput NMR structure studies. J. Biomol. NMR39, 31–52 (2007). [DOI] [PubMed] [Google Scholar]

[CR41] 41.Kobayashi, N. et al. Noise peak filtering in multi-dimensional NMR spectra using convolutional neural networks. Bioinformatics34, 4300–4301 (2018). [DOI] [PubMed] [Google Scholar]

[CR42] 42.Myers, J. K., Pace, C. N. & Scholtz, J. M. Denaturant m values and heat capacity changes: relation to changes in accessible surface areas of protein unfolding. Protein Sci.4, 2138–2148 (1995). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR43] 43.Schiffer, M. & Edmundson, A. B. Use of helical wheels to represent the structures of proteins and to identify segments with helical potential. Biophys. J.7, 121–135 (1967). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR44] 44.Crick, F. H. C. The Fourier transform of a coiled-coil. Acta Crystallogr.6, 685–689 (1953). [Google Scholar]

[CR45] 45.Lin, Y. R. et al. Control over overall shape and size in de novo designed proteins. Proc. Natl Acad. Sci. USA112, E5478–E5485 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR46] 46.Huang, P. S. et al. De novo design of a four-fold symmetric TIM-barrel protein with atomic-level accuracy. Nat. Chem. Biol.12, 29–34 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR47] 47.Marcos, E. et al. Principles for designing proteins with cavities formed by curved beta sheets. Science355, 201–206 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR48] 48.Marcos, E. et al. De novo design of a non-local β-sheet protein with high stability and accuracy. Nat. Struct. Mol. Biol.25, 1028–1034 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR49] 49.Dou, J. et al. De novo design of a fluorescence-activating β-barrel. Nature561, 485–491 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR50] 50.Klein, J. C. et al. Multiplex pairwise assembly of array-derived DNA oligonucleotides. Nucleic Acids Res.44, e43 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR51] 51.Plesa, C., Sidore, A. M., Lubock, N. B., Zhang, D. & Kosuri, S. Multiplexed gene synthesis in emulsions for exploring protein functional landscapes. Science359, 343–347 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR52] 52.Basanta, B. et al. An enumerative algorithm for de novo design of proteins with diverse pocket structures. Proc. Natl Acad. Sci. USA117, 22135–22145 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR53] 53.Koradi, R., Billeter, M. & Wuthrich, K. MOLMOL: a program for display and analysis of macromolecular structures. J. Mol. Graph14, 51–55 (1996). [DOI] [PubMed] [Google Scholar]

[CR54] 54.Kobayashi, N. A robust method for quantitative identification of ordered cores in an ensemble of biomolecular structures by non-linear multi-dimensional scaling using inter-atomic distance variance matrix. J. Biomol. NMR58, 61–67 (2014). [DOI] [PubMed] [Google Scholar]

[CR55] 55.Krissinel, E. & Henrick, K. Secondary-structure matching (SSM), a new tool for fast protein structure alignment in three dimensions. Acta Crystallogr D60, 2256–2268 (2004). [DOI] [PubMed] [Google Scholar]

[CR56] 56.Minami, S., Sawada, K. & Chikenji, G. MICAN: a protein structure alignment algorithm that can handle multiple-chains, inverse alignments, C α only models, alternative alignments, and non-sequential alignments. BMC Bioinf.14, 24 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR57] 57.Kabsch, W. & Sander, C. Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features. Biopolymers22, 2577–2637 (1983). [DOI] [PubMed] [Google Scholar]

[CR58] 58.Orengo, C. A. et al. CATH–a hierarchic classification of protein domain structures. Structure5, 1093–1108 (1997). [DOI] [PubMed] [Google Scholar]

[CR59] 59.Wang, G. & Dunbrack, R. L. Jr. PISCES: a protein sequence culling server. Bioinformatics19, 1589–1591 (2003). [DOI] [PubMed]

[CR60] 60.Jansson, M. et al. High-level production of uniformly N-15- and C-13-enriched fusion proteins in Escherichia coli. J. Biomol. NMR7, 131–141 (1996). [DOI] [PubMed] [Google Scholar]

[CR61] 61.Santoro, M. M. & Bolen, D. W. Unfolding free energy changes determined by the linear extrapolation method. 1. Unfolding of phenylmethanesulfonyl α-chymotrypsin using different denaturants. Biochemistry27, 8063–8068 (1988). [DOI] [PubMed] [Google Scholar]

[CR62] 62.Hassanieh, H., Mayzel, M., Shi, L., Katabi, D. & Orekhov, V. Y. Fast multi-dimensional NMR acquisition and processing using the sparse FFT. J. Biomol. NMR63, 9–19 (2015). [DOI] [PubMed] [Google Scholar]

[CR63] 63.Schmidt, E. & Guntert, P. A new algorithm for reliable and general NMR resonance assignment. J. Am. Chem. Soc.134, 12817–12829 (2012). [DOI] [PubMed] [Google Scholar]

[CR64] 64.Shen, Y., Delaglio, F., Cornilescu, G. & Bax, A. TALOS+: a hybrid method for predicting protein backbone torsion angles from NMR chemical shifts. J. Biomol. NMR44, 213–223 (2009). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR65] 65.Guntert, P. & Buchner, L. Combined automated NOE assignment and structure calculation with CYANA. J. Biomol. NMR62, 453–471 (2015). [DOI] [PubMed] [Google Scholar]

[CR66] 66.Zweckstetter, M. & Bax, A. Prediction of sterically induced alignment in a dilute liquid crystalline phase: aid to protein structure determination by NMR. JACS122, 3791–3792 (2000). [Google Scholar]

[CR67] 67.Farrow, N. A. et al. Backbone dynamics of a free and phosphopeptide-complexed Src homology 2 domain studied by 15 N NMR relaxation. Biochemistry33, 5984–6003 (1994). [DOI] [PubMed] [Google Scholar]

[CR68] 68.Hwang, T. L., Mori, S., Shaka, A. J. & vanZijl, P. C. M. Application of phase-modulated CLEAN chemical EXchange spectroscopy (CLEANEX-PM) to detect water–protein proton exchange and intermolecular NOEs. JACS119, 6203–6204 (1997). [Google Scholar]

[CR69] 69.Hwang, T. L., van Zijl, P. C. & Mori, S. Accurate quantitation of water-amide proton exchange rates using the phase-modulated CLEAN chemical EXchange (CLEANEX-PM) approach with a Fast-HSQC (FHSQC) detection scheme. J. Biomol. NMR11, 221–226 (1998). [DOI] [PubMed] [Google Scholar]

[CR70] 70.Kabsch, W. Xds. Acta Crystallogr. D66, 125–132 (2010). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR71] 71.Vagin, A. & Teplyakov, A. Molecular replacement with MOLREP. Acta Crystallogr. D66, 22–25 (2010). [DOI] [PubMed] [Google Scholar]

[CR72] 72.Emsley, P. & Cowtan, K. Coot: model-building tools for molecular graphics. Acta Crystallogr. D60, 2126–2132 (2004). [DOI] [PubMed] [Google Scholar]

[CR73] 73.Adams, P. D. et al. PHENIX: a comprehensive Python-based system for macromolecular structure solution. Acta Crystallogr. D66, 213–221 (2010). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR74] 74.Lovell, S. C. et al. Structure validation by Cα geometry: ϕ,ψ and Cβ deviation. Proteins50, 437–450 (2003). [DOI] [PubMed] [Google Scholar]

PERMALINK

Design of complicated all-α protein structures

Koya Sakuma

Naohiro Kobayashi

Toshihiko Sugiki

Toshio Nagashima

Toshimichi Fujiwara

Kano Suzuki

Naoya Kobayashi

Takeshi Murata

Takahiro Kosugi

Rie Tatsumi-Koga

Nobuyasu Koga

Abstract

Main

Fig. 1. Comparison of the structural complexities of naturally occurring and de novo designed proteins.

Extended Data Fig. 1. Twenty-two de novo designed proteins collected from Protein Data Bank (PDB).

Strategy for all-α topology building

Fig. 2. Strategy for building α-helical backbone structure topologies.

A typical set of HLH motifs as building blocks

Extended Data Fig. 2. The five features representing the HLH tertiary geometry.

Extended Data Fig. 3. Statistical analysis of HLH motifs of naturally occurring proteins.

Fig. 3. 18 HLH tertiary motifs and generated α-helical backbone structures.

Extended Data Fig. 4. Mapping of 18 representative HLH motifs on the ϕNC-ϕNL plane.

Extended Data Fig. 5. ABEGO-based loop geometries and amino acid sequence preferences of the cluster that each HLH motif belongs to.

Generation of all-α topologies by combinatorial enumeration

Extended Data Fig. 6. Examples of compact structures obtained from the enumeration of six-helix structures.

Extended Data Fig. 7. Distribution of the helix lengths for the generated backbone structures with four-helix and 70 residues, five-helix and 90 residues, and six-helix and 110 residues.

Design of complicated α-helical topologies

Fig. 4. Backbone structures for the five design target topologies.

Extended Data Fig. 8. Comparison of designed structures and the most similar naturally occurring proteins.

Experimental characterization of designed proteins

Extended Data Table 1.

Fig. 5. Characterization of designed proteins.

Table 1.

Fig. 6. Comparison of computational models with experimentally determined structures.

Table 2.

Extended Data Fig. 9. Comparison of computational models (left) with experimentally determined structures (right).

Discussion

Methods

Definition of HO

Analysis of all-α protein structures for de novo designed and naturally occurring proteins

Clustering of HLH units using the five features representing a HLH geometry

Building backbone structures

Selection of designs for experimental characterization based on the shapes of energy landscapes

Expression and purification of designed proteins

Experiments to identify designed proteins exhibiting folding ability

CD spectroscopy under 1-bar pressure

SEC–MALS

1H-15N HSQC NMR spectroscopy

High-pressure CD spectroscopy for melting temperature (Tm) estimation

CD spectroscopy for chemical denaturation

Sample preparation for NMR structure determination

Solution structure determination by NMR

NMR measurements

NMR signal assignments

Structure calculation

NMR structure validation

Solution structural dynamics of H6_fold-U_Nomur measured by NMR

15N R1, R2 and 15N-{1H} NOE experiments

2D 1H-15N CLEANEX-PM FHSQC experiments

X-ray structure determination of H5_fold-0_Elsa

Sample preparation for X-ray structure determination

Crystallization and X-ray structure determination

Reporting summary

Online content

Supplementary information

Source data

Acknowledgements

Extended data

Author contributions

Peer review

Peer review information

Data availability

Code availability

Competing interests

Footnotes

Extended data

Supplementary information

References

Extended Data Fig. 4. Mapping of 18 representative HLH motifs on the ϕ_NC-ϕ_NL plane.

¹H-¹⁵N HSQC NMR spectroscopy

High-pressure CD spectroscopy for melting temperature (T_m) estimation

¹⁵N R₁, R₂ and ¹⁵N-{¹H} NOE experiments

2D ¹H-¹⁵N CLEANEX-PM FHSQC experiments