Skip to main content
eLife logoLink to eLife
. 2015 Dec 9;4:e11012. doi: 10.7554/eLife.11012

Precise assembly of complex beta sheet topologies from de novo designed building blocks

Indigo Chris King 1,*,, James Gleixner 1, Lindsey Doyle 2, Alexandre Kuzin 3, John F Hunt 3, Rong Xiao 4, Gaetano T Montelione 4, Barry L Stoddard 2, Frank DiMaio 1, David Baker 1
Editor: Nir Ben-Tal5
PMCID: PMC4737653  PMID: 26650357

Abstract

Design of complex alpha-beta protein topologies poses a challenge because of the large number of alternative packing arrangements. A similar challenge presumably limited the emergence of large and complex protein topologies in evolution. Here, we demonstrate that protein topologies with six and seven-stranded beta sheets can be designed by insertion of one de novo designed beta sheet containing protein into another such that the two beta sheets are merged to form a single extended sheet, followed by amino acid sequence optimization at the newly formed strand-strand, strand-helix, and helix-helix interfaces. Crystal structures of two such designs closely match the computational design models. Searches for similar structures in the SCOP protein domain database yield only weak matches with different beta sheet connectivities. A similar beta sheet fusion mechanism may have contributed to the emergence of complex beta sheets during natural protein evolution.

DOI: http://dx.doi.org/10.7554/eLife.11012.001

Research Organism: E. coli

eLife digest

A protein is made up of a sequence of amino acids and must fold into a specific three-dimensional structure if it is to work correctly. The structure is formed by segments of the protein adopting specific shapes, the two most common shapes being alpha helices and beta strands. Beta strands commonly interact with each other to form regions called beta sheets.

Researchers trying to design proteins with new abilities have managed to create proteins that contain up to five beta strands and four alpha helices. Larger and more complex proteins are more challenging to make because there are many different ways that a protein can fold. It is also difficult to understand how complex structures such as large beta sheets emerged naturally, over the course of evolution.

King et al. have now used computer modeling to explore how a large, complex beta sheet might form. In the model, one small, newly designed protein was inserted into another so that their beta sheets merged to form a single extended sheet. The model then stabilized this structure by changing the amino acids found at the points where the two proteins met.

King et al. were then able to synthesize these new proteins in bacteria and use a technique called X-ray crystallography to determine the structure of two of them. The structures closely matched the computer models; one protein contained a six-stranded beta sheet, and the other had a seven-stranded beta sheet. The folds of the two designed proteins were then compared with those found in a database that classifies proteins on the basis of their structure. The beta sheets in the designed proteins did not match the protein structures in the database, which suggests that the designed proteins contained new types of folds.

In the future, the technique used by King et al. could be used to design other large and complex beta sheet structures. Furthermore, the results suggest that such large structures could have evolved naturally through the combination of smaller, less complex proteins.

DOI: http://dx.doi.org/10.7554/eLife.11012.002

Introduction

Modular domains constitute the primary structural and functional units of natural proteins. Multi-domain proteins likely evolved through simple linear concatenation of successive domains onto the polypeptide chain or through the insertion of one or more continuous sequences into the middle of another, now discontinuous domain (Aroul-Selvam et al., 2004; Berrondo et al., 2008; Lupas et al., 2001; Pandya et al., 2013). By analogy, new proteins have been engineered from existing domains by simple linear concatenation or insertion of one domain into another (Ay et al., 1998; Collinet et al., 2000; Cutler et al., 2009; Doi and Yanagawa, 1999; Edwards et al., 2008; Guntas and Ostermeier, 2004; Ostermeier, 2005). How individual domains evolved, in contrast, is much less clear. Both experimental and computational analyses have suggested that new folds can evolve by insertion of one fold into another (Lupas et al., 2001; Grishin, 2001; Söding and Lupas, 2003; Krishna and Grishin, 2005; Friedberg and Godzik, 2005; Ben-Tal and Kolodny, 2014), but to our knowledge, there is no evidence that complex beta sheet topologies can be formed in this manner. On the protein design front, there has been progress in de novo design of idealized helical bundles (Park et al., 2015) and alpha beta protein structures with up to 5 strands (Koga et al., 2012), and although new folds have been generated by tandem fusion of natural protein domains followed by introduction of additional stabilizing mutations (Hocker et al., 2004; Shanmugaratnam et al., 2012), assembly of large and complex beta sheets poses a challenge for de novo protein design.

One possible route to the large and complex beta sheet topologies found in many native protein domains is recombination of two smaller beta sheet domains. Here, we explore the viability of such a mechanism by inserting one de novo designed alpha beta protein into another such that the two beta sheets are combined into one. The backbone geometry at the junctions between the original domains is regularized, and the sequence at the newly formed interface is optimized to stabilize the single integrated domain structure. Crystal structures of two such proteins demonstrate that complex beta sheet structures can be designed with considerable accuracy using this approach and provide a proof-of-concept for the hypothesis that complex beta topologies in natural proteins may have evolved from simpler beta sheet structures in a similar manner.

Results

A first extended sheet protein was created by inserting a designed ferredoxin domain into a beta turn of the designed TOP7 protein to create a half-barrel structure, with the two sheets fused into a single seven strand sheet flanked by four helices (Figure 1A). The CD spectra show both alpha and beta structures (Figure 2—figure supplement 1). Two crystal structures (NESG target OR327) were solved by molecular replacement and refined to 2.49 Å (PDB entry 4KYZ) and 2.96 Å (PDB entry 4KY3) resolutions. Further analysis refers only to the higher resolution structure (4KYZ). The structure shows excellent agreement with the design model (Figure 2A), particularly in low B-factor regions, with C-alpha RMSD ranging from 1.76 to 1.85 Å among the four protomers in the crystal. The relative orientation of the strands packed against the helices is close to that in the design model, and core sidechains at the designed interfaces are in very similar conformations in the design model and crystal (Figure 2B,C).

Figure 1. Generation of protein domains with single extended beta sheets by inserting one beta sheet containing protein into another.

Figure 1.

(A) Insertion of a ferrrodoxin domain (purple) into TOP7 (red). (B) Insertion of one ferrodoxin domain into another. In both cases, two beta strands from each partner (red and purple) are concatenated to form the central strand pair of the fusion protein (pink).

DOI: http://dx.doi.org/10.7554/eLife.11012.003

Figure 2. Comparison of the crystal structure of ferredoxin-TOP7 fusion to design model.

(A) Backbone superposition of the crystal structure of ferredoxin-TOP7 (4KYZ, chain A) with the design model. The backbones of the two proteins are nearly identical. (B, C) The core sidechain packing in the ferrodoxin-TOP7 fusion is very similar in the crystal structure and design model both in the insert (B) and host (C) domains. The crystal structure is colored by B-factor and the design model is in gray.

DOI: http://dx.doi.org/10.7554/eLife.11012.004

Figure 2.

Figure 2—figure supplement 1. The circular dichroism spectrum of ferrodoxin-TOP7 has the shape expected for an alpha/beta protein.

Figure 2—figure supplement 1.

A second extended sheet protein was created by inserting one designed ferredoxin domain into another to create a half-barrel structure with four alpha helices and six beta strands (Figure 1B). A beta turn segment between two beta strands of the host ferredoxin was removed and the resulting cut-points in the host beta strands were linked to two beta strand cut-points in the insert, fusing the two strand pairs into a single, longer pair at the center of a six-stranded beta sheet. CD spectra show that the protein contains both alpha and beta structures (Figure 3—figure supplement 1). Crystals were obtained which diffracted to 3.3Å resolution. Molecular replacement using the computational design models (DiMaio et al., 2013) yielded a solution for which the refinement statistics are shown in Supplementary file 1 (PDB entry 5CW9). Attempts to improve these statistics by rebuilding portions of the model proved unsuccessful, possibly due to a register shift or dynamic fluctuations in the structure (perhaps corresponding to slightly 'molten-globule'-like behavior) that are difficult to computationally model. However, unbiased low-resolution omit maps suggest that the overall topology is correct (Figure 3—figure supplement 2). In the model that displays the best refinement statistics, the protein backbone was similar to the design model with a C-alpha RMSD value of 2 Å (Figure 3A,B). The fused beta sheet aligns with the design model, while the inter-domain helices shift slightly to accommodate the inter-domain interface. The sidechain packing between the newly juxtaposed beta strands succeeded in anchoring the secondary structure elements in their intended orientations, but the low resolution of the crystal structure prevents evaluation of the atomic-level accuracy of the design (Figure 3—figure supplement 2).

Figure 3. Comparison of the crystal structure of the ferredoxin-ferredoxin fusion to the design model.

The crystal structure (5CW9) aligns well with the design model over both the helices (A) and the fused beta sheet (B).

DOI: http://dx.doi.org/10.7554/eLife.11012.006

Figure 3.

Figure 3—figure supplement 1. Circular dichroism spectra of ferrrodoxin-ferrodoxin at 25°C.

Figure 3—figure supplement 1.

Figure 3—figure supplement 2. Ferredoxin-Ferredoxin 2Fo-Fc omit map superimposed with crystal structure shows core packing of host (A) and insert (B) domains.

Figure 3—figure supplement 2.

To compare the folds of these designed proteins to those in the SCOP v.1.75 domain database (Murzin et al., 1995), the TMalign structure-structure comparison method was used to search a 70% sequence non-redundant set of SCOP domains (Ben-Tal and Kolodny, 2014) for structure alignments containing a minimum 75% overlap with the designed proteins. The most similar SCOP domains had weak TM-align scores (0.54 and 0.51), and the sheets in these matched structures have different connectivities than those of the designs, suggesting that the two designed proteins have novel folds (Figure 4). While there are no domains with globally similar folds, both designed proteins are similar to a number of SCOP domains over the ferrodoxin-like substructure(s) (maps of the proteins to the domain network of Nepomnyachiy et al. (Ben-Tal and Kolodny, 2014) are shown in Figure 4—figure supplement 1). The mutations introduced at the redesign stage of the domain insertion design protocol are compatible with the parent fold structures with minimal perturbation of the protein backbone (Figure 4—figure supplement 2) suggesting the designed folds would have the potential to evolve from insertion followed by neutral mutational drift of the parent structures.

Figure 4. Top two SCOP domain structural homologues for Fd-Top7 (A) and Fd-Fd (B) designed domain found in TM-align searches.

Ribbon diagrams are shown on left, the strand connectivity, at the right. The beta strand connectivity is quite different in the designs than in these closest structural matches.

DOI: http://dx.doi.org/10.7554/eLife.11012.009

Figure 4.

Figure 4—figure supplement 1. Parent domain PDB structures (2KL8, 1QYS) and daughter designed folds (5CW9,4KYZ) (pink) mapped into the α+β region of the SCOP domains network of Nepomnyachi et al. (A) and zoomed region (B) highlighting parent, designed, and first neighbor folds.

Figure 4—figure supplement 1.

Figure 4—figure supplement 2. Neutral drift mutant models, relative changes to predicted free energy of folding in REU (Rosetta Energy Units), and multiple sequence alignment of parent and designed sequences, showing mutations in ferredoxin-top7 (A) and ferredoxin-ferredoxin (B).

Figure 4—figure supplement 2.

Discussion

We have shown that single designed protein domains can be combined into larger domains with complex beta sheet topologies. This mechanism provides a straightforward route to designing large and complex beta sheet structures capable of scaffolding the pockets and cavities essential for future design of protein functions. Our success in designing larger beta sheet domains by recombining smaller independently folded beta sheet proteins suggests a similar mechanism could have played a role in the evolution of naturally occurring complex beta sheet proteins.

Materials and methods

Our design strategy began with selection of three previously characterized de novo designed protein domains to serve as building blocks for recombination through domain insertion: ferredoxin, rossman 2x2, and top7 (Koga et al., 2012). These three domains were chosen because they were the only Rosetta de novo designed protein domains with both alpha and beta secondary structures for which high-resolution experimental structures had been obtained at the time of this work. Each chimeric domain consists of a parent host domain and a parent insert domain. In the insert domain, three residues from from the N-terminus were paired with three residue from the C-terminus to create nine residue pairs. Each residue pair was then aligned against all pairs of residues in the host domain to search for possible insertion points. Insertion points were accepted for residue pair alignment distances of 1 angstrom RMSD or less, replacing host domain segments of less than 5 residues. For every insertion point, a structure is generated by removing the residues between the insertion residues of the host domain and adding linkers between the aligned host and insert domain residues (Figure 1). Host and insert were connected by addition of 1–3 residues at the domain junctions using Rosetta Remodel (Huang et al., 2011), and 12 models in which this junction formed a continuous beta strand were identified. The sequences of these chimeras were optimized using Rosetta Design calculations around the junction regions and the new interface between the former domains. During the design simulation, all amino acid positions within 5 Å of the inter-domain junction interface were redesigned to minimize the predicted free energy of folding with the Rosetta all-atom energy function and a flexible backbone protein design protocol described previously (Huang et al., 2011). Final designs were selected based on Rosetta energy, packing metrics, and similarity of the junction backbone geometry to local backbone geometry in the PDB. Twelve final domain insertion designs were chosen for expression in Escherichia coli as 6xHis-tag fusions and purified on a Ni-NTA column. Purified proteins were evaluated for the presence of alpha/beta secondary structures via circular dichroism spectroscopy (CD), and three with levels of secondary structure content consistent with the design model were subjected to crystallographic analysis. One design based on Rossman 2x2 expressed as soluble protein, but no crystal structure could be obtained. Crystal structures were obtained for two designed proteins: a ferredoxin-top7 chimera and a ferredoxin-ferredoxin chimera. The design and characterization of these two proteins is described in the Results.

Crystal structures were used to search for structural homologs in the SCOP database. First, crystal structures (ferredoxin-top7: 4KYZ chain A, ferredoxin-ferredoxin: 5CW9 chain A) were used as search queries using TMalign (Zhang, 2005). Hits were saved only if the alignment covered 75% or more of the query structure. Results were sorted by TM-score to identify the most similar structures in the SCOP database. Secondary structure topology cartoons were created with the Pro Origami server (Stivala et al., 2011). To map designed protein crystal structures into the protein domains network, the structures were aligned to all domain structures in the protein domains network using the PDBeFold server (Krissinel and Henrick, 2004). PDBeFold structural alignment hits were filtered for RMSD ≤2.5 Å and aligned sequence length of ≥75 residues. In contrast to the methods of Nepomnyachi et al., sequence similarity thresholds were ignored. Including sequence similarity thresholds eliminates matching hits in the domains network. This is not surprising because the proteins were designed de novo and did not evolve from natural proteins. Filtered alignment hits were mapped into the protein domains network using Cytoscape (Shannon, 2003). To evaluate neutral drift models of the parent folds, then crystal structures of de novo ferredoxin and Top7 proteins (2KL8 and 1QYS) were obtained and corresponding mutations from the final design proteins were modeled using a flexible backbone protein design algorithm described previously (Huang et al., 2011). Final Rosetta energies were calculated and subtracted from the Rosetta energies of the original parent protein structures to obtain predictions of the change in free energy of folding.

The ferredoxin – TOP7 protein (NESF ID OR327) was expressed and purified following standard protocols developed by the NESG for production of selenomethionine-labeled protein samples (Xiao et al., 2010). Briefly, E. coli BL21 (DE3) pMGK cells, a rare-codon enhanced strain, were transformed with the DNA sequence-verified OR327-21.1 plasmid. A single isolate was cultured in MJ9 minimal media supplemented with selenomethionine, lysine, phenylalanine, threonine, isoleucine, leucine, and valine for the production of selenomethionine-labeled OR327. Initial growth was carried out at 37°C until the OD600 of the culture reached ∼0.8 units. The incubation temperature was then decreased to 17°C, and protein expression was induced by the addition of isopropyl-β-D-thiogalactopyranoside (IPTG) at a final concentration of 1 mM. Following overnight incubation at 17°C, the cells were harvested by centrifugation and resuspended in Lysis Buffer [50 mM Tris, pH 7.5, 500 mM NaCl, 1 mM tris (2-carboxyethyl)phosphine, 40 mM imidazole]. After sonication, the supernatant was collected by centrifugation for 40 min at 30,000 g. The supernatant was loaded first onto a Ni affinity column (HisTrap HP; GE Healthcare, Marlborough, MA) and the eluate loaded into a gel filtration column (Superdex 75 26/60; GE Healthcare). Yields were 60-–90 mg/L. The purified 6His-OR327 construct in buffer containing 10 mM Tris·HCl, 100 mM NaCl, 5 mM DTT, pH 7.5, was then concentrated to ∼10.6 mg/mL. The sample was flash-frozen in 50-μL aliquots using liquid nitrogen and stored at −80°C before crystallization trials. The sample purity (>98%), molecular weight, and oligomerization state were verified by SDS/PAGE, MALDI-TOF mass spectrometry, and analytic gel filtration followed by static light scattering, respectively. For static light scattering, selenomethionine-labeled ferredoxin – TOP7 protein (30 μL at 10 mM Tris·HCl, pH 7.5, 100 mM NaCl, 5 mM DTT) was injected onto an analytical gel filtration column (Shodex KW-802.5; Shodex, New York, NY) with the effluent monitored by refractive index (Optilab rEX; Wyatt Technology, Santa Barbara, CA) and 90° static light-scattering (miniDAWN TREOS; Wyatt Technology) detectors.

Accession codes

Structures have been deposited in the Protein Data Bank as entries 5CW9, 4KYZ, and 4KY3.

Acknowledgements

We thank Rie Koga and Nobuyasu Koga for data analyses and technical assistance. We thank Lei Mao for technical assistance. This work was supported by the Defense Threat Reduction Agency and by a grant from the National Institute of General Medical Sciences Protein Structure Initiative U54-GM094597 (to GTM, JH).

Funding Statement

The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.

Funding Information

This paper was supported by the following grants:

  • Defense Threat Reduction Agency to Indigo Chris King, James Gleixner, David Baker.

  • National Institute of General Medical Sciences to John F Hunt, Gaetano T Montelione.

Additional information

Competing interests

The authors declare that no competing interests exist.

Author contributions

ICK, wrote code for simulations, performed simulations, and wrote the paper; commented on the manuscript, Conception and design, Acquisition of data, Analysis and interpretation of data, Drafting or revising the article.

JG, performed simulations, performed experiments, and analyzed data; commented on the manuscript, Conception and design, Acquisition of data, Analysis and interpretation of data, Drafting or revising the article.

LD, participated in crystallization of the Fd-Fd construct and subsequent data collection and data processing; commented on the manuscript, Acquisition of data, Analysis and interpretation of data, Drafting or revising the article.

AK, solved the Fd-Top7 crystal structure; commented on the manuscript, Acquisition of data, Analysis and interpretation of data, Drafting or revising the article.

JFH, solved the Fd-Top7 crystal structure; commented on the manuscript, Acquisition of data, Analysis and interpretation of data, Drafting or revising the article.

RX, expressed and purified protein samples; commented on the manuscript, Acquisition of data, Analysis and interpretation of data, Drafting or revising the article.

GTM, expressed and purified protein samples; commented on the manuscript, Acquisition of data, Analysis and interpretation of data, Drafting or revising the article.

BLS, participated in crystallization of the Fd-Fd construct and subsequent data collection and data processing; commented on the manuscript, Acquisition of data, Analysis and interpretation of data, Drafting or revising the article.

FDiM, conducted molecular replacement analyses and subsequent refinement for the Fd-Fd crystal structure using an ensemble of molecular search models produced by RosettaDesign; commented on the manuscript, Acquisition of data, Analysis and interpretation of data, Drafting or revising the article.

DB, wrote code for simulations, performed simulations, and wrote the paper; commented on the manuscript, Conception and design, Analysis and interpretation of data, Drafting or revising the article.

Additional files

Supplementary file 1. Crystallographic data.

DOI: http://dx.doi.org/10.7554/eLife.11012.012

elife-11012-supp1.xlsx (47KB, xlsx)
DOI: 10.7554/eLife.11012.012

Major datasets

The following datasets were generated:

DiMaio F, King IC, Gleixner J, Doyle L, Stoddard B, Baker D,2015,Crystal structure of De novo designed ferredoxin-ferredoxin domain insertion protein,http://www.rcsb.org/pdb/explore/explore.do?structureId=5CW9,Publicly available at the RCSB Protein Data Bank (Accession no: 5CW9).

Kuzin A, Su M, Seetharaman J, Maglaqui M, Xiao R, Lee D, Gleixner J, Baker D, Everett JK, Acton TB, Kornhaber G, Montelione GT, Hunt JF, Tong L, Northeast Structural Genomics Consortium,2013,Three-dimensional Structure of the orthorhombic crystal of computationally designed insertion domain , Northeast Structural Genomics Consortium (NESG) Target OR327,http://www.rcsb.org/pdb/explore/explore.do?structureId=4KYZ,Publicly available at the RCSB Protein Data Bank (Accession no: 4KYZ).

Kuzin A, Su M, Seetharaman J, Maglaqui M, Xiao R, Lee D, Gleixner J, Baker D, Everett JK, Acton TB, Montelione GT, Tong L, Hunt JF, Northeast Structural Genomics Consortium,2013,Three-dimensional Structure of the orthorhombic crystal of computationally designed insertion domain , Northeast Structural Genomics Consortium (NESG) Target OR327,http://www.rcsb.org/pdb/explore/explore.do?structureId=4KY3,Publicly available at the RCSB Protein Data Bank (Accession no: 4KY3).

References

  1. Aroul-Selvam R, Hubbard T, Sasidharan R. Domain insertions in protein structures. Journal of Molecular Biology. 2004;338:633–641. doi: 10.1016/j.jmb.2004.03.039. [DOI] [PMC free article] [PubMed] [Google Scholar]
  2. Ay J, Gotz F, Borriss R, Heinemann U. Structure and function of the bacillus hybrid enzyme GluXyn-1: native-like jellyroll fold preserved after insertion of autonomous globular domain. Proceedings of the National Academy of Sciences of the United States of America. 1998;95:6613–6618. doi: 10.1073/pnas.95.12.6613. [DOI] [PMC free article] [PubMed] [Google Scholar]
  3. Ben-Tal N, Kolodny R. Representation of the protein universe using classifications, maps, and networks. Israel Journal of Chemistry. 2014;54:1286–1292. doi: 10.1002/ijch.201400001. [DOI] [Google Scholar]
  4. Berrondo M, Ostermeier M, Gray JJ. Structure prediction of domain insertion proteins from structures of individual domains. Structure. 2008;16:513–527. doi: 10.1016/j.str.2008.01.012. [DOI] [PMC free article] [PubMed] [Google Scholar]
  5. Collinet B, Herve M, Pecorari F, Minard P, Eder O, Desmadril M. Functionally accepted insertions of proteins within protein domains. Journal of Biological Chemistry. 2000;275:17428–17433. doi: 10.1074/jbc.M000666200. [DOI] [PubMed] [Google Scholar]
  6. Cutler TA, Mills BM, Lubin DJ, Chong LT, Loh SN. Effect of interdomain linker length on an antagonistic folding–unfolding equilibrium between two protein domains. Journal of Molecular Biology. 2009;386:854–868. doi: 10.1016/j.jmb.2008.10.090. [DOI] [PMC free article] [PubMed] [Google Scholar]
  7. DiMaio F, Echols N, Headd JJ, Terwilliger TC, Adams PD, Baker D. Improved low-resolution crystallographic refinement with phenix and rosetta. Nature Methods. 2013;10:1102–1104. doi: 10.1038/nmeth.2648. [DOI] [PMC free article] [PubMed] [Google Scholar]
  8. Doi N, Yanagawa H. Design of generic biosensors based on green fluorescent proteins with allosteric sites by directed evolution. FEBS Letters. 1999;453:305–307. doi: 10.1016/S0014-5793(99)00732-2. [DOI] [PubMed] [Google Scholar]
  9. Edwards WR, Busse K, Allemann RK, Jones DD. Linking the functions of unrelated proteins using a novel directed evolution domain insertion method. Nucleic Acids Research. 2008;36:e11012. doi: 10.1093/nar/gkn363. [DOI] [PMC free article] [PubMed] [Google Scholar]
  10. Friedberg I, Godzik A. Connecting the protein structure universe by using sparse recurring fragments. Structure. 2005;13:1213–1224. doi: 10.1016/j.str.2005.05.009. [DOI] [PubMed] [Google Scholar]
  11. Grishin NV. Fold change in evolution of protein structures. Journal of Structural Biology. 2001;134:167–185. doi: 10.1006/jsbi.2001.4335. [DOI] [PubMed] [Google Scholar]
  12. Guntas G, Ostermeier M. Creation of an allosteric enzyme by domain insertion. Journal of Molecular Biology. 2004;336:263–273. doi: 10.1016/j.jmb.2003.12.016. [DOI] [PubMed] [Google Scholar]
  13. Hocker B, Claren J, Sterner R. Mimicking enzyme evolution by generating new (beta-alpha)(8)-barrels from (beta-alpha)(4)-half-barrels. Proceedings of the National Academy of Sciences of the United States of Ameria. 2004;101:16448–16453. doi: 10.1073/pnas.0405832101. [DOI] [PMC free article] [PubMed] [Google Scholar]
  14. Huang P-S, Ban Y-EA, Richter F, Andre I, Vernon R, Schief WR, Baker D, Uversky VN. RosettaRemodel: a generalized framework for flexible backbone protein design. PLoS One. 2011;6:e11012. doi: 10.1371/journal.pone.0024109. [DOI] [PMC free article] [PubMed] [Google Scholar]
  15. Koga N, Tatsumi-Koga R, Liu G, Xiao R, Acton TB, Montelione GT, Baker D. Principles for designing ideal protein structures. Nature. 2012;491:222–227. doi: 10.1038/nature11600. [DOI] [PMC free article] [PubMed] [Google Scholar]
  16. Krishna SS, Grishin NV. Structural drift: a possible path to protein fold change. Bioinformatics. 2005;21:1308–1310. doi: 10.1093/bioinformatics/bti227. [DOI] [PubMed] [Google Scholar]
  17. Krissinel E, Henrick K. Secondary-structure matching (sSM), a new tool for fast protein structure alignment in three dimensions. Acta Crystallographica Section D Biological Crystallography. 2004;60:2256–2268. doi: 10.1107/S0907444904026460. [DOI] [PubMed] [Google Scholar]
  18. Lupas AN, Ponting CP, Russell RB. On the evolution of protein folds: are similar motifs in different protein folds the result of convergence, insertion, or relics of an ancient peptide world? Journal of Structural Biology. 2001;134:191–203. doi: 10.1006/jsbi.2001.4393. [DOI] [PubMed] [Google Scholar]
  19. Murzin AG, Brenner SE, Hubbard T, Chothia C. SCOP: a structural classification of proteins database for the investigation of sequences and structures. Journal of Molecular Biology. 1995;247:536–540. doi: 10.1016/S0022-2836(05)80134-2. [DOI] [PubMed] [Google Scholar]
  20. Ostermeier M. Engineering allosteric protein switches by domain insertion. Protein Engineering Design and Selection. 2005;18:359–364. doi: 10.1093/protein/gzi048. [DOI] [PubMed] [Google Scholar]
  21. Pandya C, Brown S, Pieper U, Sali A, Dunaway-Mariano D, Babbitt PC, Xia Y, Allen KN. Consequences of domain insertion on sequence-structure divergence in a superfold. Proceedings of the National Academy of Sciences of the United States of America. 2013;110:e11012. doi: 10.1073/pnas.1305519110. [DOI] [PMC free article] [PubMed] [Google Scholar]
  22. Park K, Shen BW, Parmeggiani F, Huang P-S, Stoddard BL, Baker D. Control of repeat-protein curvature by computational protein design. Nature Structural & Molecular Biology. 2015;22:167–174. doi: 10.1038/nsmb.2938. [DOI] [PMC free article] [PubMed] [Google Scholar]
  23. Shanmugaratnam S, Eisenbeis S, Hocker B. A highly stable protein chimera built from fragments of different folds. Protein Engineering Design and Selection. 2012;25:699–703. doi: 10.1093/protein/gzs074. [DOI] [PubMed] [Google Scholar]
  24. Shannon P. Cytoscape: a software environment for integrated models of biomolecular interaction networks. Genome Research. 2003;13:2498–2504. doi: 10.1101/gr.1239303. [DOI] [PMC free article] [PubMed] [Google Scholar]
  25. Stivala A, Wybrow M, Wirth A, Whisstock JC, Stuckey PJ. Automatic generation of protein structure cartoons with pro-origami. Bioinformatics. 2011;27:3315–3316. doi: 10.1093/bioinformatics/btr575. [DOI] [PubMed] [Google Scholar]
  26. Söding J, Lupas AN. More than the sum of their parts: on the evolution of proteins from peptides. BioEssays. 2003;25:837–846. doi: 10.1002/bies.10321. [DOI] [PubMed] [Google Scholar]
  27. Xiao R, Anderson S, Aramini J, Belote R, Buchwald WA, Ciccosanti C, Conover K, Everett JK, Hamilton K, Huang YJ, Janjua H, Jiang M, Kornhaber GJ, Lee DY, Locke JY, Ma L-C, Maglaqui M, Mao L, Mitra S, Patel D, Rossi P, Sahdev S, Sharma S, Shastry R, Swapna GVT, Tong SN, Wang D, Wang H, Zhao L, Montelione GT, Acton TB. The high-throughput protein sample production platform of the northeast structural genomics consortium. Journal of Structural Biology. 2010;172:21–33. doi: 10.1016/j.jsb.2010.07.011. [DOI] [PMC free article] [PubMed] [Google Scholar]
  28. Zhang Y. TM-align: a protein structure alignment algorithm based on the TM-score. Nucleic Acids Research. 2005;33:2302–2309. doi: 10.1093/nar/gki524. [DOI] [PMC free article] [PubMed] [Google Scholar]
eLife. 2015 Dec 9;4:e11012. doi: 10.7554/eLife.11012.019

Decision letter

Editor: Nir Ben-Tal1

eLife posts the editorial decision letter and author response on a selection of the published articles (subject to the approval of the authors). An edited version of the letter sent to the authors after peer review is shown, indicating the substantive concerns or comments; minor concerns are not usually shown. Reviewers have the opportunity to discuss the decision before the letter is sent (see review process). Similarly, the author response typically shows only responses to the major concerns raised by the reviewers.

Thank you for submitting your work entitled "Precise Assembly of Complex Beta Sheet Topologies from de novo Designed Building Blocks" for peer review at eLife. Your submission has been evaluated by John Kuriyan (Senior editor) and three reviewers, one of whom, Nir Ben-Tal, served as guest Reviewing editor.

The reviewers have discussed the reviews with one another and the Reviewing editor has drafted this decision that summarizes the discussion.

Summary:

The manuscript describes the design of large alpha/beta domains by fusion of two smaller domains. Crystal structures of two of the designed proteins showed agreement with the design. The successful design supports the suggestion that proteins may evolve by cut-and-paste, where complex domains emerge by the assembly of shorter fragments.

Based on the current draft of the manuscript it is difficult to decide whether the work is novel enough to justify publication in eLife. This letter reflects a long discussion among the reviewers in an effort to realize possible novelties. The authors should submit a revised draft only if they are certain that they can address all the issues that were raised.

Essential revisions:

Novelty here could be related to the successful design, as well as the implications to protein evolution. Regarding design, the authors claim in the Introduction that "while protein domains with larger and more complex beta sheets occur frequently in nature, such topologies have not been successfully created by de novo protein design". However, this statement is wrong. For example, Birte Höcker has made a nine-stranded alpha-beta barrel by fusing half of a TIM barrel with most of a response regulator domain, then used design to revert this barrel to the dominant eight-stranded form with five mutations, which could have been sampled by neutral drift (PNAS 2008, JACS 2012). This statement should be corrected accordingly (maybe there are more examples?).

However, there are three other possible novelties here:

1) The fusion of two full, independent domains into a single new domain. Previous efforts have fused fragments of domains. Here it is essential to know more about the new domains. Are the new domains folding cooperatively? Where are they in the list of twelve top ranked constructs (and how was that ranking done)? Near the top, or is there little correlation between ranking and success? What are the biophysical properties of the ten constructs that did not yield structures? A multiple alignment of the twelve experimentally tried constructs compared to the parent proteins would be really useful.

2) Fusion as an insertion of one domain into another. To the best of our knowledge, other cases are all consecutive, N-to-C fusions (but please check).

3) Generation of a fold that is different from that of the parent proteins. To the best of our knowledge, previous examples, like Riechmann and Winter's cold-shock protein fusions, Tawfik's tachylectin propellers, Blaber's beta-trefoils, ended up at or near the parent fold (in fact, most were intended to). The folds in this paper are clearly different from the starting folds. How different are they from other folds in the database? Do they occur in any of the 60 odd SCOP superfamilies that have a ferredoxin-like core (often with elaborate decorations)? The authors should discuss the folds they have generated. Perhaps map the parent and designed folds into the domains network of Nepomnyachiy et al. PNAS 111, 1691-11696, 2014.

Of these 3 novelties, only the third justifies publication in eLife, provided that the new fold differs significantly from the original ones.

4) Another source of novelty may come from the implications of the successful design to evolution. The manuscript tries to sell this view, but much more effort is required to substantiate it. The question that emerges is whether evolution can follow the path taken by the design approach. Two issues are outstanding here: first, only 2 of 12 designs have been successful. Second, interface redesign was required and it is unclear whether the normal evolutionary drift would suffice for that. So the question is whether evolution uses the possibility of fusion and the authors need to address this issue. Specifically, they should:

A) Identify examples where a new fold has arisen from the seamless merger of two simpler folds.

B) Add a lucid explanation of the mutations introduced at the redesign stage (with a multiple alignment figure). Nature does not do redesign, so the question is how many changes are needed to make the daughter protein viable and are these changes neutral for the viability of the parent proteins (i.e. could these mutations have been sampled by neutral drift)?

C) Cite related publications by Godzik, Grishin, Soding, Lupas, Ben-Tal, Kolodny, and others.

In addition, the revised draft should address the following issues:

5) The crystallographic information concerning the accuracy of the solutions seems inadequate to judge whether the structures were in fact correctly determined. The revised manuscript should provide additional information concerning the structure determinations and refinements to let the informed reader judge that the solutions are robust.

6) Discussion: "This mechanism provides a straightforward route to designing large and complex beta sheet structures capable of hosting functions such as catalysis difficult to achieve in previously designed small protein domains which lack cavities for potential active sites, etc." The connection between the data shown here and design for function is not obvious. The authors need to elaborate on this. Anyway, this speculation should be tuned down significantly.

7) The manuscript clarifies the design approach, but details about the methodology used are missing. For example, what were the considerations for selecting the ferredoxin and top7 as starting points? There should be a section that explains the computational methodology. Among other things this section should address the following questions and concerns:

A) The description says that also Rossman 2x2 was used in addition to ferredoxin and top7. What happened to designs based on Rossman 2x2?

B) "Positions within two residues of each of the termini of each domain were aligned with all pairs of residues separated by fewer than 5 residues in each domain". The authors should elaborate on this. Maybe add a figure?

C) "The sequences of these chimeras were optimized using Rosetta Design calculations around the junction regions". Optimized for what?

eLife. 2015 Dec 9;4:e11012. doi: 10.7554/eLife.11012.020

Author response


Essential revisions:

Novelty here could be related to the successful design, as well as the implications to protein evolution. Regarding design, the authors claim in the Introduction that "while protein domains with larger and more complex beta sheets occur frequently in nature, such topologies have not been successfully created by de novo protein design". However, this statement is wrong. For example, Birte Höcker has made a nine-stranded alpha-beta barrel by fusing half of a TIM barrel with most of a response regulator domain, then used design to revert this barrel to the dominant eight-stranded form with five mutations, which could have been sampled by neutral drift (PNAS 2008, JACS 2012). This statement should be corrected accordingly (maybe there are more examples?). The claim has been modified in the text with related citations of Höcker et al:

“On the protein design front, there has been progress in de novo design of idealized helical bundles (Park et al., 2015) and alpha beta protein structures with up to 5 strands (Koga et al., 2012), and though new folds have been generated by tandem fusion of natural protein domains followed by introduction of additional stabilizing mutations (Hocker, Claren and Sterner, 2004; Shanmugaratnam, Eisenbeis and Hocker, 2012), assembly of large and complex beta sheets poses a challenge for de novo protein design.”

However, there are three other possible novelties here:

1) The fusion of two full, independent domains into a single new domain. Previous efforts have fused fragments of domains. Here it is essential to know more about the new domains. Are the new domains folding cooperatively? Where are they in the list of twelve top ranked constructs (and how was that ranking done)? Near the top, or is there little correlation between ranking and success? What are the biophysical properties of the ten constructs that did not yield structures? A multiple alignment of the twelve experimentally tried constructs compared to the parent proteins would be really useful.

2) Fusion as an insertion of one domain into another. To the best of our knowledge, other cases are all consecutive, N-to-C fusions (but please check). 3) Generation of a fold that is different from that of the parent proteins. To the best of our knowledge, previous examples, like Riechmann and Winter's cold-shock protein fusions, Tawfik's tachylectin propellers, Blaber's beta-trefoils, ended up at or near the parent fold (in fact, most were intended to). The folds in this paper are clearly different from the starting folds. How different are they from other folds in the database? Do they occur in any of the 60 odd SCOP superfamilies that have a ferredoxin-like core (often with elaborate decorations)? The authors should discuss the folds they have generated. Perhaps map the parent and designed folds into the domains network of Nepomnyachiy et al. PNAS 111, 1691-11696, 2014. Of these 3 novelties, only the third justifies publication in eLife, provided that the new fold differs significantly from the original ones.

As the reviewers have suggested, we have focused on novelty (3) above in the manuscript and elaborated on the uniqueness of the designed fold in the new third section of the Results text and in two additional figures (Figure 4 and Figure 4—figure supplement 1). We have included an analysis of the similarity of the designed protein to those in SCOP in Figure 4 (TM-align hits across the entire new fold) and mapped the designed protein domains and their parent folds into the protein domains network in Figure 4—figure supplement 1:

“To compare the folds of these designed proteins to those in the SCOP v.1.75 domain database (Murzin et al., 1995), the TMalign structure-structure comparison method was used to search a 70% sequence non-redundant set of SCOP domains for structure alignments containing a minimum 75% overlap with the designed proteins. […] While there are no domains with globally similar folds, both designed proteins are similar to a number of SCOP domains over the ferrodoxin-like substructure(s) as is made evident by mapping the proteins to the domains network of Nepomnyachiy et al. (Ben-Tal and Kolodny, 2014) (Figure 4—figure supplement 1).”

4) Another source of novelty may come from the implications of the successful design to evolution. The manuscript tries to sell this view, but much more effort is required to substantiate it. The question that emerges is whether evolution can follow the path taken by the design approach. Two issues are outstanding here: first, only 2 of 12 designs have been successful. Second, interface redesign was required and it is unclear whether the normal evolutionary drift would suffice for that. So the question is whether evolution uses the possibility of fusion and the authors need to address this issue. Specifically, they should:

A) Identify examples where a new fold has arisen from the seamless merger of two simpler folds.

Previous studies identifying the evolution of new folds from the combination of multiple simple folds have been cited in the Introduction:

“By analogy, new proteins have been engineered from existing domains by simple linear concatenation or insertion of one domain into another (Ay et al., 1998; Collinet et al., 2000; Cutler et al., 2009; Doi and Yanagawa, 1999; Edwards et al., 2008; Guntas and Ostermeier, 2004; Ostermeier, 2005). How individual domains evolved, in contrast, is much less clear. Both experimental and computational analyses have suggested that new folds can evolve by insertion of one fold into another (Lupas, Ponting and Russell, 2001; Grishin, 2001; Soding and Lupas, 2003; Krishna and Griffin, 2005; Friedberg and Godzik, 2005; Bel-Tal and Kolodny, 2014), but to our knowledge there is no evidence that complex beta sheet topologies can be formed in this manner.”

B) Add a lucid explanation of the mutations introduced at the redesign stage (with a multiple alignment figure). Nature does not do redesign, so the question is how many changes are needed to make the daughter protein viable and are these changes neutral for the viability of the parent proteins (i.e. could these mutations have been sampled by neutral drift)?

We have added a third section to the Results and an additional figure (Figure 4—figure supplement 2) showing a multiple sequence alignment, computational models of the putative neutral drift parent fold mutants, and calculations predicting the energetic effects of these neutral drift mutations on the parent folds:

“The mutations introduced at the redesign stage of the domain insertion design protocol are compatible with the parent fold structures with minimal perturbation of the protein backbone (Figure 4—figure supplement 2) suggesting the designed folds would have the potential to evolve from insertion followed by neutral mutational drift of the parent structures.”

We have elaborated the explanation of the mutations introduced at the design stage in the Methods, and the location of the redesign mutations are shown in Figure 4—figure supplement 2:

“During the design simulation, all amino acid positions within 5 Å of the inter-domain junction interface were redesigned to minimize the predicted free energy of folding with a flexible backbone protein design protocol described previously (Friedberg and Godzik, 2005).”

C) Cite related publications by Godzik, Grishin, Soding, Lupas, Ben-Tal, Kolodny, and others.

The above publications have been cited in the Introduction.

In addition, the revised draft should address the following issues:

5) The crystallographic information concerning the accuracy of the solutions seems inadequate to judge whether the structures were in fact correctly determined. The revised manuscript should provide additional information concerning the structure determinations and refinements to let the informed reader judge that the solutions are robust.

Key structure quality assessment statistics for the crystal structures are presented in the table of Supplementary file 1. In addition, we attach in these responses to reviewers wwPDB Structure Validation Reports. These reports, generated by the wwPDB, follow recommendations of the wwPDB X-ray Crystallography Structure Validation Task Force. The evolving standard of the field is to provide these reports to reviewers to address exactly the concerned raised by Reviewer 1.

We have added additional clarifications in the Results section to address the crystal structure accuracy, and have made it more clear that in the ferredoxin-ferredoxin structure, the fine details of the structure difficult to model, but the overall topology is not in doubt. Conversely, in the ferredoxin-top7 crystal (ferredoxin-top7), structure refinement was well-behaved.

“Attempts to improve these statistics by rebuilding portions of the model proved unsuccessful, possibly due to a register shift or dynamic fluctuations in the structure (perhaps corresponding to slightly 'molten-globule'-like behavior) that are difficult to computationally model. However, unbiased low-resolution omit maps suggest that the overall topology is correct (Figure 2—figure supplement 2).”

6) Discussion: "This mechanism provides a straightforward route to designing large and complex beta sheet structures capable of hosting functions such as catalysis difficult to achieve in previously designed small protein domains which lack cavities for potential active sites, etc." The connection between the data shown here and design for function is not obvious. The authors need to elaborate on this. Anyway, this speculation should be tuned down significantly.

The speculation has been changed to:

“This mechanism provides a straightforward route to designing large and complex beta sheet structures capable of scaffolding the pockets and cavities essential for future design of protein functions.”

7) The manuscript clarifies the design approach, but details about the methodology used are missing. For example, what were the considerations for selecting the ferredoxin and top7 as starting points? There should be a section that explains the computational methodology. Among other things this section should address the following questions and concerns:

The following sentence has been added to the Methods:

“These three domains were chosen because they were the only Rosetta de novo designed protein domains with both alpha and beta secondary structure for which high resolution experimental structures had been obtained.”

A) The description says that also Rossman 2x2 was used in addition to ferredoxin and top7. What happened to designs based on Rossman 2x2?

The following sentence has been added to the Methods:

“One design based on Rossman 2x2 expressed as soluble protein, but no crystal structure could be obtained at the time of this work.”

B) "Positions within two residues of each of the termini of each domain were aligned with all pairs of residues separated by fewer than 5 residues in each domain". The authors should elaborate on this. Maybe add a figure?

The insertion point and design protocols have been clarified and elaborated in the Methods:

“Each chimeric domain consists of a parent host domain and a parent insert domain. […] During the design simulation, all amino acid positions within 5 Å of the inter-domain junction interface were redesigned to minimize the predicted free energy of folding with the Rosetta all-atom energy function and a flexible backbone protein design protocol described previously (Friedberg and Godzik, 2005).”

C) "The sequences of these chimeras were optimized using Rosetta Design calculations around the junction regions". Optimized for what?

The following sentence has been added to the Methods:

“The sequences of these chimeras were optimized using Rosetta Design calculations around the junction regions and the new interface between the former domains. During the design simulation, all amino acid positions within 5 Å of the inter-domain junction interface were redesigned to minimize the predicted free energy of folding with the Rosetta all-atom energy function and a flexible backbone protein design protocol described previously (Friedberg and Godzik, 2005).”

Associated Data

    This section collects any data citations, data availability statements, or supplementary materials included in this article.

    Supplementary Materials

    Supplementary file 1. Crystallographic data.

    DOI: http://dx.doi.org/10.7554/eLife.11012.012

    elife-11012-supp1.xlsx (47KB, xlsx)
    DOI: 10.7554/eLife.11012.012

    Articles from eLife are provided here courtesy of eLife Sciences Publications, Ltd

    RESOURCES