Skip to main content
Protein Science : A Publication of the Protein Society logoLink to Protein Science : A Publication of the Protein Society
. 2024 Mar 19;33(4):e4936. doi: 10.1002/pro.4936

The structural landscape of the immunoglobulin fold by large‐scale de novo design

Jorge Roel‐Touris 1, Lourdes Carcelén 1, Enrique Marcos 1,
PMCID: PMC10949314  PMID: 38501461

Abstract

De novo designing immunoglobulin‐like frameworks that allow for functional loop diversification shows great potential for crafting antibody‐like scaffolds with fully customizable structures and functions. In this work, we combined de novo parametric design with deep‐learning methods for protein structure prediction and design to explore the structural landscape of 7‐stranded immunoglobulin domains. After screening folding of nearly 4 million designs, we have assembled a structurally diverse library of ~50,000 immunoglobulin domains with high‐confidence AlphaFold2 predictions and structures diverging from naturally occurring ones. The designed dataset enabled us to identify structural requirements for the correct folding of immunoglobulin domains, shed light on β‐sheet–β‐sheet rotational preferences and how these are linked to functional properties. Our approach eliminates the need for preset loop conformations and opens the route to large‐scale de novo design of immunoglobulin‐like frameworks.

Keywords: AlphaFold2, beta‐sheets, de novo protein design, deep learning, immunoglobulin, protein structure, Rosetta

1. INTRODUCTION

Immunoglobulin‐like (Ig) domains are the structural frameworks within the variable region of antibodies, providing anchorage for hypervariable loops engaged in molecular recognition. Despite the widespread use of monoclonal antibodies as protein therapeutics, imaging agents, or affinity reagents for biological research, there has been considerable interest in developing antibody‐like scaffolds (Jost & Plückthun, 2014; Kintzing et al., 2016; Schumacher et al., 2018; Sha et al., 2017) with improved biophysical properties and more programmable structures. De novo designing protein frameworks amenable for functional loop diversification, and without relying on naturally existing proteins, holds promise for crafting antibody‐like scaffolds with more tunable structures (and functions), alongside the excellent biophysical properties associated with de novo designed proteins (Marcos & Silva, 2018; Pan & Kortemme, 2021).

Ig domains in nature are β‐sandwiches containing between 7 and 9 β‐strands arranged in two antiparallel β‐sheets packing face‐to‐face (Bork et al., 1994; Halaby et al., 1999). Ideally, their overall geometry can be described by six geometrical transformations describing the relative orientation of two flat β‐sheets: three rotations (one twist and two tilts) and three translations along three orthogonal axes passing through the center of the β‐sandwich. One notable structural feature of the Ig fold is the arrangement of β‐sheets by means of β‐strand connecting loops, as they play a crucial role in determining the overall shape of the domain. In previous studies (Chidyausiku et al., 2022; Marcos et al., 2018), we found that the conformation of β‐arch loops (Hennetin et al., 2006), which are crossover connections between β‐strands from opposing β‐sheets, is key in the formation of β‐sandwiches and establishment of the relative orientation of both β‐sheets. Specifically, certain combinations of β‐arch loop conformations, when compatible with β‐strand length and sidechain orientations, strongly support the formation of the central cross‐β motif within the core of Ig domains (Chidyausiku et al., 2022). Based on these principles, we succeeded to de novo design 7‐stranded Ig domains by a fragment‐based computational approach using specific combinations of β‐arch conformations (Chidyausiku et al., 2022). However, the diversity of the designs was heavily constrained by the combinations of β‐arch loop conformations considered, which exponentially grows with the number loop conformations per loop site. The relative twist rotation between β‐sheets is another critical structural feature of Ig domains, as it modulates the spatial arrangement of the top and bottom faces of the β‐sandwich structure, which constitute hotspots for functional binding loops, as seen in complementarity‐determining regions (CDRs) of antibodies. Understanding these structural features of the Ig fold is essential for designing Ig‐like proteins with tailored functionalities.

Towards fully controllable de novo design of Ig domains, we have developed a parametric approach to systematically explore the structural landscape of Ig domains through diversity in their β‐strand connections, thus eliminating the need for preset loop conformations. By combining physics‐based protein design with deep learning (DL) methods for structure prediction and design (Ovchinnikov & Huang, 2021), we generated an extensive library of novel Ig domains with highly confident, accurate and convergent AlphaFold2 (Jumper et al., 2021) (AF2) predictions. Our de novo immunoglobulins belong to the same fold as naturally occurring Ig domains found in antibodies or nanobodies, while exploring a more extensive structural space. We delved into the diversity of β‐arch conformations in our designs, uncovering loop configurations controlling the Ig twist rotation, and hence the spatial organization of potential functionalization sites. Furthermore, we investigated the capabilities of different DL‐based methods for sequence design and protein backbone generation for rescuing and building de novo Ig scaffolds. In summary, our study offers a comprehensive exploration of the structural landscape of the Ig fold in the realm of de novo protein design.

2. RESULTS

2.1. Parametric de novo design of immunoglobulin domains from ideal β‐sheets

We set out to de novo design 7‐stranded Ig domains through systematic exploration of the β‐sandwich conformational space by a parametric approach (Figure 1). We precomputed a library of ideal β‐sheets, formed by 3 or 4 antiparallel β‐strands of varying residue length (6–8 residues per β‐strand) and having optimal backbone hydrogen bond pairing without register shift (Figure S1). Pairs of 3‐ and 4‐stranded β‐sheets were then combined through four geometrical transformations describing the overall geometry of the immunoglobulin β‐sandwich: the twist rotation around an axis connecting the center of the two β‐sheets, the translation along the same axis, and translations along the two other orthogonal axes (the other two orthogonal rotations were ignored to reduce the number of parameters, as they are also more constrained by core packing than twist). We finely sampled twist rotations and translations between −60° and +60° and from 10 to 12 Å, respectively. The generated backbones were then closed according to the Ig topology (Figure 1a) by fragment‐based design of β‐hairpin (connecting two paired β‐strands) and β‐arch (connecting two unpaired β‐strands and crossing over the β‐sandwich) loops; resulting in fully connected Ig backbones. Fundamental rules have been described for designing β‐hairpin (Koga et al., 2012) and β‐arch (Marcos et al., 2018) loops based on the orientation of their two neighboring sidechains (Figure 1b(2), inset). We designed canonical β‐hairpins with 2 and 5 residues for L and R chiralities, respectively (Lin et al., 2015). For β‐arches, instead, we explored loop lengths between 3 and 6 residues, for each of the four possible sidechain directionality patterns, without any restriction in their conformation. We first bridged β‐strands E5 and E6 (loop L5 in Figure 1a) and then connected β‐strands E2 to E3 and E4 to E5 simultaneously (loops L2 and L4 in Figure 1a, respectively) to ensure compatibility—L2 and L4 are adjacent in the Ig structure. For all closed backbones, we designed five amino acid sequences with flexible‐backbone sequence design calculations with Rosetta (Leman et al., 2020), and probed the capability of the designed sequences to recapitulate their structures by AF2 protein structure prediction (without multiple sequence alignment or template information) (Figure 1a).

FIGURE 1.

FIGURE 1

Design and characterization of immunoglobulin‐like domains. (a) The immunoglobulin structure is composed by two β‐sheets packing face‐to‐face. The seven β‐strands (E1‐7) are connected through three β‐hairpins (L1, L3, and L6) and three β‐arches (L2, L4, and L5). Clockwise relative twists of the 3‐stranded β‐sheet are considered negative rotations, while anticlockwise twists are considered positive rotations. (b) Workflow for the de novo design of Ig domains from ideal β‐sheets: (1) Pairs of 3‐ and 4‐stranded β‐sheets are combined through a series of geometrical transformations. (2) The design of canonical β‐hairpins is followed by the design of structurally unconstrained β‐arches (L5 followed by L2 and L4 simultaneously). Bottom inset shows the different chiralities considered for both types of ββ loops. (3) For each closed scaffold, five different sequences were designed and subsequently validated by AlphaFold2 protein structure predictions. (c) In silico validation of the nearly 2.5 million de novo designed immunoglobulins as a function of the average TM score of the predictions with respect to the β‐sheet–β‐sheet twist. Darker areas represent more populated regions. (d) The 3D representation of the immunoglobulin designs as a function of their averaged pLDDT, pTM, and TM scores. Golden points represent highly confident (high pLDDT and pTM scores) and accurate (high TM scores) predictions. (e) Twist rotation distribution for 8942 high‐quality designs.

Following this approach, we designed ~2.5 million de novo Ig scaffolds, ensuring that all different combinations of variables (geometrical parameters, β‐sheet length combinations, and loop lengths) were uniformly sampled (Figure S2). After protein structure prediction, we examined the distributions of the sampled twist rotations as a function of their similarity to the AF2 predicted models. While the whole range of rotations is sampled, more density is found around the negative twist (Figure 1c, the darker the denser). For the in silico validations with AF2, we classify a design as high‐quality if (1) average RMSD ≤1.25 Å (across the top‐3 AF2 models), (2) standard deviation of RMSD ≤0.25 Å (considering all 5 AF2 predictions), and (3) average composite score (Roney & Ovchinnikov, 2022) (TM‐score [Zhang & Skolnick, 2005] * pTM * pLDDT) ≥0.5 (across the top‐3 AF2 models). By applying this selection criteria, we ensure that the selected Igs closely recapitulate the structure of the design, while having highly confident and convergent AF2 predictions. Convergence across predictions has been related to success for de novo designed proteins (Peñas‐Utrilla & Marcos, 2022). From the pool of designs, we have identified close to 9000 Ig domains with excellent predictions, whose composite scores range from 0.5 to 0.74 (Figure 1d, yellow dots); thus representing very high confident yet accurate predictions. In terms of twist rotations, the distribution of these designs is also skewed to negative values with a median of −11.3° (Figure 1e), sampling the whole range of β‐sheet–β‐sheet translations (median of 10.9 Å) (Figure S3). In fact, out of the 8942 scaffolds, only 1259 (14%) have positive rotations with a median of +4.0° and a maximum rotation of +28.7°.

2.2. Improving de novo immunoglobulins with deep learning

We reasoned that the cause for medium‐to‐low quality structure prediction of many designs could arise from (1) faulty amino acid sequences installed in good backbones, or (2) deficient β‐strand–β‐strand connecting loops. We assessed whether two advanced DL‐based design methods could address these two problems and hence rescue the designs: fixed backbone sequence design with ProteinMPNN (Dauparas et al., 2022), and joint construction of sequence and structure of ββ loops by inpainting as implemented in RFDesign (Wang et al., 2022). For sequence‐redesign with ProteinMPNN, we selected 2891 designs with high confidence and accurate predictions (composite score ≥0.5) but low structural convergence across the five AF2 models (Figure 2a); suggesting that the designed backbone is reachable but not strongly encoded by the sequence. For inpainting, we selected 67,302 designs with moderate‐to‐severe structural deviations in the prediction of the loops connecting β‐strands, regardless of their confidence metrics or structural convergence (Figure 2b). For these designs, those loops showing such predicted deviations were entirely rebuilt by inpainting (from 1 to 6 depending on the design) using RFDesign.

FIGURE 2.

FIGURE 2

Rescue of inaccurate immunoglobulin scaffolds by deep learning. (a) Scaffolds with accurate yet non‐convergent protein structure predictions were selected for sequence redesign with ProteinMPNN and subsequent validations by AF2. (b) The two different scenarios considered for structure redesign. (1) Scaffolds with convergent or (2) non‐convergent protein structure predictions with severe inaccuracies on the loops were redesigned by inpainting with RFDesign. The resulting models were further validated by AF2 predictions.

The selected scaffolds for sequence‐redesign span twist rotations between −53.5° and +46.2° and follow a distribution centered at −21° (Figure 3a). For each of them, we generated 50 sequences with ProteinMPNN (and energy minimized with FastRelax [Leman et al., 2020]) and predicted their structure with AF2. We found that for 2492 unique scaffolds (and sequences) the non‐convergence issue was resolved, yet having high confidence prediction metrics (composite score up to 0.77). This pool of successful redesigns represents 86% of the total selected designs. After recalculating their β‐sheet–β‐sheet twist rotations, we found that 75% (1863) were within the same rotational range (Figure 3b), while 629 (25%) significantly changed rotations (Figure 3c) (as compared with their original designs) to accommodate the newly designed sequence. Similarly, the selected scaffolds for loop structural repair span rotations between −60° and +60° and are centered around 0° (Figure 3d). For each scaffold, we used inpainting to rebuild those loops (backbone and sequence) showing structural disagreement between the design and the AF2 predictions, and subsequently predicted the structure of the inpainted designs with AF2. We found that 51% of them (34,335) had composite scores greater than 0.5 (up to 0.77) with averaged RMSDs below 1.25 Å and convergence across the predicted models. As before, while for the vast majority (26,438) the β‐sheet–β‐sheet twist rotations were within comparable ranges (Figure 3e), we again found shifted rotations for 23% of them (Figure 3f). Both DL design methods showed a high success ratio in rescuing designs and a preference towards less extreme β‐sheet–β‐sheet twists.

FIGURE 3.

FIGURE 3

Results for the deep learning‐based rescuing protocols. (a) Selected designs for protein sequence redesign with ProteinMPNN. (b) Successful sequence‐redesigns keeping and changing (c) their β‐sheet–β‐sheet twist with respect to the original designs (a). β‐sheet–β‐sheet twist distributions of original designs (unfilled histograms) and successfully redesigned (yellow histograms). (d) Selected designs for loop repair with Inpainting. (e) Successful inpainted designs keeping and changing (f) their β‐sheet–β‐sheet twist with respect to the original designs (d). Histograms are displayed as in b and c. All histograms show the number of counts per bin with respect to the β‐sheet–β‐sheet relative rotations.

2.3. Structural comparison between de novo designed and naturally occurring immunoglobulin domains

We pooled all the high‐quality designs (as defined above based on AF2 prediction) coming from the parametric protocol and DL‐based rescue, and obtained a total of 45,769 de novo Ig domains. To compare our scaffolds to existing (natural) Ig‐like structures, we compiled and manually curated three subgroups of naturally occurring Ig domains: 1849 antibodies (i.e., Fv, Fab, and scFv), 508 nanobodies, and 258 non‐antibody Ig‐like domains with diverse functions (Bork et al., 1994; Halaby et al., 1999) (herein called “natural Igs”)—antibodies and nanobodies were obtained from the SAbDab database (Dunbar et al., 2014) (<70% sequence identity) and natural Igs from SCOP (Andreeva et al., 2020) (annotated as “Ig‐like beta‐sandwich” fold). To focus the comparisons on the immunoglobulin domain, we only kept the variable fragments (Fvs) from the Fab regions of all antibody structures. We first examined the twist rotations of the four subgroups (Figures 4a and S4). The pool of de novo designs spans a broad range of rotations (between −58.5° and +45.7°) with a distribution centered around slightly neutral values (median −10°). In contrast, twist distributions for the three subgroups of Igs were shifted to more negative rotations (medians ranging between −36° and −29°), with antibodies and nanobodies having narrower distributions. Next, we independently clustered each of the four subgroups, at a TM‐score threshold of 0.8 for the de novo immunoglobulins and 0.9 for the three subgroups of naturally occurring Igs; resulting in 541, 107, 91, and 149 clusters for designs, Fvs, nanobodies and natural Igs, respectively. For all cluster representatives, we calculated all pairwise maximum TM scores—that is, without normalization by sequence length – and computed a structural similarity matrix. A clear segregation between our designs and naturally occurring Ig domains can be observed (Figure 4b). In comparison to Fvs and nanobodies, our designs had TM scores ~0.5 between them, which suggests that all have the same fold while sampling divergent regions of the Ig structural space. Besides differential rotational preferences, antibodies and nanobodies have topological differences (e.g., two extra strands inserted between the two equivalent strands E4 and E5 of our designs) likely associated with structural differences with our designs, decreasing the TM‐score. In contrast, for a fraction of natural Igs, more similarities were found with our designs. The subgroup of natural Igs exhibited greater structural diversity compared with Fvs and nanobodies. Indeed, we find relatively high TM‐scores across both groups, underpinning the highly conserved structure of their frameworks responsible for anchoring hypervariable loops (also reflected in their narrower twist distributions). In contrast, the designs sampled a much larger structural diversity, while in the same fold (Figure 4b,c), through our exhaustive exploration of conformational parameters; including β‐sandwich geometries, ββ connections, and β‐sheet sizes.

FIGURE 4.

FIGURE 4

Structural comparison of de novo designed Ig scaffolds to natural Ig domains. (a) Twist rotation distributions of the selected 45,769 high‐quality de novo immunoglobulins and the other three subgroups of naturally occurring Ig domains. (b) Structural similarity matrix sorted according to the nature of the scaffold: Designs/Fvs/nanobodies/natural Igs. Bottom inset zooms into the Fvs–nanobodies–natural Igs comparison. (c) Network‐like representation of the 541 design clusters originated from a. Cartoons show several models from most populated clusters as well as the center of the network.

2.4. Analysis of most frequent β‐arches in de novo immunoglobulins

We next examined the loop conformations emerging in the pool of high‐quality designs. The two face‐to‐face β‐sheets in the immunoglobulin fold are bridged by three β‐arch loops, which connect β‐strands E2–E3, E4–E5, and E5–E6 (L2, L4, and L5 in Figure 5a, respectively). In contrast to our previous study (Chidyausiku et al., 2022), here we generated de novo Ig domains through diversity in their β‐arch loops without constraints in their conformation (or ABEGO type) and up to six amino acids in length. ABEGO backbone torsion bins provide a convenient way to classify the backbone geometry of protein residues based on the Ramachandran plot region of their ϕ/ψ dihedrals (Lin et al., 2015). For each of the high‐quality designs, we computed blueprint files and extracted the ABEGO types of the three β‐arch loops as defined by DSSP (Kabsch & Sander, 1983). In the set of nearly 50,000 selected designs, we find a sizable diversity of β‐arch conformations with 1532, 2132, and 1371 unique ABEGO types for loops L2, L4, and L5, respectively. This diversity is even larger when considering paired β‐arches (16,382, 19,910, and 20,184 for L2:L4, L2:L5, and L4:L5) or the three of them (37,094 for L2:L4:L5). However, we observe that certain ABEGOs are more frequently found either in the three β‐arches than others (Figure 5b). For example, BBGB is the predominant configuration found in L4, which is also among the most frequently found ABEGO types in L2. Indeed, β‐arches L2 and L4 are structurally adjacent in the Ig fold and therefore restrict the configuration of one another. We note this coupling by the increased number of frequently found paired L2:L4 conformations (Figure 5c), which in turn explains the relatively decreased diversity in terms of the number of unique L2:L4 ABEGO pairs as compared with those in L2:L5 and L4:L5. In fact, the predominant L2:L4 paired conformation (BABB:BBGB) is five and two times more frequent than BABB:BBAAB and BBGB:AAB, which are the predominant ABEGOs for L2:L5 and L4:L5, respectively (Figure S5). These observations suggest a stronger cooperation between β‐arches L2 and L4, relative to the other two pairs, in stabilizing the structure of the Ig fold.

FIGURE 5.

FIGURE 5

Predominant β‐arch configurations found in de novo designed immunoglobulins. (a) The three different β‐arches of an immunoglobulin domain. (b) Most frequently found ABEGO types in our designs for each of the β‐arches. (c) More frequent combinations of ABEGO pairs for the L2:L4 duo. (d) Distribution of L2:L4 β‐arch pairs as a function of the immunoglobulin twist. Gray boxes are the pairs as in c. (e) Two most frequently found L2:L4 pairs with their sequence logos. Bottom cartoons show interatomic contacts.

Besides loop frequencies, we also analyzed whether these non‐local ββ connections control the overall geometry of the Ig fold. At a first glance, we noticed that particular loop configurations (alone or in combination) are not shared between positive and negative rotations. For each β‐arch, multiple loop configurations were exclusively found in negative twist rotations, while others predominantly favored positive rotations (Figure S6)—less diversity of loops exclusively found in positive rotations was observed. This effect is more evident for pairs of β‐arches. While the most frequently found loop: loop conformations shape neutral rotations (between −10° and 10°), other less represented pairs are specific to positive (>10°) or negative twist rotations (<−10°) (Figures 5d and S7). Interestingly, these observations also apply for the L2:L4:L5 trio; where, for example, ABAA:BBGGGBB:AABB, and BAAB:BBGB:BAA only shape scaffolds with positive twists of around +15° (Figure S8). Also, the identified β‐arches in high‐quality Igs sample an immense sequence space. Indeed, we find more than 20,000 unique amino acid sequences for each single loop and more than 30,000 for each loop:loop and loop:loop:loop combination. However, as previously reported, there are certain sequences that preferentially encode for specific loop conformations (Marcos et al., 2018). As an illustration, for the paired L2:L4 combination, we find 35,885 unique sequences for 16,382 unique ABEGOs. This would roughly indicate that there are, on average, two different sequences per loop configuration. However, for the predominant BABB:BBGB configuration, there is one prevalent paired sequence (SKEP:KPGE), which is found in 10% of the cases (Figure 5e). Consistent with our previous observations, we find a higher number of unique sequences in L2:L5 and L4:L5 pairs (41,485 and 44,329), which again suggests a lower cooperativity effect between these β‐arch pairs in comparison to that of L2:L4.

2.5. Relative β‐sheet–β‐sheet rotations control the localization of functionalization spots

The six ββ loops of the Ig fold are spatially clustered into two different groups. Three loops (L1, L3, and L5) on the top side and other three (L2, L4, and L6) on the bottom side (Figure 6a). These two faces represent putative sites where designed ββ loops could collectively build functional paratopes binding targets of interest, hence mimicking the action of paratopes observed in antibodies and nanobodies. We have previously shown that β‐hairpin regions in monomeric (Chidyausiku et al., 2022) and single‐chain dimeric (Roel‐Touris et al., 2023) Igs are sweet spots for functional ligand‐binding loops, and that their relative orientation could be finely tuned through structural diversification of Ig scaffolds. Here, we analyze the diversity of the ~50,000 high‐quality Ig designs in terms of their loops' relative orientations in relation to the Ig twist rotation, which highly controls the overall scaffold structure. To this end, we computed the pairwise center‐of‐mass Euclidean distances within each of the two subsets of loops, for each of the selected designs. As a general trend, we observe that a positive twist tends to enlarge the distance between loops on the same side, while negative rotations place them in closer proximity (Figures 6b and S9). The loop–loop distance separation was found to range between 9 and 27 Å when twisting from negative to positive angles (Figure 6c); suggesting that negative rotations would enable more interaction (and hence cooperativity) between loops; as often observed in complementarity‐determining regions in antibodies. Overall, the immunoglobulin twist controls the global organization of anchoring sites for functional loops, which is key for designing target‐tailored scaffolds.

FIGURE 6.

FIGURE 6

Analysis of pairwise relative distances between putative functionalization spots. (a) The immunoglobulin fold has three functionalization spots on the top face (L1, L3, and L5 in blue) and three on the bottom face (L2, L4, and L6 in yellow). (b) Distribution of distances between the center‐of‐masses of L1:L3 and L2:L6 (y axis) as a function of the β‐sheet–β‐sheet twist (x axis). (c) Examples illustrating how twisting from positive (top) to negative (bottom) rotations impact the relative Euclidean distance between functionalization spots.

2.6. Immunoglobulins with antinatural positive twists are harder to design

Out of the 45,769 high‐quality immunoglobulins, 47% of the designs are shaped by rather neutral twists ranging from −10° to +10° (21,508), 45% have negative twists from −58.5° to −10.0° (21,508) and the rest (3567) have twists higher than +10°, representing the latter only the 8% of the total population. We attempted to elucidate the origin of this imbalance by computing several physics‐based metrics including the Rosetta (per residue) energy score, buried and accessible surface areas as well as the shape complementary between opposing β‐sheets. In general, we notice that immunoglobulins with negative twists tend to be larger (in number of residues) than those with positive ones, which translates into differences in their buried surface areas (BSAs) (Figure 7a)—BSA is closely related with the stability of the native conformation in monomeric formats (Rocklin et al., 2017). Other design scores related to packing efficiency and protein folding stability, such as the Rosetta energy (Figure 7b), also tended to be more favorable for Ig designs with negative twists (Figure S10). These results suggest that non‐positive twist Ig scaffolds tend to be better predicted and scored.

FIGURE 7.

FIGURE 7

Analysis of immunoglobulin designs with positive twists. (a) Size (left) and buried surface area (right) distributions of the 45,769 high‐quality Ig designs with respect to their β‐sheet–β‐sheet twist. (b) Comparison of Rosetta per residue scores (the more negative, the better). (c) Design of Igs domains from high‐quality templates by fold conditioning (left) and motif scaffolding (right) by RFdiffusion. For the motif scaffolding protocol, only the six ββ loops were redesigned for each of the 100 templates, while designs from the fold conditioning protocol are newly designed scaffolds. (d) 50,274 Igs domains generated by the fold conditioning protocol with respect to their twist. (e) 54,897 Igs domains generated by the motif scaffolding protocol with respect to their twist. (f) The 10,690 high‐quality scaffolds passing our filtering criteria from the fold conditioning protocol. Top‐right inset shows the tiny fraction (290 designs) exhibiting positive twists. (g) The 408 high‐quality scaffolds with positive twists passing our filtering criteria from the motif scaffolding protocol.

Based on the difficulty of designing Ig domains with positive twists using different protocols, we next set out to design Ig scaffolds with positive twists using RoseTTAFold Diffusion (Watson et al., 2023) (RFdiffusion), which has shown outstanding results for de novo protein backbone generation and higher performance than other methods in several design tasks. To this end, we designed 200,000 Ig scaffolds by two different protocols included within RFdiffusion: fold conditioning and motif scaffolding (100,000 designs each). These are two template‐based protocols that bias the diffusion model and guide the generation of backbone structures using precomputed structural information. On the one hand, fold conditioning extracts secondary structure information and contacts from a given protein structure template to bias the generation of structural diversity around the template fold (Figure 7c, left). On the other hand, motif scaffolding allows to build protein segments around a given structural motif (Figure 7c, right). For both methodologies, we used the same template set of 100 high‐quality de novo 7‐stranded immunoglobulin scaffolds with exclusive positive twist rotations ranging from 0° to +40° (Figure 7c, central scheme). While in fold conditioning all the structure was built anew, in motif scaffolding we rebuilt both β‐hairpin and β‐arch loop connections using β‐sheet residues as the motif. After backbone generation by RFdiffusion, amino acid sequences were designed with ProteinMPNN and validated by protein structure predictions with ESMFold (Lin et al., 2023) and AF2 (see Section 4 for details).

From the pool of 200,000 diffused models, we could unequivocally identify ~50% of them (50,274 and 54,897 for fold conditioning and motif scaffolding, respectively. Figure 7d,e) as 7‐stranded immunoglobulin domains: two well‐defined β‐sheets (i.e., formed by continuous and regular β‐strands) with three and four antiparallel β‐strands, proper strand connectivity and optimal hydrogen backbone pairing, as in our de novo designed scaffolds. The designs were found to be structurally diverse while exploring structures diverging from parametric designs, and naturally occurring Ig domains (Figure S11). The remaining designs recapitulate either less regular Igs or misfolded entities (Figure S12). For the fold conditioning protocol, 10,980 unique designs showed excellent prediction metrics (as above described), with twists ranging from −81° to +16°, and a distribution centered at −19.4° (Figure 7f). However, only 290 of these high‐quality designs exhibit positive twists (2.5%, Figure 7f, inset). For the motif scaffolding protocol, instead, a much reduced fraction of the diffused scaffolds had high‐quality AF2 predictions (408 unique Igs, which represent <1% of the generated scaffolds) (Figure 7g), with twist rotations ranging from +0.02° to +41° and a median of +16.1°. Overall, even with the use of high‐quality template data, we observe again an evident bias towards the design of Ig scaffolds with negative twists, as the proportion of generated (and validated) scaffolds with positive twists by RFdiffusion is minor. In line with this observation, designs rescued by ProteinMPNN or inpainting tended to shift positive twist rotations to more neutral ones for a fraction of the designs. Taken together, these results suggest that Ig domains with positive twist rotations are generally more difficult to design and predict.

3. DISCUSSION

De novo design of all‐β proteins lags behind in comparison to all‐α or mixed αβ structures (Marcos & Silva, 2018; Pan & Kortemme, 2021). Specifically, de novo design of β‐sandwiches, such as jelly roll and Ig‐like structures, has been historically considered of great difficulty due to their tendency to oligomerize through their exposed strand edges and their slow folding kinetics, which can trap folding intermediates. First successful designs of these scaffolds (Chidyausiku et al., 2022; Marcos et al., 2018) required precise design rules for guiding the (fragment‐based) generation of their backbones towards structures that can be strongly encoded by an amino acid sequence (i.e., designable), which in turn allowed to minimize costly assessments of the folding energy landscape (Bradley et al., 2005). However, the structural diversity of the designs generated by fragment assembly was in part limited by the requirement of enumerating loop conformations favoring folding. Recent methods (Harteveld et al., 2022; Harteveld et al., 2023) for de novo protein design have been tested on designing some of these all‐β folds, including jelly roll and Ig‐like, but the experimental results showed that their design still remains challenging. Here, we have developed a parametric approach that eliminates the need of preset design rules to de novo design 7‐stranded immunoglobulin domains. In combination with DL‐based protein design and structure prediction, we have assembled and analyzed about 50,000 highly accurate, confident and diverse 7‐stranded Ig‐like scaffolds. This approach can be easily adjusted to design β‐sandwich domains of varying number of β‐strands and with different β‐strand connectivity.

Besides structure validation, AF2 predictions enabled the diagnosis of specific design issues. Using AF2 as a diagnostic tool, we demonstrated how DL‐based design algorithms can turn low‐quality designs into high‐quality ones with accurate, confident and convergent AF2 predictions. ProteinMPNN significantly enhanced the structural convergence of AF2 predictions across the five models, whereas inpainting effectively rectified structural deviations in the loops. Both methods proved highly efficient in rescuing designs, with success rates of 86% for ProteinMPNN and 51% for inpainting. Notably, inpainting represents a remarkable advance in addressing the long‐standing problem of loop closure in protein design. Yet, Ig domains with positive rotations were found to be the most challenging geometries for these approaches, including the parametric approach. To further address this issue, we explored the application of RFdiffusion to generate positively rotated backbones using a set of high‐quality templates as a bias for fold conditioning and motif scaffolding. Although these approaches were effective in producing backbones that can support sequences approved by AF2, they still showed a bias towards neutral or negative twist rotations. In addition to possible limitations of the design methods, the observed rotational preferences may be related to more effective packing arrangements (Figure 4) and conformational loop preferences found in Ig domains with neutral or negative rotations (Figure S13). The difficulty of designing Ig domains with positive twist rotations also agrees with the observation that naturally occurring Ig domains (antibodies, nanobodies and natural Igs subgroups) have predominantly negative rotations (medians between −36° and −29°). Interestingly, the designs' twist distribution is shifted to more neutral values (Figure 4a) (median −10°), which must also contribute to the new structural space explored by the designs.

The DL revolution in protein structure prediction and design minimizes the need for accurate design rules. While DL‐based structure prediction allows to accurately and economically assess the folding of designed sequences, DL‐based design methods for backbone generation and sequence design offer efficient design solutions with minimal designer input (Ovchinnikov & Huang, 2021). In light of this paradigm shift, we have shown in this article that Ig domains can be designed at large scale—that is, in terms of number and structural diversity of novel Ig domains with high‐quality AF2 predictions—by combining parametric and DL methods, without preset loop conformations (Chidyausiku et al., 2022) favoring the folding of the Ig core structure. We have performed a large‐scale AF2 prediction for ~4 million Ig designs (~2.5 million from parametric design; ~145,000 from ProteinMPNN; 310,000 from Inpainting; and 1 million from RFdiffusion). The comprehensive design dataset of high‐quality Ig domains assembled here enabled us to identify distinct conformational preferences of β‐arch loops and understand the degree of interdependence they exhibit in the Ig fold. Our analyses also uncovered that the twist rotations of the de novo designed Igs are preferentially negative as in naturally occurring immunoglobulins. These findings hold significant importance as they influence the overall structure of Ig domains, and hence the spatial arrangement of anchoring sites for functional loops. These scaffolds hold promise for loop scaffolding based on our previous studies showing that single‐ (Chidyausiku et al., 2022) and two‐domain (Roel‐Touris et al., 2023) de novo Ig designs can harbor long calcium‐binding loops. It will be exciting in future work to investigate the potential of this high‐quality scaffold library to anchor flexible, functional loops for protein‐binding applications, either through high‐throughput screening of loop libraries (Könning & Kolmar, 2018; Wellner et al., 2021) or computational design of grafted or entirely new loops. For example, we note that de novo Ig scaffolds from this work can be confidently predicted without compromising folding when grafting single CDRs from antibodies, despite making the prediction generally more difficult (Figure S14). Overall, rather than establishing principles in advance for the design process, it is now feasible to derive post‐design principles by gathering extensive datasets through DL structure prediction and design. This ultimately opens a different perspective for advancing our understanding on how proteins can be designed from scratch.

4. MATERIALS AND METHODS

4.1. Parametric design of immunoglobulin domains

The parametric design protocol was implemented using PyRosetta (Chaudhury et al., 2010). We built a library of ideal 3‐stranded and 4‐stranded β‐sheets of varying residue length per strand in two different chiralities, as defined by the directionality of the Cβ of the termini residues. We focused on ideal β‐sheets to prioritize sampling on the geometrical parameters and make the resulting combinatorial problem more manageable. For the first chirality, we generated a single poly‐valine β‐strand of 8 residues long using ϕ/ψ values of −140°/+130°, which corresponds to the extended region of the Ramachandran plot. Then, we made a copy of this strand, rotated it 180° and performed a lateral translation of 5 Å. This process was repeated until obtaining 3‐stranded and 4‐stranded antiparallel β‐sheets. For the second chirality, instead, we generated a poly‐valine β‐strand of 9 residues using the same ϕ/ψ values and trimmed the first valine; effectively forming a β‐strand of 8 residues, where the terminal Cβ points in the opposite direction. We repeated the process of rotating and translating as explained above. Having all 4 β‐sheets, we relaxed the backbones using harmonic AtomPair constraints between NH– and CO– groups of adjacent β‐strands with the FastRelax mover implemented in RosettaScripts (Fleishman et al., 2011). From these relaxed β‐sheets, we then built the remaining ones containing 7 and 6 residues per β‐strand by removing 1 or 2 residues per β‐strand, respectively.

We combined all pairs of 3‐stranded and 4‐stranded β‐sheets of different or same chiralities restricted to a maximum difference of 1 residue between them – i.e., we made the combination between a 3‐stranded β‐sheet of 6 residues per strand with the 4‐stranded β‐sheet of 7 residues but not with that of 8 residues. The combinatorial yielded a total of 28 β‐sheet pairs, for which we generated 20,000 open backbones each sampling relative β‐sheet–β‐sheet twist rotations from −60° to 60°, translations between 10 Å and 12 Å and further transformations along two orthogonal axes from −5 Å to 5 Å. We then carried out loop closure of these structures with the Blueprint Builder (Koga et al., 2012) mover in two steps: first designing the three β‐hairpins (with canonical AA/BG/EA/GG or BAAGB ABEGO conformations for hairpins with L and R chiralities (Lin et al., 2015), respectively), and secondly the three β‐arches (with loops ranging between 3 and 6 residues in length, and conformationally unrestricted). For each closed scaffold, we designed five different sequences with the FastDesign (Bhardwaj et al., 2016) mover implemented in RosettaScripts. The backbone relaxation process performed during sequence design accommodates structural adjustments on degrees of freedom not explicitly considering during parametric sampling, including small changes in β‐sheet curvature or rotations through the two axes orthogonal to the twist rotation. With this method, we obtained a total of 2,451,516 de novo Ig domains; 1,106,651 with positive rotations and 1,344,865 with negative twists. The pool of closed designs represents 88% of all sampled backbones.

We probed the capability of the designed sequences to recapitulate the structures with AlphaFold2. For this, we used the local installation (LocalColabFold: github.com/YoshitakaMo/localcolabfold) of the optimized AlphaFold2 software version (ColabFold) for protein structure predictions (Mirdita et al., 2022). For all protein structure predictions with LocalColabFold, the ‐use_turbo option was enabled for optimal speed, the number of recycles was kept at 3 (default), the predictions were made in the absence of multiple sequence alignment (‐‐msa_method single_sequence) and relaxed using the Amber force field. We calculated structural similarity metrics (TM‐score and RMSDs) for all predicted models against their reference and those with averaged composite scores (TM‐score * pTM * pLDDT) ≥0.5 and averaged RMSDs ≤1.25 Å across top 3 AF2 predictions with RMSD standard deviations ≤0.25 Å across all five predictions were classified as highly confident and accurate de novo immunoglobulins and selected for further characterization.

4.2. Analysis of β‐sheet–β‐sheet twist rotations

We recomputed twist rotations for all closed β‐sandwiches including original designs and redesigns (sequence and structure). First, we defined a vector connecting the center of masses of Cα atoms between the termini residues of β‐strands E1, E2, and E5 (for the 3‐stranded β‐sheet) and β‐strands E3, E4, E6, and E7 for the 4‐stranded β‐sheet. Then, we projected these two vectors onto a perpendicular plane to the connecting vector of both β‐sheets (from their center of masses). Finally, we calculated the angle between the projected vectors and used it as an estimation of the twist rotation.

4.3. Sequence and structure redesign using deep learning

For sequence redesign, we used ProteinMPNN and generated 50 new sequences for each scaffold with default parameters (‐sampling_temp 0.1, ‐seed 37, ‐batch_size 1). For structural validations, each of the 50 sequences were threaded onto their original scaffold using the SimpleThreading mover and relaxed with the FastRelax mover, both implemented in RosettaScripts (Fleishman et al., 2011).

For structure redesign, we first aligned each of the selected scaffolds to their corresponding first AlphaFold2 prediction. Then, for each of the six different loops (3 β‐hairpins and 3 β‐arches), we computed RMSDs and those with RMSD values >1.25 Å were selected for inpainting (using RFDesign) new loops with the same length or adding/removing one residue. Finally, the generated scaffolds were sorted according to their inpaint_lddt and a maximum of five models were further relaxed.

For both methods, we performed protein structure predictions with AlphaFold2 in the absence of multiple sequence alignment and using default parameters.

4.4. Design of immunoglobulins with positive rotations using RFdiffusion

We randomly selected 100 highly accurate confident and convergent de novo immunoglobulin scaffolds with positive rotations with twists ranging from +11° to +37°, designed by our parametric approach, and used it as a template set. For the fold conditioning protocol, we first used the helper script “make_secstruc_adj.py” provided by RFdiffusion to obtain secondary structure and block adjacency information from the template set. We then generated 100,000 backbones by conditioning the monomer fold, with default noise parameters and allowing an insertion of 0–3 residues in each loop of the scaffold. For the motif scaffolding protocol, we generated 100,000 backbones with positive twist rotations by fixing all residues of each β‐strand, except for the first and last one. In this way, we used the core of the Ig framework as motif, allowing the design of all ββ loops with lengths ranging from 3 to 10 residues, which also accounts for β‐strand extensions if needed by diffusion.

For both protocols, five sequences per backbone were generated with ProteinMPNN using default parameters, and were subsequently threaded onto their original scaffold using the SimpleThreading and FastRelax movers both implemented in RosettaScripts. To verify that these designs had the same 7‐stranded Ig topology as the parametric designs, we computed inter‐strand distances between E1, E2, and E5 (for the 3‐stranded β‐sheet) and strands E3, E4, E6, and E7 (for the 4‐stranded β‐sheet). We then calculated their twist rotation values as explained above.

For the in silico validations, we first performed protein structure predictions with ESMFold and default parameters (model = esm.pretrained.esmfold_v1()) and designs with pLDDT scores >90 underwent protein structure predictions with AlphaFold2 (using default parameters and without multiple sequence alignment or template information). Finally, we calculated structural similarity metrics (TM‐score and RMSDs) for all predicted models against their reference (originally generated scaffold from RFdiffusion) and those with averaged composite scores (TM‐score * pTM * pLDDT) ≥0.5 and averaged RMSDs ≤1.25 Å across the top 3 AF2 predictions with RMSD standard deviations ≤0.25 Å across all five predictions were classified as highly confident de novo immunoglobulins.

AUTHOR CONTRIBUTIONS

Enrique Marcos Benteo: Conceptualization; investigation; funding acquisition; writing – original draft; writing – review and editing; supervision; project administration; resources; visualization; formal analysis; methodology; validation. Jorge Roel‐Touris: Conceptualization; investigation; writing – original draft; methodology; validation; visualization; software; supervision; data curation; formal analysis; funding acquisition. Lourdes Carcelén: Writing – review and editing; methodology; software; investigation; validation; visualization; formal analysis.

CONFLICT OF INTEREST STATEMENT

The research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

CODE AVAILABILITY STATEMENT

The Rosetta macromolecular modeling suite (http://www.rosettacommons.org) and the PyRosetta software (https://www.pyrosetta.org) are freely available to academic and non‐commercial users. Computational protocols for designing protein structures are available at https://github.com/JorgeRoel/betasandwich.

Supporting information

Data S1. Supporting Information.

PRO-33-e4936-s001.docx (6.3MB, docx)

ACKNOWLEDGMENTS

We acknowledge computing resources provided by the Galicia Supercomputing Center (CESGA), and the Red Española de Supercomputación (grants BCV‐2021‐1‐0014 and BCV‐2021‐3‐0010). This research was supported by grants from the Spanish Ministry of Science and Innovation (RYC2018‐025295‐I, EUR2020‐112164 and PID2020‐120098GA‐I00). J.R.T. was supported by an EMBO postdoctoral fellowship (under grant agreement ALTF 145‐2021). L.C. was supported by a predoctoral fellowship from the Spanish Ministry of Science and Innovation (PRE2021‐098555). We also acknowledge support of the publication fee by the CSIC Open Access Publication Support Initiative through its Unit of Information Resources for Research (URICI).

Roel‐Touris J, Carcelén L, Marcos E. The structural landscape of the immunoglobulin fold by large‐scale de novo design. Protein Science. 2024;33(4):e4936. 10.1002/pro.4936

Review Editor: Aitziber L. Cortajarena

DATA AVAILABILITY STATEMENT

Design models of the high‐quality immunoglobulin domains are available online (https://zenodo.org/record/8380285). Other data are available from the corresponding author upon request.

REFERENCES

  1. Andreeva A, Kulesha E, Gough J, Murzin AG. The SCOP database in 2020: expanded classification of representative family and superfamily domains of known protein structures. Nucleic Acids Res. 2020;48:D376–D382. [DOI] [PMC free article] [PubMed] [Google Scholar]
  2. Bhardwaj G, Mulligan VK, Bahl CD, Gilmore JM, Harvey PJ, Cheneval O, et al. Accurate de novo design of hyperstable constrained peptides. Nature. 2016;538:329–335. [DOI] [PMC free article] [PubMed] [Google Scholar]
  3. Bork P, Holm L, Sander C. The immunoglobulin fold. J Mol Biol. 1994;242:309–320. [DOI] [PubMed] [Google Scholar]
  4. Bradley P, Misura KMS, Baker D. Toward high‐resolution de novo structure prediction for small proteins. Science. 2005;309:1868–1871. [DOI] [PubMed] [Google Scholar]
  5. Chaudhury S, Lyskov S, Gray JJ. PyRosetta: a script‐based interface for implementing molecular modeling algorithms using Rosetta. Bioinformatics. 2010;26:689–691. [DOI] [PMC free article] [PubMed] [Google Scholar]
  6. Chidyausiku TM, Mendes SR, Klima JC, Nadal M, Eckhard U, Roel‐Touris J, et al. De novo design of immunoglobulin‐like domains. Nat Commun. 2022;13:5661. [DOI] [PMC free article] [PubMed] [Google Scholar]
  7. Dauparas J, Anishchenko I, Bennett N, Bai H, Ragotte RJ, Milles LF, et al. Robust deep learning‐based protein sequence design using ProteinMPNN. Science. 2022;378:49–56. [DOI] [PMC free article] [PubMed] [Google Scholar]
  8. Dunbar J, Krawczyk K, Leem J, Baker T, Fuchs A, Georges G, et al. SAbDab: the structural antibody database. Nucleic Acids Res. 2014;42:D1140–D1146. [DOI] [PMC free article] [PubMed] [Google Scholar]
  9. Fleishman SJ, Leaver‐Fay A, Corn JE, Strauch E‐M, Khare SD, Koga N, et al. RosettaScripts: a scripting language Interface to the Rosetta macromolecular modeling suite. PLoS One. 2011;6:e20161. [DOI] [PMC free article] [PubMed] [Google Scholar]
  10. Halaby DM, Poupon A, Mornon J‐P. The immunoglobulin fold family: sequence analysis and 3D structure comparisons. Protein Eng Des Sel. 1999;12:563–571. [DOI] [PubMed] [Google Scholar]
  11. Harteveld Z, Bonet J, Rosset S, Yang C, Sesterhenn F, Correia BE. A generic framework for hierarchical de novo protein design. Proc Natl Acad Sci U S A. 2022;119:e2206111119. [DOI] [PMC free article] [PubMed] [Google Scholar]
  12. Harteveld Z, Van Hall‐Beauvais A, Morozova I, Southern J, Goverde C, Georgeon S, et al. Exploring “dark matter” protein folds using deep learning. Bioinformatics. 2023. [DOI] [PubMed] [Google Scholar]
  13. Hennetin J, Jullian B, Steven AC, Kajava AV. Standard conformations of β‐arches in β‐solenoid proteins. J Mol Biol. 2006;358:1094–1105. [DOI] [PubMed] [Google Scholar]
  14. Jost C, Plückthun A. Engineered proteins with desired specificity: DARPins, other alternative scaffolds and bispecific IgGs. Curr Opin Struct Biol. 2014;27:102–112. [DOI] [PubMed] [Google Scholar]
  15. Jumper J, Evans R, Pritzel A, Green T, Figurnov M, Ronneberger O, et al. Highly accurate protein structure prediction with AlphaFold. Nature. 2021;596:583–589. [DOI] [PMC free article] [PubMed] [Google Scholar]
  16. Kabsch W, Sander C. Dictionary of protein secondary structure: pattern recognition of hydrogen‐bonded and geometrical features. Biopolymers. 1983;22:2577–2637. [DOI] [PubMed] [Google Scholar]
  17. Kintzing JR, Filsinger Interrante MV, Cochran JR. Emerging strategies for developing next‐generation protein therapeutics for cancer treatment. Trends Pharmacol Sci. 2016;37:993–1008. [DOI] [PMC free article] [PubMed] [Google Scholar]
  18. Koga N, Tatsumi‐Koga R, Liu G, Xiao R, Acton TB, Montelione GT, et al. Principles for designing ideal protein structures. Nature. 2012;491:222–227. [DOI] [PMC free article] [PubMed] [Google Scholar]
  19. Könning D, Kolmar H. Beyond antibody engineering: directed evolution of alternative binding scaffolds and enzymes using yeast surface display. Microb Cell Fact. 2018;17:32. [DOI] [PMC free article] [PubMed] [Google Scholar]
  20. Leman JK, Weitzner BD, Lewis SM, Adolf‐Bryfogle J, Alam N, Alford RF, et al. Macromolecular modeling and design in Rosetta: recent methods and frameworks. Nat Methods. 2020;17:665–680. [DOI] [PMC free article] [PubMed] [Google Scholar]
  21. Lin Y‐R, Koga N, Tatsumi‐Koga R, Liu G, Clouser AF, Montelione GT, et al. Control over overall shape and size in de novo designed proteins. Proc Natl Acad Sci U S A. 2015;112:E5478–E5485. [DOI] [PMC free article] [PubMed] [Google Scholar]
  22. Lin Z, Akin H, Rao R, Hie B, Zhu Z, Lu W, et al. Evolutionary‐scale prediction of atomic‐level protein structure with a language model. Science. 2023;379:1123–1130. [DOI] [PubMed] [Google Scholar]
  23. Marcos E, Chidyausiku TM, McShan AC, Evangelidis T, Nerli S, Carter L, et al. De novo design of a non‐local β‐sheet protein with high stability and accuracy. Nat Struct Mol Biol. 2018;25:1028–1034. [DOI] [PMC free article] [PubMed] [Google Scholar]
  24. Marcos E, Silva D. Essentials of de novo protein design: methods and applications. WIREs Comput Mol Sci. 2018;8:E1374. [Google Scholar]
  25. Mirdita M, Schütze K, Moriwaki Y, Heo L, Ovchinnikov S, Steinegger M. ColabFold: making protein folding accessible to all. Nat Methods. 2022;19:679–682. [DOI] [PMC free article] [PubMed] [Google Scholar]
  26. Ovchinnikov S, Huang P‐S. Structure‐based protein design with deep learning. Curr Opin Chem Biol. 2021;65:136–144. [DOI] [PMC free article] [PubMed] [Google Scholar]
  27. Pan X, Kortemme T. Recent advances in de novo protein design: principles, methods, and applications. J Biol Chem. 2021;296:100558. [DOI] [PMC free article] [PubMed] [Google Scholar]
  28. Peñas‐Utrilla D, Marcos E. Identifying well‐folded de novo proteins in the new era of accurate structure prediction. Front Mol Biosci. 2022;9:991380. [DOI] [PMC free article] [PubMed] [Google Scholar]
  29. Rocklin GJ, Chidyausiku TM, Goreshnik I, Ford A, Houliston S, Lemak A, et al. Global analysis of protein folding using massively parallel design, synthesis, and testing. Science. 2017;357:168–175. [DOI] [PMC free article] [PubMed] [Google Scholar]
  30. Roel‐Touris J, Nadal M, Marcos E. Single‐chain dimers from de novo immunoglobulins as robust scaffolds for multiple binding loops. Nat Commun. 2023;14:5939. [DOI] [PMC free article] [PubMed] [Google Scholar]
  31. Roney JP, Ovchinnikov S. State‐of‐the‐art estimation of protein model accuracy using AlphaFold. Phys Rev Lett. 2022;129:238101. [DOI] [PMC free article] [PubMed] [Google Scholar]
  32. Schumacher D, Helma J, Schneider AFL, Leonhardt H, Hackenberger CPR. Nanobodies: chemical functionalization strategies and intracellular applications. Angew Chem Int Ed. 2018;57:2314–2333. [DOI] [PMC free article] [PubMed] [Google Scholar]
  33. Sha F, Salzman G, Gupta A, Koide S. Monobodies and other synthetic binding proteins for expanding protein science. Protein Sci. 2017;26:910–924. [DOI] [PMC free article] [PubMed] [Google Scholar]
  34. Wang J, Lisanza S, Juergens D, Tischer D, Watson JL, Castro KM, et al. Scaffolding protein functional sites using deep learning. Science. 2022;377:387–394. [DOI] [PMC free article] [PubMed] [Google Scholar]
  35. Watson JL, Juergens D, Bennett NR, Trippe BL, Yim J, Eisenach HE, et al. De novo design of protein structure and function with RFdiffusion. Nature. 2023;620:1089–1100. [DOI] [PMC free article] [PubMed] [Google Scholar]
  36. Wellner A, McMahon C, Gilman MSA, Clements JR, Clark S, Nguyen KM, et al. Rapid generation of potent antibodies by autonomous hypermutation in yeast. Nat Chem Biol. 2021;17:1057–1064. [DOI] [PMC free article] [PubMed] [Google Scholar]
  37. Zhang Y, Skolnick J. TM‐align: a protein structure alignment algorithm based on the TM‐score. Nucleic Acids Res. 2005;33:2302–2309. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Data S1. Supporting Information.

PRO-33-e4936-s001.docx (6.3MB, docx)

Data Availability Statement

Design models of the high‐quality immunoglobulin domains are available online (https://zenodo.org/record/8380285). Other data are available from the corresponding author upon request.


Articles from Protein Science : A Publication of the Protein Society are provided here courtesy of The Protein Society

RESOURCES