Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2024 Dec 15.
Published in final edited form as: ACS Synth Biol. 2023 Nov 27;12(12):3608–3622. doi: 10.1021/acssynbio.3c00409

Hyperstable synthetic mini-proteins as effective ligand scaffolds

Paul L Blanchard 1, Brandon J Knick 1, Sarah A Whelan 1, Benjamin J Hackel 1,*
PMCID: PMC10822706  NIHMSID: NIHMS1960700  PMID: 38010428

Abstract

Small, single-domain protein scaffolds are compelling sources of molecular binding ligands with the potential for efficient physiological transport, modularity, and manufacturing. Yet, mini-proteins require a balance between biophysical robustness and diversity to enable new function. We tested the developability and evolvability of millions of variants of 43 designed libraries of synthetic 40-amino acid βαββ proteins with diversified sheet, loop, or helix paratopes. We discovered a scaffold library that yielded hundreds of binders to seven targets while exhibiting high stability and soluble expression. Binder discovery yielded 6–122 nM affinities without affinity maturation and Tms averaging ≥78 °C. Broader βαββ libraries exhibited varied developability and evolvability. Sheet paratopes were the most consistently developable, and framework 1 was the most evolvable. Paratope evolvability was dependent on target, though several libraries were evolvable across many targets while exhibiting high stability and soluble expression. Select βαββ proteins are strong starting points for engineering performant binders.

Keywords: protein engineering, ligand scaffolds, developability, evolvability, mini-protein

Graphical Abstract

graphic file with name nihms-1960700-f0001.jpg

Introduction

Molecular binding ligands empower biology and medicine. Small, single-domain scaffolds have many promising properties due to their small size.1,2 They exhibit high tumor penetration3,4 and have high modularity which enables linkage to other proteins for multifunctional fusions and bispecifics57. Their simple structure and lack of need for glycosylation allows for production in bacteria1,2 or chemical synthesis8, resulting in easy manufacturing. Small size also allows for controlled clearance kinetics – small proteins are effective for early time molecular imaging due to fast clearance resulting in high contrast1, yet clearance times can be extended via addition of poly(ethylene glycol)9, Fc, or an albumin-binding peptide as needed.

An effective approach to engineer small protein ligands utilizes scaffolds, which are proteins consisting of a conserved stable backbone structure, alongside a paratope which is diversified to discover new function. The scaffold concept aims to decouple the functional binding site from the stabilizing core10. These scaffolds are typically designed based upon a highly stable starting protein due to their ability to tolerate more destabilizing, but functional, mutations on the path to discovery of new function11,12. Many small protein scaffolds are in use, including affibodies13,14, fibronectins15, anticalins16, cystine knots17, Gp218, and many others19. Several of these small proteins have reached clinical trials, and a few have been FDA approved7,20,21. However, they exhibit a range of performance across metrics including evolvability and developability. Developability is the set of properties a protein must possess to be translated from the laboratory to genuine use, such as stability, solubility, and cellular expression.2224 Developability liabilities add substantial cost and time to drug discovery, reduce efficacy, and may prevent utility altogether22,23,25. We define evolvability as the efficiency of discovery of new function from a naïve library; other groups use the term innovability for this property10.

Yet, as protein size is reduced to yield the aforementioned benefits of small proteins, separation of framework and paratope is increasingly challenging. This heightens the need for paratope efficiency to provide sufficient surface area to achieve specific, high-affinity interaction26 while limiting destabilizing diversification12,2730. Effective identification of tolerant paratope sites, and amino acid options, can aid evolutionary efficiency3139. More fundamentally, the molecular factors that impact the efficacy of a protein topology and paratope selection to serve as a scaffold are not well understood. Thus, we aimed to systematically evaluate various scaffold library strategies to assess their impact on developability and evolvability. Technologically, our goal was to expand the repertoire of available ligand scaffolds in pursuit of a more robust solution.

Motivated by the pursuit of small size and high stability for ligand scaffolds, we were intrigued by a family of synthetic hyperstable mini-proteins designed by the Baker group40. Several of these computationally designed βαββ proteins (comprising three β sheets and one α helix, Figure 1) were more protease resistant than the most stable natural monomeric protein in the Protein Databank41. Yet the molecules were designed to stably fold into the βαββ topology, not necessarily to tolerate diversification or enable selective binding.

Figure 1. Scaffold candidates have diverse frameworks and paratopes.

Figure 1.

Four highly stable frameworks were diversified at three different paratope secondary structures. Within the sheets, the number and location of diversified sites was varied. (A) Structure of frameworks and library schemes (diversified sites shown in red spheres) with framework stabilities. (B) Sequence of frameworks and libraries. • indicates maintenance of parental sequence. A red X indicates amino acid diversity. (C) Structural similarity between frameworks.

We evaluated the potential of synthetic mini-proteins as scaffolds by systematically varying the framework sequence across several stable variants; the paratope location across β sheets, α helices, and loops; the paratope size; and the paratope amino acids. Sorting of several combinatorial libraries yielded stable binders without affinity maturation to a range of targets, including the cancer vasculature marker42 and T-cell suppressor43,44 B7-H3, B-cell malignancy and lung cancer biomarker CD2245,46, and prostate cancer biomarker prostate-specific membrane antigen (PSMA)47. Overall, discovery of binders to seven targets and measurement of five metrics of developability for millions of variants in 43 libraries revealed elements that dictate performance. Select βαββ proteins are a compelling starting point for the development of binders due to their small size, high stability, and moderate affinity.

Results

Select synthetic βαββ mini-proteins function as ligand scaffolds

We selected four of the most stable βαββ framework sequences40 as lead molecules to build ligand scaffold libraries (Figure 1A,B). The frameworks exhibit only 30% sequence identity across all four frameworks and 50–55% sequence identity between any two frameworks yet maintain structural similarity (average root-mean squared difference of 0.8 Å, Figure 1C) thereby providing a diverse set of chemical contexts within this topology. Various secondary structures have functioned as engineered ligand paratopes in other scaffolds20,48, albeit with differential performance49. Moreover, the relative mutational tolerance and binding capacity of different regions of these synthetic scaffolds are not known. Thus, we diversified the β-sheet surface, α-helical surface, or a pair of adjacent solvent-exposed loops (Figure 1A,B) and assessed the evolvability and developability of the resultant collection of variants. The α-helix and loops provided nine surface sites, which is a reasonable paratope size to engineer novel binding functionality26,50. The full β-sheet surface provides 11 sites. For a more even comparison across structures and to avoid destabilizing over-diversification, we also created libraries with partial diversification of the β-sheet surface focused on either two of the three strands or a reduced ‘length’ of the sheet (Figure 1A,B).

We assembled diversified genes (Table S1) and transformed them into a yeast display system51,52, as genetic fusions to the C-terminus of Aga2p with flanking epitope tags, via homologous recombination with a pCT vector. We achieved an average of 3.8×106 transformants (range: 1.1×105 – 1.7×107) per library for a total of 1.7×108 variants. 93% of transformants had DNA sequences that matched library intention, and amino acid frequencies matched intentions with an average deviation of 0.8% (Table S2).

To assess evolvability of each framework and paratope (secondary structure, size, location, and sequence), we pooled all the libraries and sorted the collective library for binders to seven targets with potential clinical utility: PSMA, CD22, natural killer cell p46-related protein (NKp46; also known as NCR1), insulin receptor isoform A (IRA), tumor endothelial marker 8 (Tem8; also known as ATR1), immunoglobulin G (IgG), and B7 homolog 3 (B7-H3; also known as CD276). These proteins have varied structure (Figure S1) and thereby test the scaffolds’ abilities to bind to a wide variety of protein structures. Populations of variants exhibiting specific binding to nM concentrations of target were enriched for all targets following magnetic-activated cell sorting (MACS) and fluorescence-activated cell sorting (FACS) without the need for affinity maturation (Figure S2). Binding populations showed minimal binding to multiple non-target proteins at 100 nM (Figure S2). Deep sequencing of all binding populations revealed the diversity and identity (framework, paratope type, size, location, and sequence) of binders. 30,012 unique binders were discovered against 7 targets with the most effective libraries yielding an average of 215–772 binders per target.

To further assess scaffold functionality, fifteen randomly selected variants to B7-H3, CD22, IgG, and PSMA from the enriched binding pool were tested for affinity, and the twelve best binders were produced and characterized (Figure 2, Table S3). The best of the tested variants had high affinity (range: 6 to 33 nM) while several others had more moderate affinity (range: 103 to 122 nM) (Figure 2A, Figure S3A). Two of the tested CD22 variants had minimal binding at 100 nM and were not tested further. The tested variants also possessed high specificity with minimal binding to several off-target proteins (Figure 2B, Figure S3B). Off-target binding was assessed with three replicates against the following off targets: B7-H3 binders against PSMA and CD22, PSMA binders against CD22 and B7-H3, CD22 binders against B7-H3 and PSMA, and IgG binders against B7-H3. To further assess the specificity and functional binding of these proteins, recombinantly produced B7-H3 binders were tested against mammalian cell expressed B7-H3 and B7-H3 KO cells. B7-H3 binding variants exhibit strong B7-H3 binding at 50 nM and essentially no binding to B7-H3 KO cells at 1000 nM (Figure S4). Produced proteins had an average Tm of ≥ 78 °C (range: 54 °C to ≥ 90 °C) (Figure 2C,D, Figure S5) without selecting for stability and an average expression of 1.5 mg/L (range 0.02 to 5.0) (Figure 2D) with no codon or protocol optimization; the lone exceptions were IgG #1 and IgG #2, which were codon optimized and exhibited the highest yields of tested variants.

Figure 2. Characterization of binding variants.

Figure 2.

(A) Yeast displaying the indicated βαββ variant were incubated with titrated concentrations of B7-H3 or PSMA; binding was detected via flow cytometry and normalized to the range between the fitted maximum and minimum values. Different symbols indicate different replicate experiments. Additional curves in Figure S3A. (B) Yeast displaying the indicated βαββ variant were incubated with 100 nM non-target proteins or 50 nM target proteins, and binding was assessed via flow cytometry. The figure shown is one representative trial; additional trials and the remaining binders are shown in Figure S3B. (C) Purified βαββ, variant B7-H3 #1, was evaluated via circular dichroism as a wavelength scan from 190 – 260 nm at 25 °C (before or after 95 °C incubation) or 95 °C as well as a temperature scan from 25 to 95 °C with monitoring at 218 nm. Additional spectra are shown in Figure S5. (D) The table presents affinities, average maximum off-target binding (fluorescence of off target binding at 100 nM relative to fluorescence of on target binding at 50 nM), Tm measured via CD, structural deviation from wild-type (RMSD via AlphaFold2), AlphaFold2 confidences (Ave. pLDDT), and bacterial expression levels. N.D. indicates that a Tm was unable to be calculated. B7-H3 #4 had too low a yield to measure CD. PSMA #1 and IgG #3 had uncharacteristic CD curves and thus Tms were unable to be calculated. (E) AlphaFold2 predictions of structures are overlaid on top of each parent structure (black). 22 proteins emerged from FW1, 22 from FW2, 3 from FW3, and 13 from FW4. (F) Histogram of structural deviation between predicted protein structure versus the parent excluding two proteins with an average AlphaFold2 pLDDT < 70.

To assess if diversified variants fold comparably to parents, the structures for the 12 characterized variants as well as 48 other binding variants selected with a range of stabilities and targets were calculated via AlphaFold53. The majority of variants have predicted structures very similar to the parental frameworks (Figure 2E), with an average RMSD of 0.9 Å (Figure 2F). A caveat is that these structures are computationally predicted rather than experimentally measured. To account for this, circular dichroism was used to determine secondary structure of proteins (Figure 2C, Figure S5, Table S3), and all showed elements of both helical and sheet structure, with many showing similar secondary structures to parents. Inclusion of AS and GS linkers as well as a H6 tag may account for some of the differences from parental FWs.

Scaffold elements impact evolvability

Evolvability was measured and compared for each combination of framework, paratope structure, and paratope size and location by comparing enrichment of the library in the binding population relative to the unsorted collective library. βαββ scaffold evolvability varied over both framework and paratope (Figure 3A,B). FW1 exhibited the most evolvability across different paratopes while FW4 yielded abundant binders in several paratope options (Figure 3A), including a comparable enrichment of binders with the helix paratope as FW1 (Figure S6A). FW2 yielded binders primarily from the sheet paratope, while FW 3 was generally less effective except for diversification of eight sites in three sheets (β ABC8; see Figure S6B). Evolvability scores calculated using populations after magnetic bead sorting (MACS) showed little difference relative to populations after more stringent FACS sorting (MACS vs 100 nM FACS, R2=0.70; MACS vs 10 nM FACS, R2=0.82).

Figure 3. βαββ scaffolds exhibit varied evolvability.

Figure 3.

The evolvability scores – enrichment of that library in the binder pool relative to the unsorted pool – are shown for (A) each framework across all paratopes and (B) each paratope type across all paratope sizes and locations in each framework. Significance bar colors correspond to framework. (C) Evolvability scores for each paratope structure, computed as the median across frameworks and particular sets of paratope sites, are plotted for each target.

Overall, framework has a much larger impact on evolvability than paratope choice. Diversification of three adjacent sheets provides the most effective evolutionary performance (Figure 3B), however, for FW4, the helix paratope is most effective. Diversification of three sheets was more effective than diversification of two sheets for all frameworks (Figure S6A). ABC8 and ABC11 are the most performant three-sheet libraries, which may indicate a benefit of diversification of sites at the C-terminus of strand B and the N-terminus of strand C (Figure 1A and Figure S6B). AC7 is the least effective two-sheet library, which potentially indicates sheet B as being important to diversify for evolvability (Figure S6B). The helical paratope is moderately effective for the two most evolvable frameworks, with low effectiveness for the two least evolvable (Figure S6A). The loop paratope is comparably the least effective (Figure S6A) yet still functional as 2,274 binders were identified via loop diversification albeit at a much lower frequency (and/or binding strength) than from other paratopes. One reason that paratope choice seems to play less of a role than framework in determining evolvability, is that different targets prefer different paratopes for binding (Figure 3C).

Select synthetic βαββ mini-proteins are highly developable

Along with primary function, such as binding, developability is also requisite for protein utility. We evaluated multiple aspects of developability via two assays: proteolytic and thermal stability measured via a yeast display protease assay24,40; and solubility and expression measured via a bacterial split-GFP assay24. Thermal stability, which correlates to proteolytic and chemical stability,54 is a validated predictor of clinical developability55. Soluble expression aids utility and correlates with clinical advancement55. Stability evaluation was generalized by running the protease assay with two different proteases (proteinase K and thermolysin, chosen for their relatively broad profile of substrate sequences), each at two different temperatures (37 °C and 55 °C for proteinase K, 55 °C and 75 °C for thermolysin). Moderate difference was observed between protease treatments (R2=0.54), perhaps in part from moderate sequence dependence, indicating the importance of testing resistance to multiple proteases. Protease resistance between varied temperatures showed high correlation (proteinase K, R2=0.90; thermolysin, R2=0.87).

Unlike evolvability, developability varies more strongly with paratope type than framework (Figure 4A,B). FW1 variants exhibit the most stability to proteinase K (Figure 4A). FW3 exhibits slightly lower soluble expression than the other parental frameworks. From a paratope perspective, β-sheet diversity yields high protease stability to both proteinase K and thermolysin (Figure 4B). Helix paratopes generally exhibit comparably high thermolysin stability, but only FW 1 and 2 exhibit proteinase K stability. Conversely, loop paratopes exhibit low resistance to proteolysis. In addition to their superior protease resistance, β-sheet paratope variants also exhibit the highest soluble expression.

Figure 4. Scaffold developability varies by framework and paratope.

Figure 4.

Developability scores – computed as the enrichment of variants in the top 5% of protease resistant cells or top 10% of GFP expressing cells, normalized to the median across libraries – are shown for proteinase K, thermolysin, and split GFP. Statistical significance is shown under graphs with matching colors and line thickness; significance is designated by solid or dotted lines. Comparison of developability scores by (A) framework (B) paratope (C) sheet diversification.

While β-sheet paratope variants are generally the most developable, diversification of different sites on the sheets results in varied developability. The 3-strand paratope variants exhibit protease stability and soluble expression as good or better than various 2-strand paratopes (Figure 4C). Conversely to evolvability (Figure 3A), an intermediate number of diversified sites (9 vs. 8 or 11) excluding the C-terminus of strand B and the N-terminus of strand C is optimal for protease stability and soluble expression. Diversifying sheet B without diversifying both sheets A and C also is destabilizing. Three-sheet, 11-site diversification is very detrimental to bacterial expression.

These developability analyses were based on library prevalence in the most stable populations (denoted in text as developability score, detailed in Methods). Alternate analysis of the distribution of all stabilities across the library (denoted as developability enrichment score, detailed in Methods) generally aligned in indicating the most performant libraries (Figure S7). In several cases, however, these two stability scores do not correlate. While loop libraries have very few highly stable variants (Figure 4B), they have comparable average stabilities to helix or sheet libraries (Figure S7). While multiple FW 2 and FW 3 β3 libraries have many highly stable variants, they have much lower average stabilities to other libraries (Figures 4C, S7).

While stability varies across paratopes and frameworks for the libraries, binding variants generally exhibit higher stability than non-binders (Figure 5) especially against proteinase K. βαββ binders are generally more resistant to proteinase K than to thermolysin, and FW1 and FW4 are generally more resistant to proteinase K than FW2 or FW3. FW1 binders on average exhibit lower (albeit equal to other FWs) resistance to thermolysin than the naïve library does (Figure 5B). In clonal evolution, stability and function trade off12; thus, high parental stability is beneficial to stability to enable structural tolerance upon mutation for new or improved function. Yet, in the current context of library discovery, the trade-off is not strongly observed. Rather, variants that maintain structural integrity (as assessed via protease resistant stability) are more functional. Further studies are needed to assess if this observation results from reduced entropic cost of binding56,57 for rigid structures or if an alternative mechanism is driving the observation. In either case, the results indicate that parental hyperstability is insufficient for evolvability, but with appropriate diversification, novel function can be achieved with (and perhaps because of) maintained stability.

Figure 5. Binder populations have a consistently higher average stability than the unsorted libraries.

Figure 5.

The average developability enrichment score (enrichment upon protease resistance sorting) for the populations evolved to bind target are plotted versus the average developability enrichment scores of the naïve library. FW3 proline (pink, small shapes) have very few binders that were measured for stability (A) Proteinase K. (B) Thermolysin. Dotted line indicates equal stabilities between the binders and the naïve library.

Highly performant βαββ scaffolds

Developability and evolvability were examined simultaneously to identify libraries that exhibited high scores in both metrics (Figure 6A). The libraries with three b-strands diversified from framework 1 exhibit the best performance. FW1 β ABC8 emerged as the most promising scaffold. Other libraries are also effective albeit with moderate shortcomings revealed by analysis of the components of evolvability and developability (Figure 6B,C). FW1 β ABC11 is evolvable and stable but exhibits moderate recombinant expression. FW1 β ABC9 is highly developable but yielded few binders for Tem8 or B7-H3. Diversification of two β-strands in FW1 is reasonably effective albeit with poor thermolysin stability. The α-helix library of FW1 is stable and yields binders to several targets but exhibits moderate expression and is less effective against several other targets. A library of 11 sites in three β-strands in FW4 is also stable and moderately evolvable albeit with poor expression. Overall, nine libraries had both developability and evolvability scores above one. Many other highly developable libraries had low evolvability scores, whereas no libraries had low developability but high evolvability.

Figure 6. Nine libraries are mutually developable and evolvable, with three especially performant libraries.

Figure 6.

(A) The median evolvability score is plotted versus mean developability score. Developability scores above one indicate that library is more developable than the median library, while evolvability scores above one indicate that the library is more enriched in the binder pool than the naïve library. Paratope is indicated by shape, while framework is indicated by color. (B) Evolvability scores for each target for the nine most developable/evolvable libraries. (C) Developability scores for each assay for the nine most developable/evolvable libraries.

While some libraries were generally more evolvable than others, few libraries were enriched in the binding pool relative to the naïve library for all campaigns. The evolvability of the three paratope structures depends upon the target of choice, potentially due to shape differences in the target epitopes. Against three targets (B7-H3, IgG, and Tem8), helical libraries show strong evolvability and loops show the lowest evolvability (Figure 3C). Against three other targets (NKp46, IRA, PSMA), sheets display strong evolvability, and helix libraries show lower evolvability than loop libraries. Against CD22, all three paratope types have similar evolvability scores. While the mechanism behind these preferences is not fully elucidated, the results indicate that binder discovery against new targets with these βαββ proteins would benefit from a mixture of paratope structures. This trend aligns with a previous study that showed that scaffolds with different paratope surfaces have performed better at binding different types of targets due to their shape58.

Paratope sequence analysis

To elucidate how paratope sequence impacts performance, amino acid enrichment of sheet libraries in the binding and developable populations relative to the unsorted library were calculated for each framework at each diversified site. Multiple sites exhibit substantial selective amino acid enrichment in binding variants (Figure 7A). For example, sites 1, 3, 31, and 38 are enriched in select aromatic amino acids; sites 3 and 31 also enrich medium hydrophobic amino acids. Conversely, sites 5 and 29 – and to a lesser extent site 27 and 34 – enrich negatively charged amino acids. For sheet libraries, amino acid enrichment trends are generally consistent between FW1, FW2, and FW4, but diverge for FW3, which was notably the least evolvable framework.

Figure 7. Sitewise amino acid enrichments in binding and developable variants from the sheets libraries.

Figure 7.

(A) Enrichment values compare the frequency of each amino acid in the binder pool to the frequency in the unsorted library for each of the four frameworks. (B) Enrichment values compare the frequency of each amino acid in the most developable pool to the frequency in the unsorted library. Proteinase K (at 37 and 55 °C), thermolysin (at 55 and 75 °C), and split GFP are shown. Values consist of the average for sheets across all four frameworks, so WT is not indicated.

As for developability, negative charges are generally unfavorable for protease stability but are favorable for GFP expression (Figure 7B). Cysteine is enriched in the most stable variants, although not in β-strand C. Nonpolar I, L, and V as well as aromatic residues are generally favored. Proline at select sites hinders protease resistance in sheet paratopes but aids soluble expression of variants at sites 38 and 34 (Figure 7B).

Amino acid enrichment data for the most promising scaffolds could be combined to create even more promising second-generation libraries. For FW1 β ABC8, binding variants tend to prefer negatively charged amino acids at select sites, and non-polar or aromatic residues at remaining sites (Figure S8A). For protease resistance, negatively charged amino acids are unfavorable, and nonpolar resides are generally favorable. While these enrichments (and enrichment for split-GFP) do not fully align, they can be combined to develop an improved library. Several sites are either enriched or depleted for all three and can be increased or decreased in a combined library. Otherwise, the select sites that have high negative charge enrichment in the binder pool can be preferentially designed for negative charge but the other negatively charged residues can be lessened in future libraries or variants to increase stability.

Amino acid enrichment values were also calculated for 1-α (Figure S8B) due to the target preference for different paratopes. 1-α is enriched in non-polar/aromatic/cysteine for thermolysin resistance, and cysteine/polar/positively charged for proteinase K resistance, whereas binder enrichment generally favors positively charged or non-polar for helices. Combining data from these three quantities could be used to improve these scaffolds in future work.

Discussion

In this study, we assessed four computationally designed40 hyperstable βαββ mini-proteins (40 amino acids) as scaffolds for ligand engineering by diversifying various paratopes. Across 43 designed libraries of systematically varied framework sequence, paratope structure, location, and size, the βαββ proteins exhibit a range of performance, with multiple effective scaffolds yielding discovery of numerous binders against seven different targets. Many of these scaffolds were also developable, with sheet libraries generally being the most developable. FW1 three-strand diversified libraries emerged as the most promising scaffolds due to their high developability and evolvability. Affinities of 6 – 122 nM were achieved in direct selection from a modest library size without affinity maturation, with only a single round of FACS, and with tested variants chosen randomly from the FACS pools. Directed evolution, increased FACS stringency, and/or selecting variants based on both frequency and enrichment would likely result in even higher affinity variants.

Leading scaffolds and many of the tested binders have high developability. The initial frameworks and most of the evolved variants (100% of the 12 of characterized variants, 96% of all binders) lack disulfide bonds yet exhibit high stability. Many of the engineered proteins maintain the high stability of the parent despite mutating 7–11 sites of the 40-amino acid protein, with 6 of 9 variants having a Tm ≥ 80 °C and not exhibiting a denatured plateau via CD as the temperature reached 95 °C. The proteins produced in this study have only moderate yield in unoptimized shake flask culture (average of 1.5 mg/L produced), however, many were not codon optimized for bacteria. Most notably, B7H3 #1 contains an AGG codon for R while B7H3 #4 has an AGG and a CGA codon for R. Codon selection, fermentation optimization, or even solid-phase synthesis could be pursued to enhance productivity.

In general, loops were the least effective paratope tested across frameworks for both developability and evolvability. The loops in question, however, are very short (Figure 1A) and lack the length and flexibility present in a traditional loop paratope20,59. Extending the loops to resemble the lengths seen in other scaffolds may make the βαββ loop paratope more viable.

Framework 1 was the most evolvable framework and was the source of the three most evolvable and developable libraries. This is consistent with previous data showing that highly stable proteins are a strong starting point for engineering new function11, as framework 1, called EHEE_rd2_0005 by Baker40, was the most stable parent sequence tested, and to their knowledge, was the most stable minimal protein lacking disulfide bonds discovered at the point of their publication. However, all 43 designed protein libraries were based upon highly stable proteins (Tms ≥85 °C), and many were lacking in either evolvability or developability, thereby challenging the prediction of whether new βαββ FWs similarly designed to be hyperstable would also be good scaffolds. Thus, it may be effective to design not only for parental stability, but also variant stability. Rather than computationally predicting the stability of a single protein and basing a library upon it, it may be beneficial to computationally predict a family of sequence-similar proteins and select a scaffold with a stable family. Second-generation libraries based upon this scaffold and these libraries have the potential to have even higher developability and evolvability38,39,6062. This can be best achieved by using amino acid enrichment data to improve the most performant libraries, such as 1-β ABC8. Second-generation studies could also further advance understanding of the rationale behind the success and failure of different library strategies.

Conclusion

In this study, we evaluated a new route to molecular binding ligands: synthetic mini-proteins, computationally designed by others for clonal stability, repurposed as engineerable scaffolds. While most combinations of framework, paratope structure, and paratope sites did not yield evolvable and developable distributions of variants, select libraries were highly performant. With modest library sizes (4 million variants per library) and no affinity maturation (i.e. direct discovery without further evolution), molecules exhibited affinities of 6–122 nM, high specificity, and thermal stabilities of Tm ≥78 °C. The results highlight the challenge in engineering the framework and binding paratope, uniquely structurally integrated in mini-proteins, while elucidating efficient evolutionary strategies to enable the potential benefits in drug delivery, multi-functional modularity, and binding mechanisms. Opportunities for further optimization include directed evolution of lead variants, bacterial expression optimization, and library sequence constraint.

Methods

Cell Maintenance

Saccharomyces cerevisiae EBY100 yeast were grown in YPD (10 g/L yeast extract, 10 g/L peptone, 20 g/L dextrose) media prior to electroporation transformation at 30 °C. Upon transformation with pCT plasmid-containing libraries, yeast were grown in SD media (16.8 g/L sodium citrate dihydrate, 3.9 g/L citric acid, 20 g/L dextrose, 6.7 g/L yeast nitrogen base, and 5 g/L casamino acids) with chloramphenicol antibiotic at 30 °C. Protein surface display was induced by switching media from SD + chloramphenicol to SG + chloramphenicol (10.2 g/L sodium phosphate dibasic heptahydrate, 8.6 g/L sodium phosphate monobasic monohydrate, 19 g/L galactose, 1 g/L dextrose, 6.7 g/L yeast nitrogen base, 5 g/L casamino acids) at 20 °C.

HEK293 cells expressing IRA-mCherry vector, provided by Doug Yee (University of Minnesota), were grown at 37 °C in DMEM with 10% fetal bovine serum and 1% penicillin-streptomycin. Cells were washed with PBS and detached with 2 mL of 0.05% trypsin/EDTA for 3 minutes. 5 mL of DMEM media prewarmed to 37 °C was added to inactivate the trypsin, cells were spun down at 500 g for 3 minutes and diluted for further growth or lysed, essentially as described63. When lysed, cells were washed three times with 5 mL of ice-cold PBS. 200 μL of cell lysis buffer + protease inhibitor cocktail was added, and cells are incubated for 20 minutes at 4 °C under rotation. Cell debris was removed via centrifugation at 15,000 g for 20 minutes.

HEK293 cells with B7-H3 knocked out were maintained identically to the IRA-mcherry expressing HEK293 cells. MS1 cells expressing B7-H364 were maintained similarly to HEK293 cells, however, cells were treated with trypsin/EDTA for 5 minutes.

Library Design and Construction

Genetic libraries were constructed based upon four synthetic βαββ proteins40. The four most stable proteins were chosen: EHEE_rd2_0005, EHEE_rd2_0008, EHEE_rd2_0875, and EHEE_rd3_0124, herein named framework (FW) 1, 2, 3, and 4. Genetic libraries were constructed for each framework, with potential paratopes created by diversifying the sequence encoding for the surfaces of either the α-helix, the loops connecting the β-sheets and α-helix, or the β sheets. In general, sites were diversified using NNK degenerate codons, which can encode all 20 amino acids, but reduces the probability of stop codons (relative to NNN). At several sites, mutation stability data40 indicates that P is very destabilizing. At any site where the consensus stability score measured for proline was more than 17% below the wild-type, P was excluded from the diversity (Table S1 clarifies design). Parallel libraries were also made with full NNK diversity for comparison.

The eight sites encoding for the loops connecting the first β-strand to the α-helix, and second β-strand to the third were diversified. Site 10 (the ninth diversified site) on the β-strand directly adjacent to the loop was also diversified in proteins FW1 and FW2 to allow for wild-type E or the neutral mutant Q in case charge repulsion interfered with target binding. In FW3, this site was diversified to allow for wild-type R and Q. In the second library, nine sites on the solvent-exposed surface of the α-helix were diversified. Stability data showed that many amino acids at site 21 result in large decreases in stability compared to WT, so an HDB codon was used, which excluded P, A, T, V, D, E, and G. For the β-sheet libraries, seven to eleven sites on the solvent-exposed face of the β-sheet were diversified. To assess the impact of paratope size and location, eight, nine, or eleven sites were diversified across all three b-strands. To assess the impact of the number of diversified strands, libraries were also made with seven or eight sites diversified on pairs of b-strands. Libraries are summarized in Figure 1 and detailed in Table S1.

Gene and pCT80 vector Preparation

The 43 genetic libraries were created via polymerase chain reaction (PCR) using oligonucleotides encoding for the library purchased from IDT. The 44th library, FW3 β AB7 had design errors, which were caught late in the study, so it is not included in the analysis. An initial PCR assembled the oligos into genes using the following conditions: 1X Q5 reaction buffer, 200 μM dNTPs, 0.5 μM primers, 0.005 μM of each oligonucleotide, and 0.02 U/μL Q5 Hot Start High-Fidelity DNA Polymerase (NEB) in 50 μL of deionized water (dH2O) run at 98 °C for 30 s, (98 °C for 10 s, 60 °C for 20 sec, 72 °C for 20 s) for 40 cycles, and 72 °C for 120 s. Full-length gene sequences were isolated using agarose gel electrophoresis and purified via gel extraction. Amplification of genes was performed with Phusion polymerase under the following conditions: 10 ng of total DNA template, 0.02 U/μL Phusion polymerase, 1x Phusion buffer, 0.5 μM primers, 200 μM dNTPs in 400 μL of dH2O, split into 8 separate reactions and run at 98 °C for 30 s, (98 °C for 10 s, 64 °C for 20 sec, 72 °C for 20 s) for 30 cycles, and 72 °C for 120 s. DNA was purified via ethanol precipitation prior to electroporation.

The 40 amino acid proline/alanine/serine linker65 was appended to the native linker in the yeast plasmid display vector pCT51 with the MYC epitope peptide tag replaced with a V5 tag, and the sequence was confirmed via Sanger sequencing. The plasmid was miniprepped, digested with NheI-HF (NEB) and BamHI-HF (NEB) restriction enzymes at 37 °C for four hours, and purified via ethanol precipitation.

Yeast Transformation

Competent EBY100 cells were prepared prior to electroporation as described33. Cells were grown overnight to an optical density at 600 nm of 1.4 and washed with cold dH2O and buffer E (1.0 M sorbitol, 1 mM CaCl2). Cells were incubated at 30 °C for 30 minutes with lithium acetate solution (0.1 M lithium acetate, 10 mM Tris, 1mM ethylenediaminetetraacetic acid, pH 7.5 and 10 mM dithiothreitol). Cells were subsequently washed with cold buffer E and resuspended in buffer E alongside gene libraries (200 pmol of insert) and vector (6 ug of digested vector). Gene libraries were combined into four groups based on their parental framework before electroporation. Cells were electroporated at 1.2 kV and 25 μF and grown in YPD medium for 1 hour at 30 °C before being switched to SD media for overnight growth and induction at 30 °C. Protein surface expression was induced by switching media from SD + chloramphenicol to SG + chloramphenicol at 20 °C.

Epitope Labeling

Cells were labeled for flow cytometry analysis by labeling the HA and V5 tags. 1 million yeast cells were stained with 50 ng chicken anti-HA antibody (Abcam cat: ab9111), 50 ng mouse anti-V5 antibody (Bio-Rad, cat: MCA1360) in 50 μL PBSA and 50 ng goat anti-chicken AlexaFluor488 (Invitrogen cat: A11039) and 50 ng Goat anti mouse AlexaFluor647 (Invitrogen cat: A21325) for 20 minutes at 4 °C and washed with PBSA.

High Throughput Developability Assays

Protease Assays

5 million yeast cells per 100 μL PBSA were treated with protease (thermolysin or proteinase K) for ten minutes at 37, 55 or 75 °C and washed with cold PBSA before epitope labeling (HA and V5 antibodies). Cells were treated at 10−5 U/μL proteinase K (NEB cat: P8107S) at 37 °C and 55 °C and at 2 μg/mL thermolysin (Promega cat: V4001) at 55 °C and at 10 μg/mL thermolysin at 75 °C.

Cells are labeled for epitope tags as previously described. Cells are gated during FACS by drawing a gate around the HA+, V5+ population, and within that population drawing diagonal gates (V5:HA ratio) consisting of the top 5%, next 10%, next 15%, and bottom 70% of cells. Previous experiments were gated by sorting a population of cells that was not treated with protease and drawing a gate around cells that did not lose their pretreatment expression, a gate around HA+, V5- cells, and two approximately equal cell gates in between. To combine these, earlier trials with percentages in the top gate close to one of the later thresholds were used. Cells were collected using FACS.

Split GFP Preparation

Constructed genes from the yeast library pre-electroporation were amplified and digested using NheI and BamHI. The same amplification protocol from above was used, and DNA was restriction enzyme digested using 1 μg DNA, 1X CutSmart Buffer, 1 μL NheI-HF and 1 μL BamHI-HF in 50 μL at 37 °C for 2 hours. DNA was run on a 2% agarose gel, and gel extracted using a Zymoclean gel DNA recovery kit into 10 μL elution buffer using manufacturer’s protocol. pET-GFP11 vector24 was digested with NheI and BamHI (2 μg DNA, 5 μL cutsmart, 1 μL NheI, 1 μl BamHI, 1 μL CIP in 50 μL at 37 °C for 2 hours). Digested vector was run on a 0.75% agarose gel, and gel extracted into 10 μL dH2O using a Zymoclean gel DNA recovery kit using manufacturer’s protocol.

All ligations were prepared on ice. 0.1 pmol digested vector, 0.5 pmol digested insert, 10 μL ligase buffer, 10 μL ATP, and 5 μL of 2,000 U/μL T4 ligase were combined in 100 μL following NEB manufacturer protocol and run at 22 °C for 15 min and 60 °C for 10 min. DNA was cleaned using a Qiagen minelute kit and eluted into 10 μL elution buffer following manufacturer’s protocol. Six ligations were performed. Four included a DNA library derived from one of the four selected βαββ parent frameworks, one included DNA from all four parents added in at equal amounts, and one included DNA from all four parents added at equal amounts and four percent from the high and low stability controls.

βαββ ligations were electroporated into electrocompetent NEB5⍺ (NEB cat: C2987H) cells using manufacturer’s protocols. Cells were grown at 37 °C for one hour and grown in LB + kanamycin overnight. Dilutions were plated to calculate diversities. βαββ-GFP11 plasmids (with kanamycin resistance) were produced in these bacteria cells and extracted via miniprep.

SHuffle cells containing pBAD GFP1–1024 were made electrocompetent by adding 100 μL of an overnight starter culture into 100 mL of SOB (20 g/L tryptone, 5 g/L yeast extract, 0.584 g/L NaCl, 0.186 g/L KCl, 1.204 g/L MgSO4, pH 7.5) + ampicillin and grown at 30 °C until the OD600 reached 0.5. Culture was incubated on ice for 15 minutes, centrifuged, and washed twice with 10% glycerol in water. Cells were centrifuged, supernatant was poured off, and cells were resuspended in the residual glycerol. Competent cells were flash frozen with liquid nitrogen and stored at −80 °C.

βαββ ligations were electroporated into electrocompetent SHuffle cells using the previously extracted βαββ-GFP11 plasmid. 25 μL of thawed electrocompetent SHuffle cells were mixed with 20 ng of plasmid and shocked at 2 kV, 200 Ω, and 25 μF in a cold 1 mm electroporation cuvette. Cells were resuspended in SOB and outgrown at 30 °C for one hour and grown in LB + ampicillin + kanamycin (LAK) overnight. We achieved an average of 4.9×104 transformants (range: 5.1×102 – 2.2×105) per library for a total of 2.1×106 variants.

Cells were thawed and grown up overnight in LAK. Cells were reset to OD600 = 0.1 and grown at 30 °C for 1.5 hours at 30 °C. Protein production of target protein was induced by 0.5 mM isopropyl β-d-1-thiogalactopyranoside (IPTG) for 2 hours at 37 °C, resuspended in fresh LAK and grown for an hour at 37 °C, and then protein production of GFP 1–10 is induced via arabinose for 2 hours at 37 °C. Cells were washed with PBSA, and samples were analyzed via flow cytometry. A gate was drawn over GFP+ cells, and of that population, gates around the top 10%, next 15%, next 25%, and bottom 50% of cells were collected.

Binder Discovery

MACS

For all targets besides IRA, streptavidin beads were labeled with biotinylated IgG (Rockland Immunochemicals cat:005–0602) (33 pmol which yields 5 million IgG per bead) or biotinylated target (6.7 pmol which yields 1 million target protein per bead). Target consisted of CD22 (R&D Systems cat:10191SL050), PSMA (Sino Biological cat:103614–782), Tem8 (R&D Systems cat: 3886AR050), B7-H3 (Sino Biological cat:11188-H08H-B), or NKp46 (Acro Biosystems cat:NC1-H52H4). For IRA, anti-mCherry beads (ChromoTek cat:AB_2861253) were used and were labeled with mCherry as the negative control, or lysate from IRA-mCherry-producing HEK cells. MACS was performed on samples with an initial depletion against bare streptavidin beads, a depletion against IgG, and an enrichment sort against target protein. Yeast displaying ligands were incubated against each for 2 hours at 4 °C with constant rotation. In depletion steps, unbound cells were passed on to the next bead population and bound cell:bead conjugates were washed for comparison with the enrichment sort; in the enrichment step, beads were washed with PBSA and retained yeast were grown in SD media. First sorts involved one PBSA wash, second sorts involved two washes, and third/fourth sorts involved three washes. Cycles of growth, induction, and sorting were repeated until enrichment of specific target binders was observed. Diversities were measured by dilution and colony development on YPD plates.

Binding Assessment

Binding of enriched target binders was evaluated via flow cytometry by incubating ligand-displaying yeast with target (or IgG negative control) at 100 nM and 0 nM. 1 million yeast cells are incubated with biotinylated target at 100 nM and washed with PBSA. Cells were then incubated with 50 ng mouse anti-V5 antibody (Bio-Rad cat: MCA1360) in 50 μL PBSA for 10 minutes and 50 ng streptavidin-AlexaFluor488 (Invitrogen cat:S32354) and 50 ng goat anti-mouse AlexaFluor647 (Invitrogen cat: A21325) for 15 minutes at 4 °C and washed with PBSA. Binding was assessed by comparing AlexaFluor488 signal between 0 and 100 nM populations.

FACS

Yeast expressing binding proteins were sorted via FACS. Sorts were performed at varying concentrations and were labeled via the same method for binding assessment. For 100 nM sorts, all cells that had signal above background were collected, while for more stringent sorts (such as 10 nM), a subset of the highest affinity variants were selected. Deep sequencing of the binding populations was performed via iSeq.

Deep Sequencing

DNA Extraction

DNA from sorted yeast populations (for protease stability assays and binding assays) was extracted using zymolyase. All collected cells from yeast protease assays or 4×107 yeast from binding populations were suspended in 200 μL Zymolyase solution 1 (50 mM phosphate buffer, pH 7.5, 1 M sorbitol), with 10 mM beta-mercaptoethanol and 0.24 U/μL zymolyase at 37 °C for 4 hours. One freeze thaw cycle was performed, and 200 μL MX2 and 400 μL MX3 were added, and centrifuged. The supernatant was cleaned using silica column purification and eluted into 32 μL elution buffer or dH2O. 15 μL of the resulting DNA was mixed with 2 μL ExoI exonuclease, 1 μL lambda exonuclease, and 2 μL lambda exonuclease buffer and incubated at 30 °C for 90 minutes to degrade interfering genomic DNA and inactivated at 80 °C for 20 minutes. Post exonuclease treatment, DNA was cleaned using silica column purification, and eluted again into 32 μL elution buffer or dH2O. DNA from split-GFP populations was extracted via miniprep. 15×107 cells were used, and DNA was eluted into 32 μL dH2O or elution buffer.

Illumina Sequencing Prep

Two rounds of PCR reactions were performed to amplify template and to add genetic barcodes and Illumina adaptors. The first PCR was composed of 1X Q5 buffer, 200 μM dNTPs, 0.5 μM forward and reverse primers, 15 μL of the DNA for yeast DNA or 5 μL of the DNA for bacterial DNA, 0.02 U/μL Q5 High-Fidelity DNA Polymerase in 50 μL, and was run at 98 °C for 30 s, (98 °C for 10 s, 62 °C for 20 sec, 72 °C for 20 s) for 16 cycles, and 72 °C for 120 s. Primers were removed by addition of 4 U of Exonuclease I (final concentration 0.077 U/μL) for 30 minutes at 37 °C, and 20 minutes at 80 °C for enzyme inactivation. The second PCR added barcodes unique to each gate, protease treatment and trial. The second PCR was composed of 1x Q5 buffer, 200 μM dNTPs, 0.5 μM forward and reverse primers, 1.5 μL of first PCR, and 0.02 U/μL Q5 High-Fidelity DNA Polymerase in 50 μL, and was run at 98 °C for 30 s, (98 °C for 10 s, 67 °C for 20 s, 72 °C for 20 s) for 16 cycles, and 72 °C for 120 s. DNA was purified by gel extraction, and concentration was measured via a Nanodrop. DNA from multiple PCRs was combined for Illumina sequencing.

Illumina Sequencing/Analysis

DNA was sequenced via iSeq and MiSeq runs by the University of Minnesota Genomics Center. Sequence analysis was performed with a combination of in-house and community code including USearch66 to merge, align, and filter the sequences.

Testing Individual Protein Variants

Plasmid Digestion and Ligation

pET production vector with a C-terminal His6 tag was digested with NheI and BamHI. Scaffold DNA was extracted from yeast, purified with exonucleases, amplified via PCR, digested with NheI and BamHI, and ligated to vector.

Transformation

NEB T7 Express competent E. coli (NEB cat: C2566H) or NEB SHuffle® T7 Express Competent E. coli (NEB cat: C3028J) were transformed essentially via manufacturer’s protocol. A tube of cells was thawed on ice and split into three tubes. Each third was mixed with 2–5 μL of purified ligation reaction mixture and mixed via flicking. The mixture was placed on ice for 30 minutes, heat shocked at 42 °C for 10 seconds (T7) or 30 seconds (SHuffle) and returned to ice for 5 more minutes. 950 μL of room temperature SOC was added, and cells were outgrown at 37 °C for 60 minutes at 250 rpm. 50 μL of cell mixture was added to a LB (10 g/L tryptone, 5 g/L yeast extract, 10 g/L NaCl, and 16 g/L agar) + kanamycin plate for overnight growth. Colonies were grown overnight in 5 mL of LB (10 g/L tryptone, 5 g/L yeast extract, 10 g/L NaCl) + kanamycin and sequenced via Sanger sequencing.

Protein Production and Purification

A starter culture was used to seed a 100 mL culture of LB + kanamycin at an OD600 of 0.1 which was grown to OD600 = 0.5 – 1. IPTG was added to yield 0.5 mM IPTG, and cells were grown for 2 hours at 37 °C (T7) or 4 hours at 30 °C (SHuffle). Cells were pelleted and resuspended in 1 mL of lysis buffer with protease inhibitor (9.38 g/L sodium phosphate dibasic heptahydrate, 2.07 g/L sodium phosphate monobasic monohydrate, 29.2 g/L NaCl, 5% glycerol, 3.1 g/L CHAPS, and 1.7 g/L imidazole). Cells were frozen/thawed four times and centrifuged at 12,000 g for 10 minutes at 4 °C. The supernatant was filtered with a 0.22 μm filter.

100 to 150 μL of HisPur cobalt resin (Thermo Scientific cat: 89964) was mixed with cell lysate and placed in a fritted column on a vacuum manifold. Protein/resin was washed five times with 30 mM imidazole in 1x PBS, and once with 50 mM imidazole in PBS. Protein was eluted with 100–350 μL of 300 mM imidazole in PBS, flash frozen with liquid nitrogen, and stored at −80 °C until use.

Expression Measurement

Protein, purified ligands or lysozyme calibrants, was separated via SDS-PAGE to measure concentration. Gels were loaded with a mixture of 12 μL protein sample, 4 μL LDS Buffer (4x), and 0.5 μL β-mercaptoethanol that had been preboiled at 99 °C for 5 minutes. Samples were run at 185 V for 25 minutes. Gel was stained with SimplyBlueSafeStain and concentration was measured as integrated band intensity with ImageJ and calibrated to lysozyme controls.

Buffer exchange

Buffer exchange to remove imidazole was performed using Slide-A-Lyzer MINI Dialysis Devices, 3.5K MWCO with 10 mM sodium phosphate buffer with 4–10 exchanges.

Stability Measurement

Purified protein in 10 mM sodium phosphate buffer was diluted to ~0.2 mg/mL in sodium phosphate buffer, and thermal stability and secondary structure were measured via circular dichroism (CD) using a JASCO J-815 CD spectropolarimeter (Biophysical Technology Center, University of Minnesota). Thermal stability was measured by monitoring ellipticity at 218 nm as temperature increased from 25 °C to 95 °C at 3 °C per minute. Melting temperatures were determined using a two-state unfolding equation and minimizing the sum of squared errors comparing experimental data to model-predicted data67. Spectra were obtained by scanning from 260–190 nM at 25 °C before and after thermal treatment. Secondary structure was predicted from the initial spectra at 25 °C using Beta Structure Selection (BeStSel)68.

Affinity Titration

Clonal yeast displaying a scaffold ligand were incubated with His6-tagged target at varying concentrations with rotation at 4 °C for sufficient time to reach 95% of target bound on the lowest concentration sample assuming a kon of 2.5×105. Yeast were labeled with mouse anti-V5 antibody (Bio-Rad cat: MCA1360) for 10 minutes followed by goat anti-mouse AlexaFluor647 (Invitrogen cat: A21325) and FITC Anti-6X His tag antibody (Abcam cat: ab1206) for 15 minutes. Binding and display were measured via flow cytometry (BD Accuri C6). Affinity curves were constructed based upon the median FITC value of cells that were V5+ (full length protein). KD was calculated by minimizing the sum of squared errors assuming a 1:1 binding curve and is reported as the average of at least three trials. Specificity measurements were conducted using the same method described but by testing binding to non-target proteins at 100 nM compared to binding of target at 50 nM.

Cell Expressed B7-H3 Binding

HEK293 cells with B7-H3 KO and MS1 cells overexpressing B7-H3 were incubated with His6-tagged βαββ variants at either 50 nM (B7H3+ cells) or 1000 nM (KO cells) with rotation at 4 °C for sufficient time to reach 95% of target bound on the lowest concentration sample assuming a kon of 2.5×105. Cells were labeled with FITC Anti-6X His tag antibody (Abcam cat: ab1206) for 15 minutes. Binding was measured via flow cytometry (BD Accuri C6).

Score Calculation

Library Identification

Populations of scaffold variants from binding and developability selections were evaluated to determine the relative frequencies of different scaffold libraries. Sequences were matched to whichever library they emerged from; in the case of the sheet libraries, several sequences had the potential to emerge from multiple libraries, so the frequency of that variant was scaled to the probability of its emergence from the library of interest:

PA=jpi,j,Akjpi,j,k (1)

where pi,j,k is the probability of amino acid i appearing at site j in library k. For cases where the sequence and library both have WT at site j, pi,j =1, for cases where the library has NNK diversification at site j, pi,j = pi given NNK, and for cases where the library is WT at site j, but the sequence is not, p = pmut, which has been arbitrarily set at 1×10−4 to account for random errors during PCR. Sequences are then assigned to libraries based on their probability relative to the other probabilities.

SeqLibA=Seq*pLibAipLibi (2)

Developability Scores

Developability scores are calculated as the proportion of a library that appears in the top gate relative to the proportion of that library that appears in the unsorted population. To avoid highly frequent variants dominating the score, the square root of reads was used for this metric. Samples were then normalized by the median of calculated fractions from all libraries.

DevelopabilityScoregate1,libA=ireadsofseqi,inLibAinGate1jreadsofseqjAllreadsinGate1ireadsofseqi,inLibAinUnsortedjreadsofseqjAllreadsinUnsorted (3)

Developability Enrichment Scores

Developability enrichment scores are calculated as the proportion of reads that appear in each gate multiplied by the percentage of that gate, relative to the average. That value for each gate is multiplied by the average score for each gate, and the sum is the developability enrichment score. To avoid highly frequent variants dominating the score, the square root of reads was used for this metric. Samples were then normalized by the median of calculated fractions from all libraries.

maveragescoreingatemireadsofseqi,inLibAinGatemjreadsofseqjAllreadsinGatem*%cellsingatemkireadsofseqi,inLibAinGatekjreadsofseqjAllreadsinGatek*%cellsingatek (4)

Evolvability Scores

Evolvability scores are calculated as the proportion of a library that appears in the binder population, relative to the proportion of that library that appears in the naïve population. To avoid highly frequent variants dominating the score, the fourth root of reads was used for this metric.

EvolvabilityScore,LibA=ireadsofsequencei,inLibAinBinderPool4jreadsofsequencej,AllreadsinBinderPool4ireadsofsequencei,inLibAinNaiveLib4jreadsofsequencej,AllreadsinNaveLib4 (5)

Amino Acid Enrichment

Sitewise amino acid enrichment and depletion was analyzed via ScaffoldSeq69.

Quantification and Statistical Analysis

For all comparisons, one-way ANOVA with a post-hoc Tukey HSD test was performed. P scores of less than 0.05 were considered significant. Experiments were performed in triplicate at a minimum.

Supplementary Material

Supplement

Acknowledgments

This project was funded by an NSF Graduate Fellowship award 2237827, NIH T32GM008347, NIH R01 GM146372, and NIH R01 CA251600. We appreciate assistance by the University of Minnesota Genomics Center and University of Minnesota Flow Cytometry Core.

Abbreviations

PSMA

Prostate-specific membrane antigen

FW

Framework

IRA

Insulin receptor isoform A

Tem8

Tumor endothelial marker 8

NKp46

Natural killer cell p46 related protein

MACS

Magnetic-activated cell sorting

FACS

Fluorescence activated cell sorting

PK

Proteinase K

TH

Thermolysin

RMSD

Root Mean Square Deviation

GFP

Green fluorescent protein

PBSA

1x Phosphate Buffered Saline, 0.11% bovine serum albumin

CD

Circular dichroism

Footnotes

Supporting Information

DNA sequences of the 43 libraries, comparison of amino acid frequencies between experimental result and intention, additional description of lead variants, structures of protein targets, specificity of binding populations, affinity and specificity of individual variants, cell-expressed target binding, stability and structural information of individual variants, effect of sheet diversification on evolvability, comparison of protease score calculation methods, amino acid enrichment values of the leading libraries

Conflict of Interests

P.L.B. and B.J.H. are inventors on a patent application related to the proteins in this study.

References

  • 1.Stern LA, Case BA & Hackel BJ Alternative Non-Antibody Protein Scaffolds for Molecular Imaging of Cancer. Curr. Opin. Chem. Eng 2, 425–432 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Vazquez-Lombardi R, Phan TG, Zimmermann C, Lowe D, Jermutus L & Christ D Challenges and opportunities for non-antibody scaffold drugs. Drug Discov. Today 20, 1271–1283 (2015). [DOI] [PubMed] [Google Scholar]
  • 3.Zahnd C, Kawe M, Stumpp MT, De Pasquale C, Tamaskovic R, Nagy-Davidescu G, Dreier B, Schibli R, Binz HK, Waibel R & Plückthun A Efficient tumor targeting with high-affinity designed ankyrin repeat proteins: Effects of affinity and molecular size. Cancer Res. 70, 1595–1605 (2010). [DOI] [PubMed] [Google Scholar]
  • 4.Thurber GM, Schmidt MM & Wittrup KD Factors determining antibody distribution in tumors. Trends Pharmacol. Sci 29, 57–61 (2008). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Stumpp MT, Dawson KM & Binz HK Beyond Antibodies: The DARPin® Drug Platform. BioDrugs 34, 423–433 Preprint at 10.1007/s40259-020-00429-8 (2020) [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Malm M, Frejd FY, Ståhl S & Löfblom J Targeting HER3 using mono- and bispecific antibodies or alternative scaffolds. mAbs 8, 1195–1209 Preprint at 10.1080/19420862.2016.1212147 (2016) [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Frejd FY & Kim K-T Affibody molecules as engineered protein drugs. Exp. Mol. Med 49, 306–313 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Engfeldt T, Renberg B, Brumer H, Nygren PÅ & Eriksson Karlström A Chemical Synthesis of Triple-Labelled Three-Helix Bundle Binding Proteins for Specific Fluorescent Detection of Unlabelled Protein. ChemBioChem 6, 1043–1050 (2005). [DOI] [PubMed] [Google Scholar]
  • 9.Caliceti P & Veronese FM Pharmacokinetic and biodistribution properties of poly(ethylene glycol)-protein conjugates. Adv. Drug Deliv. Rev 55, 1261–1277 (2003). [DOI] [PubMed] [Google Scholar]
  • 10.Tóth-Petróczy Á & Tawfik DS The robustness and innovability of protein folds. Curr. Opin. Struct. Biol 26, 131–138 Preprint at 10.1016/j.sbi.2014.06.007 (2014) [DOI] [PubMed] [Google Scholar]
  • 11.Kirschner M, Gerhart J, Otey CR & Arnold FH Protein stability promotes evolvability. Proc. Natl. Acad. Sci. U. S. A 103, 5689–5874 (2006). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Tokuriki N, Stricher F, Serrano L & Tawfik DS How Protein Stability and New Functions Trade Off. PLoS Comput. Biol 4, (2008). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Nord K, Gunneriusson E, Ringdahl J, Ståhl S, Uhlén M & Nygren PÅ Binding proteins selected from combinatorial libraries of an α-helical bacterial receptor domain. Nat. Biotechnol 15, 772–777 (1997). [DOI] [PubMed] [Google Scholar]
  • 14.Löfblom J, Feldwisch J, Tolmachev V, Carlsson J, Ståhl S & Frejd FY Affibody molecules: Engineered proteins for therapeutic, diagnostic and biotechnological applications. FEBS Lett. 584, 2670–2680 (2010). [DOI] [PubMed] [Google Scholar]
  • 15.Koide A, Bailey CW, Huang X & Koide S The fibronectin type III domain as a scaffold for novel binding proteins. J. Mol. Biol 284, 1141–1151 (1998). [DOI] [PubMed] [Google Scholar]
  • 16.Skerra A ‘Anticalins’: A new class of engineered ligand-binding proteins with antibody-like properties. Rev. Mol. Biotechnol 74, 257–275 (2001). [DOI] [PubMed] [Google Scholar]
  • 17.Ackerman SE, Currier NV, Bergen JM & Cochran JR Cystine-knot peptides: Emerging tools for cancer imaging and therapy. Expert Rev. Proteomics 11, 561–572 Preprint at 10.1586/14789450.2014.932251 (2014) [DOI] [PubMed] [Google Scholar]
  • 18.Kruziki MA, Bhatnagar S, Woldring DR, Duong VT & Hackel BJA 45-Amino-Acid Scaffold Mined from the PDB for High-Affinity Ligand Engineering. Chem. Biol 22, 946–956 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Simeon R & Chen Z In vitro-engineered non-antibody protein therapeutics. Protein Cell 9, 3–14 Preprint at 10.1007/s13238-017-0386-6 (2018) [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Škrlec K, Štrukelj B & Berlec A Non-immunoglobulin scaffolds: a focus on their targets. Trends Biotechnol. 33, 408–418 (2015). [DOI] [PubMed] [Google Scholar]
  • 21.Wurch T, Pierré A & Depil S Novel protein scaffolds as emerging therapeutic proteins: from discovery to clinical proof-of-concept. Trends Biotechnol. 30, 575–582 (2012). [DOI] [PubMed] [Google Scholar]
  • 22.Jain T, Sun T, Durand S, Hall A, Houston NR, Nett JH, Sharkey B, Bobrowicz B, Caffry I, Yu Y, Cao Y, Lynaugh H, Brown M, Baruah H, Gray LT, Krauland EM, Xu Y, Vásquez M & Wittrup KD Biophysical properties of the clinical-stage antibody landscape. Proc. Natl. Acad. Sci 114, 944–949 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Jarasch A, Koll H, Regula JT, Bader M, Papadimitriou A & Kettenberger H Developability assessment during the selection of novel therapeutic antibodies. J. Pharm. Sci 104, 1885–1898 (2015). [DOI] [PubMed] [Google Scholar]
  • 24.Golinski AW, Mischler KM, Laxminarayan S, Neurock NL, Fossing M, Pichman H, Martiniani S & Hackel BJ High-throughput developability assays enable library-scale identification of producible protein scaffold variants. Proc. Natl. Acad. Sci. U. S. A 118, e2026658118 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Mieczkowski C, Zhang X, Lee D, Nguyen K, Lv W, Wang Y, Zhang Y, Way J & Gries JM Blueprint for antibody biologics developability. mAbs 15, 2185924 Preprint at 10.1080/19420862.2023.2185924 (2023) [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Chen J, Sawyer N & Regan L Protein-protein interactions: General trends in the relationship between binding affinity and interfacial buried surface area. Protein Sci. 22, 510–515 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Tokuriki N & Tawfik DS Stability effects of mutations and protein evolvability. Curr. Opin. Struct. Biol 19, 596–604 Preprint at 10.1016/j.sbi.2009.08.003 (2009) [DOI] [PubMed] [Google Scholar]
  • 28.Dellus-Gur E, Toth-Petroczy A, Elias M & Tawfik DS What makes a protein fold amenable to functional innovation? fold polarity and stability trade-offs. J. Mol. Biol 425, 2609–2621 (2013). [DOI] [PubMed] [Google Scholar]
  • 29.Nagatani RA, Gonzalez A, Shoichet BK, Brinen LS & Babbitt PC Stability for function trade-offs in the enolase superfamily ‘catalytic module’. Biochemistry 46, 6688–6695 (2007). [DOI] [PubMed] [Google Scholar]
  • 30.Mukaiyama A, Haruki M, Ota M, Koga Y, Takano K & Kanaya S A hyperthermophilic protein acquires function at the cost of stability. Biochemistry 45, 12673–12679 (2006). [DOI] [PubMed] [Google Scholar]
  • 31.Hackel BJ & Wittrup KD The full amino acid repertoire is superior to serine/tyrosine for selection of high affinity immunoglobulin G binders from the fibronectin scaffold. Protein Eng. Des. Sel 23, 211–219 (2010). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Woldring DR, Holec PV, Stern LA, Du Y & Hackel BJ A Gradient of Sitewise Diversity Promotes Evolutionary Fitness for Binder Discovery in a Three-Helix Bundle Protein Scaffold. Biochemistry 56, 1656–1671 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Woldring DR, Holec PV, Zhou H & Hackel BJ High-throughput ligand discovery reveals a sitewise gradient of diversity in broadly evolved hydrophilic fibronectin domains. PLoS ONE 10, e0138956 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Kruziki MA, Sarma V & Hackel BJ Constrained Combinatorial Libraries of Gp2 Proteins Enhance Discovery of PD-L1 Binders. ACS Comb. Sci 20, 423–435 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Koide S & Sidhu SS The importance of being tyrosine: Lessons in molecular recognition from minimalist synthetic binding proteins. ACS Chem. Biol 4, 325–334 Preprint at 10.1021/cb800314v (2009) [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Schilling J, Schöppe J & Plückthun A From DARPins to LoopDARPins: Novel LoopDARPin design allows the selection of low picomolar binders in a single round of ribosome display. J. Mol. Biol 426, 691–721 (2014). [DOI] [PubMed] [Google Scholar]
  • 37.Seeger MA, Zbinden R, Flütsch A, Gutte PGM, Engeler S, Roschitzki-Voser H & Grütter MG Design, construction, and characterization of a second-generation DARPin library with reduced hydrophobicity. Protein Sci. 22, 1239–1257 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Fellouse FA, Esaki K, Birtalan S, Raptis D, Cancasci VJ, Koide A, Jhurani P, Vasser M, Wiesmann C, Kossiakoff AA, Koide S & Sidhu SS High-throughput Generation of Synthetic Antibodies from Highly Functional Minimalist Phage-displayed Libraries. J. Mol. Biol 373, 924–940 (2007). [DOI] [PubMed] [Google Scholar]
  • 39.Hackel BJ, Ackerman ME, Howland SW & Wittrup KD Stability and CDR Composition Biases Enrich Binder Functionality Landscapes. J. Mol. Biol 401, 84–96 (2010). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Rocklin GJ, Chidyausiku TM, Goreshnik I, Ford A, Houliston S, Lemak A, Carter L, Ravichandran R, Mulligan VK, Chevalier A, Arrowsmith CH & Baker D Global analysis of protein folding using massively parallel design, synthesis, and testing. Science 357, 168–175 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41.Berman HM, Westbrook J, Feng Z, Gilliland G, Bhat TN, Weissig H, Shindyalov IN & Bourne PE The Protein Data Bank. 28, 235–242 (2000). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42.Castellanos JR, Purvis IJ, Labak CM, Guda MR, Tsung AJ, Velpula KK & Asuthkar S B7-H3 role in the immune landscape of cancer. Am. J. Clin. Exp. Immunol 6, 66–75 (2017). [PMC free article] [PubMed] [Google Scholar]
  • 43.Yang S, Wei W & Zhao Q B7-H3, a checkpoint molecule, as a target for cancer immunotherapy. Int. J. Biol. Sci 16, 1767–1773 Preprint at 10.7150/ijbs.41105 (2020) [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44.Kontos F, Michelakos T, Kurokawa T, Sadagopan A, Schwab JH, Ferrone CR & Ferrone S B7-H3: An attractive target for antibody-based immunotherapy. Clin. Cancer Res 27, 1227–1235 Preprint at 10.1158/1078-0432.CCR-20-2584 (2021) [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45.Leonard JP & Goldenberg DM Preclinical and clinical evaluation of epratuzumab (anti-CD22 IgG) in B-cell malignancies. Oncogene 26, 3704–3713 Preprint at 10.1038/sj.onc.1210370 (2007) [DOI] [PubMed] [Google Scholar]
  • 46.Tuscano JM, Kato J, Pearson D, Xiong C, Newell L, Ma Y, Gandara DR & O’Donnell RT CD22 antigen is broadly expressed on lung cancer cells and is a target for antibody-based therapy. Cancer Res. 72, 5556–5565 (2012). [DOI] [PubMed] [Google Scholar]
  • 47.Haberkorn U, Eder M, Kopka K, Babich JW & Eisenhut M New strategies in prostate cancer: Prostate-specific membrane antigen (PSMA) ligands for diagnosis and therapy. Clin. Cancer Res 22, 9–15 (2016). [DOI] [PubMed] [Google Scholar]
  • 48.Gebauer M & Skerra A Engineered protein scaffolds as next-generation therapeutics. Annu. Rev. Pharmacol. Toxicol 60, 391–415 Preprint at 10.1146/annurev-pharmtox-010818-021118 (2020) [DOI] [PubMed] [Google Scholar]
  • 49.Koide A, Wojcik J, Gilbreth RN, Hoey RJ & Koide S Teaching an Old Scaffold New Tricks: Monobodies Constructed Using Alternative Surfaces of the FN3 Scaffold. J. Mol. Biol 415, 393–405 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 50.Kintzing JR & Cochran JR Engineered knottin peptides as diagnostics, therapeutics, and drug delivery vehicles. Curr. Opin. Chem. Biol 34, 143–150 Preprint at 10.1016/j.cbpa.2016.08.022 (2016) [DOI] [PubMed] [Google Scholar]
  • 51.Boder ET & Wittrup KD Yeast surface display for screening combinatorial polypeptide libraries. Nat. Biotechnol 15, 553–557 (1997). [DOI] [PubMed] [Google Scholar]
  • 52.Hackel BJ Ligand engineering using yeast surface display. Methods Mol. Biol 1163, (2014). [DOI] [PubMed] [Google Scholar]
  • 53.Jumper J, Evans R, Pritzel A, Green T, Figurnov M, Ronneberger O, Tunyasuvunakool K, Bates R, Žídek A, Potapenko A, Bridgland A, Meyer C, Kohl SAA, Ballard AJ, Cowie A, Romera-Paredes B, Nikolov S, Jain R, Adler J, Back T, Petersen S, Reiman D, Clancy E, Zielinski M, Steinegger M, Pacholska M, Berghammer T, Bodenstein S, Silver D, Vinyals O, Senior AW, Kavukcuoglu K, Kohli P & Hassabis D Highly accurate protein structure prediction with AlphaFold. Nature 1–11 (2021). doi: 10.1038/s41586-021-03819-2 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 54.Daniel RM, Cowan DA, Morgan HW & Curran MP A correlation between protein thermostability and resistance to proteolysis. Biochem. J 207, 641–644 (1982). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 55.Jain T, Sun T, Durand S, Hall A, Houston NR, Nett JH, Sharkey B, Bobrowicz B, Caffry I, Yu Y, Cao Y, Lynaugh H, Brown M, Baruah H, Gray LT, Krauland EM, Xu Y, Vásquez M & Wittrup KD Biophysical properties of the clinical-stage antibody landscape. Proc. Natl. Acad. Sci 114, 201616408 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 56.Chang CEA, Chen W & Gilson MK Ligand configurational entropy and protein binding. Proc. Natl. Acad. Sci. U. S. A 104, 1534–1539 (2007). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 57.Searle MS & Williams DH The Cost of Conformational Order: Entropy Changes in Molecular Associations. J. Am. Chem. Soc 114, 10690–10697 (1992). [Google Scholar]
  • 58.Gilbreth RN & Koide S Structural insights for engineering binding proteins based on non-antibody scaffolds. Curr. Opin. Struct. Biol 22, 413–420 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 59.Sha F, Salzman G, Gupta A & Koide S Monobodies and other synthetic binding proteins for expanding protein science. Protein Sci. 26, 910–924 Preprint at 10.1002/pro.3148 (2017) [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 60.Woldring DR, Holec PV, Stern LA, Du Y & Hackel BJ A Gradient of Sitewise Diversity Promotes Evolutionary Fitness for Binder Discovery in a Three-Helix Bundle Protein Scaffold. Biochemistry 56, 1656–1671 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 61.Woldring DR, Holec PV, Zhou H & Hackel BJ High-throughput ligand discovery reveals a sitewise gradient of diversity in broadly evolved hydrophilic fibronectin domains. PLoS ONE 10, e0138956 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 62.Diem MD, Hyun L, Yi F, Hippensteel R, Kuhar E, Lowenstein C, Swift EJ, O’Neil KT & Jacobs SA Selection of high-affinity Centyrin FN3 domains from a simple library diversified at a combination of strand and loop positions. Protein Eng. Des. Sel 27, 419–429 (2014). [DOI] [PubMed] [Google Scholar]
  • 63.Tillotson BJ, Cho YK & Shusta EV Cells and cell lysates: A direct approach for engineering antibodies against membrane proteins using yeast surface display. Methods 60, 27–37 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 64.Lutz AM, Bachawal SV, Drescher CW, Pysz MA, Willmann JK & Gambhir SS Ultrasound molecular imaging in a human CD276 expression-modulated murine ovarian cancer model. Clin. Cancer Res 20, 1313–1322 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 65.Stern LA, Schrack IA, Johnson SM, Deshpande A, Bennett NR, Harasymiw LA, Gardner MK & Hackel BJ Geometry and expression enhance enrichment of functional yeast-displayed ligands via cell panning. Biotechnol. Bioeng 113, 2328–2341 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 66.Edgar RC Search and clustering orders of magnitude faster than BLAST. Bioinformatics 26, 2460–2461 (2010). [DOI] [PubMed] [Google Scholar]
  • 67.Hackel BJ, Kapila A & Dane Wittrup K Picomolar Affinity Fibronectin Domains Engineered Utilizing Loop Length Diversity, Recursive Mutagenesis, and Loop Shuffling. J. Mol. Biol 381, 1238–1252 (2008). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 68.Micsonai A, Moussong É, Wien F, Boros E, Vadászi H, Murvai N, Lee YH, Molnár T, Réfrégiers M, Goto Y, Tantos Á & Kardos J BeStSel: Webserver for secondary structure and fold prediction for protein CD spectroscopy. Nucleic Acids Res. 50, W90–W98 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 69.Woldring DR, Holec PV & Hackel BJ ScaffoldSeq: Software for characterization of directed evolution populations. Proteins Struct. Funct. Bioinforma 84, 869–874 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplement

RESOURCES