Significance
Generation of phenotypic diversity is a key challenge in biotechnology and the improvement of microbes, plants, and animals cultivated by humans. Traditional diversity generation relies heavily on mutagenesis and homologous recombination which, while effective, can be limited in their phenotypic impact and rate of achieving measurable improvements. Function Generator™, an original diversity approach, creates whole-genome fusion gene libraries, representing most or all biological functions encoded by an organism and therefore capable of altering any of its phenotypes. It can be applied equally to well-studied and uncharacterized organisms, requiring only an annotated genome sequence and a genetic transformation method. This powerful technology can affect a trait’s heritability, an enabling feature and advantage for rapid evolution of useful organisms.
Keywords: gene fusion, molecular evolution, microbial strain improvement, heritability, phenotypic diversity
Abstract
Translational fusion of two separate genes into a compound sequence encoding a fusion protein is a key evolutionary mechanism which underpins the emergence of new protein activities, families, and architectures. In biotechnology, gene fusion is a valuable molecular evolution tool for tagging proteins of interest and for combining or altering protein activities. To broadly demonstrate and harness the gain-of-function capabilities of fusion genes in a whole-genome approach, we constructed a “Function Generator™” fusion gene library containing pairwise combinations of 5,019 protein-coding sequences in the Saccharomyces cerevisiae genome. The open reading frames (ORFs) were PCR amplified from the S288C yeast genome and cloned into a centromeric expression vector, with each ORF represented in the 5′ and the 3′ positions of the resulting gene fusions. To illustrate the ability of fusion genes in the library to confer complex phenotypes, a population of yeast library transformants was screened for resistance to four toxic heavy metal ions (Cd+2, Co+2, Cu+2, and Ni+2). Active fusion genes were cloned, validated, and sequenced, revealing a multitude of biological functions represented in these genes, including proteins involved in transcription, translation, metal ion binding and transport, and cell cycle control, as well as unknown functions. The gain-of-function principle of gene fusions was confirmed by comparing the activity of selected fusion genes to their constituent single ORFs expressed either individually or in nonfused pairs. Function Generator™ represents a powerful way to approach phenotypic diversity in the laboratory and to bypass a key evolutionary bottleneck for accelerated strain development.
Generation of phenotypic diversity is critical for the evolution of biological organisms and their adaptation to human-designed production processes. Fusion proteins, encoded by fusion genes, contribute to diversity in nature by allowing new protein families, architectures, and their associated phenotypes to arise, albeit in a stochastic manner over evolutionary timeframes (1–4). Fusion of proto-oncogenes in mammalian cancers, which leads to unchecked cell proliferation, provided the first documented natural examples of gene fusions and their ability to drive profound phenotypic changes (5–8), and these events remain among the most prominent examples of phenotypic transformation brought about by fusion genes. Fusion genes in cancer typically arise from chromosomal rearrangements that combine different coding regions to produce chimeric proteins (5–7, 9–12). The resulting gain-of-function can manifest as constitutive kinase activity, enforced dimerization, altered subcellular localization, or resistance to normal regulatory signals. These changes disrupt cellular homeostasis by driving persistent mitogenic signaling, evading apoptosis, and bypassing cell cycle checkpoints, thereby enabling uncontrolled proliferation and tumor progression (13–15).
Gene fusions have been observed in organisms across the tree of life (4, 16, 17) and have been associated with a multitude of traits and novel functions, including bifunctional enzymes and regulatory proteins (18–23), evolution of new enzymatic and protein activities (24, 25), expansion of protein localization (26, 27), and generation of novel multidomain protein architectures (17, 25, 28). In biotechnology, gene and protein fusions have been exploited for a variety of specific purposes, including protein and enzyme design and engineering (29–31), protein tagging for facilitated recombinant protein expression and purification (32, 33), visualization of protein localization in cells and organisms (34, 35), and enzyme fusions in biochemical pathways for improved metabolic flux (36–38). Last, gene and protein fusions have been exploited in a large number of therapeutic proteins (39, 40) including antibody fusions (41–43), Fc-fusion proteins (44, 45), and chimeric antigen receptor fusion proteins (46, 47).
Realizing the powerful evolutionary principle inherent in gene fusions, we set out to demonstrate that a whole-genome fusion gene library could be a tremendous resource for creating and accessing new proteins and traits for strain improvement. We imagined that a Function Generator™ library, containing all possible pairwise combinations of all or substantially all open reading frames (ORFs) encoded in a genome, could lead to many novel and unexpected sequence combinations that might facilitate and accelerate the development of traits of interest. We also imagined that such a method, once successfully demonstrated, could be useful for enhancing any trait, in any organism.
The genetic basis of robustness- and productivity-related qualities of microbes and other useful organisms tends to be complex and multigenic (48–50), making these traits difficult to engineer. According to breeding theory (51–54), phenotypic gain or response to selection associated with one or more breeding or selection cycles depends on three critical parameters: 1) phenotypic variability of the starting population, 2) heritability of the trait (the proportion of phenotypic variation attributable to genetic factors), and 3) selectivity or selection differential, the difference in trait values between the selected individuals and the starting population.
An ideal strain improvement technology will impact both phenotypic variability as well as heritability (the genetic manner in which a trait is encoded). Heritability changes are uncommon in strain improvement projects relying principally on mutagenesis and homologous recombination for generating genetic diversity. However, by creating novel genes that dramatically modify a trait of interest, Function Generator™ has the potential to broaden the genetic and phenotypic variability of a population while also reprogramming the genetic basis of a phenotype. This impactful dual feature in an easily accessible genetic approach may elevate Function Generator™ to a method of choice for genetic improvement of microbes and higher organisms.
Because of the well-developed and flexible molecular tools and genomic resources available in the budding yeast Saccharomyces cerevisiae, as well as its high industrial relevance as a production organism, we chose S. cerevisiae for our first proof of concept of the power of whole-genome fusion gene libraries (55–57). We describe a series of experiments aimed at constructing a yeast Function Generator™ library, consisting of pairwise fusions of full-length protein-coding sequences, in a yeast centromeric plasmid under control of a galactose-inducible promoter and its use to screen for fusion genes conferring resistance to toxic heavy metal ions.
Several mechanisms for generating phenotypic diversity, listed in Table 1, can come into play simultaneously when applying a whole-genome fusion gene approach to an organism. When we began this work, it was unclear whether any of these would dominate, or if indeed a particular mechanism that comes into play in one organism would be equally prominent in another organism. However, the potential for interplay between multiple ways of impacting an organism’s qualities strengthened the notion that Function Generator™ may be a powerful genetic approach for achieving improvements in complex traits.
Table 1.
A list of potential mechanisms by which genes in a fusion gene library can confer phenotypic changes to a cell
| Number | Mechanism |
|---|---|
| 1 | Dual overexpression of two proteins |
| 2 |
Abnormal localization of a protein
|
| 3 | Changes in stability of one or both fusion partners within the fusion protein |
| 4 | Changes in protein multimerization |
| 5 | Changes in posttranslational modifications |
| 6 |
Activity modification and deregulation
|
| 7 | Combinations of the above |
Not listed are mechanisms of phenotypic changes caused by overexpression of individual genes.
Our choice of using full-length protein-coding sequences, rather than sequences coding for protein domains, as the input sequences for our fusion gene library is based on the rationale that for the purposes of strain improvement, intact native proteins, rather than protein domains, represent the minimal functional units in a cell, having evolved to interact with multiple other proteins and cellular components (58, 59). This project serves as a demonstration of the use of Function Generator™ for developing microbial stress and product tolerance traits of high importance in fermentation and bioproduction (60–63) and for creating new microbial strains for potential use in toxic metal bioremediation (64–66).
Results
Function Generator™ Library Construction and Screening.
We constructed an S. cerevisiae Function Generator™ library (Fig. 1 and SI Appendix, Fig. S1) from a total of 5,019 dereplicated yeast ORFs in the size range of 100 to 2,600 bp, corresponding to 86.5% of the 5,799 verified and uncharacterized protein coding sequences in the yeast genome annotation used. Two sets of ORFs were PCR amplified in a multiplex manner, one destined for the 5′ position in the fusion genes and the other (containing a stop codon) destined for the 3′ position (SI Appendix, Fig. S1).
Fig. 1.

Schematic illustration of Function Generator™ technology. A microbial genome (Saccharomyces cerevisiae genome shown as an example) is the source of input sequences in the form of full-length ORFs that are PCR amplified from genomic DNA and inserted into an expression vector in translational fusions to create a library of all possible pairwise combinations, with each ORF represented in both the 5′ and 3′ positions in the fusions. The library is transformed into the source organism or organism of interest to create a population of transformants, each harboring a single fusion gene. The population represents cells with altered phenotypes and qualities, representing all traits that are encoded by the genome.
The PCR primers used to amplify the ORFs included sequences homologous to the cloning/expression vector to assist homology-dependent assembly, and also encoded a 60 bp linker sequence coding for a 20 amino acid Ala, Gly, and Ser-rich peptide spacer, designed to provide separation between the two fusion partners in the fusion protein while minimizing the chances of protease cleavage (67–70). To create the Function Generator™ library, the two ORF populations were combined into the p416-GAL1 yeast centromeric vector (71) in a single, seamless cloning step, with all fusion genes placed under control of the galactose-inducible yeast GAL1 promoter. The library was cloned in Escherichia coli, and plasmid DNA isolated from the resulting population of E. coli library transformants.
The maximal possible complexity of the Function Generator™ library is 5019 × 5019 = 2.52 × 107 sequence combinations. From next-generation sequencing we predicted that approximately 4,450 ORFs (88.7% of targeted ORFs) were actually represented in the library, resulting in an estimated final library complexity of 1.98 × 107 sequence combinations.
The library was introduced into the BY4742 laboratory yeast strain (72), and the resulting population of stable yeast transformants were plated on selective media containing growth-inhibitory concentrations of four toxic metal salts: CdCl2, CoCl2, CuCl2, and NiCl2. Plasmid DNA was isolated from yeast colonies growing on the selective plates, transformed into E. coli and sequenced. The metal ion resistance activity of these plasmid clones was validated in a qualitative resistance assay (examples shown in Fig. 2). A total of 46 unique fusion genes were recovered during this process that conferred resistance to one or more metal ions (Table 2). All of these genes consisted of full-length 5′ and 3′ ORFs.
Fig. 2.

Sample results from the qualitative heavy metal resistance assay for 15 fusion genes and the negative control plasmid p416-GAL1. Results are shown for each gene for each of the four metal ions as well as nonselective plates with no added metals. The table identifies the position of each fusion gene transformant within each matrix. Differences in gene activity between this representation and Table 2 are due to multiple data points being used to generate the averaged activities represented in Table 2.
Table 2.
Validated fusion genes isolated from the Function Generator™ library
|
For each fusion gene, the two constituent ORFs are listed, color coded by cellular function, with their S. cerevisiae gene ID, description, and length. Fusion gene activities, averaged from multiple data points obtained with the qualitative metal resistance assay and scored on a scale from 0 to 4, are shown for each metal ion. The metal ion selection is listed from which the fusion gene was first isolated. GOFF genes passing deconstruction are indicated in the right-most column. The four fusion genes marked by asterisks in their names showed a point mutation in one or both of the fusion gene ORFs which were corrected to the wild type sequences before further characterization.
A large number of cellular functions were represented in the individual ORFs, summarized in Table 2 and color coded by general cellular function. While we expected to recover ORFs encoding the yeast metallothioneins (encoded by yeast ORFs YHR055C and YOR031W), the presence of many other genes was unexpected, for example, genes coding for diverse transcription factors and transcriptional repressors (YBR066C, YCL058W, YER028C, YFR034C, YGL035C, YGL096W, YHR058C, and YPL235W), cellular and mitochondrial ribosomal proteins (YCR046C, YDR462W, YFR031C-A, YGL135W, YGR215W, YNL185C, and YOR167C), and proteins involved in cell division and cell cycle control (YDL047W, YDL127W, YDL134C, YGR230W, and YPL018W). These results indicated a rich potential for seemingly unrelated genes capable of conferring metal resistance when fused with specific fusion partners. Ten ORFs occurred in multiple fusions (Table 3) potentially indicating more prominent roles in metal ion resistance. Several of these repeated ORFs seem to show a position preference in the fusion gene (Table 3).
Table 3.
Genes occurring more than once within the 46 validated fusion genes, with total number of occurrences indicated, and the number of occurrences in the 5′ and 3′ positions, respectively
|
Gene Deconstruction for Proof of Gain-of-Function.
To formally demonstrate that the resistance activity encoded by each fusion gene was contained in the fusion rather than its individual ORFs, we performed “gene deconstruction” experiments, in which each ORF found in a fusion gene was cloned individually into the same expression vector and tested in a quantitative metal ion resistance assay, with the fusion gene constructs listed in Table 2 used as controls. The quantitative assay scores the amount of yeast growth on solid selective media containing metal ions by integrating the combined colony surface area of each transformant on each of the four selective media (Materials and Methods), which we found to be a more accurate way of reflecting resistance than colony count.
The results of the gene deconstructions are graphically represented in Fig. 3, with representative examples of the selections shown in Fig. 4 and the complete dataset in SI Appendix. Of the 46 fusion genes with validated resistance activity (Table 2), 22 showed significantly higher fusion gene activity compared to either of the individual ORFs and were considered having passed the gene deconstruction test (rightmost column in Table 2).
Fig. 3.
Deconstruction of 46 validated fusion genes. Heat map depiction of metal ion resistance (surface area of colonies growing on selective plates) for each fusion gene and each of the four metal ions. Resistance activities conferred by the individual ORFs contained in each fusion gene are shown in separate columns. Fusion genes “passing deconstruction” are those in which the resistance activity significantly and consistently exceeds the activity of either of its constituent ORFs for at least one metal ion. Because growth rates and colony-forming ability on selective media differed dramatically, the maximum colony surface area measurement is indicated for each strain. Fusion gene M490-F09 was deemed to have passed deconstruction based on a combination of colony number and colony surface area metrics from multiple experiments.
Fig. 4.

Representative gene deconstruction results from the quantitative heavy metal resistance assay for five fusion genes, with one or more examples shown for each metal ion. Colonies formed by strains expressing each fusion gene, as well as its two constituent ORFs, are shown on selective and nonselective media, in comparison with a strain transformed with the negative control plasmid p416-GAL1. Fusion gene M490-A12 did not pass gene deconstruction but was included here as an example of Cd+2 resistance. The enlargement of the plate image at the Right of the M490-A12 example shows the large number of small colonies formed by the M490-A12 3′ ORF single transformant, plated without metal ion selection.
The majority of these fusion genes, referred to hereafter as Gain-Of-Function Fusions (GOFFs), confer Cu+2 resistance with a smaller number conferring resistance to Cd+2, Co+2, and Ni+2. The data summarized in Fig. 3 clearly show how fusion of pairs of ORFs results in phenotypes greatly surpassing those possible by expressing individual ORFs.
Comparison of Fusion Gene Activity to Dual Overexpression of Pairs of ORFs.
As postulated in Table 1, one possible mechanism of fusion gene gain-of-function is through simultaneous overexpression of two proteins. To address whether this mechanism explained any of the 22 GOFF activities, we transformed BY4742 with pairs of individual ORFs placed on separate plasmids and compared the phenotypes of the dual transformants to those expressing the fusion gene together with a second, empty expression plasmid. Examples of dual overexpression phenotypes are shown in Fig. 5 with the complete dataset in SI Appendix. To allow for the possibility that individual ORF expression on one of the centromeric plasmids (with the URA3 selectable marker) might be more or less efficient than individual ORF expression on the other centromeric plasmid (containing the HIS3 selectable marker), the individual ORF combinations were tested reciprocally, with each ORF present first on one plasmid and then on the other. The two reciprocal plasmid combinations are represented in the data points labeled as “ORFs #1” and “ORFs #2,” respectively.
Fig. 5.
Dual overexpression of ORFs from 22 GOFF genes. Heat map depiction of metal ion resistance (surface area of colonies growing on selective plates) for each fusion gene and each of the four metal ions. Resistance activities conferred by the reciprocally coexpressed constituent ORFs are shown in separate columns (“ORFs #1” and “ORFs #2”). Dual overexpression is the likely gain-of-function mechanism when the resistance observed in either or both individual ORF reciprocal cotransformants is equal or greater than that observed in the fusion gene.
Dual overexpression largely or completely reproduced the activity of three GOFF genes (M490-F09, M491-E04, and M492-D07), while for the remaining 19 fusion genes the combination of individual ORFs produced only a portion of the phenotype observed with the fusion gene, summarized in Fig. 5. However, dual overexpression was able to approximate the resistance phenotype for several additional GOFFs in cases of weaker metal ion resistance (for example, M491-F10, M492-D07, M495-F02). In the cases where dual overexpression fails to account for the gain-of-function activity of the fusion gene, we have to conclude that one or more of the other mechanisms listed in Table 1 are responsible for the observed activity. However, it is also possible that the phenotype conferred by the fusion gene requires precise stoichiometric expression of the two proteins which may not always be the case when they are separately expressed.
Additive Fusion Gene Activities.
The variety of genes and cellular functions observed in the 46 validated fusion genes listed in Table 2 suggest that specific combinations of fusion genes could show additive activity when coexpressed in the same cell. For example, fusion gene M490-F09, containing a metallothionein gene fused to a gene encoding a protein of unknown function, would be hypothesized to synergize with a fusion gene encoding proteins with very dissimilar functions, such as M491-E09 which encodes a protein involved in ER-to-Golgi transport fused to a cyclin. To test this hypothesis, we tested all pairwise combinations of seven GOFFs (M490-F09, M491-C12, M491-E09, M492-D07, M492-E01, M493-C02, M495-F02) for heavy metal resistance phenotypes, with each combination again tested reciprocally by placing the two genes on URA3 and HIS3 centromeric plasmids, respectively. As controls, the individual fusion genes were tested on URA3 and HIS3 plasmids in reciprocal cotransformations with a second expression vector lacking a gene insert.
As shown in Fig. 6, certain fusion gene combinations, including the one highlighted above, showed enhanced metal resistance when coexpressed, confirming our hypothesis and illustrating the potential of systematically building a desirable phenotype by stacking these novel genes inside a single transformant. In some cases, the additive activities resulted in higher surface areas of resistant colonies growing on selective plates, whereas other cases showed an increase in colony numbers. Testing the entire set of 22 × 22 = 484 reciprocal combinations of GOFFs was beyond the scope of this project but may be addressed in the future in a dedicated combinatorial screen.
Fig. 6.
Stacking of fusion genes. Metal ion resistance activities are shown for yeast strains transformed with pairs of fusion genes on URA3 and HIS3 centromeric plasmids, respectively, with colony number and colony surface area shown in separate data bars. In some cases, gene stacking increases colony number, in other cases colony surface area, or both. The resistance phenotypes of strains transformed with individual fusion genes are indicated with X and O symbols. All fusion gene combinations were tested reciprocally (with the different genes present either on the URA3 plasmid or the HIS3 plasmid) to control for expression differences between the two plasmids. These results include examples of fusion gene pairs with additive activities (for example, M490-F09 + M491-E09), gene pairs with nonadditive activities and gene pairs with negative epistasis (for example, M49-F09 + M493-C02 in Cd+2 resistance).
Cell Cycle Effects of Fusion Gene Expression.
The presence of genes involved in cell cycle control, including a cyclin and a mitotic phosphatase, in the validated fusion genes (Table 2) raised the possibility that perturbations of the cell cycle may be an underlying mechanism by which the fusion genes confer metal ion resistance. Delaying progression through or the onset of certain parts of the cell cycle may allow the cell to clear the metal ions in advance of or during cell cycle phases where the cell is particularly sensitive to metal ions. To test this hypothesis, cells expressing the 22 GOFFs were harvested in the logarithmic stage of growth, fixed, and analyzed microscopically to determine the percentage of cells without buds (G1 phase), with small buds (S phase), intermediate buds (G2 phase), and large buds (M phase). As shown in Fig. 7, M491-E04 and M491-E09, the two fusion genes containing a C-terminal YDL127W gene encoding a cyclin involved with polarized growth in cell cycle, showed a notable reduction of cells in G1, with corresponding increase in cells in M phase, suggesting an M phase delay. M495-F02, containing a fusion between YBR066C, encoding a transcriptional repressor and YGL035C encoding a transcription factor involved in glucose repression, (neither gene involved a priori in cell cycle control) showed an increase in G1 cells at the expense of cells in S, G2, and M phases. None of the five fusion genes encoding the YDL047W mitotic cycle phosphatase, nor any of the other fusion genes tested, showed meaningful shifts in their cell cycle distributions.
Fig. 7.
Effects of GOFF genes on cell cycle progression. Cells expressing the 22 GOFFs and the empty p416-GAL1 control vector were harvested in the logarithmic stage of growth, fixed and analyzed microscopically to determine the percentage of cells without buds (G1 phase), with small buds (S phase), intermediate buds (G2 phase), and large buds (M phase).
Transcriptome Sequencing of Fusion Gene Transformants.
To identify transcriptomic changes caused by expression of the 22 GOFFs, RNA was isolated from transformed strains expressing these genes and sequenced with an Illumina RNAseq workflow. Fold-changes in mRNA abundance and statistical significances of these changes were calculated for 5,202 transcribed yeast genes. Transcriptional heat maps of significant differentially expressed genes found among the 22 transformed strains are shown in Fig. 8, with samples clustered based on the relatedness of their transcriptional response. A more detailed representation of the data in Fig. 8, with all genes labeled, is shown in SI Appendix, Fig. S4.
Fig. 8.
Transcriptomic profiles of yeast strains transformed with 22 GOFF genes. Heatmap showing the mean log2 fold change of expression for all significant differentially expressed genes (P-value ≤ 0.05 and absolute log2 fold-change ≥1) among all samples, clustered by relatedness of their transcription profiles. 43 out of 47 genes with log2 fold changes ≥3 were encoded on the fusion plasmid expressed in the transformant used to generate that particular RNAseq sample. Only genes with corresponding data for all samples are included.
The results, representing a preliminary exploration of the transcriptional repercussions of GOFF gene expression, show a diversity of unique responses, with the strains shown on the left side of the figure representing the most dramatic departures from the control strain. Related responses are observed where these would be expected based on fusion genes’ constituent ORFs, for example, M491-C09 and M491-C12, both of which encode a phosphatase involved in the mitotic cycle. Perhaps not surprisingly, the above-mentioned fusion gene M495-F02, containing two ORFs encoding members of the transcription machinery, elicited a high number of significant transcriptional changes compared to the other strains (SI Appendix, Fig. S3).
Transcriptional repression or activation of a small number of genes was observed in multiple strains, as shown in SI Appendix, Table S1. The most frequently occurring of these are the PHO3 gene encoding an acid phosphatase that is known to affect resistance to specific metal ions, the FTR1 gene encoding a high-affinity iron permease, the translation elongation factor eIF-5A and the heme-regulated DAN1 gene encoding a cell wall mannoprotein controlled in an oxygen-dependent manner (each of these repressed in 6 different strains). Another notable differentially expressed gene is YBR072W, encoding chaperone Hsp26 (induced in five strains), which is known to contribute to stress tolerance by directly binding to and stabilizing partially unfolded proteins, thereby enabling their refolding by ATP-dependent chaperones (73). These examples, as well as others listed in SI Appendix, Table S1, may point toward conserved mechanisms of metal ion resistance.
The individual genes represented in the 22 GOFFs are only highly expressed when encoded in the fusion plasmid (SI Appendix, Figs. S4 and S5). In addition, gene YGR060W, encoding a C-4 methyl sterol oxidase and present as a 3′ ORF in the GOFF M492-E01, was found mildly underexpressed in 11 samples and mildly overexpressed in a 12th sample, again suggesting a potential conserved mechanism of metal ion resistance.
Discussion
Mutation and homologous recombination are the two main systematic drivers of genetic and phenotypic diversity in nature (74–79). These are complemented by various stochastic mechanisms, including gene and genome duplication and divergence, and local or global chromosomal and genome rearrangements, driven by chromosomal breaks and nonhomologous break repair (80–82). Biotechnology has successfully adopted all of these sources of genetic variability and turned them into a wide array of methods and approaches for altering and engineering biology with the aim of accelerating the evolution and improvement of useful proteins and organisms (83–101). However, it is our ability to imagine and produce specific DNA sequences with exact start and end points and join these in a precise manner into predesigned constructs that represents the principal breakthrough achieved in the recombinant DNA revolution, and the main catalyst of expedited evolution in the laboratory (102–106).
The technology presented here, Function Generator™, relies on the large-scale amplification of >76% of the protein-coding sequences encoded in a microbial genome and their random, pairwise joining in a consistent, sequence-specific manner to generate large collections of novel fusion genes. In its wholesale creation of fusion genes encoding tens of millions of unique sequence combinations, new proteins and novel protein architectures, the technology was intended to accelerate strain improvement by circumventing a critical evolutionary bottleneck. An important underlying assumption is that fusion proteins are the primary translation products from the fusion genes, which is highly likely because of the relative rarity of natural yeast bicistronic transcripts (107, 108).
The current study is a test case of the technology for advancing robustness traits in baker’s yeast. We produced a ~2.0 × 107 complexity combinatorial fusion gene library from >5,000 full-length S. cerevisiae protein-coding sequences, about 4,450 of which were represented in the library. The library was screened for fusion genes conferring resistance to four toxic metal salts. We isolated several dozen fusion genes conferring resistance to one or multiple metal ions, containing unexpected sequence combinations unlikely to have been predicted computationally or from studies of native yeast genes. The novelty of genes and mechanisms represented in our results is supported by the finding that only four of the 72 unique ORFs represented among the validated fusion genes were found with genic mutations in a separate study of metal ion resistance (66), and only three of the 52 significant differentially expressed genes with multiple occurrences identified in our study (SI Appendix, Table S1) were found with genic mutations in the cell lines of the same study.
For 22 of the initial collection of 46 validated fusion genes, both constituent ORFs are necessary to confer the resistance phenotype and achieve gain-of-function relative to the individual ORFs. These 22 GOFF genes fall into two categories, 19 genes in which the constituent ORFs need to be fused and three genes in which dual overexpression is largely sufficient to confer the resistance phenotype, demonstrating multiple mechanisms at work within the library. Because dual overexpression was one of the anticipated mechanisms when we created the fusion gene library (Table 1), all 22 GOFF genes were considered to demonstrate bona-fide gain-of-function and were included in subsequent analysis. While the possible mechanisms are currently hypothetical (Table 1), and the origin of the observed gains-of-function unexplained by the constituent ORFs present in each gene, these questions will be explored in greater detail in future studies.
We also demonstrated additive effects when coexpressing pairs of fusion genes, opening the possibility of creating novel genetic pathways with this technology. Finally, we presented data showing shifts in cell cycle progression for strains expressing several fusion genes, suggesting a potential cell cycle component of the resistance phenotype.
The studies presented here did not test for protein expression levels, and these are likely to differ depending on the expressed sequences. Although not proof, the deconstruction results provide evidence that the fusion genes are translated into fusion proteins in yeast. Furthermore, we have shown that resistance phenotypes are also observed when the fusion genes are expressed under control of a constitutive promoter (57). Some of the differences in phenotypes observed between the fusion genes and dual overexpression of their constituent ORFs may be attributable to differences in protein expression efficiency of the individual native proteins compared to the fusion protein, or to a requirement for perfect 1:1 stoichiometry between the two expressed proteins as would be found in a fusion protein. Measurements of protein expression levels or fine-tuning of fusion protein expression to maximize a phenotype will be the subject of future studies.
Our case study shows that Function Generator™ is a powerful empirical technology for phenotypic enhancement of microbes. By incorporating a majority of protein-coding sequences into a Function Generator™ library, the technology deliberately avoids biases based on preexisting knowledge of specific sequences and functions relevant to a phenotype of interest. Incorporating an entire microbial genome’s worth of biological function into the novel fusion genes allows the biological utility of thousands of genes to be sampled that might ordinarily not be selected for strain improvement projects. It is an effective way of putting to work known and unknown genes discovered in whole-genome sequences. Most sequence combinations in a Function Generator™ library will be between structurally and functionally unrelated genes, distinguishing this approach from other whole-genome methods. Because the technology is empirical in nature, it does not require knowledge of genes and pathways relevant to a trait of interest; it can be applied equally well to established and to little-studied production organisms. It is likely that Function Generator™ will work equally well in microbes as demonstrated here, and in higher organisms such as plants, insects, birds, and mammals.
In the >10 y that we have worked with Function Generator™, we have successfully used the technology to improve multiple traits in E. coli (stress and product tolerance, protein expression, amino acid production), S. cerevisiae (in addition to the present study, alcohol-, heat and low pH-tolerance, see refs. 55–57) and Pseudomonas fluorescens (protein expression). We have successfully isolated active fusion genes in all of our trait screens and have found Function Generator™ to be generally useful for the production of traits that are otherwise difficult to engineer. Based on this experience, Function Generator™ is more flexible and more powerful than current technologies and meets the requirements of “open-endedness and novelty in biological evolution” as well as “more creative approaches to biological design” advocated by others (109).
Function Generator™ has the following specific advantages: 1) high probability of improving a trait of interest; 2) novelty of the discovered fusion genes which can circumvent existing intellectual property or achieve commercial freedom to operate; 3) facile transferability of the novel genes and their encoded traits to other strains and organisms, including strains generated by other approaches (for example, mutagenesis or recombination); 4) high potential for discovery of proteins involved in or able to affect a trait of interest, allowing for rapid expansion of the phenotypic knowledge base; 5) depending on the trait of interest, possibilities of mixing genes from multiple genomes, related or unrelated; 6) possibilities of leveraging additive genetic effects; and 7) high speed and relatively low cost of the discovery process. Finally, each fusion gene isolated from a Function Generator™ library represents a new genetic building block that is evolvable in its own right, for example by mutagenesis of the constituent ORFs, or by their substitution with related genes encoded by the same or other organisms. Additional tuning is possible by varying the linker sequence connecting the two proteins. Function Generator™ enables biotechnologists to harness and take advantage of the full power of genomics and the many genes that have been discovered but so far have remained unexplored and unused (“To rust unburnish’d, not to shine in use,” ref. 110) in sequence databases.
Finally, the number of multiprotein architectures discovered in nature has been relatively limited (111–113), implying a slow rate at which new combinations of protein domains are generated, tested, and retained in eukaryotic and prokaryotic genomes. We have not attempted to calculate the number of new multidomain architectures represented in the Function Generator™ libraries we have created to date and may do so in a follow-up study. However, it is likely that whole-genome fusion gene libraries represent a rich source of novel multiprotein architectures for use in trait screening, protein engineering, de novo protein design, training of artificial intelligence algorithms, or other useful purposes in biotechnology.
Materials and Methods
Detailed methods for this study are given in the SI Appendix.
Strains, Media, and Yeast Cultures.
The laboratory yeast strain BY4742 (MATα his3Δ1 leu2Δ0 lys2Δ0 ura3Δ0, ref. 72) was obtained from the American Type Culture Collection and used for all experiments in this study. All yeast cultures were grown at 30 °C. Standard rich and synthetic defined complete (SDC) single- and double-dropout minimal yeast media were used for all experiments. Chloride salts of four toxic metals (CdCl2, CoCl2, CuCl2, and NiCl2) were added to solid and liquid media for determining metal ion resistance. The term “nonselective medium” refers to medium without added metal salts.
Plasmids and Cloning Methods.
Derivatives of the yeast centromeric expression vectors p416-GAL1 and p413-GAL1 (71) were used throughout the study for constructing the Function Generator™ library, expressing fusion genes and single ORFs, or as empty vector controls. All plasmids constructed for this study have been deposited in Genbank.
Variations of published homology-based assembly methods (97, 114–117) were used for all plasmid constructions. The Benchling suite of sequence analysis and alignment tools was used for design and analysis of all plasmids used in the project.
Function Generator™ Library Construction.
From the 5,799 verified and uncharacterized protein coding genes in the S. cerevisiae reference strain S288C, 5,065 are sequences in the size range of 100 to 2,600 bp, which is the range of the lengths chosen for the Function Generator™ library (20 genes <100 bp and 714 genes >2,600 bp). After eliminating 46 repetitive genes (>96% identical at the protein level), a dereplicated set of 5,019 unique protein-coding sequences remained. Two different pairs of PCR primers were designed based on the 5′ and 3′ ends of each ORF, one for cloning the ORF into the 5′ position of the fusion gene (5′ ORFs) and the other for placing it 3′ (3′ ORFs). Each primer contained additional conserved sequences for multiplex amplification of ORF pools and cloning into the p416-GAL1 (71) derivative used for cloning the library (SI Appendix, Fig. S1).
The coding sequences were PCR amplified from S288C genomic DNA in a three-step, multiplex amplification process (55, 56), dividing the target ORFs into 192 pools of 26 to 27 ORFs each. Illumina next-generation sequencing of selected ORF pools allowed us to estimate 4,450 ORFs represented in the final library (88.7% of the targeted ORFs and 76.7% of the S288C protein-coding genes). ORF pools were combined and assembled into the vector in 25 sublibraries that were cloned in E. coli.
Function Generator™ Library Transformation and Heavy Metal Screening.
Plasmid DNA for each sublibrary was transformed into the BY4742 yeast strain using lithium acetate transformation methods (118, 119). A pooled population of library transformants was plated on solid media supplemented with heavy metal salts. Plasmid DNA was isolated from resistant colonies and rescued by cloning in E. coli (120). Plasmid clones were sequenced, dereplicated, and retested. A total of 479 fusion gene clones obtained from the library (113, 189, 94, and 83 clones isolated from CdCl2, CoCl2, CuCl2, and NiCl2, respectively) resulted in 46 unique fusion gene clones.
Heavy Metal Resistance Assays.
A qualitative metal resistance assay was used for validation of fusion gene activity (Fig. 2). Fusion gene clones were transformed into yeast strain BY4742, and transformed cells were spotted onto solid media supplemented with varying concentrations of the heavy metal salts. Relative cell growth in all spots was assigned growth scores from 0 to 4, and averaged scores graphed in Table 2.
A quantitative heavy metal resistance assay was used for detailed scoring of fusion gene activity. Yeast strain BY4742 was transformed with either a single yeast centromeric plasmid or cotransformed with two yeast centromeric plasmids. Transformed cells were plated onto solid media containing one of the four metal salts or metal-free medium as controls. Colonies forming in the wells were counted and the integrated colony surface area calculated. Due to different metal ion concentrations used in different experiments, the quantitative metal ion resistance differed in a few cases from that observed in the qualitative assay, and/or between different iterations of the quantitative assay represented in Figs. 3 and 5.
Fusion Gene Deconstruction to Provide Evidence of Gain-of-Function.
To demonstrate that both ORFs present in a fusion gene were required for fusion gene activity, all 46 fusion genes were “deconstructed” and their individual ORFs cloned into the p416-GAL1 centromeric yeast expression vector to allow determination of their individual ability to confer metal ion resistance. The quantitative heavy metal resistance assay was used to determine metal ion resistance of both ORFs compared to the fusion gene used as a control.
Dual Overexpression of ORFs Contained in Fusion Genes.
To determine whether coexpression of the two ORFs found in a GOFF gene could approximate the activity of the fusion gene, pairs of individual ORFs cloned onto centromeric expression plasmids were transformed into yeast strain BY4742 and compared to transformants with the corresponding fusion gene. To control for possible differences in gene expression from the two vectors, the individual ORFs were tested in a reciprocal fashion: For example, for a fusion gene containing ORFs A and B, a transformant with ORF A cloned into p416-GAL1 and ORF B cloned into p413-GAL1 was compared with a second transformant containing ORF A cloned into p413-GAL1 and ORF B cloned into p416-GAL1. A third transformant of the ORF A—ORF B fusion gene cloned into p416-GAL1 and p413-GAL1 without an insert served as a control. All transformants were characterized using the quantitative heavy metal resistance assay.
Fusion Gene Stacking.
To explore possible additive effects of the resistance activities already observed, seven fusion genes were cotransformed into BY4742 in all combinations and the transformants characterized using the quantitative heavy metal resistance assay.
Fusion Gene Effects on Cell Cycle Progression.
The 25 GOFF genes were transformed into BY4742, transformant cultures grown harvested during mid-logarithmic growth phase and fixed. For each transformant, the cell morphologies of at least 1,000 cells were counted and sorted into one of four categories: unbudded cells (G1 phase), cells with a small bud <1/3 the diameter of the mother cell (S phase), cells with a large bud 1/3—2/3 the diameter of the mother cell (G2 phase), and cells with a large bud >2/3 the diameter of the mother cell (M phase).
RNAseq Analysis.
Triplicate BY4742 transformants of each fusion gene plus the empty vector control were grown in SDC-Gal-Ura to mid-logarithmic phases of growth, cells harvested and pelleted, then frozen at −80 °C, and shipped to GeneWiz/Azenta for RNA extraction and Illumina RNAseq mRNA sequencing.
Sequence reads were trimmed to remove possible adapter sequences and nucleotides with poor quality (121). Trimmed and deduplicated reads were then mapped to the S. cerevisiae S288C reference genome (122). Differential gene expression was analyzed by comparing triplicate treatment samples to the triplicate control samples (123–126).
Supplementary Material
Appendix 01 (PDF)
Dataset S01 (XLSX)
Dataset S02 (XLSX)
Acknowledgments
We acknowledge Animesh Ray, Biranchi Patra, and Dominick Mendola for their contributions in early stages of this work and for discussions of the applicability of Function Generator™ to specific organisms and traits. We thank Dominick Mendola, Julian Schroeder, and Diane Retallack for critical reading of our manuscript. Construction and initial testing of the Function Generator™ library described here was supported by a Small Business Technology Transfer grant from the National Science Foundation (NSF Award No 1321480). We also thank the many investors in Primordial Genetics, Inc. and Primrose Bio, Inc. for making development of this groundbreaking new technology possible.
Author contributions
S.B., A.G., and H.Z. designed research; M.M.M., S.B., and H.Z. performed research; M.M.M., S.B., C.R.S., and H.Z. contributed new reagents/analytic tools; M.M.M., S.B., C.R.S., B.G.M., and H.Z. analyzed data; and M.M.M., S.B., A.G., C.R.S., B.G.M., and H.Z. wrote the paper.
Competing interests
All authors are owners of stock in Primrose Bio, Inc. exceeding $5000 in stock value. Authors Mitchell and Zieler hold in excess of 5% equity in the company.
Footnotes
This article is a PNAS Direct Submission.
Data, Materials, and Software Availability
All study data are included in the article and/or supporting information, with the exception of the complete transcriptomics dataset which was deposited in the Gene Expression Omnibus (GEO) database of the National Center for Biotechnology Information (127).
Supporting Information
References
- 1.Gilbert W., Why genes in pieces? Nature 271, 501 (1978). [DOI] [PubMed] [Google Scholar]
- 2.Arber W., Genetic variation: Molecular mechanisms and impact on microbial evolution. FEMS Microbiol. Rev. 24, 1–7 (2000). [DOI] [PubMed] [Google Scholar]
- 3.Long M., A new function evolved from gene fusion. Genome Res. 10, 1655–1657 (2000). [DOI] [PubMed] [Google Scholar]
- 4.Long M., VanKuren N. W., Chen S., Vibranovski M. D., New gene evolution: Little did we know. Annu. Rev. Genet. 47, 307–333 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Sawyers C. L., The bcr-abl gene in chronic myelogenous leukaemia. Cancer Surv. 15, 37–51 (1992). [PubMed] [Google Scholar]
- 6.Aman P., Fusion oncogenes in tumor development. Semin. Cancer Biol. 15, 236–243 (2005). [DOI] [PubMed] [Google Scholar]
- 7.Mitelman F., Johansson B., Mertens F., The impact of translocations and gene fusions on cancer causation. Nat. Rev. Cancer 7, 233–245 (2007). [DOI] [PubMed] [Google Scholar]
- 8.Rabbitts T. H., Commonality but diversity in cancer gene fusions. Cell 137, 391–395 (2009). [DOI] [PubMed] [Google Scholar]
- 9.Mitelman F., Johansson B., Mertens F., Fusion genes and rearranged genes as a linear function of chromosome aberrations in cancer. Nat. Genet. 36, 331–334 (2004). [DOI] [PubMed] [Google Scholar]
- 10.Inaki K., Liu E. T., Structural mutations in cancer: Mechanistic and functional insights. Trends Genet. 28, 550–559 (2012). [DOI] [PubMed] [Google Scholar]
- 11.Latysheva N. S., et al. , Molecular principles of gene fusion mediated rewiring of protein interaction networks in cancer. Mol. Cell 63, 579–592 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Liu S. V., Nagasaka M., Atz J., Solca F., Müllauer L., Oncogenic gene fusions in cancer: From biology to therapy. Signal Transduct. Target. Ther. 10, 111 (2025). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Saglio G., Cilloni D., Abl: The prototype of oncogenic fusion proteins. Cell. Mol. Life Sci. 61, 2897–2911 (2004). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Davies K. D., Doebele R. C., Molecular pathways: ROS1 fusion proteins in cancer. Clin. Cancer Res. 19, 4040–4045 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Tuna M., Amos C. I., Mills G. B., Molecular mechanisms and pathobiology of oncogenic fusion transcripts in epithelial tumors. Oncotarget 10, 2095–2111 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Björklund Å. K., Ekman D., Light S., Frey-Skött J., Elofsson A., Domain rearrangements in protein evolution. J. Mol. Biol. 353, 911–923 (2005). [DOI] [PubMed] [Google Scholar]
- 17.Kummerfeld S. K., Teichmann S. A., Relative rates of gene fusion and fission in multi-domain proteins. Trends Genet. 21, 25–30 (2005). [DOI] [PubMed] [Google Scholar]
- 18.Kern C. B., Lusty C. J., Davidson J. N., Evidence that mammalian glutamine-dependent carbamyl phosphate synthetase arose through gene fusion. J. Mol. Evol. 35, 217–222 (1992). [DOI] [PubMed] [Google Scholar]
- 19.Saha A., Madhubala R., Dihydrofolate reductase-thymidylate synthase as a potential anti-parasitic drug target (a natural bifunctional fusion protein). FEBS Lett. 547, 109–114 (2003). [Google Scholar]
- 20.Ashby M. K., Houmard J., Cyanobacterial two-component proteins: Structure, diversity, distribution, and evolution. Microbiol. Mol. Biol. Rev. 70, 472–509 (2006). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Zhang W., Fisher J. F., Mobashery S., The bifunctional enzymes of antibiotic resistance. Curr. Opin. Microbiol. 12, 505–511 (2009). [DOI] [PubMed] [Google Scholar]
- 22.Capra E. J., Laub M. T., Evolution of two-component signal transduction systems (modular fusions in regulatory proteins). Annu. Rev. Microbiol. 66, 325–347 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Barik S., On the role, ecology, phylogeny, and structure of dual-family immunophilins. Cell Stress Chap. 22, 833–845 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Vogel C., Bashton M., Kerrison N. D., Chothia C., Teichmann S. A., Structure, function and evolution of multidomain proteins. Curr. Opin. Struct. Biol. 14, 208–216 (2004). [DOI] [PubMed] [Google Scholar]
- 25.Bashton M., Chothia C., The generation of new protein functions by the combination of domains. Structure 15, 85–99 (2007). [DOI] [PubMed] [Google Scholar]
- 26.Pradet-Balade B., et al. , An endogenous hybrid mRNA encodes TWE-PRIL, a functional cell surface TWEAK-APRIL fusion protein. EMBO J. 21, 5711–5720 (2002). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.de la Fouchardière A., et al. , Fusion partners of NTRK3 affect subcellular localization of the fusion kinase and cytomorphology of melanocytes. Mod. Pathol. 34, 735–747 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Bashton M., Chothia C., The geometry of domain combination in proteins. J. Mol. Biol. 315, 927–939 (2002). [DOI] [PubMed] [Google Scholar]
- 29.Höcker B., Design of proteins from smaller fragments-Learning from evolution. Curr. Opin. Struct. Biol. 27, 56–62 (2014). [DOI] [PubMed] [Google Scholar]
- 30.Yu K., Liu C., Kim B. G., Lee D. Y., Synthetic fusion protein design and applications. Biotechnol. Adv. 33, 155–164 (2015). [DOI] [PubMed] [Google Scholar]
- 31.Yang H., Liu L., Xu F., The promises and challenges of fusion constructs in protein biochemistry and enzymology. Appl. Microbiol. Biotechnol. 100, 8273–8281 (2016). [DOI] [PubMed] [Google Scholar]
- 32.Smith D. B., Johnson K. S., Single-step purification of polypeptides expressed in Escherichia coli as fusions with glutathione S-transferase. Gene 67, 31–40 (1988). [DOI] [PubMed] [Google Scholar]
- 33.di Guan C., Li P., Riggs P. D., Inouye H., Vectors that facilitate the expression and purification of foreign peptides in Escherichia coli by fusion to maltose-binding protein. Gene 67, 21–30 (1988). [DOI] [PubMed] [Google Scholar]
- 34.Chalfie M., Tu Y., Euskirchen G., Ward W. W., Prasher D. C., Green fluorescent protein as a marker for gene expression. Science 263, 802–805 (1994). [DOI] [PubMed] [Google Scholar]
- 35.Niedenthal R. K., Riles L., Johnston M., Hegemann J. H., Green fluorescent protein as a marker for gene expression and subcellular localization in budding yeast. Yeast 12, 773–786 (1996). [DOI] [PubMed] [Google Scholar]
- 36.Bülow L., Characterization of an artificial bifunctional enzyme, beta-galactosidase / galactokinase, prepared by gene fusion. Eur. J. Biochem. 163, 443–448 (1987). [DOI] [PubMed] [Google Scholar]
- 37.Dueber J. E., et al. , Synthetic protein scaffolds provide modular control over metabolic flux. Nat. Biotechnol. 27, 753–759 (2009). [DOI] [PubMed] [Google Scholar]
- 38.Cheah L. C., Sainsbury F., Vickers C. E., Translational fusion of terpene synthases for metabolic engineering: Lessons learned and practical considerations. Methods Enzymol. 699, 121–161 (2024). [DOI] [PubMed] [Google Scholar]
- 39.Capon D. J., et al. , Recombinant soluble CD4 fused to the Fc region of immunoglobulin G is a potent inhibitor of HIV-1 infection. Nature 337, 525–531 (1989).2536900 [Google Scholar]
- 40.Marsh M. C., Owen S. C., Therapeutic fusion proteins. AAPS J. 26, 3 (2023). [DOI] [PubMed] [Google Scholar]
- 41.Reisfeld R. A., Gillies S. D., Recombinant antibody fusion proteins for cancer immunotherapy. Curr. Top. Microbiol. Immunol. 213, 27–53 (1996). [DOI] [PubMed] [Google Scholar]
- 42.Kontermann R. E., Antibody-cytokine fusion proteins. Arch. Biochem. Biophys. 526, 194–205 (2013). [DOI] [PubMed] [Google Scholar]
- 43.Silver A. B., Leonard E. K., Gould J. R., Spangler J. B., Engineered antibody fusion proteins for targeted disease therapy. Trends Pharmacol. Sci. 42, 1064–1081 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44.Czajkowsky D. M., Hu J., Shao Z., Pleass R. J., Fc-fusion proteins: New developments and future perspectives. EMBO Mol. Med. 4, 1015–1028 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45.Rath T., et al. , Fc-fusion proteins and FcRn: Structural insights for longer-lasting and more effective therapeutics. Crit. Rev. Biotechnol. 35, 235–254 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46.Gross G., Waks T., Eshhar Z., Expression of immunoglobulin-T-cell receptor chimeric molecules as functional receptors with antibody-type specificity. Proc. Natl. Acad. Sci. U.S.A. 86, 10024–10028 (1989). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47.June C. H., Sadelain M., Chimeric antigen receptor therapy. N. Engl. J. Med. 379, 64–73 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48.Adrio J. L., Demain A. L., Genetic improvement of processes yielding microbial products. FEMS Microbiol. Rev. 30, 187–214 (2006). [DOI] [PubMed] [Google Scholar]
- 49.Steensels J., et al. , Improving industrial yeast strains: Exploiting natural and artificial diversity. FEMS Microbiol. Rev. 38, 947–995 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 50.Olsson L., Rugbjerg P., Pianale L. T., Trivellin C., Robustness: Linking strain design to viable bioprocesses. Trends Biotechnol. 40, 918–931 (2022). [DOI] [PubMed] [Google Scholar]
- 51.Falconer D. S., Mackay T. F. C., “Selection: I. The response and its prediction” in Introduction to Quantitative Genetics, 4th Ed, Falconer D. S., Mackay T. F. C., Eds. (Longman Scientific & Technical / John Wiley & Sons, 1996), pp. 187-208. [Google Scholar]
- 52.Rutkoski J. E., A practical guide to genetic gain. Adv. Agron. 157, 217–249 (2019). [Google Scholar]
- 53.Bernardo R., Breeding for Quantitative Traits in Plants (Stemma Press, ed. 3, 2020). [Google Scholar]
- 54.Beavis W. K., Lamkey K., Mahama A. A., Suza W., “Selection response” in Quantitative Genetics in Plant Breeding, Suza W. P., Lamkey K. R., Eds. (Iowa State University Digital Press, 2023), pp. 123–135. [Google Scholar]
- 55.Zieler H., Compositions and methods for creating altered and improved cells and organisms. US patent 9,200,291 (2014).
- 56.Zieler H., Methods and compositions for creating altered and improved cells and organisms. US patent 10,077,441 (2016).
- 57.Zieler H., German S., Ray A., Patra B. N., Compositions for improving cells and organisms. US patent 11,198,708 (2016).
- 58.Tan T., Frenkel D., Gupta V., Deem M. W., Length, protein-protein interactions, and complexity. Phys. A Stat. Mech. Appl. 350, 52–62 (2005). [Google Scholar]
- 59.Yang X., et al. , Widespread expansion of protein interaction capabilities by alternative splicing. Cell 164, 805–817 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 60.Gibson B. R., Lawrence S. J., Leclaire J. P. R., Powell C. D., Smart K. A., Yeast responses to stresses associated with industrial brewery handling. FEMS Microbiol. Rev. 31, 535–569 (2007). [DOI] [PubMed] [Google Scholar]
- 61.Dunlop M. J., Engineering microbes for tolerance to next-generation biofuels. Biotechnol. Biofuels 4, 32 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 62.Sandberg T. E., Salazar M. J., Weng L. L., Palsson B. O., Feist A. M., The emergence of adaptive laboratory evolution as an efficient tool for biological discovery and industrial biotechnology. Metab. Eng. 56, 1–16 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 63.Sun J., et al. , Advance of tolerance engineering on microbes for industrial applications. Bioresour. Bioprocess. 7, 30 (2020). [Google Scholar]
- 64.Kuroda K., Ueda M., Bioadsorption of cadmium ion by cell surface-engineered yeasts displaying metallothionein and hexa-His. Appl. Microbiol. Biotechnol. 63, 182–186 (2003). [DOI] [PubMed] [Google Scholar]
- 65.Wei Q., Zhang H., Guo D., Ma S., Cell surface display of four types of Solanum nigrum metallothionein on Saccharomyces cerevisiae for biosorption of cadmium. J. Microbiol. Biotechnol. 26, 846–853 (2016). [DOI] [PubMed] [Google Scholar]
- 66.Bazzicalupo A. L., Kahn P. C., Ao E., Campbell J., Otto S. P., Evolution of cross-tolerance to metals in yeast. Proc. Natl. Acad. Sci. U.S.A. 122, e2505337122 (2025). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 67.Arai R., Ueda H., Kitayama A., Kamiya N., Nagamune T., Design of the linkers which effectively separate domains of a bifunctional fusion protein. Protein Eng. 14, 529–532 (2001). [DOI] [PubMed] [Google Scholar]
- 68.Eldridge B., et al. , An in vitro selection strategy for conferring protease resistance to ligand binding peptides. Protein Eng. Des. Sel. 22, 691–698 (2009). [DOI] [PubMed] [Google Scholar]
- 69.Wang R., Xue Y., Wu X., Song X., Peng J., Enhancement of engineered trifunctional enzyme by optimizing linker peptides for degradation of agricultural by-products. Enzyme Microb. Technol. 47, 194–199 (2010). [Google Scholar]
- 70.Chen X., Zaro J. L., Shen W. C., Fusion protein linkers: Property, design and functionality. Adv. Drug Deliv. Rev. 65, 1357–1369 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 71.Funk M., et al. , Vector systems for heterologous expression of proteins in Saccharomyces cerevisiae. Methods Enzymol. 350, 248–257 (2002). [DOI] [PubMed] [Google Scholar]
- 72.Brachmann C. B., et al. , Designer deletion strains derived from Saccharomyces cerevisiae S288C: A useful set of strains and plasmids for PCR-mediated gene disruption and other applications. Yeast 14, 115–132 (1998). [DOI] [PubMed] [Google Scholar]
- 73.Cashikar A. G., Duennwald M. L., Lindquist S. L., A chaperone pathway in protein disaggregation: HSP26 alters the nature of protein aggregates to facilitate reactivation by Hsp104. J. Biol. Chem. 280, 23869–23875 (2005). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 74.Muller H. J., The relation of recombination to mutational advance. Mutat. Res. 106, 2–9 (1964). [DOI] [PubMed] [Google Scholar]
- 75.Felsenstein J., The evolutionary advantage of recombination. Genetics 78, 737–756 (1974). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 76.Kondrashov A. S., Deleterious mutations and the evolution of sexual reproduction. Nature 336, 435–440 (1988). [DOI] [PubMed] [Google Scholar]
- 77.Barton N. H., Charlesworth B., Why sex and recombination? Science 281, 1986–1990 (1998). [PubMed] [Google Scholar]
- 78.Barton N. H., Mutation and the evolution of recombination. Philos. Trans. R. Soc. Lond. B Biol. Sci. 365, 1281–1294 (2010). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 79.Charlesworth D., Barton N. H., Charlesworth B., The sources of adaptive variation. Proc. Biol. Sci. 284, 20162864 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 80.Hurles M., How homologous recombination generates a mutable genome. Hum. Genomics 2, 179–186 (2005). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 81.Lynch M., Bobay L. M., Catania F., Gout J. F., Rho M., The repatterning of eukaryotic genomes by random genetic drift. Annu. Rev. Genomics Hum. Genet. 12, 347–366 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 82.Arenas M., et al. , Mutation and recombination in pathogen evolution: Relevance, methods and controversies. Infect. Genet. Evol. 63, 295–306 (2018). [DOI] [PubMed] [Google Scholar]
- 83.Jackson D. A., Symons R. H., Berg P., Biochemical method for inserting new genetic information into DNA of Simian Virus 40: Circular SV40 DNA molecules containing lambda phage genes and the galactose operon of Escherichia coli. Proc. Natl. Acad. Sci. U.S.A. 69, 2904–2909 (1972). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 84.Cohen S. N., Chang A. C., Boyer H. W., Helling R. B., Construction of biologically functional bacterial plasmids in vitro. Proc. Natl. Acad. Sci. U.S.A. 70, 3240–3244 (1973). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 85.Lobban P. E., Kaiser A. D., Enzymatic end-to end joining of DNA molecules. J. Mol. Biol. 78, 453–471 (1973). [DOI] [PubMed] [Google Scholar]
- 86.Kunkel T. A., Rapid and efficient site-specific mutagenesis without phenotypic selection. Proc. Natl. Acad. Sci. U.S.A. 82, 488–492 (1985). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 87.Mullis K., et al. , Specific enzymatic amplification of DNA in vitro: The polymerase chain reaction. Cold Spring Harb. Symp. Quant. Biol. 51 (Pt 1), 263–273 (1986). [DOI] [PubMed] [Google Scholar]
- 88.Ma H., Kunes S., Schatz P. J., Botstein D., Plasmid construction by homologous recombination in yeast. Gene 58, 201–216 (1987). [DOI] [PubMed] [Google Scholar]
- 89.Saiki R. K., et al. , Primer-directed enzymatic amplification of DNA with a thermostable DNA polymerase. Science 239, 487–491 (1988). [DOI] [PubMed] [Google Scholar]
- 90.Leung D. W., Chen E., Goeddel D. V., A method for random mutagenesis of a defined DNA segment using a modified polymerase chain reaction. Technique 1, 11–15 (1989). [Google Scholar]
- 91.Cadwell R. C., Joyce G. F., Randomization of genes by PCR mutagenesis. PCR Methods Appl. 2, 28–33 (1992). [DOI] [PubMed] [Google Scholar]
- 92.Stemmer W. P., Rapid evolution of a protein in vitro by DNA shuffling. Nature 370, 389–391 (1994). [DOI] [PubMed] [Google Scholar]
- 93.Stemmer W. P., DNA shuffling by random fragmentation and reassembly: In vitro recombination for molecular evolution. Proc. Natl. Acad. Sci. U.S.A. 91, 10747–10751 (1994). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 94.Arnold F. H., Moore J. C., Optimizing industrial enzymes by directed evolution. Adv. Biochem. Eng. Biotechnol. 58, 1–14 (1997). [DOI] [PubMed] [Google Scholar]
- 95.Glasner M. E., Gerlt J. A., Babbitt P. C., Mechanisms of protein evolution and their application to protein engineering. Adv. Enzymol. Relat. Areas Mol. Biol. 75, 193–239 (2007). [DOI] [PubMed] [Google Scholar]
- 96.Arnold F. H., How proteins adapt: Lessons from directed evolution. Cold Spring Harb. Symp. Quant. Biol. 74, 41–46 (2009). [DOI] [PubMed] [Google Scholar]
- 97.Gibson D. G., et al. , Enzymatic assembly of DNA molecules up to several hundred kilobases. Nat. Methods 6, 343–345 (2009). [DOI] [PubMed] [Google Scholar]
- 98.Labrou N. E., Random mutagenesis methods for in vitro directed enzyme evolution. Curr. Protein Pept. Sci. 11, 91–100 (2010). [DOI] [PubMed] [Google Scholar]
- 99.Trudeau D. L., Smith M. A., Arnold F. H., Innovation by homologous recombination. Curr. Opin. Chem. Biol. 17, 902–909 (2013). [DOI] [PubMed] [Google Scholar]
- 100.Nordwald E. M., Garst A., Gill R. T., Kaar J. L., Accelerated protein engineering for chemical biotechnology via homologous recombination. Curr. Opin. Biotechnol. 24, 1017–1022 (2023). [DOI] [PubMed] [Google Scholar]
- 101.Bomfiglio I. F., Mendes I. S. M., Bonatto D., A review of DNA restriction-free overlapping sequence cloning techniques for synthetic biology. Biotechnol. J. 20, e70084 (2025). [DOI] [PubMed] [Google Scholar]
- 102.Sambrook J., Fritsch E. F., Maniatis T., Molecular Cloning: A Laboratory Manual (Cold Spring Harbor Laboratory Press, Plainview, New York, ed. 2, 1989). [Google Scholar]
- 103.Berg P., Mertz J. E., Personal reflections on the origins and emergence of recombinant DNA technology. Genetics 184, 9–17 (2010). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 104.Cohen S. N., DNA cloning: A personal view after 40 years. Proc. Natl. Acad. Sci. U.S.A. 110, 15521–15529 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 105.Simon A. J., d’Oelsnitz S., Ellington A. D., Synthetic evolution. Nat. Biotechnol. 37, 730–743 (2019). [DOI] [PubMed] [Google Scholar]
- 106.Gali V. K., Tee K. L., Wong T. S., Crafting genetic diversity: Unlocking the potential of protein evolution. SynBio 2, 142–173 (2024). [Google Scholar]
- 107.Wu B., Cox M. P., Characterization of bicistronic transcription in budding yeast. mSystems 206, e01002–20 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 108.Sun Y., Bock R., Li Z., A hidden intrinsic ability of bicistronic expression based on a novel translation reinitiation mechanism in yeast. Nucleic Acids Res. 53, gkaf220 (2025). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 109.Stock M., Gorochowski T. E., Open-endedness in synthetic biology: A route to continual innovation for biological design. Sci. Adv. 10, eadi3621 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 110.Tennyson A., “Ulysses” in Poems, (Edward Moxon, London, 1842), vol. 11, pp. 88-94. [Google Scholar]
- 111.Levitt M., Nature of the protein universe. Proc. Natl. Acad. Sci. U.S.A. 106, 11079–11084 (2009). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 112.Kolodny R., Pereyaslavets L., Samson A. O., Levitt M., On the universe of protein folds. Annu. Rev. Biophys. 42, 559–582 (2013). [DOI] [PubMed] [Google Scholar]
- 113.Nepomnyachiy S., Ben-Tal N., Kolodny R., Global view of the protein universe. Proc. Natl. Acad. Sci. U.S.A. 111, 11691–11696 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 114.Gibson D. G., Smith H. O., Hutchison C. A. 3rd, Venter J. C., Merryman C., Chemical synthesis of the mouse mitochondrial genome. Nat. Methods 7, 901–903 (2010). [DOI] [PubMed] [Google Scholar]
- 115.Li M. Z., Elledge S. J., Harnessing homologous recombination in vitro to generate recombinant DNA via SLIC. Nat. Methods 4, 251–256 (2007). [DOI] [PubMed] [Google Scholar]
- 116.Li C., et al. , Fastcloning: A highly simplified, purification-free, sequence- and ligation-independent PCR cloning method. BMC Biotechnol. 11, 92 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 117.Li M. Z., Elledge S. J., SLIC: A method for sequence- and ligation-independent cloning. Methods Mol. Biol. 852, 51–59 (2012). [DOI] [PubMed] [Google Scholar]
- 118.Gietz R. D., Woods R. A., Yeast transformation by the LiAc/SS carrier DNA/PEG method. Methods Mol. Biol. 313, 107–120 (2006). [DOI] [PubMed] [Google Scholar]
- 119.Gietz R. D., Schiestl R. H., High-efficiency yeast transformation using the LiAc/SS carrier DNA/PEG method. Nat. Protoc. 2, 31–34 (2007). [DOI] [PubMed] [Google Scholar]
- 120.Ward A. C., Single-step purification of shuttle vectors from yeast for high frequency back-transformation into E. coli. Nucleic Acids Res. 8, 5319 (1990). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 121.Chen S., Zhou Y., Chen Y., Gu J., Fastp: An ultra-fast all-in-one FASTQ preprocessor. Bioinformatics 34, i884–i890 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 122.Dobin A., et al. , STAR: Ultrafast universal RNA-seq aligner. Bioinformatics 29, 15–21 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 123.Liao Y., Smyth G. K., Shi W., Featurecounts: An efficient general purpose program for assigning sequence reads to genomic features. Bioinformatics 30, 923–930 (2014). [DOI] [PubMed] [Google Scholar]
- 124.Love M. I., Huber W., Anders S., Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol. 15, 550 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 125.Waskom M. L., Seaborn: Statistical data visualization. J. Open Source Softw. 6, 3021 (2021). [Google Scholar]
- 126.Virtanen P., et al. , SciPy 1.0: Fundamental algorithms for scientific computing in Python. Nat. Methods 17, 261–272 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 127.Schvarcz C. R., McNay M. M., Baffert S., Zieler H., Data from “Whole-genome combinatorial gene fusions generate novel genes for advanced microbial trait development.” National Center for Biotechnology Information Gene Expression Omnibus. https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE325936. Deposited 25 March 2026. [DOI] [PMC free article] [PubMed]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Appendix 01 (PDF)
Dataset S01 (XLSX)
Dataset S02 (XLSX)
Data Availability Statement
All study data are included in the article and/or supporting information, with the exception of the complete transcriptomics dataset which was deposited in the Gene Expression Omnibus (GEO) database of the National Center for Biotechnology Information (127).





