Skip to main content
Engineering Biology logoLink to Engineering Biology
. 2022 Sep 16;6(4):82–90. doi: 10.1049/enb2.12025

Prediction of strain engineerings that amplify recombinant protein secretion through the machine learning approach MaLPHAS

Evgenia A Markova 1,, Rachel E Shaw 1, Christopher R Reynolds 1
PMCID: PMC9995161  PMID: 36968340

Abstract

This article presents a discussion of the process of precision fermentation (PF), describing the history of the space, the expected 70% growth over the next 5 years, various applications of precision fermented products, and the markets available to be disrupted by the technology. A range of prokaryotic and eukaryotic host organisms used for PF are described, with the advantages, disadvantages and applications of each. The process of setting up PF and strain engineering is described, as well as various ways that computational analysis and design techniques can be employed to assist PF engineering. The article then describes the design and implementation of a machine learning method, machine learning predictions having amplified secretion (MaLPHAS) to predict strain engineerings, which optimise the secretion of a recombinant protein. This approach showed an in silico cross‐validated R 2 accuracy on the training data of up to 46.6% and in an in vitro test on a Komagataella phaffii strain, identified one gene engineering out of five predicted, which was shown to double the secretion of a heterologous protein and outperform three of the best‐known edits from the literature for improving secretion in K. phaffii.

1. PART I. PRECISION FERMENTATION FOR THE PRODUCTION OF RECOMBINANT PROTEINS IN INDUSTRY—A REVIEW

1.1. What is precision fermentation?

Precision fermentation (PF) is a process of using ‘cellular factories’ to produce organic molecules. A cell host of choice—the ‘factory’—is instructed through genetic engineering to manufacture a molecule of choice. This allows the production of large quantities of molecules that are found at low concentration in their native organisms. In this way, PF can be used to lower the economic, labour, and environmental costs of biomolecule production. Additionally, PF can be applied to the production of molecules not found in nature.

The advent of PF was enabled by the advances in gene sequencing and chemical synthesis of the 1960s and 1970s. PF was first used 45 years ago to produce synthetic human somatostatin in genetically modified Escherichia coli [1]. Shortly afterwards, E. coli was engineered to synthesise the first synthetic human insulin [2]. Since then, PF has been applied to the production of a wide range of molecules, including food products and supplements, materials, industrial enzymes, and therapeutics. Traditionally, the use of PF has been limited to high‐value products due to the high cost of producing biomolecules. Now, costs are decreasing at a rapid pace: a 2019 report generated by the RethinkX think tank suggests that by 2030, the cost of proteins produced by fermentation will be a fifth of that of animal‐derived proteins [3]. As PF products emerge as economically feasible alternatives, they drive innovation in the food, materials, and cosmetics industries, among others. The PF market is projected to increase at a Compound Annual Growth Rate of 11.5% over the next 5 years, growing by 70% [4].

1.2. Applications of precision fermentation

In the food industry, synthetic proteins and fats are produced by PF as sustainable alternatives to animal products. The first product replaced by PF was chymosin, a key component of rennet, which is used in cheese production. Chymosin was historically isolated from calf stomachs, but in 1990, synthetic chymosin became the first FDA‐approved bioengineered food product [5]. By 2006, 80% of the chymosin used in US cheese production was synthetic [6, 7]. Other dairy components, such as the milk protein casein, are now being produced through PF by companies such as Perfect Day, Motif FoodWorks, New Culture, and Better Dairy. The global dairy alternatives market is growing rapidly. It is currently estimated at $11.9 bn and is expected to reach $41 bn by 2026 [8]. Other new product types are now entering the market. In 2021, the first animal‐free egg equivalent was launched by The Every Company, previously Clara Foods. PF also finds applications in the animal‐free meat industry. Impossible Foods use yeast to produce heme, which is added to their plant‐based burgers to help replicate the taste of meat. Applications of PF in the food industry are not limited to proteins and also include the bioengineering of lipid production pathways for the production of vegan animal fats. These aim to offer the taste of animal fat without its ethical and environmental drawbacks. A new wave of companies, including Melt&Marble, Nourish Ingredients, and Cubiq Foods, are bringing these fats to market. Beyond proteins and fats, PF is also applied to the production of flavouring agents (e.g. orange flavour, raspberry aromas, and thaumatin) and pigments (e.g. canthaxanthin, astaxanthin, and beta‐carotene) [9, 10, 11, 12].

In the materials industry, PF is used to produce spider silk‐like biopolymers. Spider silk is mechanically superior to silkworm silk, and it is characterised by high tensile strength, elasticity, and durability [13]. However, its production in large amounts is obstructed by the challenges of farming spiders, which are cannibalistic. This challenge has been addressed by companies such as Bolt Threads, AMSilk, and Spiber, which use a fermentation process to create synthetic fibres similar to spider silk. Synthetic spider silk is anticipated to play a growing role in the £21 bn silk market, which is estimated to rise to $28.7 bn by 2026 [14].

In cosmetics and personal care, Geltor is producing vegan human collagen with reportedly superior qualities to marine collagen. In the production of vitamins such as B2 and B12, PF has replaced chemical synthesis as a more economically feasible process [15, 16].

While PF has been used for the production of therapeutics for a long time, this sector is currently witnessing significant growth as protein therapeutics have become the best‐selling drugs of the past 5 years [17]. More than 170 are produced worldwide, including monoclonal antibodies, hormones (such as insulin) and growth factors (such as erythropoietin) [18]. This rapidly expanding market is currently at $170 bn and projected to reach $233 bn by 2026 [19].

1.3. Expression systems for protein production

A range of prokaryotic and eukaryotic organisms can be chosen for protein production by PF. These include bacterial, fungal, insect, and mammalian cells. Different expression systems produce different protein yields (amount of product per unit of time) and productivities (amount of product per substance consumed). Furthermore, they have different capacities for performing post‐translational modification, which can be essential for correct protein folding and protein function. Therefore, while commonly optimising for product amounts, it is essential that the relevant features of the produced protein are also retained, especially for high‐risk applications such as therapeutics. Production using more complex expression hosts, such as insect and mammalian cells, is more costly, requires more time and effort, and necessitates more sophisticated production set‐ups. Therefore, prokaryotes and simple eukaryotes such as yeasts are often the organisms of choice for industrial fermentation of non‐therapeutic products. The extensive prior study of these organisms has also generated a greater understanding of the metabolic process associated with biomolecule production.

Among prokaryotes, E. coli is the most popular protein expression platform. It is cheap to maintain and can produce high amounts of protein, doubling once every 20 min [20, 21]. However, E. coli is not the ideal host for the expression of proteins containing PTMs [22]. Native E. coli does not accommodate disulphide bonding in its cytoplasm, only in its periplasm, which is a problematic expression locus [23]. This poses challenges for the expression of eukaryotic proteins as nearly a third have disulphide bonds, which are essential to their structure [24]. Native E. coli also rarely performs glycosylation, which can be indispensable for proper protein structure and function [25, 26]. Even in strains modified to allow for this, glycosylation patterns differ from those in mammals [27]. In the absence of appropriate folding and modification machinery, overexpressed proteins can aggregate in intracellular inclusion bodies [28, 29]. Hence, prokaryotes are the organisms of choice for proteins without PTMs or where PTMs are not required for the required protein function [30].

Yeasts have been used in fermentation for millennia. Currently, yeast is widely used in the production of eukaryotic proteins as a single‐celled eukaryote with low maintenance costs and high productivity [31]. Furthermore, yeasts have been intensively characterised. Saccharomyces cerevisiae is the best studied eukaryote and the first eukaryote to have its genome fully sequenced [32]. This in‐depth understanding allows for a protein production process with high potential for precision engineering. One of the main benefits of yeast expression systems is that they support a wide range of PTMs, including disulphide bonding, phosphorylation, N‐acetylation, and glycosylation [33, 34, 35, 36]. While yeast generally performs glycosylation to a similar extent as mammalian cells, it can also hypermannosilate proteins [37]. This can alter the half‐life of the protein and, in the case of therapeutics, cause an immune response in patients [38]. Nonetheless, yeast has successfully been applied to the production of a range of therapeutics, such as the Hepatitis B and human papilloma virus (HPV) vaccines, insulin, and interferon‐alpha‐2a. Komagataella phaffii, formerly known as Pichia pastoris, is widely used for the production of industrial enzymes, such as glucose oxidase, amylase, and cellulase [39, 40, 41]. While S. cerevisiae is the conventional yeast expression platform, K. phaffii has emerged as a useful alternative as it can grow to higher densities, resulting in increased protein yield [42]. Furthermore, it has a low endogenous secretory output, which can reduce the need for purification of secreted recombinant proteins [43]. Other fungi used in protein production include the filamentous Aspergillus niger, Trichoderma reesei and Neurospora crassa, which are naturally able to secrete large amounts of protein [44].

Insect cells, such as Spodoptera frugiperda Sf9 cells and Drosophila melanogaster S2 cells, can be used to produce protein therapeutics. The transient baculovirus expression system has been instrumental in the uptake of insect cell expression systems as it has enabled high recombinant protein expression [45, 46, 47]. A major challenge in the production of mammalian proteins using insect cells is that they can produce different glycosylation patterns to mammalian cells [48, 49]. This has largely limited their therapeutic application [50, 51]. Currently, only around 1% of therapeutics are produced in insect cells, including Cervarix (a vaccine against HPV) and Provenge (a therapeutic against prostate cancer) [51].

The industrial use of mammalian cells, such as Chinese hamster ovary and human embryonic kidney cells, is limited to the production of therapeutic molecules. The majority of protein therapeutics (84%) are expressed in mammalian cell lines [52]. Mammalian cells are ideal therapeutic factories for two reasons: their capacity to perform human or near‐human PTMs (including complex glycosylation patterns) and their track record of regulatory approval [53, 54]. The non‐therapeutic use of mammalian cells, however, is limited by their high cost and low productivity when compared to other expression systems.

1.4. Expression set‐up

The first step of PF is inserting the gene encoding for the protein of interest into an appropriate strain of the host organism. The optimisation of the expression set‐up consists of the following stages:

  1. Choosing stable or transient expression. Stable expression methods involve the incorporation of the gene of interest into the genome of the expression host. This enables a consistent, indefinite, and reproducible biomolecule production process. Inserting multiple copies of the gene of interest has the potential to increase protein yield [55]. While the evolution of genome editing technology has made stable cell line generation economically feasible, this process still requires up to a month. Alternatively, transient expression can be used that introduces DNA plasmids encoding the sequence of interest into the host. Transient expression is faster and cheaper to set up than stable cell‐line generation. However, the transformation efficiency can vary between batches, resulting in process variability and posing problems to optimisation.

  2. Codon optimisation. In the host organism, some codons are more commonly used and more efficiently translated than others [56]. To increase the translation efficiency of the recombinant gene, the recombinant gene sequence can be altered according to the organism's codon bias without changing the amino acid sequence of the encoded protein.

  3. Designing an expression cassette with a suitable promoter and terminator.

  4. Deciding whether to secrete the protein. Fusing a secretory signal peptide to the gene of interest can reduce the need for downstream purification, lowering cost and improving recovery. Furthermore, intracellular product accumulation, especially in the case where misfolded proteins form aggresomes, can be toxic to the cell [57]. However, whether the protein can be efficiently secreted depends on its size and the capabilities of the host organism.

1.5. Strain engineering

PF aims to increase protein production using a combination of synthetic biology and genetic engineering. This is based around the Design‐Build‐Test‐Learn (DBTL) paradigm [58, 59], which involves the following stages (Figure 1):

  1. Design and build. This stage can consist of the design and generation of large libraries where a range of diverse genetic modifications are introduced into the selected strain. Libraries can include expression strains generated from cassette modification, protease knock‐out and genome‐wide coding region knockout/knockdown libraries, among others. Library design can incorporate prior knowledge about relevant cellular pathways.

  2. Test (production screening). The protein production ability of the generated strains is screened using high‐throughput methods, and mutations that improve production are identified.

  3. Learn (iterative adaptation). Combining identified mutations in iterative engineering cycles can help generate an optimal strain.

FIGURE 1.

FIGURE 1

Precision fermentation for recombinant protein production using the Design‐Build‐Test‐Learn (DBTL) cycle

1.6. Computational optimisation approaches

Computational approaches have long been used to assist genetic engineering and strain design, but in the synthetic biology paradigm, they have become a vital part of the design and learn stages of the DBTL cycle [60].

Computational approaches can be applied to the following stages of production:

  1. Deep learning can be used to predict gene expression from the nucleotide sequences of the gene itself and adjacent regions [61, 62, 63]. Deep learning is a family of machine learning methods based on artificial neural networks that use multiple network layers (hence the adjective ‘deep’) to compute different representations of data from each previous layer [64]. Deep learning tends to outperform traditional machine learning methods but takes longer and requires more data points to build a comparable model [65, 66, 67].

  2. For well‐studied organisms such as E. coli or S. cerevisiae, genome‐scale metabolic simulation can predict production performance [68, 69]. For less established organisms, the metabolic pathways need to be elucidated de novo to understand the metabolic flux involved in protein production [70].

  3. The metabolic flux can be optimised for the desired output through the alteration of genes involved in the metabolic network [71]. This is one of the most complex stages in strain design, involving the use of databases, modelling and genetic engineering.

  4. Machine learning can be used to recommend strain engineering strategies with the potential to increase biomolecule production [72].

Importantly, strain optimisation is specific to the protein of interest. For example, when the secretion of three different proteins—horseradish peroxidase, Candida antarctica lipase B and secretory leucocyte peptidase inhibitor—was tested in mutant strains of K. phaffii, it was shown to be differently affected by the bgs7 and bgs13 supersecretor mutations [73]. This illustrates the importance of de novo strain design for the efficient production of novel proteins.

Supervised machine learning methods for protein‐specific product maximisation have fallen into two groups:

  1. Building predictive models from large omics datasets [74], such as identifying bottleneck proteins from mass spectrometry proteomics [75] or metabolomics [76]. These methods require large amounts of omics data generated from a strain expressing the recombinant protein of choice.

  2. Supervised machine learning methods to predict optimal strain construction in a combinatorial design and prediction strategy, such as Principal Component Analysis to identify optimal expression levels from 27 strain constructions [77, 78] or training on over 500 strains with different promoter combinations to predict the optimal combination [79]. These methods require multiple constructed strain designs.

1.7. Downstream processes

After a strain has been optimised to produce a desired level of recombinant product, the DBTL cycle is exited. Final optimisation stages include:

  1. Recovery and purification. The recombinant product is separated from unwanted by‐products. For the recovery of non‐secreted proteins, this step includes cell lysis.

  2. Production scale‐up and optimisation. The PF process is translated from bench‐scale to full manufacturing scale. This includes the optimisation of the feedstock and fermentation conditions [80].

  3. Manufacturing the fermentation product into an end product. The PF product is incorporated into a final product with the desired specifications, and any necessary additives are added to the formulation.

2. PART II. AN APPLICATION OF MACHINE LEARNING METHODS TO THE OPTIMISATION OF RECOMBINANT PROTEIN PRODUCTION

2.1. Aims and findings

Eden Bio Ltd aimed to design a machine learning method to guide strain engineering of recombinant‐protein‐producing strains by suggesting gene edits to amplify secretion. The method was intended to not require a priori data from strain engineering constructions to begin making suggestions, to suggest combinatorial edits specific to the recombinant protein being expressed, and for the model to be strengthened by each round of results from a DBTL cycle.

Eden Bio Ltd designed and built the proprietary machine learning approach machine learning predictions having amplified secretion (MaLPHAS) to predict strain engineerings that optimise the secretion of a recombinant protein. This approach showed an in silico cross‐validated R 2 accuracy on the training data of up to 46.6%, and in an in vitro test on a K. phaffii strain, identified one gene engineering out of five predicted, which was shown to double the secretion of a heterologous protein and outperform three of the best‐known edits from the literature for improving secretion in K. phaffii.

2.2. Methods

The value of MaLPHAS in suggesting strain engineerings was assessed through two methods: a cross‐validated in silico test on the training data and a test case where the secretion of a recombinant protein was increased using a ML‐suggested engineering.

  1. A training dataset of 751 engineering data points was extracted from 85 separate literature references, each reporting information on gene engineerings applied to alter the secretion of a range of recombinant proteins. The frequency with which different organism species appeared in the dataset is reported in Table 1.

  2. Each data point contained information on the gene engineerings, the organism used and the protein whose secretion was to be optimised.

  3. Feature vectors were generated using Eden Bio's proprietary encoding methods, which encoded the organism, the protein itself, the type of engineering and the gene(s) engineered.

  4. MaLPHAS was designed to employ an ensemble of machine learning methods that can be used as either a stacked ensemble or majority voting ensemble to generate a predictive model. Ensemble methods apply multiple machine learning algorithms to the same training data and produce a model that blends the algorithms together for improved levels of prediction [81, 82].

  5. Ten‐fold cross‐validation was used to estimate the in silico performance of the method.

  6. A recombinant strain of K. phaffii CBS7435 was purchased from the American Type Culture Collection and engineered to express a 26 kDa recombinant protein fused at the N‐terminus to an α‐mating factor secretion signal [83] and at the C‐terminus to a HiBiT tag [84]. The recombinant protein is an amphiphilic protein present in mammals, whose identity is proprietary, and the HiBiT tag is an 11 amino acid tag developed by Promega that luminesces proportional to protein concentration in the presence of the Promega NanoBiT® enzyme [84]. Data about the fused recombinant protein is reported in Table 2 and its predicted structure is shown in Figure 2. In the control strain, this protein was secreted into the supernatant at 206 mg/L.

  7. Three gene knockouts from the literature that are well known to increase secretion in K. phaffii were generated, as well as five gene knockouts that were predicted by MaLPHAS to lead to increased secretion. The gene knockouts from the literature were pep4 and prb1 (both highly effective at preventing recombinant protein degradation [87]) and the supersecretor bgs7 [73].

  8. Twelve colonies of each strain were picked and grown in Buffered Methanol‐Complex medium according to the expression protocol described by Weidner et al. [88].

  9. After 48 h, the media was separated, and the concentration of recombinant protein in the supernatant was measured using the Nano‐Glo HiBiT Lytic Detection Reagent [84].

TABLE 1.

Composition of the machine learning training dataset by organism species

Organism species Frequency
Saccharomyces cerevisiae 443
Komagataella phaffii 277
Kluyveromyces lactis 13
Other 18

TABLE 2.

Physicochemical parameters of the fused recombinant protein used in this experiment (including secretion signal and HiBiT tag), calculated with the ProtParam Expasy tool [85]

Physicochemical parameter Value
Exact molecular weight 27,817.74
Theoretical pI 9.06
Atomic composition C1249H1966N328O376S7
Extinction coefficient (at 280 nm, in water) 34,505 M−1 cm−1
Estimated half‐life 30 h (mammalian reticulocytes, in vitro); >20 h (yeast, in vivo); >10 h (Escherichia coli, in vivo)
Instability index 43.05
Aliphatic index 74.23
Grand average of hydropathicity −0.661

FIGURE 2.

FIGURE 2

An image of the predicted 3D ribbon structure of the fused recombinant protein, generated with the SWISS‐MODEL protein structure modelling server [86]

2.3. Results

The results of the in silico cross‐validation of MaLPHAS on the training dataset are shown in Table 3. This is comparable to other state‐of‐the art machine learning methods in synthetic biology [72].

TABLE 3.

The R 2 results of in silico ten‐fold cross‐validation on our training dataset when predicting n‐fold secretion increase from organism type, protein being optimised and the list of gene engineerings applied to the organism

Model Mean Median
Stacking 0.345 0.428
Voting 0.461 0.466

The protein production results of the engineered strains are plotted in Figure 3. The MaLPHAS‐predicted Component Of Oligomeric Golgi Complex 6 (cog6) knockout strain showed a significant increase in secretion compared to the control (Figure 2, p = 0.0013, two‐sample T‐test). This knockout nearly doubled the secreted yield of recombinant protein and outperformed the tested literature‐reported engineerings, including the bgs7 supersecretor strain [73]. The cog6 knockout strain also exhibited lower variance than any of other engineering tested, including the control.

FIGURE 3.

FIGURE 3

Box and whisker plots showing HiBiT signals of the recombinant protein secreted into the supernatant by engineered strains, normalised to the mean of the parental strain. A signal of 1 corresponds to a protein concentration of 206 mg/L. Measurements were made on 12 colonies of each strain, from left to right: control (the parental strain), three knockouts from the literature known to increase secretion (Δpep4, Δprb1, and Δbgs7) and five knockouts suggested by MaLPHAS (Δcog6, Δypt7, Δgas1, Δglr1, and Δvam6)

3. CONCLUSION AND OUTSTANDING CHALLENGES

While PF has advanced at a rapid pace, some open challenges remain. First, PF products are not yet an economically viable alternative for low‐value applications, such as packaging. Increasing PF yields and globally shifting attitudes towards plastic can increase the role of PF in this sector. Second, optimised fermentation processes have not yet been fully automated. A complete fermentation system requiring little user input can facilitate the establishment of new production facilities. Third, PF production has not yet been decentralised. The local production of vegan animal proteins and sustainable materials can decrease the associated transport emissions and have a large impact on sustainability.

An obstacle to the wide‐scale uptake of applying Machine Learning (ML) to synthetic biology is the training and understanding needed to use the software. Although automated ML pipelines are available as part of packages, these are limited in scope and do not confer knowledge of the underlying model, which is necessary to choose the correct parameters [89]. Using the Python programing language [90] is currently the most widely available method of integrating ML with bioinformatics, as it has become the preferred language for both scientific computing and machine learning [91]. Python has well‐developed libraries for both biology (BioPython [92]) and multiple machine learning libraries (Scikit‐learn [93], TensorFlow [94], PyTorch [95], and Keras [96]). More data and increasing numbers of packages and libraries tailored to specific biological [problems] will open up further opportunities [89] Future developments in this field that would assist biologists with no coding knowledge to use ML tools would be additional training courses on the use of software, the development of simplified libraries and code that automatically configure themselves [97], and the deployment of biology‐specific resources that allow the construction of ML pipelines through a no‐code graphical interface, such as Microsoft's Azure platform [98].

In this article we demonstrated the value of machine learning techniques for strain engineering. As the PF field matures, we can expect the increasing use of machine learning and other computational techniques for the systematic and rational design of strains. We expect the machine learning methods to improve as techniques are refined and more data becomes available to strengthen the training models.

AUTHOR CONTRIBUTION

All authors contributed equally to this paper.

CONFLICT OF INTEREST

Christopher Reynolds is CEO, founder and majority shareholder of Eden Bio Ltd, as well as a shareholder in Better Dairy. Rachel Shaw and Evgenia Markova are employees of Eden Bio Ltd.

ACKNOWLEDGEMENT

Work on this paper was funded by Eden Bio Ltd.

Markova, E.A. , Shaw, R.E. , Reynolds, C.R. : Prediction of strain engineerings that amplify recombinant protein secretion through the machine learning approach machine learning predictions having amplified secretion. Eng. Biol. 6(4), 82–90 (2022). 10.1049/enb2.12025

DATA AVAILABILITY STATEMENT

The data that support the findings of this study are available from Eden Bio Ltd. Restrictions apply to the availability of these data, which were used under licence for this study. Data are available from the authors with the permission of Eden Bio Ltd.

REFERENCES

  • 1. Itakura, K. , et al.: Expression in Escherichia coli of a chemically synthesized gene for the hormone somatostatin. Science 198(4321), 1056–1063 (1977). 10.1126/science.412251 [DOI] [PubMed] [Google Scholar]
  • 2. Gunby, P. : Bacteria directed to produce insulin in test application of genetic code. JAMA 240(16), 1697 (1978). 10.1001/jama.1978.03290160015001 [DOI] [PubMed] [Google Scholar]
  • 3. Tubb, C. , Seba, T. : Rethinking food and agriculture 2020‐2030: the second domestication of plants and animals, the disruption of the cow, and the collapse of industrial livestock farming. Ind. Biotechnol. 17(2), 57–72 (2021). 10.1089/ind.2021.29240.ctu [DOI] [Google Scholar]
  • 4. TechSci Research . Precision Fermentation Market (2021)
  • 5. Direct food substances affirmed as generally recognized as safe; chymosin enzyme preparation derived from Escherichia coli K‐12; final rule. Fed. Regist. 55, 10932–10936 (1990) [Google Scholar]
  • 6. Johnson, M.E. , Lucey, J.A. : Major technological advances and trends in cheese. J. Dairy Sci. 89(4), 1174–1178 (2006). 10.3168/jds.s0022-0302(06)72186-5 [DOI] [PubMed] [Google Scholar]
  • 7. Flamm, E.L. : How FDA approved chymosin: a case history. Nat. Biotechnol. 9(4), 349–351 (1991). 10.1038/nbt0491-349 [DOI] [PubMed] [Google Scholar]
  • 8. Global Dairy Alternatives Market . (2021). https://www.marketdataforecast.com/market‐reports/dairy‐alternatives‐market
  • 9. Chen, H. , et al.: High production of valencene in Saccharomyces cerevisiae through metabolic engineering. Microb. Cell Factories 18(1), 195 (2019). 10.1186/s12934-019-1246-2 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10. Joseph, J.A. , et al.: Bioproduction of the recombinant sweet protein thaumatin: current state of the art and perspectives. Front. Microbiol. 10, 695 (2019). 10.3389/fmicb.2019.00695 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11. Lee, D. , et al.: Heterologous production of raspberry ketone in the wine yeast Saccharomyces cerevisiae via pathway engineering and synthetic enzyme fusion. Microb. Cell Factories 15(1), 49 (2016). 10.1186/s12934-016-0446-2 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12. Sen, T. , Barrow, C.J. , Deshmukh, S.K. : Microbial pigments in the food industry—challenges and the way forward. Front. Nutr. 6, 7 (2019). 10.3389/fnut.2019.00007 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13. Hakimi, O. , et al.: Spider and mulberry silkworm silks as compatible biomaterials. Compos. B Eng. 38(3), 324–337 (2007). 10.1016/j.compositesb.2006.06.012 [DOI] [Google Scholar]
  • 14. Silk Market . (2021). https://www.marketdataforecast.com/market‐reports/silk‐market
  • 15. Averianova, L.A. , et al.: Production of vitamin B2 (riboflavin) by microorganisms: an overview. Front. Bioeng. Biotechnol. 8, 570828 (2020). 10.3389/fbioe.2020.570828 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16. Fang, H. , Kang, J. , Zhang, D. : Microbial production of vitamin B12: a review and future perspectives. Microb. Cell Factories 16(1), 15 (2017). 10.1186/s12934-017-0631-y [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17. Lu, R.‐M. , et al.: Development of therapeutic antibodies for the treatment of diseases. J. Biomed. Sci. 27, 1 (2020). 10.1186/s12929-019-0592-z [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18. Pham, P.V. : Chapter 19 ‐ medical biotechnology: techniques and applications. In: Barh, D. , Azevedo, V. (eds.) Omics Technologies and Bio‐Engineering, pp. 449–469. Academic Press, Cambridge; (2018). 10.1016/B978-0-12-804659-3.00019-1 [DOI] [Google Scholar]
  • 19. Global protein therapeutics market size (2021 to 2026). 175 (2021)
  • 20. Sezonov, G. , Joseleau‐Petit, D. , D’Ari, R. : Escherichia coli physiology in Luria‐Bertani broth. J. Bacteriol. 189(23), 8746–8749 (2007). 10.1128/jb.01368-07 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21. Huang, C.‐J. , Lin, H. , Yang, X. : Industrial production of recombinant therapeutics in Escherichia coli and its recent advancements. J. Ind. Microbiol. Biotechnol. 39(3), 383–399 (2012). 10.1007/s10295-011-1082-9 [DOI] [PubMed] [Google Scholar]
  • 22. Sahdev, S. , Khattar, S.K. , Saini, K.S. : Production of active eukaryotic proteins through bacterial expression systems: a review of the existing biotechnology strategies. Mol. Cell. Biochem. 307(1‐2), 249–264 (2007). 10.1007/s11010-007-9603-6 [DOI] [PubMed] [Google Scholar]
  • 23. Manta, B. , Boyd, D. , Berkmen, M. : Disulfide bond formation in the periplasm of Escherichia coli . EcoSal Plus 8(2), ecosalplus.esp‐0012‐2018 (2019). 10.1128/ecosalplus.esp-0012-2018 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24. Bosnjak, I. , et al.: Occurrence of protein disulfide bonds in different domains of life: a comparison of proteins from the Protein Data Bank. Protein Eng. Des. Sel. 27(3), 65–72 (2014). 10.1093/protein/gzt063 [DOI] [PubMed] [Google Scholar]
  • 25. Benz, I. , Schmidt, M.A. : Glycosylation with heptose residues mediated by the aah gene product is essential for adherence of the AIDA‐I adhesin. Mol. Microbiol. 40(6), 1403–1413 (2001). 10.1046/j.1365-2958.2001.02487.x [DOI] [PubMed] [Google Scholar]
  • 26. Lindenthal, C. , Elsinghorst, E.A. : Identification of a glycoprotein produced by enterotoxigenic Escherichia coli . Infect. Immun. 67(8), 4084–4091 (1999). 10.1128/iai.67.8.4084-4091.1999 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27. Fisher, A.C. , et al.: Production of secretory and extracellular N‐linked glycoproteins in Escherichia coli . Appl. Environ. Microbiol. 77(3), 871–881 (2011). 10.1128/aem.01901-10 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28. Williams, D.C. , et al.: Cytoplasmic inclusion bodies in Escherichia coli producing biosynthetic human insulin proteins. Science 215(4533), 687–689 (1982). 10.1126/science.7036343 [DOI] [PubMed] [Google Scholar]
  • 29. Chrunyk, B.A. , et al.: Inclusion body formation and protein stability in sequence variants of interleukin‐1β. J. Biol. Chem. 268(24), 18053–18061 (1993). 10.1016/s0021-9258(17)46810-4 [DOI] [PubMed] [Google Scholar]
  • 30. Kamionka, M. : Engineering of therapeutic proteins production in Escherichia coli . Curr. Pharmaceut. Biotechnol. 12(2), 268–274 (2011). 10.2174/138920111794295693 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31. Vieira Gomes, A. , et al.: Comparison of yeasts as hosts for recombinant protein production. Microorganisms 6(2), 38 (2018). 10.3390/microorganisms6020038 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32. Matheson, K. , Parsons, L. , Gammie, A. : Whole‐genome sequence and variant analysis of W303, a widely‐used strain of Saccharomyces cerevisiae . G3 Genes, Genomes, Genetics 7, 2219–2226 (2017). 10.1534/g3.117.040022 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33. LaMantia, M. , Lennarz, W.J. : The essential function of yeast protein disulfide isomerase does not reside in its isomerase activity. Cell 74(5), 899–908 (1993). 10.1016/0092-8674(93)90469-7 [DOI] [PubMed] [Google Scholar]
  • 34. Ptacek, J. , et al.: Global analysis of protein phosphorylation in yeast. Nature 438(7068), 679–684 (2005). 10.1038/nature04187 [DOI] [PubMed] [Google Scholar]
  • 35. Tanner, W. , Lehle, L. : Protein glycosylation in yeast. Biochim. Biophys. Acta Rev. Biomembr. 906(1), 81–99 (1987). 10.1016/0304-4157(87)90006-2 [DOI] [PubMed] [Google Scholar]
  • 36. Weinert, B.T. , et al.: Acetylation dynamics and stoichiometry in Saccharomyces cerevisiae . Mol. Syst. Biol. 10, 716 (2014). 10.15252/msb.156513 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37. Bill, R.M. , Revers, L. , Wilson, I. : Protein Glycosylation. Springer US, New York; (2011) [Google Scholar]
  • 38. Kulagina, N. , et al.: Yeasts as biopharmaceutical production platforms. Front. Fungal Biol. 2, 733492 (2021). 10.3389/ffunb.2021.733492 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39. Meng, Y. , et al.: Production and characterization of recombinant glucose oxidase from Aspergillus niger expressed in Pichia pastoris . Lett. Appl. Microbiol. 58(4), 393–400 (2014). 10.1111/lam.12202 [DOI] [PubMed] [Google Scholar]
  • 40. Li, Y. , et al.: Constitutive expression of a novel isoamylase from Bacillus lentus in Pichia pastoris for starch processing. Process Biochem. 48(9), 1303–1310 (2013). 10.1016/j.procbio.2013.07.001 [DOI] [Google Scholar]
  • 41. de Amorim Araújo, J. , et al.: Coexpression of cellulases in Pichia pastoris as a self‐processing protein fusion. Amb. Express 5(1), 84 (2015). 10.1186/s13568-015-0170-z [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42. Damasceno, L.M. , Huang, C.‐J. , Batt, C.A. : Protein secretion in Pichia pastoris and advances in protein production. Appl. Microbiol. Biotechnol. 93(1), 31–39 (2012). 10.1007/s00253-011-3654-z [DOI] [PubMed] [Google Scholar]
  • 43. Macauley‐Patrick, S. , et al.: Heterologous protein production using the Pichia pastoris expression system. Yeast 22(4), 249–270 (2005). 10.1002/yea.1208 [DOI] [PubMed] [Google Scholar]
  • 44. Nevalainen, K.M.H. , Te’o, V.S.J. , Bergquist, P.L. : Heterologous protein expression in filamentous fungi. Trends Biotechnol. 23(9), 468–474 (2005). 10.1016/j.tibtech.2005.06.002 [DOI] [PubMed] [Google Scholar]
  • 45. Smith, G.E. , Summers, M.D. , Fraser, M.J. : Production of human beta interferon in insect cells infected with a baculovirus expression vector. Mol. Cell Biol. 3(12), 2156–2165 (1983). 10.1128/mcb.3.12.2156-2165.1983 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46. Kitts, P.A. , Possee, R.D. : A method for producing recombinant baculovirus expression vectors at high frequency. Biotechniques 14, 810–817 (1993) [PubMed] [Google Scholar]
  • 47. van Oers, M.M. , Pijlman, G.P. , Vlak, J.M. : Thirty years of baculovirus–insect cell protein expression: from dark horse to mainstream technology. J. Gen. Virol. 96(1), 6–23 (2015). 10.1099/vir.0.067108-0 [DOI] [PubMed] [Google Scholar]
  • 48. Hollister, J. , et al.: Engineering the protein N‐glycosylation pathway in insect cells for production of biantennary, complex N‐glycans. Biochemistry 41(50), 15093–15104 (2002). 10.1021/bi026455d [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 49. Jarvis, D.L. : Developing baculovirus‐insect cell expression systems for humanized recombinant glycoprotein production. Virology 310, 1–7 (2003). 10.1016/s0042-6822(03)00120-x [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 50. Jenkins, N. , Parekh, R.B. , James, D.C. : Getting the glycosylation right: implications for the biotechnology industry. Nat. Biotechnol. 14(8), 975–981 (1996). 10.1038/nbt0896-975 [DOI] [PubMed] [Google Scholar]
  • 51. Yee, C.M. , et al.: The coming age of insect cells for manufacturing and development of protein therapeutics. Ind. Eng. Chem. Res. 57(31), 10061–10070 (2018). 10.1021/acs.iecr.8b00985 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 52. Walsh, G. : Biopharmaceutical benchmarks 2018. Nat. Biotechnol. 36(12), 1136–1145 (2018). 10.1038/nbt.4305 [DOI] [PubMed] [Google Scholar]
  • 53. Carillo, S. , et al.: Glycosylation analysis of therapeutic glycoproteins produced in CHO cells. In: Meleady, P. (ed.) Heterologous Protein Production in CHO Cells, vol. 1603, pp. 227–241. Springer, New York: (2017) [DOI] [PubMed] [Google Scholar]
  • 54. Croset, A. , et al.: Differences in the glycosylation of recombinant proteins expressed in HEK and CHO cells. J. Biotechnol. 161(3), 336–348 (2012). 10.1016/j.jbiotec.2012.06.038 [DOI] [PubMed] [Google Scholar]
  • 55. Wang, J.‐J. , Rojanatavorn, K. , Shih, J.C.H. : Increased production of Bacillus Keratinase by chromosomal integration of multiple copies of the kerA gene. Biotechnol. Bioeng. 87(4), 459–464 (2004). 10.1002/bit.20145 [DOI] [PubMed] [Google Scholar]
  • 56. Gustafsson, C. , Govindarajan, S. , Minshull, J. : Codon bias and heterologous protein expression. Trends Biotechnol. 22(7), 346–353 (2004). 10.1016/j.tibtech.2004.04.006 [DOI] [PubMed] [Google Scholar]
  • 57. Palomares, L.A. , Estrada‐Moncada, S. , Ramírez, O.T. : Production of recombinant proteins. In: Balbás, P. , Lorence, A. (eds.) Recombinant Gene Expression: Reviews and Protocols, pp. 15–51. Humana Press, Totowa; (2004). 10.1385/1-59259-774-2:015 [DOI] [Google Scholar]
  • 58. Nielsen, J. , Keasling, J.D. : Engineering cellular metabolism. Cell 164(6), 1185–1197 (2016). 10.1016/j.cell.2016.02.004 [DOI] [PubMed] [Google Scholar]
  • 59. Hillson, N. , et al.: Building a global alliance of biofoundries. Nat. Commun. 10(1), 2040 (2019). 10.1038/s41467-019-10079-2 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 60. Freemont, P.S. : Synthetic biology industry: data‐driven design is creating new opportunities in biotechnology. Emerg. Top. Life Sci. 3(5), 651–657 (2019). 10.1042/etls20190040 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 61. Zrimec, J. , et al.: Deep learning suggests that gene expression is encoded in all parts of a co‐evolving interacting gene regulatory structure. Nat. Commun. 11(1), 6141 (2020). 10.1038/s41467-020-19921-4 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 62. Avsec, Ž. , et al.: Effective gene expression prediction from sequence by integrating long‐range interactions. Nat. Methods 18(10), 1196–1203 (2021). 10.1038/s41592-021-01252-x [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 63. Ji, Y. , et al.: DNABERT: pre‐trained bidirectional encoder representations from transformers model for DNA‐language in genome. Bioinformatics 37(15), 2112–2120 (2021). 10.1093/bioinformatics/btab083 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 64. LeCun, Y. , Bengio, Y. , Hinton, G. : Deep learning. Nature 521(7553), 436–444 (2015). 10.1038/nature14539 [DOI] [PubMed] [Google Scholar]
  • 65. Wang, P. , Fan, E. , Wang, P. : Comparative analysis of image classification algorithms based on traditional machine learning and deep learning. Pattern Recogn. Lett. 141, 61–67 (2021). 10.1016/j.patrec.2020.07.042 [DOI] [Google Scholar]
  • 66. Korotcov, A. , et al.: Comparison of deep learning with multiple machine learning methods and metrics using diverse drug discovery data sets. Mol. Pharm. 14(12), 4462–4475 (2017). 10.1021/acs.molpharmaceut.7b00578 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 67. Paterakis, N.G. , et al.: Deep learning versus traditional machine learning methods for aggregated energy demand prediction. In: 2017 IEEE PES Innovative Smart Grid Technologies Conference Europe (ISGT‐Europe), pp. 1–6 (2017). 10.1109/ISGTEurope.2017.8260289 [DOI] [Google Scholar]
  • 68. Monk, J.M. , et al.: iML1515, a knowledgebase that computes Escherichia coli traits. Nat. Biotechnol. 35(10), 904–908 (2017). 10.1038/nbt.3956 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 69. Lu, H. , et al.: A consensus S. cerevisiae metabolic model Yeast8 and its ecosystem for comprehensively probing cellular metabolism. Nat. Commun. 10(1), 3586 (2019). 10.1038/s41467-019-11581-3 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 70. Lee, S.Y. , Kim, H.U. : Systems strategies for developing industrial microbial strains. Nat. Biotechnol. 33(10), 1061–1072 (2015). 10.1038/nbt.3365 [DOI] [PubMed] [Google Scholar]
  • 71. Munro, L.J. , Kell, D.B. : Intelligent host engineering for metabolic flux optimisation in biotechnology. Biochem. J. 478(20), 3685–3721 (2021). 10.1042/bcj20210535 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 72. Radivojević, T. , et al.: ART: a machine learning Automated Recommendation Tool for synthetic biology. Nat. Commun. 11(1), 4879 (2020). 10.1038/s41467-020-18008-4 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 73. Larsen, S. , et al.: Mutant strains of Pichia pastoris with enhanced secretion of recombinant proteins. Biotechnol. Lett. 35(11), 1925–1935 (2013). 10.1007/s10529-013-1290-7 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 74. Presnell, K.V. , Alper, H.S. : Systems metabolic engineering meets machine learning: a new era for data‐driven metabolic engineering. Biotechnol. J. 14(9), 1800416 (2019). 10.1002/biot.201800416 [DOI] [PubMed] [Google Scholar]
  • 75. Redding‐Johanson, A.M. , et al.: Targeted proteomics for metabolic pathway optimization: application to terpene production. Metab. Eng. 13(2), 194–203 (2011). 10.1016/j.ymben.2010.12.005 [DOI] [PubMed] [Google Scholar]
  • 76. Ohtake, T. , et al.: Metabolomics‐driven approach to solving a CoA imbalance for improved 1‐butanol production in Escherichia coli . Metab. Eng. 41, 135–143 (2017). 10.1016/j.ymben.2017.04.003 [DOI] [PubMed] [Google Scholar]
  • 77. Alonso‐Gutierrez, J. , et al.: Principal component analysis of proteomics (PCAP) as a tool to direct metabolic engineering. Metab. Eng. 28, 123–133 (2015). 10.1016/j.ymben.2014.11.011 [DOI] [PubMed] [Google Scholar]
  • 78. Alonso‐Gutierrez, J. , et al.: Metabolic engineering of Escherichia coli for limonene and perillyl alcohol production. Metab. Eng. 19, 33–41 (2013). 10.1016/j.ymben.2013.05.004 [DOI] [PubMed] [Google Scholar]
  • 79. Zhang, J. , et al.: Combining mechanistic and machine learning models for predictive engineering and optimization of tryptophan metabolism. Nat. Commun. 11(1), 4880 (2020). 10.1038/s41467-020-17910-1 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 80. Crater, J.S. , Lievense, J.C. : Scale‐up of industrial microbial processes. FEMS Microbiol. Lett. 365(13), fny138 (2018). 10.1093/femsle/fny138 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 81. Pavlyshenko, B. : Using stacking approaches for machine learning models. In: 2018 IEEE Second International Conference on Data Stream Mining Processing (DSMP), pp. 255–258 (2018). 10.1109/DSMP.2018.8478522 [DOI] [Google Scholar]
  • 82. Zhou, Z.‐H. : Ensemble Methods: Foundations and Algorithms. CRC Press; (2012) [Google Scholar]
  • 83. Lin‐Cereghino, G.P. , et al.: The effect of α‐mating factor secretion signal mutations on recombinant protein expression in Pichia pastoris . Gene 519(2), 311–317 (2013). 10.1016/j.gene.2013.01.062 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 84. Boursier, M.E. , et al.: The luminescent HiBiT peptide enables selective quantitation of G protein–coupled receptor ligand engagement and internalization in living cells. J. Biol. Chem. 295(15), 5124–5135 (2020). 10.1074/jbc.ra119.011952 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 85. Gasteiger, E. , et al.: Protein identification and analysis tools on the ExPASy server. In: Walker, J.M. (ed.) The Proteomics Protocols Handbook, pp. 571–607. Humana Press, Totowa; (2005). 10.1385/1-59259-890-0:571 [DOI] [Google Scholar]
  • 86. Waterhouse, A. , et al.: SWISS‐MODEL: homology modelling of protein structures and complexes. Nucleic Acids Res. 46(W1), W296–W303 (2018). 10.1093/nar/gky427 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 87. Ahmad, M. , et al.: Protein expression in Pichia pastoris: recent achievements and perspectives for heterologous protein production. Appl. Microbiol. Biotechnol. 98(12), 5301–5317 (2014). 10.1007/s00253-014-5732-5 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 88. Weidner, M. , Taupp, M. , Hallam, S.J. : Expression of recombinant proteins in the methylotrophic yeast Pichia pastoris . JoVE(36), 1862 (2010). 10.3791/1862 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 89. Greener, J.G. , et al.: A guide to machine learning for biologists. Nat. Rev. Mol. Cell Biol. 23(1), 40–55 (2022). 10.1038/s41580-021-00407-0 [DOI] [PubMed] [Google Scholar]
  • 90. Sanner, M.F. : Python: a programming language for software integration and development. J. Mol. Graph. Model. 17, 57–61 (1999) [PubMed] [Google Scholar]
  • 91. Raschka, S. , Patterson, J. , Nolet, C. : Machine learning in Python: main developments and technology trends in data science, machine learning, and artificial intelligence. Information 11(4), 193 (2020). 10.3390/info11040193 [DOI] [Google Scholar]
  • 92. Cock, P.J.A. , et al.: Biopython: freely available Python tools for computational molecular biology and bioinformatics. Bioinformatics 25(11), 1422–1423 (2009). 10.1093/bioinformatics/btp163 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 93. Pedregosa, F. , et al.: Scikit‐learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011) [Google Scholar]
  • 94. Abadi, M. et al.: TensorFlow: large‐scale machine learning on heterogeneous distributed systems. (2016)Preprint. 10.48550/arXiv.1603.04467 [DOI]
  • 95. Paszke, A. , et al.: PyTorch: an imperative style, high‐performance deep learning library. In: Advances in Neural Information Processing Systems, vol. 32. Curran Associates, Inc. (2019) [Google Scholar]
  • 96. Chollet, F.K. : Deep learning for humans. (2022). https://github.com/keras‐team/keras
  • 97. Isensee, F. , et al.: nnU‐Net: a self‐configuring method for deep learning‐based biomedical image segmentation. Nat. Methods 18(2), 203–211 (2021). 10.1038/s41592-020-01008-z [DOI] [PubMed] [Google Scholar]
  • 98. Joshi, A.V. : Azure machine learning. In: Joshi, A.V. (ed.) Machine Learning and Artificial Intelligence, pp. 207–220. Springer International Publishing, New York; (2020). 10.1007/978-3-030-26622-6_22 [DOI] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

The data that support the findings of this study are available from Eden Bio Ltd. Restrictions apply to the availability of these data, which were used under licence for this study. Data are available from the authors with the permission of Eden Bio Ltd.


Articles from Engineering Biology are provided here courtesy of Wiley

RESOURCES