Skip to main content
Biosafety and Health logoLink to Biosafety and Health
. 2025 Sep 13;7(5):306–311. doi: 10.1016/j.bsheal.2025.09.004

Beyond evolution: De novo designed protein toolkit rewriting the rules of synthetic biology

Guohao Zhang 1,1, Chuanyang Liu 1,1, Wenying Li 1,1, Jiajie Lu 1, Ang Li 1, Lingyun Zhu 1,
PMCID: PMC12624534  PMID: 41262455

Highlights

  • AI-driven de novo protein design enables atom-level precision in synthetic biology.

  • Novel protein structures demand robust biosafety and bioethics assessment.

  • Future prospects span from protein modules to fully-synthetic cellular systems.

Keywords: De novo protein design, Synthetic biology, Biosafety

Abstract

Artificial intelligence (AI)-driven de novo protein design is revolutionizing synthetic biology by facilitating the first-principle rational engineering of protein-based functional modules unbound by known structural templates and evolutionary constraints, enabling a diverse range of applications. Expressing these novel, structurally unprecedented proteins within cellular systems inherently adds complexity to their functional unpredictability. Robust biosafety and bioethics evaluations are therefore required to address potential risks such as immune reactions, cellular pathway disruptions, and environmental persistence. We systematically analyse the computational frameworks underpinning this revolution and highlight the capability of de novo proteins to act as a modular toolkit for synthetic biology. Looking forward, we envision integrating closed-loop validation with multi-omics profiling for comprehensive risk assessments along with a hierarchical design framework for advancing the future of synthetic biology – from the creation of tailored de novo functional protein modules and structure-guided rational genetic circuits design to the development of full-synthetic cellular systems, thereby establishing a scalable path from protein design to system-level implementation.

1. AI-driven de novo design: A new era for synthetic biology

The 2024 Nobel Prize in Chemistry recognized pioneering advances in artificial intelligence (AI)-driven de novo protein structure prediction and design, marking a transformative shift in biotechnology – from decoding natural protein catalogue to creating custom biological blueprints. This breakthrough frees biological engineering from the constraints of evolutionary history. Natural proteins, optimized for biological fitness rather than biotechnological versatility, are often confined to local optima in the protein functional fitness landscape. Traditional methods such as directed evolution and phage display modify pre-existing proteins, inherently limiting the scope for novel functions [1,2]. In contrast, AI-driven de novo protein design leverages computational frameworks to rationally design biomolecules tailored to specific structural or functional requirements, generating diverse candidate designs without relying on natural starting points [3].

AI-driven de novo protein design follows an inverse biomolecular design paradigm that progresses from function to structure and, ultimately, to sequence. This methodology is analogous to the multi-level architecture of synthetic biology, which organizes modules, circuits, and systems [[4], [5], [6]]. In this framework, modules are the basic functional units such as genes, proteins, or regulatory elements that perform specific biological tasks. These modules can be assembled into circuits to perform more complex functions, much like components in electronic circuits. At a higher level, multiple circuits are integrated into systems that form complete biological entities, such as synthetic cells or engineered organisms with desired functions. This layered framework supports the predictable and controllable construction of synthetic biological systems [7,8].

By delivering atom-level precision and programmability, AI-driven de novo protein design provides synthetic biology with a new generation of high-performance modules precisely engineered to fulfill specific functional requirements in diverse synthetic biology applications. This integrative paradigm shifts synthetic biology from the empirical assembly of functional parts into an atom-level, autonomous design process for task-specific modules [9,10]. As a result, synthetic genetic circuits and biological systems can be constructed with greater controllability, predictability, and efficiency – a transformation driven by advances in data science, machine learning, and biophysical modeling (Fig. 1).

Fig. 1.

Fig. 1

AI-driven de novo protein design converges with synthetic biology to outline a hierarchical framework for advancing the future of synthetic biology. A) Driven by the accumulation of extensive experimental datasets and advances in AI algorithms, AI-driven de novo protein design has progressed to enable the generation of synthesizable protein sequences with near-atomic precision and high predictive accuracy directly from target functions or structures. Following experimental validation, these designs can serve as robust, standardized modules for synthetic biology. Feeding wet-lab results from de novo designs back into the training datasets establishes a “design-test-learn” cycle that iteratively improves subsequent designs and raises overall success rates. B) Looking ahead, we envision a hierarchical framework for advancing synthetic biology – beginning with AI-driven, first-principles design of tailored modules, proceeding to the rational engineering of synthetic circuits enabled by genome editing and regulatory design, and ultimately to the construction of fully synthetic cellular systems (Syn-cells). Syn-cells can be integrated to an AI virtual cell (AIVC) platform to simulate intracellular dynamics and predict biosafety endpoints (toxicity, off-target interactions, system-level perturbations), providing a safer and more reliable prioritization of designs before any in-cell or in vivo testing. Abbreviations: AI, artificial intelligence; LLMs, large language models; GenAI, generative artificial intelligence. This figure was created using Adobe Illustrator, and some icons were generated with BioRender “https://BioRender.com/dvs4myp” (2025).

Over the past two decades, advances in deoxyribonucleic acid (DNA) sequencing and experimental determination of protein structures and properties have provided the essential data for training AI-driven de novo protein prediction and design models, including sequences, structures, and functional annotations [11]. Concurrently, progress in algorithms, intense learning [12], [13], [14], [15], has enabled the analysis of large-scale protein sequence, structure, and function (SSF) datasets. These developments support the construction of models that accurately map the protein SSF landscapes. Coupled with enhanced computational power, advances in AI have led to the development of a broad suite of high-performance protein design tools (Table 1). Notably, the combination of de novo design tools such as RFdiffusion [16] and ProteinMPNN [17] has demonstrated exceptional capabilities in engineering novel proteins for diverse biological applications [[18], [19], [20], [21], [22], [23], [24]]. By utilizing multimodal datasets that integrate SSF data, these tools generate high-dimensional representations, such as embeddings of protein space or protein fitness landscapes. Encoded within large-scale neural networks, this framework facilitates precise exploration and systematic navigation of the vast dominant/hidden protein space. These AI-driven design platforms have already been applied to a wide range of biological applications, from enzyme engineering to therapeutic protein development. Critically, experimental measurements of biophysical stability and functional performance are continuously fed back into the training datasets, facilitating the iterative refinement of the models (Fig. 1A). This closed‑loop workflow not only improves the quality and success rate of each design cycle but also lays the foundation for constructing ever more stable and predictable synthetic genetic circuits – featuring precisely engineered proteins with modular architectures, programmable outputs, and allosteric control [1,11].

Table 1.

Overview of high-performance protein design models.

Models Core task Application scenarios Description Reference
Alphafold2 Predicting protein structure from a given amino acid sequence Predicting single-chain protein structures A deep learning method that predicts single-chain protein structures with atomic precision from amino-acid sequences. [21]
RoseTTAfold All-atom Predicting protein structure from a given amino acid sequence Protein-protein and protein–ligand complex modeling A three-track deep model that rapidly predicts all-atom protein structures and assemblies by jointly reasoning over sequence, distance maps, and coordinates. [22]
Alphafold3 Predicting protein structure from a given amino acid sequence Protein-protein and protein–ligand complex modeling An advanced Alphafold-class model extending accurate structural prediction to multimeric complexes and interaction interfaces. [23]
RFdiffusion Generating a protein backbone for a given function De novo backbone/topology design; binder design; symmetric oligomer and active-site scaffolding A diffusion-based generative model that produces de novo protein backbone conditioned on motifs, symmetry, or binding constraints. [16]
RFdiffusion2 Generating a protein backbone for a given function Atom-level enzyme active-site scaffolding; precise ligand/cofactor placement prior to experimental testing An enhanced, atom-aware diffusion model offering finer control for active-site and ligand scaffolding. [24]
ProteinMPNN Sequence design conditioned on backbone/structure Designing sequences to stabilize de novo backbones A graph-neural-network sequence-design model that generates amino-acid sequences optimized for a given 3-dimensions (3D) backbone. [17]
LigandMPNN Sequence design conditioned on backbone/structure Enzyme active-site design for specific substrates/cofactors; biosensor and small-molecule binder design A context-aware message passing neural network (MPNN) that models protein–ligand atomic interactions to design sequences and side-chains compatible with small-molecule binding. [51]
ESM3 Sequence-structure–function co-generation Zero/few-shot functional prediction; sequence generation conditioned on function; landscape mapping and candidate prioritization A large-scale protein language model producing sequence/structure embeddings for property prediction, sequence generation, and functional inference. [26]

2. AI-driven rational design overcomes evolutionary limits

The integration of AI-driven protein engineering with experimental validation has markedly accelerated the development of tailored protein-based functional modules, from enzyme engineering [[25], [26]] and biosensors [27] to therapeutic proteins [20] and allosteric protein nanoscale assemblies [28], and this progress can be framed in three complementary aspects.

2.1. Structure and function prediction

Sequence-based structure predictors such as Alphafold2 [21] can accurately predict tertiary structures directly from primary sequence and provide quantitative metrics [for example, the predicted local distance difference test score (pLDDT) and predicted-versus-design Cα root-mean-square deviation (RMSD)] that are routinely used to filter candidate designs prior to wet-experimental verification. Complementary sequence-to-function tools (e.g., DPFunc [29]) identify key regions within the predicted structure and infer their likely functions from sequence alone. These AI tools supply principled, quantitative guidance for de novo protein design and enable virtual in silico screening of binding pockets and overall fold plausibility, thereby reducing the amount of wet-lab validation required and increasing experimental success rates and throughput.

2.2. Protein optimization

Building on prediction and pre-screening, AI methods are increasingly applied to improve the existing protein properties such as solubility, thermostability, and expression yield. For example, K.H. Sumida et al. [30] utilised ProteinMPNN to redesign myoglobin and, after Alphafold2-based virtual screening, observed that 5 of 20 designs retained significant heme-binding activity at 95 °C and closely matched the design model (Cα RMSD = 0.66 Å). J. Wang et al. [31] developed the iMARS framework to dissect spatial structure–function relationships in artificial multienzyme complexes. Their designed fusion enzymes increased resveratrol titers by 45.1-fold across multiple host strains and substantially enhanced multienzyme catalytic efficiency in applications such as polyethylene terephthalate (PET) plastic depolymerization and vanillin biosynthesis.

2.3. De novo protein design

AI-driven pipelines have produced novel structures and functions that are experimentally validated. Lauko [25] designed a serine hydrolase featuring a novel topology, which exhibited a catalytic efficiency (kcat/Km) of up to 2.2 × 105 M−1 s−1, and crystal structures aligned closely with the design models (Cα RMSDs < 1 Å). Notably, 15 % (20/132) of the designed variants exhibited detectable catalytic activity, laying the foundation for the development of high-efficiency biocatalysts. Similarly, J.Z. Zhang et al. [27] developed genetically encoded biosensors (Ras-LOCKR-S/PL) that detect endogenous Ras activity and mark surrounding environments at subcellular resolution, surpassing the limitations of natural scaffolds. In another significant study, S.V. Torres et al. [20] used RFdiffusion [16] to engineer potent, stable binders that neutralize elapid venom toxins. Of 44 initial short-chain α-neurotoxin-binding designs, one bound with Kd = 842 nmol/L (nM). After partial diffusion optimization, 11 of 78 variants (−14 %) improved affinity with the top candidate (SHRT) reaching 0.9 nM, and its crystal structure matched the design (RMSD = 1.04 Å). For long-chain α-neurotoxin, the best binder (LNG) achieved Kd = 1.9 nM (complex RMSD = 0.42 Å). Against cytotoxin, the highest-affinity design (CYTX) bound at Kd = 271 nM (complex RMSD = 1.32 Å). Animal experiments demonstrated that SHRT and LNG protect entirely mice from lethal doses of toxins, overcoming the limitations of traditional antibody therapies, such as high costs and severe adverse reactions, and opening new avenues for therapeutic development. Additionally, A. Pillai et al. [28] showcased allosterically switchable protein assemblies with potential in switchable nanomaterials, drug delivery carriers, and gene switches – one designed tetramer aligned closely with its computational model (monomer RMSD = 1.82 Å).

These rationally designed proteins offer precise control over structures and functions, forming an ideal modular toolkit for synthetic biology. Within the stratified framework of synthetic biology, de novo designed proteins enhance the modular level by providing robust and standardized components. For example, de novo designed enzymes can be optimized for improved catalytic efficiency and enhanced stability, thereby improving the effectiveness of biosynthetic pathways. These proteins can serve as foundational modules for assembling synthetic genetic circuits, which integrate multiple modules to perform coordinated functions. Furthermore, de novo proteins can be engineered to be orthogonal, interacting exclusively with their designated targets, ensuring the independent and predictable behavior of individual modules within complex environments. These de novo designed protein-based functional modules support the scalability of synthetic biology, from genetic circuits to fully synthetic systems, enabling robust solutions in biomanufacturing, cellular computation, and biological therapeutic engineering.

3. Discussion and perspectives

3.1. Reshaping synthetic biology with AI-driven rational design

AI-driven de novo protein design is opening new horizons in synthetic biology by complementing established modular strategies with atom-level precision. Traditional engineering paradigms that assemble well-characterized parts in a “Lego-like” modular fashion have proven effective for rapid prototyping and for leveraging decades of accumulated biological knowledge. At the same time, context-dependent interactions can introduce variability when discrete parts are assembled in novel systems. AI-driven methods address this challenge by expanding the design toolkit in two main ways. First, they enable the on-demand creation of protein modules with tailored features (such as binding affinity, stability, or allosteric control) to achieve specific system-level behaviors. Second, they combine high-resolution structural predictions with network-scale modeling: platforms like Alphafold3 [23] and RoseTTAfold All-atom [22] produce atom-precision models of multi-protein modules, enabling more informed exploration of SSF relationships. For example, A. A. Nava et al. [32] applied Alphafold2 to modular engineering of polyketide synthase (PKS), generating conformationally stable PKS structures. This improved inter-domain interface prediction and module compatibility assessment, offering novel structural guidance for PKS engineering. Similarly, T. F. Huddy et al. [33] combined ProteinMPNN [17] for sequence design with Alphafold2 for validation to create twistless helix repeat (THR) protein blocks. These standardized THR modules assemble into higher-order nanomaterials, demonstrating scalable, AI-enabled modular design of complex protein architectures.

However, challenges remain. Current AI-based predictors usually output only static models and cannot fully capture the conformational dynamics essential for in vivo function, limiting reliable modelling of induced fit, allostery, and intrinsically disordered regions without complementary molecular dynamics or experimental data [34,35]. Moreover, most AI-driven de novo protein design workflows lack rigorous wet-lab validation or apply inconsistent experimental benchmarks, and few studies include comprehensive in-cell or in vivo functional testing. This limited validation is compounded by concerns about model generalizability across species and expression contexts, raising uncertainty about design robustness and potential unintended immunogenicity. Furthermore, current pipelines rarely model complex post-translational modifications (PTMs) such as glycosylation or phosphorylation, despite their critical influence on protein stability, localization, and function [36].

Addressing these gaps with integrated computational-experimental pipelines, standardized validation, and explicit modeling of biological complexity will serve as a key to translating AI-driven protein design into precise, reliable modules for real-world synthetic biology.

3.2. Biosafety and bioethics

The unprecedented structural novelty and functional scope of AI-designed de novo proteins necessitate robust and comprehensive biosafety and bioethics assessment frameworks. Compared with conventionally engineered biomolecules, AI-designed proteins may persist longer in cells or environments due to enhanced thermostability or resistance to natural degradation [20,25,37], potentially leading to bioaccumulation or chronic off-target effects. Such structural uniqueness can also elicit neo-epitope formation, triggering unintended immune activation, autoimmune reactions, or disruptions to native metabolic and signaling networks. Moreover, the limited capacity of current AI models to capture intracellular dynamics, such as molecular crowding, proteostasis mechanisms, or post-translational PTMs, further increases biosafety uncertainties.

Deep generative models for proteins and nucleic acids can propose sequences and folds with little or no homology to natural counterparts, thereby bypassing existing sequence-based biosafety checks used by nucleic acid synthesis providers [38,39]. The “novel-but-functional” design space thus constitutes a critical blind spot in current screening infrastructures. Recent international debates highlight how AI-enabled biomolecular design raises concerns similar to those of gain-of-function-like, even when not explicitly aimed at pathogen enhancement, due to the potential emergence of unanticipated toxic or pathogenic functions.

Current governance provides partial coverage, leaving gaps for AI-enabled protein design. National Institutes of Health (NIH)-style rules apply when synthesized coding sequences are expressed in living systems [triggering Institutional Biosafety Committee (IBC) review and biosafety levels] [40]. The World Health Organization (WHO) instruments supply dual-use life-science risks guidance but lack operational rules specific to AI-generated proteins [41], and the Cartagena Protocol governs risks from living modified organisms (LMOs) that might arise during production or release of novel constructs, yet does not cover the structural and functional novelty posed by AI-designed proteins [42]. The European Union AI Act (EU AI Act) establishes rules for AI systems (including risk-based obligations for high-risk applications) but does not by itself regulate biological hazard assessments; national biosecurity laws (for example, China’s Biosecurity Law) and recent United States National Security Memorandum (US NSM) on AI set strategic expectations but do not comprehensively operationalize function-first biosafety for AI design pipelines [[43], [44], [45]]. Furthermore, the convergence of generative AI, synthetic biology, and automated experimental – including cloud laboratories, robotics-driven “self-driving” labs, and AI science agents – shortens the design-build-test cycle and scales the throughput of potentially hazardous constructs [46]. This shift from slow, human-driven design to continuous, automated pipelines complicates conventional oversight models developed for episodic experiments. We therefore recommend a pragmatic and integrated governance approach combining: (1) Ethics-by-design – embedding ethical review and dual-use assessment at project inception rather than as an afterthought; (2) Function-first screening – biosecurity policy and practice should adopt an integrated, capacity-aware approach, moving from sequence-only gating to function-first screening, incorporating AI-based predictors of functional activity (e.g., toxicity, binding, enzymatic activity) into synthesis-ordering pipelines and oversight workflows; (3) Dynamic oversight triggers – implementing dynamic oversight triggers based on platform capabilities (for example, model scale, degree of automation, and end-to-end integration across the design-build-test cycle) rather than solely final product properties; (4) Hybrid governance and closing policy gaps – integrate existing biosafety regimes (NIH, WHO, Cartagena) with AI-specific controls (mandatory provenance, model-capability disclosure, platform-triggered review), and implement targeted policy updates: require function-first screening in EU AI Act guidance, add AI-capability-triggered biosafety checks to US NSM guidance, and explicitly include AI proteins in China’s Biosecurity Law.

Technical safety-by-design strategies provide additional mitigation layers, “closed-loop validation” cycles, integrating machine learning with high-throughput assays, refining designs iteratively, predicting cytotoxicity and detecting off-target effects pre-deployment; Multi-omics profiling (transcriptomics, proteomics, metabolomics) enables system-level assessments of cellular responses; Large language models (LLMs), trained on vast biological datasets, offer qualitative insights into how de novo proteins might alter signaling or metabolic pathways. Paired with ordinary differential equation (ODE)-based simulations, researchers can model protein accumulation, degradation, and interactions with native systems, identifying potential risks and safety control like “minimal chassis” designs engineered to streamline genetic and metabolic profiles, reduce unintended interactions [1], while synthetic regulatory circuits with degradation tags, kill-switches, or feedback silencing ensure controlled protein activity and clearance.

Finally, ethical and regulatory oversight must evolve alongside technological capabilities. Transparent governance processes, harmonized international safety standards, active public engagement, and capacity-building in dual-use awareness are essential for responsibly harnessing AI-enhanced de novo protein design. Without adaptive frameworks, innovation risks outpacing the ability of biosafety and bioethics regimes to anticipate and prevent harm.

3.3. Prospect of the future of synthetic biology

Looking ahead, a hierarchical framework for advancing future synthetic biology, spanning from tailored de novo protein modules and rational circuits design to fully-synthetic cell systems, promises to unify synthetic biology’s diverse tools (Fig. 1B). Beginning with atomic precision design tools and enhanced by generative adversarial networks and reinforcement learning, this approach designs stable, metabolically compatible protein-based modules. Graph neural networks standardize biocomponent repositories, enabling seamless integration into synthetic circuits. These modules can be further refined through circuit optimization techniques such as clustered regularly interspaced short palindromic repeats (CRISPR)-based metabolic rewiring, culminating in a full-synthetic cellular system. A standout feature is the advent of the “AI virtual cell” (AIVC) – computational models simulating entire full-synthetic cellular systems using multi-omics data. Platforms like “whole-cell” [47] model bacterial cells with high fidelity, allowing in silico testing of genetic modifications. Validated across molecular and cellular scales, AIVC reduces reliance on trial-and-error, as demonstrated by metabolic flux predictions in Escherichia coli [48]. This framework bridges design and implementation, propelling the future of synthetic biology toward transformative breakthroughs in sustainable biomanufacturing and personalized biological medicine.

4. Summary

AI-driven de novo protein engineering is enhancing synthetic biology by enabling atom‑level design of customizable proteins beyond natural templates with applications across medicine, environmental remediation, and industrial biomanufacturing. To realize these benefits while managing risks, future efforts should integrate generative models with mechanistic biophysics (folding, dynamics, post-translational processing), improve cellular-context fidelity to enhance in vivo predictability, and implement upstream, function-first safety pipelines that screen for toxicity, immunogenicity, and off-target activity prior to synthesis. Emerging trends – such as AI-aided directed evolution to focus and accelerate experimental optimization, and the routine use of cell-free or in vitro validation platforms to test designs in safer, lower-containment contexts – can both raise success rates and reduce biosafety risks [49,[50], [51]]. When coupled with AI‑driven validation pipelines, standardized repositories, and integrative modeling platforms, these advances shift synthetic biology from traditional empirical, nature-derived engineering toward a more systematic, predictive, and safer design paradigm – one in which living systems can be programmed and optimized with unprecedented precision.

Acknowledgements

The work was supported by the Research Program (22-TDRCJH-02-015).

Conflict of interest statement

The authors declare that there are no conflicts of interest.

Author contributions

Guohao Zhang: Writing – review & editing, Writing – original draft. Chuanyang Liu: Writing – review & editing, Writing – original draft. Wenying Li: Writing – review & editing. Jiajie Lu: Writing – review & editing. Ang Li: Writing – review & editing. Lingyun Zhu: Writing – review & editing.

References

  • 1.Kortemme T. De novo protein design—From new structures to programmable functions. Cell. 2024;187:526–544. doi: 10.1016/j.cell.2023.12.028. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Hsu C., Fannjiang C., Listgarten J. Generative models for protein structures and sequences. Nat. Biotechnol. 2024;42:196–199. doi: 10.1038/s41587-023-02115-w. [DOI] [PubMed] [Google Scholar]
  • 3.Huang P.S., Boyken S.E., Baker D. The coming of age of de novo protein design. Nature. 2016;537:320–327. doi: 10.1038/nature19946. [DOI] [PubMed] [Google Scholar]
  • 4.Khalil A.S., Collins J.J. Synthetic biology: Applications come of age. Nat. Rev. Genet. 2010;11:367–379. doi: 10.1038/nrg2775. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Arai R. Hierarchical design of artificial proteins and complexes toward synthetic structural biology. Biophys. Rev. 2018;10:391–410. doi: 10.1007/s12551-017-0376-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Plante M. Epistemology of synthetic biology: A new theoretical framework based on its potential objects and objectives. Front. Bioeng. Biotechnol. 2023;11 doi: 10.3389/fbioe.2023.1266298. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Li H., Lei Y., Zeng J. Revolutionizing biomolecular structure determination with artificial intelligence. Natl. Sci. Rev. 2024;11 doi: 10.1093/nsr/nwae339. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Snoj J., Zhou W., Ljubetič A., Jerala R. Advances in designed bionanomolecular assemblies for biotechnological and biomedical applications. Curr. Opin. Biotechnol. 2025;92 doi: 10.1016/j.copbio.2024.103256. [DOI] [PubMed] [Google Scholar]
  • 9.Chen Y.Y., Galloway K.E., Smolke C.D. Synthetic biology: Advancing biological frontiers by building synthetic systems. Genome Biol. 2012;13:240. doi: 10.1186/gb-2012-13-2-240. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Endy D. Foundations for engineering biology. Nature. 2005;438:449–453. doi: 10.1038/nature04342. [DOI] [PubMed] [Google Scholar]
  • 11.Notin P., Rollins N., Gal Y., Sander C., Marks D. Machine learning for functional protein design. Nat. Biotechnol. 2024;42:216–228. doi: 10.1038/s41587-024-02127-0. [DOI] [PubMed] [Google Scholar]
  • 12.Gao W., Mahajan S.P., Sulam J., Gray J.J. Deep learning in protein structural modeling and design. Patterns. 2020;1 doi: 10.1016/j.patter.2020.100142. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Wu Z., Johnston K.E., Arnold F.H., Yang K.K. Protein sequence design with deep generative models. Curr. Opin. Chem. Biol. 2021;65:18–27. doi: 10.1016/j.cbpa.2021.04.004. [DOI] [PubMed] [Google Scholar]
  • 14.AlQuraishi M. Machine learning in protein structure prediction. Curr. Opin. Chem. Biol. 2021;65:1–8. doi: 10.1016/j.cbpa.2021.04.005. [DOI] [PubMed] [Google Scholar]
  • 15.Strokach A., Kim P.M. Deep generative modeling for protein design. Curr. Opin. Struct. Biol. 2022;72:226–236. doi: 10.1016/j.sbi.2021.11.008. [DOI] [PubMed] [Google Scholar]
  • 16.Watson J.L., Juergens D., Bennett N.R., Trippe B.L., Yim J., Eisenach H.E., Ahern W., Borst A.J., Ragotte R.J., Milles L.F., et al. De novo design of protein structure and function with RFdiffusion. Nature. 2023;620:1089–1100. doi: 10.1038/s41586-023-06415-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Dauparas J., Anishchenko I., Bennett N., Bai H., Ragotte R.J., Milles L.F., Wicky B.I.M., Courbet A., De Haas R.J., Bethel N., et al. Robust deep learning–based protein sequence design using ProteinMPNN. Science. 2022;378:49–56. doi: 10.1126/science.add2187. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Zhang J.Z., Li X., Batingana A.R., Liu C., Jiang H., Shannon K., Huang B.J., Wu K., Baker D. De novo design of Ras isoform selective binders. bioRxiv [Preprint] 2025 doi: 10.1101/2024.08.29.610300. 2024.08.29.610300. [DOI] [Google Scholar]
  • 19.Glögl M., Krishnakumar A., Ragotte R.J., Goreshnik I., Coventry B., Bera A.K., Kang A., Joyce E., Ahn G., Huang B., et al. Target-conditioned diffusion generates potent TNFR superfamily antagonists and agonists. Science. 2024;386:1154–1161. doi: 10.1126/science.adp1779. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.V. Torres S., B. Valle M., Mackessy S.P., Menzies S.K., Casewell N.R., Ahmadi S., Burlet N.J., Muratspahić E., Sappington I., Overath M.D., et al. De novo designed proteins neutralize lethal snake venom toxins. Nature. 2025;639:225–231. doi: 10.1038/s41586-024-08393-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Jumper J., Evans R., Pritzel A., Green T., Figurnov M., Ronneberger O., Tunyasuvunakool K., Bates R., Žídek A., Potapenko A., et al. Highly accurate protein structure prediction with AlphaFold. Nature. 2021;596:583–589. doi: 10.1038/s41586-021-03819-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Krishna R., Wang J., Ahern W., Sturmfels P., Venkatesh P., Kalvet I., Lee G.R., Morey-Burrows F.S., Anishchenko I., Humphreys I.R., et al. Generalized biomolecular modeling and design with RoseTTAFold all-atom. Science. 2024;384 doi: 10.1126/science.adl2528. [DOI] [PubMed] [Google Scholar]
  • 23.Abramson J., Adler J., Dunger J., Evans R., Green T., Pritzel A., Ronneberger O., Willmore L., Ballard A.J., Bambrick J., et al. Accurate structure prediction of biomolecular interactions with AlphaFold 3. Nature. 2024;630:493–500. doi: 10.1038/s41586-024-07487-w. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Ahern W., Yim J., Tischer D., Salike S., Woodbury S.M., Kim D., Kalvet I., Kipnis Y., Coventry B., Altae-Tran H.R., et al. Atom level enzyme active site scaffolding using RFdiffusion2 [Preprint] bioRxiv. 2025 doi: 10.1101/2025.04.09.648075. [DOI] [Google Scholar]
  • 25.Lauko A., Pellock S.J., Sumida K.H., Anishchenko I., Juergens D., Ahern W., Jeung J., Shida A.F., Hunt A., Kalvet I., et al. Computational design of serine hydrolases. Science. 2025;388 doi: 10.1126/science.adu2454. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Hayes T., Rao R., Akin H., Sofroniew N.J., Oktay D., Lin Z., Verkuil R., Tran V.Q., Deaton J., Wiggert M., et al. Simulating 500 million years of evolution with a language model. Science. 2025;387:850–858. doi: 10.1126/science.ads0018. [DOI] [PubMed] [Google Scholar]
  • 27.Zhang J.Z., Nguyen W.H., Greenwood N., Rose J.C., Ong S.E., Maly D.J., Baker D. Computationally designed sensors detect endogenous Ras activity and signaling effectors at subcellular resolution. Nat. Biotechnol. 2024;42:1888–1898. doi: 10.1038/s41587-023-02107-w. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Pillai A., Idris A., Philomin A., Weidle C., Skotheim R., Leung P.J.Y., Broerman A., Demakis C., Borst A.J., Praetorius F., et al. De novo design of allosterically switchable protein assemblies. Nature. 2024;632:911–920. doi: 10.1038/s41586-024-07813-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Wang W., Shuai Y., Zeng M., Fan W., Li M. DPFunc: Accurately predicting protein function via deep learning with domain-guided structure information. Nat. Commun. 2025;16:70. doi: 10.1038/s41467-024-54816-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Sumida K.H., Núñez-Franco R., Kalvet I., Pellock S.J., Wicky B.I.M., Milles L.F., Dauparas J., Wang J., Kipnis Y., Jameson N., et al. Improving protein expression, stability, and function with ProteinMPNN. J. Am. Chem. Soc. 2024;146:2054–2061. doi: 10.1021/jacs.3c10941. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Wang J., Ouyang X., Meng S., Zhao B., Liu L., Li C., Li H., Zheng H., Liu Y., Shi T., et al. Rational multienzyme architecture design with iMARS. Cell. 2025;188:1349–1362.e17. doi: 10.1016/j.cell.2024.12.029. [DOI] [PubMed] [Google Scholar]
  • 32.Nava A.A., Roberts J., Haushalter R.W., Wang Z., Keasling J.D. Module-based polyketide synthase eengineering for de novo polyketide biosynthesis. ACS Synth. Biol. 2023;12:3148–3155. doi: 10.1021/acssynbio.3c00282. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Huddy T.F., Hsia Y., Kibler R.D., Xu J., Bethel N., Nagarajan D., Redler R., Leung P.J.Y., Weidle C., Courbet A., et al. Blueprinting extendable nanomaterials with standardized protein blocks. Nature. 2024;627:898–904. doi: 10.1038/s41586-024-07188-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Agarwal V., McShan A.C. The power and pitfalls of Alphafold2 for structure prediction beyond rigid globular proteins. Nat. Chem. Biol. 2024;20:950–959. doi: 10.1038/s41589-024-01638-w. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Monteiro Da Silva G., Cui J.Y., Dalgarno D.C., Lisi G.P., Rubenstein B.M. High-throughput prediction of protein conformational distributions with subsampled Alphafold2. Nat. Commun. 2024;15:2464. doi: 10.1038/s41467-024-46715-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Meng L., Chan W.S., Huang L., Liu L., Chen X., Zhang W., Wang F., Cheng K., Sun H., Wong K-C. Mini-review: Recent advances in post-translational modification site prediction based on deep learning. Comput. Struct. Biotechnol. J. 2022;20:3522–3532. doi: 10.1016/j.csbj.2022.06.045. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.Zhang T., Liu C., Li W., Kuang J., Qiu X., Min L., Zhu L. Targeted protein degradation in mammalian cells: A promising avenue toward future. Comput. Struct Biotechnol. J. 2022;20:5477–5489. doi: 10.1016/j.csbj.2022.09.038. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.M. Wang, Z. Zhang, A. Singh Bedi, S. Guerra, S. Lin-Gibson, L. Cong, S. Chakraborty, Y. Qu, J. Ma, E. Xing, et al., A call for built-in biosecurity safeguards for generative AI tools, (2025) 845-847, https://doi.org/10.20944/preprints202503.1761.v1. [DOI] [PubMed]
  • 39.Baker D., Church G. Protein design meets biosecurity. Science. 2024;383:349. doi: 10.1126/science.ado1671. [DOI] [PubMed] [Google Scholar]
  • 40.National institutes of health, National institutes of health (NIH) office of science policy (OSP): Proposed changes to the NIH guidelines for research involving recombinant or synthetic nucleic acid molecules (NIH Guidelines). https://www.federalregister.gov/documents/2023/08/10/2023-17178/national-institutes-of-health-nih-office-of-science-policy-osp-proposed-changes-to-the-nih, 2023 (accessed 8 September 2025).
  • 41.Chief Scientist and Science Division (SCI), Global guidance framework for the responsible use of the life sciences: Mitigating biorisks and governing dual-use research. https://www.who.int/publications/i/item/9789240056107, 2022 (accessed 8 September 2025).
  • 42.Secretariat of the convention on biological diversity. Cartagena protocol on biosafety to the convention on biological diversity: text and annexes. https://www.cbd.int/biosafety/, 2000 (accessed 8 September 2025).
  • 43.Centre for long-term resilience, Biological tools and the EU AI Act. https://www.longtermresilience.org/reports/biological-tools-and-the-eu-ai-act/, 2025 (accessed 8 September 2025).
  • 44.Cao C. China’s evolving biosafety/biosecurity legislations. J. Law Biosci. 2021;8 doi: 10.1093/jlb/lsab020. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45.The White House. Memorandum on advancing the United States’ leadership in artificial intelligence; Harnessing artificial intelligence to fulfill national security objectives, and fostering the safety, security, and trustworthiness of artificial intelligence. https://bidenwhitehouse.archives.gov/briefing-room/presidential-actions/2024/10/24/memorandum-on-advancing-the-united-states-leadership-in-artificial-intelligence-harnessing-artificial-intelligence-to-fulfill-national-security-objectives-and-fostering-the-safety-security/, 2024 (accessed 8 September 2025).
  • 46.Boiko D.A., MacKnight R., Kline B., Gomes G. Autonomous chemical research with large language models. Nature. 2023;624:570–578. doi: 10.1038/s41586-023-06792-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47.Karr J.R., Sanghvi J.C., Macklin D.N., Gutschow M.V., Jacobs J.M., Bolival B., Assad-Garcia N., Glass J.I., Covert M.W. A whole-cell ccomputational model predicts phenotype from genotype. Cell. 2012;150:389–401. doi: 10.1016/j.cell.2012.05.044. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 48.Macklin D.N., Ahn-Horst T.A., Choi H., Ruggero N.A., Carrera J., Mason J.C., Sun G., Agmon E., DeFelice M.M., Maayan I., et al. Simultaneous cross-evaluation of heterogeneous E. coli datasets via mechanistic. simulation. Science. 2020;369 doi: 10.1126/science.aav3751. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 49.Jiang K., Yan Z., Di Bernardo M., Sgrizzi S.R., Villiger L., Kayabolen A., Kim B.J., Carscadden J.K., Hiraizumi M., Nishimasu H., et al. Rapid in silico directed evolution by a protein language model with EVOLVEpro. Science. 2025;387 doi: 10.1126/science.adr6006. [DOI] [PubMed] [Google Scholar]
  • 50.Garenne D., Haines M.C., Romantseva E.F., Freemont P., Strychalski E.A., Noireaux V. Cell-free gene expression. Nat. Rev. Methods Primer. 2021;1:49. doi: 10.1038/s43586-021-00046-x. [DOI] [Google Scholar]
  • 51.Dauparas J., Lee G.R., Pecoraro R., An L., Anishchenko I., Glasscock C., Baker D. Atomic context-conditioned protein sequence design using LigandMPNN, Nat. Methods. 2025;22:717–723. doi: 10.1038/s41592-025-02626-1. [DOI] [PMC free article] [PubMed] [Google Scholar]

Articles from Biosafety and Health are provided here courtesy of Elsevier

RESOURCES