Deep learning–driven protein binder design for crop improvement

Muhammad Salman Iqbal; Revocatus Bahitwa; Abdul Ali Azam; Hui Xu; Hai Wang

doi:10.1016/j.abiote.2025.100018

. 2025 Dec 13;7(1):100018. doi: 10.1016/j.abiote.2025.100018

Deep learning–driven protein binder design for crop improvement

Muhammad Salman Iqbal ^a, Revocatus Bahitwa ^a,^b, Abdul Ali Azam ^a, Hui Xu ^a,^c,^d, Hai Wang ^a,^c,^d,^⁎

PMCID: PMC12973395 PMID: 41940160

Abstract

Deep learning (DL) now enables the end-to-end design of protein binders—proteins that bind to specific targets—to precisely modulate protein–protein interactions (PPIs). Models and tools such as BindCraft, AlphaFold, RoseTTAFold, RFdiffusion, and ProteinMPNN predict the structures of these binders and their targets, generate binder sequences, and refine their binding interfaces with increasing accuracy. Most progress so far has been in the therapeutics field, where de novo protein binders are now common targets for design and testing. In plant biology, however, applications remain early and scattered, and clear, crop-focused guidance that links design choices to practical testing is limited. This review explores the potential of protein binders by outlining an end-to-end pipeline for DL-driven protein binder design in plants. The pipeline covers target selection in key pathways for disease resistance, stress tolerance, and plant development; protein binder generation using current DL tools; and tiered in silico screening of the best binders using interface quality and stability metrics. We connect these steps to laboratory practice, including initial binding assays, production in suitable systems, and early in planta evaluation before stable transformation. Current constraints are also discussed, including uncertainty in affinity prediction and the limited structural information available for many plant proteins, as well as practical approaches to mitigate risk during target selection. Together, this plant-focused synthesis illustrates how DL-driven protein binder design can be applied to crop engineering and highlights the work still required to move from early demonstrations of success cases to robust agricultural use.

Keywords: AI in agriculture, Crop breeding, Deep learning, Precision agriculture, Protein binder design, Synthetic biology

1. Introduction

Protein binders are protein molecules that bind to target proteins; they are therefore powerful regulators of protein activity with wide-ranging applications in biotechnology, diagnostics, and therapeutics [1]. In agriculture, they can enhance disease resistance, improve tolerance to abiotic stresses, and regulate key developmental stages such as flowering and nutrient uptake, ultimately contributing to increased crop production, improved climate resilience, and greater sustainability [[2], [3], [4]]. In parallel, machine learning (ML)-assisted genome editing is beginning to improve the selection of target genes for genome editing and of editing systems for a given outcome in crops, demonstrating that artificial intelligence (AI)-aided approaches can support trait improvement at multiple molecular levels [2]. However, traditional approaches—approaches that solve the protein inverse folding problem using physics-based energy functions (e.g., in Rosetta's Dock-and-Optimize or Hotspot-Centric approaches) and template-based grafting to predict sequences that bind a target structure—to designing protein binders are slow and inefficient, often relying on iterative mutagenesis, library screening, and structural validation, and are typically associated with low success rates (<0.1 %) [[5], [6], [7]]. In addition, methods based on template structures or homology modeling frequently fail to target proteins with novel folds or those acting in crop-specific pathways, limiting their usefulness for advancing agricultural innovation [8].

Recent advances in deep learning (DL)-driven protein binder design have transformed the de novo development of high-affinity protein binders with unprecedented speed and precision [[9], [10], [11]]. For example, Bennett et al. showed that combining traditional de novo binder design outcomes from the molecular modeling software Rosetta with DL tools (AlphaFold2/RoseTTAFold for fold and interface filtering, and ProteinMPNN using a message-passing neural network [MPNN] for sequence design) increased success rates by approximately 10-fold, and in two practical cases yielded 8–30-fold higher success rates for the identification of a protein binder than physics-based methods alone [12]. Furthermore, a generative diffusion model (RoseTTAFold diffusion, or RFdiffusion) achieved success rates on certain targets roughly 100 times greater than previous approaches and produced experimentally validated protein binders with atomic-level agreement to design models [13]. Recent meta-analyses indicate that the screening burden has now been reduced to fewer than ∼100 design cycles per target in some cases, compared to the thousands previously required with traditional methods [14]. DL tools such as AlphaFold2 (AF2) enable the accurate prediction of 3D protein structures, whereas generative models such as RFdiffusion and DiffPepBuilder can design diverse protein and peptide backbones tailored to specific targets. Additionally, platforms such as AI-Bind have simplified the prediction of protein–protein interactions (PPIs) [[15], [16], [17]]. Despite these advances, most applications remain focused on human therapeutics such as antibodies, nanobodies, and synthetic scaffolds used to treat cancer, autoimmune diseases, and other disorders [7,12,18]. By contrast, plant-related applications are still in their infancy [3], with only a few studies beginning to explore their potential in crop species (Table 1) [2,19,20].

Table 1.

Recent contributions to protein binder design.

Year	Authors (model)	Work	Area	Results	Reference
2018	Hashemifar et al. (DPPI)	DL-based sequence PPI prediction (DPPI)	Sequence-based PPI (crops, interspecies)	94.6 % accuracy, better scaling than peers	[63]
2021	Baek et al. (RoseTTAFold)	Three-track model for protein structure (RoseTTAFold)	Protein structure and PPI prediction	Top 2 in CASP14, limited by hardware	[39]
2023	Watson et al. (RFdiffusion application)	Fungi–rice PPI prediction using RFdiffusion	Crop–pathogen PPI (fungal disease)	78 % recall, short protein binders working best	[20]
2023	Bennett et al. (DL-augmented protocol)	Protein binder validation with AF2/RF2	General protein binder design (crop targets)	10 × improved success over energy-based methods	[28]
2024	Pacesa et al. (BindCraft)	AF2-based protein binder generation pipeline (BindCraft)	Efficient protein binder generation for protection	Up to 100 % success, top EGFR protein binder	[4]
2024	Wang et al. (DiffPepBuilder)	Peptide binder design via SE(3) diffusion (DiffPepBuilder)	Peptide binders for plant proteins	Cyclic peptides are best, with strong recall	[32]
2024	Chatterjee et al. (moPPIt)	Motif-based peptide design via pLMs (moPPIt)	Targeting IDP/effectors in crops	Effective on IDPs, structure-free	[35]
2024	Zambaldi et al. (AlphaProteo)	High-affinity protein binder generation (AlphaProteo)	De novo protein binder design (plant defense)	Sub-nM affinity, high thermal stability	[21]
2024	Sayeed et al. (Prot42)	Sequence-only protein binder design via pLMs (Prot42)	Design without 3D data (plants/fungi)	Handles 8k aa, no structure needed	[31]
2024	Liu et al. (RFdiffusion eval)	Functional evaluation of RFdiffusion protein binders	Diagnostic protein binder assessment	Low detection success in assays	[64]
2024	Baidu Team (HelixDesign-Binder)	Automated protein binder design (HelixDesign-Binder)	Scalable design for crop targets	Consistent outputs, needs HPC	[36]
2025	Ãœnal et al. (Adaptyv EGFR Competition)	EGFR binder competition (Adaptyv)	Benchmarking DL protein binder design methods	601 tested, best K_d = 1.2 nM	[23]
2025	Strom & Luck et al. (PPI-ID)	Interface-focused PPI tool (PPI-ID)	PPI interface detection (plant–pathogens)	Boosts AlphaFold-M accuracy on PPIs	[65]

Open in a new tab

CASP14, is the 14th Critical Assessment of protein Structure Prediction, a biennial blind community challenge (2020) where groups predict protein structures from sequences and are scored against unreleased experimental structures; IDPs, intrinsically disordered proteins, are fungal pathogen effector proteins that lack a fixed structure, yet manipulate host targets through flexible, transient binding; pLMs, protein language models; HPC, high performance computing.

Designing precise synthetic protein binders using DL offers promising strategies to enhance crop performance. These protein binders can inhibit key fungal effectors from plant pathogens [20], modulate regulators of flowering and senescence to optimize reproductive timing and seed production [3], or refine nutrient uptake by transporters to improve fertilizer use efficiency while mitigating effects on the environment [3]. Furthermore, the design of DL-based protein binders enables the rational engineering of new crop traits by controlling plant-specific signaling pathways [21], optimizing root architecture for improved water and nutrient uptake, and adjusting circadian rhythms to enhance resonance with local photoperiods [3].

This review aims to explore the DL-based principles of protein binder design, recent technological advancements, and the potential of protein binders to drive crop improvement. We highlight opportunities for targeting crop-specific proteins and pathways, discuss existing challenges, and identify strategies to overcome these challenges. As ML and synthetic biology converge, DL-driven protein binder design is poised to support a new era of precision agriculture, enabling the development of more productive, climate-resilient, and sustainable crop systems [22].

2. Fundamentals of protein binder design

2.1. Principles of molecular recognition in protein binder design

Recent advances in DL have transformed protein binder design by enabling models to learn the complex principles of molecular recognition that underlie PPIs [23]. These interactions are primarily governed by geometric and chemical complementarity at the binding interfaces, driven by forces such as van der Waals interactions, hydrophobic effects, and electrostatic interactions [8,24,25]. Within these interfaces, specific regions known as hotspots contribute disproportionately to binding energy and are crucial for achieving high-affinity interactions [24,26]. Advanced DL models such as RFdiffusion and AlphaProteo are trained on large-scale structural and sequence datasets, allowing them to implicitly learn these subtle patterns. This capability enables the de novo design of protein binders by exploring large sequence and structure spaces and optimizing binding interfaces without requiring explicit prior knowledge of target sites [27,28].

2.2. Types and structures of engineered protein binders

Engineered protein binders encompass a diverse array of proteins beyond conventional antibodies, including nanobodies, designed ankyrin repeat proteins (DARPins), monobodies, affibodies, anticalins, miniproteins, and peptides [29,30]. DL models such as RFdiffusion and AlphaProteo facilitate the de novo design of protein binders with a wide range of backbone topologies, expanding the structural diversity of synthetic protein binders [4,5,31]. For example, heterochiral D-proteins—proteins that are fully d-amino-acid synthetic mini-proteins that form highly specific D↔L binding interfaces with natural L-targets such as TrkA (NGF-binding site) and human IL-6 (IL-6R-binding site) [25]—have been constructed as stable miniproteins comprising 50–65 residues [25]. DL-based approaches are also well-suited for designing both linear and cyclic peptides, typically 8–30 residues in length, using tools such as DiffPepBuilder. Cyclic peptides, in particular, offer advantages over linear peptides, including lower steric hindrance, lower production costs, and improved cell membrane permeability [32,33]. In plants, many endogenous signaling peptides fall into a similar size range and regulate growth, nutrient balance, and stress responses, highlighting the importance of small peptides as natural regulatory molecules in crop systems [34].

3. A new era of protein binder design

Protein binder design aims to precisely modulate protein function and offers powerful strategies for enhancing crop resistance to diseases. A typical pipeline involves predicting protein binder structures, optimizing binding specificity between a target protein and its designed binder, and understanding the biological roles of the target proteins [35,36]. It is inherently iterative, combining computational modeling, experimental refinement, and rigorous validation to produce highly effective protein binders for use in therapeutics and agricultural biotechnology [5,24,25,37].

3.1. Modern pipelines for DL-driven protein binder design

The integration of DL into protein science has improved the performance of existing protein binders and enabled the fully de novo creation of entirely novel protein binders [38]. DL enhances nearly every step of the design procedure, from target selection to complex structure prediction and PPI modeling, dramatically increasing both speed and efficiency.

3.1.1. Target selection and target structure

A protein target is chosen based on its relevance to growth or resistance pathways, after which a specific biological objective should be defined, for example, modulating a specific PPI [4,26,37]. The target protein is retrieved or verified from curated sources such as UniProt, Protein Data Bank (PDB) records, or from the literature. If the 3D structure of the protein is not available, it should be predicted from the sequence using state-of-the-art DL structure predictors such as AF2 or RoseTTAFold2 (RF2) (Fig. 1A) [12,39]. This step provides the structural context needed for site selection and downstream design, replacing slower physics-only methods and reducing early attrition [4,5,28].

Fig. 1 — Deep learning–driven protein binder design for crop improvement. A Target identification and structure prediction: Key plant proteins involved in growth or resistance pathways are selected, and their sequences are obtained from the literature or databases. When structural information is unavailable, DL tools such as AlphaFold are used for structure prediction. B Protein binder structure and sequence design: Binding sites within the target protein are identified, and the 3D structures of the intended protein binder are generated and refined using generative models such as RoseTTAFold or RFdiffusion. The structures of the resulting designed protein binders are converted into amino acid sequences using sequence design tools such as ProteinMPNN. C Computational and experimental validation: Candidate protein binders are filtered computationally and experimentally validated using screening platforms such as yeast surface display (YSD) assays and quantitative biophysical assays such as surface plasmon resonance (SPR). Validated protein binders are reverse-translated into DNA sequences for vector construction. D Genetic engineering and plant transformation: Recombinant vectors are introduced into *Agrobacterium tumefaciens* for transfer into target plant tissues. Transgenic plants are regenerated through tissue culture and tested for the accumulation of the synthetic protein binder, before being evaluated for improved growth and resistance traits (Created with BioRender. Wang, H (2025) https://BioRender.com/yxtdon9).

3.1.2. Binding site, sequence, and structure of the protein binder

The binding site on the target protein should be defined by using structure-aware DL tools to identify PPI-prone patches and hotspots. AF2 or AlphaFold (AF)-Multimer can reveal compatible interfaces, whereas geometric DL approaches, such as CASTp and Molecular Surface Interaction Fingerprinting (MaSIF)-site, identify interaction-ready surface fingerprints, including those on challenging or less structured regions [24,35,40]. The protein binder backbone and the protein binder–target complex are then generated by employing generative structure models to propose compatible scaffolds for the protein binder that can dock to the selected site. RFdiffusion- and RoseTTAFold-based protocols iteratively sample protein binder conformations and interface poses—the specific relative orientations and positions of the binder and target proteins at their contact surface—directly in the 3D structure space, enabling de novo design by denoising random noise and recovery of known topologies (Fig. 1B) [20,35,41,42]. Approaches such as BindCraft use backpropagation through AF2 to hallucinate protein binders while allowing flexibility in both partners: they start from random (or fuzzy) sequences and optimize them by backpropagating through AF2-Multimer so the model predicts a high-confidence fold and a tight target-binding interface—i.e., they make AF2 “imagine” a new binder that would bind the target [4]. Until this stage, the protein binder is a shape, but not yet a peptide. Therefore, the sequence for a compatible protein binder is designed by using ProteinMPNN to assign sequences that efficiently stabilize both the fold and the interface [12,23]. For peptide binders, DiffPepBuilder can co-design their structure and sequence while incorporating stabilizing motifs such as disulfide bonds [32]. Emerging systems such as AlphaProteo provide ready-to-test designs with improved affinities and higher positive hit rates when empirically tested [21].

3.1.3. From in silico and laboratory validation to reverse translation

The designs for different protein binders are ranked, using interface and complex quality metrics such as predicted local distance difference test (LDDT), predicted LDDT (pLDDT), predicted aligned error (pAE), interface energies, and surface complementarity between the two partners to prioritize candidates for laboratory testing. Experimental assessment involves screening selected protein binders using display-based assays, such as yeast surface display (YSD) assays, which enable directed evolution of protein binders by expressing candidate proteins on the yeast cell surface for high-throughput screening and selection, or similar platforms, to check for binding and production properties. Surface plasmon resonance (SPR) is a label-free, real-time biosensor in which one partner is immobilized on a gold chip and the other flows over it; binding changes the local refractive index, shifting the plasmon signal to yield affinity and kinetics. SPR can be used to measure k_on (binding rate constant) k_off (unbinding rate constant), the dissociation constant K_d, and thermodynamic parameters (ΔG, ΔH, ΔS) for shortlisted designs, with results fed back to steps in Fig. 1B as needed. A validated protein or peptide sequence is then reverse-translated into a DNA sequence codon-optimized for the intended crop species and the resulting sequence is cloned into suitable vectors supporting expression in and transformation of the intended host system (Fig. 1C).

3.1.4. Plant delivery and trait realization

The above expression cassette subcloned into a binary vector is introduced into Agrobacterium tumefaciens for plant transformation using the most appropriate transformation protocol, which may involve tissue culture to regenerate transgenic plants. The accumulation of the designed protein binder should be confirmed; pathogen resistance or growth outcomes should then be assessed and quantified in seedlings and mature plants. Functional protein binders that enhance disease resistance or growth traits by modulating the intended PPI network will be identified and retained (Fig. 1D).

4. In silico evaluation and in vitro validation

The final stage of protein binder design involves comprehensive validation, which combines in silico filtering with in vitro experimental testing to identify the most promising protein binders.

4.1. Computational filtering

Computational filtering serves as a critical preliminary step to prioritize candidates for laboratory validation. DL models, particularly AF2 and RoseTTAFold, are widely used to evaluate whether designed protein binders are likely to adopt stable structures and effectively engage their targets (Fig. 1C) [12]. AF2-based filtering has significantly improved design success rates by assessing the structural plausibility of predicted complexes [4,12]. Key metrics used for evaluation include pLDDT, interface predicted template modeling score (i_pTM), and i_pAE (interface pAE), which are effective for binary classification of protein binders as either plausible or poor candidates (Table 2). However, these scores do not always strongly correlate with actual binding affinities. More advanced models, such as the multimodal language models Evolutionary Scale Modeling version 3 (ESM3) and ESM Cambrian (ESM-C), have shown improved predictive power by correlating log-probability scores with binding strength, particularly across different binding categories [23]. In parallel, the physics-based scoring methods employed by Rosetta, including assessment of shape complementarity and hydrogen bond saturation, are used alongside DL-derived metrics to refine candidate selection and improve discriminative accuracy [4]. This integrated approach enables the efficient narrowing down of high-quality designs before proceeding to costly and time-consuming experimental validation.

Table 2.

Evaluation metrics for the validation of protein binder design.

Metric name	Description	Application area	Best value	Reference
Predicted ΔG (FoldX)	Estimated binding free energy between the protein binder and the target	Protein binder design (FoldX)	Lower is better	[36]
Interface hydrophobicity	Apolar–apolar contact ratio at the binding interface	Protein binder interface quality	Higher is better	[36]
TM-score	Fold similarity of structures	Accuracy of structures	>0.5 good, ∼1.0 ideal	[6]
RMSD	Mean atom-wise deviation from target	Accuracy of structures	<2 Å ideal	[6]
GDT	% residues aligned within cutoff	Structure match quality	Higher is better	[66]
LDDT	Local atomic accuracy	Structure detail confidence	>70 good	[33]
pTM	Complex formation confidence	Overall protein binder complex	>0.7 good	[35]
i_pTM	Confidence in interface formation	Protein binder–target interaction	>0.7 good	[35]
K_d	Binding strength of protein binder–target	Interaction strength	Lower is better (nM–pM)	[67]
L-RMSD	Deviation of ligand/peptide structure	Ligand structure fit	<2 Å ideal	[32]
ddG (ΔΔG)	Change in binding/stability upon mutation	Protein binder stability	Lower is better	[32]
pLDDT	Per-residue confidence	Structure confidence	>70 good	[66]
pAE	Expected positional error between residues in the protein complex	PPI interface quality	Lower is better	[13,66]
i_pAE	Interface predicted alignment error	PPI interface fit	Lower is better	[66]
Combined loss	Multi-model design loss score	Protein binder model selection	Lower is better	[33]
Contact Molecular Surface	Weighted molecular contact area	Interface quality	Higher is better	[5]

Open in a new tab

TM-score, template modeling score; RMSD, root mean square deviation; GDT, global distance test; LDDT, local distance difference test; pTM, predicted template modeling score; i_pTM, interface predicted template modeling score; K_d, binding affinity; L-RMSD, ligand RMSD; pLDDT, predicted local distance difference test; pAE, predicted aligned error; i_pAE, interface pAE.

4.2. Experimental validation

Despite major advances in computational prediction and filtering, experimental validation remains essential to confirm the structural integrity, binding specificity, and functional performance of designed protein binders toward their intended protein partner.

4.2.1. Binding affinity and kinetics

In addition to SPR, Bio-Layer Interferometry (BLI) is another gold-standard technique for quantifying binding affinities (K_d values) and kinetic parameters (k_on and k_off). BLI is label-free and real-time: a ligand-coated fiber tip is dipped into analyte solutions, and binding thickens the optical layer, causing a wavelength/interference shift used to extract kinetics/affinity (Fig. 1C) [4,21]. Molecular Dynamics (MD) simulations, including those performed with the software GROningen MAchine for Chemical Simulations (GROMACS) and the toolkit Open Molecular Mechanics (OpenMM), are used to evaluate the stability and tight binding of the designed protein binder to its target, which is important for ensuring efficient activity and achieving the desired biological outcome [1]. Additionally, Homogeneous Time-Resolved Fluorescence (HTRF), a mix-and-read TR-FRET assay using donor/acceptor fluorophores in which energy transfer produces a time-gated fluorescence ratio proportional to binding without wash steps when two labeled partners bind and come close, provides a sensitive and high-throughput method for detecting binding signals, particularly in screening contexts [21].

4.2.2. Structural validation

High-resolution structural analysis is essential for confirming the accuracy of computationally designed protein binders. Techniques such as X-ray crystallography and cryo-electron microscopy (cryo-EM) can be used to determine the 3D structures of protein binder–target complexes, enabling validation of predicted binding modes and overall structural integrity [21]. Circular dichroism (CD) spectroscopy is also commonly used to assess proper folding and the secondary structure content of the designed binders [4].

4.2.3. Functional assays

Beyond binding affinity, the true measure of a protein binder lies in its functional efficacy. Although not yet widely applied in plants, designed protein binders have demonstrated therapeutic and biotechnological utility across diverse applications, including inhibiting IgE-allergen interactions, modulating clustered regularly interspaced short palindromic repeats associated nuclease 9 (CRISPR-Cas9) activity, neutralizing bacterial enterotoxins, and redirecting adeno-associated virus (AAV) capsids for targeted gene delivery. For antiviral applications, virus neutralization assays, such as those used for SARS-CoV-2, assess the ability of protein binders to block infection by targeting viral proteins [21].

4.2.4. Specificity and accumulation

Specificity is typically assessed through all-by-all binding assays and competition experiments using known competitive protein binders. To confirm the binding mode and the contribution of key interactions, site-directed mutagenesis changing specific amino acids (aa) within the binding interface is performed, followed by functional analysis [21]. Peptide production and solubility are also critical for the practical utility of designed protein binders, and these are commonly evaluated using techniques such as SDS-PAGE and protein quantification assays [21]. Additionally, YSD assays offer a high-throughput platform to assess both the accumulation and the binding of a candidate protein binder during the early stages of screening (Fig. 1C).

The integration of DL with advanced computational and experimental techniques marks a major step toward realizing a “one design, one binder” paradigm in protein engineering [4]. Although challenges remain, particularly in accurately predicting binding affinity in silico and targeting structurally complex or elusive binding sites, ongoing advancements are steadily improving these capabilities. This progress holds considerable promise for diverse biological applications, including the precise control of protein functions, which is especially valuable for crop improvement and agricultural innovation.

5. Working example for designing a protein binder to a fungal effector protein

To design a small protein binder (50–120 aa) targeting a secreted fungal effector protein implicated in cereal crop disease, two inputs are required: (i) the structure of the receptor or the effector target, if these are different, obtained either from an experimental PDB entry or an AF2-predicted model of the effector (monomer), and (ii) optional coarse binding-site information derived from the literature or conservation analysis, if available.

5.1. Step 1–5

Step 1
RFdiffusion (backbone/interface first): The RFdiffusion task “binder design” should be run to generate backbones for candidate protein binders positioned on the effector surface, using repository defaults for the number of trajectories and samples. This step will generate 100–1000 backbones for downstream triage [13].
Step 2
Sequence design: For each backbone, a peptide sequence should be designed using ProteinMPNN (with default temperature and backbone noise settings, enabling interchain coupling for heteromers). The output will be 1–5 sequences per backbone. ProteinMPNN is state-of-the-art for robust sequence design on fixed backbones [43].
Step 3
Structure/complex prediction: Each backbone design obtained in the previous step should be scored with AlphaFold-Multimer (5 models per run). Designs should be retained when they show (i) high and consistent i_pTM values across recycles and models, (ii) a well-packed interface, and (iii) low pAE in the interface region.
Step 4
Sequence-level filters (specificity and developability): Designs that contain low-complexity regions, repetitive sequences, extreme pI values, or predicted aggregation motifs should be removed. A diverse set of sequences and interface poses should be retained.
Step 5
Minimal in vitro plan: The DNA sequences encoding the top ∼20 designs should be synthesized and cloned into an expression vector. The resulting candidate proteins should be produced and purified before being used for screening in pull-down assays or BLI/SPR. Candidates that bind to their intended target should be subjected to another round of design to iterate the design cycle.

6. End-to-end pipelines and tools

The advent of DL has driven the development of integrated, end-to-end pipelines that streamline the entire procedure for protein binder design, from target selection to candidate generation and evaluation [44]. These pipelines unify key computational steps, including target structure input, sequence and structure generation, and in silico validation, into a cohesive and automated system [4]. By replacing traditionally time-consuming and low-throughput experimental steps with high-efficiency computational screening, these pipelines enable rapid assessment of binding affinity, stability, and structural compatibility, significantly accelerating protein binder discovery.

6.1. BindCraft: backpropagation through AF2

One of the most innovative platforms in modern protein binder design is BindCraft, which integrates the predictive power of AF2 directly into the design pipeline [4]. Beginning with the 3D structure of a target protein, BindCraft uses backpropagation through AF2 to hallucinate novel protein binder sequences and interfaces tailored to the target [7]. This iterative process optimizes candidate sequences based on AF2-derived confidence metrics, predicted molecular interactions, and specific design objectives [45]. After the initial generation of candidates, ProteinMPNN is employed to optimize both core and surface residues while maintaining the designed binding interface [4]. Final candidates are evaluated using a combination of AF2 monomeric confidence scores and energy-based metrics calculated by Rosetta to assess structural integrity and interaction quality [45]. Designed for broad accessibility, BindCraft reduces the need for deep computational expertise and supports flexible target backbone handling, enabling it to model binding-induced conformational changes, which is a key advantage when working with dynamic protein systems.

6.2. Modular backbone and sequence design approach with RFdiffusion, ProteinMPNN, and AlphaProteo

A widely used strategy in DL-driven protein binder design involves decoupling backbone generation from sequence optimization. In this modular approach, RFdiffusion first generates a protein backbone by starting with random noise and iteratively refining it into a structure that fits the target interface (Fig. 1B) [20]. Once the backbone is generated, it is passed to ProteinMPNN, which predicts amino acid sequences likely to fold into the given structure while preserving target interaction potential. Although more accurate than earlier models, this method often requires generating thousands of designs to identify high-quality protein binders. A similar two-step framework is implemented in AlphaProteo, which combines models that learn sequence–structure relationships with downstream evaluation using AlphaFold, selecting candidates that perform well in in silico assessments [21]. To facilitate the choice between RFdiffusion and AlphaProteo, we have provided a side-by-side comparison of their end-to-end binder pipelines—how each generates initial backbones, couples backbone generation to sequence design, and ranks candidates, and what their main strengths, limitations, and typical use-cases look like in practice, highlighting where they are complementary versus where one approach may be preferable depending on the target and design goals (Table 3).

Table 3.

Practical comparison of RFdiffusion and AlphaProteo features for protein binder design.

Dimension	RFdiffusion (backbone diffusion)	AlphaProteo (end-to-end generative)
Design paradigm	Generates protein binder backbones to a user-defined epitope; then sequence design (e.g., ProteinMPNN) and AF2/ColabFold re-ranking	Produces protein binders end-to-end from learned sequence–structure priors; AF-based scoring used for selection
Binding-site control	High (user sets hotspot residues; precise epitope/geometry steering)	Moderate (interface emerges from the model; less direct epitope steering)
Key parameters/knobs	Protein binder length sweep ∼60–120 aa; hundreds of trajectories; set 'hotspot_res'; ProteinMPNN temp ∼0.1–0.3 (5–10 seq/backbone); AF2 multi-seed + extra recycles; select by i_pTM/interface pLDDT/pAE	Fewer exposed knobs; rely on dataset/target choice, then AF-based complex confidence/predicted affinity for down-selection
Computational efficiency	Moderate; cost scales with trajectories and AF2 re-ranking; easy to batch/parallelize	Moderate to high overall; screening phase is efficient once models are set up
Accuracy/hit rate	High when hotspots are well-chosen and AF2 filtering is strict; strong prospective gains vs. physics-only baselines	High stringency reported (including sub-nM cases) with good stability; strong experimental breadth
Data requirements	Target structure/model plus interface hints ideal (AF-Multimer for complexes to define hotspots)	Target structure/model helpful but not always required for initial proposals; AF-based rescoring recommended
Applicable target types	Defined epitopes, edges/concavities; improved handling of β-edge interfaces with newer recipes; can struggle on very flat/polar patches without extra cues	Broad classes; strong on compact interfaces; less hand-tunable for very specific patch geometries
Strengths	Fine interface control; transparent triage (hotspot → MPNN → AF2); large community know-how and tutorials	End-to-end candidates with strong affinity/stability; fewer manual steps to high-stringency hits
Limitations/caveats	Needs good epitope definition; success tied to AF2 rescoring; more manual tuning	Fewer user-visible parameters; interface shape less controllable; public head-to-head benchmarks still expanding
Code/resources	RFdiffusion + ProteinMPNN + ColabFold/AlphaFold-Multimer; documented flags (hotspots, length sweeps, seeds/recycles)	AlphaProteo paper + materials; AF-based selection downstream; implementation details and user guidance are provided with the article

Open in a new tab

6.3. Contrasting design philosophies

These platforms represent distinct yet complementary philosophies in protein binder design. BindCraft leverages AF2 for structure prediction and as an integral component of the design procedure, guiding protein binder generation and interface refinement through backpropagation with the aim of achieving a high-confidence, “one design, one binder” outcome [4]. By contrast, RFdiffusion prioritizes structural complementarity by generating diverse backbone conformations that are subsequently paired with ProteinMPNN for sequence optimization. AlphaProteo adopts a more generalized generative approach, learning sequence–structure relationships and refining outputs through AlphaFold-based scoring. Despite their differences, all three frameworks incorporate specialized loss functions (custom objective terms added to training or optimization that specifically reward good binding designs, not just generic folding), curated datasets tailored to binding interactions, and screening mechanisms to prioritize candidates with favorable folding and binding profiles.

6.4. Accessibility and flexibility

User accessibility varies across platforms, but many, such as BindCraft, prioritize ease of use by offering streamlined inputs and automated pipelines, making advanced protein binder design accessible to non-specialists [4]. At the same time, modular frameworks such as RFdiffusion–ProteinMPNN offer design flexibility, as individual components can be customized or integrated with other tools to meet specific research needs. Despite substantial advances in computational prediction, experimental validation remains essential to test binding affinity, specificity, and biological relevance. Techniques such as SPR, flow cytometry, and functional assays are routinely employed to validate in silico designs. Notably, some platforms, including BindCraft, combine empirical validation with computational scoring, strengthening the connection between predicted performance and real-world functionality.

However, significant challenges persist, particularly in translating computational output into experimentally viable protein binders. Although DL has improved structural prediction and interface design, reliably predicting binding affinities, functional activity, and target specificity remains an open research challenge. Additional considerations include ensuring that the protein binders are soluble, stable, and expressible in biological systems, factors that current models do not always address comprehensively. Moreover, selecting appropriate filtering thresholds and confidence metrics often requires case-specific tuning, depending on the protein target and experimental context. Nevertheless, the field is progressing rapidly toward more robust, integrated, and user-friendly design ecosystems. The growing availability of open-source code, web-based platforms, and Google Colab–compatible tools is democratizing access, enabling wider adoption across agricultural and other biotechnological applications.

7. Practical workflows and case studies

7.1. Case study 1: de novo design via DL backpropagation (BindCraft targeting CLDN1)

The selection phase of this de novo design campaign focused on automatically generating highly functional protein binders. The specific target chosen was Claudin 1 (CLDN1), using its soluble extracellular domain (sCLDN1). The desired functional prediction for the successful protein binder was inhibition of cytotoxicity caused by Clostridium perfringens enterotoxin (CpE) [4].

The computational design phase employed the BindCraft pipeline, which automates all steps through the ColabDesign implementation of AF2. The design trajectory, known as a hallucination, begins with a randomly initialized protein binder sequence and structural input from the target. The central computational step involves repeatedly tweaking the amino acid sequence and asking AF2 to predict how it would fold and bind the target. It keeps making small changes until the predicted binder has a good shape and fits tightly against the target surface. This process explores a continuous sequence landscape across four optimization stages.

Following candidate generation, designs undergo rigorous in silico filtering. Metrics include AF2 confidence scores, such as pLDDT and i_pAE, specifically addressing protein binder stability and accuracy of the predicted binding mode [12]. Successful candidates must also satisfy physics-based criteria, such as high Rosetta interface shape complementarity. These filters help mitigate type I failures (misfolding) and type II failures (incorrect binding pose) [4,28].

For the functional prediction and confirmation phase, the highest-ranking designs are produced in E. coli and evaluated experimentally. When designs were tested using sCLDN1, Binder12 demonstrated strong validation with nanomolar affinity. Functional prediction was confirmed when protein binders, including Binder9 and Binder12, inhibited CpE-based cytotoxicity in Sf9 cells expressing CLDN1 in a concentration-dependent manner. The molecular mechanism was verified by showing that preincubation of Binder12 with recombinant purified CLDN1 blocked CpE binding, indicating successful competition for the intended extracellular binding site.

7.2. Case study 2: accurate de novo design using heterochiral (mirror-image) principles

This case study focused on developing therapeutic protein binders composed of d-amino acids—enantiomers of l-amino acids—for natural protein targets assembled from l-amino acids, a strategy that provides increased stability against proteases. Such protein binders are called D-protein binders, and their targets are L-proteins. The selected targets for this approach included Tropomyosin receptor kinase A (TrkA), a receptor for nerve growth factor, and Interleukin-6 (IL-6) [25].

Sun et al. recently demonstrated a mirror-image computational–experimental pipeline for accurate de novo design of heterochiral binders. In their study, they first executed all design and screening steps in mirror-image space, and then converted the best L-binders into D-binders for testing against the natural L-targets [25]. Computational design then generated L-protein binders intended to interact with the D-protein target. Structures from D-TrkA and D-IL-6 were prepared by mirror-imaging the corresponding structures from their L-protein equivalent available from the PDB across the y–z planes by changing the sign of the x coordinates. The design used the Rotamer Interaction Field (RIF) docking protocol, which docks isolated l-amino-acid side-chain rotamers against the D-protein surface to identify favorable hydrogen-bonding or hydrophobic interaction motifs [5,25].

A key step for high-throughput evaluation involves screening the designed L-protein binders using a YSD assay against the refolded D-protein targets. Successful L-protein binders are enriched across several selection rounds at varying target concentrations. This mirror-image evaluation is advantageous because the YSD assay provides a highly parallel and efficient platform for testing L-protein binders.

For the functional prediction and confirmation phase, each selected L-protein binder is chemically synthesized into its mirror-image D-protein form. Based on symmetry principles, this D-protein binder is predicted to bind the natural L-protein target. For the target protein IL-6, the D-protein binder D-25367-evoL-IL-6 was tested in a cell-based assay, where it effectively inhibited IL-6 signaling, which was monitored by measuring secreted alkaline phosphatase (SEAP) production in transfected HEK-293T cells. Structural prediction accuracy was assessed by X-ray crystallography of the resulting heterochiral complexes (e.g., D-19437-L-Pep-1), followed by comparison of the backbone positions of the alpha-carbon atom (Cα) of each amino acid with those of the model. This analysis gave a Cα–root mean square deviation (RMSD) value of 0.6 Å, indicating that the structures were nearly identical to the design models [25].

8. Common failure patterns and their solutions

8.1. Type I failures (misfolding)

This type of failure occurs when the designed amino acid sequence does not fold into the intended monomeric structure. Solutions include using state-of-the-art structure prediction tools such as AF2 or RF2 to evaluate the fold of the monomer; if the predicted structure deviates substantially from the designed backbone (for example, high Cα RMSD disagreement), the design is rejected [28]. Design protocols may also incorporate metrics such as NetSolP (a bioinformatics tool that predicts how soluble a protein will be when expressed in E. coli, using only its amino-acid sequence) usability, and low aggregation propensity scores as computed by the tool Aggrescan3D to improve predicted solubility and stability, thereby reducing the likelihood of type I failures [23].

8.2. Type II failures (incorrect binding mode)

In this case, the designed protein folds correctly but fails to form the intended binding interface or binds in an unexpected orientation. These errors were a major cause of the low success rates of early de novo design methods. The primary solution is to use DL-derived metrics specific to protein complexes. The pAE for interchain residue pairs (pAE_interaction) computed by AF2 or RF2 is highly predictive of success, and success rates are significantly higher for designs with pAE_interaction <10 [12,28].

8.3. Physical/chemical failures

Even when the overall structure is predicted correctly, the interface chemistry may be suboptimal, such as the presence of buried unsatisfied polar atoms (i.e., polar groups that are buried in the up buried in the protein–protein interface and do not form hydrogen bonds or salt bridges) or poor geometric fit (low shape complementarity). These issues can be addressed by incorporating penalties for buried unsatisfied polar atoms during computational refinement, for example, using Rosetta FastDesign protocols [5,24].

8.4. Low diversity/redundancy

Generative models may produce redundant or insufficiently diverse designs. This issue can be mitigated by calculating metrics such as the predicted template modeling (pTM) score (average TM-score of a generated peptide against all others) among generated peptides; lower mean pTM-scores indicate greater structural diversity [32].

9. Potential of DL-driven protein binders in crop improvement

Protein binders are well established as effective therapeutic modalities (Table 4), and recent advances now show strong potential for designed binders to modulate plant protein–protein interactions in a targeted, plant-specific manner to achieve desirable traits and improve crop performance. Several promising use cases are outlined below.

Table 4.

Examples of real-life applications of DL-driven protein binders.

Protein binder	Target(s)	Applications	Reference
HelixDesign-Binder	IL-7Rα, FGFR2, VirB8, PDGFR, TrKA, InsR	Therapeutic, diagnostic, crop use; higher protein binder affinity; HPC-enabled design	[36]
SHRT	Short-chain α-neurotoxins, using a previously designed consensus toxin (ScNtx) as a representative template	Blocks neurotoxin binding; neutralizes in vitro; enables safer antivenoms	[68]
LNG	Long-chain α-neurotoxins, using α-cobratoxin as a representative target	Blocks nAChR binding; neutralizes toxin; enables safer antivenoms	[68]
BindCraft-designed inhibitor	TEM-171 β-lactamase	Antimicrobial resistance therapy; restoration of β-lactam antibiotic efficacy; DL + evolutionary optimization	[1]
CYTX	Cytotoxins (Naja pallida cytotoxin)	Blocks membrane disruption; neutralizes venom; prevents snakebite damage	[68]
CYTX_B10	Cytotoxins (Naja pallida cytotoxin)	CYTX design; crystallography matched; supports RFdiffusion protein binders	[68]
HA-PD1	PD-L1	Cancer immunotherapy	[69]
GDM_SC2BRD_11,27,104,50,129	SARS-CoV-2 spike S1	Neutralization of SARS-CoV-2 variants; validated via cryo-EM	[21]
BindCraft-designed protein binder	Birch allergen	Lower IgE binding in patient-derived samples	[4]
GDM_VEGFA_54,71	VEGF-A	Inhibition of VEGF signaling in human cells	[21]
AlphaProteo-designed protein binder	SARS-CoV-2 (specifically SARS-CoV-2-RBD)	SARS-CoV-2 neutralization in Vero monkey cells	[21]
BindCraft-designed protein binder	Cas9 (a multidomain nuclease)	Modulating Cas9 gene editing activity	[4]
BindCraft-designed protein binder	Foodborne bacterial enterotoxin	Lower cytotoxicity	[4]
BindCraft-designed protein binder (cell surface receptor-specific)	Cell surface receptors	Redirecting AAV capsids for targeted gene delivery	[4]

Open in a new tab

9.1. Precision disease resistance and enhanced stress acclimation

Once data and model limitations are addressed, DL-designed protein binders could hold substantial potential for improving plant protection. ML frameworks that guide genome editing in crops should be able to help identify important genes and design more precise edits, suggesting that DL-designed protein binders could be integrated with these tools in future breeding strategies [2]. Antimicrobial resistance represents an urgent biological challenge for which next-generation protein binders, supported by DL, could play a transformative role in resistance development [1]. Mechanistically, protein binders can be directed toward pathogen effector proteins that suppress host immunity. Examples include neutralizing apoplastic effector proteins from fungal pathogens before they reach host targets or blocking RxLR- or Crinkler (CRN)-type effector proteins from entering plant cells [46]. In host-directed strategies, protein binders could be used to stabilize or activate stress signaling hubs such as kinases from the SNF1-related protein kinase 2 (SnRK2) family (for example, OPEN STOMATA 1 [OST1], also named SnRK2.6) by blocking protein phosphatase 2C (PP2C) inhibition, thereby enhancing abscisic acid (ABA)-responsive drought protection. Conversely, they can inhibit mitogen-activated protein kinase (MAPK) nodes such as MPK3 and MPK6 to modulate excessive stress responses that compromise growth [30,47]. These interventions provide a molecular basis for durable, broad-spectrum disease resistance and for fine-tuned abiotic stress acclimation without the growth penalties associated with constitutive activation (Fig. 2).

Fig. 2 — Potential applications of synthetic protein binders designed for specific targets. Validated DL-based synthetic protein binders can selectively recognize and modulate the function of proteins from plants or their pathogens involved in key biological pathways. By targeting fungal effectors, regulatory proteins, and transporters, these designed molecules could enable precision disease resistance, climate-adaptive stress tolerance, developmental control, efficient nutrient use, and advanced breeding strategies. Collectively, these applications position DL-designed protein binders as powerful tools for engineering resilient, high-producing, and climate-smart crops (Created with BioRender. Wang, H (2025) https://BioRender.com/2mkf1ps).

9.2. Modulating developmental stages: flowering, senescence, and haploid induction

DL-designed protein binders could offer precise control over key aspects of crop development. For example, in flowering control, protein binders that sequester FLOWERING LOCUS T (FT) away from its basic leucine zipper (bZIP) partner FD at the shoot apex could delay the floral transition, whereas protein binders that stabilize the FT–FD complex or inhibit the FT antagonist TERMINAL FLOWER 1 (TFL1) might promote earlier flowering and greater synchrony across environments [48]. For senescence regulation, protein binders able to target the NAC transcription factor ORESARA 1 (ORE1, also reported as NAC2) might be able to suppress age-related transcriptional programs, prolonging photosynthesis and grain filling (Fig. 2). Alternatively, protein binders that destabilize ETHYLENE-INSENSITIVE 3 (EIN3) may help slow down the progression of ethylene-driven senescence [49].

For rapid line development, protein binders that block the activity or trafficking of the pollen phospholipase MATRILINEAL (MTL, also known as Not like dad [NLD] and Phospholipase A1 [PLA1]) may help increase haploid induction rates by perturbing the contribution of the paternal genome during double fertilization, thereby accelerating the production of homozygous lines (Fig. 2) [50,51]. Several families of small secreted peptides, such as CLAVATA3/EMBRYO SURROUNDING REGION (ESR)-RELATED (CLE)-type, C-TERMINALLY ENCODED PEPTIDE (CEP)-type, and EPIDERMAL PATTERNING FACTOR-LIKE (EPFL)-type signals, naturally regulate meristem activity, root architecture, and grain traits. These ligand–receptor modules could be promising targets for tuning via designed protein binders or synthetic peptide analogs [34].

9.3. Engineering root systems, nutrient use, and circadian rhythms

Designing protein binders that can act on auxin transport would provide a direct method for redesigning root systems. Protein binders that block the intracellular entry pocket or stabilize the inward-facing conformation of PIN-FORMED (PIN) transporters (for example, PIN1 and PIN2) could reduce polar auxin flux and promote deeper, thicker roots. Conversely, protein binders that enhance the residency of the ABC cassette transporter ABCB19 (also named PGP19) at the plasma membrane could increase basipetal auxin transport and promote exploratory rooting [52,53].

An option for tuning nutrient uptake would be by biasing transporter conformations. Protein binders that favor the high-affinity state of the nitrate transceptor NITRATE TRANSPORTER 1.1 (NRT1.1, also reported as NRT1/PEPTIDE RECEPTOR FAMILY 6.3 [NPF6.3]) could enhance nitrogen acquisition in soils with low nitrate content, and those that stabilize the plasma-membrane localization of PHT1-type phosphate transporters may help improve phosphorus use efficiency and reduce fertilizer demand [54].

For circadian regulation, protein binders that stabilize the F-box photoreceptor ZEITLUPE (ZTL) toward degradation of evening components could shorten the circadian period, whereas protein binders that disrupt binding to DNA by the Evening Complex composed of EARLY FLOWERING 3 (ELF3)–ELF4–LUX ARRHYTHMO (LUX) may lengthen it. Both routes could allow adjustment of the internal circadian phase to the local photoperiods, improving photosynthetic performance under variable climates [55].

9.4. Expanding breeding horizons and beyond

DL-designed protein binders could also create new opportunities in plant breeding. In reproductive systems based on self-incompatibility, protein binders that sequester S-RNases that are presented on the style surface in Solanaceae or block receptor–ligand signaling between S-receptor kinase (SRK) and S-locus Cysteine-Rich protein (SCR, also named S-locus Protein 11 [SP11]) in Brassicaceae might transiently relax pollen–pistil barriers, enabling crosses between otherwise incompatible species and expanding genetic diversity [56].

For photosynthesis, protein binders that stabilize Rubisco activase (RCA) hexamers under heat stress or protect the D1 (PsbA) protein of photosystem II (PSII) in the reaction center from photodamage might maintain electron transport and carboxylation capacity during episodes of high temperature or intense light [57].

Together, these thought experiments illustrate how DL-designed protein binders may act through specific proteins and interactions to deliver measurable improvements in plant development, nutrient use, and resilience. Realizing this potential will require continued crop-specific data generation, refinement of DL pipelines, and integration of protein binder design with breeding workflows, positioning synthetic protein binders as practical tools for climate-smart agriculture.

10. Challenges

Designing protein binders in the context of plant biology, particularly for applications such as targeted crop breeding or mitigating fungal disease, presents unique challenges related to data availability, model adjustments, and in vivo delivery. Predicting interspecies PPIs, including those between crop plants and their fungal pathogens, is substantially more difficult than in human or mammalian systems. Several of the key reasons for this are discussed below.

10.1. Data acquisition challenges

The primary challenge in predicting PPIs within plants and in plant–pathogen systems, is the scarcity of plant-specific data. Most major PPI databases, including STRINGdb, MINT, and IntAct, predominantly compile interactions from human or model animal systems and their associated pathogens [20]. For plants, the majority of available interspecies PPI data originates from the model organism Arabidopsis thaliana and its pathogens. In addition, structural data is extremely limited for plant proteins; in fact, proteins from the Viridiplantae account for less than 4 % of all experimental structures in the PDB [20]. This deficiency in structural data for organisms underrepresented in public databases (such as PDB and UniProtKB) restricts the deployment of advanced protein design techniques.

10.2. Challenges in the adaptation of models and tools

Models used for predicting and designing protein binders, which are often developed and benchmarked primarily on human proteins, face generalization challenges when applied to underrepresented plant systems. Conventional sequence-based homology methods have been used, but they are less powerful than modern approaches that incorporate structural homology for predicting interspecies PPIs [[58], [59], [60]]. Furthermore, there is limited information regarding best practices for applying generative models, such as RFdiffusion, specifically for predicting interspecies PPIs in plant–microbe interactions. When using these advanced DL tools, achieving designs that meet recommended quality metrics (for example, pLDDT, pAE, and pTM) can be extremely difficult. Even when computational predictions succeed, domain knowledge remains crucial for interpreting outputs and filtering out improbable host proteins, such as those with unsuitable cellular localization.

10.3. In vivo delivery challenges

Protein binders intended to modulate complex plant–pathogen interactions often need to target plant host proteins or fungal effectors within plant cells. Delivering macromolecules that target such intracellular proteins is challenging because they do not readily cross cellular membranes [37]. Although general strategies for intracellular delivery, including the use of cell-penetrating peptides or nanoparticles, have been explored in the biomedical field, miniproteins and peptides still raise concerns regarding efficient delivery [4,23]. Techniques that use functionalized protein binders for specific protein manipulation, such as inducing degradation in planta through selective autophagy motifs, further highlight the need for effective delivery approaches and application methods tailored to plant cells.

11. Plant-specific considerations

11.1. Construction and curation of plant-specific datasets

Progress depends on data that accurately represent plant biology. Beyond solving more plant protein structures, curated PPI datasets are needed for major crop–pathogen systems such as Magnaporthe–rice (Oryza sativa) and Fusarium–wheat (Triticum aestivum)/maize (Zea mays). Each entry should record the experimental assay (yeast two-hybrid, bimolecular fluorescence complementation assay, co-immunoprecipitation, SPR/BLI) and, when available, quantitative binding strength. To avoid bias, datasets must include well-chosen non-interacting pairs and should be split into training and testing sets by protein families so that close homologs do not leak across split sets. It is also essential to annotate features that are particularly important in plants, including intrinsically disordered regions (IDRs), signal or transit peptides, transmembrane domains, and glycosylation sites, ensuring that models are trained and evaluated in the appropriate biological context (apoplast versus intracellular). Recent work shows that plants produce many small secreted peptides and non-conventional peptides encoded by small open reading frames, and these should be included as potential targets or interaction partners for designed protein binders [34].

11.2. Cross-species transfer learning and DL pipelines

DL models trained on broad protein datasets may be adapted to plant biology through fine-tuning them on crop–pathogen interactions and by using orthology or synteny to find related genes encoding a protein of interest in plant species. Because many plant targets exhibit substantial flexibility [61], pipelines for protein binder design should explicitly account for disorder by allowing flexible-backbone sampling (i.e., letting the protein backbone move during modeling to capture disorder), down-weighting (i.e., giving low-confidence/uncertain regions less influence in early scoring so they do not mislead design) highly uncertain regions early in the pipeline, and refining models at later stages. Recent benchmarking in crops highlights these concerns: a maize-wide evaluation of AlphaFold 2/3, ESMFold, and Boltz reported lower confidence and more stereochemical issues for evolutionarily young proteins (genes/proteins that emerged relatively recently on the evolutionary timescale, so they lack deep homologs across distant species) or plant-specific proteins, emphasizing the need for adapted training approaches, transfer learning, and plant-focused evaluation pipelines [62]. For secreted protein binders, it is important to model N-glycans and disulfide bonds. Recent diffusion-based pipelines recovered a high fraction of known fungus–cereal PPIs and generated diverse protein binder candidates against these fungal effectors and their plant targets (greater than approximately 78 % recall in reported benchmarks). Combining these options with AF-style ranking, ProteinMPNN sequence design, and host–proteome off-target scans can improve hit rates while maintaining selectivity.

11.3. In vivo delivery and deployment

Delivery strategies should be considered at the earliest stages of design. Stable genetic transformation is suitable for durable traits, whereas transient expression methods such as Agrobacterium tumefaciens–mediated infiltration or viral replicons may be more appropriate for rapid screening. Correct targeting signals must be included: secretion peptides for apoplastic targets; nuclear localization signals (NLSs), nuclear exclusion signals (NESs), or organelle signal peptides for intracellular targets; and endoplasmic reticulum (ER) retention tags when lumenal residence is required. Tissue-, stage-, or stress-responsive promoters can help minimize fitness costs by restricting binder expression to where and when it is needed—for example, root-specific promoters for root architecture traits, guard-cell or vascular promoters for pathogen entry/transport targets, developmental promoters active only during flowering or grain filling, or stress-inducible promoters (e.g., pathogen-, drought-, or salinity-responsive) that turn on binders only under challenge, reducing constitutive metabolic burden and pleiotropic effects. Degrons or inducible systems could add a second layer of control: degron tags could enforce rapid, conditional protein turnover (constitutive short half-life degrons, hormone-responsive degrons, or temperature/light-tunable degrons) so that binders accumulate only during desired windows, while inducible promoters or split-protein switches enable user- or environment-triggered activation (e.g., chemical inducers like dexamethasone/estradiol, or optogenetic/light-inducible systems), allowing dose- and time-controlled binder activity during screening or field deployment. A practical deployment path involves testing protein binders in transient assays within a plant system, followed by generating stable transgenic lines and characterizing them for phenotypes, and then conducting multi-environment trials. Throughout this process, codon usage, glyco-compatibility, protease resistance, and thermostability should be optimized for the intended tissue, plant species, and growth conditions.

12. Conclusions

DL-based protein binder design represents a promising new frontier in plant biotechnology. By enabling precise modulation of protein–protein interactions, these tools could offer innovative strategies for improving disease resistance, enhancing stress tolerance, and regulating key developmental stages such as flowering and senescence. However, realizing their full potential will require addressing several important challenges, including the scarcity of plant-specific data and the need for more accurate prediction metrics tailored to plant systems. Future progress should expand curated plant datasets, refining DL models for plant-focused applications, and conducting robust experimental validation to confirm functional relevance. Together, these advancements could support a transformative shift toward sustainable and targeted crop improvement strategies, equipping agriculture to better withstand global challenges and environmental pressures. When integrated with ML-guided genome editing, which improves target selection and editing accuracy in crops, DL-designed protein binders could become part of a broader AI-based toolkit for precise molecular breeding.

CRediT authorship contribution statement

Muhammad Salman Iqbal: Writing – review & editing, Writing – original draft, Visualization, Methodology, Conceptualization. Revocatus Bahitwa: Writing – review & editing, Formal analysis. Abdul Ali Azam: Writing – review & editing. Hui Xu: Writing – review & editing. Hai Wang: Writing – review & editing, Supervision, Formal analysis.

Declaration of competing interests

The authors declare that they have no competing interests.

Acknowledgements

We thank all members of the Wang laboratory for helpful discussions and support. We regret and apologize that some relevant studies, including some recent DL protein binder pipelines, could not be covered here owing to space constraints. This work was supported by the Biological Breeding-National Science and Technology Major Project (2023ZD04076), a Special Guidance Fund for Central Universities (2025AC030), and the 2115 Talent Development Program of China Agricultural University.

Data availability

No data was used for the research described in the article.

References

1.Potluri K. De Novo Structure-Based Design of TEM-171 β-Lactamase Protein Inhibitors Using Integrated Deep Learning and Multi-Scale Simulations to Combat Bacterial Resistance. bioRxiv. 2025 2025.06. 23.661177. [Google Scholar]
2.Chen L., Liu G., Zhang T. Integrating machine learning and genome editing for crop improvement. aBIOTECH. 2024;5(2):262–277. doi: 10.1007/s42994-023-00133-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
3.Peng S., Rajjou L. Advancing plant biology through deep learning-powered natural language processing. Plant Cell Rep. 2024;43(8):208. doi: 10.1007/s00299-024-03294-9. [DOI] [PubMed] [Google Scholar]
4.Pacesa M., Nickel L., Schellhaas C., Schmidt J., Pyatova E., Kissling L., et al. bioRxiv.; 2024. Bindcraft: one-shot design of functional protein binders. [Google Scholar]
5.Cao L., Coventry B., Goreshnik I., Huang B., Sheffler W., Park J.S., et al. Design of protein-binding proteins from the target structure alone. Nature. 2022;605(7910):551–560. doi: 10.1038/s41586-022-04654-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
6.Liu H., Chen Q. Computational protein design with data-driven approaches: recent developments and perspectives. Wiley Interdiscip Rev Comput Mol Sci. 2023;13(3) [Google Scholar]
7.Li Y., Duan Z., Li Z., Xue W. Data and AI-driven synthetic binding protein discovery. Trends Pharmacol Sci. 2025 [Google Scholar]
8.Marchand A., Van Hall-Beauvais A.K., Correia B.E. Computational design of novel protein–protein interactions–An overview on methodological approaches and applications. Curr Opin Struct Biol. 2022;74 [Google Scholar]
9.Khakzad H., Igashov I., Schneuing A., Goverde C., Bronstein M., Correia B. A new age in protein design empowered by deep learning. Cell Syst. 2023;14(11):925–939. doi: 10.1016/j.cels.2023.10.006. [DOI] [PubMed] [Google Scholar]
10.Kortemme T. De novo protein design—From new structures to programmable functions. Cell. 2024;187(3):526–544. doi: 10.1016/j.cell.2023.12.028. [DOI] [PMC free article] [PubMed] [Google Scholar]
11.Yao J., Wang X. Artificial intelligence in de novo protein design. Medicine in Novel Technology and Devices. 2025 [Google Scholar]
12.Bennett N.R., Coventry B., Goreshnik I., Huang B., Allen A., Vafeados D., et al. Improving de novo protein binder design with deep learning. Nat Commun. 2023;14(1):2625. doi: 10.1038/s41467-023-38328-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
13.Watson J.L., Juergens D., Bennett N.R., Trippe B.L., Yim J., Eisenach H.E., et al. De novo design of protein structure and function with RFdiffusion. Nature. 2023;620(7976):1089–1100. doi: 10.1038/s41586-023-06415-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
14.Fox D.R., Taveneau C., Clement J., Grinter R., Knott G.J. Code to complex: ai-driven de novo binder design. Structure. 2025 [Google Scholar]
15.Kuhlman B., Bradley P. Advances in protein structure prediction and design. Nat Rev Mol Cell Biol. 2019;20(11):681–697. doi: 10.1038/s41580-019-0163-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
16.Chatterjee A., Walters R., Shafi Z., Ahmed O.S., Sebek M., Gysi D., et al. Improving the generalizability of protein-ligand binding predictions with AI-Bind. Nat Commun. 2023;14(1):1989. doi: 10.1038/s41467-023-37572-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
17.Hu B., Tan C., Wu L., Zheng J., Xia J., Gao Z., et al. 2024. Advances of deep learning in protein science: a comprehensive survey. arXiv preprint arXiv:240305314. [Google Scholar]
18.Notin P., Rollins N., Gal Y., Sander C., Marks D. Machine learning for functional protein design. Nat Biotechnol. 2024;42(2):216–228. doi: 10.1038/s41587-024-02127-0. [DOI] [PubMed] [Google Scholar]
19.Lee M. Recent advances in deep learning for protein-protein interaction analysis: a comprehensive review. Molecules. 2023;28(13):5169. doi: 10.3390/molecules28135169. [DOI] [PMC free article] [PubMed] [Google Scholar]
20.Haley O.C., Harding S., Sen T.Z., Woodhouse M.R., Kim H.-S., Andorf C. Application of RFdiffusion to predict interspecies protein-protein interactions between fungal pathogens and cereal crops. bioRxiv. 2024 2024.09. 17.613523. [Google Scholar]
21.Zambaldi V., La D., Chu A.E., Patani H., Danson A.E., Kwan T.O., et al. 2024. De novo design of high-affinity protein binders with AlphaProteo. arXiv preprint arXiv:240908022. [Google Scholar]
22.Wu K., Jiang H., Hicks D.R., Liu C., Muratspahić E., Ramelot T.A., et al. Design of intrinsically disordered region binding proteins. Science. 2025;389(6757) eadr8063. [Google Scholar]
23.Cotet T.-S., Krawczuk I., Pacesa M., Nickel L., Correia B.E., Haas N., et al. Crowdsourced protein design: lessons from the adaptyv EGFR binder competition. bioRxiv. 2025 2025.04. 17.648362. [Google Scholar]
24.Gainza P., Wehrle S., Van Hall-Beauvais A., Marchand A., Scheck A., Harteveld Z., et al. De novo design of protein interactions with learned surface fingerprints. Nature. 2023;617(7959):176–184. doi: 10.1038/s41586-023-05993-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
25.Sun K., Li S., Zheng B., Zhu Y., Wang T., Liang M., et al. Accurate de novo design of heterochiral protein–protein interactions. Cell Res. 2024;34(12):846–858. doi: 10.1038/s41422-024-01014-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
26.van Wier S.P., Beekman A.M. Peptide design to control protein–protein interactions. Chem Soc Rev. 2025 [Google Scholar]
27.Xue L.C., Dobbs D., Bonvin A.M., Honavar V. Computational prediction of protein interfaces: a review of data driven methods. FEBS Lett. 2015;589(23):3516–3526. doi: 10.1016/j.febslet.2015.10.003. [DOI] [PMC free article] [PubMed] [Google Scholar]
28.Bennett N.R. University of Washington; 2023. Deep learning tools for protein binder design. [Google Scholar]
29.Harmansa S., Affolter M. Protein binders and their applications in developmental biology. Development. 2018;145(2) dev148874. [Google Scholar]
30.Schnider S.T., Vigano M.A., Affolter M., Aguilar G. Functionalized protein binders in developmental biology. Annu Rev Cell Dev Biol. 2024;40 [Google Scholar]
31.Sayeed M.A., Tekin E., Nadeem M., ElNaker N.A., Singh A., Vassilieva N., et al. Prot42: a novel family of protein language models for target-aware protein binder generation. arXiv preprint arXiv:250404453. 2025 [Google Scholar]
32.Wang F., Wang Y., Feng L., Zhang C., Lai L. Target-specific de novo peptide binder design with DiffPepBuilder. J Chem Inf Model. 2024;64(24):9135–9149. doi: 10.1021/acs.jcim.4c00975. [DOI] [PubMed] [Google Scholar]
33.Li Q., Vlachos E., Bryant P. Design of linear and cyclic peptide binders of different lengths from protein sequence information. bioRxiv. 2024 [Google Scholar]
34.Feng Y.-Z., Zhu Q.-F., Xue J., Chen P., Yu Y. Shining in the dark: the big world of small peptides in plants. aBIOTECH. 2023;4(3):238–256. doi: 10.1007/s42994-023-00100-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
35.Chen T., Zhang Y., Chatterjee P. moPPIt: de Novo Generation of Motif-Specific Binders with Protein Language Models. bioRxiv. 2024 [Google Scholar]
36.Gao J., Li J., Hu J., Zhang S., Zhu K., Huang Y., et al. 2025. HelixDesign-Binder: a scalable production-grade platform for binder design built on HelixFold3. arXiv preprint arXiv:250521873. [Google Scholar]
37.Bonadio A., Shifman J.M. Computational design and experimental optimization of protein binders with prospects for biomedical applications. Protein Eng Des Sel. 2021;34:gzab020. doi: 10.1093/protein/gzab020. [DOI] [PMC free article] [PubMed] [Google Scholar]
38.Defresne M., Barbe S., Schiex T. Protein design with deep learning. Int J Mol Sci. 2021;22(21) [Google Scholar]
39.Baek M., DiMaio F., Anishchenko I., Dauparas J., Ovchinnikov S., Lee G.R., et al. Accurate prediction of protein structures and interactions using a three-track neural network. Science. 2021;373(6557):871–876. doi: 10.1126/science.abj8754. [DOI] [PMC free article] [PubMed] [Google Scholar]
40.Liu C., Wu K., Choi H., Han H.L., Zhang X., Watson J.L., et al. Diffusing protein binders to intrinsically disordered proteins. Nature. 2025:1–9. [Google Scholar]
41.Gainza P., Bunker R.D., Townson S.A., Castle J.C. Machine learning to predict de novo protein–protein interactions. Trends Biotechnol. 2025;43(12):3056–3070. doi: 10.1016/j.tibtech.2025.04.013. [DOI] [PubMed] [Google Scholar]
42.Baker D., Sappington I., Toul M., Lee D., Robinson S., Goreshnik I., et al. 2025. Improved protein binder design using beta-pairing targeted RFdiffusion. [Google Scholar]
43.Dauparas J., Anishchenko I., Bennett N., Bai H., Ragotte R.J., Milles L.F., et al. Robust deep learning–based protein sequence design using ProteinMPNN. Science. 2022;378(6615):49–56. doi: 10.1126/science.add2187. [DOI] [PMC free article] [PubMed] [Google Scholar]
44.Ferruz N., Heinzinger M., Akdel M., Goncearenco A., Naef L., Dallago C. From sequence to function through structure: deep learning for protein design. Comput Struct Biotechnol J. 2023;21:238–250. doi: 10.1016/j.csbj.2022.11.014. [DOI] [PMC free article] [PubMed] [Google Scholar]
45.Kyro G.W., Qiu T., Batista V.S. A model-centric review of deep learning for protein design. arXiv preprint arXiv:250219173. 2025 [Google Scholar]
46.Del Corpo D., Coculo D., Greco M., De Lorenzo G., Lionetti V. Pull the fuzes: processing protein precursors to generate apoplastic danger signals for triggering plant immunity. Plant Commun. 2024;5(8):100931. doi: 10.1016/j.xplc.2024.100931. [DOI] [PMC free article] [PubMed] [Google Scholar]
47.Baumann K. ABA's greatest hits. Nat Rev Mol Cell Biol. 2010;11(1):2. doi: 10.1038/nrm2826. [DOI] [PubMed] [Google Scholar]
48.Zhu Y., Klasfeld S., Jeong C.W., Jin R., Goto K., Yamaguchi N., et al. TERMINAL FLOWER 1-FD complex target genes and competition with FLOWERING LOCUS T. Nat Commun. 2020;11(1):5118. doi: 10.1038/s41467-020-18782-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
49.Matallana-Ramirez L.P., Rauf M., Farage-Barhom S., Dortay H., Xue G.-P., Dröge-Laser W., et al. NAC transcription factor ORE1 and senescence-induced BIFUNCTIONAL NUCLEASE1 (BFN1) constitute a regulatory cascade in Arabidopsis. Mol Plant. 2013;6(5):1438–1452. doi: 10.1093/mp/sst012. [DOI] [PubMed] [Google Scholar]
50.Kelliher T., Starr D., Richbourg L., Chintamanani S., Delzer B., Nuccio M.L., et al. MATRILINEAL, a sperm-specific phospholipase, triggers maize haploid induction. Nature. 2017;542(7639):105–109. doi: 10.1038/nature20827. [DOI] [PubMed] [Google Scholar]
51.Liu C., Li X., Meng D., Zhong Y., Chen C., Dong X., et al. A 4-bp insertion at ZmPLA1 encoding a putative phospholipase A generates haploid induction in maize. Mol Plant. 2017;10(3):520–522. doi: 10.1016/j.molp.2017.01.011. [DOI] [PubMed] [Google Scholar]
52.Janacek D.P., Kolb M., Schulz L., Mergner J., Kuster B., Glanc M., et al. Transport properties of canonical PIN-FORMED proteins from Arabidopsis and the role of the loop domain in auxin transport. Dev Cell. 2024;59(24):3259–3271. e4. doi: 10.1016/j.devcel.2024.09.020. [DOI] [PubMed] [Google Scholar]
53.Rodriguez L., Fiedler L., Zou M., Giannini C., Monzer A., Vladimirtsev D., et al. ABP1/ABL3-TMK1 cell-surface auxin signaling targets PIN2-mediated auxin fluxes for root gravitropism. Cell. 2025;188(22):6138–6150. doi: 10.1016/j.cell.2025.08.026. [DOI] [PubMed] [Google Scholar]
54.Léran S., Muños S., Brachet C., Tillard P., Gojon A., Lacombe B. Arabidopsis NRT1. 1 is a bidirectional transporter involved in root-to-shoot nitrate translocation. Mol Plant. 2013;6(6):1984–1987. doi: 10.1093/mp/sst068. [DOI] [PubMed] [Google Scholar]
55.Wang L., Fujiwara S., Somers D.E. PRR5 regulates phosphorylation, nuclear import and subnuclear localization of TOC1 in the Arabidopsis circadian clock. EMBO J. 2010;29(11):1903–1915. doi: 10.1038/emboj.2010.76. [DOI] [PMC free article] [PubMed] [Google Scholar]
56.Lan Z., Zhong S., Qu L.-J. Insights into pollen–stigma recognition: self-incompatibility mechanisms serve as interspecies barriers in Brassicaceae? aBIOTECH. 2023;4(2):176–179. doi: 10.1007/s42994-023-00105-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
57.Su J., Jiao Q., Jia T., Hu X. The photosystem-II repair cycle: updates and open questions. Planta. 2024;259(1):20. [Google Scholar]
58.Casadio R., Martelli P.L., Savojardo C. Machine learning solutions for predicting protein–protein interactions. Wiley Interdiscip Rev Comput Mol Sci. 2022;12(6) [Google Scholar]
59.Rao V.S., Srinivas K., Sujini G., Kumar G.S. Protein-protein interaction detection: methods and analysis. Int J Proteomics. 2014;2014(1) [Google Scholar]
60.Soleymani F., Paquet E., Viktor H., Michalowski W., Spinello D. Protein–protein interaction prediction with deep learning: a comprehensive review. Comput Struct Biotechnol J. 2022;20:5316–5341. doi: 10.1016/j.csbj.2022.08.070. [DOI] [PMC free article] [PubMed] [Google Scholar]
61.Pazos F., Pietrosemoli N., García-Martín J.A., Solano R. Protein intrinsic disorder in plants. Front Plant Sci. 2013:4363. [Google Scholar]
62.Haley O.C., Tibbs-Cortes L., Hayford R.K., Harding S., Woodhouse M., Cannon E., et al. Why do some predicted protein structures fold poorly? Benchmarking AlphaFold, ESMFold, and Boltz in maize. bioRxiv. 2025 2025.07.05.663230. [Google Scholar]
63.Hashemifar S., Neyshabur B., Khan A.A., Xu J. Predicting protein–protein interactions through sequence-based deep learning. Bioinformatics. 2018;34(17):i802–i810. doi: 10.1093/bioinformatics/bty573. [DOI] [PMC free article] [PubMed] [Google Scholar]
64.Jiang B., Li X., Guo A., Wei M., Wu J. RFdiffusion exhibits low success rate in de novo design of functional protein binders for biochemical detection. bioRxiv. 2025 2025.02. 07.636769. [Google Scholar]
65.Goodwin H.V., Atkinson N.S. PPI-ID: streamlining protein-protein interaction prediction through domain and SLiM mapping. PLoS Comput Biol. 2025;21(10) [Google Scholar]
66.Albanese K.I., Barbe S., Tagami S., Woolfson D.N., Schiex T. Computational protein design. Nat Rev Methods Primers. 2025;5(1):13. [Google Scholar]
67.Dhakal A., McKay C., Tanner J.J., Cheng J. Artificial intelligence in the prediction of protein–ligand interactions: recent advances and future directions. Briefings Bioinf. 2022;23(1):bbab476. [Google Scholar]
68.Vázquez Torres S., Benard Valle M., Mackessy S.P., Menzies S.K., Casewell N.R., Ahmadi S., et al. De novo designed proteins neutralize lethal snake venom toxins. Nature. 2025;639(8053):225–231. doi: 10.1038/s41586-024-08393-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
69.Goudy O.J., Nallathambi A., Kinjo T., Randolph N., Kuhlman B. In silico evolution of protein binders with deep learning models for structure prediction and sequence design. bioRxiv. 2023 [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

No data was used for the research described in the article.

[bib1] 1.Potluri K. De Novo Structure-Based Design of TEM-171 β-Lactamase Protein Inhibitors Using Integrated Deep Learning and Multi-Scale Simulations to Combat Bacterial Resistance. bioRxiv. 2025 2025.06. 23.661177. [Google Scholar]

[bib2] 2.Chen L., Liu G., Zhang T. Integrating machine learning and genome editing for crop improvement. aBIOTECH. 2024;5(2):262–277. doi: 10.1007/s42994-023-00133-5. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib3] 3.Peng S., Rajjou L. Advancing plant biology through deep learning-powered natural language processing. Plant Cell Rep. 2024;43(8):208. doi: 10.1007/s00299-024-03294-9. [DOI] [PubMed] [Google Scholar]

[bib4] 4.Pacesa M., Nickel L., Schellhaas C., Schmidt J., Pyatova E., Kissling L., et al. bioRxiv.; 2024. Bindcraft: one-shot design of functional protein binders. [Google Scholar]

[bib5] 5.Cao L., Coventry B., Goreshnik I., Huang B., Sheffler W., Park J.S., et al. Design of protein-binding proteins from the target structure alone. Nature. 2022;605(7910):551–560. doi: 10.1038/s41586-022-04654-9. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib6] 6.Liu H., Chen Q. Computational protein design with data-driven approaches: recent developments and perspectives. Wiley Interdiscip Rev Comput Mol Sci. 2023;13(3) [Google Scholar]

[bib7] 7.Li Y., Duan Z., Li Z., Xue W. Data and AI-driven synthetic binding protein discovery. Trends Pharmacol Sci. 2025 [Google Scholar]

[bib8] 8.Marchand A., Van Hall-Beauvais A.K., Correia B.E. Computational design of novel protein–protein interactions–An overview on methodological approaches and applications. Curr Opin Struct Biol. 2022;74 [Google Scholar]

[bib9] 9.Khakzad H., Igashov I., Schneuing A., Goverde C., Bronstein M., Correia B. A new age in protein design empowered by deep learning. Cell Syst. 2023;14(11):925–939. doi: 10.1016/j.cels.2023.10.006. [DOI] [PubMed] [Google Scholar]

[bib10] 10.Kortemme T. De novo protein design—From new structures to programmable functions. Cell. 2024;187(3):526–544. doi: 10.1016/j.cell.2023.12.028. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib11] 11.Yao J., Wang X. Artificial intelligence in de novo protein design. Medicine in Novel Technology and Devices. 2025 [Google Scholar]

[bib12] 12.Bennett N.R., Coventry B., Goreshnik I., Huang B., Allen A., Vafeados D., et al. Improving de novo protein binder design with deep learning. Nat Commun. 2023;14(1):2625. doi: 10.1038/s41467-023-38328-5. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib13] 13.Watson J.L., Juergens D., Bennett N.R., Trippe B.L., Yim J., Eisenach H.E., et al. De novo design of protein structure and function with RFdiffusion. Nature. 2023;620(7976):1089–1100. doi: 10.1038/s41586-023-06415-8. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib14] 14.Fox D.R., Taveneau C., Clement J., Grinter R., Knott G.J. Code to complex: ai-driven de novo binder design. Structure. 2025 [Google Scholar]

[bib15] 15.Kuhlman B., Bradley P. Advances in protein structure prediction and design. Nat Rev Mol Cell Biol. 2019;20(11):681–697. doi: 10.1038/s41580-019-0163-x. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib16] 16.Chatterjee A., Walters R., Shafi Z., Ahmed O.S., Sebek M., Gysi D., et al. Improving the generalizability of protein-ligand binding predictions with AI-Bind. Nat Commun. 2023;14(1):1989. doi: 10.1038/s41467-023-37572-z. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib17] 17.Hu B., Tan C., Wu L., Zheng J., Xia J., Gao Z., et al. 2024. Advances of deep learning in protein science: a comprehensive survey. arXiv preprint arXiv:240305314. [Google Scholar]

[bib18] 18.Notin P., Rollins N., Gal Y., Sander C., Marks D. Machine learning for functional protein design. Nat Biotechnol. 2024;42(2):216–228. doi: 10.1038/s41587-024-02127-0. [DOI] [PubMed] [Google Scholar]

[bib19] 19.Lee M. Recent advances in deep learning for protein-protein interaction analysis: a comprehensive review. Molecules. 2023;28(13):5169. doi: 10.3390/molecules28135169. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib20] 20.Haley O.C., Harding S., Sen T.Z., Woodhouse M.R., Kim H.-S., Andorf C. Application of RFdiffusion to predict interspecies protein-protein interactions between fungal pathogens and cereal crops. bioRxiv. 2024 2024.09. 17.613523. [Google Scholar]

[bib21] 21.Zambaldi V., La D., Chu A.E., Patani H., Danson A.E., Kwan T.O., et al. 2024. De novo design of high-affinity protein binders with AlphaProteo. arXiv preprint arXiv:240908022. [Google Scholar]

[bib22] 22.Wu K., Jiang H., Hicks D.R., Liu C., Muratspahić E., Ramelot T.A., et al. Design of intrinsically disordered region binding proteins. Science. 2025;389(6757) eadr8063. [Google Scholar]

[bib23] 23.Cotet T.-S., Krawczuk I., Pacesa M., Nickel L., Correia B.E., Haas N., et al. Crowdsourced protein design: lessons from the adaptyv EGFR binder competition. bioRxiv. 2025 2025.04. 17.648362. [Google Scholar]

[bib24] 24.Gainza P., Wehrle S., Van Hall-Beauvais A., Marchand A., Scheck A., Harteveld Z., et al. De novo design of protein interactions with learned surface fingerprints. Nature. 2023;617(7959):176–184. doi: 10.1038/s41586-023-05993-x. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib25] 25.Sun K., Li S., Zheng B., Zhu Y., Wang T., Liang M., et al. Accurate de novo design of heterochiral protein–protein interactions. Cell Res. 2024;34(12):846–858. doi: 10.1038/s41422-024-01014-2. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib26] 26.van Wier S.P., Beekman A.M. Peptide design to control protein–protein interactions. Chem Soc Rev. 2025 [Google Scholar]

[bib27] 27.Xue L.C., Dobbs D., Bonvin A.M., Honavar V. Computational prediction of protein interfaces: a review of data driven methods. FEBS Lett. 2015;589(23):3516–3526. doi: 10.1016/j.febslet.2015.10.003. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib28] 28.Bennett N.R. University of Washington; 2023. Deep learning tools for protein binder design. [Google Scholar]

[bib29] 29.Harmansa S., Affolter M. Protein binders and their applications in developmental biology. Development. 2018;145(2) dev148874. [Google Scholar]

[bib30] 30.Schnider S.T., Vigano M.A., Affolter M., Aguilar G. Functionalized protein binders in developmental biology. Annu Rev Cell Dev Biol. 2024;40 [Google Scholar]

[bib31] 31.Sayeed M.A., Tekin E., Nadeem M., ElNaker N.A., Singh A., Vassilieva N., et al. Prot42: a novel family of protein language models for target-aware protein binder generation. arXiv preprint arXiv:250404453. 2025 [Google Scholar]

[bib32] 32.Wang F., Wang Y., Feng L., Zhang C., Lai L. Target-specific de novo peptide binder design with DiffPepBuilder. J Chem Inf Model. 2024;64(24):9135–9149. doi: 10.1021/acs.jcim.4c00975. [DOI] [PubMed] [Google Scholar]

[bib33] 33.Li Q., Vlachos E., Bryant P. Design of linear and cyclic peptide binders of different lengths from protein sequence information. bioRxiv. 2024 [Google Scholar]

[bib34] 34.Feng Y.-Z., Zhu Q.-F., Xue J., Chen P., Yu Y. Shining in the dark: the big world of small peptides in plants. aBIOTECH. 2023;4(3):238–256. doi: 10.1007/s42994-023-00100-0. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib35] 35.Chen T., Zhang Y., Chatterjee P. moPPIt: de Novo Generation of Motif-Specific Binders with Protein Language Models. bioRxiv. 2024 [Google Scholar]

[bib36] 36.Gao J., Li J., Hu J., Zhang S., Zhu K., Huang Y., et al. 2025. HelixDesign-Binder: a scalable production-grade platform for binder design built on HelixFold3. arXiv preprint arXiv:250521873. [Google Scholar]

[bib37] 37.Bonadio A., Shifman J.M. Computational design and experimental optimization of protein binders with prospects for biomedical applications. Protein Eng Des Sel. 2021;34:gzab020. doi: 10.1093/protein/gzab020. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib38] 38.Defresne M., Barbe S., Schiex T. Protein design with deep learning. Int J Mol Sci. 2021;22(21) [Google Scholar]

[bib39] 39.Baek M., DiMaio F., Anishchenko I., Dauparas J., Ovchinnikov S., Lee G.R., et al. Accurate prediction of protein structures and interactions using a three-track neural network. Science. 2021;373(6557):871–876. doi: 10.1126/science.abj8754. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib40] 40.Liu C., Wu K., Choi H., Han H.L., Zhang X., Watson J.L., et al. Diffusing protein binders to intrinsically disordered proteins. Nature. 2025:1–9. [Google Scholar]

[bib41] 41.Gainza P., Bunker R.D., Townson S.A., Castle J.C. Machine learning to predict de novo protein–protein interactions. Trends Biotechnol. 2025;43(12):3056–3070. doi: 10.1016/j.tibtech.2025.04.013. [DOI] [PubMed] [Google Scholar]

[bib42] 42.Baker D., Sappington I., Toul M., Lee D., Robinson S., Goreshnik I., et al. 2025. Improved protein binder design using beta-pairing targeted RFdiffusion. [Google Scholar]

[bib43] 43.Dauparas J., Anishchenko I., Bennett N., Bai H., Ragotte R.J., Milles L.F., et al. Robust deep learning–based protein sequence design using ProteinMPNN. Science. 2022;378(6615):49–56. doi: 10.1126/science.add2187. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib44] 44.Ferruz N., Heinzinger M., Akdel M., Goncearenco A., Naef L., Dallago C. From sequence to function through structure: deep learning for protein design. Comput Struct Biotechnol J. 2023;21:238–250. doi: 10.1016/j.csbj.2022.11.014. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib45] 45.Kyro G.W., Qiu T., Batista V.S. A model-centric review of deep learning for protein design. arXiv preprint arXiv:250219173. 2025 [Google Scholar]

[bib46] 46.Del Corpo D., Coculo D., Greco M., De Lorenzo G., Lionetti V. Pull the fuzes: processing protein precursors to generate apoplastic danger signals for triggering plant immunity. Plant Commun. 2024;5(8):100931. doi: 10.1016/j.xplc.2024.100931. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib47] 47.Baumann K. ABA's greatest hits. Nat Rev Mol Cell Biol. 2010;11(1):2. doi: 10.1038/nrm2826. [DOI] [PubMed] [Google Scholar]

[bib48] 48.Zhu Y., Klasfeld S., Jeong C.W., Jin R., Goto K., Yamaguchi N., et al. TERMINAL FLOWER 1-FD complex target genes and competition with FLOWERING LOCUS T. Nat Commun. 2020;11(1):5118. doi: 10.1038/s41467-020-18782-1. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib49] 49.Matallana-Ramirez L.P., Rauf M., Farage-Barhom S., Dortay H., Xue G.-P., Dröge-Laser W., et al. NAC transcription factor ORE1 and senescence-induced BIFUNCTIONAL NUCLEASE1 (BFN1) constitute a regulatory cascade in Arabidopsis. Mol Plant. 2013;6(5):1438–1452. doi: 10.1093/mp/sst012. [DOI] [PubMed] [Google Scholar]

[bib50] 50.Kelliher T., Starr D., Richbourg L., Chintamanani S., Delzer B., Nuccio M.L., et al. MATRILINEAL, a sperm-specific phospholipase, triggers maize haploid induction. Nature. 2017;542(7639):105–109. doi: 10.1038/nature20827. [DOI] [PubMed] [Google Scholar]

[bib51] 51.Liu C., Li X., Meng D., Zhong Y., Chen C., Dong X., et al. A 4-bp insertion at ZmPLA1 encoding a putative phospholipase A generates haploid induction in maize. Mol Plant. 2017;10(3):520–522. doi: 10.1016/j.molp.2017.01.011. [DOI] [PubMed] [Google Scholar]

[bib52] 52.Janacek D.P., Kolb M., Schulz L., Mergner J., Kuster B., Glanc M., et al. Transport properties of canonical PIN-FORMED proteins from Arabidopsis and the role of the loop domain in auxin transport. Dev Cell. 2024;59(24):3259–3271. e4. doi: 10.1016/j.devcel.2024.09.020. [DOI] [PubMed] [Google Scholar]

[bib53] 53.Rodriguez L., Fiedler L., Zou M., Giannini C., Monzer A., Vladimirtsev D., et al. ABP1/ABL3-TMK1 cell-surface auxin signaling targets PIN2-mediated auxin fluxes for root gravitropism. Cell. 2025;188(22):6138–6150. doi: 10.1016/j.cell.2025.08.026. [DOI] [PubMed] [Google Scholar]

[bib54] 54.Léran S., Muños S., Brachet C., Tillard P., Gojon A., Lacombe B. Arabidopsis NRT1. 1 is a bidirectional transporter involved in root-to-shoot nitrate translocation. Mol Plant. 2013;6(6):1984–1987. doi: 10.1093/mp/sst068. [DOI] [PubMed] [Google Scholar]

[bib55] 55.Wang L., Fujiwara S., Somers D.E. PRR5 regulates phosphorylation, nuclear import and subnuclear localization of TOC1 in the Arabidopsis circadian clock. EMBO J. 2010;29(11):1903–1915. doi: 10.1038/emboj.2010.76. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib56] 56.Lan Z., Zhong S., Qu L.-J. Insights into pollen–stigma recognition: self-incompatibility mechanisms serve as interspecies barriers in Brassicaceae? aBIOTECH. 2023;4(2):176–179. doi: 10.1007/s42994-023-00105-9. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib57] 57.Su J., Jiao Q., Jia T., Hu X. The photosystem-II repair cycle: updates and open questions. Planta. 2024;259(1):20. [Google Scholar]

[bib58] 58.Casadio R., Martelli P.L., Savojardo C. Machine learning solutions for predicting protein–protein interactions. Wiley Interdiscip Rev Comput Mol Sci. 2022;12(6) [Google Scholar]

[bib59] 59.Rao V.S., Srinivas K., Sujini G., Kumar G.S. Protein-protein interaction detection: methods and analysis. Int J Proteomics. 2014;2014(1) [Google Scholar]

[bib60] 60.Soleymani F., Paquet E., Viktor H., Michalowski W., Spinello D. Protein–protein interaction prediction with deep learning: a comprehensive review. Comput Struct Biotechnol J. 2022;20:5316–5341. doi: 10.1016/j.csbj.2022.08.070. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib61] 61.Pazos F., Pietrosemoli N., García-Martín J.A., Solano R. Protein intrinsic disorder in plants. Front Plant Sci. 2013:4363. [Google Scholar]

[bib62] 62.Haley O.C., Tibbs-Cortes L., Hayford R.K., Harding S., Woodhouse M., Cannon E., et al. Why do some predicted protein structures fold poorly? Benchmarking AlphaFold, ESMFold, and Boltz in maize. bioRxiv. 2025 2025.07.05.663230. [Google Scholar]

[bib63] 63.Hashemifar S., Neyshabur B., Khan A.A., Xu J. Predicting protein–protein interactions through sequence-based deep learning. Bioinformatics. 2018;34(17):i802–i810. doi: 10.1093/bioinformatics/bty573. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib64] 64.Jiang B., Li X., Guo A., Wei M., Wu J. RFdiffusion exhibits low success rate in de novo design of functional protein binders for biochemical detection. bioRxiv. 2025 2025.02. 07.636769. [Google Scholar]

[bib65] 65.Goodwin H.V., Atkinson N.S. PPI-ID: streamlining protein-protein interaction prediction through domain and SLiM mapping. PLoS Comput Biol. 2025;21(10) [Google Scholar]

[bib66] 66.Albanese K.I., Barbe S., Tagami S., Woolfson D.N., Schiex T. Computational protein design. Nat Rev Methods Primers. 2025;5(1):13. [Google Scholar]

[bib67] 67.Dhakal A., McKay C., Tanner J.J., Cheng J. Artificial intelligence in the prediction of protein–ligand interactions: recent advances and future directions. Briefings Bioinf. 2022;23(1):bbab476. [Google Scholar]

[bib68] 68.Vázquez Torres S., Benard Valle M., Mackessy S.P., Menzies S.K., Casewell N.R., Ahmadi S., et al. De novo designed proteins neutralize lethal snake venom toxins. Nature. 2025;639(8053):225–231. doi: 10.1038/s41586-024-08393-x. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib69] 69.Goudy O.J., Nallathambi A., Kinjo T., Randolph N., Kuhlman B. In silico evolution of protein binders with deep learning models for structure prediction and sequence design. bioRxiv. 2023 [Google Scholar]

PERMALINK

Deep learning–driven protein binder design for crop improvement

Muhammad Salman Iqbal

Revocatus Bahitwa

Abdul Ali Azam

Hui Xu

Hai Wang

Abstract

1. Introduction

Table 1.

2. Fundamentals of protein binder design

2.1. Principles of molecular recognition in protein binder design

2.2. Types and structures of engineered protein binders

3. A new era of protein binder design

3.1. Modern pipelines for DL-driven protein binder design

3.1.1. Target selection and target structure

Fig. 1.

3.1.2. Binding site, sequence, and structure of the protein binder

3.1.3. From in silico and laboratory validation to reverse translation

3.1.4. Plant delivery and trait realization

4. In silico evaluation and in vitro validation

4.1. Computational filtering

Table 2.

4.2. Experimental validation

4.2.1. Binding affinity and kinetics

4.2.2. Structural validation

4.2.3. Functional assays

4.2.4. Specificity and accumulation

5. Working example for designing a protein binder to a fungal effector protein

5.1. Step 1–5

6. End-to-end pipelines and tools

6.1. BindCraft: backpropagation through AF2

6.2. Modular backbone and sequence design approach with RFdiffusion, ProteinMPNN, and AlphaProteo

Table 3.

6.3. Contrasting design philosophies

6.4. Accessibility and flexibility

7. Practical workflows and case studies

7.1. Case study 1: de novo design via DL backpropagation (BindCraft targeting CLDN1)

7.2. Case study 2: accurate de novo design using heterochiral (mirror-image) principles

8. Common failure patterns and their solutions

8.1. Type I failures (misfolding)

8.2. Type II failures (incorrect binding mode)

8.3. Physical/chemical failures

8.4. Low diversity/redundancy

9. Potential of DL-driven protein binders in crop improvement

Table 4.

9.1. Precision disease resistance and enhanced stress acclimation

Fig. 2.

9.2. Modulating developmental stages: flowering, senescence, and haploid induction

9.3. Engineering root systems, nutrient use, and circadian rhythms

9.4. Expanding breeding horizons and beyond

10. Challenges

10.1. Data acquisition challenges

10.2. Challenges in the adaptation of models and tools

10.3. In vivo delivery challenges

11. Plant-specific considerations

11.1. Construction and curation of plant-specific datasets

11.2. Cross-species transfer learning and DL pipelines

11.3. In vivo delivery and deployment

12. Conclusions

CRediT authorship contribution statement

Declaration of competing interests

Acknowledgements

Data availability

References

Associated Data

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases