Abstract
Designing protein-binding proteins plays an important role in drug discovery. However, AI-based design of such proteins is challenging due to complex ligand-protein interactions, flexibility of ligand molecules and amino acid side chains, and sequence-structure dependencies. We introduce PocketGen, a deep generative model that produces both the residue sequence and atom structure of the protein regions where interactions with ligand molecules occur. PocketGen ensures sequence-structure consistency by using a graph transformer for structural encoding and a sequence refinement module based on a protein language model. The bilevel graph transformer captures interactions at multiple granularities across atom, residue, and ligand levels. To enhance sequence refinement, PocketGen integrates a structural adapter with the protein language model, ensuring consistency between structure-based and sequence-based predictions. Results show that PocketGen can generate high-fidelity protein pockets with superior binding affinity and structural validity. It is ten times faster than physics-based methods and achieves a 95% success rate, defined as the percentage of generated pockets with higher binding affinity than reference pockets, along with achieving an amino acid recovery rate exceeding 64%.
Introduction
Modulating protein functions primarily involves the interaction between proteins and small molecule ligands1–4. These interactions play a crucial role in biological processes such as enzymatic catalysis, signal transduction, and cellular regulatory mechanisms. Binding small molecules to specific protein sites can induce conformational changes, modulate activity, or inhibit functions. This mechanism is valuable for studying protein functions and designing small molecule-binding proteins with customized properties. Examples include designing enzymes to catalyze reactions without natural catalysts5–8 and developing biosensors to detect environmental compounds. These biosensors transduce signals useful for environmental monitoring, clinical diagnostics, pathogen detection, drug delivery systems, and food industry applications9–12. Often, designs involve modifying existing ligand-binding protein pockets to enable more precise interactions with specific ligands13–15. However, the complexity of ligand-protein interactions, the flexibility of ligands and side chains, and sequence-structure relationships pose challenges for the computational generation of high-validity ligand-binding protein pockets3,15,16.
Methods for pocket design primarily focus on physics modeling or template matching10,11,13,17,18. For example, PocketOptimizer18–20 uses a pipeline that predicts mutations in protein pockets to enhance binding affinity based on physics-inspired energy functions and search algorithms. Starting with a bound protein-ligand complex, PocketOptimizer explores side chain structures and residue types, evaluating these mutants using energy functions and ranking them through integer linear programming techniques. Another approach employs template matching and enumeration methods11,13,14,17,21. For instance, Polizzi et al.13 use a two-step strategy for pocket design. First, they identify and assemble disconnected protein motifs (van der Mer (vdM) structural units) around the target molecule to build protein-ligand hydrogen bonds. They then graft these residues onto the protein scaffold and select the best protein-ligand pairs using scoring functions. This template-matching strategy enabled de novo design of proteins binding the drug apixaban22. However, methods based on physics modeling and template matching can be time-consuming, often requiring several hours for a single protein pocket design. Additionally, the focus on specific fold types (e.g., four-helix bundle13 or NTF214) by these methods limits their broad use.
Recent advances in protein pocket design have been driven by deep learning-based approaches3,8,16,23–25. For example, RFDiffusion26 employs denoising diffusion probabilistic models27 along with RoseTTAFold28 for de novo protein structure generation. While capable of designing pockets for specific ligands, RFDiffusion’s auxiliary guiding potentials lack precision in modeling protein-ligand interactions. RFdiffusion All-Atom (RFdiffusionAA)16 advances this approach by enabling direct generation of binding proteins around small molecules through iterative denoising, thanks to architectural modifications that concurrently consider protein structures and ligand molecules. However, both RFDiffusion and RFdiffusionAA derive residue sequences through post-processing with ProteinMPNN29 or LigandMPNN30, potentially leading to inconsistencies between sequence and structure modalities.
In contrast, our previous work, FAIR24, simultaneously designs the complete atom pocket structure and sequence using a two-stage refinement approach. FAIR employs a coarse-to-fine method, initially updating the backbone protein structure before refining the complete atom structure, including side chains. This iterative process continues until convergence is achieved. However, the gap between the two refinement steps can result in instability and limited performance, highlighting the need for an end-to-end generative approach for pocket design. Related research has focused on the co-design of sequence and structure in complementarity determining regions (CDRs) of antibodies31–35. While these methods are effective for antibodies, they face challenges when adapting to pocket designs conditioned on target ligand molecules.
Hybrid approaches that combine deep learning models with traditional methods are also being explored3,8. For instance, Yeh et al.8 developed a novel Luciferase by integrating protein hallucination36, the trRosetta structure prediction neural network37, hydrogen bonding networks, and RifDock38, generating a variety of idealized protein structures with diverse pocket shapes for subsequent filtering. Despite its success, this method’s applicability is limited to specific protein scaffolds and substrates, lacking a generalized solution. Similarly, Lee et al.3 combine deep learning with physics-based methods to create proteins with diverse and designable pocket geometries. Their approach involves backbone generation via trRosetta hallucination, sequence design through ProteinMPNN29 and LigandMPNN30, and filtering with AlphaFold39. Current pocket generation models face several challenges, including achieving sequence-structure consistency and accurately modeling complex protein-ligand interactions.
Here, we introduce PocketGen, a deep generative method for efficient generation of protein pockets. PocketGen employs a co-design scheme (Figure 1a), where the model concurrently predicts the sequence and structure of the protein pocket based on the ligand molecule and the protein scaffold (excluding the pocket). PocketGen consists of two modules: the bilevel graph transformer (Figure 1b) and the sequence refinement module (Figure 1c). PocketGen models the protein-ligand complex as a geometric graph of blocks to handle variable atom counts across different residues and ligands. Initially, the pocket residues are assigned the maximum possible number of atoms (14 atoms) and are later mapped back to specific residue types post-generation.
Figure 1. Overview of PocketGen generative model for the design of full-atom ligand-binding protein pockets.
a, Conditioned on the binding ligand molecule and the rest part of the protein except the pocket region (i.e., scaffold), PocketGen aims to generate the full atom pocket structure (backbone and sidechain atoms) and the residue type sequence with iterative equivariant refinement. The ligand structure is also adjusted during the protein pocket refinement. b, Bilevel graph transformer is leveraged in PocketGen for all-atom structural encoding and update. The bilevel level attention captures both the residue/ligand and atom-level interactions. Both the protein pocket structure and the ligand molecule structure are updated in the refinement. c, Sequence refinement module adds lightweight structural adapter layers into pLMs for sequence prediction. Only the adapter’s parameters are fine-tuned during training, and the other layers are fixed. In the adapter, the cross-attention between sequence and structure features is performed to achieve sequence-structure consistency.
The graph transformer module uses a bilevel attention mechanism to capture interactions at multiple granularities (atom-level and residue/ligand-level) and aspects (intra-protein and protein-ligand). To account for the influence of the redesigned pocket on the ligand, the ligand structure is updated during refinement to capture potential changes in binding pose. To ensure consistency across protein sequence and structure domains and incorporate evolutionary information encoded in protein language models (pLMs)40,41, PocketGen integrates a structural adapter into protein sequence updates. This adapter facilitates cross-attention between sequence and structure features, achieving sequence-structure consistency. Only the adapter is fine-tuned during training, while the remaining layers of the protein language models remain unchanged. Results show that PocketGen significantly outperforms current methods in protein pocket generation across two popular benchmarks. PocketGen achieves an average amino acid recovery rate of 63.40% and a Vina score of −9.655 for top-1 generated protein pockets on the CrossDocked dataset. Comprehensive analyses indicate that PocketGen can produce diverse and high-affinity protein pockets for functional molecules, showcasing its efficacy and potential in designing small-molecule binders and enzymes.
Results
Benchmarking generated protein pockets across multiple axes of molecular evaluation
We benchmark PocketGen on two datasets. The CrossDocked dataset42 comprises protein-molecule pairs generated through cross-docking, divided into train, validation, and test sets based on a sequence identity threshold of 30%. The Binding MOAD dataset43 includes experimentally determined protein-ligand complexes, split into train, validation, and test sets according to the proteins’ enzyme commission numbers44. Given intermolecular distance scales relevant to protein-ligand interactions45, our default experimental setting considers all residues with atoms within 3.5 Å of any binding ligand atoms as part of the protein pocket, averaging about eight residues per pocket. We also investigate PocketGen’s capability to design larger pockets with a radius of 5.5 Å that include more residues (Figure 3c). Methodological details on data processing are provided in Methods.
Figure 3. Exploring the capabilities of PocketGen.
a, The average runtime of different methods for generating 100 protein pockets for a ligand molecule on the two benchmarks. b, The trade-off between quality (measured by Vina score) and diversity (1- average pairwise sequence similarity) of PocketGen. We can balance the trade-off by tuning the temperate hyperparameter . c, The influence of the design pocket size on the metrics. d, Performance w.r.t. model scales of pLMs using ESM series on CrossDocked dataset. The green dots represent PocketGen models with different ESMs. The bubble size is proportional to the number of trainable parameters. e, PocketGen tends to generate pockets with higher affinity for larger ligand molecules (Pearson Correlation , bands indicate 95% confidence interval). f, The top molecular functional groups leading to high affinity.
We use three groups of metrics to evaluate the quality of protein pockets generated by PocketGen. First, we measure the affinity between the generated pocket and the target ligand molecule using the AutoDock Vina score46, MM-GBSA47, and min-in-place GlideSP score48. Second, we assess the structural validity of the generated pockets using scRMSD, scTM, and pLDDT. We derive the amino acid sequence for the protein pocket structure using ProteinMPNN29 and predict the pocket structure using ESMFold49 or AlphaFold239. The scRMSD is obtained by aligning and computing the self-consistency root mean squared deviation between the generated and the predicted structure’s backbone atoms. Following an established strategy50,51, eight sequences are predicted for each generated protein structure and the sequence with the minimum scRMSD is used for reporting the metric values. Similarly, scTM, the self-consistency template modeling score, is obtained by calculating the TM-score52 between the ESMFold/AlphaFold2-predicted structure and the generated structure, with scores ranging from 0 to 1, where higher values indicate greater designability. We also report the score to evaluate whether the generated pocket improves or degrades the scTM score of the initial protein. The predicted local distance difference test (pLDDT) score39 reflects the confidence in structural predictions on a scale from 0 to 100, with higher scores indicating greater confidence. We report the average plDDT score from ESMFold/AlphaFold2 across pocket residues. A generated protein pocket structure is defined as designable if the overall structure scRMSD is less than 2 Å and the pocket scRMSD is less than 1 Å26,53,54. Table S1 shows the percentage of designable generated pockets. Supplementary Figure S1 describes calculating these metrics. Finally, we report the amino acid recovery (AAR) as the percentage of correctly predicted pocket residue types to evaluate the accuracy of the designed sequence. A higher AAR signifies better modeling of sequence-structure dependencies.
We compare PocketGent to six methods, including deep learning methods RFDiffusion26, RFDiffusionAA (RFAA)16, FAIR24, dyMEAN25, a template-matching method DEPACT17, and a physics modeling method PocketOpt18 (refer to Methods for detailed descriptions of baselines). In Figure 2 and Table S1, PocketGen and other methods are tasked to generate 100 sequences and structures for each protein-ligand complex in the CrossDocked and Binding MOAD dataset test set. PocketOpt is not compared because it focuses on mutating existing pockets for optimization and is time-consuming for generating many protein pockets. In Table S1, we show the mean and standard deviation of results over three different runs with random seeds. In Figure 2, we bootstrap the generation results and draw the distributions to show how sensitive the results are to the composition of the data set55. Table S1 and Figure 2 show that PocketGen surpasses all the baselines, including state-of-the-art methods RFDiffusion and RFDiffusionAA (RFAA) in designability (by 3% and 2% on CrossDocked, respectively) and Vina scores (by 0.199 and 0.123 on CrossDocked, respectively), reflecting its effectiveness in generating structurally valid pockets with high binding affinities. This superior performance can be attributed to PocketGen’s capability to capture interactions at multiple granularities (atom-level and residue/ligand-level) and aspects (intra-protein and protein-ligand).
Figure 2. Benchmarking PocketGen on CrossDocked and Binding MOAD datasets.
Shown are a, amino acid recovery rates (AAR), b, Vina score performance, c, Designability scores using ESMFold structure prediction method, and d, Designability scores using AF2 structure prediction method. Uncertainty is quantified via bootstrapping, two-sided Kolmogorov-Smirnov test is used to compare PocketGen to the best-performing existing model (RFAA). P-value annotation legend: *: .
Additionally, PocketGen significantly improves the best-performing alternative method, RFDiffusionAA, by 13.95% on average in AAR, benefiting from incorporating a protein language model that encompasses evolutionary sequence information. In contrast, RFDiffusion and RFDiffusionAA require post-processing to derive amino acid types, which may incur sequence-structure inconsistency and inferior performance in AAR. In protein engineering, the practice often involves mutating several key residues to optimize properties while keeping most residues unchanged to preserve protein folding stability56,57. A high AAR achieved by generated protein pockets aligns with this practice.
In Table 1, the top-1, 3, 5, and 10 protein pockets (ranked by Vina score) generated by PocketGen demonstrate the lowest Vina scores, achieving 0.476 lower Vina scores on average than RFDiffusionAA. Besides Vina, two additional affinity metrics, MM-GBSA and GlideSP score further validate that PocketGen is in fact generating higher affinity pockets (4.287 lower MM-GBSA and 0.376 lower GlidsSP score). Moreover, PocketGen also achieves competitive pLDDT, scRMSD, and scores, showcasing PocketGen’s ability to produce high-affinity pockets that maintain structural validity and sequence-structure consistency. With a 97% success rate in creating pockets with higher affinity than the reference cases (the success rate of the strongest baseline method is 93% of RFDiffusionAA) on CrossDocked, PocketGen’s effectiveness and broad applicability across various ligand molecules are demonstrated.
Table 1.
The top 1/3/5/10 generated designable protein pocket (ranked by Vina score) on the CrossDocked dataset.
| PocketOpt | DEPACT | dyMEAN | FAIR | RFDiffusion | RFAA | PocketGen | |
|---|---|---|---|---|---|---|---|
| Top-1 generated protein pocket | |||||||
| Vina score (↓) | −9.216±0.154 | −8.527±0.061 | −8.540±0.107 | −8.792±0.122 | −9.037±0.080 | −9.216±0.091 | −9.655±0.094 |
| MM-GBSA (↓) | −58.754±1.220 | −47.130±1.372 | 48.248±0.816 | −51.923±0.588 | −54.817±1.091 | −59.255±1.260 | −63.542±0.717 |
| GlideSP (↓) | −8.612±0.127 | −7.495±0.053 | −7.472±0.088 | −7.584±0.094 | −8.485±0.069 | −8.540±0.065 | −8.916±0.047 |
| Success Rate (↑) | 0.923±0.034 | 0.750±0.016 | 0.762±0.029 | 0.796±0.035 | 0.891±0.020 | 0.930±0.027 | 0.974±0.012 |
| pLDDT (AF2) (↑) | - | 82.164±0.241 | 83.053±0.397 | 83.285±0.240 | 84.432±0.152 | 86.571±0.178 | 86.830±0.145 |
| scRMSD (AF2) (↓) | - | 0.714±0.025 | 0.708±0.022 | 0.693±0.018 | 0.675±0.015 | 0.654±0.012 | 0.645±0.009 |
| ΔscTM (AF2) (↑) | - | −0.008±0.003 | −0.005±0.002 | −0.011±0.005 | 0.022±0.006 | 0.020±0.003 | 0.028±0.002 |
| ΔscTM (AF2+co) (↑) | - | −0.012±0.003 | −0.025±0.004 | −0.032±0.007 | - | - | 0.008±0.002 |
| Top-3 generated protein pockets | |||||||
| Vina score (↓) | −8.878±0.112 | −8.131±0.064 | −8.196±0.090 | −8.321±0.045 | −8.876±0.107 | −8.980±0.057 | −9.353±0.063 |
| MM-GBSA (↓) | −53.372±1.164 | −43.790±1.029 | −44.151±0.534 | −46.050±0.809 | −52.423±0.847 | −53.593±0.722 | −60.770±0.589 |
| GlideSP (↓) | −8.360±0.094 | −7.377±0.039 | −7.325±0.078 | −7.348±0.052 | −8.219±0.049 | −8.233±0.060 | −8.670±0.056 |
| pLDDT (AF2) (↑) | - | 82.049±0.456 | 82.918±0.237 | 83.025±0.334 | 84.260±0.210 | 86.289±0.214 | 86.280±0.135 |
| scRMSD (AF2) (↓) | - | 0.713±0.017 | 0.722±0.011 | 0.692±0.016 | 0.685±0.007 | 0.659±0.014 | 0.660±0.012 |
| ΔscTM (AF2) (↑) | - | −0.011±0.004 | −0.006±0.002 | −0.008±0.003 | 0.021±0.003 | 0.022±0.002 | 0.026±0.003 |
| ΔscTM (AF2+co) (↑) | - | −0.016±0.005 | −0.026±0.004 | −0.034±0.003 | - | - | 0.005±0.001 |
| Top-5 generated protein pockets | |||||||
| Vina score (↓) | −8.702±0.090 | −7.786±0.052 | −7.974±0.049 | −7.943±0.035 | −8.510±0.073 | −8.689±0.044 | −9.239±0.076 |
| MM-GBSA (↓) | −52.080±1.071 | −35.250±0.823 | −37.924±0.340 | −37.816±0.402 | −46.847±0.700 | −51.651±0.809 | −58.083±0.561 |
| GlideSP (↓) | −8.173±0.089 | −7.126±0.035 | −7.294±0.042 | −7.289±0.041 | −8.022±0.030 | −8.093±0.048 | −8.417±0.040 |
| pLDDT (AF2) (↑) | - | 82.445±0.307 | 82.763±0.102 | 83.748±0.271 | 84.505±0.288 | 85.617±0.105 | 85.969±0.080 |
| scRMSD (AF2) (↓) | - | 0.716±0.014 | 0.726±0.011 | 0.698±0.015 | 0.680±0.009 | 0.657±0.006 | 0.655±0.004 |
| ΔscTM (AF2) (↑) | - | −0.009±0.003 | −0.007±0.002 | −0.012±0.004 | 0.019±0.003 | 0.020±0.001 | 0.025±0.001 |
| ΔscTM (AF2+co) (↑) | - | −0.017±0.002 | −0.025±0.006 | −0.035±0.005 | - | - | 0.006±0.002 |
| Top-10 generated protein pockets | |||||||
| Vina score (↓) | −8.556±0.104 | −7.681±0.040 | −7.690±0.054 | −7.785±0.028 | −8.352±0.061 | −8.524±0.038 | −9.065±0.057 |
| MM-GBSA (↓) | −49.257±0.821 | −32.534±0.680 | −33.118±0.269 | −33.670±0.440 | −45.726±0.830 | −47.325±0.540 | −54.800±0.406 |
| GlideSP (↓) | −7.935±0.082 | −6.954±0.042 | −7.022±0.034 | −7.131±0.025 | −7.806±0.022 | −7.840±0.026 | −8.196±0.027 |
| pLDDT (AF2) (↑) | - | 81.520±0.317 | 82.467±0.255 | 83.271±0.228 | 84.080±0.190 | 85.442±0.145 | 85.945±0.139 |
| scRMSD (AF2) (↓) | - | 0.712±0.013 | 0.733±0.014 | 0.706±0.013 | 0.688±0.009 | 0.680±0.010 | 0.659±0.007 |
| ΔscTM (AF2) (↑) | - | −0.014±0.002 | −0.006±0.001 | −0.010±0.003 | 0.016±0.002 | 0.019±0.001 | 0.023±0.002 |
| ΔscTM (AF2+co) (↑) | - | −0.018±0.004 | −0.030±0.002 | −0.033±0.002 | - | - | 0.004±0.002 |
The success rate measures the percentage of protein that the model can generate pockets with higher affinity than the reference ones in the datasets. Besides the Vina score, we additionally use MM-GBSA and min-in-space GlideSP score to evaluate the binding affinity. We report the average pIDDT of the predicted pocket, the scRMSD of the pocket backbone coordinates, and the change of scTM scores of the whole protein. AF2 means the scores are calculated with AlphaFold2 as the folding tool (ESMFold results in Table S2). co indicates codesign, where codesign methods directly use the designed sequence for consistency calculation. The plDDT, scRMSD, and for PocketOpt are not reported, as PocketOpt keeps protein backbone structures fixed. We use
to mark the results of affinity-related metrics,
for pocket-structure related metrics, and
for whole protein structure metrics. We report the means and standard deviations over three independent runs with random seeds.
To investigate the substructure validity and consistency with reference datasets, we conduct a qualitative substructure analysis (Table S4 and Figure S2). This analysis considers three covalent bonds in the residue backbone (C-N, C=O, and C-C), three dihedral angles in the backbone (58), and four dihedral angles in the side chains (59). Following previous research60,61, we collect the bond length and angle distributions in the generated pockets and the test dataset and compute the Kullback-Leibler (KL) divergence to quantify the distribution distances. Lower KL divergence scores for PocketGen indicate its effectiveness in replicating the geometric features present in the data.
Probing generative capabilities of PocketGen
Next, we explore PocketGen’s generative capabilities. In addition to designing high-quality protein pockets, generative models need to be efficient and maximize the yield of biochemical experiments, meaning that they can rapidly produce high-fidelity pocket candidates and that only a small number of designs are necessary to find a hit. Figure 3a offers a comparative analysis of average runtime across different methods. Traditional approaches, such as physics-based modeling (PocketOpt) and template-matching (DEPACT), require over 1,000 seconds to generate 100 pockets. Similarly, the runtime for the advanced protein backbone generation models RFDiffusion and RFDiffusionAA are significant due to their diffusion architecture, requiring 1633.5 and 2210.1 seconds on average to design 100 pockets. In contrast, methods that iteratively refine protein designs, including PocketGen, show a notable decrease in generation time; it takes 44.2 seconds for PocketGen to generate 100 pockets.
While recent methods for pocket generation prioritize maximizing binding affinity with target molecules, this strategy may not always align with practical needs, where pocket diversity also plays a critical role. Examining a batch of designed pockets rather than a single design is beneficial to improve the success rate of pocket design. Thus, we examine the relationship between binding affinity and diversity of generated protein pockets in Figure 3b. Diversity is quantified as (1 – average pairwise pocket residue sequence similarity) and can be adjusted by altering the sampling temperature (with higher resulting in greater diversity). Figure 3b compares PocketGen with the most competitive baseline RFDiffusionAA16 + LigandMPNN30, the latest version of ProteinMPNN29. Additional results for the RFDiffusion method are in Figure S4. We find that binding affinity and diversity present a trade-off. PocketGen can produce protein pockets with higher affinity than RFDiffusionAA at equivalent levels of diversity.
Figure 3c examines the impact of changes in the redesigned pocket size on PocketGen’s performance. The redesign process targets all residues with atoms within 3.5 Å, 4.5 Å, and 5.5 Å of any binding ligand atoms. The average AAR, RMSD, and Vina scores show a slight decline for larger protein pockets. This trend is attributed to the increased complexity and reduced contextual information associated with expanding the redesigned protein pocket area. Furthermore, an observation of note is that larger pockets tend to enable the exploration of structures with potentially higher affinity, as indicated by the lowest Vina scores, reaching −20 kcal/mol for designs with a 5.5 Å radius. This phenomenon could be due to the enhanced structural complementarity achievable in larger pocket designs. In Extended Data Figure 1ab, we also show PocketGen has the potential to generate the full protein binder for the ligand molecule with two case studies. The generated protein binder demonstrates high scTM scores (0.900 and 0.976, respectively).
Incorporating protein language models (pLMs) distinguishes PocketGen from prior pocket generation models. Beyond employing ESM-2 650 M49 in the default configuration, various versions of ESM with parameter counts ranging from 8M to 15B were also evaluated. As illustrated in Figure 3d, PocketGen’s performance improves with the scaling of pLMs. Going from ESM-2 35M to ESM-2 15B protein language models, the performance of PocketGen increases from 54.58% to 66.61%. Specifically, a logarithmic scaling law is observed, aligning with trends noted in large language models62. PocketGen achieves efficient training with large pLMs by fine-tuning adapter layers while keeping the majority of pLM layers fixed. PocketGen also has significantly fewer trainable parameters than RFDiffusionAA16 (7.9M versus 82.9M trainable parameters).
The performance of PocketGen in generating binding pockets is influenced by the characteristics of the ligand molecule. Figure 3e illustrates the relationship between the average Vina score of generated pockets and the number of ligand atoms, indicating that PocketGen tends to create pockets with higher affinity for larger ligand molecules. This trend may be due to the increased surface area for interaction, the presence of additional functional groups, and enhanced flexibility in the conformation of larger molecules63,64. Additionally, functional groups in ligand molecules that contribute to high binding affinity were identified using IFG65. Figure 3f displays the top 10 molecular functional groups, which include hydrogen bond donors and acceptors (e.g., carbonyl groups), aromatic rings, sulfhydryl groups, and halogens. These groups can form favorable interactions with protein pockets, thereby increasing binding affinity.
As PocketGen also updates ligand structures during pocket generation, we leverage PoseBusters66 to evaluate the structural validity of the updated ligands. Extended Data Figure 1e shows the detailed validity check, where PocketGen achieves over 95% on all the tests. This is not a surprise because PocketGen only slightly updates the ligand structure during pocket generation and manages to keep ligand structural validity. In Extended Data Figure 1c, we also explore the relationship between affinity and the RMSD to the crystal structure on PDBBind. Generally, lower RMSD leads to higher affinity. Extended Data Fig. 1d shows that PocketGen improves most of the protein-ligand complexes in PDBBind by redesigning the pockets.
We performed ablation studies (Table S5) and hyperparameter analysis (Figure S3) to evaluate the contribution of each module in PocketGen and the impact of loss function hyperparameters on model performance. Among others, we replaced the bilevel graph transformer in PocketGen with other popular encoders in structural biology, such as EGNN67, GVP68, and GMN69 for comparison. The results show that the bilevel graph transformer and the integration of pLM into PocketGen significantly improve performance. Additionally, PocketGen is robust to hyperparameter variations, consistently producing competitive results.
Generating protein pockets for therapeutic small molecule ligands
Next, we illustrate the capacity of PocketGen to redesign the pockets of antibodies, enzymes, and biosensors for target ligand molecules, drawing upon previous research3,10,16. Specifically, we consider the following molecules. Cortisol (HCY)70 is a primary stress hormone that elevates glucose levels in the bloodstream and serves as a biomarker for stress and other conditions. We redesign the pocket of a cortisol-specific antibody (PDB ID 8cby) to potentially aid in developing immunoassays. Apixaban (APX)71 is an oral anticoagulant approved by the FDA in 2012 for patients with non-valvular atrial fibrillation to reduce the risk of stroke and blood clots72. Apixaban targets Factor Xa (fXa) (PDB ID 2p16), a crucial enzyme in blood coagulation that transforms prothrombin into thrombin for clot formation. Redesigning the pocket of fXa could, therefore, have therapeutic significance. Fentanyl (7V7)73 has become a widely abused drug contributing to the opioid crisis. Computational design of fentanyl-binding proteins (biosensors) can facilitate detection and neutralization of the toxin10. For example, Baker et al.10 developed a biosensor (PDB ID 5tzo) for detecting fentanyl in plants. In Figure 4, PLIP74 describes the interactions between the designed protein pocket and ligands, comparing these predicted interactions to the original patterns.
Figure 4. Using PocketGen to design protein pockets for binding with important ligands.
a, b, c, Illustrations of protein-ligand interaction analysis for three target molecules (HCY, APX, and 7V7, respectively). ‘PocketGen’ refers to the protein pocket designed by PocketGen, and ‘Original’ denotes the original protein-ligand structure. ‘HP’ indicates hydrophobic interactions, ‘HB’ signifies hydrogen bonds, and ‘’ denotes the -stacking/cation interactions. In the residue sequences, red ones denote the designed residues that differ from the original pocket. d, e, f, The pocket binding affinity distributions of PocketGen and baseline methods for three target molecules (HCY, APX, and 7V7, respectively). We mark the Vina Score of the original pocket with the vertical dotted lines. For each method, we sample 100 pockets for each target ligand. The ratio of generated pockets by PocketGen with higher affinity than the corresponding reference pocket are 11%, 40%, and 45%, respectively. g, h, Protein-ligand interaction analysis for unseen proteins in the training dataset (PiB21 and luxsit8). The target molecules are Rucaparib and DTZ, respectively. i, j, The pocket binding affinity distributions of PocketGen and baselines for Rucaparib and DTZ.
To generate pockets for the aforementioned small molecules, we pretrained PocketGen on the Binding MOAD dataset and carefully excluded the protein-ligand complexes used in the case studies. The pockets produced by PocketGen replicate most non-bonded interactions from experimentally measured protein-ligand complexes (e.g., achieving a 13/15 match for HCY) and introduce additional physically plausible interaction patterns not present in the original complexes (generated pockets for HCY, APX, and 7V7 molecules form 2, 3, and 4 extra interactions). Specifically, for HCY, PocketGen maintains essential interaction patterns such as hydrophobic interactions (TRP47, PHE50, TYR59, and TYR104) and hydrogen bonds (TYR59) and introduces two new hydrogen bond-mediated interactions with the pocket. For protein pockets designed to bind APX and 7V7 ligands while retaining key interaction patterns, including hydrophobic interactions, hydrogen bonds, and stacking, PocketGen-generated molecules establish additional interactions (e.g., -Cation interaction with LYS192 for APX and hydrogen bonds with ASN35 for 7V7), thus enhancing binding affinity with the target ligands. In conclusion, PocketGen can establish non-covalent interactions derived from protein-ligand structure data.
With the ability to establish favorable protein-ligand interactions, PocketGen generates high-affinity pockets for these drug ligands. In Figure 4d,e,f, we show the affinity distributions of the generated pockets by PocketGen and baseline methods. The ratio of generated pockets with higher affinity than the reference pocket is 11 %, 40%, and 45%, respectively, for PocketGen. On the contrary, the best runner-up method, RFDiffusionAA, only has 0%, 10 %, and 18%, respectively.
Protein stability is crucial in protein design, ensuring that the designed protein can fold into and maintain its 3D structure75. Stability is quantified by the difference in Gibbs free energy of folded protein between the redesigned protein and the wild-type (original) protein, where . A positive value indicates increased protein stability, while a negative value suggests decreased stability. We used DDMut76 to predict the change in stability for pockets generated in Figure 4, with values of 0.09 (HCY), 0.92 (APX), 0.13 (7V7), 0.27 (Rucaparib), and 0.02 (DTZ) respectively. These results indicate that PocketGen can generate protein structures that are likely sufficiently stable to bind ligand molecules.
To demonstrate the generalization capability of PocketGen, we tested it on unseen proteins from the training sets, including PiB21 and luxsit8, with binding ligand molecules Rucaparib and DTZ, respectively. Figure 4g and 4h show the interaction analysis, while Figure 4i and 4j display the distribution of Vina scores. PocketGen consistently outperforms baseline methods in generating higher-affinity pockets. Generating pockets with higher affinity than the reference pocket for DTZ is more challenging, as the original pocket was designed using site-saturation mutagenesis8 to achieve optimal design. In Extended Data Figure 1f, we report case studies with a pair of activity cliff ligand molecules (C19 and C52)77 to explore PocketGen’s adaptability. PocketGen’s generated interactions vary across molecular fragments; hydrogen bonds and hydrophobic interactions are generated for one fragment, and halogen bonds are generated for a different fragment, suggesting that PocketGen has learned protein-ligand interaction rules, enabling it to adaptively design high-affinity binding pockets.
Exploring interactions between atoms in protein and ligand molecules produced by PocketGen
We analyze attention maps to understand what PocketGen has learned using a generated pocket for the ligand APX as an example. Figure 5a presents a 2D interaction plot drawn with Schrödinger Maestro software. To assess PocketGen’s recognition of meaningful protein-ligand interactions, we plot the heatmap of attention weights produced by the final layer of PocketGen’s neural architecture. In Figure 5b, two attention heads are illustrated, with each row and column representing a protein residue or a ligand atom, respectively. The attention heatmaps are sparse, reflecting PocketGen’s use of sparse attention (see Methods for more details). The attention heads display diverse patterns, focusing on different aspects. For instance, the first head appears to emphasize hydrogen bonds, assigning high weights between residue THR146, ASP220, and ligand atom 7. The second attention head seems to capture stacking and -cation interactions, specifically between residue TYR99 and ligand atoms 15, 21, 23, 25, 29, and 33; and residue LYS192 and ligand atoms 1, 14, 17, 19, and 20. These analyses indicate that PocketGen, despite its data-driven nature, has acquired biochemical knowledge to identify intermolecular interactions.
Figure 5. Attention maps in PocketGen capture interactions between atoms in protein and ligand molecules.
a, The 2D interaction plot of the designed pocket by PocketGen for APX. b, The heatmap of attention matrices between residues and ligand atoms from the last layer of PocketGen. We show two selected attention heads with notable attention patterns marked with red rectangles. We notice that each head emphasizes different interactions. For example, PocketGen recognizes the hydrogen bond interaction and assigns a strong attention weight between residue ① THR146, ② ASP220, and ligand atom 7 in the first head. The stacking and -Cation interactions of ③ TYR99 and ④ LYS192 are well captured in the second head. The values are normalized by the maximum value () and the minimum value () in each heatmap (i.e., ).
Discussion
Protein-ligand binding is critical in enzyme catalysis, immune recognition, cellular signal transduction, gene expression control, and other biological processes. Recent developments include deep generative models designed to study protein-ligand binding, like Lingo3DMol78, ResGen79, and PocketFlow80 which generate de novo drug-like ligand molecules for fixed protein targets; NeuralPLexer4 can create the structure of protein-ligand complexes given the protein sequence and ligand molecular graph. However, these models do not facilitate the de novo generation of protein pockets, the interfaces that bind with the ligand molecule for targeted ligand binding, critical in enzyme and biosensor engineering. We developed PocketGen, a deep generative method capable of generating both the residue sequence and the full atom structures of the protein pocket region for binding with the target ligand molecule. PocketGen includes two main modules: a bilevel graph transformer for structural encoding and updates and a sequence refinement module that uses protein language models (pLMs) for sequence prediction. For structure prediction, the bilevel graph transformer directly updates the all-atom coordinates instead of separately predicting the backbone frame orientation and side-chain torsion angles. To achieve sequence-structure consistency and effectively leverage evolutionary knowledge from pLMs, a structural adapter is integrated into protein language models for sequence updates. This adapter employs cross-attention between sequence and structure features to promote information flow and ensure sequence-structure consistency. Extensive experiments across benchmarks and case studies involving therapeutic ligand molecules illustrate PocketGen’s ability to generate high-fidelity pocket structures with high binding affinity and favorable interactions with target ligands. Analysis of PocketGen’s performance across various settings reveals its proficiency in balancing diversity and affinity and generalizing across different pocket sizes. Additionally, PocketGen offers computational efficiency, significantly reducing runtime compared to traditional physics-based methods, making it feasible to sample large quantities of pocket candidates. PocketGen surpasses existing methods in efficiently generating high-affinity protein pockets for target ligand molecules, finding important interactions between atoms on protein and ligand molecules, and attaining consistency in sequence and structure domains.
PocketGen creates several fruitful directions for future work. PocketGen could be expanded to design larger areas of the protein beyond the pocket area. While PocketGen has been evaluated on larger pocket designs, modifications will be required to enhance scalability and robustness for generating larger protein areas. Another fruitful future direction involves incorporating additional biochemical priors, such as subpockets81 and interaction templates17, to improve generalizability and success rates. For instance, despite overall dissimilarity, two protein pockets might still bind the same fragment if they share similar subpockets82. Moreover, conducting wet lab experiments could provide empirical validation of PocketGen’s effectiveness. Approaches such as PocketGen have the potential to advance areas of machine learning and bioengineering and help with the design of small molecule binders and enzymes.
Methods
Overview of PocketGen
Unlike previous methods focusing on protein sequence or structure generation, we aim to co-design both residue types (sequences) and 3D structures of the protein pocket that can fit and bind with target ligand molecules. Inspired by previous works on structure-based drug design79,81 and protein generation34,35, we formulate pocket generation in PocketGen as a conditional generation problem that generates the sequences and structures of pocket conditioned on the protein scaffold (other parts of the protein except the pocket region) and the binding ligand. To be specific, let denote the whole protein sequence of residues, where is the length of the sequence. The 3D structure of the protein can be described as a point cloud of protein atoms and let denote the 3D coordinate of protein atoms. is the number of atoms in a residue determined by the residue types. The first four atoms in any residue correspond to its backbone atoms , and the rest are the side-chain atoms. The ligand molecule can also be represented as a 3D point cloud where denotes the atom feature. Let denotes the 3D coordinates of atom . Our work defines the protein pocket as a set of residues in the protein closest to the binding ligand molecule: . The pocket can thus be represented as an amino acid subsequence of a protein: where is the index of the pocket residues in the whole protein. The index can be formally given as: , where is the distance norm and is the distance threshold. According to the distance range of pocket-ligand interactions45, we set Å in the default setting. With the above-defined notations, PocketGen aims to learn a conditional generative model formally defined as :
| (1) |
where denotes the other parts of the protein except the pocket region. We also adjust the structure ligand molecule in PocketGen to encourage protein-ligand interactions and reduce steric clashes.
To effectively generate the structure and the sequence of the protein pocket , the equivariant bilevel graph transformer and the sequence refinement module with pretrained protein language models and adapters are proposed, which will be discussed in the following paragraphs. The illustrative workflow is depicted in Fig. 1.
Equivariant bilevel graph transformer
It is critical to model the complex interactions in the protein pocket-ligand complexes for pocket generation. However, the multi-granularity (e.g., atom-level and residue-level) and multi-aspect (intra-protein and protein-ligand) nature of interactions brings a lot of challenges. Inspired by recent works on hierarchical graph transformer81 and generalist equivariant transformer83, we propose a novel equivariant bilevel graph transformer to well model the multi-granularity and multi-aspect interactions. Each residue or ligand is represented as a block (i.e., a set of atoms) for the conciseness of representation and ease of computation. Then the protein-ligand complex can be abstracted as a geometric graph of sets , where denotes the blocks and include all the edges between blocks. We added self-loops to the edges to capture interactions within the block (e.g., the interactions between ligand atoms). Our model adaptively assigns different numbers of channels to and to accommodate different numbers of atoms in residues and ligands. For example, given a block with atoms, the corresponding block has indicating the atom features ( is the feature dimension size) and denoting the atom coordinates. Specifically, the -th row of and corresponds to the -th atom’s trainable feature (i.e., ) and coordinates (i.e., ) respectively. The trainable feature is first initialized with the concatenation of atom type embedding, residue/ligand embeddings, and the atom positional embeddings. To build , we connect the -nearest neighboring residues according to the pairwise distances. To reflect the interactions between the protein pocket and ligand, we add edges between all the pocket residue and the ligand block. We describe the modules in PocketGen’s equivariant bilevel graph transformer, bilevel attention module, and equivariant feed-forward networks.
Bilevel attention module.
Our model captures both atom-level and residue/ligand-level interactions with the bilevel attention module. Firstly, given two block and connected by an edge , we obtain the query, the key, and the value matrices with the following transformations:
| (2) |
where are trainable parameters.
To calculate the atom-level attention across the -th and -th block, we denote and as the relative coordinates and distances between atom pairs in block and , namely, .
Then we have:
| (3) |
| (4) |
where is a Multi-Layer Perceptron (MLP) that adds distance bias to the attention calculation. RBF embeds the distance with radial basis functions. is the atom-level attention matrix obtained by applying row-wise Softmax on . To encourage sparsity in the attention matrix, we keep the top- elements of each row in and set the others as zeros.
The residue/ligand-level attention from the -th block to the -th block is calculated as:
| (5) |
| (6) |
where 1 refers to the column vector with all elements set as ones and denotes the neighboring blocks of sums up all values in to represent the overall correlation between blocks and . Subsequently, denotes the attention across blocks at the block level.
We can update the representations and coordinates using the above atom-level and the residue/ligand-level attentions. PocketGen only updates the coordinates of the residues in the pocket and the ligand molecule. The other protein residues are fixed. Specifically, for the -th atom in block :
| (7) |
| (8) |
| (9) |
where and are MLPs with concatenated representations as input (concatenation along the second dimension and is repeated along rows). computes the element-wise multiplication. and denote the updated representation and coordinate matrices, and we can verify that the dimension size of and remains the same regardless of the neighboring block size . Furthermore, as the attention coefficients and are invariant under E(3) transformations, the modification of adheres to E(3)-equivariance. Additionally, the permutation of atoms within each block does not affect this update process.
Equivariant feed-forward network.
We adapted the feed-forward network module (FFN) in the transformer model84 to update and . Specifically, the representation and coordinates of atoms are updated to consider the block’s feature/geometric centroids (means). The centroids are denoted as:
| (10) |
Then we obtain the relative coordinate and the relative distance representation based on the L2 norm of :
| (11) |
The representation and coordinates of atoms are updated with MLPs and . The centroids are integrated to inform of the context of the block:
| (12) |
| (13) |
To stabilize and accelerate training, layer normalization85 is appended at each layer of the equivariant bilevel graph transformer to normalize . The equivariant feed-forward network satisfies E(3)-equivariance. Thanks to each module’s E(3)-equivariance, the whole proposed bilevel graph transformer has the desirable property of E(3)-equivariance (Theorem 1 in Supplementary Information shows the details). In PocketGen, we use E(3) equivariant model for its simplicity similar to previous works86,87, which is capable enough to achieve strong performance. We are aware that an SE(3) equivariant model architecture would be better for learning the chirality-related properties of the protein, which we left for future exploration.
Sequence refinement with pretrained protein language models and adapters
Protein language models (pLMs), such as the ESM family of models40,41, have learned extensive evolutionary knowledge from the vast array of natural protein sequences, demonstrating a strong ability to design protein sequences. In PocketGen, we propose to leverage pLMs to help refine the designed protein pocket sequences. To infuse the pLMs with structural information, we implant lightweight structural adapters inspired by previous works88,89. Different from LM-Design89 which focuses on protein sequence design given fixed backbone structure, PocketGen codesigns both the amino acid sequence as well as the full atom structure of the protein pocket. In our default setting, only one structural adapter was placed after the last layer of pLM. Only the adapter layers are fine-tuned during training, and the other layers of PLMs are frozen to save computation costs. The structural adapter mainly has the following two parts.
Structure-sequence cross attention.
The structural representation of the -th residue is obtained by mean pooling of from the bilevel graph transformer. In the input to the pLMs, the pocket residue types to be designed are assigned with the mask, and we denote the -th residue representation from pLMs as . In the structural adapter, we perform cross-attention between the structural representations and sequence representations . The query, key, and value matrices are obtained as follows:
| (14) |
where are trainable weight matrices. Rotary positional encoding90 is applied to the representations, and we omit it in the equations for simplicity. The output of the cross attention is obtained as:
| (15) |
Bottleneck feed-forward network.
A bottleneck feed-forward network (FFN) is appended after the cross-attention to impose non-linearity and abstract representations, inspired by previous works such as Houlsby et al.88. The intermediate dimension of the bottleneck FFN is set to be half of the default representation dimension. Finally, the predicted pocket residue type is obtained using an MLP on the output residue representation.
Training protocol
Inspired by AlphaFold239, we use a recycling strategy for model training. Recycling facilitates the training of deeper networks without incurring extra memory costs by executing multiple forward passes and computing gradients solely for the final pass. The training loss of PocketGen is the weighted sum of the following three losses:
| (16) |
| (17) |
| (18) |
| (19) |
where is the total refinement rounds. , and are the ground-truth residue types, residue coordinates, and ligand coordinates, bond lengths, and bond/dihedral angles; , and are the predicted ones at the -th round by PocketGen. The sequence loss is the cross-entropy loss for pocket residue type prediction; the coordinate loss uses huber loss91 for the training stability; the structure loss is added to supervised bond lengths and bond/dihedral angles for realistic local geometry. and denote all the bonds and angles in the protein pocket (including side chains). , and are hyperparameters balancing the three losses. We perform a grid search over {0.5,1.0,2.0,3.0} and choose these hyperparameters based on the validation performance to select the specific parameter values. In the default setting, we set to 1.0 and to 2.0.
Generation protocol
In the generation procedure, PocketGen initializes the sequence with uniform distributions over 20 amino acid types and the coordinates based on linear interpolations and extrapolations. Specifically, we initialize the residue coordinates with linear interpolations and extrapolations based on the nearest residues with known structures in the protein. Denote the sequence of residues as , where is the length of the sequence. Let denote the coordinate of the -th residue. We take the following strategies to determine the coordinate of the -th residue: (1) We use linear interpolation if there are residues with known coordinates at both sides of the -th residue. Specifically, assume and are the indexes of the nearest residues with known coordinates at each side of the -th residue ), we have: . (2) We conduct linear extrapolation if the -th residue is at the ends of the chain, i.e., no residues with known structures at one side of the -th residue. Specifically, let and denote the index of the nearest and the second nearest residue with known coordinates. The position of the -th residue can be initialized as . Inspired by previous works33,34, we initialize the other backbone atom coordinates according to their ideal local coordinates relative to the coordinates. We initialize the side-chain atoms’ coordinates with the coordinate of their corresponding , added with Gaussian noise. We initialize the ligand molecular structure with the reference ligand structure from the dataset. The ligand structure is updated during pocket generation and the updated ligand is used for Vina score calculation.
Since the number of pocket residue types and the number of side chain atoms are unknown at the beginning of generation, each pocket residue is assigned 14 atoms, the maximum number of atoms for residues. After rounds of refinement by PocketGen, the pocket residue types are predicted, and the full atom coordinates are determined by mapping the coordinates to the predicted residue types (taking the first n coordinates according to residue type). In PocketGen, we directly predict the absolute atom coordinates, which reduces the model complexity and flexibly captures atom interactions. We also notice PocketGen aligns with the recent trend of directly predicting full atom coordinates. For example, the recent AlphaFold392 directly predicts the full atom coordinates, replacing the AlphaFold2 structure module that operated on amino-acid-specific frames and side-chain torsion angles, and achieves better performance on protein structure prediction. For generation efficiency, we set the number of refinement rounds to 3.
Experimental setting
Datasets.
We consider two widely used datasets for benchmark evaluation: CrossDocked dataset42 contains 22.5 million protein-molecule pairs generated through cross-docking. Following previous works24,60,93, we filter out data points with binding pose RMSD greater than 1 Å, leading to a refined subset with around 180k data points. For data splitting, we use mmseqs294 to cluster data at 30% sequence identity, and randomly draw 100k protein-ligand structure pairs for training and 100 pairs from the remaining clusters for testing and validation, respectively; Binding MOAD dataset43 contains around 41k experimentally determined protein-ligand complexes. Following previous work95, we keep pockets with valid and moderately ‘drug-like’ ligands with QED score ≥ 0.3. We further filter the dataset to discard molecules containing atom types as well as binding pockets with non-standard amino acids. Then, we randomly sample and split the filtered dataset based on the Enzyme Commission Number (EC Number)44 to ensure different sets do not contain proteins from the same EC Number main class. Finally, we have 40k protein-ligand pairs for training, 100 pairs for validation, and 100 pairs for testing. For all the benchmark tasks in this paper, PocketGen and all the other baseline methods are trained with the same data split for a fair comparison. In real-world pocket generation and optimization case studies, the protein structures were downloaded from PDB96.
Implementation.
Our PocketGen model is trained with Adam97 optimizer for 5k iterations, where the learning rate is 0.0001, and the batch size is 64. We report the results corresponding to the checkpoint with the best validation loss. It takes around 48 hours to finish training on 1 Tesla A100 GPU from scratch. In PocketGen, the number of attention heads is set as 4; the hidden dimension d is set as 128; is set to 8 to connect the -nearest neighboring residues to build is set as 3 to encourage sparsity in the attention matrix. For all the benchmark tasks of pocket generation and optimization, PocketGen and all the other baseline methods are trained with the same data split for a fair comparison. We follow the implementation codes provided by the authors to obtain the results of baseline methods. Algorithm 1 and 2 in the supplementary show the pseudo-codes of the training and generation process of PocketGen.
Baseline methods.
PocketGen is compared with five state-of-the-art representative baseline methods. PocketOptimizer18 is a physics-based method that optimizes energies such as packing and binding-related energies for ligand-binding protein design. Following the suggestion of the paper, we fixed the backbone structures. DEPACT17 is a template-matching method that follows a two-step strategy98 for pocket design. It first searches the protein-ligand complexes in the database with similar ligand fragments. It then grafts the associated residues into the protein scaffold to output the complete protein structure with PACMatch17. Both the backbone and the sidechain structures are changed in DEPACT. RFDiffusion26, RFDiffusionAA16, FAIR24, and dyMEAN25 are deep-learning-based models that for protein generation. RFDiffusion does not explicitly model protein-ligand interactions and is not directly applicable to small molecule-binding protein generation. Following the suggestions in RFDiffusion26 and RFDiffusionAA16, we use a heuristic attractive-repulsive potential to encourage the formation of pockets with shape complementarity to a target molecule. The residue sequence for the generated protein by RFDiffusion is derived with ProteinMPNN, and the side-chain conformation is decided with Rosetta99 side-chain packing. RFDiffusionAA is the latest version of RFDiffusion, which can directly generate protein structures surrounding small molecules by combining residue-based representation of amino acids with atomic representation of small molecules. For RFDiffusion and RFDiffusionAA, we let them in paint the pocket area to obtain a consistent setting with other methods for comparison. We also note that RFDiffusion and RFDiffusionAA do not provide the training/finetuning scripts, so we use the provided pre-trained checkpoints for all the related experiments in our paper. FAIR24 was specially designed for full-atom protein pocket design via iterative refinement. dyMEAN25 was originally proposed for full atom antibody design, and we adapted it to our pocket design task with proper modifications. Detailed information on baselines is included in Supplementary Notes. The setting of the key hyperparameters is summarized in Table. S6. All the baselines are run on the same Telsa A100 GPU for a fair comparison with our PocketGen.
Extended Data
Extended Data Fig. 1: More case studies and evaluations of PocketGen.
a, The originally designed protein binder for Rucaparib21(left panel) and the generated protein binder by PocketGen (right panel). b, The originally designed protein binder for DTZ8(left panel) and the generated protein binder by PocketGen (right panel). Note that PocketGen generates the whole protein instead of the pocket region in a&b. The generated protein binder has high scTM scores (0.900 and 0.976). c, The predicted affinity (log K) by GIGN100 of the generated pockets by PocketGen with respect to RMSD. We randomly select two protein-ligand complexes from PDBBind (PDB id 2c3i and 3jya). d, The Vina score/binding affinity (log K) of the generated pockets by PocketGen and the original pockets from PDBBind. The black region/dots indicate the generated pockets have higher affinities than the original pockets while the red region/dots indicate lower affinities. f, The generated interactions by PocketGen with respect to a pair of activity cliff ligand molecules, i.e., C19 and C5277. As marked with red rectangles, PocketGen adaptively generates different interactions for different molecular fragments (hydrogen bonds+hydrophobic interactions and halogen bonds respectively). ‘HP’ indicates hydrophobic interactions, ‘HB’ signifies hydrogen bonds, ‘’ denotes the -stacking/cation interactions, and ‘Halo’ indicates the Halogen bonds. e, Detailed validity check with PoseBusters on CrossDocked and Binding MOAD.
Supplementary Material
Acknowledgements
This research was partially supported by grants from the National Key Research and Development Program of China (No. 2021YFF0901003) and the University Synergy Innovation Program of Anhui Province (GXXT-2021-002). We thank Dr. Yaoxi Chen and Dr. Haiyan Liu from the University of Science and Technology of China for their constructive discussions on implementing and evaluating baseline methods, which greatly helped this research. M.Z. gratefully acknowledges the support of NIH R01-HD108794, NSF CAREER 2339524, US DoD FA8702-15-D-0001, awards from Harvard Data Science Initiative, Amazon Faculty Research, Google Research Scholar Program, AstraZeneca Research, Roche Alliance with Distinguished Scientists, Sanofi iDEA-iTECH Award, Pfizer Research, Chan Zuckerberg Initiative, John and Virginia Kaneb Fellowship award at Harvard Medical School, Biswas Computational Biology Initiative in partnership with the Milken Institute, and Kempner Institute for the Study of Natural and Artificial Intelligence at Harvard University.
Footnotes
Code availability
The source code of this study is freely available at GitHub (https://github.com/zaixizhang/PocketGen) to allow for replication of the results of this study.
Competing interests
The authors declare no competing interests.
Data availability
This study’s training and test data are available at Zenodo (https://zenodo.org/records/10125312). The project website for PocketGen is at https://zitniklab.hms.harvard.edu/projects/PocketGen.
References
- 1.Tinberg C. E. et al. Computational design of ligand-binding proteins with high affinity and selectivity. Nature 501, 212–216(2013) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Kroll A., Ranjan S., Engqvist M. K. & Lercher M. J. A general model to predict small molecule substrates of enzymes based on machine and deep learning. Nature Communications 14, 2787 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Lee G. R. et al. Small-molecule binding and sensing with a designed protein family. bioRxiv 2023–11 (2023). [Google Scholar]
- 4.Qiao Z., Nie W., Vahdat A., Miller III T. F. & Anandkumar A. State-specific protein-ligand complex structure prediction with a multiscale deep generative model. Nature Machine Intelligence 1–14 (2024). [Google Scholar]
- 5.Jiang L. et al. De novo computational design of retro-aldol enzymes. science 319, 1387–1391 (2008). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Röthlisberger D. et al. Kemp elimination catalysts by computational enzyme design. Nature 453, 190–195 (2008). [DOI] [PubMed] [Google Scholar]
- 7.Dou J. et al. De novo design of a fluorescence-activating -barrel. Nature 561, 485–491 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Yeh A. H.-W. et al. De novo design of luciferases using deep learning. Nature 614, 774–780 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Beltrán J. et al. Rapid biosensor development using plant hormone receptors as reprogrammable scaffolds. Nature Biotechnology 40, 1855–1861 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Bick M. J. et al. Computational design of environmental sensors for the potent opioid fentanyl. Elife 6, e28909 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Glasgow A. A. et al. Computational design of a modular protein sense-response system. Science 366, 1024–1028 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Herud-Sikimić O. et al. A biosensor for the direct visualization of auxin. Nature 592, 768–772 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Polizzi N. F. & DeGrado W. F. A defined structural unit enables de novo design of small-molecule-binding proteins. Science 369, 1227–1233 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Basanta B. et al. An enumerative algorithm for de novo design of proteins with diverse pocket structures. Proceedings of the National Academy of Sciences 117, 22135–22145 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Dou J. et al. Sampling and energy evaluation challenges in ligand binding protein design. Protein Science 26, 2426–2437 (2017) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Krishna R. et al. Generalized biomolecular modeling and design with rosettafold all-atom. Science eadl2528 (2024). [DOI] [PubMed] [Google Scholar]
- 17.Chen Y., Chen Q. & Liu H. Depact and pacmatch: A workflow of designing de novo protein pockets to bind small molecules. Journal of Chemical Information and Modeling 62, 971–985 (2022). [DOI] [PubMed] [Google Scholar]
- 18.Noske J., Kynast J. P., Lemm D., Schmidt S. & Höcker B. Pocketoptimizer 2.0: A modular framework for computer-aided ligand-binding design. Protein Science 32, e4516 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Malisi C. et al. Binding pocket optimization by computational protein design. PloS one 7, e52505 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Stiel A. C., Nellen M. & Höcker B. Pocketoptimizer and the design of ligand binding sites. Computational Design of Ligand Binding Proteins 63–75 (2016). [DOI] [PubMed] [Google Scholar]
- 21.Lu L. et al. De novo design of drug-binding proteins with predictable binding energy and specificity. Science 384, 106–112 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Byon W., Garonzik S., Boyd R. A. & Frost C. E. Apixaban: a clinical pharmacokinetic and pharmacodynamic review. Clinical pharmacokinetics 58, 1265–1279 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Stark H., Jing B., Barzilay R. & Jaakkola T. Harmonic prior self-conditioned flow matching for multi-ligand docking and binding site design. In NeurIPS 2023 Generative AI and Biology (GenBio) Workshop (2023). [Google Scholar]
- 24.Zhang Z., Lu Z., Hao Z., Zitnik M. & Liu Q. Full-atom protein pocket design via iterative refinement. In Thirty-seventh Conference on Neural Information Processing Systems (2023). [Google Scholar]
- 25.Kong X., Huang W. & Liu Y. End-to-end full-atom antibody design. ICML (2023). [Google Scholar]
- 26.Watson J. L. et al. De novo design of protein structure and function with rfdiffusion. Nature 620, 1089–1100 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Ho J., Jain A. & Abbeel P. Denoising diffusion probabilistic models. Advances in Neural Information Processing Systems 33, 6840–6851 (2020). [Google Scholar]
- 28.Baek M. et al. Accurate prediction of protein structures and interactions using a three-track neural network. Science 373, 871–876 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Dauparas J. et al. Robust deep learning-based protein sequence design using proteinmpnn. Science 378, 49–56 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Dauparas J. et al. Atomic context-conditioned protein sequence design using ligandmpnn. Biorxiv 2023–12 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Jin W., Wohlwend J., Barzilay R. & Jaakkola T. Iterative refinement graph neural network for antibody sequence-structure co-design. ICLR (2022). [Google Scholar]
- 32.Jin W., Barzilay R. & Jaakkola T. Antibody-antigen docking and design via hierarchical structure refinement. In ICML, 10217–10227 (PMLR, 2022). [Google Scholar]
- 33.Luo S. et al. Antigen-specific antibody design and optimization with diffusion-based generative models. NeurIPS (2022). [Google Scholar]
- 34.Kong X., Huang W. & Liu Y. Conditional antibody design as 3d equivariant graph translation. ICLR (2023). [Google Scholar]
- 35.Shi C., Wang C., Lu J., Zhong B. & Tang J. Protein sequence and structure co-design with equivariant translation. ICLR (2023). [Google Scholar]
- 36.Anishchenko I. et al. De novo protein design by deep network hallucination. Nature 600, 547–552 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Yang J. et al. Improved protein structure prediction using predicted interresidue orientations. Proceedings of the National Academy of Sciences 117, 1496–1503 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Cao L. et al. Design of protein-binding proteins from the target structure alone. Nature 605, 551–560 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Jumper J. et al. Highly accurate protein structure prediction with alphafold. Nature 596, 583–589 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Rives A. et al. Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences. PNAS (2019). URL https://www.biorxiv.org/content/10.1101/622803v4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Lin Z. et al. Language models of protein sequences at the scale of evolution enable accurate structure prediction. bioRxiv (2022). [Google Scholar]
- 42.Francoeur P. G. et al. Three-dimensional convolutional neural networks and a cross-docked data set for structure-based drug design. Journal of chemical information and modeling 60, 4200–4215 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Hu L., Benson M. L., Smith R. D., Lerner M. G. & Carlson H. A. Binding moad (mother of all databases). Proteins: Structure, Function, and Bioinformatics 60, 333–340 (2005). [DOI] [PubMed] [Google Scholar]
- 44.Bairoch A. The enzyme database in 2000. Nucleic acids research 28, 304–305 (2000). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45.Marcou G. & Rognan D. Optimizing fragment and scaffold docking by use of molecular interaction fingerprints. Journal of chemical information and modeling 47, 195–207 (2007). [DOI] [PubMed] [Google Scholar]
- 46.Trott O. & Olson A. J. Autodock vina: improving the speed and accuracy of docking with a new scoring function, efficient optimization, and multithreading. Journal of computational chemistry 31, 455–461 (2010). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47.Yang M. et al. Uni-gbsa: An open-source and web-based automatic workflow to perform mm/gb(pb) sa calculations for virtual screening. Briefings in Bioinformatics 24, bbad218 (2023). [DOI] [PubMed] [Google Scholar]
- 48.Friesner R. A. et al. Glide: a new approach for rapid, accurate docking and scoring. 1. method and assessment of docking accuracy. Journal of medicinal chemistry 47, 1739–1749 (2004). [DOI] [PubMed] [Google Scholar]
- 49.Lin Z. et al. Evolutionary-scale prediction of atomic-level protein structure with a language model. Science 379, 1123–1130 (2023). [DOI] [PubMed] [Google Scholar]
- 50.Trippe B. L. et al. Diffusion probabilistic modeling of protein backbones in 3d for the motif-scaffolding problem. In The Eleventh International Conference on Learning Representations (2023). URL https://openreview.net/forum?id=6TxBxqNME1Y. [Google Scholar]
- 51.Lin Y. & AlQuraishi M. Generating novel, designable, and diverse protein structures by equivariantly diffusing oriented residue clouds. ICML (2023). [Google Scholar]
- 52.Zhang Y. & Skolnick J. Scoring function for automated assessment of protein structure template quality. Proteins: Structure, Function, and Bioinformatics 57, 702–710 (2004). [DOI] [PubMed] [Google Scholar]
- 53.Yim J. et al. Improved motif-scaffolding with se (3) flow matching. arXiv preprint arXiv:2401.04082 (2024). [Google Scholar]
- 54.Yim J. et al. Se (3) diffusion model with application to protein backbone generation. In International Conference on Machine Learning, 40001–40039 (PMLR, 2023). [Google Scholar]
- 55.Tibshirani R. J. & Efron B. An introduction to the bootstrap. Monographs on statistics and applied probability 57, 1–436 (1993). [Google Scholar]
- 56.Yoo Y. J., Feng Y., Kim Y.-H. & Yagonia C. F. J. Fundamentals of enzyme engineering (2017).
- 57.Traut T. W. Protein engineering: Principles and practice. American Scientist 85, 571–573 (1997). [Google Scholar]
- 58.Spencer R. K. et al. Stereochemistry of polypeptoid chain configurations. Biopolymers 110, e23266 (2019). [DOI] [PubMed] [Google Scholar]
- 59.http://www.mlb.co.jp/linux/science/garlic/doc/commands/dihedrals.html .
- 60.Peng X. et al. Pocket2mol: Efficient molecular sampling based on 3d protein pockets. ICML (2022). [Google Scholar]
- 61.Zhang Z., Liu Q., Lee C.-K., Hsieh C.-Y. & Chen E. An equivariant generative framework for molecular graph-structure co-design. Chemical Science 14, 8380–8392 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 62.Kaplan J. et al. Scaling laws for neural language models. arXiv preprint arXiv:2001.08361 (2020). [Google Scholar]
- 63.Alberts B. Molecular biology of the cell (Garland science, 2017). [Google Scholar]
- 64.Shoichet B. K. Virtual screening of chemical libraries. Nature 432, 862–865 (2004). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 65.Ertl P. An algorithm to identify functional groups in organic molecules. Journal of cheminformatics 9, 1–7 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 66.Buttenschoen M., Morris G. M. & Deane C. M. Posebusters: Ai-based docking methods fail to generate physically valid poses or generalise to novel sequences. Chemical Science 15, 3130–3139 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 67.Satorras V. G., Hoogeboom E., Fuchs F. B., Posner I. & Welling M. E (n) equivariant normalizing flows. NeurIPS (2021). [Google Scholar]
- 68.Jing B., Eismann S., Suriana P., Townshend R. J. & Dror R. Learning from protein structure with geometric vector perceptrons. ICLR (2021). [Google Scholar]
- 69.Huang W. et al. Equivariant graph mechanics networks with constraints. arXiv preprint arXiv:2203.06442 (2022). [Google Scholar]
- 70.Eronen V. et al. Structural insight to elucidate the binding specificity of the anti-cortisol fab fragment with glucocorticoids. Journal of Structural Biology 215, 107966 (2023). [DOI] [PubMed] [Google Scholar]
- 71.Pinto D. J. et al. Discovery of 1-(4-methoxyphenyl)-7-oxo-6-(4-(2-oxopiperidin-1-yl) phenyl)-4, 5, 6, 7-tetrahydro-1 h-pyrazolo [3, 4-c] pyridine-3-carboxamide (apixaban, bms-562247), a highly potent, selective, efficacious, and orally bioavailable inhibitor of blood coagulation factor xa. Journal of medicinal chemistry 50, 5339–5356 (2007). [DOI] [PubMed] [Google Scholar]
- 72.Hernandez I., Zhang Y. & Saba S. Comparison of the effectiveness and safety of apixaban, dabigatran, rivaroxaban, and warfarin in newly diagnosed atrial fibrillation. The American journal of cardiology 120, 1813–1819 (2017). [DOI] [PubMed] [Google Scholar]
- 73.Stanley T. H. The fentanyl story. The Journal of Pain 15, 1215–1226 (2014). [DOI] [PubMed] [Google Scholar]
- 74.Salentin S., Schreiber S., Haupt V. J., Adasme M. F. & Schroeder M. Plip: fully automated protein-ligand interaction profiler. Nucleic acids research 43, W443–W447 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 75.Yang J., Li F.-Z. & Arnold F. H. Opportunities and challenges for machine learning-assisted enzyme engineering. ACS Central Science (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 76.Zhou Y., Pan Q., Pires D. E., Rodrigues C. H. & Ascher D. B. Ddmut: predicting effects of mutations on protein stability using deep learning. Nucleic Acids Research gkad472 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 77.Hu E. et al. Discovery of aryl aminoquinazoline pyridones as potent, selective, and orally efficacious inhibitors of receptor tyrosine kinase c-kit. Journal of medicinal chemistry 51, 3065–3068 (2008). [DOI] [PubMed] [Google Scholar]
- 78.Wang L. et al. Lingo3dmol: Generation of a pocket-based 3d molecule using a language model. Nature Machine Intelligence (2024). [Google Scholar]
- 79.Zhang O. et al. Resgen is a pocket-aware 3d molecular generation model based on parallel multiscale modelling. Nature Machine Intelligence 1–11 (2023). [Google Scholar]
- 80.Jiang Y. et al. Pocketflow is a data-and-knowledge-driven structure-based molecular generative model. Nature Machine Intelligence 1–12 (2024). [Google Scholar]
- 81.Zhang Z. & Liu Q. Learning subpocket prototypes for generalizable structure-based drug design. ICML (2023). [Google Scholar]
- 82.Kalliokoski T., Olsson T. S. & Vulpetti A. Subpocket analysis method for fragment-based drug discovery. Journal of chemical information and modeling 53, 131–141 (2013). [DOI] [PubMed] [Google Scholar]
- 83.Kong X., Huang W. & Liu Y. Generalist equivariant transformer towards 3d molecular interaction learning. arXiv preprint arXiv:2306.01474 (2023). [Google Scholar]
- 84.Vaswani A. et al. Attention is all you need. Advances in neural information processing systems 30 (2017). [Google Scholar]
- 85.Ba J. L., Kiros J. R. & Hinton G. E. Layer normalization. arXiv preprint arXiv:1607.06450 (2016). [Google Scholar]
- 86.Igashov I. et al. Equivariant 3d-conditional diffusion model for molecular linker design. Nature Machine Intelligence 1–11 (2024). [Google Scholar]
- 87.Batzner S. et al. E (3)-equivariant graph neural networks for data-efficient and accurate interatomic potentials. Nature communications 13, 2453 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 88.Houlsby N. et al. Parameter-efficient transfer learning for nlp. In International Conference on Machine Learning, 2790–2799 (PMLR, 2019). [Google Scholar]
- 89.Zheng Z. et al. Structure-informed language models are protein designers. bioRxiv 2023–02 (2023). [Google Scholar]
- 90.Su J. et al. Roformer: Enhanced transformer with rotary position embedding. arXiv preprint arXiv:2104.09864 (2021). [Google Scholar]
- 91.Huber P. J. Robust estimation of a location parameter. Breakthroughs in statistics: Methodology and distribution 492–518 (1992). [Google Scholar]
- 92.Abramson J. et al. Accurate structure prediction of biomolecular interactions with alphafold 3. Nature 1–3 (2024), [DOI] [PMC free article] [PubMed] [Google Scholar]
- 93.Luo S., Guan J., Ma J. & Peng J. A 3d generative model for structure-based drug design. NeurIPS 34, 6229–6239 (2021). [Google Scholar]
- 94.Steinegger M. & Söding J. Mmseqs 2 enables sensitive protein sequence searching for the analysis of massive data sets. Nature biotechnology 35, 1026–1028 (2017). [DOI] [PubMed] [Google Scholar]
- 95.Schneuing A. et al. Structure-based drug design with equivariant diffusion models. arXiv preprint arXiv:2210.13695 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 96.Sussman J. L. et al. Protein data bank (pdb): database of three-dimensional structural information of biological macromolecules. Acta Crystallographica Section D: Biological Crystallography 54, 1078–1084 (1998). [DOI] [PubMed] [Google Scholar]
- 97.Kingma D. P. & Ba J. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014). [Google Scholar]
- 98.Zanghellini A. et al. New algorithms and an in silico benchmark for computational enzyme design. Protein Science 15, 2785–2794 (2006). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 99.Alford R. F. et al. The rosetta all-atom energy function for macromolecular modeling and design. Journal of chemical theory and computation 13, 3031–3048 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 100.Yang Z., Zhong W., Lv Q., Dong T. & Yu-Chian Chen C. Geometric interaction graph neural network for predicting protein-ligand binding affinities from 3d structures (gign). The journal of physical chemistry letters 14, 2020–2033 (2023). [DOI] [PubMed] [Google Scholar]
- 101.Maier J. A. et al. ff14sb: improving the accuracy of protein side chain and backbone parameters from ff99sb. Journal of chemical theory and computation 11, 3696–3713 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 102.Shapovalov M. V. & Dunbrack R. L. Jr A smoothed backbone-dependent rotamer library for proteins derived from adaptive kernel density estimates and regressions. Structure 19, 844–858 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
This study’s training and test data are available at Zenodo (https://zenodo.org/records/10125312). The project website for PocketGen is at https://zitniklab.hms.harvard.edu/projects/PocketGen.






