Abstract
Large language models (LLMs) have demonstrated their limitations in addressing the design of active proteins that rely on intricate intramolecular interactions, particularly in the engineering of biocatalysts. Conducting real-world studies from targeted laboratory assays has become the de facto standard for artificial intelligence (AI) research in complex biological tasks. In this study, we present a standardized strategy using function-targeted models to decode the subtle effect of sequence variations on the function. Unlike affinity-oriented protein–protein interaction studies using LLMs, our model targets the specific functional interpretation, thereby guiding enzyme evolution. We established the VERnet model using deep mutation scanning data that underwent self-distillation, achieving an optimal accuracy of 93.5% for interpreting CYP2C9 variants. Through directed evolution at conserved positions enhanced by generative AI, we identified multiple CYP2C9 variants exhibiting a broad range of functional alterations. Additionally, a fine-tuned model optimized by AlphaFold3 significantly improved the prediction of variants involving the substitution of two amino acids. Molecular dynamics simulations revealed the structural and dynamic features of the catalytic alterations in evolved variants. The in vitro validation of metabolic activity strongly corroborated the in silico predictions, highlighting the substantial potential of AI models in predicting functional evolution.
Keywords: machine learning, protein structure, enzyme evolution, CYP2C9, catalytic activity, variant effect prediction


Introduction
A forefront in biological research involves unraveling the intricate relationships between protein sequences and functions, enabling the anticipation of genetic variations’ outcomes, and designing tailored proteins. Evidence suggests that artificial intelligence (AI) approaches have gained significant prominence in clinically interpreting the variants of uncertain significance (VUS). Significant advancements in protein structure prediction, propelled by deep learning models such as AlphaFold2 and RoseTTAFold, have enhanced AI variant predictors by incorporating crucial information about protein tertiary structures. , With the surge of generative AI methodologies, protein structure-based variant models would further broaden the application of clinical genetic information. Notably, diffusion-based architectures like AlphaFold3 and RoseTTAFold All-Atom, which emerged in 2024, provide further high-resolution protein structures that may offer greater value for structure-based variant predictions.
With the expanded structural coverage afforded by increasingly accurate protein tertiary structure modeling, recent hybrid frameworks have demonstrated the value of incorporating structural constraints, such as residue–residue distance maps or AlphaFold-derived structural features, into variant effect prediction. , These approaches integrate both evolutionary information and structural context to improve the precision and robustness of prediction outcomes. Despite the significant advances of large language models (LLMs) in decoding the effects of variants on protein folding and basic functions, − inevitable limitations of LLMs undermine their ability to interpret variants involved in more complex and challenging functions. Scientists have realized that a good binder does not always perform as well as intended. When more subtle functional changes are addressed, AI research should be guided by targeted real-world experimental data to improve reliability. Moreover, de novo design has been recognized for a tendency to hallucinate protein structures that cannot exist in nature, highlighting the necessity of prioritizing functional evolution based on natural structures.
In addition, evolutionary theory provides a principled framework for guiding enzyme design by quantifying sequence constraints imposed by function. The maximum entropy (MaxEnt) principle has recently been revisited as a powerful statistical approach to model protein sequence ensembles under minimal assumptions while preserving experimentally or evolutionarily derived constraints. Notably, MaxEnt-based statistical energy functions inferred from multiple sequence alignments (MSAs) have been shown to correlate with experimentally measured enzyme properties, including catalytic efficiency and stability. By quantifying the underlying fitness landscape, the MaxEnt perspective connects the laboratory evolution with natural evolution, enabling key advances in understanding and exploring the emergence of novel enzyme functions. Rather than relying on explicit structural modeling or supervised functional labels, the MaxEnt framework captures higher-order residue couplings that reflect collective constraints shaped by natural selection, thereby offering an orthogonal perspective to deep-learning-based variant effect predictors. Importantly, this strategy provides an evolutionarily grounded criterion for prioritizing designed variants, distinguishing sequences that are both functionally promising and evolutionarily accessible and thus offering valuable guidance for computational enzyme engineering and AI-driven protein design.
Enzymes leverage intricate intramolecular interactions to achieve efficient catalysis, requiring thorough characterization within their intrinsic sequence–structure–function landscape. The catalytic functions of enzymes are not only associated with their binding affinity but also tied to substrate specificity. Structure-based variant prediction models can provide more accurate variant interpretation and yield cutting-edge insights into conformational alterations associated with specific functions. Previous studies indicated that conformational changes caused by mutations can influence protein function via a variety of mechanisms, resulting in distinct responses to the same structural alteration across different protein functions. For instance, cytochrome P450 family 2 subfamily C member 9 (CYP2C9), a critical metabolic enzyme, undergoes functional mutations that can alter the dosing efficacy of numerous drugs. However, mutations in an enzyme can exhibit varying levels of metabolic activity toward different molecules, highlighting the complexity of enzyme functions (especially catalytic activity), which depends on more than just their ligand-binding affinity. Similarly, drug discovery is also a highly complicated endeavor that extends far beyond predicting the physical binding ability with ligands. It emphasizes the importance of developing specialized models for diverse protein functions, which promise to reflect finer functional changes, thereby possessing the ability to perform directional evolution on a function.
VUS in CYP2C9 poses a challenge in pharmacogenomics, especially concerning the individual drug response to widely prescribed anticoagulants like warfarin, influenced by CYP2C9 polymorphisms. Despite the Clinical Pharmacogenetics Implementation Consortium (CPIC) offering accurate allele functional annotations through clinical evidence and drug metabolic activity assays, comprehensive characterization of CYP2C9 variants remains challenging due to economic and temporal constraints associated with experimental techniques. State-of-the-art variant prediction approaches often rely on weak labels and avoid human-curated classification, assigning variants as benign or pathogenic based on observed frequencies in humans or other primate species. Consequently, these generic approaches address only the identification of pathogenic variants, leaving many more specific functions awaiting characterization.
In response to these challenges, we proposed a general-purpose strategy for enzyme evolution to discover novel variants with anticipated functionality. A function-targeted model called the Variant Effect Recognition Network (VERnet) was developed, which can be trained for diverse proteins and functions. This work is a real-world study that learns from targeted laboratory assays and undergoes self-distillation, offering more reliable and task-specific predictive insights. VERnet provided precise variant prediction by integrating accurate protein tertiary structures, deep learning, and amino acid networks (AANs). This work showcased that VERnet excelled in interpreting CYP2C9 variants and demonstrated its capacity for directed evolution. By applying an individualized function-based model for each protein, VERnet surpassed state-of-the-art variant effect predictors using LLMs in predicting CYP2C9 variant enzyme activity. Additionally, a fine-tuned model optimized by AlphaFold3 significantly improved the prediction accuracy and successfully handled variants involving the substitution of two amino acids. Subsequently, generative AI was employed to empower the virtual evolution based on VERnet and identified several novel CYP2C9 variants with extreme functional alterations in two directions. These findings were strongly confirmed through in vitro assessments. As a case study, we performed molecular dynamics (MD) simulations for CYP2C9 WT, N218A, and T229E using DIC as the substrate. The MD study enabled us to understand and explain the structural effects of these evolved CYP2C9 variants on their catalytic activity at the atomic level. To assess the evolutionary plausibility of the CYP2C9 variants generated by VERnet, we integrated a MaxEnt statistical framework into our analysis. By jointly analyzing VERnet functional scores, MaxEnt-derived statistical energies, and experimental measurements, we demonstrate that VERnet-guided virtual evolution operates within an evolutionarily accessible sequence space. Our results also underscore the complementary role of evolutionary modeling in refining sequence prioritization and validating AI-driven enzyme engineering.
Results
VERnet: Predicting Variant Effects from Protein Structural Information
VERnet utilized features extracted from AlphaFold2-predicted protein structures to predict the functional consequences of missense single-nucleotide variants (SNVs). Its core strategies involved constructing amino acid networks (AANs) and applying deep learning techniques. By leveraging the high accuracy of protein structures predicted by AlphaFold2, VERnet incorporated innovative methods to amplify and capture pertinent information.
The realization of VERnet unfolded in two stages, as illustrated in Figure . This study focused on a protein fragment of CYP2C9 comprising 377 amino acids, which is known to encompass its active sites. , During the training preparation stage, AlphaFold2 was employed to predict protein tertiary structures for missense SNVs in CYP2C9. Subsequently, these structures were utilized to generate AANs, acting as inputs for subsequent deep learning networks (see Methods). Our prior research demonstrated the effectiveness of constructing AANs from AlphaFold2-predicted structures in enriching structural information for characterizing amino acid changes. It has also been demonstrated that the contribution of features in AANs to VERnet’s performance varied in degree. We explored optimal ways to utilize AANs as inputs for deep learning and discovered that adjusting the order of AAN features in input matrices could significantly influence the model’s ability to learn practical information (see Figure S1A,B). Therefore, we introduced modifications to the representation of structural information based on the contribution of each interaction in AANs aimed to enhance the performance of the model compared to our prior work (see Methods).
1.
Overview of constructing VERnet and using it for virtual evolution. Model training is carried out using structural information on CYP2C9 variants and their activity class derived from an enzyme deep mutation library. During data preparation, the matrices representing AANs are derived based on the tertiary structures, which are predicted using AlphaFold2. The 2D-CNN model, VERnet, then learns the correlation between the enzyme activity and protein variant’s tertiary structure. In the virtual evolution phase, VERnet predicts variants not included in the library for several key mutant sites, which are pinpointed by generative AI. Subsequent validation of these newly identified variants is done through in vitro metabolic activity assessments.
Transitioning to the training model stage, the 3D input matrices derived from AANs were employed to train 2D convolutional neural network (2D-CNN) models. These classification models learned the intricate relationship between the tertiary structures of protein variants and their corresponding enzyme activity levels. A threshold on the activity score, measured through massively parallel assessments, was set to assign the positive and negative labels for CYP2C9 variants within the training data set. Further filtering was carried out by integrating the abundance scores (Figure S1C and Methods). Figure S1C also reveals a notable imbalance between the available positive and negative samples. Despite removing some ambiguous variants using abundance scores, the training data set still contained noise due to the inherent variation in enzyme activity within complex contexts. To mitigate the impact of such misleading samples, self-distillation was conducted by pretraining models to filter out challenging samples prone to extreme misclassification (see Methods). To optimize VERnet and efficiently leverage the majority class in data sets, an undersampling schema based on EasyEnsemble was employed. Model training was stopped upon detecting signs of overfitting on the validation data set (see Figure S1D,E and Methods).
To enhance the interpretability of our model, we applied the integrated gradient (IG) method to representative variants in the testing data set. IG was computed using a 200-step interpolation path and further smoothed with a NoiseTunnel. As depicted in Figure S5, the most substantial model contributions arose from channels 1, 2, and 3, which encode interatomic contact and overlap features. In contrast, channels 4 and 5, corresponding to hydrogen-bonding interactions, exhibited localized attribution signals around the mutant sites. This analysis provides interpretable insights into the structural features guiding VERnet predictions, which may inform rational enzyme design.
VERnet Represents the Highest Agreement with Massively Parallel Assessments
Validations were conducted using an independent testing data set comprising the remaining missense single-nucleotide variants (SNVs) with known activity scores, ensuring that they were not part of the training set. The testing data set encompassed 276 variants, with 138 classified as positive samples exhibiting high activity and the remaining 138 as negative samples with low activity. An impressive accuracy of 93.5% was yielded for classifying activity levels in the testing data set. Furthermore, VERnet achieved consistent accuracies of 93.5% for both high-activity and low-activity variant recognition (Figure A). These results affirm that VERnet genuinely learned features associated with enzyme activity, rather than the selection bias toward a particular sample type.
2.
Performance evaluation of VERnet. (A) Evaluation of testing accuracy for VERnet and three base classifiers in the positive, negative, and integrated testing data sets. (B) Assessment of AUC values for ROC and PR curves depicting predicted enzyme activity ranks (where a higher rank indicates the greater possibility of increased activity) for 276 samples in the testing data set. The performance of the final model VERnet is denoted by a red solid curve, while dashed curves represent the performance of the base classifiers. (C) Comparative analysis of VERnet’s performance against other 10 computational variant effect predictors in the positive (orange) and negative (blue) testing data sets. (D) Examination of AUC values of the ROC and PR curves illustrated predicted enzyme activity ranks (where a higher rank indicates the greater possibility of increased activity) for the 276 CYP2C9 variants in the testing data set. The performance of VERnet is depicted by a solid curve, while dashed curves represent the performance of the other 10 computational variant effect predictors.
The areas under the curve (AUC) for the precision-recall (PR) and receiver operating characteristic (ROC) were calculated to evaluate VERnet’s performance comprehensively. Remarkably, in the testing cohort, VERnet exhibited AUC values of 0.971 and 0.966 for ROC and PR, respectively (Figure B). These findings underscored the robust performance of VERnet. Additionally, integrated results from EasyEnsemble improved the accuracy, as indicated in Figure A,B, surpassing the individual base classifiers. We further evaluated the proposed method using an additional random train–test split of the collected variant data set. On this independently repartitioned data set, the model maintained high predictive performance across all evaluation metrics (Figure S9), suggesting that the observed performance is not dependent on a specific data split.
To assess VERnet against other computational variant effect predictors, comparisons were made using the same testing data set (see Figure C). Notably, most of these methods were initially designed to identify pathogenic variants. To facilitate a fair comparison, a unified classification standard was applied, designating damaging or pathogenic as negative and tolerated or benign as positive since the decrease in enzyme activity often implies disrupted molecular function. ROC curves were employed for an intuitive comparative analysis (see Figure D). The best performance was defined as the method with the highest accuracy in the recognition of both high-activity and low-activity variants. Results revealed that VERnet exhibited the most robust agreement with massively parallel assessments compared with other methods in predicting CYP2C9 variant enzyme activity. Surprisingly, VERnet significantly outperformed AlphaMissense and ESM-1b, the state-of-the-art LLMs for variant effect interpretation.
We further evaluated the sensitivity of VERnet to the magnitude of the functional alterations. Across the testing data set, the Pearson correlation between VERnet predictions and experimentally measured activity scores was 0.83. To assess the model’s stability across varying degrees of functional impact, variants in the testing data set were categorized into three groups (low, moderate, and high) based on the 33rd and 66th percentile thresholds of their activity scores. The corresponding prediction accuracies were 91.3%, 95.6%, and 93.5%, respectively. These results demonstrate that VERnet maintains consistently high predictive performance across the entire functional spectrum, underscoring its robustness in decoding variants with diverse biochemical consequences.
Generative AI-Enhanced Virtual Evolution Unveils CYP2C9 Enzyme Activity through VERnet
By scrutinizing the CYP2C9 enzyme activity library with 6142 missense variants established through massively parallel assessments, we observed that mutation cold spots were rarely characterized. Therefore, a generative AI algorithm, variational autoencoder (VAE), was employed to generate an activity landscape of amino acid substitutions in CYP2C9 (see Methods). Through the predicted activity map, six mutant sites exhibiting a higher incidence of mutations altering activity were pinpointed (see Figure S3). Based on VERnet, we delved into these sites through saturation mutation prediction, uncovering novel variants with a high likelihood of activity alteration. From the prediction of VERnet, we identified 6 variants most likely to exhibit increased activity and another set of 6 with anticipated decreased activity (see Figure A). Notably, all of these variants represent novel discoveries, with their enzyme activities yet uncharacterized by previous experimental studies. Earlier studies on CYP2C9 structures revealed that these selected sites are located around a predominantly hydrophobic pocket that interacts with S-warfarin, but they do not engage in direct molecular interactions. Therefore, our results notably contributed to identifying new sites influencing ligand binding not solely through direct side-chain contacts, underscoring the distinct advantage of our virtual evolution over alternative methods.
3.
Virtual evolution of CYP2C9 based on VERnet. (A) VERnet was employed to predict the activities of 31 unannotated variants across 6 sites. Six variants with the highest likelihood of increased activity were identified, while another six variants exhibiting the most probable decrease in activity were subsequently selected. (B) Comparison of local structures proximal to the mutation site of N218A (blue) variants and WT (green) of CYP2C9 revealed notable differences. Specifically, the substitution from Asn-218 to Ala-218 induced conformational changes, causing the side chain of Asn-217 to approach S-warfarin. (C) Comparison of local structures near the mutation site of T229E (gold) variants and WT (green) of CYP2C9 unveiled distinct features. Notably, in the T229E protein, the side chain of the ligand-binding amino acid Ala-103 was extruded from the hydrophobic pocket. (D) Distance between the SOM of DIC and the oxyferryl center in the WT (green) and the N218A (red) along the MD simulations. (E) Distance between the SOM of DIC and the oxyferryl center in the WT (green) and the T229E (blue) along the MD simulations.
MD Simulations Identify Structural and Dynamic Features Underlying Catalytic Alterations in Evolved Variants
Subsequently, we highlighted variant N218A, associated with increased activity, and variant T229E, linked to decreased activity, as examples illustrating our novel findings from virtual evolution. In previously reported crystal structures, Asn-218 and Thr-229 were not considered to be part of the binding sites to S-warfarin in CYP2C9, leading to a lack of studies on these sites. Virtual evolution played a pivotal role in discovering the effect of these mutant sites. As depicted in Figure B, the subtle structural changes caused by the amino acid substitution from Asn-218 to Ala-218 induced conformational adjustments in the hydrophobic pocket, reducing the distance between Asn-217 and S-warfarin, which potentially enhanced the ligand-binding capacity. Regarding T229E, Figure C illustrates that the side chains of the ligand-binding site Ala-103 protruded from the hydrophobic pocket, leading to conformational instability in the ligand-binding region of CYP2C9-T229E.
To further investigate the conformational dynamics and catalytic determinants of the evolved variants, MD simulations were performed for the CYP2C9 WT, N218A, and T229E proteins using DIC as the substrate. The distance between the site of metabolism (SOM) of DIC (C9 atom) and the oxyferryl center was measured to evaluate the likelihood of a catalytic reaction in each system. As shown in Figure D, the WT system exhibited a certain degree of fluctuation in the C9-oxyferryl distance during the 500 ns production simulation, with an average value of 9.7 Å. In contrast, the activity-enhanced N218A variant maintained a more stable C9-oxyferryl distance with a shorter average value of 8.3 Å, suggesting improved substrate positioning and catalytic accessibility. Conversely, the activity-reduced T229E variant displayed even greater fluctuations in the C9-oxyferryl distance (Figure E), along with an increased average value of 10.7 Å, indicating impaired substrate orientation and a diminished catalytic efficiency.
Evaluation of In Vitro Metabolic Activity Strongly Affirms Predictions of VERnet and the AlphaFold3 Refined Model
To assess the predictive accuracy of VERnet, the drug metabolic activities of the identified CYP2C9 variants (12 in total) were examined using two probe drugs, tolbutamide (TOL) and diclofenac (DIC), following a previously established method. A baculovirus-based insect cell expression system was established to coexpress CYP2C9 variants and cytochrome P450 reductase (CYPOR) in insect cell microsomes, which were subsequently employed for enzyme kinetic parameter characterization. Immunoblot results indicated successful simultaneous expression of all variants with the CYPOR enzyme, except for variant R357I, which displayed a shallow signal in Western blot analysis (see Figure A and Figure S4). TOL and DIC were serially diluted and incubated with equal amounts of the purified mutant proteins. The concentrations of the resulting metabolites were quantified, and Michaelis–Menten curves were plotted to determine the kinetic parameters, K m and V max. Detailed kinetic parameters for each variant can be found in Table S3.
4.

Biochemical assays confirm predicted CYP2C9 variants. (A) Western blot analysis was conducted on wild-type (WT) CYP2C9, reported negative control CYP2C9*3 (2C9*3), and 12 CYP2C9 variants expressed in insect cell microsomes. Cytochrome P450 reductase (CYPOR) was taken as the loading control. Michaelis–Menten curves were generated for the hydroxylation of two CYP2C9 probe substrates, tolbutamide (B, C) and diclofenac (D, E), by the recombinant CYP2C9 variants. Each data point represents the mean ± standard deviation of at least three independent experiments. (F) A comparison of measured metabolic activity was made between the 12 identified selected variants and controls (WT and reported decreased variant 2C9*3).
Using TOL as the probe drug, variants N218A, T229A, A369F, E288W, E288Q, and E288K demonstrated significantly increased metabolic activity compared with WT CYP2C9 (see Figure B). Notably, variant N218A exhibited an ultrastrong metabolic activity over seven times higher than WT CYP2C9. Conversely, variants T299E, E354A, E354Q, N218G, R357V, and R357I displayed significantly decreased metabolic activity for TOL compared with the WT enzyme (see Figure C). Similar trends were observed when DIC, another typical probe drug for the CYP2C9 enzyme, was included in the experiment (see Figure D,E). Figure F shows the relative enzymatic activity of the evolved variants toward the two substrates, TOL and DIC, calculated as the clearance rate (V max/K m) of each variant normalized to that of the WT (set to 100%). Figure F illustrates that, excluding the variant R357I with faint expression, nine of 11 variants exhibited consistent results between VERnet prediction and in vitro metabolic activity evaluation.
To address the misprediction of some variants, especially those with the substitution of two amino acids, we employed AlphaFold3-predicted structures to refine the VERnet model. Specifically, AlphaFold3 was employed to generate high-accuracy tertiary structures for selected variants in the training data set, and these refined structures were subsequently used to fine-tune the VERnet model. In our current study, the fine-tuned model achieved more accurate predictions for some challenging samples, including double mutations (refer to Table S4 and Figure S6). In addition to correcting the predictions of several variants, the refined model improved its quantitative agreement with experimental measurements. Specifically, the Pearson correlation with the relative enzymatic activity toward TOL increased from 0.59 to 0.73, and that for DIC improved from 0.68 to 0.80. This successful attempt demonstrates the potential for even more precise virtual evolution with the incorporation of AlphaFold3 in the future.
Functional Interpretation and Evolutionary Consistency of VERnet-Identified CYP2C9 Variants
To elucidate the mechanistic basis and evolutionary plausibility of the CYP2C9 variants generated by VERnet-guided virtual evolution, we performed a systematic comparison of the kinetic parameters, VERnet functional scores, and MaxEnt statistical energies (summarized in Figure A). The relationships among model scores and experimental measurements were further visualized by using a pairwise correlation matrix (Figure B).
5.

Comprehensive correlation analysis between computational predictions and experimental kinetic parameters across identified CYP2C9 variants. (A) Summary of E MaxEnt(S), VERnet scores, and experimentally determined kinetic parameters for WT and 12 variants, including relative clearance to WT, V max, and K m, toward two probe substrates TOL and DIC. (B) Pairwise correlation matrix across predictive and experimental metrics. The color and size of the circles represent the Pearson correlation coefficient, where red indicates a positive correlation and blue indicates a negative correlation. The abbreviation "NA" (not applicable) denotes pairs that were excluded from the correlation analysis as they were considered functionally unrelated.
We first analyzed the correlations between VERnet predictions and the kinetic parameters across all experimentally characterized alleles. For the substrates TOL and DIC, the model predictions exhibited Pearson correlation coefficients of r = 0.73/r = 0.81 with V max, r = −0.74/r = −0.04 with K m, and r = 0.73/r = 0.8 with the clearance rate. These results indicate that the VERnet model captures enzyme activity through bidirectional optimization of both V max and K m, with a stronger contribution from V max in the case of DIC, thereby enabling the identification of variants that increase or decrease clearance rates rather than merely altering substrate-binding affinity.
To evaluate evolutionary plausibility, we next applied a MaxEnt framework to calculate the statistical energy, E MaxEnt(S), for the 12 novel variants. In this model, lower energy values indicate a higher probability of sequence occurrence under inferred evolutionary constraints, which has been previously shown to correlate with enhanced enzyme activity or stability. E MaxEnt(S) exhibited a moderate and consistent negative correlation with experimentally measured enzymatic activities toward both TOL and DIC (r = −0.5). Although E MaxEnt(S) did not strictly rank activity changes of individual variants relative to the WT, variants with lower statistical energy were generally enriched among those displaying higher catalytic activity.
Notably, a strong negative correlation was observed between E MaxEnt(S) and the functional scores predicted by VERnet (r = −0.79), indicating a high degree of agreement between the two independent modeling approaches. This strong concordance suggests that VERnet may implicitly capture evolutionary constraints that are explicitly modeled by the MaxEnt framework, thereby increasing the confidence in the biological relevance of the features learned by the function-targeted model. Collectively, these results indicate that VERnet enables the identification of novel variants through fine-grained functional prediction, while the MaxEnt statistical energy provides complementary evolutionary constraints that define the accessible sequence space.
Conclusions
It has long been recognized that structure-based artificial intelligence (AI) can inspire the discovery of novel proteins with specific properties, such as drug development, enzyme engineering, and antibody design. Current AI endeavors in synthetic biology predominantly focus on binder-design approaches for protein–protein interaction analysis. However, the exploration of protein function, such as the evolution of enzyme activity, involves more than just ligand-binding investigations. The enzyme activity also relied on its catalytic ability after binding of substrates, which is even more prominent. Our study proposed a general-purpose framework combined with specific models tailored for particular functionality, presenting a promising avenue for protein design. Our prior work applied VERnet to BRCA1 variants for pathogenicity prediction, achieving an accuracy of 85% and outperforming state-of-the-art computational methods. In this study, the precise variant effect predictor VERnet effectively uncovered novel CYP2C9 variants exhibiting significant functional alterations in two distinct directions. Our findings were robustly validated against clinical substrates of CYP2C9, which affirmed the capacity of VERnet to offer a reliable interpretation of CYP2C9 variants. The preemptive accurate prediction of CYP2C9 variant effects provides a foundation for pharmacogenomic interpretation, which holds the potential for supporting future dose-adjustment simulations to optimize clinical drug dosing guidelines. In our current study, VERnet was proposed as a general strategy that can be widely applied in the interpretation and evolution of other functional proteins. Unlike most structure–function studies that focused on protein–protein interactions, our approach enabled the prediction of more complex functions by developing specialized models based on their particular functional charts.
Discussion
Since spontaneous mutations are not randomly distributed in the genome, mutation bias is commonly observed. Therefore, there are mutation hot or cold spots along a protein sequence when performing massively parallel variant characterization, which may result in overlooking research on some important mutation sites. Employing generative AI, we predicted a complete activity landscape of amino acid substitutions in CYP2C9. Although generative AI could not precisely predict the actual impact of each mutation, its generated activity landscape can more intuitively indicate important mutation sites or regions, which efficiently empowered the virtual evolution with VERnet. Our work demonstrated the undeniable potential of generative AI for empowering protein design based on structural models.
Protein structure-based variant prediction approaches typically follow two broad strategies. The first class of methods provides a proteome-wide variant effect prediction and does not necessitate training a separate prediction model for every protein. In this strategy, models avoid relying on human-curated classification and are trained with weak labels, thus mitigating the impact of biases introduced by human annotation. However, it is widely accepted that the LLMs become less reliable as they scale up. Approaches to LLMs often lack real-world experimental data, leading to greater limitations in addressing complex biological problems. Therefore, such approaches can address only the tasks for identifying pathogenic variants, overlooking many specific functions beyond disease-related phenotypes. By adopting an individualized model for each protein, the second class of methods enables more targeted understandings of variant effects and allows directed evolution for specific functions. Despite their inability to generalize across genes, such approaches elucidate deeper insights into genotype–phenotype relationships and achieve specialized adaptability to diverse biological contexts.
In recent studies, AlphaFold3 brought considerable improvements in the prediction of protein structure, building upon the significant advancements already made by AlphaFold2. The atomic-level structure reconstruction algorithm provides more precise protein structures that are more applicable to mutation research. In this work, we employed AlphaFold3-predicted structures to refine the VERnet, obtaining more accurate predictions. Notably, the fine-tuned model enabled the prediction of variants with the replacement of two amino acids, a task that failed with AlphaFold2-based models. Although Alphafold3's outputs have not yet been permitted for further deep learning, our results indicate the potential of substantially optimizing our model by coupling it with finer structural models like AlphaFold3.
The complex nature of the protein tertiary structure has led to diverse forms of representation in deep learning, ranging from molecular graphs to 3D projections based on the protein’s original 3D shape. Accurate representation of structural information is crucial for precise predictions, particularly in the context of variant prediction, where single amino acid substitutions may minimally impact the overall coiling and folding of the protein structure but can significantly affect its function. Constructing amino acid networks (AANs) for protein structures has been proven valuable by detailing multiple biochemical interactions within proteins, thus providing more comprehensive information to capture the influences caused by single amino acid substitutions. , In this work, we investigated optimal ways to utilize AANs as inputs for VERnet models. The model exhibited a peak performance when prioritizing more valuable features, while the reverse order resulted in the least favorable outcome. Overall, this adjustment underscored the sensitivity of the prediction performance to structural representation during training.
Our in vitro results revealed that mutations could induce varying degrees of alteration in metabolic activity toward different drugs, further illustrating the diverse pathways through which structure affects function. VERnet was trained on CYP2C9 enzyme activity toward an acid hexynyl amide activity-based probe (TAHA-ABP) and accurately reflected the direction of changes in its metabolic activity toward TOL and DIC, however, exhibiting low sensitivity to recognizing the magnitude of these alterations. Nevertheless, sensitivity analysis demonstrated that VERnet maintains a robust performance across a broad dynamic range of enzymatic activities. In general, VERnet revealed the feasibility of training a model for each protein based on a wide array of its functions, striking a balance between the precision and generalizability of variant effect prediction models.
Importantly, a deeper understanding of enzyme action is essential for rational enzyme engineering. We further contextualized the VERnet-identified variants within an evolutionary framework by integrating a MaxEnt statistical modeling approach, which has uncovered a link to enzyme physicochemical properties. Although the MaxEnt-derived statistical energy is not intended to directly predict substrate-specific enzymatic activity, its strong concordance with VERnet scores across the evolved variants suggests that VERnet predictions are constrained within an evolutionarily plausible sequence space. Moreover, the moderate association between MaxEnt-derived statistical energy and experimental measurements highlights the value of MaxEnt as an interpretable metric for prioritizing sequence space and refining mutation libraries, thereby reducing the experimental burden through a more rational design.
Significantly, our results have contributed to uncovering new mutations that indirectly influence ligand binding, moving beyond solely exploring the sites directly binding ligands via their side chains. As shown in our findings, a CYP2C9 mutation N218A was highlighted, exhibiting an ultrastrong TOL metabolic activity over seven times higher than the WT enzyme, a level of activity unprecedented in previously characterized variants (refer to Table S5 for a summary of TOL metabolic activities across named CYP2C9). Due to the possible mutation bias from natural selection, massively parallel characterization has also never identified CYP2C9 mutations with such significant alterations in activity. These results demonstrated that VERnet enabled the generation of variants with stronger functionality, surpassing what natural evolution can approach. Moreover, Asn-218 has never been considered a ligand-binding site by previous structural studies, underscoring the unique advantage of virtual evolution based on prediction models over existing ligand-based de novo design methods.
Methods
Data Preparation for Training and Testing Models
The capabilities of enzyme activity of 6142 CYP2C9 missense variants were massively determined by pooled yeast-based activity assays, 4421 of which also had steady-state cellular abundance measurement. After removing the variants with conflicting situations between activity and abundance, we picked out 3055 missense variants with relatively reliable annotation to create labels. Specifically, we assigned positive labels to the 1027 variants with both high activity scores and high abundance scores, while negative labels were assigned to the 2028 variants with both low activity scores and low abundance scores. To reduce computational requirements, we focused on the mutations that occurred around the essential hydrophobic pocket of CYP2C9 protein. In addition, we employed self-distillation by training preliminary models to filter out some ambiguous variants. We pretrained several CNN models using different subsets of the data set to recognize the rest of variants. Samples that were incorrectly recognized with extreme scores, which indicated that they were difficult samples or misclassified samples, were removed from the data set. Finally, we built a labeled data set on this protein fragment with 377 amino acids, with 2023 positive samples and 690 negative samples. To evaluate the potential of VERnet for inferring variant effects, 138 positive samples and 138 negative samples were randomly selected and used for the testing data set. Detailed descriptions of the data sets are provided in Table S1.
To prepare protein structural information on variants as input data for VERnet, we carried out the following data preparation stage. Initially, the amino acid sequences of CYP2C9 variants were generated using the HGVS format representations of variants for tertiary structure prediction with AlphaFold2. Subsequently, the files saving the AANs were generated using the method in the next section. The AANs were truncated into subnetworks relevant to the fragment of interest, which were further transformed into 3D matrices using the optimized method in a later section. In such input matrices, the coordinates of both the row and column represented the amino acids in a sequence, and the element values represented the interaction strengths between amino acids at the corresponding positions. Here, the shape of the input data is 377 × 377 grid with 7 channels. All preprocessing procedures were performed by using Python.
Construction of AANs
Construction of AANs for protein variants in the current data sets was performed to train a deep-learning model VERnet or evaluate its performance. As some successful practices of using AANs to infer protein functions have been achieved in a previous study, we processed the atomic 3D coordinates of a protein predicted by AlphaFold2 using the Probe progress, version 2011.10 (http://kinemage.biochem.duke.edu). In brief, the Probe identified contacts between amino acids in a protein using a small rolling probe to evaluate their atomic packing. The program created a small virtual probe sphere (usually 0.25 Å radius) that rolled around the van der Waals surface of each atom. If this probe touched or overlapped with another noncovalently bonded atom, then it indicated that an interaction or an overlap was detected, which was represented by periodically drawing contact dots or spikes. The strength of the interactions also could be quantitatively measured by the contact dots or spikes. Overlaps like hydrogen bonds were quantified by the volume of overlap, and other nonoverlapping contacts were quantified by a weighted sum of the contact scores per dot. The combined score for the generic residue interaction was obtained by summing the weighted scores of these three interactions. The scores of the interactions were proportional to their strength. Next, the Probe program summarized and output the scoring data for all atoms of an entire structure into a file. The final step to generate networks was completed by the Python package RINerator, version 2014.10 (https://rinalyzer.de/rinerator.php). It integrated the information for every atom or residue from the previous file to construct an undirected weighted network with multiple edges, in which the nodes and edges respectively represented the amino acid residues and the noncovalent interactions. There are four possible types of edges, namely, interatomic contact (cnt), overlap (ovl), hydrogen bond (hbond), and generic residue interaction (combi), while the subtypes of each interaction are combinations between main chains (mc) and side chains (sc). The resulting AANs were stored in files formatted as SIF (nonweighted) and NA (weighted).
The Representation of AANs for Deep Learning
For each interaction type in AANs, the interactions between main-chain to main-chain and side-chain to side-chain were aggregated into one channel, while the interactions between main-chain to side-chain were stored in a separate channel. Therefore, there were 7 channels separately stored: cnt between similar chains, cnt between different chains, hbond between similar chains, hbond between different chains, combi between all kinds of chains, ovl between similar chains, and ovl between different chains. Our previous studies demonstrated that the channel of combi, which was the weight combination of other features, provided the most significant contribution, followed by the channels of cnt, representing interatomic contact. The channels of ovl made minimal contributions due to the poorest information, but their inclusion still improved the model performance. vERnet-B used an arbitrary feature order to organize the input matrices. We applied different channel arrangements to train independent models using the same training data set and same learning parameters and evaluated them on the same testing data set. Finally, in VERnet, we modified the input 3D matrices by reordering the channels according to their contribution to the model performance.
Construction of Deep Neural Networks
The architecture of the learning model used to implement the base classifiers of VERnet was ResNet18, including 2D convolutional blocks and fully connected blocks. The 2D convolutional blocks consisted of one maximum-pooling layer and five 2D convolutional layers. The first 2D convolutional layer (with a kernel size of 7 × 7) was applied before max-pooling. Residual blocks were implemented to form the other four 2D convolutional layers, each containing parallel convolutions. At the end of each convolution operation, batch normalization (BN) and Leaky ReLU activation were adopted. Each of the last four 2D convolutional layers stacked two residual blocks, and different channel sizes (64, 128, 256, and 512) were applied to them, with a kernel size of 3 × 3. Downsampling was used in the first 2D convolutional layer and the first residual block of the last three 2D convolutional layers. The maximum-pooling size was set to 3 × 3. Fully connected blocks consisted of the global average pooling layer and the dense layer. Cross-entropy was used as the loss function. A total of 11,194,882 trainable parameters were included in the model. An overview of the deep neural networks is provided in Figure S2. The number of maximum epochs and batch size hyperparameters were set to 500 and 69, respectively, and we used early stopping with a patience interval of 10 epochs to prevent overfitting. The learning rate was automatically chosen by Keras’s optimizer, Adadelta. To evaluate the training effect of our 2D-CNN model, we assigned 10% of the training samples to the validation data set and the remaining 90% to the training data set. We trained 3 models with good performance to ensemble VERnet, all of which were trained using an NVIDIA Corporation GP102 with 11 GB of memory. The Keras 2.5.0 library with TensorFlow 2.5.0 as the backend was used for the implementation of the 2D-CNN model.
EasyEnsemble
One of the popular methods in dealing with class-imbalance problems is undersampling. The traditional undersampling methods only used a subset of the majority class, so it is not efficient in the usage of the samples. EasyEnsemble was proposed to overcome this deficiency by using a set of majority class examples N and a set of minority class examples P. Specifically, we first randomly sampled a subset Ni from N with the same number as P. Then, T base classifiers were trained using different undersampled data sets, of which the outputs were integrated together as the final model result. Here, we did not use the sgn function to obtain the ensemble results but chose a more reliable probability to determine the classification.
Generative AI-Assisted Virtual Evolution
We proposed a generative AI method for generating missing scores in the activity landscape of amino acid substitutions in CYP2C9 variants. The variational autoencoder (VAE) is a generative model consisting of an encoder and a decoder, which is trained to learn a latent representation of the data distribution. Our approach leveraged the capability of VAEs to model the distribution of CYP2C9 enzyme activity and generate imputed values for uncharacterized variants. We constructed an enzyme activity matrix of missense SNVs with known activity covered by parallel assessments, acting as the data set for training VAE. The values for 20 possible amino acids (including WT) at each site constituted a sample, with the WT amino acid at this position serving as its label. Before training, we preprocessed the data set by scaling the values to a range between 0 and 1 using min–max scaling. For each site, the VAE was trained on data from the other sites by learning to minimize the reconstruction error while also regularizing the distribution of the latent space. The trained decoder imputed the missing values using the incomplete data of each sample and its corresponding label.
MD Simulation
MD simulations were performed using the Amber24 package with GPU acceleration. The protein conformations of CYP2C9 WT, N218A, and T229E were generated using AlphaFold2. To mimic the metabolism-ready state, heme and cysteine parameters for the compound I (CPDI) state of heme were taken from the work of Shahrokh et al. The initial positions of substrate DIC were obtained by docking with the AutoDock 4.2 program. Antechamber assigned a neutral charge to DIC using the AM1-BCC model. Amber ff99SB force field, general Amber force field (GAFF), and OPC water model parameters were used in the study. Complexes were placed in a truncated octahedron box of OPC waters with 15 Å buffer. To mimic a solvent of 150 mM NaCl, one Cl– ion was added to neutralize the system along with 64 Na+ and 64 Cl– ions.
The system first underwent four consecutive minimization steps. In the first minimization step, the protein was held with a constant force restraint of 100 kcal/mol Å, allowing only solvent molecules to relax for 2000 cycles (500 steep descent followed by 1500 conjugate gradient steps). For the second minimization step, restraints were reduced to 10 kcal/mol Å and applied only to heavy atoms. The third minimization step maintained a 10 kcal/mol Å restraint on the protein backbone, whereas the final minimization was conducted without any restraints for 10,000 cycles (3000 steep descent followed by 7000 conjugate gradient steps). Following energy minimization, the system was gradually heated from 0 to 300 K over 200 ps under constant volume conditions, using a Langevin thermostat with a collision frequency of 2.0 ps–1 and a restraint of 10 kcal/mol Å applied to the solute. Equilibration was then performed in three stages under an NPT ensemble with a target pressure of 1 atm and a pressure relaxation time of 2 ps. Each stage lasted for 20, 20, and 100 ps, with progressively reduced restraints.
After heating and equilibration, the total energy, temperature, and density of each system stabilized (Figure S7), confirming that the simulations had reached equilibrium. Production phase simulations were carried out under constant temperature (300 K) and pressure (1 atm) by using a Langevin thermostat and a Monte Carlo barostat. During the heating, equilibration, and production phases, hydrogen bonds were constrained using the SHAKE algorithm with a time step of 2 fs. A nonbonded cutoff of 8.0 Å was used throughout all of the steps.
Each system was simulated in triplicate to ensure reproducibility, resulting in a total of nine independent trajectories.
Drug Metabolic Activity Analysis
cDNA of the 12 identified variants, a typical defective variant CYP2C9*3, and the WT were obtained by the overlap extension PCR amplification method. , The site-directed mutagenesis primers are provided in Table S2. Purified PCR amplicons were digested with EcoRI and XbaI at 37 °C for 2 h and then ligated to the EcoRI/XbaI double-digested pFast Bac dual-OR vector to generate the recombinant plasmid pFastBac dual-OR-2C9. A Bac-to-Bac Baculovirus Expression System was used to package constructed pFastBac dual-OR-2C9 vectors into baculoviruses for expressing OR and 2C9 enzymes simultaneously in insect cell microsomes. Then, the expressions of CYP2C9 and OR proteins were quantified with our previously reported immunoblot methods.
Subsequently, the enzyme kinetics analysis was performed with the method we previously described, in which two representative CYP2C9 probe drugs, tolbutamide (TOL) and diclofenac (DIC), were chosen as the substrate because only CYP2C9 mediates their metabolism in human liver microsomes. They are representative probe substrates for assessing CYP2C9 activity, and notably, DIC is recognized and recommended by the U.S. FDA as a standard probe. In brief, the reaction mixture consisted of 5–10 pmol of recombinant CYP2C9 insect microsomes, 10–20 pmol of cytochrome b5, 100 mM K3PO4 (pH 7.4, for tolbutamide) or Tris-HCl (pH 7.4, for diclofenac), and a series of gradient solutions of drugs (10–1000 mM for tolbutamide and 1–100 mM for diclofenac). After preincubation at 37 °C for 5 min, an NADPH regeneration system was added to a final reaction volume of 200 μL to start the reaction, and the incubation was allowed to proceed at 37 °C for 60 min (for tolbutamide) or 30 min (for diclofenac). The reaction was terminated by the addition of 200 μL of acetonitrile containing 50 ng/mL diazepam as an internal standard. Reaction products were centrifuged at 12,000 rpm for 5 min, and the organic phase was transferred into autosampler plastic vials for injection and detection on an ACQUITY I-Class UPLC and a Waters XEVO TQD MS (Milford, MA, United States). The incubations were performed in triplicate, and the results were presented as the mean ± standard deviation (SD) from three experiments. The enzyme kinetic parameters (K m and V max) were calculated by the Michaelis–Menten model with nonlinear regression analysis parameters. The absolute clearance rate was defined as clearance = V max/K m. One-way analysis of variance was used for the enzymatic activity comparison between WT and variants.
Maximum Entropy Modeling
To characterize the evolutionary plausibility of VERnet-identified CYP2C9 variants, we applied a MaxEnt statistical framework based on sequence coevolutionary information, as previously proposed by Xie et al. MSAs were constructed by retrieving homologous sequences from the UniRef90 database using Jackhmmer, followed by standard filtering procedures to remove highly gapped positions and redundant sequences. Single-site amino acid frequencies and pairwise co-occurrence frequencies were then computed from the filtered MSAs and used as experimental constraints in the MaxEnt model.
The MaxEnt model was formulated as a generalized Potts Hamiltonian in which model parameters were optimized to reproduce the observed single-body and pairwise statistics from the MSA under the MaxEnt principle. Convergence of the loss function during model parametrization is shown in Figure S8. Using the parametrized MaxEnt model, the statistical energy E MaxEnt(S) was calculated for each variant protein sequence S. For ease of comparison, the energy values were shifted by a constant such that the wild-type (WT) sequence was assigned a reference energy of zero.
Supplementary Material
Acknowledgments
The authors would like to express their gratitude for the generosity of Professor Douglas M. Fowler, at the University of Washington for sharing the data. The authors would like to acknowledge the valuable suggestions made by Dr. Dong An at Florida University. Furthermore, the authors are deeply grateful to Professor Wenjun Xie for his invaluable assistance in implementing the MaxEnt framework, verifying the accuracy of the results, and critically reviewing the sections of the manuscript related to the maximum entropy method.
Detailed descriptions of the training and testing data sets are provided in Table S1. All processed data sets, trained VERnet models, training codes, and accompanying metadata have been released on GitHub under the Apache License 2.0 at https://github.com/XiaoAILab/VERnet.
The Supporting Information is available free of charge at https://pubs.acs.org/doi/10.1021/acscatal.6c00759.
Tables summarizing training and testing data sets, experimental conditions and parameters, raw experimental measurements, and comparisons between predicted and experimental results (ZIP)
Supplementary performance evaluation, predicted mutational landscapes, and details of experimental validation and computational simulation (PDF)
†.
C.L., W.X., and H.Y. contributed equally. F.X. and P.J. conceived and designed the experiments. C.L., W.X., L.Z., Yf.L., and Y.X. performed the preprocessing of AANs, designed the neural networks, and carried out model training and analysis. Z.C., S.W., H.L., and Yy.L. performed the variant collection and screening. H.Y., Y.L., and Z.L. performed the functional validation. C.L. and C.Z. performed the MD and MaxEnt simulations. C.L. and W.X. wrote the manuscript. F.X., X.Y., D.D., and P.J. played advisory roles. All of the authors contributed to the review of the manuscript.
This work was supported by the National Natural Science Foundation of China (Grant 82372314), Innovative Drug Research and DevelopmentNational Science and Technology Major Project (No. 2025ZD1801600), National Key Research and Development Program of China (2025YFC2423900), CAMS Innovation Fund for Medical Sciences (2021-I2M-1-050), and National High Level Hospital Clinical Research Funding (Grant BJ-2023-233).
The authors declare the following competing financial interest(s): Authors declare that they have no competing financial interests. F.X. and C.L. el. have filed a provisional patent application relating to the VERnet algorithms.
References
- Frazer J., Notin P., Dias M., Gomez A., Min J. K., Brock K., Gal Y., Marks D. S.. Disease variant prediction with deep generative models of evolutionary data. Nature. 2021;599(7883):91–95. doi: 10.1038/s41586-021-04043-8. [DOI] [PubMed] [Google Scholar]; Pejaver V., Urresti J., Lugo-Martinez J., Pagel K. A., Lin G. N., Nam H. J., Mort M., Cooper D. N., Sebat J., Iakoucheva L. M.. et al. Inferring the molecular and phenotypic impact of amino acid variants with MutPred2. Nat. Commun. 2020;11(1):5918. doi: 10.1038/s41467-020-19669-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Jumper J., Evans R., Pritzel A., Green T., Figurnov M., Ronneberger O., Tunyasuvunakool K., Bates R., Žídek A., Potapenko A.. Highly accurate protein structure prediction with AlphaFold. Nature. 2021;596(7873):583–589. doi: 10.1038/s41586-021-03819-2. [DOI] [PMC free article] [PubMed] [Google Scholar]; Kryshtafovych A., Schwede T., Topf M., Fidelis K., Moult J.. Critical assessment of methods of protein structure prediction (CASP)Round XIII. Proteins: Struct., Funct., Bioinf. 2019;87(12):1011–1020. doi: 10.1002/prot.25823. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Baek M., DiMaio F., Anishchenko I., Dauparas J., Ovchinnikov S., Lee G. R., Wang J., Cong Q., Kinch L. N., Schaeffer R. D.. et al. Accurate prediction of protein structures and interactions using a three-track neural network. Science. 2021;373(6557):871–876. doi: 10.1126/science.abj8754. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Li C., Zhang L., Zhuo Z., Su F., Li H., Xu S., Liu Y., Zhang Z., Xie Y., Yu X.. et al. Artificial intelligence-based recognition for variant pathogenicity of BRCA1 using AlphaFold2-predicted structures. Theranostics. 2023;13(1):391–402. doi: 10.7150/thno.79362. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Cheng J., Novati G., Pan J., Bycroft C., Zemgulyte A., Applebaum T., Pritzel A., Wong L. H., Zielinski M., Sargeant T.. et al. Accurate proteome-wide missense variant effect prediction with AlphaMissense. Science. 2023;381(6664):eadg7492. doi: 10.1126/science.adg7492. [DOI] [PubMed] [Google Scholar]
- Abramson J., Adler J., Dunger J., Evans R., Green T., Pritzel A., Ronneberger O., Willmore L., Ballard A. J., Bambrick J.. et al. Accurate structure prediction of biomolecular interactions with AlphaFold 3. Nature. 2024;630(8016):493–500. doi: 10.1038/s41586-024-07487-w. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Krishna R., Wang J., Ahern W., Sturmfels P., Venkatesh P., Kalvet I., Lee G. R., Morey-Burrows F. S., Anishchenko I., Humphreys I. R.. et al. Generalized biomolecular modeling and design with RoseTTAFold All-Atom. Science. 2024;384(6693):eadl2528. doi: 10.1126/science.adl2528. [DOI] [PubMed] [Google Scholar]
- Gao H., Hamp T., Ede J., Schraiber J. G., McRae J., Singer-Berk M., Yang Y., Dietrich A., Fiziev P., Kuderna L.. et al. The landscape of tolerated genetic variation in humans and primates. bioRxiv. 2023 doi: 10.1101/2023.05.01.538953. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Jimenez-Luna J., Grisoni F., Weskamp N., Schneider G.. Artificial intelligence in drug discovery: recent advances and future perspectives. Expert Opin Drug Discov. 2021;16(9):949–959. doi: 10.1080/17460441.2021.1909567. [DOI] [PubMed] [Google Scholar]
- Ozcelik R., van Tilborg D., Jimenez-Luna J., Grisoni F.. Structure-Based Drug Discovery with Deep Learning. Chembiochem. 2023;24(13):e202200776. doi: 10.1002/cbic.202200776. [DOI] [PubMed] [Google Scholar]
- Vazquez Torres S., Leung P. J. Y., Venkatesh P., Lutz I. D., Hink F., Huynh H. H., Becker J., Yeh A. H., Juergens D., Bennett N. R.. et al. De novo design of high-affinity binders of bioactive helical peptides. Nature. 2024;626(7998):435–442. doi: 10.1038/s41586-023-06953-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zhou L., Schellaert W., Martinez-Plumed F., Moros-Daval Y., Ferri C., Hernandez-Orallo J.. Larger and more instructable language models become less reliable. Nature. 2024;634(8032):61–68. doi: 10.1038/s41586-024-07930-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Reardon S.. Five protein-design questions that still challenge AI. Nature. 2024;635(8037):246–248. doi: 10.1038/d41586-024-03595-9. [DOI] [PubMed] [Google Scholar]
- Levy R. M., Haldane A., Flynn W. F.. Potts Hamiltonian models of protein co-variation, free energy landscapes, and evolutionary fitness. Curr. Opin Struct Biol. 2017;43:55–62. doi: 10.1016/j.sbi.2016.11.004. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Xie W. J., Asadi M., Warshel A.. Enhancing computational enzyme design by a maximum entropy strategy. Proc. Natl. Acad. Sci. U. S. A. 2022;119(7):e2122355119. doi: 10.1073/pnas.2122355119. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Xie W. J., Warshel A.. Natural Evolution Provides Strong Hints about Laboratory Evolution of Designer Enzymes. Proc. Natl. Acad. Sci. U. S. A. 2022;119(31):e2207904119. doi: 10.1073/pnas.2207904119. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Asadi M., Xie W. J., Warshel A.. Exploring the Role of Chemical Reactions in the Selectivity of Tyrosine Kinase Inhibitors. J. Am. Chem. Soc. 2022;144(36):16638–16646. doi: 10.1021/jacs.2c07307. [DOI] [PMC free article] [PubMed] [Google Scholar]; Gelfand N., Orel V., Cui W., Damborsky J., Li C., Prokop Z., Xie W. J., Warshel A.. Biochemical and Computational Characterization of Haloalkane Dehalogenase Variants Designed by Generative AI: Accelerating the S(N)2 Step. J. Am. Chem. Soc. 2025;147(3):2747–2755. doi: 10.1021/jacs.4c15551. [DOI] [PMC free article] [PubMed] [Google Scholar]; Xie W. J., Liu D., Wang X., Zhang A., Wei Q., Nandi A., Dong S., Warshel A.. Enhancing luciferase activity and stability through generative modeling of natural enzyme sequences. Proc. Natl. Acad. Sci. U. S. A. 2023;120(48):e2312848120. doi: 10.1073/pnas.2312848120. [DOI] [PMC free article] [PubMed] [Google Scholar]; Hu L., Zhang A., Warshel A.. Exploring evolutionary trajectories of drug resistance. Proc. Natl. Acad. Sci. U. S. A. 2025;122(45):e2517715122. doi: 10.1073/pnas.2517715122. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Matreyek K. A., Starita L. M., Stephany J. J., Martin B., Chiasson M. A., Gray V. E., Kircher M., Khechaduri A., Dines J. N., Hause R. J.. et al. Multiplex assessment of protein variant abundance by massively parallel sequencing. Nat. Genet. 2018;50(6):874–882. doi: 10.1038/s41588-018-0122-z. [DOI] [PMC free article] [PubMed] [Google Scholar]; Wester M. R., Yano J. K., Schoch G. A., Yang C., Griffin K. J., Stout C. D., Johnson E. F.. The structure of human cytochrome P450 2C9 complexed with flurbiprofen at 2.0-A resolution. J. Biol. Chem. 2004;279(34):35630–35637. doi: 10.1074/jbc.M405427200. [DOI] [PubMed] [Google Scholar]
- Rettie A. E., Jones J. P.. Clinical and toxicological relevance of CYP2C9: drug-drug interactions and pharmacogenetics. Annu. Rev. Pharmacol Toxicol. 2005;45:477–494. doi: 10.1146/annurev.pharmtox.45.120403.095821. [DOI] [PubMed] [Google Scholar]; Daly AK, Rettie AE, Fowler DM, Miners JO. PharmacogenomicsofCYP2C9FunctionalandClinicalConsiderations. JPersMed. 2018;8(1):1. doi: 10.3390/jpm8010001. [DOI] [Google Scholar]
- Sultana J., Cutroneo P., Trifiro G.. Clinical and economic burden of adverse drug reactions. J. Pharmacol. Pharmacother. 2013;4(Suppl 1):S73–S77. doi: 10.4103/0976-500X.120957. [DOI] [PMC free article] [PubMed] [Google Scholar]; Lazarou J., Pomeranz B. H., Corey P. N.. Incidence of adverse drug reactions in hospitalized patients: a meta-analysis of prospective studies. JAMA. 1998;279(15):1200–1205. doi: 10.1001/jama.279.15.1200. [DOI] [PubMed] [Google Scholar]
- Li X., Li D., Wu J. C., Liu Z. Q., Zhou H. H., Yin J. Y.. Precision dosing of warfarin: open questions and strategies. Pharmacogenomics J. 2019;19(3):219–229. doi: 10.1038/s41397-019-0083-3. [DOI] [PubMed] [Google Scholar]
- Relling M. V., Klein T. E.. CPIC: Clinical Pharmacogenetics Implementation Consortium of the Pharmacogenomics Research Network. Clin Pharmacol Ther. 2011;89(3):464–467. doi: 10.1038/clpt.2010.279. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Sundaram L., Gao H., Padigepati S. R., McRae J. F., Li Y., Kosmicki J. A., Fritzilas N., Hakenberg J., Dutta A., Shon J.. et al. Predicting the clinical impact of human mutation with deep neural networks. Nat. Genet. 2018;50(8):1161–1170. doi: 10.1038/s41588-018-0167-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Liu R., Lyu X., Batt S. M., Hsu M. H., Harbut M. B., Vilcheze C., Cheng B., Ajayi K., Yang B., Yang Y.. et al. Determinants of the Inhibition of DprE1 and CYP2C9 by Antitubercular Thiophenes. Angew. Chem., Int. Ed. Engl. 2017;56(42):13011–13015. doi: 10.1002/anie.201707324. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Williams P. A., Cosme J., Ward A., Angove H. C., Matak Vinkovic D., Jhoti H.. Crystal structure of human cytochrome P450 2C9 with bound warfarin. Nature. 2003;424(6947):464–468. doi: 10.1038/nature01862. [DOI] [PubMed] [Google Scholar]
- Amorosi C. J., Chiasson M. A., McDonald M. G., Wong L. H., Sitko K. A., Boyle G., Kowalski J. P., Rettie A. E., Fowler D. M., Dunham M. J.. Massively parallel characterization of CYP2C9 variant enzyme activity and abundance. Am. J. Hum. Genet. 2021;108(9):1735–1751. doi: 10.1016/j.ajhg.2021.07.001. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Liu X. Y., Wu J., Zhou Z. H.. Exploratory undersampling for class-imbalance learning. IEEE Trans. Syst., Man, and Cybern., Part B (Cybernetics) 2009;39(2):539–550. doi: 10.1109/TSMCB.2008.2007853. [DOI] [PubMed] [Google Scholar]
- Zhao F. L., Zhang Q., Wang S. H., Hong Y., Zhou S., Zhou Q., Geng P. W., Luo Q. F., Yang J. F., Chen H.. et al. Identification and drug metabolic characterization of four new CYP2C9 variants CYP2C9*72-*75 in the Chinese Han population. Front Pharmacol. 2022;13:1007268. doi: 10.3389/fphar.2022.1007268. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Yeh A. H., Norn C., Kipnis Y., Tischer D., Pellock S. J., Evans D., Ma P., Lee G. R., Zhang J. Z., Anishchenko I.. et al. De novo design of luciferases using deep learning. Nature. 2023;614(7949):774–780. doi: 10.1038/s41586-023-05696-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Torres SV, Leung PJY, Venkatesh P, Lutz ID, Hink F, Huynh HH, Becker J, Yeh AH, Juergens D, Bennett NR. et al. De novo design of high-affinity binders of bioactive helical peptides. Nature. 2023;626:435–442. doi: 10.1038/s41586-023-06953-1. [DOI] [PMC free article] [PubMed] [Google Scholar]; Anderluzzi G., Schmidt S. T., Cunliffe R., Woods S., Roberts C. W., Veggi D., Ferlenghi I., O’Hagan D. T., Baudner B. C., Perrie Y.. Rational design of adjuvants for subunit vaccines: The format of cationic adjuvants affects the induction of antigen-specific antibody responses. J. Controlled Release. 2021;330:933–944. doi: 10.1016/j.jconrel.2020.10.066. [DOI] [PubMed] [Google Scholar]
- Monroe J. G., Srikant T., Carbonell-Bejerano P., Becker C., Lensink M., Exposito-Alonso M., Klein M., Hildebrandt J., Neumann M., Kliebenstein D.. et al. Mutation bias reflects natural selection in Arabidopsis thaliana. Nature. 2022;602(7895):101–105. doi: 10.1038/s41586-021-04269-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Feinberg E. N., Sur D., Wu Z., Husic B. E., Mai H., Li Y., Sun S., Yang J., Ramsundar B., Pande V. S.. PotentialNet for Molecular Property Prediction. ACS Cent Sci. 2018;4(11):1520–1530. doi: 10.1021/acscentsci.8b00507. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Tavanaei A., Maida A. S., Kaniymattam A., Loganantharaj R.. Towards Recognition of Protein Function based on its Structure using Deep Convolutional Networks. IEEE Int. Conf. Bioinf. Biomed. 2016:145–149. doi: 10.1109/BIBM.2016.7822509. [DOI] [Google Scholar]
- Song B., Luo X., Luo X., Liu Y., Niu Z., Zeng X.. Learning spatial structures of proteins improves protein-protein interaction prediction. Briefings Bioinf. 2022;23(2):bbab558. doi: 10.1093/bib/bbab558. [DOI] [PubMed] [Google Scholar]
- Xie W. J., Warshel A.. Harnessing generative AI to decode enzyme catalysis and evolution for enhanced engineering. Natl. Sci. Rev. 2023;10(12):nwad331. doi: 10.1093/nsr/nwad331. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Giollo M., Martin A. J., Walsh I., Ferrari C., Tosatto S. C.. NeEMO: a method using residue interaction networks to improve prediction of protein stability upon mutation. BMC Genomics. 2014;15(4):S7. doi: 10.1186/1471-2164-15-s4-s7. [DOI] [PMC free article] [PubMed] [Google Scholar]; Yan W., Sun M., Hu G., Zhou J., Zhang W., Chen J., Chen B., Shen B.. Amino acid contact energy networks impact protein structure and evolution. J. Theor. Biol. 2014;355:95–104. doi: 10.1016/j.jtbi.2014.03.032. [DOI] [PubMed] [Google Scholar]; Li Y., Wen Z., Xiao J., Yin H., Yu L., Yang L., Li M.. Predicting disease-associated substitution of a single amino acid by analyzing residue interactions. BMC Bioinf. 2011;12(1):14. doi: 10.1186/1471-2105-12-14. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Word J. M., Lovell S. C., Richardson J. S., Richardson D. C.. Asparagine and glutamine: using hydrogen atom contacts in the choice of side-chain amide orientation. Journal of molecular biology. 1999;285(4):1735–1747. doi: 10.1006/jmbi.1998.2401. [DOI] [PubMed] [Google Scholar]
- Doncheva N. T., Klein K., Domingues F. S., Albrecht M.. Analyzing and visualizing residue networks of protein structures. Trends in biochemical sciences. 2011;36(4):179–182. doi: 10.1016/j.tibs.2011.01.002. [DOI] [PubMed] [Google Scholar]
- He, K. ; Zhang, X. ; Ren, S. ; Sun, J. . Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, 2016; pp. 770-778.
- Ioffe, S. ; Szegedy, C. . Batch normalization: Accelerating deep network training by reducing internal covariate shift. In International conference on machine learning, 2015; pp. 448-456.
- Nair, V. ; Hinton, G. E. . Rectified linear units improve restricted boltzmann machines. In Proceedings of the 27th international conference on machine learning (ICML-10), 2010; pp. 807-814.
- Zeiler, M. D. Adadelta: an adaptive learning rate method. arXiv preprint arXiv:1212.5701, 2012.
- Chollet F.. Keras: Deep learning library for theano and tensorflow. URL: https://keras. io/k. 2015;7(8):T1. [Google Scholar]
- Abadi, M. TensorFlow: learning functions at scale. In Proceedings of the 21st ACM SIGPLAN International Conference on Functional Programming, 2016; pp. 1-1.
- Drummond, C. ; Holte, R. C. . C4. 5, class imbalance, and cost sensitivity: why under-sampling beats over-sampling. In Workshop on learning from imbalanced datasets II, 2003; pp. 1-8.
- Liu X.-Y., Wu J., Zhou Z.-H.. Exploratory undersampling for class-imbalance learning. IEEE Trans. Syst., Man, and Cybern., Part B (Cybernetics) 2008;39(2):539–550. doi: 10.1109/TSMCB.2008.2007853. [DOI] [PubMed] [Google Scholar]
- Cemgil T., Ghaisas S., Dvijotham K., Gowal S., Kohli P.. The autoencoding variational autoencoder. Adv. Neural Inf. Process. Syst. 2020;33:15077–15087. doi: 10.48550/arXiv.2012.03715. [DOI] [Google Scholar]
- Case D. A., Cheatham T. E. 3rd, Darden T., Gohlke H., Luo R., Merz K. M. Jr, Onufriev A., Simmerling C., Wang B., Woods R. J.. The Amber biomolecular simulation programs. J. Comput. Chem. 2005;26(16):1668–1688. doi: 10.1002/jcc.20290. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Shahrokh K., Orendt A., Yost G. S., Cheatham T. E. 3rd. Quantum mechanically derived AMBER-compatible heme parameters for various states of the cytochrome P450 catalytic cycle. J. Comput. Chem. 2012;33(2):119–133. doi: 10.1002/jcc.21922. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Morris G. M., Huey R., Lindstrom W., Sanner M. F., Belew R. K., Goodsell D. S., Olson A. J.. AutoDock4 and AutoDockTools4: Automated docking with selective receptor flexibility. J. Comput. Chem. 2009;30(16):2785–2791. doi: 10.1002/jcc.21256. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Tian C., Kasavajhala K., Belfon K. A. A., Raguette L., Huang H., Migues A. N., Bickel J., Wang Y., Pincay J., Wu Q.. et al. ff19SB: Amino-Acid-Specific Protein Backbone Parameters Trained against Quantum Mechanics Energy Surfaces in Solution. J. Chem. Theory Comput. 2020;16(1):528–552. doi: 10.1021/acs.jctc.9b00591. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Izadi S., Anandakrishnan R., Onufriev A. V.. Building Water Models: A Different Approach. J. Phys. Chem. Lett. 2014;5(21):3863–3871. doi: 10.1021/jz501780a. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Dai D. P., Wang Y. H., Wang S. H., Geng P. W., Hu L. M., Hu G. X., Cai J. P.. In vitro functional characterization of 37 CYP2C9 allelic isoforms found in Chinese Han population. Acta Pharmacol Sin. 2013;34(11):1449–1456. doi: 10.1038/aps.2013.123. [DOI] [PMC free article] [PubMed] [Google Scholar]; Dai D. P., Xu R. A., Hu L. M., Wang S. H., Geng P. W., Yang J. F., Yang L. P., Qian J. C., Wang Z. S., Zhu G. H.. et al. CYP2C9 polymorphism analysis in Han Chinese populations: building the largest allele frequency database. Pharmacogenomics J. 2014;14(1):85–92. doi: 10.1038/tpj.2013.2. [DOI] [PubMed] [Google Scholar]
- Dai D. P., Wang S. H., Li C. B., Geng P. W., Cai J., Wang H., Hu G. X., Cai J. P.. Identification and Functional Assessment of a New CYP2C9 Allelic Variant CYP2C9*59. Drug Metab. Dispos. 2015;43(8):1246–1249. doi: 10.1124/dmd.115.063412. [DOI] [PubMed] [Google Scholar]; Chen H., Dai D. P., Zhou S., Liu J., Wang S. H., Wu H. L., Zhou Q., Geng P. W., Chong J., Lu Y.. et al. An identification and functional evaluation of a novel CYP2C9 variant CYP2C9*62. Chem. Biol. Interact. 2020;327:109168. doi: 10.1016/j.cbi.2020.109168. [DOI] [PubMed] [Google Scholar]
- Waring R. H.. Cytochrome P450: genotype to phenotype. Xenobiotica. 2020;50(1):9–18. doi: 10.1080/00498254.2019.1648911. [DOI] [PubMed] [Google Scholar]
- Johnson L. S., Eddy S. R., Portugaly E.. Hidden Markov model speed heuristic and iterative HMM search procedure. BMC Bioinformatics. 2010;11:431. doi: 10.1186/1471-2105-11-431. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
Detailed descriptions of the training and testing data sets are provided in Table S1. All processed data sets, trained VERnet models, training codes, and accompanying metadata have been released on GitHub under the Apache License 2.0 at https://github.com/XiaoAILab/VERnet.



