Abstract
Pathogenic missense mutations are commonly found in protein-coding regions of DNA, and often alter protein function. In the past decade, enormous experimental efforts have been undertaken through the development of functional assays, mutagenesis screenings, and computational prediction algorithms to characterize these variants. Indeed, most efforts have been focused towards identifying degrees of loss-of-function (LoF) effects on the three-dimensional structures of proteins, but gain-of-function (GoF) mutations remain poorly understood. Herein, we performed a case study of the PIEZO1 mechanosensitive ion channel protein whose GoF variants (n = 56) are implicated in hereditary xerocytosis (HX) and LoF variants (n = 6) in lymphatic dysplasia (LD), respectively. This study evaluated the abilities of AlphaMissense (AM) to decipher both mutation types, and benchmarked its performance against other algorithmic approaches, including Combined Annotation Dependent Depletion (CADD) v1.7, evolutionary model of variant effect (EVE), and Evolutionary Scale Modeling-1b (ESM-1B). We found that all approaches excelled in identifying LoF variants but were often ambiguous in their predictions for GoF PIEZO1 variants. ESM-1b was a notable exception that demonstrated balance sensitivity to both GoF and LoF variants as it likely identified certain sequential features not utilized in other approaches. Secondly, our findings suggest that GoF variants of HX do not significantly destabilize the PIEZO1 structure and are not generally identified by conventional signatures of damage from these algorithms. GoF variants are not synonymous with direct changes in changes in free energy upon mutations. Furthermore, we validated computational structure predictions of PIEZO1 against resolved cryo-EM structures, and provided biophysical data for variants located on unresolved residues of this ion channel. We formulated a weighted ensemble model that performed similarly to AM and outperformed all other traditional approaches evaluated in this study. Overall, this is the first study to directly evaluate the capabilities of pathogenicity prediction algorithms for deciphering GoF and LoF variants, and underscores the limitations present in current approaches.
Keywords: Piezo1, Congenital lymphatic dysplasia, Hereditary xerocytosis, AlphaMissense, Mechanosensitive cation channel, Gain-of-Function mutations
Highlights
-
•
ESM-1b demonstrates balanced sensitivity in deciphering GoF and LoF variants.
-
•
A weighted ensemble model outperforms individual predictors of variant pathogenicity.
-
•
GoF variants do not typically destabilize the PIEZO1 structure severely.
-
•
AM excels in predicting LoF variants, but often misses GoF variants.
-
•
The AF2 structure of PIEZO1 can be used to evaluate variants on unresolved residues.
1. Introduction
PIEZO1 is a mechanosensitive ion channel comprised of over 2500 amino acids and 24–36 transmembrane domains forming a homotrimeric propellor structure [1,2]. PIEZO1 is involved in regulating membrane potential and Ca2+ signaling coupled to downstream effectors in animals [3]. Previous studies have identified that PIEZO1 is an inherent sensor of membranophone tension and a primary physiological activator [[3], [4], [5], [6]]. PIEZO1 is expressed in numerous cell-types with a wide range in function, including regulating blood pressure, mediating endothelial shear-stress and vascular development, and erythrocyte function [3,[7], [8], [9]]. Further, mutations in PIEZO1 have been implicated in lymphatic dysplasia (LD), hereditary xerocytosis (HX), and other rare disorders [10]. Specifically, loss-of-function (LoF) mutations in the PIEZO1 gene are implicated in autosomal recessive generalized lymphatic dysplasia [[11], [12], [13]]. Compound heterozygous and cosegregating homozygous mutations, including splice-site, nonsense, and missense variants, have been reported in patients [10]. In erythrocytes, PIEZO1 mediated Ca2+ signaling in response to various mechanical stimuli in heterozygous parents but not in compound heterozygotes of the G2029R variant [12]. Clinical phenotypes observed in PIEZO1-associated lymphatic dysplasia appear distinct from other lymphedema syndromes, and it has been previously suggested that the PIEZO1 gene may be involved in modulating the expression of other genes implicated in lymphatic endothelial cell signaling, such as VEGF-C and VEGFR-3 (Ma et al., 2008; [10]). On the other hand, gain-of-function (GoF) mutations in the PIEZO1 gene have been found in HX from numerous studies [[14], [15], [16], [17], [18]]. In GoF mutations, the autosomal dominant missense variants demonstrate altered ion channel kinetics and delayed inactivation [10,19]. However, the mechanistic relation between the persistent lymphatic insufficiency of LoF and the apparent insufficiency of GoF PIEZO1 variants remains uncertain.
Biochemical and biophysical experimentation with PIEZO1 remains difficult due to the large size and complexity of PIEZO1. Computing restraints on single protein chains have precluded the application of computational approaches in understanding PIEZO1. Attempts to study PIEZO1 are further complicated by the functional redundancy of PIEZO1 in various biological systems. Despite these challenges, the unique LoF and GoF variants position PIEZO1 to be a candidate for evaluation with pathogenicity prediction algorithms which may help decipher variants between these two mutation types. Currently, limited literature exists on the abilities of pathogenicity prediction algorithms to decipher variants of LoF from GoF. Herein, we sought to perform a case study of PIEZO1 variants using these computational approaches.
We performed comparative benchmarking of multiple pathogenicity tools in assessing these mutation types, especially tools utilizing machine-learning based algorithms such as AlphaMissense and ESM-1b [20,21]. Previous work has assessed more than 17,000 variants across five genes (DDX3X, MSH2, PTEN, KCNQ4, and BRCA1) and found that AM scores correlate with in vitro functional assay data (Ljungdahl et al., 2024). This correlation has also been observed for variants in the CFTR gene implicated in cystic fibrosis and multiple amyloidogenic genes (APP, PSEN1, and PSEN2) [22,23]. Our previous work has established pathogenicity prediction tools that are capable of assessing variants occurring on intrinsically disordered regions of proteins, allowing for analysis of variants located on unresolved residues of the PIEZO1 structure [23]. Missense mutations demonstrate broad diversity in functional biological impacts, including degrees of LoF, risk-imposing, benign or protective, and GoF variants. Interestingly, Ljungdahl et al. [24] found that both LoF and GoF variants received high pathogenicity scoring for the KCNQ4 gene. A potential hypothesis for these approaches identifying GoF with high pathogenicity is that they impart significant changes to the protein structure, resulting in the introduction of novel biological functions, and thereby being predicted as pathogenic. For instance, GoF mutations (e.g. G12V, G12C, G12D) in the KRAS oncogene result in either strong stabilization or destabilization of free energy of the protein structure as shown in molecular dynamic simulations and deep mutational scanning experiments [25,26]. Yet, these variants are incompletely understood. There is a possibility that GoF imposes minimal structural effects but rather affects unstructured regions or other dependent biological factors implicated in protein function ([27]; Li et al., 2018). Accurately deciphering LoF from GoF variants may support numerous applications in precision medicine, and the potential for application of predictive in silico approaches to answering this question remains uncertain.
Herein, we benchmarked multiple pathogenicity prediction algorithms and calculated the free energy of stability data for 75 unique missense variants implicated in seven reported clinical phenotypes linked to PIEZO1, including LoF mutations in lymphatic dysplasia and GoF in HX. This is the first case study to our knowledge to directly evaluate the capabilities of pathogenicity prediction algorithms against these mutation types.
2. Materials and methods
2.1. 3D structure, variants, and phenotypes of PIEZO1
A comprehensive table of missense variants for PIEZO1 and their reported phenotypes was obtained from More et al. [28]. Of the 75 unique variants, only one, F2458L, has been reported to cause both LD and HX. This variant was included twice as two data points in our analyses. We modified this table to sort by phenotype and obtained the functional and stability data for this list (Supplementary Material). Next, we utilized the predicted 3D structure of PIEZO1 (AF-Q92508-F1-v4) derived from AlphaFold2 (AF2) in the AlphaFold Structure Database [29,30]. We reviewed seven Mus musculus derived cryo-EM structures from Protein Data Bank (PDB: 6B3R; 7WLT; 7WLU; 8IMZ; 6LQI; 5Z10; 6BPZ; 3JAC). Visualization was completed in PyMOL (Schrödinger LLC, Portland, OR, USA, Version 3.1.0). Additionally, one recently resolved human PIEZO1 structure was reviewed as well (PDB: 8YEZ).
2.2. Pathogenicity and stability scoring for PIEZO1 variants
AlphaMissense scores were obtained from the AlphaFold Structure Database (Heatmap), and established classification cutoffs were used. Variants with scores above 0.56 were classified pathogenic, scores below 0.34 were classified benign, and scores between 0.34 and 0.56 were classified as ambiguous [20]. Likewise, CADD scores were obtained from the ProtVar server from the EMBL-EBI. Scores below 15 were classified as likely benign and those above 15 were classified as likely pathogenic (https://www.ebi.ac.uk/ProtVar/) [31]; Stephenson et al. [32]. For Evolutionary Scale Modeling-1b (ESM-1b), the negative log-likelihood ratio (-LLR) was used for scoring. A -LLR of less than seven was classified as corresponding to a benign variant, a score greater than eight was classified as corresponding to a pathogenic variant, and a score between seven and eight inclusive was considered ambiguous (https://huggingface.co/spaces/ntranoslab/esm_variants) [21]. Lastly, scoring of the Evolutionary Model of Variant Effect (EVE) (https://evemodel.org/) followed the criteria established by Frazer et al. [33], where scores greater than 0.65 were considered pathogenic, those below 0.35 were considered benign, and those in between 0.35 and 0.65 were considered ambiguous. Next, with the AF2 structure of PIEZO1, we used a modified version of the open-source, peer reviewed protocol defined by Refs. [23,34] to calculate the difference in the change in free energy (ΔΔG) upon the amino acid substitution in each variant. To evaluate the stability change in the missense variants, we considered using the PremPS, DDMUT, DynaMut2, MAESTRO, DDGEmb, and INPS-MD methods for computation [[35], [36], [37], [38], [39], [40]]. However, given the size of the ion channel exceeded the amino acid limit (AA < 2500) for most of these computational approaches, we elected to use PremPS which was trained on a balanced dataset of stabilizing and destabilizing missense variants. Lastly, a preliminary ensemble model was produced with Python version 3.10, using libraries including pandas, matplotlib, sklearn, and seaborn [[41], [42], [43], [44]]. All programming occurred on a Google Colab environment and used a T4 GPU.
2.3. Statistical analyses
All statistical analyses and data plotting was performed using GraphPad Prism (version 10.3.1) (GraphPad Software, La Jolla, CA, USA). Unless stated otherwise, all data displayed has been reported as the mean ± standard error of the mean (SEM). p-values less than 0.05 were deemed statistically significant. When comparing individual variables, a one-sample Student's t-test was used for significance testing. Likewise, when comparing multiple variables, a Mann-Whitney U test was used for evaluating statistical significance. For linear correlational analyses, we calculated Pearson's correlation coefficients and Spearman's correlation coefficients to determine whether the predicted slope was significantly non-zero. A GraphPad file containing all statistical analyses and visualizations has been provided in the Supplementary Material.
3. Results
3.1. AlphaFold2 predictions are consistent with cryo-EM structures of wild-type PIEZO1
Before evaluating our selected pathogenicity prediction algorithms, we sought to confirm the accuracy of the AF2 structure (AF-Q92508-F1-v4) of PIEZO1 as a prerequisite to measuring the change in free energies in unresolved regions of the protein in point mutation variants. At the time of writing, there were no structures in the Protein Data Bank with a complete resolution of all PIEZO1 domains. Therefore, we evaluated the structure of the AF2 prediction against all partially resolved cryo-EM structures of PIEZO1, described in Section 3.2. When superimposing the AF2 structure with each individual subunit of the homotrimer (n = 24), the mean root mean square deviation of atomic position (r.m.s.d.) was found to be 4.350 ± 0.212 Å, demonstrating similarity of the AF2 prediction with partially resolved structures. We utilized the cryo-EM structure (PDB: 5Z10) as a template to assemble a homotrimeric model of PIEZO1 using the AF2 prediction (Fig. 1A). We also found that the AF2 structure was nearly identical to a human PIEZO1 structure recently resolved by Shan et al. (2024), where the r.m.s.d. was 2.445 ± 0.007 Å (n = 3). The overall predicted local distance difference test (pLDDT) score for the structure was 73.44, and those for the 75 variants was 79.91 ± 1.68, both indicating high confidence predictions of the 3D structure. The discrepancy in confidence for the position of variants compared to the complete structure is because the variants occur on higher confidence residues. However, there were 13 variants (17.57%) that were located on low confidence residues of the structure, displayed in Fig. 1B. Given the unique structure of PIEZO1 from other ion channels, the MMseqs2 algorithm used to create multiple-sequence alignment in the AF2 prediction identified evolutionary similarity with varying species, including Rattus norvegicus, Mus musculus, Glycine max, Arabidopsis thaliana, and Zea mays, and strengthens the conservation data utilized for AM predictions. Therefore, with consideration of the structure and sequence similarity along with the pLDDT scores, we began our computational analyses with the AF2 structure.
Fig. 1.
Overall pathogenicity for the phenotypes of PIEZO1 variants. (a) AlphaFold2 model of PIEZO1 homotrimer superimposed from a prior cryo-EM structure (PDB: 5Z). (b) The pLDDT scores for the 75 missense variants from More et al. [28]. Dotted red line indicates cutoff for high confidence prediction. (c) AM scoring for the 7 unique phenotypes of missense variants, where the dotted red and blue line indicates the cutoff for pathogenic and benign classification, respectively. (d) The negative log-likelihood ratio from ESM-1b according to phenotype, where the red and blue line indicates the cutoff for pathogenic and benign classification, respectively. (e) Distribution of EVE scores, with dotted red and blue lines indicating the cutoff for pathogenic and benign classification, respectively. (f) CADD scoring is distributed for the phenotypes with a blue line indicating the benign classification cutoff.
3.2. AlphaMissense identifies LoF, but fails with GoF variants in comparison to ESM-1b
Next, we evaluated pathogenicity scoring from AM, ESM-1b, CADD, and EVE for each of the unique phenotypes occurring from the PIEZO1 variants, including bone marrow failure, colorectal adenomatous polyposis (CAP), hydrops fetalis, immunodeficiency, LD, HX, and pseudohyperkalemia. LD is caused by LoF and HX from GoF, respectively. AM scores for all available phenotypes are displayed in Fig. 1C. The mean score of LD variants was 0.7672 ± 0.1017 (n = 6) and that of HX variants was 0.5213 ± 0.0425 (n = 56). AM was able to decipher LoF variants with high scores despite a limited sample size for the LD phenotypes, which validates prior studies evaluating AM [[22], [23], [24]]. Interestingly, the GoF variants for HX were found to be within the range of uncertain classification (AM: 0.34–0.56). Seven variants were classified as uncertain (L2277 M, L2192I, K2502R, A2003T, R1797C, A2020T, and E2461K). There were 29 variants classified as pathogenic and 20 as benign. These findings suggest that there is no clear trend of pathogenicity for GoF variants in PIEZO1, contradicting the high pathogenicity consistently predicted by AM with the GoF variants in the KCNQ4, a gene that encodes a potassium channel protein [24]. While there remains no clear biological trend reported for bone marrow failure and CAP implicated from PIEZO1 variants, both were within range of uncertain pathogenicity at 0.5863 ± 0.1337 (n = 6) and 0.2686 ± 0.1101 (n = 6), respectively. No clear pathogenicity is observed for hydrops fetalis, immunodeficiency, and pseudohyperkalemia as there remains only a single variant reported in literature. The AM scores at the pathogenic benchmark (0.56) were not significant for both LD (p = 0.0972) and HX (p = 0.3669). However, AM scores for both LD (p = 0.0085) and HX (p < 0.0001) were significantly above the benign benchmark (0.34). The non-significance observed for pathogenic LD-causing PIEZO1 variant AM scores may be attributed to the low sample size of variants observed in patients.
The trends of predicted pathogenicity among the phenotypes from ESM-1b is shown in Fig. 1D. Generally, all phenotypes were determined to be pathogenic (-LLR >8). The mean -LLR score for LD-causing variants was 12.29 ± 0.53 and that for HX was 9.65 ± 0.41 (p = 0.0268). Both LoF (p = 0.0005) and GoF variants (p = 0.0002) yielded scores significantly greater than the pathogenic threshold (-LLR = 8). The remaining phenotypes were not significantly different from this threshold (p > 0.05). These trends appear to align with the expected effects of LoF and GoF variants described in Section 1. The ESM-1b model is unique from AM in that the LLR takes consideration of GoF variants [21]. CAP (p = 0.5884) and bone marrow failure (p = 0.2733) were also considered uncertain by ESM-1b, while the hydrops fetalis, immunodeficiency, pseudohyperkalemia were also classified above the pathogenic threshold. Overall, deviations among the LoF and GoF variants as predicted by ESM-1b in relation to the clinical ground truth were reduced compared to AM and a difference in likelihood was observed. It appears that while both mutation types are pathogenic, the LoF imparts a greater likelihood between the wild-type and mutant.
Similar trends were also observed with EVE, where LoF variants do not have significantly greater predicted pathogenicity values compared to GoF variants (p = 0.2990). Scores for CAP, bone marrow failure, LD, and HX phenotypes were within the range of ambiguity, (Fig. 1E). CADD exhibited similar results; LoF variant scores were, on average, greater than scores for GoF cairants, though not to a statistically significant degree (p = 0.6714). Similarly to our previous reporting with variants on amyloidogenic proteins [23], CADD scores appear to be biased towards pathogenic predictions (Fig. 1F). With the exception of ESM-1b, LoF variants tend to yield a pathogenic classification to a non-significant but greater degree than by GoF variants across the evaluated pathogenicity prediction algorithms.
3.3. GoF PIEZO1 variants are correlated with predicted pathogenicity of EVE and CADD
Correlational analyses were then performed between pathogenicity scores obtained from the four computational approaches and computed changes in free energy upon point mutation using the balanced machine-learning model PremPS (Fig. 2A). There is nonsignificant, weak, and negative correlation between the AM predicted pathogenicity scores and resulting change in stability of PIEZO1 (p = 0.2457). Additionally, there appeared no clear trend between LoF variant scores and PremPS derived free energies as well as between scores for variants of unknown significance (VUS) and PremPS derived free energies. This trend was also observed in ESM-1b calculated variant scores (Fig. 2B). However, the weak negative correlation was statistically significant for the distribution of EVE (p = 0.0276) (Fig. 2C). Likewise, though the predicted slope was not significantly non-zero for CADD (Fig. 2D), Spearman's ⍴ was significantly negatively correlated (p = 0.0073). From the individual ΔΔG, HX GoF variants caused significant destabilization to the protein structure at −0.5471 ± 0.0932 kcal/mol (p < 0.0001), while LD LoF variants were not significant at 0.0717 ± 0.4931 kcal/mol (p = 0.8901), respectively. Remaining VUS variants were found to not cause significant destabilization (Supplementary Material).
Fig. 2.
Computed free change upon amino acid substitution. (a) Correlation of AM predicted pathogenicity against change in free energy upon mutation, including the legend for each phenotype. (b) Correlation of ESM-1b predicted pathogenicity against change in free energy upon mutation, including the legend for each phenotype. (c) Correlation of EVE predicted pathogenicity against change in free energy upon mutation, including the legend for each phenotype. (d) Correlation of CADD predicted pathogenicity against change in free energy upon mutation, including the legend for each phenotype.
3.4. Potential ensemble approach to pathogenicity prediction of PIEZO1 variants
As described in our previous reporting with amyloidogenic genes [23], there appears to be great promise in the possibility of ensemble methods towards accurately predicting the functional effects of missense variants. Indeed, in internal validation studies by Cheng et al. [20], the REVEL ensemble algorithm performed the second highest in accuracy and precision on the ClinVar database (shown in Fig. 2 of publication), suggesting potential interest in ensemble techniques for deep learning-based approaches. Therefore, we sought to employ this method to optimize the predictive power for both GoF and LoF variants in the context of PIEZO1. Herein, we developed a preliminary weighted ensemble approach with validation using PIEZO1 variants that have received classification as either pathogenic or benign from the ClinVar dataset (n = 118). The in-house python script to this model has been provided in the Supplementary Material. Using the scoring from each model without consideration of the thresholds for pathogenic or benign classification, we normalized the data and optimized the weights of individual models dynamically to achieve the highest area under the curve (AUC) using the sequential least square optimization method. We arbitrarily labeled scores from 0.4 to 0.6 as ambiguous, which were deemed incorrect predictions for ROC-AUC analyses. Surprisingly, the weighting that achieved the highest AUC was a balanced incorporation of (0.25) all four algorithm scores per variant. We found that the overall accuracy of the ensemble model was 81.4% with an AUC of 0.955 and generally outperformed all models except for AM which reported the highest accuracy of 86.4% and an AUC of 0.959. Metrics for this preliminary ensemble model have been displayed in Fig. 3A–C. Overall, these results indicate that AM still retains the best performance of the evaluated approaches and the ensemble model.
Fig. 3.
Performance of preliminary ensemble model for PIEZO1 variants. (a) Confusion matrix of performance of the model, indicating high accuracy with significant false positives for benign variants. (b) Distribution of predicted scores. (c) ROC-AUC analyses results of individual models against the ensemble approach.
4. Discussion
Classifying the pathogenicity of missense variants is a crucial application in precision medicine, and remains a recurrent challenge due to the large diversity of possible biological effects. Over the past decade, the majority of computational approaches have focused on determining degrees of LoF, but limited literature exists on those for GoF variants. Due to the breakthrough of AF2, many of these approaches have adopted structural data in consideration of these effects, particularly AM. Numerous studies have demonstrated that AM is correlated with in vitro functional assays of variants, but no studies have evaluated genes involving both GoF and LoF variants. Therefore, we sought to perform a case study of PIEZO1 variants that involved both mutation types. In this study, we examined the performance of several pathogenicity prediction models in the context of the PIEZO1 gene and LoF or GoF variants deriving from missense mutations. We found that AM, a deep learning classifier that convolves AlphaFold structural context with evolutionary conservation, is adept at recognizing deleterious LoF variants but is less able to recognize GoF variants [20]. AM assigned higher pathogenicity scores to known LoF mutations from PIEZO1 mutation-associated LD, accurately indicating these variants as functionally damaging and reflecting a training emphasis on identifying variants that disrupt protein structure or conservation in LoF mutations. By contrast, AM was more likely to misclassify known GoF mutations causing HX as benign or of uncertain significance.
These results are consistent with the broader literature on PIEZO1 and variant pathogenicity. How mechanical force activates PIEZO1 remains only partially understood [45]. Diseases caused by PIEZO1 mutations illustrate two opposite mechanisms. GoF variants typically do not grossly destabilize the protein; instead, they alter channel gating kinetics or regulatory domains. For example, HX-associated PIEZO1 mutations have been shown to slow channel inactivation kinetics [12], resulting in prolonged cation influx without abolishment of function. Such changes are pathogenic in vivo but may not trigger recognition of conventional damage signatures that algorithms like AM recognize. In contrast, LoF mutations often disrupt conserved residues or protein folding, changes that are readily detected by conservation- and structure-based models. This mechanistic discrepancy in variant effects may explain the underperformance of AM on GoF predictions.
Interestingly, the protein language model ESM-1b demonstrated a greater balance in sensitivity to both LoF and GoF variants than AM. Unlike AM, ESM-1b is an unsupervised model trained on approximately 250 million general protein sequence patterns. It has previously been shown that such language models are capable of predicting mutational impacts in a zero-shot manner by identifying the degree to which a mutation appears deleterious or unusual against learned sequence regularities [46]. ESM-1b's predictions herein flagged not only the LoF variants but also several GoF variants as outliers, despite only with moderate confidence. This suggests that certain gain-of-function mutations may induce sequence features, for instance at critical motifs such as gating motifs, that the language model recognizes as evolutionarily atypical despite not benign overtly destabilizing.
However, ESM-1b was less sensitive than AM for some LoF variants, reflecting a possible trade-off. While previous work has focused primarily on distinguishing pathogenic from benign variants, herein, we directly assessed pathogenicity prediction model performance in identifying distinct pathogenic mechanisms. PIEZO1 was selected for evaluation as LoF and GoF mutations produce variants associated with well-characterized pathophysiology. Our findings underscore a potential limitation in current prediction algorithms in which pathogenicity is predominantly associated with structural disruption-based LoF.
Our study contains several limitations. The limited number of confirmed PIEZO1 LoF mutations restricts statistical robustness and generalizability. Furthermore, despite advancements in cryo-EM, unresolved structural regions may perturb AlphaFold-derived model predictions. Static structure-based methods may also fail to capture minor conformational changes induced by GoF mutations. Furthermore, pathogenicity prediction tools are constrained by a lack of mechanism specification or by heuristic score determination, altogether reducing model interpretability. Future approaches to improving GoF identification in pathogenicity prediction models may include training models on datasets enriched with activating mutations or utilizing multi-task learning frameworks for simultaneously classifying pathogenicity and functional mechanisms. Improved pathogenicity prediction will advance our understanding of variant pathogenicity and facilitate improved diagnostics and therapeutics development.
Data
The source data has been deposited in the Supplementary Material and GitHub repository: https://github.com/Joshua-Pillai/PIEZO1.
Funding information
This material was based upon work supported by a gift from Beckman Laser Institute Inc. to LS. Special thanks to the private donors to our University of California, San Diego (UCSD) Institute for Engineering in Medicine, Biophotonics Technology Center: Dr. Shu Chien from UCSD Bioengineering, Dr. Lizhu Chen from CorDx Inc., Dr. Xinhua Zheng, David & Leslie Lee for their generous donations.
CRediT authorship contribution statement
Joshua Pillai: Conceptualization, Data curation, Formal analysis, Investigation, Methodology, Resources, Software, Validation, Visualization. Adhvaith Sridhar: Conceptualization, Formal analysis, Methodology, Validation, Visualization, Writing – original draft, Writing – review & editing. Kijung Sung: Conceptualization, Investigation. Linda Shi: Conceptualization, Funding acquisition, Supervision, Writing – review & editing. Chengbiao Wu: Conceptualization, Project administration, Resources, Software, Supervision, Writing – review & editing.
Declaration of competing interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Acknowledgements
We would like to thank Dr. Shang Ma, Ph.D. from the Children's Medical Center Research Institute at the University of Texas Southwestern Medical Center at Dallas for technical assistance with reviewing the presented study.
Footnotes
Supplementary data to this article can be found online at https://doi.org/10.1016/j.bbrep.2026.102480.
Contributor Information
Linda Shi, Email: zshi@ucsd.edu.
Chengbiao Wu, Email: chw049@ucsd.edu.
Appendix A. Supplementary data
The following are the Supplementary data to this article:
Data availability
Please see github described in the main text.
References
- 1.Coste B., et al. Piezo1 and Piezo2 are essential components of distinct mechanically activated cation channels. Science. 2010;330:55–60. doi: 10.1126/science.1193270. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Xiao B. Mechanisms of mechanotransduction and physiological roles of PIEZO channels. Nat. Rev. Mol. Cell Biol. 2024 doi: 10.1038/s41580-024-00773-5. [DOI] [PubMed] [Google Scholar]
- 3.Beech D.J., Xiao B. Piezo channel mechanisms in health and disease. J. Physiol. 2018;596:965–967. doi: 10.1113/JP274395. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Lewis A.H., Grandl J. Mechanical sensitivity of Piezo1 ion channels can be tuned by cellular membrane tension. eLife. 2015;4 doi: 10.7554/eLife.12088. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Cox C.D., et al. Removal of the mechanoprotective influence of the cytoskeleton reveals PIEZO1 is gated by bilayer tension. Nat. Commun. 2016;7 doi: 10.1038/ncomms10366. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Syeda R., et al. Piezo1 channels are inherently mechanosensitive. Cell Rep. 2016;17:1739–1746. doi: 10.1016/j.celrep.2016.10.033. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Gudipaty S.A., et al. Mechanical stretch triggers rapid epithelial cell division through Piezo1. Nature. 2017;543:118–121. doi: 10.1038/nature21407. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Zarychanski R., et al. Mutations in the mechanotransduction protein PIEZO1 are associated with hereditary xerocytosis. Blood. 2012;120:1908–1915. doi: 10.1182/blood-2012-04-422253. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Rode B., et al. Piezo1 channels sense whole body physical activity to reset cardiovascular homeostasis and enhance performance. Nat. Commun. 2017;8 doi: 10.1038/s41467-017-00429-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Alper S.L. Current Topics in Membranes. 2017. Genetic diseases of PIEZO1 and PIEZO2 dysfunction; pp. 97–134. [DOI] [PubMed] [Google Scholar]
- 11.Marouli E., et al. Rare and low-frequency coding variants alter human adult height. Nature. 2017;542:186–190. doi: 10.1038/nature21039. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Lukacs V., et al. Impaired PIEZO1 function in patients with a novel autosomal recessive congenital lymphatic dysplasia. Nat. Commun. 2015;6 doi: 10.1038/ncomms9329. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Fotiou E., et al. Novel mutations in PIEZO1 cause an autosomal recessive generalized lymphatic dysplasia with non-immune hydrops fetalis. Nat. Commun. 2015;6 doi: 10.1038/ncomms9085. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Picard V., et al. Clinical and biological features in PIEZO1-hereditary xerocytosis and Gardos channelopathy: a retrospective series of 126 patients. Haematologica. 2019;104:1554–1564. doi: 10.3324/haematol.2018.205328. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Russo R., et al. Multi‐gene panel testing improves diagnosis and management of patients with hereditary anemias. Am. J. Hematol. 2018;93:672–682. doi: 10.1002/ajh.25058. [DOI] [PubMed] [Google Scholar]
- 16.Albuisson J., et al. Dehydrated hereditary stomatocytosis linked to gain-of-function mutations in mechanically activated PIEZO1 ion channels. Nat. Commun. 2013;4 doi: 10.1038/ncomms2899. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Andolfo I., et al. Multiple clinical forms of dehydrated hereditary stomatocytosis arise from mutations in PIEZO1. Blood. 2013;121:3925–3935. doi: 10.1182/blood-2013-02-482489. [DOI] [PubMed] [Google Scholar]
- 18.Risinger M., et al. Hereditary xerocytosis: diagnostic considerations. Am. J. Hematol. 2017;93 doi: 10.1002/ajh.24996. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Bae C., Gnanasambandam R., Nicolai C., Sachs F., Gottlieb P.A. Vol. 110. Proceedings of the National Academy of Sciences; 2013. (Xerocytosis is Caused by Mutations that Alter the Kinetics of the Mechanosensitive Channel PIEZO1). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Cheng J., et al. Accurate proteome-wide missense variant effect prediction with AlphaMissense. Science. 2023;381 doi: 10.1126/science.adg7492. [DOI] [PubMed] [Google Scholar]
- 21.Brandes N., Goldman G., Wang C.H., Ye C.J., Ntranos V. Genome-wide prediction of disease variant effects with a deep protein language model. Nat. Genet. 2023;55:1512–1522. doi: 10.1038/s41588-023-01465-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.McDonald E.F., Oliver K.E., Schlebach J.P., Meiler J., Plate L. Benchmarking AlphaMissense pathogenicity predictions against cystic fibrosis variants. PLoS One. 2024;19 doi: 10.1371/journal.pone.0297560. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Pillai J., Liu S., Sung K., Shi L., Wu C. Benchmarking AlphaMissense pathogenicity predictions against APP, PSEN1, and PSEN2 variants of unknown significance. Biochemistry and Biophysics Reports. 2025;42 doi: 10.1016/j.bbrep.2025.102049. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Ljungdahl A., et al. AlphaMissense is better correlated with functional assays of missense impact than earlier prediction algorithms. bioRxiv. 2023 doi: 10.1101/2023.10.24.562294. [DOI] [Google Scholar]
- 25.Pandey D., Roy K.K. Decoding KRAS Dynamics: exploring the impact of mutations and inhibitor binding. Arch. Biochem. Biophys. 2024 doi: 10.1016/j.abb.2024.110279. [DOI] [PubMed] [Google Scholar]
- 26.Kwon J.J., et al. Comprehensive structure-function analysis reveals gain- and loss-of-function mechanisms impacting oncogenic KRAS activity. bioRxiv. 2024 doi: 10.1101/2024.10.22.618529. [DOI] [Google Scholar]
- 27.Li Y., Zhang Y., Li X., Yi S., Xu J. Gain-of-Function mutations: an emerging advantage for cancer biology. Trends Biochem. Sci. 2019;44:659–674. doi: 10.1016/j.tibs.2019.03.009. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.More T.A., Dongerdiye R., Devendra R., Warang P.P., Kedar P.S. Mechanosensitive Piezo1 ion channel protein (PIEZO1 gene): update and extended mutation analysis of hereditary xerocytosis in India. Ann. Hematol. 2020;99:715–727. doi: 10.1007/s00277-020-03955-1. [DOI] [PubMed] [Google Scholar]
- 29.Jumper J., et al. Highly accurate protein structure prediction with AlphaFold. Nature. 2021;596:583–589. doi: 10.1038/s41586-021-03819-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Varadi M., et al. AlphaFold Protein Structure Database: massively expanding the structural coverage of protein-sequence space with high-accuracy models. Nucleic Acids Res. 2021;50:D439–D444. doi: 10.1093/nar/gkab1061. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Rentzsch P., Witten D., Cooper G.M., Shendure J., Kircher M. CADD: predicting the deleteriousness of variants throughout the human genome. Nucleic Acids Res. 2018;47:D886–D894. doi: 10.1093/nar/gky1016. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Stephenson J.D., et al. ProtVar: mapping and contextualizing human missense variation. Nucleic Acids Res. 2024;52:W140–W147. doi: 10.1093/nar/gkae413. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Frazer J., et al. Disease variant prediction with deep generative models of evolutionary data. Nature. 2021;599:91–95. doi: 10.1038/s41586-021-04043-8. [DOI] [PubMed] [Google Scholar]
- 34.Pillai J., Sung K., Wu C. Predicting the impact of missense mutations on an unresolved protein's stability, structure, and function: a case study of Alzheimer's disease‐associated TREM2 R47H variant. Comput. Struct. Biotechnol. J. 2025;27:564–574. doi: 10.1016/j.csbj.2025.01.024. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Chen Y., et al. PremPS: predicting the impact of missense mutations on protein stability. PLoS Comput. Biol. 2020;16 doi: 10.1371/journal.pcbi.1008543. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Zhou Y., Pan Q., Pires D.E.V., Rodrigues C.H.M., Ascher D.B. DDMut: predicting effects of mutations on protein stability using deep learning. Nucleic Acids Res. 2023;51:W122–W128. doi: 10.1093/nar/gkad472. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Rodrigues C.H.M., Pires D.E.V., Ascher D.B. DynaMut2: assessing changes in stability and flexibility upon single and multiple point missense mutations. Protein Sci. 2020;30:60–69. doi: 10.1002/pro.3942. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Laimer J., Hofer H., Fritz M., Wegenkittl S., Lackner P. MAESTRO - multi agent stability prediction upon point mutations. BMC Bioinf. 2015;16 doi: 10.1186/s12859-015-0548-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Savojardo C., Fariselli P., Martelli P.L., Casadio R. INPS-MD: a web server to predict stability of protein variants from sequence and structure. Bioinformatics. 2016;32:2542–2544. doi: 10.1093/bioinformatics/btw192. [DOI] [PubMed] [Google Scholar]
- 40.Savojardo C., Manfredi M., Martelli P.L., Casadio R. DDGemb: predicting protein stability change upon single- and multi-point variations with embeddings and deep learning. Bioinformatics. 2025 doi: 10.1093/bioinformatics/btaf019. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.McKinney W. 2011. Pandas: a Foundational Python Library for Data Analysis and Statistics. [Google Scholar]
- 42.Hunter J.D. MatPlotLib: a 2D Graphics environment. Comput. Sci. Eng. 2007;9:90–95. [Google Scholar]
- 43.Waskom M. Seaborn: statistical data visualization. J. Open Source Softw. 2021;6:3021. [Google Scholar]
- 44.Pedregosa, et al. SciKit-Learn: Machine learning in python. J. Mach. Learn. Res. 2011 doi: 10.5555/1953048.2078195. [DOI] [Google Scholar]
- 45.Wang Y., et al. A lever-like transduction pathway for long-distance chemical- and mechano-gating of the mechanosensitive Piezo1 channel. Nat. Commun. 2018;9 doi: 10.1038/s41467-018-03570-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46.Meier, J. et al. Language models enable zero-shot prediction of the effects of mutations on protein function. Adv. Neural Inf. Process. Syst., 34, 29287–29303.
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
Please see github described in the main text.



