Skip to main content
PLOS ONE logoLink to PLOS ONE
. 2024 Jan 25;19(1):e0297560. doi: 10.1371/journal.pone.0297560

Benchmarking AlphaMissense pathogenicity predictions against cystic fibrosis variants

Eli Fritz McDonald 1,2, Kathryn E Oliver 3,4, Jonathan P Schlebach 5, Jens Meiler 1,2,6,7,*, Lars Plate 1,8,9,*
Editor: Jeffrey L Brodsky10
PMCID: PMC10810519  PMID: 38271453

Abstract

Variants in the cystic fibrosis transmembrane conductance regulator gene (CFTR) result in cystic fibrosis–a lethal autosomal recessive disorder. Missense variants that alter a single amino acid in the CFTR protein are among the most common cystic fibrosis variants, yet tools for accurately predicting molecular consequences of missense variants have been limited to date. AlphaMissense (AM) is a new technology that predicts the pathogenicity of missense variants based on dual learned protein structure and evolutionary features. Here, we evaluated the ability of AM to predict the pathogenicity of CFTR missense variants. AM predicted a high pathogenicity for CFTR residues overall, resulting in a high false positive rate and fair classification performance on CF variants from the CFTR2.org database. AM pathogenicity score correlated modestly with pathogenicity metrics from persons with CF including sweat chloride level, pancreatic insufficiency rate, and Pseudomonas aeruginosa infection rate. Correlation was also modest with CFTR trafficking and folding competency in vitro. By contrast, the AM score correlated well with CFTR channel function in vitro–demonstrating the dual structure and evolutionary training approach learns important functional information despite lacking such data during training. Different performance across metrics indicated AM may determine if polymorphisms in CFTR are recessive CF variants yet cannot differentiate mechanistic effects or the nature of pathophysiology. Finally, AM predictions offered limited utility to inform on the pharmacological response of CF variants i.e., theratype. Development of new approaches to differentiate the biochemical and pharmacological properties of CFTR variants is therefore still needed to refine the targeting of emerging precision CF therapeutics.

Introduction

Cystic fibrosis (CF) is a lethal genetic disease caused by variants in the epithelial anion channel cystic fibrosis transmembrane conductance regulator (CFTR) [1]. CFTR is composed of an N-terminal lasso motif, two nucleotide binding domains (NBDs), two transmembrane domains (TMDs) and an unstructured regulatory domain (RD) [2]. Loss of CFTR protein production or function results in osmotic dysregulation at the epithelium of the skin, pancreatic duct, and lungs–leading to high sweat chloride levels, pancreatic insufficiency, and lung infections respectively [3]. Standard treatment paradigms for CF involve supplementation of salt, vitamins, and digestive enzymes, together with airway clearance therapies and small molecule CFTR modulators known as potentiators and correctors. CFTR variants experience distinct structural defects and proteostasis states leading to divergent pharmacological response profiles to modulators also known as theratypes [47].

At present, elexacaftor-tezacaftor-ivacaftor (ETI) is the best available highly effective modulator therapy for CF. This triple combination is clinically approved for ~170 CFTR variants, including the most commonly reported allele, deletion of phenylalanine 508 (F508del) [811]. ETI is composed of one gating potentiator (ivacaftor, VX-770) and two protein maturation correctors, tezacaftor (VX-661) and elexacaftor (VX-445). The corrector compounds have been suggested to directly bind unique subdomains of CFTR: VX-661 to TMD1 [12, 13], and VX-445 to the N-terminal lasso and TMD2 [14, 15]. Correctors contribute intermolecular interactions that favor the properly folded, trafficking competent state of CFTR. Due to the distinct binding sites, VX-661 and VX-445 elicit different mechanisms of action and confer variable theratype responses. Thus, profiling CFTR variant theratypes to these and other emerging modulators remains an important priority for CF personalized medicine.

Increasing implementation of next-generation sequencing approaches for CFTR DNA analysis has rapidly augmented the pace of novel CFTR variant discovery; and thus, hastened the need for more accurate pathogenicity prediction tools. This is particularly relevant to individuals with CFTR related metabolic syndrome (CRMS), also known as CF Screen Positive Inconclusive Diagnosis (CFSPID). Patients are diagnosed with this condition if they possess a positive newborn screen for CF and either of the following criteria: (1) normal sweat chloride value (<30 mEq/L) and two identified CFTR variants, at least one of which exhibits unclear phenotypic consequences; or (2) intermediate sweat chloride value (30–59 mEq/L) and detection of one or zero CF-causing variants [16]. Clinical symptoms worsen for approximately 11–48% of CRMS/CFSPID patients, who eventually convert to a CF diagnosis [17, 18]. Insufficient data exists to predict which CFTR variants (or other factors) enhance the risk for progression to CF.

Furthermore, high-throughput methods for characterizing CFTR variant severity are limited. Only 804 of the reported 2,111 variants have been annotated for disease association according to in vitro or clinical data [19]. The majority of these CFTR variants are single amino acid substitutions or missense variants [20]. Recently, AlphaMissense (AM) was published as a technology designed to predict the pathogenicity of missense variants throughout the human proteome [21]. Among well-characterized genetic diseases, AM included CF pathogenicity predictions for every possible CFTR single amino acid substitution. AM provides a significant advance beyond previous attempts to model a limited number of CFTR variants [22, 23]. Here, we evaluated the predictive validity of AM across several metrics of CF data such as pathogenicity in people with CF, in vitro CFTR folding and function, and theratype.

The increasing pace of novel CFTR variant discovery has created a need for pathogenicity prediction, especially among wild-type (WT) heterozygous individuals, e.g. carriers, and CRMS/CFSPID individuals whose variants remain uncharacterized. Our analysis suggests AM predicts the relative pathogenicity of severe CF-causing variants well, while performing modestly for variants of unknown significance (VUS) or variants of varying clinical consequence (VVCC). Overall, AM showed a high false positive rate for predicting CFTR2 patient outcomes [19]. Among VUSs, two variants from CFTR2 and 368 variants from ClinVar [24] were predicted as pathogenic. By contrast, the S912L VUS from CFTR2 was predicted benign despite clinical outcomes indicating ~half the people with this variant display hallmarks of CF disease. AM scores correlated modestly with CF pathogenicity metrics and CFTR trafficking/folding competency in vitro. Correlation improved when compared to CFTR channel functional data. These analyses imply AM has learned important trends in variant function despite not training on such data. Finally, we provide evidence that AM offers little power in predicting CFTR variant theratype, although we note this measure is beyond its intended design. Thus, AM may offer capabilities in predicting the pathogenicity of emerging variants but proved less useful for theratyping variants.

Results and discussion

I. AlphaMissense predictions of CFTR pathogenicity

AM makes pathogenicity predictions based on a 90% accuracy against ClinVar data [21]. For CFTR, AM predicted scores from 0.56–1.00 as pathogenic, scores from 0.34–0.56 as ambiguous, and scores from 0.04–0.34 as benign. We mapped the average AM prediction score per residue onto a CFTR structure (PDBID 5UAK) [25] (Fig 1A). TMDs showed a propensity for pathogenicity in contrast to residue conservation as calculated by ConSurf [26], which suggested the TMDs are comparatively variable across species (Fig 1A & S1A Fig). Since the regulatory domain (RD) is disordered and not resolved in the CFTR structure 5UAK, we also plotted the average AM score for RD residues against a RD map highlighting key features such as transiently formed α-helices and phosphorylation sites [4] (Fig 1B). Despite noted difficulty with disordered regions [21], AM predicted RD residues ~760–775 as a hotspot for pathogenicity. This is consistent with the role of transient helix 752–778 in CFTR gating through interactions with a conserved region of the NBD2 C-terminus [27, 28].

Fig 1. AlphaMissense predictions of CFTR pathogenicity compared to CFTR2.org repository.

Fig 1

A. The average AM score per residue mapped onto the CFTR structure (PDBID 5uak) [4]. Variants with a score from 0.56–1.00 were classified by AM as pathogenic, variants with a score from 0.34–0.56 were classified as ambiguous, and variants with a score from 0.04–0.34 were classified as benign. B. The average AM score for the regulatory domain (RD) with an RD map of important features for reference [4]. Transient helices shown, two negatively charged regions (Neg1/2), and important phosphorylation sites shown as P. Notably, the second half of the transient helix 748–778 is predicted to be a hotspot of RD variant pathogenicity. C. Receiver operating characteristic (ROC) curve of AM predictions benchmarked against 169 CFTR2.org database classifications. Our curated data set contained 110 CF-causing, 41 variants variable clinical consequence (VVCC), and 18 non-CF causing missense variants. In the pathogenic curve (violet)–pathogenic prediction of a CF-causing variant was considered a true positive. Likewise in the ambiguous curve (grey)–ambiguous prediction of a variable clinical outcomes was considered a true positive. Finally, in the benign curve (bluegreen)–benign prediction of a non-CF-causing variant was considered a true positive. D. Heatmap of AM scores for NBD1 residues 485–565 including all regions interfacing with ICL4 and adjacent regions. Pathogenicity colored as in A. and WT residues depicted in black. E. Heatmap of AM scores for ICL4 residues 1048–1084, colored as in E.

We sought to evaluate AM’s ability to correctly predict CF pathogenesis on 169 classified missense variants from CFTR2.org [19]. The CFTR2 database offered a rich patient metric repository including pathogenicity classifications as CF-causing, VVCC, non-CF-causing, or VUS (S1 Table). VVCC are defined by CFTR2 as variants that may cause CF when found heterozygous with CF-causing variants, which results in variable clinical diagnosis of CF, e.g., a person with a VVCC and a CF-causing variant may or may not present with CF [19].

AM showed a 95% accuracy (104/110) for predicting pathogenic variants and a 78% accuracy (14/18) for predicting benign variants based on variant determination in CFTR2 (S1 Table). We calculated the receiver operating characteristic (ROC) curve for all pairwise comparisons of pathogenicity predicted by AM (Fig 1C, See Methods). Briefly, all pairwise comparisons were considered–pathogenic, ambiguous, or benign were taken in turn to be a true positive. The alternative two predictions for a specific comparison were taken to be false positives. We considered pathogenic to predict CF-causing, ambiguous to predict VVCC, and benign to predict non-CF causing, VUS were not used. While looping through all possible score thresholds, the corresponding true positive and false positive rates were calculated and plotted. Benign predictions showed the highest area under the curve (AUC) (0.91) followed by pathogenic (0.80) and ambiguous respectively (0.66)–suggesting that AM has a high false positive rate, particularly for ambiguous predictions (Fig 1C). A high false positive rate may be attributed to a poor AlphaFold2 (AF2) predicted structure of CFTR. However, the AF2 predicted CFTR [29] shows a root mean squared deviation of just 2.5 Å from the active state cryo-EM model (PDB ID 6MSM, resolution 3.2 Å [30]) (S1B Fig).

We noted seven VUS in the CFTR2.org database and their respective AM predictions (Table 1). The location of these variants in the CFTR structure is shown (S1C Fig). Benign predicted R31L disrupts the arginine framed tripeptide motif at R29-R31, important for folding evaluation prior to ER export [31] and may affect endocytosis rates [32]. V201M was ambiguously predicted, consistent with our previous report describing this variant as mildly mis-trafficked and selectively sensitive to VX-661 [23]. A439V (benign prediction) and Y1014C (ambiguous prediction) showed trafficking and function slightly below WT [33] suggesting these variants are benign. Benign predicted variant S912L lies close to the CFTR glycosylation sites at N894 and N900, thus we speculated this mutation could interfere with glycan processing. Nevertheless, S912L trafficking and function remained sufficient compared to WT in vitro [34].

Table 1. CFTR2.org Variants of Unknown Significance (VUS).

Seven missense variants of unknown significance (VUS) from the CFTR2.org database with their respective AM scores and predicted pathogenicity. Variants D923N and M952T are predicted to be pathogenic.

Variant AlphaMissense Score Pathogenicity Prediction
R31L 0.17 benign
V201M 0.36 ambiguous
A349V 0.26 benign
S912L 0.12 benign
D924N 0.83 pathogenic
M952T 0.85 pathogenic
Y1014C 0.37 ambiguous

Variants D924N and M952T, both located in transmembrane helix 8, are predicted as pathogenic (S1C Fig). D924N resides in the potentiator binding hotspot [35, 36] and, according to clinical data, may cause pancreatic insufficiency but not lung disease [37]. M952T displays robust functional expression in vitro [38], and two patients with an M952T/F508del genotype exhibit normal chloride transport measured from intestinal mucosa [38]–suggesting this variant is likely not pathogenic, despite the AM prediction.

For performance comparison, we also plotted ROC curves for AM predictions of the 115 ClinVar variants from the AlphaMissense study and observed 96% average accuracy as presented previously [21] (S1D Fig). To validate this finding, we additionally downloaded a dataset of 209 ClinVar variants directly from ClinVar, including 96 overlapping variants from the AlphaMissense benchmark set. The ROC curve for our expanded ClinVar dataset showed >90% prediction accuracy with additional variants (S1E Fig). Finally, we plotted a ROC curve for 113 ClinVar variants not included in the AM benchmark set, which revealed a >90% accuracy and indicates AM performs well on ClinVar data outside of the training set (S1F and S1G Fig). In addition to classified variants used for performance evaluation, ClinVar contains 1,277 CFTR VUS [24]. AM predicted VUS ClinVar variants to contain 728 benign, 181 ambiguous, and 368 pathogenic variants (S1H Fig, S2 Table).

AM performance was also compared to two other pathogenicity prediction tools, Evolutionary Scale Modeling (ESM) [39] and Evolutionary model of Variant Effect (EVE) [40] (S2A and S2B Fig). In the ROC AUCs of benign variants, the AM value (0.91) was higher than those obtained for ESM (0.78) or EVE (0.78). A similar observation was made for pathogenic ROC AUCs, with AM (0.80) slightly above ESM (0.76) or EVE (0.73). ROC AUCs for ambiguous variants were nearly uniform across all methods (AM, 0.66; ESM, 0.65; EVE, 0.64). AM therefore offers a slight advantage for predicting pathogenic or benign variants and less utility regarding ambiguous variants.

Previous analysis of CFTR variants across sampled genetic information indicates the NBD1-intracellular loop 4 (ICL4) interface is a hotspot of pathogenicity [41]. Thus, we generated a heatmap of AM scores for NBD1 residues 485–565, which encompass the α-helical subdomain, structurally diverse region (SDR), and the entire NBD1-ICL4 boundary (Fig 1D). For the Q-loop (residues 486–495) and helix 3 (residue 496–512), potential substitutions are largely predicted as pathogenic except for residues 494 and 511. Of note, AM predicts position 508 as intolerant to substitution. Deletion of the encoded phenylalanine (F508del) is the most frequently reported variant among worldwide CF populations [8, 4244]. Most variations calculated as benign or ambiguous occur within helix 4/4b of the α-helical subdomain (residues 511–532) or SDR (residues 533–547) (Fig 1D). Possible substitutions across helix 4/4b that are predicted as pathogenic include V520 and C524. V520F7 and C524X [45] are not presently approved for CFTR correctors and potentiators. Most substitutions (40% benign, 35% pathogenic) within the SDR are predicted as benign as expected base on the lack of structure in the region.

In contrast, variations at the NBD1-ICL4 interface are overwhelmingly scored as severe. Residues 548–565 comprise the NBD1 core helix 5, which directly interacts with ICL4 and demonstrates the strongest sensitivity (7% benign, 82% pathogenic) to mutation with potential substitutions predicted as pathogenic (Fig 1D). This region contains numerous CF-causing variants, some of which are refractory to available CFTR modulators, such as R560T/K/S [4, 41]. Within the ICL4 region (residues 1048–1084), AM scores indicate 14% benign and 69% pathogenic predictions (Fig 1E). The heatmap reveals residues 1069, 1072, 1076, and 1084 as relatively tolerant to substitution. Together, these data suggested AM pathogenicity scores matched previous findings, as well as our general understanding about residue conservation throughout CFTR, while providing specific information about every possible substitution.

II. Cystic Fibrosis pathogenicity correlated modestly with AlphaMissense predictions

In addition to classifying variant pathogenicity, the CFTR2.org database annotates clinical outcomes for persons with CF including sweat chloride levels, pancreatic insufficiency rates, Pseudomonas aeruginosa infection rates, and lung function [19]. We curated the clinical outcomes for all CFTR missense variants with available data (S1 Table, See Methods). We then analyzed the ability of AM to predict patient pathogenicity metrics. Briefly, CFTR2.org data were downloaded from the Variant List History tab and filtered for 176 missense variants (169 classified and 7 VUS). Then, clinical outcome data were manually assembled by searching each variant and recording the sweat chloride (mEq/L), pancreatic insufficiency rate (%), P. aeruginosa infection rate (%), and lung function (forced expiratory volume in one second (FEV1), % predicted). Of note, CFTR2 data was based on individual alleles, e.g. missense variants.

First, we plotted AM score versus CF sweat chloride levels for 123 missense variants with sweat chloride values reported (Fig 2A). AM score correlated modestly with sweat chloride levels (Pearson Correlation Coefficient: 0.46, Spearman Correlation Coefficient: 0.48). CF-causing variants, shown in blue, clustered in the top right corner, indicative of high AM scores and elevated sweat chloride levels. By contrast, VVCCs, shown in yellow, clustered in the bottom right corner, reflecting an excessive AM score (Fig 2A). When considering CF-causing or VVCC separately, we note a reduced correlation between sweat chloride levels and AM scores (S3A and S3B Fig), suggesting AM captures the trend across all variant types rather than performing better on pathogenic variants.

Fig 2. Benchmarking AlphaMissense against cystic fibrosis patient pathogenicity metrics.

Fig 2

A. AM score plotted against sweat chloride levels in milliequivalents per liter (mEq/L) for 123 missense variants. Healthy sweat chloride levels were <30 mEq/L. CF-causing variants were shown in blue, variants of variable clinical consequence (VVCC) were shown in yellow, and variants of unknown significance (VUS) were shown in red (as annotated in CFTR2). A modest linear correlation (Pearson Coefficient r = 0.46, Spearman Coefficient ρ = 0.48) was observed. B. AM score plotted against pancreatic insufficiency rates in percent for 116 missense variants. Less correlation (Pearson Coefficient r = 0.31, Spearman Coefficient ρ = 0.41) was observed than with sweat chloride, notably many VVCCs were predicted pathogenic AM score but demonstrate a low pancreatic insufficiency rate. Colors annotated as in A. C. AM score plotted against P. aeruginosa infection rates in percent for 114 missense variants. Colors annotated as in A. Linear correlation (Pearson Coefficient r = 0.38, Spearman Coefficient ρ = 0.43) is shown. Interestingly, S912L, a variant of unknown significance is predicted to be benign but shows a high sweat chloride, pancreatic sufficiency, and P. aeruginosa infection rate compared to variants with similar scores.

Next, we plotted AM score versus pancreatic insufficiency rates for 116 missense variants present on at least one allele of persons with CF with CFTR2 outcomes reported (Fig 2B). AM scores correlated poorly with pancreatic insufficiency rates (Pearson coefficient: 0.31, Spearman Coefficient: 0.41) compared to sweat chloride. Again, AM failed to predict VVCCs, shown in yellow, on this metric (Fig 2B). However, considering CF-causing and VVCCs separately failed to change the correlation for pancreatic insufficiency (S3C and S3D Fig). Finally, we plotted AM score versus P. aeruginosa infection rates for 114 missense variants on at least one allele with CFTR2 outcomes reported (Fig 2C). AM correlated better here than for pancreatic insufficiency rates, but worse than for sweat chloride (Pearson Coefficient: 0.38, Spearman Coefficient: 0.44). However, it performed better on VVCCs, yet correlation was again reduced when only CF-causing or VVCCs were separately considered (S3E and S3F Fig).

Taken together, AM correlated modestly with clinical data and performed poorly on VVCCs and VUSs. For example, VUS S912L was predicted benign with an AM score of 0.12. However, this variant was associated with sweat chloride levels of 60 mEq/L (Fig 2A), which resides exactly at the diagnostic cutoff for CF. S912L displays a pancreatic insufficiency rate of 57% (Fig 2B) and P. aeruginosa infection rate of 50% (Fig 2C)–suggesting this variant may present with more pathologic characteristics than predicted or annotated in CFTR2. Unfortunately, pathogenic-predicted variants such as D924N and M952T have insufficient data available on CFTR2 for comparison. Weak performance by AM could be attributable to high false positive rates and/or compound heterozygous genotypes. The latter factor likely complicates interpretation of clinical data, as people with complex CF alleles may exhibit differing degrees of variant severity on each chromosome (e.g. one CF-causing paired with a VUS/VVCC) compared to patients with the same variant severity on each allele (e.g. two CF-causing).

III. AlphaMissense predicts CFTR function beyond folding and trafficking competency

Much CFTR biochemical and functional data was also available for comparison, including recent deep mutational scanning (DMS), theratype screening, and spatial covariance studies [23, 33, 46] (S3 and S4 Tables). In the DMS study, fluorescence-activated cell sorting was used to measure the cell surface immunostaining intensity of an epitope-tagged library of 129 CFTR variants including 100 missense variants [23]. In the theratype study, 655 variants including 585 missense variants were screened for their trafficking efficiency and function [33]. In the spatial covariance study, a CFTR trafficking and a chloride conductance index were established to characterize variant temperature response [46]. Variable, albeit high, overlap was observed between the CFTR2 dataset and the in vitro data sets discussed below (S2C Fig).

First, we evaluated AM ability to predict CFTR folding competency–which is well characterized to correlate with cell surface expression and trafficking efficiency [4750]. We plotted AM prediction scores for 100 missense variants versus DMS cell immunostaining intensity (Fig 3A, S4 Table), which showed an inverse relationship with poor correlation (Pearson coefficient: -0.37, Spearman Coefficient: -0.37). Notably, among variants in the top right corner, e.g. high AM score and high surface staining, we observed several gating variants (G551D/S, R347H, S1251N, and G1244E etc.) (Fig 3A). Mis-gating variants traffic normally, but they are CF-causing due to disrupted properties of channel opening and closing. This result demonstrated that AM failed to infer the nature of the variant defect.

Fig 3. Benchmarking AlphaMissense against CFTR in vitro functional metrics.

Fig 3

A. AM score plotted against deep mutational scanning data for 100 missense variants in HEK293T cells [23]. The y-axis represents the cell surface immune-staining intensity of CFTR and thus is indicative of the CFTR trafficking levels to the cell surface. A slight inverse linear correlation was observed (Pearson Coefficient r = -0.37, Spearman Coefficient ρ = -0.37). Some off-axis variants such as G551D are gating variants and thus fail to experience aberrant trafficking in the HEK293T cell background but still exhibit impaired channel function. CF-causing variants are shown in blue, variants of variable clinical consequence (VVCC) are shown in yellow, and variants of unknown significance (VUS) are shown in red (as annotated in CFTR2). B. AM score plotted against CFTR western blot C:B band ratio in percent WT for 538 missense variants [33]. Several off-axis variants are highlighted. Color annotation as in A. Inverse linear correlation (Pearson Coefficient r = -0.50, Spearman Coefficient ρ = -0.53) improves for this larger dataset. Of note, VUS D924N and S912L were filtered out of this dataset due to an SEM >30, indicative of poor reproducibility across replicates (S4A and S4B Fig). C. AM score plotted against Forskolin induced CFTR current in percent WT for 546 missense variants [33]. Color annotation as in A. Inverse linear correlation (Pearson Coefficient r = -0.70, Spearman Coefficient ρ = -0.69) is higher than the function data compared to trafficking alone. All seven CFTR2 VUSs are highlighted.

Next, we plotted AM prediction scores for 538 missense variants versus CFTR trafficking efficacy as measured by the ratio of mature, fully-glycosylated CFTR (band C) to the immature glycoform (band B) on western blot (C/B band ratio) (Fig 3B, S3 Table) [33]. Experimental data was filtered for plotting clarity (S4 Fig, See Methods). We removed highly variable experimental data with a standard error of the mean (SEM) greater than 30. Most CFTR variants show a C:B ratio less than 30% of WT, indicating a lack of reproducibility for these measurements with higher variability (8% of data points removed, 92% retained). AM scores displayed improved inverse correlation with the larger trafficking efficiency dataset (Pearson coefficient: -0.50, Spearman Coefficient: -0.53). This finding suggested AM can predict CFTR folding competency across diverse types of variants. Several off-axis variants were annotated that show poor predictions and poor trafficking (<30% of WT) based on the distribution of all trafficking data (S4A and S4B Fig).

Finally, we evaluated AM ability to predict CFTR function as measured by transepithelial current clamp conductance [33]. We plotted AM prediction scores versus forskolin (FSK)-induced basal CFTR channel activity as percent WT (FSK %WT) for 546 missense variants (Fig 3C). Again, highly variable experimental data were filtered out considering an SEM greater than 20 as most variants were less than 20% of WT (S4 Fig, See Methods), leaving 93% of the experimental data for comparison to AM. AM scores inversely correlated best with CFTR function measured by conductance (Pearson coefficient: -0.70, Spearman Coefficient: -0.69). Several off-axis variants were noted which show poor predictions and poor channel function (<30% of WT) based on the distribution of functional data (S4C and S4D Fig).

We verified the increased capability to predict CFTR function by correlating AM scores with a spatial covariance study (S5 Fig). This study describes trafficking (measured by western blot band shift assay) and chloride conductance indices and presents data for both metrics at 37 ºC and reduced temperature (27 ºC) [46]. Reduced temperature is a well-established method for partially rescuing F508del biogenesis [51]. We observed a modest correlation (Pearson coefficient: -0.46, Spearman Coefficient: -0.44) with trafficking index at 37 ºC, and a similar correlation at 27 ºC (Pearson coefficient: -0.48, Spearman Coefficient: -0.49) (S5A and S5B Fig). Again, correlation increased when compared to chloride conductance index (Pearson coefficient: -0.58, Spearman Coefficient: -0.54 at 37 ºC vs. Pearson coefficient: -0.50, Spearman Coefficient: -0.53 at 27 ºC) (S5C and S5D Fig). Together these results indicated that AM scores are closely aligned with pathogenicity but cannot differentiate between variants that compromise expression versus function.

IV. AlphaMissense cannot predict CFTR variant theratype

Given the rapid and continuous emergence of novel CFTR variants detected by next-generation sequencing technologies, as well as a robust pipeline of new modulators and other CFTR-directed treatments under development, the need remains for optimized approaches to CF precision therapeutics. CFTR variant theratyping is an established method for quantifying in vitro CFTR sensitivity to pharmacologic agents, results of which are utilized to predict treatment responses for genotype-matched patients [6, 52]. CF treatment involves two corrector compounds, VX-661 and VX-445, that likely bind directly to two unique sites on CFTR [14], show distinct mechanisms, and hence distinct response profiles across variants. Thus, theratyping variant response remains an important task for CF personalized medicine.

We sought to determine whether AM offered any predictive power for CFTR theratyping, although this task is beyond the intended scope of AM. Theratype distinguishing plots were generated and colored by AM pathogenicity score to assess for potential patterns. We split VX-445-sensitive variants from VX-661-sensitive variants along a diagonal axis of best fit by plotting CFTR immunostaining intensity for VX-445 versus VX-661 (Fig 4A). Variants responsive to VX-445 fell above the dotted line, and variants responsive to VX-661 fell below the dotted line [23]. Variants were then colored by AM pathogenicity score, although the color distribution across the responsive spectrum revealed little discernable patterns (Fig 4A). We also plotted basal CFTR immune staining intensity versus VX-661, VX-445, or the combination thereof, then shaded variants by AM pathogenicity (S6A–S6C Fig). Similarly, AM scores showed little-to-no color patterns and appear randomly distributed.

Fig 4. CFTR theratype plots colored by AlphaMissense pathogenicity score.

Fig 4

A. CFTR cell surface immune staining intensity comparing treatment with VX-661 versus VX-445 correctors with the dotted line representing equivalent response to both correctors [23]. Variants that fell below the best-fit dotted line are selectively responsive to VX-661 while variants above the dotted line were selectively responsive to VX-445. The AlphaMissense pathogenicity predictions (color gradient) show no correlation with the corrector response patterns for CFTR variants. Variants with a score from 0.56–1.00 were classified by AM as pathogenic (violet), variants with a score from 0.34–0.56 were classified as ambiguous (grey), and variants with a score from 0.04–0.34 were classified as benign (green). Error bars represent the standard deviation of the cell surface immunostaining intensity. Selectively sensitive variants were annotated. B. Correlation of the basal CFTR activity versus CFTR activity when treated with dual corrector combination (VX-445, VX-661) from [33]. FSK-induced CFTR channel activity is expressed as % WT. Error bars were excluded for clarity and AM predicted pathogenicity colored as in A. A line of benign variants emerged suggesting that AM benign predicted variants display a similar response to correctors that is dependent on their basal activity. Pathogenic variants, by contrast, showed little pattern or ability to predict theratype. Off-axis variants were annotated. C. Degree of functional CFTR correction was calculated by subtracting basal FSK (% of WT) levels from VX-445+VX-661 FSK (% of WT) levels and then plotted against AM score and colored by score. CFTR variants across the continuum of AM scores displayed variable functional responses to the corrector combination. AM predicted pathogenicity colored as in A.

Next, we used the theratyping study CFTR functional data [33] to plot the VX-445 + VX-661 FSK-mediated response (% of WT) versus basal activity, then colored the values by AM pathogenicity score (Fig 4B). Benign variants fell along a linear diagonal, suggesting that benign predicted variants all experience a linear response to CFTR correctors. We speculate this shift may reflect well-documented WT modulator response, implying an inherent stabilizing effect of VX-445 and VX-661. C:B band ratio response colored by AM score portrayed a random distribution of score color (S7A Fig). Pathogenic predicted variants in both plots show a random distribution. To determine whether theratype was predicted by variant structural location within CFTR, combined with AM score, we subdivided the plot in Fig 4B by domain (S7B–S7E Fig). Each domain individually showed a similar random distribution of score colors. Finally, we calculated relative degree CFTR correction by subtracting basal FSK (% of WT) from VX-445+VX-661 correction FSK (% of WT) and plotted this difference against AM score (Fig 4C). Again, no obvious pattern was observed. In summary, we found AM score afforded little predictive power for profiling pharmacologic responsiveness of CFTR variants. However, AM score could potentially be a useful machine learning feature for future theratype prediction methods.

Conclusion

AlphaMissense has the exciting potential to aid with pathogenicity classification of rare and emerging variants identified during genetic screening. CF posited a valuable case study for evaluating AM performance because of abundance of clinical outcome data and in vitro variant classifications available. AM predicted pathogenicity of severe CF-causing variants well, albeit with a high false positive rate, and matched previous studies of CFTR variant pathogenicity in the NBD1/ICL4 interface [41]. However, AM performed modestly for pathogenicity predictions of VUSs and VVCCs, and the tool does not appear useful for CFTR theratype predictions. Again, for pathogenic missense variants, AM score correlated modestly with trafficking data and correlated well with channel activity functional data. Thus, predictions offer little information for distinguishing pathogenicity mechanism. AM may provide guidance in determining if polymorphisms in CFTR are benign, but performance on less severe disease variants indicate that caution must be taken when interpreting AM predictions. In vitro measurements on variant severity may aid in evaluating prediction quality and will remain necessary for CFTR theratyping.

Methods

Data curation and collection

AlphaMissense (AM) predictions for all single amino acid substitutions in the human proteome data was downloaded, gunzipped, and searched using vim text editor for CFTR accession number/Uniprot ID P13569. CFTR AM predictions were extracted into a separate file for analysis. ESM score predictions were downloaded from https://huggingface.co/spaces/ntranoslab/esm_variants by searching accession number P13569. EVE predictions were downloaded from https://evemodel.org/proteins/CFTR_HUMAN#variantsTableContainer by searching accession number P13569.

Cystic Fibrosis clinical outcome data was initially downloaded from the Variant List History tab on CFTR2.org. The table of 804 variants was filtered for 176 missense variants by removing in/dels, splicing variants, premature stop codons, etc. The patient information was manually curated by searching each variant and annotating the sweat chloride (mEq/L), pancreatic insufficiency rate (%), P. aeruginosa infection rate (%), lung function ages < 10 (FEV1%), lung function ages 10<20 (FEV1%), lung function ages >20 (FEV1%) (S1 Table). Lung function data proved too highly variable for comparison and was not used, but was still included in the Supporting Table for reference. CFTR2 definitions for these variants are as follows [19]: CF-causing: “A variant in one copy of the CFTR gene that always causes CF, as long as it is paired with another CF-causing variant in the other copy of the CFTR gene.” Non-CF-causing: “A variant in one copy of the CFTR gene that does not cause CF, even when it is paired with a CF-causing variant in the other copy of the CFTR gene.” Variant of Variable Clinical Consequence (VVCC): “A variant that may cause CF, when paired with a CF-causing variant in the other copy of the CFTR gene.” Variant of Unknown Significance (VUS): “A variant for which we do not have enough information to determine whether or not it falls into the other three categories.”

In vitro modulator response data was downloaded from [33] and deep mutational scanning data downloaded from [23]. The 650 variants from [33] were filtered to 585 missense variants. ClinVar data was downloaded after searching for CFTR. Clinvar predictions were filtered for missense variants by removing, in/dels, stop codons, double missense variants, etc. Filtering yielded 1768 missense variant pathogenicity predictions (S2 Table). For performance comparison, the missense variants were filtered by clinical significance. We removed classifications such as no interpretation, conflicting interpretations, uncertain significance, and drug response. This left 219 variants classified as pathogenic or likely benign for performance evaluation and ROC plotting.

Filtering experimental data

Experimental data from Bihler et al. [33] were filtered to exclude highly variable data based on the SEM due to lack of reproducibility. We plotted both the distribution of the data itself to look for outliers on the y axis of our correlation plots (Fig 3B and 3C) and the distribution of the SEM (S4 Fig). We labeled outliers with a C:B ratio of less than 30, but with a benign AM prediction of less than 0.3 (S4A Fig, Fig 3B). C:B band ratio SEM of greater than 30 were excluded from the analysis and not plotted for clarity, leaving 538 variants for analysis– 92% of the experimental data (S4B Fig). For the functional data, we labeled outliers of interest with a FSK % of WT of less than 30, but with a benign AM prediction of less than 0.3 (S4C Fig, Fig 3C). FSK % of WT SEM of greater than 20 were excluded from the analysis and not plotted for clarity, leaving 546 variants for analysis– 93% of the available experimental data.

Analysis

Data were analyzed and plotted in Python 3. Raw excel files were imported and parsed using the Pandas data frame library and plots were generated with the matplotlib.pyplot and seaborn libraries. Pearson and Spearman correlation coefficients were calculated with the scipy.stats library using the pearsonr() and spearmanr() functions respectively. Plots were generated for all possible variants with available data for a given metric. The receiver operating characteristic (ROC) curve for AM pathogenicity predicted by AM was calculated against CFTR2 classification. CFTR2 classifies variants as CF causing, variable clinical consequence (VVCC), or non-CF causing. We equated the AM prediction pathogenic to CF causing, ambiguous to variable, and benign to non-CF causing. Since ROC is used for binary classification, all pairwise comparisons were considered. In each ROC curve, a different prediction (pathogenic, ambiguous, or benign) was taken to be a true positive, and the other two predictions to be false positives. Then the corresponding true positive and false positive rates were calculated by considering all possible score cutoffs for pathogenicity. Theratype discerning plots were generated to distinguish responsive from non-responsive variants graphically and colored by the variants respective AM score.

Supporting information

S1 Fig. AlphaMissense prediction of CFTR variants of unknown significance (VUS).

A. Conservation of residue in CFTR mapped on the structure (PDBID 5UAK) [25,26]. The abundance of green, representing low conservations scores in the TMDs stands in notable contrast the AM score predictions of pathogenicity in the TMDs. B. An overlay of the active state CFTR (PDB ID 6MSM) [30] and the AlphaFold prediction for CFTR [29] showing nearly perfect alignment of all resolved residues (1–409,435–637, 845–889, 900–1173, 1202–1451). We calculated the root mean squared deviations (RMSD) of carbon backbone atoms between these two models in Chimera and found an RMSD of just 2.5 Å. C. Variants of unknown significance (VUS) displayed on CFTR structure (PDBID 5UAK) [25] demonstrates all unknown variants are in the transmembrane domains. Benign predicted mutations are shown in green, ambiguous predicted mutations in grey, and pathogenic predicted mutations are shown in purple. The two pathogenic predicted mutations both occur in transmembrane helix 8 (TH8) shown in orange. D. Receiver operating characteristic curve for AlphaMissense predictions of 115 Clinvar variants presented in the AlphaMissense benchmark. The average performance between pathogenic and benign variants is 95.8% as previously presented [21]. E. Receiver operating characteristic curve for AlphaMissense predictions of 209 variants downloaded directly from Clinvar [24], including 96 overlapping variants from the AlphaMissense benchmark. Again, the average performance is 95.8%. F. Receiver operating characteristic curve for AlphaMissense predictions of 113 variants downloaded directly from Clinvar [24] that did not overlap with variants from the AlphaMissense benchmark. Despite, not being trained on these ClinVar data, average performance is 95.8%. G. Overlap between AlphaMissense ClinVar benchmark set and our extended ClinVar set–showing 115 variants from AM and an additional 113 variants considered in F. Performance of AlphaMissense is very good across all permutations of ClinVar data considered. H. Due to the high number of VUS predictions in ClinVar6 for CFTR missense mutations, we plotted the AlphaMissense score for all 1277 VUSs in ClinVar. We show 728 benign, 181 ambiguous, and 368 pathogenic variants as predicted by AM. Data is available in S2 Table.

(TIF)

S2 Fig. Alternative prediction method performance and dataset overlap.

A. Receiver operating characteristic curve for ESM predictions [39] of 169 CFTR missense variants including 110 CF causing, 41 variable clinical consequence (VVCC), and 18 non-CF causing variants. For the pathogenic curve (violet), we considered a pathogenic prediction of a CF-causing variant a true positive. For the ambiguous curve (grey)—we considered the ambiguous prediction a VVCC a true positive. For the benign curve (bluegreen)–we considered the benign prediction of a non-CF causing variant as a true positive. B. Receiver operating characteristic curve calculated the same as in A. but using EVE missense variant predictions [40] of 169 CFTR missense variants colored as shown in A. C. Venn diagrams depicting the overlap of various datasets used throughout the study. We considered our expanded ClinVar dataset, the deep mutational scanning (DMS) dataset [23], our curated CFTR2 dataset, and the missense variants from the Bihler et al. dataset [33].

(TIF)

S3 Fig. AlphaMissense prediction correlations with cystic fibrosis patient pathogenicity metrics by diagnosis.

A. AM score plotted against sweat chloride levels in milliequivalents per liter (mEq/L) for 85 missense variants classified as CF causing. The linear correlation (Pearson Coefficient r = 0.21, Spearman Coefficient ρ = 0.33) is reduced compared to the complete data set correlation shown in Fig 2A. B. AM score plotted against sweat chloride levels for 33 missense variants classified as variants of variable clinical consequence (VVCC). The linear correlation (Pearson Coefficient r = -0.12, Spearman Coefficient ρ = -0.11) is statistically insignificant. C. AM score plotted against pancreatic insufficiency rates in percent for 83 missense variants classified as CF causing. The correlation (Pearson Coefficient r = 0.32, Spearman Coefficient ρ = 0.44) was similar to the entire dataset in Fig 2B. D. AM score plotted against pancreatic insufficiency rates for 30 missense variants classified as VVCC. The correlation for these data (Pearson Coefficient r = -0.21, Spearman Coefficient ρ = -0.22) was statistically insignificant. E. AM score plotted against pseudomonas infection rates for 82 missense variants classified as CF-causing. Linear correlation is reduced compared to the entire data set presented in Fig 2C (Pearson Coefficient r = 0.33, Spearman Coefficient ρ = 0.36). F. AM score plotted against pseudomonas infection rates in percent for 28 missense variants classified as VVCC. Linear correlation was insignificant (Pearson Coefficient r = 0.13, Spearman Coefficient ρ = 0.28).

(TIF)

S4 Fig. Distributions of experimental data and error from Bihler et al. study for filtering purposes.

A. Histogram of the distribution of C-B band ratio of all 585 missense variants from the Bihler et al. study [33]. B. Histogram of the distribution of C-B band ratio SEM for all 585 missense variants from the Bihler et al. study. Variants with an SEM greater than 30 were excluded from analysis due to lack of experimental reproducibility and for plotting clarity. C. FSK %WT distribution plotted as a histogram for all 585 missense variants from the Bihler et al. study [33]. D. FSK %WT SEM distribution plotted as a histogram for all 585 missense variants from the Bihler et al. study. Variants with an SEM greater than 20 were excluded from analysis due to lack of experimental reproducibility and for plotting clarity.

(TIF)

S5 Fig. AlphaMissense correlation with CFTR in vitro data from spatial covariance study.

A. Spatial covariance data from a previous study [46] for 62 missense variants plotted against AlphaMissense scores. Y axis represents the trafficking index as measured by a western blot trafficking assay when HEK293T cells were incubated at 37 ºC. A slight inverse linear correlation was observed (Pearson Coefficient r = -0.46, Spearman Coefficient ρ = -0.44). B. Spatial covariance data for 62 missense variants using the same trafficking index in A. except at 27 ºC, plotted against AlphaMissense scores. Again, an inverse linear correlation was observed (Pearson Coefficient r = -0.48, Spearman Coefficient ρ = -0.49) albeit slightly higher than at 37 ºC. C. AlphaMissense scores correlated with the spatial covariance data but using chloride conductance index described in [46], which measured channel activity at 37 ºC. We observed an increased correlation (Pearson Coefficient r = -0.58, Spearman Coefficient ρ = -0.54). D. AlphaMissense scores correlated with chloride conductance index at 27 ºC. We observed a slight correlation (Pearson Coefficient r = -0.50, Spearman Coefficient ρ = -0.53).

(TIF)

S6 Fig. Deep mutational scanning data for VX-661 and VX-445 response colored by AlphaMissense pathogenicity score.

A. Basal CFTR surface immune staining versus VX-661 CFTR cell surface immune staining intensity [23]. Pathogenic variants score from 0.56–1.00 (violet), ambiguous variants score from 0.34–0.56 (grey), and benign variants score 0.04–0.34 (green). Error bars represent standard deviation. The distribution of pathogenicity colors throughout the plots suggested that AM pathogenicity prediction score failed to predict the VX-661 response. B. Basal CFTR surface immune staining versus VX-445 CFTR cell surface immune staining intensity. Colored the same as in A. Error bars represent standard deviation. Again, AM score failed to predict VX-445 response. C. Basal CFTR surface immune staining versus VX-661 + VX-445 CFTR cell surface immune staining intensity. Colored the same as in A. Error bars represent standard deviation. Finally, AM score failed to predict the combination of VX-661 and VX-445 response on a variant basis.

(TIF)

S7 Fig. CFTR modulator response plots colored by AlphaMissense pathogenicity score reveals little predictive capabilities of AM in theratyping.

A. Basal mature CFTR (C band) to immature CFTR (B band) trafficking (C-B ratio) in percent WT versus modulator enhanced C/B ratio in percent WT from the Bihler et al. study [33]. Error bars were excluded for clarity. Variants with an AlphaMissense pathogenicity prediction score from 0.56–1.00 were classified by AM as pathogenic (violet), a score from 0.34–0.56 as ambiguous (grey), and a score from 0.04–0.34 as benign (green). The distribution of colors across the plots indicated little predictive capability of AM on trafficking theratype. B. TMD1 variants only from Fig 4B. of the basal FSK CFTR activity in percent WT versus modulator enhanced FSK CFTR activity in percent WT from the Bihler et al. study [33]. Error bars were excluded for clarity and AM predicted pathogenicity colored as in A. C. NBD1 variants only from Fig 4B., data from the Bihler et al. study [33]. D. TMD2 variants only from Fig 4B., data from the Bihler et al. study [33]. E. NBD2 variants only from Fig 4B., data from the Bihler et al. study [33].

(TIF)

S1 Table. CFTR2.org variant clinical outcome data.

(XLSX)

S2 Table. ClinVar CFTR variant data.

(XLSX)

S3 Table. In vitro CFTR trafficking and functional data from Bihler et al.

(XLSX)

S4 Table. Deep mutational scanning CFTR data.

(XLSX)

Data Availability

All relevant data are within the manuscript and its Supporting information files.

Funding Statement

This work was supported by R35 GM133552 (NIGMS), R01 HL167046 (NHBLI), R00HL151965 (NIH) and OLIVER22A0-KB (CFF). EFM was supported by a predoctoral fellowship F31 HL162483 (NHLBI) and Chemical-Biology Interface training grant T32 GM065086 (NIGMS).

References

  • 1.Welsh MJ, Smith AE. Molecular mechanisms of CFTR chloride channel dysfunction in cystic fibrosis. Cell. 1993. Jul;73(7):1251–4. doi: 10.1016/0092-8674(93)90353-r [DOI] [PubMed] [Google Scholar]
  • 2.Riordan JR, Rommens JM, Kerem BS, Alon N, Rozmahel R, Grzelczak Z, et al. Identification of the Cystic Fibrosis Gene: Cloning and Characterization of Complementary DNA. Science. 1989. Sep 8;245(4922):1066–73. doi: 10.1126/science.2475911 [DOI] [PubMed] [Google Scholar]
  • 3.Cutting GR. Cystic fibrosis genetics: from molecular understanding to clinical application. Nat Rev Genet. 2015. Jan;16(1):45–56. doi: 10.1038/nrg3849 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.McDonald EF, Meiler J, Plate L. CFTR Folding: From Structure and Proteostasis to Cystic Fibrosis Personalized Medicine. ACS Chem Biol. 2023. Sep 20;acschembio.3c00310. doi: 10.1021/acschembio.3c00310 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Oliver KE, Han ST, Sorscher EJ, Cutting GR. Transformative therapies for rare CFTR missense alleles. Curr Opin Pharmacol. 2017. Jun;34:76–82. doi: 10.1016/j.coph.2017.09.018 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Clancy JP, Cotton CU, Donaldson SH, Solomon GM, VanDevanter DR, Boyle MP, et al. CFTR modulator theratyping: Current status, gaps and future directions. J Cyst Fibros. 2019. Jan;18(1):22–34. doi: 10.1016/j.jcf.2018.05.004 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Molinski SV, Ahmadi S, Hung M, Bear CE. Facilitating Structure-Function Studies of CFTR Modulator Sites with Efficiencies in Mutagenesis and Functional Screening. SLAS Discov. 2015. Dec;20(10):1204–17. doi: 10.1177/1087057115605834 [DOI] [PubMed] [Google Scholar]
  • 8.CF Foundation Patient Registry, https://www.cff.org/medical-professionals/patient-registry. 2022.
  • 9.Middleton PG, Mall MA, Dřevínek P, Lands LC, McKone EF, Polineni D, et al. Elexacaftor–Tezacaftor–Ivacaftor for Cystic Fibrosis with a Single Phe508del Allele. N Engl J Med. 2019. Nov 7;381(19):1809–19. doi: 10.1056/NEJMoa1908639 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Oliver KE, Carlon MS, Pedemonte N, Lopes-Pacheco M. The revolution of personalized pharmacotherapies for cystic fibrosis: what does the future hold? Expert Opin Pharmacother. 2023. Sep 22;24(14):1545–65. doi: 10.1080/14656566.2023.2230129 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Trikafta Prescribing Information. 2023.
  • 12.Baatallah N, Elbahnsi A, Mornon JP, Chevalier B, Pranke I, Servel N, et al. Pharmacological chaperones improve intra-domain stability and inter-domain assembly via distinct binding sites to rescue misfolded CFTR. Cell Mol Life Sci. 2021. Dec;78(23):7813–29. doi: 10.1007/s00018-021-03994-5 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Fiedorczuk K, Chen J. Mechanism of CFTR correction by type I folding correctors. Cell. 2022;185(1):158–168.e11. doi: 10.1016/j.cell.2021.12.009 [DOI] [PubMed] [Google Scholar]
  • 14.Fiedorczuk K, Chen J. Molecular structures reveal synergistic rescue of Δ508 CFTR by Trikafta modulators. Science. 2022. Oct 21;378(6617):284–90. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Wang C, Yang Z, Loughlin BJ, Xu H, Veit G, Vorobiev S, et al. Mechanism of dual pharmacological correction and potentiation of human CFTR [Internet]. Biophysics; 2022. Oct [cited 2023 Dec 23]. http://biorxiv.org/lookup/doi/10.1101/2022.10.10.510913 [Google Scholar]
  • 16.Kallam EF, Kasi AS, Barr E, Linnemann RW, Guglani L. Diagnostic challenges in CFTR-related metabolic syndrome: Where the guidelines fall short. Paediatr Respir Rev. 2023. Aug;S1526054223000489. doi: 10.1016/j.prrv.2023.08.004 [DOI] [PubMed] [Google Scholar]
  • 17.Barben J, Castellani C, Munck A, Davies JC, De Winter–de Groot KM, Gartner S, et al. Updated guidance on the management of children with cystic fibrosis transmembrane conductance regulator-related metabolic syndrome/cystic fibrosis screen positive, inconclusive diagnosis (CRMS/CFSPID). J Cyst Fibros. 2021. Sep;20(5):810–9. doi: 10.1016/j.jcf.2020.11.006 [DOI] [PubMed] [Google Scholar]
  • 18.Southern KW, Barben J, Gartner S, Munck A, Castellani C, Mayell SJ, et al. Inconclusive diagnosis after a positive newborn bloodspot screening result for cystic fibrosis; clarification of the harmonised international definition. J Cyst Fibros. 2019. Nov;18(6):778–80. doi: 10.1016/j.jcf.2019.04.010 [DOI] [PubMed] [Google Scholar]
  • 19.The Clinical and Functional TRanslation of CFTR (CFTR2); http://cftr2.org.
  • 20.Cystic Fibrosis Mutation Database, http://www.genet.sickkids.on.ca/. 2023.
  • 21.Cheng J, Novati G, Pan J, Bycroft C, Žemgulytė A, Applebaum T, et al. Accurate proteome-wide missense variant effect prediction with AlphaMissense. Science. 2023. Sep 19;eadg7492. doi: 10.1126/science.adg7492 [DOI] [PubMed] [Google Scholar]
  • 22.McDonald EF, Woods H, Smith ST, Kim M, Schoeder CT, Plate L, et al. Structural Comparative Modeling of Multi-Domain F508del CFTR. Biomolecules. 2022. Mar 18;12(3):471. doi: 10.3390/biom12030471 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.McKee AG, McDonald EF, Penn WD, Kuntz CP, Noguera K, Chamness LM, et al. General trends in the effects of VX-661 and VX-445 on the plasma membrane expression of clinical CFTR variants. Cell Chem Biol. 2023. Jun 15;30(6):632–642.e5. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Landrum MJ, Lee JM, Benson M, Brown GR, Chao C, Chitipiralla S, et al. ClinVar: improving access to variant interpretations and supporting evidence. Nucleic Acids Res. 2018. Jan 4;46(D1):D1062–7. doi: 10.1093/nar/gkx1153 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Liu F, Zhang Z, Csanády L, Gadsby DC, Chen J. Molecular Structure of the Human CFTR Ion Channel. Cell. 2017;169(1):85–92. doi: 10.1016/j.cell.2017.02.024 [DOI] [PubMed] [Google Scholar]
  • 26.Ashkenazy H, Abadi S, Martz E, Chay O, Mayrose I, Pupko T, et al. ConSurf 2016: an improved methodology to estimate and visualize evolutionary conservation in macromolecules. Nucleic Acids Res. 2016. Jul;44(W1):W344–50. doi: 10.1093/nar/gkw408 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Naren AP, Cormet-Boyaka E, Fu Jian Villain Matteo, Blalock JE, Quick MW, Kirk KL. CFTR Chloride Channel Regulation by an Interdomain Interaction. Science. 1999;286(October):544–8. doi: 10.1126/science.286.5439.544 [DOI] [PubMed] [Google Scholar]
  • 28.Bozoky Z, Krzeminski M, Muhandiram R, Birtley JR, Al-Zahrani A, Thomas PJ, et al. Regulatory R region of the CFTR chloride channel is a dynamic integrator of phospho-dependent intra- and intermolecular interactions. Proc Natl Acad Sci U S A. 2013;110(47). doi: 10.1073/pnas.1315104110 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Jumper J, Evans R, Pritzel A, Green T, Figurnov M, Ronneberger O, et al. Highly accurate protein structure prediction with AlphaFold. Nature. 2021. Aug 26;596(7873):583–9. doi: 10.1038/s41586-021-03819-2 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Zhang Z, Liu F, Chen J. Molecular structure of the ATP-bound, phosphorylated human CFTR. Proc Natl Acad Sci. 2018;115(50):12757–62. doi: 10.1073/pnas.1815287115 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Farinha CM, Canato S. From the endoplasmic reticulum to the plasma membrane: mechanisms of CFTR folding and trafficking. Cell Mol Life Sci. 2017;74(1):39–55. doi: 10.1007/s00018-016-2387-7 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Jurkuvenaite A, Varga K, Nowotarski K, Kirk KL, Sorscher EJ, Li Y, et al. Mutations in the Amino Terminus of the Cystic Fibrosis Transmembrane Conductance Regulator Enhance Endocytosis. J Biol Chem. 2006. Feb;281(6):3329–34. doi: 10.1074/jbc.M508131200 [DOI] [PubMed] [Google Scholar]
  • 33.Bihler H, Sivachenko A, Millen L, Bhatt P, Patel AT, Chin J, et al. In Vitro Modulator Responsiveness of 655 CFTR Variants Found in People With CF [Internet]. Pharmacology and Toxicology; 2023. Jul [cited 2023 Sep 22]. http://biorxiv.org/lookup/doi/10.1101/2023.07.07.548159 [DOI] [PubMed] [Google Scholar]
  • 34.Clain J, Lehmann-Che J, Girodon E, Lipecka J, Edelman A, Goossens M, et al. A neutral variant involved in a complex CFTR allele contributes to a severe cystic fibrosis phenotype. Hum Genet. 2005. May;116(6):454–60. doi: 10.1007/s00439-004-1246-z [DOI] [PubMed] [Google Scholar]
  • 35.Liu F, Zhang Z, Levit A, Levring J, Touhara KK, Shoichet BK, et al. Structural identification of a hotspot on CFTR for potentiation. Science. 2019;364(6446):1184–8. doi: 10.1126/science.aaw7611 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Yeh HI, Qiu L, Sohma Y, Conrath K, Zou X, Hwang TC. Identifying the molecular target sites for CFTR potentiators GLPG1837 and VX-770. J Gen Physiol. 2019. Jul 1;151(7):912–28. doi: 10.1085/jgp.201912360 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.Koyano S, Hirano Y, Nagamori T, Tanno S, Murono K, Fujieda K. A Rare Mutation in Cystic Fibrosis Transmembrane Conductance Regulator Gene in a Recurrent Pancreatitis Patient Without Respiratory Symptoms. Pancreas. 2010. Jul;39(5):686–7. doi: 10.1097/MPA.0b013e3181c65c2e [DOI] [PubMed] [Google Scholar]
  • 38.Hatton A, Bergougnoux A, Zybert K, Chevalier B, Mesbahi M, Altéri JP, et al. Reclassifying inconclusive diagnosis after newborn screening for cystic fibrosis. Moving forward. J Cyst Fibros. 2022. May;21(3):448–55. doi: 10.1016/j.jcf.2021.12.010 [DOI] [PubMed] [Google Scholar]
  • 39.Brandes N, Goldman G, Wang CH, Ye CJ, Ntranos V. Genome-wide prediction of disease variant effects with a deep protein language model. Nat Genet. 2023. Sep;55(9):1512–22. doi: 10.1038/s41588-023-01465-0 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Frazer J, Notin P, Dias M, Gomez A, Min JK, Brock K, et al. Disease variant prediction with deep generative models of evolutionary data. Nature. 2021. Nov 4;599(7883):91–5. doi: 10.1038/s41586-021-04043-8 [DOI] [PubMed] [Google Scholar]
  • 41.Molinski SV, Shahani VM, Subramanian AS, MacKinnon SS, Woollard G, Laforet M, et al. Comprehensive mapping of cystic fibrosis mutations to CFTR protein identifies mutation clusters and molecular docking predicts corrector binding site. Proteins Struct Funct Bioinforma. 2018;86(8):833–43. doi: 10.1002/prot.25496 [DOI] [PubMed] [Google Scholar]
  • 42.Campagna G, Amato A, Majo F, Ferrari G, Quattrucci S, Padoan R, et al. Registro italiano Fibrosi Cistica (RIFC). Rapporto 2019–2020. Epidemiol Prev. 2022. Sep;46(4S2):1–38. [DOI] [PubMed] [Google Scholar]
  • 43.Zampoli M, Verstraete J, Frauendorf M, Kassanjee R, Workman L, Morrow BM, et al. Cystic fibrosis in South Africa: spectrum of disease and determinants of outcome. ERJ Open Res. 2021. Jul;7(3):00856–2020. doi: 10.1183/23120541.00856-2020 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44.Vaidyanathan S, Trumbull AM, Bar L, Rao M, Yu Y, Sellers ZM. CFTR genotype analysis of Asians in international registries highlights disparities in the diagnosis and treatment of Asian patients with cystic fibrosis. Genet Med. 2022. Oct;24(10):2180–6. doi: 10.1016/j.gim.2022.06.009 [DOI] [PubMed] [Google Scholar]
  • 45.Jones CT, Mclntosh L, Keston M, Ferguson A, Brock DJH. Three novel mutations in the cystic fibrosis gene detected by chemical cleavage: analysis of variant splicing and a nonsense mutation. Hum Mol Genet. 1992;1(1):11–7. doi: 10.1093/hmg/1.1.11 [DOI] [PubMed] [Google Scholar]
  • 46.Anglès F, Wang C, Balch WE. Spatial covariance analysis reveals the residue-by-residue thermodynamic contribution of variation to the CFTR fold. Commun Biol. 2022. Apr 13;5(1):356. doi: 10.1038/s42003-022-03302-2 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47.Mendoza JL, Schmidt A, Li Q, Nuvaga E, Barrett T, Bridges RJ, et al. Requirements for efficient correction of Δf508 CFTR revealed by analyses of evolved sequences. Cell. 2012;148(1–2):164–74. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 48.Rabeh WM, Bossard F, Xu H, Okiyoneda T, Bagdany M, Mulvihill CM, et al. Correction of Both NBD1 Energetics and Domain Interface Is Required to Restore ΔF508 CFTR Folding and Function. Cell. 2012. Jan;148(1–2):150–63. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 49.Protasevich I, Yang Z, Wang C, Atwell S, Zhao X, Emtage S, et al. Thermal unfolding studies show the disease causing F508del mutation in CFTR thermodynamically destabilizes nucleotide-binding domain 1. Protein Sci. 2010;19(10):1917–31. doi: 10.1002/pro.479 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 50.He L, Aleksandrov LA, Cui L, Jensen TJ, Nesbitt KL, Riordan JR. Restoration of domain folding and interdomain assembly by second‐site suppressors of the ΔF508 mutation in CFTR. FASEB J. 2010;24(8):3103–12. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 51.Denning GM, Anderson MP, Amara JF, Marshall J, Smith AE, Welsh MJ. Processing of mutant cystic fibrosis transmembrane conductance regulator is temperature-sensitive. Nature. 1992. Aug;358(6389):761–4. doi: 10.1038/358761a0 [DOI] [PubMed] [Google Scholar]
  • 52.McDonald EF, Sabusap CMP, Kim M, Plate L. Distinct proteostasis states drive pharmacologic chaperone susceptibility for cystic fibrosis transmembrane conductance regulator misfolding mutants. Miller E, editor. Mol Biol Cell. 2022. Jun 1;33(7):ar62. [DOI] [PMC free article] [PubMed] [Google Scholar]

Decision Letter 0

Jeffrey L Brodsky

30 Nov 2023

PONE-D-23-36353Benchmarking AlphaMissense Pathogenicity Predictions Against Cystic Fibrosis VariantsPLOS ONE

Dear Dr. Plate,

Thank you for submitting your manuscript to PLOS ONE. As you will note below, your manuscript was read and commented on by two experts in the field, both of whom--I am delighted to say--felt that your study was significant and would be of interest to the readers of the journal. However, there are some relatively minor concerns that, I agree, should be made to the paper in order to improve clarity. Most of these changes will be textual, but the inclusion of additional computational (e.g. comparisons to other predictive algorithms) and other information in Tables or Figures is needed. Nevertheless, this should not be too onerous...Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process.

==============================

Please submit your revised manuscript by Jan 14 2024 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

Please include the following items when submitting your revised manuscript:

  • A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.

  • A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.

  • An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter.

If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: https://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols. Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols.

We look forward to receiving your revised manuscript, and should each of the comments be addressed, I am confident that I will be able to make an Editorial decision on the suitability of the manuscript for publication.

Thank you again for submitting your work to PLOS ONE.

Sincerely,

Jeffrey L Brodsky

Academic Editor

PLOS ONE

Journal Requirements:

1. When submitting your revision, we need you to address these additional requirements.

Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at 

https://journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and 

https://journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf.

2. Thank you for stating the following financial disclosure: 

 [This work was supported by R35 GM133552 (NIGMS), R01 HL167046 (NHBLI), R00HL151965 (NIH) and OLIVER22A0-KB (CFF). EFM was supported by a predoctoral fellowship F31 HL162483 (NHLBI) and Chemical-Biology Interface training grant T32 GM065086 (NIGMS).].  

Please state what role the funders took in the study.  If the funders had no role, please state: ""The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript."" 

If this statement is not correct you must amend it as needed. 

Please include this amended Role of Funder statement in your cover letter; we will change the online submission form on your behalf.

3. Thank you for stating the following in the Acknowledgments Section of your manuscript: 

[This work was supported by R35 GM133552 (NIGMS), R01 HL167046 (NHBLI), R00HL151965 (NIH) and OLIVER22A0-KB (CFF). EFM was supported by a predoctoral fellowship F31 HL162483 (NHLBI) and Chemical-Biology Interface training grant T32 GM065086 (NIGMS). ]

We note that you have provided funding information that is not currently declared in your Funding Statement. However, funding information should not appear in the Acknowledgments section or other areas of your manuscript. We will only publish funding information present in the Funding Statement section of the online submission form. 

Please remove any funding-related text from the manuscript and let us know how you would like to update your Funding Statement. Currently, your Funding Statement reads as follows: 

  [This work was supported by R35 GM133552 (NIGMS), R01 HL167046 (NHBLI), R00HL151965 (NIH) and OLIVER22A0-KB (CFF). EFM was supported by a predoctoral fellowship F31 HL162483 (NHLBI) and Chemical-Biology Interface training grant T32 GM065086 (NIGMS).].  

Please include your amended statements within your cover letter; we will change the online submission form on your behalf.

4. Please amend the manuscript submission data (via Edit Submission) to include author Jens Meiler.

5. We note that Figure 1A, S1a and S1b in your submission contain copyrighted images. All PLOS content is published under the Creative Commons Attribution License (CC BY 4.0), which means that the manuscript, images, and Supporting Information files will be freely available online, and any third party is permitted to access, download, copy, distribute, and use these materials in any way, even commercially, with proper attribution. For more information, see our copyright guidelines: http://journals.plos.org/plosone/s/licenses-and-copyright.

We require you to either (1) present written permission from the copyright holder to publish these figures specifically under the CC BY 4.0 license, or (2) remove the figures from your submission:

a. You may seek permission from the original copyright holder of Figure 1A, S1a and S1b to publish the content specifically under the CC BY 4.0 license. 

We recommend that you contact the original copyright holder with the Content Permission Form (http://journals.plos.org/plosone/s/file?id=7c09/content-permission-form.pdf) and the following text:

“I request permission for the open-access journal PLOS ONE to publish XXX under the Creative Commons Attribution License (CCAL) CC BY 4.0 (http://creativecommons.org/licenses/by/4.0/). Please be aware that this license allows unrestricted use and distribution, even commercially, by third parties. Please reply and provide explicit written permission to publish XXX under a CC BY license and complete the attached form.”

Please upload the completed Content Permission Form or other proof of granted permissions as an ""Other"" file with your submission. 

In the figure caption of the copyrighted figure, please include the following text: “Reprinted from [ref] under a CC BY license, with permission from [name of publisher], original copyright [original copyright year].”

b. If you are unable to obtain permission from the original copyright holder to publish these figures under the CC BY 4.0 license or if the copyright holder’s requirements are incompatible with the CC BY 4.0 license, please either i) remove the figure or ii) supply a replacement figure that complies with the CC BY 4.0 license. Please check copyright information on all replacement figures and update the figure caption with source information. If applicable, please specify in the figure caption text when a figure is similar but not identical to the original image and is therefore for illustrative purposes only.

6. Please include captions for your Supporting Information files at the end of your manuscript, and update any in-text citations to match accordingly. Please see our Supporting Information guidelines for more information: http://journals.plos.org/plosone/s/supporting-information. 

7. Please review your reference list to ensure that it is complete and correct. If you have cited papers that have been retracted, please include the rationale for doing so in the manuscript text, or remove these references and replace them with relevant current references. Any changes to the reference list should be mentioned in the rebuttal letter that accompanies your revised manuscript. If you need to cite a retracted article, indicate the article’s retracted status in the References list and also include a citation and full reference for the retraction notice.

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: Yes

Reviewer #2: Yes

********** 

2. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: Yes

Reviewer #2: Yes

********** 

3. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: Yes

Reviewer #2: Yes

********** 

4. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: Yes

Reviewer #2: Yes

********** 

5. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: In this study, McDonald et al used statistical methods and existing in vitro data to systematically evaluate the accuracy of the recently developed AlphaMissense technology in predicting the impact of missense variants within the cystic fibrosis transmembrane conductance regulator (CFTR) channel. The criteria the authors used for the assessment are: 1) Pathogenicity annotation from two databases, 2) Clinical features, 3) Protein trafficking and channel function, and 4) Theratypes based on responses to two CFTR correctors. Their data suggest that while the overall pathogenicity of CFTR variants can be predicted with high accuracy by AlphaMissense (AM), albeit slightly skewed towards a higher rate of false positives, AM is unable to distinguish between different mutation classes, theratypes, and clinical outcomes. Together, the study is thorough and the manuscript is well written, the methods used are sound, and the findings are useful for researchers and clinicians who would like to peruse AM to predict existing and new CFTR variants. Here are a few comments and suggestions to strengthen the manuscript and make it more accessible for readers outside the CFTR field:

1. It is unclear how much overlap of the AM training dataset (from ClinVar) with the list of variants in the CFTR2 repository and with the variants examined in the DMS and drug response studies. As this overlap would influence the accuracy of AM predictions, it would be helpful if the authors can provide a brief clarification on this.

Along a similar line of thought, how many CFTR ClinVar variants were included in the AM benchmark/ training set (Lines 95 and 98 suggest this might either be 115 or 104)? An ROC curve of all ClinVar variants (including the benchmark set) was performed in Supp Fig S1D, but a standalone ROC curve for only the variants not in the benchmark set might be useful.

2. Based on the results presented here, how much better (or worse) is AM in predicting CFTR variant pathogenicity compared to other methods such as ESM or EVE?

3. AM incorporates structural data from AlphaFold2 in its predictions. How reliable is this predicted structure compared to existing CFTR structures? Perhaps the poor correlation between AM scores and protein folding might be due to the quality of the predicted structure?

4. Other comments:

4.1. How are “variable consequence” variants defined in the CFTR2 repository? I think a brief clarification in the text would be helpful.

4.2. Table 1 lists all 7 variants from the CFTR2 repository that are of unknown consequence. Since the clinical features of individuals carrying these variants are further discussed in part II, it would be helpful to include these data in Supp Table S1 so that all data (such as # of alleles, frequency, sweat chloride levels, etc.) for all CFTR2 variants are available.

4.3. The authors brought up S912L as an example of a predicted benign variant, yet presenting with typical CF phenotypes. Has this variant been examined experimentally? Are individuals with this variant mostly heterozygous? Is it localized in a domain important for function and/or protein biogenesis? Similarly, were any variants from Table 1 examined in the in vitro studies or elsewhere?

4.4. If heterozygous data is readily available for other ClinVar/ CFTR2 variants, it would be useful to include this data in the supplemental tables, especially since the authors cited heterozygosity as a potential cause for the relatively poor performance of AM.

4.5. A couple sentences detailing the significance/ mode of action of VX-445 and VX-661 in part IV would be helpful for readers not in the immediate CFTR circle.

4.6. Are the correlation coefficients calculated in Figure 2 improved when only CF causing variants are taken into account?

4.7. Figure 2A has a grey box for “Other”, but no variants in this category were in this analysis.

4.8. Figure 2 legend, line 317: typo “variants were variable consequence were shown..”.

4.9. Line 194: What does the * refer to?

4.10. Are there titles and legends for the Supplemental Tables?

Reviewer #2: In this manuscript, McDonald et al explore the ability of the novel AlphaMissense technology to predict the pathogenicity of CFTR missense mutations. The study is quite relevant and, in general, well designed and presented. There are some aspects that should be considered before acceptance.

Major aspects

1. Descriptions of what was performed and is being presented as Results is in general very brief, demanding that the reader needs to constantly shift between the main text, the methods, and the figure legends. It would be very beneficial if the main text could include for each set of results one or two sentences, exposing what was done and is being presented.

2. When comparing data from CFTR2 and other sources, the authors analyze different number of CFTR variants – it is not clear why these specific sets of mutations were chosen (probably due to results being available). It would be good to add this rationale.

3. When analyzing the correlation of AM score with P.aeruginosa infection rate, it would be clearer to provide also separate plots for the different groups of individuals (CF-causing/Variable CC/Unknown) – probably as supplementary.

4. Results should be discussed considering work published by B Balch group – especially Anglès et al (2022) Comm Biol in which the authors present a spatial covariance analysis on the thermodynamic contribution of each residue to CFTR fold. Would it be feasible to add a comparative analysis of those results with the AM scores?

Minor aspects

5. When mentioning (l.112) the non-responsiveness of mutations at residue 560 to modulators, R560S can probably be added.

6. It is not clear what is meant by “CFTR fitness” (l.172). Do the authors mean “CFTR function”?

********** 

6. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: No

**********

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step.

PLoS One. 2024 Jan 25;19(1):e0297560. doi: 10.1371/journal.pone.0297560.r002

Author response to Decision Letter 0


3 Jan 2024

Reviewer #1

In this study, McDonald et al used statistical methods and existing in vitro data to systematically evaluate the accuracy of the recently developed AlphaMissense technology in predicting the impact of missense variants within the cystic fibrosis transmembrane conductance regulator (CFTR) channel. The criteria the authors used for the assessment are: 1) Pathogenicity annotation from two databases, 2) Clinical features, 3) Protein trafficking and channel function, and 4) Theratypes based on responses to two CFTR correctors. Their data suggest that while the overall pathogenicity of CFTR variants can be predicted with high accuracy by AlphaMissense (AM), albeit slightly skewed towards a higher rate of false positives, AM is unable to distinguish between different mutation classes, theratypes, and clinical outcomes. Together, the study is thorough and the manuscript is well written, the methods used are sound, and the findings are useful for researchers and clinicians who would like to peruse AM to predict existing and new CFTR variants. Here are a few comments and suggestions to strengthen the manuscript and make it more accessible for readers outside the CFTR field:

We thank reviewer 1 for their overall very positive feedback and recommendation.

Reviewer #1, Major Comments:

R1.1. It is unclear how much overlap of the AM training dataset (from ClinVar) with the list of variants in the CFTR2 repository and with the variants examined in the DMS and drug response studies. As this overlap would influence the accuracy of AM predictions, it would be helpful if the authors can provide a brief clarification on this. Along a similar line of thought, how many CFTR ClinVar variants were included in the AM benchmark/ training set (Lines 95 and 98 suggest this might either be 115 or 104)? An ROC curve of all ClinVar variants (including the benchmark set) was performed in Supp Fig S1D, but a standalone ROC curve for only the variants not in the benchmark set might be useful.

Our response: We thank Reviewer 1 for this helpful suggestion. Firstly, to clarify and quantify the overlap between the dataset, we now include a systematic set of Venn diagrams for the variants between all possible combinations of datasets including the expanded ClinVar dataset, CFTR2, DMS, and the study by Bihler et al. – leading to 6 total Venn diagrams. We included these Venn diagrams in Supplemental Figure 2C.

Page 5, Line 229: “Variable, albeit high, overlap was observed between the CFTR2 dataset and the in vitro data sets discussed below (Supplemental Figure S2C).”

Secondly, upon closer inspection of the overlap between the AM ClinVar benchmark set and our expanded ClinVar set – we found an overlap of 96 variants, not 104 due to some repeated variants in the data set. We now also include a Venn diagram of the overlap between the AlphaMissense ClinVar dataset and our expanded dataset in Supplemental Figure S1G. This revealed 113 additional variants not included in the AM benchmark, we subsequently calculated the ROC curve for these variants revealing a pathogenic AUC of 1.00 and a benign AUC of 0.92. We now include this in Supplemental Figure S1F and added the following sentences to the manuscript to highlight these data.

Page 4, Line 145: “Finally, we plotted a ROC curve for 113 ClinVar variants not included in the AM benchmark set, which revealed a >90% accuracy and indicates AM performs well on ClinVar data outside of the training set (Supplemental Figure S1F-G).”

R1.2. Based on the results presented here, how much better (or worse) is AM in predicting CFTR variant pathogenicity compared to other methods such as ESM or EVE?

Our response: We agree with Reviewer 1 that it is important to compare prediction methods. We downloaded the CFTR predictions for both the suggested methods and calculated the AUC ROC curve in the same manner as calculated for CFTR. Interestingly, this showed similar performance for pathogenicity predictions (ESM AUC = 0.76, EVE AUC = 0.73) but a slightly lower benign prediction capability (ESM AUC = 0.78, EVE AUC = 0.78). We added the following sentences to the manuscript to highlight these findings.

Page 4, Line 150-156: “AM performance was also compared to two other pathogenicity prediction tools, Evolutionary Scale Modeling (ESM)39 and Evolutionary model of Variant Effect (EVE)40 (Supplemental Figure S2A-B). ROC AUCs of benign variants, the AM value (0.91) was higher than those obtained for ESM (0.78) or EVE (0.78). A similar observation was made for pathogenic ROC AUCs, with AM (0.80) slightly above ESM (0.76) or EVE (0.73). ROC AUCs for ambiguous variants were nearly uniform across all methods (AM, 0.66; ESM, 0.65; EVE, 0.64). AM therefore offers a slight advantage for predicting pathogenic or benign variants and less utility regarding ambiguous variants. ”

R1.3. AM incorporates structural data from AlphaFold2 in its predictions. How reliable is this predicted structure compared to existing CFTR structures? Perhaps the poor correlation between AM scores and protein folding might be due to the quality of the predicted structure?

Our response: We agree with Reviewer 1 that a poor AlphaFold2 prediction could drive poor correlation, however we note that the AF2 prediction of CFTR is quite good. Specifically, AF2 prediction available on Uniprot has a root mean squared deviation of just 2.5 Å from the active state cryo-EM model (PDB ID 6MSM) for resolved residues (1-409,435-637, 845-889, 900-1173, 1202-1451). We added the following sentence to the manuscript to highlight this.

Page 3, Line 123-125: “A high false positive rate may be attributed to a poor AlphaFold2 (AF2) predicted structure of CFTR. However, the AF2 predicted CFTR29 shows a root mean squared deviation of just 2.5 Å from the active state cryo-EM model (PDB ID 6MSM, resolution 3.2 Å30) (Supplemental Figure S1B).”

Reviewer #1, Minor Comments:

R1.4.1. How are “variable consequence” variants defined in the CFTR2 repository? I think a brief clarification in the text would be helpful.

Our response: We agree this is an important technical term to define, we added the following sentences to the first results to clarify “variable consequences”. As well as detailed definitions of all variants from CFTR2.org to the methods section.

Page 3, Line 109: “VVCC are defined by CFTR2 as variants that may cause CF when found heterozygous with CF-causing variants, which results in variable clinical diagnosis of CF, e.g., a person with a VVCC and a CF-causing variant may or may not present with CF19.”

Page 8, Line 335: “CFTR2 definitions for these variants are as follows19: CF-causing: “A variant in one copy of the CFTR gene that always causes CF, as long as it is paired with another CF-causing variant in the other copy of the CFTR gene.” Non CF-causing: “A variant in one copy of the CFTR gene that does not cause CF, even when it is paired with a CF-causing variant in the other copy of the CFTR gene.” Variant of Variable Clinical Consequence (VVCC): “A variant that may cause CF, when paired with a CF-causing variant in the other copy of the CFTR gene.” Variant of Unknown Significance (VUS): “A variant for which we do not have enough information to determine whether or not it falls into the other three categories.””

R1.4.2. Table 1 lists all 7 variants from the CFTR2 repository that are of unknown consequence. Since the clinical features of individuals carrying these variants are further discussed in part II, it would be helpful to include these data in Supp Table S1 so that all data (such as # of alleles, frequency, sweat chloride levels, etc.) for all CFTR2 variants are available.

Our response: We thank Reviewer 1 for alerting us to the point. The variants of unknown consequence were not included in Supplemental Table S1 by mistake. We have now included these variants as well as all variants plotted in Figure 2, such that the Figure 2 plot can easily be reproduced by others from Table S1 with all available data on alleles and clinical outcomes.

R1.4.3. The authors brought up S912L as an example of a predicted benign variant, yet presenting with typical CF phenotypes. Has this variant been examined experimentally? Are individuals with this variant mostly heterozygous? Is it localized in a domain important for function and/or protein biogenesis? Similarly, were any variants from Table 1 examined in the in vitro studies or elsewhere?

Our response: We thank Reviewer 1 for this helpful suggestion. Although S912L was not studied experimentally in the DMS study and was filtered out of the drug response study due to lack of reproducibility, a literature search revealed past experimental studies on this variant and others from Table 1. As there are only 7 patients with S912L in the CFTR2 database, we would venture a guess that they are all heterozygous, although this is not documented specifically. We show the location in the CFTR structure of all variants of unknown significance in Supplemental Figure S1C, although we did not draw attention to this in the first submission of the manuscript – we added a sentence below to highlight this. Additionally, we have added the paragraph below to the manuscript to highlight previous studies of all missense variants of unknown consequence from CFTR2.

Page 3, Line 126: “The location of these variants in the CFTR structure is shown (Supplemental Figure S1C).”

Page 3, Line 127: “Benign predicted R31L disrupts the arginine framed tripeptide motif at R29-R31, important for folding evaluation prior to ER export31 and may affect endocytosis rates32. V201M was ambiguously predicted, consistent with our previous report describing this variant as mildly mis-trafficked and selectively sensitive to VX-66123. A439V (benign prediction) and Y1014C (ambiguous prediction) showed trafficking and function slightly below WT33 suggesting these variants are benign. Benign predicted variant S912L lies close to the CFTR glycosylation sites at N894 and N900, thus we speculated this mutation could interfere with glycan processing. Nevertheless, S912L trafficking and function remained sufficient compared to WT in vitro34.

Variants D924N and M952T, both located in transmembrane helix 8, are predicted as pathogenic (Supplemental Figure S1C). D924N resides in the potentiator binding hotspot35,36 and, according to clinical data, may cause pancreatic insufficiency but not lung disease37. M952T displays robust functional expression in vitro38, and two patients with an M952T/F508del genotype exhibit normal chloride transport measured from intestinal mucosa38 – suggesting this variant is likely not pathogenic, despite the AM prediction.”

R1.4.4. If heterozygous data is readily available for other ClinVar/ CFTR2 variants, it would be useful to include this data in the supplemental tables, especially since the authors cited heterozygosity as a potential cause for the relatively poor performance of AM.

Our response: Although this would be nice to present, heterozygous data is not available in CFTR2 or ClinVar.

R1.4.5. A couple sentences detailing the significance/ mode of action of VX-445 and VX-661 in part IV would be helpful for readers not in the immediate CFTR circle.

Our response: We agree with Reviewer 1 that adding some details on the CFTR corrector compounds VX-445 and VX-661 will increase the accessibility of our manuscript to a broader readership. We added the following sentences to the introduction and a condensed summary to section IV of Results.

Page 2, Line 53: “At present, elexacaftor-tezacaftor-ivacaftor (ETI) is the best available highly effective modulator therapy for CF. This triple combination is clinically approved for ~170 CFTR variants, including the most commonly reported allele, deletion of phenylalanine 508 (F508del)8–11. ETI is composed of one gating potentiator (ivacaftor, VX-770) and two protein maturation correctors, tezacaftor (VX-661) and elexacaftor (VX-445). The corrector compounds have been suggested to directly bind unique subdomains of CFTR: VX-661 to TMD112,13, and VX-445 to the N-terminal lasso and TMD214,15. Correctors contribute intermolecular interactions that favor the properly folded, trafficking competent state of CFTR. Due to the distinct binding sites, VX-661 and VX-445 elicit different mechanisms of action and confer variable theratype responses. Thus, profiling CFTR variant theratypes to these and other emerging modulators remains an important priority for CF personalized medicine.”

Page 6, Line 274: “CF treatment involves two corrector compounds, VX-661 and VX-445, that likely bind directly to two unique sites on CFTR14, show distinct mechanisms, and hence distinct response profiles across variants. Thus, theratyping variant response remains an important task for CF personalized medicine.”

R1.4.6. Are the correlation coefficients calculated in Figure 2 improved when only CF causing variants are taken into account?

Our response: We thank reviewer 1 for this suggest, however the correlation coefficients decrease by 0.05-0.2 when considering only the CF-causing variants. The exception is pancreatic insufficiency rates, which show a very slight increase in correlation about ~0.01-0.3 which we interpret to be inconsequential.

All data CF causing variants only

sweat chloride r = 0.46, ρ = 0.48 r = 0.21, ρ = 0.33

pancreatic insufficiency rates r = 0.31, ρ = 0.41 r = 0.32, ρ = 0.44

pseudomonas infection rates r = 0.38, ρ = 0.43 r = 0.33, ρ = 0.36

As also requested by Reviewer 2 comment 3, we included plots of sweat chloride, pancreatic insufficiency rates, and pseudomonas infection rates separated by CF-causing vs. variable consequence variants in Supplemental Figure S3. We conclude that correlation is best when all data is included and highlight this in the manuscript in the following sentences.

Page 4, Line 193: “When considering CF-causing or VVCC separately, we note a reduced correlation between sweat chloride levels and AM scores (Supplemental Figure 3A-B), suggesting AM captures the trend across all variant types rather than performing better on pathogenic variants.”

Page 4, Line 200: “However, considering CF-causing and VVCCs separately failed to change the correlation for pancreatic insufficiency (Supplemental Figure 3C-D).”

Page 4, Line 205: “yet correlation was again reduced when only CF-causing or VVCCs were separately considered (Supplemental Figure 3E-F).”

R1.4.7. Figure 2A has a grey box for “Other”, but no variants in this category were in this analysis.

Our response: Indeed, there are no variants classified as “Other” by the CFTR2.org database and Figure 2 only contains CFTR2.org data. Thus, we removed this legend label.

R1.4.8. Figure 2 legend, line 317: typo “variants were variable consequence were shown..”.

Our response: We corrected this typo to “variant of variable consequence were shown…”.

R1.4.9. Line 194: What does the * refer to?

Our response: We apologize for this error, this asterisk indicated a citation needed to be inserted here, we added the proper citation.

R1.4.10. Are there titles and legends for the Supplemental Tables?

Our response: We added a list of Supplemental Tables to the main text. Additionally, we added a description tab to each excel file containing a brief legend/description of the data present in the table.

Page 18, Line 482:

“List of Supplemental Tables

Supplemental Table S1: CFTR2.org variant clinical outcome data

Supplemental Table S2: ClinVar CFTR variant data

Supplemental Table S3: In vitro CFTR trafficking and functional data from Bihler et al

Supplemental Table S4: Deep Mutational Scanning CFTR data”

Reviewer #2

In this manuscript, McDonald et al explore the ability of the novel AlphaMissense technology to predict the pathogenicity of CFTR missense mutations. The study is quite relevant and, in general, well designed and presented. There are some aspects that should be considered before acceptance.

Thank you to reviewer 2 for this positive assessment of our study.

Reviewer #2, Major Comments

R2.1. Descriptions of what was performed and is being presented as Results is in general very brief, demanding that the reader needs to constantly shift between the main text, the methods, and the figure legends. It would be very beneficial if the main text could include for each set of results one or two sentences, exposing what was done and is being presented.

Our response: We agree with Reviewer 2 that including a brief description of Methods in the Results section would help with readability. At the same time, we seek to maintain a succinct Results section for accessibility to a broad readership, including those on the clinical side of CF. To compromise we have included the following clarifying sentences in the Results to describe the ROC curve calculation, the CFTR2 database curation, and in vitro data filtering rationale.

Page 3, Line 116: “Briefly, all pairwise comparisons were considered - pathogenic, ambiguous, or benign were taken in turn to be a true positive. The alternative two predictions for a specific comparison were taken to be false positives. We considered pathogenic to predict CF-causing, ambiguous to predict VVCC, and benign to predict non-CF causing, VUS were not used. While looping through all possible score thresholds, the corresponding true positive and false positive rates were calculated and plotted.”

Page 4, Line 184: “Briefly, CFTR2.org data were downloaded from the Variant List History tab and filtered for 176 missense variants (169 classified and 7 VUS). Then, clinical outcome data were manually assembled by searching each variant and recording the sweat chloride (mEq/L), pancreatic insufficiency rate (%), P. aeruginosa infection rate (%), and lung function (forced expiratory volume in one second (FEV1), % predicted).”

Page 5, Line 242: “We removed highly variable experimental data with a standard error of the mean (SEM) greater than 30. Most CFTR variants show a C:B ratio less than 30% of WT, indicating a lack of reproducibility for these measurements with higher variability (8% of data points removed, 92% retained).”

Page 5, Line 251: “Again, highly variable experimental data were filtered out considering an SEM greater than 20 as most variants were less than 20% of WT (Supplemental Figure 4, See Methods), leaving 93% of the experimental data for comparison to AM.”

R2.2. When comparing data from CFTR2 and other sources, the authors analyze different number of CFTR variants – it is not clear why these specific sets of mutations were chosen (probably due to results being available). It would be good to add this rationale.

Our response: We agree with Reviewer 2, that the number of variants compared between plots is somewhat convoluted. For clinical outcome data from CFTR2, this is simply a matter of data availability. Some variants with low patient measurements have insufficient data for one or more of the metrics considered, e.g. sweat chloride, pancreatic insufficiency, and pseudomonas infection rates. We added the following clarifications below shown in italics.

Page 4, Line 182: “We curated the clinical outcomes for all CFTR missense variants with available data (Supplemental Table 1, See Methods).”

Page 4, Line 189: “First, we plotted AM score versus CF sweat chloride levels for 123 missense variants with sweat chloride values reported (Figure 2A).”

Page 5, Line 197: “Next, we plotted AM score versus pancreatic insufficiency rates for 116 missense variants present on at least one allele of persons with CF with CFTR2 outcomes reported (Figure 2B).”

Page 4, Line 170: “Finally, we plotted AM score versus P. aeruginosa infection rates for 114 missense variants on at least one allele with CFTR2 outcomes reported (Figure 2C).”

R2.3. When analyzing the correlation of AM score with P. aeruginosa infection rate, it would be clearer to provide also separate plots for the different groups of individuals (CF-causing/Variable CC/Unknown) – probably as supplementary.

Our response: We included correlation plots of all three clinical outcome metrics (sweat chloride levels, pancreatic insufficiency rates, and pseudomonas infection rates) separated by CF-causing and Variable consequence variants in Supplemental Figure S3, as also requested by Reviewer 1, comment 4.6. We draw attention to this Supplemental Figure in the context of pseudomonas infection rates in the following sentence.

Page 5, Line 205: “yet correlation was again reduced when only CF-causing or VVCCs were separately considered (Supplemental Figure 3E-F).”

R2.4. Results should be discussed considering work published by B Balch group – especially Anglès et al (2022) Comm Biol in which the authors present a spatial covariance analysis on the thermodynamic contribution of each residue to CFTR fold. Would it be feasible to add a comparative analysis of those results with the AM scores?

Our response: We agree with Reviewer 2 that the comparison of AM scores to the spatial covariance study from the Balch group represents an excellent dataset to include in our benchmark. In this study, Anglès et al (2022) Comm Biol, present a trafficking index and a chloride conductance index for 62 CFTR missense variants at 37 ºC and 27 ºC. We plotted the correlation of these in vitro metrics against the respective AlphaMissense score and included this as an additional Supplemental Figure 5. The following passage highlight this useful addition to the manuscript.

Page 5, Line 257-265: “We verified the increased capability to predict CFTR function by correlating AM scores with a spatial covariance study (Supplemental Figure S5). This study describes trafficking (measured by western blot band shift assay) and chloride conductance indices and presents data for both metrics at 37 ºC and reduced temperature (27 ºC)46. Reduced temperature is a well-established method for partially rescuing F508del biogenesis51. We observed a modest correlation (Pearson coefficient: -0.46, Spearman Coefficient: -0.44) with trafficking index at 37 ºC, and a similar correlation at 27 ºC (Pearson coefficient: -0.48, Spearman Coefficient: -0.49) (Supplemental Figure S5A-B). Again, correlation increased when compared to chloride conductance index (Pearson coefficient: -0.58, Spearman Coefficient: -0.54 at 37 ºC vs. Pearson coefficient: -0.50, Spearman Coefficient: -0.53 at 27 ºC) (Supplemental Figure S5C-D).”

Reviewer #2, Minor Comments:

R2.5. When mentioning (l.112) the non-responsiveness of mutations at residue 560 to modulators, R560S can probably be added.

Our response: We thank Reviewer 2 for pointing this out, we added R560S and it now reads “R560T/K/S”.

R2.6. It is not clear what is meant by “CFTR fitness” (l.172). Do the authors mean “CFTR function”?

Our response: We agree with Reviewer 2 that the term “fitness” is generally ambiguous. We changed “fitness” to “function” or “functional” throughout the manuscript for clarity.

Attachment

Submitted filename: CFTR_AM_benchmark_Review_Rebuttal_final_v2.pdf

Decision Letter 1

Jeffrey L Brodsky

9 Jan 2024

Benchmarking AlphaMissense Pathogenicity Predictions Against Cystic Fibrosis Variants

PONE-D-23-36353R1

Dear Dr. Plate,

We’re pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it meets all outstanding technical requirements.

Within one week, you’ll receive an e-mail detailing the required amendments. When these have been addressed, you’ll receive a formal acceptance letter and your manuscript will be scheduled for publication.

An invoice for payment will follow shortly after the formal acceptance. To ensure an efficient process, please log into Editorial Manager at http://www.editorialmanager.com/pone/, click the 'Update My Information' link at the top of the page, and double check that your user information is up-to-date. If you have any billing related questions, please contact our Author Billing department directly at authorbilling@plos.org.

If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they’ll be preparing press materials, please inform our press team as soon as possible -- no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org.

Kind regards,

Jeffrey L Brodsky

Academic Editor

PLOS ONE

Additional Editor Comments (optional):

Reviewers' comments:

Acceptance letter

Jeffrey L Brodsky

17 Jan 2024

PONE-D-23-36353R1

PLOS ONE

Dear Dr. Plate,

I'm pleased to inform you that your manuscript has been deemed suitable for publication in PLOS ONE. Congratulations! Your manuscript is now being handed over to our production team.

At this stage, our production department will prepare your paper for publication. This includes ensuring the following:

* All references, tables, and figures are properly cited

* All relevant supporting information is included in the manuscript submission,

* There are no issues that prevent the paper from being properly typeset

If revisions are needed, the production department will contact you directly to resolve them. If no revisions are needed, you will receive an email when the publication date has been set. At this time, we do not offer pre-publication proofs to authors during production of the accepted work. Please keep in mind that we are working through a large volume of accepted articles, so please give us a few weeks to review your paper and let you know the next and final steps.

Lastly, if your institution or institutions have a press office, please let them know about your upcoming paper now to help maximize its impact. If they'll be preparing press materials, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org.

If we can help with anything else, please email us at customercare@plos.org.

Thank you for submitting your work to PLOS ONE and supporting open access.

Kind regards,

PLOS ONE Editorial Office Staff

on behalf of

Dr. Jeffrey L Brodsky

Academic Editor

PLOS ONE

Associated Data

    This section collects any data citations, data availability statements, or supplementary materials included in this article.

    Supplementary Materials

    S1 Fig. AlphaMissense prediction of CFTR variants of unknown significance (VUS).

    A. Conservation of residue in CFTR mapped on the structure (PDBID 5UAK) [25,26]. The abundance of green, representing low conservations scores in the TMDs stands in notable contrast the AM score predictions of pathogenicity in the TMDs. B. An overlay of the active state CFTR (PDB ID 6MSM) [30] and the AlphaFold prediction for CFTR [29] showing nearly perfect alignment of all resolved residues (1–409,435–637, 845–889, 900–1173, 1202–1451). We calculated the root mean squared deviations (RMSD) of carbon backbone atoms between these two models in Chimera and found an RMSD of just 2.5 Å. C. Variants of unknown significance (VUS) displayed on CFTR structure (PDBID 5UAK) [25] demonstrates all unknown variants are in the transmembrane domains. Benign predicted mutations are shown in green, ambiguous predicted mutations in grey, and pathogenic predicted mutations are shown in purple. The two pathogenic predicted mutations both occur in transmembrane helix 8 (TH8) shown in orange. D. Receiver operating characteristic curve for AlphaMissense predictions of 115 Clinvar variants presented in the AlphaMissense benchmark. The average performance between pathogenic and benign variants is 95.8% as previously presented [21]. E. Receiver operating characteristic curve for AlphaMissense predictions of 209 variants downloaded directly from Clinvar [24], including 96 overlapping variants from the AlphaMissense benchmark. Again, the average performance is 95.8%. F. Receiver operating characteristic curve for AlphaMissense predictions of 113 variants downloaded directly from Clinvar [24] that did not overlap with variants from the AlphaMissense benchmark. Despite, not being trained on these ClinVar data, average performance is 95.8%. G. Overlap between AlphaMissense ClinVar benchmark set and our extended ClinVar set–showing 115 variants from AM and an additional 113 variants considered in F. Performance of AlphaMissense is very good across all permutations of ClinVar data considered. H. Due to the high number of VUS predictions in ClinVar6 for CFTR missense mutations, we plotted the AlphaMissense score for all 1277 VUSs in ClinVar. We show 728 benign, 181 ambiguous, and 368 pathogenic variants as predicted by AM. Data is available in S2 Table.

    (TIF)

    S2 Fig. Alternative prediction method performance and dataset overlap.

    A. Receiver operating characteristic curve for ESM predictions [39] of 169 CFTR missense variants including 110 CF causing, 41 variable clinical consequence (VVCC), and 18 non-CF causing variants. For the pathogenic curve (violet), we considered a pathogenic prediction of a CF-causing variant a true positive. For the ambiguous curve (grey)—we considered the ambiguous prediction a VVCC a true positive. For the benign curve (bluegreen)–we considered the benign prediction of a non-CF causing variant as a true positive. B. Receiver operating characteristic curve calculated the same as in A. but using EVE missense variant predictions [40] of 169 CFTR missense variants colored as shown in A. C. Venn diagrams depicting the overlap of various datasets used throughout the study. We considered our expanded ClinVar dataset, the deep mutational scanning (DMS) dataset [23], our curated CFTR2 dataset, and the missense variants from the Bihler et al. dataset [33].

    (TIF)

    S3 Fig. AlphaMissense prediction correlations with cystic fibrosis patient pathogenicity metrics by diagnosis.

    A. AM score plotted against sweat chloride levels in milliequivalents per liter (mEq/L) for 85 missense variants classified as CF causing. The linear correlation (Pearson Coefficient r = 0.21, Spearman Coefficient ρ = 0.33) is reduced compared to the complete data set correlation shown in Fig 2A. B. AM score plotted against sweat chloride levels for 33 missense variants classified as variants of variable clinical consequence (VVCC). The linear correlation (Pearson Coefficient r = -0.12, Spearman Coefficient ρ = -0.11) is statistically insignificant. C. AM score plotted against pancreatic insufficiency rates in percent for 83 missense variants classified as CF causing. The correlation (Pearson Coefficient r = 0.32, Spearman Coefficient ρ = 0.44) was similar to the entire dataset in Fig 2B. D. AM score plotted against pancreatic insufficiency rates for 30 missense variants classified as VVCC. The correlation for these data (Pearson Coefficient r = -0.21, Spearman Coefficient ρ = -0.22) was statistically insignificant. E. AM score plotted against pseudomonas infection rates for 82 missense variants classified as CF-causing. Linear correlation is reduced compared to the entire data set presented in Fig 2C (Pearson Coefficient r = 0.33, Spearman Coefficient ρ = 0.36). F. AM score plotted against pseudomonas infection rates in percent for 28 missense variants classified as VVCC. Linear correlation was insignificant (Pearson Coefficient r = 0.13, Spearman Coefficient ρ = 0.28).

    (TIF)

    S4 Fig. Distributions of experimental data and error from Bihler et al. study for filtering purposes.

    A. Histogram of the distribution of C-B band ratio of all 585 missense variants from the Bihler et al. study [33]. B. Histogram of the distribution of C-B band ratio SEM for all 585 missense variants from the Bihler et al. study. Variants with an SEM greater than 30 were excluded from analysis due to lack of experimental reproducibility and for plotting clarity. C. FSK %WT distribution plotted as a histogram for all 585 missense variants from the Bihler et al. study [33]. D. FSK %WT SEM distribution plotted as a histogram for all 585 missense variants from the Bihler et al. study. Variants with an SEM greater than 20 were excluded from analysis due to lack of experimental reproducibility and for plotting clarity.

    (TIF)

    S5 Fig. AlphaMissense correlation with CFTR in vitro data from spatial covariance study.

    A. Spatial covariance data from a previous study [46] for 62 missense variants plotted against AlphaMissense scores. Y axis represents the trafficking index as measured by a western blot trafficking assay when HEK293T cells were incubated at 37 ºC. A slight inverse linear correlation was observed (Pearson Coefficient r = -0.46, Spearman Coefficient ρ = -0.44). B. Spatial covariance data for 62 missense variants using the same trafficking index in A. except at 27 ºC, plotted against AlphaMissense scores. Again, an inverse linear correlation was observed (Pearson Coefficient r = -0.48, Spearman Coefficient ρ = -0.49) albeit slightly higher than at 37 ºC. C. AlphaMissense scores correlated with the spatial covariance data but using chloride conductance index described in [46], which measured channel activity at 37 ºC. We observed an increased correlation (Pearson Coefficient r = -0.58, Spearman Coefficient ρ = -0.54). D. AlphaMissense scores correlated with chloride conductance index at 27 ºC. We observed a slight correlation (Pearson Coefficient r = -0.50, Spearman Coefficient ρ = -0.53).

    (TIF)

    S6 Fig. Deep mutational scanning data for VX-661 and VX-445 response colored by AlphaMissense pathogenicity score.

    A. Basal CFTR surface immune staining versus VX-661 CFTR cell surface immune staining intensity [23]. Pathogenic variants score from 0.56–1.00 (violet), ambiguous variants score from 0.34–0.56 (grey), and benign variants score 0.04–0.34 (green). Error bars represent standard deviation. The distribution of pathogenicity colors throughout the plots suggested that AM pathogenicity prediction score failed to predict the VX-661 response. B. Basal CFTR surface immune staining versus VX-445 CFTR cell surface immune staining intensity. Colored the same as in A. Error bars represent standard deviation. Again, AM score failed to predict VX-445 response. C. Basal CFTR surface immune staining versus VX-661 + VX-445 CFTR cell surface immune staining intensity. Colored the same as in A. Error bars represent standard deviation. Finally, AM score failed to predict the combination of VX-661 and VX-445 response on a variant basis.

    (TIF)

    S7 Fig. CFTR modulator response plots colored by AlphaMissense pathogenicity score reveals little predictive capabilities of AM in theratyping.

    A. Basal mature CFTR (C band) to immature CFTR (B band) trafficking (C-B ratio) in percent WT versus modulator enhanced C/B ratio in percent WT from the Bihler et al. study [33]. Error bars were excluded for clarity. Variants with an AlphaMissense pathogenicity prediction score from 0.56–1.00 were classified by AM as pathogenic (violet), a score from 0.34–0.56 as ambiguous (grey), and a score from 0.04–0.34 as benign (green). The distribution of colors across the plots indicated little predictive capability of AM on trafficking theratype. B. TMD1 variants only from Fig 4B. of the basal FSK CFTR activity in percent WT versus modulator enhanced FSK CFTR activity in percent WT from the Bihler et al. study [33]. Error bars were excluded for clarity and AM predicted pathogenicity colored as in A. C. NBD1 variants only from Fig 4B., data from the Bihler et al. study [33]. D. TMD2 variants only from Fig 4B., data from the Bihler et al. study [33]. E. NBD2 variants only from Fig 4B., data from the Bihler et al. study [33].

    (TIF)

    S1 Table. CFTR2.org variant clinical outcome data.

    (XLSX)

    S2 Table. ClinVar CFTR variant data.

    (XLSX)

    S3 Table. In vitro CFTR trafficking and functional data from Bihler et al.

    (XLSX)

    S4 Table. Deep mutational scanning CFTR data.

    (XLSX)

    Attachment

    Submitted filename: CFTR_AM_benchmark_Review_Rebuttal_final_v2.pdf

    Data Availability Statement

    All relevant data are within the manuscript and its Supporting information files.


    Articles from PLOS ONE are provided here courtesy of PLOS

    RESOURCES