Skip to main content
PLOS Genetics logoLink to PLOS Genetics
. 2020 Jun 22;16(6):e1008862. doi: 10.1371/journal.pgen.1008862

A Bayesian method to estimate variant-induced disease penetrance

Brett M Kroncke 1,2,3,*, Derek K Smith 4, Yi Zuo 4, Andrew M Glazer 1,2, Dan M Roden 1,2,3,5, Jeffrey D Blume 4
Editor: Leslie Biesecker6
PMCID: PMC7347235  PMID: 32569262

Abstract

A major challenge emerging in genomic medicine is how to assess best disease risk from rare or novel variants found in disease-related genes. The expanding volume of data generated by very large phenotyping efforts coupled to DNA sequence data presents an opportunity to reinterpret genetic liability of disease risk. Here we propose a framework to estimate the probability of disease given the presence of a genetic variant conditioned on features of that variant. We refer to this as the penetrance, the fraction of all variant heterozygotes that will present with disease. We demonstrate this methodology using a well-established disease-gene pair, the cardiac sodium channel gene SCN5A and the heart arrhythmia Brugada syndrome. From a review of 756 publications, we developed a pattern mixture algorithm, based on a Bayesian Beta-Binomial model, to generate SCN5A penetrance probabilities for the Brugada syndrome conditioned on variant-specific attributes. These probabilities are determined from variant-specific features (e.g. function, structural context, and sequence conservation) and from observations of affected and unaffected heterozygotes. Variant functional perturbation and structural context prove most predictive of Brugada syndrome penetrance.

Author summary

The clinical implications for genetic variants, even definitively pathogenic variants, can vary strikingly across individuals. Lack of evidence to estimate the probability of disease from identified genetic variants, especially rare variants, presents a major barrier to integrating genotype information into clinical care. Here we advance an approach to estimate the penetrance, or positive predictive value of the discovery of a genetic variant, in service of advancing the use of genetic information in personalized medicine.

Introduction

A major barrier to integrating genotype information into clinical care is accurately linking genetic variants to disease risk. As cheap whole genome, exome, and gene panel sequencing becomes more widely used, the genetics community frequently observes novel, ultra-rare variants—ones carried by a single or few (often related) individuals. Indeed, most variants found in large population genome sequencing efforts are novel or ultra rare [14]. The number of possible single nucleotide variants in the human genome is in the billions; the number of variants becomes uncountable if insertion and/or deletions (indels) are included. The majority of these discovered variants will never be observed in a sufficient number of heterozygotes to ascertain a causal link with disease. In addition to finding rare variants, large-scale genetic sequencing efforts taking place around the world are identifying greater numbers of individuals, ostensibly unaffected, who carry variants previously thought to be disease-inducing [5, 6]. As a consequence of insufficient heterozygote counts and conflicting annotations, many diagnostic laboratories annotate such variants as “Variants of Uncertain Significance” (VUS), despite more confident past assessments of “Likely Pathogenic” or “Pathogenic” [710].

To help assess the impact of genetic variants, the American College of Medical Genetics and Genomics (ACMG) suggests integrating multiple sources of information including population, functional, computational, and segregation data to classify variants [11, 12]. This is consistent with a continuous, Bayesian framework where each additional satisfied classification criterion modifies the probability a variant is causative for disease (pathogenic) or not (benign) [12]. Given the resulting probabilities, a final classification can be made into one of the five categories commonly used to distinguish variants—benign, likely benign, variant of uncertain significance, likely pathogenic, or pathogenic. However, a remaining challenge even after classification is that the clinical implications for definitively pathogenic variants can vary strikingly across individuals, including variable expressivity and incomplete penetrance [13]. We attempt here to address one aspect of this clinical variability by developing a method to estimate variant-induced disease risk.

In this study, we sought to develop a method to estimate the probability of disease given variant-specific information–which we refer to as the penetrance of a variant–and we also provide the uncertainty for that estimate. The pathogenicity of a variant for a specific individual at a given point in time is binary but unknown. This pathogenicity may have a time dependence such as for diseases which present later in life. Penetrance is one metric that captures the degree to which the pathogenicity will manifest as a human phenotype such as a disease or a trait. We provide posterior probability estimates of the penetrance, asymptotic with respect to age, which can be thought of as the positive predictive value of disease given the known variant information. We also provide a 95% credible interval that represents the uncertainty in that estimate. Our method relies on “borrowing strength” or sharing information across variants to produce variant-specific, quantitative penetrance estimates even in the absence of a large number of heterozygotes. These estimates can be especially informative for interpreting rare and novel variants.

We illustrate our approach using the rare cardiac arrhythmia disorder Brugada Syndrome (BrS1 [MIM: 601144]), which is linked to rare loss-of-function variants in the cardiac sodium channel SCN5A [14]. These variants most commonly act by altering peak sodium current, a parameter of sodium channel function that is readily assessed using in vitro methods. By quantitatively integrating multiple features, including in vitro functional experiments, information about the three-dimensional protein structure, and previously published variant-classifiers, such as PolyPhen-2 and PROVEAN, we estimate the BrS1 penetrance attributable to individual SCN5A variants. The resulting priors, imputed from these predictive features, can be readily interpreted as hypothetical observations of unaffected and affected heterozygotes.

Results/Discussion

Variants in SCN5A have been associated with BrS1 since 1998,[15] some variants affecting almost all known heterozygous individuals, some variants conferring only modestly increased risk, and others have no influence on arrhythmia presentation [14, 16, 17]. SCN5A variants that do not influence the gene in any way do not predispose or protect against BrS1, e.g. many synonymous variants. These variants therefore have a relatively low penetrance of the arrhythmia, similar to the general population. SCN5A variants that produce no sodium current result in a higher fraction of heterozygotes presenting with BrS1, much higher than in the general population [18]. However, BrS1 presentation, as for nearly all inherited diseases, is not homogeneous even amongst heterozygotes of SCN5A haploinsufficiency alleles. In fact, even highly penetrant variants such as p.Glu1784Lys and p.Glu1784Lys still leave some heterozygotes unaffected: 100% penetrance is extremely rare [18].

Our hypothesis is that variant-specific features (e.g. variant-induced changes in function and location in structure) contain information equivalent to clinically phenotyping heterozygotes and can therefore be used to inform the prior distribution in a Bayesian framework. This prior distribution is combined directly with clinically phenotyped heterozygotes (the likelihood function) to produce more accurate estimates of disease risk probability (posterior penetrance; Fig 1) via Bayes theorem. To demonstrate this approach, we developed an expectation maximization approach (EM), detailed in the Materials and Methods section, and applied it to a previously generated dataset of SCN5A features and BrS1 phenotype counts [18] (supplemented with reports published within the last year) to estimate BrS1 penetrance using SCN5A variant-specific features. This process yielded a total of 1,439 unique variants with at least 1 observed heterozygote, BrS1 was diagnosable in 857 individuals heterozygous for 387 unique variants (S1S3 Figs). BrS1 penetrance priors informed by the predictive features listed in S1 Table adjust and narrow the uncertainty, as shown in Fig 1.

Fig 1. Penetrance priors are informed by variant-specific features.

Fig 1

Probability density (y-axis) versus penetrance (x-axis) for three selected SCN5A variants where peak current, penetrance density, and in silico classification are known. Numbers of affected and unaffected individuals reported are presented for each variant. Penetrance priors are low for c.3922C>T (p.Leu1308Phe; Benign according to ClinVar), moderate for c.4978A>G (p.Ile1660Val; VUS), and higher for c.2632C>T (p.Arg878Cys; Pathogenic). When variant-specific data are known, the penetrance estimate is adjusted to reflect the penetrance probability consistent with variants with similar features.

Precision and accuracy of BrS1 penetrance priors

To evaluate performance over the distribution of BrS1 prior penetrances (S5 Fig), we plotted the difference between prior mean and posterior mean BrS1 penetrance as a function of the average between the two estimates (Fig 2). The resulting Bland-Altman difference plot seen in Fig 2 indicates scatter evenly distributed with under and over predicted BrS1 penetrance as a function of prior mean penetrance. This suggests the predictive priors are reasonably calibrated and have no systematic biases in the range of BrS1 mean penetrance estimated. We additionally compared linear regression models trained on a limited subset of features/covariates with the BrS1 mean posterior, BrS1cases+αpriortotalheterozygotes+αprior+βprior (where αprior and βprior are the tuning parameters for the beta-binomial distribution and are set equivalent to the number of affected and unaffected individual heterozygotes in the prior), as the dependent variable; both empirical and EM priors were evaluated as indicated in Table 1. Peak current and penetrance density (a modification of a structure-derived feature we developed previously[19]; see S1 Text) contain orthogonal information as can been seen by the differences in coefficient of determination, R2, for models built using each or both predictors (Table 1). The relatively small improvement in R2 when all predictors are included suggests most information contained in the sequence-based predictive features is recapitulated by both peak current and penetrance density.

Fig 2. Bland-Altman plot between EM prior and EM posterior mean penetrances for all SCN5A variants.

Fig 2

To assess the performance of the EM prior, we used a Bland-Altman plot to compare the mean BrS1 penetrance estimated from the EM prior and from the EM posterior, the y-axis is the difference between the two and the x-axis is the average between the two. For each plotted point, both color and radius indicate the log10 of the total number of heterozygotes present in the dataset. The relatively consistent scatter about y = 0 suggests no systematic biases present in the EM prior mean BrS1 estimates.

Table 1. Weighted R2 from EM prior means to Empirical/EM posterior means.

Models trained with displayed subsets of features using the same subset of variants, where covariates listed in S1 Table are known.

Features Empirical EM
Peak Current 0.22 [0.12–0.34; 155] 0.35 [0.24–0.45; 20]
Penetrance Density 0.35 [0.20–0.49; 113] 0.66 [0.53–0.76; -124]
Peak Current and Penetrance Density 0.43 [0.27–0.57; 88] 0.76 [0.66–0.83; -201]
All Features 0.44 [0.28–0.59; 90] 0.78 [0.69–0.85; -218]
Sequence-based Features 0.12 [0.06–0.19; 189] 0.20 [0.12–0.28; 74]

†Weighted R2 [95% Confidence Interval; Akaike information criterion], weighted by inverse beta-binomial variance capped at the 9th decile as described in the methods section

Inclusion of individuals from gnomAD

Individuals in gnomAD are mostly unaffected, given the rarity of BrS; however, the data available from that resource could be contaminated with individuals presenting with BrS, though likely at or near the rate in the general public. To test the sensitivity of our results to this type of misclassification, we randomly switched individuals from unaffected (gnomAD) to BrS cases for each variant and examined the change in penetrance due to misclassification. We did this with 24 and 240 misclassified cases. With 24 misclassifications, the median rate of penetrance change is 0.4% and the expected number of variants with a penetrance change is 6. The average mean absolute difference in penetrance change is 0.02% (first quartile of 0.0014% and third quartile of 0.02%). With 240 misclassifications, the median rate of penetrance change is 2%, and the expected number of variants with a penetrance change is 28. The average mean absolute difference in penetrance change is 0.2% (first quartile of 0.1% and third quartile of 0.3%). These results suggest minimal influence of small or modest misclassification rates on penetrance estimates.

Structure and peak current improve prediction of penetrance

The resulting prior BrS1 mean penetrance estimates reflect the known topology of NaV1.5 (protein product of SCN5A; Fig 3), with the sodium channel pore and selectivity filter inducing a greater disease burden as previously observed [18, 20]. Fig 4 examines in greater detail a small region within domain III (D-III), showing the 95% credible interval of BrS1 penetrance both before (prior) and after (posterior) adding heterozygote counts listed on the left. The selectivity filter has the highest average BrS1 prior and posterior, also true for domains I, II, and IV (Fig 3). Towards the intracellular side of the D-III S6 helix, there are fewer variants with high BrS1 penetrance. This trend can also be seen in S6 Fig which shows an increase in variants associated with BrS1 that depends on membrane depth of the variant. These results support our assertion that variant-specific predictive features of variant-induced functional perturbation and structural context contain information equivalent to clinically phenotyping individuals heterozygous for these variants. The interchangeability of this information was additionally demonstrated recently by taking the reverse approach: functionally characterizing variants with different estimates of BrS1 penetrance [21]. In these experiments, Glazer et al. found that variants with a higher estimated BrS1 penetrance had a higher probability of producing a variant-induced loss-of-function protein phenotype (Fig 3A in reference 21).

Fig 3. Prior mean BrS1 penetrance reflects the protein topology of NaV1.5.

Fig 3

The predicted mean BrS1 penetrance from the converged expectation maximization (EM) algorithm. The line across the plot is a predicted mean BrS1 penetrance averaged over 30 neighboring variants. Topology diagram is shown above with transmembrane helices indicated by yellow lines and membrane indicated as a grey rectangle. Note the four largest, distinct peaks correspond to the four structured, transmembrane domains of the channel, with an especially steep peak at the selectivity filter and pore. Though estimated distances in three-dimensional space between residues is used to construct the BrS1 penetrance density, structural data are not explicitly used in the BrS1 penetrance prior and so the recapitulation of the structure is not assured.

Fig 4. Sample of BrS1 penetrance prior 95% credible intervals.

Fig 4

Left: SCN5A variants with more than one heterozygote in our dataset are plotted with prior 95% credible intervals (colored bars) and mean posteriors (black rectangles) with posterior 95% credible intervals (black lines). Right: a model of the SCN5A protein product, NaV1.5, is shown with the regions highlighted in blue, green, gold, and red, corresponding to the colors of the variant prior 95% credible intervals shown to the left, which are analogous to the penetrance probability distributions shown on the y-axes in Fig 1. Variants near the D-III pore selectivity filter have a much higher prior and posterior BrS1 penetrance compared to residues near the D-III/D-IV linker. This is expected since the selectivity filter pore helices contain the most compacted region of the protein and also are responsible for the ion conduction and are therefore most sensitive to substitution. In fact, the highest density of variants with non-zero BrS1 penetrance lie at this depth in the membrane (S6 Fig). Variants listed are c.4057G>A (p.Val1353Met), c.4070C>T (p.Ala1357Val), c.4109A>G (p.Asp1370Gly), c.4132G>A (p.Val1378Met), c.4140C>G (p.Asn1380Lys), c.4137_4139CAA (p.Asn1380del), c.4145G>T (p.Ser1382Ile), c.4171G>A (p.Gly1391Arg), c.4192G>A (p.Val1398Met), c.4213G>C (p.Val1405Leu), c.4213G>A (p.Val1405Met), c.4217G>A (p.Gly1406Glu), c.4216G>A (p.Gly1406Arg), c.4222G>A (p.Gly1408Arg), c.4258G>C (p.Gly1420Arg), c.4259G>T (p.Gly1420Val), c.4282G>T (p.Ala1428Ser), c.4283C>T (p.Ala1428Val), c.4288G>A (p.Asp1430Asn), c.4296G>T (p.Arg1432Ser), c.4297G>T (p.Gly1433Trp), c.4328A>G (p.Asn1443Ser), c.4333T>C (p.Tyr1445His), c.4342A>C (p.Ile1448Leu), c.4346A>G (p.Tyr1449Cys), c.4381A>T (p.Thr1461Ser), c.4414_4417AAC (p.Asn1472del), c.4418T>G (p.Phe1473Cys), c.4427A>G (p.Gln1476Arg), c.4459A>C (p.Met1487Leu), c.4467G>T (p.Glu1489Asp).

A modified Bayesian approach to estimate BrS penetrance

A typical Empirical Bayes approach combines information across all variants to estimate a single prior distribution and estimate a variant-specific posterior penetrance from that prior. These estimates assume all variant effects have the same prior and therefore shrink towards a global mean across all variants. Here we put forward a method to model the penetrance for each variant using variant-specific predictive features. The resulting penetrance and uncertainty estimates yield a posterior that can be re-used as variant-specific prior (interpretable as equivalent to hypothetical observations of affected and unaffected heterozygotes) in a classical Bayesian updating scheme. This information is accessible before clinically phenotyping a single heterozygote; example estimates of high BrS1 penetrance [c.4213G>C (p.Val1405Leu), c.4259G>T (p.Gly1420Val), and c.4258G>C (p.Gly1420Arg)] and low BrS1 penetrance [c.4418T>G (p.Phe1473Cys), c.4459A>C (p.Met1487Leu), and c.4467G>T (p.Glu1489Asp)] are seen in Fig 4.

Comparison between penetrance prediction and ACMG variant classification

We put forward a method to estimate the probability that an SCN5A variant will manifest in BrS1 for a given patient (our ‘risk score’), and uncertainty for that score, conditioned on variant attributes. We are not assessing the causality of the variant and its attributes on the manifestation of disease, but rather their association. Hence, our framework diverges from that of the ACMG, quantitated by Tavtigian et al. 2018. For example, in our formulation, a VUS with many affected heterozygotes would have the same probability distribution as a pathogenic variant with many affected heterozygotes [provided the number of observations of cases and controls is the same and the other predictive covariates (variant attributes) are the same]. If there are comparatively few heterozygotes of the VUS, given the same predictive covariates, greater uncertainty would be reflected by a wider distribution of penetrance probability (Fig 1). In addition, our calculation is agnostic to origin, de novo or inherited, and therefore does not consider this evidence (though this information may additionally inform an estimate of penetrance and therefore warrants further investigation). We also do not treat null variants here. For our purposes of building variant-specific, data-driven penetrance priors, null variants have relatively little variance in the predictive covariates and therefore contribute less to our analysis. In future work we will additionally attempt to include these features.

Prospects for applications of this method

Our approach provides a risk score for disease, in this case, for BrS1. However, Brugada syndrome has degrees of electrophysiologic phenotypes and symptoms. We envision being able to predict these degrees of clinical phenotype from variant-specific properties in the future by integrating electronic health records with linked genetic data. However, at present, these granular electrophysiologic and symptom data are not available for a number of unique heterozygotes and unique variants sufficient for statistical analysis. Beyond SCN5A and BrS1, a reasonable next step would involve the 59 genes for which the ACMG recommends clinical diagnostic laboratories report secondary variant discovery. Of these, 36 have greater than or equal to 20 missense “pathogenic”/”likely pathogenic” variants in ClinVar,[22] suggesting that many variants are described in the literature and can be curated in a similar manner to SCN5A. It is also important to note that the penetrance estimates derived in our approach are not static and will continue to be refined as additional data become available, i.e. phenotype data from case reports and large biobank projects, additional in vitro functional studies, and improved computational and structural predictors [13, 2326].

Limitations

Our approach provides a risk score for disease, in this case BrS1, analogous to a diagnostic test (might patient X develop BrS1 given they have variant Y). If we know patient X already has BrS1, we can use their data to inform other individuals’ risk scores, but we cannot use our approach to absolutely determine the role of variant Y manifesting disease. One application of our approach is that we can examine the ratio P(BrS1|SCN5A Variant X)/P(BrS1|wild-type SCN5A) to see if the data better support that variant X is on the causal pathway to disease. But we caution that this approach is imperfect; it does not allow for variants to interact, for example. Additionally, while clinical evidence affirms a strong relationship between SCN5A variants and BrS1, many genetic and environmental factors influence the ultimate presentation of BrS1 in an individual [13, 27, 28]. Not accounting for additional demographic, genetic, or environmental factors certainly increased the noise in our analysis. To counter this as best as possible, we included the maximum number of carriers for the maximum number of unique variants. Finally, we recognize the likely bias intrinsic to compiling a list of affected and unaffected heterozygotes in the manner outlined in the methods section above; however, the most probable manifestation of these biases would be the loss of an observable relationship between the predictive features and penetrance, not the creation of a spurious relationship.

Conclusions

We advance a method to estimate a degree of clinical heterogeneity in variant impact, incomplete penetrance. Here we have demonstrated how BrS1 penetrance can be estimated with high accuracy and precision. Using a Bayesian framework to estimate penetrance allows us to quantitatively integrate clinical phenotypic data with variant-specific functional measurements, variant classifiers, and sequence- and structure-based features to accurately estimate penetrance. This method can be extended to other genes and disorders in order to enable quantitative interpretation of variants probabilistically and quantitatively [24, 29].

Materials and methods

These analyses focus on the SCN5A gene, where individual variants are known to influence the clinical presentation of the autosomal dominant arrhythmia Brugada Syndrome (BrS1) [16, 17]. We define cases as individuals with either a spontaneous or drug-induced ECG BrS1 pattern, ST-segment abnormalities, as reported in each publication [18, 30]. Penetrance is defined as the fraction of individuals who carry a variant that also present with a disease. This can be extracted from literature reports when multiple variant heterozygotes have been reported. We do not observe the actual penetrance for any given variant; however, we can estimate BrS1 penetrance for each variant as the average posterior penetrance denoted as the following:

MeanPosteriorPenetrance=α+αpriorα+β+αprior+βprior Eq 1

Where α is the number of variant heterozygotes diagnosed with BrS1 (or BrS1 cases) and β is the number of unaffected heterozygotes of the same variant (or controls). As the total number of observed heterozygotes increases, the estimated penetrance converges to the traditional definition. The mean posterior penetrance can be thought of as a shrunken estimate of the observed penetrance [31], especially for variants with small numbers of known heterozygotes.

To generate priors from our available data, we use a variation of the expectation maximization (EM) algorithm [32]. Our modified EM algorithm is an iterative technique composed of three steps: 1) calculate the expected penetrance from an empirical Bayes penetrance model, 2) fit a regression model of our estimated penetrance on variant-specific characteristics by maximum likelihood (Eq 2, below) and 3) revise our estimate of the BrS1 penetrance prior using the fit from step 2 then iterate steps 2–3 until convergence criteria are satisfied (S7 Fig).

PenetranceEstimatei=β0+β1(PeakCurrent)i+β2(PenetranceDensity)i+nβi,n(InSilicoVariantClassifiers)i,n+εi Eq 2

Here peak current is an in vitro measurement of the maximum current through a channel (normalized to wild type), penetrance density is a structure-based metric [19] detailed in the S1 Text, and in silico variant-classifiers is a vector populated with commonly used variant classification servers such as PROVEAN and PolyPhen (see below); all predictors used are continuous, not categorical or binary (S1 Table). The fitted model is then used to generate an updated prior distribution and, by addition of observed cases and controls for each variant, a subsequent posterior expected penetrance. The updated posterior penetrance is then used to build a new fitted model and further refine the posterior expected penetrance. This procedure is iterated until it converges to the maximum likelihood solution (S7 Fig). Using a beta-binomial model to estimate penetrance, the prior parameters (αprior, EM and βprior, EM, both functions of the features listed in S1 Table) are identifiable from a predicted penetrance point estimate and its associated variance. For comparison, we generated predicted penetrance values using a standard empirical Bayes method which generated a single empirical prior for all variants, αprior, empirical and βprior, empirical equal to 0.45 and 2.73, respectively (called empirical prior throughout the text, S8 Fig). To test our predictions, we compare our EM penetrance priors, αprior,EMαprior,EM+βprior,EM, to the posterior mean penetrance derived by adding BrS1 cases and controls for each variant to the empirical prior, BrS1cases+αprior,empiricalTotalheterozygotes+αprior,empirical+βprior,empirical, or the EM prior, BrS1cases+αprior,EMTotalheterozygotes+αprior,EM+βprior,EM.

Collection of the SCN5A variant dataset

The dataset was curated from 711 papers in a previous publication [18], to which we added an additional 45 papers on SCN5A that had been published since the previous dataset was constructed. Briefly, we searched publications for the number of heterozygotes of each variant mentioned, the number of unaffected and affected individuals with diagnosed BrS1, and variant-induced changes in channel function, if reported; all recorded values of channel function were normalized to wild-type values reported in the same publications. We supplemented this dataset with all SCN5A variants in the gnomAD database of population variation (http://gnomad.broadinstitute.org/; release 2.0) [33]. Due to the rarity of BrS1 (~1 in 10,000) [34], all heterozygotes found in gnomAD were counted as unaffected. An interactive version of the dataset, the SCN5A Variant Browser, is available at https://oates.app.vumc.org/vancart/SCN5A/. We further collected in silico pathogenicity predictions from three commonly used servers: SIFT [35], Polyphen-2 [36], and PROVEAN [37]. We also include basic local alignment search tool position-specific scoring matrix (BLAST-PSSM)[38] for SCN5A and the per residue evolutionary rate [39], previously shown to have predictive value for predicting functional perturbation for the cardiac potassium channel gene KCNQ1 [40], and point accepted mutation score (PAM) [41]. Additionally, we leveraged structures of the SCN5A protein product and derived a penetrance density as previously described (see S1 Text for details) [19]. In-frame indels are treated as missense variants. We include these variants as variations at a residue where the indel starts, and only note whether they are an insertion or deletion. Some of these variants have functional data available and their penetrance densities are calculated from the residue starting the indel. These are simplifications to enable an analysis of as many variants and heterozygote individuals as possible. For these variants, we did not include in silico pathogenicity predictions. We included compound heterozygotes (individuals with more than one SCN5A variant) as separate records when these data are available, though these were very rare. Additionally, our inclusion criteria are not modified by relatedness. We did not include intronic variants in our analysis. The dataset is available in S2 Table.

Initial Empirical Bayes beta-binomial prior penetrance calculation

Using the data from the aforementioned literature curation [18], we estimated the penetrance for each observed variant using a beta-binomial empirical Bayes model. To calculate the empirical BrS1 penetrance prior, we calculated αprior, empirical and βprior, empirical by finding the weighted mean penetrance over all variants in the dataset and estimating the variance. Weighting was done using the following equation:

w=110.01+numberofheterozygotes Eq 3

Eq 3 ensures variants with a greater number of total heterozygotes (and therefore higher confidence in penetrance estimate) had a greater weight in the preliminary analysis. We then estimated the variance in penetrance as the mean squared error (MSE) between the estimated penetrance mean and the observed penetrance from Eq 1 with αprior and βprior equal to zero. With these estimated mean and MSE-derived variance, the empirical prior penetrance was calculated to be an αprior and βprior equal to 0.45 and 2.73, respectively. The variant-specific empirical posterior for each variant was then calculated by adding observed heterozygote counts of affected (BrS1 cases) and unaffected to αprior, empirical and βprior, empirical, respectively, and the resulting posterior mean penetrance was used as the dependent variable of the subsequent regression model (Eq 2). The inverse variance of the estimated posterior beta distributions capped at the ninth decile determined in this step were used to weight subsequent regression models and Pearson R2 calculations.

Expectation maximization Bayesian beta-binomial penetrance predictions

To deal with missing data in a prediction model, we followed the approach outlined in Mercaldo and Blume [42] which avoids multiple imputation but guarantees maximum predictive accuracy across missing data patterns. In short, for every missing data pattern, we estimate a separate prediction model. For example, p.His558Arg, where penetrance density, in silico predictors, and functional data are all available, the estimate of penetrance is regressed on all other variants where all of these covariates are available (n = 238). For p.Try1449Cys, however, only penetrance density and in silico predictors are available, so only those covariates are used in the regression (n = 1,382; much higher since functional data have been collected for relatively few variants). The models were built with a linear regression pattern-mixture algorithm, updating posterior mean penetrances iteratively until the resulting estimated mean penetrance, μ=αprior,EMαprior,EM+βprior,EM, changed by < 0.01% from the previous iteration. This process typically converged within eight iterations. For variant, i, the variance was estimated from this converged EM mean penetrance according to (Eq 4):

σi=μi(1μi)1+ν Eq 4

We then adjusted ν, equivalent to the number hypothetical observations of clinically phenotyped heterozygotes, to balance overcoverage of variants with low to moderate BrS1 penetrance and poorer coverage of variants with high estimated mean penetrance, resulting in a range of ν, from approximately 15 to 20 (see S2 Text for details; S9S12 Figs). All analyses were performed using the datasets provided in S2 Table and at the Kroncke lab GitHub site: https://github.com/kroncke-lab/Bayes_BrS1_Penetrance.

Supporting information

S1 Text. Detailed explanation of penetrance density calculation.

(DOCX)

S2 Text. Detailed explanation of how ‘ν’ from Eq 4 was determined.

(DOCX)

S1 Table. SCN5A variant-specific features used to predict BrS1 penetrance.

(DOCX)

S2 Table. SCN5A dataset.

All data used to estimate BrS1 penetrance including covariates are included in the accompanying dataset.

(CSV)

S1 Fig. Histogram of the frequency of variants (y-axis) with different number of individuals diagnosed with Brugada syndrome (x-axis).

Most variants have only a single heterozygote diagnosed with BrS; however, there are over 10 variants with 10 or more heterozygotes diagnosed with BrS.

(PNG)

S2 Fig. Frequency of variants (y-axis) with different counts in gnomAD (x-axis).

The x-axis is truncated at 350. There are 10 variants with greater than 350 carriers.

(PNG)

S3 Fig. Frequency of variants (y-axis) with different observed BrS penetrances (x-axis).

Most variants have either exactly 0 or exactly 1 observed BrS penetrance, at odds with both the known background rate of BrS in the general public (approximately 1 in 10,000–20,000) and with the extreme rarity of any variant having 100% penetrance.

(PNG)

S4 Fig. Bland-Altman plot between EM posterior mean BrS penetrances and observed BrS penetrance for SCN5A variants with at least 15 heterozygotes.

The relatively narrow spread along the y-axis suggests reasonable agreement between the two estimates of BrS penetrance. With the cutoff of at least 15 heterozygotes, there are relatively few variants with an expected penetrance of greater than 10%.

(PNG)

S5 Fig. Histogram of BrS1 penetrance imputed EM prior means and associated upper and lower bounds to 95% credible interval from pattern mixture models.

Plotted are BrS1 mean penetrances from imputed EM priors (“Predicted”, green) and upper (red) and lower (blue) bounds to associated 95% credible intervals from those imputed EM priors.

(PNG)

S6 Fig. SCN5A pathogenic and benign variants cluster in space.

Rate of variants with high BrS1 penetrance (>20%, blue) or low BrS1 penetrance (<10%, red) in a model of the SCN5A protein product. Each bar represents a histogram of variants associated with each disease within a 5Å slice within the membrane (divided by the total number of residues within the slice), boxes at each of the four corners represent residues not modeled (only 33 residues were not modeled in the extracellular loops). There is a relative paucity of low BrS1 penetrance variants within the structured transmembrane region and the relative abundance of high BrS1 penetrance in the same region. The rate of high BrS1 penetrance variants is higher in the extracellular half of the protein molecule likely due to more compacting of residues in the top half of the pore domain as well as proximity to the ion selective element (selectivity filter). Amino acid substitutions in these regions therefore more often have a disruptive influence.

(PNG)

S7 Fig. Generation of empirical and EM priors.

The modified EM algorithm is an iterative technique composed of two steps: 1) calculate the expected penetrance from an empirical Bayes penetrance model and 2) fit regression of our estimated penetrance on variant-specific characteristics by maximum likelihood. The fitted model is then used to generate an updated, imputed prior and subsequent posterior expected penetrance and this process is iterated until it converges to the maximum likelihood solution, when the new mean penetrance changed by less than 1% from the previous iteration. The variance is then estimated according to Eq 4 as explained above.

(PNG)

S8 Fig. BrS1 penetrance probability versus penetrance for the empirical prior.

(PNG)

S9 Fig. Estimated coverage rates for each SCN5A variant versus sampled true penetrance.

Coverage rate was calculated as defined above. Color and radius indicate the log10 of the total number of heterozygotes present in the dataset. The tuning parameter Eq 4 was set to ν = 7. There is overcoverage (greater than 95%) for variants with high and low BrS1 penetrance indicating an overestimate of the variance.

(PNG)

S10 Fig. Estimated coverage rates for each SCN5A variant versus sampled true penetrance.

Coverage rate was calculated as defined above. Color and radius indicate the log10 of the total number of heterozygotes present in the dataset. The tuning parameter Eq 4 was set to ν = 14. There is overcoverage for the majority of variants, though some variants are now outside the 95% credible interval.

(PNG)

S11 Fig. Estimated coverage rates for each SCN5A variant versus sampled true penetrance.

Coverage rate was calculated as defined above. Color and radius indicate the log10 of the total number of heterozygotes present in the dataset. The tuning parameter Eq 4 was set to ν = 19. Overcoverage is reduced especially for residues with very low or very high BrS1 penetrance, indicating an appropriate estimate of variance.

(PNG)

S12 Fig. Estimated coverage rates for each SCN5A variant versus sampled true penetrance.

Coverage rate was calculated as defined above. Color and radius indicate the log10 of the total number of heterozygotes present in the dataset. The tuning parameter Eq 4 was set to ν = 99. Variant undercoverage is much more prevalent and distributed evenly across variants with low to high BrS1 penetrance indicating an overestimate of variance.

(PNG)

Data Availability

All data and processing scripts are available on the Kroncke lab github website: https://github.com/kroncke-lab/Bayes_BrS1_Penetrance. Additionally the data used are available as a supplement to this manuscript and through our SCN5A variant browser website: https://oates.app.vumc.org/vancart/SCN5A/.

Funding Statement

This research was funded by National Institutes of Health awards R00HL135442 (BMK), P50GM115305 (DMR), K99 HG010904 (AMG), R01HL149826 (DMR), and F32HL137385 (AMG) (https://www.nih.gov/). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

References

  • 1.Tennessen JA, Bigham AW, O'Connor TD, Fu WQ, Kenny EE, Gravel S, et al. Evolution and Functional Impact of Rare Coding Variation from Deep Sequencing of Human Exomes. Science. 2012;337(6090):64–9. 10.1126/science.1219240 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Dewey FE, Murray MF, Overton JD, Habegger L, Leader JB, Fetterolf SN, et al. Distribution and clinical impact of functional variants in 50,726 whole-exome sequences from the DiscovEHR Study. Science. 2016;354(6319). [DOI] [PubMed] [Google Scholar]
  • 3.Van Hout CV, Tachmazidou I, Backman JD, Hoffman JX, Ye B, Pandey AK, et al. Whole exome sequencing and characterization of coding variation in 49,960 individuals in the UK Biobank. bioRxiv. 2019. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Taliun D, Harris DN, Kessler MD, Carlson J, Szpiech ZA, Torres R, et al. Sequencing of 53,831 diverse genomes from the NHLBI TOPMed Program. bioRxiv. 2019. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Chen R, Shi L, Hakenberg J, Naughton B, Sklar P, Zhang J, et al. Analysis of 589,306 genomes identifies individuals resilient to severe Mendelian childhood diseases. Nat Biotechnol. 2016;34(5):531–8. 10.1038/nbt.3514 [DOI] [PubMed] [Google Scholar]
  • 6.Krier J, Barfield R, Green RC, Kraft P. Reclassification of genetic-based risk predictions as GWAS data accumulate. Genome Med. 2016;8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Cooper GM. Parlez-vous VUS? Genome Res. 2015;25(10):1423–6. 10.1101/gr.190116.115 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Hoffman-Andrews L. The known unknown: the challenges of genetic variants of uncertain significance in clinical practice. J Law Biosci. 2017;4(3):648–57. 10.1093/jlb/lsx038 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Ackerman MJ. Genetic purgatory and the cardiac channelopathies: Exposing the variants of uncertain/unknown significance issue. Heart Rhythm. 2015;12(11):2325–31. 10.1016/j.hrthm.2015.07.002 [DOI] [PubMed] [Google Scholar]
  • 10.Manolio TA, Fowler DM, Starita LM, Haendel MA, MacArthur DG, Biesecker LG, et al. Bedside Back to Bench: Building Bridges between Basic and Clinical Genomic Research. Cell. 2017;169(1):6–12. 10.1016/j.cell.2017.03.005 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Richards S, Aziz N, Bale S, Bick D, Das S, Gastier-Foster J, et al. Standards and guidelines for the interpretation of sequence variants: a joint consensus recommendation of the American College of Medical Genetics and Genomics and the Association for Molecular Pathology. Genet Med. 2015;17(5):405–24. 10.1038/gim.2015.30 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Tavtigian SV, Greenblatt MS, Harrison SM, Nussbaum RL, Prabhu SA, Boucher KM, et al. Modeling the ACMG/AMP variant classification guidelines as a Bayesian classification framework. Genet Med. 2018;20(9):1054–60. 10.1038/gim.2017.210 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Cooper DN, Krawczak M, Polychronakos C, Tyler-Smith C, Kehrer-Sawatzki H. Where genotype is not predictive of phenotype: towards an understanding of the molecular basis of reduced penetrance in human inherited disease. Hum Genet. 2013;132(10):1077–130. 10.1007/s00439-013-1331-2 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Kapplinger JD, Tester DJ, Alders M, Benito B, Berthet M, Brugada J, et al. An international compendium of mutations in the SCN5A-encoded cardiac sodium channel in patients referred for Brugada syndrome genetic testing. Heart Rhythm. 2010;7(1):33–46. 10.1016/j.hrthm.2009.09.069 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Chen Q, Kirsch GE, Zhang D, Brugada R, Brugada J, Brugada P, et al. Genetic basis and molecular mechanism for idiopathic ventricular fibrillation. Nature. 1998;392(6673):293–6. 10.1038/32675 [DOI] [PubMed] [Google Scholar]
  • 16.Hong K, Berruezo-Sanchez A, Poungvarin N, Oliva A, Vatta M, Brugada J, et al. Phenotypic characterization of a large European family with Brugada syndrome displaying a sudden unexpected death syndrome mutation in SCN5A. J Cardiovasc Electrophysiol. 2004;15(1):64–9. 10.1046/j.1540-8167.2004.03341.x [DOI] [PubMed] [Google Scholar]
  • 17.Potet F, Mabo P, Le Coq G, Probst V, Schott JJ, Airaud F, et al. Novel brugada SCN5A mutation leading to ST segment elevation in the inferior or the right precordial leads. J Cardiovasc Electrophysiol. 2003;14(2):200–3. 10.1046/j.1540-8167.2003.02382.x [DOI] [PubMed] [Google Scholar]
  • 18.Kroncke BM, Glazer AM, Smith DK, Blume JD, Roden DM. SCN5A (NaV1.5) Variant Functional Perturbation and Clinical Presentation: Variants of a Certain Significance. Circ Genom Precis Med. 2018;11(5):e002095 10.1161/CIRCGEN.118.002095 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Kroncke BM, Mendenhall J, Smith DK, Sanders CR, Capra JA, George AL, et al. Protein structure aids predicting functional perturbation of missense variants in SCN5A and KCNQ1. Computational and Structural Biotechnology Journal. 2019;17:206–14. 10.1016/j.csbj.2019.01.008 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Kapplinger JD, Giudicessi JR, Ye D, Tester DJ, Callis TE, Valdivia CR, et al. Enhanced Classification of Brugada Syndrome-Associated and Long-QT Syndrome-Associated Genetic Variants in the SCN5A-Encoded Na(v)1.5 Cardiac Sodium Channel. Circ Cardiovasc Genet. 2015;8(4):582–95. 10.1161/CIRCGENETICS.114.000831 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Glazer AM, Wada Y, Li B, et al. High-Throughput Reclassification of SCN5A Variants [published online ahead of print, 2020 Jun 5]. Am J Hum Genet. 2020;S0002-9297(20)30162-2. 10.1016/j.ajhg.2020.05.015 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Landrum MJ, Lee JM, Benson M, Brown G, Chao C, Chitipiralla S, et al. ClinVar: public archive of interpretations of clinically relevant variants. Nucleic Acids Res. 2016;44(D1):D862–8. 10.1093/nar/gkv1222 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Wright CF, West B, Tuke M, Jones SE, Patel K, Laver TW, et al. Assessing the Pathogenicity, Penetrance, and Expressivity of Putative Disease-Causing Variants in a Population Setting. The American Journal of Human Genetics. 2019;104(2):275–86. 10.1016/j.ajhg.2018.12.015 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Katsanis N. The continuum of causality in human genetic disorders. Genome Biol. 2016;17(1):233 10.1186/s13059-016-1107-9 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Tuke MA, Ruth KS, Wood AR, Beaumont RN, Tyrrell J, Jones SE, et al. Mosaic Turner syndrome shows reduced penetrance in an adult population study. Genetics in Medicine. 2018. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Shah N, Hou YC, Yu HC, Sainger R, Caskey CT, Venter JC, et al. Identification of Misclassified ClinVar Variants via Disease Population Prevalence. Am J Hum Genet. 2018;102(4):609–19. 10.1016/j.ajhg.2018.02.019 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Schwartz PJ, Crotti L, George AL Jr. Modifier genes for sudden cardiac death. Eur Heart J. 2018;39(44):3925–31. 10.1093/eurheartj/ehy502 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Hosseini SM, Kim R, Udupa S, Costain G, Jobling R, Liston E, et al. Reappraisal of Reported Genes for Sudden Arrhythmic Death. Circulation. 2018;138(12):1195–205. 10.1161/CIRCULATIONAHA.118.035070 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Oetjens MT, Kelly MA, Sturm AC, Martin CL, Ledbetter DH. Quantifying the polygenic contribution to variable expressivity in eleven rare genetic disorders. Nature communications. 2019;10(1):4897 10.1038/s41467-019-12869-0 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Mizusawa Y, Wilde AA. Brugada syndrome. Circ Arrhythm Electrophysiol. 2012;5(3):606–16. 10.1161/CIRCEP.111.964577 [DOI] [PubMed] [Google Scholar]
  • 31.Copas JB. Regression, Prediction and Shrinkage. J R Stat Soc B. 1983;45(3):311–54. [Google Scholar]
  • 32.Dempster A, Laird N, Rdin D, editors. {M}aximum {L}ikelihood from {I}ncomplete {D}ata via the {EM} {A}lgorithm. JOURNAL OF THE ROYAL STATISTICAL SOCIETY, SERIES B; 1977. [Google Scholar]
  • 33.Lek M, Karczewski KJ, Minikel EV, Samocha KE, Banks E, Fennell T, et al. Analysis of protein-coding genetic variation in 60,706 humans. Nature. 2016;536(7616):285–91. 10.1038/nature19057 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Postema PG. About Brugada syndrome and its prevalence. Europace. 2012;14(7):925–8. 10.1093/europace/eus042 [DOI] [PubMed] [Google Scholar]
  • 35.Kumar P, Henikoff S, Ng PC. Predicting the effects of coding non-synonymous variants on protein function using the SIFT algorithm. Nat Protoc. 2009;4(7):1073–81. 10.1038/nprot.2009.86 [DOI] [PubMed] [Google Scholar]
  • 36.Adzhubei IA, Schmidt S, Peshkin L, Ramensky VE, Gerasimova A, Bork P, et al. A method and server for predicting damaging missense mutations. Nat Methods. 2010;7(4):248–9. 10.1038/nmeth0410-248 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.Choi Y, Sims GE, Murphy S, Miller JR, Chan AP. Predicting the functional effect of amino acid substitutions and indels. PLoS One. 2012;7(10):e46688 10.1371/journal.pone.0046688 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, et al. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 1997;25(17):3389–402. 10.1093/nar/25.17.3389 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Pupko T, Bell RE, Mayrose I, Glaser F, Ben-Tal N. Rate4Site: an algorithmic tool for the identification of functional regions in proteins by surface mapping of evolutionary determinants within their homologues. Bioinformatics. 2002;18 Suppl 1:S71–7. [DOI] [PubMed] [Google Scholar]
  • 40.Li B, Mendenhall JL, Kroncke BM, Taylor KC, Huang H, Smith DK, et al. Predicting the Functional Impact of KCNQ1 Variants of Unknown Significance. Circ Cardiovasc Genet. 2017;10(5). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41.Dayhoff MO, Schwartz RM, Orcutt BC. A model of evolutionary change in proteins. Atlas of protein sequence and structure. 1978;5(suppl 3):345–51. [Google Scholar]
  • 42.Fletcher Mercaldo S, Blume JD. Missing data and prediction: the pattern submodel. Biostatistics. 2018. [DOI] [PMC free article] [PubMed] [Google Scholar]

Decision Letter 0

Hua Tang, Leslie Biesecker

31 Dec 2019

Dear Dr Kroncke,

Thank you very much for submitting your Research Article entitled 'A Bayesian method to estimate variant-induced disease penetrance: moving beyond a dichotomous view of variant pathogenicity' to PLOS Genetics. Your manuscript was fully evaluated at the editorial level and by independent peer reviewers. The reviewers appreciated the attention to an important problem, but raised some substantial concerns about the current manuscript. Based on the reviews, we will not be able to accept this version of the manuscript, but we would be willing to review again a much-revised version. We cannot, of course, promise publication at that time.

Should you decide to revise the manuscript for further consideration here, your revisions should address the specific points made by each reviewer. We will also require a detailed list of your responses to the review comments and a description of the changes you have made in the manuscript.

If you decide to revise the manuscript for further consideration at PLOS Genetics, please aim to resubmit within the next 60 days, unless it will take extra time to address the concerns of the reviewers, in which case we would appreciate an expected resubmission date by email to plosgenetics@plos.org.

If present, accompanying reviewer attachments are included with this email; please notify the journal office if any appear to be missing. They will also be available for download from the link below. You can use this link to log into the system when you are ready to submit a revised version, having first consulted our Submission Checklist.

To enhance the reproducibility of your results, we recommend that you deposit your laboratory protocols in protocols.io, where a protocol can be assigned its own identifier (DOI) such that it can be cited independently in the future. For instructions see our guidelines.

Please be aware that our data availability policy requires that all numerical data underlying graphs or summary statistics are included with the submission, and you will need to provide this upon resubmission if not already present. In addition, we do not permit the inclusion of phrases such as "data not shown" or "unpublished results" in manuscripts. All points should be backed up by data provided with the submission.

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool.  PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email us at figures@plos.org.

PLOS has incorporated Similarity Check, powered by iThenticate, into its journal-wide submission system in order to screen submitted content for originality before publication. Each PLOS journal undertakes screening on a proportion of submitted articles. You will be contacted if needed following the screening process.

To resubmit, use the link below and 'Revise Submission' in the 'Submissions Needing Revision' folder.

[LINK]

We are sorry that we cannot be more positive about your manuscript at this stage. Please do not hesitate to contact us if you have any concerns or questions.

Yours sincerely,

Leslie Biesecker

Guest Editor

PLOS Genetics

Hua Tang

Section Editor: Natural Variation

PLOS Genetics

The paper has been reviewed by two anonymous reviewers and I have added some comments here. The opinions of the peer reviewers was widely disparate, but both identified major issues.  I add some comments below.

1. You appear to be collapsing the pathogenicity of the variant and the penetrance function of a variant into a single concept. It is not clear to me why this is the correct approach. A VUS can have high (apparent) penetrance and a pathogenic variant could have ~50% penetrance. Your construct would appear to evaluate these the same.

2. You are inaccurately characterizing the Richards et al and Tavtigian et al heuristics. The former (implicitly) and the latter (explicitly) yield pathogenicity assertions in a nearly continuous probabilistic fashion, but have chosen to use a post-hoc reporting system of five tiers. Anyone using Richards et al can readily see that a variant with one strong and three supporting criteria for pathogenicity is more likely to be pathogenic than would be a variant with one strong and two supporting pathogenic (likely pathogenic criterion iii). You also inaccurately describe this as a three tier system by collapsing P and LP into one category and B and LB into one category and then criticizing it for being too coarse – “…with variants not confidently placed in either of these categories classified as VUS.” Emphasis added. The Tavitigian et al heuristic showed how Richards et al can be transformed into a (nearly) continuous output posterior probability – they are the same. This characterization is also problematic – “Here we advance an approach that quantitates degree of pathogenicity, probabilistically…” but Tavtigian did that as well and you seem to be suggesting that your work in this respect is novel. Your figure 5 is especially problematic in this regard. Even worse that the (incorrect, in my view) collapsing of P & LP, you simply label them both as P. You should also be aware of a larger issue, which is that there is a strongly held view in medicine that clinicians do not want posterior probabilities at all – they want yes-no answers (you and I don’t accept this view, but I think we are, for the time being, outnumbered). Thus, this is a complicated issue. I think this thrust of your article is not justified and not supported by a review of prior work. Instead, what I think you are doing is describing a novel formulation of a conditional probability and using a different Bayesian formulation than did Tavtigian. That to me seems like a perfectly valid and useful approach and would simplify and clarify your paper.

3. You state that “By quantitatively integrating multiple features, including in vitro functional experiments, information about the three-dimensional protein structure, and previously published variant classifiers, we estimate the BrS1 penetrance attributable to individual SCN5A variants.” This seems to suggest that you are supplanting Richard/Tavtigian PS3 (in vitro functional experiments) PP3 (information about the three-dimensional protein structure) with your measures. I am troubled by ‘previously published variant classifiers’ – I have no idea what that means. Later you say this: “Our hypothesis is that features such as variant-induced changes in function, sequence conservation, and location in structure, contain equivalent information to clinically phenotyping carriers and can therefore be used to calculate a penetrance prior. This seems to suggest three criteria, but different than the first three I quote above. The first again seems to be PS3 and are the second and third PP3?

4. I am also puzzled as to why this is considered a prior probability. To me, it seems like a conditional probability. You seem to have a very different concept of priors and posteriors than does Tavtigian et al – this needs to be much better explained.

5. Your system would not appear to take into account any of the other forms of evidence as does Richards/Tavtigian – why is that? If you do mean to allow those to be taken into account how would that be done? You claim “Our framework captures the same information currently used to adjudicate variants (ACMG guidelines) and reinterprets it quantitatively and probabilistically in terms of risk and uncertainty.” This does not appear to be the case – I do not see in your framework incorporation of the following criteria from Richards et al: PVS1, PS2, PM1, PM2, PM4, PP1, and PP2.

6. You state “We suggest the proposed framework is useful in cases both where an individual presents with BrS1 or where a variant is discovered incidentally in an individual not presenting with an arrhythmia.” Are you suggesting that what you describe as ‘penetrance’ – the output of your algorithm is the same in diagnostic or screening contexts? I find that difficult to accept.

7. What do you define Brugada syndrome to be? While you roundly criticize Richards et al for being categorical, is it the case that Brugada syndrome is a binary attribute – people either have it or they do not? Given the electrophysiologic nature of this disease, this seems implausible. But even putting that aside I don’t see how you can define a probability of the penetrance of a disorder if you don’t precisely define what it is. This point seems to be consistent with that of one of the peer reviewers.

8. Your assumption that all persons in gnomAD are unaffected is almost certainly wrong. Can you estimate the error introduced by, say 12 people in gnomAD having the disorder? 25 if your estimate of prevalence is off by a factor of 2.0?

9. I do not think that individuals heterozygous for a variant associated with a disorder that is inherited in an autosomal dominant pattern of inheritance should be described as carriers. Better to say ‘heterozygotes’ and reserve ‘carriers’ for disorders inherited in an autosomal recessive pattern.

10. You should use HGVS nomenclature for all variants. Please specify the cDNA change with the first citation of a variant (subsequent cites of that variant can use protein only, but should use proper protein nomenclature and three letter codes e.g., p.(Arg878Cys).)

Reviewer's Responses to Questions

Comments to the Authors:

Please note here if the review is uploaded as an attachment.

Reviewer #1: The authors built a model to predict Brugada Syndrome (BrS1) penetrance of SCN5A variants and assume that the severity of the variant predicts the probability of having disease symptoms (penetrance). The authors have a good idea to move from categorical classification of variants to a more quantitative measure. However, although the gene function and structure are known to predict variant pathogenicity, it is not clear from this manuscript that they predict penetrance as well. Penetrance is much more complicated than just variant pathogenicity for BrS. They have chosen a disease to study that is somewhat complicated.

Major comments:

1. The authors use data from a collection of 756 publications. The studies recruited participants using different criteria. Given the heterogeneity in the clinical characterization of the clinical phenotyping across studies and the difficulty in making the clinical diagnosis of BrS, I am not confident that this is a reasonable data set upon which to build or test a robust model. Furthermore, because the penetrance for BrS is sex and age dependent in addition to modification with fever and medication, the data upon which to base the models are complicated, and the information on the relevant modifiers is rarely published. The details of the dataset should have been summarized in a table describing the clinical cohort derived from the 756 publications along with a figure of the distribution and frequency of the variants studied.

2. The model design is not clear. To calculate the prior penetrance, one of the features – penetrance density is based on the observed cases and controls. The posterior penetrance is also calculated based on the observed cases and controls. It is unclear if the convergence criterion is meaningful. What does it optimize?

3. How did the authors evaluate the performance of the model? The dataset includes 1000+ variants but only three variants were presented in Figure 1 and 5. The authors used the ClinVar annotation of the three variants to evaluate their estimated penetrance. Pathogenic variants can have penetrance from 0 to 1. The pathogenic variant in the Figure 1 and 5 showed high penetrance, but how do they prove this? Can the authors compare their posterior penetrance with the observed penetrance when the number of carriers is large enough?

4. From Figure 2, the authors seem to be including common variants in their dataset. These common variants have a BrS penetrance of ~3%, which is too high for common benign variants. This implies the prevalence of BrS is high in their cohort which will inflate the penetrance calculation.

5. The prior alpha and beta values depend on the estimated penetrance mean. The authors should not include the benign variants, for which the non-zero penetrance might be due in part to non-genetic reasons or pathogenic variants in other BrS genes.

Reviewer #2: This paper concerns the widely studied medical genetics problem of classifying variants in disease-causing genes with respect to the potential pathogenicity. The authors develop a Bayesian framework to classify variants on a continuous [0,1] range that represents asymptotic (w.r.t. age) penetrance. They apply this framework to hundreds of variants in the gene SCN5A, which is implicated in Brugada syndrome. The manuscript is well written.

I have only two substantial concerns of the which the first is far more important.

Major concerns.

1. I did not see any pointers to available code or to available posteriors in a form that readers can retrieve. To make the method useful to readers who may be interested in applying the method to

other diseases, the authors need to make their code available. To make the Brugada syndrome analysis useful to medical geneticists who study Brugada syndrome, the results (i.e,, prior and posterior distributions) need to be made available.

2. It is unclear how the method handles missing data such as i) variants for which peak current has not been measured or ii) in-frame indels for which several of the sequence analysis tools give no prediction.

Minor concerns.

3. The reasoning by which the feature are equivalent to 19 carriers is unclear. It would help to show explicit calculations that explain why 19 is the right number rather than 18 or 20. Moreover, I do not understand why this number does not depend on the number among the 19 phenotyed carriers who turn out to be affected.

4. How are individuals who carry more than one variant treated in the analysis?

5. How are related individuals carrying the same variant handled in the analysis?

6. How are unaffected young individuals, who may manifest Brugada syndrome at a later not-yet-reached age, handled in the analysis?

7. What is the definition of "match" in the assertion "The penetrance posteriors match classification as presented in ClinVar" in the legend of Figure 1.

8. alpha and beta have not been defined when they are first used at line 156; they are defined later at line 283

9. Line 331 refers to "four commonly used servers" but lists only three methods after the colon.

10. The usage of the definition of distance between residues is not clear, especially for in-frame indels and variants in the promoter and splice sites. As an extreme example, it is unclear why the requirement i not equal to j is included in the first big equation in supplemental methods since there could be two distinct mutations affecting the same amino acid, such as p.P1011L and p.P1011S. Adding some examples of the distance calculation involving different

types of variants and unusual situations would help.

**********

Have all data underlying the figures and results presented in the manuscript been provided?

Large-scale datasets should be made available via a public repository as described in the PLOS Genetics data availability policy, and numerical data that underlies graphs or summary statistics should be provided in spreadsheet form as supporting information.

Reviewer #1: No: Need the actual data set they used to try and replicate their findings.

Reviewer #2: No: See major comment 1.

**********

PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: Yes: Alejandro Schaffer

Decision Letter 1

Hua Tang, Leslie Biesecker

15 Apr 2020

* Please note while forming your response, if your article is accepted, you may have the opportunity to make the peer review history publicly available. The record will include editor decision letters (with reviews) and your responses to reviewer comments. If eligible, we will contact you to opt in or out. *

Dear Dr Kroncke,

Thank you very much for submitting your Research Article entitled 'A Bayesian method to estimate variant-induced disease penetrance' to PLOS Genetics. Your manuscript was fully evaluated at the editorial level and by independent peer reviewers. The reviewers appreciated the attention to an important topic but identified some aspects of the manuscript that should be improved.

We therefore ask you to modify the manuscript according to the review recommendations before we can consider your manuscript for acceptance. Your revisions should address the specific points made by each reviewer.

In addition we ask that you:

1) Provide a detailed list of your responses to the review comments and a description of the changes you have made in the manuscript.

2) Upload a Striking Image with a corresponding caption to accompany your manuscript if one is available (either a new image or an existing one from within your manuscript). If this image is judged to be suitable, it may be featured on our website. Images should ideally be high resolution, eye-catching, single panel square images. For examples, please browse our archive. If your image is from someone other than yourself, please ensure that the artist has read and agreed to the terms and conditions of the Creative Commons Attribution License. Note: we cannot publish copyrighted images.

We hope to receive your revised manuscript within the next 30 days. If you anticipate any delay in its return, we would ask you to let us know the expected resubmission date by email to plosgenetics@plos.org.

If present, accompanying reviewer attachments should be included with this email; please notify the journal office if any appear to be missing. They will also be available for download from the link below. You can use this link to log into the system when you are ready to submit a revised version, having first consulted our Submission Checklist.

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email us at figures@plos.org.

Please be aware that our data availability policy requires that all numerical data underlying graphs or summary statistics are included with the submission, and you will need to provide this upon resubmission if not already present. In addition, we do not permit the inclusion of phrases such as "data not shown" or "unpublished results" in manuscripts. All points should be backed up by data provided with the submission.

PLOS has incorporated Similarity Check, powered by iThenticate, into its journal-wide submission system in order to screen submitted content for originality before publication. Each PLOS journal undertakes screening on a proportion of submitted articles. You will be contacted if needed following the screening process.

To resubmit, you will need to go to the link below and 'Revise Submission' in the 'Submissions Needing Revision' folder.

[LINK]

Please let us know if you have any questions while making these revisions.

Yours sincerely,

Leslie Biesecker

Guest Editor

PLOS Genetics

Hua Tang

Section Editor: Natural Variation

PLOS Genetics

This is much improved. One thing I noticed is that the summary of the article needs to be rewritten to shift the narrative to penetrance - it still reads like it is pathogenicity.

Reviewer's Responses to Questions

Comments to the Authors:

Please note here if the review is uploaded as an attachment.

Reviewer #1: The authors have dramatically improved the manuscript and addressed most of my previous questions. I still do not agree with including common variants. The common/synonymous variants are very likely to be benign, and it is not useful estimate penetrance for predicted benign variants.

The model can estimate penetrance for a previously variant observed in cases. Penetrance for a new variants (not previously observed) cannot be estimated because you cannot form the likelihood probability. If this is true, this should be stated in the limitations.

Reviewer #2: I mostly limited my assessment of the revision to the question: Did the authors address the 10 comments I made on the initial submission.

The authors have not fully addressed my two major comments The authors have fully addressed 6 or 7 of my 8 minor comments. While revising the manuscript

the authors introduced some typos and errors in wording.

Major comment 1.

The authors have not addressed my concern about making the code and data available.

I see some files for this project at

https://github.com/kroncke-lab/resources/tree/master/A%20Bayesian%20method%20to%20estimate%20disease%20penetrance%20from%20genetic%20variant%20properties

However, there is no README file or any other file that I recognize as documentation. The commands

git clone https://github.com/kroncke-lab/resources/tree/master/A%20Bayesian%20method%20to%20estimate%20disease%20penetrance%20from%20genetic%20variant%20properties

git clone https://github.com/kroncke-lab/resources/tree/master

both failed.

Major comment 2.

This may be addressed but I do not understand what is a "missing data pattern" and the corresponding

new text at lines 405-408. Some examples would help.

Minor comment 3.

I understand the response but did not find where this is explained in the text.

Minor comment 4.

OK

Minor comment 5.

OK, except that the newly added text has a typo.

Minor comment 6.

The response in place is OK, but I was surprised to see that the authors claimed in response to the Editor that

""For person X, the variant is either pathogenic or it is not", which misrepresents the challenge of age-dependent

penetrance and this simplistic point of view made it into the revised manuscript at lines 84-85 where the authors wrote

"The pathogenicity of a variant for a specific individual is binary, but unknown".

The authors cannot have it both ways. Either they acknowledge that age-dependent penetrance is

not handled by their method, as suggested by the response to minor comment 6, or they do not.

Minor comment 7.

OK.

Minor comment 8.

OK.

Minor comment 9.

OK.

Minor comment 10.

OK.

Line 19, "to best assess" is a split infinitive

Lines 105 and 125, change "heterozygotic" to "heterozygous"

Line 109, change "which produce" to "that produce"

Line 172, change "used Bland-Altman plot" to "used a Bland-Altman plot"

Line 272, change "59 genes the ACMG recommends" to "59 genes for which the ACMG recommends"

Line 290, change "The result of not accounting for" to "Not accounting for"

Line 373, it is not clear how the indels are scored by the methods such as SIFT; is the score assigned as the worst possible score for any amino acid substitution at the same position?

Line 377, change "For we included" to "We included"

Lines 379-380, change "We do not include intronic variant" to "We did not include intronic variants"

**********

Have all data underlying the figures and results presented in the manuscript been provided?

Large-scale datasets should be made available via a public repository as described in the PLOS Genetics data availability policy, and numerical data that underlies graphs or summary statistics should be provided in spreadsheet form as supporting information.

Reviewer #1: Yes

Reviewer #2: No: See my report. I could not download the authors' GitHub repository and it lacks a README.

**********

PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: Yes: Alejandro A. Schaffer

Decision Letter 2

Hua Tang, Leslie Biesecker

14 May 2020

Dear Dr Kroncke,

We are pleased to inform you that your manuscript entitled "A Bayesian method to estimate variant-induced disease penetrance" has been editorially accepted for publication in PLOS Genetics. Congratulations!

Before your submission can be formally accepted and sent to production you will need to complete our formatting changes, which you will receive in a follow up email. Please be aware that it may take several days for you to receive this email; during this time no action is required by you. Please note: the accept date on your published article will reflect the date of this provisional accept, but your manuscript will not be scheduled for publication until the required changes have been made.

Once your paper is formally accepted, an uncorrected proof of your manuscript will be published online ahead of the final version, unless you’ve already opted out via the online submission form. If, for any reason, you do not want an earlier version of your manuscript published online or are unsure if you have already indicated as such, please let the journal staff know immediately at plosgenetics@plos.org.

In the meantime, please log into Editorial Manager at https://www.editorialmanager.com/pgenetics/, click the "Update My Information" link at the top of the page, and update your user information to ensure an efficient production and billing process. Note that PLOS requires an ORCID iD for all corresponding authors. Therefore, please ensure that you have an ORCID iD and that it is validated in Editorial Manager. To do this, go to ‘Update my Information’ (in the upper left-hand corner of the main menu), and click on the Fetch/Validate link next to the ORCID field.  This will take you to the ORCID site and allow you to create a new iD or authenticate a pre-existing iD in Editorial Manager.

If you have a press-related query, or would like to know about one way to make your underlying data available (as you will be aware, this is required for publication), please see the end of this email. If your institution or institutions have a press office, please notify them about your upcoming article at this point, to enable them to help maximise its impact. Inform journal staff as soon as possible if you are preparing a press release for your article and need a publication date.

Thank you again for supporting open-access publishing; we are looking forward to publishing your work in PLOS Genetics!

Yours sincerely,

Leslie Biesecker

Guest Editor

PLOS Genetics

Hua Tang

Section Editor: Natural Variation

PLOS Genetics

www.plosgenetics.org

Twitter: @PLOSGenetics

----------------------------------------------------

Comments from the reviewers (if applicable):

Thank you for your responsive revision. I am pleased to recommend to the Senior Editor that your paper be accepted for publication.

----------------------------------------------------

Data Deposition

If you have submitted a Research Article or Front Matter that has associated data that are not suitable for deposition in a subject-specific public repository (such as GenBank or ArrayExpress), one way to make that data available is to deposit it in the Dryad Digital Repository. As you may recall, we ask all authors to agree to make data available; this is one way to achieve that. A full list of recommended repositories can be found on our website.

The following link will take you to the Dryad record for your article, so you won't have to re‐enter its bibliographic information, and can upload your files directly: 

http://datadryad.org/submit?journalID=pgenetics&manu=PGENETICS-D-19-01892R2

More information about depositing data in Dryad is available at http://www.datadryad.org/depositing. If you experience any difficulties in submitting your data, please contact help@datadryad.org for support.

Additionally, please be aware that our data availability policy requires that all numerical data underlying display items are included with the submission, and you will need to provide this before we can formally accept your manuscript, if not already present.

----------------------------------------------------

Press Queries

If you or your institution will be preparing press materials for this manuscript, or if you need to know your paper's publication date for media purposes, please inform the journal staff as soon as possible so that your submission can be scheduled accordingly. Your manuscript will remain under a strict press embargo until the publication date and time. This means an early version of your manuscript will not be published ahead of your final version. PLOS Genetics may also choose to issue a press release for your article. If there's anything the journal should know or you'd like more information, please get in touch via plosgenetics@plos.org.

Acceptance letter

Hua Tang, Leslie Biesecker

15 Jun 2020

PGENETICS-D-19-01892R2

A Bayesian method to estimate variant-induced disease penetrance

Dear Dr Kroncke,

We are pleased to inform you that your manuscript entitled "A Bayesian method to estimate variant-induced disease penetrance" has been formally accepted for publication in PLOS Genetics! Your manuscript is now with our production department and you will be notified of the publication date in due course.

The corresponding author will soon be receiving a typeset proof for review, to ensure errors have not been introduced during production. Please review the PDF proof of your manuscript carefully, as this is the last chance to correct any errors. Please note that major changes, or those which affect the scientific understanding of the work, will likely cause delays to the publication date of your manuscript.

Soon after your final files are uploaded, unless you have opted out or your manuscript is a front-matter piece, the early version of your manuscript will be published online. The date of the early version will be your article's publication date. The final article will be published to the same URL, and all versions of the paper will be accessible to readers.

Thank you again for supporting PLOS Genetics and open-access publishing. We are looking forward to publishing your work!

With kind regards,

Matt Lyles

PLOS Genetics

On behalf of:

The PLOS Genetics Team

Carlyle House, Carlyle Road, Cambridge CB4 3DN | United Kingdom

plosgenetics@plos.org | +44 (0) 1223-442823

plosgenetics.org | Twitter: @PLOSGenetics

Associated Data

    This section collects any data citations, data availability statements, or supplementary materials included in this article.

    Supplementary Materials

    S1 Text. Detailed explanation of penetrance density calculation.

    (DOCX)

    S2 Text. Detailed explanation of how ‘ν’ from Eq 4 was determined.

    (DOCX)

    S1 Table. SCN5A variant-specific features used to predict BrS1 penetrance.

    (DOCX)

    S2 Table. SCN5A dataset.

    All data used to estimate BrS1 penetrance including covariates are included in the accompanying dataset.

    (CSV)

    S1 Fig. Histogram of the frequency of variants (y-axis) with different number of individuals diagnosed with Brugada syndrome (x-axis).

    Most variants have only a single heterozygote diagnosed with BrS; however, there are over 10 variants with 10 or more heterozygotes diagnosed with BrS.

    (PNG)

    S2 Fig. Frequency of variants (y-axis) with different counts in gnomAD (x-axis).

    The x-axis is truncated at 350. There are 10 variants with greater than 350 carriers.

    (PNG)

    S3 Fig. Frequency of variants (y-axis) with different observed BrS penetrances (x-axis).

    Most variants have either exactly 0 or exactly 1 observed BrS penetrance, at odds with both the known background rate of BrS in the general public (approximately 1 in 10,000–20,000) and with the extreme rarity of any variant having 100% penetrance.

    (PNG)

    S4 Fig. Bland-Altman plot between EM posterior mean BrS penetrances and observed BrS penetrance for SCN5A variants with at least 15 heterozygotes.

    The relatively narrow spread along the y-axis suggests reasonable agreement between the two estimates of BrS penetrance. With the cutoff of at least 15 heterozygotes, there are relatively few variants with an expected penetrance of greater than 10%.

    (PNG)

    S5 Fig. Histogram of BrS1 penetrance imputed EM prior means and associated upper and lower bounds to 95% credible interval from pattern mixture models.

    Plotted are BrS1 mean penetrances from imputed EM priors (“Predicted”, green) and upper (red) and lower (blue) bounds to associated 95% credible intervals from those imputed EM priors.

    (PNG)

    S6 Fig. SCN5A pathogenic and benign variants cluster in space.

    Rate of variants with high BrS1 penetrance (>20%, blue) or low BrS1 penetrance (<10%, red) in a model of the SCN5A protein product. Each bar represents a histogram of variants associated with each disease within a 5Å slice within the membrane (divided by the total number of residues within the slice), boxes at each of the four corners represent residues not modeled (only 33 residues were not modeled in the extracellular loops). There is a relative paucity of low BrS1 penetrance variants within the structured transmembrane region and the relative abundance of high BrS1 penetrance in the same region. The rate of high BrS1 penetrance variants is higher in the extracellular half of the protein molecule likely due to more compacting of residues in the top half of the pore domain as well as proximity to the ion selective element (selectivity filter). Amino acid substitutions in these regions therefore more often have a disruptive influence.

    (PNG)

    S7 Fig. Generation of empirical and EM priors.

    The modified EM algorithm is an iterative technique composed of two steps: 1) calculate the expected penetrance from an empirical Bayes penetrance model and 2) fit regression of our estimated penetrance on variant-specific characteristics by maximum likelihood. The fitted model is then used to generate an updated, imputed prior and subsequent posterior expected penetrance and this process is iterated until it converges to the maximum likelihood solution, when the new mean penetrance changed by less than 1% from the previous iteration. The variance is then estimated according to Eq 4 as explained above.

    (PNG)

    S8 Fig. BrS1 penetrance probability versus penetrance for the empirical prior.

    (PNG)

    S9 Fig. Estimated coverage rates for each SCN5A variant versus sampled true penetrance.

    Coverage rate was calculated as defined above. Color and radius indicate the log10 of the total number of heterozygotes present in the dataset. The tuning parameter Eq 4 was set to ν = 7. There is overcoverage (greater than 95%) for variants with high and low BrS1 penetrance indicating an overestimate of the variance.

    (PNG)

    S10 Fig. Estimated coverage rates for each SCN5A variant versus sampled true penetrance.

    Coverage rate was calculated as defined above. Color and radius indicate the log10 of the total number of heterozygotes present in the dataset. The tuning parameter Eq 4 was set to ν = 14. There is overcoverage for the majority of variants, though some variants are now outside the 95% credible interval.

    (PNG)

    S11 Fig. Estimated coverage rates for each SCN5A variant versus sampled true penetrance.

    Coverage rate was calculated as defined above. Color and radius indicate the log10 of the total number of heterozygotes present in the dataset. The tuning parameter Eq 4 was set to ν = 19. Overcoverage is reduced especially for residues with very low or very high BrS1 penetrance, indicating an appropriate estimate of variance.

    (PNG)

    S12 Fig. Estimated coverage rates for each SCN5A variant versus sampled true penetrance.

    Coverage rate was calculated as defined above. Color and radius indicate the log10 of the total number of heterozygotes present in the dataset. The tuning parameter Eq 4 was set to ν = 99. Variant undercoverage is much more prevalent and distributed evenly across variants with low to high BrS1 penetrance indicating an overestimate of variance.

    (PNG)

    Attachment

    Submitted filename: Reviews_PLOS_Genetics_v7.docx

    Attachment

    Submitted filename: Reviewer critique 2nd-v2.docx

    Data Availability Statement

    All data and processing scripts are available on the Kroncke lab github website: https://github.com/kroncke-lab/Bayes_BrS1_Penetrance. Additionally the data used are available as a supplement to this manuscript and through our SCN5A variant browser website: https://oates.app.vumc.org/vancart/SCN5A/.


    Articles from PLoS Genetics are provided here courtesy of PLOS

    RESOURCES