The effect of ascertainment on penetrance estimates for rare variants: Implications for establishing pathogenicity and for genetic counselling

Andrew D Paterson; Sang-Cheol Seok; Veronica J Vieland

doi:10.1371/journal.pone.0290336

. 2023 Sep 21;18(9):e0290336. doi: 10.1371/journal.pone.0290336

The effect of ascertainment on penetrance estimates for rare variants: Implications for establishing pathogenicity and for genetic counselling

Andrew D Paterson ^1,^2,^*, Sang-Cheol Seok ³, Veronica J Vieland ^3,⁴

Editor: Anshuman Mishra⁵

PMCID: PMC10513297 PMID: 37733810

Abstract

Next-generation sequencing has led to an explosion of genetic findings for many rare diseases. However, most of the variants identified are very rare and were also identified in small pedigrees, which creates challenges in terms of penetrance estimation and translation into genetic counselling in the setting of cascade testing. We use simulations to show that for a rare (dominant) disorder where a variant is identified in a small number of small pedigrees, the penetrance estimate can both have large uncertainty and be drastically inflated, due to underlying ascertainment bias. We have developed PenEst, an app that allows users to investigate the phenomenon across ranges of parameter settings. We also illustrate robust ascertainment corrections via the LOD (logarithm of the odds) score, and recommend a LOD-based approach to assessing pathogenicity of rare variants in the presence of reduced penetrance.

Introduction

Next-generation sequencing has led to an explosion in the number of genetic findings for many rare diseases. For certain types of rare coding variants (e.g. missense, or protein truncating), if the variant is sufficiently rare and has bioinformatic predictions that are severe, current algorithms result in it being classified as pathogenic [1]. However, the analysis of large-scale sequencing from cohorts, such as ExAC [2], gnomAD [3], and the UK Biobank [4], has shown that many such variants may often lack clinically significant impact. For example, ExAC estimated that individuals from population cohorts carried a mean of 53 variants previously thought to be sufficient causes of Mendelian diseases [2]. Additionally, 88% of such variants had MAF>1%, implying that they are likely not sufficient causes. This may indicate that such variants are not causally related to disease, or perhaps, that they are causally related but with reduced penetrance.

Penetrance plays an important role in understanding disease pathology, in the appropriate classification of pathogenic variants, and perhaps above all in the context of genetic counseling. However, most of the variants reported to date have been very rare and identified in small sets of unrelated individuals (sometimes just one) or small pedigrees. Penetrance cannot be estimated from a single case, or a single parent-offspring trio presenting with a de novo mutation in the offspring. But even with multiple cases or families, determination of the penetrance can present challenges. Here we focus on one such challenge: ascertainment.

Typically a variant of interest is first identified in one individual with a given phenotype. Investigators may then sequence either additional relatives of the individual, or additional individuals or families presenting with the same or closely related phenotypes, with the goal of bolstering the case for pathogenicity. Thus, ascertainment of individuals to be sequenced typically proceeds in stages. The precise ascertainment process used to enroll individuals and/or families is usually at least to some extent unsystematic, and may vary between families. Ascertainment is therefore challenging to model when attempting to estimate the penetrance of a variant.

One situation in which ascertainment can be easily handled is “single” ascertainment, in which the probability of an affected individual being ascertained is proportional to the number of affected individuals in the family [5]. In fact, much of the literature on inferring pathogenicity or estimating penetrance tends to assume single ascertainment, e.g., [6], where ascertainment is addressed by conditioning on “the proband,” a procedure which is strictly correct only under true single ascertainment. While it is true that the typical study ascertains families through one individual who may be designated as the single “proband”, this does not ensure that the study meets the proportionality requirement of single ascertainment. This requirement would be violated, e.g., if families with four affected members were more than twice as likely to be recruited as families with just two; or, if the probability of a second sibling being ascertained were dependent on the ascertainment status of the first. And in general, if either (i) ascertainment is not truly single, or (ii) even if it is, if an appropriate ascertainment correction is not incorporated into the estimation method, then penetrance estimates will be biased. Here we consider the magnitude of that bias, across a range of plausible ascertainment models and varying amounts of available data.

Penetrance estimation also plays a role in the assessment of pathogenicity. Some approaches to the interpretation of rare coding variants assume either full or high penetrance [7], for the sake of simplicity. Extensive criteria have been proposed to claim a causal relationship between variants and disease, and the authors have urged caution in presuming full penetrance for pathogenic variants [8]. But in practice, penetrance remains an important consideration. For instance, the ACMGG/AMP joint consensus recommendations [1] warns against ignoring the possibility of reduced penetrance in establishing segregation of a VOI with a phenotype, but also instructs that “lack of segregation…provides strong evidence against pathogenicity.” (p. 15). And in practice, many laboratories will rule out candidate VOIs when they are found among unaffected relatives. Particularly in the absence of a rigorous and accurate estimate of the actual penetrance, this complicates the use of segregation information in assessments of pathogenicity. Below we consider some implications for the assessment of pathogenicity in the presence of reduced penetrance, and we propose a new metric for assessing co-segregation between a VOI and disease.

Methods

Preliminaries and notation

We focus here on sibship data. The impact of ascertainment for more complex pedigrees can be approximated by considering large sibship sizes. We assume a very rare variant of interest (VOI), and an autosomal dominant disease D. Let a qualifying individual (QI) be anyone who is both heterozygous (HET) for the VOI and also affected (AFF) with D. Let r be the number of QI sibs within a family, and let t be the number of AFF sibs regardless of VOI genotype. We also assume that, regardless of VOI status, an individual might develop D due to other factors, which might be genetic (involving one or more VOIs at other loci or other variants within the same gene) and/or environmental (e.g., due to infections). Let γ be the combined penetrance across all causes other than the VOI under study. Since we assume the VOI is very rare, γ is effectively the population prevalence of D. Let s be the total number of siblings in a family (regardless of phenotype), and let N be the number of s-sized sibships in a dataset.

Ascertainment model

In order to consider a range of plausible ascertainment scenarios, we employ the general family-based k-model of ascertainment [9]. In its simplest form, this model stipulates that the probability that a family is ascertained is proportional to r^k, where k controls the model. For example, when k = 1, the probability of ascertainment is strictly proportional to r: this is equivalent to classical “single ascertainment”. Similarly, when k = 0, so that every family with r ≥ 1 is ascertained, this model is equivalent to classical “complete” or “truncate” ascertainment. We generalize this model in two ways. First, we assume that ascertainment requires r ≥ 1, that is, every ascertained family contains at least one QI, but we allow that there may be additional preferential ascertainment of families based on t alone, that is, that investigators may preferentially ascertain families with more affected individuals without knowing (or prior to knowing) the VOI status of those additional individuals. Second, we allow that even an individual carrying the VOI may develop disease due to any other independent causes at work in the general population. With these two extensions in mind, our ascertainment model becomes

P[sibship is ascertained | r, t] = c(r^k+t); for r≥1, and 0 otherwise where c is a normalizing constant.

Estimation methods

Let f be the attributable penetrance, or the penetrance due to the VOI for HET individuals. (Note that when γ > 0, β = P[AFF|HET] = γ+f−γf. However, we focus here on estimation of f itself rather than β.) In what follows, we estimate f in three ways, the first two of which are:

(i) $\tilde{f}$ is obtained by counting the proportion of AFF individuals among all HET individuals in the data set, after dropping one QI individual per family, that is, applying the correction for single ascertainment;
(ii) ${\tilde{f}}^{*}$ is obtained by counting the proportion of AFF individuals among all HET individuals in the data set, that is, without applying any ascertainment correction.

${\tilde{f}}^{*}$ is a naïve estimate, which would be correct if the families were not ascertained based on either phenotype or genotype. It is clearly, however, incorrect under any of our ascertainment models. Our interest in this estimate is to establish how biased it becomes under various ascertainment scenarios. $\tilde{f}$ by contrast, does apply the frequently employed single ascertainment correction, and again, our interest in $\tilde{f}$ is to establish how biased it will be under ascertainment scenarios other than single ascertainment.

The third form of penetrance estimate we consider is based on an “ascertainment assumption free” [10] approach, which involves conditioning on all of the phenotypic data. This is the ascertainment correction implicit in the usual LOD score [11–13], and also the LOD score allowing for linkage disequilibrium or LD-LOD [6, 14, 15], and in principle any program that allows calculation of the LOD score will support this method. The calculation is done here assigning the VOI (which plays the role of the “marker”) and the disease allele the same (rare) frequency (we have used 0.001 in the simulations), assuming complete linkage disequilibrium between the two (D′ = 1), and also assuming 0 recombination between the marker and the disease allele. Free parameters in the model are then the 3 penetrances; in our calculations we also include the admixture parameter α of Smith [16], representing the probability that any given family is of the “linked” type, which adds robustness when phenocopy levels are high.

Maximizing the LD-LOD over the free parameters gives us the LD-MOD, which occurs at the maximum likelihood estimate (m.l.e.) of $\hat{f}$ of f [10–13], giving us our third estimate:

(iii) $\hat{f}$ is obtained by maximizing the LD-LOD over the penetrance vector.

Assessment of pathogenicity

While maximizing the LD-LOD can be used to estimate f, the LD-MOD itself is not a good statistic for representing the strength of evidence for co-segregation between the VOI and disease, because it is not additionally conditioned on ascertainment through the VOI. We note, however, that in nuclear families, once we ascertain so as to require the VOI to be present in the family, there is no remaining LD information in the sibship, since LD information is conveyed entirely by the marker allele frequencies in the parents. Therefore, assessments of co-segregation can be made using the ordinary (linkage equilibrium) LOD, or LE-LOD. Because maximizing the LE-LOD itself will not return true m.l.e.s of f under the LD model, we consider evaluating the LE-LOD at the maximizing model obtained from the LD-MOD, for a statistic we annotate as LE-LOD(max).

Thompson et al. [6], following Petersen et al. [15], proposed using a particular form of what they refer to as a Bayes Factor (BF) for assessing the strength of evidence for co-segregation of the VOI with disease. We refer to this statistic as the Thompson BF (TBF). The TBF is closely related to the LD-LOD, but it incorporates an additional adjustment for single ascertainment through a QI. As we illustrate below, unlike the LD-LOD, the TBF cannot be maximized to obtain ascertainment-corrected estimates of the penetrances; and [6] did not recommend using it for this purpose. However, this complicates application of the TBF, which requires specifying a fixed set of pentrances (but see also [15]), which must be separately obtained or estimated; furthermore, it is not clear whether the adjustment for single ascertainment incorporated into the TBF is strictly correct or robust to other ascertainment models.

In what follows we evaluate the behavior of the LE-LOD(max), and compare it with the TBF, using the simulated data. For comparability with the LOD, we report TBF on the log₁₀ scale. We also incorporate the admixture parameter α in order to afford comparability with the LE- or LD-LOD in maintaining some degree of robustness to phenocopies.

Simulation methods

Expected values of $\tilde{f}, {\tilde{f}}^{*}$ and $\hat{f}$ were obtained via simulation, by averaging each estimate’s value across 1,000 replicates per generating condition, and standard errors were obtained by averaging the standard deviation of each estimate across those same 1,000 replicates. (While the expected values of $\tilde{f}$ and ${\tilde{f}}^{*}$ are easily calculated analytically, the standard errors are not.) We note that, depending on the generating conditions, many sibships may end up with only the QI being HET. In this case, the proportion of AFF out of all HET individuals cannot be scored after dropping the QI, and therefore any such sibships do not contribute to $\tilde{f}$ and ${\tilde{f}}^{*}$ , effectively reducing the sample size. For simplicity, in computing $\tilde{f}$ and ${\tilde{f}}^{*}$ we assume all parents are phenotypically and genotypically unknown; when computing LODs and TBF, parents are treated as genotypically known but phenotypically unknown. Including parental information does not substantively affect results. Simulations and calculations were done in MATLAB (2021.9.10.0.1739362 (R2021a), Natick, Massachusetts: The MathWorks Inc.); LE-LOD and TBF calculations were done using Kelvin [17].

Results

Impact of ascertainment on penetrance estimates

Fig 1 shows results for true single ascertainment (k = 1), for s = 2, as a function of sample size N. Here we assume that the true value of f = 0.5. As can be seen, in this case, the mean of $\tilde{f}$ = 0.5, the generating value, as expected. But using ${\tilde{f}}^{*}$ the estimates are seriously upwardly biased in all data sets, regardless of N. Note that because each sibship contains at least one QI, by stipulation, the minimum value of ${\tilde{f}}^{*}$ is 0.50.

Note too that even the correct estimate $\tilde{f}$ shows considerable sampling variability. For instance, with N = 10, $\tilde{f}$ will be >70% or <30% in approximately 40% of all data sets when f = 50%. This variability remains appreciable even for N = 50.

For ascertainment models other than single, overall variability remains similar to what is shown in Fig 1, but even $\tilde{f}$ tends to be biased, with mean $\tilde{f}$ = 0.60, 0.50, 0.43 and 0.38 for k = 2, 1, 0 and −1, respectively. In all cases, the uncorrected ${\tilde{f}}^{*}$ will return even more biased estimates, with mean ${\tilde{f}}^{*}$ = 0.89, 0.88, 0.87 and 0.86, for k = 2, 1, 0 and −1, respectively.

Fig 2 shows the impact of the population prevalence γ on average penetrance estimates. Focusing first on single ascertainment (k = 1) and f = 0.5, we can see that regardless of k, the expected value of $\tilde{f}$ is relatively independent of γ until γ becomes quite high. Note that for f = 0.5 and γ = 0.5, the actual probability that a VOI carrier is affected under our generating model is 0.5 + 0.5 − (0.5)(0.5) = 0.75, which is in line with the estimates returned by $\tilde{f} . {\tilde{f}}^{*}$ might be said to be even more robust to γ, although this is because in this case ${\tilde{f}}^{*}$ is already close to the top of the scale for γ = 0. Moreover, ${\tilde{f}}^{*}$ appears not only indifferent to γ, but also to f itself, with estimates >70% even for f = 0.05, and >80% for f = 0.05 when γ = 0.5. These patterns repeat for different values of k, with visible impact only on the magnitude of the bias for any given (f, γ) combination. Ascertainment effects will be reduced as s increases. Users who are interested in investigating penetrance estimates for other ascertainment models, other combinations of parameter values or other sibship sizes are encouraged to download the PenEst app: https://github.com/MathematicalMedicine/PenetranceEstimator.

Fig 3 shows results for $\hat{f}$ for the same data used in Figs 1 and 2. As can be seen, $\hat{f}$ behaves very much like $\tilde{f}$ when k = 1 (Fig 3A), but it retains almost complete robustness to ascertainment, and also to γ at least until γ is quite large (Fig 3B). (As with $\tilde{f}$ , as γ gets very large, $\hat{f}$ covers both cases due to the VOI and those among variant carriers due to other causes.) Comparing Fig 3A with Fig 1A, $\hat{f}$ shows slightly greater sampling variability than $\tilde{f}$ ; this is due to the inherent ascertainment correction built in to $\hat{f}$ . The slight but systematic over- or under-estimation of f seen in Fig 3B is due to the small sample size; as N increases $\hat{f} \to f$ (results not shown). However, in small samples the upward bias can be appreciable particularly when f is small; e.g., when f = 0.05 (γ = 0), for N = 20, the expected value of $\hat{f}$ = 0.165.

For comparison purposes, Fig 3C shows the corresponding results based on maximizing the TBF. As noted above, this procedure has never been proposed as a mechanism for estimating f, but the figure illustrates that the small differences in form between the LD-LOD and the TBF fundamentally change the applicability of the ascertainment assumption free approach to estimation. This becomes relevant when deciding how to set parameter values in calculating the TBF for purposes of assessing pathogenicity. We note too that, particularly in the presence of phenocopies, estimates obtained by maximizing the TBF remain highly biased even in very large samples. For example, for s = 2, k = 1, N = 1000 and f_DD = f_Dd = 0.05 or 0.5, when γ = 0.1, maximizing the LD-LOD returns estimates of f_Dd of 0.07 (s.d. 0.05) and 0.50 (0.04), respectively, while maximizing the TBF returns 0.74 (0.26) and 0.66 (0.12), respectively.

Assessment of pathogenicity

Fig 4A shows the distribution of the LE-MOD(max) as a function of γ and k, for f = 0.5, s = 2 and N = 20. Not surprisingly, as γ increases, evidence for co-segregation decreases; also notable is that, while estimates of f are robust to ascertainment, the LE-LOD(max) itself increases as k increases; but since there really is co-segregation, this is not in itself problematic. While values of LE-MOD(max) are small (see also below), they are consistently positive until γ is quite large, indicating evidence in favor of co-segregation. By contrast, results for the TBF(gen) (Fig 4B) become increasingly negative as γ increases, erroneously indicating evidence against co-segregation for even small values of γ, with strikingly negative values for large γ. For comparison we also show (Fig 4C) results for the TBF when it is evaluated at the same maximizing model used to calculate LE-LOD(max). While this ameliorates the problem somewhat, especially for large γ, the basic pattern of results remains the same.

Fig 5 shows the distribution of LE-LOD(max), as a function of N, when data are generated under the alternative hypothesis of (complete) disequilibrium (Fig 5A) and the null hypothesis of no linkage and no disequilibrium (Fig 5B), for s = 2, k = 1, γ = 0, and f = .5. Notably, under the alternative hypothesis, evidence of co-segregation of the VOI with disease tends to be quite weak until N is at least 30, and even then the chance of obtaining a small LOD score remains high. Under these generating conditions, it apparently requires closer to 50 2-child families before there is a reasonable chance of obtaining a substantial LE-LOD(max). Under the null distribution, even with N = 50 LE-LOD(max) scores are not consistently negative. However, the maximum and minimum scores all remain small in magnitude, so that the distributions under the alternative and the null are increasingly non-overlapping. For example, when N = 50 and there is co-segregation, 480/1000 replicates return LE-LOD(max) ≥ 3; however, when there is no co-segregation, 0 out of 1000 replicates do so.

Discussion

In general, our simulations show that under unsystematic ascertainment schemes, or in cases where appropriate ascertainment corrections are not included in the estimation procedure, there is a high risk of over-estimating the penetrance of any given VOI. This finding is consonant with, and may in large part explain, reports for specific variants. For example, multiple coding variants in PRNP had been reported to cause rare dominant monogenic neurodegenerative disease, but there was a 30-fold higher prevalence of variants previously suggested to be causal in this gene in ExAC compared to the expected frequency calculated from the estimated prevalence of the disorder [18]. Specifically for three variants the lifetime risk of developing disease was <10%. Similarly, GWAS array data from the UK Biobank were used to estimate pathogenicity, penetrance, and expressivity of putative disease-causing rare variants (MAF<1%) that were directly genotyped and had good quality [19]. Focused on maturity-onset diabetes of the young and developmental disorders, many specific variants were found for which the penetrance—estimated either in families ascertained for the presence of the VOI or in disease cohorts—was much higher than that obtained from a population-based cohort. For example, previous studies had estimated the penetrance of HNF4A rs137853336 (chr20:43042354C>T, p.Arg114Trp) to be up to 75% by age 40 years from a large Maturity Onset Diabetes of the Young cohort, but data from the UK Biobank estimated penetrance to be <10% [19]. Similarly, in the same study, none of 6 protein truncating variants in 5 genes that had previously been related to disease via a haploinsufficiency mechanisms were associated with development traits, casting doubt that such variants in these genes are a cause of developmental delay.

In another study, the median penetrance was estimated to be 14% for 361 variants that were observed in multiple individuals from genes in which some variants are related to either hypertrophic or dilated cardiomyopathy [20]. For example, MYBPC3:c.1504C>T:p.R502W, had penetrance estimated of ~50% by age 45 years in the clinical setting. However, penetrance estimates of 6.4% were obtained for this variant from two population-based sequencing cohorts. The extent of coding variation in humans is astounding: gnomAD shows that on average each individual harbours around 11,000 missense variants, about 200 of which are rare (allele frequency <0.1%) [21]. Unique variants are also relatively common: each participant in gnomAD has a mean of 27 (±13) novel coding variants that were not observed in other individuals in gnomAD [21]. These observations have implications for genetic counselling, including the recommendation of invasive screening procedures and administration of preventative treatment.

By contrast, maximizing the LD-LOD over the penetrances, which yields the LD-MOD, is a valid method for obtaining ascertainment-adjusted maximum likelihood estimates. Variability of these estimates remains a concern, however, even in reasonably large sample sizes (say, N = 50 sibships). While the LD-MOD itself cannot be used as a measure of evidence for or against co-segregation, because it is not properly conditioned on ascertainment through the VOI, the penetrance estimates obtained from the LD-MOD can be used in conjunction with the ordinary (linage equilibrium) LOD to give a statistic we called the LE-LOD(max). This statistic appears to perform more reliably than the Bayes factor proposed by Thompson et al. [6] in application to sibship data under the conditions we have simulated in this paper. It reminds us, however, that in the presence of reduced penetrance, attributions of co-segregation between a VOI and a disease can be difficult to reliably establish, or rule out, without substantial quantities of data.

Acknowledgments

Special thanks to Jo Valentine-Cooper for creation of the PenEst app.

Data Availability

Data are simulated. Code used in generating and analyzing the data is available at https://github.com/MathematicalMedicine/PenetranceEstimator/.

Funding Statement

Funding came from the National Institutes of Health (NINDS NS085238) in the form of salary support for VJV and SCS. Salary support to VJV and SCS came via a subcontract to Mathematical Medicine from the NIH grant listed above. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. The specific roles of these authors are articulated in the ‘author contributions’ section.

References

1.Richards S, Aziz N, Bale S, Bick D, Das S, Gastier-Foster J, et al. Standards and guidelines for the interpretation of sequence variants: a joint consensus recommendation of the American College of Medical Genetics and Genomics and the Association for Molecular Pathology. Genet Med. 2015;17(5):405–24. doi: 10.1038/gim.2015.30 ; PubMed Central PMCID: PMC4544753. [DOI] [PMC free article] [PubMed] [Google Scholar]
2.Lek M, Karczewski KJ, Minikel EV, Samocha KE, Banks E, Fennell T, et al. Analysis of protein-coding genetic variation in 60,706 humans. Nature. 2016;536(7616):285–91. doi: 10.1038/nature19057 ; PubMed Central PMCID: PMC5018207. [DOI] [PMC free article] [PubMed] [Google Scholar]
3.Karczewski KJ, Francioli LC, Tiao G, Cummings BB, Alföldi J, Wang Q, et al. The mutational constraint spectrum quantified from variation in 141,456 humans. Nature. 2020;581(7809):434–43. Epub 20200527. doi: 10.1038/s41586-020-2308-7 ; PubMed Central PMCID: PMC7334197. [DOI] [PMC free article] [PubMed] [Google Scholar]
4.Backman JD, Li AH, Marcketta A, Sun D, Mbatchou J, Kessler MD, et al. Exome sequencing and analysis of 454,787 UK Biobank participants. Nature. 2021;599(7886):628–34. Epub 20211018. doi: 10.1038/s41586-021-04103-z ; PubMed Central PMCID: PMC8596853. [DOI] [PMC free article] [PubMed] [Google Scholar]
5.Hodge SE, Vieland VJ. The essence of single ascertainment. Genetics. 1996;144(3):1215–23. doi: 10.1093/genetics/144.3.1215 ; PubMed Central PMCID: PMC1207613. [DOI] [PMC free article] [PubMed] [Google Scholar]
6.Thompson D, Easton DF, Goldgar DE. A full-likelihood method for the evaluation of causality of sequence variants from family data. Am J Hum Genet. 2003;73(3):652–5. Epub 20030729. doi: 10.1086/378100 ; PubMed Central PMCID: PMC1180690. [DOI] [PMC free article] [PubMed] [Google Scholar]
7.Jarvik GP, Browning BL. Consideration of Cosegregation in the Pathogenicity Classification of Genomic Variants. Am J Hum Genet. 2016;98(6):1077–81. Epub 20160526. doi: 10.1016/j.ajhg.2016.04.003 ; PubMed Central PMCID: PMC4908147. [DOI] [PMC free article] [PubMed] [Google Scholar]
8.MacArthur DG, Manolio TA, Dimmock DP, Rehm HL, Shendure J, Abecasis GR, et al. Guidelines for investigating causality of sequence variants in human disease. Nature. 2014;508(7497):469–76. doi: 10.1038/nature13127 ; PubMed Central PMCID: PMC4180223. [DOI] [PMC free article] [PubMed] [Google Scholar]
9.Ewens WJ, Shute NC. The limits of ascertainment. Ann Hum Genet. 1986;50(4):399–402. doi: 10.1111/j.1469-1809.1986.tb01760.x . [DOI] [PubMed] [Google Scholar]
10.Ewens WJ, Shute NC. A resolution of the ascertainment sampling problem. I. Theory. Theor Popul Biol. 1986;30(3):388–412. doi: 10.1016/0040-5809(86)90042-0 . [DOI] [PubMed] [Google Scholar]
11.Greenberg DA. Inferring mode of inheritance by comparison of lod scores. Am J Med Genet. 1989;34(4):480–6. doi: 10.1002/ajmg.1320340406 . [DOI] [PubMed] [Google Scholar]
12.Elston RC. Man bites dog? The validity of maximizing lod scores to determine mode of inheritance. Am J Med Genet. 1989;34(4):487–8. doi: 10.1002/ajmg.1320340407 . [DOI] [PubMed] [Google Scholar]
13.Vieland VJ, Hodge SE. The problem of ascertainment for linkage analysis. Am J Hum Genet. 1996;58(5):1072–84. ; PubMed Central PMCID: PMC1914614. [PMC free article] [PubMed] [Google Scholar]
14.Slager SL, Huang J, Vieland VJ. Power comparisons between the TDT and two likelihood-based methods. Genet Epidemiol. 2001;20(2):192–209. doi: . [DOI] [PubMed] [Google Scholar]
15.Petersen GM, Parmigiani G, Thomas D. Missense mutations in disease genes: a Bayesian approach to evaluate causality. Am J Hum Genet. 1998;62(6):1516–24. doi: 10.1086/301871 ; PubMed Central PMCID: PMC1377150. [DOI] [PMC free article] [PubMed] [Google Scholar]
16.Smith CA. Testing for heterogeneity of recombination fraction values in human genetics. Ann Hum Genet. 1963;27:175–82. doi: 10.1111/j.1469-1809.1963.tb00210.x . [DOI] [PubMed] [Google Scholar]
17.Vieland VJ, Huang Y, Seok SC, Burian J, Catalyurek U, O’Connell J, et al. KELVIN: a software package for rigorous measurement of statistical evidence in human genetics. Hum Hered. 2011;72(4):276–88. Epub 20111223. doi: 10.1159/000330634 ; PubMed Central PMCID: PMC3267994. [DOI] [PMC free article] [PubMed] [Google Scholar]
18.Minikel EV, Vallabh SM, Lek M, Estrada K, Samocha KE, Sathirapongsasuti JF, et al. Quantifying prion disease penetrance using large population control cohorts. Sci Transl Med. 2016;8(322):322ra9. doi: 10.1126/scitranslmed.aad5169 ; PubMed Central PMCID: PMC4774245. [DOI] [PMC free article] [PubMed] [Google Scholar]
19.Wright CF, West B, Tuke M, Jones SE, Patel K, Laver TW, et al. Assessing the Pathogenicity, Penetrance, and Expressivity of Putative Disease-Causing Variants in a Population Setting. Am J Hum Genet. 2019;104(2):275–86. doi: 10.1016/j.ajhg.2018.12.015 ; PubMed Central PMCID: PMC6369448. [DOI] [PMC free article] [PubMed] [Google Scholar]
20.McGurk KA, Zhang X, Theotokis P, Thomson K, Harper A, Buchan RJ, et al. The penetrance of rare variants in cardiomyopathy-associated genes: a cross-sectional approach to estimate penetrance for secondary findings. Medrxiv. 2023; 10.1101/2023.03.15.23287112. [DOI] [PMC free article] [PubMed] [Google Scholar]
21.Gudmundsson S, Singer-Berk M, Watts NA, Phu W, Goodrich JK, Solomonson M, et al. Variant interpretation using population databases: Lessons from gnomAD. Hum Mutat. 2022;43(8):1012–30. Epub 20211216. doi: 10.1002/humu.24309 ; PubMed Central PMCID: PMC9160216. [DOI] [PMC free article] [PubMed] [Google Scholar]

PLoS One. doi: 10.1371/journal.pone.0290336.r001

Decision Letter 0

Anshuman Mishra

14 Jun 2023

PONE-D-23-06786The effect of ascertainment on penetrance estimates for rare variants: implications for establishing pathogenicity and for genetic counsellingPLOS ONE

Dear Dr. Paterson,

Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process.

==============================

Article Report

Article title ‘The effect of ascertainment on penetrance estimates for rare variants: implications for establishing pathogenicity and for genetic counselling’ by Andrew D Paterson et al have, developed PenEst, an app that allows users to investigate the phenomenon across ranges of parameter settings.

Summary of article: In this study Ascertainment effects on penetrance estimates.

Most of the variants identified are very rare and were identified in small pedigrees, which creates challenges in terms of penetrance estimation and translation into genetic counselling in the setting of cascade testing. They illustrated robust ascertainment corrections via the LOD score, and recommend a LOD-based approach to assessing pathogenicity of rare variants in the presence of reduced penetrance.

Comment: Minor revision

The work is very interesting and these findings have important implications for establishing pathogenicity for variants as well as implications for cascade genetic counselling.

These findings will be of general interest to the human and medical genetics community since they have impact on variant interpretation and penetrance estimation for rare variants. Article must be accepted and below points may be consider for better framework of the article-

Representation of statistical analysis should be more defined to understand better way.
Results should be compared with recent published work and emphasize on betterment cause.
Result and discussion section should be arranged more properly with the appropriate content.
Application of the app-research defined in infectious diseases model with healthcare management.

Please submit your revised manuscript by Jul 29 2023 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

Please include the following items when submitting your revised manuscript:

A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.
A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.
An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter.

If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: https://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols. Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols.

We look forward to receiving your revised manuscript.

Kind regards,

Anshuman Mishra, PhD

Academic Editor

PLOS ONE

Journal Requirements:

When submitting your revision, we need you to address these additional requirements.

1. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at

https://journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and

https://journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf

2. Please include captions for your Supporting Information files at the end of your manuscript, and update any in-text citations to match accordingly. Please see our Supporting Information guidelines for more information: http://journals.plos.org/plosone/s/supporting-information.

3. Please review your reference list to ensure that it is complete and correct. If you have cited papers that have been retracted, please include the rationale for doing so in the manuscript text, or remove these references and replace them with relevant current references. Any changes to the reference list should be mentioned in the rebuttal letter that accompanies your revised manuscript. If you need to cite a retracted article, indicate the article’s retracted status in the References list and also include a citation and full reference for the retraction notice.

Additional Editor Comments (if provided):

Dear Author,

Please find the Article Report as below and carefully submit the revised version with reviewers reply.

Article title ‘The effect of ascertainment on penetrance estimates for rare

variants: implications for establishing pathogenicity and for genetic counselling’

by Andrew D Paterson et al have, developed PenEst, an app that allows users to investigate the phenomenon across ranges of parameter settings.

Summary of article: In this study Ascertainment effects on penetrance estimates.

Comment: Minor revision

The work is very interesting and these findings have important implications for establishing pathogenicity for variants as well as implications for cascade genetic counselling.

1. Representation of statistical analysis should be more defined to understand better way.

2. Results should be compared with recent published work and emphasize on betterment cause.

3. Result and discussion section should be arranged more properly with the appropriate content.

4. Application of the app-research defined in infectious diseases model with healthcare management.

Thanks

Regards

Dr. Anshuman Mishra

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: Yes

Reviewer #2: Yes

**********

2. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: Yes

Reviewer #2: Yes

**********

3. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: Yes

Reviewer #2: Yes

**********

4. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: Yes

Reviewer #2: Yes

**********

5. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: The manuscript investigates a relevant and critical problem in the field. The study design and experimental approaches are appropriate and data supports the conclusion. Hence, I endorse publication of this manuscript.

Reviewer #2: In the current study entitled “The effect of ascertainment on penetrance estimates for rare variants: implications for 3 establishing pathogenicity and for genetic counseling”, authors performed simulation and identified that penetrance estimated for variant in rare diseases can be drastically inflated due to underlying ascertainment bias. They developed a python based tool “PenEst” for the simulation. In the end, authors recommended to use LOD-based approach to assess the pathogenicity of rare variant. The article's research is relatively clear. I endorse it for publication in PlosOne.

**********

6. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: Yes: Manju Kashyap

**********

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step.

PLoS One. 2023 Sep 21;18(9):e0290336. doi: 10.1371/journal.pone.0290336.r002

Author response to Decision Letter 0

21 Jul 2023

The previous submission was not specifically formatted for PLOS One, so we took the advantage of the reviewer’s suggestions to better define the analysis and re-arrange the results and discussion to revise the structure of the manuscript. This has entailed including the previous supplementary text in the main manuscript, as well as major re-organization of the manuscript to make the work easier to understand.

The reviewer’s comment about infectious disease and healthcare management are beyond the scope of the current manuscript and have not been explicitly addressed.

Attachment

Submitted filename: PLOS_one_response_to_reviews_ascertainment.docx

Click here for additional data file.^{(16.2KB, docx)}

PLoS One. doi: 10.1371/journal.pone.0290336.r003

Decision Letter 1

Anshuman Mishra

4 Aug 2023

The effect of ascertainment on penetrance estimates for rare variants: implications for establishing pathogenicity and for genetic counselling

PONE-D-23-06786R1

Dear Prof. Andrew D Paterson,

We’re pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it meets all outstanding technical requirements.

Within one week, you’ll receive an e-mail detailing the required amendments. When these have been addressed, you’ll receive a formal acceptance letter and your manuscript will be scheduled for publication.

An invoice for payment will follow shortly after the formal acceptance. To ensure an efficient process, please log into Editorial Manager at http://www.editorialmanager.com/pone/, click the 'Update My Information' link at the top of the page, and double check that your user information is up-to-date. If you have any billing related questions, please contact our Author Billing department directly at authorbilling@plos.org.

If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they’ll be preparing press materials, please inform our press team as soon as possible -- no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org.

Kind regards,

Anshuman Mishra, PhD

Academic Editor

PLOS ONE

Additional Editor Comments (optional):

Dear Prof. Andrew D Paterson,

Thanks for the revised article and corrections for making it more better for the readers. Hopefully this article will be helpful to understand complex genetics phenomenon through the developed PenEst for calculating and displaying the corrected penetrance estimates.

Comment: Accepted

Regards

Anshuman Mishra

PLOS ONE

Reviewers' comments:

PLoS One. doi: 10.1371/journal.pone.0290336.r004

Acceptance letter

Anshuman Mishra

15 Sep 2023

PONE-D-23-06786R1

The effect of ascertainment on penetrance estimates for rare variants: implications for establishing pathogenicity and for genetic counselling

Dear Dr. Paterson:

I'm pleased to inform you that your manuscript has been deemed suitable for publication in PLOS ONE. Congratulations! Your manuscript is now with our production department.

If your institution or institutions have a press office, please let them know about your upcoming paper now to help maximize its impact. If they'll be preparing press materials, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information please contact onepress@plos.org.

If we can help with anything else, please email us at plosone@plos.org.

Thank you for submitting your work to PLOS ONE and supporting open access.

Kind regards,

PLOS ONE Editorial Office Staff

on behalf of

Dr. Anshuman Mishra

Academic Editor

PLOS ONE

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Attachment

Submitted filename: PLOS_one_response_to_reviews_ascertainment.docx

Click here for additional data file.^{(16.2KB, docx)}

Data Availability Statement

Data are simulated. Code used in generating and analyzing the data is available at https://github.com/MathematicalMedicine/PenetranceEstimator/.

[pone.0290336.ref001] 1.Richards S, Aziz N, Bale S, Bick D, Das S, Gastier-Foster J, et al. Standards and guidelines for the interpretation of sequence variants: a joint consensus recommendation of the American College of Medical Genetics and Genomics and the Association for Molecular Pathology. Genet Med. 2015;17(5):405–24. doi: 10.1038/gim.2015.30 ; PubMed Central PMCID: PMC4544753. [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0290336.ref002] 2.Lek M, Karczewski KJ, Minikel EV, Samocha KE, Banks E, Fennell T, et al. Analysis of protein-coding genetic variation in 60,706 humans. Nature. 2016;536(7616):285–91. doi: 10.1038/nature19057 ; PubMed Central PMCID: PMC5018207. [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0290336.ref003] 3.Karczewski KJ, Francioli LC, Tiao G, Cummings BB, Alföldi J, Wang Q, et al. The mutational constraint spectrum quantified from variation in 141,456 humans. Nature. 2020;581(7809):434–43. Epub 20200527. doi: 10.1038/s41586-020-2308-7 ; PubMed Central PMCID: PMC7334197. [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0290336.ref004] 4.Backman JD, Li AH, Marcketta A, Sun D, Mbatchou J, Kessler MD, et al. Exome sequencing and analysis of 454,787 UK Biobank participants. Nature. 2021;599(7886):628–34. Epub 20211018. doi: 10.1038/s41586-021-04103-z ; PubMed Central PMCID: PMC8596853. [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0290336.ref005] 5.Hodge SE, Vieland VJ. The essence of single ascertainment. Genetics. 1996;144(3):1215–23. doi: 10.1093/genetics/144.3.1215 ; PubMed Central PMCID: PMC1207613. [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0290336.ref006] 6.Thompson D, Easton DF, Goldgar DE. A full-likelihood method for the evaluation of causality of sequence variants from family data. Am J Hum Genet. 2003;73(3):652–5. Epub 20030729. doi: 10.1086/378100 ; PubMed Central PMCID: PMC1180690. [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0290336.ref007] 7.Jarvik GP, Browning BL. Consideration of Cosegregation in the Pathogenicity Classification of Genomic Variants. Am J Hum Genet. 2016;98(6):1077–81. Epub 20160526. doi: 10.1016/j.ajhg.2016.04.003 ; PubMed Central PMCID: PMC4908147. [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0290336.ref008] 8.MacArthur DG, Manolio TA, Dimmock DP, Rehm HL, Shendure J, Abecasis GR, et al. Guidelines for investigating causality of sequence variants in human disease. Nature. 2014;508(7497):469–76. doi: 10.1038/nature13127 ; PubMed Central PMCID: PMC4180223. [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0290336.ref009] 9.Ewens WJ, Shute NC. The limits of ascertainment. Ann Hum Genet. 1986;50(4):399–402. doi: 10.1111/j.1469-1809.1986.tb01760.x . [DOI] [PubMed] [Google Scholar]

[pone.0290336.ref010] 10.Ewens WJ, Shute NC. A resolution of the ascertainment sampling problem. I. Theory. Theor Popul Biol. 1986;30(3):388–412. doi: 10.1016/0040-5809(86)90042-0 . [DOI] [PubMed] [Google Scholar]

[pone.0290336.ref011] 11.Greenberg DA. Inferring mode of inheritance by comparison of lod scores. Am J Med Genet. 1989;34(4):480–6. doi: 10.1002/ajmg.1320340406 . [DOI] [PubMed] [Google Scholar]

[pone.0290336.ref012] 12.Elston RC. Man bites dog? The validity of maximizing lod scores to determine mode of inheritance. Am J Med Genet. 1989;34(4):487–8. doi: 10.1002/ajmg.1320340407 . [DOI] [PubMed] [Google Scholar]

[pone.0290336.ref013] 13.Vieland VJ, Hodge SE. The problem of ascertainment for linkage analysis. Am J Hum Genet. 1996;58(5):1072–84. ; PubMed Central PMCID: PMC1914614. [PMC free article] [PubMed] [Google Scholar]

[pone.0290336.ref014] 14.Slager SL, Huang J, Vieland VJ. Power comparisons between the TDT and two likelihood-based methods. Genet Epidemiol. 2001;20(2):192–209. doi: . [DOI] [PubMed] [Google Scholar]

[pone.0290336.ref015] 15.Petersen GM, Parmigiani G, Thomas D. Missense mutations in disease genes: a Bayesian approach to evaluate causality. Am J Hum Genet. 1998;62(6):1516–24. doi: 10.1086/301871 ; PubMed Central PMCID: PMC1377150. [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0290336.ref016] 16.Smith CA. Testing for heterogeneity of recombination fraction values in human genetics. Ann Hum Genet. 1963;27:175–82. doi: 10.1111/j.1469-1809.1963.tb00210.x . [DOI] [PubMed] [Google Scholar]

[pone.0290336.ref017] 17.Vieland VJ, Huang Y, Seok SC, Burian J, Catalyurek U, O’Connell J, et al. KELVIN: a software package for rigorous measurement of statistical evidence in human genetics. Hum Hered. 2011;72(4):276–88. Epub 20111223. doi: 10.1159/000330634 ; PubMed Central PMCID: PMC3267994. [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0290336.ref018] 18.Minikel EV, Vallabh SM, Lek M, Estrada K, Samocha KE, Sathirapongsasuti JF, et al. Quantifying prion disease penetrance using large population control cohorts. Sci Transl Med. 2016;8(322):322ra9. doi: 10.1126/scitranslmed.aad5169 ; PubMed Central PMCID: PMC4774245. [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0290336.ref019] 19.Wright CF, West B, Tuke M, Jones SE, Patel K, Laver TW, et al. Assessing the Pathogenicity, Penetrance, and Expressivity of Putative Disease-Causing Variants in a Population Setting. Am J Hum Genet. 2019;104(2):275–86. doi: 10.1016/j.ajhg.2018.12.015 ; PubMed Central PMCID: PMC6369448. [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0290336.ref020] 20.McGurk KA, Zhang X, Theotokis P, Thomson K, Harper A, Buchan RJ, et al. The penetrance of rare variants in cardiomyopathy-associated genes: a cross-sectional approach to estimate penetrance for secondary findings. Medrxiv. 2023; 10.1101/2023.03.15.23287112. [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0290336.ref021] 21.Gudmundsson S, Singer-Berk M, Watts NA, Phu W, Goodrich JK, Solomonson M, et al. Variant interpretation using population databases: Lessons from gnomAD. Hum Mutat. 2022;43(8):1012–30. Epub 20211216. doi: 10.1002/humu.24309 ; PubMed Central PMCID: PMC9160216. [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

The effect of ascertainment on penetrance estimates for rare variants: Implications for establishing pathogenicity and for genetic counselling

Andrew D Paterson

Sang-Cheol Seok

Veronica J Vieland

Roles

Abstract

Introduction

Methods

Preliminaries and notation

Ascertainment model

Estimation methods

Assessment of pathogenicity

Simulation methods

Results

Impact of ascertainment on penetrance estimates

Fig 1. Swarm plots showing sampling distributions of penetrance estimates as a function of number of families N.

Fig 2. Expected values of penetrance estimates as a function of population prevalence γ and ascertainment parameter k.

Fig 3. Sampling distributions and expected values of f^.

Assessment of pathogenicity

Fig 4. Expected values of alternative co-segregation measures as a function of ascertainment model and phenocopy rate for N = 20.

Fig 5. Sampling distributions of LE-LOD(max) as a function of number of families N.

Discussion

Acknowledgments

Data Availability

Funding Statement

References

Decision Letter 0

Anshuman Mishra

Roles

Author response to Decision Letter 0

Decision Letter 1

Anshuman Mishra

Roles

Acceptance letter

Anshuman Mishra

Roles

Associated Data

Supplementary Materials

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases

Fig 3. Sampling distributions and expected values of $\hat{f}$ .