A novel statistical method for interpreting the pathogenicity of rare variants

Jun Wang; Hehe Liu; Renae Elaine Bertrand; Alejandro Sarrion-Perdigones; Yezabel Gonzalez; Koen JT Venken; Rui Chen

doi:10.1038/s41436-020-00948-3

. Author manuscript; available in PMC: 2021 Mar 4.

Published in final edited form as: Genet Med. 2020 Sep 4;23(1):59–68. doi: 10.1038/s41436-020-00948-3

A novel statistical method for interpreting the pathogenicity of rare variants

Jun Wang ^1,², Hehe Liu ^1,², Renae Elaine Bertrand ^1,³, Alejandro Sarrion-Perdigones ³, Yezabel Gonzalez ³, Koen JT Venken ^3,⁴, Rui Chen ^1,²

PMCID: PMC7796914 NIHMSID: NIHMS1630822 PMID: 32884132

Abstract

Purpose:

To achieve the ultimate goal of personalized treatment of patients, accurate molecular diagnosis and precise interpretation of the impact of genetic variants on gene function is essential. With the sequencing cost becoming increasingly affordable, accurate distinguishing benign from pathogenic variants upon sequencing becomes the major bottleneck. Although large normal population sequence databases have become a key resource in filtering benign variants, they are not effective at filtering extremely rare variants.

Methods:

To address this challenge, we developed a novel statistical test by combining sequencing data from a patient cohort with a normal control population database. By comparing the expected and observed allele frequency in the patient cohort, variants that are likely benign can be identified.

Results:

The performance of this new method is evaluated on both simulated and real datasets coupled with experimental validation. As a result, we demonstrate this new test is well-powered to identify benign variants, particularly effective for variants with low frequency in the normal population.

Conclusion:

Overall, as a general test that can be applied to any type of variants in the context of all Mendelian diseases, our work provides a general framework for filtering benign variants with very low population allele frequency.

Keywords: variant interpretation, allele frequency, mendelian diseases, statistical test, clinical genomics

Introduction

The advancement of high throughput sequencing technology significantly facilitates the identification of genetic variations in individuals and populations. However, the determination of the pathogenicity of genetic variants upon sequencing remains a major challenge for precision medicine. Over the last two decades, many in silico variant functional prediction tools have been developed to distinguish the pathogenic from the likely benign genetic variants, but they are far from perfect. For example, the methods based on evolutionary sequence conservation might be prone to both false positives and false negatives as many benign variants also occur in the evolutionarily conserved regions and vice versa. In parallel, many computational prediction methods that applied machine learning algorithms are limited by the training dataset and the type of information incorporated into the model. Furthermore, most of the computational prediction methods focused on missense variants, leaving other types of variants, such as INDEL and noncoding variants, largely unexplored. Recently, with genome sequences of large populations becoming available, a significant proportion of benign variants can be identified based on population allele frequency (AF) and disease prevalence, given variants having high AF in the normal population tend to be benign^1,2. Similarly, by comparing AF between a patient cohort and the normal control population, variants that are enriched in the patient cohort and therefore likely to be pathogenic can be identified³. However, these AF-based methods have limited power when the AF of the variant is low in the normal population.

In this study, we developed a novel statistical method to identify likely benign variants. Briefly, in contrast to previous methods that primarily use the AF in the normal population, our method calculates the expected frequency of an allele in the patient cohort by taking into account the disease prevalence and the AF in the normal population. By comparing the expected and the observed frequency of the variant in the patient cohort, the probability that the variant is pathogenic can be calculated. To test the new statistics, we have applied it to both simulated and real datasets and evaluated its performance based on literature and other variant prediction methods. A subset of the variants with prediction contradictory to previous reports were further examined with experimental functional assays. Based on our results, the model has significantly improved power to annotate rare variants, and it can be applied for all variant types such as SNPs, INDELs and non-coding variants. Overall, we show that our new method is effective and robust as a general framework in evaluating the pathogenicity of variants in the context of Mendelian diseases.

Methods:

Ethics Statement

The studies have approval from the appropriate institutional review board (BCM IRB).

Derivation of the combined test

The combined test is composed of two binomial tests.

Let Q = the disease prevalence due to the variants in gene g in population. q_k= the AF of a single pathogenic variant a_k of the gene g in population. Assume that n patients are randomly sampled from the patients attributable to the variants in gene g.

For a recessive disease gene g, the expected number of pathogenic alleles in n patients would be 2 * n. Among the 2 * n pathogenic alleles, the expected occurrence count of a pathogenic allele a_k should follow a Binomial distribution with N = 2 * n trials and the occurrence frequency (success rate) of $\frac{q_{k}}{\sqrt{Q}}$ .

The first test (test1) is a left-tailed $Binomial . test (X = x, N = 2 n, p = \frac{q_{k}}{\sqrt{Q}})$ :

H0 of the test1: The allele is pathogenic, thus its observed occurrence in patient cohort follows a $Binomial (N = 2 n, p = \frac{q_{k}}{\sqrt{Q}})$ distribution.

H1 of the test1 : The observed occurrence of the allele in patient cohort does not follow $Binomial (N = 2 n, p = \frac{q_{k}}{\sqrt{Q}})$ , significantly lower than $2 n \times \frac{q_{k}}{\sqrt{Q}}$ . Therefore, it is unlikely to be pathogenic (The detailed derivation in Supplementary Methods).

Additionally, if an allele is likely benign, its AF in patient cohort should be similar to its AF in normal population, given it has equal association with patients and normal population.

Thus, the second test (test2) is a right-tailed Binomial.test (X = x, N = 2n, p = q_k),:

H0 of the test2: The allele is benign, thus its observed occurrence in patient cohort follows a Binomial (N = 2n, p = q_k) distribution.

H1 of the test2: The observed occurrence of the allele in patient cohort does not follow Binomial (N = 2n, p = q_k), significantly higher than 2n × q_k. Therefore, it is unlikely to be benign.

The combined test result is based on the results of test1 and test2.

H0 of the combined test: The allele is pathogenic, thus test1 H0 is true (test1 p-value > 0.05) or test2 H0 is rejected (test2 p-value ≤ 0.05).

H1 of the combined test: The test1 H1 is true (test1 p-value ≤ 0.05) and test2 H0 is true (test2 p-value > 0.05), thus the allele is unlikely to be pathogenic (namely likely to be benign).

For a dominant disease gene g, the expected number of pathogenic alleles in n patients (with the rare dominant disease) ≈ n. Among the n pathogenic alleles, the expected occurrence count of a pathogenic allele a_i should follow a Binomial distribution with N = n trials and the occurrence frequency (success rate) of $\frac{q_{i}}{1 - \sqrt{1 - Q}}$ .

Test1 is a left-tailed $Binomial . test (X = x, N = n, p = \frac{q_{k}}{1 - \sqrt{1 - Q}})$ :

H0 of the test1: The allele is likely pathogenic, thus its observed occurrence in patient cohort follows a $Binomial (N = n, p = \frac{q_{k}}{1 - \sqrt{1 - Q}})$ distribution.

H1 of the test1 : The observed occurrence of the allele in patient cohort does not follow $Binomial (N = n, p = \frac{q_{k}}{1 - \sqrt{1 - Q}})$ , significantly lower than $n \times \frac{q_{k}}{1 - \sqrt{1 - Q}}$ . Therefore, it is unlikely to be pathogenic (Supplementary Methods).

Additionally, if an allele is benign, its AF in patient cohort should be similar to its AF in normal population, given it has equal association with patients and normal population.

Test2 is a right-tailed Binomial.test (X = x, N = n, p = 2 × q_k):

H0 of the test2: The allele is likely benign, thus its observed occurrence in patient cohort follows a Binomial (N = n, p = 2 × q_k) distribution.

H1 of the test2: The observed occurrence of the allele in patient cohort does not follow Binomial (N = n, p = 2 × q_k), significantly higher than n × 2 × q_k. Therefore, it is unlikely to be benign.

The combined test result is based on the results of test1 and test2.

H0 of the combined test: The allele is likely pathogenic, thus test1 H0 is true (test1 p-value > 0.05) or test2 H0 is rejected (test2 p-value ≤ 0.05).

H1 of the combined test: The test1 H1 is true (test1 p-value ≤ 0.05) and test2 H0 is true (test2 p-value > 0.05), thus the allele is unlikely to be pathogenic (namely likely to be benign).

The treatment for X-linked genes and for population stratification is in supplementary materials.

Other Methods

See supplementary materials.

Code availability

Our code are available at:

https://github.com/fe4960/Binomial_test

Results:

The simulation analysis of the test power and specificity

To assess the performance of the test, we simulated a variety of scenarios with different settings for the patient cohort size (n), the disease prevalence (Q), and the AF in the normal population (q_k).

The test power increases as the patient cohort size increases

The test power is positively correlated with the patient cohort size (Figure 1A). Larger sample size of patient cohort will allow the test to achieve high power and specificity in determining the pathogenicity of rarer variants. For example, under the autosomal dominant (AD) model, for a disease with a prevalence of 1/1000, when the patient cohort size is increased from 120 to 200, the test power will increase from 0 to about 100% in detecting rare benign variants with a population AF at 1×10⁻⁵ (Figure1A, Supplementary table S1). The similar trend was observed under the AR model (Figure1B, Supplementary table S1).

The power increases as the disease prevalence decreases

The test power is negatively correlated with the disease prevalence (Figure 1C–F, Supplementary table S1). Namely, the test has higher power for rarer diseases. For example, under the AD model, to detect benign variants with a population AF of 5×10⁻⁵, when the associated disease prevalence decreases from 1/200 to 1/1000, the patient cohort size that the test requires to achieve about 100% power will decrease from 200 to 50. The similar trend was observed under the AR model (Supplementary table S1).

The test has a high specificity

We found that the test specificity is robust and remains close to 100% and is largely unaffected by the AF in the normal population and disease prevalence. As long as the observed frequency of the variant in the patient cohort is greater than or similar to the expected frequency based on the AF in the normal population and disease prevalence, the test will typically not consider the variant as a benign variant (Figure 1G–L, Supplementary table S1).

Sampling bias in patient cohort could affect the test power and specificity

When applying the test to detect the variants biasedly enriched in the sampled patient cohort, the test power to detect benign alleles will decrease (Figure 2, Supplementary Table S2). For example, if the observed frequency of a variant is five folds of the true AF due to sampling bias, for an AR disease with a disease prevalence of 1/10000, the test power of detecting variants with a population AF of 5×10⁻⁴ will decrease to 26% with a patient cohort size of 1000. Similarly, when applying the test to identify the variants artificially depleted in the sampled patient cohort due to sampling bias, the test specificity to detect pathogenic alleles will decrease when the patient sample size is small but will rapidly recover as the sample size increases (Figure 2, Supplementary Table S2). For example, if the observed frequency of a variant is at 10% of the true AF, for an AD disease with a disease prevalence of 1/1000, the test specificity of detecting a variant with a population AF of 5×10⁻⁵ will increase from 39% to 92% when the sample size increases from 50 to 250. The similar trend was observed under the AR model.

Figure 2. — The simulation analysis of sampling bias

Sampling bias can affect the test performance. The test power decreases for detecting the variants biasedly enriched in the patient sampling, and is not significantly affected for detecting the variants biasedly depleted in the sampling, for variants with a population AF of 1×10⁻⁴ under the AD model (A), a population AF of 1×10⁻³ under the AR model (B), a population AF of 5×10⁻⁵ under the AD model (C), and a population AF of 5×10⁻⁴ under the AR model (D). The test specificity is not significantly affected for detecting the variants biasedly enriched in the patient sampling, and decreases for detecting the variants biasedly depleted in the sampling, which can be rapidly improved as the sample sizes increase, for variants with a population AF of 1×10⁻⁴ under the AD model (E), a population AF of 1×10⁻³ under the AR model (F), a population AF of 5×10⁻⁵ under the AD model (G), and a population AF of 5×10⁻⁴ under the AR model (H)

Misspecification of disease prevalence could affect test performance

As shown in Supplementary Figure S1, in the case where the disease prevalence is under-estimated, the test power to detect benign alleles and test specificity to detect pathogenic alleles are not significantly affected. On the other hand, when the disease prevalence is over-estimated, the power of the test to detect benign alleles decreases but rapidly improves with increasing of the sample size, while the specificity of the test to detect pathogenic alleles is not significantly affected (Supplementary materials, Supplementary Figure S1, Table S3).

The impact of allele penetrance on test performance

Based on simulation, if a disease is attributed to alleles with various penetrance values, the test power to detect benign alleles will not be significantly affected, whereas, test specificity to detect pathogenic alleles will decrease for alleles with low penetrance using small patient cohort sizes but will rapidly improve as the sample size increases. (Supplementary materials, Supplementary Figure S2, Table S4). Additionally, we performed the simulation for the scenario of a disease attributed to multiple pathogenic alleles with heterogenous penetrance. Under this scenario, the test power to detect benign alleles is not significantly affected, and test specificity to detect pathogenic alleles with low penetrance decreases using small sample sizes but will rapidly improve as the sample size increases (Supplementary materials, Supplementary Figure S3, Table S4). Overall, the test shows excellent power and specificity for alleles with penetrance ≥ 50%.

Estimation of the thresholds of population allele frequency for filtering variants

We performed simulation to determine the thresholds of population AF for filtering benign variants for diseases with a variety of disease prevalence under AR/AD model, respectively (Supplementary Materials). As shown in Supplementary Table S5, our test allows the filtering of alleles that are at least 10 times less frequent in population than applying thresholds based on disease prevalence alone.

Real case studies

To evaluate the performance of our model, we applied our test to variants in three genes, ABCA4, USH2A and LRP5, to represent three types of scenarios.

Screening of reported pathogenic variants in ABCA4 for Stargardt (STGD)

The disease prevalence of STGD is estimated as 1 in 10000 individuals⁴. It has been estimated that about 70% of STGD patients carry variants in ABCA4⁵. Therefore, this represents the scenario of a recessive disease with a relatively homogeneous genetic cause. We screened 945 reported pathogenic variants in ABCA4 genes collected in HGMD. Among them, 11 variants are likely benign, as their population AF in is higher than 0.7% $(\sqrt{1 / 20000})$ , the cutoff based on STGD disease prevalence, therefore were excluded from further analysis. The remaining 934 variants were subjected to our test model. As a result, 26 variants with the AF in the range of 0.46% to 0.03% were identified as likely benign (Binomial test1, Bonferroni correction p-value ≤ 0.05/934 and test2 Bonferroni correction p-value > 0.05/934) (Figure 3A).

Figure 3. — The distribution of population AF of the HGMD variants in the three genes. (A) The Non-Finnish European (NFE) AF of *ABCA4* variants in HGMD, (B) The NFE AF of *USH2A* variants in HGMD, and (C) The East-Asian AF of *LRP5* variants in HGMD. The orange indicates the HGMD variants identified as likely benign by our test. The blue indicates the other HGMD variants.

To examine the test result, we further reviewed the original literatures and examined the gnomAD database (gnomAD) for the 26 variants. Based on published literature, 8 of the 26 variants should be annotated as benign as they are reported as non-pathogenic polymorphisms in one or more papers (e.g. NM_000350.2:c.455G>A^6,7, NM_000350.2:c.2701A>G⁶), behave similarly to wildtype based on functional assays (e.g. NM_000350.2:c.5693G>A⁸), or do not segregated with the disease (e.g. NM_000350.2:c.4685T>C⁷) in the original report. Consistently, homozygous individuals were observed for five of the eight variants in gnomAD. Additionally, 13 of the 26 variants should be annotated as a variant of uncertain significance (VUS) based on data presented in the original papers due to the lack of definite association of the variant to the disease^4,6,9–12. Interestingly, 6 of these 13 variants have homozygous individuals observed in gnomAD (Supplementary table S6, S7). Finally, 5 of the 26 variants were noted as likely pathogenic in the original reports, given the patients carry a 2^nd variant in ABCA4, or supported by functional evidence^7,8,13–19. However, four of the five variants have homozygotes identified in gnomAD with the number ranging from 1 to 68 (Supplementary table S6, S7). Therefore, almost all the 26 variants that predicted to be benign by our test are indeed unlikely to be pathogenic.

Screening of reported pathogenic variants in USH2A for Retinitis pigmentosa (RP)

The disease prevalence of RP was estimated as 1 in 4000 (https://ghr.nlm.nih.gov/condition/retinitis-pigmentosa#statistics), and RP has been linked to more than 60 disease genes. About 10%−15% of RP patients were attributed to variants in USH2A^20,21. Therefore, this represents the scenario where variants in a gene lead to a recessive disease with heterogenous genetic basis. 748 USH2A variants collected in HGMD were evaluated. Among them, 13 variants with AF greater than 0.52% $(\sqrt{11 / 400000})$ in gnomAD were considered benign and excluded from further analysis (supplemental table S8 and S9). For the remaining 735 variants, 18 were identified as likely benign (Binomial test1, Bonferroni correction p-value ≤ 0.05/735 and test2 Bonferroni correction p-value > 0.05/735), whose AF ranges from 0.04% to 0.5% (Figure 3B). Based on data in the original reports and the number of individuals carrying homozygous variants observed in gnomAD, at least 10 of the 18 variants are likely to be benign. Specifically, three variants were suggested to be benign by the original reference (e.g. NM_206933.2:c.15433G>A²², NM_206933.2:c.6587G>C^23,24). All of them have homozygotes in gnomAD, with the number ranging from 4 to 120. Eight variants were annotated as VUS based on the original papers (e.g. NM_206933.2:c.10510C>G¹⁶) and three of them have homozygotes in gnomAD with the number ranging from one to nine (Supplementary table S8 and S9). Finally, for the seven variants that were suggested to be likely pathogenic by the original papers (e.g. NM_206933.2:c.11815G>A²⁵), four of them have homozygotes in gnomAD with the number ranging from 1 to 95 (Supplemental table S8 and S9).

Identification of benign variants in LRP5 gene for Familial Exudative Vitreoretinopathy (FEVR)

FEVR is a rare retinal vascular disorder that is primarily dominantly inherited²⁶. Variants in LRP5 account for about 20% – 25% of FEVR patients. Therefore, this represents the scenario where variants in a gene lead to a dominant disease with heterogenous genetic basis. 150 putative pathogenic variants in LRP5 were analyzed. After excluding seven variants based on the disease prevalence of FEVR (greater than 0.05%, $1 - \sqrt{1 - 1 / 1000}$ ), 16 likely benign variants were identified (Binomial test1, Bonferroni correction p-value ≤ 0.05/143 and test2 Bonferroni correction p-value > 0.05/143). The AF of the 16 variants are in the range of 0.0058% – 0.029% (Figure 3C). Based on the data presented in the original report, nine variants were found in patients with diseases unrelated to eye, including six variants related to bone diseases and three variants related to colorectal cancer, indicating these variants are not pathogenic variants for FEVR. Among the seven variants have been linked to eye diseases, two variants were identified in patients with a 2^nd LRP5 variant in compound heterozygous form, implying each variant alone might be insufficient to lead to disease. For the remaining five variants, one variant was considered as benign, one as VUS, and three as likely pathogenic based on the original literature (Supplementary table S10 and S11). To further evaluate these five variants, functional assays are conducted as described below.

Comparison to in silico variant prediction scores and ClinVar annotation

We have compared our results to three commonly used in silico variant prediction tools, including CADD, REVEL and phastcon_100way. The benign variants identified by our method have lower CADD, REVEL and phastcon_100way scores than the other reported variants in the same gene for all the three genes, supporting our test result (Figure 4A–C and Supplementary table S6, S8, S10, S12, Supplementary Results).

To further assess our test results, we crossed reference to ClinVar annotation. As shown in figure 4D, the ClinVar assessment largely supports our test results. Specifically, among the 74 out of the 91 benign variants identified by our test and with records in ClinVar, the vast majority are classified as benign (32 variants, 43.2%), conflicting interpretations of pathogenicity (24 variants, 32.4%), and uncertain_significance (15 variants, 20.3%) (Figure 4D, and Supplementary table S6, S8, and S10).

Functional assay of predicted benign variants in LRP5

To further evaluate our results, we performed a functional assay on five LRP5 variants that were originally reported as pathogenic but are identified as likely benign by our test. As shown in figure 5, three of the variants, LRP5.R1219H (NM_002335.3:c.3656G>A, two tailed T test, p-value < 0.005), LRP5.R1342Q (NM_002335.3:c.4025G>A, T test, p-value > 0.05), LRP5.H1383P (NM_002335.3:c.4148A>C, T test, p-value < 0.002), have similar or higher WNT signaling activity than the wild type control without or with WNT3A treatment, suggesting that these three variants are indeed likely benign. In contrast, LRP5.A422V (NM_002335.3:c.1265C>T) show similar signaling activity to LRP5.WT without WNT3A treatment (T test, p-value = 0.2733), but its activity is reduced by about 50% with WNT3A treatment (T test, p-value =4.95e-5) (Figure 5), suggesting that LRP5.A422V is likely to be a hypomorphic allele. One variant, LRP5.T1506M (NM_002335.3:c.4517C>T), shows lower signaling activity than LRP5.WT without (T test, p-value =1.01e-6) or with WNT3A treatment (T test, p-value =1.31e-7), suggesting it is likely to be a pathogenic allele (Figure 5, Supplementary Table S13, Supplementary Results).

Figure 5. — The luciferase reporter assay of LRP5 variants. We tested five variants that were identified as likely benign by our test along with a positive control and a negative control. The luciferase reporter assay of the positive control variant, p.Q368X, showed reduced WNT signaling activity, while that of the negative control variant, p.D666N, showed higher or similar WNT signaling activity to the wildtype allele. The variants, p.R1342Q, p.H1383P, and p.R1219H showed higher or similar WNT activity to the wildtype allele, suggesting they are likely benign and consistent with test result. However, p.A422V showed similar WNT signaling to the wildtype allele without WNT3A treatment, but the reduced WNT signaling with WNT3A treatment, suggesting it might be a hypomorphic allele. p.T1506M showed the reduced WNT signaling without or with WNT3A treatment, suggesting it might be a pathogenic allele. The Y-axis shows the WNT signaling activity of the variants normalized by that of wildtype allele without WNT3A treatment. NT: no treatment. WNT3A: with treatment.

Discussion

To identify the variants likely to be benign in the context of Mendelian diseases, we designed a novel statistical test by integrating disease prevalence, AF in the patient cohort, and AF in the normal population into a robust statistical framework. Evaluation of our model with both the simulated and real data followed by experimental validation show that it is well-powered to detect benign variants, especially effective for the variants with low AF in the normal population.

Simulation analysis also suggests the test is more powerful when using a patient cohort with larger sample size, for dominant diseases than recessive diseases, and for the rarer diseases than the relatively common diseases. Furthermore, simulation indicates the test has a high specificity which are less affected by the disease prevalence and AF in the normal population. Moreover, simulations show that the test performance is not significantly affected by the estimation of disease prevalence and allele penetrance, but could be affected by sampling bias. Additionally, in case when the prevalence of a disease approaches zero, several confounding factors likely co-occur - the variance of disease prevalence estimation likely increases, the available sample size will be smaller, and within these constraints, the penetrance of an allele possibly varies by its frequency. Under those situations, the test power to detect extreme rare benign variants could be compromised and should be used with caution.

For real datasets, we applied our test to screen the variants in HGMD using real patient cohort data and AF in gnomAD, and identified the variants that are likely benign for three types of Mendelian diseases. The identified variants have the population AF in the range of 0.0058% to higher than 1%, consistent with the simulation analysis showing the high power of our test for variants with low AF in population. Moreover, our test results are supported by multiple independent evidences: 1) the original literature suggest that a large proportion of the predicted benign variants can indeed be classified as benign or VUS. 2) For recessive diseases, a large proportion of the predicted benign variants are found in homozygous state in multiple individuals in gnomAD, suggesting they are unlikely to cause disease in bi-allelic state. 3) Other in silico variant prediction scores (i.e. CADD, REVEL, and phastcon score) showed that the putatively benign variants are less deleterious than the rest HGMD variants in the same genes. 4) The majority of the putative benign variants are classified as benign, conflicting interpretations of pathogenicity, and uncertain-significance by ClinVar.

Predictions from our model are further validated by a functional assay. The luciferase functional assay was performed on a subset of HGMD variants predicted to be benign in LRP5 (with population AF in the range of 0.0058% to 0.029%). The assay show that three of the five tested variants are likely benign. Another variant, LRP5.A422V, is likely a hypomorphic variant. This is also consistent with the original reference in which LRP5.A422V was only seen in-cis with LRP5.R348W in the patient family²⁷, therefore LRP5.A422V alone might not be severe enough to cause the phenotype. Interestingly, LRP5.R348W is predicted to be likely pathogenic by our test. Additionally, one tested variant, LRP5.T1507M, is likely pathogenic. The reason of the contradictory result for LRP5.T1507M might be due to sampling bias of this allele in our patient cohort. Specifically, under sampling of this variant in the patient cohort could lead to false positive interpretation. Indeed, our test suggested this allele is slightly enriched in the patient cohort compared to its AF in normal population with a p-value of 0.01, though it did not pass the multiple testing correction of p-value (p-value ≤ 0.05/150) to be assigned as a pathogenic allele.

One of the main limitations is that the test depends on the availability of patient sequencing data and the patient cohort size. With the dramatic reduction of sequencing cost, it becomes a common practice to perform panel of exome or even genome sequencing as part of the diagnosis for patients with rare genetic diseases. Hence, we expect that this bottleneck will soon be overcome. Additionally, although our model is for filtering the variants of a single gene, if the disease is caused by multiple genes, one can simply adjust the Q value to the frequency of the disease due to mutations in the gene of interests and apply our test directly (e.g. the aforementioned cases of USH2A and LRP5). Moreover, it is straightforward to aggregate multiple genes from the same disease and run the test directly with the modified Q value accordingly. We envision that the performance of our test can be improved by increasing the patient cohort size and combining the test with other variant prediction scores, e.g. REVEL score, CADD score, which are based on other types of information besides allele frequency. In summary, as a general test that is mainly driven by the size of the dataset and can be applied to any types of variants in the context of all Mendelian diseases, we believe our work provides a general framework for filtering benign variants with very low population AF and its performance will continue to improve as more patient sequences becoming available.

Supplementary Material

Supplementary Tables

NIHMS1630822-supplement-Supplementary_Tables.xlsx^{(357KB, xlsx)}

Supplementary (Appendix, online only material, etc.)

NIHMS1630822-supplement-Supplementary___Appendix__online_only_material__etc__.pdf^{(1.8MB, pdf)}

Acknowledgements

We are grateful for the lab of Dr. David Moore at Baylor College of Medicine for providing the L-cells to us for the functional assays of LRP5 variants. We thank the computing cluster server of Molecular and Human Genetics Department at Baylor College of Medicine for providing the computing resource.

Funding

This work was supported by grants from the National Eye Institute [grant numbers R01EY022356, R01EY018571, EY002520 to R.C.]; Retinal Research Foundation [to RC]; National Institutes of Health shared instrument grant [grant number S10OD023469 to RC]; and the Competitive Renewal Grant of Knights Templar Eye Foundation [to JW].

Footnotes

Declaration of conflict of Interests Statement:

The authors declare that they have no competing interests.

Reference

1.Bamshad MJ, Ng SB, Bigham AW, et al. Exome sequencing as a tool for Mendelian disease gene discovery. Nat Rev Genet. 2011;12(11):745–755. [DOI] [PubMed] [Google Scholar]
2.Whiffin N, Minikel E, Walsh R, et al. Using high-resolution variant frequencies to empower clinical genome interpretation. Genet Med. 2017;19(10):1151–1158. [DOI] [PMC free article] [PubMed] [Google Scholar]
3.Clarke GM, Anderson CA, Pettersson FH, Cardon LR, Morris AP, Zondervan KT. Basic statistical analysis in genetic case-control studies. Nat Protoc. 2011;6(2):121–133. [DOI] [PMC free article] [PubMed] [Google Scholar]
4.Michaelides M, Hunt DM, Moore AT. The genetics of inherited macular dystrophies. J Med Genet. 2003;40(9):641–650. [DOI] [PMC free article] [PubMed] [Google Scholar]
5.Zernant J, Xie YA, Ayuso C, et al. Analysis of the ABCA4 genomic locus in Stargardt disease. Hum Mol Genet. 2014;23(25):6797–6806. [DOI] [PMC free article] [PubMed] [Google Scholar]
6.Thiadens AA, Phan TM, Zekveld-Vroon RC, et al. Clinical course, genetic etiology, and visual outcome in cone and cone-rod dystrophy. Ophthalmology. 2012;119(4):819–826. [DOI] [PubMed] [Google Scholar]
7.Downes SM, Packham E, Cranston T, Clouston P, Seller A, Nemeth AH. Detection rate of pathogenic mutations in ABCA4 using direct sequencing: clinical and research implications. Arch Ophthalmol. 2012;130(11):1486–1490. [DOI] [PubMed] [Google Scholar]
8.Sun H, Smallwood PM, Nathans J. Biochemical defects in ABCR protein variants associated with human retinopathies. Nat Genet. 2000;26(2):242–246. [DOI] [PubMed] [Google Scholar]
9.Downs K, Zacks DN, Caruso R, et al. Molecular testing for hereditary retinal disease as part of clinical care. Arch Ophthalmol. 2007;125(2):252–258. [DOI] [PubMed] [Google Scholar]
10.Simonelli F, Testa F, de Crecchio G, et al. New ABCR mutations and clinical phenotype in Italian patients with Stargardt disease. Invest Ophthalmol Vis Sci. 2000;41(3):892–897. [PubMed] [Google Scholar]
11.Eisenberger T, Neuhaus C, Khan AO, et al. Increasing the yield in targeted next-generation sequencing by implicating CNV analysis, non-coding exons and the overall variant load: the example of retinal dystrophies. PLoS One. 2013;8(11):e78496. [DOI] [PMC free article] [PubMed] [Google Scholar]
12.Rosenberg T, Klie F, Garred P, Schwartz M. N965S is a common ABCA4 variant in Stargardt-related retinopathies in the Danish population. Mol Vis. 2007;13:1962–1969. [PubMed] [Google Scholar]
13.Wiszniewski W, Zaremba CM, Yatsenko AN, et al. ABCA4 mutations causing mislocalization are found frequently in patients with severe retinal dystrophies. Hum Mol Genet. 2005;14(19):2769–2778. [DOI] [PubMed] [Google Scholar]
14.Burke TR, Tsang SH, Zernant J, Smith RT, Allikmets R. Familial discordance in Stargardt disease. Mol Vis. 2012;18:227–233. [PMC free article] [PubMed] [Google Scholar]
15.Testa F, Rossi S, Sodi A, et al. Correlation between photoreceptor layer integrity and visual function in patients with Stargardt disease: implications for gene therapy. Invest Ophthalmol Vis Sci. 2012;53(8):4409–4415. [DOI] [PMC free article] [PubMed] [Google Scholar]
16.van Huet RA, Pierrache LH, Meester-Smoor MA, et al. The efficacy of microarray screening for autosomal recessive retinitis pigmentosa in routine clinical practice. Mol Vis. 2015;21:461–476. [PMC free article] [PubMed] [Google Scholar]
17.Zernant J, Schubert C, Im KM, et al. Analysis of the ABCA4 gene by next-generation sequencing. Invest Ophthalmol Vis Sci. 2011;52(11):8479–8487. [DOI] [PMC free article] [PubMed] [Google Scholar]
18.Suarez T, Biswas SB, Biswas EE. Biochemical defects in retina-specific human ATP binding cassette transporter nucleotide binding domain 1 mutants associated with macular degeneration. J Biol Chem. 2002;277(24):21759–21767. [DOI] [PubMed] [Google Scholar]
19.Biswas-Fiss EE, Affet S, Ha M, Biswas SB. Retinoid binding properties of nucleotide binding domain 1 of the Stargardt disease-associated ATP binding cassette (ABC) transporter, ABCA4. J Biol Chem. 2012;287(53):44097–44107. [DOI] [PMC free article] [PubMed] [Google Scholar]
20.Huang L, Mao Y, Yang J, Li Y, Li Y, Yang Z. Mutation screening of the USH2A gene in retinitis pigmentosa and USHER patients in a Han Chinese population. Eye (Lond). 2018;32(10):1608–1614. [DOI] [PMC free article] [PubMed] [Google Scholar]
21.Sandberg MA, Rosner B, Weigel-DiFranco C, McGee TL, Dryja TP, Berson EL. Disease course in patients with autosomal recessive retinitis pigmentosa due to the USH2A gene. Invest Ophthalmol Vis Sci. 2008;49(12):5532–5539. [DOI] [PMC free article] [PubMed] [Google Scholar]
22.Glockle N, Kohl S, Mohr J, et al. Panel-based next generation sequencing as a reliable and efficient technique to detect mutations in unselected patients with retinal dystrophies. Eur J Hum Genet. 2014;22(1):99–104. [DOI] [PMC free article] [PubMed] [Google Scholar]
23.Shearer AE, Eppsteiner RW, Booth KT, et al. Utilizing ethnic-specific differences in minor allele frequency to recategorize reported pathogenic deafness variants. Am J Hum Genet. 2014;95(4):445–453. [DOI] [PMC free article] [PubMed] [Google Scholar]
24.Garcia-Garcia G, Aparisi MJ, Jaijo T, et al. Mutational screening of the USH2A gene in Spanish USH patients reveals 23 novel pathogenic mutations. Orphanet J Rare Dis. 2011;6:65. [DOI] [PMC free article] [PubMed] [Google Scholar]
25.Tajiguli A, Xu M, Fu Q, et al. Next-generation sequencing-based molecular diagnosis of 12 inherited retinal disease probands of Uyghur ethnicity. Sci Rep. 2016;6:21384. [DOI] [PMC free article] [PubMed] [Google Scholar]
26.Li LH, Li N, Zhao JY, et al. Findings of perinatal ocular examination performed on 3573, healthy full-term newborns. Br J Ophthalmol. 2013;97(5):588–591. [DOI] [PMC free article] [PubMed] [Google Scholar]
27.Salvo J, Lyubasyuk V, Xu M, et al. Next-generation sequencing and novel variant determination in a cohort of 92 familial exudative vitreoretinopathy patients. Invest Ophthalmol Vis Sci. 2015;56(3):1937–1946. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary Tables

NIHMS1630822-supplement-Supplementary_Tables.xlsx^{(357KB, xlsx)}

Supplementary (Appendix, online only material, etc.)

NIHMS1630822-supplement-Supplementary___Appendix__online_only_material__etc__.pdf^{(1.8MB, pdf)}

[R1] 1.Bamshad MJ, Ng SB, Bigham AW, et al. Exome sequencing as a tool for Mendelian disease gene discovery. Nat Rev Genet. 2011;12(11):745–755. [DOI] [PubMed] [Google Scholar]

[R2] 2.Whiffin N, Minikel E, Walsh R, et al. Using high-resolution variant frequencies to empower clinical genome interpretation. Genet Med. 2017;19(10):1151–1158. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R3] 3.Clarke GM, Anderson CA, Pettersson FH, Cardon LR, Morris AP, Zondervan KT. Basic statistical analysis in genetic case-control studies. Nat Protoc. 2011;6(2):121–133. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R4] 4.Michaelides M, Hunt DM, Moore AT. The genetics of inherited macular dystrophies. J Med Genet. 2003;40(9):641–650. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R5] 5.Zernant J, Xie YA, Ayuso C, et al. Analysis of the ABCA4 genomic locus in Stargardt disease. Hum Mol Genet. 2014;23(25):6797–6806. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R6] 6.Thiadens AA, Phan TM, Zekveld-Vroon RC, et al. Clinical course, genetic etiology, and visual outcome in cone and cone-rod dystrophy. Ophthalmology. 2012;119(4):819–826. [DOI] [PubMed] [Google Scholar]

[R7] 7.Downes SM, Packham E, Cranston T, Clouston P, Seller A, Nemeth AH. Detection rate of pathogenic mutations in ABCA4 using direct sequencing: clinical and research implications. Arch Ophthalmol. 2012;130(11):1486–1490. [DOI] [PubMed] [Google Scholar]

[R8] 8.Sun H, Smallwood PM, Nathans J. Biochemical defects in ABCR protein variants associated with human retinopathies. Nat Genet. 2000;26(2):242–246. [DOI] [PubMed] [Google Scholar]

[R9] 9.Downs K, Zacks DN, Caruso R, et al. Molecular testing for hereditary retinal disease as part of clinical care. Arch Ophthalmol. 2007;125(2):252–258. [DOI] [PubMed] [Google Scholar]

[R10] 10.Simonelli F, Testa F, de Crecchio G, et al. New ABCR mutations and clinical phenotype in Italian patients with Stargardt disease. Invest Ophthalmol Vis Sci. 2000;41(3):892–897. [PubMed] [Google Scholar]

[R11] 11.Eisenberger T, Neuhaus C, Khan AO, et al. Increasing the yield in targeted next-generation sequencing by implicating CNV analysis, non-coding exons and the overall variant load: the example of retinal dystrophies. PLoS One. 2013;8(11):e78496. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R12] 12.Rosenberg T, Klie F, Garred P, Schwartz M. N965S is a common ABCA4 variant in Stargardt-related retinopathies in the Danish population. Mol Vis. 2007;13:1962–1969. [PubMed] [Google Scholar]

[R13] 13.Wiszniewski W, Zaremba CM, Yatsenko AN, et al. ABCA4 mutations causing mislocalization are found frequently in patients with severe retinal dystrophies. Hum Mol Genet. 2005;14(19):2769–2778. [DOI] [PubMed] [Google Scholar]

[R14] 14.Burke TR, Tsang SH, Zernant J, Smith RT, Allikmets R. Familial discordance in Stargardt disease. Mol Vis. 2012;18:227–233. [PMC free article] [PubMed] [Google Scholar]

[R15] 15.Testa F, Rossi S, Sodi A, et al. Correlation between photoreceptor layer integrity and visual function in patients with Stargardt disease: implications for gene therapy. Invest Ophthalmol Vis Sci. 2012;53(8):4409–4415. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R16] 16.van Huet RA, Pierrache LH, Meester-Smoor MA, et al. The efficacy of microarray screening for autosomal recessive retinitis pigmentosa in routine clinical practice. Mol Vis. 2015;21:461–476. [PMC free article] [PubMed] [Google Scholar]

[R17] 17.Zernant J, Schubert C, Im KM, et al. Analysis of the ABCA4 gene by next-generation sequencing. Invest Ophthalmol Vis Sci. 2011;52(11):8479–8487. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R18] 18.Suarez T, Biswas SB, Biswas EE. Biochemical defects in retina-specific human ATP binding cassette transporter nucleotide binding domain 1 mutants associated with macular degeneration. J Biol Chem. 2002;277(24):21759–21767. [DOI] [PubMed] [Google Scholar]

[R19] 19.Biswas-Fiss EE, Affet S, Ha M, Biswas SB. Retinoid binding properties of nucleotide binding domain 1 of the Stargardt disease-associated ATP binding cassette (ABC) transporter, ABCA4. J Biol Chem. 2012;287(53):44097–44107. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R20] 20.Huang L, Mao Y, Yang J, Li Y, Li Y, Yang Z. Mutation screening of the USH2A gene in retinitis pigmentosa and USHER patients in a Han Chinese population. Eye (Lond). 2018;32(10):1608–1614. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R21] 21.Sandberg MA, Rosner B, Weigel-DiFranco C, McGee TL, Dryja TP, Berson EL. Disease course in patients with autosomal recessive retinitis pigmentosa due to the USH2A gene. Invest Ophthalmol Vis Sci. 2008;49(12):5532–5539. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R22] 22.Glockle N, Kohl S, Mohr J, et al. Panel-based next generation sequencing as a reliable and efficient technique to detect mutations in unselected patients with retinal dystrophies. Eur J Hum Genet. 2014;22(1):99–104. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R23] 23.Shearer AE, Eppsteiner RW, Booth KT, et al. Utilizing ethnic-specific differences in minor allele frequency to recategorize reported pathogenic deafness variants. Am J Hum Genet. 2014;95(4):445–453. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R24] 24.Garcia-Garcia G, Aparisi MJ, Jaijo T, et al. Mutational screening of the USH2A gene in Spanish USH patients reveals 23 novel pathogenic mutations. Orphanet J Rare Dis. 2011;6:65. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R25] 25.Tajiguli A, Xu M, Fu Q, et al. Next-generation sequencing-based molecular diagnosis of 12 inherited retinal disease probands of Uyghur ethnicity. Sci Rep. 2016;6:21384. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R26] 26.Li LH, Li N, Zhao JY, et al. Findings of perinatal ocular examination performed on 3573, healthy full-term newborns. Br J Ophthalmol. 2013;97(5):588–591. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R27] 27.Salvo J, Lyubasyuk V, Xu M, et al. Next-generation sequencing and novel variant determination in a cohort of 92 familial exudative vitreoretinopathy patients. Invest Ophthalmol Vis Sci. 2015;56(3):1937–1946. [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

A novel statistical method for interpreting the pathogenicity of rare variants

Jun Wang, PhD

Hehe Liu, MMed

Renae Elaine Bertrand, BSc

Alejandro Sarrion-Perdigones, PhD

Yezabel Gonzalez, MSc

Koen JT Venken, PhD

Rui Chen, PhD

Abstract

Purpose:

Methods:

Results:

Conclusion:

Introduction

Methods:

Ethics Statement

Derivation of the combined test

Other Methods

Code availability

Results:

The simulation analysis of the test power and specificity

The test power increases as the patient cohort size increases

Figure1.

The power increases as the disease prevalence decreases

The test has a high specificity

Sampling bias in patient cohort could affect the test power and specificity

Figure 2.

Misspecification of disease prevalence could affect test performance

The impact of allele penetrance on test performance

Estimation of the thresholds of population allele frequency for filtering variants

Real case studies

Screening of reported pathogenic variants in ABCA4 for Stargardt (STGD)

Figure 3.

Screening of reported pathogenic variants in USH2A for Retinitis pigmentosa (RP)

Identification of benign variants in LRP5 gene for Familial Exudative Vitreoretinopathy (FEVR)

Comparison to in silico variant prediction scores and ClinVar annotation

Figure 4.

Functional assay of predicted benign variants in LRP5

Figure 5.

Discussion

Supplementary Material

Acknowledgements

Footnotes

Reference

Associated Data

Supplementary Materials

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases