Extracting and calibrating evidence of variant pathogenicity from population biobank data

Vineel Bhat; Tian Yu; Lara Brown; Vikas Pejaver; Matthew Lebo; Steven Harrison; Christopher A Cassa

doi:10.1016/j.ajhg.2025.06.012

. 2025 Jul 9;112(8):1805–1817. doi: 10.1016/j.ajhg.2025.06.012

Extracting and calibrating evidence of variant pathogenicity from population biobank data

Vineel Bhat ¹, Tian Yu ¹, Lara Brown ¹, Vikas Pejaver ^2,³, Matthew Lebo ^4,⁵, Steven Harrison ^6,⁷, Christopher A Cassa ^1,^∗

PMCID: PMC12401458 PMID: 40639380

Summary

Genomic medicine requires a robust evidence base of variant phenotypic impacts, which remains incomplete even in extensively studied genes with monogenic disease associations. Here, we evaluated the broad potential of using population cohort data to identify evidence that can be used in variant assessment. Across 41 genes related to 18 clinically actionable monogenic phenotypes, we calculated variant-level odds ratios of disease enrichment using data from 469,803 UK Biobank participants. We found significant differences in odds ratio values between ClinVar-labeled pathogenic and benign variants in 11 phenotypes, spanning both common and rare disorders. To facilitate clinical translation, we calibrated the strength of evidence provided by variant-level odds ratios to align with American College of Medical Genetics and Genomics and the Association for Molecular Pathology (ACMG/AMP) interpretation guidelines (PS4 criterion) and found that odds ratios may reach “moderate,” “strong,” or “very strong” evidence, varying by phenotype and gene. Overall, we found that 2.6% (N = 12,350) of participants harbor a rare variant of uncertain significance (VUS) with at least moderate evidence of pathogenicity—an indication of potentially unrecognized disease risk. Finally, by incorporating computational and functional data alongside population-based odds ratios, we identified variants that met the criteria for clinical reclassification. Notably, using this approach, we identified that 12.4% of rare VUSs in LDLR seen in participants meet diagnostic criteria to be classified as likely pathogenic, demonstrating its potential to scale the reclassification of VUSs.

Keywords: clinical variant classification, ACMG/AMP sequence variant interpretation guidelines, variant of uncertain significance, VUS, disease enrichment for rare coding variants, UK Biobank, PS4 evidence, odds ratio, variant interpretation, statistical genetics

This study leverages biobank data to identify rare coding variants that increase risk across 41 genes linked to 18 clinically actionable conditions. By calibrating this population enrichment within the ACMG/AMP framework, the authors identify thresholds for evidence strength, enabling the reclassification of numerous variants of uncertain significance.

Introduction

Advancing genomic medicine relies on our ability to identify variants with sufficient clinical certainty that they can be labeled “pathogenic” or “benign.”¹ Diagnostic testing has identified many pathogenic variants that increase disease risk, providing valuable insight into the molecular basis of inherited disorders.² However, even in established genes like BRCA1, LDLR, and MSH2, most variants have insufficient evidence to reach a clinical classification and are consequently labeled “variants of uncertain significance” (VUSs).³^,⁴ While many of these variants may substantially increase disease risk, current clinical guidelines discourage this prognostic information from being communicated to patients or providers who have no specific indication for testing when following current clinical guidelines.⁵ This translational gap collectively prevents many patients from benefiting from genomic medicine, including optimized surveillance and therapeutic options.⁴

To standardize the information used in variant assessment, the American College of Medical Genetics and Genomics and the Association for Molecular Pathology (ACMG/AMP) sequence variant interpretation (SVI) guidelines systematically weigh and combine the available evidence of pathogenicity or benignity for each variant. These guidelines define various forms of evidence (e.g., population, functional, computational, or contextual evidence) and assign strength levels to each type (e.g., “supporting,” “moderate,” “strong,” or “very strong”) based on the clinical certainty provided by each form of evidence. Importantly, no single source of evidence alone is considered sufficient to classify a variant as pathogenic. The SVI guidelines define population evidence as strong evidence of pathogenicity for a variant when it is significantly enriched in affected individuals of a disease over control subjects (ACMG/AMP PS4 criterion).⁶

Given that damaging variants in genes tend to be rare, population evidence of pathogenicity has most often been derived by counting the number of distinct probands from affected individuals or measuring the extent of co-segregation of variants with disease within families.² Some disease-specific ClinGen variant curation expert panels (VCEPs) have defined specialized interpretation criteria for population evidence for specific genes and phenotypes (https://cspec.genome.network/cspec/ui/svi/). For example, different numbers of probands that have a specific disorder can correspond to different strengths of evidence for a phenotype (e.g., 2–5 distinct individuals with familial hypercholesterolemia provide supporting evidence of variant pathogenicity in LDLR). When considering the population enrichment of affected individuals at the variant level, most often, a variant with an odds ratio (OR) ≥ 5.0 and a lower 95% confidence bound ≥ 1 is considered strong evidence of pathogenicity, though these criteria may also differ by disorder or even gene. Recent work has considered how to make use of population data in variant interpretation and the extent to which population cohorts can provide evidence of variant pathogenicity, but this approach has not yet been measured or calibrated at the gene or phenotypic level or across a broad set of phenotypes.⁷^,⁸

Dramatic increases in biobank size now provide the statistical power to detect a broader spectrum of variant effect sizes, particularly for disorders and phenotypes that are more common or that are widely measured in population cohorts.⁹ Notably, endophenotypic effect sizes have been shown to discriminate between variants that are known to be pathogenic or benign in monogenic susceptibility genes, generally with larger effects.¹⁰ However, this information has yet to be generalized across a broader set of dichotomous phenotypes or calibrated for use with existing clinical guidelines, which has limited its use in variant assessment.

Here, we draw on population case data at scale from 469,803 UK Biobank participants to identify variants that are enriched in 18 actionable disorders. Specifically, we model the enrichment of disease using ORs for each phenotype based on case data at the variant level. We then align this evidence of pathogenicity within the ACMG/AMP SVI diagnostic framework by systematically calibrating the strength of evidence provided by variant ORs in each gene, enabling its use in clinical translation. Finally, we combine this information with well-calibrated computational and functional evidence to identify the scale of VUSs that could potentially be reclassified. By extracting and aligning population evidence in an automated manner, this framework represents a powerful step forward toward eliminating VUSs.

Methods

Population characteristics and genomic data from the UK Biobank

We computed variant-level ORs based on observed population outcomes using data from 469,803 participants in the UK Biobank with available whole-exome sequencing data.⁹^,¹¹ Population-level summary statistics are provided in Table 1. After quality control, we observed 56,144 rare, nonsynonymous variants across 18 disorders and 41 genes, affecting 409,003 participants collectively. Additional details about study participants, variant annotation, and filtering are included in supplemental methods.

Table 1.

Phenotype-level information, including genes considered, number of variants observed, number of variants observed for which an odds ratio could be calculated, cohort disease prevalence, median odds ratio of ClinVar pathogenic and benign variants, and one-sided Mann-Whitney U p values between pathogenic and benign variant odds ratios

Disease	Genes	No. of variants	No. of variants with OR	Disease prevalence (%)	Median OR of P/LP variants	Median OR of B/LB variants	p value
Familial hypercholesterolemia (predicted using adjusted LDL ≥ 190)	LDLR, APOB, PCSK9	33,77	1,330	9.1^a	30.3	1.0	2.2 × 10⁻¹⁹
Breast or ovarian cancer	BRCA1, BRCA2, CHEK2, ATM	3,799	1,351	9.0^b	10.5	1.0	1.3 × 10⁻³⁹
Colorectal cancer	MSH2, MSH6, MLH1, PMS2	2,510	415	2.1	46.9	1.4	3.8 × 10⁻¹⁶
Hypertrophic cardiomyopathy	ACTC1, MYBPC3, MYH7, MYL2, TPM1	2,773	121	0.1	262.1	4.6	1.5 × 10⁻⁴
Dilated cardiomyopathy	MYBPC3, TNNT2, TTN	32,416	1,400	0.3	125.1	1.6	4.5 × 10⁻¹⁵
Arrhythmogenic right ventricular cardiomyopathy	DSC2, DSG2, DSP, PKP2, TMEM43	5,594	284	0.3	202.0	1.2	8.4 × 10⁻⁴
Brugada syndrome	SCN5A	1,196	42	0.2	215.3	0.9	7.9 × 10⁻³
Li-Fraumeni syndrome (predicted using presence of any cancer)	TP53	394	276	24.1^a	8.3	1.1	2.7 × 10⁻⁸
Maturity-onset diabetes of the young (predicted using HbA1c ≥ 6.5%)	HNF1A, HNF1B, HNF4A, GCK	693	205	3.9^a	46.1	5.4	1.8 × 10⁻³
Long QT syndrome (predicted using QTc ≥ 460)	SCN5A, KCNH2, KCNQ1	461	106	5.9^a	48.6	0.8	3.8 × 10⁻⁴
Osler hemorrhagic telangiectasia syndrome	ACVRL1, ENG	751	29	0.0	10766.3	3.5	4.6 × 10⁻²
Juvenile polyposis syndrome (predicted using presence of colon polyp)	BMPR1A, SMAD4	520	131	5.6^a	28.0	1.5	0.2
Pancreatic cancer	STK11	226	8	0.4	–	–	–
Prostate cancer	PTEN	181	19	7.4^b	38.0	1.3	0.3
Hereditary hemochromatosis	HFE	323	8	0.3	–	–	–
Ehlers-Danlos syndrome	COL3A1	1,191	26	0.2	–	4.1	–
Retinitis pigmentosa	RPE65	607	6	0.1	14.8	6.8	0.5
Nephroblastoma	WT1	207	2	0.0	–	–	–

Open in a new tab

Prevalence described uses the phenotypic encodings considered for participants to predict the presence of each disorder (e.g., prevalence of any cancer for Li-Fraumeni syndrome).

Prevalence is based on participants of a single sex.

Clinical endpoints and endophenotypes

Primary clinical endpoints were specific to each condition. For familial hypercholesterolemia, maturity-onset diabetes of the young, and long QT syndrome, endophenotype values above a threshold were used as a proxy for disease cases. Adjusted low-density lipoprotein cholesterol (LDL-C) ≥190 mg/dL, HbA1c ≥6.5%, and QTc ≥460 were used for familial hypercholesterolemia, maturity-onset diabetes of the young, and long QT syndrome, respectively. Estimated untreated (adjusted) LDL-C levels were derived using adjustments for lipid-lowering therapies, as applied previously.¹² Additionally, the presence or absence of any cancer was used as a proxy for Li-Fraumeni syndrome, and the presence of a colon polyp was used as a proxy for juvenile polyposis syndrome. Note that these “cases” derived from alternative data may not be considered cases in the traditional sense; however, their use allows for predicting variant impact even in the absence of direct data on disorders. For all other disorders, cases were ascertained using the presence or absence of the specific disorder. Case definitions were developed in the UK Biobank using a combination of self-reported data confirmed by trained healthcare professionals, hospitalization records, and national procedural, cancer, and death registries, when applicable. Age at event was estimated based on the listed date of event, when available, and derived using participant birth dates when not directly provided. Cases without any form of age at event information were excluded.

Calculating variant-level ORs from population cohort data

Variant-level ORs were calculated using the aforementioned endophenotype and disorder case data, where for a nonsynonymous variant $v$ , the “population-based OR”

O R (v) = \frac{a / b}{c / d}, where

a = number of participants with phenotype and v,

b = number of participants without phenotype and v,

c = number of participants with phenotype and no variants in associated genes, and

d = number of participants without phenotype and no variants in associated genes.

Haldane-Anscombe correction (adding 0.5 to all cells $a, b, c, and d$ in the contingency table) was applied to allow for the calculation of ORs when no false positives existed while also reducing the bias of the estimator. Variants for which $a$ was 0 and the corrected OR was ≥1 were not included in analyses.

ClinVar clinical assertions from diagnostic laboratories

ClinVar summary assessments were extracted from the tab delimited ClinVar variant summary file released on April 6, 2023.¹³ We grouped pathogenic, “likely pathogenic,” and “pathogenic/likely pathogenic” classifications as P/LP collectively and grouped benign, “likely benign,” and “benign/likely benign” classifications as B/LB. Importantly, we used the term VUS inclusively to capture all variants with an inconclusive classification, including “uncertain significance,” “conflicting interpretations of pathogenicity,” and “not provided.”

Calibrating population evidence of pathogenicity (PS4)

Estimating the prevalence of pathogenic variants

We considered two methods for estimating the prevalence of pathogenic variants or the prior probability of pathogenicity: (1) using population data from the UK Biobank and (2) using clinical data from ClinVar. In the first approach, we estimate the prior probability of pathogenicity in a gene $g$ as

P (f | v_{g}), where

f = participant has the phenotype of interest and

v_{g} = participant has a nonsynonymous variant in g .

Then, priors at the phenotypic level can be calculated as the weighted average

\frac{\sum_{g = 1}^{n} P (f | v_{g}) \cdot c o u n t (g)}{\sum_{g = 1}^{n} c o u n t (g)},

where n = number of genes associated with the phenotype and

c o u n t (g) = number of nonsynonymous variants in g .

A similar weighted average can be used to aggregate phenotype-level priors into an overall prior for all phenotypes considered or some subset of phenotypes. Often, in previous studies, a single overall prior has been calculated for calibration across all genes in a dataset; however, we contend that there is a benefit to stratifying at the phenotype and gene levels, given how widely priors can differ at these levels.¹⁴^,¹⁵ We report prior probabilities of pathogenicity generated using population data in Tables S2–S4. These priors generally closely track disease prevalence, with deviations in some phenotypes. Notably, the aggregated prior across phenotypes associated with CDC Tier 1 genes was 8.6%, similar to the 10% prior assumed in Tavtigian et al.¹⁴ The aggregate prior across all phenotypes was lower, at 2.8%.

In the second approach, we estimate the prior probability of pathogenicity in a gene $g$ as

P (f | v_{g}), where

f = variant is P / LP in ClinVar and

v_{g} = observed nonsynonymous variant in g .

Weighted averages can be calculated for phenotype-level and further aggregated prevalence values of pathogenic variants, as described earlier. We report priors generated using clinical data in Tables S2–S4, next to the priors generated using population data. Compared to priors based on population data, these priors were generally higher in low-disease-prevalence phenotypes and lower in high-disease-prevalence phenotypes. Notably, the aggregated prior across phenotypes associated with CDC Tier 1 genes was 5.4%, similar to the 4.41% calculated in Pejaver et al.¹⁵ The aggregate prior across all phenotypes was lower, at 1.9%.

We caution that both of our methods to estimate pathogenic variant prevalence may underestimate true values. When using population case data from the UK Biobank, there are participants that are yet to develop a phenotype, and when considering the proportion of ClinVar P/LP variants among nonsynonymous variants in the UK Biobank, there are a large number of nonsynonymous variants observed in the UK Biobank that are not represented in ClinVar. For subsequent analysis, we use population-based priors, as some phenotypes we evaluated have very few ClinVar variants and participants in the UK Biobank have been enrolled over a long period, so we are less likely to underestimate relevant values for developing these priors.

Statistical framework

The ACMG/AMP variant interpretation guidelines describe multiple levels of strength in favor of variant pathogenicity: supporting, moderate, strong, and very strong.⁶ These strength levels have been mapped to positive likelihood ratios (LR+) to quantify variant impact.¹⁴ To calibrate the strength of population evidence (which we model as continuous ORs) at the gene level, we use a sliding window method similar to that introduced in Pejaver et al. for the calibration of computational scores.¹⁵ For every population OR $p$ , we calculate a local likelihood ratio (lr⁺) using pathogenic and benign variants with ORs in the interval $[p - z, p + z]$ for the lowest value of z such that ${v | O R (v) \in [p - z, p + z]}$ contains at least 20% of all pathogenic and benign variants in a given gene, excluding those for which ORs could not be calculated. The local likelihood ratio calculation is then

l r^{+} = \frac{P (O R (v) \in [p - z, p + z] | v is pathogenic)}{P (O R (v) \in [p - z, p + z] | v is benign)} .

Given this and the prior probability of pathogenicity $a$ , we can calculate the posterior probability of pathogenicity $b$ for a variant by rearranging the following equation:

prior odds posterior odds

\frac{a}{1 - a} \cdot l r^{+} = \frac{b}{1 - b} .

Next, to map supporting, moderate, strong, and very strong evidence thresholds to posterior probability thresholds for each gene and phenotype, we calculate suitable values for the odds of pathogenicity for very strong evidence ( $O_{P V s t}$ ). These values were determined using the supplementary table from Tavtigian et al.¹⁴ so that the ACMG/AMP combining rules are generally satisfied and posterior probabilities reach values of 0.9 and 0.99 for likely pathogenic and pathogenic classifications, respectively. O_PVst values are reported in Tables S2–S4 next to their respective prior probabilities. We then used the $O_{P V s t}$ values for each gene, phenotype, and further aggregation as the $l r^{+}$ variable in the equation above to calculate the very strong posterior probability threshold. For strong, moderate, and supporting evidence of pathogenicity, this process was repeated but using $\sqrt{O_{P V s t}}$ , $\sqrt{\sqrt{O_{P V s t}}}$ , and $\sqrt{\sqrt{\sqrt{O_{P V s t}}}}$ , respectively, based on the assumption that evidence levels scale by powers of 2.

Finally, to identify intervals of population-based ORs that correspond to varying levels of evidence, we identify the minimum OR at which the posterior probability threshold for an evidence level is crossed and then use linear approximation to estimate the OR at which the exact threshold would be reached.

Alternative approaches considered for calibration

Our calibration strategy follows a localized estimation approach, as described previously.¹⁵ We considered this localized approach and a global approach, which involves calculating likelihood ratios using all available variants. The primary disadvantage of a global approach is that only a single global threshold is considered, and any OR value greater than that threshold would be considered to have the same strength of evidence, though this may not always be the case.¹⁵ Due to this challenge, we used a localized estimation approach, though we note that this also has limitations. Specifically, a local approach requires a large amount of data and may suffer from imprecision when the interval around a score used to calculate $l r^{+}$ has to be very wide in order to include a sufficient number of variants for estimation. While the approach in Pejaver et al.¹⁵ maintained intervals wide enough such that 100 pathogenic or benign variants were always included, this approach was not always possible with population data at the gene level, given the low numbers of pathogenic or benign variants in some phenotypes and genes. Instead, we used intervals wide enough such that 20% of pathogenic or benign variants are included, as we found this to work particularly well, although alternative approaches may use a different proportion or a constant number.

Estimating the prevalence of pathogenic variants, or the prior probability, can be substantially different depending on the context. For example, Tavtigian et al.¹⁴ assumed a prior of 10% given the context of identifying a pathogenic variant among a set of candidate variants from clinical genetic testing, and Pejaver et al.¹⁵ estimated a lower number, 4.41%, measured empirically using gnomAD as a reference set. For our analyses, as described previously, we used priors using population data from the UK Biobank due to their concordance with our approach, though we also calculated priors based on clinical data (using non-pathogenic UK Biobank variants as controls). We note that other approaches based on clinical diagnostic data may make use of other control datasets, such as gnomAD, for this purpose and that there are alternative ways to calculate prior probabilities outside of these two methods (e.g., using functional assay data).

Survival analysis

Survival analyses including Kaplan-Meier curves and log rank tests were performed using the Python lifelines survival analysis package (v.0.23.9).¹⁶

Conversion of evidence to the point system

To identify how many VUSs and variants absent from ClinVar with population evidence might have sufficient evidence to be classified as P/LP, we considered multiple evidence sources and converted them to the semi-quantitative point system adaptation of the ACMG/AMP SVI framework.¹⁷ We interpreted variants with OR greater than the strong threshold calibrated for the gene as having PS4 evidence (+4 points on the point scale). REVEL scores were converted to evidence bands (PP3/BP4) using thresholds calculated in Pejaver et al.,¹⁵ and functional scores were translated to PS3 (+4 points) or BS3 (−4 points) based on author-recommended thresholds, with the exception of LDLR, for which a threshold was not provided and which we estimated by analyzing global LR+ with ClinVar as a truth set.¹⁸^,¹⁹^,²⁰ Finally, contextual evidence from previously classified pathogenic variants was applied in the following manner: PVS1 (+8 points) for loss-of-function (LoF) variants annotated as high confidence by LOFTEE²¹; PS1 (+4 points) when a variant has a colocated P/LP variant encoding the same substitution and −4 points when a variant has a colocated B/LB variant encoding the same substitution (note that the benign equivalent is not an established ACMG/AMP criterion); and PM5 (+2 points) when a variant has a colocated P/LP variant encoding a different substitution and −2 points when a variant has a colocated B/LB variant encoding a different substitution (note that the benign equivalent is not an established ACMG/AMP criterion; PM5 was not applied to BRCA1, in accordance with guidance from the VCEP; https://cspec.genome.network/cspec/ui/svi/doc/GN092).

Results

Population characteristics

To calculate variant-level ORs based on observed population outcomes, we draw on data from 469,803 participants in the UK Biobank with available whole-exome sequencing data. Population-level summary statistics are provided in Table S1, and a complete table of variant-level ORs for all disorders studied is available at https://doi.org/10.6084/m9.figshare.29143331. After quality control, we observed 56,144 rare, nonsynonymous variants across 18 disorders and 41 genes, affecting 409,003 participants collectively.

Population-based ORs separate pathogenic and benign variants in many phenotypes

Among the 56,144 rare, nonsynonymous variants we observed, we calculated population-based ORs for 5,737 variants (affecting 359,687 participants collectively) with sufficient biobank case data to calculate a valid corrected OR (see methods). The median OR for a ClinVar P/LP variant was 30.3 (interquartile range [IQR]: 10.5–59.3), while the median OR for a B/LB variant was 1.1 (IQR: 0.8–1.9), across all phenotypes. These figures vary across phenotypes, sometimes significantly. For cardiomyopathy-related phenotypes, ORs for pathogenic variants were particularly high, with phenotype-level median ORs for P/LP variants ranging from 125.1 (dilated cardiomyopathy) to 262.1 (hypertrophic cardiomyopathy) (Table 1). Some cancer-related phenotypes had relatively lower ORs for damaging variants: breast/ovarian cancer had a median OR for P/LP variants of 10.5, while Li-Fraumeni syndrome had a median of 8.3. For many phenotypes, B/LB variants had median ORs close to 1, with familial hypercholesterolemia and breast/ovarian cancer having a median OR of exactly 1. Notably, for 11 phenotypes, the difference between the median ORs of P/LP and B/LB variants was statistically significant (one-sided Mann-Whitney U p < 0.05). The distribution of ORs for these phenotypes stratified by ClinVar status is shown in Figure 1.

Population-based odds ratios separate pathogenic and benign variants across many phenotypes

The distribution of log₁₀ odds ratios by ClinVar status in 11 phenotypes with statistically significant separation of pathogenic and benign variants (one-sided Mann-Whitney U p < 0.05).

Next, we validated the discriminatory capacity of population-based ORs to classify pathogenic and benign variants. Across all phenotypes, we found that ORs were able to accurately classify existing ClinVar pathogenic and benign variants (AUC = 0.94). At the phenotypic level, AUCs ranged between 0.92 and 1.00 among 8 phenotypes that had at least 25 ClinVar variants with either pathogenic (P/LP) or benign (B/LB) classifications with ORs (Figure S1). We also calculate the optimal OR thresholds at which specificity and sensitivity are maximized (calculated using Youdan’s index, the most upper left point on ROC curves, which can also be interpreted as the point at which global LR+ is maximized), which can be used to dichotomously classify variants for each phenotype (Table S5).

Systematically calibrating evidence thresholds for population-based ORs

We sought to measure the strength of evidence provided by population-based ORs to align with the qualitative levels (supporting, moderate, strong, or very strong) described in the ACMG/AMP SVI guidelines in order to accelerate the use of this information in variant interpretation. Using a local posterior-probability-based approach described in the methods, we calibrate population-based ORs and define evidence thresholds for each phenotype with sufficient clinical data. Among 8 phenotypes with at least 25 pathogenic or benign ClinVar variants, the highest level of evidence reached was very strong for 4 phenotypes (colorectal cancer, maturity-onset diabetes of the young, hypertrophic cardiomyopathy, and arrhythmogenic right ventricular cardiomyopathy), strong for 3 phenotypes (familial hypercholesterolemia, breast/ovarian cancer, and Li-Fraumeni syndrome), and moderate for 1 phenotype (dilated cardiomyopathy) (Figure 2A). Notably, we found that evidence thresholds for many phenotypes varied considerably from one another, demonstrating the utility of calibration at the phenotype level. This challenges the uniform application of specific OR thresholds being treated as strong evidence across all phenotypes, as described in the ACMG/AMP PS4 criterion. Phenotypes such as hypertrophic cardiomyopathy and arrhythmogenic right ventricular cardiomyopathy required higher ORs to reach the same level of evidence, while phenotypes such as familial hypercholesterolemia and breast/ovarian cancer required lower ORs to reach the same level of evidence. Evidence thresholds at the phenotypic level are presented in Table 2, and bootstrapped results and confidence intervals are presented in Figure S2A.

Posterior probability curves for select phenotypes and aggregations

(A) Posterior probability curves for variants in 8 phenotypes with at least 25 pathogenic or benign ClinVar variants available for evidence calibration.

(B) Posterior probability curve for variants in phenotypes with fewer than 25 pathogenic or benign ClinVar variants, in aggregate.

(C) Posterior probability curves for variants in phenotypes associated with CDC Tier 1 genes, in aggregate.

(D) Posterior probability curves for variants in all phenotypes, in aggregate.

(E) Posterior probability curves for variants in all phenotypes, in aggregate, using a prior probability of pathogenicity of 10%.

Three SD Gaussian smoothing was applied to all plots uniformly. Threshold levels used (supporting, moderate, strong, and very strong, from bottom to top) vary by plot based on the estimation of distinct priors for each phenotype and aggregation, as described in the methods.

Table 2.

Estimated odds ratio evidence intervals at the phenotypic level

Phenotype	N	Supporting	Moderate	Strong	Very strong
Familial hypercholesterolemia (evaluated using statin-adjusted LDL ≥ 190)	154	[5.2, 8.1]	[8.1, 49.6]	≥49.6	–
Breast or ovarian cancer	320	[4.2, 5.4]	[5.4, 13.7]	≥13.7	–
Colorectal cancer	94	[8.5, 12.0]	[12.0, 17.6]	[17.6, 20.0]	≥20.0
Maturity-onset diabetes of the young (evaluated using HbA1c ≥ 6.5%)	27	[17.6, 18.7]	[18.7, 21.8]	[21.8, 24.6]	≥24.6
Hypertrophic cardiomyopathy	28	[73.6, 74.4]	[74.4, 84.0]	[84.0, 102.6]	≥102.6
Dilated cardiomyopathy	139	[29.6, 57.9]	≥57.9	–	–
Arrhythmogenic right ventricular cardiomyopathy	30	[14.6, 18.6]	[18.6, 60.5]	[60.5, 131.3]	≥131.3
Li-Fraumeni syndrome (evaluated using the presence of any cancer)	59	[2.6, 3.4]	[3.4, 6.2]	≥6.2	–

Open in a new tab

We calculated odds ratio intervals that correspond to different ACMG/AMP evidence strength levels for 8 phenotypes that had at least 25 pathogenic or benign ClinVar variants so that there would be sufficient data for calibration. A dash indicates that the specified level of evidence was not reached. N represents the number of ClinVar pathogenic and benign variants considered for calibration in each phenotype.

Beyond phenotype-level calibration, we also performed three aggregate calibrations across multiple phenotypes: (1) using phenotypes with fewer than 25 pathogenic or benign ClinVar variants, (2) using CDC Tier 1 genes, and (3) using all phenotypes. For 10 phenotypes with fewer than 25 pathogenic or benign variants that reached very strong evidence, phenotypes associated with CDC Tier 1 genes (familial hypercholesterolemia, breast/ovarian cancer, and colorectal cancer) reached strong evidence, and all phenotypes in aggregate reached moderate evidence (Figures 2B–2D). These strength estimates are based on prior probabilities of disease that were developed using population sequencing data and are consequently lower than disease priors previously used for calibration, leading to conservatively lower strengths of evidence (see methods). When applying a commonly used prior of 10%, we find that all 18 phenotypes in aggregate reach strong evidence (Figure 2E). Evidence thresholds for these aggregations are presented in Table 3, and bootstrapped results and confidence intervals are presented in Figures S2B–S2E.

Table 3.

Estimated odds ratio evidence intervals for aggregations

Aggregation	N	Supporting	Moderate	Strong	Very strong
Phenotypes with fewer than 25 pathogenic or benign ClinVar variants	50	[9.1, 9.6]	[9.6, 12.6]	[12.6, 16.1]	≥16.1
Phenotypes associated with CDC Tier 1 conditions	568	[4.8, 6.9]	[6.9, 60.3]	≥60.3	–
All phenotypes	901	[7.5, 14.5]	≥14.5	–	–
All phenotypes (10% prior)	901	[6.7, 12.0]	[12.0, 40.9]	≥40.9	–

Open in a new tab

We calculated odds ratio intervals that correspond to ACMG/AMP evidence strength levels for three aggregations across multiple phenotypes: phenotypes with fewer than 25 pathogenic or benign ClinVar variants, phenotypes associated with CDC Tier 1 conditions, and across all phenotypes. Strength levels for all phenotypes in aggregate were calculated using both a population-based prior (the default in our calculations) and a prior of 10%, a commonly used value. A dash indicates that the specified level of evidence was not reached. N represents the number of ClinVar pathogenic and benign variants considered for calibration in each aggregation of phenotypes.

Using the full aggregate calibration thresholds based on a 10% prior across all 18 phenotypes we analyzed, among 53,926 VUSs (including variants that are uncertain or absent from ClinVar), we found that 456 VUSs (0.8%) affecting 11,796 participants had supporting population evidence, 691 VUSs (1.3%) affecting 9,596 participants had moderate population evidence, and 609 VUSs (1.1%) affecting 2,754 participants had strong population evidence. Collectively, we found that 12,350 (2.6%) participants harbor a rare VUS with at least moderate evidence of pathogenicity in the aggregate, highlighting the clinical value of population-based ORs. Using individual phenotype calibration thresholds instead, among 15,150 VUSs in the 8 phenotypes we calibrated individually, we found that 172 VUSs (1.1%) affecting 1,220 participants had supporting population evidence, 445 VUSs (2.9%) affecting 2,085 participants had moderate population evidence, 266 VUSs (1.8%) affecting 487 participants had strong population evidence, and 165 VUSs (1.1%) affecting 337 participants had very strong population evidence of pathogenicity. We found that 2,909 (0.6%) participants harbor a rare VUS with at least moderate evidence of pathogenicity in these phenotypes.

When calculating ORs, ultra-rare variants have fewer observations and thus potentially less certainty around estimates for individual variants. To determine whether ultra-rare variants provide the same clinical certainty as a class when compared with variants with higher frequencies, we conducted a sensitivity analysis. Using a prior of 10%, we found that singletons—in this case, variants present in one participant who has an associated disorder—as a class provide very strong evidence, as do variants present in 2 or 3 participants, though there is variance in these estimates at very high OR values (Figure S3A). More generally, we found that variants with at most 10 participants with the associated disorder reached the same level of evidence as variants with more than 10 participants (Figure S3B). Furthermore, for singletons in particular, we found that 94.8% of singletons with an OR of at least 5 with a decisive ClinVar classification were pathogenic (Figure S3C). Notably, 93.9% of singletons with an OR of at least 5 also had a lower 95th percentile confidence bound greater than 1, highlighting that even variants that only appear in one case can provide robust evidence.

Comparing outcomes for participants with VUSs with high ORs versus pathogenic variants

We sought to identify potential differences in clinical outcomes between participants with high-OR VUSs (including variants that are uncertain or absent from ClinVar) and participants with high-OR P/LP variants using survival analysis. Figure 3A compares longitudinal clinical outcomes in MSH2 and MSH6, two genes associated with colorectal cancer, at various OR thresholds (5, 10, 15, and 30). Notably, despite being related parts of the same mismatch repair complex, VUSs in MSH2 must reach a much higher OR threshold in order to reach parity with similar P/LP variants in clinical outcomes compared with VUSs in MSH6, highlighting the potential benefits of evaluating population evidence at the gene level. Beyond colorectal cancer, we observed large differences between genes related to the same phenotype, including ATM and BRCA2 for breast cancer and MYH7 and MYBPC3 for hypertrophic cardiomyopathy, further motivating calibration at the gene level. We report calibration thresholds for 9 genes with at least 25 ClinVar pathogenic and benign variants in Table S6.

Comparing Kaplan-Meier curves for participants with high odds ratio VUSs versus participants with high odds ratio pathogenic variants

Survival curves were generated using a Kaplan-Meier estimator, and shaded regions represent 95% confidence intervals. “At risk” counts describe the number of participants considered in each of the three groups at different ages. Log rank p values between P/LP and VUS survival curves are noted in each plot.

(A) Survival analysis of participants with no variants, P/LP variants, and VUSs/absent variants in MSH2 and MSH6. Each column considers variants above an odds ratio threshold ranging from 5 to 30.

(B) Survival analysis of participants with no variants, P/LP variants with odds ratios ≥ 5/lower bounds ≥ 1, and VUSs/absent variants with odds ratios ≥ 5/lower bounds ≥ 1 in each of 9 genes that had at least 25 ClinVar pathogenic and benign variants.

(C) Survival analysis of participants with no variants, P/LP variants with odds ratios ≥ the optimal threshold, and VUSs/absent variants with odds ratios ≥ the optimal threshold in each of 9 genes that had at least 25 ClinVar pathogenic and benign variants. Note that optimal thresholds refer to the thresholds like those described in Table S5 but at the gene level.

When applying the ACMG/AMP-recommended threshold values for the application of PS4 population evidence (variants with an OR ≥ 5 and a lower 95% confidence bound ≥ 1), we found that outcomes among participants with VUSs were not significantly different from those with P/LP variants in some genes. Two genes associated with colorectal cancer, MSH6 (log rank p = 0.03) and MLH1 (log rank p = 1.4 × 10⁻⁸), as well as BRCA2 (log rank p = 0.02) and TTN (log rank p = 2.1 × 10⁻³¹), still had differences between clinical outcomes for individuals with VUSs versus P/LP variants (Figure 3B). This indicates that PS4 evidence developed using this approach based on current guidelines—in aggregate—may be as strong as P/LP annotations in some, but not all, of the genes we evaluated.

We next analyzed outcomes in the same genes using optimal OR thresholds (values at which specificity and sensitivity are maximized) at the gene level (Table S5) and found that there is no significant difference in outcomes between participants that have VUSs and P/LP variants with an OR greater than the optimal threshold in all genes except ATM (log rank p = 6.4 × 10⁻⁷) and APOB (no participants have P/LP variants with an OR ≥ the optimal threshold) (Figure 3C). When compared with the ACMG/AMP PS4-recommended OR thresholds for PS4 evidence, optimal threshold values are sometimes higher and result in similar clinical outcomes between individuals with P/LP variants and VUSs in more genes (7 of 9 versus 5 of 9).

Correlation between population, computational, and functional evidence

We analyzed the correlation between ORs (PS4/population evidence), REVEL scores (PP3/BP4/computational evidence), and functional scores (PS3/BS3/functional evidence) in 3 genes for which these data sources were all available (LDLR, BRCA1, and MSH2). Interestingly, we found that ORs are moderately correlated with REVEL scores in LDLR but not in BRCA1 and MSH2 (Figure 4A) and that ORs are moderately correlated with functional scores in LDLR and BRCA1 but not in MSH2 (Figure 4B). We note that MSH2 requires a much higher OR than LDLR and BRCA1 to reach the same level of evidence, which may contribute to the lack of correlation with other forms of evidence, given that the majority of variants we evaluated in MSH2 were below that threshold.

Correlation between population evidence and computational or functional evidence

(A) Spearman correlation between log₁₀ odds ratios and REVEL computational predictions of variant pathogenicity.

(B) Spearman correlation between log₁₀ odds ratios and functional estimates of variant impact from experimental assays.

(C) Number of variants with different forms of evidence (computational, functional, and contextual) among variants that already have strong population evidence of pathogenicity.

We then sought to identify how many VUSs in these genes might have sufficient evidence to be “pre-classified” as P/LP, making use of population, computational, and functional evidence, as well as contextual evidence from previously classified pathogenic variants, which has been shown to be potentially underused.²² We evaluate each of these forms of evidence for variants that meet the threshold for strong population evidence of pathogenicity (≥6.6 for LDLR, ≥23.2 for BRCA1, and ≥24.0 for MSH2), as described in the methods. The number of variants in LDLR, BRCA1, and MSH2 with each form of evidence is shown visually in Figure 4C.

We combine these forms of evidence using the Bayesian framework-based point system, noting that a comprehensive evaluation of all available forms of evidence would be required for a diagnostic variant interpretation. Using this approach, we are able to pre-classify 60 VUSs across LDLR, BRCA1, and MSH2 (80% in LDLR), affecting 245 participants collectively, as P/LP (points ≥ 6) (Figure S4). Notably, in LDLR, 72.2% of VUSs with strong population evidence of pathogenicity also have sufficient complementary evidence to be potentially classified as pathogenic. These VUSs with sufficient evidence for classification represent 12.4% of all rare VUSs in LDLR observed in participants, suggesting that there may be a high potential yield of variants that could be reclassified in some genes when well-calibrated functional and population evidence are available.

Discussion

Collectively, many individuals harbor a rare VUS in an actionable disease gene, and this framework can provide information to significantly accelerate the interpretation of these challenging variants. Our results demonstrate that case enrichments within large population cohorts can broadly provide meaningful clinical evidence for rare coding VUSs. This approach can streamline the generation of new information for variant assessment across a range of clinically actionable phenotypes, including rare and common disorders. We found that population-based ORs collectively reached strong evidence when applying the Bayesian adaptation of the ACMG/AMP framework and a commonly used prior of 10%.¹⁴ At the phenotypic level, population ORs reached at least moderate evidence in all 8 phenotypes with sufficient numbers of variants for calibration, with multiple phenotypes reaching strong and very strong evidence.

Calibration of evidence at the phenotype and gene levels

We calibrated population evidence at the phenotype and gene levels, defining OR thresholds that are considered supporting, moderate, strong, or very strong evidence of pathogenicity. Our approach involved calibrating and calculating prior probabilities of pathogenicity for each phenotype and gene from a biobank dataset, in contrast to previous calibration methods that use a single threshold and prior probability across all genes.¹⁴^,¹⁵ The considerations and trade-offs in different approaches to calculating prior probabilities are discussed extensively in the methods. We used these values to calculate local likelihood ratios for each gene and phenotype. Given the wide variability we observed in evidence thresholds and the prevalence of pathogenic variants among different genes and phenotypes, this approach may be valuable in future calibration efforts when sufficient data are available more broadly. It is also likely to be useful for informing the specialized work of VCEPs to determine how to apply various evidence-strength thresholds for specific genes and sub-types of disorders.

We highlighted an example in the mismatch repair complex, where we observed that variants in MSH2 and MSH6 had different OR thresholds to reach the same clinical certainty. These differences at the gene level may be statistical artifacts related to different numbers of variants with ORs, numbers of pathogenic and benign variants, or differences in the accuracy of prior clinical classifications. This may also have a biological basis; while MSH2 and MSH6 physically interact and form the Mutɑ complex, each subunit contributes uniquely to the repair process. MSH2 plays a central role in stabilizing the complex, while MSH6 provides the specificity for recognizing particular types of DNA mismatches. There are also existing known differences in effect sizes of protein-truncating variants; for example, such variants in BRCA1 or BRCA2 are much more likely to lead to cancer when compared with protein-truncating variants in ATM.²³

Calibrating the strength of evidence at the gene and phenotype levels also has limitations, as there are currently low numbers of ClinVar pathogenic and benign variants in many genes, which can naturally increase variance around calibration thresholds. Case ascertainment and available case data may differ substantially across genes, leading to varying levels of clinical certainty for pathogenic and benign variants in different genes. This is also confounded by biological factors, such as varying phenotypic effect sizes, incomplete penetrance, or stochasticity related to phenotypic effects (e.g., for endophenotypes like LDL-C leading to myocardial infarction or DNA damage repair competency leading to cancer). There may also be specialized biological modeling about variant types (e.g., cysteine residues within LDLR class A repeats) that could inform these evidence models.

Approaches to automating components of variant interpretation

Methods to automatically generate structured sources of diagnostic evidence can expedite variant assessment and prioritize the most promising variants for re-assessment. Notably, when combining the four forms of evidence that we automatically generated (population, computational, functional, and contextual), we found that 72.2% of VUSs in LDLR with strong or very strong population evidence also have sufficient complementary evidence to be potentially classified as pathogenic and that these VUSs with sufficient evidence represent 12.4% of all rare VUSs in LDLR seen in participants. As functional assay data for additional genes are developed and computational scores evolve, combining these sources of evidence can help dramatically scale variant interpretation. We note that the broad application of this framework will require a careful review of VCEP classification criteria (e.g., evidence-type PM5 should not be applied when classifying variants in BRCA1 or BRCA2).

Limitations and future directions

In populations other than those most highly represented in the UK Biobank, there is an insufficient number of participants with rare missense variants to make robust estimates of risk. Therefore, we note that our estimates may not be generalizable to those other populations or those not ascertained during adulthood, and future work may estimate population-specific variant effects. Additionally, while our analysis focuses on a subset of actionable genes that are some of the most commonly screened in diagnostic settings, future work may analyze a broader set of genes. Given small variant counts at the gene level, we caution that gene-level calibration may be data constrained and that statistical estimates will become more powerful as biobank sizes grow. We provide higher-confidence calibration thresholds in our fully aggregated calibration across all 18 phenotypes we analyzed (Table S3).

To remain consistent with ACMG/AMP guidelines and commonly used standards developed by VCEPs, we use an OR to represent the enrichment of disease at the population level for individual variants. Future work may evaluate whether other representations of population evidence (e.g., using other statistical measures or models) can achieve better performance. We note that OR estimates can be confounded by low variant allele counts or population structure. Separately, because we have focused on rare variants in genes where coding variants are known to have a substantial effect, we presume that these variants are likely to be truly causal. In rare cases, it is possible for a variant we analyzed to be tagged by a more common coding variant with functional effect, though this is unlikely to impact our calibration efforts, which were aggregated at the gene level. Additionally, many complex disorders are attributable to other complex genetic, behavioral, or environmental factors, in addition to their established monogenic contributions.

Conclusion

In summary, this analysis presents a comprehensive approach to assess the impact of germline variants using endophenotypic and disease risk data from a national biobank. For the set of genes we analyzed, we calculated variant-level ORs and calibrated the strengths of evidence, then used these to identify VUSs that can potentially be reclassified as pathogenic. By highlighting the utility of biobank data and calibrating them, we hope that this form of population evidence can be adopted to inform variant interpretation broadly.

Data and code availability

Variant-level ORs are available at https://doi.org/10.6084/m9.figshare.29143331 for all 18 phenotypes studied.

Acknowledgments

We are grateful to the UK Biobank and its participants who provided biological samples and data for this study, performed under UK Biobank application 41250 and Mass General Brigham IRB protocol 2020P002093. We gratefully acknowledge funding from NIH R01HG010372 (V.B., T.Y., L.B., and C.A.C.), R01HG013350 (V.P.), and R56HG012681 (T.Y. and C.A.C.) and from the American Heart Association (24TPA1300072: T.Y. and C.A.C.).

Author contributions

Manuscript, V.B., C.A.C., V.P., M.L., and S.H.; data curation, V.B., T.Y., and L.B.; statistical analysis, V.B., C.A.C., and V.P.

Declaration of interests

The authors declare no competing interests.

Published: July 9, 2025

Footnotes

Supplemental information can be found online at https://doi.org/10.1016/j.ajhg.2025.06.012.

Supplemental information

Document S1. Figures S1–S4, Tables S1–S6, and supplemental methods

mmc1.pdf^{(706.2KB, pdf)}

Document S2. Article plus supplemental information

mmc2.pdf^{(8.6MB, pdf)}

References

1.Green E.D., Gunter C., Biesecker L.G., Di Francesco V., Easter C.L., Feingold E.A., Felsenfeld A.L., Kaufman D.J., Ostrander E.A., Pavan W.J., et al. Strategic vision for improving human health at The Forefront of Genomics. Nature. 2020;586:683–692. doi: 10.1038/s41586-020-2817-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
2.Rehm H.L., Berg J.S., Brooks L.D., Bustamante C.D., Evans J.P., Landrum M.J., Ledbetter D.H., Maglott D.R., Martin C.L., Nussbaum R.L., et al. ClinGen — The Clinical Genome Resource. N. Engl. J. Med. 2015;372:2235–2242. doi: 10.1056/NEJMsr1406261. [DOI] [PMC free article] [PubMed] [Google Scholar]
3.Cassa C.A., Tong M.Y., Jordan D.M. Large numbers of genetic variants considered to be pathogenic are common in asymptomatic individuals. Hum. Mutat. 2013;34:1216–1220. doi: 10.1002/humu.22375. [DOI] [PMC free article] [PubMed] [Google Scholar]
4.Fowler D.M., Rehm H.L. Will variants of uncertain significance still exist in 2030? Am. J. Hum. Genet. 2024;111:5–10. doi: 10.1016/j.ajhg.2023.11.005. [DOI] [PMC free article] [PubMed] [Google Scholar]
5.Murray M.F., Giovanni M.A., Doyle D.L., Harrison S.M., Lyon E., Manickam K., Monaghan K.G., Rasmussen S.A., Scheuner M.T., Palomaki G.E., et al. DNA-based screening and population health: a points to consider statement for programs and sponsoring organizations from the American College of Medical Genetics and Genomics (ACMG) Genet. Med. 2021;23:989–995. doi: 10.1038/s41436-020-01082-w. [DOI] [PubMed] [Google Scholar]
6.Richards S., Aziz N., Bale S., Bick D., Das S., Gastier-Foster J., Grody W.W., Hegde M., Lyon E., Spector E., et al. Standards and guidelines for the interpretation of sequence variants: a joint consensus recommendation of the American College of Medical Genetics and Genomics and the Association for Molecular Pathology. Genet. Med. 2015;17:405–424. doi: 10.1038/gim.2015.30. [DOI] [PMC free article] [PubMed] [Google Scholar]
7.Zanti M., O’Mahony D.G., Parsons M.T., Dorling L., Dennis J., Boddicker N.J., Chen W., Hu C., Naven M., Yiangou K., et al. Analysis of more than 400,000 women provides case-control evidence for BRCA1 and BRCA2 variant classification. medRxiv. 2024 doi: 10.1101/2024.09.04.24313051. Preprint at. [DOI] [PMC free article] [PubMed] [Google Scholar]
8.Parsons M.T., de la Hoya M., Richardson M.E., Tudini E., Anderson M., Berkofsky-Fessler W., Caputo S.M., Chan R.C., Cline M.S., Feng B.J., et al. Evidence-based recommendations for gene-specific ACMG/AMP variant classification from the ClinGen ENIGMA BRCA1 and BRCA2 Variant Curation Expert Panel. Am. J. Hum. Genet. 2024;111:2044–2058. doi: 10.1016/j.ajhg.2024.07.013. [DOI] [PMC free article] [PubMed] [Google Scholar]
9.Backman J.D., Li A.H., Marcketta A., Sun D., Mbatchou J., Kessler M.D., Benner C., Liu D., Locke A.E., Balasubramanian S., et al. Exome sequencing and analysis of 454,787 UK Biobank participants. Nature. 2021;599:628–634. doi: 10.1038/s41586-021-04103-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
10.Halford J.L., Morrill V.N., Choi S.H., Jurgens S.J., Melloni G., Marston N.A., Weng L.C., Nauffal V., Hall A.W., Gunn S., et al. Endophenotype effect sizes support variant pathogenicity in monogenic disease susceptibility genes. Nat. Commun. 2022;13:5106. doi: 10.1038/s41467-022-32009-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
11.Bycroft C., Freeman C., Petkova D., Band G., Elliott L.T., Sharp K., Motyer A., Vukcevic D., Delaneau O., O'Connell J., et al. The UK Biobank resource with deep phenotyping and genomic data. Nature. 2018;562:203–209. doi: 10.1038/s41586-018-0579-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
12.Khera A.V., Chaffin M., Aragam K.G., Haas M.E., Roselli C., Choi S.H., Natarajan P., Lander E.S., Lubitz S.A., Ellinor P.T., Kathiresan S. Genome-wide polygenic scores for common diseases identify individuals with risk equivalent to monogenic mutations. Nat. Genet. 2018;50:1219–1224. doi: 10.1038/s41588-018-0183-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
13.Landrum M.J., Lee J.M., Riley G.R., Jang W., Rubinstein W.S., Church D.M., Maglott D.R. ClinVar: Public archive of relationships among sequence variation and human phenotype. Nucleic Acids Res. 2014;42:D980–D985. doi: 10.1093/nar/gkt1113. [DOI] [PMC free article] [PubMed] [Google Scholar]
14.Tavtigian S.V., Greenblatt M.S., Harrison S.M., Nussbaum R.L., Prabhu S.A., Boucher K.M., Biesecker L.G., ClinGen Sequence Variant Interpretation Working Group ClinGen SVI Modeling the ACMG/AMP variant classification guidelines as a Bayesian classification framework. Genet. Med. 2018;20:1054–1060. doi: 10.1038/gim.2017.210. [DOI] [PMC free article] [PubMed] [Google Scholar]
15.Pejaver V, Byrne AB, Feng BJ, Pagel, K.A., Mooney, S.D., Karchin, R., O’Donnell-Luria, A., Harrison, S.M., Tavtigian, S.V., Greenblatt, M.S., et al., Evidence-Based Calibration of Computational Tools for Missense Variant Pathogenicity Classification and ClinGen Recommendations for Clinical Use of PP3/BP4 Criteria. Bioinformatics; 2022. doi: 10.1101/2022.03.17.484479 [DOI] [PMC free article] [PubMed]
16.Davidson-Pilon C, Kalderstam J, Zivich P,Kuhn, B., Fiore-Gartland, A., Moneda, L., WIlson, D., Parij, A., Stark, K., Anton, S. et al. CamDavidsonPilon/lifelines: v0.23.9. Published online January 28, 2020. doi: 10.5281/ZENODO.3629409 [DOI]
17.Tavtigian S.V., Harrison S.M., Boucher K.M., Biesecker L.G. Fitting a naturally scaled point system to the ACMG/AMP variant classification guidelines. Hum. Mutat. 2020;41:1734–1737. doi: 10.1002/humu.24088. [DOI] [PMC free article] [PubMed] [Google Scholar]
18.Ryu J., Barkal S., Yu T., Jankowiak M., Zhou Y., Francoeur M., Phan Q.V., Li Z., Tognon M., Brown L., et al. Joint genotypic and phenotypic outcome modeling improves base editing variant effect quantification. Nat. Genet. 2024;56:925–937. doi: 10.1038/s41588-024-01726-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
19.Findlay G.M., Daza R.M., Martin B., Zhang M.D., Leith A.P., Gasperini M., Janizek J.D., Huang X., Starita L.M., Shendure J. Accurate classification of BRCA1 variants with saturation genome editing. Nature. 2018;562:217–222. doi: 10.1038/s41586-018-0461-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
20.Jia X., Burugula B.B., Chen V., Lemons R.M., Jayakody S., Maksutova M., Kitzman J.O. Massively parallel functional testing of MSH2 missense variants conferring Lynch syndrome risk. Am. J. Hum. Genet. 2021;108:163–175. doi: 10.1016/j.ajhg.2020.12.003. [DOI] [PMC free article] [PubMed] [Google Scholar]
21.Karczewski K.J., Francioli L.C., Tiao G., Cummings B.B., Alföldi J., Wang Q., Collins R.L., Laricchia K.M., Ganna A., Birnbaum D.P., et al. The mutational constraint spectrum quantified from variation in 141,456 humans. Nature. 2020;581:434–443. doi: 10.1038/s41586-020-2308-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
22.Bhat V., Adzhubei I.A., Fife J.D., Lebo M., Cassa C.A. Informing variant assessment using structured evidence from prior classifications (PS1, PM5, and PVS1 sequence variant interpretation criteria) Genet. Med. 2023;25:16–26. doi: 10.1016/j.gim.2022.09.009. [DOI] [PMC free article] [PubMed] [Google Scholar]
23.Dorling L., Carvalho S., Allen J., González-Neira A., Luccarini C., Wahlström C., Pooley K.A., Parsons M.T., Fortuno C., et al. Breast Cancer Association Consortium Breast Cancer Risk Genes - Association Analysis in More than 113,000 Women. N. Engl. J. Med. 2021;384:428–439. doi: 10.1056/NEJMoa1913948. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Document S1. Figures S1–S4, Tables S1–S6, and supplemental methods

mmc1.pdf^{(706.2KB, pdf)}

Document S2. Article plus supplemental information

mmc2.pdf^{(8.6MB, pdf)}

Data Availability Statement

Variant-level ORs are available at https://doi.org/10.6084/m9.figshare.29143331 for all 18 phenotypes studied.

[bib1] 1.Green E.D., Gunter C., Biesecker L.G., Di Francesco V., Easter C.L., Feingold E.A., Felsenfeld A.L., Kaufman D.J., Ostrander E.A., Pavan W.J., et al. Strategic vision for improving human health at The Forefront of Genomics. Nature. 2020;586:683–692. doi: 10.1038/s41586-020-2817-4. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib2] 2.Rehm H.L., Berg J.S., Brooks L.D., Bustamante C.D., Evans J.P., Landrum M.J., Ledbetter D.H., Maglott D.R., Martin C.L., Nussbaum R.L., et al. ClinGen — The Clinical Genome Resource. N. Engl. J. Med. 2015;372:2235–2242. doi: 10.1056/NEJMsr1406261. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib3] 3.Cassa C.A., Tong M.Y., Jordan D.M. Large numbers of genetic variants considered to be pathogenic are common in asymptomatic individuals. Hum. Mutat. 2013;34:1216–1220. doi: 10.1002/humu.22375. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib4] 4.Fowler D.M., Rehm H.L. Will variants of uncertain significance still exist in 2030? Am. J. Hum. Genet. 2024;111:5–10. doi: 10.1016/j.ajhg.2023.11.005. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib5] 5.Murray M.F., Giovanni M.A., Doyle D.L., Harrison S.M., Lyon E., Manickam K., Monaghan K.G., Rasmussen S.A., Scheuner M.T., Palomaki G.E., et al. DNA-based screening and population health: a points to consider statement for programs and sponsoring organizations from the American College of Medical Genetics and Genomics (ACMG) Genet. Med. 2021;23:989–995. doi: 10.1038/s41436-020-01082-w. [DOI] [PubMed] [Google Scholar]

[bib6] 6.Richards S., Aziz N., Bale S., Bick D., Das S., Gastier-Foster J., Grody W.W., Hegde M., Lyon E., Spector E., et al. Standards and guidelines for the interpretation of sequence variants: a joint consensus recommendation of the American College of Medical Genetics and Genomics and the Association for Molecular Pathology. Genet. Med. 2015;17:405–424. doi: 10.1038/gim.2015.30. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib7] 7.Zanti M., O’Mahony D.G., Parsons M.T., Dorling L., Dennis J., Boddicker N.J., Chen W., Hu C., Naven M., Yiangou K., et al. Analysis of more than 400,000 women provides case-control evidence for BRCA1 and BRCA2 variant classification. medRxiv. 2024 doi: 10.1101/2024.09.04.24313051. Preprint at. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib8] 8.Parsons M.T., de la Hoya M., Richardson M.E., Tudini E., Anderson M., Berkofsky-Fessler W., Caputo S.M., Chan R.C., Cline M.S., Feng B.J., et al. Evidence-based recommendations for gene-specific ACMG/AMP variant classification from the ClinGen ENIGMA BRCA1 and BRCA2 Variant Curation Expert Panel. Am. J. Hum. Genet. 2024;111:2044–2058. doi: 10.1016/j.ajhg.2024.07.013. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib9] 9.Backman J.D., Li A.H., Marcketta A., Sun D., Mbatchou J., Kessler M.D., Benner C., Liu D., Locke A.E., Balasubramanian S., et al. Exome sequencing and analysis of 454,787 UK Biobank participants. Nature. 2021;599:628–634. doi: 10.1038/s41586-021-04103-z. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib10] 10.Halford J.L., Morrill V.N., Choi S.H., Jurgens S.J., Melloni G., Marston N.A., Weng L.C., Nauffal V., Hall A.W., Gunn S., et al. Endophenotype effect sizes support variant pathogenicity in monogenic disease susceptibility genes. Nat. Commun. 2022;13:5106. doi: 10.1038/s41467-022-32009-5. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib11] 11.Bycroft C., Freeman C., Petkova D., Band G., Elliott L.T., Sharp K., Motyer A., Vukcevic D., Delaneau O., O'Connell J., et al. The UK Biobank resource with deep phenotyping and genomic data. Nature. 2018;562:203–209. doi: 10.1038/s41586-018-0579-z. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib12] 12.Khera A.V., Chaffin M., Aragam K.G., Haas M.E., Roselli C., Choi S.H., Natarajan P., Lander E.S., Lubitz S.A., Ellinor P.T., Kathiresan S. Genome-wide polygenic scores for common diseases identify individuals with risk equivalent to monogenic mutations. Nat. Genet. 2018;50:1219–1224. doi: 10.1038/s41588-018-0183-z. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib13] 13.Landrum M.J., Lee J.M., Riley G.R., Jang W., Rubinstein W.S., Church D.M., Maglott D.R. ClinVar: Public archive of relationships among sequence variation and human phenotype. Nucleic Acids Res. 2014;42:D980–D985. doi: 10.1093/nar/gkt1113. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib14] 14.Tavtigian S.V., Greenblatt M.S., Harrison S.M., Nussbaum R.L., Prabhu S.A., Boucher K.M., Biesecker L.G., ClinGen Sequence Variant Interpretation Working Group ClinGen SVI Modeling the ACMG/AMP variant classification guidelines as a Bayesian classification framework. Genet. Med. 2018;20:1054–1060. doi: 10.1038/gim.2017.210. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib15] 15.Pejaver V, Byrne AB, Feng BJ, Pagel, K.A., Mooney, S.D., Karchin, R., O’Donnell-Luria, A., Harrison, S.M., Tavtigian, S.V., Greenblatt, M.S., et al., Evidence-Based Calibration of Computational Tools for Missense Variant Pathogenicity Classification and ClinGen Recommendations for Clinical Use of PP3/BP4 Criteria. Bioinformatics; 2022. doi: 10.1101/2022.03.17.484479 [DOI] [PMC free article] [PubMed]

[bib16] 16.Davidson-Pilon C, Kalderstam J, Zivich P,Kuhn, B., Fiore-Gartland, A., Moneda, L., WIlson, D., Parij, A., Stark, K., Anton, S. et al. CamDavidsonPilon/lifelines: v0.23.9. Published online January 28, 2020. doi: 10.5281/ZENODO.3629409 [DOI]

[bib17] 17.Tavtigian S.V., Harrison S.M., Boucher K.M., Biesecker L.G. Fitting a naturally scaled point system to the ACMG/AMP variant classification guidelines. Hum. Mutat. 2020;41:1734–1737. doi: 10.1002/humu.24088. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib18] 18.Ryu J., Barkal S., Yu T., Jankowiak M., Zhou Y., Francoeur M., Phan Q.V., Li Z., Tognon M., Brown L., et al. Joint genotypic and phenotypic outcome modeling improves base editing variant effect quantification. Nat. Genet. 2024;56:925–937. doi: 10.1038/s41588-024-01726-6. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib19] 19.Findlay G.M., Daza R.M., Martin B., Zhang M.D., Leith A.P., Gasperini M., Janizek J.D., Huang X., Starita L.M., Shendure J. Accurate classification of BRCA1 variants with saturation genome editing. Nature. 2018;562:217–222. doi: 10.1038/s41586-018-0461-z. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib20] 20.Jia X., Burugula B.B., Chen V., Lemons R.M., Jayakody S., Maksutova M., Kitzman J.O. Massively parallel functional testing of MSH2 missense variants conferring Lynch syndrome risk. Am. J. Hum. Genet. 2021;108:163–175. doi: 10.1016/j.ajhg.2020.12.003. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib21] 21.Karczewski K.J., Francioli L.C., Tiao G., Cummings B.B., Alföldi J., Wang Q., Collins R.L., Laricchia K.M., Ganna A., Birnbaum D.P., et al. The mutational constraint spectrum quantified from variation in 141,456 humans. Nature. 2020;581:434–443. doi: 10.1038/s41586-020-2308-7. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib22] 22.Bhat V., Adzhubei I.A., Fife J.D., Lebo M., Cassa C.A. Informing variant assessment using structured evidence from prior classifications (PS1, PM5, and PVS1 sequence variant interpretation criteria) Genet. Med. 2023;25:16–26. doi: 10.1016/j.gim.2022.09.009. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib23] 23.Dorling L., Carvalho S., Allen J., González-Neira A., Luccarini C., Wahlström C., Pooley K.A., Parsons M.T., Fortuno C., et al. Breast Cancer Association Consortium Breast Cancer Risk Genes - Association Analysis in More than 113,000 Women. N. Engl. J. Med. 2021;384:428–439. doi: 10.1056/NEJMoa1913948. [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

Extracting and calibrating evidence of variant pathogenicity from population biobank data

Vineel Bhat

Tian Yu

Lara Brown

Vikas Pejaver

Matthew Lebo

Steven Harrison

Christopher A Cassa

Summary

Introduction

Methods

Population characteristics and genomic data from the UK Biobank

Table 1.

Clinical endpoints and endophenotypes

Calculating variant-level ORs from population cohort data

ClinVar clinical assertions from diagnostic laboratories

Calibrating population evidence of pathogenicity (PS4)

Estimating the prevalence of pathogenic variants

Statistical framework

Alternative approaches considered for calibration

Survival analysis

Conversion of evidence to the point system

Results

Population characteristics

Population-based ORs separate pathogenic and benign variants in many phenotypes

Figure 1.

Systematically calibrating evidence thresholds for population-based ORs

Figure 2.

Table 2.

Table 3.

Comparing outcomes for participants with VUSs with high ORs versus pathogenic variants

Figure 3.

Correlation between population, computational, and functional evidence

Figure 4.

Discussion

Calibration of evidence at the phenotype and gene levels

Approaches to automating components of variant interpretation

Limitations and future directions

Conclusion

Data and code availability

Acknowledgments

Author contributions

Declaration of interests

Footnotes

Supplemental information

References

Associated Data

Supplementary Materials

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases