Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2018 Dec 10.
Published in final edited form as: Genet Med. 2018 Jun 8;21(1):71–80. doi: 10.1038/s41436-018-0018-4

Comprehensive annotation of BRCA1 and BRCA2 missense variants by functionally validated sequence-based computational prediction models

Steven N Hart 1, Tanya Hoskin 1, Hermela Shimelis 2, Raymond M Moore 1, Bingjian Feng 3, Abigail Thomas 1, Noralane M Lindor 4, Eric C Polley 1, David E Goldgar 3, Edwin Iversen 5, Alvaro NA Monteiro 6, Vera J Suman 1, Fergus J Couch 1,2,*
PMCID: PMC6287763  NIHMSID: NIHMS953431  PMID: 29884841

Abstract

Purpose:

To improve methods for predicting the impact of missense variants of uncertain significance (VUS) in BRCA1 and BRCA2 on protein function.

Methods:

Functional data for 248 BRCA1 and 207 BRCA2 variants from assays with established high sensitivity and specificity for damaging variants were used to recalibrate 40 in silico algorithms predicting the impact of variants on protein activity. Additional RandomForest (RF) and Naïve Voting Method (NVM) meta-predictors for both BRCA1 and BRCA2 were developed to increase predictive accuracy.

Results:

Optimized thresholds for in silico prediction models significantly improved the accuracy of predicted functional effects for BRCA1 and BRCA2 variants. In addition, new BRCA1-RF and BRCA2-RF meta-predictors showed AUC values of 0.92 (95%CI:0.88–0.96) and 0.90 (95%CI:0.84–0.95), respectively. Similarly, the BRCA1-NVM and BRCA2-NVM models had AUCs of 0.93 and 0.90. The RF and NVM models were used to predict the pathogenicity of all possible missense variants in BRCA1 and BRCA2.

Conclusion:

The recalibrated algorithms and new meta-predictors significantly improved upon current models for predicting the impact of variants in cancer risk-associated domains of BRCA1 and BRCA2. Prediction of the functional impact of all possible variants in BRCA1 and BRCA2 provides important information about the clinical relevance of variants in these genes.

Keywords: BRCA1 and BRCA2, VUS, functional evaluation, in silico prediction, meta-predictor

INTRODUCTION

Pathogenic variants in BRCA1 and BRCA2 account for 20–25% of hereditary breast and ovarian cancer1, 5–10% of breast cancers2, and up to 15% of ovarian cancers3. While most known pathogenic variants in these genes truncate the encoded proteins, missense variants can also predispose to cancer. More than 90% of missense variants in public databases4 identified by clinical genetic testing are listed as variants of uncertain significance (VUS)5. Missense variants with definitive pathogenic or neutral status can inform clinical management, prevention, and treatment. Thus, accurate methods to establish variant pathogenicity are needed.

Family-based studies yielding likelihoods of pathogenicity, based on segregation of variants with cancer and personal and family history of cancer are established methods for determining pathogenicity of variants in BRCA1 and BRCA2. However, few missense variants have been clinically annotated by this method owing to the limited availability of family-based data. Similarly, functional assays6,7 with established specificity and sensitivity for known pathogenic and neutral BRCA1 or BRCA2 variants, have been used alone or in combination with family-based segregation data to infer pathogenicity8. However, classification of all possible variants by functional assays is unlikely. Alternatively, the clinical relevance of variants can be assessed using sequence-based in silico prediction models, which can be applied to all possible missense VUS in these genes. Given the large number of unique VUS identified in BRCA1 and BRCA2, in silico prediction models will need to be incorporated in models that aim to predict the pathogenicity of VUS in these genes. Most commonly used prediction tools such as SIFT9, PolyPhen10, GERP11, Align-GVGD12, and CADD13 have been developed using large-scale databases such as the Human Gene Mutation Database (HGMD)14 or ClinVar4. While functional assays can out-perform these computational predictions of damage6,15, development and/or calibration of in silico prediction models using well characterized functional data from validated assays is expected to improve variant annotation.

In this study, HDR6 functional data from 207 BRCA2 variants and transcriptional integrity16,17 data for 248 BRCA1 variants were used to evaluate the performance of existing in silico algorithms. Sensitivity and specificity of the algorithms were optimized by defining more accurate thresholds, and by newer high performance Random Forest (RF) and naïve voting method (NVM) predictors. We show that optimization for one gene leads to poor performance when applied to the other, highlighting the importance of different gene-specific features for prediction accuracy.

MATERIALS AND METHODS

BRCA1 transcription integrity assay

Results from functional studies of variants in the BRCT domains of BRCA1 using a transcription integrity assay have been reported previously16,17. The sensitivity and specificity of this assay for missense variants in the BRCT domains of BRCA1 have been estimated at 100% (Sensitivity, 95%CI: 75%−100%; Specificity, 95%CI: 83%−100%)16. The 95% probability of pathogenicity and neutrality from the VarCall two-component mixture model for classification of BRCA1 missense variants8 was used to define 61 pathogenic, 21 indeterminate (partial effect on function), and 166 neutral variants (total of 248). These data were used to define BRCA1 activity.

BRCA2 HDR assay

A cell-based homology directed DNA repair activity assay was used to assess the influence of missense variants in the DNA binding domain of BRCA2 on protein activity6. In brief, BRCA2 activity in brca2 deficient V-C8 cells expressing mutant forms of full-length BRCA2 was measured with a DR-GFP reporter plasmid after induction of a DNA double strand break using the I-Sce1 enzyme. The V-C8 hamster lung fibroblast cell line was a gift from Dr. Margaret Zdzienicka. Cells were verified by genotyping in the Mayo Clinic Medical Research Facility and routinely tested for mycoplasma contamination. The sensitivity and specificity of this assay for damaging missense variants in the DNA binding domain (DBD) of BRCA2 has previously been estimated at 100% (Sensitivity, 95%CI: 79%−100%; Specificity, 95%CI: 93%−100%) using 21 known neutral and 13 known pathogenic variants6,1820. Results from 68 variants were combined with previous results from 139 previously characterized variants for a total of 207.

Damaging missense prediction tools

dbNSFP version 3.0a21 was downloaded and converted into a BioR catalogue22 to annotate variants. Align-GVGD12 was accessed online. CAROL and CONDEL scores were gathered from Variant Effect Predictor (VEP)23.

Optimized thresholds

Analyses included damaging, indeterminate, and neutral variants. Indeterminate variants were included in the neutral category (Scenario 1). An alternative approach, in which indeterminate variants are included in the pathogenic category (Scenario 2) is provided in Supplemental Materials. Optimal thresholds for individual predictive algorithms that maximized sensitivity and specificity for damaging variants were derived using results from the BRCA1 transcriptional integrity assay and BRCA2 HDR assay, individually (Figure S1 and S2). Matthew’s correlation coefficients (MCC) were calculated for each resulting binary classification relative to the functional assay standards24. The areas under the curve (AUCs) were estimated and reported with 95% confidence intervals using the DeLong error method. Receiver operating characteristic (ROC) analyses were performed using the package optimalCutpoints25 for R software (v3.3.3; http://www.R-project.org).

Naïve Voting Method (NVM) models.

For each gene, a training set (a random sample of ~50% of the variants for each gene) and a test set (the remaining ~50%) were constructed using the sample function in R. The training set was used to determine the optimal number of individual prediction algorithms in the NVM model based on the maximal MCC. Starting with the individual prediction algorithm with the highest MCC, the prediction algorithm with the highest individual MCC among the models not previously chosen was added iteratively until the optimal numbers of prediction algorithms were included. If both a raw score (Score) and rank score (RankScore) for an algorithm were available, then only the RankScore was utilized. The NVM models and thresholds developed in the training sets were validated in the test sets. The MCC and other performance statistics were also re-calculated across the entire data sets (training and test combined) to be consistent with reporting of other models. Lollipop plots were generated with lollipops (v1.2, http://dx.doi.org/10.5281/zenodo.46184).

Random Forest models.

Random forest (RF) modelling utilized scores from each of the optimized individual prediction algorithms to identify the subset of prediction algorithms that maximized the accuracy of predicting damaging and non-damaging (indeterminate and neutral) variants in BRCA1 and BRCA2. The randomForest R package26 was used with settings of n=500 trees and the number of predictor variables sampled as candidates at each split set to the recommended default of sqrt(p), where p is the number of predictor variables included in the model. For individual prediction algorithms available as both a Score and RankScore, only the RankScore was included in the random forest models. Variable importance was assessed using the mean decrease in accuracy resulting from exclusion of a given prediction model from the RF classifiers. Out-of-sample predictions on the probability scale were again derived for each model and used to estimate AUC, sensitivity, specificity, and MCC at optimized cut points for prediction of functional status.

Comparison to ClinVar

BRCA1 and BRCA2 classifications from ClinVar that were reviewed by an expert panel and had no conflicting interpretations were used. Pathogenic and likely pathogenic variants in ClinVar were grouped into the pathogenic (damaging) category, and variants annotated as benign or likely benign in ClinVar with no conflicting interpretations were defined as neutral (neutral).

Code availability

All code and data required to replicate all analyses are available on GitHub (https://github.com/Steven-N-Hart/NVM).

RESULTS

Functional characterization of 68 novel BRCA2 missense variants

In this study, 68 BRCA2 variants from the BRCA2 DBD were evaluated using the HDR assay. Of these, 17 showed HDR fold change <1.66, with probabilities of pathogenicity >0.99 (Table 1, Figure 1, Table S1), and 48 variants showed HDR>2.41 and probabilities of neutrality >0.99. Another three variants (p.I2672T, p.D2733V and p.P3150L) displayed partial activity (HDR fold change >1.66 and <2.41) and were annotated as indeterminate variants (Figure 1, Table S1). When combined with previously classified variants6,27, 69 were predicted deleterious (damaging), 21 were intermediate/partial (indeterminate), and 117 were predicted benign/neutral (neutral) (Table 1, Table S1).

Table 1.

Predicted pathogenic missense variants defined by the BRCA2 HDR assay

Variant cDNA AGVGD
class
IARC
class
FC* SE# p
(pathogenicity)
p
(neutrality)
Origin

G2748D c.8243G>A C65 Class 5 0.68 ± 0.07 1.00 2.96E-12 Lindor et al., 2012
L2686P c.8057T>C C45 0.72 ± 0.04 1.00 9.50E-12 Guidugli et al., 2017
L2653P c.7958T>C C65 Class 5 0.75 ± 0.08 1.00 3.40E-11 Lindor et al., 2012
E2663K c.7987G>A C55 0.83 ± 0.01 1.00 4.00E-10 Current study
R3052W c.9154C>T C65 Class 5 0.86 ± 0.09 1.00 9.33E-10 Lindor et al., 2012
L2721H c.8162T>A C25 0.88 ± 0.10 1.00 1.70E-09 Guidugli et al., 2017
Y2624D c.7870T>G C65 0.88 ± 0.06 1.00 1.75E-09 Current study
Y2624H c.7870T>C C65 0.89 ± 0.31 1.00 2.40E-09 Current study
L3125R c.9374T>G C65 0.91 ± 0.00 1.00 3.41E-09 Current study
R2784W c.8350C>T C65 0.91 ± 0.10 1.00 3.88E-09 Guidugli et al., 2017
A2603P c.7807G>C C25 0.92 ± 0.60 1.00 4.78E-09 Guidugli et al., 2017
N3124I c.9371A>T C65 Class 4 0.92 ± 0.10 1.00 4.65E-09 Guidugli et al., 2013
L2647P c.7940T>C C65 Class 4 0.93 ± 0.10 1.00 5.79E-09 Lindor et al., 2012
S2670L c.8009C>T C15 0.93 ± 0.07 1.00 6.20E-09 Guidugli et al., 2017
G3076E c.9227G>A C65 0.95 ± 0.10 1.00 1.02E-08 Guidugli et al., 2017
Y2624N c.7870T>A C65 0.95 ± 0.00 1.00 1.20E-08 Current study
L3125H c.9374T>A C65 0.96 ± 0.07 1.00 1.36E-08 Guidugli et al., 2017
L2510P c.7529T>C C65 0.98 ± 0.11 1.00 2.61E-08 Guidugli et al., 2017
K2630Q c.7888A>C C45 0.98 ± 0.08 1.00 2.49E-08 Guidugli et al., 2017
R2824G c.8470A>G C65 1.00 ± 0.20 1.00 3.72E-08 Current study
H2623R c.7868A>G C25 1.00 ± 0.04 1.00 3.59E-08 Guidugli et al., 2017
D2723H c.8167G>C C65 Class 5 1.00 ± 0.01 1.00 3.81E-08 Lindor et al., 2012
N2781I c.8342A>T C65 1.00 ± 0.10 1.00 4.08E-08 Current study
I2627F c.7879A>T C15 Class 5 1.01 ± 0.08 1.00 4.83E-08 Lindor et al., 2012
A2730P c.8188G>C C0 1.01 ± 0.01 1.00 5.46E-08 Current study
L2688P c.8063T>C C65 Class 4 1.02 ± 0.08 1.00 5.80E-08 Guidugli et al., 2013
G3076R c.9227G>T C65 1.03 ± 0.08 1.00 8.66E-08 Guidugli et al., 2017
G3076V c.9226G>C C65 1.03 ± 0.11 1.00 8.08E-08 Guidugli et al., 2017
D2723V c.8168A>T C65 1.04 ± 0.08 1.00 1.01E-07 Guidugli et al., 2017
W2788R c.8362T>C C25 1.05 ± 0.08 1.00 1.37E-07 Guidugli et al., 2017
D2723A c.8168A>C C65 1.06 ± 0.11 1.00 1.49E-07 Guidugli et al., 2017
W2788S c.8363G>C C35 1.06 ± 0.08 1.00 1.44E-07 Guidugli et al., 2017
N3124K c.9372C>A C65 1.07 ± 0.06 1.00 1.89E-07 Current study
G2609V c.7826G>T C65 1.07 ± 0.08 1.00 2.24E-07 Guidugli et al., 2017
F2642S c.7925T>C C45 1.07 ± 0.08 1.00 2.02E-07 Guidugli et al., 2017
D3095E c.9285C>G C35 Class 4 1.07 ± 0.12 1.00 1.89E-07 Guidugli et al., 2013
T2722R c.8165C>G C65 Class 5 1.08 ± 0.08 1.00 2.78E-07 Lindor et al., 2012
G2508R c.7522G>C C65 1.09 ± 0.15 1.00 2.92E-07 Current study
S2691F c.8072C>T C0 1.10 ± 0.06 1.00 4.09E-07 Guidugli et al., 2017
E3002K c.9004G>A C55 1.10 ± 0.08 1.00 4.35E-07 Guidugli et al., 2017
D2723G c.8168A>G C65 Class 5 1.11 ± 0.12 1.00 5.19E-07 Lindor et al., 2012
W2626R c.7876T>C C65 1.12 ± 0.01 1.00 6.04E-07 Current study
G2596E c.7787G>A C65 1.12 ± 0.09 1.00 5.77E-07 Guidugli et al., 2017
Q2561P c.7682A>C C15 1.13 ± 0.06 1.00 7.81E-07 Guidugli et al., 2017
V2687F c.8059G>T C0 1.14 ± 0.02 1.00 8.96E-07 Current study
H2623Y c.7867C>T C65 1.15 ± 0.01 1.00 1.32E-06 Current study
W2626C c.7878G>C C65 Class 5 1.16 ± 0.13 1.00 1.56E-06 Lindor et al., 2012
G2793R c.8377G>A C65 1.18 ± 0.09 1.00 2.23E-06 Guidugli et al., 2017
A3028P c.9082G>C C0 1.18 ± 0.04 1.00 2.48E-06 Current study
L2792P c.8375T>C C65 1.19 ± 0.09 1.00 3.03E-06 Guidugli et al., 2017
G2793E c.8378G>A C65 1.19 ± 0.07 1.00 2.71E-06 Guidugli et al., 2017
A2786P c.8356G>C C0 1.23 ± 0.09 1.00 5.97E-06 Guidugli et al., 2017
R2784Q c.8351G>A C35 1.27 ± 0.14 1.00 1.46E-05 Guidugli et al., 2017
G2596R c.7786G>C C65 1.28 ± 0.10 1.00 1.73E-05 Guidugli et al., 2017
G2585R c.7753G>A C65 1.30 ± 0.10 1.00 2.47E-05 Guidugli et al., 2017
G3003E c.9008G>A C65 1.33 ± 0.08 1.00 4.59E-05 Guidugli et al., 2017
G2609D c.7826G>A C65 Class 4 1.35 ± 0.07 1.00 6.19E-05 Guidugli et al., 2013
W2725L c.8174G>T C55 1.35 ± 0.10 1.00 6.73E-05 Guidugli et al., 2017
K2498E c.7492A>G C55 1.36 ± 0.10 1.00 7.98E-05 Guidugli et al., 2017
Q2655R c.7964A>G C35 1.38 ± 0.09 1.00 1.13E-04 Guidugli et al., 2017
Y2726C c.8177A>G C65 1.49 ± 0.16 1.00 7.86E-04 Guidugli et al., 2017
Q2925K c.8773C>A C45 1.51 ± 0.12 1.00 1.07E-03 Guidugli et al., 2017
D3073G c.9218A>G C65 1.52 ± 0.12 1.00 1.17E-03 Guidugli et al., 2017
R2659G c.7975A>G C65 1.57 ± 0.12 1.00 2.57E-03 Guidugli et al., 2017
Y2624C c.7871A>G C65 1.57 ± 0.05 1.00 2.71E-03 Current study
R2842P c.8525G>C C65 1.59 ± 0.05 1.00 3.40E-03 Current study
D2611G c.7832A>G C65 1.61 ± 0.12 0.99 5.09E-03 Guidugli et al., 2017
Y2660D c.7978T>G C65 1.61 ± 0.12 0.99 5.03E-03 Guidugli et al., 2017
N2622S c.7865A>G C45   1.63 ± 0.13 0.99 6.76E-03 Current study
*

Fold Change in GFP positive cells in HDR assay;

#

Standard Error

Figure 1. HDR activity of 207 BRCA2 missense variants.

Figure 1.

The model-based HDR fold change with standard error (SE) is displayed on a logarithmic scale. The SE is included as a measure of the reproducibility of the HDR assay for each variant. Solid lines represent 99% probability of pathogenicity and 99% probability of neutrality (fold increase in GFP (+) cells < 1.66 for damaging and fold increase in GFP (+) cells > 2.41 for neutral). Dotted lines separate variants classified as deleterious, indeterminate, and neutral.

Computational Predictions

Sensitivity and specificity of 40 computational prediction models with previously established cut points for damaging variants were determined using the functional assay data for BRCA1 and BRCA2 missense variants (Tables S2, Table S3). These default thresholds yielded either high sensitivity with low specificity (e.g. BRCA2 SIFT Score: sensitivity 100%, specificity <20%) or low sensitivity with high specificity (e.g. BRCA1 PROVEAN Score: sensitivity <0.02%, specificity 100%), depending on the gene (Table S3).

To optimize the predictive ability of each model, thresholds that maximized sensitivity and specificity for damaging variants were defined separately for BRCA1 and BRCA2. For the purposes of predicting damaging, clinically relevant variants, models were generated by combining indeterminate with neutral variants (Scenario 1). Performance characteristics and AUC values for optimized individual prediction models for BRCA1 and BRCA2 are shown in Table 2 and Figure S2. The best performing individual models for BRCA1 incorporated conservation measures including deep interspecies protein alignments and physicochemical changes in amino acids (MetaSVM Score and RankScore21; PERCH and PERCH_noMAF28; Align-GVGD12; Polyphen2Hvar Score and RankScore10, and VEST3 Score and RankScore29). These models yielded AUCs>0.87, sensitivity and specificity >80%, and MCCs up to 0.68 (VEST3Score) (Table 2, Table S4). These results represented a major improvement in performance over results based on default thresholds (mean = 0.29) (Table S3). The best performing models for BRCA2 were PERCH and PERCH_noMAF; MetaLR RankScore and Score; MetaSVM RankScore and Score; and VEST3 RankScore and Score. These yielded AUCs of 0.83–0.89, sensitivity and specificity >78% (85% for PERCH), and MCCs>0.53 (Table 2, Table S4), which were substantially improved over models using default parameters (MCC<0.42) (Table S3).

Table 2.

Performance of in silico prediction models with optimized thresholds for classification of BRCA1 and BRCA2 missense variants

Gene Model Optimal
Threshold
AUC (95%CI) FN / FP / TP / TN MCC

 BRCA1 NVM-Validation ≥9 0.94 (0.897–0.983) 5 / 8 / 25 / 79 0.719
Vest3RankScore ≥0.85546 0.9 (0.849–0.95) 8 / 24 / 51 / 153 0.678
Vest3Score ≥0.868 0.9 (0.849–0.95) 8 / 24 / 51 / 153 0.678
RF ≥0.298 0.92 (0.879–0.96) 8 / 26 / 51 / 151 0.663
AlignGVGDPrior ≥0.29 0.88 (0.829–0.931) 7 / 37 / 54 / 150 0.614
PERCHnoMAF ≥0.206316 0.87 (0.814–0.924) 10 / 31 / 51 / 156 0.614
PERCH ≥0.239853 0.87 (0.819–0.922) 10 / 32 / 51 / 155 0.607
Polyphen2HvarRankScore ≥0.91584 0.89 (0.845–0.93) 11 / 30 / 48 / 147 0.593
Polyphen2HvarScore ≥0.999 0.89 (0.845–0.93) 11 / 30 / 48 / 147 0.593
MetaSVMRankScore ≥0.9083 0.89 (0.844–0.928) 11 / 34 / 48 / 143 0.565

BRCA2 NVM-Validation ≥4 0.89 (0.826–0.963) 6 / 9 / 29 / 59 0.683
PERCH ≥0.295957 0.89 (0.847–0.939) 11 / 21 / 60 / 115 0.672
PERCHnoMAF ≥0.272149 0.88 (0.832–0.929) 12 / 23 / 59 / 113 0.642
RFModel ≥0.371 0.9 (0.843–0.947) 12 / 24 / 59 / 111 0.633
MetaSVMRankScore ≥0.93181 0.87 (0.824–0.923) 15 / 29 / 56 / 107 0.555
MetaSVMScore ≥0.7002 0.87 (0.824–0.923) 15 / 29 / 56 / 107 0.555
MetaLRRankScore ≥0.92107 0.87 (0.823–0.922) 16 / 30 / 55 / 106 0.535
MetaLRScore ≥0.7679 0.87 (0.823–0.922) 16 / 30 / 55 / 106 0.535
Vest3RankScore ≥0.79963 0.83 (0.776–0.893) 16 / 30 / 55 / 106 0.535
Vest3Score ≥0.811 0.83 (0.776–0.893) 16 / 30 / 55 / 106 0.535

FN: False negative; FP: False positive; TP: True positive; TN: True negative

AUC: Area under the curve from Receiver Operator characteristic analysis

MCC: Matthew Correlation Coefficient

To assess whether meta-predictor models improved prediction of the damaging variants for each gene, two new models were developed for both BRCA1 and BRCA2: (1) Random Forest (RF) classifiers of prediction methods were derived from the continuous outputs from the functional data (BRCA1-RF and BRCA2-RF); (2) naïve voting methods (NVM) were applied to optimized thresholds for each prediction model (BRCA1-NVM and BRCA2-NVM). CAROL30 and CONDEL31 predictors were not included in development of new BRCA2 models because prediction scores for 29 of 207 (14.0%) variants were not available. Only 12 of 248 (4.8%) BRCA1 and 1 of 207 (0.5%) BRCA2 variants were excluded from new model development due to missing data or conflicts between protein and DNA sequences (Table S2).

RF-Models

Random Forest (RF) classifiers were used to evaluate the impact of excluding individual prediction methods on the accuracy of composite prediction models. VEST3 RankScore and Align-GVGD had the greatest impact on the accuracy of BRCA1-RF, whereas Mutation Assessor RankScore and PERCH had the greatest impact on BRCA2-RF. The BRCA1-RF model (threshold ≥0.298) (Table S4) showed the second highest AUC value of all models for BRCA1 (0.92, 95%CI:0.88–0.96), with 86% sensitivity and 85% specificity. The BRCA1-RF model predicted 8 of 59 (13.6%) functionally impaired BRCA1 variants as neutral (false negatives), 12 of 21 (57.1%) functionally indeterminate variants as damaging, and 14 of 156 (9.0%) functionally intact neutral variants as damaging (false positives) (Table S2). Similarly, the BRCA2-RF model (threshold ≥0.371) (Table S4) had the highest AUC for BRCA2 (0.90, 95%CI:0.84–0.95) (Table 2, Table S4) with 83% sensitivity and 82% specificity (Table S4, Figure 2).

Figure 2. Matthews Correlation Coefficients (MCC) for 42 in silico predictors with optimized thresholds for damaging versus indeterminate/neutral variants in BRCA1 and BRCA2.

Figure 2.

Higher values indicate increased classifier performance.

NVM-Models

NVM models based on the optimal number of individual prediction algorithms for BRCA1 and BRCA2 variants were also developed. The optimal NVM for BRCA1, following training and validation (BRCA1-NVM Combined) contained 13 prediction models (Table S5). BRCA1 variants are predicted damaging when ≥9 of the 13 models exceed their individual thresholds for damaging variants (Table S5). BRCA1-NVM yielded an AUC of 0.94 with sensitivity of 83% and specificity of 91%. The highest proportion of BRCA1 misclassifications involved variants with indeterminate function, with 9 of 21 (42.9%) annotated as damaging. In contrast, the optimal BRCA2-NVM (BRCA2-NVM Combined) model after training and validation incorporated six prediction models with a threshold of ≥4 models predicting damaging variants (Table S5). This model yielded sensitivity of 82% and specificity of 87% (Table 2, Table S4, Table S5), with 14 functionally damaging variants predicted as neutral, and 18 indeterminate/neutral variants predicted as damaging. As with BRCA1, the false positive results were disproportionately enriched for indeterminate function with 6 of 21 (28.6%) misclassified. Overall, the predictive abilities of the RF and NVM models showed substantial improvement over individual in silico prediction methods using default parameters, and modest improvements over the best performing individual in silico methods optimized at thresholds specific to BRCA1 and BRCA2.

Application of selected models to all possible missense variants in BRCA1 and BRCA2

The RF and NVM models were used to assess the damaging potential of all theoretically possible missense substitutions resulting from single nucleotide changes in BRCA1 and BRCA2, contingent on availability of prediction scores from all the individual methods contributing to each model (Table S2). Because a subset of the contributing prediction algorithms are in part based on nucleotide substitution rates, several missense variants caused by different nucleotide changes may have more than one predicted RF or NVM score. Using BRCA1-NVM, 7.1% of BRCA1 variants were predicted as damaging. Similarly, 2.6% of BRCA2 variants were predicted as damaging using BRCA2-NVM. However, marked enrichment for NVM predicted damaging variants was observed in known functional domains (Figure 3). Analysis of the BRCA1 RING domain, predicted that 30–40% of all missense changes disrupt protein function. Similarly, 46% of all possible missense variants in the C-terminal BRCT domains and >20% in the larger C-terminal region (residue 1660–1810) were predicted damaging (Table S2). Interestingly, ~10% of all possible variants between amino acids 300 to 550, which have been associated with TP5332, RAD5033, and c-MYC32 interactions, were predicted damaging (Table S2). For BRCA2, only the region from residues 2574 to 2771 that contains the helical and OB1 domains of the DNA binding domain was predicted to have >20% damaging variants, although 10% of variants in OB3 were also predicted damaging (Figure 3, Table S2). Few damaging missense variants were predicted in the OB2 domain. Similar results were obtained using the RF model (Table S2). Damaging mutations were not predicted in the N-terminus of BRCA2, containing the PALB2 interaction domain34, possibly because of the small size of the interaction site.

Figure 3. Estimates of the proportion of damaging missense variants by position in each gene.

Figure 3.

The AAPOS x-axis represents the amino acid position, and the y-axis is the probability of a missense mutation being damaging from the NVM model. The lines were smoothed using a 50 amino acid sliding window.

DISCUSSION

Specific measures of BRCA1 and BRCA2 functional activity have been established as reliable measures of the functional impact and the likelihood of pathogenicity of variants in certain domains of BRCA1 and BRCA26,16. However, in the absence of functional studies of individual variants, in silico models that incorporate functional or structural data are often considered useful predictors of function. Here, existing models for prediction of damaging missense variants were recalibrated based on BRCA1 and BRCA2 functional data and were combined in meta-predictor classifiers (NVM and RF). These meta-predictors leveraged the strengths and weaknesses and improved upon many of the individual models for predicting the functional implications of missense variants in the cancer risk-associated domains of BRCA1 and BRCA2. We subsequently used these highly sensitive and specific models to annotate all missense variants from the BRCA1 and BRCA2 genes as damaging or neutral. Importantly, because the BRCA1 transcriptional integrity assay and the BRCA2 HDR assay used for calibration of the various prediction models have 100% sensitivity and specificity for clinically pathogenic variants in the BRCA1 BRCT and BRCA2 DNA binding domain domains, respectively, the models may also predict the clinical pathogenicity of missense variants in these domains. Whether prediction of functional effects in other parts of these proteins also reflects pathogenicity remains to be determined using additional pathogenic and neutral standards. Overall, these prediction models are likely to alter the interpretation of many VUS in BRCA1 and BRCA2, leading to improved clinical genetic testing, and perhaps improved risk management of patients found to carry VUS.

The current American College of Medical Genetics guidelines for variant classification recommends that in silico evidence can be counted as supporting evidence for pathogenicity (or lack thereof) if all of the in silico programs tested agree on the prediction, whereas in silico evidence should not be used for classification if in silico predictions disagree. However, the guidelines do not recommend specific in silico methods, or indicate the number of methods that should be evaluated35. This differs from the NVM model in two key areas. First, default thresholds of predictive models are not appropriate for BRCA1 and BRCA2 because the specificity is very low. The new thresholds for predictive models derived here should provide more accurate predictions of functional impact and therefore pathogenicity. Second, while using an ensemble of models is a rational strategy, requiring all models to be in agreement becomes overly stringent resulting in decreased performance (Figure S3 and Figure S4). Rather, the number of in silico models, the choice of which specific models, and the thresholds for those models that are required for an accurate consensus with both high sensitivity and specificity can vary by gene.

Effect of grouping indeterminate variants as either damaging or neutral

Generally, the performance of individual in silico prediction models, as well as the RF and NVM, were similar when indeterminate variants were grouped with either damaging or neutral. However, the performance of some of the known prediction methods was highly sensitive to indeterminate variant classification. Interestingly, the prediction methods that had the greatest difference in thresholds, depending on the incorporation of the indeterminate variants in the damaging or neutral categories, also had higher AUCs (e.g. PERCH, NVM, RF, MetaSVM) compared to those with no change in threshold (e.g. PolyPhen2HDiv, PolyPhen2HVar, MutationTaster Score), suggesting that the former methods are better predictors of indeterminate impact on function. However, the clinical relevance of the indeterminate variants in BRCA1 and BRCA2 is not well understood. Further understanding of function, pathogenicity, cancer risk, and associated refinement of thresholds for damage and pathogenicity for each functional assay may allow recalibration of the prediction models and improved prediction of clinically relevant BRCA1 and BRCA2 variants in the future.

Extending missense prediction outside established functional domains

When cross-referencing the NVM predictions with well-annotated ClinVar classifications, the predictions clustered in well-known domains, with damaging missense variants mostly restricted to BRCT and RING domains of BRCA1 and the DNA binding domain of BRCA2 12 (Figure S5). The BRCA1-NVM prediction model clearly delineated both regions, with as many as 40% of missense variants in the RING domain and 50% in parts of the BRCT regions annotated as damaging. Interestingly, enrichment between amino acids 400–500 was also observed, but no damaging variants in this region have been defined by functional studies and no pathogenic variants have yet been observed in the clinically tested population. According to the BRCA1-NVM model, the total proportion of all theoretically possible damaging variants in BRCA1 is ~8%, almost all of which are located in the known RING and BRCT domains. For BRCA2, family based studies in combination with the Align-GVGD prediction method were previously used to estimate that 33% of missense variants in the BRCA2 DNA binding domain were damaging36. While based on small numbers of missense variants, this is consistent with predictions from the NVM and RF models for BRCA2, although the frequency based on the BRCA2-NVM and BRCA2-RF models is as high as 50% in specific regions. A notable drop in the estimated pathogenic potential was observed in the BRCA2 OB2 DNA binding domain. This was also observed when considering all pathogenic BRCA2 missense mutations listed in ClinVar.

However, it should be noted that when applied to genes other than BRCA1 and BRCA2 (Table S6) (or even BRCA1-RF or -NVM and applied to BRCA2 and vice versa), the performance of the NVM and RF models was much lower, with MCCs <0.40, as shown for the BRCA2-NVM (Table S7). The sizable reduction in model accuracy suggested that recalibrated models are specific to the initial gene of interest and cannot be effectively extrapolated to other disease genes. Another potential explanation for this phenomenon is that not all missense variants in other genes may exert phenotypic effects through loss of activity. Because the BRCA1 and BRCA2 assays are limited to measurement of loss of function, perhaps more comprehensive assays to evaluate splicing alterations, gain-of-function mutations, and epigenetic influences on gene function are needed in order to extend the NVM and RF prediction models to other genes. Separately, disruption of functions other than transcriptional activation or homology directed repair by missense variants could result in recalibration of the NVM and RF prediction models. Other influences on model performance may include AT versus GC content of coding sequences and codon usage, and the structural effects of observed variants. Finally, the differences could be due to evolutionary constraint – since some models like Align GVGD and PolyPhen2 perform well for the highly conserved BRCA1, but profoundly less so for the less constrained BRCA2.

While the clinical implications of truncating mutations in the BRCA1 and BRCA2 breast cancer predisposition genes are clear, interpretation of missense variants is more challenging. Here we present an approach for predicting the functional impact and potentially the pathogenicity of missense BRCA1 and BRCA2 variants, based on functional evaluation of variants and in silico sequence-based analysis. The functional studies of BRCA2 variants in combination with similar studies of BRCA1 now identify 130 variants in these genes that are damaging and likely pathogenic and may substantially increase risk of breast, ovarian, and other cancers. In contrast, public databases currently identify fewer than 40 such variants. In the absence of functional results, other methods for variant assessment are needed. Many in silico prediction methods exist for characterization of missense variants, but the interpretation of results from these methods, and the accuracy of the methods for predicting whether variants in BRCA1 and BRCA2 are damaging or neutral are not well defined. Here we recalibrated established in silico prediction methods for missense variants using results from BRCA1 and BRCA2 functional assays and developed RF and NVM models that incorporate multiple in silico prediction methods. These classifiers out-performed the individual in silico models. Overall this approach leverages measures of BRCA1 and BRCA2 functional activity to improve the classification of BRCA1 and BRCA2 VUS detected by clinical genetic testing and tumor sequencing.

Supplementary Material

Supplementary Figure S1
Supplementary Figure S2
Supplementary Figure S3
Supplementary Figure S4
Supplementary Figure S5
Supplementary Tables
Supplementary Table S2
Supplementary Table S6

ACKNOWLEDGEMENT

FUNDING

This work was supported by the Breast Cancer Research Foundation; National Institutes of Health grants [CA192393, CA176785, CA116167]; and a National Cancer Institute Specialized Program of Research Excellence (SPORE) in Breast Cancer to Mayo Clinic [P50 CA116201].

Footnotes

CONFLICT OF INTEREST

The authors have no relevant conflicts of interest.

REFERENCES

  • 1.Easton DF. How many more breast cancer predisposition genes are there? Breast Cancer Res 1999;1(1):14–17. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Campeau PM, Foulkes WD, Tischkowitz MD. Hereditary breast cancer: new genetic developments, new therapeutic avenues. Hum Genet August 2008;124(1):31–42. [DOI] [PubMed] [Google Scholar]
  • 3.Pal T, Permuth-Wey J, Betts JA, et al. BRCA1 and BRCA2 mutations account for a large proportion of ovarian carcinoma cases. Cancer December 15 2005;104(12):2807–2816. [DOI] [PubMed] [Google Scholar]
  • 4.Landrum MJ, Lee JM, Benson M, et al. ClinVar: public archive of interpretations of clinically relevant variants. Nucleic Acids Res January 04 2016;44(D1):D862-868. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Eggington JM, Bowles KR, Moyes K, et al. A comprehensive laboratory-based program for classification of variants of uncertain significance in hereditary cancer genes. Clin Genet September 2014;86(3):229–237. [DOI] [PubMed] [Google Scholar]
  • 6.Guidugli L, Pankratz VS, Singh N, et al. A classification model for BRCA2 DNA binding domain missense variants based on homology-directed repair activity. Cancer Res January 01 2013;73(1):265–275. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Millot GA, Carvalho MA, Caputo SM, et al. A guide for functional analysis of BRCA1 variants of uncertain significance. Hum Mutat November 2012;33(11):1526–1537. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Iversen ES Jr., Couch FJ, Goldgar DE, Tavtigian SV, Monteiro AN. A computational method to classify variants of uncertain significance using functional assay data with application to BRCA1. Cancer Epidemiol Biomarkers Prev June 2011;20(6):1078–1088. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Ng PC, Henikoff S. SIFT: Predicting amino acid changes that affect protein function. Nucleic Acids Res July 01 2003;31(13):3812–3814. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Adzhubei IA, Schmidt S, Peshkin L, et al. A method and server for predicting damaging missense mutations. Nat Methods April 2010;7(4):248–249. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Cooper GM, Stone EA, Asimenos G, et al. Distribution and intensity of constraint in mammalian genomic sequence. Genome Res July 2005;15(7):901–913. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Tavtigian SV, Byrnes GB, Goldgar DE, Thomas A. Classification of rare missense substitutions, using risk surfaces, with genetic- and molecular-epidemiology applications. Hum Mutat November 2008;29(11):1342–1354. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Kircher M, Witten DM, Jain P, O’Roak BJ, Cooper GM, Shendure J. A general framework for estimating the relative pathogenicity of human genetic variants. Nat Genet March 2014;46(3):310–315. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Stenson PD, Ball EV, Mort M, Phillips AD, Shaw K, Cooper DN. The Human Gene Mutation Database (HGMD) and its exploitation in the fields of personalized genomics and molecular evolution. Curr Protoc Bioinformatics September 2012;Chapter 1:Unit1 13. [DOI] [PubMed] [Google Scholar]
  • 15.Starita LM, Young DL, Islam M, et al. Massively Parallel Functional Analysis of BRCA1 RING Domain Variants. Genetics June 2015;200(2):413–422. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Woods NT, Baskin R, Golubeva V, et al. Functional assays provide a robust tool for the clinical annotation of genetic variants of uncertain significance. NPJ Genom Med 2016;1:16001. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Lee MS, Green R, Marsillac SM, et al. Comprehensive analysis of missense variations in the BRCT domain of BRCA1 by structural and functional assays. Cancer Res June 15 2010;70(12):4880–4890. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Farrugia DJ, Agarwal MK, Pankratz VS, et al. Functional assays for classification of BRCA2 variants of uncertain significance. Cancer Res May 1 2008;68(9):3523–3531. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Lindor NM, Guidugli L, Wang X, et al. A review of a multifactorial probability-based model for classification of BRCA1 and BRCA2 variants of uncertain significance (VUS). Hum Mutat January 2012;33(1):8–21. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Goldgar DE, Easton DF, Deffenbaugh AM, et al. Integrated evaluation of DNA sequence variants of unknown clinical significance: application to BRCA1 and BRCA2. Am J Hum Genet October 2004;75(4):535–544. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Liu X, Wu C, Li C, Boerwinkle E. dbNSFP v3.0: A One-Stop Database of Functional Predictions and Annotations for Human Nonsynonymous and Splice-Site SNVs. Hum Mutat March 2016;37(3):235–241. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Kocher JP, Quest DJ, Duffy P, et al. The Biological Reference Repository (BioR): a rapid and flexible system for genomics annotation. Bioinformatics July 1 2014;30(13):1920–1922. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.McLaren W, Pritchard B, Rios D, Chen Y, Flicek P, Cunningham F. Deriving the consequences of genomic variants with the Ensembl API and SNP Effect Predictor. Bioinformatics August 15 2010;26(16):2069–2070. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Matthews BW. Comparison of the predicted and observed secondary structure of T4 phage lysozyme. Biochim Biophys Acta October 20 1975;405(2):442–451. [DOI] [PubMed] [Google Scholar]
  • 25.Lopez-Raton M, Cadarso-Suarez C, Rodriguez-Alvarez MX, Gude-Sampedro F. OptimalCutpoints: An R Package for Selecting Optimal Cutpoints in Diagnostic Tests. J Stat Softw October 2014;61(8):1–36. [Google Scholar]
  • 26.Liaw A, Wiener M. Classification and Regression by randomForest. R News 2002;2/3:18–22. [Google Scholar]
  • 27.Farrugia DJ, Agarwal MK, Pankratz VS, et al. Functional assays for classification of BRCA2 variants of uncertain significance. Cancer Res May 1 2008;68(9):3523–3531. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Feng B, Goldgar D. An integrated framework for sequence variant prioritization. American Society of Human Genetics Annual Meeting 2014;Abstract 1370T. [Google Scholar]
  • 29.Carter H, Douville C, Stenson PD, Cooper DN, Karchin R. Identifying Mendelian disease genes with the variant effect scoring tool. BMC Genomics 2013;14 Suppl 3:S3. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Lopes MC, Joyce C, Ritchie GR, et al. A combined functional annotation score for non-synonymous variants. Hum Hered 2012;73(1):47–51. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Gonzalez-Perez A, Lopez-Bigas N. Improving the assessment of the outcome of nonsynonymous SNVs with a consensus deleteriousness score, Condel. Am J Hum Genet April 8 2011;88(4):440–449. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Wang Q, Zhang H, Kajino K, Greene MI. BRCA1 binds c-Myc and inhibits its transcriptional and transforming activity in cells. Oncogene October 15 1998;17(15):1939–1948. [DOI] [PubMed] [Google Scholar]
  • 33.Zhong Q, Chen CF, Li S, et al. Association of BRCA1 with the hRad50-hMre11-p95 complex and the DNA damage response. Science July 30 1999;285(5428):747–750. [DOI] [PubMed] [Google Scholar]
  • 34.Xia B, Sheng Q, Nakanishi K, et al. Control of BRCA2 cellular and clinical functions by a nuclear partner, PALB2. Mol Cell June 23 2006;22(6):719–729. [DOI] [PubMed] [Google Scholar]
  • 35.Richards S, Aziz N, Bale S, et al. Standards and guidelines for the interpretation of sequence variants: a joint consensus recommendation of the American College of Medical Genetics and Genomics and the Association for Molecular Pathology. Genet Med May 2015;17(5):405–424. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Easton DF, Deffenbaugh AM, Pruss D, et al. A systematic genetic assessment of 1,433 sequence variants of unknown clinical significance in the BRCA1 and BRCA2 breast cancer-predisposition genes. Am J Hum Genet November 2007;81(5):873–883. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary Figure S1
Supplementary Figure S2
Supplementary Figure S3
Supplementary Figure S4
Supplementary Figure S5
Supplementary Tables
Supplementary Table S2
Supplementary Table S6

RESOURCES