Table 2.
Key Aspects of Methodology Used in This Study
Aspect of methodology used | Rationale (see text for details) |
---|---|
Large data sets used for both SNVs and indels | Provides confidence in the resulting criteria and helps minimize overfitting. Appropriate sizes determined by CI calculations (below). |
Both clinical and reference (GIAB) samples used | Greatly increases the size and diversity of the data sets, particularly in FPs. |
Same quality filtering thresholds as used in clinical practice | Confirmation criteria can depend on filtering criteria. Using lower filters would add many FPs to the study but could result in the selection of ineffective or biased confirmation criteria. |
Separate filtering and confirmation thresholds used | Allows high sensitivity (by keeping variants of marginal quality) and high specificity (by subjecting these variants to confirmation). |
On- and off-target variant calls analyzed in GIAB samples | Further increases the data set size and diversity. Off-target calls were subject to the same quality filters as were on-target calls. |
Indels and SNVs analyzed separately | Indels and SNVs can have different quality determinants. An adequate population of each was required to achieve statistical significance. |
Partial matches considered FPs | Zygosity errors and incorrect diploid genotypes do occur and can be as clinically important as “pure” FPs. |
Algorithm selects criteria primarily by their ability to flag FPs | Other algorithms equally value classification of TPs, which may result in biased criteria, particularly as TPs greatly outnumber FPs. |
Multiple quality metrics used to flag possible FPs | Various call-specific metrics (eg, quality scores, read depth, allele balance, strand bias) and genomic annotations (eg, repeats, segmental duplications) proved crucial. |
Key metric: fraction of FPs flagged (FP sensitivity) | Other metrics, including test PPV and overall classifier accuracy, can be uninformative or misleading in the evaluation of confirmation criteria. |
Requirement of 100% FP sensitivity on training data | Clinically appropriate. Resulting criteria will be effective on any subset of the training data (clinical or GIAB, on-target or off-target, etc.) |
Statistical significance metric: CI on FP sensitivity | Rigorously indicates validity of the resulting criteria: eg, flagging 100% of 125 FPs demonstrates ≥98% FP sensitivity at P = 0.05. Smaller data sets (eg, 50 FPs) resulted in ineffective criteria. |
Separate training and test sets used (cross validation) | In conjunction with large data sets, cross-validation is a crucial step to avoid overfitting, which can otherwise result in ineffective criteria. |
Prior confirmation of a variant was not used as a quality metric | Successful confirmations of a particular variant can indicate little about whether future calls of that same variant are true or false. |
All variants outside of GIAB-HC regions require confirmation | Outside of these regions, too few confirmatory data are available to prove whether the criteria are effective. |
Laboratory- and workflow-specific criteria | Effective confirmation criteria can vary based on numerous details of a test's methodology and its target genes. Changes can necessitate revalidation of confirmation criteria. |
FP, false positive; GIAB, Genome in a Bottle; GIAB-HC, regions in which high-confidence truth data are available from the GIAB specimens (unrelated to confidence in our own calls or to on/off-target regions); indel, insertion or deletion; PPV, positive predictive value; SNV, single-nucleotide variant; TP, true positive.