Skip to main content
. 2019 Mar;21(2):318–329. doi: 10.1016/j.jmoldx.2018.10.009

Table 2.

Key Aspects of Methodology Used in This Study

Aspect of methodology used Rationale (see text for details)
Large data sets used for both SNVs and indels Provides confidence in the resulting criteria and helps minimize overfitting. Appropriate sizes determined by CI calculations (below).
Both clinical and reference (GIAB) samples used Greatly increases the size and diversity of the data sets, particularly in FPs.
Same quality filtering thresholds as used in clinical practice Confirmation criteria can depend on filtering criteria. Using lower filters would add many FPs to the study but could result in the selection of ineffective or biased confirmation criteria.
Separate filtering and confirmation thresholds used Allows high sensitivity (by keeping variants of marginal quality) and high specificity (by subjecting these variants to confirmation).
On- and off-target variant calls analyzed in GIAB samples Further increases the data set size and diversity. Off-target calls were subject to the same quality filters as were on-target calls.
Indels and SNVs analyzed separately Indels and SNVs can have different quality determinants. An adequate population of each was required to achieve statistical significance.
Partial matches considered FPs Zygosity errors and incorrect diploid genotypes do occur and can be as clinically important as “pure” FPs.
Algorithm selects criteria primarily by their ability to flag FPs Other algorithms equally value classification of TPs, which may result in biased criteria, particularly as TPs greatly outnumber FPs.
Multiple quality metrics used to flag possible FPs Various call-specific metrics (eg, quality scores, read depth, allele balance, strand bias) and genomic annotations (eg, repeats, segmental duplications) proved crucial.
Key metric: fraction of FPs flagged (FP sensitivity) Other metrics, including test PPV and overall classifier accuracy, can be uninformative or misleading in the evaluation of confirmation criteria.
Requirement of 100% FP sensitivity on training data Clinically appropriate. Resulting criteria will be effective on any subset of the training data (clinical or GIAB, on-target or off-target, etc.)
Statistical significance metric: CI on FP sensitivity Rigorously indicates validity of the resulting criteria: eg, flagging 100% of 125 FPs demonstrates ≥98% FP sensitivity at P = 0.05. Smaller data sets (eg, 50 FPs) resulted in ineffective criteria.
Separate training and test sets used (cross validation) In conjunction with large data sets, cross-validation is a crucial step to avoid overfitting, which can otherwise result in ineffective criteria.
Prior confirmation of a variant was not used as a quality metric Successful confirmations of a particular variant can indicate little about whether future calls of that same variant are true or false.
All variants outside of GIAB-HC regions require confirmation Outside of these regions, too few confirmatory data are available to prove whether the criteria are effective.
Laboratory- and workflow-specific criteria Effective confirmation criteria can vary based on numerous details of a test's methodology and its target genes. Changes can necessitate revalidation of confirmation criteria.

FP, false positive; GIAB, Genome in a Bottle; GIAB-HC, regions in which high-confidence truth data are available from the GIAB specimens (unrelated to confidence in our own calls or to on/off-target regions); indel, insertion or deletion; PPV, positive predictive value; SNV, single-nucleotide variant; TP, true positive.