Abstract
Purpose:
NCI selected a network of CLIA-certified laboratories performing routine NGS tumor testing to identify patients for the NCI-MATCH trial. This large network provided a unique opportunity to compare variant detection and reporting between a wide range of testing platforms.
Experimental Design:
Twenty-eight NGS assays from 26 laboratories within the NCI-MATCH network, including the NCI-MATCH central laboratory (CL) and 11 commercial and 14 academic designated laboratories (DL), were used for this study. DNA from 8 cell lines and 2 clinical samples were sequenced. Pairwise comparisons in variant detection and reporting between each DL and CL were performed for SNV, Indel, and CNV variant classes.
Results:
We observed high concordance in variant detection between CL and DL for SNVs and Indels (Average Positive Agreement, APA>95.4% for all pairwise comparisons) but lower concordance for variant reporting after analysis pipeline filtering. We observed much higher agreement between CL and assays using amplification as the target enrichment method (84.2%<APA≤95.7%, average APA=88.7%) than with other assays using hybridization capture (69.7%<APA≤93.8%, average APA=77.4%) due to blacklisting of actionable variants in low complexity regions. For CNV reporting, we observed high agreement (APA>82%) except between CL and 2 assays (APA=76.9 and 71.4%) due to differences in estimation of copy numbers. Notably, for all variants, differences in variant interpretation also contributed to reporting discrepancies.
Conclusions:
This study indicates that different NGS tumor profiling tests currently in widespread clinical use achieve high concordance between assays in variant detection. For variant reporting, observed discrepancies are mainly introduced during the bioinformatics analysis.
Keywords: Next-Generation Sequencing, SNVs, Indels, CNVs, variant calling and reporting
INTRODUCTION
Over the last decade, molecular profiling by next generation sequencing (NGS) has been widely adopted into clinical practice [1]. In the era of precision oncology, NGS has proven indispensable in identifying actionable mutations that can then be matched to effective targeted therapies for patients with cancer. Most commercial and academic laboratories that perform clinical sequencing use targeted panels that interrogate several hundred cancer-related genes. In a typical laboratory workflow, raw sequencing data are processed through a custom data analysis pipeline that calls variants of clinical interest and annotates them. The annotated variant data are then reviewed and compiled into a clinical report that is provided to the treating physician to guide treatment selection.
Early studies comparing whole exome sequencing assays found substantial disagreement between assays [2, 3] and raised concerns that similar discordance might carry over to clinical NGS testing with targeted panels. Discrepancies in variants reported might arise because of differences in performance characteristics between sequencing assays/methods or based on how different data analysis pipelines call and filter variants. Yet another potential source of discordance is in how laboratories interpret the clinical significance of identified variants.
More recently, several proficiency studies [4, 5] and interlaboratory comparisons [6, 7] have evaluated concordance between clinical NGS laboratories that sequence solid tumor or hematological malignancies. These studies have demonstrated high concordance between labs. However, the number of variants evaluated has been small (n ≤ 20), and the focus has been on single nucleotide variants (SNVs), which are the least challenging variant type to detect because of their lower data analysis complexity compared to insertions and deletions (Indels) [8–10] and copy number variants (CNVs) [11–15]. Additionally, since contrived materials that were not formalin-fixed were used in these studies, it is not clear how well the results recapitulate what might be observed with standard formalin-fixed paraffin-embedded (FFPE) clinical tissue samples.
Clinical NGS has been used to support several large “basket” clinical trials [16–18]. Basket studies match a patient’s molecular profile to an appropriate targeted therapy independent of tumor histology [19]. The largest among these was the NCI-MATCH trial (NCI-MATCH), also referred to as EAY131 or NCT02465060 [20, 21]. NCI-MATCH included participation from more than 1100 clinical sites to identify patients for enrollment to 38 treatment arms. In the first phase of the trial, biopsies from 6390 patients were screened by four central laboratories. Prior to study start, an interlaboratory reproducibility study was conducted in which a positive pairwise concordance of 96% between the four labs was achieved [22]. Critical to the high concordance rate was the implementation of a harmonized assay workflow, standard operating procedures (SOPs), including CL extraction of nucleic acids, and a common data analysis pipeline across the laboratory network.
In May 2017, as 19 treatment arms with rare variants did not meet accrual of 35 patients after completion of the NCI-MATCH biopsy screening phase, NCI-MATCH transitioned to a network of “designated labs” (DL) to refer patients to the study. Candidate NGS labs were required to be Clinical Laboratory Improvement Amendments (CLIA) compliant. After an initial review of the validation report for their assay, each lab was required to perform a concordance study on a common set of formalin-fixed cancer cell lines and FFPE clinical samples before being officially admitted to the network. The purpose of this concordance study was to ensure that data generated by all DLs for SNVs, insertion/deletions (Indels), and copy number variants (CNVs) would be consistent with the central lab (CL) assay, Oncomine Comprehensive Assay version 3 (OCAv3). This study also provided a unique opportunity to assess concordance for multiple variant types among assays having widely varying panel content, different library preparation and sequencing chemistries, and variant-calling algorithms. We present here results from a pilot study with one DL and the final concordance study among 26 academic and commercial laboratories participating in the NCI-MATCH DL network.
MATERIALS AND METHODS
Cell lines and Clinical Samples
For both the pilot study and the concordance study, a set of samples was assembled to provide representation of 3 variant classes (i.e., SNVs, Indels, and CNVs) used to assign patients to the different treatment arms of the NCI-MATCH trial. As it had been hypothesized that calling CNVs would be more challenging across designated labs than calling SNVs or Indels, these samples also included some of the clinically actionable CNVs that were part of the inclusion criteria for the NCI-MATCH rare variant arms. For the pilot study, an outside commercial laboratory, the Pilot Lab, provided to the NCI-MATCH CL, the Molecular Characterization (MoCha) Laboratory at the Frederick National Laboratory for Cancer Research (Frederick, MD), 81 de-identified formalin-fixed, paraffin-embedded (FFPE) clinical tumor specimens with various histopathologic diagnoses. These samples were previously characterized by the Pilot Lab’s analytically validated oncology-based NGS.
For the concordance study, 6 FFPE clinical tumor specimens with various histopathologic diagnoses from the Yale Tumor Profiling laboratory (Yale) and the Center for Integrated Diagnostics at Massachusetts General Hospital (MGH) were chosen to include a large number of clinically relevant amplifications. These CNVs were originally identified by analytically validated orthogonal assays (e.g., NGS or fluorescence in situ hybridization [FISH] assays) in CLIA-certified laboratories. Because of the limited amount of DNA per clinical sample and the scarcity of the other variant types (SNVs and Indels) in these clinical samples, each laboratory analyzed DNA from two clinical samples among the six clinical samples received from Yale and MGH but also DNA from a common set of 8 cell lines from ATCC acquired between 2014 and 2016. All cell lines from ATCC go through authentication and QC testing including short-tandem repeats (STR) profiling. . The complete list of specimens used for the concordance study and their description (i.e., source, orthogonal assays performed, tumor type) is provided in Supplementary Table S1.
This study was initiated after approval from the Central Institutional Review Board and was conducted in accordance with the Helsinki Declaration and Good Clinical Practice Guidelines of the International Conference on Harmonization. Written informed consent was obtained from all patients. Each site that provided clinical samples also received institutional review board approval for the analyses conducted.
Sample Preparation and DNA Extraction
For the concordance study, cell lines (Supplementary Table S1) were cultured using vendor-recommended conditions. Cultured cells were harvested and pelleted by centrifugation, fixed overnight in 10% neutral buffered formalin, and embedded in paraffin blocks. Sections were cut from tumor and cell line FFPE blocks, and the relevant regions were collected for DNA extraction. For the concordance study, DNA was extracted using the Qiagen AllPrep DNA/RNA FFPE kit (Qiagen, Valencia, CA). All nucleic acid samples were quantitated by the Qubit fluorometer (Thermo Fisher Scientific) before use in the NGS assay [22].
FISH
Fluorescent in situ hybridization (FISH) analysis was conducted on the cell line and tumor samples used for the concordance study harboring BRAF, CCND1, CDK6, ERBB2, FGF19, FGF3, FGFR1, FLT3, MDM2, MET, MYC, and EGFR copy number gains assessed by NGS. Probes for these genes were selected from human BAC genomic libraries by Empire Genomics (Depew, New York), where the probes were prepared and fluorescently labeled. Each gene probe came admixed with a second, differently fluorescently labeled probe for the peri-centromeric region of the chromosome containing that gene.
Cells were cultured using vendor-recommended conditions. Adherent cells were grown on chamber slides; otherwise, cells were grown in suspension. Cells were prepared for hybridization according to described methods (https://www.genecopoeia.com/wp-content/uploads/2016/04/Cell-slide-prep_2016.pdf), excluding the treatment with protease for suspension cells.
Seven micron-thick sections of formalin-fixed, paraffin-embedded tissues containing tumor cells were analyzed with the probes described above. Procedures described by the Empire Genomics were followed (https://empiregenomics.com/docs/protocols/CISH-Protocol.pdf) for FISH on cells and tissues, with slight modifications.
Slides were examined using an Olympus BX51 fluorescence microscope with a 60x oil immersion objective. For each cell line/amplified gene, two separate preparations and hybridizations were carried out. For each hybridization, the absolute fluorescence signals per cell were counted for the amplified gene and the associated centromere, and the absolute counts plus the gene:centromere ratios averaged from at least 30 cells. Two expert observers performed the counting and their results were recorded separately. If high levels of amplification were observed (≥ ~x15), were noted because of possible inaccuracy in counting of copy numbers.
Karyotyping
For cytogenetics, cell cultures in logarithmic growth were blocked in metaphase by adding colcemid 40 ng/mL for 2–4 hours. For adherent cultures, cells were removed from the surface of the culture flask by brief treatment with trypsin/EDTA and gentle agitation. Cells were centrifuged and the supernatant discarded. A hypotonic solution (0.075 M KCl) at ambient temperature was added to the resuspended cell pellets for 7 minutes. The cells were then collected by centrifugation and gently resuspended in ice-cold fixative (methanol:glacial acetic acid at 3:1). After 2–4 hours at 4°C, the cells were equilibrated to room temperature and centrifuged, resuspended in fresh fixative, and stored overnight at 4°C. The following day, cells suspended in fixative were dropped on humidified microscope slides. The slides were air-dried.
For G-banding of chromosome spreads, the slides were baked overnight at 60°C in a dry oven. Three Coplin jars were prepared, one each with 500 μL trypsin stock solution (17.5 mg trypsin 1:250 [Difco] in PBS, pH 6.8) in 70 mL PBS (pH 7.2) at 37°C, ice-cold PBS (pH 6.8) to stop enzymatic activity, and 5% Giemsa in PBS (pH 6.8) at room temperature. The dried slides were dipped in the trypsin solution for 10–15 seconds, followed by dipping in the ice-cold PGS for a few seconds, and then staining in Giemsa for 15 minutes. The slides were briefly rinsed in deionized water and dried on paper towels. The slides were then examined for metaphase chromosome spreads in the light transmission microscope using an oil immersion lens. Appropriate spreads were photographed and karyotypes constructed. The chromosomal contents of the spreads were examined and reported using standard nomenclature. At least 20 spreads were examined to determine cytogenetic homogeneity among the cells.
These spreads were also be examined by FISH if judged to be of value, for instance to determine the chromosomal location of the signals produced by hybridization with the probes when there was any question about the specificity of the FISH probes.
DNA microarrays
To detect CNVs using an orthogonal method in the concordance study samples, microarray analysis of DNA from cells and tissues was performed at the Yale Center for Genome Analysis using Affymetrix OncoScan FFPE assay kit and following manufacturer’s protocol with integrated quality control measures as previously described. Briefly, 80 ng of genomic DNA from cell lines or tumor tissue specimens was used as a template to anneal a panel of molecular inversion probes. The annealing mix was split for AT and GC reactions. The gap between the probes and template was filled in by DNA polymerase and linked by ligase to form circularized probes. Unligated probes and single-stranded DNA were destroyed by exonuclease and washed out. Circular probes were released, cleaved and amplified using universal primers by first polymerase chain reaction (PCR). These PCR mixtures were divided for second PCR for the incorporation of either biotinylated AT or GC. These two PCR mixtures were digested by HaeIII restriction enzyme, then denatured and hybridized to microarray chip. The results were analyzed using the Chromosome Analysis Suite version 3.3 (Affymetrix, Santa Clara, CA), in which the log2 ratio for copy number changes and B-allele frequency for allelic patterns were visualized. Copy number variants (CNV) on targeted loci were analyzed for twelve tumors samples with known genes gain or amplification. The quality metrics for Oncoscan microarray assay are mapd ≤ 0.3 and ndSnpQc ≥ 26. [23, 24].
Sequencing and Variant Calling
NCI-MATCH central laboratory/MoCha Lab
The cells lines and clinical samples used for the concordance study were sequenced at the CL using OCAv3 [22] prior to being sent to the DLs in order to confirm the variants present and to provide a baseline for comparison. The library and template preparation, as well as the bioinformatics and data analysis pipeline at the CL, have been described previously [22]. Each DL listed the variants they identified and would have reported on a clinical report at the time they performed their assay on the concordance testing samples. To understand differences in variant reporting between DLs, each DL was sent a list of variants reported by at least one other DL but not reported by that particular DL. Approximately 500 additional variants that were not found in the samples provided by any lab were also included in this list. The DL’s bioinformatician(s) determined whether these variants were covered by the given DL’s assay. If the variants were included in their assay panel but not reported, they provided the reasons for not reporting them.
Our analysis focused on the percentage of agreement for pairwise comparisons between CL and each DL in variants detected and reported for each variant type (with SNVs and Indels pooled together under the name “SNVs/Indels”) and the reasons explaining why some variants are “detected” by some labs versus “detected and reported” by other labs. Each SNV/Indel in this study is described as “cell line.gene.amino acid change”; each CNV is described as “cell line.gene”.
Designated Laboratories
These labs used a variety of assay platforms to identify variants for NCI-MATCH (Table 1). The analysis for each DL’s assay was performed in a single-blinded fashion for concordance with the CL (assay H).
Table 1.
NCI-MATCH designated lab assay platforms.
| Assay ID | Number of genes on panel (N) | Instrument Platform | Target Enrichment Method |
|---|---|---|---|
|
| |||
| A | <100 | Ion Torrent | Amplification |
| B | <100 | Ion Torrent | Amplification |
| C | <100 | Ion Torrent | Amplification |
| D | 100≤ N <200 | Ion Torrent | Amplification |
| E | 100≤ N <200 | Ion Torrent | Amplification |
| F | 100≤ N <200 | Ion Torrent | Amplification |
| G | 100≤ N <200 | Ion Torrent | Amplification |
| H | 100≤ N <200 | Ion Torrent | Amplification |
| I | 100≤ N <200 | Ion Torrent | Amplification |
| J | <100 | Archer VariantPlex on Illumina | Anchored Multiplex PCR |
| K | 100≤ N <200 | Archer VariantPlex on Illumina | Anchored Multiplex PCR |
| L | <100 | Illumina | Amplification |
| M | 200≤ N <500 | Illumina | Amplification |
| N | 100≤ N <200 | Illumina | Amplification |
| O | <100 | Illumina | Hybrid Capture |
| P | <100 | Illumina | Hybrid Capture |
| Q | 100≤ N <200 | Illumina | Hybrid Capture |
| R | 100≤ N <200 | Illumina | Hybrid Capture |
| S | 200≤ N <500 | Illumina | Hybrid Capture |
| T | 200≤ N <500 | Illumina | Hybrid Capture |
| U | 200≤ N <500 | Illumina | Hybrid Capture |
| V | 200≤ N <500 | Illumina | Hybrid Capture |
| W | 200≤ N <500 | Illumina | Hybrid Capture |
| X | >500 | Illumina | Hybrid Capture |
| Pilot Lab | >500 | Illumina | Hybrid Capture |
| Y | >500 | Illumina | Hybrid Capture |
| Z | >500 | Illumina | Hybrid Capture |
| θ | 100≤ N <200 | Illumina | Hybrid Capture |
| ϕ | >500 | Illumina | Hybrid Capture |
Statistical Analysis
For the pilot study and the larger concordance study, the primary analysis calculated estimates of positive agreement between each CL and DL. These analyses focused on binary CNV and SNV/Indel calling. Positive percent agreement (PPA) [25] and average positive agreement (APA) [26] were calculated for each DL compared to CL, pooled across variants, but separately for CNV and SNV/Indels. Additional information about calculation of PPA and APA can be found in the Supplemental materials and methods.
For all analyses of agreement, variants were only included if they were able to be interrogated by both CL and DL assays. For CNVs, both labs provided files that indicated which CNV regions their assay could identify. The observed variants that were within the intersected region are the variants that are included in the overall CNV analysis. For SNVs/Indels in the pilot study only, the CL provided a hotspot file, and the DL provided a bed file that indicated what regions the assay was able to interrogate. For SNVs/Indels in the larger concordance study, SNVs/Indels were included if they were detected by the CL and reported by at least one DL.
Tile plots were generated for SNV/Indel and CNV variants to visualize concordance across variants. Different colors were used for each of the following categories: detected and reported, detected but not reported, not detected/below cutoff, sample not run, and variant not covered by assay. PPA and APA were calculated by designating a positive call for variants in two different ways: positive calls could be designated as either 1) detected and reported, or 2) detected and above the NCI-MATCH cut-off but not reported. The NCI-MATCH cut-offs are variant allele frequency (VAF) ≥ 5% and 10% for SNVs and Indels, respectively, and CN ≥ 7 copies for CNVs except for assay U that used a cut-off of 15 copies for CNVs. For CNV, we chose a cut-off of 7 based on discussion with the principal investigators from the different subprotocols, who by consensus preferred this conservative cut-off over the limit of detection of the CL of 4 copies to ensure that patients referred to the study had a true gene amplification.
An acceptance criterion of 80% PPA with CL as reference for both CNV and SNV/Indel calls for all DLs was applied. For this criterion, we designated a positive agreement according to scenario 1) above—for variants that were detected and reported by both assays. Additional descriptive analyses assessed continuous (VAF) (SNVs/Indels) and copy number (CNVs), including boxplots for each variant, scatter plots with Deming regression lines, and Pearson correlation coefficients [27]. Analyses were completed using R version 4.3.0.
Data Availability
Raw data for this study were generated by DL and CL. All BAM sequencing files from the CL are available through the Sequence Read Archive under the BioProject ID “PRJNA1250987”. These data, as well as all other processed data files from the CL, are available from the corresponding author upon request. Other derived data supporting the findings of this study, including processed data files from each DL, are available within the article and its supplementary data files.
RESULTS
The NCI-MATCH Designated Lab (DL) Network is a network of commercial and academic CLIA labs performing NGS as standard of care for treating physicians. These labs were recruited to identify patients with genetic variants making them potentially eligible for NCI-MATCH. The DL Network used a variety of NGS platforms and tumor profiling assays (Table 1). Laboratories were asked to use their own validated data analysis and annotation pipelines. Among the DL Network, three laboratories—Memorial Sloan Kettering Cancer Center (MSKCC), MD Anderson Cancer Center (MDACC), and Yale—did not participate in the concordance study because their respective assays were either FDA-cleared (MSKCC) or concordance with the central assay had been previously demonstrated (MDACC and Yale) [22].
Pilot Study
To determine the feasibility of our planned data collection and analysis approach, we conducted a pilot study between the central laboratory (CL), which used assay H, and one Pilot Lab using 81 FFPE tumor samples. In these samples, more than 100 CNVs and 60 SNVs/Indels were reported by each laboratory. Strong concordance in both variant detection and variant reporting was demonstrated (Supplementary Table S2) with an APA of 98.6% and 100% for the detection of SNVs/Indels and CNVs, respectively, and with an APA of ~92% for the reporting of both SNVs/Indels and CNVs. In addition, both laboratories detected SNVs/Indels with similar VAFs (Deming regression slope = 1.01) and CNVs with similar copy number values (Deming regression slope = 0.86) (Supplementary Figure S1).
SNV and Indel Concordance between Laboratories
We conducted the main concordance study with a substantially smaller set of samples than used for the pilot study. This was done to ensure that the main study would be operationally feasible for all 26 laboratories and to facilitate the acquisition of specimens that could be shared between all labs. Fourteen well-characterized FFPE samples were used, including 8 cell lines and 6 clinical samples. Each laboratory sequenced 10 samples, including all 8 cell lines. However, because the amount of extracted DNA from any one clinical sample was limiting, each lab was provided with 2 of the 6 clinical samples.
A total of 28 SNVs and 22 Indels, including 28 OncoKB™ curated variants (25 SNVs and 3 Indels as of 5/9/2023, OncoKB database version 4.4) that are annotated oncogenic or likely oncogenic on the OncoKB™ knowledge base, were reported by at least one laboratory (Figure 1) (Supplementary Tables S3 and S4) [29]. Given the variety of assays used by the laboratories in the DL Network, the number of variants covered by each assay varied greatly between labs, from 12 SNVs/Indels covered by the <100-gene panel from Assay B to 49 variants covered by the >500-gene panel from assay Y (Figure 1A). Despite the wide range in panel size, more than 75% of the assays (i.e., 22 out of 28 NGS assays) covered at least 25 SNVs/Indels.
Figure 1.

Variant allele frequencies (VAF) of SNVs and Indels identified by DL Network assays.
A, Tile plot illustrating SNVs (top-above the black line and Indels (bottom) identified in the concordance study and their reporting status by laboratory. The NCI-MATCH cut-off was used for this study with an SNV cut-off of ≥5% VAF and an Indel cut-off of ≥10% VAF. Numbers in boxes indicate VAF. B and C, Box plots with median VAF for each SNV and Indel. The lower and upper hinges indicate the 25th and 75th percentiles, respectively, and the whiskers correspond to ± 1.58 × IQR/sqrt(n) of the hinge, where IQR is the interquartile range or distance between the first and third quartiles and n is the number of data points.
Variants written in red denote OncoKB™ variants.
We observed multiple cases of discordance in reported variants between assays (Figure 1A). Variants were detected by laboratories in their BAM files but filtered out during the bioinformatic analysis process (Figures 2A and B). For example, SNU5.MSH6.K1358fs*2 was not reported by 8 labs that determined it to be a likely benign germline variant based on Genome Aggregation Database (gnomAD) annotation, 7 variants were not reported by labs using the Ion Torrent platform because they were not considered hotspots or actionable mutations of interest by Ion ReporterTM software (IRS), and 6 variants were not reported because of the existence of a neighboring variant in close proximity. We also observed one variant (HCT-116.CDKN2A.E33fs*15) that was not reported by 4 labs because of differences in the reference transcript used, and one variant (HCT-116.NF1.I679fs*21) was not reported by 3 labs because the variant-calling algorithm did not identify it.
Figure 2. Main sources of discordance in SNV and Indel reporting between assays.

A, A typical NGS data analysis workflow for labs performing NGS tumor testing. B, Summary of SNV/Indel discordance between assays. Abbreviations: actionable mutation of interest, aMOI
Figures 1B and C show the variant allele frequencies of SNVs and Indels measured by each assay. For SNVs, all variants displayed minimal variability across assays. Interquartile ranges were less than 8% VAF between labs except for the TERT promoter variant c.−124C>T (IQR = 13%) (Supplementary Table S5). The TERT promoter region is well known to be difficult to sequence because of its high GC content (nearly 80%) and frequently results in low coverage, lower base and mapping quality, and high strand bias that ultimately compromises the sensitivity of variant calling [30]. As expected from previous studies, we observed higher variability in VAF measurements for Indels than for SNVs (Figure 1C), with an average IQR of 3.2% for SNVs and 7.3% for Indels (Supplementary Tables S5 and S6) and more discordance in reporting Indels than SNVs between assays (Figure 1A) [9, 31, 32]. Five of the 22 Indels analyzed in this study (i.e. HCT-116.MSH6.F1088fs*5, HCT15.MSH6.D1171fs*4, HCT-116.RAD50.K722fs*14, HCT-116.CDKN2A.R24fs*20 and HCT-116.SLX4.A1461fs*2) have the broadest interquartile ranges, varying between 11% and 22% VAF. Among these, HCT-116.CDKN2A.R24fs*20 and HCT-116.MSH6.F1088fs*5 were especially challenging both biochemically and bioinformatically because they are located in homopolymeric regions [33]. Both tended to be measured at lower VAFs and were more likely to be blacklisted by assays that used an amplification-based target enrichment method (Assays A to I) (Figure 1A). Indeed, the largest source of discordance in reported variants between labs occurred in homopolymeric regions, where 16 Indels were blacklisted and not reported by labs using the Ion Torrent platform (Figure 2B).
For variant detection, there was a very high level of concordance overall, with an APA of 100% for 23 of 27 DL assays, as compared to the CL (Figure 3A, Supplementary Table S7). There was also strong agreement in VAF measurements (Figure 1B and C and Supplementary Figure S2) between assays for SNVs/Indels (median Deming regression slope = 0.98). For variant reporting, concordance between the CL (assay H) and assays that use the same platform and target enrichment method (assays A – I) was strong, with an APA exceeding 84% and an average APA of 88.7% (Figure 3A, Supplementary Table S7). Concordance in variant reporting between CL (assay H) and all other assays (assays J to φ) was lower due to removal of several variants by the automatic filters/settings of the Ion Torrent series built-in bioinformatic pipeline (68.8%<APA≤93.8%, average APA=77.9%). However, using the CL (assay H) as reference, all PPAs between CL and DL assays exceeded 88%, with a PPA of 100% for 15 of the 27 pairwise assay comparisons (Figure 3B and Supplementary Table S7). This was well above the prespecified acceptance criterion of 80% for the NCI-MATCH DL Network. All PPAs between CL and DL assays with DL as reference were also calculated and can be found in Supplementary Table S7. Finally, there was an almost perfect agreement across all labs in the reporting of OncoKB™ variants, which is not surprising given their clinical relevance independently of the chemistry and target enrichment method used (Figure 1A).
Figure 3:

Concordance for SNV/Indel detection and reporting.
Values indicate A, the average percentage positive agreement (APA) in SNV/Indel detection above the MATCH cut-off (VAF ≥ 5% and 10% for SNV and Indel, respectively) and reporting between CL (assay H) and each DL assay and B, the positive percent agreement (PPA) using CL as reference. The solid red line indicates the threshold criterion for acceptance into the NCI-MATCH DL network, which was 80% PPA with CL as reference.
CNV Concordance between Laboratories
A total of 38 CNVs found in 17 different genes, including 28 OncoKB™ variants (as of May 9, 2023, OncoKB database version 4.4) were reported as amplified at ≥ 7 copies by at least one laboratory (Figure 4A) [29]. The number of CNVs detected by each assay varied from 6 CNVs detected above the cut-off of 7 by Assay θ to 24 detected by CL (assay H) (Figure 4A). Although detected as amplified, some CNVs, such as the MET amplification in the cell lines C32 and SK-BR3, were not considered detected in our study and were not reported by several labs due to copy number values below the NCI-MATCH cut-off of 7.
Figure 4.

Copy number amplifications identified by Designated Labs.
A, Tile plot illustrating CNVs identified in the concordance study and their reporting status by laboratory. Copy number values are indicated in boxes. The cut-off for CNVs used in the NCI-MATCH trial is CN≥7. U* corresponds to assay U evaluated at a cut-off of 15 instead of 7.
B, Box plots illustrating the median of the copy number values for CNVs detected by CL (assay H) below the cut-off of 7 and C, CNVs detected by CL at or above the cut-off. Lower and upper hinges of the box plot indicate the 25th and 75th percentiles, respectively. Whiskers correspond to ± 1.58 × IQR/sqrt(n) of the hinge, where IQR is the interquartile range or distance between the first and third quartiles and n is the number of data points.
Variants written in red denote OncoKB™ variants.
Most CNVs fell within a narrow interquartile range across labs (i.e., median IQR ≤ 2.0 copies) (Figures 4A and B, Supplementary Table S8). One notable exception was MGH-3.ERBB2, which had an IQR of 53 copies that was likely driven by the very high-level amplification observed in this case (median = 111 copies). Overall, there was a strong correlation between CL (assay H) and DL copy number values (Supplementary Figure S3) (median Deming regression slope = 0.89). For CNV detection, we observed only a slightly higher concordance in variant detection than in variant reporting between CL and DL assays, as most discordance between assays was observed for amplifications close to the cut-off of 7 (Figures 4A and C). For example, the amplification MGH-1.FGFR1 (CL or assay H = 6.9) was detected above the cut-off of 7 copies and reported as amplified by 5 labs and detected below the cut-off and not reported as amplified by 3 labs (gene copy < 7). However, despite these discordances, there was strong agreement for CNV detection between DL and CL (assay H) with an APA ≥ 80% for all pairwise assay comparisons except between CL and assay U (APA = 76.9%), including an APA of 100% for 8 of the 20 assay comparisons and a median APA of 95.5% (Figure 5, Supplementary Table S9). There was also strong concordance at the variant reporting level, with a median APA of 93.7% and an APA ≥ 82% between CL (assay H) and all other assays except assays U and Y, whose APAs with the CL were 76.9% and 71.4% respectively (Figure 5, Supplementary Table S9).
Figure 5.

Concordance of CNV detection and reporting.
Values indicate A, APA in CNV detection for CNV detected above the NCI-MATCH cut-off of 7 and reporting between Central Lab (CL or assay H) and each Designated Lab (DL) assay and B, PPA using CL as reference. The solid red line indicates the threshold criterion for acceptance into the NCI-MATCH DL network, which was 80% PPA with CL as reference.
U* corresponds to assay U evaluated at a cut-off of 15 instead of 7.
Prespecified criteria for acceptance into the NCI-MATCH DL network were set at a threshold of ≥ 80% PPA in variant reporting with CL as reference for each variant type. All laboratories met these criteria for both SNVs/Indels and CNVs (Supplementary Tables S7 and S9) except assay Y for CNV reporting. In contrast to the reporting of SNVs/Indels (Figure 3A), agreement in CNV reporting between assays displayed no clear bias based on the target enrichment method used (Figure 5A), suggesting that most observed differences arose at the level of bioinformatic analysis. All PPAs between CL and DL assays with DL as reference were also calculated and can be found in Supplementary Table S9.
Interestingly, among the two most discordant assays for CNVs (assays U and Y), assay U repeatedly detected CNVs at a higher copy number than the CL (assay H) and other assays (Figures 4B and C). To investigate the basis for this discrepancy, we further analyzed the 38 amplified genes in the concordance study samples by multiple non-NGS modalities including digital droplet PCR (ddPCR), fluorescence in situ hybridization (FISH) and microarray-based hybridization of DNA (Supplementary Table 10). The results were then compared to NGS copy number values determined by CL (assay H) and assay U (Supplementary Figure S4). Among the orthogonal assays, assay U data were most strongly correlated with FISH absolute copy data, i.e., FISH copy data that are not centromere normalized (Spearman correlation coefficient: 0.87, slope: 0.65). By contrast, CL (assay H) data were more strongly correlated with the microarray data (Spearman correlation: 0.74, slope: 0.35) and with the ddPCR data (Spearman correlation: 1, slope: 0.76) than with raw FISH data (Spearman correlation: 0.68, slope: 1.89). While assay U met NCI-MATCH prespecified acceptance criteria for CNVs and showed strong CNV correlation with FISH, the CNV cut-off for assay U patient referrals was nonetheless raised to 15 to accommodate the systematic bias observed in copy number reporting between assay U and the CL (assay H). Since DL actionable mutations needed to be confirmed by the CL for patients to be evaluable for the primary clinical outcome analysis, adjustment of the cut-off to 15 maximized the number of evaluable patients referred to the study by assay U.
DISCUSSION
In this study, we have demonstrated strong concordance at the level of both variant detection and variant reporting for both SNVs/Indels and CNVs between NGS tumor tests from 26 CLIA laboratories. These labs used target enrichment methods, different chemistries, data analysis pipelines, and variant classification algorithms. The latter consisted of both commercial software and in-house tools. The generally high concordance observed in this qualification study demonstrates that a large precision medicine clinical trial can be successfully conducted using a network of NGS laboratories without a requirement for harmonized chemistry or bioinformatics.
While overall concordance in variant reporting was high, the discordance that was observed occurred at any of several steps in laboratory data analysis. For SNVs and Indels, most discordance in variant reporting (16/29 cases of discordance) occurred due to the blacklisting of variants in low complexity (e.g., GC-rich) or homopolymeric regions by those labs using the Ion Torrent platform, which included the CL. This led to the reporting of fewer variants by those labs than by laboratories using Illumina-based sequencing platforms, and generally resulted in APA percentages that were lower than the corresponding PPA comparisons with the CL as reference. A notable outcome of this study is that the CL and several DLs have since changed some blacklisting settings to allow reporting of certain actionable variants, such as mutations in the TERT promoter, when high VAF, read coverage above minimum thresholds, and data of adequate quality (Quality score > Q30) are observed.
A second major reason for the discrepancies in the clinical reporting of variants occurred at the level of variant classification, further highlighting the need for the establishment and use of a gold standard approach to variant interpretation and reporting. Knowledgebases such as MyCancerGenome [34, 35], CIViC [36], OncoKB™ [29], Cancer Genome Interpreter [37] and ClinVar [38] have been developed to aid laboratories in the curation and interpretation of NGS data, but they are not harmonized [39]. This results in discrepancies whereby one knowledgebase annotates a variant as “likely pathogenic” while another might consider it a “variant of unknown significance (VUS)”. The Variant Interpretation for Cancer Consortium’s Meta-Knowledgebase has made progress toward resolving discrepancies in variant annotation by aggregating evidence across multiple knowledgebases to reach a consensus interpretation, albeit for a still somewhat limited number of hotspot mutations [40]. Additionally, the Association for Molecular Pathology and the College of American Pathologists have jointly published guidelines for how to establish, validate, and monitor bioinformatic NGS pipelines [41]. These advancements notwithstanding, we feel it would be of great value for CLIA NGS laboratories nationwide to participate in an in silico-based quality assessment focused on the processing and interpretation of NGS variant data to better understand the current level of inter-lab discordance in variant interpretation.
In tumor specimens, the accurate assessment of copy number amplification by NGS is complicated by the highly rearranged nature of most cancer genomes. Over 90% of tumors have some level of aneuploidy [42] and ~30% experience whole genome duplication [43]. Additionally, solid tumors are an admixture of tumor, stromal, and other diploid cellular components that pose additional challenges to estimating copy number. Still, overall agreement in CNV detection between CL and DL assays was very good (average APA = 93% and APA >80%), with the exception of assay U (APA=76.9%). This assay was optimized for clinical samples with stromal components and adjusted for tumor purity in their estimate of copy number, which the CL and other labs did not do. However, adjustment for tumor purity does not fully explain the discordance seen in the tumor cell lines used in this study, which have no stromal component. Additional discordance between assay H (or CL) and assay U can be attributed to assay U incorporating both relative read depth coverage and B-allele frequencies into its CNV calling algorithm, while most other labs use relative depth coverage alone. The use of B-allele frequencies allows the estimate of copy number to be adjusted to account for aneuploidy at the arm, chromosome, or whole genome level [44–47].
We further compared copy number assessment between assay U (hybrid-capture NGS), assay H (amplicon NGS), FISH, ddPCR, and a microarray-based hybridization platform. As far as we are aware, this was the first such cross-platform evaluation published. FISH is considered by some to be a gold standard for evaluating amplifications. However, no clear truth emerged from this cross-platform/technology comparison. Indeed, assay U CNV reporting, which was more discordant with assay H (or CL) than the other DL assays, tracked more closely with FISH absolute copy data than assay H, while assay H tracked more closely with FISH gene/centromere ratio data. These results reflect how each assay accounted for the aneuploidy we observed at most of the loci tested, which was confirmed by cytogenetics (Supplementary Table S10) and often resulted in low gene/centromere ratios. These limitations notwithstanding, we saw strong correlations between FISH and both assay U and assay H. Future research could evaluate a more comprehensive cross-platform comparison for different assays.
While many advances have been made in precision medicine directed gene sequencing since the NCI-MATCH DL network was initiated, to our knowledge, this is still the most comprehensive study published to date comparing NGS tests used in the routine clinical management of patients with cancer in the United States. In contrast with previously published studies, which have generally been restricted to a relatively small number of variants (≤ 20 SNVs) [4–7], this study evaluated 28 SNVs, 22 Indels, and 38 CNVs. Additionally, we used cell lines and clinical samples that were formalin-fixed to recapitulate as much as possible what might be observed with standard FFPE clinical samples. Despite these strengths, our work had several limitations. First, all DLs were sent extracted DNA, which did not allow us to evaluate the potential impact of preanalytical variables (i.e., sample processing, tumor enrichment, and DNA extraction) on the results. We also did not evaluate rearrangements because assay H (or CL) used an RNA-based approach for detection of rearrangements while some Designated labs used a DNA-based approach. In addition, the study was not designed to assess the concordance of assays near reporting thresholds (e.g., 5% VAF). More discordance would likely have been observed if we had included more variants near thresholds. Another limitation was the length of time required to complete the study, with two years elapsing between the completion of the first concordance study by a DL and when the last DL joined the network. Some variants initially considered to be of uncertain significance were reclassified as likely actionable during this two-year period, which may have contributed to some of the reporting discordance seen between labs. Similarly, variant-calling algorithms and curation methods evolved over time, favoring laboratories that performed the concordance study later. Finally, most of the concordance data were generated using cell lines that do not truly mimic the cellular complexity of clinical samples derived from solid tumors.
Furthermore, it is noteworthy that tumor mutational burden and microsatellite instability, two clinically actionable complex biomarkers, were not part of the molecular eligibility criteria for any NCI-MATCH study arm. At the time of the start of the DL network in 2018, there were no FDA-approved companion diagnostics to measure TMB or MSI and few labs included them in their clinical report. Similarly, we did not evaluate RNA-Seq in the several DL labs that now offer it as a clinical assay. Interestingly, similar to the inclusion of MSI and TMB, laboratories are increasingly adopting RNA-Seq into their CLIA laboratory workflows to interrogate for actionable fusions and to identify gene expression patterns in the tumor microenvironment that may be predictive of response to treatment [48].
Despite these limitations, our study indicates that a national network of NGS laboratories can be successfully utilized to identify actionable genomic variants for the purpose of assigning patients to treatment in a large precision medicine trial. A designated network lab approach provides for a wider search for trial-eligible rare variants than would be possible by a single central screening laboratory. Indeed, plans are underway with NCI and the National Clinical Trials Network to leverage the DL Network for the screening and referral of patients to future NCI precision medicine initiatives. Data presented here support the continued use of a concordance assessment to qualify any new NGS laboratory-developed tests applying to the DL or other NCI laboratory networks.
Supplementary Material
TRANSLATIONAL RELEVANCE:
This study done in the context of the NCI-MATCH trial, one of the largest oncology precision trials, is the most comprehensive comparison of NGS laboratory-developed tests used in the routine clinical management of cancer patients in the United States. Using 6 formalin-fixed cell lines and 2 clinical samples to best mimic what might be observed with standard FFPE clinical samples, we evaluated 28 SNVs, 22 Indels, and 38 CNVs and showed a strong concordance at the level of both variant detection and variant reporting for both SNVs/Indels and CNVs between 29 NGS tumor tests from 26 CLIA laboratories. These data demonstrate that a national network of NGS laboratories can be successfully utilized to identify actionable genomic variants for assigning patients to treatment in a large precision medicine trial. This lab network approach provides for a wider and faster search for patients eligible for trials of patients with rare variants and/or rare cancers.
Acknowledgments
This manuscript is dedicated to the memory of Mickey Williams, who founded the Molecular Characterization Laboratory in 2010 and led it until his retirement in February of 2024. Mickey was a superb translational scientist with a unique creative vision. In support of clinical trials sponsored by the National Cancer Institute, his lab developed several cutting-edge genomic assays that had an enduring impact on the lives of patients with cancer. Among his many scientific achievements, he stood up a network of 5 laboratories which screened the first 6000 patients for the NCI-MATCH trial, demonstrating for the first time that high concordance among NGS labs running the same harmonized workflow was indeed possible. In support of NCI’s Cancer Moonshot Biobank study, he helped bring molecular testing to underserved communities. He was also a humanist, a mentor, and a beloved friend to many at NCI, Frederick National Laboratory and beyond until his recent untimely passing. He will be deeply missed.
The authors thank Ben Kim and Dr. Brian Sorg from the Cancer Diagnosis Program at the NCI Division of Cancer Treatment at the National Institutes of Health, Dr. Lily Chen from the Molecular Characterization (MoCha) Laboratory at the Frederick National Laboratory for Cancer Research (Frederick, MD), Cynthia Winter and Dr. Jennifer Lee from the NCI-MATCH bioinformatic support team at NCI CBIIT for their assistance with the NCI-MATCH DLs (e.g., DL orientation calls and the NGS confirmation testing study between the NCI-MATCH CLs and the DLs). The authors also thank the NCI-MATCH Manuscript Committee for editorial review of this manuscript. This study was conducted by the NIH/National Cancer Institute and supported by Federal funds from the NIH/National Cancer Institute under Contract No. HHSN261201500003I and award numbers U10CA180820 and UG1CA233180. NCI-MATCH was coordinated by the ECOG-ACRIN Cancer Research Group (Peter J. O’Dwyer, MD and Mitchell D. Schnall, MD, PhD, Group Co-Chairs). Drs. Tsongalis, Iafrate, Sklar and Hamilton’s laboratories were all subcontractors and thus were supported under Contract No. HHSN261201500003I and award numbers U10CA180820 and UG1CA233180.
Footnotes
Disclosure of Potential Conflicts of Interest
A. John Iafrate reports receiving royalties from Invitae and being a scientific advisory board member for Kinnate, Repare, SequreDx and Paige.AI. No potential conflicts of interest were disclosed by the other authors.
The views presented in this article are those of the authors and should not be viewed as official opinions or positions of the National Cancer Institute, NIH, or U.S. Department of Health and Human Services.
REFERENCES:
- 1.Conley BA and Doroshow JH, Molecular analysis for therapy choice: NCI MATCH. Semin Oncol, 2014. 41(3): p. 297–9. [DOI] [PubMed] [Google Scholar]
- 2.O’Rawe J, et al. , Low concordance of multiple variant-calling pipelines: practical implications for exome and genome sequencing. Genome Med, 2013. 5(3): p. 28. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Cornish A and Guda C, A Comparison of Variant Calling Pipelines Using Genome in a Bottle as a Reference. Biomed Res Int, 2015. 2015: p. 456479. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Merker JD, et al. , Proficiency Testing of Standardized Samples Shows Very High Interlaboratory Agreement for Clinical Next-Generation Sequencing-Based Oncology Assays. Arch Pathol Lab Med, 2019. 143(4): p. 463–471. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Keegan A, et al. , Proficiency Testing of Standardized Samples Shows High Interlaboratory Agreement for Clinical Next Generation Sequencing-Based Hematologic Malignancy Assays With Survey Material-Specific Differences in Variant Frequencies. Arch Pathol Lab Med, 2020. [DOI] [PubMed] [Google Scholar]
- 6.Pfeifer JD, Loberg R, Lofton-Day C, and Zehnbauer BA, Reference Samples to Compare Next-Generation Sequencing Test Performance for Oncology Therapeutics and Diagnostics. Am J Clin Pathol, 2022. 157(4): p. 628–638. [DOI] [PubMed] [Google Scholar]
- 7.Gutowska-Ding MW, et al. , One byte at a time: evidencing the quality of clinical service next-generation sequencing for germline and somatic variants. Eur J Hum Genet, 2020. 28(2): p. 202–212. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Highnam G, et al. , An analytical framework for optimizing variant discovery from personal genomes. Nat Commun, 2015. 6: p. 6275. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Fang H, et al. , Indel variant analysis of short-read sequencing data with Scalpel. Nat Protoc, 2016. 11(12): p. 2529–2548. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Boegel S, et al. , Bioinformatic methods for cancer neoantigen prediction. Prog Mol Biol Transl Sci, 2019. 164: p. 25–60. [DOI] [PubMed] [Google Scholar]
- 11.Shen R and Seshan VE, FACETS: allele-specific copy number and clonal heterogeneity analysis tool for high-throughput DNA sequencing. Nucleic Acids Res, 2016. 44(16): p. e131. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Yuan X, et al. , Accurate Inference of Tumor Purity and Absolute Copy Numbers From High-Throughput Sequencing Data. Front Genet, 2020. 11: p. 458. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Liu B, et al. , Computational methods for detecting copy number variations in cancer genome using next generation sequencing: principles and challenges. Oncotarget, 2013. 4(11): p. 1868–81. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Zhao M, et al. , Computational tools for copy number variation (CNV) detection using next-generation sequencing data: features and perspectives. BMC Bioinformatics, 2013. 14 Suppl 11: p. S1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Gong T, Hayes VM, and Chan EKF, Detection of somatic structural variants from short-read next-generation sequencing data. Brief Bioinform, 2021. 22(3). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Flaherty KT, et al. , Molecular Landscape and Actionable Alterations in a Genomically Guided Cancer Clinical Trial: National Cancer Institute Molecular Analysis for Therapy Choice (NCI-MATCH). J Clin Oncol, 2020. 38(33): p. 3883–3894. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Hainsworth JD, et al. , Targeted Therapy for Advanced Solid Tumors on the Basis of Molecular Profiles: Results From MyPathway, an Open-Label, Phase IIa Multiple Basket Study. J Clin Oncol, 2018. 36(6): p. 536–542. [DOI] [PubMed] [Google Scholar]
- 18.Mangat PK, et al. , Rationale and Design of the Targeted Agent and Profiling Utilization Registry (TAPUR) Study. JCO Precis Oncol, 2018. 2018. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Yee LM, et al. , Biostatistical and Logistical Considerations in the Development of Basket and Umbrella Clinical Trials. Cancer J, 2019. 25(4): p. 254–263. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Group E-ACR NCI-MATCH: The Blueprint for Future Precision Medicine Trials. 2022; Available from: https://blog-ecog-acrin.org/nci-match-the-blueprint-for-future-precision-medicine-trials/.
- 21.O’Dwyer PJ, et al. , The NCI-MATCH trial: lessons for precision oncology. Nat Med, 2023. 29(6): p. 1349–1357. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Lih CJ, et al. , Analytical validation of the next-generation sequencing assay for a nationwide signal-finding clinical trial: Molecular Analysis for Therapy Choice clinical trial. Journal of Molecular Diagnostics, 2017. 19(2): p. 313–327. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Singh RR, et al. , Comprehensive Screening of Gene Copy Number Aberrations in Formalin-Fixed, Paraffin-Embedded Solid Tumors Using Molecular Inversion Probe-Based Single-Nucleotide Polymorphism Array. J Mol Diagn, 2016. 18(5): p. 676–687. [DOI] [PubMed] [Google Scholar]
- 24.Wen J, et al. , Detection of cytogenomic abnormalities by OncoScan microarray assay for products of conception from formalin-fixed paraffin-embedded and fresh fetal tissues. Mol Cytogenet, 2021. 14(1): p. 21. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Statistical guidance on reporting results from studies evaluating diagnostic tests. 2007, Center for Devices and Radiological Health: US FDA, Rockville, MD. [Google Scholar]
- 26.Hewitt SM, et al. , Quality Assurance for Design Control and Implementation of Immunohistochemistry Assays; Approved Guideline. Second Edition CLSI document I/LA28-A2. ed. Vol. 31. 2011, Wayne, PA: Clinical and Laboratory Standards Institute. [Google Scholar]
- 27.Giavarina D, Understanding Bland Altman analysis. Biochemia Medica, 2015. 25(2): p. 141–151. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Chakravarty D, et al. , OncoKB: A Precision Oncology Knowledge Base. JCO Precis Oncol, 2017. 2017. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Lee H, et al. , Detection of TERT Promoter Mutations Using Targeted Next-Generation Sequencing: Overcoming GC Bias through Trial and Error. Cancer Res Treat, 2022. 54(1): p. 75–83. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Weissbach S, et al. , Reliability of genomic variants across different next-generation sequencing platforms and bioinformatic processing pipelines. BMC Genomics, 2021. 22(1): p. 62. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Narzisi G and Schatz MC, The challenge of small-scale repeats for indel discovery. Front Bioeng Biotechnol, 2015. 3: p. 8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Lincoln SE, et al. , One in seven pathogenic variants can be challenging to detect by NGS: an analysis of 450,000 patients with implications for clinical sensitivity and genetic test implementation. Genet Med, 2021. 23(9): p. 1673–1680. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Jain N, et al. , The My Cancer Genome clinical trial data model and trial curation workflow. J Am Med Inform Assoc, 2020. 27(7): p. 1057–1066. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Kusnoor SV, et al. , My Cancer Genome: Evaluating an Educational Model to Introduce Patients and Caregivers to Precision Medicine Information. AMIA Jt Summits Transl Sci Proc, 2016. 2016: p. 112–21. [PMC free article] [PubMed] [Google Scholar]
- 36.Griffith M, et al. , CIViC is a community knowledgebase for expert crowdsourcing the clinical interpretation of variants in cancer. Nat Genet, 2017. 49(2): p. 170–174. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Landrum MJ, et al. , ClinVar: public archive of interpretations of clinically relevant variants. Nucleic Acids Res, 2016. 44(D1): p. D862–8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Tamborero D, et al. , Cancer Genome Interpreter annotates the biological and clinical relevance of tumor alterations. Genome Med, 2018. 10(1): p. 25. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Conway JR, Warner JL, Rubinstein WS, and Miller RS, Next-Generation Sequencing and the Clinical Oncology Workflow: Data Challenges, Proposed Solutions, and a Call to Action. JCO Precis Oncol, 2019. 3. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Wagner AH, et al. , A harmonized meta-knowledgebase of clinical interpretations of somatic genomic variants in cancer. Nat Genet, 2020. 52(4): p. 448–457. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Roy S, et al. , Standards and Guidelines for Validating Next-Generation Sequencing Bioinformatics Pipelines: A Joint Recommendation of the Association for Molecular Pathology and the College of American Pathologists. J Mol Diagn, 2018. 20(1): p. 4–27. [DOI] [PubMed] [Google Scholar]
- 42.Taylor AM, et al. , Genomic and Functional Approaches to Understanding Cancer Aneuploidy. Cancer Cell, 2018. 33(4): p. 676–689 e3. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Bielski CM, et al. , Genome doubling shapes the evolution and prognosis of advanced cancers. Nat Genet, 2018. 50(8): p. 1189–1195. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44.Carter SL, et al. , Absolute quantification of somatic DNA alterations in human cancer. Nat Biotechnol, 2012. 30(5): p. 413–21. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45.Van Loo P, et al. , Allele-specific copy number analysis of tumors. Proc Natl Acad Sci U S A, 2010. 107(39): p. 16910–5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46.Yau C, et al. , A statistical approach for detecting genomic aberrations in heterogeneous tumor samples from single nucleotide polymorphism genotyping data. Genome Biol, 2010. 11(9): p. R92. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47.Wang K, et al. , PennCNV: an integrated hidden Markov model designed for high-resolution copy number variation detection in whole-genome SNP genotyping data. Genome Res, 2007. 17(11): p. 1665–74. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48.Nuzhdina K, et al. , Analytical and clinical validation of the BostonGene tumor portrait assay. ASCO, 2021. 39(15). [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
Raw data for this study were generated by DL and CL. All BAM sequencing files from the CL are available through the Sequence Read Archive under the BioProject ID “PRJNA1250987”. These data, as well as all other processed data files from the CL, are available from the corresponding author upon request. Other derived data supporting the findings of this study, including processed data files from each DL, are available within the article and its supplementary data files.
