Skip to main content
The Journal of Molecular Diagnostics : JMD logoLink to The Journal of Molecular Diagnostics : JMD
. 2018 Nov;20(6):822–835. doi: 10.1016/j.jmoldx.2018.06.007

Analytical Validation of Clinical Whole-Genome and Transcriptome Sequencing of Patient-Derived Tumors for Reporting Targetable Variants in Cancer

Kazimierz O Wrzeszczynski ∗,, Vanessa Felice , Avinash Abhyankar , Lukasz Kozon , Heather Geiger , Dina Manaa , Ferrah London , Dino Robinson , Xiaolan Fang , David Lin , Michelle F Lamendola-Essel , Depinder Khaira , Esra Dikoglu , Anne-Katrin Emde , Nicolas Robine , Minita Shah , Kanika Arora , Olca Basturk , Umesh Bhanot , Alex Kentsis †,, Mahesh M Mansukhani §, Govind Bhagat §, Vaidehi Jobanputra ∗,§,
PMCID: PMC6198246  PMID: 30138725

Abstract

We developed and validated a clinical whole-genome and transcriptome sequencing (WGTS) assay that provides a comprehensive genomic profile of a patient's tumor. The ability to fully capture the mappable genome with sufficient sequencing coverage to precisely call DNA somatic single nucleotide variants, insertions/deletions, copy number variants, structural variants, and RNA gene fusions was analyzed. New York State's Department of Health next-generation DNA sequencing guidelines were expanded for establishing performance validation applicable to whole-genome and transcriptome sequencing. Whole-genome sequencing laboratory protocols were validated for the Illumina HiSeq X Ten platform and RNA sequencing for Illumina HiSeq2500 platform for fresh or frozen and formalin-fixed, paraffin-embedded tumor samples. Various bioinformatics tools were also tested, and CIs for sensitivity and specificity thresholds in calling clinically significant somatic aberrations were determined. The validation was performed on a set of 125 tumor normal pairs. RNA sequencing was performed to call fusions and to confirm the DNA variants or exonic alterations. Here, we present our results and WGTS standards for variant allele frequency, reproducibility, analytical sensitivity, and present limit of detection analysis for single nucleotide variant calling, copy number identification, and structural variants. We show that The New York Genome Center WGTS clinical assay can provide a comprehensive patient variant discovery approach suitable for directed oncologic therapeutic applications.


Next-generation DNA sequencing (NGS) technologies are currently being applied in the clinical setting for the treatment of disease. The goal is to use high-throughput sequencing to identify specific variants within each tumor and to recommend personalized treatment approaches or clinical trials tailored to the individual's disease and genomic profile.1, 2, 3, 4, 5 Most of these assays currently are comprised of either predefined sequencing panels, in which a handpicked set of clinically significant genes are examined within each patient, or are cancer type–specific targeted sequencing protocols or whole-exome platforms that cover only the coding region of the patient's genome.6, 7, 8 Whole-genome sequencing (WGS) allows hypothesis-free interrogation of both coding and noncoding regions of the genome to reveal more potential therapeutic options than examining a small set of genes or genomic loci.9, 10 The assay eliminates sequence capture–related bias observed in whole-exome or panel sequencing. Clinical NGS assays undergo strict analytical validation protocols before their diagnostic use.11, 12, 13, 14 We therefore have performed analytical validation of whole-genome and transcriptome sequencing (WGTS) of patient-derived tumors and matched normal for the purposes of clinical testing and have devised a clinical reporting strategy of significant driver- and therapeutic-associated mutations. Many clinical NGS validation guidelines are directed toward targeted panel or exome sequencing at high depths of sequencing.13, 14, 15, 16, 17 Here, we expanded on New York State's Department of Health NGS Oncology guidelines for somatic genetic variant detection (updated and revised March 2015), developing them into novel standards applicable to WGTS for the purposes of clinical test validation. The optimum depth of sequencing necessary for high confidence somatic variant calling was first estimated using in silico analysis of a virtual human genome. WGS laboratory protocols for DNA and RNA sequencing derived from fresh or frozen (FF) and formalin-fixed, paraffin-embedded (FFPE) tumor samples were then validated. A series of experiments were performed to assess the accuracy and reliability of the results based on standardized laboratory and bioinformatics protocols. WGS was performed for tumor/normal pairs on a total of 125 specimens (77 FF, 48 FFPE), a subset of which had known genomic profiles. The validation results and clinical WGTS standards for depth of sequencing, reproducibility, sensitivity, and limit of detection analysis for single nucleotide variant (SNV) calling, copy number identification, and structural variations are presented here. RNA sequencing is performed to identify fusions and to confirm DNA variants and exonic aberrations. The New York Genome Center WGTS clinical assay is intended to provide a more comprehensive patient variant discovery approach suitable for directed oncologic therapeutic applications.

Materials and Methods

DNA and RNA Sequencing Preparations

A total of 125 tumors and matched normal samples were collected from patients with consent and under institutional review board guidelines. Tumor/normal samples were gifted from the following investigators: Drs. Olca Basturk, Umesh Bhanot, Alex Kentis, and Marc Ladanyi (Memorial Sloan Kettering Cancer Center, New York, NY) and Drs. Govind Bhagat and Mahesh Mansukhani (Columbia University Cancer Center, New York, NY). The 125 tumor/normal sample sets consisted of 16 different cancer types (Table 1). The AllPrep DNA/RNA Mini Kit (Qiagen, Valencia, CA) designed for purifying both genomic DNA and total RNA from tissue samples was used. Because there is no need to divide the sample into two for separate purification procedures, maximum yields of DNA and RNA can be achieved. The purified DNA and RNA are eluted separately and ready to use in any downstream application. Total DNA and RNA from approximately 10 mg of tumor tissue was extracted. Total genomic DNA purified from 200 μL of whole blood by using the QIAamp DNA Blood Mini Kit from Qiagen was used as a normal reference for sequencing. DNA sequencing libraries were prepared with the KAPA Library preparation kit (Kapa Biosystems, Wilmington, MA). This process included shearing the DNA, repairing the ends of the fragments, adding an A-base to the 3′ ends, ligating Illumina adapters, and amplifying the DNA to prepare samples for sequencing. The DNA is sheared to an average base pair size of 450 bp by using the Covaris LE220 instrument (Covaris, Woburn, MA) under default settings. After shearing, the DNA end repair step uses T4 DNA polymerase and Klenow DNA polymerase to remove 3′ overhangs and to fill in the 5′ overhangs. Because the adapter ligation requires the presence of a 3′ A-base on the double-stranded DNA fragments, the adenylation step uses dATPs and Exo(-) Klenow to adenylate the DNA fragments. NEXTflex-96 DNA adapters (Bioo Scientific, Austin, TX) are attached to the 3′ ends by using DNA ligase, followed by PCR enrichment. The final libraries are then assayed for quality with the use of the Agilent 2100 Bioanalyzer by using the DNA 1000 chip (Agilent Technologies, Santa Clara, CA). Final libraries that pass quality control have a concentration of >2 ng/μL and a library size of >200 bp with average peak height of ≥400 bp.

Table 1.

Cancer Types in Validation Sample Set

Cancer type Sample count
Brain 40
Sarcoma 15
Colon 11
Lymphoma 11
Lung 8
Pancreatic 8
Leukemia 6
Bone 5
Ovarian/cervical 4
Skin 3
Kidney 3
Breast 3
Multiple myeloma 3
Liver 2
Appendiceal 2
Unknown 1

For RNA preparation, for FF samples with a RNA integrity number (RIN) score ≥7, Illumina's TruSeq stranded mRNA sample prep kit was used. This process involves a poly-A capture of the mRNA that is then fragmented. The cDNA is then ligated with adapters and enriched by PCR to create libraries for sequencing. For samples (FFPE or FF) with a RIN score <7, KAPA's Stranded RNA-Seq Kit with RiboErase (HMR) combined with Agilent's SureSelectXT Target Enrichment Kit for Illumina Multiplex Sequencing was used. Because of the degradation of FFPE RNA samples (all FFPE samples in this study had a RIN score <7), a poly-A capture was not used in this preparation. RNA samples were fragmented by using heat and magnesium and were converted into cDNA. A-tailing was performed to add dAMP to the 3′ ends of the dscDNA library fragments. dsDNA adapters with 3′ dTMP overhangs were ligated to the A-tailed library insert fragments. Library fragments that carried appropriate adapter sequences at both ends were amplified by using high-fidelity, low-bias PCR. Samples were then hybridized with biotinylated RNA library baits, and the targeted regions were selected by using magnetic streptavidin beads before amplification. The library quality was confirmed on an Agilent 2100 Bioanalyzer by using the Agilent High Sensitivity chip, and the quantity was confirmed by using Thermo Fisher's Qubit 4 Flouorometer with the dsDNA BR Assay kit (Thermo Fisher Scientific, Waltham, MA). Final libraries passed quality control with a concentration of >2 ng/μL and an average peak height of >400 bp for the mRNA protocol and an average peak height of >300 bp for the KAPA total RNA protocol.

DNA and RNA Sequencing

DNA sequencing is performed on the Illumina HiSeq X 2 × 150 bp run (Illumina, San Diego, CA). The final DNA library was diluted, denatured, and introduced into the lanes of the flow cell by using the Illumina cBot 2 system according to the manufacturer's protocol. The libraries were loaded at a 2:1 tumor/normal (T/N) ratio to reach coverage (average read depth) of 80× for the tumor sample and 40× for the normal sample. A total of two patient samples, or T/N pairs, can be loaded onto one HiSeqX flowcell. For RNA samples prepared with the mRNA protocol, libraries were sequenced on the Illumina HiSeq 2500 2 × 125 bp Rapid Run platform. For RNA samples prepared with the KAPA total protocol, libraries were sequenced on the Illumina HiSeq 2500 2 × 50 bp Rapid Run platform. A total of seven samples were multiplexed onto one flowcell, giving a minimum of 40 million reads per sample.

Genotyping Chip

Genotyping was performed on all samples received for WGS as an internal quality control measure and to determine copy number. First, extracted DNA was normalized, denatured, and neutralized. After an overnight amplification step, DNA was fragmented, precipitated, resuspended, and hybridized to Illumina HumanOmni 2.5M BeadChips (WG-313-2511; Illumina). Hybridization was followed by a wash to remove unhybridized and nonspecifically hybridized DNA. Then, single-base extension of the oligos present on the BeadChip was performed by using the patient DNA as a template. This step incorporates red and green fluorophores onto the BeadChip to enable genotype calling. Finally, BeadChips were loaded onto the Illumina HiScan microarray scanner, which uses a laser to excite the fluorophores attached to the single-base extension products on the beads. The scanner yields fluorescence intensity data files that are interpreted in the context of biological information about the single nucleotide polymorphisms on the BeadChips to generate genotype calls. Up to 12 samples are hybridized on a single BeadChip.

Data Analysis

SNV/Indels

Paired-end 2 × 150 bp reads were aligned to the GRCh37 human reference by using the Burrows-Wheeler Aligner (version 0.7.8)18 and processed by using the best-practices pipeline that includes marking of duplicate reads by the use of Picard tools and realignment around insertions/deletions (indels) and base recalibration by Genome Analysis Toolkit version 2.7.4.19 The following variant callers were used: MuTect version 1.1.720 (SNVs only), Strelka version 1.0.1421 (both SNVs and indels), and Pindel version 0.2.5a8.22 SNVs and indels were annotated by snpEff version 4.2,23 snpSift version 4.0,23 and Genome Analysis Toolkit VariantAnnotator by using annotation from ENSEMBL,24 COSMIC,25 Gene Ontology,26 and 1000 Genomes.27

Structural Variation

Structural variants (SVs), such as copy number variants (CNVs) and complex genomic rearrangements, were detected by the use of the following multiple tools: BIC-seq228 for CNV/SV calling, Delly version 1.0,29 Crest version 0.6.1,30 and BreakDancer version 1.4.031 for SV calling. SVs were prioritized in the intersection of callers and SVs for which additional split-read evidence could be found by using SplazerS version 1.032 and SVs for which there was split-read support in the matched normal or that were annotated as known germline variants (1000 Genomes call set; Database of Genomic Variants) were removed as likely remaining germline variants. The predicted set of somatic SVs were annotated with gene overlap (RefSeq, Cancer Gene Census), including prediction of potential effect on genes (eg, disruptive/exonic, intronic, intergenic, fusion candidate).

Tumor Purity and Ploidy

Tumor purity was calculated from WGS data by using the Titan version 1.8.033 package for copy number identification from clonal cell populations. In addition, purity and ploidy were calculated from the Illumina OMNI 2.5M Array by using the allele-specific copy number analysis of tumors (ASCAT) version 2.1 program.34

RNA-Sequencing

Read alignment to the reference genome was performed by using the RNA-Seq aligner STAR.35

The somatic variant discovered in the WGS analyses was annotated with expressed variant measured in the RNA-Seq experiment. The fusion transcript discovery tool FusionCatcher36 was used to look for fusion genes expressed in the tumor samples.

Variant Annotations and Databases

All variants are annotated based on an in-house clinical classification system (Results) in which variants in targetable and Cosmic cancer census25 genes were identified and prioritized. Initial matching of each variant to drug(s) was performed by identifying the tumor-specific gene variants, relative to normal germline DNA, based on SNV, CNV, and RNA-seq data, and searching the expert-curated New York Genome Center (NYGC) drug-to-gene database. Our internal drug-to-gene database was assembled by manual curation of publically available data from the National Comprehensive Network (https://www.nccn.org; last accessed December 20, 2017), the US Food and Drug Administration (FDA; https://www.fda.gov/Drugs/InformationOnDrugs/ApprovedDrugs; last accessed December 20, 2017), Civic or Clinical Interpretations of Variants in Cancer (civic.genome.wustl.edu; last accessed December 20, 2017), Precision Cancer Therapy-MD Anderson (https://pct.mdanderson.org; last accessed December 20, 2017), OncoKB (https://oncokb.org; last accessed on December 20, 2017), canSar (https://cansar.icr.ac.uk; last accessed December 20, 2017), Pharmacogenomics Knowledgebase or PharmGKB (www.pharmgkb.org; last accessed December 20, 2017), Clinical Trials.gov (https://clinicaltrials.gov; last accessed December 20, 2017), and from directed literature searches.

Results

Analytical Performance

To determine the optimal depth of coverage for proper variant analysis of T/N pair WGS, a synthetic in silico (virtual tumor) experiment was performed. A virtual tumor was generated at high coverage, and the tumor and normal bam files were down sampled while measuring positive predictive value (PPV) and sensitivity at specific paired T/N coverage. The synthetic T/N pairs were generated by using HapMap Project human samples NA12892 and NA12891 similar to Cibulskis et al.20 Sample NA12892 was sequenced separately to 65× (NA12892-R1) and 180× (NA12892-R2), and sample NA12891 was sequenced to 110×. To create the synthetic virtual tumor, homozygous SNV reads in NA12891 were mixed into homozygous reference positions in NA12892-R2 and compared with NA12892-R1 sample as a T/N pair (Supplemental Figure S1). The sites were sampled binomially to produce allelic frequencies of 40%, 20%, 10%, and 5% (Supplemental Figure S2). [Note: using these specific fractions created a virtual tumor that contained a greater proportion of low variant allele frequency (VAF) positions <10%.] The virtual tumor and corresponding normal sample were down sampled to coverages of 140×, 90×, 80×, 60×, 40×, and 30×, and 60×, 40×, and 30×, respectively. Both precision and recall were determined at each coverage interval based on the true positive NA128921-R1 homozygous reference positions. Therefore, the virtual tumor was NA12892_00_NA12891_11 and the normal was NA12892-R1.

Total SNVs called and total true positives increased with higher coverage in both tumor and normal samples (Figure 1 and Supplemental Table S1). MuTect20 true positive capture was affected by lower sequencing coverage with 40× normal samples consistently outperforming 30× normal samples. Sensitivity could be increased between 8% and 9% by using genomes with 80×:40× (tumor:normal) coverage compared with 60×:30×, with minimal changes in PPV. Therefore, not unexpectedly, PPV increased with greater normal sequencing depth and sensitivity increased with additional tumor sequencing depth. The performance of the two SNV callers (MuTect and Strelka21) with respect to sequencing depth was highly dependent on the filtering strategies for low-quality sequencing reads and sequencing errors. Several sequencing metrics were incorporated to determine evidence for a variant beyond the expected random sequencing errors. Therefore, specifically adding additional tumor reads in the absence of normal reads could increase sequencing error at any given position, these were less likely to be offset by normal coverage. Increase in tumor coverage without additional normal coverage would therefore produce more false positives because of the potential for more sequencing error. This can often produce low VAF false positive calls. Increase in normal coverage (maintaining the same tumor coverage) allowed for a more precise calculation of the log-odds score of a variant for determining true positives. A slight increase in PPV (elimination of false positives) was observed when adding normal coverage and maintaining consistent tumor coverage. However, the difference in PPV change between 80×:40× and 60×:30× was minimal. The benefit of 80×:40× over 60×:30× in sensitivity was evident at different levels of variant detection, whereas adding 20× of normal coverage to 80×:60× did not provide significant improvement in performance, suggesting 40× normal (with 80× tumor) to be sufficient coverage for sequencing error filtering. Variant calls made by using a combination of both callers MuTect and Strelka increased the capture of true positives and in turn the elimination of false negatives with minimal false positive increase (Supplemental Table S2). The union of both callers MuTect and Strelka was therefore applied at 80× tumor and 40× normal coverage to optimize the identification of somatic variants in clinical tumor samples.

Figure 1.

Figure 1

Positive predictive value (PPV) and sensitivity (SENS) for tumor/normal sequencing depth. Virtual tumor experiment PPV (solid line) and SENS (dotted line) percentages for single nucleotide variant callers MuTect (version 1.1.7) (black lines) and Strelka (version 1.0.14) (red lines) over a range of tumor/normal sequencing coverage. PPV increases with increased normal sequencing depth. SENS increases with increased tumor sequencing depth.

The distribution of true positive calls and false positive calls at 80×:40× and 60×:30× for MuTect (Figure 2) and Strelka (Supplemental Figure S3) in the virtual tumor are shown for comparison. To obtain sufficient sampling size, all calls were then binned at 10× coverage intervals, and the minimum 95% CI of true positive calls within those intervals was determined (Table 2). A standard Gaussian lower bound CI was determined at 10 read count bins. This calculation was performed based on the VAF and alternate allele count true positive distributions, respectively. MuTect accurately (lower CI 95%) identified 15% VAF with ≥30 reads at a given true positive variant position. A 95% CI was obtained by MuTect at approximately four alternate allele reads for 30 total reads. Strelka had a 95% CI of 17% VAF and seven alternate alleles for 40 total reads. Sequencing T/N samples at 80×:40× with above VAF and alternate allele count thresholds specific for each caller will be used for somatic variant detection in clinical diagnostic sequencing. At coverage of 80× tumor and 40× normal the 95% CI thresholds of 15% VAF and 40 total reads were comparative to other assays. For instance, the Weill Cornell Medicine assay (EXaCT-1)37 showed a power calculation of 10% to 12% VAF at 28× total coverage (P ≤ 0.05) needed to avoid false-negative results and to report a clinical threshold for variant calling of 10% VAF and 30 total reads. The MSK-IMPACT assay14 described a power calculation for variant discovery based on a Q20 sequencing error rate of ≤1% for 15% VAF, resulting in a depth of coverage between 44× and 51× to achieve a CI between 90% and 95%.

Figure 2.

Figure 2

Virtual tumor variant allele frequency (VAF) and alternate allele read count versus read count (RC). A: VAF of true positive (TP; red dots) and false positive (FP; black dots) calls made by MuTect versus total RC at tumor/normal coverage (60×:30×, 80×:40×). B: Alternate allele read count of TP (red dots) and FP (black dots) calls made by MuTect (B) versus total RC at tumor/normal coverage (60×:30×, 80×:40×).

Table 2.

Virtual Tumor 95% CI (Bottom) per VAF and Read Count 80×:40×

Total read count, bins MuTect VAF MuTect ALT_COUNT Strelka VAF Strelka ALT_COUNT
10 0 0 0 0
20 0 0 0 0
30 0.152 4.33 0 0
40 0.139 5.28 0.176 6.80
50 0.158 7.57 0.173 8.30
60 0.165 9.58 0.180 10.42
70 0.165 11.23 0.173 11.78
80 0.156 12.16 0.170 13.27
90 0.154 13.52 0.166 14.64
100 0.145 14.22 0.157 15.34

MuTect version 1.1.7, Strelka version 1.0.14. CIs calculated from the Gaussian distribution of true positive calls. Zero values represent insufficient data to perform CI calculation.

ALT_COUNT, alternate allele count; VAF, variant allele frequency.

The VAFs and read count per variant were examined in all tier 1 to tier 3 variants called in a subset (n = 64) of the samples (Supplemental Figure S4). Only 8 of the 325 variants (2.5%) fell below 40× coverage and 15% VAF threshold with two below the MSK-IMPACT power analysis threshold. Therefore, at 80× mean targeted coverage the variant calling VAF (15%) and read count (40) observed 95% CI thresholds were comparable with two other DNA NGS assays. As with these other assays all variants discovered below the thresholds were labeled as investigational.

Depth of Coverage

The average read depth (mean and SD) for all variants identified for cancer census genes was calculated in a 64 sample T/N validation data subset (53 FF tissue and 11 FFPE) (Figure 3). Mean coverage per gene variant when targeting 80× ranged from 59× to 461× among 60 samples (Figure 3A). Variability in coverage was dependent on tumor ploidy per sample. The percentage of whole-genome coverage at clinical SNV calling thresholds of 30× (PCT_30X) and 40× (PCT_40X) per sample was achieved at >85% for 62 of 64 samples in this validation set with an average whole-genome read depth across all samples of 97% and 95% of the genome covered at least 30× and 40×, respectively. Mean coverage for FF samples was 97% PCT_30X and 96% PCT_40X, and for FPPE samples it was 96% PCT_30X and 93% PCT_40X with a larger sample coverage variability (Figure 3B).

Figure 3.

Figure 3

Whole-genome sequencing coverage. A: Representation of the coverage (mean and SD) for all variants in cancer census genes based on our validation sample data. Red filled circles indicate mean coverage of genes in our validation sample set. Black lines indicate SD limits from mean. Mean coverage per gene when targeting 80× is shown. B: Percentage of genome sequencing coverage for all 64 samples at 30× read depth (PCT_30X) and 40× read depth (PCT_40X). FF, fresh or frozen; FFPE, formalin-fixed, paraffin-embedded.

Limit of Detection

To determine the limit of detection of tumor purity for variant identification (sensitivity), an artificial T/N dilution experiment was performed by using two HapMap Project samples. Genomic DNA from sample NA12892 (tumor) was diluted with sample NA12891 (normal) at mixing ratios of 50:50 (T/N), 20:80, and 10:90. Genomic DNA dilutions were performed before any sequencing preparation protocols. Because the VAFs were created in the artificial tumor by dilution, using a 100% homozygous position allowed for the most consistent and accurate VAF in the artificial tumor. Therefore, a 20% artificial tumor will contain 20% VAFs at the homozygous position (diluting a 100% homozygous position 80:20 will create a 20% VAF at that position). Variant calling analysis was performed to determine the effect of each dilution level on the identification of homozygous positions in NA12892. To examine the effect of coverage depth, identification of homozygous positions in NA12892 by two callers (MuTect and Strelka) was performed at 80×:40×. Results from the dilution experiments indicated >95% sensitivity at 20% tumor content at 80×:40× coverage by each caller was achieved (Supplemental Figure S5). Furthermore, at 30% tumor content >95% sensitivity was reached at a sequencing depth of 40×, resulting in a 30% tumor content threshold to achieve optimal sensitivity of WGS at 80×:40×.

In addition, one FFPE and one FF sample were diluted three times with its corresponding normal genomic DNA. VAF distributions are presented for the original sample and the three dilutions as called by MuTect-Strelka (Supplemental Figures S6 and S7), showing appropriate dilutions of the original sample to 30%, 20%, and 10% tumor content. Evidence of proper dilution was seen in a shift to lower VAFs in the distribution (Supplemental Figures S6 and S7). Furthermore, the bimodal nature of the distribution, often associated with subclonal heterogeneity (or ploidy variations) in the tumor, could no longer be seen in the diluted sample distributions. Sample-specific SNV, indel, and copy number calls have been reported for selected cancer genes at each dilution found in each original sample. A linear decrease consistent with dilution fractions for allelic frequency and log2(T/N) CNV values was observed in all of the three dilutions with copy number log2(T/N) ratios for the deletion events approaching a log2(T/N) value of 0 (or equal read coverage in the 10% tumor compared with the 100% normal). The highly amplified EGFR sample exhibited a log2 ratio of 1.5 at 10% and >4 at 75% tumor content (consistent with approximately 45 copies of amplified EGFR in the sample) (Figure 4 and Supplemental Table S3).

Figure 4.

Figure 4

Limit of detection. A and B: Selected sample-specific variants of single nucleotide variant (SNV) and insertion/deletion (indel) (A) and copy number variant (CNV) (B) at 30%, 20%, and 10% tumor content in two samples, respectively. A: Variant allele frequencies in two samples [frozen and formalin-fixed, paraffin-embedded (FFPE)] with original tumor content of 91% and 65%, respectively, are shown with decreasing tumor content. B: Copy number log2(tumor/normal; T/N) of one focal amplification (EGFR, FFPE sample) and three deletions [two whole arm (1p, 19q, frozen sample) and one focal deletion (CDKN2A, FFPE sample)]. Inset shows 19q and 1p CNV log2(T/N) values at 30% to 10% tumor content for better resolution.

A known oncogenic IDH1 R132H in the FF sample was captured at all dilution affected allelic frequencies with total read counts above the established confidence threshold of >40 and below a 15% VAF threshold in samples with <30% tumor content (Supplemental Table S3). In addition, CNV log2(T/N) ratios were well within the theoretical calculated values based on the equations for estimating tumor purity (http://cnvkit.readthedocs.io/en/stable/heterogeneity.html#adjusting-copy-ratios-and-segments-for-normal-cell-contamination; last accessed December 20, 2017) in all dilutions in both the FF and FFPE samples (Supplemental Figures S8 and S9). The 1p and 19q deletions in the FF sample (Figure 4 and Supplemental Figure S8) showed an increase in log2 value on subsequent dilutions with normal DNA. As tumor content decreased log2 values deviated from an expected log2 value based on purity estimates especially when calling deletions. At 10% tumor content calling copy number deletion changes in the sample seemed impossible because of the incorporation of normal reads in any region of the tumor genome that would otherwise show lower than normal coverage because of a chromosomal or focal loss. Of interest was the FFPE sample that contained an overamplified EGFR CNV. This overamplification of >50 copies was captured in all dilutions. In addition, the known Y270C variant often observed as subclonal was identified by MuTect but was rejected at 2% VAF but was called by Strelka in the original sample of 60% to 65% purity (Supplemental Table S3). Furthermore, the CDKN2A homozygous deletion log2 values also began to increase with subsequent tumor dilutions and resulted in log2 values not predictive of a homozygous loss <30% tumor content (Figure 4 and Supplemental Figure S9). Therefore, a threshold of ≥30% tumor content may be established for all SNV, indel, CNV calling in the clinical context; however, it is recommended that all copy number alterations (which are not considered fully validated by New York State Department of Health guidelines) used in a diagnostic setting should be correlated with immunohistochemistry and/or by a validated orthogonal assay if possible.

Reproducibility

To validate sequencing reproducibility, both an intra-assay and inter-assay validation was performed (Table 3). In the protocol the intra-assay and inter-assay experiments were designed to test the precision and reproducibility of one technical variable at a time. The intra-assay experiments were designed to test the precision of the library preparation and the inter-assay experiments tested the reproducibility of the sequencing runs independently. The intra-assay reproducibility included three separate library preps (p1, p2, p3), which were prepared on three different days by two different technicians. All three of these libraries were then run on the same day on different sequencing machines. The preferred experimental design would have dictated running all replicates on the same flowcell and the same instrument. However, because of the limitation of only being able to fit two T/N pairs per flowcell, libraries had to be loaded over multiple runs (which would not be needed for instance in a whole-exome or targeted panel sequencing validation). The inter-assay validation included running a single library preparation (p1) on the same sequencer on three different days, thus testing the reproducibility of the instrument. The reproducibility assays at the time of sequencing were based on designs from Washington University and Mount Sinai validation protocols for inter-run and intra-technician reproducibility.13, 38 Reproducibility was examined by evaluating genotyping markers in the four tumor and normal pairs and somatic aberrations in the four tumor samples. The genotyping concordance measurements were based on a fixed set of known germline markers39 with high population frequency, and all were homozygous or heterozygous with allelic frequencies between 40% and 100%. Performing tumor–tumor and normal–normal reproducibility analysis on the samples from separate sequencing runs (inter or intra sequencing comparison) revealed sample concordance to be >99% for FF samples and FFPE samples (Supplemental Figure S10, A and B). For each inter and intra sample concordance (intersection/union) was calculated for SNVs, indels, copy number aberrations, and SVs. SNV reproducibility was compared between intra and inter sequencing runs by using the MuTect+Strelka caller output. Unlike the germline variants, somatic variant calling was more highly dependent on sequencing quality output and total genome coverage (see above) in both the tumor and normal sample. Therefore, SNVs were analyzed in these runs that met clinical quality control metrics and variant calling filtering criteria as calculated in the analytical validation above. The concordance between two runs was measured by calculating the percentage of all calls in sample A as captured in sample B, resulting in all versus all comparison per sample set. All high/moderate/low/modifier SNVs were first examined, as annotated by SnpEff at all VAFs and total read depth. The additional filtering criteria more resembling the clinical validation, examining only nonsynonymous variants at ≥15% VAF and ≥30 total read count in sample A and sample B, was then imposed. This second criteria was added because of per gene coverage variability between samples affecting inconsistent VAFs across runs and showed reproducibility of calling variants within clinically confident metrics. Reproducibility was 92% to 95% in inter/intra sequencing runs for each FF and FFPE sample set (Supplemental Table S4). Similar analysis was performed for calculating reproducibility of somatic indels. Eighty-five percent to 97% somatic indel calling reproducibility was achieved. By examining heterozygous and homozygous germline variants the technical sequencing reproducibility of the assay was tested because these variants are known stable variants of high allelic frequencies. In addition, by calculating the concordance of somatic variant calling between inter and intra sequencing samples the ability to adequately reproduce somatic variant identification at clinical confidence percentages was shown. Reproducibility of calling copy number log2(T/N) values per gene across the inter and intra sample sets was performed. Only copy number calls for the 570 COSMIC cancer census genes v75 were examined, and genes with multiple isoforms or ambiguous mapping were filtered, leaving a total of 559 genes. Reproducibility of copy number identification for FF samples was >98% in both the inter and intra runs and between 89% and 100% in FFPE samples. The reproducibility of variant calling was also analyzed by using eight FFPE and FF matched tumor samples from the same patient. Within this small sample set, a 68% mean reproducibility of variants called in cancer census genes in the FFPE sample with those in the FF sample was observed. When applying the clinical VAF threshold of (≥15% VAF and ≥30 total read count) the mean reproducibility increased to 80% (Supplemental Table S5). This reproducibility was similar to that seen in a larger WGS FFPE–FF matched tumor analysis.40 In addition, in four of the matched samples in which there was an orthogonal assay for variant confirmation, FFPE and FF samples were 100% concordant; therefore, both FFPE and frozen samples produced the same overall concordance to the orthogonal assay. The lower performance in overlapping calls between the two sample types in these data compared with the 2018 study by Robbe et al40 can likely be attributed to the quality of FFPE sample and variations in sample storage, handling, and length of fixation time before extraction. These analyses also revealed high variability in nonconcordant variants (a variant unique to either only the frozen sample or the FFPE sample) with four pairs exhibiting more nonconcordant variants in the FFPE sample and four pairs with more nonconcordant variants in the frozen sample. Furthermore, median VAF of nonconcordant variants in all eight samples was slightly greater in the FFPE samples than in the FF samples (16% versus 11%), whereas the median VAF of the concordant variants was 26% (mean, 29.1% ± 15.5%). Mean VAF of FFPE unique calls was 17.5% ± 7.2% with a mean VAF for FF unique being 14.4% ± 10.0%. However, a more stringent and complete analysis that used a larger sampling of matched FFPE and FF sample cancer types and the effects of FFPE induced artifacts in variant calling was necessary and beyond the scope of this article.

Table 3.

Reproducibility Experiment: Sequencing Validation Design

Assay Day Intra
Sequencer 1 Sequencer 2 Sequencer 3
Inter Day 1 CA-0101-p1 (frozen)
CA-0122-p1 (frozen)
ONC16-13-p1 (FFPE)
GBM-35-p1 (FFPE)
Day 2 CA-0101-p1 (frozen)
CA-0122-p1 (frozen)
ONC16-13-p1 (FFPE)
GBM-35-p1 (FFPE)
CA-0101-p2
CA-0122-p2
ONC16-03-p2
GBM-35-p2
CA-101-p3
CA-0122-p3
ONC16-13-p3
GBM-35-p3
Day 3 CA-0101-p1 (frozen)
CA-0122p1 (frozen)
ONC16-13-p1 (FFPE)
GBM-35-p1 (FFPE)

Sequencing was performed on the same library preparation (p1) for four samples (two frozen and two FFPE) on the same sequencing machine on three separate days (inter-run reproducibility). Two additional library preparations were performed for each sample (p2, p3) and sequenced on the same day on different sequencing machines (intra-technician reproducibility).

FFPE, formalin-fixed, paraffin-embedded.

Copy Number Validation

To assess accuracy of CNV identification, tumor specimens with known CNVs by karyotype or microarray were used, and copy number calling validation based on the concordance between DNA sequencing and genotyping analysis was developed. The Illumina HumanOmni 2.5M BeadChips (WG-313-2511; Illumina) whole genotyping chip was used as an orthogonal method to validate the WGS CNV calling by using the BIC-seq2 algorithm.28 Copy number calling from the genotyping chip was performed by using the ASCAT tool.34 A comparison of gene copy number calls between the two platforms showed the correlation of the mean values was r = 0.89, with total correlation at r = 0.70 (Figure 5).

Figure 5.

Figure 5

DNA whole-genome sequencing (WGS) copy number correlation to genotyping chip copy number. The correlation of allele-specific copy number analysis of tumors (ASCAT) discrete copy number per gene to copy number per gene identified by DNA WGS is shown for 40 samples with tumor purity of 30%. The red squares represent the mean values in the distributions. The correlation of the mean values was r = 0.89, with total correlation at r = 0.70. The plot is shown with a y axis range limited to 0,30 copy number. T/N, tumor/normal.

PPV and sensitivity were calculated between the two methods for the three discrete types of copy number calls, gain (three copies or greater), neutral (two copies, diploid), and loss (one copy or less) (Supplemental Table S6). Only copy number calls for 570 COSMIC cancer census genes v75 were examined. Both methods and callers reported all genes, but genes with multiple isoforms or ambiguous mapping were filtered for leaving a total of 559 genes. The resolution of the genotyping chip was not as high compared with DNA WGS data. For instance, amplified genes such as EGFR reached a maximum of seven or nine copies by ASCAT but could be called as 53 or 111 adjusted copy number values by DNA WGS. Therefore, only gains and losses have been reported because this was not a quantitative test. PPV and sensitivity for three reporting criteria have been shown. PPV for the three criteria were 95%, 96%, and 86% for calling loss, neutral (diploid), and gain CNV results, respectively (Supplemental Table S6). When reporting CNV from WGS BIC-seq2 output was used to call gains and losses. Our clinical reporting protocol as required by New York State Department of Health always performed an orthogonal assay for variant confirmation. Each reported gain or loss was then compared with the genotyping chip ASCAT output. The WGS data were considered to be confirmed if both WGS and genotyping chip were concordant. However, if there was any discrepancy between the two, the sensitivity of WGS was set at ≥30% purity and the copy number variation was reported with recommendation for confirmation by immunohistochemistry or other orthogonal assay.

SV Validation

Our benchmarking of SV calling from DNA relies on performing concordance, reproducibility, and downsampling experiments for limit of detection of orthogonally confirmed SVs. We used specimens with SVs identified by fluorescence in situ hybridization, exomes, or panels from other clinical laboratories as truth sets for the validation of SVs in DNA-seq data. Limit of detection, sensitivity (orthogonal assay concordance), and reproducibility were performed for selected clinically relevant variants such as EGFRvIII deletion, EWSR1-WT1 fusion in DNA, EGFR inversion in overamplified locus of DNA, ETV6-RUNX1 translocation. Reproducibility experiment results showed 19 of 20 samples sequenced contained the respective variant. The 28% to 30% tumor purity of the sample that did not contain the selected variant was at or below our detection limit threshold and was still found in four reproducibility runs. Benchmarking of SV callers for all variants showed the capability of calling SVs with as low as three reads in 60×/30× T/N coverage. For RNA-seq data, intra-assay and inter-assay reproducibility was performed in four samples (two FF and two FFPE, all total RNA preparations) for known clinically significant fusion variants, CCDC6-RET, ETV6-NTRK3, EWSR1-ATF1, and ERG-EWSR1. The respective fusions were identified in all reproducibility runs. In addition, fusions such as EWSR1-WT1, NUP98-KDM5, ETV6-RUNX, and SREBF1-CIC seen in an orthogonal assay were confirmed by our assay. Finally, because of the lack of a large cohort of samples that contained all possible SVs (both DNA and RNA), the performance-based benchmarking (accuracy and specificity) within the peer-reviewed published analysis was considered.

Variant Identification Concordance with Orthogonal Sequencing Panels

The somatic variant calling results for 71 samples were compared with results obtained by an orthogonal clinically validated sequencing assay. These specimens were obtained from other clinical laboratories (Memorial Sloan Kettering Cancer Center, Columbia University Medical Center), and/or research studies for which data from Foundation Medicine or other panels/clinical testing were available. The comparison of our assay with the results available for all 68 total samples are shown in Supplemental Table S7. Comparison of WGS SNV/indel/SV variant calls to those performed by an orthogonal assay showed a total of 84% concordance. For panel reports that did not contain a variant and an actionable variant was not reported, the test was considered to be concordant with the panel. In the examination of discordant calls it was mainly noticed that these calls were low VAF (<5%) variants usually filtered out by the SNV/indel callers because of the low alternate allele read count. Because these panel calls were not strictly orthogonal to WGS, it was not surprising that high coverage target panels identified lower VAF subclonal variants with deeper sequencing. Because many other clinical laboratory reports did not provide VAF for the variants they report, there was no way of knowing what allelic frequency they have identified or if these variants can be considered as true positive variants. However, these low VAF variants would be below our thresholds of 95% CI and may be identified as investigational. In addition, we want to emphasize that some of the orthogonal tests reported germline variants that were identified and filtered out (panel reported germline variants were not counted in the concordance statistics). The concordance to CNVs identified in orthogonal panel testing was 99%. Finally, in all eight gene fusions reported in these orthogonal assays, combined DNA/RNA sequencing yielded 100% concordance. Both DNA and RNA sequencing data identified these eight fusions.

RNA Sequencing

The 15 most stably expressed housekeeping genes as reported by Hsiao et al41 were chosen to determine the coverage threshold criteria for our RNA-Seq assay. Coverage requirement for an acceptable RNA run was ≥40 million mapped reads. An average coverage for a sample was not used, instead median coverage for 15 housekeeping genes across the validation cohort are shown (Supplemental Figure S11, A–C). The genes were annotated by using Gencode version 19, plus selected only central exons (minus those adjacent to untranslated regions) and exons that would be least affected by alternative isoform usage. A minimum of 40 million reads for the sample produced sufficient coverage in a selected set of housekeeping genes, ranging from 30× to 1000×. A minimum number of reads for the housekeeping genes has been established based on the median of all 15 genes as the threshold for pass/fail. Therefore, the median will include eight or more genes over our read threshold. Analysis of all validation samples confirmed sufficient coverage in housekeeping genes, ranging from 30× to 1000× in more than one-half of the housekeeping genes in each sample (Supplemental Figure S11D).

To confirm somatic SNVs and small indels identified in WGS, the number of reads corresponding to the variant (with base quality >20) were measured. The total number of reads (coverage) at the position were also measured. If there were less than five total reads, “no reads” was indicated. If there were more than five total reads, then it could be determined whether the variant was considered “confirmed by RNA.” The variant was called “confirmed by RNA” if there were five or more total reads, including at least two of the total reads being variant reads. If there were five or more total reads, but less than two variant reads, the variant was declared “not confirmed.” The category “not confirmed” could correspond to a false-positive variant or an absence of expression of the variant allele (Supplemental Table S8 and Supplemental Figure S12). Of the tier 1 to 3 DNA variants with a VAF ≥15%, 36% could be confirmed with RNA sequencing with sufficient read coverage.

Final Variant Classification, Therapy Associations, and Reporting

Post-processing of variant classification and prioritization for reporting pathogenic variants and potential associated therapies and clinical trials underwent similar criteria as specified in previous studies.11, 42 Initial clinical classification of each identified somatic variant was performed by using a clinical tier system designed at NYGC. Tier 1 variants were clinically important variants in the cancer type being studied (eg, EGFR T790M is known to be clinically important in lung cancer43). The same variant observed in a cancer unknown to manifest this variant was classified as tier 2 (eg, the clinical importance of EGFR T790M is unknown in glioblastoma). Tier 3 variants were in targetable genes; however, the specific variant was not known to be targetable (eg, a functionally unknown or unannotated mutation in EGFR). Tier 4 variants were in genes cataloged by COSMIC cancer census and were not included in tiers 1 to 3.25 All other variants were in tier 5 and were considered variants of uncertain significance. Splice site variants were identified in 20 different known oncogenes and tumor suppressor genes (Supplemental Table S9).44 Variants were matched to potential treatments by identifying the most aberrant genes from a combination of SNV, indel, SV, and RNA-Seq data and by searching the NYGC drug-to-gene database. Matching of each specimen's somatic aberration to a potential drug was performed by identifying the most aberrant genes from a combination of SNV, CNV, and RNA-Seq data and by searching the expert-curated NYGC drug-to-gene database for all cancer drugs associated with identified genes. Prioritization and rationale of associated drugs was further based on manual assessment by the NYGC clinical team examining several criteria in concordance to recommendations from others,15, 42 including but not limited to strength of data for variants detected, FDA approval of drug in cancer type, FDA approval of drug in other cancer type, drug has been identified as currently in a cancer trial, drug has been used successfully in the treatment of tumor type to the variant, and drug has been used successfully in other cancer types associated with the variant. Confidence levels for therapeutic target annotations were based on the strength and number of data evidence and rationale (eg, a known FDA-approved targetable mutation such as BRAF V600E would be rated as high and a variant of unknown significance would be none to low, depending on the level of data evidence). Prioritization of potential treatments was based on further manual assessment, including criteria such as strength of data supporting variants detected, FDA approval of drug in cancer type, current cancer-specific trial for a drug, and successful use of the drug to target the variant identified in cancer type. Thereby the NYGC's final clinical diagnostic report provided the treating oncologist with both the standard of care therapy for each cancer type best fitting the patient's genomic profile while also searching clinical trial and potential experimental therapeutic options associated to all significant genomic variants identified.

Discussion

WGS provides an unbiased representation of a patient's somatic tumor profile. Combined with RNA transcriptome sequencing, a more complete depiction of the patient's disease from a genomic and molecular perspective is revealed. This broader diagnostic approach can be used to improve the diagnosis and potential therapy selection within patient precision medicine that is beginning to emerge.2, 45, 46 We have developed a clinical WGTS strategy and applied it for the reporting of targetable variants in cancer47 (NCT02725684). We show that sufficient sequencing coverage and clinical sensitivity are achieved for all known targetable and cancer census genes. Furthermore, adequate identification of SVs in both DNA and RNA can be performed in samples with sufficient tumor content. Whole-genome coverage of 95% is achieved at a read depth of ≥40×. Variant calling PPV increases with increased normal sequencing depth and sensitivity increases with additional tumor sequencing depth. A T/N sequencing coverage ratio of 80×:40× was determined with a detectable limit of ≥30% tumor content. A set of clinically validated WGTS confidence thresholds comparable with other sequencing panels for five types of somatic aberrations was established (Table 4), and all variants identified below these thresholds were considered investigational or putative. In addition, 84%, 99%, and 100% concordance to high coverage clinical sequencing orthogonal panels for SNV, CNV, and gene fusions, respectively, was achieved. The observed discordance between high coverage sequencing panels and our WGTS assay is attributed to low allelic frequency mutations, variability between assay platforms, variant calling thresholds, and final reporting criteria. Increased sensitivity for low frequency variant identification in WGS assays is expected to become comparable with those of deep sequencing assays, with the cost of sequencing consistently going down and when the ability to add more read depth across the entire genome becomes more efficient. Although limitations in indel calling and SV identification are still evident across all platforms,48 constant development of improved calling algortihms49, 50 and new version releases51 of current tools plus added sequencing coverage should only enhance accuracy and clinical reproducibility.

Table 4.

Minimum Number of Total Reads Required to Detect a Variant

Type VAF Total reads Method
SNVs 15% ≥40 WGS
Indels 20% ≥40 WGS
CNVs NA ≥30 WGS
SVs NA ≥3 WGS
Fusions NA ≥5 RNA-seq

CNV, copy number variant; Indel, insertion/deletion; NA, not applicable; SNV, single-nucleotide variant; SV, structural variant; VAF, variant allele frequency; WGS, whole-genome sequencing.

Based on Xi et al.28

Spanning pairs.

Many targeted therapies are not specific to SNVs. Copy number aberrations, SVs, and fusion genes that cause activation of oncogenic and loss of tumor suppressive proteins are emerging as potential targets of improved therapies.52, 53, 54 These aberrations are not often captured by exome or targeted panel sequencing because of insufficient coverage or coverage variability over an entire affected genomic locus when using only segmented sequencing data.55, 56 Specifically, single gene, high focal amplifications, and deletions are vulnerable to misidentification, and structural variation identification within genes remains a challenge when sequencing only coding regions. Furthermore, additional analysis and functional annotation of noncoding regions will make WGTS crucial to clinical genomic practice.57 Therefore, the acquisition of whole-genomic data from greater sequencing resolution can provide additional information into the treatment of disease. However, the patient's entire genomic profile, including copy number variation, SVs, subclonality, mutation burden, mutation signature, and tumor stability may help in distinguishing why certain patients responded to targeted therapy and others given the same drug did not.51, 58, 59 Incorporation of WGTS to provide more comprehensive genomic profiling into clinical trials is warranted to improve the diagnosis, therapy selection, and potentially clinical outcomes for patients.

Acknowledgment

The authors thank Dr. Marc Ladanyi (Memorial Sloan Kettering Cancer Center, New York, NY) for providing tumor samples.

Footnotes

Supported in part by NIH grant P30 CA008748 (A.K.) and the Damon Runyon–Richard Lumsden Foundation (A.K.).

Disclosures: None declared.

Supplemental material for this article can be found at https://doi.org/10.1016/j.jmoldx.2018.06.007.

Contributor Information

Kazimierz O. Wrzeszczynski, Email: kwrzeszczynski@nygenome.org.

Vaidehi Jobanputra, Email: vjobanputra@nygenome.org.

Supplemental Data

Supplemental Figure S1

Virtual tumor design. Schematic of synthetic design of virtual tumor used in caller validation and precision/recall analysis. The synthetic tumor [new Binary Alignment Map (BAM) file] is generated by incorporating homozygous variant positions from a high coverage NA12891 BAM file in silico into the NA12892-R2 BAM file by using binomial sampling. This tumor was then analyzed against the NA12892-R1 as the matched normal. VAF, variant allele frequency.

mmc1.pdf (107.4KB, pdf)
Supplemental Figure S2

Binomial variant allele frequency (VAF) distribution of VAF virtual tumor homozygous variants (true positives). VAF distribution [after binomial sampling for VAF 5% (red), 10% (blue), 20% (green), 40% (violet)] within the virtual tumor of spiked in NA12891 bases at homozygous positions in NA12892. Density was calculated by using the R package density function which is a kernel-based smoothing function. Description of the algorithm can be found at https://stat.ethz.ch/R-manual/R-devel/library/stats/html/density.html; last accessed December 20, 2017.

mmc2.pdf (46.6KB, pdf)
Supplemental Figure S3

Virtual tumor variant allele frequency (VAF) and alternate allele read count versus read count. A: VAF of true positive (TP; red dots) and false positive (FP; black dots) calls made by Strelka versus total read count at tumor/normal coverage (60×:30× and 80×:40×). B: Alternate allele read count of TP (red dots) and FP (black dots) calls made by Strelka versus total read count at tumor/normal coverage (60×:30× and 80×:40×).

mmc3.pdf (891.2KB, pdf)
Supplemental Figure S4

Whole-genome sequencing variant calls. A and B: Variant allele frequency and read count per single nucleotide variant (SNV) and insertion or deletion (indel) variants [tier 1 to 2 (A) and tier 3 (B)] identified in New York Genome Center (NYGC) whole-genome sequencing validation set. Red lines represent the power calculations for variant discovery based on a Q20 sequencing error rate as depicted in previous study.14Vertical and horizontal dot-dash lines represent NYGC variant calling threshold of 15% variant allele frequency and a read count of 40, respectively. Note: All but one tier 1 to 2 variant (KIT p.Asp816His, c.2446G>C, 3% VAF, 446 total reads) fell within the power analysis threshold, this was a known low frequency secondary therapy resistance variant often observed in amplified KIT. In this sample the KIT gene was also amplified and this low frequency variant was called at >400 reads, in addition this variant would be below the Memorial Sloan Kettering Cancer Center power calculation threshold (red line). This variant was called in all runs for this sample by our assay, including inter/intra reproducibility analysis.

mmc4.pdf (264.1KB, pdf)
Supplemental Figure S5

Tumor/normal dilution experiment at 80×:40× of HapMap Project Samples: variant detection sensitivity. DNA from HapMap Project sample NA12892 (tumor; T) was diluted with sample NA12891 (normal; N) at mixing ratios of 50:50 (T/N), 40:60, 30:70, 20:80, and 10:90. Cumulative coverage [true positive (TP)/TP+false negative (FN)] count is shown with increasing read depth for each TP call by MuTect at 80×:40× (solid lines). Coverage drop-off was evident at respective sequencing coverage in all cases. At mix percentage of ≥30% cumulative coverage sensitivity at ≥ 40× was equal.

mmc5.pdf (674.4KB, pdf)
Supplemental Figure S6

Variant allele frequency (VAF) distribution plots for tumor sample dilution (frozen sample, CA-0061T). The frozen sample was diluted three times with its corresponding normal genomic DNA. VAF plots (using R kernel density function) are presented for the original sample and the three dilutions as called by MuTect-Strelka. VAF distribution was used to estimate tumor purity for which the mode of VAF distribution represented approximate purity multiplied by 2 (for a heterozygous somatic variant). A: Original tissue sample (90% tumor purity). B: Dilutions to 30% to 25% tumor content. C: 20% tumor content. D: 10% tumor content.

mmc6.pdf (50.4KB, pdf)
Supplemental Figure S7

Variant allele frequency (VAF) distribution plots for tumor sample dilutions (formalin-fixed, paraffin-embedded sample, G15-31T). The formalin-fixed, paraffin-embedded sample was diluted three times with the corresponding normal genomic DNA. VAF plots (using R kernel density function) are presented for the original sample and the three dilutions as called by MuTect-Strelka. VAF distribution was used to estimate tumor purity for which the mode of VAF distribution represented approximate purity multiplied by 2 (for a heterozygous somatic variant). A: Original tissue sample (60% to 65% tumor purity), B: Dilutions to 30% tumor content. C: 20% to 15% tumor content. D: 10% tumor content.

mmc7.pdf (57.5KB, pdf)

Supplemental Figure S8.

Supplemental Figure S8

Copy number and B-allele frequency plots for tumor sample dilutions (frozen sample, CA-0061T). The frozen sample CA-0061 was diluted three times with its corresponding normal genomic DNA. Copy number log2 (tumor/normal; T/N) ratio and B-allele frequency across the genome are represented as called by NBIC-Seq for the original and dilution samples. A: Original tissue sample (90% tumor purity). B: Dilutions to 30% to 25% tumor content. C: 20% tumor content. D: 10% tumor content.

Supplemental Figure S9.

Supplemental Figure S9

Copy number and B-allele frequency plots for tumor sample dilutions (formalin-fixed, paraffin-embedded sample, G15-31T). The formalin-fixed, paraffin-embedded G15-35T was diluted three times with its corresponding normal genomic DNA. Copy number log2 (tumor/normal; T/N) ratio and B-allele frequency across the genome are represented as called by NBIC-Seq for the original and dilution samples. A: Original tissue sample (60% to 65% tumor purity). B: Dilutions to 30% tumor content. C: 20% to 15% tumor content. D: 10% tumor content.

Supplemental Figure S10

Concordance reproducibility of genotyping markers. A and B: Genotype marker concordance percentage matrix of tumor (A) and normal (B) sample inter and intra reproducibility sequencing runs. Generated with Conpair.39 Concordance across samples of all sequencing runs (inter and intra) in the reproducibility assay is depicted using Conpair which applies Hardy-Weinberg equation and integrates across >7300 preselected exonic markers. The concordance intensity is indicated by the red coloring in the matrix: bright red squares depict >95% concordance among markers. Any discordance or contamination between the samples or sequencing failure would result in lower percentage values.

mmc8.pdf (82.2KB, pdf)

Supplemental Figure S11.

Supplemental Figure S11

RNA sequencing coverage. A: Median coverage of 15 housekeeping genes plus two olfactory genes (ORA5A2 and OR11H4) used as control in all samples. B: Median coverage of 15 housekeeping genes from mRNA-prepared libraries. C: Median coverage of 15 housekeeping genes from total RNA-prepared libraries. Libraries prepared from mRNA samples exhibit greater coverage than libraries prepared from total RNA samples. D: A minimum library size of 40 million reads (vertical blue line) provides minimum median coverage of 30 reads for the housekeeping genes (horizontal red line), independent of input sample type.

Supplemental Figure S12

RNA–DNA variant correlation. A and B: Variant allele frequency (VAF) determined by DNA and RNA sequencing for tier 1 to 3 variants (targetable genes) in our validation sample set (A) and tier 4 cancer census genes (B).

mmc9.pdf (47.5KB, pdf)
Supplemental Table S1
mmc10.docx (14.5KB, docx)
Supplemental Table S2
mmc11.docx (11.7KB, docx)
Supplemental Table S3
mmc12.docx (17.7KB, docx)
Supplemental Table S4
mmc13.docx (18.4KB, docx)
Supplemental Table S5
mmc14.docx (16.1KB, docx)
Supplemental Table S6
mmc15.docx (11.6KB, docx)
Supplemental Table S7
mmc16.docx (78.4KB, docx)
Supplemental Table S8
mmc17.docx (11.2KB, docx)
Supplemental Table S9
mmc18.xlsx (7.9KB, xlsx)
Data Profile
mmc19.xml (257B, xml)

References

  • 1.AACR Project GENIE Consortium AACR Project GENIE: powering precision medicine through an international consortium. Cancer Discov. 2017;7:818–831. doi: 10.1158/2159-8290.CD-17-0151. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Massard C., Michiels S., Ferte C., Le Deley M.C., Lacroix L., Hollebecque A., Verlingue L., Ileana E., Rosellini S., Ammari S., Ngo-Camus M., Bahleda R., Gazzah A., Varga A., Postel-Vinay S., Loriot Y., Even C., Breuskin I., Auger N., Job B., De Baere T., Deschamps F., Vielh P., Scoazec J.Y., Lazar V., Richon C., Ribrag V., Deutsch E., Angevin E., Vassal G., Eggermont A., Andre F., Soria J.C. High-throughput genomics and clinical outcome in hard-to-treat advanced cancers: results of the MOSCATO 01 trial. Cancer Discov. 2017;7:586–595. doi: 10.1158/2159-8290.CD-16-1396. [DOI] [PubMed] [Google Scholar]
  • 3.Zehir A., Benayed R., Shah R.H., Syed A., Middha S., Kim H.R. Mutational landscape of metastatic cancer revealed from prospective clinical sequencing of 10,000 patients. Nat Med. 2017;23:703–713. doi: 10.1038/nm.4333. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Coyne G.O., Takebe N., Chen A.P. Defining precision: the precision medicine initiative trials NCI-MPACT and NCI-MATCH. Curr Probl Cancer. 2017;41:182–193. doi: 10.1016/j.currproblcancer.2017.02.001. [DOI] [PubMed] [Google Scholar]
  • 5.Simon R., Roychowdhury S. Implementing personalized cancer genomics in clinical trials. Nat Rev Drug Discov. 2013;12:358–369. doi: 10.1038/nrd3979. [DOI] [PubMed] [Google Scholar]
  • 6.Lopez-Chavez A., Thomas A., Rajan A., Raffeld M., Morrow B., Kelly R., Carter C.A., Guha U., Killian K., Lau C.C., Abdullaev Z., Xi L., Pack S., Meltzer P.S., Corless C.L., Sandler A., Beadling C., Warrick A., Liewehr D.J., Steinberg S.M., Berman A., Doyle A., Szabo E., Wang Y., Giaccone G. Molecular profiling and targeted therapy for advanced thoracic malignancies: a biomarker-derived, multiarm, multihistology phase II basket trial. J Clin Oncol. 2015;33:1000–1007. doi: 10.1200/JCO.2014.58.2007. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Borad M.J., Egan J.B., Condjella R.M., Liang W.S., Fonseca R., Ritacca N.R. Clinical implementation of integrated genomic profiling in patients with advanced cancers. Sci Rep. 2016;6:25. doi: 10.1038/s41598-016-0021-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.He J., Abdel-Wahab O., Nahas M.K., Wang K., Rampal R.K., Intlekofer A.M. Integrated genomic DNA/RNA profiling of hematologic malignancies in the clinical setting. Blood. 2016;127:3004–3014. doi: 10.1182/blood-2015-08-664649. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Belkadi A., Bolze A., Itan Y., Cobat A., Vincent Q.B., Antipenko A., Shang L., Boisson B., Casanova J.L., Abel L. Whole-genome sequencing is more powerful than whole-exome sequencing for detecting exome variants. Proc Natl Acad Sci U S A. 2015;112:5473–5478. doi: 10.1073/pnas.1418631112. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Petersen B.S., Fredrich B., Hoeppner M.P., Ellinghaus D., Franke A. Opportunities and challenges of whole-genome and -exome sequencing. BMC Genet. 2017;18:14. doi: 10.1186/s12863-017-0479-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Li M.M., Datto M., Duncavage E.J., Kulkarni S., Lindeman N.I., Roy S., Tsimberidou A.M., Vnencak-Jones C.L., Wolff D.J., Younes A., Nikiforova M.N. Standards and guidelines for the interpretation and reporting of sequence variants in cancer: a joint consensus recommendation of the Association for Molecular Pathology, American Society of Clinical Oncology, and College of American Pathologists. J Mol Diagn. 2017;19:4–23. doi: 10.1016/j.jmoldx.2016.10.002. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Lih C.J., Sims D.J., Harrington R.D., Polley E.C., Zhao Y., Mehaffey M.G., Forbes T.D., Das B., Walsh W.D., Datta V., Harper K.N., Bouk C.H., Rubinstein L.V., Simon R.M., Conley B.A., Chen A.P., Kummar S., Doroshow J.H., Williams P.M. Analytical validation and application of a targeted next-generation sequencing mutation-detection assay for use in treatment assignment in the NCI-MPACT trial. J Mol Diagn. 2016;18:51–67. doi: 10.1016/j.jmoldx.2015.07.006. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Cottrell C.E., Al-Kateb H., Bredemeyer A.J., Duncavage E.J., Spencer D.H., Abel H.J., Lockwood C.M., Hagemann I.S., O'Guin S.M., Burcea L.C., Sawyer C.S., Oschwald D.M., Stratman J.L., Sher D.A., Johnson M.R., Brown J.T., Cliften P.F., George B., McIntosh L.D., Shrivastava S., Nguyen T.T., Payton J.E., Watson M.A., Crosby S.D., Head R.D., Mitra R.D., Nagarajan R., Kulkarni S., Seibert K., Virgin H.W., IV, Milbrandt J., Pfeifer J.D. Validation of a next-generation sequencing assay for clinical molecular oncology. J Mol Diagn. 2014;16:89–105. doi: 10.1016/j.jmoldx.2013.10.002. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Cheng D.T., Mitchell T.N., Zehir A., Shah R.H., Benayed R., Syed A., Chandramohan R., Liu Z.Y., Won H.H., Scott S.N., Brannon A.R., O'Reilly C., Sadowska J., Casanova J., Yannes A., Hechtman J.F., Yao J., Song W., Ross D.S., Oultache A., Dogan S., Borsu L., Hameed M., Nafa K., Arcila M.E., Ladanyi M., Berger M.F. Memorial Sloan Kettering-Integrated Mutation Profiling of Actionable Cancer Targets (MSK-IMPACT): a hybridization capture-based next-generation sequencing clinical assay for solid tumor molecular oncology. J Mol Diagn. 2015;17:251–264. doi: 10.1016/j.jmoldx.2014.12.006. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Jennings L.J., Arcila M.E., Corless C., Kamel-Reid S., Lubin I.M., Pfeifer J., Temple-Smolkin R.L., Voelkerding K.V., Nikiforova M.N. Guidelines for validation of next-generation sequencing-based oncology panels: a joint consensus recommendation of the Association for Molecular Pathology and College of American Pathologists. J Mol Diagn. 2017;19:341–365. doi: 10.1016/j.jmoldx.2017.01.011. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Samorodnitsky E., Datta J., Jewell B.M., Hagopian R., Miya J., Wing M.R., Damodaran S., Lippus J.M., Reeser J.W., Bhatt D., Timmers C.D., Roychowdhury S. Comparison of custom capture for targeted next-generation DNA sequencing. J Mol Diagn. 2015;17:64–75. doi: 10.1016/j.jmoldx.2014.09.009. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Pritchard C.C., Salipante S.J., Koehler K., Smith C., Scroggins S., Wood B., Wu D., Lee M.K., Dintzis S., Adey A., Liu Y., Eaton K.D., Martins R., Stricker K., Margolin K.A., Hoffman N., Churpek J.E., Tait J.F., King M.C., Walsh T. Validation and implementation of targeted capture and sequencing for the detection of actionable mutation, copy number variation, and gene rearrangement in clinical cancer specimens. J Mol Diagn. 2014;16:56–67. doi: 10.1016/j.jmoldx.2013.08.004. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Li H., Durbin R. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics. 2009;25:1754–1760. doi: 10.1093/bioinformatics/btp324. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.McKenna A., Hanna M., Banks E., Sivachenko A., Cibulskis K., Kernytsky A., Garimella K., Altshuler D., Gabriel S., Daly M., DePristo M.A. The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. 2010;20:1297–1303. doi: 10.1101/gr.107524.110. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Cibulskis K., Lawrence M.S., Carter S.L., Sivachenko A., Jaffe D., Sougnez C., Gabriel S., Meyerson M., Lander E.S., Getz G. Sensitive detection of somatic point mutations in impure and heterogeneous cancer samples. Nat Biotechnol. 2013;31:213–219. doi: 10.1038/nbt.2514. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Saunders C.T., Wong W.S., Swamy S., Becq J., Murray L.J., Cheetham R.K. Strelka: accurate somatic small-variant calling from sequenced tumor-normal sample pairs. Bioinformatics. 2012;28:1811–1817. doi: 10.1093/bioinformatics/bts271. [DOI] [PubMed] [Google Scholar]
  • 22.Ye K., Schulz M.H., Long Q., Apweiler R., Ning Z. Pindel: a pattern growth approach to detect break points of large deletions and medium sized insertions from paired-end short reads. Bioinformatics. 2009;25:2865–2871. doi: 10.1093/bioinformatics/btp394. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Cingolani P., Platts A., Wang le L., Coon M., Nguyen T., Wang L., Land S.J., Lu X., Ruden D.M. A program for annotating and predicting the effects of single nucleotide polymorphisms, SnpEff: SNPs in the genome of Drosophila melanogaster strain w1118; iso-2; iso-3. Fly (Austin) 2012;6:80–92. doi: 10.4161/fly.19695. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Aken B.L., Achuthan P., Akanni W., Amode M.R., Bernsdorff F., Bhai J. Ensembl 2017. Nucleic Acids Res. 2017;45:D635–D642. doi: 10.1093/nar/gkw1104. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Forbes S.A., Bindal N., Bamford S., Cole C., Kok C.Y., Beare D., Jia M., Shepherd R., Leung K., Menzies A., Teague J.W., Campbell P.J., Stratton M.R., Futreal P.A. COSMIC: mining complete cancer genomes in the Catalogue of Somatic Mutations in Cancer. Nucleic Acids Res. 2011;39:D945–D950. doi: 10.1093/nar/gkq929. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Gene Ontology Consortium The Gene Ontology in 2010: extensions and refinements. Nucleic Acids Res. 2010;38:D331–D335. doi: 10.1093/nar/gkp1018. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.1000 Genomes Project Consortium. Auton A., Brooks L.D., Durbin R.M., Garrison E.P., Kang H.M., Korbel J.O., Marchini J.L., McCarthy S., McVean G.A., Abecasis G.R. A global reference for human genetic variation. Nature. 2015;526:68–74. doi: 10.1038/nature15393. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Xi R., Lee S., Xia Y., Kim T.M., Park P.J. Copy number analysis of whole-genome data using BIC-seq2 and its application to detection of cancer susceptibility variants. Nucleic Acids Res. 2016;44:6274–6286. doi: 10.1093/nar/gkw491. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Rausch T., Zichner T., Schlattl A., Stutz A.M., Benes V., Korbel J.O. DELLY: structural variant discovery by integrated paired-end and split-read analysis. Bioinformatics. 2012;28:i333–i339. doi: 10.1093/bioinformatics/bts378. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Wang J., Mullighan C.G., Easton J., Roberts S., Heatley S.L., Ma J., Rusch M.C., Chen K., Harris C.C., Ding L., Holmfeldt L., Payne-Turner D., Fan X., Wei L., Zhao D., Obenauer J.C., Naeve C., Mardis E.R., Wilson R.K., Downing J.R., Zhang J. CREST maps somatic structural variation in cancer genomes with base-pair resolution. Nat Methods. 2011;8:652–654. doi: 10.1038/nmeth.1628. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Chen K., Wallis J.W., McLellan M.D., Larson D.E., Kalicki J.M., Pohl C.S., McGrath S.D., Wendl M.C., Zhang Q., Locke D.P., Shi X., Fulton R.S., Ley T.J., Wilson R.K., Ding L., Mardis E.R. BreakDancer: an algorithm for high-resolution mapping of genomic structural variation. Nat Methods. 2009;6:677–681. doi: 10.1038/nmeth.1363. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Emde A.K., Schulz M.H., Weese D., Sun R., Vingron M., Kalscheuer V.M., Haas S.A., Reinert K. Detecting genomic indel variants with exact breakpoints in single- and paired-end sequencing data using SplazerS. Bioinformatics. 2012;28:619–627. doi: 10.1093/bioinformatics/bts019. [DOI] [PubMed] [Google Scholar]
  • 33.Ha G., Roth A., Khattra J., Ho J., Yap D., Prentice L.M., Melnyk N., McPherson A., Bashashati A., Laks E., Biele J., Ding J., Le A., Rosner J., Shumansky K., Marra M.A., Gilks C.B., Huntsman D.G., McAlpine J.N., Aparicio S., Shah S.P. TITAN: inference of copy number architectures in clonal cell populations from tumor whole-genome sequence data. Genome Res. 2014;24:1881–1893. doi: 10.1101/gr.180281.114. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Van Loo P., Nordgard S.H., Lingjaerde O.C., Russnes H.G., Rye I.H., Sun W., Weigman V.J., Marynen P., Zetterberg A., Naume B., Perou C.M., Borresen-Dale A.L., Kristensen V.N. Allele-specific copy number analysis of tumors. Proc Natl Acad Sci U S A. 2010;107:16910–16915. doi: 10.1073/pnas.1009843107. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Dobin A., Davis C.A., Schlesinger F., Drenkow J., Zaleski C., Jha S., Batut P., Chaisson M., Gingeras T.R. STAR: ultrafast universal RNA-seq aligner. Bioinformatics. 2013;29:15–21. doi: 10.1093/bioinformatics/bts635. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Nicorici D., Satalan M., Edgren H., Kangaspeska S., Murumagi A., Kallioniemi O., Virtanen S., Kilkku O. FusionCatcher—a tool for finding somatic fusion genes in paired-end RNA-sequencing data. bioRxiv. 2014 doi: 10.1101/011650. [DOI] [Google Scholar]
  • 37.Rennert H., Eng K., Zhang T., Tan A., Xiang J., Romanel A., Kim R., Tam W., Liu Y.C., Bhinder B., Cyrta J., Beltran H., Robinson B., Mosquera J.M., Fernandes H., Demichelis F., Sboner A., Kluk M., Rubin M.A., Elemento O. Development and validation of a whole-exome sequencing test for simultaneous detection of point mutations, indels and copy-number alterations for precision cancer care. NPJ Genom Med. 2016;1:16019. doi: 10.1038/npjgenmed.2016.19. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Linderman M.D., Brandt T., Edelmann L., Jabado O., Kasai Y., Kornreich R., Mahajan M., Shah H., Kasarskis A., Schadt E.E. Analytical validation of whole exome and whole genome sequencing for clinical applications. BMC Med Genomics. 2014;7:20. doi: 10.1186/1755-8794-7-20. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Bergmann E.A., Chen B.J., Arora K., Vacic V., Zody M.C. Conpair: concordance and contamination estimator for matched tumor-normal pairs. Bioinformatics. 2016;32:3196–3198. doi: 10.1093/bioinformatics/btw389. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Robbe P., Popitsch N., Knight S.J.L., Antoniou P., Becq J., He M., Kanapin A., Samsonova A., Vavoulis D.V., Ross M.T., Kingsbury Z., Cabes M., Ramos S.D.C., Page S., Dreau H., Ridout K., Jones L.J., Tuff-Lacey A., Henderson S., Mason J., Buffa F.M., Verrill C., Maldonado-Perez D., Roxanis I., Collantes E., Browning L., Dhar S., Damato S., Davies S., Caulfield M., Bentley D.R., Taylor J.C., Turnbull C., Schuh A. Clinical whole-genome sequencing from routine formalin-fixed, paraffin-embedded specimens: pilot study for the 100,000 Genomes Project. Genet Med. 2018 doi: 10.1038/gim.2017.241. [Epub ahead of print] [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41.Hsiao L.L., Dangond F., Yoshida T., Hong R., Jensen R.V., Misra J., Dillon W., Lee K.F., Clark K.E., Haverty P., Weng Z., Mutter G.L., Frosch M.P., MacDonald M.E., Milford E.L., Crum C.P., Bueno R., Pratt R.E., Mahadevappa M., Warrington J.A., Stephanopoulos G., Stephanopoulos G., Gullans S.R. A compendium of gene expression in normal human tissues. Physiol Genomics. 2001;7:97–104. doi: 10.1152/physiolgenomics.00040.2001. [DOI] [PubMed] [Google Scholar]
  • 42.Yohe S.L., Carter A.B., Pfeifer J.D., Crawford J.M., Cushman-Vokoun A., Caughron S., Leonard D.G. Standards for clinical grade genomic databases. Arch Pathol Lab Med. 2015;139:1400–1412. doi: 10.5858/arpa.2014-0568-CP. [DOI] [PubMed] [Google Scholar]
  • 43.Sharma S.V., Bell D.W., Settleman J., Haber D.A. Epidermal growth factor receptor mutations in lung cancer. Nat Rev Cancer. 2007;7:169–181. doi: 10.1038/nrc2088. [DOI] [PubMed] [Google Scholar]
  • 44.Vogelstein B., Papadopoulos N., Velculescu V.E., Zhou S., Diaz L.A., Jr., Kinzler K.W. Cancer genome landscapes. Science. 2013;339:1546–1558. doi: 10.1126/science.1235122. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45.Griffith M., Griffith O.L., Krysiak K., Skidmore Z.L., Christopher M.J., Klco J.M. Comprehensive genomic analysis reveals FLT3 activation and a therapeutic strategy for a patient with relapsed adult B-lymphoblastic leukemia. Exp Hematol. 2016;44:603–613. doi: 10.1016/j.exphem.2016.04.011. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46.Chantrill L.A., Nagrial A.M., Watson C., Johns A.L., Martyn-Smith M., Simpson S., Mead S., Jones M.D., Samra J.S., Gill A.J., Watson N., Chin V.T., Humphris J.L., Chou A., Brown B., Morey A., Pajic M., Grimmond S.M., Chang D.K., Thomas D., Sebastian L., Sjoquist K., Yip S., Pavlakis N., Asghari R., Harvey S., Grimison P., Simes J., Biankin A.V., Australian Pancreatic Cancer Genome Initiative (APGI); Individualized Molecular Pancreatic Cancer Therapy (IMPaCT) Trial Management Committee of the Australasian Gastrointestinal Trials Group (AGITG) Precision medicine for advanced pancreas cancer: the Individualized Molecular Pancreatic Cancer Therapy (IMPaCT) Trial. Clin Cancer Res. 2015;21:2029–2037. doi: 10.1158/1078-0432.CCR-15-0426. [DOI] [PubMed] [Google Scholar]
  • 47.Wrzeszczynski K.O., Frank M.O., Koyama T., Rhrissorrakrai K., Robine N., Utro F., Emde A.K., Chen B.J., Arora K., Shah M., Vacic V., Norel R., Bilal E., Bergmann E.A., Moore Vogel J.L., Bruce J.N., Lassman A.B., Canoll P., Grommes C., Harvey S., Parida L., Michelini V.V., Zody M.C., Jobanputra V., Royyuru A.K., Darnell R.B. Comparing sequencing assays and human-machine analyses in actionable genomics for glioblastoma. Neurol Genet. 2017;3:e164. doi: 10.1212/NXG.0000000000000164. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 48.Kroigard A.B., Thomassen M., Laenkholm A.V., Kruse T.A., Larsen M.J. Evaluation of nine somatic variant callers for detection of somatic mutations in exome and targeted deep sequencing data. PLoS One. 2016;11:e0151664. doi: 10.1371/journal.pone.0151664. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 49.Narzisi G., Corvelo A., Arora K., Bergmann E.A., Shah M., Musunuri R., Emde A.-K., Robine N., Vacic V., Zody M.C. Genome-wide somatic variant calling using localized colored de Bruijn graphs. Commun Biol. 2018;1:20. doi: 10.1038/s42003-018-0023-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 50.Wala J.A., Bandopadhayay P., Greenwald N.F., O'Rourke R., Sharpe T., Stewart C., Schumacher S., Li Y., Weischenfeldt J., Yao X., Nusbaum C., Campbell P., Getz G., Meyerson M., Zhang C.Z., Imielinski M., Beroukhim R. SvABA: genome-wide detection of structural variants and indels by local assembly. Genome Res. 2018;28:581–591. doi: 10.1101/gr.221028.117. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 51.Kim S., Scheffler K., Halpern A.L., Bekritsky M.A., Noh E., Källberg M., Chen X., Beyter D., Krusche P., Saunders C.T. Strelka2: fast and accurate variant calling for clinical sequencing applications. bioRxiv. 2017 doi: 10.1101/192872. [DOI] [PubMed] [Google Scholar]
  • 52.Secrier M., Li X., de Silva N., Eldridge M.D., Contino G., Bornschein J., MacRae S., Grehan N., O'Donovan M., Miremadi A., Yang T.P., Bower L., Chettouh H., Crawte J., Galeano-Dalmau N., Grabowska A., Saunders J., Underwood T., Waddell N., Barbour A.P., Nutzinger B., Achilleos A., Edwards P.A., Lynch A.G., Tavare S., Fitzgerald R.C., Oesophageal Cancer Clinical and Molecular Stratification (OCCAMS) Consortium Mutational signatures in esophageal adenocarcinoma define etiologically distinct subgroups with therapeutic relevance. Nat Genet. 2016;48:1131–1141. doi: 10.1038/ng.3659. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 53.Robinson D.R., Wu Y.M., Lonigro R.J., Vats P., Cobain E., Everett J., Cao X., Rabban E., Kumar-Sinha C., Raymond V., Schuetze S., Alva A., Siddiqui J., Chugh R., Worden F., Zalupski M.M., Innis J., Mody R.J., Tomlins S.A., Lucas D., Baker L.H., Ramnath N., Schott A.F., Hayes D.F., Vijai J., Offit K., Stoffel E.M., Roberts J.S., Smith D.C., Kunju L.P., Talpaz M., Cieslik M., Chinnaiyan A.M. Integrative clinical genomics of metastatic cancer. Nature. 2017;548:297–303. doi: 10.1038/nature23306. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 54.Wrzeszczynski K.O., Felice V., Shah M., Rahman S., Emde A.K., Jobanputra V., Frank M.O., Darnell R.B. Whole genome sequencing-based discovery of structural variants in glioblastoma. Methods Mol Biol. 2018;1741:1–29. doi: 10.1007/978-1-4939-7659-1_1. [DOI] [PubMed] [Google Scholar]
  • 55.Magi A., Tattini L., Cifola I., D'Aurizio R., Benelli M., Mangano E., Battaglia C., Bonora E., Kurg A., Seri M., Magini P., Giusti B., Romeo G., Pippucci T., De Bellis G., Abbate R., Gensini G.F. EXCAVATOR: detecting copy number variants from whole-exome sequencing data. Genome Biol. 2013;14:R120. doi: 10.1186/gb-2013-14-10-r120. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 56.Shen R., Seshan V.E. FACETS: allele-specific copy number and clonal heterogeneity analysis tool for high-throughput DNA sequencing. Nucleic Acids Res. 2016;44:e131. doi: 10.1093/nar/gkw520. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 57.Zhang W., Bojorquez-Gomez A., Velez D.O., Xu G., Sanchez K.S., Shen J.P., Chen K., Licon K., Melton C., Olson K.M., Yu M.K., Huang J.K., Carter H., Farley E.K., Snyder M., Fraley S.I., Kreisberg J.F., Ideker T. A global transcriptional network connecting noncoding mutations to changes in tumor gene expression. Nat Genet. 2018;50:613–620. doi: 10.1038/s41588-018-0091-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 58.Hyman D.M., Puzanov I., Subbiah V., Faris J.E., Chau I., Blay J.Y., Wolf J., Raje N.S., Diamond E.L., Hollebecque A., Gervais R., Elez-Fernandez M.E., Italiano A., Hofheinz R.D., Hidalgo M., Chan E., Schuler M., Lasserre S.F., Makrutzki M., Sirzen F., Veronese M.L., Tabernero J., Baselga J. Vemurafenib in multiple nonmelanoma cancers with BRAF V600 mutations. N Engl J Med. 2015;373:726–736. doi: 10.1056/NEJMoa1502309. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 59.Le Tourneau C., Delord J.P., Goncalves A., Gavoille C., Dubot C., Isambert N., Campone M., Tredan O., Massiani M.A., Mauborgne C., Armanet S., Servant N., Bieche I., Bernard V., Gentien D., Jezequel P., Attignon V., Boyault S., Vincent-Salomon A., Servois V., Sablin M.P., Kamal M., Paoletti X., SHIVA Investigators Molecularly targeted therapy based on tumour molecular profiling versus conventional therapy for advanced cancer (SHIVA): a multicentre, open-label, proof-of-concept, randomised, controlled phase 2 trial. Lancet Oncol. 2015;16:1324–1334. doi: 10.1016/S1470-2045(15)00188-6. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplemental Figure S1

Virtual tumor design. Schematic of synthetic design of virtual tumor used in caller validation and precision/recall analysis. The synthetic tumor [new Binary Alignment Map (BAM) file] is generated by incorporating homozygous variant positions from a high coverage NA12891 BAM file in silico into the NA12892-R2 BAM file by using binomial sampling. This tumor was then analyzed against the NA12892-R1 as the matched normal. VAF, variant allele frequency.

mmc1.pdf (107.4KB, pdf)
Supplemental Figure S2

Binomial variant allele frequency (VAF) distribution of VAF virtual tumor homozygous variants (true positives). VAF distribution [after binomial sampling for VAF 5% (red), 10% (blue), 20% (green), 40% (violet)] within the virtual tumor of spiked in NA12891 bases at homozygous positions in NA12892. Density was calculated by using the R package density function which is a kernel-based smoothing function. Description of the algorithm can be found at https://stat.ethz.ch/R-manual/R-devel/library/stats/html/density.html; last accessed December 20, 2017.

mmc2.pdf (46.6KB, pdf)
Supplemental Figure S3

Virtual tumor variant allele frequency (VAF) and alternate allele read count versus read count. A: VAF of true positive (TP; red dots) and false positive (FP; black dots) calls made by Strelka versus total read count at tumor/normal coverage (60×:30× and 80×:40×). B: Alternate allele read count of TP (red dots) and FP (black dots) calls made by Strelka versus total read count at tumor/normal coverage (60×:30× and 80×:40×).

mmc3.pdf (891.2KB, pdf)
Supplemental Figure S4

Whole-genome sequencing variant calls. A and B: Variant allele frequency and read count per single nucleotide variant (SNV) and insertion or deletion (indel) variants [tier 1 to 2 (A) and tier 3 (B)] identified in New York Genome Center (NYGC) whole-genome sequencing validation set. Red lines represent the power calculations for variant discovery based on a Q20 sequencing error rate as depicted in previous study.14Vertical and horizontal dot-dash lines represent NYGC variant calling threshold of 15% variant allele frequency and a read count of 40, respectively. Note: All but one tier 1 to 2 variant (KIT p.Asp816His, c.2446G>C, 3% VAF, 446 total reads) fell within the power analysis threshold, this was a known low frequency secondary therapy resistance variant often observed in amplified KIT. In this sample the KIT gene was also amplified and this low frequency variant was called at >400 reads, in addition this variant would be below the Memorial Sloan Kettering Cancer Center power calculation threshold (red line). This variant was called in all runs for this sample by our assay, including inter/intra reproducibility analysis.

mmc4.pdf (264.1KB, pdf)
Supplemental Figure S5

Tumor/normal dilution experiment at 80×:40× of HapMap Project Samples: variant detection sensitivity. DNA from HapMap Project sample NA12892 (tumor; T) was diluted with sample NA12891 (normal; N) at mixing ratios of 50:50 (T/N), 40:60, 30:70, 20:80, and 10:90. Cumulative coverage [true positive (TP)/TP+false negative (FN)] count is shown with increasing read depth for each TP call by MuTect at 80×:40× (solid lines). Coverage drop-off was evident at respective sequencing coverage in all cases. At mix percentage of ≥30% cumulative coverage sensitivity at ≥ 40× was equal.

mmc5.pdf (674.4KB, pdf)
Supplemental Figure S6

Variant allele frequency (VAF) distribution plots for tumor sample dilution (frozen sample, CA-0061T). The frozen sample was diluted three times with its corresponding normal genomic DNA. VAF plots (using R kernel density function) are presented for the original sample and the three dilutions as called by MuTect-Strelka. VAF distribution was used to estimate tumor purity for which the mode of VAF distribution represented approximate purity multiplied by 2 (for a heterozygous somatic variant). A: Original tissue sample (90% tumor purity). B: Dilutions to 30% to 25% tumor content. C: 20% tumor content. D: 10% tumor content.

mmc6.pdf (50.4KB, pdf)
Supplemental Figure S7

Variant allele frequency (VAF) distribution plots for tumor sample dilutions (formalin-fixed, paraffin-embedded sample, G15-31T). The formalin-fixed, paraffin-embedded sample was diluted three times with the corresponding normal genomic DNA. VAF plots (using R kernel density function) are presented for the original sample and the three dilutions as called by MuTect-Strelka. VAF distribution was used to estimate tumor purity for which the mode of VAF distribution represented approximate purity multiplied by 2 (for a heterozygous somatic variant). A: Original tissue sample (60% to 65% tumor purity), B: Dilutions to 30% tumor content. C: 20% to 15% tumor content. D: 10% tumor content.

mmc7.pdf (57.5KB, pdf)
Supplemental Figure S10

Concordance reproducibility of genotyping markers. A and B: Genotype marker concordance percentage matrix of tumor (A) and normal (B) sample inter and intra reproducibility sequencing runs. Generated with Conpair.39 Concordance across samples of all sequencing runs (inter and intra) in the reproducibility assay is depicted using Conpair which applies Hardy-Weinberg equation and integrates across >7300 preselected exonic markers. The concordance intensity is indicated by the red coloring in the matrix: bright red squares depict >95% concordance among markers. Any discordance or contamination between the samples or sequencing failure would result in lower percentage values.

mmc8.pdf (82.2KB, pdf)
Supplemental Figure S12

RNA–DNA variant correlation. A and B: Variant allele frequency (VAF) determined by DNA and RNA sequencing for tier 1 to 3 variants (targetable genes) in our validation sample set (A) and tier 4 cancer census genes (B).

mmc9.pdf (47.5KB, pdf)
Supplemental Table S1
mmc10.docx (14.5KB, docx)
Supplemental Table S2
mmc11.docx (11.7KB, docx)
Supplemental Table S3
mmc12.docx (17.7KB, docx)
Supplemental Table S4
mmc13.docx (18.4KB, docx)
Supplemental Table S5
mmc14.docx (16.1KB, docx)
Supplemental Table S6
mmc15.docx (11.6KB, docx)
Supplemental Table S7
mmc16.docx (78.4KB, docx)
Supplemental Table S8
mmc17.docx (11.2KB, docx)
Supplemental Table S9
mmc18.xlsx (7.9KB, xlsx)
Data Profile
mmc19.xml (257B, xml)

Articles from The Journal of Molecular Diagnostics : JMD are provided here courtesy of American Society for Investigative Pathology

RESOURCES