Skip to main content
The Journal of Molecular Diagnostics : JMD logoLink to The Journal of Molecular Diagnostics : JMD
. 2022 Jul;24(7):760–774. doi: 10.1016/j.jmoldx.2022.03.011

A Validation Framework for Somatic Copy Number Detection in Targeted Sequencing Panels

Raghu Chandramohan , Jacquelyn Reuther †,, Ilavarasi Gandhi †,, Horatiu Voicu †,, Karla R Alvarez †,, Sharon E Plon ∗,§,¶,||,∗∗, Dolores H Lopez-Terrada †,‡,§,¶,||, Kevin E Fisher †,‡,§, D Williams Parsons ∗,†,§,¶,||,∗∗,, Angshumoy Roy †,‡,§,¶,||,∗∗
PMCID: PMC9302205  PMID: 35487348

Abstract

Somatic copy number alterations (SCNAs) in tumors are clinically significant diagnostic, prognostic, and predictive biomarkers. SCNA detection from targeted next-generation sequencing panels is increasingly common in clinical practice; however, detailed descriptions of optimization and validation of SCNA pipelines for small targeted panels are limited. This study describes the validation and implementation of a tumor-only SCNA pipeline using CNVkit, augmented with custom modules and optimized for clinical implementation by testing reference materials and clinical tumor samples with different classes of copy number variation (CNV; amplification, single copy loss, and biallelic loss). Using wet-bench and in silico methods, various parameters impacting CNV calling, including assay-intrinsic variables (establishment of normal reference and sequencing coverage), sample-intrinsic variables (tumor purity and sample quality), and CNV algorithm-intrinsic variables (bin size), were optimized. The pipeline was trained and tested on an optimization cohort and validated using an independent cohort with a sensitivity and specificity of 100% and 93%, respectively. Using custom modules, intragenic CNVs with breakpoints within tumor suppressor genes were uncovered. Using the validated pipeline, re-analysis of 28 pediatric solid tumors that had been previously profiled for mutations identified SCNAs in 86% (24/28) samples, with 46% (13/28) samples harboring findings of potential clinical relevance. Our report highlights the importance of rigorous establishment of performance characteristics of SCNA pipelines and presents a detailed validation framework for optimal SCNA detection in targeted sequencing panels.


Somatic copy number alteration (SCNA) is an important class of structural variation in cancer. SCNAs may be focal, affecting one or a few genes; or generalized, involving segments to full chromosomes or segments. The amplitude of such events can range from multifold copy gain (amplification) to biallelic copy loss (deep loss or homozygous loss), or more subtle events of smaller magnitude (single copy gain and shallow or monoallelic loss). By direct disruption of tumor suppressor genes or increased dosage of oncogenes, SCNAs have the potential to contribute to tumorigenesis and represent clinically significant biomarkers in diverse tumors.

Existing SCNA detection techniques fall mainly under targeted methods, such as fluorescence in situ hybridization, or genome-wide platforms, such as DNA arrays. With the widespread adoption of next-generation sequencing (NGS) in clinical cancer testing, the potential of a single platform to detect both actionable SCNA and sequence alterations (single-nucleotide variants and small insertions and deletions) has generated interest in NGS-based SCNA detection.1,2 SCNA detection from NGS provides a more scalable platform relative to fluorescence in situ hybridization. Compared with DNA arrays, SCNA detection from NGS offers a potentially higher resolution method for gene-level SCNA detection as well as higher dynamic range. Although exome- and genome-scale NGS is feasible in clinical cancer diagnostics, the shorter turnaround time, lower cost, and more focused interpretation requirements of targeted panels3,4 have led to tumor panels becoming the most widely prevalent NGS method in current clinical use.

However, significant challenges remain in the analytical and clinical validation of SCNA detection using NGS panels. First, copy number (CN) estimation by NGS is more accurate in exome- and genome-scale testing than smaller targeted panels as the increased target footprint allows for precise assessment of baseline ploidy. Most bioinformatic methods for SCNA detection were developed for exomes and genomes; although studies have shown adequate performance of SCNA calling using panels with >200 genes, the methods used to optimize these algorithms are not well described.5, 6, 7 Second, detection of somatic copy number variants presents additional challenges intrinsic to tumor samples, including confounding roles of tumor purity, ploidy, and clonality.8, 9, 10 For these reasons, many laboratory-based tumor NGS assays either do not report SCNAs or report only unequivocal high-magnitude events, without attempting to resolve heterozygous or homozygous losses.11, 12, 13 Third, the sparsity of methods and resources for analytical validation of SCNA detection from targeted NGS panels poses a major hurdle in the clinical implementation of such methods. Although methods for CN detection in NGS panels have been more widely published for germline specimens,14, 15, 16 and guidelines exist for validating NGS bioinformatics pipelines for mutation detection,17,18 to our knowledge, similar detailed description of methods for rigorous validation of SCNA calling in NGS panels is limited.

This study describes a validation framework for SCNA detection from targeted sequencing panels. Using CNVkit,19 an open-source SCNA detection algorithm, we report on the analytical validation of an SCNA pipeline on capture hybridization-based SCNA detection calling, detailing practical aspects of assay design considerations, normal reference calibration, available reference materials, sample-level quality assessment, and use of parameters for extended analysis, including intragenic copy number variation (CNV) detection. Performance characteristics on reference materials and clinical tumor samples with known SCNAs were assessed to validate SCNA detection in two medium-sized tumor-only sequencing panels. We also report on the clinical diagnostic value of incorporating SCNA detection in tumor-only sequencing panels used for profiling childhood solid tumors.

Materials and Methods

Samples

All samples analyzed in the study were collected under a Baylor College of Medicine Institutional Review Board–approved protocol for clinical and genomic analysis of pediatric cancers. Validation samples consisted of well-characterized clinical samples, cancer cell lines, and normal reference samples from unaffected individuals. Clinical samples were obtained from the Texas Children's Hospital Department of Pathology laboratory, with copy number profiles obtained from gold standard orthogonal methods (truth set), including OncoScan single-nucleotide polymorphism (SNP) array (Thermo Fisher Scientific, Waltham, MA), fluorescence in situ hybridization, and karyotyping (Table 1 and Supplemental Table S1). Samples used in this study were evaluated for high-level amplifications (five or more copies) and net copy number loss. Low-level gain (less than five copies) and copy neutral loss of heterozygosity in these samples were not assessed (Table 1). Clinical samples were divided into 11 samples for fine-tuning the SCNA pipeline parameters (referred to as the optimization cohort) (Supplemental Table S1) and 15 independent samples for evaluating the optimized parameters (referred to as the validation cohort) (Supplemental Table S1). Reference cell lines [namely, HT-29 (ATCC, Manassas, VA), MOLT-4 (Coriell Institute, Camden, NJ), and HAP1 (Horizon Discovery, Cambridge, UK)] with known copy number alterations were obtained (Supplemental Table S1).20, 21, 22, 23 The near-haploid HAP1 cell line is haploid for the entire genome, except for a 30-MB region of chromosome 15q.23 The SCNA pipeline was validated for different specimen types from diverse tumors, including fresh-frozen tissue, formalin-fixed, paraffin-embedded (FFPE) archival tissue, peripheral blood, bone marrow, and other specimen types from solid tumors and leukemias. A set of 16 unmatched normal peripheral blood samples (referred to as the normal reference cohort) (Supplemental Table S1) were used to generate the pooled normal reference comparator. In addition, independent sets of clinical samples were processed for quality assessment (n = 31, referred to as quality assessment cohort), detection of allelic imbalance (n = 8, referred to as allelic imbalance cohort), and evaluating the diagnostic utility of the SCNA pipeline (n = 28, referred to as clinical cohort) (Supplemental Table S1).

Table 1.

Overview of Samples with Known SCNA Alterations Profiled for Validation Study

Cohort Sample Tumor type Gene Alteration type Gold standard method TCH solid SCNA call TCH solid SCNA log2(FC)
Optimization T01 Hepatoblastoma CCND1 Amplification OncoScan Yes 3.78
T02 Neuroblastoma MYCN Amplification OncoScan Yes 5.66
T03 Metastatic poorly differentiated adenocarcinoma CytoScan
T04 Glioneuronal tumor TSC1 Heterozygous loss OncoScan Yes −0.4
CDKN2A Homozygous loss Yes −1.04
T05 Atypical teratoid Rhabdoid tumor ARID1B Heterozygous loss CytoScan Yes −0.56
T06 Medulloblastoma PTEN Heterozygous loss OncoScan Yes −0.9
T07 Atypical teratoid rhabdoid tumor SMARCB1 Homozygous loss FISH Yes −3.97
T08 Pancreatoblastoma OncoScan
T09 Neuroblastoma FISH
T10 Wilms tumor OncoScan
T11 High-grade ovarian tumor OncoScan
Validation T12 Wilms tumor OncoScan
T13 Wilms tumor TSC1 Heterozygous loss OncoScan Yes −0.76
T14 Neuroblastoma MYCN Amplification OncoScan Yes 4.55
T15 Metastatic medulloblastoma EGFR Heterozygous loss OncoScan Yes −0.9
T16 Wilms tumor AMER1 Homozygous loss OncoScan Yes −2.37
T17 Atypical teratoid rhabdoid tumor SMARCB1 Heterozygous loss FISH Yes −0.87
T18 Myeloid sarcoma CDKN2A Homozygous loss OncoScan Yes −2.3
T19 B-ALL RB1 Homozygous loss OncoScan Yes (intragenic event) −0.28
T20 Epithelial hepatoblastoma with HCC-like features OncoScan
T21 Malignant epithelioid glioneuronal tumor CDKN2A Heterozygous loss FISH Yes −0.45
T22 High-grade astrocytoma CDKN2A Homozygous loss OncoScan Yes −5
TP53 Heterozygous loss Yes −0.98
T23 Neuroblastoma MYCN Amplification OncoScan Yes 4.46
ALK Amplification Yes 3.4
T24 Dedifferentiated liposarcoma OncoScan
T25 Germline SMARCB1 Heterozygous loss Clinical report Yes −0.98
T26 Glioblastoma ATRX Hemizygous loss OncoScan Yes (intragenic event) −0.87

B-ALL, B-cell acute lymphoblastic leukemia; FISH, fluorescence in situ hybridization; HCC, hepatocellular carcinoma; TCH, Texas Children's Hospital.

The sample had other low-level gains and/or copy neutral loss of heterozygosity that was not assessed in this study.

Panel Design

The SCNA pipeline was validated for two representative medium-size gene panels custom designed for detecting single-nucleotide variants, insertions and deletions, and CNVs in childhood cancers at Texas Children's Hospital (TCH): a 124-gene (capture size, 0.96 Mb) panel for solid tumors (TCH solid) and a 152-gene (capture size, 1.26 Mb) panel for leukemias and lymphomas (TCH heme). Hybridization capture panels were designed using the NimbleGen SeqCap EZ system (Roche NimbleGen, Madison, WI) (Supplemental Table S2) to capture all NCBI Reference Sequence Database (RefSeq) coding exons of target genes. Design elements specifically incorporated for CNV calling included expanded intronic padding (approximately 100 bp) as well as inclusion of select deep intronic SNPs of certain genes for which copy loss is a functional mechanism (tumor suppressors).

Intragenic SCNA calls made exclusively by the SCNA pipeline (absent in the truth set) were validated on a separate custom-designed hybrid capture panel (TCH-intragenic copy number panel) targeting the entire gene body of 40 genes (capture size, 3.66 Mb) (Supplemental Table S2).

Library Preparation

Genomic DNA was extracted according to the manufacturer's protocols using the QIAamp DNA FFPE Tissue Kit (Qiagen, Germantown, MD). NGS libraries were prepared using KAPA HyperPlus Library Preparation Kit (Roche, Wilmington, MA) and sequenced 4- to 6-plex on an Illumina MiSeq (2 × 150 bp; Illumina, San Diego, CA) for an average of 9.5 million reads per sample.

To assess the reproducibility of the assay and pipeline, two FFPE samples with known SCNAs [MYC amplification (HT-29) and PTEN heterozygous loss (T06)] were assayed in duplicate by two different technologists. These libraries were then sequenced on two different sequencing runs to assess for interrun variability. To assess repeatability, another sample with a known SMARCB1 heterozygous loss (T25) was assayed in duplicate by the same technologist. These libraries were then sequenced on the same sequencing run for intrarun concordance analysis.

SCNA Pipeline

CNVkit19 version 0.9.3 was chosen for use from multiple available CNV detection tools as it offers the following: i) an open-source read-depth based CN calling algorithm optimized for SCNA, ii) compatibility with either paired (tumor/normal) or nonpaired modes (with pooled reference control), iii) robustness in bias handling in capture-based sequencing, and iv) the ability to leverage off-target reads for improving CNV calling. CNVkit was augmented with additional custom modules for annotation, intragenic calling, allelic imbalance calling, and visualization (each of which is described in following sections).

Preprocessing

BAM files were generated from FASTQ files using a custom DNA bioinformatics pipeline that includes alignment by Burrows-Wheeler Aligner version 0.7.12.24 Mark duplication and BAM quality control analysis was performed using Picard version 2.9.2 (http://broadinstitute.github.io/picard, last accessed March 22, 2021). BAM file subsampling and merging operations for in silico assessment of SCNAs were done by Sambamba version 0.6.9.25

Binning was done in preparation for the SCNA analysis. The genome was divided into regions captured and not captured by the targeted sequencing panel, referred to as on-target and off-target regions, respectively. These on-target and off-target regions were binned differently as they were split into a median bin size of 100 and 500,000 bp, respectively (Supplemental Figure S1, A and B).

For tumor-only sequencing workflows, a pooled reference CNV profile that is generated initially once per assay (bait set) was used to derive SCNA calls in tumor samples. The number of samples in the normal reference cohort to be used for developing the pooled reference was determined empirically. Two sex-specific pooled reference normal samples were generated using CNVkit reference command to account for sex chromosome ploidy and corrected for read depth, GC, RepeatMasker, and coverage profile around the baits. CNVkit includes a pseudocount equivalent to one sample with a neutral copy number when generating a pooled reference.

Quality Metrics

Global metrics for assessing quality of SCNA data in a sample have been previously published,19,26 including parameters to measure dispersion or noise in the log2 fold change (FC) values and confidence in segmentation. Metrics generated by the pipeline include median absolute deviation (MAD), SD, and interquartile range. MAD was defined as the median of the absolute deviation from the bin-level log2(FC) median, given by the formula: MAD = median(|X_i-X|), where X_i is bin-level log2(FC) and X is the median of all bin-level log2(FC) for each sample.

Gene-Level Calling and Annotation

The pipeline automates the different steps of CNVkit to perform SCNA analysis. CNVkit was used for calculation of read coverage at the predefined on-target and off-target bins, division of the calculated normalized tumor sample coverage by the sex-matched weighted average of normal samples' coverage to generate log2 transformed values [log2(FC)], segmentation of the data points generated using circular binary segmentation,27 calling gene-level events from these segments as bin-weighted average log2(FC), and calculation of SCNA metrics. Assessment of the SCNA pipeline was limited to detection of high-level amplifications and copy number losses. A custom module annotated the segments and gene-level call files. Segments were annotated with Database of Genomic Variants28 (stringent CNVs; http://dgv.tcag.ca/dgv/app/downloads?ref=GRCh37/hg19, released February 3, 2015) if there was >75% mutual overlap. Catalogue of Somatic Mutations in Cancer version 8629 was used to highlight the prevalence of the gene-level SCNA in different cancer contexts (number of times SCNA has been observed in each tumor type).

Detecting Allelic Imbalance

A custom SNP variant allele fraction module was developed to identify genomic regions with allelic imbalance. The pipeline relies on the captured coding regions as well as the captured adjacent intronic regions to identify heterozygous SNPs for detection of allelic imbalance. BAM files were genotyped at polymorphic genomic loci identified in dbSNP, 1000G, ExAC, and gnomAD databases that intersect with the hybridization capture bait coordinates using Freebayes30 (-i -X -u --min-mapping-quality 10 --min-alternate-count 2 --min-alternate-fraction 0.01). Using the genotyped variant call format, heterozygous SNPs were defined as those with an alternate allele fraction between 0.05 and 0.95. Copy number segment genome coordinates were used to group together heterozygous SNPs, which were then subgrouped based on allele fraction of the genotyped SNP (<0.50 or ≥0.50). Off-target reads were utilized in the analysis to assist in extrapolating the segments genome wide. A chromosome arm was identified to have allelic imbalance if the separation between these grouped allele fraction means was >0.35 (referring to the imbalance in allele fractions) and observed in at least half the chromosome arm.

Detecting Intragenic Events

A custom module was developed to detect intragenic events. On-target bins with coverage within 3 SDs of variance using the panel of peripheral blood normal samples were identified as high confidence to be used for this analysis. These high-confidence bins were then used to segment the bins’ gene by gene using the circular binary segmentation algorithm from DNAcopy version 1.62.027 R package. Outlier bins within a gene were smoothened before segmentation based on log2(FC) values, and segments within 5 SDs from the preceding segment mean were appended to the preceding segment. To report high-confidence calls, genes were identified to have an intragenic event when more than one segment was involved and the log2(FC) of any segment within a gene was <−0.8 or >1.5. Structural were detected using Manta version 1.6.031 (tumor-only and exome mode with default parameters with the exception of minCandidateVariantSize set to 1000 bp) for samples sequenced using TCH-intragenic copy number panel for validation of predicted intragenic events.

Visualizing SCNA Results

SCNA pipeline results were summarized and assessed using a custom-developed Python-based interactive CNV visualization dashboard, reconCNV version 1.0.0.32 In addition, the seg file generated by the pipeline can be loaded into Integrative Genomics Viewer version 3.033 for simultaneous visualization of SCNAs and mutations.

Results

Optimization of SCNA Pipeline

Establishing Normal Reference

The normal reference cohort (8 males and 8 females) was used to generate sex-specific pooled normal references (Supplemental Table S1). Average coverage in the normal samples was 343× (range, 275× to 391×) in targeted regions (Figure 1A and Supplemental Figure S1C) and 0.03× (range, 0.01× to 0.05×) in off-target regions (Supplemental Figure S1D).

Figure 1.

Figure 1

Optimizing normal reference parameters for SCNA analysis. A: Mean sequencing coverage of normal peripheral blood samples. B: Verifying fitness of normal samples by performing a sex-matched normal versus normal analysis (females: top panel; males: bottom panel). Each normal sample (gray rectangles) was compared one on one against other normal comparators (x axis). Box plots summarize the gene-level log2(FC) values. C: Box and density plots summarizing log2(FC) ratio of 15 peripheral blood normal samples (7 males and 8 females), after removing the outlier normal, show a narrow distribution of values centered around zero. Dashed red line represents log2(FC) threshold for copy number loss. D: Box plots summarizing increase in the median absolute deviation (blue dots) for tumor samples (optimization cohort) when the normal reference pool is down sampled to decreasing coverages (Kruskal-Wallis P = 1.3 × 10−7). E: Change in sensitivity (blue) and specificity (red) of the SCNA pipeline when tumor samples are compared against normal reference pool at different down-sampled coverages. n = 16 (A); n = 11 (D). ∗∗P < 0.01, ∗∗∗∗P < 0.0001 (Wilcoxon rank-sum test).

To verify that samples in the normal reference cohort (Supplemental Table S1) displayed a diploid profile in target regions, the copy number of each gene in every normal sample was compared against all control normal samples. This normal versus normal analysis (Figure 1B) demonstrated nearly all gene calls to be centered around log2(FC) = 0; however, two exceptions were noted. In the male sample N11, a gain in KRAS was observed when compared against all other normal samples. Despite the average KRAS copy gain log2(FC) of 0.32 being well below the threshold for amplification [log2(FC) > 1.5], N11 was removed from the pool. In a second exception, sample N02 displayed FOXL2 gene copy loss at a log2(FC) of −0.49 when compared against sample N08; however, this aberration was not replicated in comparison to other samples, suggesting a low false-positive rate. In summary, gene-by-gene comparison from the normal versus normal analysis identified optimal diploid profiles in seven male and eight female samples in the normal reference cohort, with only a single nondiploid gene-level log2(FC), yielding a false-positive rate of 0.01% (Figure 1C).

Further analysis to assess the impact of varying sequence coverage in normal samples on generating a valid pooled reference was undertaken by performing an in silico down sampling of the 15 normal sample BAM files (original average coverage, 343×; down sampled progressively to 200×, 100×, 50×, 25×, 10×, 5×, and 1×) in the normal reference cohort. Using MAD as a measure of noise, the dispersion of tumor specimen copy number profiles was observed to be significantly increased when average normal specimen coverage fell below 50× (Figure 1D); furthermore, although SCNA profiles of tumor samples in the optimization cohort (n = 11) were concordant with expected results at original coverage of 343×, progressive down sampling negatively affected pipeline performance below 25× (Figure 1E).

Benchmarking Quality

To assess feasibility of using MAD derived from CNVkit as a benchmark of sample-level quality, a distribution of the MAD metric was performed in a cohort of 31 FFPE tumor samples with known variable metrics of NGS data quality (variable uniformity in coverage and poor library complexity). Analysis revealed a bimodal distribution of MAD, separating noisy samples (with poor sequencing quality) from higher-quality samples (Supplemental Figure S2). Using bin-level log2(FC) with other sequence quality metrics, a maximum MAD threshold of 0.35 was established. Samples above this threshold for SCNA call quality may be flagged for review. The utility of a MAD threshold was assessed independently on a separate probe set (TCH heme) and on different samples.

Setting CN Thresholds

CN call thresholds were set for different SCNA classes based on theoretical calculations. The expected log2(FC) for detection of a gene-level amplification (five or more copies) in a pure tumor sample (100% purity) and a normal contaminating admixture tumor sample with 50% tumor purity is 1.32 and 0.81, respectively. Similarly, for a clonal heterozygous loss in a pure tumor sample (100% purity) and one with 50% tumor purity, expected log2(FC) is −1 and −0.42, respectively. Thresholds were set for loss at <−0.4 and high-level amplifications at >1.5 and requiring more than four contiguous bins. Using the set CN thresholds, analysis of gene-level copy number calls generated with the normal versus normal analysis yielded only 1 of 14,012 false-positive comparisons (0.01% false positive rate) (Figure 1C).

Evaluating Pipeline Performance

With basic parameters of normal reference, quality control, and CN call thresholds set, pipeline performance was evaluated on known positive samples by iteratively tuning intrinsic and extrinsic variables. First, using the set thresholds above, CN calls from the pipeline and gold standard methods (array comparative genomic hybridization and OncoScan SNP array) on the same reference cell lines (Supplemental Table S1) were assessed for concordance. Comparing TCH solid panel gene-level log2(FC) values for the HT-29 cell line with array comparative genomic hybridization data (Figure 2A) and TCH heme panel gene-level log2(FC) values for the MOLT-4 cell line with OncoScan SNP array data (Figure 2B) revealed high correlation (R2 = 0.90 and R2 = 0.95, respectively). The overall copy number profiles generated by the pipeline from the TCH solid panel on HT-29 (Figure 2, C and D) and TCH heme panel on MOLT-4 cell lines (Supplemental Figure S3, A and B) also resembled those generated by array-based methods. Next, the near-haploid HAP1 cell line was analyzed to assess the baseline threshold for shallow (single-copy, heterozygous) loss. Although the pipeline initially interpreted the most prevalent ploidy state (CN = 1) as the baseline log2FC = 0 (CN = 2) (Figure 2E), diploid recentering using the MAP2K1 gene that is within the diploid segment of chromosome 15 as baseline showed gene-level log2(FC) values for all target genes below the CN threshold for loss [log2(FC) < −0.4] (Figure 2F), with the exception of the diploid MAP2K1 gene and MYC (which showed copy gain).

Figure 2.

Figure 2

Establishing congruence between SCNA derived from copy number array and targeted sequencing panel data. A: Comparison of HT-29 cell line gene-level log2(FC) between copy number array [array comparative genomic hybridization (aCGH)] and Texas Children's Hospital (TCH) solid sequencing data (R2 = 0.90). B: Comparison of MOLT-4 cell line gene-level log2(FC) between copy number array (OncoScan) and TCH heme sequencing data (R2 = 0.95). C and D: Overall SCNA profile of HT-29 cell line using aCGH data from NCI-60 CellMiner (C) and TCH solid targeted sequencing data (D). E and F: SCNA profile of HAP1 cell line before recentering (E) and after recentering (F).

Finally, the set CN thresholds were tested on cases in the optimization cohort (n = 11) (Supplemental Table S1) with different CNV classes (namely, high-level amplification and copy number loss). Gene-level log2(FC) between TCH solid panel and OncoScan data, where available (n = 6), showed high correlation (R2 = 0.77) (Supplemental Figure S3C), and the overall sensitivity and specificity of the pipeline were assessed in this cohort at 1.00 and 0.94, respectively (Table 1).

Next, the impact of variable depth of tumor coverage was assessed on pipeline performance by performing in silico down sampling with the tumor BAM files in the optimization cohort (n = 11) (Supplemental Table S1). Although the cohort had been sequenced to an average coverage of 949× (range, 274× to 1967×) in targeted regions (Figure 3A) and 0.07× (range, 0.02× to 0.17×) in off-target regions, progressive down sampling of tumor coverage revealed a negative impact on MAD at mean coverage of ≤50×. The MAD of samples down sampled to 200× was significantly lower when compared with samples down sampled to even lower coverages (P < 0.05 against 100×, and P < 0.0001 against other comparisons; Wilcoxon test) but was not significant against higher coverages (Figure 3B). Similarly, the observed sensitivity and specificity appeared to drop with tumor coverage approaching <100×; specifically, gene-level log2(FC) values of samples with known CN variants and relatively high tumor purity (>70%) were affected at <100× coverage, as exemplified for each CNV class using T01 (amplification of CCND1), T07 (homozygous deletion of SMARCB1), and T06 (heterozygous deletion of PTEN) (Table 2). Taken together, a conservative minimum threshold for tumor sample coverage was empirically set at 100× for robust assessment of SCNAs (Figure 3, B and C).

Figure 3.

Figure 3

Optimizing tumor sample parameters for SCNA analysis. A: Mean sequencing coverage of tumor samples in the optimization cohort. B: Box plots summarizing the increase in median absolute deviation (blue dots) for tumor samples (optimization cohort) when their coverages are down sampled to decreasing coverages (Kruskal-Wallis P = 1.4 × 10−15). C: Change in sensitivity (blue) and specificity (red) of the SCNA pipeline when tumor samples are down sampled to different coverages. D: Change in sensitivity (blue) and specificity (red) of the SCNA pipeline with varying bin size. EG: Studying limit of detection by diluting samples T01 (purity, 70%; E), T06 (purity, 80%; F), and T07 (purity, 90%; G) that have confirmed CCND1 amplification, PTEN heterozygous loss, and SMARCB1 homozygous loss, respectively, by orthogonal gold standard method with normal reference in silico. H: Diluting T17 (previously confirmed SMARCB1 heterozygous loss) with normal reference in silico and wet bench. Comparing log2(FC) values between theoretical (green), in silico (dark red), and wet-bench (golden yellow) dilutions to verify minimum tumor fraction at which loss is detected using log2(FC) threshold of <−0.4 (red). Fraction of tumor sample at log2(FC) threshold for theoretical and wet-bench dilutions marked by blue and purple dashed lines, respectively. Line plots show good concordance between log2(FC) values of theoretical (green) and in silico dilutions (dark red). Copy number thresholds marked by red dashed line [amplifications, log2(FC) > 1.5; and loss, log2(FC) < −0.4], and fraction of tumor sample at the threshold has been marked by blue dashed line. n = 11 (A and B). ∗P < 0.05, ∗∗∗∗P < 0.0001 (Wilcoxon rank-sum test).

Table 2.

Effect of Tumor Sample Coverage on Gene-Level Log2(FC) and MAD

Sample Gene and alteration type Tumor purity, % Coverage Sample MAD TCH solid SCNA log2(FC) TCH solid SCNA call
T01 CCND1 amplification 70 834× 0.22 3.78 Yes
700× 0.22 3.78
600× 0.22 3.78
500× 0.23 3.79
400× 0.23 3.78
300× 0.24 3.77
200× 0.26 3.77
100× 0.3 3.77
50× 0.37 3.77
25× 0.48 3.77
10× 0.69 3.73
0.99 3.72
6.14 4.67
T07 SMARCB1 homozygous deletion 90 337× 0.23 −3.97 Yes
200× 0.26 −3.99
100× 0.31 −4.07
50× 0.4 −3.94
25× 0.52 −3.56
10× 0.78 −3.05
1.1 −1.08
8.43 0.03 No
T06 PTEN heterozygous deletion 80 104× 0.26 −0.9 Yes
50× 0.32 −0.9
25× 0.42 −0.88
10× 0.61 −0.93
0.86 −1.07
4.25 1.02 No

MAD, median absolute deviation; TCH, Texas Children's Hospital.

The measure fails to be within the expected limits. Tumor purity estimated by pathology review.

Pipeline performance was further optimized by evaluating performance at variable bin sizes. An optimal bin size is one that balances the resolution of copy number calling and the noise in the CNV profile. By assessing sensitivity and specificity with different bin sizes ranging from 25 to 400 bp, the best sensitivity (1.00) and specificity (0.94) were observed to result from a 100-bp bin size (Figure 3D). For larger bins, low segmentation resolution resulted in dropout of amplification calls in small oncogenes (MYC), whereas smaller bins produced oversegmentation of data. For off-target regions, bin size was optimized from 100 to 1000 kbp; the difference in MAD between on-target and off-target bins stabilized at 500 kbp (Supplemental Figure S4), reducing the number of spurious segment breaks.

Using the normal reference cohort, the bin size and coverage distribution were evaluated for on-target and off-target bins (Supplemental Figure S1). Average on-target and off-target bin sizes are 100 bp and 500 kbp, respectively, with median coverage of 336× and 0.02×, respectively. The bin size for off-target bins is significantly greater than that for on-target bins to capture reliable signals.

Validation of SCNA Pipeline

Analytical Sensitivity and Specificity

On the basis of the results of the optimization studies described above, pipeline thresholds were determined (Supplemental Table S3) and further validated using the validation cohort (Supplemental Table S1 and Table 1). The specificity and sensitivity using this independent cohort to detect amplification and loss were 1.00 and 0.92, respectively (Table 1). When studied by SCNA variant class, sensitivity and specificity for detection of amplifications were observed to be 1.00 and the sensitivity and specificity to detect losses were observed to be 1.00 and 0.93, respectively. An average of 8.3 calls for loss per sample were made by the pipeline.

The analysis pipeline and thresholds determined from TCH solid panel data were also found to work robustly when applied to the TCH heme (hematologic malignancy) panel. A similar optimization and validation approach was undertaken utilizing 17 tumor samples with known copy number status, as determined by orthogonal clinical methods. The validation events used for assessment of sensitivity and specificity included 3 losses and 11 samples with normal karyotypes. The overall analytical sensitivity (1.00) and specificity (1.00) were determined to be similar to the TCH solid panel without requiring changes to the pipeline parameters.

Limit of Detection

To determine the limit of detection for different CNV classes, both in silico and wet-laboratory dilution studies were performed. For three tumor samples (T01, T06, and T07) in the optimization cohort, each with a single gene-level event (amplification, heterozygous or shallow loss, and homozygous or deep loss, respectively), varying proportions of the tumor sample reads (5%, 10%, 20%, 30%, 40%, 50%, 60%, 70%, 80%, and 90%) were supplemented in silico with sex-matched normal sample reads (1, tumor BAM). The trend of observed log2(FC) was found to correlate with the expected values computed theoretically (assuming a clonal event in 100% tumor purity and based on the integer copy number of the gene of interest derived from an orthogonal gold standard platform) (Figure 3, E–G). Using the previously established thresholds for the different CNV classes, in T01 (CCND1 amplification), the pipeline could detect the expected call down to an in silico dilution of 20% tumor purity (Figure 3E). The log2(FC) values generated by the SCNA pipeline for amplification of CCND1 in T01 and MYC in T02 translated to estimates of 27 and 101 copies, respectively, in comparison to the OncoScan estimates of 24 and 56 copies, respectively. Not surprisingly, for copy losses (which are lower-magnitude events than amplifications), the detection limits were reached at a higher tumor purity [namely, at a 50% dilution in T06 (PTEN heterozygous deletion) (Figure 3F) and at a 40% dilution in T07 (homozygous deletion of SMARCB1) (Figure 3G)]. Actual wet-laboratory serial dilution of T17 DNA (SMARCB1 heterozygous deletion identified by fluorescence in situ hybridization with NA12878 normal reference DNA) showed a similar fold change threshold (Figure 3H).

Precision

To assess reproducibility of the assay and SCNA analysis, the gene-level calls between the duplicate samples were compared and found concordant log2(FC) values for both sets (R2 = 0.95 for HT-29/HT-29R2; R2 = 0.85 for T06/T06R2), including the specific log2(FC) of the known alterations in each case (Figure 4A, Supplemental Figure S5, and Supplemental Table S4).

Figure 4.

Figure 4

Evaluating precision of the SCNA pipeline. A: Reproducibility measured using HT-29 cell line showed concordant gene-level log2(FC) values (R2 = 0.95) when compared against its replicate (HT-29R2). B: Repeatability measured using T25 shows concordant gene-level log2(FC) values (R2 = 0.98) when compared against its replicate (T25R2). C: Interquartile range (IQR) of gene-level log2(FC) measured per gene (blue) using replicates of the positive control sample, HT-29. D: Genes crossed the set threshold [log2(FC) < −0.4 and log2(FC) > 1.5; blue] for making gene-level SCNA calls consistently in all replicates, except for IRF2 and STAG2 (outliers; black). n = 124 genes (C); n = 18 replicates (C). TCH, Texas Children's Hospital.

Similarly, to assess repeatability, a scatterplot of gene-level log2(FC) values between the two replicates showed a high degree of concordance (Figure 4B) of all log2(FC) values (R2 = 0.98), including the known copy number alteration involving SMARCB1 heterozygous loss [log2(FC) = −0.98 and −1.02, respectively, for the two replicates] (Supplemental Table S4).

To rigorously measure interrun precision of the pipeline, gene-level log2(FC) was analyzed from 18 replicate runs using the positive control sample HT-29 cell line sequenced with the TCH solid panel. This analysis demonstrated a narrow distribution (interquartile range, 0.03) of most gene-level log2(FC) values (Figure 4C); however, it also revealed 3 of 124 genes (MDM4, MAP2K4, and DAXX) with an interquartile log2(FC) range of >0.2. Furthermore, although 28 of 30 (93.3%) genes expected to have a detectable CN call were consistently called in all 18 replicates (Figure 4D), two genes (IRF2 and STAG2) with log2(FC) values close to threshold for loss occasionally dropped out (IRF2, 2/18; and STAG2, 3/18).

Detecting Allelic Imbalance

Although CNVkit is not designed to detect allelic imbalance, a custom SNP variant allele fraction module (see Materials and Methods) was designed to assess feasibility of this analysis. Evaluation using normal reference cohort (Supplemental Table S1) showed an increased number of heterozygous SNP sites identified when using TCH solid (designed to capture additional flanking intronic regions) over standard exon panels (Supplemental Figure S6). Given the limitations of targeted panels with sparse and discontinuous genome coverage and limited capture of noncoding SNP sites, low-resolution allelic imbalance detection was assessed, with the resolution set to chromosomal arm level and including at least four or more genes on the arm. Eight clinical FFPE tumor samples from histologic specimens with high probability of exhibiting arm-level loss of heterozygosity (five neuroblastoma and three Wilms tumor; allelic imbalance cohort) were processed using both TCH solid panel and OncoScan array. Chromosome arm-level loss of heterozygosity detected by OncoScan (n = 12) was evaluated for allelic imbalance using the SNP variant allele fraction module, resulting in a sensitivity of 0.83 and a specificity of 0.99. Of note, some of the arm-level loss events detected with support for allelic imbalance from the SNP variant allele fraction module were chromosome 1p and 11q loss in neuroblastoma samples, previously identified to be recurrent events in high-risk patients34 (Figure 5). The single-chromosome, arm-level, false-positive allelic imbalance call was not marked as loss of heterozygosity by OncoScan but was reported to have nine copies, indicative of a copy gain-mediated allelic imbalance. Two of the false negatives involved chromosome arms 4q and 11p, where SNPs were insufficient to predict allelic imbalance for a significant portion of the arm.

Figure 5.

Figure 5

Identifying allelic imbalance and intragenic events. A: Neuroblastoma sample wild type for chromosome 11q SCNA supported by segments (orange) centered around log2(FC) = 0 and heterozygous single-nucleotide polymorphism (SNP) means centered close to 0.5 variant allele fraction (VAF; orange), as seen in the SNP allele fraction track. B: Neuroblastoma sample with allelic imbalance in chromosome 11q, supported by a segment (orange) in chromosome 11q with log2(FC) = −1.3 and heterozygous SNP means (orange) showing imbalance with SNPs clustering close to 0 and 1 VAF, as seen in the SNP allele fraction track.

Evaluating Limit of Resolution

Although gene-level copy number detection is the primary objective in NGS panels, the variable size of genes and probe density in panel designs may allow further assessment of the limit of resolution within genes. Within the 26 clinical samples (including both optimization cohort and validation cohort), the average number of genes with intragenic segmentation breaks was found to be 17 using CNVkit; however, use of an optimized custom module (see Materials and Methods) to smoothen segmentation reduced this to an average of four genes with intragenic segmentation. Ten intragenic calls were identified in 26 samples, including partial loss of known tumor suppressor genes, such as PTEN (T06), KDM6A (T06), NF1 (T13), PBRM1 (T15), RB1 (T19), and ATRX (T26), as well as a complex event involving the ALK oncogene (Supplemental Figure S7). Intragenic events were verified by visual inspection of the tumor BAM in Integrative Genomics Viewer33 for breakpoints within targeted regions supported by paired-end and split reads and/or by comparison of relative change in coverage depth in adjacent capture regions between the tumor sample and an unmatched normal reference (eg, intragenic loss of ATRX exons 9 to 17 in T26 and a separate event involving amplification of all exons of ALK with the exception of exon 16, which exhibited loss). To validate events without discordant paired read (reads with aberrant insert size) or split-read evidence, additional sequencing was performed (when sample DNA was available) using a full gene body tiled (TCH-intragenic copy number) panel that confirmed precise intronic breakpoints (eg, intragenic deep loss of RB1 exon 18 and shallow loss of RB1 exons 19 to 27 in T19) (Supplemental Figure S8). This demonstrated the ability of the intragenic event detection module to accurately detect events with high resolution.

Assessing Diagnostic Utility

As preliminary evaluation of the potential diagnostic utility of the SCNA pipeline for pediatric cancers, clinical cohort (an independent set of 28 histologically diverse pediatric cancer samples that had been previously characterized using the TCH solid panel for single-nucleotide variants and insertions and deletions but not for CNVs) was re-analyzed (Supplemental Table S1). The potential clinical relevance of SCNAs was annotated in diagnostic, therapeutic, and prognostic categories, as described by Li35 in 2017. SCNAs were identified in 86% of samples (24/28), resulting in detection of clinically relevant CNVs in 46% of samples (13/28), as summarized in Figure 6. Events of potential diagnostic and/or prognostic utility included AMER1 (formerly WTX) loss in Wilms tumor, MYCN gene amplification in neuroblastoma and medulloblastoma, chromosome 6 loss in WNT-subtype medulloblastoma, chromosome 1p and 16q loss in Wilms tumor, and chromosome 1p and 11q loss in neuroblastoma. Several SCNAs were also identified to be of potential relevance to therapeutic decision making (in the event of recurrent or refractory disease), including PTEN loss (a potential biomarker for response to phosphatidylinositol 3-kinase/mammalian target of rapamycin inhibitors36) and CDK4 amplification (predictive of response to cyclin-dependent kinase inhibitors, such as palbociclib, in some tumor types36).

Figure 6.

Figure 6

Assessing diagnostic utility of SCNA analysis. A: Oncoprint summarizing the diagnostic utility of integrating copy number analysis to the Texas Children's Hospital (TCH) solid gene mutation panel. DNA extracted from 28 available pediatric cancer samples was evaluated for copy number changes and somatic point mutations. B: Sunburst plot highlighting diagnostic (orange), therapeutic (green), and prognostic (yellow) utility of including copy number analysis as part of TCH solid. AMP, amplification; ATRT, atypical teratoid rhabdoid tumor; Chr, chromosome; CNLOH, copy-neutral loss of heterozygosity; GNT, glioneuronal tumor; LOH, loss of heterozygosity; LS, liposarcoma; MB, medulloblastoma; MEGNT, malignant epithelioid glioneuronal tumor; MUT, mutation; NB, neuroblastoma; WT, Wilms tumor.

Discussion

Although targeted sequencing panels for clinical tumor profiling are universally used for single-nucleotide variant and insertion and deletion detection, SCNA analysis from NGS data is not routinely included in clinical diagnostics or limited to detection of high-magnitude and/or outlier SCNA events, such as high-level amplifications37 and deep or biallelic loss. Validation of SCNA detection from targeted NGS panels, especially shallow deletions, poses several practical challenges due to clonality, purity, ploidy, and lack of available reference samples. In this study, a framework is presented for validation of gene-level SCNA detection from targeted NGS panels using an extensive analytical validation of a tumor-only SCNA pipeline as a model. The ability of the pipeline to detect SCNA in a cohort of 28 pediatric tumors was then assessed (clinical cohort) (Supplemental Table S1).

To reliably identify SCNAs, several features of the analytic pipeline were rigorously optimized, including the method to assess the normal reference, a metric for assessing quality of the SCNA profile, assessing bin sizes for optimal detection of amplifications and losses, the impact of sequencing coverage on robust SCNA detection, and the limit of detection. Our data demonstrate the importance of establishing the normal reference by using a normal versus normal analysis that highlighted the rare normal sample with a nondiploid profile. By profiling samples of variable quality, it was shown that the MAD is an excellent measure of noise in SCNA calling. Although the MAD threshold depends on both sample intrinsic (quality of DNA) and extrinsic (probe chemistry) features, once determined during validation, it can effectively provide confidence in sample-level SCNA call quality. Empirical determination of optimal bin size was also demonstrated through testing the impact of variable bin sizes on analytical performance; although the optimal bin size would be primarily dependent on the enrichment platform and chemistry, default settings may not yield optimal performance.

One of the major obstacles to validation of an SCNA pipeline is the relative lack of available reference materials. Along with the well-characterized tumor cell lines HT-29 and MOLT-4, the use of another tumor cell line derivative, the near-haploid HAP1 cell line, that can be a highly effective resource in validating shallow loss in the vast majority of genes was introduced in this study. Using both wet-bench and in silico dilution studies, this study presents a framework for assessing the limit of detection for different variant classes, taking into consideration clonality and tumor purity. However, the precision studies showed that detection of subclonal heterozygous loss close to the limit of detection can be challenging when compared with deep loss or amplification as some gene-level calls drop out owing to the log2(FC) threshold for loss being set close to this limit. This was observed when the limit of detecting the PTEN heterozygous deletion was reached at approximately 50% tumor purity (Figure 3F).

Although read-depth based CN detection is commonly employed, a panel design utilizing a sufficient number of heterozygous SNP sites can be highly useful for unambiguous assessment of SCNA. As it was shown in the case of the HAP1 cell line before recentering (Figure 2, E and F), exclusive read-depth based strategies are less effective in detection of potentially clinically relevant events, such as heterozygous loss of tumor suppressor genes and chromosome arm-level events correlated with prognosis or diagnosis. As demonstrated in this study, this limitation can be overcome by supplementing allelic imbalance information using heterozygous SNPs captured by the sequencing panel to segments derived from read depth. The pipeline was also able to detect broader chromosome arm-level loss of heterozygosity events by leveraging the ultra-low-coverage off-target reads. Using a combination of on-target bins along with the heterozygous SNP sites that captured within- and off-target bins, it was possible to detect chromosome arm-level loss of heterozygosity with high sensitivity and specificity. The additional flanking intronic capture in the TCH solid panel captures approximately 200 additional heterozygous SNP sites (Supplemental Figure S6) compared with a standard exon-limited design, providing increased confidence in identifying heterozygous loss. Similar approaches to increase the number of available SNP sites through incorporation of several hundred to several thousand targets1,6,38,39 have been widely deployed. Strategically increasing the density of SNP probes designed throughout the genome and in key regions of interest has the potential to increase sensitivity of detecting relevant SCNA events that were not assessed in this study, such as low copy gain, copy neutral loss of heterozygosity, and whole genome duplication. It would also facilitate calculation of copy number alteration burden. This could permit more complex allele-specific SCNA analysis, providing more reliable detection of these events that are of prognostic value.40, 41, 42, 43, 44

An ambitious goal for SCNA analysis from targeted sequencing data is for it to serve as a stand-alone test for detection of small SCNAs (intragenic events) without requiring supplemental validation using methods such as multiplex ligation-dependent probe amplification. A combination of optimized panel design and custom analysis modules offers the possibility of achieving this high resolution. In this study, a custom pipeline module was developed to detect intragenic SCNA events involving a single exon (T19 RB1 exon 18) as well as events involving several exons (T26 ATRX exons 9 to 17). This highlights the potential for greater resolution of SCNA analysis when using sequencing data in comparison to clinical copy number arrays.

Through a retrospective analysis of 28 pediatric cancer samples of various histologic specimens (clinical cohort) (Supplemental Table S1), the ability of the SCNA pipeline to detect potential clinically relevant findings was highlighted. Compared with adult cancers, pediatric cancers often have a low tumor mutation burden, making it particularly critical to derive as much information as possible from each patient sample sequenced. In addition, although it is becoming more common for pediatric tumors to be biopsied at the time of recurrence, concerns about procedure-related morbidity can limit these procedures as well as the quantity of tumor samples obtained, making downstream decisions about sample prioritization particularly challenging. In this context, reducing the number of different clinical tests to be performed on a sample is especially critical. Comprehensive profiling of DNA for both mutations and SCNAs using the same platform has the potential to improve the ability to uncover biologically and clinically relevant tumor alterations even with limited sample availability. This approach also offers advantages to diagnostic laboratories as it might decrease costs as well as the turnaround time for clinical molecular tumor testing. To more rigorously evaluate the diagnostic utility of this approach, tumor testing in a larger cohort of tumor samples will be required, such as the ongoing Texas KidsCanSeq study, which is comparing the utility of multiple parallel molecular testing approaches (including copy number arrays, panel sequencing, exome sequencing, and RNA sequencing) for childhood cancer patients.45

In addition to the limitations discussed above that are intrinsic to copy number calling algorithms applied on NGS data, this particular study was limited in scope in terms of assessing different workflows that are commonly deployed in the clinical laboratory, such as comparison of performance of SCNA calling between tumor-only (with pooled normal reference) and tumor-normal matched pairs. In addition, SCNA calling has also been implemented on targeted NGS panels enriched using amplicon-based methods, which may have unique biases and advantages compared with capture-based enrichment.

In summary, the results herein confirm that addition of an optimized SCNA pipeline to a mutation-focused clinical tumor panel sequencing test is capable of detecting copy number alterations in tumor samples with high sensitivity, specificity, and reproducibility. Herein, various parameters were defined that require optimization in such a pipeline and demonstrate the use of these methods to validate a clinical copy number analysis pipeline. These data suggest that addition of a copy number calling analysis pipeline to a mutation panel is feasible and has the potential to increase the panel's utility and deliver a more comprehensive approach to characterizing the spectrum of genetic alterations that can guide the clinical care of childhood cancer patients.

Acknowledgments

We thank the members of the Texas Children's Cancer and Hematology Centers; and our patients and their families.

Footnotes

Supported by the Gillson Longenbaugh Foundation (D.W.P.), The Cullen Foundation (R.C.), the St. Baldrick's Foundation (D.W.P.), and the NIH National Human Genome Research Institute and National Cancer Institute grant U01HG006485 (D.W.P. and S.E.P.).

Disclosures: S.E.P. is a member of the Scientific Advisory Panel of Baylor Genetics Laboratories.

Supplemental material for this article can be found at http://doi.org/10.1016/j.jmoldx.2022.03.011.

Contributor Information

D. Williams Parsons, Email: dwparson@texaschildrens.org.

Angshumoy Roy, Email: aroy@bcm.edu.

Supplemental Data

Supplemental Figure S1

Bin size characterization. Comparison of on-target (A) and off-target (B) bin size. Comparison of on-target (C) and off-target (D) coverage using female normal peripheral blood samples. Median values marked in red. n = 8 (C and D).

mmc1.pdf (28.1KB, pdf)
Supplemental Figure S2

Threshold for median absolute deviation (MAD). A: MAD as a measure to evaluate noise in SCNA profile with the threshold for flagging samples with higher noise set at MAD = 0.35 (red line). B: A density plot shows a bimodal distribution of samples with variable quality (pass or flag/fail). n = 31 (B).

mmc2.pdf (29.3KB, pdf)
Supplemental Figure S3

A and B: Genome-wide SCNA profile of MOLT-4 cell line using single-nucleotide polymorphism array data (A) and Texas Children's Hospital (TCH) heme (B) targeted sequencing data. Segments with log2(FC) greater than zero and less than zero are plotted in red and blue, respectively. C: Comparison of gene-level log2(FC) between copy number array (OncoScan) and TCH solid sequencing data when OncoScan data were available (R2 = 0.77). n = 6 (C).

mmc3.pdf (990.4KB, pdf)
Supplemental Figure S4

Optimizing off-target bin size. Box plot of the difference between on-target median absolute deviation (MAD) and off-target MAD when off-target bin size is varied from 100 to 1000 kbp.

mmc4.pdf (23.2KB, pdf)
Supplemental Figure S5

Reproducibility measured using a tumor sample (T06), with a known PTEN heterozygous loss, shows concordant gene-level log2(FC) values (R2 = 0.85) when compared against its replicate (T06R), except for HIST1H3B.

mmc5.pdf (15.4KB, pdf)
Supplemental Figure S6

Total number of heterozygous (het.) single-nucleotide polymorphisms (SNPs) identified 16 peripheral blood normal samples (A) and number of heterozygous SNP sites identified in these normal samples by chromosome arm compared between exon + flanking intron design (green) and exon design (orange), represented as a boxplot (B).

mmc6.pdf (32.1KB, pdf)
Supplemental Figure S7

Intragenic events identified in 26 clinical samples (optimization and validation cohorts) and HAP1 cell line. A: Loss of PTEN exon 1 in T06. B: Loss of KDM6A exons 5 to 10 in T06. C: Loss of NF1 exon 6 in T13. D: Loss of PBRM1 exons 2 to 23 in T15. E: Deep loss of RB1 exon 18 and shallow loss of RB1 exons 19 to 27 in T19. F: Loss of ATRX exons 9 to 17 in T26. G: Amplification of ALK and loss of ALK exon 16 in T23.

mmc7.pdf (99.1KB, pdf)
Supplemental Figure S8

Genome tracks highlighting the intragenic deletion in RB1 involving homozygous loss of exon 18 and heterozygous loss of exons 19 to 27. From top to bottom: Region visualized is chromosome (chr) 13:48,880,000 to 49,070,000. SCNA analysis identifies intragenic event in RB1, represented by log2(FC) of copy number bins (on-target and off-target bins represented in red and gray dots, respectively) and segments (orange). Coverage and read tracks for an unmatched normal peripheral blood normal sample (N03), showing baseline coverage at RB1 locus sequenced using Texas Children's Hospital (TCH) solid. Coverage and read tracks for T19, showing baseline-like coverage for exons 1 to 17, drop in coverage for exons 19 to 27, and a more dramatic drop in coverage for exon 18, sequenced using TCH solid. Track showing structural variants detected by Manta in the RB1 locus using TCH-intragenic copy number panel (TCH-ICN) data [breakpoints chr13:48,983,334-49,028,179 (exon 18) and chr13:48,984,540-49,075,121 (exons 19 to 27)]. Coverage and read tracks for T19 TCH-ICN, showing aberrant reads marking two pairs of breakpoints suggestive of two deletion events that are likely responsible for the SCNA events detected. RefSeq track shows the gene structure of RB1. Light blue boxed areas mark exon 18 and exons 19 to 27. SV, structural variant.

mmc8.pdf (7.9MB, pdf)
Supplemental Table S1
mmc9.docx (13KB, docx)
Supplemental Table S2
mmc10.docx (16.1KB, docx)
Supplemental Table S3
mmc11.docx (12.4KB, docx)
Supplemental Table S4
mmc12.docx (14.5KB, docx)

References

  • 1.Surrey L.F., MacFarland S.P., Chang F., Cao K., Rathi K.S., Akgumus G.T., Gallo D., Lin F., Gleason A., Raman P., Aplenc R., Bagatell R., Minturn J., Mosse Y., Santi M., Tasian S.K., Waanders A.J., Sarmady M., Maris J.M., Hunger S.P., Li M.M. Clinical utility of custom-designed NGS panel testing in pediatric tumors. Genome Med. 2019;11:32. doi: 10.1186/s13073-019-0644-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Seed G., Yuan W., Mateo J., Carreira S., Bertan C., Lambros M., Boysen G., Ferraldeschi R., Miranda S., Figueiredo I., Riisnaes R., Crespo M., Rodrigues D.N., Talevich E., Robinson D.R., Kunju L.P., Wu Y.-M., Lonigro R., Sandhu S., Chinnaiyan A.M., de Bono J.S. Gene copy number estimation from targeted next-generation sequencing of prostate cancer biopsies: analytic validation and clinical qualification. Clin Cancer Res. 2017;23:6070–6077. doi: 10.1158/1078-0432.CCR-17-0972. [DOI] [PubMed] [Google Scholar]
  • 3.Nagarajan R., Bartley A.N., Bridge J.A., Jennings L.J., Kamel-Reid S., Kim A., Lazar A.J., Lindeman N.I., Moncur J., Rai A.J., Routbort M.J., Vasalos P., Merker J.D. A window into clinical next-generation sequencing-based oncology testing practices. Arch Pathol Lab Med. 2017;141:1679–1685. doi: 10.5858/arpa.2016-0542-CP. [DOI] [PubMed] [Google Scholar]
  • 4.Merker J.D., Devereaux K., Iafrate A.J., Kamel-Reid S., Kim A.S., Moncur J.T., Montgomery S.B., Nagarajan R., Portier B.P., Routbort M.J., Smail C., Surrey L.F., Vasalos P., Lazar A.J., Lindeman N.I. Proficiency testing of standardized samples shows very high interlaboratory agreement for clinical next-generation sequencing-based oncology assays. Arch Pathol Lab Med. 2019;143:463–471. doi: 10.5858/arpa.2018-0336-CP. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Pritchard C.C., Salipante S.J., Koehler K., Smith C., Scroggins S., Wood B., Wu D., Lee M.K., Dintzis S., Adey A., Liu Y., Eaton K.D., Martins R., Stricker K., Margolin K.A., Hoffman N., Churpek J.E., Tait J.F., King M.-C., Walsh T. Validation and implementation of targeted capture and sequencing for the detection of actionable mutation, copy number variation, and gene rearrangement in clinical cancer specimens. J Mol Diagn. 2014;16:56–67. doi: 10.1016/j.jmoldx.2013.08.004. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Cheng D.T., Mitchell T.N., Zehir A., Shah R.H., Benayed R., Syed A., Chandramohan R., Liu Z.Y., Won H.H., Scott S.N., Brannon A.R., O'Reilly C., Sadowska J., Casanova J., Yannes A., Hechtman J.F., Yao J., Song W., Ross D.S., Oultache A., Dogan S., Borsu L., Hameed M., Nafa K., Arcila M.E., Ladanyi M., Berger M.F. Memorial Sloan Kettering-Integrated Mutation Profiling of Actionable Cancer Targets (MSK-IMPACT): a hybridization capture-based next-generation sequencing clinical assay for solid tumor molecular oncology. J Mol Diagn. 2015;17:251–264. doi: 10.1016/j.jmoldx.2014.12.006. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Kadri S., Long B.C., Mujacic I., Zhen C.J., Wurst M.N., Sharma S., McDonald N., Niu N., Benhamed S., Tuteja J.H., Seiwert T.Y., White K.P., McNerney M.E., Fitzpatrick C., Wang Y.L., Furtado L.V., Segal J.P. Clinical validation of a next-generation sequencing genomic oncology panel via cross-platform benchmarking against established amplicon sequencing assays. J Mol Diagn. 2017;19:43–56. doi: 10.1016/j.jmoldx.2016.07.012. [DOI] [PubMed] [Google Scholar]
  • 8.Zack T.I., Schumacher S.E., Carter S.L., Cherniack A.D., Saksena G., Tabak B., Lawrence M.S., Zhsng C.-Z., Wala J., Mermel C.H., Sougnez C., Gabriel S.B., Hernandez B., Shen H., Laird P.W., Getz G., Meyerson M., Beroukhim R. Pan-cancer patterns of somatic copy number alteration. Nat Genet. 2013;45:1134–1140. doi: 10.1038/ng.2760. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Aran D., Sirota M., Butte A.J. Corrigendum: systematic pan-cancer analysis of tumour purity. Nat Commun. 2016;7:10707. doi: 10.1038/ncomms10707. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Marusyk A., Polyak K. Tumor heterogeneity: causes and consequences. Biochim Biophys Acta. 2010;1805:105–117. doi: 10.1016/j.bbcan.2009.11.002. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Luthra R., Patel K.P., Routbort M.J., Broaddus R.R., Yau J., Simien C., Chen W., Hatfield D.Z., Medeiros L.J., Singh R.R. A targeted high-throughput next-generation sequencing panel for clinical screening of mutations, gene amplifications, and fusions in solid tumors. J Mol Diagn. 2017;19:255–264. doi: 10.1016/j.jmoldx.2016.09.011. [DOI] [PubMed] [Google Scholar]
  • 12.Wang S.R., Malik S., Tan I.B., Chan Y.S., Hoi Q., Ow J.L., He C.Z., Ching C.E., Poh D.Y.S., Seah H.M., Cheung K.H.T., Perumal D., Devasia A.G., Pan L., Ang S., Lee S.E., Ten R., Chua C., Tan D.S.W., Qu J.Z.Z., Bylstra Y.M., Lim L., Lezhava A., Ng P.C., Wong C.W., Lim T., Tan P. Technical validation of a next-generation sequencing assay for detecting actionable mutations in patients with gastrointestinal cancer. J Mol Diagn. 2016;18:416–424. doi: 10.1016/j.jmoldx.2016.01.006. [DOI] [PubMed] [Google Scholar]
  • 13.Seong M.-W., Seo S.H., Yu Y.S., Hwang J.-M., Cho S.I., Ra E.K., Park H., Lee S.J., Kim J.Y., Park S.S. Diagnostic application of an extensive gene panel for Leber congenital amaurosis with severe genetic heterogeneity. J Mol Diagn. 2015;17:100–105. doi: 10.1016/j.jmoldx.2014.09.003. [DOI] [PubMed] [Google Scholar]
  • 14.Kerkhof J., Schenkel L.C., Reilly J., McRobbie S., Aref-Eshghi E., Stuart A., Rupar C.A., Adams P., Hegele R.A., Lin H., Rodenhiser D., Knoll J., Ainsworth P.J., Sadikovic B. Clinical validation of copy number variant detection from targeted next-generation sequencing panels. J Mol Diagn. 2017;19:905–920. doi: 10.1016/j.jmoldx.2017.07.004. [DOI] [PubMed] [Google Scholar]
  • 15.Schmidt A.Y., Hansen T.V.O., Ahlborn L.B., Jønson L., Yde C.W., Nielsen F.C. Next-generation sequencing-based detection of germline copy number variations in BRCA1/BRCA2: validation of a one-step diagnostic workflow. J Mol Diagn. 2017;19:809–816. doi: 10.1016/j.jmoldx.2017.07.003. [DOI] [PubMed] [Google Scholar]
  • 16.Ellingford J.M., Campbell C., Barton S., Bhaskar S., Gupta S., Taylor R.L., Sergouniotis P.I., Horn B., Lamb J.A., Michaelides M., Webster A.R., Newman W.G., Panda B., Ramsden S.C., Black G.C. Validation of copy number variation analysis for next-generation sequencing diagnostics. Eur J Hum Genet. 2017;25:719–724. doi: 10.1038/ejhg.2017.42. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Roy S., Coldren C., Karunamurthy A., Kip N.S., Klee E.W., Lincoln S.E., Leon A., Pullambhatla M., Temple-Smolkin R.L., Voelkerding K.V., Wang C., Carter A.B. Standards and guidelines for validating next-generation sequencing bioinformatics pipelines: a joint recommendation of the Association for Molecular Pathology and the College of American Pathologists. J Mol Diagn. 2018;20:4–27. doi: 10.1016/j.jmoldx.2017.11.003. [DOI] [PubMed] [Google Scholar]
  • 18.Fujiki R., Ikeda M., Yoshida A., Akiko M., Yao Y., Nishimura M., Matsushita K., Ichikawa T., Tanaka T., Morisaki H., Morisaki T., Ohara O. Assessing the accuracy of variant detection in cost-effective gene panel testing by next-generation sequencing. J Mol Diagn. 2018;20:572–582. doi: 10.1016/j.jmoldx.2018.04.004. [DOI] [PubMed] [Google Scholar]
  • 19.Talevich E., Shain A.H., Botton T., Bastian B.C. CNVkit: genome-wide copy number detection and visualization from targeted DNA sequencing. PLoS Comput Biol. 2016;12:e1004873. doi: 10.1371/journal.pcbi.1004873. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Knutsen T., Padilla-Nash H.M., Wangsa D., Barenboim-Stapleton L., Camps J., McNeil N., Difilippantonio M.J., Ried T. Definitive molecular cytogenetic characterization of 15 colorectal cancer cell lines. Genes Chromosomes Cancer. 2010;49:204–223. doi: 10.1002/gcc.20730. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Kawai K., Viars C., Arden K., Tarin D., Urquidi V., Goodison S. Comprehensive karyotyping of the HT-29 colon adenocarcinoma cell line. Genes Chromosomes Cancer. 2002;34:1–8. doi: 10.1002/gcc.10003. [DOI] [PubMed] [Google Scholar]
  • 22.Varma S., Pommier Y., Sunshine M., Weinstein J.N., Reinhold W.C. High resolution copy number variation data in the NCI-60 cancer cell lines from whole genome microarrays accessible through CellMiner. PLoS One. 2014;9:e92047. doi: 10.1371/journal.pone.0092047. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Essletzbichler P., Konopka T., Santoro F., Chen D., Gapp B.V., Kralovics R., Brummelkamp T.R., Nijman S.M.B., Bürckstümmer T. Megabase-scale deletion using CRISPR/Cas9 to generate a fully haploid human cell line. Genome Res. 2014;24:2059–2065. doi: 10.1101/gr.177220.114. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Li H., Durbin R. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics. 2009;25:1754–1760. doi: 10.1093/bioinformatics/btp324. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Tarasov A., Vilella A.J., Cuppen E., Nijman I.J., Prins P. Sambamba: fast processing of NGS alignment formats. Bioinformatics. 2015;31:2032–2034. doi: 10.1093/bioinformatics/btv098. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Macé A., Tuke M.A., Beckmann J.S., Lin L., Jacquemont S., Weedon M.N., Reymond A., Kutalik Z. New quality measure for SNP array based CNV detection. Bioinformatics. 2016;32:3298–3305. doi: 10.1093/bioinformatics/btw477. [DOI] [PubMed] [Google Scholar]
  • 27.Olshen A.B., Venkatraman E.S., Lucito R., Wigler M. Circular binary segmentation for the analysis of array-based DNA copy number data. Biostatistics. 2004;5:557–572. doi: 10.1093/biostatistics/kxh008. [DOI] [PubMed] [Google Scholar]
  • 28.Zarrei M., MacDonald J.R., Merico D., Scherer S.W. A copy number variation map of the human genome. Nat Rev Genet. 2015;16:172–183. doi: 10.1038/nrg3871. [DOI] [PubMed] [Google Scholar]
  • 29.Tate J.G., Bamford S., Jubb H.C., Sondka Z., Beare D.M., Bindal N., Boutselakis H., Cole C.G., Creatore C., Dawson E., Fish P., Harsha B., Hathaway C., Jupe S.C., Kok C.Y., Noble K., Ponting L., Ramshaw C.C., Rye C.E., Speedy H.E., Stefancsik R., Thompson S.L., Wang S., Ward S., Campbell P.J., Forbes S.A. COSMIC: the catalogue of somatic mutations in cancer. Nucleic Acids Res. 2019;47:D941–D947. doi: 10.1093/nar/gky1015. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Garrison E., Marth G. Haplotype-based variant detection from short-read sequencing. arXiv. 2012 doi: 10.48550/arXiv.1207.3907. [Preprint] [DOI] [Google Scholar]
  • 31.Chen X., Schulz-Trieglaff O., Shaw R., Barnes B., Schlesinger F., Källberg M., Cox A.J., Kruglyak S., Saunders C.T. Manta: rapid detection of structural variants and indels for germline and cancer sequencing applications. Bioinformatics. 2016;32:1220–1222. doi: 10.1093/bioinformatics/btv710. [DOI] [PubMed] [Google Scholar]
  • 32.Chandramohan R., Kakkar N., Roy A., Parsons D.W. reconCNV: interactive visualization of copy number data from high-throughput sequencing. Bioinformatics. 2020;37:1164–1167. doi: 10.1093/bioinformatics/btaa746. [DOI] [PubMed] [Google Scholar]
  • 33.Robinson J.T., Thorvaldsdóttir H., Winckler W., Guttman M., Lander E.S., Getz G., Mesirov J.P. Integrative genomics viewer. Nat Biotechnol. 2011;29:24–26. doi: 10.1038/nbt.1754. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Attiyeh E.F., London W.B., Mossé Y.P., Wang Q., Winter C., Khazi D., McGrady P.W., Seeger R.C., Look A.T., Shimada H., Brodeur G.M., Cohn S.L., Matthay K.K., Maris J.M., Children’s Oncology Group Chromosome 1p and 11q deletions and outcome in neuroblastoma. N Engl J Med. 2005;353:2243–2253. doi: 10.1056/NEJMoa052399. [DOI] [PubMed] [Google Scholar]
  • 35.Li M. Clinical implementation of the standards and guidelines for the interpretation and reporting of sequence variants in cancer: a joint consensus recommendation of AMP, ASO and CAP. Cancer Genet. 2017;214-215:36. [Google Scholar]
  • 36.Allen C.E., Laetsch T.W., Mody R., Irwin M.S., Lim M.S., Adamson P.C., Seibel N.L., Parsons D.W., Cho Y.J., Janeway K., Pediatric MATCH Target and Agent Prioritization Committee Target and agent prioritization for the Children’s Oncology Group-National Cancer Institute Pediatric MATCH trial. J Natl Cancer Inst. 2017;109:djw274. doi: 10.1093/jnci/djw274. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.Ross D.S., Zehir A., Cheng D.T., Benayed R., Nafa K., Hechtman J.F., Janjigian Y.Y., Weigelt B., Razavi P., Hyman D.M., Baselga J., Berger M.F., Ladanyi M., Arcila M.E. Next-generation assessment of human epidermal growth factor receptor 2 (ERBB2) amplification status: clinical validation in the context of a hybrid capture-based, comprehensive solid tumor genomic profiling assay. J Mol Diagn. 2017;19:244–254. doi: 10.1016/j.jmoldx.2016.09.010. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Wang J., Giorda K., Lai Z., Stetson D., Jarosz M. Abstract 397: whole genome copy number variation analysis using a SNP-focused targeted sequencing panel for tumor analysis. Mol Cell Biol Genet. 2017;77(13 Suppl) Abstract nr 397. [Google Scholar]
  • 39.Shen W., Paxton C.N., Szankasi P., Longhurst M., Schumacher J.A., Frizzell K.A., Sorrells S.M., Clayton A.L., Jattani R.P., Patel J.L., Toydemir R., Kelley T.W., Xu X. Detection of genome-wide copy number variants in myeloid malignancies using next-generation sequencing. J Clin Pathol. 2018;71:372–378. doi: 10.1136/jclinpath-2017-204823. [DOI] [PubMed] [Google Scholar]
  • 40.Bielski C.M., Zehir A., Penson A.V., Donoghue M.T.A., Chatila W., Armenia J., Chang M.T., Schram A.M., Jonsson P., Bandlamudi C., Razavi P., Iyer G., Robson M.E., Stadler Z.K., Schultz N., Baselga J., Solit D.B., Hyman D.M., Berger M.F., Taylor B.S. Genome doubling shapes the evolution and prognosis of advanced cancers. Nat Genet. 2018;50:1189–1195. doi: 10.1038/s41588-018-0165-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41.Hieronymus H., Murali R., Tin A., Yadav K., Abida W., Moller H., Berney D., Scher H., Carver B., Scardino P., Schultz N., Taylor B., Vickers A., Cuzick J., Sawyers C.L. Tumor copy number alteration burden is a pan-cancer prognostic factor associated with recurrence and death. Elife. 2018;7:e37294. doi: 10.7554/eLife.37294. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42.Geiersbach K.B., Willmore-Payne C., Pasi A.V., Paxton C.N., Werner T.L., Xu X., Wittwer C.T., Gulbahce H.E., Downs-Kelly E. Genomic copy number analysis of HER2 -equivocal breast cancers. Am J Clin Pathol. 2016;146:439–447. doi: 10.1093/ajcp/aqw130. [DOI] [PubMed] [Google Scholar]
  • 43.Lee K., Kim H.J., Jang M.H., Lee S., Ahn S., Park S.Y. Centromere 17 copy number gain reflects chromosomal instability in breast cancer. Sci Rep. 2019;9:1–11. doi: 10.1038/s41598-019-54471-w. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44.Nichols C.A., Gibson W.J., Brown M.S., Kosmicki J.A., Busanovich J.P., Wei H., Urbanski L.M., Curimjee N., Berger A.C., Gao G.F., Cherniack A.D., Dhe-Paganon S., Paolella B.R., Beroukhim R. Loss of heterozygosity of essential genes represents a widespread class of potential cancer vulnerabilities. Nat Commun. 2020;11:1–14. doi: 10.1038/s41467-020-16399-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45.Ting M.A., Reuther J., Chandramohan R., Voicu H., Gandhi I., Liu M., Cortes-Santiago N., Foster J.H., Hicks J., Nuchtern J., Scollon S., Plon S.E., Chintagumpala M., Rainusso N., Roy A., Parsons D.W. Genomic analysis and preclinical xenograft model development identify potential therapeutic targets for MYOD1-mutant soft-tissue sarcoma of childhood. J Pathol. 2021;255:52–61. doi: 10.1002/path.5736. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplemental Figure S1

Bin size characterization. Comparison of on-target (A) and off-target (B) bin size. Comparison of on-target (C) and off-target (D) coverage using female normal peripheral blood samples. Median values marked in red. n = 8 (C and D).

mmc1.pdf (28.1KB, pdf)
Supplemental Figure S2

Threshold for median absolute deviation (MAD). A: MAD as a measure to evaluate noise in SCNA profile with the threshold for flagging samples with higher noise set at MAD = 0.35 (red line). B: A density plot shows a bimodal distribution of samples with variable quality (pass or flag/fail). n = 31 (B).

mmc2.pdf (29.3KB, pdf)
Supplemental Figure S3

A and B: Genome-wide SCNA profile of MOLT-4 cell line using single-nucleotide polymorphism array data (A) and Texas Children's Hospital (TCH) heme (B) targeted sequencing data. Segments with log2(FC) greater than zero and less than zero are plotted in red and blue, respectively. C: Comparison of gene-level log2(FC) between copy number array (OncoScan) and TCH solid sequencing data when OncoScan data were available (R2 = 0.77). n = 6 (C).

mmc3.pdf (990.4KB, pdf)
Supplemental Figure S4

Optimizing off-target bin size. Box plot of the difference between on-target median absolute deviation (MAD) and off-target MAD when off-target bin size is varied from 100 to 1000 kbp.

mmc4.pdf (23.2KB, pdf)
Supplemental Figure S5

Reproducibility measured using a tumor sample (T06), with a known PTEN heterozygous loss, shows concordant gene-level log2(FC) values (R2 = 0.85) when compared against its replicate (T06R), except for HIST1H3B.

mmc5.pdf (15.4KB, pdf)
Supplemental Figure S6

Total number of heterozygous (het.) single-nucleotide polymorphisms (SNPs) identified 16 peripheral blood normal samples (A) and number of heterozygous SNP sites identified in these normal samples by chromosome arm compared between exon + flanking intron design (green) and exon design (orange), represented as a boxplot (B).

mmc6.pdf (32.1KB, pdf)
Supplemental Figure S7

Intragenic events identified in 26 clinical samples (optimization and validation cohorts) and HAP1 cell line. A: Loss of PTEN exon 1 in T06. B: Loss of KDM6A exons 5 to 10 in T06. C: Loss of NF1 exon 6 in T13. D: Loss of PBRM1 exons 2 to 23 in T15. E: Deep loss of RB1 exon 18 and shallow loss of RB1 exons 19 to 27 in T19. F: Loss of ATRX exons 9 to 17 in T26. G: Amplification of ALK and loss of ALK exon 16 in T23.

mmc7.pdf (99.1KB, pdf)
Supplemental Figure S8

Genome tracks highlighting the intragenic deletion in RB1 involving homozygous loss of exon 18 and heterozygous loss of exons 19 to 27. From top to bottom: Region visualized is chromosome (chr) 13:48,880,000 to 49,070,000. SCNA analysis identifies intragenic event in RB1, represented by log2(FC) of copy number bins (on-target and off-target bins represented in red and gray dots, respectively) and segments (orange). Coverage and read tracks for an unmatched normal peripheral blood normal sample (N03), showing baseline coverage at RB1 locus sequenced using Texas Children's Hospital (TCH) solid. Coverage and read tracks for T19, showing baseline-like coverage for exons 1 to 17, drop in coverage for exons 19 to 27, and a more dramatic drop in coverage for exon 18, sequenced using TCH solid. Track showing structural variants detected by Manta in the RB1 locus using TCH-intragenic copy number panel (TCH-ICN) data [breakpoints chr13:48,983,334-49,028,179 (exon 18) and chr13:48,984,540-49,075,121 (exons 19 to 27)]. Coverage and read tracks for T19 TCH-ICN, showing aberrant reads marking two pairs of breakpoints suggestive of two deletion events that are likely responsible for the SCNA events detected. RefSeq track shows the gene structure of RB1. Light blue boxed areas mark exon 18 and exons 19 to 27. SV, structural variant.

mmc8.pdf (7.9MB, pdf)
Supplemental Table S1
mmc9.docx (13KB, docx)
Supplemental Table S2
mmc10.docx (16.1KB, docx)
Supplemental Table S3
mmc11.docx (12.4KB, docx)
Supplemental Table S4
mmc12.docx (14.5KB, docx)

Articles from The Journal of Molecular Diagnostics : JMD are provided here courtesy of American Society for Investigative Pathology

RESOURCES