Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2018 Nov 28.
Published in final edited form as: Cell Rep. 2018 Nov 6;25(6):1446–1457. doi: 10.1016/j.celrep.2018.10.046

Reliability of Whole-Exome Sequencing for Assessing Intratumor Genetic Heterogeneity

Weiwei Shi 1,11, Charlotte KY Ng 2,3,4,11, Raymond S Lim 2,11, Tingting Jiang 1, Sushant Kumar 5,6, Xiaotong Li 1,6, Vikram B Wali 1, Salvatore Piscuoglio 2,3, Mark B Gerstein 5,6,7, Anees B Chagpar 8,9, Britta Weigelt 2, Lajos Pusztai 1,9, Jorge S Reis-Filho 2,10,*, Christos Hatzis 1,9,12,*
PMCID: PMC6261536  NIHMSID: NIHMS1512150  PMID: 30404001

SUMMARY

Multi-region sequencing is used to detect intratumor genetic heterogeneity (ITGH) in tumors. To assess whether genuine ITGH can be distinguished from sequencing artifacts, we performed whole-exome sequencing (WES) on three anatomically distinct regions of the same tumor with technical replicates to estimate technical noise. Somatic variants were detected with three different WES pipelines and subsequently validated by high-depth amplicon sequencing. The cancer-only pipeline was unreliable, with about 69% of the identified somatic variants being false positive. Even with matched normal DNA for which 82% of the somatic variants were detected reliably, only 36%–78% were found consistently in technical replicate pairs. Overall, 34%–80% of the discordant somatic variants, which could be interpreted as ITGH, were found to constitute technical noise. Excluding mutations affecting low-mappability regions or occurring in certain mutational contexts was found to reduce artifacts, yet detection of sub-clonal mutations by WES in the absence of orthogonal validation remains unreliable.

Graphical Abstract

graphic file with name nihms-1512150-f0001.jpg

In Brief

Shi et al. report that standard coverage whole-exome sequencing and bioinformatics pipelines cannot discriminate between genuine intratumor genetic heterogeneity and sequencing artifacts. Although aggressive minimum depth filtering would not improve the false detection rate of subclonal mutations, excluding mutations in low-mappability regions or in certain mutational contexts could help.

INTRODUCTION

Intratumor genetic heterogeneity (ITGH), typically defined as the coexistence of genetically distinct but clonally related cancer cells within the same patient (Yap et al., 2012), can manifest itself spatially within the same lesion or as genetic differences between different metastatic sites and the primary tumor from the same patient (Ding et al., 2010; Gerlinger et al., 2012; Marusyk et al., 2012; Newburger et al., 2013; Yates et al., 2015). The broad availability of massively parallel sequencing has accelerated research into ITGH, and numerous studies have applied whole-exome or targeted-exome sequencing to multiple biopsies from the same cancer, to different metastatic lesions from the same patient, and more recently to multiple single cells from the same cancer (Gerlinger et al., 2012; Hou et al., 2012; Martelotto et al., 2017; Navin et al., 2011; Newburger et al., 2013; Nik-Zainal et al., 2012; Wang et al., 2014; Xu et al., 2012). ITGH represents a snapshot of the tumor’s evolutionary path and is a clinically important phenomenon with implications in prognosis and treatment response (Fisher et al., 2013; Hiley et al., 2014; Jiang et al., 2014; Marusyk et al., 2012; Morris et al., 2016; Turner and Reis-Filho, 2012).

The assessment of ITGH, by definition, involves the detection of subclonal, low-frequency variants that are not uniformly present in all cancer cells and is made possible by the availability of bioinformatics tools to detect low-frequency somatic mutations with high sensitivity (Cibulskis et al., 2013; Koboldt et al., 2012; Saunders et al., 2012; Wilm et al., 2012). However, the presence of technical noise in sequencing data is well known (Li, 2014; Nakamura et al., 2011), and it is unclear whether genuine ITGH can be reliably distinguished from artifacts generated during library preparation, sequencing, and data processing (Qi et al., 2015; Smith et al., 2014). Understanding the signal-to-noise characteristics in these experiments is critical for the interpretation of ITGH.

Given the implications in prognosis and treatment response, ITGH is an important consideration in the clinical setting. Because the cost of WES and complex informed consent requirements, the inclusion of matching normal samples still represents a limitation in the sequencing of tumor samples in some large clinical trials (Shi et al., 2017). Whether subclonal mutations can be robustly identified in the absence of matching normal samples and whether pooled normal samples from unrelated individuals would serve as a reasonable control should be explored as alternative options for the assessment of ITGH in the clinical setting.

In this study, we aimed to assess the reliability of somatic variant detection from whole-exome sequencing (WES) in the context of ITGH (Figures 1A and S1A). To address this question, we performed WES on DNA from biopsies obtained from three anatomically distinct regions of six primary breast cancers (6 3 3 biopsies) and from matched peripheral blood leukocytes. Additionally, to determine the background noise levels as a comparator for the assessment of ITGH, aliquots of the same DNA samples from six distinct biopsies were sequenced twice to generate technical replicates. We examined the reliability of somatic variant detection using three different analysis approaches, namely, using cancer WES data only, cancer data and WES from pooled unrelated (i.e., non-matched) normal samples, and WES data from cancer and matching normal tissue (i.e., blood). Each approach used different sets of detection algorithms and filtering steps in an attempt to control for the specific biases associated with each approach. To generate the “gold standard” benchmark dataset, we re-sequenced all somatic variants identified by any of the three somatic variant detection pipelines in at least one sample using high-depth targeted amplicon sequencing. Given the higher depth obtained by amplicon sequencing, we assessed true ITGH and estimated the frequency of false positives by the different detection pipelines. Finally, we evaluated the sequence patterns and context in which the artifactual mutations occurred to improve the specificity of WES for characterizing genuine ITGH.

Figure 1. Reliability of Somatic SNVs and INDELs Detected by WES and High-Depth Amplicon Sequencing Validation of Variants.

Figure 1.

(A) Biopsies were obtained from three anatomically distinct regions of each tumor to assess spatial genomic heterogeneity. One of the three DNA samples was split in two to provide a pair of technical replicates. Somatic variants were detected by three different WES analysis pipelines and subsequently validated by high-depth amplicon sequencing.

(B) Number of somatic mutations identified by WES in each technical replicate, subclassified according to their validation status by high-depth amplicon sequencing.

(C and D) Concordance of somatic SNVs (C) and INDELs (D) defined in each pair of technical replicates using the matched-normal WES analysis pipeline (left) or high-depth amplicon sequencing (right). Discordance between the replicates quantified as the Jaccard distance shown next to each bar and the pathogenicity of variants were assessed as described in Supplemental Experimental Procedures. Putative WES variants re-sequenced with high-depth amplicon sequencing were further classified as absent (VAF < 1%), germline (tumor VAF/germline VAF < 5), or low depth (<503).

(E) Validation status of variants (SNVs and INDELs) detected by WES in technical replicate pairs, categorized as concordant (“con”) or discordant (“dis”) by WES, and the distribution of their AmpliSeq validation status is shown within each bar. See also Figure S1.

RESULTS

Limited Reliability of Somatic Mutations Defined by WES

We performed WES on DNA extracted from three distinct regions of the primary tumor and the matching normal blood cells from six breast cancer patients, including four with estrogen receptor-positive and two with triple-negative cancers (Table 1). The biopsies were obtained from three anatomically distinct regions of each tumor at least 1 cm apart (i.e., intratumor replicates; Figure 1A) in the context of a prospective institutional review board-approved study to assess intratumor molecular heterogeneity (von Wahlde et al., 2017). All tumor samples had at least 50% tumor cellularity on the basis of pathologic assessment. For the technical replicates, a second library was generated from one of the tumor DNA samples randomly selected from each cancer and sequenced by WES using the same protocol at the same facility on a different day. The mean target depth was 1603 (range 703 to 2203; Table S1), consistent with recommendations for WES (Clark et al., 2011; Sims et al., 2014). Following WES, somatic single-nucleotide variants (SNVs) and small insertion-deletions (INDELs) were identified by three different somatic variant calling pipelines that used (1) the tumor DNA alone (i.e., tumor only, a common approach in clinical practice), (2) tumor DNA and pooled unrelated normal DNA (i.e., cohort normal), or (3) tumor DNA and patient-matched normal DNA (i.e., matched normal; Figure S1A; Supplemental Experimental Procedures). Targeted amplicon sequencing using an orthogonal library generation and an independent sequencing method (AmpliSeq) was performed for all putative somatic variants identified by any of the WES variant calling pipelines on all tumor and matching normal DNA to a median depth of 6053 to define the “gold standard” mutation status for each identified somatic variant (Table S1; Supplemental Experimental Procedures).

Table 1.

Tumor Characteristics and Estimated Tumor Cellularity and Ploidy from WES Data Using FACETS

Sample Subtype Stage Grade Estimated
Purity (%)
Estimated
Ploidy
Case 1 biorep
A/techrep 1
ER+/PR+/
HER2+
IIA 2 53.8 1.92
Case 1 biorep B 46.3 2.00
Case 1 biorep C 26.0 1.92
Case 1 techrep 2 54.0 1.92
Case 2 biorep A ER−/PR−/
HER2−
IIB 3 40.0 2.14
Case 2 biorep
B/techrep 1
32.3 1.79
Case 2 biorep C NE 1.72
Case 2 techrep 2 38.5 2.03
Case 3 biorep A ER+/PR+/
HER2+
IIB 2 56.4 1.89
Case 3 biorep B 72.8 1.93
Case 3 biorep
C/techrep 1
64.9 1.90
Case 3 techrep 2 66.7 1.93
Case 4 biorep A ER+/PR−/
HER2−
IIB 2 60.4 1.98
Case 4 biorep
B/techrep 1
61.7 1.95
Case 4 biorep C 57.6 1.99
Case 4 techrep 2 57.6 1.94
Case 5 biorep A ER−/PR−/
HER2−
IIA 3 78.7 2.60
Case 5 biorep
B/techrep 1
76.7 2.62
Case 5 biorep C 80.8 2.82
Case 5 techrep 2 76.2 2.61
Case 6 biorep
A/techrep 1
ER+/PR+/
HER2+
IIA 2 47.8 1.99
Case 6 biorep B 49.4 2.05
Case 6 biorep C 52.9 2.18
Case 6 techrep 2 49.0 1.96

biorep, biological replicate; NE, could not be estimated; techrep, technical replicate; WES, whole-exome sequencing.

To quantify the technical reliability of somatic mutation detection by the matched-normal WES pipeline, the approach that is most frequently used in the research setting to assess ITGH, we compared the somatic mutations identified in the six pairs of technical replicates. In this experiment, tumor samples and normal samples were sequenced to a median coverage of 1843 (range 92–211) and 903 (range 80–138), respectively (Table S1). We identified medians of 74 (range 40–125) and 3 (range 0–13) somatic SNVs and INDELs, respectively, in each of the 12 DNA samples. Considering the high-depth amplicon sequencing results as the “gold standard,” we categorized candidate mutations detected in WES as true somatic, absent, germline-like (incorporating genuine germline variants and artifactual variant alleles caused by alignment biases and/or the sequencing technology) (Kim and Speed, 2013), or low-depth (i.e., technical failure with amplicon sequencing; Table S2; Supplemental Experimental Procedures). Excluding the low-depth (technical failure) variants, a median 82% (range 56%–90% per sample) of the somatic SNVs were confirmed as somatic, a median 5% (range 0%–16%) as germline-like, and the remaining were absent by AmpliSeq (Figure 1B; Table S3).

Given that both technical replicates used the same input DNA, we anticipated detecting nearly identical somatic mutations in each pair of replicates. However, only a subset of the somatic SNVs and INDELs were consistently identified in technical replicate pairs, with median Jaccard distances (ranging from 0, perfect agreement, to 1, absence of overlapping variants; Supplemental Experimental Procedures) of 0.39 (range 0.24–0.64) for SNVs (Figure 1C) and 0.17 (range 0–0.50) for INDELs (Figure 1D). Interestingly, the technical replicate pairs with the highest Jaccard distances were those with the lowest tumor cell content as inferred by FACETS (Shen and Seshan, 2016) (Table 1). There were also a small number of potentially pathogenic variants among the discordant variants in technical replicate pairs (range 0–3; Figure 1C, red bars) that could have been misinterpreted as ITGH. We obtained fewer somatic mutations but similarly modest reproducibility with the cohort-normal WES pipeline and far fewer somatic mutations but improved reproducibility using the tumor-only pipeline (see explanation below; Figures S1B and S1C). Comparing only the mutations that were confirmed to be somatic by AmpliSeq between the technical replicate pairs, we observed almost perfect agreement for SNVs (Figure 1C) and for INDELs (Figure 1D) (maximum Jaccard distance of 0.02 and 0, respectively).

When we examined the somatic mutations found to be concordant or discordant between pairs of technical replicates on the basis of WES, a median of 95% (range 73%–97%) of the concordant variants were confirmed to be genuinely somatic in at least one of the two technical replicates, compared with a median 33% (range 14%–48%) of the discordant variants (Figure 1E). Of the discordant WES variants, a median of 44% (range 36%–71%) were found to be absent by AmpliSeq (i.e., false positive in one of the two technical replicates), and a median of 7% (range 3%–22%) were germline(-like) variants (i.e., missed by WES in the matching normal). The validation status of 3% (range 1%–8%) of the mutations could not be ascertained, because of technical failure of low AmpliSeq coverage in the validation experiments.

Taken together, these results suggest that WES performed at typical sequencing depth may be inadequate for detecting ITGH, particularly when the tumor cell content is less than 50%, as only 62% (range 36%–76%) of the somatic mutations were detected consistently in the technical replicate pairs by WES, with the remaining mutations falsely appearing as discordant.

WES Overestimates True ITGH

Next, we quantified ITGH by comparing somatic variant calls between the geographically distinct biopsies from the same cancer. In this analysis, we also examined how the three different WES analytic approaches differed in the quantification of ITGH (Figure S1A; Supplemental Experimental Procedures). Tumor cellularity of all samples was inferred from WES using FACETS (Shen and Seshan, 2016) (Table 1). We used mixed-effects linear modeling to estimate an average cellularity of 54.7% with intratumor and technical SDs of 7.0% and 2.2%, respectively (Supplemental Experimental Procedures). The matched-normal pipeline detected a median of 150 unique somatic mutations (range 68–186) in each tumor. Compared with the matched-normal pipeline, the cohort-normal pipeline detected a median of 101 mutations (range 48–131; p > 0.05, Wilcoxon test), and the tumor-only pipeline identified a median of 62 mutations (range 36–97; p = 0.01, Wilcoxon test; Figures 2A and 2B; Table S4).

Figure 2. ITGH as Assessed by Different WES Analysis Pipelines.

Figure 2.

(A) Total number of somatic variants (SNVs and INDELs) identified in intratumor biopsies by the three WES analysis pipelines. One of the two technical replicates was randomly selected for inclusion in this analysis. Validation status by high-depth amplicon sequencing (i.e., somatic, germline, absent [VAF < 1%], low depth [< 503 coverage]) is shown according to the color key.

(B) Venn diagrams showing the overlap of putative somatic variants detected in intratumor biopsies from each tumor by the three WES analysis pipelines. The last row includes only “true” somatic variants validated by high-depth amplicon sequencing. The size of the circles is proportional to the number of somatic variants in a biopsy, with the numbers representing the total variants and those in parentheses indicating the number of pathogenic variants.

(C) Performance characteristics of the three WES analysis pipelines to identify true somatic variants. Putative somatic variants were considered as “true” if confirmed by high-depth amplicon sequencing. Precision was calculated as TP/(FP + TP) and sensitivity as TP/STP, where TP and FP are the number of true-positive and false-positive variants and STP is the total number of true somatic calls made by all three pipelines. Each circle represents one sample as analyzed by each pipeline, and the size of the circles is proportional to the number of putative somatic variants per biopsy identified by each analysis pipeline. See also Figure S2.

To assess the reliability of the different WES pipelines in detecting ITGH, we compared the WES candidate mutations with the “gold-standard” AmpliSeq-validated somatic variants from the same sample. A median of 62% (range 50%–98%) of the candidate somatic variants detected by the tumor-only pipeline were germline variants, and only 28% (range 2%–45%) were validated as true somatic mutations, highlighting the challenges posed by this commonly used approach in clinical practice. By contrast, a median of 79% (range 59%–91%) and 84% (range 61%–92%) of the variants defined by the cohort-normal and the matched-normal pipelines, respectively, were true somatic mutations (Figures 2A, S1D, and S1E). The tumor-only pipeline had the lowest sensitivity (median 18%, range 0%–29%) and precision (median 28%, range 0%–46%) for identifying true somatic variants. The matched-normal pipeline had the highest sensitivity (median 87%, range 47%–94%) and precision (median 86%, range 62%–94%; Figure 2C), whereas the cohort-normal pipeline had similar precision (median 83%, range 61%–91%) but significantly lower sensitivity (median 48%, range 26%–64%; p < 0.001, Wilcoxon test). Using the matched-normal pipeline, tumor cellularity was positively correlated with sensitivity (Pearson r = 0.7, p = 0.001) and numerically, though not statistically significantly, correlated with precision (Figure S2A). We did not observe the same correlation with the other two pipelines.

Next, we estimated the apparent ITGH on the basis of mutations detected by the three WES calling pipelines using the Jaccard distance as the metric of ITGH. Mutations identified as somatic in one or two of the three biopsies and as germ-line-like or absent in the remaining biopsies contributed to ITGH. The tumor-only pipeline had a median Jaccard distance of 0.34 (range 0.19–0.96) compared with the cohort-normal pipeline of 0.70 (range 0.53–0.91) and matched-normal pipeline of 0.60 (range 0.44–0.89) (Figure 2B). The apparently lower ITGH defined by the tumor-only pipeline (p = 0.03 for tumor-only versus cohort-normal, p > 0.05 versus matched-normal, paired Wilcoxon tests) was due to the large number of germline variants misidentified as somatic mutations by the tumor-only pipeline (Figure 2A). When ITGH was estimated on the basis of the AmpliSeq-validated somatic mutations only, the median Jaccard distance was 0.40 (range 0.19–0.61; Figure 2B), which in the context of this study was considered a true estimation of ITGH. We also observed that the proportion of private mutations in a given biopsy was positively correlated with its purity relative to the mean purity of all biopsies for the patient (Pearson r = 0.531, p = 0.023; Figure S2B), suggesting that ITGH may be overestimated in cases with large variability in tumor purity between biopsies. Compared with the apparent ITGH defined by the candidate mutations in the cohort-normal and matched-normal WES pipelines, the true ITGH was significantly smaller (p = 0.015 versus cohort-normal and p = 0.015 versus matched-normal, paired Wilcoxon tests). We noted that 5.8% of the heterogeneous variants, also called branch mutations, that were not present in all biopsies of a given case were predicted to be pathogenic, but there was no statistically significant enrichment in pathogenic mutations compared to non-pathogenic variants among the heterogeneous somatic variants (Figure 2B). Heterogeneous variants were detected in a small number of cancer genes (21 of 306 [6.9%]), and five of these variants (1.6%) were predicted pathogenic.

Taken together, these results suggest that the tumor-only WES pipeline misidentifies a substantial proportion of germline variants as somatic mutations. Even when using the matched-normal DNA for mutation detection, the extent of ITGH defined solely on the basis of WES performed at typical sequencing depth is overestimated, potentially affecting actionable cancer genes. For example, a deleterious stop-gain branch mutation in CDC27 (p.Cys71*) was identified as heterogeneous (in one of the three biopsies) by the matched-normal WES pipeline but was not validated by AmpliSeq. False-positive heterogeneous variants were mostly not actionable, however. genes. For example, a deleterious stop-gain branch mutation in CDC27 (p.Cys71*) was identified as heterogeneous (in one of the three biopsies) by the matched-normal WES pipeline but was not validated by AmpliSeq. False-positive heterogeneous variants were mostly not actionable, however.

Characteristics of Artifactual WES Somatic Mutations

To identify the characteristics of the putative mutations identified by WES that were subsequently found not to be truly somatic variants, we examined the alternative coverage (i.e., the number of reads supporting the alternate allele), the variant allele frequency (VAF), and the total depth of coverage of the candidate somatic mutations identified by the three WES pipelines from the intratumor biopsies. The tumor-only pipeline reported a median of 2 variants (range 0–5) with VAF < 10% because of reduced sensitivity of the single-sample mutation detection algorithm at low VAF and the strict filtering imposed to remove potential germline variants (Figures 3A and 3B). Despite aggressive filtering, most of the putative somatic variants from the tumor-only analysis were germline variants with VAF ~ 50%, indicating the presence of a large number of private mutations that have not been cataloged in publicly available databases (Figures 2A, 3A, and 3B). The cohort-normal analysis correctly identified somatic variants with low VAF, including many that were heterogeneous between the biopsies, but missed somatic variants with VAF > 45% because of filtering imposed to remove likely germline variants that may not be present in the pooled normal DNA used as the reference (Figures 2C, 3A, and 3B). The validated somatic mutations defined by the matched-normal pipeline covered the widest range of VAFs. The putative somatic mutations identified by the matched-normal pipeline that were found to be absent by AmpliSeq were mostly in the low-VAF range (median 7.3%, range 0.6%–44%; Figures 2A, 3A, and 3B).

Figure 3. Coverage Characteristics of True Somatic Variants and False-Positive Mutations in the WES Data.

Figure 3.

(A and B) Alternative allele coverage (i.e., number of read supporting the alternative allele) (A) and total coverage (B) are plotted against variant allele fraction (VAF; all log10 scale) for the three WES analysis pipelines (rows). The different subsets of WES putative somatic variants according to validation status by high-depth amplicon sequencing are shown as columns: validated somatic mutations, validated homogeneous somatic mutations (i.e., present in all three biopsies of the same tumor), validated heterogeneous somatic mutations (i.e., present in one or two biopsies from the same tumor), and putative somatic mutations identified by the WES pipelines but failed validation by high-depth amplicon sequencing (i.e., putative somatic mutations that were validated to be germline, absent [VAF < 1%] or low coverage [< 50x]).

(C and D) For the matched-normal WES pipeline, (C) total coverage in the tumor (bottom) is plotted against the coverage in the matched normal sample, and (D) WES alternative allele coverage is plotted against VAF and of somatic mutations identified in all the specimens. The validation status categories are the same as in (A) and (B). Density kernel plots of the marginal distributions are included above and to the right of the scatterplots for each of the four categories of mutations. See also Figure S3.

Given that the majority of the ITGH studies carried out to date (Ding et al., 2010; Gerlinger et al., 2012; Nik-Zainal et al., 2012) used matched tumor-normal samples and analysis pipelines similar to our matched-normal pipeline and that mainly low-VAF mutations contributed to ITGH (Figures 3A and 3B), we compared the true- and false-positive somatic mutations (i.e., the validated and the un validated putative somatic variants) and the validated homogeneous (i.e., present in all biopsies from a case) and heterogeneous (i.e., absent in at least one biopsy from a case) somatic mutations derived from the distinct biopsies using the matched-normal WES pipeline. We found that the total depth for the true positive (median 124, range 9–1,028; Figure 3C, green) was significantly higher than for the false-positive mutations (median 78, range 12–926; p < 0.001, Wilcoxon test; Figures 3C and S3A, red). Furthermore, the false positives, compared with true mutations, had significantly lower alternative coverage (median 7, range 1–126 versus median 22, range 4–231; p < 0.001, Wilcoxon test) and VAF (median 11%, range 0.6%–64% versus median 23%, range 3%–93%; p < 0.001, Wilcoxon test; Figures 3D and S3B). There were also significant differences in the VAF distribution of the validated homogeneous and heterogeneous mutations (median 25%, range 1%–93% versus median 6%, range 1%–57%; p < 0.001, Wilcoxon test) and between the validated homogeneous mutations and the false positives (median 25%, range 1%–93% versus median 12%, range 2%–64%; p < 0.001, Wilcoxon test; Figures 3D and S3B). Crucially, the validated mutations implicated in true ITGH had significantly lower VAF than the false positives (p < 0.001, Wilcoxon test; Figures 3D and S3BS3E), which indicates that true ITGH mutations may display similarly low or lower VAFs compared with the false-positive and false-negative mutations. These results suggest that filtering somatic variants with low VAF or low alternative coverage may improve the precision of the WES pipeline but would also eliminate many true somatic variants that contribute to ITGH. Importantly, false-positive mutations had significantly lower total depth in the matched normal DNA (median 35, range 6–492) compared with true-positive mutations (median 68, range 9–514; p < 0.001, Wilcoxon test) and validated heterogeneous mutations (median 72, range 12–343; p < 0.001, Wilcoxon test; Figure 3C). Many putative somatic mutations (47%) with total depth in the normal DNA of 10 or less were confirmed as germline-like by AmpliSeq, emphasizing the importance of having adequate sequencing depth in the normal samples.

Separating True ITGH from WES Artifacts

Because subclonal mutations are expected to be the predominant contributors to ITGH, we inferred the clonality of all mutations identified by WES using ABSOLUTE (Carter et al., 2012). Subclonal mutations were significantly overrepresented among the validated heterogeneous variants compared with the homogeneous somatic variants (83.2% versus 28.9%; p < 0.001, Fisher’s exact test) but were similarly overrepresented among the artifactual somatic variants (91.5% versus 83.2%; p > 0.05, Fisher’s exact test; Figures 4A and S4A). Importantly, subclonal mutations were also significantly enriched among discordant variants in the WES technical replicates compared with the concordant variants (72.0% versus 24.9%; p < 0.001, Fisher’s exact test; Figures 4A and S4A). These results suggest that compared with clonal variants, subclonal variants detected by WES are more likely to be erroneously attributed to ITGH.

Figure 4. Mappability and Sequence Context of True and Artifactual Somatic Variants.

Figure 4.

(A and B) Clonality as defined by ABSOLUTE (A) and mappability of somatic variants (B) identified by the matched-normal WES pipeline. In each panel, the first two sets of bars enumerate the putative somatic variants identified as concordant or discordant in the technical replicates, whereas the bottom four sets of bars enumerate the somatic variants identified in intratumor biopsies and subsequently validated by high-depth amplicon sequencing. High-mappability regions are regions with mappability score of 1 (see Supplemental Experimental Procedures).

(C) Comparison of the mutational spectra of validated somatic heterogeneous mutations and artifactual somatic mutations that failed validation in all samples. The reference base listed (C or T) includes the corresponding reverse complement (G or A).

(D) Distribution of signature weights obtained from the decomposition of mutational signatures from each tumor sample.

(E) Detailed mutational spectra of the trinucleotide context of the pool of mutations detected in all tumor samples. Trinucleotide contexts with significant enrichment in the validated somatic heterogeneous mutations or in the artifactual somatic mutations are shown with an asterisk above the corresponding bars. *p < 0.005. Statistical comparisons in (A), (B), (C), and (E) are based on Fisher’s exact tests. Statistical comparisons in (D) are based on Wilcoxon tests. All statistical tests are two-sided. A p value < 0.05 was considered to indicate statistical significance. See also Figures S4S6.

Examination of the genomic locations of mutations revealed that 41.1% of the artifactual somatic mutations occurred in regions of low mappability (Derrien et al., 2012) compared with only 6.4% for the validated somatic heterogeneous mutations (p < 0.001, Fisher’s exact test; Figures 4B and S4B). Furthermore, 30.8% of the discordant variants in the WES technical replicates occurred in regions of low mappability compared with 8.4% for the concordant variants (p < 0.001, Fisher’s exact test; Figures 4B and S4B). These results suggest that ambiguous mapping of DNA fragments directly contributes to artifactual somatic variants, even if longer reads (100 bp) were used (Figure S4C). Compared with the validated heterogeneous mutations, artifactual somatic mutations appeared to be significantly enriched in T > C transitions (p < 0.001, Fisher’s exact test; Figures 4C and S5A), particularly in the ApTpA and NpTpG trinucleotide contexts and also in T > G transversions in the GpTpG context (Figure 4E). By contrast, artifactual somatic mutations were significantly depleted in C > G transversions (p < 0.001, Fisher’s exact test; Figures 4C and S5A). Interestingly, validated somatic heterogeneous mutations were enriched for C > G substitutions in the TpCpA and TpCpT contexts (Figure 4E), the characteristic substitution patterns induced by the upregulation of APOBEC cytidine deaminases (Nik-Zainal et al., 2016). However, this enrichment appears to be driven primarily by the case with the largest variability in tumor purity (Figures S2B and S5A). Although a substantial proportion of the artifactual mutations detected were C > T transitions, which have been associated with both the aging process and with labinduced cytosine deamination during DNA library preparation (Alexandrov et al., 2013; Chen et al., 2014), their proportion was similar between the artifactual and validated somatic heterogeneous variants (34.6% versus 33.9%; p > 0.05, Fisher’s exact test; Figure 4C), with mutations occurring in the TpCpN context being significantly underrepresented in the artifactual variants (Figure 4E). We also did not observe an enrichment of C > T substitutions at low VAFs among the artifactual somatic mutations, which would have been indicative of likely FFPE fixation artifacts (Graw et al., 2015; Do and Dobrovic, 2015) (Figure S5B). Mutational signature analysis using previously defined mutational signatures (Alexandrov et al., 2013) identified signatures 5 and 29 to be overrepresented in the artifactual somatic mutations (p = 0.028, Wilcoxon test; Figure 4D), with a median of 16.1% (range 0%–44.4%) of the artifactual mutations classified as signature 5, driven mainly by the T > C transitions reported above. Signature 29 is driven by C > A mutations, predominantly in the ApCpA and GpCpA contexts, that were significantly overrepresented among the artifactual variants (Figure 4E).

Finally, we considered whether applying filtering strategies to exclude mutations that occur in low-mappability regions or within sequence contexts enriched in artifactual mutations (C > A mutations in ApCpA and GpCpA contexts, T > C mutations in ApTpA and [C/T/G]pTpG contexts, and T > G mutations in GpTpG context; Figure 4E) can improve the reliability of WES. Excluding putative somatic mutations in low-mappability regions improved the precision for mutations detected in the technical (0.862 versus 0.792 without filtering) and biological replicates (0.876 versus 0.817 without filtering), while reducing moderately the sensitivity (0.858 versus 0.913 for technical, 0.773 versus 0.824 for biological). Filtering by mutational context reduced the sensitivity without appreciably improving precision, as did filtering by the combination of filters (Figure 5).

Figure 5. Effect of Filtering of Variants Called by WES Pipelines in Technical Replicates and Intratumor Biopsies.

Figure 5.

Excluding somatic mutations in low-mappability regions improves precision (or positive predictive value) by reducing false-positive calls, while slightly reducing the sensitivity by increased false-negative calls.

DISCUSSION

WES is an appealing and increasingly affordable technology to study the extent of ITGH in an unbiased manner (i.e., without a priori selecting genes of interest for sequencing). The reliability of WES to detect low-frequency mutations, which often account for the majority of ITGH within a cancer, and therefore to resolve clonal architecture (Figure S6) depends on the experimental design, the sequencing depth, tumor purity, and the bioinformatics approaches used to define the somatic variants. To examine the influence of these factors on measuring ITGH, here we generated a dataset incorporating both technical and biological replicates sequenced to depths commonly found in clinical or translational research studies, and validated every putative somatic variant detected with orthogonal high-depth sequencing methods. All datasets generated in this study have been made publicly available to provide a resource for the community to refine analytical tools for ITGH detection from WES.

We performed six pairs of technical replicates that involved independent library preparation and sequencing of aliquots from the same DNA extractions, and experiments were performed on different days. The technical replicates revealed an unexpectedly high degree of discordance in the putative somatic variants identified, even using the current best practice matched-normal variant calling analysis approach. Subsequent validation with high-depth amplicon sequencing (6053 median coverage) of all variants identified by WES demonstrated that the majority of the false-positive somatic variants (1) displayed low VAF and were often detected as subclonal in one experiment but not in the other, (2) were in fact germline-like variants that appeared as heterogeneous somatic mutations (Kim and Speed, 2013), or (3) map to genomic regions of low mappability. The enrichment of subclonal mutations among the discordant mutations in the WES technical replicates is expected, given the well-known difficulty in identifying somatic mutations at low VAF. Indeed, comparing the putative mutations that did not validate to the “true” somatic mutations validated by high-depth amplicon sequencing demonstrated that mutations with low VAF and/or low alternative coverage were more difficult to be reliably identified by WES. On the other hand, our results revealed a not insignificant proportion of germline-like false-positive mutation calls. Although some of these germline-like variants are genuinely germline alleles not detected in the matched normal samples, a substantial proportion of these are likely attributed to alignment and sequencing biases (Kim and Speed, 2013) that manifested as false-positive variants at low VAF or alternative coverage. Importantly, our results high-lighted the often overlooked importance of adequate coverage for the matched-normal sample in the accurate identification of somatic mutations, given that mutations that failed validation were, on average, associated with lower coverage in the normal sample. In terms of mappability of genomic regions, we found that false-positive mutation calls were enriched in genomic regions of poor mappability. In fact, we demonstrated that this may represent a reasonable filter if specificity is of paramount importance and some trade-off in sensitivity can be tolerated. Although our study provides direct evidence in support of ITGH, the mutations implicated in ITGH showed substantial overlap with the alterations stemming from intrinsic technical noise in terms of VAF, alternate allele depth, total depth in tumor and normal, as well as mappability. Incorporating unique molecular identifiers into deep sequencing experiments will likely reduce false positives and enhance sensitivity in detecting subclonal mutations with greater confidence (Salk et al., 2018). Furthermore, because private mutations are less likely to be identified in biopsies of relatively low purity, ITGH should be best assessed in biopsies with uniformly high cellularity.

Our analysis of the mutational signatures between the validated mutations implicated in ITGH versus the false-positive mutations revealed striking differences. The heterogeneous mutations were enriched in a pattern typically associated with increased APOBEC activity, and this pattern has been previously shown to contribute to ITGH in breast and lung cancers (de Bruin et al., 2014; Ng et al., 2017). On the other hand, the false-positive mutations were enriched for, in particular, C > A and T > G mutations at specific sequence contexts. A recent study of rare polymorphisms determined by high-depth whole-genome sequencing in 300 individuals of diverse genetic origins identified four mutational signatures, two of which were consistent within populations and had a clear association with geographic distribution (Mathieson and Reich, 2017). The origin of the remaining two were uninterpretable, with one of these latter signatures dominated by T > G mutations in the GpTpG context and the other signature highly correlated with COSMIC signature 5, which has been found in all cancer types (Alexandrov et al., 2013) and has been suggested to display clock-like properties suggestive of an association with the aging process (Alexandrov et al., 2015). Our analysis of the sequence context of false-positive variants identified both these features as being significantly enriched in artifactual putative mutations (Figures 4C–4E), strongly suggesting that caution should be exercised in the interpretation of the reported mutational signatures.

This study has several limitations. The sample size of 6 breast cancers with 18 biopsies may be too limited to allow generalizing our results on ITGH to all breast cancer subtypes or to other cancers. Of note, the unique nested experimental design incorporating within-sample technical replication, processed and sequenced in the same manner as the intratumor biopsies, provided an estimate of background discordance against which the ITGH results could be interpreted. Additionally, the extensive orthogonal validation by high-depth amplicon sequencing on an independent sequencing platform with very different chemistry from the platform used for WES adds rigor to our study. The sequencing depth attained in this study is comparable with that in previous studies using WES (with subsequent high-depth sequencing for validation) for the genomic characterization of ITGH (Yates et al., 2015); it is plausible, however, that WES at higher depth (i.e., >2503) would mitigate in part the false positives and false negatives, in particular in samples with tumor cell content < 50%. Finally, the three WES analyses pipelines used different calling algorithms and filtering steps that reflected the best practice at the time of the analysis, but future improvements could result in reduced bias.

In summary, our study showed that WES at 184 mean depth of coverage in the tumor samples overestimates the extent of ITGH, and the technical noise associated with somatic mutation detection using WES alone can confound true ITGH. Our results also suggested that it is not possible to reduce the false-positive rate through more aggressive minimum depth filtering without affecting the sensitivity of detecting true somatic mutations in the 1%–5% VAF range, but excluding mutations that occur in low-mappability regions of the genome, or in certain mutational contexts could reduce artifactual somatic mutations and provide less biased estimates of ITGH. Nevertheless, orthogonal, high-depth validation experiments are highly desirable in the context of quantifying ITGH.

EXPERIMENTAL PROCEDURES

Tumor Sample Collection

Breast cancer samples were collected from patients with newly diagnosed invasive breast cancer with tumor size > 2 cm at the Yale Cancer Center. Tumor tissues were obtained with three punch biopsies at least 1 cm apart from three different regions of the tumor after pathologic gross examination. Six of these tumors with high enough cellularity (>50%) and high DNA quality from all three biopsies and with matched blood DNA were selected for this study.

WES and Analysis

DNA was extracted and library was prepared using standard protocols, and the exome was captured using the NimbleGen SeqCap EZ Human Exome Kit version 2.0. Sequencing was performed on the HiSeq 2000 in paired-end 75-cycle mode at the Yale Center for Genome Analysis. We used three different analytical pipelines for detecting variants. A single-sample “tumor-only” pipeline, a “cohort-normal” pipeline using an in-house normal reference obtained from ten unrelated normal blood DNA samples, and a “matched-normal” pipeline using the matched-normal DNA from each patient as reference. Further details are provided in Supplemental Experimental Procedures.

Validation of Putative Variants with High-Depth Amplicon Sequencing

Variants identified by WES were subjected to validation with high-depth amplicon sequencing using custom AmpliSeq panels on the same tumor and matched normal DNA samples. Amplicon sequencing was performed to a median depth of 6003. Further details are provided in Supplemental Experimental Procedures.

Mutational Signature Analysis and Mappability

Mutational signature analyses comparing the validated variants that contribute to ITHG and the WES false-positive calls were performed for individual tumor samples and for the pooled mutations over all samples using the R package deconstructSigs (Rosenthal et al., 2016). Mappability of SNVs was assessed using the CRG Alignability track (Derrien et al., 2012) in the UCSC Genome Browser. Additional details are provided in Supplemental Experimental Procedures.

Supplementary Material

1
2
3
4
5
6

Highlights.

  • Genuine intratumor genetic heterogeneity is hard to distinguish from sequencing artifacts

  • Cancer-only WES pipelines are unreliable (69% somatic mutations are false positive)

  • 34%–80% of somatic variants contributing to genetic heterogeneity are technical noise

  • Excluding mutations in low-mappability regions and higher coverage are needed

ACKNOWLEDGMENTS

Research reported in this publication was supported in part by the Breast Cancer Research Foundation (to L.P., J.S.R.-F., and C.H.) and by a Cancer Center Support Grant of the NIH/National Cancer Institute (grant P30CA008748). S.P. is currently funded by the Swiss National Foundation (Ambizione grant PZ00P3_168165). The content is solely the responsibility of the authors and does not necessarily represent the official views of the NIH. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Footnotes

DATA AND SOFTWARE AVAILABILITY

The accession number for the raw data from WES and AmpliSeq sequences reported in this paper is SRA: SRP070662.

DECLARATION OF INTERESTS

The authors declare no competing interests.

REFERENCES

  1. Alexandrov LB, Nik-Zainal S, Wedge DC, Aparicio SA, Behjati S, Biankin AV, Bignell GR, Bolli N, Borg A, Børresen-Dale AL, et al. ; Australian Pancreatic Cancer Genome Initiative; ICGC Breast Cancer Consortium; ICGC MMML-Seq Consortium; ICGC PedBrain (2013). Signatures of mutational processes in human cancer. Nature 500, 415–421. [DOI] [PMC free article] [PubMed] [Google Scholar]
  2. Alexandrov LB, Jones PH, Wedge DC, Sale JE, Campbell PJ, Nik-Zainal S, and Stratton MR (2015). Clock-like mutational processes in human somatic cells. Nat. Genet 47, 1402–1407. [DOI] [PMC free article] [PubMed] [Google Scholar]
  3. Carter SL, Cibulskis K, Helman E, McKenna A, Shen H, Zack T, Laird PW, Onofrio RC, Winckler W, Weir BA, et al. (2012). Absolute quantification of somatic DNA alterations in human cancer. Nat. Biotechnol 30, 413–421. [DOI] [PMC free article] [PubMed] [Google Scholar]
  4. Chen G, Mosier S, Gocke CD, Lin M-T, and Eshleman JR (2014). Cytosine deamination is a major cause of baseline noise in next-generation sequencing. Mol. Diagn. Ther 18, 587–593. [DOI] [PMC free article] [PubMed] [Google Scholar]
  5. Cibulskis K, Lawrence MS, Carter SL, Sivachenko A, Jaffe D, Sougnez C, Gabriel S, Meyerson M, Lander ES, and Getz G (2013). Sensitive detection of somatic point mutations in impure and heterogeneous cancer samples. Nat. Biotechnol 31, 213–219. [DOI] [PMC free article] [PubMed] [Google Scholar]
  6. Clark MJ, Chen R, Lam HYK, Karczewski KJ, Chen R, Euskirchen G, Butte AJ, and Snyder M (2011). Performance comparison of exome DNA sequencing technologies. Nat. Biotechnol 29, 908–914 [DOI] [PMC free article] [PubMed] [Google Scholar]
  7. de Bruin EC, McGranahan N, Mitter R, Salm M, Wedge DC, Yates L, Jamal-Hanjani M, Shafi S, Murugaesu N, Rowan AJ, et al. (2014). Spatial and temporal diversity in genomic instability processes defines lung cancer evolution. Science 346, 251–256. [DOI] [PMC free article] [PubMed] [Google Scholar]
  8. Derrien T, Estelle J, Marco Sola S, Knowles DG, Raineri E, Guigo R, and Ribeca P (2012). Fast computation and applications of genome mappability. PLoS ONE 7, e30377. [DOI] [PMC free article] [PubMed] [Google Scholar]
  9. Ding L, Ellis MJ, Li S, Larson DE, Chen K, Wallis JW, Harris CC, McLellan MD, Fulton RS, Fulton LL, et al. (2010). Genome remodeling in a basal-like breast cancer metastasis and xenograft. Nature 464, 999–1005. [DOI] [PMC free article] [PubMed] [Google Scholar]
  10. Do H, and Dobrovic A (2015). Sequence artifacts in DNA from formalinfixed tissues: causes and strategies for minimization. Clin. Chem 61, 64–71. [DOI] [PubMed] [Google Scholar]
  11. Fisher R, Pusztai L, and Swanton C (2013). Cancer heterogeneity: implications for targeted therapeutics. Br. J. Cancer 108, 479–485. [DOI] [PMC free article] [PubMed] [Google Scholar]
  12. Gerlinger M, Rowan AJ, Horswell S, Math M, Larkin J, Endesfelder D,Gronroos E, Martinez P, Matthews N, Stewart A, et al. (2012). Intratumor heterogeneity and branched evolution revealed by multiregion sequencing. N. Engl. J. Med 366, 883–892. [DOI] [PMC free article] [PubMed] [Google Scholar]
  13. Graw S, Meier R, Minn K, Bloomer C, Godwin AK, Fridley B, Vlad A, Beyerlein P, and Chien J (2015). Robust gene expression and mutation analyses of RNA-sequencing of formalin-fixed diagnostic tumor samples. Sci. Rep 5, 12335. [DOI] [PMC free article] [PubMed] [Google Scholar]
  14. Hiley C, de Bruin EC, McGranahan N, and Swanton C (2014). Deciphering intratumor heterogeneity and temporal acquisition of driver events to refine precision medicine. Genome Biol 15, 453. [DOI] [PMC free article] [PubMed] [Google Scholar]
  15. Hou Y, Song L, Zhu P, Zhang B, Tao Y, Xu X, Li F, Wu K, Liang J, Shao D, et al. (2012). Single-cell exome sequencing and monoclonal evolution of a JAK2-negative myeloproliferative neoplasm. Cell 148, 873–885. [DOI] [PubMed] [Google Scholar]
  16. Jiang T, Shi W, Natowicz R, Ononye SN, Wali VB, Kluger Y, Pusztai L, and Hatzis C (2014). Statistical measures of transcriptional diversity capture genomic heterogeneity of cancer. BMC Genomics 15, 876. [DOI] [PMC free article] [PubMed] [Google Scholar]
  17. Kim SY, and Speed TP (2013). Comparing somatic mutation-callers: beyond Venn diagrams. BMC Bioinformatics 14, 189. [DOI] [PMC free article] [PubMed] [Google Scholar]
  18. Koboldt DC, Zhang Q, Larson DE, Shen D, McLellan MD, Lin L, Miller CA, Mardis ER, Ding L, and Wilson RK (2012). VarScan 2: somatic mutation and copy number alteration discovery in cancer by exome sequencing. Genome Res 22, 568–576 [DOI] [PMC free article] [PubMed] [Google Scholar]
  19. Li H (2014). Toward better understanding of artifacts in variant calling from high-coverage samples. Bioinformatics 30, 2843–2851. [DOI] [PMC free article] [PubMed] [Google Scholar]
  20. Martelotto LG, Baslan T, Kendall J, Geyer FC, Burke KA, Spraggon L, Piscuoglio S, Chadalavada K, Nanjangud G, Ng CKY, et al. (2017). Whole-genome single-cell copy number profiling from formalin-fixed paraffin-embedded samples. Nat. Med 23, 376–385. [DOI] [PMC free article] [PubMed] [Google Scholar]
  21. Marusyk A, Almendro V, and Polyak K (2012). Intra-tumour heterogeneity:a looking glass for cancer? Nat. Rev. Cancer 12, 323–334. [DOI] [PubMed] [Google Scholar]
  22. Mathieson I, and Reich D (2017). Differences in the rare variant spectrum among human populations. PLoS Genet 13, e1006581. [DOI] [PMC free article] [PubMed] [Google Scholar]
  23. Morris LGT, Riaz N, Desrichard A, Sxenbabaoglu Y, Hakimi AA, Makarov V, Reis-Filho JS, and Chan TA (2016). Pan-cancer analysis of intratumor heterogeneity as a prognostic determinant of survival. Oncotarget 7, 10051–10063. [DOI] [PMC free article] [PubMed] [Google Scholar]
  24. Nakamura K, Oshima T, Morimoto T, Ikeda S, Yoshikawa H, Shiwa Y, Ishikawa S, Linak MC, Hirai A, Takahashi H, et al. (2011). Sequence-specific error profile of Illumina sequencers. Nucleic Acids Res 39, e90. [DOI] [PMC free article] [PubMed] [Google Scholar]
  25. Navin N, Kendall J, Troge J, Andrews P, Rodgers L, McIndoo J, Cook K, Stepansky A, Levy D, Esposito D, et al. (2011). Tumour evolution inferred by single-cell sequencing. Nature 472, 90–94. [DOI] [PMC free article] [PubMed] [Google Scholar]
  26. Newburger DE, Kashef-Haghighi D, Weng Z, Salari R, Sweeney RT, Brunner AL, Zhu SX, Guo X, Varma S, Troxell ML, et al. (2013). Genome evolution during progression to breast cancer. Genome Res 23, 1097–1108. [DOI] [PMC free article] [PubMed] [Google Scholar]
  27. Ng CKY, Bidard F-C, Piscuoglio S, Geyer FC, Lim RS, de Bruijn I, Shen R, Pareja F, Berman SH, Wang L, et al. (2017). Genetic heterogeneity in therapy-naı¨ve synchronous primary breast cancers and their metastases. Clin. Cancer Res 23, 4402–4415. [DOI] [PMC free article] [PubMed] [Google Scholar]
  28. Nik-Zainal S, Van Loo P, Wedge DC, Alexandrov LB, Greenman CD, Lau KW, Raine K, Jones D, Marshall J, Ramakrishna M, et al. ; Breast Cancer Working Group of the International Cancer Genome Consortium (2012). The life history of 21 breast cancers. Cell 149, 994–1007. [DOI] [PMC free article] [PubMed] [Google Scholar]
  29. Nik-Zainal S, Davies H, Staaf J, Ramakrishna M, Glodzik D, Zou X, Martincorena I, Alexandrov LB, Martin S, Wedge DC, et al. (2016). Landscape of somatic mutations in 560 breast cancer whole-genome sequences. Nature 534, 47–54. [DOI] [PMC free article] [PubMed] [Google Scholar]
  30. Qi Y, Liu X, Liu C-G, Wang B, Hess KR, Symmans WF, Shi W, and Pusztai L (2015). Reproducibility of variant calls in replicate next generation sequencing experiments. PLoS ONE 10, e0119230. [DOI] [PMC free article] [PubMed] [Google Scholar]
  31. Rosenthal R, McGranahan N, Herrero J, Taylor BS, and Swanton C (2016). DeconstructSigs: delineating mutational processes in single tumors distinguishes DNA repair deficiencies and patterns of carcinoma evolution. Genome Biol 17, 31. [DOI] [PMC free article] [PubMed] [Google Scholar]
  32. Salk JJ, Schmitt MW, and Loeb LA (2018). Enhancing the accuracy of next-generation sequencing for detecting rare and subclonal mutations. Nat. Rev. Genet 19, 269–285 [DOI] [PMC free article] [PubMed] [Google Scholar]
  33. Saunders CT, Wong WSW, Swamy S, Becq J, Murray LJ, and Cheetham RK (2012). Strelka: accurate somatic small-variant calling from sequenced tumor-normal sample pairs. Bioinformatics 28, 1811–1817 [DOI] [PubMed] [Google Scholar]
  34. Shen R, and Seshan VE (2016). FACETS: allele-specific copy number and clonal heterogeneity analysis tool for high-throughput DNA sequencing. Nucleic Acids Res 44, e131. [DOI] [PMC free article] [PubMed] [Google Scholar]
  35. Shi W, Jiang T, Nuciforo P, Hatzis C, Holmes E, Harbeck N, Sotiriou C, Pena L, Loi S, Rosa DD, et al. (2017). Pathway level alterations rather than mutations in single genes predict response to HER2-targeted therapies in the neo-ALTTO trial. Ann. Oncol 28, 128–135 [DOI] [PMC free article] [PubMed] [Google Scholar]
  36. Sims D, Sudbery I, Ilott NE, Heger A, and Ponting CP (2014). Sequencing depth and coverage: key considerations in genomic analyses. Nat. Rev. Genet 15, 121–132. [DOI] [PubMed] [Google Scholar]
  37. Smith EN, Jepsen K, Khosroheidari M, Rassenti LZ, D’Antonio M, Ghia EM, Carson DA, Jamieson CH, Kipps TJ, and Frazer KA (2014). Biased estimates of clonal evolution and subclonal heterogeneity can arise from PCR duplicates in deep sequencing experiments. Genome Biol 15, 420. [DOI] [PMC free article] [PubMed] [Google Scholar]
  38. Turner NC, and Reis-Filho JS (2012). Genetic heterogeneity and cancer drug resistance. Lancet Oncol 13, e178–e185. [DOI] [PubMed] [Google Scholar]
  39. von Wahlde M-K, Timms KM, Chagpar A, Wali VB, Jiang T, Bossuyt V,Saglam O, Reid J, Gutin A, Neff C, et al. (2017). Intratumor heterogeneity of homologous recombination deficiency in primary breast cancer. Clin. Cancer Res 23, 1193–1199. [DOI] [PubMed] [Google Scholar]
  40. Wang Y, Waters J, Leung ML, Unruh A, Roh W, Shi X, Chen K, Scheet P, Vattathil S, Liang H, et al. (2014). Clonal evolution in breast cancer revealed by single nucleus genome sequencing. Nature 512, 155–160. [DOI] [PMC free article] [PubMed] [Google Scholar]
  41. Wilm A, Aw PPK, Bertrand D, Yeo GHT, Ong SH, Wong CH, Khor CC, Petric R, Hibberd ML, and Nagarajan N (2012). LoFreq: a sequencequality aware, ultra-sensitive variant caller for uncovering cell-population heterogeneity from high-throughput sequencing datasets. Nucleic Acids Res 40, 11189–11201. [DOI] [PMC free article] [PubMed] [Google Scholar]
  42. Yap TA, Gerlinger M, Futreal PA, Pusztai L, and Swanton C (2012). Intratumor heterogeneity: seeing the wood for the trees. Sci. Transl. Med 4, 127ps10. [DOI] [PubMed] [Google Scholar]
  43. Yates LR, Gerstung M, Knappskog S, Desmedt C, Gundem G, Van Loo P, Aas T, Alexandrov LB, Larsimont D, Davies H, et al. (2015). Subclonal diversification of primary breast cancer revealed by multiregion sequencing. Nat. Med 21, 751–759. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

1
2
3
4
5
6

RESOURCES