Abstract
Next-generation sequencing shows great promise by allowing rapid mutational analysis of multiple genes in human cancers. Recently, we implemented the multiplex PCR-based Ion AmpliSeq Cancer Hotspot Panel (>200 amplicons in 50 genes) to evaluate EGFR, KRAS, and BRAF in lung and colorectal adenocarcinomas. In 10% of samples, automated analysis identified a novel G873R substitution mutation in EGFR. By examining reads individually, we found this mutation in >5% of reads in 50 of 291 samples and also found similar events in 18 additional amplicons. These apparent mutations are present only in short reads and within 10 bases of either end of the read. We therefore hypothesized that these were from panel primers promiscuously binding to nearly complementary sequences of nontargeted amplicons. Sequences around the mutations matched primer binding sites in the panel in 18 of 19 cases, thus likely corresponding to panel primers. Furthermore, because most primers did not show this effect, we demonstrated that next-generation sequencing may be used to better design multiplex PCR primers through iterative elimination of offending primers to minimize mispriming. Our results indicate the need for careful sequence analysis to avoid false-positive mutations that can arise in multiplex PCR panels. The AmpliSeq Cancer panel is a valuable tool for clinical diagnostics, provided awareness of potential artifacts.
Detecting driver mutations in cancer genomes is of increasing importance for patient care, both for prognostic significance and for allowing better utilization of targeted therapies. Determining the mutational status of specific genes, such as KRAS, BRAF, and EGFR in lung adenocarcinoma and KRAS and BRAF in colorectal adenocarcinoma, has become the standard of care in clinical oncology to direct epidermal growth factor receptor (EGFR) inhibitor therapy.1,2 To achieve this, targeted sequencing of these genes with the use of Sanger sequencing and pyrosequencing is widely available. However, Sanger sequencing is labor intensive and has a relatively poor analytic sensitivity (approximately 20% mutant alleles), requiring specimens with a significant percentage of tumor nuclei (>40%) to detect heterozygous mutations.3,4 Pyrosequencing, although less labor intensive with a better limit of detection (analytic sensitivity approximately 5%), is typically limited to short regions of DNA, requiring the clustering of mutations (eg, KRAS codons 12 and 13). With each of these approaches, a single amplicon from a single patient is analyzed in a single well or capillary.
Massively parallel, next-generation sequencing (NGS) platforms, such as the Ion Torrent Personal Genome Machine (PGM) and the Illumina MiSeq, provide limits of detection superior to pyrosequencing combined with even broader genomic coverage.5–7 Although NGS platforms have the capability to perform cancer whole genome/exome sequencing, targeted sequencing of panels of amplicons with actionable and hotspot mutations is currently more practical in a clinical laboratory setting.8–11 One panel, the Ion Torrent AmpliSeq Cancer Hotspot Panel, uses multiplex-PCR to cover >200 amplicons in 50 genes known to be involved in carcinogenesis. We recently transitioned to this platform to determine KRAS, BRAF, and EGFR mutation status in formalin-fixed, paraffin-embedded (FFPE) lung and colon cancer specimens.
A major issue with FFPE specimens is significant variability in both the quality and quantity of DNA that can be isolated. Sources of this variability include the amount of tumor in the biopsy, time from biopsy/resection to fixation, and time in formalin before processing.12,13 As a result, we often are left with relatively low DNA concentrations, which may require use of less DNA than the recommended 10- to 30-ng amount for the AmpliSeq panel.
Here, we report that off-target amplification is common in multiplex-PCR–based NGS, yielding 19 mispriming events in 208 amplicons (9%) in our study. We define the signature features to identify mispriming events and show that false-positive mutations can be avoided by using multiple bioinformatic analysis tools in the pipeline. We also show that these events are more common with lower input DNA amounts. We demonstrate that the phenomenon is due to multiplex PCR and is not seen when primers are used in monoplex reactions. Finally, because the vast majority of primers do not show significant mispriming, we hypothesize that NGS may be the ultimate multiplex PCR primer design tool by allowing for sensitive detection of off-target amplification and consequent iterative primer design.
Materials and Methods
Materials
As a part of ongoing clinical diagnosis of mutations in lung and colorectal adenocarcinomas at The Johns Hopkins Hospital, 291 consecutive FFPE tissue specimens were analyzed during a 6-month period from January to June 2013. In addition, 10 FFPE tissue specimens from a variety of tissue types were analyzed at Duke University Hospital. The tumor tissues were enriched by manual dissection of targeted areas identified by anatomical pathologists as described previously.3 DNA was isolated as described previously.3,14 Concentration of DNA was determined by Qubit 2.0 Fluorometer (Life Technologies, Carlsbad, CA).
NGS
The NGS was conducted with the Ion AmpliSeq Cancer Hotspot Panel version 2 for targeted multigene amplification [207 amplicons covering approximately 2800 Catalog of Somatic Mutations in Cancer (COSMIC, http://cancer.sanger.ac.uk/cancergenome/projects/cosmic, last accessed July 14, 2014) mutations from 50 oncogenes and tumor suppressor genes], Ion AmpliSeq Library Kit 2.0 for library preparation, Ion OneTouch 200 Template Kit version 2 DL and Ion OneTouch ES Instrument for emulsion PCR and enrichment, Ion PGM 200 Sequencing Kit, Ion 318 Chips, and the PGM sequencing platform for massive parallel sequencing (Life Technologies), as recommended by the manufacturers' protocols without modification. The DNA input for targeted multigene PCR was 0.8 to 30 ng. Up to eight specimens were barcoded with Ion Xpress Barcode Adapters (Life Technologies), pooled, and run on a single Ion 318 chip. This includes multiple patient samples and one control, which we rotate among water, normal, and a mix of positive control cell lines.
Analysis Pipeline
Sequencing data of three targeted genes that we have validated in clinical practice (KRAS, BRAF, and EGFR) were analyzed with Torrent Suite version 3.2.0 (Life Technologies). All other genes were masked for this analysis in Torrent Variant Caller by using specific BED files and in Ion Reporter using filters. Mutations were identified and annotated through Torrent Variant Caller version 3.2.45211 and Ion Reporter version 1.2. Visual inspection of the BAM file from each specimen, including the three genes for clinical diagnosis and the remainder of the genes in the AmpliSeq panel, was performed with Integrative Genomics Viewer (IGV) version 2.3 (Broad Institute, Cambridge, MA). IGV was used to verify the variants called and to identify short reads with potential mispriming events (see Results).
Targeted NGS of Individual Mispriming Sites
For confirmation of the putative mutation sites within EGFR exons 20 and 21, GNAQ exon 5, ABL1 exon 7, RB1 exon 20, RET exon 15, SMAD4 exon 6, and APC exon 14, primers were designed for sequencing on the Ion Torrent PGM platform. Primer pairs were as follows: 5′-TGTCCGGGAACACAAAGACA-3′ and 5′-CTGGCTCCTTATCTCCCCTC-3′ for EGFR exon 20; 5′-CGCAGCATGTCAAGATCACA-3′ and 5′-TGTCAGGAAAATGCTGGCTG-3′ for EGFR exon 21; 5′-TAACCTTGCAGAATGGTCGATG-3′ and 5′-AACACTTACCTCATTGTCTGACT-3′ for GNAQ exon 5; 5′-TCTTGCTGCCCGAAACTG-3′ and 5′-ATGGGCTGTGTAGGTGTCC-3′ for ABL1 exon 7; 5′-TGTGAACGCCTTCTGTCTGA-3′ and 5′-TGGTCCAAATGCCTGTCTCT-3′ for RB1 exon 20; 5′-AGAAACATCCTGGTAGCTGAGG-3′ and 5′-CCTGGCTCCTCTTCACGTAG-3′ for RET exon 15; 5′-GGCAGCCATAGTGAAGGAC-3′ and 5′-TACTATGATGGTAAGTAGCTGGC-3′ for SMAD4 exon 6; and, 5′-TGAAACAGAATCAGAGCAGCC-3′ and 5′-GACTTTGTTGGCATGGCAGA-3′ for APC exon 14. For iterative primer design to minimize mispriming, additional forward primers were used as follows: 5′-GACTTGGCAGCCAGAAACAT-3′ (RET F2) and 5′-TTTTCCTCACAGCTCGTTCA-3′ (RET F3) for RET exon 15 and 5′-TGACAATGGGAATGAAACAGA-3′ (APC F2) and 5′-GGAAAATGACAATGGGAATGA-3′ (APC F3) for APC exon 14. Adaptor sequences A (5′-CCATCTCATCCCTGCGTGTCTCCGACTCAG-3′) and P1 (5′-CCTCTCTATGGGCAGTCGGTGAT-3′) for the Ion Torrent system were included at the 5′ end of the forward and reverse primers, respectively. Each exon was separately amplified with Platinum PCR SuperMix High Fidelity (Life Technologies), verified by agarose gel electrophoresis, and purified with the QIAquick PCR Purification Kit (Qiagen, Hilden, Germany). The final products were pooled, and DNA was quantified with a Qubit 2.0 Fluorometer. The Ion OneTouch 200 Template Kit version 2 DL and the Ion OneTouch ES Instrument were then used for emulsion PCR and enrichment, and the Ion PGM 200 Sequencing Kit, Ion 318 Chips, and the PGM sequencing platform for massive parallel sequencing, as recommended by the manufacturers' protocols without modification. Sequencing data were analyzed with Torrent Suite for mapping and IGV for evaluation of read depth and mutational status.
Additional Bioinformatic Analyses
Sequencing data for four specimens were reanalyzed separately with Burrows-Wheeler Transform-based mapping algorithms: Bowtie 2 (Johns Hopkins University, Baltimore, MD) and Burrows-Wheeler Aligner (BWA; Wellcome Trust Sanger Institute, Cambridge, UK).15,16 Raw sequencing data (FASTQ files) for each specimen were passed through each algorithm with default settings, and the resulting alignment files (BAM format) were coordinated, sorted, and evaluated with IGV for read depth and mutational status.
Statistical Analysis
All statistical analyses, including Fisher exact tests, Wilcoxon rank-sum tests, and Kendall correlations, were performed with R version 3.0.1 (R Foundation for Statistical Computing, Vienna, Austria).
Results
Initial Identification of the EGFR Exon 21 G873R Mutation
Although analyzing sequence data from exons of KRAS, BRAF, and EGFR, Ion Torrent Variant Caller and/or Ion Reporter reported an apparent novel mutation in exon 21 of EGFR, c.2617G>A (p.Gly873Arg, G873R), occurring in up to 15% of reads, in 23 of our 291 patient samples (15 by both Ion Torrent Variant Caller and Ion Reporter and 8 by Ion Reporter alone). This mutation was not reported in COSMIC (http://cancer.sanger.ac.uk/cancergenome/projects/cosmic, last accessed July 14, 2014). Careful examination of the actual sequence reads in IGV in all of our cases showed that this mutation was present in >5% of total reads in 50 of 291 samples. Notably, it was only present in shorter reads. These reads had a common 5′ end, starting in the middle of the exon 21 amplicon in the AmpliSeq panel, with three single-base mutations within the first 8 bases, whereas the 3′ end matched the full-length amplicon (Figure 1, A and D). This result indicated that these apparent mutations might be due to another primer mispriming within this EGFR exon as a somewhat nonspecific reaction, where it contained enough homology to bind, but where the three bases diverged between the offending primer and the EGFR off-target sequence, producing the three apparent mutations. The mispriming primer participates with the correct EGFR reverse primer to produce the inappropriate amplicon for sequencing.
Figure 1.
Next-generation sequencing mispriming events occur at multiple amplicons within the Cancer Hotspot Panel. Representative screen capture images from the Integrative Genomics Viewer showing full-length (top of each panel) and short, misprimed reads (bottom of each panel) for EGFR exon 21 (A), RB1 exon 20 (B), APC exon 14 (C), and detail of the 5′ end of EGFR exon 21 (D).
Other Mispriming Events
We then set out to systematically determine whether the EGFR exon 21 G873R-producing short reads were unique to this amplicon, or whether similar false-positive mutations were found in other genes in the AmpliSeq panel. To screen these genes most effectively, we first started with a single specimen with the best combination of a high number of apparent G873R mutations (15.5%) and a large number of EGFR exon 21 reads (2504), reasoning that it offered the optimum chance to catch these mutations at their highest frequencies, even in amplicons with lower coverage than EGFR exon 21. By using a cutoff of 5% of total reads, we identified 17 sites, with a similar pattern of mutations generated at the end of short reads, within the exons of the 50 genes in the panel, along with two mutations in intronic regions contained within the available amplicons (Table 1). The most prevalent example, a T224N mutation in exon 5 of GNAQ, occurred in 19.5% of 2408 reads, whereas mutations in ABL1, APC, ERBB2, and RB1 each happened in >10% of total reads. The various mispriming events result in one to three base substitutions (Figure 1, B–D, respectively), as well as in one or two base insertions (Table 1). These mispriming events, when identified, occur at the same positions in every specimen (see representative examples in Supplemental Table S1).
Table 1.
False-Positive Mutations from Nonspecific Primer Binding
| Gene | Exon | hg19 Genome location | Mutation type | Mutation | cDNA position | Protein change | Mutant reads | Total reads | Percent, % |
|---|---|---|---|---|---|---|---|---|---|
| ABL1 | 7 | 9:133750318 | Silent | C>T | 1206 | G402G | 805 | 6484 | 12.42 |
| ALK | 23 | 2:29443666 | Missense | G>A | 3551 | G1184E∗ | 71 | 1358 | 5.23 |
| APC | 14 (var1)† | 5:112175940 | Insertion | insA | 4594_4595 | Frameshift/truncation | 158 | 1102 | 14.34 |
| 17 (var2) | 4648_4649 | ||||||||
| 16 (var3) | 4648_4649 | ||||||||
| APC | 14 (var1)† | 5:112175938 | Silent | A>G | 4593 | Q1531Q‡ | 151 | 1091 | 13.84 |
| 17 (var2) | 4647 | Q1549Q | |||||||
| 16 (var3) | 4647 | Q1549Q | |||||||
| ATM | 26 | 11:108155119 | Silent | A>G | 3912 | R1304R | 124 | 2024 | 6.13 |
| EGFR | 20 | 7:55249168 | Silent | A>C | 2466 | A822A | 279 | 2938 | 9.50 |
| EGFR | 20 | 7:55249174 | 3′ intronic | A>G | 187 | 2643 | 7.08 | ||
| EGFR | 21 | 7:55259559 | Missense | G>A | 2617 | G873R | 388 | 2504 | 15.50 |
| EGFR | 21 | 7:55259552 | Missense | T>G | 2610 | H870Q | 285 | 2147 | 13.27 |
| EGFR | 21 | 7:55259554 | Missense | C>A | 2612 | A871E‡ | 287 | 2192 | 13.09 |
| ERBB2 | 21 (var1)† | 17:37881388 | Silent | A>G | 2580 | K860K | 252 | 2151 | 11.72 |
| 24 (var2) | 2490 | K830K | |||||||
| ERBB2 | 21 (var1)† | 17:37881384 | Missense | T>A | 2576 | V859D | 169 | 1990 | 8.49 |
| 24 (var2) | 2486 | V829D | |||||||
| GNAQ | 5 | 9:80409443 | Missense | G>A | 671 | T224N | 470 | 2408 | 19.52 |
| GNAQ | 5 | 9:80409448 | Insertion | insCA | 666_668 | Frameshift/truncation | 242 | 2118 | 11.43 |
| KIT | 2 | 4:55561706 | Silent | C>A | 96 | G32G | 107 | 2134 | 5.01 |
| PDGFRA | 18 | 4:55152043 | Silent | G>A | 2475 | L825L | 100 | 1496 | 6.68 |
| RB1 | 20 | 13:49033902 | Missense | A>C | 2039 | I680T | 250 | 2176 | 11.49 |
| RET | 15 (1) | 10:43615621 | Nonsense | T>G | 2700 | Y900X | 75 | 781 | 9.60 |
| RET | 15 (2) | 10:43615572 | Missense | A>T | 2653 | E884V∗ | 29 | 567 | 5.11 |
| SMAD4 | 6 | 18:48584594 | Missense | A>T | 767 | Q256L | 112 | 1164 | 9.62 |
| SMAD4 | 9 | 18:48591897 | Missense | G>C | 1060 | V354L | 187 | 2308 | 8.10 |
| SMAD4 | 10 | 18:48593453 | Missense | CT>TC | 1204 | L402S | 40 | 508 | 7.87 |
| STK11 | 8 | 19:1223060 | Missense | C>A | 997 | R333S | 61 | 1066 | 5.72 |
| TP53 | 5 | 17:7578556 | 5′ intronic | A>T | 371 | 5357 | 6.93 | ||
| TP53 | 8 | 17:7577106 | Missense | C>G | 835 | P279A | 26 | 418 | 6.22 |
COSMIC, Catalog of Somatic Mutations in Cancer; NGS, next-generation sequencing.
Reported in COSMIC and in the literature via a multiplex PCR NGS panel (see Discussion).
Variant transcripts in these genes lead to different exon numbers, cDNA positions, and amino acid numbers.
Mutations that have been reported in COSMIC (http://cancer.sanger.ac.uk, last accessed July 14, 2014).
Mispriming Events Correspond to Presumptive Primers in the Cancer Hotspot Panel
As discussed, we hypothesized that these short reads with mutations generated at one end represented amplification with mismatched primers (Figure 2). We therefore compared the sequences of the mutated ends of the short reads with the presumptive PCR primers (by examining the 20 bases on either side of the 5′ and 3′ ends of the full-length amplicons). We were able to identify homologous regions in 18 of the 19 short read sequences, consisting of 10 to 17 bases of sequence, primarily within genomic sequence immediately 5′ or 3′ to the ends of the full-length amplicons (Table 2). Interestingly, a single presumptive primer, at the 5′ end of exon 26 of the KDR gene, produced both the EGFR G873R events and the equally frequent silent Q1549Q mutation in exon 16 of APC. Careful searching of the FASTQ data files of the specimen also found rare short reads of PIK3CA (9 of 1600 reads) with the same KDR sequence.
Figure 2.
Model of mispriming events during library amplification. A: The primers recognize the 5′ and 3′ ends of each amplicon (green) and have 5′ adaptors for use in emulsion PCR and sequencing (orange). B: However, in certain situations, another primer in the mix (red) is partially homologous to an internal region of the amplicon (mismatches in black). The resultant product is shorter than full length and contains false-positive mutations corresponding to the primer mismatches in the middle of the true amplicon.
Table 2.
Nonspecific Primer Binding Sites and the Corresponding Cancer Hotspot Panel Primers
| Mispriming site | Target sequence | Presumptive primer∗ | Primer sequence | Resultant mutant sequence |
|---|---|---|---|---|
| ABL1 Exon 7 | 5′-GCTGATTTTGGCCTGAGCAGGT-3′ | MET Exon 19F | 5′-GCTGATTTTGGTCTTGCCAGAGAC-3′ | 5′-TTTGGTCTGAGCAGGTTGAT-3′ |
| ALK Exon 23 | 5′-CAGGCTCACCCCAATGCAGCGA-3′ | HRAS Exon 2R | 5′-CAGGCTCACCTCTATAGTGGGGTC-3′ | 5′-CACCTCAATGCAGCGAACAA-3′ |
| APC Exon 16 | 5′-CAAGAGAAAGAGGCAGAAA-3′ | KDR Exon 26F† | 5′-CAGGAAGAAAGAGGCATTTAATGAAA-3′ | 5′-CAGGAAGAAAGAGGCAGAAA-3′ |
| ATM Exon 26 | 5′-AGGGTACCAGAGACAGTGGGAT-3′ | ALK Exon 25R | 5′-AGGGTACCAGGAGATGATGTAAGGGAC-3′ | 5′-ACCAGGGACAGTGGGATGGC-3′ |
| EGFR Exon 20 | 5′-AAAGGTAATCAGTGAAGGGAG-3′ | RB1 Exon 18R | 5′-CAAGGTGATCAGTTGGTCCTTC-3′ | 5′-CAAGGTGATCAGTGAAGGGA-3′ |
| EGFR Exon 21 | 5′-CATGCAGAAGGAGGCAAAGT-3′ | KDR Exon 26F† | 5′-CAGGAAGAAAGAGGCATTTAATGAAA-3′ | 5′-CAGGAAGAAAGAGGCAAAGT-3′ |
| ERBB2 Exon 21 | 5′-CCCAACCATGTCAAAATTACAG-3′ | FBXW7 Exon 9R | 5′-CCCAACCATGACAAGATTTTCCCTTACCT-3′ | 5′-GACAAGATTACAGACTTCGG-3′ |
| GNAQ Exon 5 | 5′-TGTCAGCTCTATCATGTTTCT-3′ | KDR Exon 7F | 5′-TCAGTCAACTCTTTTTTTTCAGC-3′ | 5′-TCAGTCAACTCTATCATGTT-3′ |
| KIT Exon 2 | 5′-AGTCCAGGCGAACCGTCTCCACC-3′ | EGFR Exon 20R (1 of 2)‡ | 5′-AGTCCAGGAGGCAGC-3′ | 5′-GTCCAGGAGAACCGTCTCCA-3′ |
| PDGFRA Exon 18 | 5′-CGTCCTGCTGGCACAAGGAAA-3′ | FGFR3 Exon 14R | 5′-CGTCCTACTGGCATGACCCCCAC-3′ | 5′-ACTGGCACAAGGAAAAATTG-3′ |
| RB1 Exon 20 | 5′-TCCAGTTGATATGTTCTAATCTG-3′ | SMAD4 Exon 3R | 5′-TCCAGGTGATACAACTCGTT-3′ | 5′-CCAGGTGATATGTTCTAATC-3′ |
| RET Exon 15 (1) | 5′-TGAAGAGGATTCCTACGTGAAG-3′ | KDR Exon 27F | 5′-GGAAGAGGATTCTGGACTCTCT-3′ | 5′-GGAAGAGGATTCCTACGTGA-3′ |
| RET Exon 15 (2) | 5′-GGTAGCTGAGGGGCGGAAGAT-3′ | VHL Exon 1R | 5′-GGTAGCTGTGGATGCGGCGGCTC-3′ | 5′-AGCTGTGGGGCGGAAGAT-3′ |
| SMAD4 Exon 10 | 5′-AGGCACCTGACCCAAACATCAC-3′ | CSF1 Exon 22F | 5′-GAGCACCTGACCTGCTGCGAGC-3′ | 5′-GAGCACCTGACCCAAACATC-3′ |
| SMAD4 Exon 6 | 5′-TAGCTGGCTGACCAGTAAATCC-3′ | PTEN Exon 7R (1 of 2)‡ | 5′-TAGCTGGCAGACCAC-3′ | 5′-TGGCAGACCAGTAAATCCAT-3′ |
| SMAD4 Exon 9 | 5′-AGGGTCCACGTATCCATCAACA-3′ | APC Exon 14F (6 of 7)‡ | 5′-AGGGTCCAGGTTCTTCCAGATGCT-3′ | 5′-CCAGGTATCCATCAACAGTA-3′ |
| STK11 Exon 8 | 5′-GCGCAGCATGACTGTGGTGCCG-3′ | Not identified§ | 5′-GAGCAGCATGACTGTGGTGC-3′ | |
| TP53 Exon 5 | 5′-GGGAGTACTGTAGGAAGAGGA-3′ | SMO Exon 5F | 5′-GGGAGTACAGAGTGACCGCCTC-3′ | 5′-ACAGTAGGAAGAGGAAGAAG-3′ |
| TP53 Exon 8 | 5′-GACAGGCACAAACACGCACCTC-3′ | HNF1A Exon 4F | 5′-CACAGGCACAGGGGCTG-3′ | 5′-CACAGGCACAAACACGCACC-3′ |
Italicized sequence indicates genomic sequence 5′ to start of amplicon, bold sequence indicates overlap between presumptive primers and target amplicons, and underlined sequence indicates mismatched bases generating false-positive mutations.
Primer sequence presumed to be located immediately 5′ to the generated amplicon sequence because of the trimming that takes place after library amplification.
The same primer produces both events.
Multiple amplicons are used in the Cancer Hotspot Panel kit to cover these exons.
No sequence present within 20 bases of any amplicon in the Cancer Hotspot Panel corresponds to this short read event.
Validation of G873R and Other Mispriming Events as False Positives
To confirm that the presumptive mispriming events we identified are not true mutations, we designed PCR primers to amplify 100 to 150 bases around the eight most common event sites. By using DNA from the same patient specimen as used to identify the 19 mispriming sites, we amplified each site separately, pooled the products, and sequenced them with the Ion Torrent PGM. With at least 350,000 reads for each amplicon, we identified no short reads with these mutations and few mutations overall (<0.1%) at each mispriming event site (Table 3). Therefore, the mutations seen in the Cancer Hotspot Panel are not present in the genomic DNA of the sample and represent false positives.
Table 3.
Targeted Sequencing of Misprimed Exons Demonstrates False-Positive Nature of Mutations
| Mispriming site∗ | Cancer Hotspot Panel |
Targeted sequencing with custom primer set |
P value | |||||
|---|---|---|---|---|---|---|---|---|
| Mutant reads | Total reads | Percent, % | Short reads with mutation | All mutant reads | Total reads | Percent, % | ||
| ABL1 Exon 7 | 805 | 6484 | 12.42 | 0 | 471 | 575757 | 0.082 | <1 × 10−5 |
| APC Exon 14 | 158 | 1102 | 14.34 | 0 | 1353 | 357411 | 0.38 | |
| EGFR Exon 20 | 279 | 2938 | 9.50 | 0 | 123 | 549422 | 0.022 | |
| EGFR Exon 21 | 388 | 2504 | 15.50 | 0 | 303 | 454039 | 0.067 | |
| GNAQ Exon 5 | 470 | 2408 | 19.52 | 0 | 314 | 480009 | 0.066 | |
| RET Exon 15 | 75 | 781 | 9.60 | 0 | 37 | 490998 | 0.008 | |
| RB1 Exon 20 | 250 | 2176 | 11.49 | 0 | 285 | 628279 | 0.045 | |
| SMAD4 Exon 6 | 112 | 1164 | 9.62 | 0 | 37 | 519439 | 0.007 | |
Monoplex PCR reactions.
Validation of Mispriming Events for Laboratory Specificity
To verify that the mispriming events we see in clinical specimens are not a phenomenon unique to a single laboratory, we analyzed data obtained from an independent laboratory that runs the Cancer Hotspot Panel on an Ion Torrent PGM. We received data from 12 FFPE specimen runs and analyzed them for three different mispriming events. We were able to detect at least a low frequency of mispriming events in 8 of the 12 specimens (Supplemental Table S2), which confirms that these events are not a laboratory-specific artifact.
Mispriming Events Are Identified as Mutations by Multiple Different Alignment Algorithms
To determine whether mispriming events are uniquely identified by the Torrent Suite as real mutations, we re-analyzed the FASTQ sequencing data for four specimens by using two independent alignment algorithms: Bowtie 2 and BWA-MEM. Bowtie 2 aligns all of the misprimed reads in a similar fashion to Torrent Suite, whereas BWA-MEM trims the misprimed ends from three of the five mispriming events where more than one mismatch is present and in both cases where a one base insertion is produced (Supplemental Tables S3 and S4). In the 15 mispriming events in which BWA-MEM does not trim the reads, it maps at least as high a percentage of reads as Torrent Suite. Thus, mispriming events remain a pitfall regardless of which alignment algorithm is chosen.
Mispriming Events Are More Common in Samples with Low DNA Quantity or Few Sequencing Reads
We then sought to determine the frequency of EGFR G873R events in our entire FFPE patient sample runs. In 291 consecutive patient specimens processed during a 6-month period, the frequency varied from 0% (0 of 2926 reads) to 24.1% (7 of 29 reads), quantified by using IGV. A negative correlation was found between input DNA amount and percentage of G873R reads (P < 0.001, Kendall correlation; τ = −0.38). Furthermore, a significantly lower mispriming percentage was found if 30 ng of input DNA was used (the optimum recommended DNA input by Life Technologies; n = 201; median, 0.3%; range, 0% to 17.8%) than if <30 ng was used (n = 90; median, 3.6%; range, 0% to 24.1%; P < 0.001) (Figure 3A). However, no significant difference was found between specimens with <10 ng (10 ng is the minimum recommended DNA input by Ion Torrent; n = 42; median, 1.3%; range, 0% to 19.8%) and specimens with 10 to 29 ng (n = 48; median, 4.9%; range, 0% to 24.1%; P = 0.11). If a 5% G873R cutoff was applied, only 10 of 201 cases with 30 ng had >5% G873R, whereas 40 of 90 cases with <30 ng had >5% (P < 0.001). If the same 5% G873R cutoff was applied along with the minimum recommended input DNA of 10 ng, 34 of 252 cases with >10 ng showed >5% G873R compared with 16 of 39 cases with <10 ng (P < 0.001).
Figure 3.

Next-generation sequencing mispriming events correlate with low input DNA and low read depth. Modified box plots of input DNA (A) and read depth (B) versus percentage of misprimed reads at EGFR exon 21 (G873R). The horizontal line in the middle of each box indicates the median, and the top and bottom borders of the box mark the 75th and 25th percentiles, respectively. The whiskers above and below the box mark the 90th and 10th percentiles. Outliers (±1.5 × interquartile range) are represented as individual dots. ∗∗P < 0.01, ∗∗∗P < 0.001, Wilcoxon rank-sum test.
The number of sequence reads, which positively correlates with input DNA (P < 0.001; τ = 0.20), also played a role in this phenomenon and inversely correlated with the percentage of G873R reads (P < 0.001; τ = −0.22). Significantly fewer mispriming events were found if >500 reads were achieved (the minimum number needed to detect a 5% mutant allele frequency at 95% confidence; n = 256; median, 0.32%; range, 0% to 17.4%) than if <500 reads were achieved (n = 35; median, 10.0%; range, 0% to 24.1%; P < 0.001) (Figure 3B). However, no significant difference was found between specimens with 150 or fewer reads (the minimum needed to detect a 10% mutant allele frequency at 95% confidence; n = 23; median, 10.3%; range, 0% to 24.1%) and specimens with between 151 and 500 reads (n = 12; median, 7.6%; range, 0% to 14.2%; P = 0.44). With the use of a 5% G873R cutoff, 17 of 35 cases with <500 reads showed >5% G873R, whereas only 13 of 256 cases with >500 reads fell into the same category (P < 0.001). With a cutoff of 150 reads, 12 of 23 cases with <150 reads had >5% G873R, whereas 18 of 268 cases with >150 reads showed the same (P < 0.001). Repeat testing of samples generating this phenomenon showed variability in the percentage of mispriming events.
Multiplex PCR Permits Mispriming Events
We then considered whether the mispriming events were intrinsic to the AmpliSeq kit or a fundamental problem of multiplex PCR reactions. To test this, we assembled a multiplex PCR reaction by using the eight primer pairs used in separate PCR reactions described above. We found mispriming events in this custom reaction, confirming that this is an intrinsic problem of multiplex PCR reactions when analyzed by NGS. We then designed two new primers to remove the homology that produced the mispriming events in the first custom reaction. A multiplex reaction, including these replacement primers, eliminated the identified mispriming events but generated a new one. When this primer was again replaced, there were no mispriming events (data not shown).
In this regard, traditional Sanger sequencing or pyrosequencing reactions are inherently forgiving because of their poorer limits of detection. If a primer produces off target amplicons, but these amplicons are <5% of the total, they will not be seen. With NGS, because what is being sequenced is a clone of molecules (produced on the surface of a bead or a chip) from a single starting molecule, all products may be seen, and errors generated by the first round of PCR (library preparation) are seen. Although worrisome at a first approximation, this fact should allow one to iteratively design multiplex PCR reactions by replacing primers that promiscuously bind to other amplicons in the kit (Figure 4).
Figure 4.
Model for iterative primer design for multiplex PCR reactions using NGS. Because the majority of primers do not demonstrate mispriming events, initial primers can be designed and replaced on the basis of iterative mispriming event analyses. NGS, next-generation sequencing.
Discussion
NGS platforms will revolutionize clinical practice by providing cost-effective, rapid, and sensitive sequencing of large panels of amplicons important for cancer diagnosis, prognosis, and therapeutics. However, as we implemented multiplex PCR-based NGS to detect mutations in colon and lung cancer specimens, we commonly encountered false-positive mutations, especially in patient samples with low DNA amounts or shallow read depths. This phenomenon occurs uniformly at the 5′ or 3′ end of short reads, where a primer among the hundreds in the Cancer Hotspot Panel anneals to a homologous sequence, with one to three mismatches, in the middle of one of the other panel amplicons (Figure 2). Long reads never contain these mutations, and they are not present in the genomic DNA targets. Mechanistically, we cannot distinguish whether these events happen with primer binding to genomic DNA directly or to the amplification product as it increases in concentration during PCR, although the latter seems more plausible. It is these mismatches that, on library amplification, lead to mutations that could erroneously be interpreted as existing within the patient's tumor DNA. But, because these false positives occur consistently from sample to sample, in the same locations, within the same amplicons, and in reads that are shorter than full length, we are able to easily exclude them bioinformatically as we sign out patient samples. We tabulated the critical features of mispriming events that distinguish them from true amplicons (Table 4).
Table 4.
Summary of the Characteristics of Misprimed False-Positive Events
| Read length | Always shorter than full-length amplicon, with one end identical to full length and the other in the middle of the target amplicon |
| False-positive mutation site(s) | Within 10 bases of either end of the read |
| Homology with presumed panel primers | 10–17 bases, with 1–3 mismatches |
| Mutation types | Base substitutions and short (1–2 base) insertions |
We also reviewed PubMed and COSMIC for mutations reported at mispriming event sites to see whether these events had been identified by others. We identified three references to mutations at these sites: two of these either did not use NGS or the AmpliSeq panel, whereas ALK G1184E and RET E884V were reported in one thymic squamous cell carcinoma case.17 The thymic carcinoma case is instructive: IGV screen shots of three mutations were published, and the two mutations which we found in mispriming events, ALK G1184E and RET E884V, are seen only at the ends of short reads, whereas the third intronic variant, TP53 c.782+1 G>T, which we did not identify, is displayed in full-length reads. Although we have not reviewed the primary data in this case, and the methods are not completely specified, we are concerned that these two of the three reported mutations may represent false positives due to mispriming, which underscores the need to identify this phenomenon while analyzing multiplex PCR-based NGS data.
The existence of reproducible false-positive mutations in the Cancer Hotspot Panel brings up three major areas of concern: data analysis, sample DNA quantity, and the specificity of massively multiplex PCR reactions. Given the large number of amplicons in this and other targeted gene panels, it is time consuming and ultimately impractical to manually read each sequence. As such, automated software such as Ion Torrent's Variant Caller is essential to the future of this technology in a clinical laboratory. However, in our experience, the current version of Variant Caller is unable to distinguish between a true point mutation and one introduced by a mispriming event, requiring us to examine sequences manually by using the IGV. One possible way to remedy this issue would be to analyze only full-length amplicons, because most of the mispriming events result in much shorter amplicon lengths. This would eliminate these false positives, but at the same time would reduce the total number of reads, and therefore the sensitivity of mutation detection, because a significant number of shorter reads with correct priming would be filtered out. Another approach would be to design software to automatically screen out the mispriming mutations, as they occur in consistent locations in the panel, allowing for short reads to be included in read depth and sensitivity to be maintained. Finally, a third approach would be to identify and eliminate offending primers during kit development. In this regard, NGS may be the ultimate multiplex primer design tool because one could initially design primers, then use NGS to identify and replace promiscuous primers iteratively, as shown in Figure 4.
Our results emphasize the importance of using a sufficient quantity of DNA to ensure accurate results in Ion Torrent-based NGS. In samples in which <30 ng of DNA were available for use, the number of mispriming events was significantly higher. Interestingly, these events were not reported by a clinical validation study that used the same panel, whereby only 10 ng of DNA was used in each case.11 It is unclear whether the short reads we found were screened out in the data analysis of Singh et al.11 If not, there are other possibilities for the discrepancy: we used a different manufacturer's DNA extraction kit and used Ion Torrent's automated One Touch emulsion PCR system. Singh et al11 reported significantly higher quality reads (as determined by the average AQ20 reads per sample) by using a manual emulsion PCR method. In this regard, the instructions for the kit recommend 10 to 30 ng of input DNA, whereas we find fewer mispriming artifacts by using 30 ng.
Our analysis also included 39 patient samples in which we were unable to reach 10 ng of DNA input. A major question is whether to run these samples at all, because of the increased incidence of misprimed false-positive mutations. But we were able to achieve read depths of >150 reads in 28 of 39 samples and >500 reads in 25 of 39 samples, allowing for adequately sensitive detection of true mutations. Because of the difficulty of obtaining more DNA in many cases, especially from fine-needle aspirations and small biopsies, we have attempted to run samples with low DNA content in the hope of obtaining useable results. In about three-quarters of cases, we have been successful. However, one concern with low DNA quantity in cases in which read depth appears adequate (or even excellent) is that a small number of DNA molecules are being preferentially amplified. One could imagine a case in which a fine-needle aspiration contained 10% tumor cells with a coding mutation in a panel amplicon, but with few cells overall. If the DNA from benign stromal cells was preferentially amplified, a false negative would be produced; conversely, if the DNA from tumor cells was preferentially amplified, a falsely high mutation percentage could be obtained, leading to incorrect assignment of a germline mutation. Moreover, the quality of the DNA used may also play a role in this assay. We have samples in which more than adequate DNA amounts were used (30 ng), but read depth was poor and/or a high percentage of mispriming events occurred. Perhaps in these cases, the DNA was more highly fragmented, leading to poor amplification.
Finally, our data indicate a flaw in using massively multiplex PCR reactions, with hundreds of primers, to amplify many different amplicons simultaneously before DNA sequencing. Multiplex PCR requires careful primer design to avoid significant homology between amplicons and primers.18,19 In cases in which the stoichiometry of the reaction shifts toward more primer concentration in relation to template (ie, low DNA quantity), we hypothesize that primer binding to these partially homologous internal targets occurs at a high frequency to generate the short reads that we find. As more amplicons are included in larger and more comprehensive exon panels, we are concerned that this phenomenon may be even more common, because the likelihood of partial homology goes up with increasing numbers of primers and amplicons. One could use a multiplex reaction as a primer design tool of sorts, whereby primers for different exons could be tested iteratively to minimize mispriming events and to maximize read depth. Whole exome sequencing, provided that it achieves adequate depth of coverage to detect small mutation percentages in tumor samples, could resolve this problem by eliminating the amplification step, but it is likely that larger quantities of DNA (and therefore larger samples) will be needed.
In summary, we have identified a consistent pattern of mispriming events that produce false-positive mutations in Ion Torrent-based NGS by using a multiplex PCR-based targeted gene panel. These events are more common in samples with shallow read depth and low DNA quantity. Such low amounts of tumor DNA are often unavoidable in clinical situations in which small biopsies and fine-needle aspirations for tumor diagnosis are used. The mispriming events occur in consistent locations from sample to sample and occur only in short reads, allowing them to be easily identified. As NGS platforms come into wider use in molecular diagnostics, phenomena such as this must be recognized to avoid significant diagnostic errors in clinical practice. The AmpliSeq cancer panel is a valuable new tool for molecular diagnostics laboratories, and we present a number of options to circumvent this issue.
Acknowledgments
We thank Drs. Sarah Wheelan, Denise Batista, Leslie Cope, and Jun Yu for helpful discussions and Ross Gagnon for technical assistance.
Footnotes
Supported by grants from the Caring Collection (C.D.G.), the Johns Hopkins Hospital Women's Board (M.T.L.), NIH grant R21CA164592 (J.R.E.), and a PanCan/AACR Innovation Award (J.R.E.).
Disclosures: None declared.
Supplemental Data
References
- 1.Markowitz S.D., Bertagnolli M.M. Molecular origins of cancer: molecular basis of colorectal cancer. N Engl J Med. 2009;361:2449–2460. doi: 10.1056/NEJMra0804588. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Herbst R.S., Heymach J.V., Lippman S.M. Lung cancer. N Engl J Med. 2008;359:1367–1380. doi: 10.1056/NEJMra0802714. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Tsiatis A.C., Norris-Kirby A., Rich R.G., Hafez M.J., Gocke C.D., Eshleman J.R., Murphy K.M. Comparison of Sanger sequencing, pyrosequencing, and melting curve analysis for the detection of KRAS mutations: diagnostic and clinical implications. J Mol Diagn. 2010;12:425–432. doi: 10.2353/jmoldx.2010.090188. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Querings S., Altmuller J., Ansen S., Zander T., Seidel D., Gabler F., Peifer M., Markert E., Stemshorn K., Timmermann B., Saal B., Klose S., Ernestus K., Scheffler M., Engel-Riedel W., Stoelben E., Brambilla E., Wolf J., Nurnberg P., Thomas R.K. Benchmarking of mutation diagnostics in clinical lung cancer specimens. PLoS One. 2011;6:e19601. doi: 10.1371/journal.pone.0019601. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Loman N.J., Misra R.V., Dallman T.J., Constantinidou C., Gharbia S.E., Wain J., Pallen M.J. Performance comparison of benchtop high-throughput sequencing platforms. Nat Biotechnol. 2012;30:434–439. doi: 10.1038/nbt.2198. [DOI] [PubMed] [Google Scholar]
- 6.Rothberg J.M., Hinz W., Rearick T.M., Schultz J., Mileski W., Davey M. An integrated semiconductor device enabling non-optical genome sequencing. Nature. 2011;475:348–352. doi: 10.1038/nature10242. [DOI] [PubMed] [Google Scholar]
- 7.Metzker M.L. Sequencing technologies - the next generation. Nat Rev Genet. 2010;11:31–46. doi: 10.1038/nrg2626. [DOI] [PubMed] [Google Scholar]
- 8.Wagle N., Berger M.F., Davis M.J., Blumenstiel B., Defelice M., Pochanard P., Ducar M., Van Hummelen P., Macconaill L.E., Hahn W.C., Meyerson M., Gabriel S.B., Garraway L.A. High-throughput detection of actionable genomic alterations in clinical tumor samples by targeted, massively parallel sequencing. Cancer Discov. 2012;2:82–93. doi: 10.1158/2159-8290.CD-11-0184. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Harismendy O., Schwab R.B., Bao L., Olson J., Rozenzhak S., Kotsopoulos S.K., Pond S., Crain B., Chee M.S., Messer K., Link D.R., Frazer K.A. Detection of low prevalence somatic mutations in solid tumors with ultra-deep targeted sequencing. Genome Biol. 2011;12:R124. doi: 10.1186/gb-2011-12-12-r124. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Hadd A.G., Houghton J., Choudhary A., Sah S., Chen L., Marko A.C., Sanford T., Buddavarapu K., Krosting J., Garmire L., Wylie D., Shinde R., Beaudenon S., Alexander E.K., Mambo E., Adai A.T., Latham G.J. Targeted, high-depth, next-generation sequencing of cancer genes in formalin-fixed, paraffin-embedded and fine-needle aspiration tumor specimens. J Mol Diagn. 2013;15:234–247. doi: 10.1016/j.jmoldx.2012.11.006. [DOI] [PubMed] [Google Scholar]
- 11.Singh R.R., Patel K.P., Routbort M.J., Reddy N.G., Barkoh B.A., Handal B., Kanagal-Shamanna R., Greaves W.O., Medeiros L.J., Aldape K.D., Luthra R. Clinical validation of a next-generation sequencing screen for mutational hotspots in 46 cancer-related genes. J Mol Diagn. 2013;15:607–622. doi: 10.1016/j.jmoldx.2013.05.003. [DOI] [PubMed] [Google Scholar]
- 12.Srinivasan M., Sedmak D., Jewell S. Effect of fixatives and tissue processing on the content and integrity of nucleic acids. Am J Pathol. 2002;161:1961–1971. doi: 10.1016/S0002-9440(10)64472-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Hewitt S.M., Lewis F.A., Cao Y., Conrad R.C., Cronin M., Danenberg K.D., Goralski T.J., Langmore J.P., Raja R.G., Williams P.M., Palma J.F., Warrington J.A. Tissue handling and specimen preparation in surgical pathology: issues concerning the recovery of nucleic acids from formalin-fixed, paraffin-embedded tissue. Arch Pathol Lab Med. 2008;132:1929–1935. doi: 10.5858/132.12.1929. [DOI] [PubMed] [Google Scholar]
- 14.Lin M.T., Tseng L.H., Beierl K., Hsieh A., Thiess M., Chase N., Stafford A., Levis M.J., Eshleman J.R., Gocke C.D. Tandem duplication PCR: an ultrasensitive assay for the detection of internal tandem duplications of the FLT3 gene. Diagn Mol Pathol. 2013;22:149–155. doi: 10.1097/PDM.0b013e31828308a1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Li H., Durbin R. Fast and accurate long-read alignment with Burrows-Wheeler transform. Bioinformatics. 2010;26:589–595. doi: 10.1093/bioinformatics/btp698. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Langmead B., Salzberg S.L. Fast gapped-read alignment with Bowtie 2. Nat Methods. 2012;9:357–359. doi: 10.1038/nmeth.1923. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Hu Z., Wang J., Yao T., Hong R.L., Zhang K., Gao H., Wu X., Li J., Bai C., Yen Y. Identification of novel mutations of TP53, ALK and RET gene in metastatic thymic squamous cell carcinoma and its therapeutic implication. Lung Cancer. 2013;81:27–31. doi: 10.1016/j.lungcan.2013.04.006. [DOI] [PubMed] [Google Scholar]
- 18.Henegariu O., Heerema N.A., Dlouhy S.R., Vance G.H., Vogt P.H. Multiplex PCR: critical parameters and step-by-step protocol. Biotechniques. 1997;23:504–511. doi: 10.2144/97233rr01. [DOI] [PubMed] [Google Scholar]
- 19.Markoulatos P., Siafakas N., Moncany M. Multiplex polymerase chain reaction: a practical approach. J Clin Lab Anal. 2002;16:47–51. doi: 10.1002/jcla.2058. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.



