Accuracy of Next Generation Sequencing Platforms

Edward J Fox; Kate S Reid-Bayliss; Mary J Emond; Lawrence A Loeb

doi:10.4172/jngsa.1000106

. Author manuscript; available in PMC: 2015 Feb 17.

Published in final edited form as: Next Gener Seq Appl. 2014 Jun 28;1:1000106. doi: 10.4172/jngsa.1000106

Accuracy of Next Generation Sequencing Platforms

Edward J Fox ¹, Kate S Reid-Bayliss ¹, Mary J Emond ², Lawrence A Loeb ^1,^*

PMCID: PMC4331009 NIHMSID: NIHMS655291 PMID: 25699289

Abstract

Next-generation DNA sequencing has revolutionized genomic studies and is driving the implementation of precision diagnostics. The ability of these technologies to disentangle sequence heterogeneity, however, is limited by their relatively high error rates. A Several single molecule barcoding strategies have been propose to reduce the overall error frequency. A Duplex Sequencing additionally exploits the fact that DNA is double-strand, with one strand reciprocally encoding the sequence information of its complement, and can eliminate nearly all sequencing errors by comparing the sequence of individually tagged amplicons derived from one strand of DNA with that of its complementary strand. This method reduces errors to fewer than one per ten million nucleotides sequenced.

Keywords: Next-generation DNA sequencing, Precision medicine, Accuracy, Duplex sequencing

Introduction

Mutation drives evolution and underlies many diseases, most prominently cancer [1]. Of the newly developed genomic technologies, next-generation DNA sequencing (NGS), in particular, has revolutionized the scale of study of biological systems [2] and has already started to enter the clinic where it is expected to enable a more personalized approach to patient care [3]. Unlike conventional sequencing techniques, which simply report the average genotype of an aggregate of molecules, NGS digitally tabulates the sequence of individual DNA fragments, thereby offering the unique ability to detect minor variants within heterogeneous mixtures [4]. Already, NGS has been used to characterize exceptional diversity within microbial [5,6], viral [7-9], and tumor cell populations [10-12], and many low frequency, drug-resistant variants of therapeutic importance have been identified [13,14]. NGS has also revealed previously underappreciated intra-organismal mosaicism in both the nuclear [15] and mitochondrial genomes [16]. This somatic heterogeneity, along with that underlying adaptive immunity [17], is an important factor in determining the phenotypic variability of disease.

In theory, DNA subpopulations of any size should be detectable via ‘deep sequencing’ of a sufficient number of molecules. However, a fundamental limitation of standard NGS is the high frequency with which bases are scored incorrectly due to artifacts introduced during sample preparation and sequencing [18]. For example, amplification bias during PCR of heterogeneous mixtures can result in skewed populations [19]. Additionally, polymerase mistakes, such as base misincorporations and rearrangements due to template switching, can result in incorrect variant calls. Furthermore, errors arise during cluster amplification, sequencing cycles, and image analysis result in approximately 0.1–1% of bases being called incorrectly (Table 1).

Table 1.

Comparison of the primary error frequencies of DNA sequencing platforms and tag-based error correction methodologies

Commercial Platform	Most Frequent Error Type	Error Frequency
Capillary sequencing	single nucleotide substitutions	10⁻¹
454 GS Junior	Deletions	10⁻²
PacBio RS	CG deletions	10⁻²
Ion Torrent PGM	Short deletions	10⁻²
Solid	A-T bias	2×10⁻²
Illumina MiSeq	single nucleotide substitutions	10⁻³
Illumina HiSeq2000	single nucleotide substitutions	10⁻³
Tag-based methods
SafeSeq	single nucleotide substitutions	1.4×10⁻⁵
CircleSeq	single nucleotide substitutions	7.6×10⁻⁶
Duplex Sequencing	Single nucleotide substitutions	5×10⁻⁸

Open in a new tab

For a genetically homogenous sample, the effects of these base miscalls can be mitigated by establishing a consensus sequence from high-coverage sequencing reads.

However, when rare genetic variants are sought, this base call error frequency presents a profound barrier and has limited the use of deep sequencing in a variety fields that require the highly accurate disentangling of subpopulations within complex (heterogeneous or mixed) biological samples, including metagenomics [20,21], forensics [22], paleogenomics [23] and human genetics [4,24]. Furthermore, for many applications, such as the prenatal screening for fetal aneuploidy [25,26], detection of circulating tumor DNA [27], and monitoring response to chemotherapy with nucleic acid-based serum biomarkers [28], a level of detection well below 1 in 10,000 is highly desirable; unfortunately, the high frequency of erroneous base calls inherent to standard NGS imposes a practical limit of detection of approximately 1 in 100. These technical shortcomings have also limited the elucidation of mechanism by which genomes, and DNA itself, have evolved [29-31], where bioinformatics analyses have been used to reconstruct phylogenetic relationships [32-35].

Although biochemical protocols [36-39] and bioinformatics [10,40-43] have improved sequencing accuracy, the ability to confidently resolve subpopulations below 1% has remained problematic [44]. Laird and colleagues demonstrated that it was possible to significantly reduce the frequency of variant miscalls by covalently linking individual DNA molecules to unique tags prior to amplification [45,46]. This ‘barcoding’ technique allows many artifactual variations in the sequence to be identified as due to technical error [47-52], as all amplicons derived from a particular individual starting molecule carry the same unique specific tag and can, thus, be collapsed to a consensus sequence representing that of the original DNA strand. An alternative to single-stranded tagging based on shear-points is the circle sequencing methodology developed by Lou et al., which utilizes the strand-displacement activity of Phi29's DNA polymerase to generate multiple copies of circularized DNA molecules in tandem prior to amplification [53]. After sequencing, these linked copies are collapsed to a consensus sequence, thereby eliminating many artifactual errors. Though significant improvements, these single-strand approaches all (Table 1) still exhibit error frequencies greater than the estimated frequency of variation of many biological systems. The mutation rate of normal cells, for example, is estimated to range from 10⁻⁹ to 10⁻¹¹ mutations/per nucleotide/per cell division [54,55].

Schmitt et al., highlighted a conceptual shortcoming of initial tag-based methods, and of next-generation sequencing platforms in general, in that use is made of sequence data derived from a single strand of DNA [56]. As a consequence, artifactual variants introduced during the initial rounds of PCR amplification become fixed and are indistinguishable from true variants, since the sequence information of the complementary strand is not taken into account. Damage to DNA from oxidative cellular processes, or generated ex vivo during tissue processing and DNA extraction [57,58], is a particular concern, as such damage can result in frequent copying errors by DNA polymerases. For example, the most thoroughly studied DNA lesion arising from oxidative damage, 8-oxoguanine, incorrectly pairs with adenine during copying with an overall efficiency greater than that of correct pairing with cytosine, and can, thus, contribute a large frequency of artifactual G:C→T:A mutations [59]. Similarly, deamination of cytosine to form uracil is a common event, which leads to inappropriate pairing with adenine during polymerase extension, thus producing artifactual C:G→T:A mutations, at a frequency approaching 100% [60]. Significantly, DNA damage and the resulting sequencing artifacts occur in strand-specific patterns.

Schmitt et al. recognized that these types of errors could be resolved by exploiting the fact that DNA naturally exists as a double-stranded entity, with one molecule reciprocally encoding the sequence information of its complement. Using this insight and the arising sequencing methodology, termed Duplex Sequencing, Schmitt et al., demonstrated that it is possible to identify and eliminate nearly all sequencing errors by comparing the sequence of individually tagged amplicons derived from one strand of DNA with that of its complementary strand; a base sequenced at a given position is scored only if the read data from each of the two strands match perfectly. The method has a theoretical background error rate of less than one artifactual error per 109 nucleotides and has been used to detect variants at a frequency of 5×10⁻⁸.

In principle, Duplex Sequencing can be used with any NGS platform and can call sequence variants when present in an excess of 10 million wild-type sequences [53,56,61]. In contrast, with an error rate of approximately 10⁻², the probability of accurately distinguishing a true subclonal variant from a sequencing artifact in an excess of 100 wild-type molecules with NGS is approximately 50%, using standard (Q30)–filtered reads (Figure 1). A real variant at or below these frequencies cannot be resolved by increasing sequencing depth at a single position, as the proportion of errors will not change. Duplex Sequencing, thus, offers an improvement of nearly 5-orders of magnitude over standard Q30-filtered sequencing and 3-orders of magnitude over other tag-based methods. Thus by exploiting the redundant sequence information contained in the complementary strand of a double-stranded DNA molecule, Duplex Sequencing has dramatically increased the precision and power of NGS. Its application will likely improve our understanding of the substructure of biological systems, including human cancers, help to pinpoint mechanisms of mutation generation, modify the catalog of rare variants, dramatically improve our ability to accurately deconvolute complex biological admixtures, and offer the diagnostic accuracy required for the implementation of precision medicine.

Comparison of the probability that an observed variant is real [54] for subclonal variants using Q30-filtered reads of an Illumina HiSeq2500 (NGS) versus Duplex Sequencing. Error Frequencies of each approach is given in parenthesis. PPV (Positive Predictive Value)=(Expected Number of true positives)/(Expected Total Number of Positive Calls). Note that the PPV is 0.50 for NGS when the variant frequency at a single position is ~1/100, i.e., any variant call has a 50/50 chance of being real hen the frequency of real variants equals the frequency of mistakes invalidity [62].

References

1.Loeb LA. Human cancers express mutator phenotypes: origin, consequences and targeting. Nat Rev Cancer. 2011;11:450–457. doi: 10.1038/nrc3063. [DOI] [PMC free article] [PubMed] [Google Scholar]
2.Shendure J, Ji H. Next-generation DNA sequencing. Nat Biotechnol. 2008;26:1135–1145. doi: 10.1038/nbt1486. [DOI] [PubMed] [Google Scholar]
3.Schwartz WB, Wolfe HJ, Pauker SG. Pathology and probabilities: a new approach to interpreting and reporting biopsies. N Engl J Med. 1981;305:917–923. doi: 10.1056/NEJM198110153051604. [DOI] [PubMed] [Google Scholar]
4.Druley TE, Vallania FL, Wegner DJ, Varley KE, Knowles OL, et al. Quantification of rare allelic variants from pooled genomic DNA. Nat Methods. 2009;6:263–265. doi: 10.1038/nmeth.1307. [DOI] [PMC free article] [PubMed] [Google Scholar]
5.LaTuga MS, Ellis JC, Cotton CM, Goldberg RN, Wynn JL, et al. Beyond bacteria: a study of the enteric microbial consortium in extremely low birth weight infants. PLoS One. 2011;6:e27858. doi: 10.1371/journal.pone.0027858. [DOI] [PMC free article] [PubMed] [Google Scholar]
6.Hyman RW, Herndon CN, Jiang H, Palm C, Fukushima M, et al. The dynamics of the vaginal microbiome during infertility therapy with in vitro fertilization-embryo transfer. J Assist Reprod Genet. 2012;29:105–115. doi: 10.1007/s10815-011-9694-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
7.Minot S, Sinha R, Chen J, Li H, Keilbaugh SA, et al. The human gut virome: inter-individual variation and dynamic response to diet. Genome Res. 2011;21:1616–1625. doi: 10.1101/gr.122705.111. [DOI] [PMC free article] [PubMed] [Google Scholar]
8.Nasu A, Marusawa H, Ueda Y, Nishijima N, Takahashi K, et al. Genetic heterogeneity of hepatitis C virus in association with antiviral therapy determined by ultra-deep sequencing. PLoS One. 2011;6:e24907. doi: 10.1371/journal.pone.0024907. [DOI] [PMC free article] [PubMed] [Google Scholar]
9.Yang J, Yang F, Ren L, Xiong Z, Wu Z, et al. Unbiased parallel detection of viral pathogens in clinical samples by use of a metagenomic approach. J Clin Microbiol. 2011;49:3463–3469. doi: 10.1128/JCM.00273-11. [DOI] [PMC free article] [PubMed] [Google Scholar]
10.Campbell PJ, Pleasance ED, Stephens PJ, Dicks E, Rance R, et al. Subclonal phylogenetic structures in cancer revealed by ultra-deep sequencing. Proc Natl Acad Sci U S A. 2008;105:13081–13086. doi: 10.1073/pnas.0801523105. [DOI] [PMC free article] [PubMed] [Google Scholar]
11.Landau DA, Carter SL, Stojanov P, McKenna A, Stevenson K, et al. Evolution and impact of subclonal mutations in chronic lymphocytic leukemia. Cell. 2013;152:714–726. doi: 10.1016/j.cell.2013.01.019. [DOI] [PMC free article] [PubMed] [Google Scholar]
12.Costello M, Pugh TJ, Fennell TJ, Stewart C, Lichtenstein L, et al. Discovery and characterization of artifactual mutations in deep coverage targeted capture sequencing data due to oxidative DNA damage during sample preparation. Nucleic Acids Res. 2013;41:e67. doi: 10.1093/nar/gks1443. [DOI] [PMC free article] [PubMed] [Google Scholar]
13.Ding L, Ley TJ, Larson DE, Miller CA, Koboldt DC, et al. Clonal evolution in relapsed acute myeloid leukaemia revealed by whole-genome sequencing. Nature. 2012;481:506–510. doi: 10.1038/nature10738. [DOI] [PMC free article] [PubMed] [Google Scholar]
14.Wang C, Mitsuya Y, Gharizadeh B, Ronaghi M, Shafer RW. Characterization of mutation spectra with ultra-deep pyrosequencing: application to HIV-1 drug resistance. Genome Res. 2007;17:1195–1201. doi: 10.1101/gr.6468307. [DOI] [PMC free article] [PubMed] [Google Scholar]
15.Carlson CA, Kas A, Kirkwood R, Hays LE, Preston BD, et al. Decoding cell lineage from acquired mutations using arbitrary deep sequencing. Nat Methods. 2011;9:78–80. doi: 10.1038/nmeth.1781. [DOI] [PMC free article] [PubMed] [Google Scholar]
16.Ameur A, Stewart JB, Freyer C, Hagström E, Ingman M, et al. Ultra-deep sequencing of mouse mitochondrial DNA: mutational patterns and their origins. PLoS Genet. 2011;7:e1002028. doi: 10.1371/journal.pgen.1002028. [DOI] [PMC free article] [PubMed] [Google Scholar]
17.Boyd SD, Marshall EL, Merker JD, Maniar JM, Zhang LN, et al. Measurement and clinical monitoring of human lymphocyte clonality by massively parallel VDJ pyrosequencing. Sci Transl Med. 2009;1:12ra23. doi: 10.1126/scitranslmed.3000540. [DOI] [PMC free article] [PubMed] [Google Scholar]
18.Glenn TC. Field guide to next-generation DNA sequencers. Mol Ecol Resour. 2011;11:759–769. doi: 10.1111/j.1755-0998.2011.03024.x. [DOI] [PubMed] [Google Scholar]
19.Kanagawa T. Bias and artifacts in multitemplate polymerase chain reactions (PCR). J Biosci Bioeng. 2003;96:317–323. doi: 10.1016/S1389-1723(03)90130-7. [DOI] [PubMed] [Google Scholar]
20.Lecroq B, Lejzerowicz F, Bachar D, Christen R, Esling P, et al. Ultra-deep sequencing of foraminiferal microbarcodes unveils hidden richness of early monothalamous lineages in deep-sea sediments. Proc Natl Acad Sci U S A. 2011;108:13177–13182. doi: 10.1073/pnas.1018426108. [DOI] [PMC free article] [PubMed] [Google Scholar]
21.Mackelprang R, Waldrop MP, DeAngelis KM, David MM, Chavarria KL, et al. Metagenomic analysis of a permafrost microbial community reveals a rapid response to thaw. Nature. 2011;480:368–371. doi: 10.1038/nature10576. [DOI] [PubMed] [Google Scholar]
22.Tillmar AO, Dell'Amico B, Welander J, Holmlund G. A universal method for species identification of mammals utilizing next generation sequencing for the analysis of DNA mixtures. PLoS One. 2013;8:e83761. doi: 10.1371/journal.pone.0083761. [DOI] [PMC free article] [PubMed] [Google Scholar]
23.Schubert M, Ermini L, Der Sarkissian C, Jónsson H, Ginolhac A, et al. Characterization of ancient and modern genomes by SNP detection and phylogenomic and metagenomic analysis using PALEOMIX. Nat Protoc. 2014;9:1056–1082. doi: 10.1038/nprot.2014.063. [DOI] [PubMed] [Google Scholar]
24.Out AA, van Minderhout IJ, Goeman JJ, Ariyurek Y, Ossowski S, et al. Deep sequencing to reveal new variants in pooled DNA samples. Hum Mutat. 2009;30:1703–1712. doi: 10.1002/humu.21122. [DOI] [PubMed] [Google Scholar]
25.Fan HC, Blumenfeld YJ, Chitkara U, Hudgins L, Quake SR. Noninvasive diagnosis of fetal aneuploidy by shotgun sequencing DNA from maternal blood. Proc Natl Acad Sci U S A. 2008;105:16266–16271. doi: 10.1073/pnas.0808319105. [DOI] [PMC free article] [PubMed] [Google Scholar]
26.Chiu RW, Akolekar R, Zheng YW, Leung TY, Sun H, et al. Noninvasive prenatal assessment of trisomy 21 by multiplexed maternal plasma DNA sequencing: large scale validity study. BMJ. 2011;342:c7401. doi: 10.1136/bmj.c7401. [DOI] [PMC free article] [PubMed] [Google Scholar]
27.Beck J, Urnovitz HB, Mitchell WM, Schutz E. Next generation sequencing of serum circulating nucleic acids from patients with invasive ductal breast cancer reveals differences to healthy and nonmalignant controls. Mol Cancer Res. 2010;8:335–342. doi: 10.1158/1541-7786.MCR-09-0314. [DOI] [PubMed] [Google Scholar]
28.Leary RJ, Kinde I, Diehl F, Schmidt K, Clouser C, et al. Development of personalized tumor biomarkers using massively parallel sequencing. Sci Transl Med. 2010;2:20ra14. doi: 10.1126/scitranslmed.3000702. [DOI] [PMC free article] [PubMed] [Google Scholar]
29.Di Mauro E, Saladino R, Trifonov EN. The path to life's origins. Remaining hurdles. J Biomol Struct Dyn. 2014;32:512–522. doi: 10.1080/07391102.2013.783509. [DOI] [PubMed] [Google Scholar]
30.Frenkel ZM, Trifonov EN. Origin and evolution of genes and genomes. Crucial role of triplet expansions. J Biomol Struct Dyn. 2012;30:201–210. doi: 10.1080/07391102.2012.677771. [DOI] [PubMed] [Google Scholar]
31.Sobolevsky Y, Guimarães RC, Trifonov EN. Towards functional repertoire of the earliest proteins. J Biomol Struct Dyn. 2013;31:1293–1300. doi: 10.1080/07391102.2012.735623. [DOI] [PubMed] [Google Scholar]
32.Gerhardt GJ, Takeda AA, Andrighetti T, Sartor IT, Echeverrigaray SL, et al. Triplet entropy analysis of hemagglutinin and neuraminidase sequences measures influenza virus phylodynamics. Gene. 2013;528:277–281. doi: 10.1016/j.gene.2013.06.060. [DOI] [PubMed] [Google Scholar]
33.Yang Y, Zhang Y, Jia M, Li C, Meng L. Non-degenerate graphical representation of DNA sequences and its applications to phylogenetic analysis. Comb Chem High Throughput Screen. 2013;16:585–589. doi: 10.2174/1386207311316080001. [DOI] [PubMed] [Google Scholar]
34.Huang T, Zhang J, Xu ZP, Hu LL, Chen L, et al. Deciphering the effects of gene deletion on yeast longevity using network and machine learning approaches. Biochimie. 2012;94:1017–1025. doi: 10.1016/j.biochi.2011.12.024. [DOI] [PubMed] [Google Scholar]
35.Wu G, Yan S. Prediction of mutations engineered by randomness in H5N1 neuraminidases from influenza A virus. Amino acids. 2008;34:81–90. doi: 10.1007/s00726-007-0579-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
36.Kozarewa I, Ning Z, Quail MA, Sanders MJ, Berriman M, et al. Amplification-free Illumina sequencing-library preparation facilitates improved mapping and assembly of (G+C)-biased genomes. Nat Methods. 2009;6:291–295. doi: 10.1038/nmeth.1311. [DOI] [PMC free article] [PubMed] [Google Scholar]
37.Vandenbroucke I, Van Marck H, Verhasselt P, Thys K, Mostmans W, et al. Minor variant detection in amplicons using 454 massive parallel pyrosequencing: experiences and considerations for successful applications. Biotechniques. 2011;51:167–177. doi: 10.2144/000113733. [DOI] [PubMed] [Google Scholar]
38.Vandenbroucke I, Eygen VV, Rondelez E, Vermeiren H, Baelen KV, et al. Minor Variant Detection at Different Template Concentrations in HIV-1 Phenotypic and Genotypic Tropism Testing. Open Virol J. 2008;2:8–14. doi: 10.2174/1874357900802010008. [DOI] [PMC free article] [PubMed] [Google Scholar]
39.Flaherty P, Natsoulis G, Muralidharan O, Winters M, Buenrostro J, et al. Ultrasensitive detection of rare mutations using next-generation targeted resequencing. Nucleic Acids Res. 2012;40:e2. doi: 10.1093/nar/gkr861. [DOI] [PMC free article] [PubMed] [Google Scholar]
40.Muralidharan O, Natsoulis G, Bell J, Newburger D, Xu H, et al. A cross-sample statistical model for SNP detection in short-read sequencing data. Nucleic Acids Res. 2012;40:e5. doi: 10.1093/nar/gkr851. [DOI] [PMC free article] [PubMed] [Google Scholar]
41.Zagordi O, Klein R, Däumer M, Beerenwinkel N. Error correction of next-generation sequencing data and reliable estimation of HIV quasispecies. Nucleic Acids Res. 2010;38:7400–7409. doi: 10.1093/nar/gkq655. [DOI] [PMC free article] [PubMed] [Google Scholar]
42.Quail MA, Kozarewa I, Smith F, Scally A, Stephens PJ, et al. A large genome center's improvements to the Illumina sequencing system. Nat Methods. 2008;5:1005–1010. doi: 10.1038/nmeth.1270. [DOI] [PMC free article] [PubMed] [Google Scholar]
43.Shen Y, Wan Z, Coarfa C, Drabek R, Chen L, et al. A SNP discovery method to assess variant allele probability from next-generation resequencing data. Genome Res. 2010;20:273–280. doi: 10.1101/gr.096388.109. [DOI] [PMC free article] [PubMed] [Google Scholar]
44.Klco JM, Spencer DH, Miller CA, Griffith M, Lamprecht TL, et al. Functional heterogeneity of genetically defined subclones in acute myeloid leukemia. Cancer Cell. 2014;25:379–392. doi: 10.1016/j.ccr.2014.01.031. [DOI] [PMC free article] [PubMed] [Google Scholar]
45.Miner BE, Stöger RJ, Burden AF, Laird CD, Hansen RS. Molecular barcodes detect redundancy and contamination in hairpin-bisulfite PCR. Nucleic Acids Res. 2004;32:e135. doi: 10.1093/nar/gnh132. [DOI] [PMC free article] [PubMed] [Google Scholar]
46.McCloskey M, Stöger R, Hansen RS, Laird CD. Encoding PCR products with batch-stamps and barcodes. Biochem Genet. 2007;45:761–767. doi: 10.1007/s10528-007-9114-x. [DOI] [PubMed] [Google Scholar]
47.Casbon JA, Osborne RJ, Brenner S, Lichtenstein CP. A method for counting PCR template molecules with application to next-generation sequencing. Nucleic Acids Res. 2011;39:e81. doi: 10.1093/nar/gkr217. [DOI] [PMC free article] [PubMed] [Google Scholar]
48.Jabara CB, Jones CD, Roach J, Anderson JA, Swanstrom R. Accurate sampling and deep sequencing of the HIV-1 protease gene using a Primer ID. Proc Natl Acad Sci U S A. 2011;108:20166–20171. doi: 10.1073/pnas.1110064108. [DOI] [PMC free article] [PubMed] [Google Scholar]
49.Kivioja T, Vähärautio A, Karlsson K, Bonke M, Enge M, et al. Counting absolute numbers of molecules using unique molecular identifiers. Nat Methods. 2011;9:72–74. doi: 10.1038/nmeth.1778. [DOI] [PubMed] [Google Scholar]
50.Kinde I, Wu J, Papadopoulos N, Kinzler KW, Vogelstein B. Detection and quantification of rare mutations with massively parallel sequencing. Proc Natl Acad Sci U S A. 2011;108:9530–9535. doi: 10.1073/pnas.1105422108. [DOI] [PMC free article] [PubMed] [Google Scholar]
51.Shiroguchi K, Jia TZ, Sims PA, Xie XS. Digital RNA sequencing minimizes sequence-dependent bias and amplification noise with optimized single-molecule barcodes. Proc Natl Acad Sci U S A. 2012;109:1347–1352. doi: 10.1073/pnas.1118018109. [DOI] [PMC free article] [PubMed] [Google Scholar]
52.Fu GK, Hu J, Wang PH, Fodor SP. Counting individual DNA molecules by the stochastic attachment of diverse labels. Proc Natl Acad Sci U S A. 2011;108:9026–9031. doi: 10.1073/pnas.1017621108. [DOI] [PMC free article] [PubMed] [Google Scholar]
53.Lou DI, Hussmann JA, McBee RM, Acevedo A, Andino R, et al. High-throughput DNA sequencing errors are reduced by orders of magnitude using circle sequencing. Proc Natl Acad Sci U S A. 2013;110:19872–19877. doi: 10.1073/pnas.1319590110. [DOI] [PMC free article] [PubMed] [Google Scholar]
54.Jackson AL, Loeb LA. The mutation rate and cancer. Genetics. 1998;148:1483–1490. doi: 10.1093/genetics/148.4.1483. [DOI] [PMC free article] [PubMed] [Google Scholar]
55.Roach JC, Glusman G, Smit AF, Huff CD, Hubley R, et al. Analysis of genetic inheritance in a family quartet by whole-genome sequencing. Science. 2010;328:636–639. doi: 10.1126/science.1186802. [DOI] [PMC free article] [PubMed] [Google Scholar]
56.Schmitt MW, Kennedy SR, Salk JJ, Fox EJ, Hiatt JB, et al. Detection of ultra-rare mutations by next-generation sequencing. Proc Natl Acad Sci U S A. 2012;109:14508–14513. doi: 10.1073/pnas.1208715109. [DOI] [PMC free article] [PubMed] [Google Scholar]
57.Lindahl T, Wood RD. Quality control by DNA repair. Science. 1999;286:1897–1905. doi: 10.1126/science.286.5446.1897. [DOI] [PubMed] [Google Scholar]
58.Preston BD, Albertson TM, Herr AJ. DNA replication fidelity and cancer. Semin Cancer Biol. 2010;20:281–293. doi: 10.1016/j.semcancer.2010.10.009. [DOI] [PMC free article] [PubMed] [Google Scholar]
59.Shibutani S, Takeshita M, Grollman AP. Insertion of specific bases during DNA synthesis past the oxidation-damaged base 8-oxodG. Nature. 1991;349:431–434. doi: 10.1038/349431a0. [DOI] [PubMed] [Google Scholar]
60.Stiller M, Green RE, Ronan M, Simons JF, Du L, et al. Patterns of nucleotide misincorporations during enzymatic amplification and direct large-scale sequencing of ancient DNA. Proc Natl Acad Sci U S A. 2006;103:13578–13584. doi: 10.1073/pnas.0605327103. [DOI] [PMC free article] [PubMed] [Google Scholar]
61.Kennedy SR, Salk JJ, Schmitt MW, Loeb LA. Ultra-sensitive sequencing reveals an age-related increase in somatic mitochondrial mutations that are inconsistent with oxidative damage. PLoS Genet. 2013;9:e1003794. doi: 10.1371/journal.pgen.1003794. [DOI] [PMC free article] [PubMed] [Google Scholar]
62.Greenberg RS. Medical epidemiology. 4th edn Lange Medical Books/McGraw-Hill; New York: 2005. [Google Scholar]

[R1] 1.Loeb LA. Human cancers express mutator phenotypes: origin, consequences and targeting. Nat Rev Cancer. 2011;11:450–457. doi: 10.1038/nrc3063. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R2] 2.Shendure J, Ji H. Next-generation DNA sequencing. Nat Biotechnol. 2008;26:1135–1145. doi: 10.1038/nbt1486. [DOI] [PubMed] [Google Scholar]

[R3] 3.Schwartz WB, Wolfe HJ, Pauker SG. Pathology and probabilities: a new approach to interpreting and reporting biopsies. N Engl J Med. 1981;305:917–923. doi: 10.1056/NEJM198110153051604. [DOI] [PubMed] [Google Scholar]

[R4] 4.Druley TE, Vallania FL, Wegner DJ, Varley KE, Knowles OL, et al. Quantification of rare allelic variants from pooled genomic DNA. Nat Methods. 2009;6:263–265. doi: 10.1038/nmeth.1307. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R5] 5.LaTuga MS, Ellis JC, Cotton CM, Goldberg RN, Wynn JL, et al. Beyond bacteria: a study of the enteric microbial consortium in extremely low birth weight infants. PLoS One. 2011;6:e27858. doi: 10.1371/journal.pone.0027858. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R6] 6.Hyman RW, Herndon CN, Jiang H, Palm C, Fukushima M, et al. The dynamics of the vaginal microbiome during infertility therapy with in vitro fertilization-embryo transfer. J Assist Reprod Genet. 2012;29:105–115. doi: 10.1007/s10815-011-9694-6. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R7] 7.Minot S, Sinha R, Chen J, Li H, Keilbaugh SA, et al. The human gut virome: inter-individual variation and dynamic response to diet. Genome Res. 2011;21:1616–1625. doi: 10.1101/gr.122705.111. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R8] 8.Nasu A, Marusawa H, Ueda Y, Nishijima N, Takahashi K, et al. Genetic heterogeneity of hepatitis C virus in association with antiviral therapy determined by ultra-deep sequencing. PLoS One. 2011;6:e24907. doi: 10.1371/journal.pone.0024907. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R9] 9.Yang J, Yang F, Ren L, Xiong Z, Wu Z, et al. Unbiased parallel detection of viral pathogens in clinical samples by use of a metagenomic approach. J Clin Microbiol. 2011;49:3463–3469. doi: 10.1128/JCM.00273-11. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R10] 10.Campbell PJ, Pleasance ED, Stephens PJ, Dicks E, Rance R, et al. Subclonal phylogenetic structures in cancer revealed by ultra-deep sequencing. Proc Natl Acad Sci U S A. 2008;105:13081–13086. doi: 10.1073/pnas.0801523105. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R11] 11.Landau DA, Carter SL, Stojanov P, McKenna A, Stevenson K, et al. Evolution and impact of subclonal mutations in chronic lymphocytic leukemia. Cell. 2013;152:714–726. doi: 10.1016/j.cell.2013.01.019. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R12] 12.Costello M, Pugh TJ, Fennell TJ, Stewart C, Lichtenstein L, et al. Discovery and characterization of artifactual mutations in deep coverage targeted capture sequencing data due to oxidative DNA damage during sample preparation. Nucleic Acids Res. 2013;41:e67. doi: 10.1093/nar/gks1443. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R13] 13.Ding L, Ley TJ, Larson DE, Miller CA, Koboldt DC, et al. Clonal evolution in relapsed acute myeloid leukaemia revealed by whole-genome sequencing. Nature. 2012;481:506–510. doi: 10.1038/nature10738. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R14] 14.Wang C, Mitsuya Y, Gharizadeh B, Ronaghi M, Shafer RW. Characterization of mutation spectra with ultra-deep pyrosequencing: application to HIV-1 drug resistance. Genome Res. 2007;17:1195–1201. doi: 10.1101/gr.6468307. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R15] 15.Carlson CA, Kas A, Kirkwood R, Hays LE, Preston BD, et al. Decoding cell lineage from acquired mutations using arbitrary deep sequencing. Nat Methods. 2011;9:78–80. doi: 10.1038/nmeth.1781. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R16] 16.Ameur A, Stewart JB, Freyer C, Hagström E, Ingman M, et al. Ultra-deep sequencing of mouse mitochondrial DNA: mutational patterns and their origins. PLoS Genet. 2011;7:e1002028. doi: 10.1371/journal.pgen.1002028. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R17] 17.Boyd SD, Marshall EL, Merker JD, Maniar JM, Zhang LN, et al. Measurement and clinical monitoring of human lymphocyte clonality by massively parallel VDJ pyrosequencing. Sci Transl Med. 2009;1:12ra23. doi: 10.1126/scitranslmed.3000540. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R18] 18.Glenn TC. Field guide to next-generation DNA sequencers. Mol Ecol Resour. 2011;11:759–769. doi: 10.1111/j.1755-0998.2011.03024.x. [DOI] [PubMed] [Google Scholar]

[R19] 19.Kanagawa T. Bias and artifacts in multitemplate polymerase chain reactions (PCR). J Biosci Bioeng. 2003;96:317–323. doi: 10.1016/S1389-1723(03)90130-7. [DOI] [PubMed] [Google Scholar]

[R20] 20.Lecroq B, Lejzerowicz F, Bachar D, Christen R, Esling P, et al. Ultra-deep sequencing of foraminiferal microbarcodes unveils hidden richness of early monothalamous lineages in deep-sea sediments. Proc Natl Acad Sci U S A. 2011;108:13177–13182. doi: 10.1073/pnas.1018426108. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R21] 21.Mackelprang R, Waldrop MP, DeAngelis KM, David MM, Chavarria KL, et al. Metagenomic analysis of a permafrost microbial community reveals a rapid response to thaw. Nature. 2011;480:368–371. doi: 10.1038/nature10576. [DOI] [PubMed] [Google Scholar]

[R22] 22.Tillmar AO, Dell'Amico B, Welander J, Holmlund G. A universal method for species identification of mammals utilizing next generation sequencing for the analysis of DNA mixtures. PLoS One. 2013;8:e83761. doi: 10.1371/journal.pone.0083761. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R23] 23.Schubert M, Ermini L, Der Sarkissian C, Jónsson H, Ginolhac A, et al. Characterization of ancient and modern genomes by SNP detection and phylogenomic and metagenomic analysis using PALEOMIX. Nat Protoc. 2014;9:1056–1082. doi: 10.1038/nprot.2014.063. [DOI] [PubMed] [Google Scholar]

[R24] 24.Out AA, van Minderhout IJ, Goeman JJ, Ariyurek Y, Ossowski S, et al. Deep sequencing to reveal new variants in pooled DNA samples. Hum Mutat. 2009;30:1703–1712. doi: 10.1002/humu.21122. [DOI] [PubMed] [Google Scholar]

[R25] 25.Fan HC, Blumenfeld YJ, Chitkara U, Hudgins L, Quake SR. Noninvasive diagnosis of fetal aneuploidy by shotgun sequencing DNA from maternal blood. Proc Natl Acad Sci U S A. 2008;105:16266–16271. doi: 10.1073/pnas.0808319105. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R26] 26.Chiu RW, Akolekar R, Zheng YW, Leung TY, Sun H, et al. Noninvasive prenatal assessment of trisomy 21 by multiplexed maternal plasma DNA sequencing: large scale validity study. BMJ. 2011;342:c7401. doi: 10.1136/bmj.c7401. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R27] 27.Beck J, Urnovitz HB, Mitchell WM, Schutz E. Next generation sequencing of serum circulating nucleic acids from patients with invasive ductal breast cancer reveals differences to healthy and nonmalignant controls. Mol Cancer Res. 2010;8:335–342. doi: 10.1158/1541-7786.MCR-09-0314. [DOI] [PubMed] [Google Scholar]

[R28] 28.Leary RJ, Kinde I, Diehl F, Schmidt K, Clouser C, et al. Development of personalized tumor biomarkers using massively parallel sequencing. Sci Transl Med. 2010;2:20ra14. doi: 10.1126/scitranslmed.3000702. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R29] 29.Di Mauro E, Saladino R, Trifonov EN. The path to life's origins. Remaining hurdles. J Biomol Struct Dyn. 2014;32:512–522. doi: 10.1080/07391102.2013.783509. [DOI] [PubMed] [Google Scholar]

[R30] 30.Frenkel ZM, Trifonov EN. Origin and evolution of genes and genomes. Crucial role of triplet expansions. J Biomol Struct Dyn. 2012;30:201–210. doi: 10.1080/07391102.2012.677771. [DOI] [PubMed] [Google Scholar]

[R31] 31.Sobolevsky Y, Guimarães RC, Trifonov EN. Towards functional repertoire of the earliest proteins. J Biomol Struct Dyn. 2013;31:1293–1300. doi: 10.1080/07391102.2012.735623. [DOI] [PubMed] [Google Scholar]

[R32] 32.Gerhardt GJ, Takeda AA, Andrighetti T, Sartor IT, Echeverrigaray SL, et al. Triplet entropy analysis of hemagglutinin and neuraminidase sequences measures influenza virus phylodynamics. Gene. 2013;528:277–281. doi: 10.1016/j.gene.2013.06.060. [DOI] [PubMed] [Google Scholar]

[R33] 33.Yang Y, Zhang Y, Jia M, Li C, Meng L. Non-degenerate graphical representation of DNA sequences and its applications to phylogenetic analysis. Comb Chem High Throughput Screen. 2013;16:585–589. doi: 10.2174/1386207311316080001. [DOI] [PubMed] [Google Scholar]

[R34] 34.Huang T, Zhang J, Xu ZP, Hu LL, Chen L, et al. Deciphering the effects of gene deletion on yeast longevity using network and machine learning approaches. Biochimie. 2012;94:1017–1025. doi: 10.1016/j.biochi.2011.12.024. [DOI] [PubMed] [Google Scholar]

[R35] 35.Wu G, Yan S. Prediction of mutations engineered by randomness in H5N1 neuraminidases from influenza A virus. Amino acids. 2008;34:81–90. doi: 10.1007/s00726-007-0579-z. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R36] 36.Kozarewa I, Ning Z, Quail MA, Sanders MJ, Berriman M, et al. Amplification-free Illumina sequencing-library preparation facilitates improved mapping and assembly of (G+C)-biased genomes. Nat Methods. 2009;6:291–295. doi: 10.1038/nmeth.1311. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R37] 37.Vandenbroucke I, Van Marck H, Verhasselt P, Thys K, Mostmans W, et al. Minor variant detection in amplicons using 454 massive parallel pyrosequencing: experiences and considerations for successful applications. Biotechniques. 2011;51:167–177. doi: 10.2144/000113733. [DOI] [PubMed] [Google Scholar]

[R38] 38.Vandenbroucke I, Eygen VV, Rondelez E, Vermeiren H, Baelen KV, et al. Minor Variant Detection at Different Template Concentrations in HIV-1 Phenotypic and Genotypic Tropism Testing. Open Virol J. 2008;2:8–14. doi: 10.2174/1874357900802010008. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R39] 39.Flaherty P, Natsoulis G, Muralidharan O, Winters M, Buenrostro J, et al. Ultrasensitive detection of rare mutations using next-generation targeted resequencing. Nucleic Acids Res. 2012;40:e2. doi: 10.1093/nar/gkr861. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R40] 40.Muralidharan O, Natsoulis G, Bell J, Newburger D, Xu H, et al. A cross-sample statistical model for SNP detection in short-read sequencing data. Nucleic Acids Res. 2012;40:e5. doi: 10.1093/nar/gkr851. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R41] 41.Zagordi O, Klein R, Däumer M, Beerenwinkel N. Error correction of next-generation sequencing data and reliable estimation of HIV quasispecies. Nucleic Acids Res. 2010;38:7400–7409. doi: 10.1093/nar/gkq655. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R42] 42.Quail MA, Kozarewa I, Smith F, Scally A, Stephens PJ, et al. A large genome center's improvements to the Illumina sequencing system. Nat Methods. 2008;5:1005–1010. doi: 10.1038/nmeth.1270. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R43] 43.Shen Y, Wan Z, Coarfa C, Drabek R, Chen L, et al. A SNP discovery method to assess variant allele probability from next-generation resequencing data. Genome Res. 2010;20:273–280. doi: 10.1101/gr.096388.109. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R44] 44.Klco JM, Spencer DH, Miller CA, Griffith M, Lamprecht TL, et al. Functional heterogeneity of genetically defined subclones in acute myeloid leukemia. Cancer Cell. 2014;25:379–392. doi: 10.1016/j.ccr.2014.01.031. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R45] 45.Miner BE, Stöger RJ, Burden AF, Laird CD, Hansen RS. Molecular barcodes detect redundancy and contamination in hairpin-bisulfite PCR. Nucleic Acids Res. 2004;32:e135. doi: 10.1093/nar/gnh132. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R46] 46.McCloskey M, Stöger R, Hansen RS, Laird CD. Encoding PCR products with batch-stamps and barcodes. Biochem Genet. 2007;45:761–767. doi: 10.1007/s10528-007-9114-x. [DOI] [PubMed] [Google Scholar]

[R47] 47.Casbon JA, Osborne RJ, Brenner S, Lichtenstein CP. A method for counting PCR template molecules with application to next-generation sequencing. Nucleic Acids Res. 2011;39:e81. doi: 10.1093/nar/gkr217. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R48] 48.Jabara CB, Jones CD, Roach J, Anderson JA, Swanstrom R. Accurate sampling and deep sequencing of the HIV-1 protease gene using a Primer ID. Proc Natl Acad Sci U S A. 2011;108:20166–20171. doi: 10.1073/pnas.1110064108. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R49] 49.Kivioja T, Vähärautio A, Karlsson K, Bonke M, Enge M, et al. Counting absolute numbers of molecules using unique molecular identifiers. Nat Methods. 2011;9:72–74. doi: 10.1038/nmeth.1778. [DOI] [PubMed] [Google Scholar]

[R50] 50.Kinde I, Wu J, Papadopoulos N, Kinzler KW, Vogelstein B. Detection and quantification of rare mutations with massively parallel sequencing. Proc Natl Acad Sci U S A. 2011;108:9530–9535. doi: 10.1073/pnas.1105422108. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R51] 51.Shiroguchi K, Jia TZ, Sims PA, Xie XS. Digital RNA sequencing minimizes sequence-dependent bias and amplification noise with optimized single-molecule barcodes. Proc Natl Acad Sci U S A. 2012;109:1347–1352. doi: 10.1073/pnas.1118018109. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R52] 52.Fu GK, Hu J, Wang PH, Fodor SP. Counting individual DNA molecules by the stochastic attachment of diverse labels. Proc Natl Acad Sci U S A. 2011;108:9026–9031. doi: 10.1073/pnas.1017621108. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R53] 53.Lou DI, Hussmann JA, McBee RM, Acevedo A, Andino R, et al. High-throughput DNA sequencing errors are reduced by orders of magnitude using circle sequencing. Proc Natl Acad Sci U S A. 2013;110:19872–19877. doi: 10.1073/pnas.1319590110. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R54] 54.Jackson AL, Loeb LA. The mutation rate and cancer. Genetics. 1998;148:1483–1490. doi: 10.1093/genetics/148.4.1483. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R55] 55.Roach JC, Glusman G, Smit AF, Huff CD, Hubley R, et al. Analysis of genetic inheritance in a family quartet by whole-genome sequencing. Science. 2010;328:636–639. doi: 10.1126/science.1186802. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R56] 56.Schmitt MW, Kennedy SR, Salk JJ, Fox EJ, Hiatt JB, et al. Detection of ultra-rare mutations by next-generation sequencing. Proc Natl Acad Sci U S A. 2012;109:14508–14513. doi: 10.1073/pnas.1208715109. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R57] 57.Lindahl T, Wood RD. Quality control by DNA repair. Science. 1999;286:1897–1905. doi: 10.1126/science.286.5446.1897. [DOI] [PubMed] [Google Scholar]

[R58] 58.Preston BD, Albertson TM, Herr AJ. DNA replication fidelity and cancer. Semin Cancer Biol. 2010;20:281–293. doi: 10.1016/j.semcancer.2010.10.009. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R59] 59.Shibutani S, Takeshita M, Grollman AP. Insertion of specific bases during DNA synthesis past the oxidation-damaged base 8-oxodG. Nature. 1991;349:431–434. doi: 10.1038/349431a0. [DOI] [PubMed] [Google Scholar]

[R60] 60.Stiller M, Green RE, Ronan M, Simons JF, Du L, et al. Patterns of nucleotide misincorporations during enzymatic amplification and direct large-scale sequencing of ancient DNA. Proc Natl Acad Sci U S A. 2006;103:13578–13584. doi: 10.1073/pnas.0605327103. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R61] 61.Kennedy SR, Salk JJ, Schmitt MW, Loeb LA. Ultra-sensitive sequencing reveals an age-related increase in somatic mitochondrial mutations that are inconsistent with oxidative damage. PLoS Genet. 2013;9:e1003794. doi: 10.1371/journal.pgen.1003794. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R62] 62.Greenberg RS. Medical epidemiology. 4th edn Lange Medical Books/McGraw-Hill; New York: 2005. [Google Scholar]

PERMALINK

Accuracy of Next Generation Sequencing Platforms

Edward J Fox

Kate S Reid-Bayliss

Mary J Emond

Lawrence A Loeb

Abstract

Introduction

Table 1.

Figure 1.

References

ACTIONS

PERMALINK

RESOURCES

Cite

Add to Collections

PERMALINK

Accuracy of Next Generation Sequencing Platforms

Edward J Fox

Kate S Reid-Bayliss

Mary J Emond

Lawrence A Loeb

Abstract

Introduction

Table 1.

Figure 1.

References

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases