ABSTRACT
The demand for accurate, faster, and inexpensive sequencing of deoxyribonucleic acid (DNA) is increasing and is driving the emergence of next-generation sequencing (NGS) technologies. NGS can provide useful insights to help researchers and clinicians to develop the right treatment options. NGS has wide applications in novel fields in biology and medicine. These technologies are of great aid to decode mysteries of life, to improve the quality of crops to detect the pathogens, and also useful in improving life qualities. Thousands to millions of molecules can be sequenced simultaneously in parallel using various NGS methods. NGS can identify and characterize the microbial species more comprehensively than culture-based methods. Recently, the NGS approach has been used for oral microbial analysis.
KEYWORDS: DNA, NGS, Oral microbial analysis
HISTORY AND EVOLUTION OF NGS
Watson et al.[1] discovered the double helical structure of DNA. The emergence of gene sequencing can be used to identify genetic composition in biological lives present on earth and to decrypt it, which further can be utilized in understanding the pathophysiology of various genetic diseases and their treatment.[2] In the recent past, tremendous growth in technologies and their understanding has been observed in the field of genomics. In genomic studies, almost in all branches of life sciences, disclosing the DNA sequences has become necessary.[3] In the late 70s, the methods for sequencing of DNA viz. Sanger’s sequencing and with the use of chemical breakage were put forth by Edward Sanger and Maxam–Gilbert, respectively.[3,4,5] In the year 2005, Roche launched a faster “454” technology, termed as “Next-generation sequencing (NGS) technology” or “High throughput (HT) sequencing technology”. NGS can sequence millions to billions of nucleotides in a single run; the large-sized DNA can be sequenced by NGS in a very short time.[2] In this article, NGS technologies have been reviewed briefly.
GENERATIONS AND SEQUENCING PLATFORMS
The First Generation of Sequencing
The sequencing techniques put forth by Sanger and Maxam and Gilbert are called the first generation sequencing technologies.[6] Sanger’s technique is also known as the terminator sequencing because dideoxy nucleotides (dideoxyribonucleotides triphosphates [ddNTPs]) are incorporated as terminators along with deoxy (deoxyribonucleotides triphosphates [dNTPs]) to synthesize the varying lengths of DNA fragments.[2,7] Another first generation sequencing method is Maxam–Gilbert sequencing, also known as the chemical degradation method. In this method, the nucleotides are cleaved with the use of chemicals, and it is found to be most effective with small nucleotide polymers.[2] The future sequencing platforms are built by improving Sanger’s sequencing methods, so Sanger is considered as the father of sequencing technologies.[8]
The Second Generation of Sequencing
To overcome the limitations of first generation sequencing, the emergence of the second generation of sequencers was marked subsequently till 2005.[2] They are also known as next generation sequencers (NGS) isolation of DNAs and creation of single-stranded DNA libraries is the common feature of second-generation DNA sequencing technologies. Various techniques are utilized to create single-stranded DNA libraries by the sample DNA fragmentation. Each of the commercial platforms has specific key features and unique adaptor chemistry for DNA fragment amplification. Then, either on bead or glass slide-based polymerase chain reaction (PCR) methods, these modified DNA libraries are amplified. These single-stranded amplified DNA strands are then converted to double-stranded DNA with the insertion of complementary nucleotides as per ATGC (adenine, thymine, guanine, cytosine) templates in individual flow cycles. The sequencing instrumentation can detect the signal released by the complementary match unique to the DNA template strand.[5] NGS sequencing is based on either short or long-read sequencing.[9] The short long-read by synthesis (SBS).”[2] SBL and SBS both approaches create huge numbers of reaction centers with the clonal DNA template generated during amplification, facilitating massive parallelization that allows the sequencing of millions of DNA simultaneously.[10]
Roche/454 sequencing
The first commercial NGS system by Roche/Life Sciences was launched in 2005 and was named as “454 Roche”.[11] It relies on pyrosequencing.[8] It uses the SBS principle for the sequencing of DNAs. It is a real-time sequencing technique and is bioluminometric in nature. It employs a cascade of four catalytic reactions that give sequence peak signals.[12]
Illumina/Solexa Genome Analyzer (GA)
The concept of sequencing a single DNA molecule bound to microspheres was thought by two British scientists, Shankar Balasubramanian and David Klenerman, in 1997. In 1998, they founded Solexa but were unable to sequence a single DNA molecule, and hence, sequencing was attempted using clonally amplified templates.[11] The first “short read” sequencing analyzer, Solexa Genome Analyzer, was launched commercially and acquired by Illumina. In this technology, the SBS approach was used for sequencing[13,14] Illumina/Solexa platform demands precisely controlled sample loading. If not controlled well, it may lead to overloading, thereby overlapping of clusters and compromising the quality of sequencing.[2] Illumina technology and the platforms are continuously evolving. In 2009, 75PE, 100PE, Truseq V3, 150PE, and GA IIx were introduced, improving the polymerase used, buffer, flow cell, and software and the data output. In 2010, the GA HiSeq 2000 was launched with a substantial increase in read lengths of almost 150-bp paired-end reads.[8] In 2011, “MiSeq” was launched. It is faster as the time required to sequence 150PE and generate 150 G/run is 10 h only.[15]
The Applied Biosystems (ABI) Sequencing by Oligonucleotide Ligation and Detection (SOLiD) system
Applied Biosystems (ABI) in 2007 used the SBL approach in the development of new ABI/SOLiD technology for sequencing.[16] Shedure et al. (2005) in their review, credited McKernan from Agincourt Personal Genomicsfor the development of the ABI/SOLiD sequencing platform.[17] With this platform, more than 50 million bead clusters can be sequenced in parallel with very high throughputs, almost of the order in gigabases per run.[18] However, they have the drawbacks of short reads and longer run times, while the higher sensitivity was the advantage as the platform reads each base sequence twice.[2]
Compact Personal Genome Machine (PGM) sequencers
In 2010, Ion Torrent launched Ion Personal Genomic Machine (PGM), which is based on semiconductor sequencing technology. In this technology, the sequence is determined by detecting changes in the potential of hydrogen (pH) during the reaction. When the nucleotides are incorporated in DNA fragments, the polymerase releases proton, causing the change in pH. The change in pH is detected by a complimentary metal-oxide-semiconductor integrated in the PGM.[19] A modified sequencer of Ion Torrent allows a greater number of samples to be run in parallel and can give a high throughput of almost 10 Gb in only 2 to 6 h. The read lengths obtained can be of 200, 400, and 600 base pairs (bps).[2] One of the important advantages of this sequencer is that the pathogens can be identified.[15]
The Third Generation of Sequencing
The NGS technologies have some limitations: (i) many genomic features cannot be resolved, like long, repetitive regions or copy number variations, by the short reads of second-generation sequencing (SGS), and (ii) the short reads make the genome assembly difficult. SGS technologies require a PCR amplification step, therefore increasing the cost and time as the procedure becomes lengthy.[8,9] To overcome these limitations, “third-generation sequencing” (TGS) techniques were developed. TGS sequencers are cost-effective and are faster as the PCR amplification step is omitted. The long reads exceeding several kilo-bases can be produced with the help of TGS, which resolves the problems in the assembly and repetitive regions in complex genomes.[2,15] The advantage of TGS is that it captures the signal in real-time and is monitored during the enzymatic reaction of the addition of a nucleotide in a complementary strand.[15] Two approaches are used in TGS, including “synthetic long-read and single-molecule real-time sequencing” (SMRT) approaches. Platforms like PacBio and MinION by Oxford Nanopore work on the same principle of the SMRT approach. The high error rate observed during long-read sequencing is the major limitation of TGS platforms.[2] However, by increasing sequencing coverage through multiple passes after 30 passes (i.e. at 30X coverage), this error rate can be reduced.[10]
NGS WORKFLOW
NGS workflow primarily works on the principle of capillary electrophoresis. Though each NGS platform has developed its own protocol for the workflow during sequencing, some basic steps remain common to all platforms, and these are (i) template preparation, (ii) sequencing, (iii) imaging, and (iv) data analysis.[20]
Template Preparation
The template preparation is the first step in all NGS platforms, and the starting material is double-stranded DNA, whose source may vary; it may be genomic DNA, DNA that is immuno-precipitated, reverse-transcribed ribonucleic acid (RNA), or complementary DNA.[4] Preparation of the template can be done in two steps: (i) Preparation of the library and (ii) Amplification of the library.[20]
Library preparation
Library in genomic sequencing means a DNA fragment having adaptors attached to both ends, and the fragment size between these adaptors is known as insert size.[12] Library preparation is nothing but preparing the DNA fragments ready for sequencing. Various approaches can be used to prepare the libraries for RNA and DNA sequencing, which are based on ligation, transposase, or tagging approaches.[21]
Library amplification
In the next step, the DNA libraries prepared are amplified clonally as a preparatory step for sequencing. In this step, when a new nucleotide is added, a significant signal is generated. In PCR based amplification procedure, the DNA fragments are attached on either the microbeads or glass slide of the flow cell as per the protocol of the platform used for sequencing. After the amplification of the library, the next steps are the reactions of sequencing followed by detection by imaging. The sequencer PGM by Ion Torrent amplifies the fragments of DNA on microbeads and utilizes one-touch system emulsion PCR, while bridge amplification is used by Illumina sequencer to form template clusters on a flow cell.[3,20]
Sequencing and Imaging
Once amplification is over, the next step is sequencing, which is carried out by washing and flooding of the DNA fragments in sequential order with the known nucleotides. The nucleotides incorporated in the amplification reaction will be recorded digitally as a sequence. The Ion Torrent PGM sequencer detects the hydrogen ion released due to the addition of nucleotides in growing DNA, so the change in pH is detected by semiconductor sequencing. In the Illumina MiSeq sequencer, the fluorescence generated during the addition of fluorescently labeled nucleotide is detected to identify the sequence.[20]
Data Analysis
Data analysis of raw data generated after sequencing involves several steps and is done by using bioinformatics pipelines.[22] The preprocessing of the data is the initial step in the data analysis, which includes the tasks of removing adaptor sequences and low-quality reads, mapping of the data to a reference genome or realignment of sequence reads to compile sequences, and finally, analyzing the compiled sequences.[9] In any NGS platform, several terabytes of raw data are generated during preprocessing of the data. For further analysis, this raw image data is converted to fast quality (FASTQ) file format. If more than one samples are to be processed, an additional step of demultiplexing is performed. For preprocessing of the data, various packages are used, for example, the Consensus Assessment of Sequence and Variation (CASAVA) and Bioscope package. Various software tools, like Mapping and Assembly with Quality (MAQ), Blat-like Fast Accurate Search Tool (BFAST), and Novoalign Short Read Aligner (Novoalign), are available for read alignments, for mapping the sequence reads to the reference genome sequence.[23] A number of online software and tools are available to successfully analyze sequence data, that is, Metagenomics Rapid Annotation using Subsystem Technology (MG-RAST), Integrated Microbial Genomes/Metagenomes (IMG/M), virus metagenome (VIROME), etc.[22] Apart from these online tools, a few offline tools are also available, like the National Center for Biotechnology Information (NCBI) taxonomy database, for analyzing metagenomic reads against reference datasets.
APPLICATIONS OF NGS
The advanced techniques of NGS have been found to be useful worldwide in many aspects of medicine and biochemical research.[3] Genomic research is playing an important role in understanding of all the organisms’ genetic components in more detail, and it seems to have many applications in food science,[9] agriculture, medicine, drug development,[15] and animal genomics.[19]
Whole Genome Sequencing (WGS)
The entire genome can be sequenced with high throughput by NGS, and the genetic cause of the disease can thus be determined. The most widely used application of NGS is whole-genome sequencing (WGS). It gives complete information on the genome, including genetic variations, drug response on a gene set, and many other complex/simple biological processes. Since WGS allows population-level sequencing, it is becoming an essential tool in human genome mega projects to understand human diseases at the genetic level.[3,10]
Targeted Sequencing
Targeted sequencing involves the sequencing of specific genes. It is useful when a specific disease is diagnosed. As compared to WGS, targeted sequencing is more affordable and requires less time for sequencing. Another advantage is that it yields a broader coverage of the regions of the genome of interest.[20]
Chromatin immunoprecipitation sequencing (ChIP-seq)
ChIP-seq identifies the DNA binding sites. ChIP-seq can be used (i) to analyze the DNA and RNA interactions, (ii) to analyze various regulation events during regular biological processes like DNA repair, gene regulation, and synthesis of DNA, (iii) to classify various genetic diseases, and (iv) to analyze epigenetic modifications, etc.[3]
RNA sequencing (RNA-Seq)
The sequencing of transcribed RNAs (transcriptome) can be done using NGS technologies. As RNA-Seq technology provides information only on transcribed sequences, it can precisely identify functionally relevant mutations. It is found to be useful in the identification of somatic mutations in the development of cancers with its implications.[3,4]
NGS and Microbiome
Microbial “uncultivability” is an important challenge in microbiology. The uncultivability occurs due to the resistance to cultivation by the majority of microbial species in the biosphere. Even the microbiome in humans follows the phenomenon of “uncultivability.” However, with the ever-advancing NGS technologies, it has become easier to identify the majority of organisms present in the environment and in the human microbiome, with their characterization, functions, and interactions that offer them sustainability in a balanced ecological niche. NGS can determine the causal effect of microorganisms in various human diseases like diabetes, obesity, and inflammatory bowel diseases.[24]
NGS and oral microbiome
Lederberg and McCray referred to all the microbial species along with their genome harboring the human oral cavity as the human oral microbiome.[25] The oral microbiome forms complex multi-species bacterial communities, aka biofilms, which are known to cause dental caries and periodontal disease.[26] The traditional cultivation techniques and molecular methods are not capable of identifying the microorganisms from the complex biofilms.[27] NGS techniques can be used effectively to overcome the limitations of conventional sequencing techniques.[28]
Endodontic infections are studied vigorously to decipher the polymicrobial communities involved in the pathogenicity of the disease. The profiling of the microbiome associated with endodontic infections is not well characterized due to the limitations of conventional methods to identify complex microbiota. Hence, the researchers are trying to use NGS technologies for deep sequencing of the endodontic microbiome.[29] NGS is trending even in periodontal microbiological studies, and it has been preferred to understand microbial communities and microbial dynamics in periodontal diseases.[28]
CHALLENGES AND LIMITATIONS OF NGS
Though applications of NGS are ever-expanding, the technology is not without challenges, and advancement in technology brings a new set of challenges that can limit the widespread use of NGS.[30] These challenges are (i) the time required to process and analyze the data,[3] (ii) management of storage space for huge data generated,[31] (iii) short read lengths, (iv) handling the software, (v) accurate interpretation of results with its reproducibility, (vi) privacy and security of protected health information and the user experience complexity,[32] (vii) and the sequencing errors and amplification biases may give false-positive and false-negative results.[33]
CONCLUSION
Different NGS platforms are available. Accuracy in data assembly and quality of data generated from the NGS platform largely depends on the number of sequence reads and read lengths obtained from that particular NGS platform. Therefore, the challenge for researchers is to improve the methods of data analysis so that the huge amount of data generated can be managed efficiently. In order to make the best use of NGS data, the design of state-of-the-art bioinformatics pipelines to extract meaningful biological insights will be a significant topic in the following years.
Financial support and sponsorship
Nil.
Conflicts of interest
There are no conflicts of interest.
REFERENCES
- 1.Watson JD, Crick FH. Molecular structure of nucleic acids: A structure for deoxyribose nucleic acid. Nature. 1953;171:737–8. doi: 10.1038/171737a0. [DOI] [PubMed] [Google Scholar]
- 2.Kchouk M, Gibrat JF, Elloumi M. Generations of sequencing technologies: From first to next generation. Biol Med (Aligarh) 2017;9:395. [Google Scholar]
- 3.Raza K, Ahmed S. Recent advancements in next generation Sequencing techniques and its computational analysis. IntJ Bioinform Res and Appl. 2019;15:191–220. [Google Scholar]
- 4.Rizzo JM, Buck MJ. Key principles and clinical applications of “Next-generation”DNA sequencing. Cancer Prev Res (Phila) 2012;5:887–900. doi: 10.1158/1940-6207.CAPR-11-0432. [DOI] [PubMed] [Google Scholar]
- 5.Gullapalli RR, Desai KV, Santos LS, Kant JA, Becich MJ. Next generation sequencing in clinical medicine: Challenges and lessons for pathology and biomedical informatics. J Pathol Inform. 2012;3:40. doi: 10.4103/2153-3539.103013. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Thudi M, Li Y, Jackson SA, May GD, Varshney RK. Current state of-art of sequencing technologies for plant genomics research. Brief Funct Genomics. 2012;11:3–11. doi: 10.1093/bfgp/elr045. [DOI] [PubMed] [Google Scholar]
- 7.Harrington CT, Lin EI, Olson MT, Eshleman JR. Fundamentals of pyrosequencing. Arch Pathol Lab Med. 2013;137:1296–303. doi: 10.5858/arpa.2012-0463-RA. [DOI] [PubMed] [Google Scholar]
- 8.Pillai S, Gopalan V, Lam AK-Y. Review of sequencing platforms and their applications in phaeochromocytoma and paragangliomas. Crit Rev Oncol Hemat. 2017;116:58–67. doi: 10.1016/j.critrevonc.2017.05.005. [DOI] [PubMed] [Google Scholar]
- 9.Wiedmann M, Carroll LM. Next-generation sequencing. Encyclo. Food Chem. 2018:376. [Google Scholar]
- 10.Goodwin S, Mcpherson JD, Mccombie WR. Coming of age: Ten years of next-generation sequencing technologies. Nat Rev Genet. 2016;17:333–51. doi: 10.1038/nrg.2016.49. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Voelkerding KV, Dames SA, Durtschi JD. Next-generation sequencing: From basicresearch to diagnostics. Clin Chem. 2009;55:4641–58. doi: 10.1373/clinchem.2008.112789. [DOI] [PubMed] [Google Scholar]
- 12.Gharizadeha B, Michael A, Nader N, Ghaderic M, Kenji Y, Nyrén P, et al. Methodological improvements of pyrosequencing technology. J Biotechnol. 2006;124:504–11. doi: 10.1016/j.jbiotec.2006.01.025. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Yohe S, Thyagarajan B. Review of clinical next-generation sequencing. Arch Pathol Lab Med. 2017;141:1544–57. doi: 10.5858/arpa.2016-0501-RA. [DOI] [PubMed] [Google Scholar]
- 14.Mardis ER. Next-generation sequencing platforms. Annu Rev Anal Chem. 2013;6:287–303. doi: 10.1146/annurev-anchem-062012-092628. [DOI] [PubMed] [Google Scholar]
- 15.Lin L, Li Y, Li S, Hu N, He Y, Pong R, et al. Comparison of next-generation sequencing systems. J Biomed Biotech. 2012;2012:251364. doi: 10.1155/2012/251364. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Metzker ML. Sequencing technologies- the next generation. Nat Rev Genet. 2010;11:31–46. doi: 10.1038/nrg2626. [DOI] [PubMed] [Google Scholar]
- 17.Shendure J, Ji H. Next-generation DNA sequencing. Nat Biotech. 2008;26:135–45. doi: 10.1038/nbt1486. [DOI] [PubMed] [Google Scholar]
- 18.Ansorge WJ. Next-generation DNA sequencing techniques. New Biotechnol. 2009;25:195–203. doi: 10.1016/j.nbt.2008.12.009. [DOI] [PubMed] [Google Scholar]
- 19.Pareek CS, Smoczynski R, Tretyn A. Sequencing technologies and genome sequencing. J Appl Genet. 2011;52:413–35. doi: 10.1007/s13353-011-0057-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Grada A, Weinbrecht K. Next-generation sequencing: Methodology and application. J Invest Dermatol. 2013;133:e11. doi: 10.1038/jid.2013.248. [DOI] [PubMed] [Google Scholar]
- 21.Quail MA, Smith M, Coupland P, Otto TD, Harris SR, Connor TR, et al. A tale of three next generation sequencing platforms: Comparison of ion torrent, pacific biosciences and illumina MiSeq sequencers. BMC Genomics. 2012;13:341. doi: 10.1186/1471-2164-13-341. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Dudhagara P, Bhavsar S, Bhagat C, Ghelani A, Bhatt S, Patel R. Web resources for metagenomics studies. Genomics Proteomics Bioinformatics. 2015;13:296–303. doi: 10.1016/j.gpb.2015.10.003. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Lee CY, Chiu YC, Wang LB, Kuo YL, Chuang EY, Lai LC, et al. Common applications of next generation sequencing technologies in genomic research. Transl Cancer Res. 2013;2:33–45. [Google Scholar]
- 24.Clooney AG, Fouhy F, Sleator RD, Driscoll AO, Stanton C, Cotter PD, et al. Comparing apples and oranges?: Next generation sequencing and its impact on microbiome analysis. PLoS One. 2016;11:e0148028. doi: 10.1371/journal.pone.0148028. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Chen H, Jiang W. Application of high throughput sequencing in understanding human oral microbiome related with health and disease. Front Microbiol. 2014;5:1–6. doi: 10.3389/fmicb.2014.00508. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Do T, Devine D, Marsh PD. Oral biofilms: Molecular analysis, challenges, and future prospects in dental diagnostics. Clin Cosmet Investig Dent. 2013;5:11–9. doi: 10.2147/CCIDE.S31005. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Zaura E. Next-generation sequencing approaches to understanding the oral microbiome. Adv Dent Res. 2012;22:81–5. doi: 10.1177/0022034512449466. [DOI] [PubMed] [Google Scholar]
- 28.Keijser BJ, Zaura E, Huse SM, Van Der Vossen JM, Schuren FH, Montijn RC, et al. Pyrosequencing analysis of the oral microflora of healthy adults. J Dent Res. 2008;87:1016–20. doi: 10.1177/154405910808701104. [DOI] [PubMed] [Google Scholar]
- 29.Shin JM, Luo T, Lee KH, Guerreiro D, Botero TM, McDonald NJ, et al. Deciphering endodontic microbial communities by next-generation sequencing. J Endod. 2018;44:1080–7. doi: 10.1016/j.joen.2018.04.003. [DOI] [PubMed] [Google Scholar]
- 30.Li Y, He J, He Z, Zhou Y, Yuan M, Xu X, et al. Phylogenetic and functional gene structure shifts of the oral microbiomes in periodontitis patients. ISME J. 2014;8:1879–91. doi: 10.1038/ismej.2014.28. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Schatz MC, Langmead B. The DNA data deluge fast, efficient genome sequencing machines are spewing out more data than geneticists can analyze. IEEE Spectr. 2013;50:29–33. doi: 10.1109/MSPEC.2013.6545119. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Turak JD, Courtney SM, Hazard ES, Glen WB, Silveira W, Wesselman T, et al. Genomics pipelines and data integration: Challenges and opportunities in the research setting. Expert Rev Mol Diagn. 2017;17:225–37. doi: 10.1080/14737159.2017.1282822. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Schadt EE, Turner S, Kasarskis A. A window into third-generation sequencing. Hum Mol Genet. 2010;19:227–40. doi: 10.1093/hmg/ddq416. [DOI] [PubMed] [Google Scholar]