Abstract
Next generation sequencing (NGS) represents several powerful platforms that have revolutionized RNA and DNA analysis. The parallel sequencing of millions of DNA molecules can provide mechanistic insights into toxicology and provide new avenues for biomarker discovery with growing relevance for risk assessment. The evolution of NGS technologies has improved over the last decade with increased sensitivity and accuracy to foster new biomarker assays from tissue, blood and other biofluids. NGS sequencing technologies can identify transcriptional changes and genomic targets with base pair precision in response to chemical exposure. Further, there are several exciting movements within the toxicology community that incorporate NGS platforms into new strategies for more rapid toxicological characterizations. These include the Tox21 in vitro high throughput transcriptomic screening program, development of organotypic spheroids, alternative animal models, mining archival tissues, liquid biopsy and epigenomics. This review will describe NGS-based technologies, demonstrate how they can be used as tools for target discovery in tissue and blood, and suggest how they might be applied for risk assessment.
Introduction
The Sanger sequencing method was developed in the late 1970’s to analyze DNA sequence using 32P-labeled nucleotides separated by polyacrylamide gels for autoradiograms [1]. Radionuclides were replaced by fluorescently labeled nucleotides and capillary electrophoresis that gave rise to automated sequencing instruments like the Applied Biosystems, Inc. (ABI) model 370A sequencers and others, based on Sanger chemistries [2]. Read length per each sample was at 500–800 nucleotides and sample throughput was limited.
Sanger-based DNA sequencing instruments are considered first-generation platforms. Instruments that perform multiple sequencing reactions simultaneously in a ‘massively parallel’ fashion have been dubbed, ‘NextGeneration’ or NextGen Sequencing [3]. How NGS came about compared to other genomic platforms is interlinked with microarray technology (Figure 1). Both platforms can provide whole genomic approaches to research problems. Microarrays are a fluorescent probe hybridization-based technology with origins in the mid-1990’s that are now a mature genomic platform with a well-established data analysis pipeline. Downsides of microarrays are that a prior genomic knowledge is needed to generate probes which are species-specific with a limited dynamic range for differential expression.
Figure 1.
Timeline for development of microarray and Next Generation Sequencing (NGS) technology platforms. Microarray developments are above the timeline and NGS activities are below. For microarray development, Brown’s laboratory at Stanford was one of the first to develop a multigene expression measurement system using fluorescent detection. The term, toxicogenomics (Tgmx) was first defined by Nuwaysir et al in 1999 [67]. Commercialized platforms such as Affymetrix, Agilent, NimbleGen matured through 2010. For NGS, the Multiple Parallel Signature Sequencing (MPSS) was developed in 2000 by the Brenner lab at Lynx Therapeutics. 454 Life Sciences developed a massively parallel pyrosequencing method in 2006 followed by a commercial instrument put out by Roche. The Solexa short-read platform was acquired by Illumina in 2008 and has undergone continued development and improvement. Commercialization of NGS platforms continues with various speeds of analysis, read lengths, and sequencing capacities. More recent developments include BRB-seq or ‘Bulk RNA Barcoding and Sequencing’ and TempO-Seq by BioSpyder as a library of bar-coded probes that hybridize to representative gene transcripts as a targeted NGS approach to transcript expression. Advantages and disadvantages are summarized and discussed further in the text.
Development of the second wave of sequencing technologies, termed NextGen sequencing (NGS) technologies, has overlapped with microarray platforms (Figure 1). NGS began in the new millennium as exemplified by the MPSS (massively parallel signature sequencing) system that came from university research. Improvements in sequencing chemistries, detection and automation over the next decade promoted a rapid development of NGS platforms. From 2010-2015, many NGS instruments became commercially available that could produce millions of reads from 100 to 1000 bases in length. A read is a short piece of sequence (e.g. 100 nucleotides) that can be aligned to a transcript and it also serves as a quantitative measure of a transcript when summed up with other aligning reads. These second-generation sequencers include the Roche “454 FLX,” Life Technologies “Ion Torrent,” ABI “SOLiD”, and Illumina family of sequencers including the HiSeq 2000 series, MiSeq, X-Ten and NovaSeq [4]. Further advances in NGS sequencing technology such as single-molecule real-time sequencing (SMRT) have led to longer read sequencers such as the Pacific Biosciences “PacBio RS II” instrument that produces reads greater than 10,000 bases [3].
A more recent sequencing technology has been advanced by Oxford Nanopore Technologies. Nanopore instruments read bases directly from single DNA or RNA molecules through a biological nanopore channel – a nanoscale biological tube that sequences by sensing changes in ionic current as the nucleic acid molecule passes through [5]. The sequencing devices can provide rapid analysis (hours) and some units are portable (size of a USB flash drive) that can be readily applied to teaching laboratories, medical offices and field work. Reads lengths can be in the tens to hundreds of kb in length. A primary advantage of long read length is to reduce the ambiguity of highly homologous genes, splice variants and repetitive regions in the genome where alignment is inherently more difficult using short reads. The high sequence resolution of NGS instruments has come at the expense of relatively low sample throughput. This issue has been addressed by creating libraries of targeted probe sets that analyze the complete transcriptome [e.g. TempO-Seq [6] and bulk RNA coding-seq (BRB-seq), reviewed later], rapidly and at relatively low cost. A brief depiction of NGS applications is shown in Figure 2.
Figure 2.
Many applications for measuring RNA and DNA in toxicogenomics are supported by NGS platforms. Whole genome (WH) or transcriptome analysis or targeted portions of each can be measured by NGS. Some abbreviations are: seq – ‘sequencing’; WG – ‘whole genome’; ATAC – ‘assay for transposase-accessible chromatin’; ChIP – ‘chromatin immunoprecipitation’; Ribo – ‘ribosome’; miRNA – ‘microRNA’ RNA analysis platforms are indicated by blue arrows and DNA analysis platforms are shown by red arrows. Further description of these applications is provided in the text.
NGS and Risk Assessment
Traditional risk assessment often involves identifying hazard(s) in a dose-response manner after chemical or test article exposure in animal models or human data if available [7]. Test articles can include chemicals and many other agents including pharmaceuticals, drugs, natural products, particles like asbestos, nanoparticles, physical factors like radiation, metals, and many others. The type of toxicity or hazard can be widely defined as macro- or microscopic lesions and pathologies, altered pharmacologic, immunologic, functional and behavioral reactions, changes in biochemistry and physiology, or any measurable response that is considered adverse or outside of normal health. Study of the underlying molecular changes contributing to toxicity have been greatly facilitated by Omics technologies, particularly transcriptomics, while standardization of data analysis and interpretation continue to be refined [8]. There are many in vitro assays and screens (e.g. anticholinestase activity or bacterial mutagenesis) that support mode of action in the risk assessment process, but new initiatives such as Tox21 aim to develop new assays incorporating NGS platforms for a larger role in risk determination [9].
The dynamic nature of gene expression (transcriptomics) in response to a chemical or test article exposure makes it well suited as part of the hazard identification and dose-setting process for risk assessment [10,11]. There are approximately 15,000 coding genes and probably an equal number of non-coding genes expressed at any one time in a specific cell type. Splice variants also add more complexity to response. Of those expressed genes, only a proportion may change in response to chemical exposure. A robust transcriptomic response may number in the thousands of altered transcripts while a low level of response may differ by a few hundred transcripts or less. The field has been greatly assisted by genomic dose-response analysis software (e.g. BMDExpress) to facilitate use of transcriptomic data in toxicology and risk assessment [12].
RNA-seq
RNA-seq is the principal NGS platform for transcriptome analysis [13,14]. Unlike microarrays, transcript sequencing can occur without prior genomic knowledge, although accurate alignment is greatly enhanced by genome data assemblies. RNA-seq performs tens or hundreds of thousands of small-scale DNA sequencing reactions (cDNA converted from RNA) that produce relatively short sequences (reads) of 100–400 bases in length that in aggregate represent the transcriptome after alignment to a reference genome. About 95% of total RNA isolates are ribosomal RNA (rRNA) and since it provides little value, rRNA must be removed either by using poly(A) enrichment or by rRNA depletion strategies [15]. In most tissues, the transcriptome is primarily composed of mRNA that is translated into protein; however, there is a substantial portion of RNA that is not protein-coding, expressed as non-coding RNA (ncRNA), that can be detected by RNA-seq. Such ncRNA includes microRNA, long non-coding RNA (lncRNA) and other specialized small RNAs (Table 1). mRNA or ncRNA are reverse transcribed into cDNA, and then a library of cDNA fragments is constructed for each sample with short adaptor sequences attached to either fragment end. RNA libraries can be sequenced in one direction single-end reads and also from the opposite direction (paired-end reads). Paired-end reads provide a much more accurate alignment but at more expense.
Table 1.
Transcriptome: Transcript Classification
| Transcript type | Genomic Number | Mature Size | Examples |
|---|---|---|---|
| mRNA - coding | 20 – 25,000 | 500 – 15,000nt | TP53, GAPDH |
| ncRNA – miRNA | 2 – 5,000 | 22nt | miRNA-29, miR-122 |
| ncRNA - lncRNA | >30,000 | >200nt | HOTAIR, PVT1 |
| small ncRNA - regulatory | 1,000 | 20-100nt | tRNA, rRNA, siRNA |
The transcriptome is comprised of coding and non-coding transcripts (ncRNA), including microRNAs, ‘miRNA’; long non-coding transcripts, ‘IncRNA’; and regulatory small ncRNAs (e.g. transfer RNA, ribosomal RNA, small interfering RNA). HOTAIR is ‘HOX antisense intergenic RNA’; PVT1 is ‘Plasmacytoma variant translocation 1).
The number of reads per sequencing lane or sequencing run varies with each NGS platform and the number of samples for RNA-seq analysis can be mixed and distinguished by a multiplexing process called “bar coding.” Statistical confidence in differential transcript expression is increased by devoting a requisite number of reads per sample for adequate ‘depth of coverage’ of the transcriptome [14]. Sequence coverage is the number of times each base within the transcriptome is sequenced; so, if each base within the transcriptome on average was sequenced 10 times, the coverage would be 10-fold. Coverage needs for differential expression using RNA-seq varies from 15 to 30-fold depending upon the organism and reference genome. However, the detection of rare transcripts or inferring SNPs (single nucleotide polymorphisms) from RNA-seq data may require significantly more reads from 100 to 200-fold coverage [16]. It should be acknowledged that the expense, depth of transcriptome coverage, greater data complexity, greater demands for computational analysis and computational infrastructure are considerations in RNA-seq analysis.
One of the landmark studies in comparing RNA-seq with the microarray platform for chemical exposure studies involved the Sequencing Quality Control (SEQC) project that examined thousands of microarrays and RNA-seq analyses to compare differential expression profiling and quality metrics for each platform [17]. In general, the consortium found good agreement between RNA-seq and microarray relative to gene expression although some data variability in low expression genes that could be attributed to differences in expression platforms and data analysis pipelines. A follow up study by this working group compared transcriptomes from RNA-seq and microarray data on 498 primary neuroblastomas and showed that RNA-seq outperforms microarrays in terms of overall transcript characterization but both platforms show similar results in clinical endpoint prediction [18].
The sensitivity and discovery potential of RNA-seq has found applications in biomarker discovery and environmental monitoring that can be relevant for many stages of risk assessment. For example, nine chemical pollutants were screened in undifferentiated mouse embryonic stem cells (mESCs) in a cell-based toxicity assay in which RNA-seq identified novel RNA biomarkers including ncRNAs that showed substantial response to in vitro chemical exposure [19]. TBPH (bis-(2-ethylhexyl)-tetrabromophthalate), a widely used commercial chemical, was screened in vitro in a fish embryo system (Atlantic killifish) by RNA-seq that related transcriptional and pathway changes to developmental endpoints as part of environmental risk assessment [20]. A very innovative study profiled liver expression responses from mice in conventional and germ-free conditions exposed to the persistent environmental contaminants, PBDE (polybrominated diphenyl ethers), to determine effect of presence or absence of gut microbiomes on toxicity and gene expression [21]. These authors found several protein-coding-lncRNA pairs that may serve as specific biomarkers to distinguish various PBDE congeners and shed light on chemical effects and toxicity related to changes in the microbiome. In another RNA-study, livers from rats subchronically exposed to aflatoxin B1 in diet, showed differential expression of 25 new lncRNAs to exposure that were discovered as candidate predictive biomarkers of hepatocellular carcinomas [22]. RNA-seq can also detect microRNAs, as recently described in the Rat MicroRNA Body Atlas [23]), for detecting specific biomarkers like miR122 that is released from hepatocytes into biological fluids after chemically-induced liver injury [24]). Other recent studies have similarly employed the sensitivity and base pair resolution power of RNA-seq for differential expression and pathway changes to suggest modes of action for toxicity with estradiol [25] or pharmaceutical water contaminants [26] in zebrafish, with diesel fractions [27] or the flame retardant contaminant, TBBPA (tetrabromobisphenol A) [28] in zebrafish embryos in vitro.
High Throughput Transcriptomics
Even though RNA-seq can provide a detailed measurement of the transcriptome, it is not a high throughput platform. Tox21 is interagency program to develop and encourage high throughput in vitro screening and advanced computation methods to better predict toxicological effects of chemical exposure [29]. In the past few years, high throughput transcriptomics or ‘HTT’ has been developed to measure thousands of transcripts for differential expression after chemical exposure, in a highly multiplexed fashion that accommodates thousands of samples to establish concentration-response relationships at the gene and pathway level [6,30]. How was this accomplished? A library of transcript-specific probes can be synthesized that bind to RNA in a hybridization-ligation reaction. Indexing sequences are unique to each transcript and sample. Sample libraries can be mixed for simultaneous analysis on an NGS sequencing instrument with a sensitivity of detection in the picogram range for RNA. This sensitivity for RNA transcripts means that transcriptional changes can be measured using thousands or just hundreds of cells. RNA-seq methods have been developed to profile single-cell transcriptomes of known and novel cell types in complex tissues like kidney [31]. The sensitivity of NGS methods has been adapted for toxicant screening in 96 or 384-well plates for high resolution concentration-response assessment to chemical exposures [32]. Several papers have demonstrated its application in toxicity screening. For example, six compounds were screened in differentiated kidney RPTEC or liver HepaRG cells in a time and concentration related manner using a 2800 transcript panel to discriminate compound and cell type-specific responses [33]. The development of liver spheroids comprised of 1,000 to 2,000 cells have an increased metabolic capacity over two-dimensional cells in flat culture. The high sensitivity of HTT can be exploited for rapid, high throughput concentration-response studies with multiple chemicals [32,34].
While the whole transcriptome can be screened in each HTT analysis, considerable sequencing must be performed to deliver a statistical level of confidence for gene expression especially for low copy number transcripts. As a result, a strategy of selecting a transcriptomic subset of genes has been developed into a platform called the S1500+, or ‘Sentinel’ 1500 [30]. This platform evolved as a hybrid approach of combining 1) the L1000 platform; 2) transcripts using a toxicogenomic data-driven method from public databases for selecting the most responsive transcripts; and 3) expert contributed genes [30]. The S1500+ platform represents a biological space reflecting a diverse pharmacologic and toxicity gene expression that represents all known canonical pathways from the Molecular Signature Database and can infer changes from the remainder of the transcriptome. A study comparing the S1500+ gene set with RNA-seq and microarray rat liver mode of action samples demonstrated that the S1500+ platform results are consistent with findings performed with genome-wide platforms (e.g. microarray, RNA-seq) for measuring genome-wide transcriptional responses [35]. Another aspect of the HTT approach for risk assessment can be the comparative screening of specific cell types in vitro (e.g., liver, kidney, heart, neurons) to test articles during the same experiment. For animal testing, it will eventually be possible to monitor transcriptional changes in all tissues of test-article exposed animals to accompany histopathologic and clinical chemistry evaluations.
Archival Transcriptomics
Toxicology studies with archival specimens such as formalin-fixed and paraffin-embedded (FFPE) tissues comprise an invaluable resource for linking histopathologic diagnosis to gene expression profiles. Establishing molecular and pathologic relationships can provide a basis for risk assessment based on linking established molecular pathways with chemical pathologies and disease. Advanced procedures for deparaffinization and enzymatic digestion have been developed to release nucleic acids for extraction and purification from slices of paraffin blocks after years in storage [36].
The non-specific hybridization and higher background from microarray analysis of archival blocked samples have encouraged researchers to use RNA-seq for transcriptomic profiling. Several studies have successfully used NGS transcript profiling technologies on archival samples. One study showed genomic signatures and gene set analysis in AFB1 differentially expressed transcripts that were highly comparable for matched fresh frozen and FFPE tissues [37]. Subsequent studies have shown a conservation of gene expression patterns in FFPE and frozen tissue samples especially when ribosomal RNA depletion procedures were used [38]. A comprehensive analysis of archival liver sample sets involving di(2-ethylhexyl)phthalate or dichloroacetic acid, varying from 2 to 20 years storage, showed remarkably high correlation in dose-response of differentially expressed genes, despite challenges of lower read counts from the older study [39]. In particular, the more recently (2 year) archived FFPE samples were highly similar to frozen sample transcriptional data regarding sequencing quality metrics, differential expression and dose-response relationships [39].
DNA-Sequencing
The base-pair resolution of NGS platforms are uniquely positioned to detect mutations and single nucleotide variations (SNPs) related to genomic changes and health hazards posed by chemical exposure. Targeted sequencing of molecular sensors of genotoxic and cellular stress like TP53 [40]; multiple ‘Omic analyses involving genomic, epigenomic and transcriptomic characterization of the mutagen, 1,3-butadiene in mouse strains [41]; and research on genomic interrogation of oxidative DNA damage [42] are representative studies that have applied multiple NGS sequencing platforms to more completely describe adverse effects of chemical exposure across the genome. Unlike the dynamic nature of RNA transcription to hazardous substance exposures, the stability of DNA in human, rodent and other animal model systems does not as readily lend itself to rapid chemical screening for genomic changes in risk assessment. Genetic toxicology employs a range of screening tests for DNA damage including the Ames assay, the micronucleus test, the Comet assay and several chromosomal aberration and DNA damage and repair assays [43].
There is increasing interest in the role that high throughput screening and NGS platforms might play in creating a new generation of tests for genotoxicity and genetic susceptibility to disease and chemical hazards [44,45]. An important part of NGS approaches is the ability to interrogate the whole genome or the exome, where changes in coding regions of genes could alter translational protein products and damage regulatory processes. In the past, human genetic and epidemiological studies were limited to a candidate gene approach to establish genomic-disease and toxicity relationships. Now, whole genome sequencing can survey single nucleotide polymorphisms, copy number variation and chromosomal aberrations with increasing accuracy. Sequencing studies of humans and experimental species (e.g. mouse) provide publicly available data to better estimate ‘normal’ sequence variation (normal phenotype) which is critical for distinguishing those variants leading to environmental disease. Inherent in this task of genetic variation is discerning germline variants (hereditable sequences) from somatic variants acquired in DNA of tissues during life. NGS sequencing in forward genetic screens in mice is one experimental approach to help sort out candidate genes and mutations that correspond to specific phenotypes [46]. The dbSNP public database maintained by NCBI is a catalog of single nucleotide variants, small scale deletions or insertions, and short tandem repeats like microsatellites [47,48]. As of 2017, NCBI will only accept human data variant submissions while EBI’s European Variation Archive will continue to accept data and host the collection of non-human data variants. dbGaP is the NCBI public database that archives genotype-phenotype associations from many sources such as GWAS (genome-wide association study) data, SRA (short read archive), molecular diagnostic assays and others.
Whole-Genome Sequencing
The cost of whole genomic sequencing has crossed below the one thousand-dollar barrier. In the wake of decreasing costs of NGS platforms, considerable public resources are now being devoted to sequencing thousands of human genomes to benefit personalized medicine and understanding genetic susceptibility. For example, the ‘All of Us’ project sponsored by NIH aims to fund several centers for complete genomic sequencing on one million or more people to interrelate effects of genetics, environment and life style [49]. As public sequencing projects get underway, the varying levels of sequencing depth are required for how genomic information can be used with confidence for distinguishing germline variants and rare variants from somatic variation, use in clinical decisions, surgery and therapeutics; use in genetic counseling; or use in advising patients with risk factors, either known or suspect [50]. Not all variants will have risk; the risks may yet be undiscovered; variants may be multifactorial in disease risk; or variants may be protective, counteracting the disease potential posed by other variants. Pharmacogenetics and pharmacogenomics are research areas where NGS may inform both clinical and research efforts for therapeutic and chemical exposure risks [51].
Genomic material for sequencing is generally collected from blood or oral swabs and such procedures offer a non-invasive ease of collection [52] but may have limitations. First, is that many disease phenotypes occur in tissues or organs far away from the collection site so critical sequence variants may not be readily observed when obtained from blood or pharynx. Second, initiation of early stage disease may begin in a single cell or small cluster of cells so that detection of sequence variants may require cell enrichment or greater depths of DNA-sequencing. Despite these concerns, there is enthusiasm for use of non-invasive or minimally invasive (e.g. blood) DNA or RNA for NGS sequencing and genotyping [53,54].
Exome-Sequencing
Since coding regions comprise about 2% of genomes, designing probe sets to capture and sequence only those coding genomic regions provides an efficient way to examine sequence variants in the most consequential regions of the genome where changes may be linked to abnormal phenotypes and disease without sequencing the entire genome. The other advantages to this approach are the greater depth of sequencing possible with each sample (e.g. 50 to 100-fold or more) compared to whole genome sequencing, and the greater number of samples possible for analysis in a sequencing run [55]. Thus far, use of exome sequencing for assessment of risk has been clinically focused in prenatal and reproductive medicine [56] as well as cancer diagnosis and treatment [57]. Use of exome-sequencing in environmental risk assessment lies in the future.
Duplex Sequencing
Duplex sequencing is a highly specific NGS method for detecting rare sequence variants and mutation with frequencies as low as 1 in 10 million [58]. Specific adapters and tags can uniquely identify reads from each strand of DNA. Although there is considerable sensitivity in NGS sequencing platforms, sample preparation and polymerase amplification error rate contribute to substantial noise that obscures low frequency mutations. For example, duplex sequencing was used to determine TP53 mutations in peritoneal fluid samples from women with ovarian carcinomas and control subjects without cancer [59]. Findings showed nearly all patients with and without cancer (35/37 total) had low frequency TP53 mutations that were more abundant with cancer, clustered in hotspots, and increased with age. Widespread, age-associated somatic TP53 mutations in noncancerous tissue suggests overall mutational burden even in normal individuals. Such TP53 mutations could also be detected in peripheral blood samples. Another study identified a characteristic mutation spectrum for the liver carcinogen, aflatoxin B1 (AFB1), months before tumors were detectable in a mouse model using duplex sequencing [60]. The AFB1 spectrum proved clinically useful in accurately identifying a subset of cancers associated with AFB1 exposure from a larger set of human liver tumors. Use of duplex sequencing as a measure of mutational load in sensitive genomic regions due to environmental chemical exposure remains to be explored.
Liquid Biopsy
Liquid biopsy refers efforts to detect and monitor disease or toxicity in accessible biofluids, notably in blood, because of the relative ease of sampling that can be done repeatedly over time. Release of cell-specific miRNAs during chemically-induced toxicity can be exploited by NGS for a liquid biopsy approach to assess risk to chemical exposure over time [61]. For example, miRNA-seq analysis of urine in rats sampled after one week exposure to the renal toxicant, gentamicin, found 227 unique miRNAs of which 146 were differentially expressed, with 9 being novel miRNAs not found on a primer-designed qPCR platform [62]. In addition to circulating miRNA, NGS analysis of plasma DNA used for tumor diagnosis in oncology might be similarly applied in toxicology. Circulating, cell free DNA (ccfDNA) is comprised of short fragments extracellular DNA (~180bp) that normally circulate in blood at low levels (e.g. 1-5 ng/ml) in healthy individuals. ccfDNA is derived from leukocytes and tissue apoptosis and cell turnover from all tissues. However, in many tumors the amount of ccfDNA is increased (10-100 ng/ml) and a portion may harbor diagnostic cancer mutations [63]. Use of ccfDNA has gained attention in the diagnosis, staging and biomarker discovery for many types of tumors [64], and also many other diseases [65] (e.g. autoimmune and infectious diseases) as a novel, minimally-invasive form of ‘liquid biopsy’, an attractive alternative to needle biopsy. Another exciting development in this field is the epigenetic analysis of ccfDNA for cancer diagnostics and determining tissue of origin along with somatic mutations [66]. To date, ccfDNA has been little studied in environmental health sciences. However, for those exposures that leave a somatic mutation or epigenetic pattern, NGS analysis of ccfDNA at the exome, whole genomic or epigenomic level could provide new data on the amounts and types of environmental exposures over time in experimental and epidemiological settings for improved risk assessments.
Summary
Transitioning from the clinic to environmental carcinogenesis research, the impact of biomarker research is shifting from therapeutics and companion diagnostic development towards risk assessment and early detection of chemical exposure and disease-related changes. Biomarkers evaluating risk assessment directly relate to environmental regulation and can be used to help define amounts and type of environmental chemical exposures, biomarkers of effect, and biomarkers of susceptibility, all of which reflect the interactions between the environment and the population. Thus, NGS data can contribute to an understanding of the environmental and/or genetic factors that could lead to potential adverse health effects and have a very positive impact on the future of chemical risk assessment.
NextGen sequencing (NGS) technologies have rapidly developed and surpassed matured microarray platforms.
NGS is well suited to capture transcriptome responsiveness for hazard identification and benchmark dose setting.
Archival transcriptomics of paraffinized tissues by RNA-seq benefit from phenotypic anchoring.
High throughput transcriptomics by NGS generates rapid chemical concentration-response curves.
Duplex sequencing, exome-seq and ccfDNA may provide new NGS-based measures for risk assessment.
Acknowledgements
This work was supported by the Divisions of the National Toxicology Program at the National Institute of Environmental Health Sciences.
Footnotes
Conflicts of Interest
The author declares that there are no conflicts of interest regarding publication of this work.
Disclaimer
This article is the work product of employees of the National Institute of Environmental Health Sciences (NIEHS), National Institutes of Health (NIH); however, the statements, opinions or conclusions contained therein do not necessarily represent the statements, opinions or conclusions of NIEHS, NIH or the United States government.
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
References
Papers of particular interest, published within the period of review have been highlighted as:
• of special interest
•• of outstanding interest
- 1.Sanger F, Nicklen S, Coulson AR: DNA sequencing with chain-terminating inhibitors. Proc Natl Acad Sci U S A 1977, 74:5463–5467. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Heather JM, Chain B: The sequence of sequencers: The history of sequencing DNA. Genomics 2016, 107:1–8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Goodwin S, McPherson JD, McCombie WR: Coming of age: ten years of next-generation sequencing technologies. Nat Rev Genet 2016, 17:333–351.• A current overview of NGS technologies and applications, including a comparison against microarrays, and subgenomic platforms like NanoString, qPCR and Optical mapping. Different NGS instrument and sequencing platforms are described ranging from DNA-seq, RNA-seq, ATAC-seq and others.
- 4.Mutz KO, Heilkenbrinker A, Lonne M, Walter JG, Stahl F: Transcriptome analysis using next-generation sequencing. Curr Opin Biotechnol 2013, 24:22–30. [DOI] [PubMed] [Google Scholar]
- 5.Feng Y, Zhang Y, Ying C, Wang D, Du C: Nanopore-based fourth-generation DNA sequencing technology. Genomics Proteomics Bioinformatics 2015, 13:4–16. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Yeakley JM, Shepard PJ, Goyena DE, VanSteenhouse HC, McComb JD, Seligmann BE: A trichostatin A expression signature identified by TempO-Seq targeted whole transcriptome profiling. PLoS One 2017, 12:e0178302. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.McCarty LS, Borgert CJ, Posthuma L: The regulatory challenge of chemicals in the environment: Toxicity testing, risk assessment, and decision-making models. Regul Toxicol Pharmacol 2018, 99:289–295. [DOI] [PubMed] [Google Scholar]
- 8.Sauer UG, Deferme L, Gribaldo L, Hackermuller J, Tralau T, van Ravenzwaay B, Yauk C, Poole A, Tong W, Gant TW: The challenge of the application of 'omics technologies in chemicals risk assessment: Background and outlook. Regul Toxicol Pharmacol 2017, 91 Suppl 1:S14–S26. [DOI] [PubMed] [Google Scholar]
- 9.Tice RR, Austin CP, Kavlock RJ, Bucher JR: Improving the human hazard characterization of chemicals: a Tox21 update. Environ Health Perspect 2013, 121:756–765.• A review of the Tox21 interagency program, the goals and accomplishments for Phase I and Phase II is detailed. High throughput screening assays in 1536 well format are described to form 15 concentration-response curves to a 10K (10,000 environmental, pesticide and pharmaceutical compounds) in a variety of nuclear receptor and stress reponse reporter assay systems.
- 10.Bourdon-Lacombe JA, Moffat ID, Deveau M, Husain M, Auerbach S, Krewski D, Thomas RS, Bushel PR, Williams A, Yauk CL: Technical guide for applications of gene expression profiling in human health risk assessment of environmental chemicals. Regul Toxicol Pharmacol 2015, 72:292–309. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Wilson VS, Keshava N, Hester S, Segal D, Chiu W, Thompson CM, Euling SY: Utilizing toxicogenomic data to understand chemical mechanism of action in risk assessment. Toxicol Appl Pharmacol 2013, 271:299–308. [DOI] [PubMed] [Google Scholar]
- 12.Phillips JR, Svoboda DL, Tandon A, Patel S, Sedykh A, Mav D, Kuo B, Yauk CL, Yang L, Thomas RS, et al. : BMDExpress 2: Enhanced transcriptomic dose-response analysis workflow. Bioinformatics 2018. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Chen G, Shi T, Shi L: Characterizing and annotating the genome using RNA-seq data. Sci China Life Sci 2017, 60:116–125. [DOI] [PubMed] [Google Scholar]
- 14.Hrdlickova R, Toloue M, Tian B: RNA-Seq methods for transcriptome analysis. Wiley Interdiscip Rev RNA 2017, 8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Zhao S, Zhang Y, Gamini R, Zhang B, von Schack D: Evaluation of two main RNA-seq approaches for gene quantification in clinical RNA sequencing: polyA+ selection versus rRNA depletion. Sci Rep 2018, 8:4781. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Piskol R, Ramaswami G, Li JB: Reliable identification of genomic variants from RNA-seq data. Am J Hum Genet 2013, 93:641–651. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Wang C, Gong B, Bushel PR, Thierry-Mieg J, Thierry-Mieg D, Xu J, Fang H, Hong H, Shen J, Su Z, et al. : The concordance between RNA-seq and microarray data depends on chemical treatment and transcript abundance. Nat Biotechnol 2014, 32:926–932.•• Large consortium study comparing the RNA-seq and microarray platforms iwht 27 chemicals representing multiple modes of action (MOA). Cross platform concordance was very high but differentially expressed genes and pathways was affected by transcript abundance and biological complexity of MOA. The bioinformatic methods and data viscualization are very informative for analyzing large datasets and platform comparison.
- 18.Zhang W, Yu Y, Hertwig F, Thierry-Mieg J, Zhang W, Thierry-Mieg D, Wang J, Furlanello C, Devanarayan V, Cheng J, et al. : Comparison of RNA-seq and microarray-based models for clinical endpoint prediction. Genome Biol 2015, 16:133. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Tani H, Takeshita JI, Aoki H, Nakamura K, Abe R, Toyoda A, Endo Y, Miyamoto S, Gamo M, Sato H, et al. : Identification of RNA biomarkers for chemical safety screening in mouse embryonic stem cells using RNA deep sequencing analysis. PLoS One 2017, 12:e0182032. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Huang W, Bencic DC, Flick RL, Nacci DE, Clark BW, Burkhard L, Lahren T, Biales AD: Characterization of the Fundulus heteroclitus embryo transcriptional response and development of a gene expression-based fingerprint of exposure for the alternative flame retardant, TBPH (bis (2-ethylhexyl)-tetrabromophthalate). Environ Pollut 2019, 247:696–705. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Li CY, Cui JY: Regulation of protein-coding gene and long noncoding RNA pairs in liver of conventional and germ-free mice following oral PBDE exposure. PLoS One 2018, 13:e0201387. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Merrick BA, Chang JS, Phadke DP, Bostrom MA, Shah RR, Wang X, Gordon O, Wright GM: HAfTs are novel lncRNA transcripts from aflatoxin exposure. PLoS One 2018, 13:e0190992. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Smith A, Calley J, Mathur S, Qian HR, Wu H, Farmen M, Caiment F, Bushel PR, Li J, Fisher C, et al. : The Rat microRNA body atlas; Evaluation of the microRNA content of rat organs through deep sequencing and characterization of pancreas enriched miRNAs as biomarkers of pancreatic toxicity in the rat and dog. BMC Genomics 2016, 17:694. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Kullak-Ublick GA, Andrade RJ, Merz M, End P, Benesic A, Gerbes AL, Aithal GP: Drug-induced liver injury: recent advances in diagnosis and risk assessment. Gut 2017, 66:1154–1164. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Zheng Y, Yuan J, Meng S, Chen J, Gu Z: Testicular transcriptome alterations in zebrafish (Danio rerio) exposure to 17beta-estradiol. Chemosphere 2019, 218:14–25. [DOI] [PubMed] [Google Scholar]
- 26.Wu M, Liu S, Hu L, Qu H, Pan C, Lei P, Shen Y, Yang M: Global transcriptomic analysis of zebrafish in response to embryonic exposure to three antidepressants, amitriptyline, fluoxetine and mianserin. Aquat Toxicol 2017, 192:274–283. [DOI] [PubMed] [Google Scholar]
- 27.Mu X, Liu J, Yang K, Huang Y, Li X, Yang W, Qi S, Tu W, Shen G, Li Y: 0# Diesel water-accommodated fraction induced lipid homeostasis alteration in zebrafish embryos. Environ Pollut 2018, 242:952–961. [DOI] [PubMed] [Google Scholar]
- 28.Chen J, Tanguay RL, Xiao Y, Haggard DE, Ge X, Jia Y, Zheng Y, Dong Q, Huang C, Lin K: TBBPA exposure during a sensitive developmental window produces neurobehavioral changes in larval zebrafish. Environ Pollut 2016, 216:53–63. [DOI] [PubMed] [Google Scholar]
- 29.Merrick BA, Paules RS, Tice RR: Intersection of toxicogenomics and high throughput screening in the Tox21 program: an NIEHS perspective. Int J Biotechnol 2015, 14:7–27. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Mav D, Shah RR, Howard BE, Auerbach SS, Bushel PR, Collins JB, Gerhold DL, Judson RS, Karmaus AL, Maull EA, et al. : A hybrid gene selection approach to create the S1500+ targeted gene sets for use in high-throughput transcriptomics. PLoS One 2018, 13:e0191105. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Park J, Shrestha R, Qiu C, Kondo A, Huang S, Werth M, Li M, Barasch J, Susztak K: Single-cell transcriptomics of the mouse kidney reveals potential cellular targets of kidney disease. Science 2018, 360:758–763. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Ramaiahgari SC, Waidyanatha S, Dixon D, DeVito MJ, Paules RS, Ferguson SS: Three-Dimensional (3D) HepaRG Spheroid Model With Physiologically Relevant Xenobiotic Metabolism Competence and Hepatocyte Functionality for Liver Toxicity Screening. Toxicol Sci 2017, 160:189–190.•• Human liver spheroid model demonstrates enhanced xenobiotic (CYP1A2, CYP2B6and CYP3A4/5) and functional capabilities. HepaRG liver spheroids can be screened in 384 well plates and show liver enzyme inducibility with activators of hepatic receptors to AhR, CAR and PXR. The 3D spheroids have a longevity in culture needed for repeated exposures and lab-to-lab and year-to-year repeatability for toxicology screening. Generated data could rapidly provide concentration-response data for risk assessment.
- 33.Limonciel A, Ates G, Carta G, Wilmes A, Watzele M, Shepard PJ, VanSteenhouse HC, Seligmann B, Yeakley JM, van de Water B, et al. : Comparison of base-line and chemical-induced transcriptomic responses in HepaRG and RPTEC/TERT1 cells using TempO-Seq. Arch Toxicol 2018, 92:2517–2531. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Ramaiahgari SC, den Braver MW, Herpers B, Terpstra V, Commandeur JN, van de Water B, Price LS: A 3D in vitro model of differentiated HepG2 cell spheroids with improved liver-like properties for repeated dose high-throughput toxicity studies. Arch Toxicol 2014, 88:1083–1095. [DOI] [PubMed] [Google Scholar]
- 35.Bushel PR, Paules RS, Auerbach SS: A Comparison of the TempO-Seq S1500+ Platform to RNA-Seq and Microarray Using Rat Liver Mode of Action Samples. Front Genet 2018, 9:485. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Wehmas LC, Wood CE, Gagne R, Williams A, Yauk C, Gosink MM, Dalmas D, Hao R, O'Lone R, Hester S: Demodifying RNA for Transcriptomic Analyses of Archival Formalin-Fixed Paraffin-Embedded Samples. Toxicol Sci 2018, 162:535–547. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Auerbach SS, Phadke DP, Mav D, Holmgren S, Gao Y, Xie B, Shin JH, Shah RR, Merrick BA, Tice RR: RNA-Seq-based toxicogenomic assessment of fresh frozen and formalin-fixed tissues yields similar mechanistic insights. J Appl Toxicol 2015, 35:766–780. [DOI] [PubMed] [Google Scholar]
- 38.Webster AF, Zumbo P, Fostel J, Gandara J, Hester SD, Recio L, Williams A, Wood CE, Yauk CL, Mason CE: Mining the Archives: A Cross-Platform Analysis of Gene Expression Profiles in Archival Formalin-Fixed Paraffin-Embedded Tissues. Toxicol Sci 2015, 148:460–472. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Hester SD, Bhat V, Chorley BN, Carswell G, Jones W, Wehmas LC, Wood CE: Editor's Highlight: Dose-Response Analysis of RNA-Seq Profiles in Archival Formalin-Fixed Paraffin-Embedded Samples. Toxicol Sci 2016, 154:202–213.• Demonstration of dose-response from RNA-seq transcript profiled after RNA extraction from paired frozen and FFPE samples from two archival studies in mice at 2 years and 20 years of storage. Extraction procedures, RNA-seq methods and data analysis are exemplary for use of archival materials that connect histopathology and NGS-based transcriptomics.
- 40.Zerdoumi Y, Kasper E, Soubigou F, Adriouch S, Bougeard G, Frebourg T, Flaman JM: A new genotoxicity assay based on p53 target gene induction. Mutat Res Genet Toxicol Environ Mutagen 2015, 789-790:28–35. [DOI] [PubMed] [Google Scholar]
- 41.Israel JW, Chappell GA, Simon JM, Pott S, Safi A, Lewis L, Cotney P, Boulos HS, Bodnar W, Lieb JD, et al. : Tissue- and strain-specific effects of a genotoxic carcinogen 1,3-butadiene on chromatin and transcription. Mamm Genome 2018, 29:153–167. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.Poetsch AR, Boulton SJ, Luscombe NM: Genomic landscape of oxidative DNA damage and repair reveals regioselective protection from mutagenesis. Genome Biol 2018, 19:215. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Cimino MC: Comparative overview of current international strategies and guidelines for genetic toxicology testing for regulatory purposes. Environ Mol Mutagen 2006, 47:362–390. [DOI] [PubMed] [Google Scholar]
- 44.Maslov AY, Quispe-Tintaya W, Gorbacheva T, White RR, Vijg J: High-throughput sequencing in mutation detection: A new generation of genotoxicity tests? Mutat Res 2015, 776:136–143. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45.Gomy I, Diz Mdel P: Hereditary cancer risk assessment: insights and perspectives for the Next-Generation Sequencing era. Genet Mol Biol 2016, 39:184–188. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46.Schneeberger K: Using next-generation sequencing to isolate mutant genes from forward genetic screens. Nat Rev Genet 2014, 15:662–676. [DOI] [PubMed] [Google Scholar]
- 47.Deng JE, Sham PC, Li MX: SNPTracker: A Swift Tool for Comprehensive Tracking and Unifying dbSNP rs IDs and Genomic Coordinates of Massive Sequence Variants. G3 (Bethesda) 2015, 6:205–207. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48.Wei CH, Phan L, Feltz J, Maiti R, Hefferon T, Lu Z: tmVar 2.0: integrating genomic variant information from literature with dbSNP and ClinVar for precision medicine. Bioinformatics 2018, 34:80–87. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49.Scherr CL, Aufox S, Ross AA, Ramesh S, Wicklund CA, Smith M: What People Want to Know About Their Genes: A Critical Review of the Literature on Large-Scale Genome Sequencing Studies. Healthcare (Basel) 2018, 6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 50.Chen HZ, Bonneville R, Roychowdhury S: Implementing precision cancer medicine in the genomic era. Semin Cancer Biol 2018. [DOI] [PubMed] [Google Scholar]
- 51.Schwarz UI, Gulilat M, Kim RB: The Role of Next-Generation Sequencing in Pharmacogenetics and Pharmacogenomics. Cold Spring Harb Perspect Med 2019, 9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 52.Woo JG, Martin LJ, Ding L, Brown WM, Howard TD, Langefeld CD, Moomaw CJ, Haverbusch M, Sun G, Indugula SR, et al. : Quantitative criteria for improving performance of buccal DNA for high-throughput genetic analysis. BMC Genet 2012, 13:75. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 53.Bellairs JA, Hasina R, Agrawal N: Tumor DNA: an emerging biomarker in head and neck cancer. Cancer Metastasis Rev 2017, 36:515–523. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 54.Yin Y, Lan J, Zhang Q: Application of High-Throughput Next-Generation Sequencing for HLA Typing on Buccal Extracted DNA. Methods Mol Biol 2018, 1802:101–113. [DOI] [PubMed] [Google Scholar]
- 55.Warr A, Robert C, Hume D, Archibald A, Deeb N, Watson M: Exome Sequencing: Current and Future Perspectives. G3 (Bethesda) 2015, 5:1543–1550. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 56.Normand EA, Alaimo JT, Van den Veyver IB: Exome and genome sequencing in reproductive medicine. Fertil Steril 2018, 109:213–220. [DOI] [PubMed] [Google Scholar]
- 57.Kamps R, Brandao RD, Bosch BJ, Paulussen AD, Xanthoulea S, Blok MJ, Romano A: Next-Generation Sequencing in Oncology: Genetic Diagnosis, Risk Prediction and Cancer Classification. Int J Mol Sci 2017, 18. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 58.Fox EJ, Reid-Bayliss KS, Emond MJ, Loeb LA: Accuracy of Next Generation Sequencing Platforms. Next Gener Seq Appl 2014, 1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 59.Krimmel JD, Schmitt MW, Harrell MI, Agnew KJ, Kennedy SR, Emond MJ, Loeb LA, Swisher EM, Risques RA: Ultra-deep sequencing detects ovarian cancer cells in peritoneal fluid and reveals somatic TP53 mutations in noncancerous tissues. Proc Natl Acad Sci U S A 2016, 113:6005–6010.• Use of novel NGS methods for ultradeep sequencing of TP53 mutations applied to ovarian cancer. The implications are that similar queries on environmentally important genes could be useful for determining risk in environmentally contaminated regions or lifestyles. Mutational load on critical genes may be determinants of risk and environmental disease.
- 60.Fedeles BI, Chawanthayatham S, Croy RG, Wogan GN, Essigmann JM: Early detection of the aflatoxin B1 mutational fingerprint: A diagnostic tool for liver cancer. Mol Cell Oncol 2017, 4:e1329693. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 61.Harrill AH, McCullough SD, Wood CE, Kahle JJ, Chorley BN: MicroRNA Biomarkers of Toxicity in Biological Matrices. Toxicol Sci 2016, 152:264–272.• Biofluid-based miRNA studies reviewed as a “liquid biopsy” by sampling extracellular fluids like blood for chemically-induced toxicity in liver, kidney, heart and pancreas. Description of unique factors in miRNA analysis including biogenesis and baseline circulating expression levels, normalization, spike-in oligonucleotides and potential interference by erythrocyte lysis.
- 62.Nassirpour R, Mathur S, Gosink MM, Li Y, Shoieb AM, Wood J, O'Neil SP, Homer BL, Whiteley LO: Identification of tubular injury microRNA biomarkers in urine: comparison of next-generation sequencing and qPCR-based profiling platforms. BMC Genomics 2014, 15:485. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 63.Parsons HA, Beaver JA, Park BH: Circulating Plasma Tumor DNA. Adv Exp Med Biol 2016, 882:259–276. [DOI] [PubMed] [Google Scholar]
- 64.Petit J, Carroll G, Gould T, Pockney P, Dun M, Scott RJ: Cell-free DNA as a Diagnostic Blood-Based Biomarker for Colorectal Cancer: A Systematic Review. J Surg Res 2018, 236:184–197. [DOI] [PubMed] [Google Scholar]
- 65.Ghosh RK, Pandey T, Dey P: Liquid biopsy: A new avenue in pathology. Cytopathology 2018. [DOI] [PubMed] [Google Scholar]
- 66.Gai W, Sun K: Epigenetic Biomarkers in Cell-Free DNA and Applications in Liquid Biopsy. Genes (Basel) 2019, 10. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 67.Nuwaysir EF, Bittner M, Trent J, Barrett JC, Afshari CA: Microarrays and toxicology: the advent of toxicogenomics. Mol Carcinog 1999, 24:153–159. [DOI] [PubMed] [Google Scholar]


