Abstract
Frequent exposure to chemicals in the environment, diet, and endogenous electrophiles leads to chemical modification of DNA and the formation of DNA adducts. Some DNA adducts can induce mutations during cell division and, when occurring in critical regions of the genome, can lead to the onset of disease, including cancer. The targeted analysis of DNA adducts over the past 30 years has revealed that the human genome contains many types of DNA damages. However, a long-standing limitation in conducting DNA adduct measurements has been the inability to screen for the total complement of DNA adducts derived from a wide range of chemicals in a single assay. With the advancement of high-resolution mass spectrometry (MS) instrumentation and new scanning technologies, nontargeted “omics” approaches employing data-dependent acquisition and data-independent acquisition methods have been established to simultaneously screen for multiple DNA adducts, a technique known as DNA adductomics. However, notable challenges in data processing must be overcome for DNA adductomics to become a mature technology. DNA adducts occur at low abundance in humans, and current softwares do not reliably detect them when using common MS data acquisition methods. In this perspective, we discuss contemporary computational tools developed for feature finding of MS data widely utilized in the disciplines of proteomics and metabolomics and highlight their limitations for conducting nontargeted DNA-adduct biomarker discovery. Improvements to existing MS data processing software and new algorithms for adduct detection are needed to develop DNA adductomics into a powerful tool for the nontargeted identification of potential cancer-causing agents.
Graphical Abstract

INTRODUCTION
Hazardous chemicals from environmental pollution, the diet, lifestyle factors, some pharmaceuticals, and endogenous electrophiles contribute to DNA damage of the genome.1,2 These combinations of chemical exposures, termed the “exposome”, result in a wide range of DNA damage, altered genomic function, and disease risk, including cancer.3 Specifically, the measurement of DNA adducts provides data on exposure to hazardous chemicals, genotoxicants, and endogenous electrophiles which covalently modify DNA. Thus, DNA adducts can serve as biomarkers for interspecies comparisons of the biologically effective dose of procarcinogens and permit extrapolation of toxicity data from animal studies for human risk assessment.4–6
Highly sensitive analytical techniques are required to measure DNA adducts, which occur at extremely low levels in humans, generally in the ranges of ~1 adduct per 108 to 1 per 1010 nucleotides (nts).7,8 The 32P-postlabeling assay, with a sensitivity of ~1 per 1010 nts, remains a commonly used method to screen for DNA adducts in humans.9,10 The method has revealed that DNA damage can be extensive and lifestyle factors impact adduct levels in humans.11 However, 32P-postlabeling is not quantitative and does not provide structural information to confirm the identity of the adduct. More recently, liquid chromatography-electrospray ionization-tandem mass spectrometry (LC-ESI-MS/MS), with capabilities for structural identification, has supplanted 32P-postlabeling to become the dominant platform for the detection and identification of many types of DNA adducts.12 The protonated modified 2′-deoxyribonucleoside (2′-dN) ions, which are typically measured during DNA adduct LC-MS analysis, contain a weak glycosidic bond which readily breaks upon collision-induced dissociation (CID) (Figure 1).13 This nearly universal neutral loss fragmentation process is typically used to create scanning methods for both targeted MS quantitation and nontargeted MS screening of DNA adducts. LC-ESI-MS/MS monitoring of known precursor-to-fragment mass transitions, such as in multiple reaction monitoring (MRM) scanning mode in triple-quadrupole (QqQ) MS and parallel reaction monitoring (PRM) mode in high-resolution hybrid MS, is used for targeted, quantitative measurements of DNA adducts. These methods require prior knowledge of the adduct identity and therefore only a limited number of DNA adducts are measured at a time. Over the past two decades, QqQ MS has been the most widely used MS instrument for DNA adduct measurements because of its high sensitivity and selectivity, wide dynamic range, fast duty cycle, and robustness in operation.14 However, QqQ instruments acquire nominal mass resolution spectral data and, thus, are prone to isobaric interferences when analyzing human samples in which DNA adducts are present at very low levels.15 Other MS scanning techniques have evolved to overcome the issues of selectivity for the detection of DNA adducts through the use of ion trap or Q-trap instruments with multistage (MSn) scanning, the high-resolution accurate MS instruments such as the Q-TOF and hybrid Orbitrap MS instruments,1,15–18 or a recent method using normalized retention times (iRT) to schedule detection of 36 DNA adducts using a hybrid Orbitrap mass spectrometer.19
Figure 1.
The DNA adduct fragmentation scheme of a modified 2′-deoxynucleoside. The CID of a DNA-adduct produces the characteristic neutral loss of the dR moiety. dG-C8-PhIP is used as an example.
DNA adductomic analysis is a broad term for the screening of biological samples, using a variety of mass spectrometry (MS) methodologies for the profiling and discovery of DNA adducts.2 We believe that data-dependent acquisition (DDA) and data-independent acquisition (DIA) scanning techniques, traditionally used for nontargeted detection and/or discovery of compounds in proteomics and metabolomics analyses,18,20,21 offer the most promise for the trace-level comprehensive screening of DNA adducts.1,2,15,16,22 The nontargeted discovery of DNA adducts will be greatly aided by the development of software tools designed for the relevant data types produced using these methodologies, especially for high-throughput trace-level DNA adduct screening.
As an emerging technology, MS-based DNA adductomics has the demonstrated ability to identify novel DNA adducts. Totsuka and co-workers identified N2-(3,4,5,6-tetrahydro-2H-pyran-2-yl) deoxyguanosine, a DNA adduct formed from N-nitrosopiperidine, a potential human carcinogen involved in the etiology of esophageal cancer.23 Balskus and colleagues identified a DNA adduct of colibactin, a genotoxin produced by certain strains of Escherichia coli, which may contribute to human colorectal cancer.24 Recent studies using MS have discovered putative DNA adducts important to environmental exposures through the use of an in vitro genotoxicity DNA and RNA damage assay.25,26 DNA adductomics approaches also may serve to assess chemotherapeutic efficacy of drugs used in cancer treatment.26 Recently, we utilized a new DIA approach for the sensitive and comprehensive detection of DNA adducts18,27 and identified DNA adducts formed from the cooked meat carcinogen 2-amino-1-methyl-6-phenylimidazo-[4,5-b]pyridine (PhIP) in human prostate,28 and the tobacco carcinogen 4-aminobiphenyl (4-ABP) in human bladder tissue.18,27 These findings show the great promise of nontargeted MS-based DNA adductomics in DNA adduct discovery.
Despite these advances in utilization of MS for DNA adductomics, a remaining major bottleneck in efficient MS-based DNA discovery is a lack of software designed to process data generated by DDA and DIA DNA adduct screening assays. In this perspective, we highlight our DDA and DIA DNA adductomic methodologies, the challenges in developing bioinformatics tools for automated DIA data analyses, and solutions for overcoming these issues.
DATA-DEPENDENT (DDA) DNA ADDUCT SCREENING
The employment of the linear ion trap (LIT) to conduct both quantitative measurements and acquire high-quality product ion spectra at the MS3 scan stage was an important technical advance in the biomonitoring of DNA adducts at trace levels in humans.17,29 Thereafter, we demonstrated that multiple classes of DNA adducts could be screened, by DDA-constant neutral loss-MS3(DDA-CNL-MS3), where the neutral loss of the dR moiety (116 ± 0.5 Da) in the MS2 scan mode triggered the acquisition of MS3 product ion spectra of the aglycone ion ([B + H2]+). DDA-CNL-MS3 scanning identified multiple DNA adducts in human hepatocytes treated with the tobacco carcinogen 4-ABP and DNA adducts in the liver of rats treated with the cooked-meat carcinogen 2-amino-3,8-dimethylmidazo[4,5-f]quinoxaline (MeIQx).16 The MS3 scan provided rich spectral data of the adducted aglycone for corroboration of the identity of the DNA adduct in this nontargeted scanning mode. The CNL-MS3 scanning technique with the LIT is a major improvement over CNL experiments conducted by QqQ, where the only information attained is that an ion of a given mass underwent the neutral loss of 116 Da in tandem MS.15,22 DDA-CNL-MS3 screening with the LIT also successfully detected DNA adducts in human cancer cell lines treated with the anticancer drugs acylfulvene and illudin S.30
The performance and selectivity of DDA-CNL-MS3 scanning were dramatically improved by the usage of a higher-solution Q-LIT-hybrid Orbitrap MS,1 where the loss of dR (116.0473 Da ± 5 ppm) triggered the MS3 scan event. The accurate mass scanning for the neutral loss of dR minimized the number of false positives observed with the low-resolution LIT. Multiple DNA adducts of the tobacco-specific nitroso carcinogen 4-(methylnitrosamino)-1-(3-pyridyl)-1-butanone (NNK) were identified, and the MS3 spectra collected confirmed the structures of the DNA adducts. This scanning approach was also successfully used to screen for DNA adducts of chemotherapeutics by monitoring alkylated and cross-linked adducts.31
DDA is a dynamic screening approach where the selection of precursor ions is usually determined by the ion abundance in the full MS scan. The performance of DDA is strongly influenced by the scanning speed of the MS, where the sampling rate limits the number of ions which undergo MS2 fragmentation. This issue is mitigated to some extent through the use of a dynamic exclusion function which restricts the reoccurrence of fragmentation events of the same precursor ions. However, DDA-CNL-MS3 is still biased toward sampling more abundant ions, and DNA adducts typically present at low levels may escape detection.18 Additionally, the DNA adducts must be enriched from the large excess of nonmodified DNA bases and other interferences in the matrix to maximize the sensitivity of this scanning technique.32
DATA-INDEPENDENT (DIA) DNA ADDUCT SCREENING
The concept of our DIA scanning strategy, termed wide-SIM/MS2, was adapted from the sequential window acquisition of all theoretical mass spectra (SWATH) method developed by Aebersold and co-workers for the analysis of proteomic samples.33,34 SWATH analysis, broadly speaking, does not rely on the ion abundance, and data-mining can be performed retrospectively to probe for any analyte without recollecting data. We improved upon the SWATH method by incorporating segmented MS1 survey scans and their corresponding MS2 scans to screen for DNA adducts of environmental genotoxicants and endogenous electrophiles and have found wide-SIM/MS2 to be more comprehensive than the DDA approach in detecting bulky DNA adducts.18
A general overview of the different MS scan methods used for proteomics and metabolomics versus our novel method developed for detection of DNA-adducts is shown in Figure 2. The evidence that an analyte is a potential DNA adduct in wide-SIM/MS2 relies on the co-elution of the SIM and MS2 extracted ion current (EIC) chromatograms of the precursor ion and the corresponding aglycone fragment ion, respectively. The proof of identity of the DNA adduct is then confirmed by a subsequent analysis using targeted MS3 scanning.18 We have successfully used the wide-SIM/MS2 technology to detect several DNA adducts in human samples which were formed by known carcinogens or endogenous electrophiles. These included the bladder carcinogen 4-ABP in human bladder tissues,27 the potent urothelial carcinogen aristolochic acid-I present in Chinese herbal medicines in human kidney,18 and a DNA adduct of the cooked-meat PhIP in human prostate that may contribute to prostate cancer.35,36 In addition, we detected multiple DNA adducts of lipid peroxidation products in human tissues. These DNA adducts of known carcinogens and endogenous electrophiles were detected by manual extraction of their precursor and aglycone masses from the wide-SIM/MS2 data. This is a powerful approach to retrospectively probe for putative DNA adducts without recollecting data. However, our goal is to process DNA adductomics data in an automated manner to discover novel or “unexpected” adducts. Thus, novel algorithms and bioinformatic workflows for wide-SIM/MS2 data must be created or existing proteomics/metabolomics software must be adapted for this purpose. It is essential for the software to identify both the precursor and aglycone ions of DNA adducts and their isotopic profiles, which are often found near the limit of detection.
Figure 2.
Different MS scanning technologies in “omics” disciplines. MS scanning techniques differ for DDA, DIA, and targeted scanning techniques for peptides, small molecules, and DNA adducts. Each graphic represents a single duty cycle. (A) Proteomics and metabolomics techniques utilize DDA and DIA technologies with a full MS1 scan prior to MS2 for structural elucidation by data-dependent fragmentation or data-independent “all ion fragmentation” in the selected m/z parent fragmentation range. (B) DDA-CNL-MS3 first fragment more abundant molecules from the MS1 scan to monitor for the loss of dR in MS2, which subsequently triggers the MS3 for structural identification. (C) Wide-SIM/MS2 uses a DIA approach that first detects adduct precursor ions in the 10 segmented MS1(wide-SIM) scans, each of which is composed of 30 m/z windows such that all spectra combined encompass a 300 m/z range. All ions detected in a single wide-SIM scan are fragmented in the following MS2 scan, in which the adducted aglycone ions can be extracted. There is no targeting of molecules during the initial screen, rather specific precursor ions are targeted for structural identification in a subsequent experiment using MSn.
WORKFLOWS FOR PROCESSING MS DATA
The DIA wide-SIM/MS2 methodology with high-resolution Orbitrap MS has the potential to detect many DNA adducts through nontargeted data analysis. Our goal is to develop a software program that identifies precursor and aglycone ions in an automated fashion. However, the low ion abundances of the signals of DNA adducts pose challenges in automated processing of DNA adductomic data when using existing software. For example, in wide-SIM/MS2, the precursor ([M + H]+) and aglycone ([B + H2]+) ion pairs are often near the limits of detection of the MS signals which creates complications during a step called “feature finding”. Feature finding detects the monoisotopic (precursor: [M0 + H]+, aglycone: [B0 + H2]+), first isotopic ([M1 + H]+ and [B1 + H2]+), and second isotopic ([M2 + H]+ and [B2 + H2]+) peaks. The challenges of feature finding are depicted in the expanded full scan and product ion spectra of the DNA adducts N-(2′-deoxyguanosin-8-yl)-2-amino-1-methyl-6-phenylimidazo[4,5-b]pyridine (dG-C8-PhIP) and N-(2′-deoxyguanosin-8-yl)-4-aminobiphenyl (dG-C8–4-ABP) (Figure 3). The precursors [M0 + H]+ and aglycone [B0 + H2]+ of the dG-C8-PhIP and dG-C8–4-ABP adducts are observed in the spectra. Because of the low signal intensity, the assignment of the first isotope [M1 + H]+ (predominated by the 13C isotope) of the precursor ion can be inaccurate due to isobaric interferences, and the second isotope [M2 + H]+ is very often below the level of detection for the feature finding software. For these adducts, the [B0 + H2]+ and [B1 + H2]+ aglycone ions are much more easily observed than for the precursor. The co-elution of the [M0 + H]+ and aglycone [B0 + H2]+ ions together with their first isotopic peaks serve as critical evidence for the presence of adducts. These ions can be extracted manually to confirm their presence in the data; however, this is an inefficient process, and an improved automated detection of these multiple peaks at trace levels is needed. The computational methods of feature finding and their contribution toward identification of DNA adducts are further discussed below.
Figure 3.
MS scans and extracted ions of dG-C8-PhIP in human prostate DNA (1.1 adducts per 107 nts) and dG-C8–4-ABP in human bladder DNA (1.0 adducts per 107 nts). The signal intensities for the precursor ion and its isotope peaks occur at low abundances and can be obscured by isobaric interferences. (A) Mass spectrum of dG-C8-PhIP precursor ion ([M + H]+) in wide-SIM and (B) aglycone ion ([B + H2]+) in MS2 scans. The monoisotopic ([M0 + H]+, [B0 + H2]+) and first isotopic peaks ([M1 + H]+, [B1 + H2]+) are highlighted in red. (C) EIC for the monoisotopic and (D) first isotopic peaks of dG-C8-PhIP from the wide-SIM/MS2 data (precursor ions ([M + H]+) are depicted in red, and aglycone ions ([B + H2]+) are shown in blue. (E–H) Mass spectra and EIC of mono- and first isotopic peaks of dG-C8–4-ABP.
WORKFLOWS FOR PROCESSING MS DATA
Data analysis workflows for MS screening of analytes typically requires several common steps, including mass centroiding, feature finding (including identification of isotope peaks), identification of co-eluting clusters (e.g., [M + Na]+), alignment of features between samples, compound identification, and statistical evaluation for differences between experiment groups. Feature finding is a data reduction and grouping technique where centroiding converts the original data to a manageable computational size, and grouping assigns various peaks (isotopes, chromatographically linked peaks) to a single “feature” with an m/z value and retention time. Features can exist in multiple forms (e.g., different charge state, [M + Na]+), and multiple chromatographically co-eluting features are often collapsed into “compounds” that contribute to the overall detected abundance of a molecule.
Feature finding in nontargeted metabolomics analysis is often followed by compound identification using targeted MSn analysis. The criteria for compound identification follow accepted routines defined by Metabolomics Standards Initiative (MSI) of decreasing levels of confidence.37 These minimum reporting standards were the subject of a recent discussion for improving the criteria given the varied compliance toward using the reporting standards in published data, but current criteria remain unchanged.38,39 The criteria established for identification of analytes are stratified into four tiers of confidence. The highest level of confidence is compound identification, which is performed by comparison of the unknown analyte to a sample spiked with a synthetic compound or an internal standard (IS). The second level of confidence is based on the product ion spectrum acquired by tandem MS, where characteristic fragment ions serve to assign the chemical structure. The third level of confidence employs a Lewis formula within an isotope envelope, and the fourth level of confidence relies on comparison of the m/z of the unknown compound to compounds of the same m/z. Spectral libraries are used to identify compounds at the second level, and databases are used for the third and fourth levels. The usage of different compound reference databases at this stage of analysis can greatly impact the identification false positive rates (FPR) (e.g., PubChem with millions of entries versus the Human Metabolome Database with ~114,000 entries). The larger database can produce many more putative identifications for a formula that is empirically calculated, which increases the FPR. The FPR for compound identification at this step depends highly on the ambiguity of the mass assignment and Lewis formula found within these databases of various sizes. Table 1 summarizes the utility of different MS scan levels from proteomics, metabolomics, and DNA/RNA adductomics. MSI fundamentals for identification quality can be used to analyze DNA adducts since they are small organic molecules, similar in nature to the molecular weights and fragment similar to metabolites analyzed in metabolomic assays.
Table 1.
Comparison of MS “omics” Methods and MSn Scan Level Utility for Proteomics, Metabolomics, and DNA Adductomics
| proteomics |
metabolomics |
DNA/RNA adductomics |
|
|---|---|---|---|
| analyte | peptides | small molecules/lipids | small molecules/nucleosides |
| MS1 | m/z, quantitation, RTa | m/z, Lewis formula, quantitation, RTa | m/z, Lewis formula, RTa |
| MS2 | peptide sequence | structure, Lewis formula | neutral loss of dR/Rb |
| MS3 | structure of fragments | structure of fragments | structure information/confirmation structures of aglycones |
RT stands for retention time.
dR and R are used as the acronyms for 2′-deoxyribose and ribose.
The molecular formula of DNA adducts can be determined from the accurate MS1 mass measurements; however, there are no spectral libraries which contain product ion spectra of DNA adducts. The construction of a fragmentation spectral library for DNA adducts and their aglycone ions would allow for identification of previously observed DNA adducts and assist in the identification of putative, uncharacterized adducts. The greatest challenge for DNA adductomic analysis is the trace-level detection of DNA adducts where signal intensities approach the background ion signal noise levels. The most error prone steps in feature finding are the identification of the precursor ions and their isotopes, and consequentially the error will propagate forward to compound identification, especially if the isotopic peaks of the [M0 + H]+ or [B0 + H2]+ ions cannot be detected.40 Therefore, highly efficient and accurate feature identifications and adduct searches require optimized feature finding methods.
FEATURE FINDING IN MS DATA
Compound identification and semiquantitation, as are routinely performed with metabolomics data, involve feature finding that typically (but not always) consists of three steps: (1) centroided masses are picked from profile MS data, (2) contiguous masses within a mass tolerance window are extracted from the MS data to produce an EIC, and (3) within the EICs, chromatographic peaks, representing the elution of a compound with an observed Gaussian-like peak shape, are selected.41 Software tools have been developed to perform each of these steps and typically incorporate filters to remove low-intensity peaks that are likely background noise. However, if this noise removal filter is set too stringent, then many DNA adducts present at trace levels will be filtered out as noise. Conversely, if noise filter settings are less stringent, then putative DNA adducts can be detected at the expense of longer processing times and an increase in the FPR for the detection of features.40
Recent improvements in feature finding (MZ Mine 2.0-ADAP algorithm)40,47 have reduced the FPR of feature finding. Myers et al. reported a reduction in the FPR if the peak finding algorithms used an absolute tolerance (i.e., 0.005 Da instead of 5 ppm mass tolerance) window during construction of the EICs to overcome m/z assignment errors during the peak centroiding step. Once EICs are constructed for each m/z, continuous wavelet transform (CWT) is then used again to select a chromatographic peak that elutes over time and that best matches the expected peak shape of an eluting compound. Current CWT-based tools differ little in their approach for this step but can produce subtle to large differences in the final feature finding data (e.g., peak area, intensity, detected m/z).42,43,44
Following initial feature finding, an isotopic deconvolution is performed to identify isotopes.45 Isotopic deconvolution pairs the monoisotopic feature of an ion with the most abundant mass that is produced by the distribution of stable isotopes in nature. This is where the [M0 + H]+ and [M1 + H]+ ions, originally separate features, are grouped together as a single feature. CAMERA is one software package used for this step and is part of the XCMS feature finding workflow.43,46 MS-DIAL and MZ Mine 2 also perform isotopic deconvolution,43,44 and vendor software typically incorporates this step in their feature processing regimes (e.g., Profinder from Agilent, Compound Discoverer from Thermo Fisher Scientific, among others). Isotopic deconvolution can be completed before or after alignment of features across samples, the latter is advantageous because the incorporation of additional data quality checks correlate the feature data between samples which can improve the confidence of the aligned feature identification. Further, isotope identification is an important step that aids the computation of a molecular formula. To compute a molecular formula, XCMS and MZ Mine 2 use SIRIUS,47,48 while MS-DIAL utilizes a custom algorithm.44
The alignment of features, using both mass and retention time, between samples where metrics contributing to the features (isotope ratios, retention time, and peak shape) are measured for concordance between samples. There are methods of varying complexity to accomplish this task which are included in many software packages. XCMS includes the Ordered Bijective Interpolated Warping (Obiwarp) and PeakDensity methods for RT-based alignment,43,49 MZ Mine 2 uses Random Sample Consensus (RANSAC) and Join Align methods,47 and MS-DIAL incorporates MS2 fragment spectra as a step to improve correspondence between sample features during the alignment. Parameters for these steps, like for feature finding, require careful selection to avoid splitting of single features falsely into multiple features or to avoid collapsing multiple features representing separate molecules into single features during the alignment. There are many algorithms that exist for this step,50 and the above-mentioned algorithms are extensively used in metabolomics experiments and reviews are available.21,50 A routinely applied component of these software is a step called gap filling, which is used after the alignment of peaks to recover missed features in one or more samples where a definitive feature was found in other samples. Gap filling attempts to assign a peak area by searching the original raw data for the relevant signal and fitting a curve to it (although “missed” by the feature finding software) to generate a peak which is then integrated to produce a peak area useful for ensuing statistical analysis. Gap filling can also include statistical “imputation”, the process of replacing missing data with a substituted abundance value and is commonly applied after searching the raw data for real but missed peaks.
DATA PROCESSING FOR DNA ADDUCTOMICS WIDE-SIM/MS2 DATA
Previously acquired data generated by analysis of calf thymus DNA (ctDNA) (spiked-in experiment with synthetic DNA adducts),18 human bladder,27 kidney, and prostate18 using targeted high-resolution/accurate mass MSn and wide-SIM/MS2 with the Orbitrap Fusion Tribrid MS (Thermo Fisher Scientific) were reanalyzed using the targeted peak extraction wizard of MZ Mine 2.47 The ctDNA (spiked-in experiment) consisted of the analysis of 12 samples (4 groups × 3 replicates), where 20 DNA adducts were spiked at three levels (level 1, 4–8 adducts in 108 nts; level 2, 1.3–2.7 adducts in 108 nts; or level 3, 4–8 in 109 nts) and compared with an unspiked ctDNA sample. The chemical names and structures of DNA adducts are shown in Figure 4. The use of existing feature finding tools such as MS-DIAL, MZ Mine 2, XCMS or Compound Discoverer (Thermo Fisher Scientific) was not possible due to the segmented nature of the SIM and MS2 scan data. Thus, we developed a solution to separate the individual SIM and MS2 scan event data sets into individual files to allow the existing tools to perform feature detection (“wSIMCity”, S. Walmsley, manuscript in preparation). For our solution, the 20 scan events of 30 m/z windows (10 wide-SIM, 10 MS2 per duty cycle) were extracted and segregated into 20 independent mzML data files using the MSConvert raw data conversion tool and a custom script programmed in R using the mzR package.51,52 MZ Mine 2 (Automated Data Analysis Pipeline (ADAP)) and MS-Dial (v.3.53) were then used to perform feature finding on the individual data files.
Figure 4.
Adduct structures. The structures for DNA adducts that were discussed in the text. The figure includes the m/z values for the [M + H]+ and [B + H2]+ ions. Blue sphere indicates the sites of 2H-labeled atoms, and red sphere indicates the positions of the 13C-labeled atoms. The R′ indicates different chemical moieties for the aristolactam analogues. The DNA adducts and their labeled analogues (when present) are dG-C8–4-ABP, [13C10]-dG-C8–4-ABP; N-(2′-deoxyguanosin-8-yl)-2-amino-9H-pyrido[2,3-b]indole (dG-C8-AαC), [13C10]-dG-C8-AαC; 7-(2′-deoxyadenosin-N6-yl)-aristolactam I (dA-AL-I); 7-(2′-deoxyadenosin-N6-yl) aristolactam II (dA-AL-II); N-(2′-deoxyguanosin-8-yl)-2-amino-3,8-dimethylimidazo[4,5-f]quinoxaline (dG-C8-MeIQx), dG-C8-[2H3C]-MeIQx; N-(2′-deoxyguanosin-8-yl)-2-amino-3-methyl-3H-imidazo[4,5- f]-quinoline (dG-C8-IQ), [13C10]-dG-C8-IQ; 7-(2′-deoxyguanosin-N2-yl)-aristolactam-I (dG-AL-I); 7-(2′-deoxyguanosin-N2-yl)-aristolactam-II (dG-AL-II); O6-[4-oxo-4-(3-pyridyl)-butyl]-2′-deoxyguanosine (O6- POB-dG), [pyridine-2H4]-O6-POB-dG; 10-(2′-deoxyguanosin-N2-yl)-7,8,9-trihydroxy-7,8,9,10- tetrahydrobenzo[a]pyrene (dG-N2-B[a]PDE), [13C10]-dG-N2-B[a]PDE; dG-C8-PhIP, dG-C8-[2H3C]-PhIP, [13C10]-dG-C8-PhIP; 6-(1-hydroxyhexanyl)-8-hydroxy-1, N2-propano-2′-deoxyguanosine ([2H11]-HNE- dG).
The 20 separate alignment result files were combined into a single output file which contains features identified for both the SIM and MS2 data sets. A custom bioinformatics workflow and algorithm were then used to search for co-eluting features of spiked-in DNA adducts and unknowns, but was limited in its ability to detect trace-level DNA adducts due to their low-intensity signals. For example, the output for dG-C8-PhIP in the ctDNA spiked-in experiment using MZmine 2 is illustrated in Figure 5. MZmine 2 used centroided masses (Figure 5A) within a ± 5 ppm tolerance to construct an EIC (Figure 5B) from which a chromatographic peak for the [M0 + H]+ was detected (Figure 5C). The assigned mass was calculated from the mean of the centroided masses detected across the chromatographic peak (Figure 5D). For any given feature ([M0 + H]+, [M1 + H]+, [B0 + H2]+, [B1 + H2]+, etc.), the lower the intensity of the signal, the greater the chance for significant deviation of the measured masses (see Figure 5D vs Figure 5E), leading to a less accurate mass assignment. For our analysis, co-eluting background signal precluded the detection of some of the spiked-in DNA adducts during feature finding. The ability of the workflow to detect these spiked-in DNA adducts was quantified using the true positive rate (TPR), calculated as the fraction of the spiked DNA adducts for which both the precursor ([M + H]+) and aglycone ([B + H2]+) ions were detected.
Figure 5.
Results of feature finding workflow in MZ Mine 2.0 (ADAP) using dG-C8-PhIP in the spiked ctDNA as an example. (A) Peak centroiding reduces data points (profile mode of black trace vs centroid mode in a single vertical line in red). Red dashed line shows an arbitrary background noise level. (B) EIC of the centroided m/z value (490.1946 ± 5 ppm) and (C) a zoom-in around tR ~ 17 min after chromatographic deconvolution. (D) Trace of centroided [M0 + H]+ peaks showing which peaks were assigned to the [M0 + H]+ monoisotope and the final assigned m/z = 490.1944 (black dashed line). (E) Trace of centroided [M1 + H]+ with deviation of mass accuracy of ~0.004 Da or ~8 ppm and the final assigned m/z = 491.1968 (black dashed line). (F) MS spectrum for [M0 + H]+ and [M1 + H]+ peaks. Note the [M1 + H]+ peak is below the assigned arbitrary level of the noise.
The TPR of feature finding using MZmine 2 was 84%, 84%, and 78% for the three spiking levels, respectively (Figure 6), as compared to 94%, 87%, and 69% when performed by manual verification (Figure 7). The greater TPR for feature finding at the lowest level of spiking (level 3, 4–8 in 109 nts) was due to the automated gap filling of aligned peak data based upon the detection of features in the higher level spiked-in samples. While many of the spiked adducts were detected by feature finding, the correlations between the automated and manually extracted peak areas were low, with R2 (goodness-of-fit for linear regression) = 0.03 to 0.67.
Figure 6.
Intensities of spiked adducts in ctDNA using the ADAP feature finding algorithm in MZmine 2 with a minimum peak intensity of 3000 and 5 ppm mass deviation. The levels were (N = 3): Level 1, 4–8 adducts per 108 nts; level 2, 1.3–2.7 adducts per 108 nts; and level 3, 4–8 adducts per 109 nts. (A) The computed mean peak intensities of adduct precursor ions ([M + H]+) in wide-SIM data and (B) of aglycone ions ([B + H2]+) in MS2 data. Intensities are log scaled, and error bars represent the standard deviations across three samples. No error bar indicates insufficient data points were present.
Figure 7.
Intensities of spiked adducts in ctDNA using the targeted extraction module of MZmine 2 at 5 ppm. The targeted extraction identified the EIC peak boundaries and computed a peak intensity for each compound. “Targeted” means the user provides an m/z value, a ppm mass tolerance window, and a retention time in which to detect an ion in the MS data file. The spiking levels were (N = 3): Level 1, 4–8 adducts per 108 nts; level 2, 1.3–2.7 adducts per 108 nts; and level 3, 4–8 adducts per 109 nts. (A) The computed mean peak intensities of adduct precursor ions ([M + H]+) in wide-SIM data and (B) of aglycone ions ([B + H2]+) in the MS2 data. Intensities are log scaled, and error bars indicate standard deviation across three samples for each group. No error bar indicates insufficient data points were detected.
MS-DIAL produced slightly lower TPR values (0.80 across all spiking levels) for the detection of the spiked-in DNA adducts but with slightly higher correlations, R2 = 0.56–0.77. MS-DIAL’s TPR was consistent across all spiking levels because of how the recovered spike-in data was reported after gap-filling.
The results produced using MS-DIAL and our DNA adduct search algorithm suggested the detection of multiple putative DNA adducts in the ctDNA sample in addition to the spiked adducts (an example of the search result from a level 1 ctDNA sample is shown in Figure 8A). The total number of features found in ctDNA was extremely high (>48,000) and, after searching for aglycone ions matched to a precursor, was reduced to ~1000 pairs. Of these pairs, data were filtered using a quality score (the probability that the precursor and aglycone features are from the same analyte) produced by the search algorithm to reduce the list to ~100 putative and spiked-in DNA adducts. Given the original large number of features identified, it is likely many false positives arising from artifacts were produced by the feature finding step when using a low-intensity threshold.53 The accuracy of the feature detection step as defined by the receiver operator characteristic (ROC) curve was computed using the level 1 and level 3 spiked ctDNA samples (high vs low level adducts, triplicate for each level). To compute the ROC, we first calculated the FPR of feature finding by searching for significant changes in feature abundances between the level 1 (high spiked level) and level 3 (low spiked level) groups (Welch’s t test, p < 0.05).54 Any feature that was not a spiked-in DNA adduct and p < 0.05 was flagged as a false positive, and the spiked-in DNA adducts were flagged as true positives. These criterion were used to plot a receiver operator characteristic (ROC) curve and compute an area under the curve (AUC). The performance of the feature finding was a marginal value of 0.84 (Figure 8B), indicating false features were detected in the background ctDNA matrices.
Figure 8.
Adductome map and accuracy of spiked DNA adducts. (A) Data search for precursor ([M + H]+, blue diamond) and aglycone ([B + H2]+, red circle) ion pairs using a custom algorithm indicates many putative DNA adducts over all >48,000 detected features (gray dots) in a spiked ctDNA sample (level 1, for example). (B) ROC and AUC for feature finding performance. The accuracy of feature detection in ctDNA is marginal (AUC = 0.84) due to the false detection of features in ctDNA.
In summary, not all spiked adducts can be identified by either MS-DIAL or MZmine 2 (ADAP) feature finding tools. The detection of adducts was dependent on the algorithm used, the abundance of the adducts, and whether isobaric interferences were present. On the other hand, feature finding identifies many potential false features, which are attributed to the unexplored landscape of detected ions in the digested DNA adduct matrix and how they are processed using feature finding software. We note that these issues in automated detection of DNA adducts using the existing automated software are in part due to the low levels of the DNA adducts. Myers et al. have reported similar issues regarding feature finding in their metabolomics experiments.40 Lower abundant molecules had inconsistent mass traces across the peaks which could lead to very large mass errors and false negatives.40 Current feature finding software tools are sufficient for automated detection of some DNA adducts present at moderate to high levels of DNA modification; however, the robust detection of many adducts present at low levels of DNA modification, as occurs in humans, can be difficult owing to the above-mentioned isobaric interferences. An extensive comparison of the performance of these feature finding tools is warranted and will require the introduction of a standardized negative control sample so that the performance of these software tools as applied for DNA adduct analysis can be measured.
FUTURE DIRECTIONS FOR DNA ADDUCTOMIC DATA ANALYSIS
We have shown that DIA data acquisition methods can screen for multiple DNA adducts in a single assay by high-resolution MS. Some of the limitations of current software tools used for the automated analyses of DNA adducts can be resolved with straightforward bioinformatics solutions. Like for DNA, RNA modification is dynamic and reversible, and there are reports of some RNA adducts being linked to disease.55,56 Efforts in RNA adduct identification have mostly focused on the impact on RNA transcription and more than 160 RNA modifications have been identified.57,58 Thus, improvements in the scanning technology and software tools will aid discovery of both DNA and RNA adducts.59
The desired outcome of this effort is the establishment of a rapid, automated search software tool for the detection of the [M + H]+ and [B + H2]+ ion pairs. However, the algorithms fail to detect several of the adducts at lower levels of DNA modification (<1 adduct per 108 nts), and the correlations with manually measured spiked-in abundances were low. Additionally, errors can occur by the automated and in inaccurate estimations of the relative abundances of the [M + H]+ and [B + H2]+ pairs, which weaken the criteria used to support the presence of a putative DNA adduct. Optimizing data acquisition by employing a shallower chromatographic gradient, using higher resolution detection, decreasing the size of the isolation window in SIM, and elevating both the automatic gain control and the maximum injection time, can improve peak detection and the separation of isobaric components. Improvements to algorithms that discriminate between instrument and sample noise and are capable of dynamically adapting to the levels of within the MS experiment may improve the performance of existing, widely used feature finding algorithms. Standardized negative control samples are required to reduce false positives while allowing for the accurate detection of DNA adducts. Further, MS data analysis could benefit from the exploration of novel approaches to feature finding (e.g., artificial intelligence-based methods such as neural networks). These improvements may provide a more accurate and complete coverage detection of the DNA adductome and advance the field of DNA adductomics.
ACKNOWLEDGMENTS
This work was supported by the University of Minnesota Masonic Cancer Center and by R01CA122320 and R01CA220367 (R.J.T.) from the National Cancer Institute and R01ES019564 (R.J.T.) from the National Institute of Environmental Health Sciences. Salary support for P.W.V. was provided by the National Cancer Institute under award number R50CA211256. Mass spectrometry was supported by Cancer Center support grant CA077598 from the National Cancer Institute, and human bio specimens were supported by the National Center for Advancing Translational Sciences of the National Institutes of Health award number UL1TR000114. We thank Dr. Francis Johnson and Dr. Radha Bonala, Stony Brook University, who generously provided dG-AL-I, dG-AL-II, dA-AL-I, and dA-AL-II and Dr. Pramod Upadhyaya from University of Minnesota for kindly providing O6-POB-dG.
Biographies
Scott Walmsley received his Ph.D. in cell and molecular biology from Colorado State University under the direction of Professor Norman Curthoys. Starting in 2011, he spent 3 years as a post-doctoral researcher in the laboratory of Professor Alexey Nesvizhskii at the University of Michigan, specializing in proteome informatics. In 2014, he was a Research Instructor at the Skaggs School of Pharmacy Mass Spectrometry Facility at the University of Colorado Denver-Anschutz under the direction of Associate Professor Nichole Reisdorph. In 2018, he moved to his current position as a Computational Scientist at the Masonic Cancer Center at the University of Minnesota specializing in computational methods for mass spectrometry data.
Jingshu Guo received her Ph.D. in chemistry from the University of Toledo in 2013 under the supervision of Dr. Wendell P. Griffith. She pursued the post-doctoral training in Turesky laboratory at the University of Minnesota and became a Research Assistant Professor in 2018. Her research interest focuses on developing and applying high-resolution mass spectrometry-based analytical methods in biomonitoring environmental toxicants and their DNA adducts in humans.
Jinhua Wang, PhD is a Professor of Bioinformatics at UMN Masonic Cancer Center and Institute for Health Informatics and the director for the MCC Cancer Center Bioinformatics. Dr. Wang was an Associate Professor at NYUSOM and a member and co-director of the biomedical informatics core at the NYU Cancer Center before he moved to UMN. Dr. Wang received his Ph.D. degree in computational biology from Chinese Academy of Sciences. Dr. Wang’s training includes a post-doctoral fellowship in computational biology and computational genomics at Cold Spring Harbor Laboratory and a bioinformatics/computational scientist appointment at the Hartwell Center for Bioinformatics and Biotechnology at St. Jude Children’s Research Hospital.
Peter Villalta received his Ph.D. degree in Chemistry from the University of Minnesota under the direction of Professor Doreen Leopold. He spent two years as a post-doctoral associate at the National Oceanic and Atmospheric Administration in Boulder, Colorado (advisor, Dr. Carl Howard). In 1995, he became a Senior Research Scientist at Aerodyne Research in Billerica, Massachusetts. He moved to the University of Minnesota in 1999 to serve as the Coordinator of Mass Spectrometry Services of the Masonic Cancer Center’s Analytical Biochemistry Shared Resource, a mass spectrometry facility supporting cancer research at the University of Minnesota. In 2016, he became an NCI Research Specialist in support of the Masonic Cancer Center’s Carcinogenesis and Chemoprevention Program, and in 2019 became the Director of the Analytical Biochemistry Shared Resource. His research interests include developing methodologies for screening unknown DNA adducts (adductomics) and improving in vivo DNA adduct quantitation through advanced mass spectrometry and chromatography methodologies.
Rob Turesky received his Ph.D. degree in Nutrition and Food Science at M.I.T. under the supervision of Professors Marcus Karel and Steven Tannenbaum. He worked as a research scientist investigating food safety and processing at the Nestlé Research Center, Vers-chez-les-Blanc, Switzerland (1986–2000). He held positions at the National Center for Toxicology/US FDA (2000–2004) and Wadsworth Center, New York State Department of Health (2004–2013), before joining the University of Minnesota in 2013. He is a Professor in the Department of Medicinal Chemistry, holds the Masonic Chair in Cancer Causation, and served as the Director of the Masonic Cancer Center’s Analytical Biochemistry shared resource (2013–2019). His research is focused on the biochemical toxicology of hazardous chemicals in the environment, diet, and tobacco. The laboratory characterizes pathways of metabolism of genotoxicants and establishes biomarkers, including metabolites, protein, and DNA adducts. Mass spectrometric methods are applied to measure these biomarkers in molecular epidemiology studies that seek to understand the role of chemical exposures in the origin of cancer.
Footnotes
The authors declare no competing financial interest.
REFERENCES
- (1).Balbo S, Hecht SS, Upadhyaya P, and Villalta PW (2014) Application of a high-resolution mass-spectrometry-based DNA adductomics approach for identification of DNA adducts in complex mixtures. Anal. Chem. 86, 1744–1752. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (2).Villalta PW, and Balbo S (2017) The future of DNA adductomic analysis. Int. J. Mol. Sci. 18, 1870. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (3).Wild CP (2005) Complementing the genome with an “exposome”: the outstanding challenge of environmental exposure measurement in molecular epidemiology. Cancer Epidemiol., Biomarkers Prev. 14, 1847–1850. [DOI] [PubMed] [Google Scholar]
- (4).Wiencke JK (2002) DNA adduct burden and tobacco carcinogenesis. Oncogene 21, 7376. [DOI] [PubMed] [Google Scholar]
- (5).Jarabek AM, Pottenger LH, Andrews LS, Casciano D, Embry MR, Kim JH, Preston RJ, Reddy MV, Schoeny R, Shuker D, Skare J, Swenberg J, Williams GM, and Zeiger E (2009) Creating context for the use of DNA adduct data in cancer risk assessment: I. Data organization. Crit. Rev. Toxicol. 39, 659–678. [DOI] [PubMed] [Google Scholar]
- (6).Himmelstein MW, Boogaard PJ, Cadet J, Farmer PB, Kim JH, Martin EA, Persaud R, and Shuker DE (2009) Creating context for the use of DNA adduct data in cancer risk assessment: II. Overview of methods of identification and quantitation of DNA damage. Crit. Rev. Toxicol. 39, 679–694. [DOI] [PubMed] [Google Scholar]
- (7).Tretyakova N, Villalta PW, and Kotapati S (2013) Mass spectrometry of structurally modified DNA. Chem. Rev. 113, 2395–2436. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (8).Klaene JJ, Sharma VK, Glick J, and Vouros P (2013) The analysis of DNA adducts: The transition from 32P-postlabeling to mass spectrometry. Cancer Lett. 334, 10–19. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (9).Randerath K, Reddy MV, and Gupta RC (1981) 32Plabeling test for DNA damage. Proc. Natl. Acad. Sci. U. S. A. 78, 6126–6129. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (10).Phillips DH, and Venitt S (2012) DNA and protein adducts in human tissues resulting from exposure to tobacco smoke. Int. J. Cancer 131, 2733–2753. [DOI] [PubMed] [Google Scholar]
- (11).Phillips DH (2005) DNA adducts as markers of exposure and risk. Mutat. Res., Fundam. Mol. Mech. Mutagen 577, 284–292. [DOI] [PubMed] [Google Scholar]
- (12).Liu S, and Wang Y (2015) Mass spectrometry for the assessment of the occurrence and biological consequences of DNA adducts. Chem. Soc. Rev. 44, 7829–7854. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (13).Wolf SM, and Vouros P (1994) Application of capillary liquid chromatography coupled with tandem mass spectrometric methods to the rapid screening of adducts formed by the reaction of N-acetoxy-N-acetyl-2-aminofluorene with calf thymus DNA. Chem. Res. Toxicol. 7, 82–88. [DOI] [PubMed] [Google Scholar]
- (14).Tretyakova N, Goggin M, Sangaraju D, and Janis G (2012) Quantitation of DNA adducts by stable isotope dilution mass spectrometry. Chem. Res. Toxicol. 25, 2007–2035. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (15).Kanaly RA, Hanaoka T, Sugimura H, Toda H, Matsui S, and Matsuda T (2006) Development of the adductome approach to detect DNA damage in humans. Antioxid. Redox Signaling 8, 993–1001. [DOI] [PubMed] [Google Scholar]
- (16).Bessette EE, Goodenough AK, Langouet S, Yasa I, Kozekov ID, Spivack SD, and Turesky RJ (2009) Screening for DNA adducts by data-dependent constant neutral loss-triple stage mass spectrometry with a linear quadrupole ion trap mass spectrometer. Anal. Chem. 81, 809–819. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (17).Goodenough AK, Schut HA, and Turesky RJ (2007) Novel LC-ESI/MS/MS(n) method for the characterization and quantification of 2′-deoxyguanosine adducts of the dietary carcinogen 2-amino-1-methyl-6-phenylimidazo[4,5-b]pyridine by 2-D linear quadrupole ion trap mass spectrometry. Chem. Res. Toxicol. 20, 263–276. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (18).Guo J, Villalta PW, and Turesky RJ (2017) Data-independent mass spectrometry approach for screening and identification of DNA adducts. Anal. Chem. 89, 11728–11736. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (19).Cui Y, Wang P, Yu Y, Yuan J, and Wang Y (2018) Normalized retention time for targeted analysis of the DNA adductome. Anal. Chem. 90, 14111. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (20).Patti GJ, Yanes O, and Siuzdak G (2012) Innovation: metabolomics: the apogee of the omics trilogy. Nat. Rev. Mol. Cell Biol. 13, 263–269. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (21).Bantscheff M, Lemeer S, Savitski MM, and Kuster B (2012) Quantitative mass spectrometry in proteomics: critical review update from 2007 to the present. Anal. Bioanal. Chem. 404, 939–965. [DOI] [PubMed] [Google Scholar]
- (22).Kanaly RA, Matsui S, Hanaoka T, and Matsuda T (2007) Application of the adductome approach to assess intertissue DNA damage variations in human lung and esophagus. Mutat. Res., Fundam. Mol. Mech. Mutagen 625, 83–93. [DOI] [PubMed] [Google Scholar]
- (23).Totsuka Y, Lin Y, He Y, Ishino K, Sato H, Kato M, Nagai M, Elzawahry A, Totoki Y, Nakamura H, Hosoda F, Shibata T, Matsuda T, Matsushima Y, Song G, Meng F, Li D, Liu J, Qiao Y, Wei W, Inoue M, Kikuchi S, Nakagama H, and Shan B (2019) DNA Adductome Analysis Identifies N-Nitrosopiperidine Involved in the Etiology of Esophageal Cancer in Cixian, China. Chem. Res. Toxicol. 32, 1515. [DOI] [PubMed] [Google Scholar]
- (24).Wilson MR, Jiang Y, Villalta PW, Stornetta A, Boudreau PD, Carrá A, Brennan CA, Chun E, Ngo L, Samson LD, Engelward BP, Garrett WS, Balbo S, and Balskus EP (2019) The human gut bacterial genotoxin colibactin alkylates DNA. Science 363, eaar7785. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (25).Chang YJ, Cooke MS, Hu CW, and Chao MR (2018) Novel approach to integrated DNA adductomics for the assessment of in vitro and in vivo environmental exposures. Arch. Toxicol. 92, 2665–2680. [DOI] [PubMed] [Google Scholar]
- (26).Zimmermann M, Wang SS, Zhang H, Lin TY, Malfatti M, Haack K, Ognibene T, Yang H, Airhart S, Turteltaub KW, Cimino GD, Tepper CG, Drakaki A, Chamie K, de Vere White R, Pan CX, and Henderson PT (2017) Microdose-Induced Drug-DNA Adducts as Biomarkers of Chemotherapy Resistance in Humans and Mice. Mol. Cancer Ther. 16, 376–387. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (27).Guo J, Villalta PW, Weight CJ, Bonala R, Johnson F, Rosenquist TA, and Turesky RJ (2018) Targeted and untargeted detection of DNA adducts of aromatic amine carcinogens in human bladder by ultra-performance liquid chromatography-high-resolution mass spectrometry. Chem. Res. Toxicol. 31, 1382. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (28).Xiao S, Guo J, Yun BH, Villalta PW, Krishna S, Tejpaul R, Murugan P, Weight CJ, and Turesky RJ (2016) Biomonitoring DNA adducts of cooked meat carcinogens in human prostate by nano liquid chromatography-high resolution tandem mass spectrometry: Identification of 2-amino-1-methyl-6-phenylimidazo[4,5-b]pyridine DNA adduct. Anal. Chem. 88, 12508–12515. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (29).Grollman AP, Shibutani S, Moriya M, Miller F, Wu L, Moll U, Suzuki N, Fernandes A, Rosenquist T, Medverec Z, Jakovina K, Brdar B, Slade N, Turesky RJ, Goodenough AK, Rieger R, Vukelić M, and Jelaković B (2007) Aristolochic acid and the etiology of endemic (Balkan) nephropathy. Proc. Natl. Acad. Sci. U. S. A. 104, 12129–12134. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (30).Pietsch KE, van Midwoud PM, Villalta PW, and Sturla SJ (2013) Quantification of acylfulvene- and illudin S-DNA adducts in cells with variable bioactivation capacities. Chem. Res. Toxicol. 26, 146–155. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (31).Stornetta A, Villalta PW, Hecht SS, Sturla SJ, and Balbo S (2015) Screening for DNA alkylation mono and cross-linked adducts with a comprehensive LC-MS adductomic approach. Anal. Chem. 87, 11706. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (32).Carra A, Guidolin V, Dator RP, Upadhyaya P, Kassie F, and Villalta PW (2019) Targeted High Resolution LC/MS3 Adductomics Method for the Characterization of Endogenous DNA Damage. Front. Chem, DOI: 10.3389/fchem.2019.00658. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (33).Doerr A (2015) DIA mass spectrometry. Nat. Methods 12, 35. [Google Scholar]
- (34).Gillet LC, Navarro P, Tate S, Röst H, Selevsek N, Reiter L, Bonner R, and Aebersold R (2012) Targeted data extraction of the MS/MS spectra generated by data-independent acquisition: a new concept for consistent and accurate proteome analysis. Mol. Cell. Proteomics 11, O111.016717. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (35).Sugimura T, Wakabayashi K, Nakagama H, and Nagao M (2004) Heterocyclic amines: mutagens/carcinogens produced during cooking of meat and fish. Cancer Sci. 95, 290–299. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (36).Bouvard V, Loomis D, Guyton KZ, Grosse Y, Ghissassi FE, Benbrahim-Tallaa L, Guha N, Mattock H, and Straif K (2015) Carcinogenicity of consumption of red and processed meat. Lancet Oncol. 16, 1599–1600. [DOI] [PubMed] [Google Scholar]
- (37).Sansone SA, Fan T, Goodacre R, Griffin JL, Hardy NW, Kaddurah-Daouk R, Kristal BS, Lindon J, Mendes P, Morrison N, Nikolau B, Robertson D, Sumner LW, Taylor C, van der Werf M, van Ommen B, and Fiehn O (2007) The metabolomics standards initiative. Nat. Biotechnol. 25, 846–848. [DOI] [PubMed] [Google Scholar]
- (38).Spicer RA, Salek R, and Steinbeck C (2017) A decade after the metabolomics standards initiative it’s time for a revision. Sci. Data 4, 170138. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (39).Spicer RA, Salek R, and Steinbeck C (2017) Compliance with minimum information guidelines in public metabolomics repositories. Sci. Data 4, 170137. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (40).Myers OD, Sumner SJ, Li S, Barnes S, and Du X (2017) One step forward for reducing false positive and false negative compound identifications from mass spectrometry metabolomics data: new algorithms for constructing extracted ion chromatograms and detecting chromatographic peaks. Anal. Chem. 89, 8696–8703. [DOI] [PubMed] [Google Scholar]
- (41).Tautenhahn R, Böttcher C, and Neumann S (2008) Highly sensitive feature detection for high resolution LC/MS. BMC Bioinf. 9, 504. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (42).Du P, Kibbe WA, and Lin SM (2006) Improved peak detection in mass spectrum by incorporating continuous wavelet transform-based pattern matching. Bioinformatics 22, 2059–2065. [DOI] [PubMed] [Google Scholar]
- (43).Smith CA, Want EJ, O’Maille G, Abagyan R, and Siuzdak G (2006) XCMS: processing mass spectrometry data for metabolite profiling using nonlinear peak alignment, matching, and identification. Anal. Chem. 78, 779–787. [DOI] [PubMed] [Google Scholar]
- (44).Tsugawa H, Cajka T, Kind T, Ma Y, Higgins B, Ikeda K, Kanazawa M, VanderGheynst J, Fiehn O, and Arita M (2015) MS-DIAL: data-independent MS/MS deconvolution for comprehensive metabolome analysis. Nat. Methods 12, 523–526. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (45).Pluskal T, Uehara T, and Yanagida M (2012) Highly accurate chemical formula prediction tool utilizing high-resolution mass spectra, MS/MS fragmentation, heuristic rules, and isotope pattern matching. Anal. Chem. 84, 4396–4403. [DOI] [PubMed] [Google Scholar]
- (46).Kuhl C, Tautenhahn R, Bottcher C, Larson TR, and Neumann S (2012) CAMERA: an integrated strategy for compound spectra extraction and annotation of liquid chromatography/mass spectrometry data sets. Anal. Chem. 84 (1), 283–289. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (47).Pluskal T, Castillo S, Villar-Briones A, and Oresic M (2010) MZmine 2: modular framework for processing, visualizing, and analyzing mass spectrometry-based molecular profile data. BMC Bioinf. 11, 395. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (48).Böcker S, Letzel MC, Liptak Z, and Pervukhin A (2009) SIRIUS: decomposing isotope patterns for metabolite identification. Bioinformatics 25, 218–224. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (49).Prince JT, and Marcotte EM (2006) Chromatographic alignment of ESI-LC-MS proteomics data sets by ordered bijective interpolated warping. Anal. Chem. 78, 6140–6152. [DOI] [PubMed] [Google Scholar]
- (50).Smith R, Ventura D, and Prince JT (2015) LC-MS alignment in theory and practice: a comprehensive algorithmic review. Briefings Bioinf. 16, 104–117. [DOI] [PubMed] [Google Scholar]
- (51).Holman JD, Tabb DL, and Mallick P (2014) Employing ProteoWizard to convert raw mass spectrometry data. Curr. Protoc. Bioinformatics 46, 13.24.1–13.24.9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (52).Chambers MC, Maclean B, Burke R, Amodei D, Ruderman DL, Neumann S, Gatto L, Fischer B, Pratt B, Egertson J, Hoff K, Kessner D, Tasman N, Shulman N, Frewen B, Baker TA, Brusniak MY, Paulse C, Creasy D, Flashner L, Kani K, Moulding C, Seymour SL, Nuwaysir LM, Lefebvre B, Kuhlmann F, Roark J, Rainer P, Detlev S, Hemenway T, Huhmer A, Langridge J, Connolly B, Chadick T, Holly K, Eckels J, Deutsch EW, Moritz RL, Katz JE, Agus DB, MacCoss M, Tabb DL, and Mallick P (2012) A cross-platform toolkit for mass spectrometry and proteomics. Nat. Biotechnol. 30, 918–920. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (53).Mahieu NG, and Patti GJ (2017) Systems-Level Annotation of a Metabolomics Data Set Reduces 25 000 Features to Fewer than 1000 Unique Metabolites. Anal. Chem. 89, 10397–10406. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (54).Ramus C, Hovasse A, Marcellin M, Hesse AM, Mouton-Barbosa E, Bouyssie D, Vaca S, Carapito C, Chaoui K, Bruley C, Garin J, Cianferani S, Ferro M, Dorssaeler AV, Burlet-Schiltz O, Schaeffer C, Coute Y, and Gonzalez de Peredo A (2016) Spiked proteomic standard dataset for testing label-free quantitative software and statistical methods. Data Brief 6, 286–294. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (55).Li S, and Mason CE (2014) The pivotal regulatory landscape of RNA modifications. Annu. Rev. Genomics Hum. Genet. 15, 127–150. [DOI] [PubMed] [Google Scholar]
- (56).Jonkhout N, Tran J, Smith MA, Schonrock N, Mattick JS, and Novoa EM (2017) The RNA modification landscape in human disease. RNA 23, 1754–1769. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (57).Lobue PA, Yu N, Jora M, Abernathy S, and Limbach PA (2019) Improved application of RNAModMapper - An RNA modification mapping software tool - For analysis of liquid chromatography tandem mass spectrometry (LC-MS/MS) data. Methods 156, 128–138. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (58).Boccaletto P, Machnicka MA, Purta E, Piatkowski P, Baginski B, Wirecki TK, de Crecy-Lagard V, Ross R, Limbach PA, Kotter A, Helm M, and Bujnicki JM (2018) MODOMICS: a database of RNA modification pathways. 2017 update. Nucleic Acids Res. 46, D303–D307. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (59).Guo J, and Turesky RJ (2019) Emerging Technologies in Mass Spectrometry-Based DNA Adductomics. High Throughput 8, 13. [DOI] [PMC free article] [PubMed] [Google Scholar]








