Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2022 Apr 27.
Published in final edited form as: Anal Chem. 2021 Apr 12;93(16):6491–6500. doi: 10.1021/acs.analchem.1c00362

Comprehensive Analysis of DNA Adducts Using Data-Independent wSIM/MS2 Acquisition and wSIM-City

Scott J Walmsley 1, Jingshu Guo 2, Paari Murugan 3, Christopher J Weight 4, Jinhua Wang 5, Peter W Villalta 6, Robert J Turesky 7
PMCID: PMC8675643  NIHMSID: NIHMS1696505  PMID: 33844920

Abstract

A novel software has been created to comprehensively characterize covalent modifications of DNA through mass spectral analysis of enzymatically hydrolyzed DNA using the neutral loss of 2′-deoxyribose, a nearly universal MS2 fragmentation process of protonated 2′-deoxyribonucleosides. These covalent modifications termed DNA adducts form through xenobiotic exposures or by reaction with endogenous electrophiles and can induce mutations during cell division and initiate carcinogenesis. DNA adducts are typically present at trace levels in the human genome, requiring a very sensitive and comprehensive data acquisition and analysis method. Our software, wSIM-City, was created to process mass spectral data acquired by a wide selected ion monitoring (wSIM) with gas-phase fractionation and coupled to wide MS2 fragmentation. This untargeted approach can detect DNA adducts at trace levels as low as 1.5 adducts per 109 nucleotides. This level of sensitivity is sufficient for comprehensive analysis and characterization of DNA modifications in human specimens.

Graphical Abstract

graphic file with name nihms-1696505-f0007.jpg

INTRODUCTION

DNA adducts occur through exposure to xenobiotics or their metabolites or endogenous processes, such as oxidative stress. If not repaired, some DNA adducts can disrupt normal cell functions and initiate cancer.13 Exogenous DNA adducts usually occur at very low levels, ranging from ∼1 adduct per 108 to 1 per 1010 nucleotides (nts). 32P-Postlabeling methodologies have served as the major platform to screen for DNA adducts during the past 4 decades.4 This technique is extremely sensitive, approaching limits of detection of 1 adduct per 1010 nts for some adducts, but does not provide physicochemical data to confirm the putative DNA lesions’ identities. In contrast, liquid chromatography−electrospray ionization tandem mass spectrometry (LC-ESI-MSn) methods can screen for DNA adducts and provide spectral information for structural identification5,6 and have supplanted the 32P-postlabeling assay as the preferred method to screen for DNA adducts. The simultaneous screening of multiple DNA adducts is the emerging field of DNA adductomics.

Data-dependent acquisition (DDA) and data-independent acquisition (DIA) MS methodologies adopted from the disciplines of metabolomics and proteomics are commonly used for targeted and untargeted DNA adductomics. DDA selects ions based upon their MS signal abundance, and the adducts’ structures are characterized by MSn (Figure 1A). In contrast, DIA selects and analyzes all ions in a sample using relatively wide mass isolation windows for MS and MS2.7 Gasphase fractionation (GPF) is a technique that separates the MS1 data collection into multiple wide m/z acquisition windows to increase the sensitivity of the assay at the expense of the number of data points collected. We recently adopted GPF for DNA adductomic analysis using a DIA-based methodology called wide selected ion monitoring tandem mass spectrometry (wSIM/MS2), as illustrated in Figure 1B, and used it to detect DNA modifications down to a level of 1.5 adducts in 109 nts of the genome.7 These adduct levels are comparable to the levels of some adducts detected in human tissues.8 In this approach, DNA is hydrolyzed to 2′-deoxynucleosides (dNs) and is screened by wSIM/MS2 for the neutral loss of 2′-deoxyribose (dR) (Figure 1A), a nearuniversal MS2 fragmentation mechanism of protonated nucleosides (precursor ions), which serves to detect modified nucleobase ions (aglycones).5,9

Figure 1.

Figure 1.

GPF technique and neutral loss screening by wSIM/MS2. (A) Characteristic neutral loss of dR from modified 2′-deoxynucleosides using dG-C8-PhIP as an example. The MS1 detected dG-C8-PhIP [M + H]+, and MS2 detected the aglycone guanine-C8-PhIP [BH2]+ following high energy collision-induced dissociation (HCD) at 25%. The molecular formulae of [M + H]+ and [BH2]+ are reported in blue. The red formula reports the change in formulae between the [M + H]+ and [BH2]+ molecules (the neutral loss of dR). (B) In wSIM, the SIM acquisition windows are set to a constant m/z width, followed by HCD of all ions in the wide-SIM window. The Y-axis displays the m/z ranges for wide-SIM and MS2 scans, and the X-axis reports the ∼3 s duty cycle.

MS-based targeted analysis of animal models and human studies have revealed that DNA adducts are structurally diverse. Among others, DNA adducts are formed from polar N-nitroso compounds present in tobacco smoke and cured meats, bulky aromatic amines, heterocyclic aromatic amines (HAAs), and polycyclic aromatic hydrocarbons (PAHs) formed in tobacco smoke or grilled meats,1013 and lipid peroxidation products are formed endogenously as a result of oxidative stress or inflammation.1416 wSIM/MS2 has successfully detected several known DNA adducts at trace levels in the human tissues.7,17 A DNA adduct of 2-amino-1-methyl-6-phenylimidazo[4,5-b]pyridine (PhIP), an HAA mutagen formed in grilled beef and a potential human colorectal and prostate carcinogen, was identified in the genome of prostate cancer patients.7,13 A DNA adduct of the aromatic amine 4-aminobiphenyl (4-ABP), a bladder carcinogen present in tobacco smoke, was identified in the bladder DNA of cancer patients,17 and a DNA adduct of aristolochic acid I (AA-I), a human renal carcinogen present in some traditional herbal medicines, was identified in the renal cortex DNA of kidney cancer patients.7 However, automated searching in the DNA adductome for known and unknown DNA adducts displaying the signature dR loss has not been possible due to the inability of the existing software to process the GPF data generated by the wSIM/MS2 method. Here, we report our software solution, wSIM-City, specifically developed for processing wSIM/MS2 data for the rapid, comprehensive, and untargeted detection of DNA adducts in genomic DNA at trace levels.8

Popular software solutions for metabolomics, including XCMS, MZmine 2, and MS-DIAL, are incompatible with the multiple wSIM/MS2 data acquired by GPF unless the data are separated into individual files based on m/z acquisition ranges, resulting in cumbersome data analysis. This incompatibility is due to the multiple MS1 scan events used to cover the MS1 range of interest. The software solutions are also limited in their ability to detect many DNA adducts with asymmetric chromatographic peak shapes1820 due to an assumption of Gaussian peak shapes. The programs also use a low-intensity cutoff to remove interferences and noise in the MS data at the low signal levels that are often seen with trace-level DNA adducts.2022 The challenges associated with trace-level feature finding have been discussed, with the most problematic issue being the increase in the number of false features detected upon the lowering of the minimum intensity threshold.8 wSIM-City performs accurate and trace-level feature detection of signal abundances in the presence of noise and combines signals of spectral peaks with near-identical masses into series of signal intensities representing a chromatographically eluting single peak at the low signal range of DNA adducts. wSIM-City thus provides sensitive and concordant feature detection of a DNA adducts with a neutral losses of dR (or any user-provided neutral loss) at a parts-per-billion nucleoside levels in the human biospecimens. Other features of wSIM-City include computation of an abundance ratio between the MS1 and MS2 signals, scoring to assess the quality of the detected DNA adducts, and exporting of plots and tables of results (Figure 2). wSIM-City provides the user with an annotated list of candidate DNA adducts for further experimental evaluation and identification.

Figure 2.

Figure 2.

Scheme for wSIM-City and resulting qualitative plots. (A) Scheme for the algorithm (left) and corresponding visualization of the method (right). (B) Qualitative plot includes XIC (upper left), raw peak profile (upper right), identified MS1 peaks at maximum intensity (lower left) and MS2 peaks at maximum intensity (lower right). Blue and red lines indicate [M + H]+ and [BH2]+ ion signals, respectively. The title reports the DNA adduct name (if known), and detected [M + H]+ m/z and retention time in minutes. (C) Example of a DNA adductome map. A web view bubble plot of DNA adducts is produced for every analysis. High scoring DNA adducts are visualized in an interactive web plot with colors indicating the calculated scores.

In this article, we report our wSIM-City software and demonstrate its power to screen for DNA adducts of environmental and dietary carcinogens in the rat model and the prostate genome of prostate cancer patients.

MATERIALS AND METHODS

Materials.

A complete description of the chemical reagents is reported in the Supporting Information.

Sample Preparation and Mass Spectrometry.

Sample preparation, animal dosing, tissue collection, wSIM/MS2, and CNL-DDA/MS3 data acquisition and methods utilized to process data, develop, and test the software are described in the Supporting Information.

Neutral Loss Search Algorithm.

Search for Adduct Masses in the NL Data.

The algorithm first computes the global characteristic distributions of peak mass errors for each proposed precursor ([M + H]+) and aglycone ([BH2]+) ion pair and then scores each candidate DNA adduct. Each set of potential adduct ion signals must be present within the userprovided retention time and mass ranges, typically two duty cycles (in our case ∼6 s) and 20 ppm mass range. A wide 20 ppm m/z tolerance window is used so that the algorithm can model a range of dR loss errors whose score ranges from 0 to 1. The distribution of these dR mass errors is Gaussian in shape and provides the basis for scoring each candidate DNA adduct. Details and equations describing the search and scoring scheme can be found in the Supporting Information.

Construction of Extracted Ion Chromatograms and Selection of Peaks.

For each SIM and NL feature pair, extracted ion chromatograms (XICs) are constructed using methods adopted from MS-DIAL.21 Briefly, m/z values from each SIM and NL feature are grouped into a user-defined m/z window, typically 0.01 m/z. Every detected m/z found in these windows across the entire chromatographic elution is searched for matching m/z peaks of the [BH2]+ in the MS2 scan within a ppm search tolerance window (default 7 ppm). These peaks are then indexed and grouped with each other as separate peaks belonging to a single feature. Finally, only XICs with chromatographic peaks acquired with a minimum of 3 scans and a maximum of one missed scan or “gap” are considered. XIC statistics are then generated by computing the mean m/z ± ppm deviation using each m/z peak belonging to each XIC. The XICs are then smoothed using a lowess smoothing algorithm (native R function). The maximum peak intensity is then used to assign the retention time for each XIC peak. One major difference between our approach and the MS-DIAL implementation is that there is no minimum peak intensity setting to filter noise. Another difference is that XICs are constructed only for those originally matched SIM ([M + H]+) and NL ([BH2]+) mass peaks, which are assembled into XICs and those retention times (RT) are computed from the maximum detected intensities in the XICs. The primary advantage of our approach is that there is no minimum intensity threshold set for trace-level detection. For each XIC detected, the monoisotopic peak ([M + 0]) and its first isotopologue ([M + 1]) are grouped together by searching for XIC m/z values that are +1.00335 m/z ± 7 ppm (or any userdefined ppm value) and have a relative abundance <0.5 times the monoisotope.

Assignment of Formulae to Precursor and Aglycone Molecules and Computing Change in Atomic Formulae.

Molecular formulae are generated for each SIM and NL feature and serve as a heuristic filter to remove noise and false features. The formulae are computed for each feature using Rdisops (v.4.42) decomposeMass function with a 2 ppm mass tolerance. The formulae are computed for m/z values minus the mass of a proton (1.00728 Da). Each formula is then filtered for validity using a minimum double bond equivalency (DBE) calculated for the modified nucleosides (for SIM features) or nucleobases (for NL features). The difference in formula between a SIM and NL feature is computed for each atom using the subtraction of the dR formula with the additional loss of hydrogen. Generally, for a detected SIM-NL pair of features to be an actual DNA adduct, the formula of the SIM feature (FormSIM), denoted CaHbNcOd, minus the formula of a dR loss (C5H8N0O3) should equal the formula of the NL feature (FormNL) denoted as CwHxNyOz. Multiple candidate formulae can exist for each feature pair. A Δformula metric is calculated for each paired FormSIM and FormNL to remove potential false matches. The Δformula is calculated using ∑(atomSIM − atomNL), where atom is each constituent atom in the formulae, atomSIM = CaHbNcOd − C5H8N0O3, and atomNL is CwHxNyOz. Those Δformula arising from a true neutral loss and a putative DNA adduct produce a value equal to zero. When multiple formula pairings exist for a putative adduct, the best candidate is chosen using the formulae pair with the minimum mass error for the formula assignment.

Alignment of DNA Adducts Across Samples.

Alignment of SIM-NL feature pairs is performed using only the SIM features by applying the Join Align method described in MS-DIAL and MZmine 2.21,23 Briefly, a master feature list is assembled from all SIM features, and then, each sample’s feature is searched in the master list and scored using eqs 5 and 6 (Supporting Information), as described in the report by Tsugawa et al.21 employing user-defined m/z ppm error and RT tolerance windows. If a sample’s feature is present in the list, the sample’s feature intensity is added to the data matrix for that feature. No synthetic feature values are produced (e.g., “gap filling”).

Data Imputation, Statistics, and Hierarchical Clustering.

DNA adduct abundances aligned using the Join Align method were imputed when necessary to allow for statistical significance testing and hierarchical clustering. Imputation was carried out by replacement of zero abundance values with a value sampled from a Normal distribution using the rnorm function in R 3.2 with the mean set to the minimum observed abundance and the standard deviation set to 20% of the minimum abundance. Significance testing was completed using Welch’s t-test for unequal variances. Hierarchical clustering used the Euclidean distance with Wards clustering method.24

Plotting.

Individual DNA adduct plots are generated using the ggplot function from the ggplot R library (v2.3) together with a custom plotting method with a user-defined minimum score (eq 5, Supporting Information).

Web-Based Viewing.

Interactive plots (DNA adduct QC plots and DNA adduct maps) and data charts are generated using plotly and the DT R packages. These plots are then saved in the web html format using the HtmlWidgets R library. DNA adduct maps use the NL feature intensities for the bubble size, and the known adducts detected are colored. Individual DNA adduct plots include XIC traces and both SIM and NL spectra with detected isotopologues. These spectra are selected from the scans nearest to the maximum intensity detected in the XICs.

Data Storage.

Data produced as a result of intermediate processing steps, the final data, and alignment data are stored as a SQLite 3 database. Interoperability with wSIM-City is facilitated by using the RSQLite v 2.2.0 R package.25

RESULTS AND DISCUSSION

wSIM-City Algorithm and Informatics Workflow.

wSIM-City algorithm searches MS1 spectra and the subsequent MS2 ion signals for the characteristic neutral loss of dR (−116.0473 Da, Figure S1). The list of candidate mass peaks with this characteristic neutral loss is aggregated into pairs of precursor (denoted: [M + H]+) and aglycone ([M + H]+ minus dR, denoted: [BH2]+) peaks. At this point, the putative DNA adduct list can include more than 100,000 entries, most of which are false positives. Similar mass peaks are then aggregated into features utilizing the method employed by MS-DIAL without a low-intensity cutoff.21 A score based upon the mass error (δM) and retention time differences (δRT) between the [M + H]+ and [BH2]+ ions and retention times (Figures 2A and S1) is calculated and used to reduce the number of false positives. The collection of detected δM and δRT follow a Gaussian distribution and are employed similarly to MS-DIAL’s scoring during alignment of features between samples by retention time.21 After isotopologues of both [M + H]+ and [BH2]+ peaks are identified, the molecular formulae are generated for both using the R package Rdisop.26 The list of putative DNA adduct identifications is further filtered and refined using heuristics commonly used in metabolomic analyses, including the seven golden rules for molecular formula generation, which includes DBE, atomic ratios, isotopic ratios, and the Lewis atomic rules, among others.27 wSIM-City further refines the DNA adduct list using a minimum DBE cutoff, determined by the DBE of unmodified nucleobases and a comparison of generated formulae between the [M + H]+ and [BH2]+ peaks. The XICs for the [M + H]+ and corresponding [BH2]+ ions of the refined candidate list are generated, and the integrated chromatographic peak areas are used to calculate [BH2]+/[M + H]+ abundance ratios. A userdefined [BH2]+/[M + H]+ ratio can be established to remove likely false-positive DNA adducts. The XICs are plotted (Figure 2B) to aid in visual confirmation of the candidate DNA adducts to assess and further reduce the number of false-positive putative DNA adducts. Interactive landscape maps of high-scoring putative DNA adducts are also produced (Figure 2C), which include tentative structure assignment to previously documented DNA adducts from the literature. wSIM-City was employed to identify DNA adducts formed in the rat liver genome after exposure to a cocktail of genotoxic carcinogens and genomic DNA from tumor-adjacent normal prostate tissues from prostate cancer patients.13

Detection of Spike-in DNA Adducts in ctDNA.

Nine synthetic DNA adduct standards were spiked into calf thymus DNA (ctDNA) at levels of 0.8, 2.7, and 8.0 adducts per 108 nts and were analyzed for the loss of dR (−116.0473 Da) by wSIM/MS2.7,8 Using wSIM-City, we determined the true positive rate (TPR) for the detection of the spike-in DNA adducts. The trace-level feature finding algorithm effectively reduced the size of a mass peak list from (2.33 ± 0.27 RSD) × 105 (MS1 mass peaks that had corresponding MS2 mass peaks differing by 116.0473 Da ± 10 ppm) to a refined list of 9832 ± 0.22 RSD ([M + H]+ and [BH2]+ pairs). A further reduction to 263 ± 0.18 RSD putative DNA adducts was achieved when employing the score cutoff for the dR loss (SNL > 0.9, Supporting Information) and filtering by heuristics. The composite score value of 0.9 was chosen as a cutoff value for high confidence detection of spike in DNA adduct hits; lower values resulted in false positives (S. Walmsley, unpublished observations). Estimates of false-positive rates are challenging due to the presence of endogenous DNA adducts and artifacts in all available DNA sources. Finally, the number of putative DNA adducts was reduced to 47 ± 0.69 RSD across all samples after applying the [BH2]+/[M + H]+ abundance ratio filter (ratio = 0.2× to 5×). There were more candidate DNA adducts than spike-in DNA adducts across all samples, indicating the presence of putative endogenous DNA adducts in ctDNA. Most of these adducts’ identities are unknown and will require further study for structural elucidation. However, the purpose of this first study was to optimize and validate the software by determining the detection rate of the spike-in adducts at levels of DNA modifications present in the human genome.1012

The total number of detected putative DNA adducts and TPR values increased as a function of the increasing levels of spike-in adduct standards. The TPR for detecting the spike-in adducts using wSIM-City was 67, 89, and 100% at the 0.8, 2.7, and 8.0 adducts per 108 nts, respectively. The spike-in DNA adduct identities were confirmed using the wSIM-City qualitative XIC plots of the [M + H]+ and [BH2]+ peaks. An example of wSIM-City’s web view qualitative plot is shown in Figure 2B for the spike-in DNA adduct, N-(2′-deoxyguanosin-8-yl)-2-amino-3-methylimidazo[4,5-f ]quinoline (dG-C8-IQ, an adduct of a carcinogen formed in cooked meat).28 Qualitative plots display the XICs of the [M + H]+ and [BH2]+ features, the raw detected mass peaks per scan, and the monoisotope and the second isotopologue peak (if above the limit of detection) for both [M + H]+ and [BH2]+ ions. These qualitative plots assist in the detection and corroboration of likely true-positive DNA adducts. Figure 3 shows the XICs of three spike-in DNA adducts (N-(2′-deoxyguanosin-8-yl)-2-amino-1-methyl-6-phenylimidazo[4,5-b]pyridine (dG-C8-PhIP), N-(2′-deoxyadenosin-8-yl)-4-aminobiphenyl (dG-C8–4-ABP), and N-(2′-deoxyguanosin-8-yl)-2-amino-9H-pyrido-[2,3-b]indole (dG-C8-AαC), demonstrating a successful unbiased detection of the spike-in DNA adducts at levels of DNA damage that occur in humans. As was reported by Guo et al.,7 some DNA adducts, such as 10-(2′-deoxyguanosin-N2-yl)7,8,9-trihydroxy-7,8,9,10-tetrahydrobenzo[a]pyrene (dG-N2-B-[a]PDE, m/z 570.1983) formed from the human lung carcinogen, benzo[a]pyrene (B[a]P), are below the limit of detection because of the generation of multiple product ions of the B[a]PDE moiety at the MS2 scan stage, resulting in low intensity of the aglycone ion under the global collision energy (CE, 25%). However, changing the wSIM-City search criteria to the neutral losses of 267.0967 Da (-dG), leaving the ions at m/z 303.1016 ([B[a]PDE-F1]+) and 285.1073 Da (-dG-H2O), and leaving the ion m/z 285.0910 ([B[a]PDE-F2]+) as the major neutral fragments, allowed for the detection of dG-N2B[a]PDE. Alternatively, lowering the global CE from 25% to 15% increased the [BH2]+ ion abundance for dG-N2-B[a]PDE and permitted its detection by wSIM/MS2 and wSIM-City (Figure S2). The majority of DNA adducts reported in the literature lose dR as a neutral fragment to give the aglycone ion as the base peak, if not the only peak, in the MS2 spectra. This phenomenon is also observed upon examination of our DNA adductome database, where >1000 MS2 and MS3 spectra have been collected from over 150 synthetic DNA adduct standards obtained from collaborators worldwide.31

Figure 3.

Figure 3.

Detected candidate peaks, features, and DNA adducts. (A,B) [M + H]+ and [BH2]+ XICs detected using the software for dG-C8-PhIP; (C,D) dG-C8–4-ABP; (E,F) dG-AαC. XIC trace colors: gray, control, no spike ins. Green: 0.8 in 108 nt. Blue: 2.7 in 108 nt. Red: 8.0 in 108 nt.

wSIM/MS2 Detection of DNA Adducts in the Liver of Rats Treated with a Cocktail of Carcinogens.

We applied wSIM/MS2 and wSIM-City to detect DNA adducts formed in the liver of male rats exposed to B[a]P, 4-ABP, PhIP, AαC, and MeIQx, which are rodent and potential human carcinogens. B[a]P and 4-ABP are formed in tobacco smoke; PhIP, AαC, and MeIQx are HAAs and also form in tobacco smoke and cooked meat.1013,2830

wSIM-City detected many DNA adducts from wSIM/MS2 data, including dG-C8-PhIP, dG-C8-MeIQx, dG-C8-AαC, and dG-C8–4-ABP. Minor adducts including the isomeric N-(2′-deoxyguanosin-N2-yl)-4-aminobiphenyl (dG-N2-N4-4-ABP) adduct and a dA adduct of 4-ABP N-(2′-deoxyadenosin-8-yl)-4-aminobiphenyl (dA-C8–4-ABP) were also identified.29 Statistical analysis comparing the carcinogen-treated (N = 3) versus control (N = 3) rats and hierarchical clustering of the DNA adduct [M + H]+ ion intensities confirmed a global increase in DNA adducts in the treated group (Figure 4C). The DNA adduct identities were confirmed by DDA-constant neutral loss-multistage mass spectrometry (DDA-CNL/MS3) analysis (Figure 4E,G) with comparison of MS3 spectra to entries in a mass spectra library.31

Figure 4.

Figure 4.

Detection of putative DNA adduct in rats treated with a cocktail of carcinogens. (A,B) DNA adductome maps of liver DNA from control and treated rats for significantly altered levels of DNA adducts (p < 0.1, Welchs t-test). The alignment of DNA adduct peaks across samples was used to statistically evaluate changes in the DNA adduct profiles in the liver of rats after exposure to carcinogens. Bubble sizes indicate relative abundance, blue labels report the detected m/z value, and red texts are known putative DNA adduct names. (C) Hierarchical clustered significant features (p < 0.1) of control and treated rats. “C” = control, “T” = treatment, the first number in the name is sample number, the second number in the name is replicate number. (D) Extracted ion chromatograms (XICs) for dG-C8–4-ABP formed in treated rats. Blue lines represent the control group and red lines treatment group. The known structure is shown with observed fragments annotated. (E) Observed MS3 library match to publicly available and validated spectrum produced using the NIST MS Search v2.3 program. (F,G) XIC and MS3 spectrum for the identification of dG-C8-PhIP.

wSIM-City analyses of XICs were aligned by retention time and m/z between samples. Some peaks had missing scans due to low ion abundance and were imputed using a number corresponding to the lowest observed abundance (±20% RSD). The statistically significant changes of the detected putative adduct levels in the rat liver after treatment with carcinogens were tested using Shapiro Wilk’s test, followed by hierarchical clustering. Thirty-nine putative DNA adducts increased significantly (p < 0.1) in the liver DNA of the carcinogen-treated rats, whereas adduct levels of 10 other putative DNA adducts had decreased. Among a total of 2552 putative DNA adducts with a high wSIM-City final score (SNL > 0.9) across all samples, only 334 were detected in two or more samples. Thus, the peak alignment and repeated detection of DNA adducts in multiple samples refine the number of positive hits and reduce the burden of manually validating thousands of DNA adducts. Some DNA adducts were detected in both control and treatment groups, indicating a common exposure source(s) of chemicals such as in the rodent diet or via endogenous electrophiles.32,33 Some of the putative endogenous DNA adducts were present at higher levels in treated animals than in the control animals (Figure 4C). The high-dose treatment of rats with carcinogens is known to induce oxidative stress and produce reactive intermediates, such as lipid peroxidation products, which form DNA adducts.34 Many of these putative adducts are likely “true” adducts because the fragmentation spectra acquired by DDA-CNL/MS3 contain fragment ions attributable to protonated guanine, adenine, cytosine, and their related fragment ions (Figure 5).3537 Thus, wSIM-City enabled the rapid, automated detection of DNA adducts, allowed for statistical evaluation of the resulting DNA adducts among different treatment groups, and provided a manageable list of known and unknown putative DNA adducts for subsequent structural identification via the DDA-CNL/MS3 approach. Product ion spectra of several DNA adducts at the MS3 scan stage with proposed annotations of major fragment ions are shown in Supplementary Figure S3.1013,28,30,37

Figure 5.

Figure 5.

Product ion spectra of putative DNA adducts in DNA from the liver of rats treated with carcinogens, by DDA-CNL/MS3. Potential DNA adducts detected by wSIM-City were characterized by MS3: (A) m/z 422.2036, a possible dA adduct, (B) m/z 424.2192, a possible dG adduct, (C) m/z 382.1973, a possible dC adduct, (D) m/z 376.1979, a possible etheno-dA adduct. For each compound, an annotated product ion spectrum at CE 50% HCD is reported with proposed sub-structures (highlighted in blue) consistent with the ions of adenine, guanine, cytosine, and their related fragments.3537 The structure of these ions are inserted with the theoretical m/z values.

Untargeted Detection of DNA Adducts in Prostate Tissue of Cancer Patients.

wSIM-City profiled the DNA adductome in tumor-adjacent normal prostate tissue of the transition zone from four prostate cancer patients, two of which had known levels of dG-C8-PhIP.13 Data were aligned, and 46 background signals detected in the enzymes and buffers employed for DNA digestion (background) were subtracted from the analysis. A database of 349 known DNA adducts, comprising 279 unique masses compiled from three previous DNA adductomic studies, was searched against the results of the wSIM-City output to identify any putative adducts which have been previously observed (Table S1).7,38,39

After scoring and heuristic filtering, 361 putative DNA adducts were detected including dG-C8-PhIP (m/z 490.1942) and the lipid peroxidation adducts: N2-dimethyldioxane-dG (m/z 382.1721, derived from acetaldehyde),40 1,N2-propano-2′-deoxycytidine adducts of (E)-4-hydroxy-2-nonenal (HNE) and (E)-4-oxonon-2-enal (ONE) (HNE-dC, m/z 382.1973 and ONE-dC m/z 364.1867),41,42 3-(2′-deoxy-β-D-ribofuranosyl)-7-methyl-8-formyl[2,1-i]pyrimidopurine (M1AA-dA, m/z 334.1503),40 and 4-hydroxy-2-heptenal-dG (m/z 396.1870).38 DDA-CNL/MS3 analysis confirmed the presence of dG-C8-PhIP in two patients (Figure 6B). N2-Dimethyldioxane-dG, 4-OH-2-heptenal-dG, and HNE-dC were detected and tentatively identified based on their MS3 spectra (Figures 6C,D, and S4). However, the signals of ONE-dC and M1AA-dA were too low in abundance to trigger quality MS3 spectra and were tentatively assigned by the accurate mass only. The identities of the other putative DNA adducts in prostate tissue are unknown and require further investigations to elucidate structures, as commercial standards are not available. Regardless, wSIM-City pared down the number of putative DNA adducts for future investigation. The large number of undocumented DNA adducts discovered by wSIM-City is consistent with the previous reports, revealing that the genome contains many putative DNA adducts of unknown structures.4345

Figure 6.

Figure 6.

Detection of DNA adducts in human prostate tissue. DNA adducts from prostate tissues of four cancer patients were profiled using wSIM-City and DDA-CNL/MS3. (A) DNA adductome map of aligned features include background control subtracted ions (pink), unknown DNA adducts (blue), and tentatively known DNA adducts (green). XICs of [M + H]+ and [BH2]+ ions and MS3 confirmation using DDA-CNL/MS3 confirmation of (B) dG-C8-PhIP, (C) 4-OH-2-heptenal-dG, and (D) N2-dimethyldioxane-dG DNA adducts from a prostate cancer patient. The proposed annotated fragment ions of dG-C8-PhIP are reported in Figure 4F. The tentatively assigned structures for 4-OH-2-heptenal-dG and N2-dimethyldioxane-dG are inserted in (C,D). The proposed annotated fragment ions of these adducts are highlighted in blue.

CONCLUSIONS

Here, we describe our wSIM-City software for the untargeted detection of DNA adducts acquired by wSIM/MS2. The software searches for the loss of dR (or any user-defined change in mass in the paired wSIM/MS2 scans). Once a candidate analyte is identified, trace-level feature-based detection and construction of XICs are performed. Scoring and heuristics are used to rank the quality of the candidate DNA adducts. The algorithm successfully detected spiked-in bulky aromatic DNA adducts in ctDNA at high TPR, except for the lowest spike-in level which was close to the detection limit. wSIM-City detected many DNA adducts in the liver of rats treated with carcinogens, and subsequent DDA-CNL/MS3 analysis confirmed the identities of DNA adducts of PhIP, 4-ABP, AαC, and MeIQx. However, there are many observed putative, endogenous DNA adducts whose identities are unknown. We surmise that many of these adducts are derived from endogenous electrophiles, possibly due to oxidative stress induced by the dosed carcinogens or by chemicals in the rodent’s diet.32,33 Finally, wSIM-City detected several DNA adducts in the human prostate, including one from the cooked meat carcinogen PhIP and four tentatively identified as resulting from lipid peroxidation (confirmatory analysis is needed), and 361 were unknown, putative DNA adducts.

Previous DNA adductomic studies also reported hundreds to thousands of putative unknown DNA adducts in the lung and esophagus of cancer patients.4345 The early DNA adductomic studies of lung DNA conducted by Kanaly and co-workers were performed at a nominal mass resolution with a triple quadrupole mass spectrometer, and the assignment of the neutral loss dR moiety for many putative DNA adducts was uncertain.43,44 It is not known whether many of these analytes were actual DNA adducts or simply other compounds that lose 116 Da upon collision-induced dissociation in MS/MS. Totsuka and colleagues also observed up to a thousand unknown, putative DNA adducts in the human esophagus when employing high-resolution quadrupole time-of-flight (QTOF) mass spectrometry (MS).45 One adduct was formed with N-nitrosopiperidine, a potential human esophageal carcinogen.

We are creating a mass spectral library of DNA adducts to assist in identifying this seemingly large number of DNA adducts.31 Future development of the data analysis workflow will include identification of additional fragment ions from the wSIM/MS2 data to generate “pseudo-MS2” spectra which can be searched against our DNA adduct spectral library. However, structural elucidation of unknown putative DNA adducts will require alternative approaches such as spectral similarity searching of MSn spectra, which is a promising approach to annotate unknown metabolic features in MS-based untargeted metabolomics.46,47 These findings suggest that the human genome contains a complex mixture of DNA adducts derived from multiple xenobiotics and endogenous electrophiles.

wSIM-City software allows for rapid detection and curation of DNA adducts using wSIM/MS2 scanning methods, greatly decreasing the burden of manually data mining wSIM/MS2 data for known DNA adducts and unknown DNA adducts in an unbiased fashion. wSIM-City was tailored for MS data collected using the wSIM/MS2 method on a Tribrid Orbitrap MS but can be used with other high-resolution mass spectrometers where MS1 ions can be fractionated in the gas phase with the same MS1 and MS2 isolation ranges. wSIM-City allows the user to profile carcinogen and natural endogenous exposures and characterize the chemically-modified genome.

Supplementary Material

Supplementary Info and Table

Acknowledgments

Funding

This work was supported by the University of Minnesota Masonic Cancer Center, the National Cancer Institute (R01CA122320, R01CA220367, and R50CA211256), and the National Institute of Environmental Health Sciences (R01ES019564, U2CES026533, and R03ES031188). Mass spectrometry was supported by Cancer Center Support Grant CA077598 from the National Cancer Institute, and human biospecimens were supported by the National Center for Advancing Translational Sciences of the National Institutes of Health award number UL1TR000114. The Turesky laboratory gratefully acknowledges the support of the Masonic Chair in Cancer Causation.

Footnotes

ASSOCIATED CONTENT

Supporting Information

The Supporting Information is available free of charge at https://pubs.acs.org/doi/10.1021/acs.analchem.1c00362.

Detailed materials and methods describing sample processing, mass spectrometry and detailed overview of algorithm, algorithm and detection of putative DNA adducts, and known DNA adducts (PDF)

Complete contact information is available at: https://pubs.acs.org/10.1021/acs.analchem.1c00362

The authors declare no competing financial interest.

All mzML mass spectrometry raw data files are available from Figshare (https://figshare.com/s/46cecbb2ff9670c0b429, DOI: 10.6084/m9.figshare.13585262).

wSIM-City is available for download as an R package from GitHub at https://github.com/scottwalmsley/wSIMCity and is licensed using the Apache 2.0 license.

Contributor Information

Scott J. Walmsley, Masonic Cancer Center and Institute of Health Informatics, University of Minnesota, Minneapolis 55455, Minnesota, United States.

Jingshu Guo, Masonic Cancer Center and Department of Medicinal Chemistry, College of Pharmacy, University of Minnesota, Minneapolis 55455, Minnesota, United States.

Paari Murugan, Department of Laboratory Medicine and Pathology, University of Minnesota, Minneapolis 55455, Minnesota, United States.

Christopher J. Weight, Glickman Urologic and Kidney Institute, Cleveland Clinic, Cleveland 44125, Ohio, United States; Case Comprehensive Cancer Center, Cleveland 44106, Ohio, United States

Jinhua Wang, Masonic Cancer Center and Institute of Health Informatics, University of Minnesota, Minneapolis 55455, Minnesota, United States.

Peter W. Villalta, Masonic Cancer Center and Department of Medicinal Chemistry, College of Pharmacy, University of Minnesota, Minneapolis 55455, Minnesota, United States.

Robert J. Turesky, Masonic Cancer Center and Department of Medicinal Chemistry, College of Pharmacy, University of Minnesota, Minneapolis 55455, Minnesota, United States.

REFERENCES

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary Info and Table

RESOURCES