Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2011 Mar 17.
Published in final edited form as: J Am Chem Soc. 2010 Mar 17;132(10):3388–3399. doi: 10.1021/ja908282f

Structural Characterization of Formaldehyde-induced Cross-links Between Amino Acids and Deoxynucleosides and Their Oligomers

Kun Lu 1, Wenjie Ye 1, Li Zhou 2, Leonard B Collins 1, Xian Chen 2, Avram Gold 1,, Louise M Ball 1,, James A Swenberg 1,
PMCID: PMC2866014  NIHMSID: NIHMS181814  PMID: 20178313

Abstract

Exposure to formaldehyde results in the formation of DNA-protein cross-links (DPCs) as a primary genotoxic effect. Although DPCs are biologically important and eight amino acids have been reported to form stable adducts with formaldehyde, the structures of these cross-links have not yet been elucidated. We have characterized formaldehyde-induced cross-links of Lys, Cys, His and Trp with dG, dA and dC. dT formed no cross-links, nor did Arg, Gln, Tyr or Asn. Reaction of formaldehyde with Lys and dG gave the highest yield of cross-linked products, followed by reaction with Cys and dG. Yields from the other coupling reactions were lower by a factor of 10 or more. Detailed structural examination by NMR and mass spectrometry established that the cross-links between amino acids and single nucleosides involve a formaldehyde-derived methylene bridge. Lys yielded two additional products with dG in which the linking structure is a 1,N2-fused triazino ring. The Lys cross-linked products were unstable at ambient temperature. Reactions between the reactive Nα-Boc-protected amino acids and the trinucleotides d(T1B2T3) where B2 is the target base G, A or C and reactions between dG, dA and dC and 8-mer peptides containing a single reactive target residue at position 5 yielded cross-linked products with structures inferred from high resolution mass spectrometry and fragmentation patterns that are consistent with those between Nα-Boc-protected amino acids and single nucleotides rigorously determined by NMR studies. These structures will provide a basis for investigation of the characteristics and properties of DPCs formed in vivo and will be helpful in identifying biomarkers for the evaluation of formaldehyde exposure both at site of contact and at distant sites.

Keywords: Formaldehyde, DNA-protein crosslink, Structure Elucidation

Introduction

Formaldehyde is a ubiquitous environmental contaminant, with human exposures occurring during use in industrial processes such as the manufacture of resins, particle board, plywood, leather goods, paper, and pharmaceuticals and through emission as a vapor over the lifetimes of these products1. Formaldehyde also occurs endogenously as a normal metabolic intermediate in human cells as well as under certain pathological conditions such as oxidative stress. While formaldehyde is a known carcinogen and mutagen and has thus evoked serious health concerns2-6, its mode of action is not well understood. Among the interactions of formaldehyde considered to have biological relevance are the induction of DNA adducts,7-14 protein modifications,15,16 inter-strand DNA cross-links,17-19 and DNA-protein cross-links (DPCs),5,6,20,21 with the formation of DPCs being considered as the primary genotoxic effect following exposure to formaldehyde.5,6 The two routes shown in Scheme 1 may be responsible for the formation of DPCs. In pathway A, formaldehyde addition at a nucleophilic site on a protein is followed by cross linking between the resulting protein methylol adduct and a nucleophilic site on DNA, while in pathway B, addition at a nucleophilic site on a DNA base is followed by cross-linking between the DNA methylol adduct and a protein residue22. The formation of DPCs is favored by intimate interactions between DNA and proteins22 and the lysine-rich DNA-binding histones have been reported to cross-link with DNA9,20,21 in the presence of formaldehyde.

Scheme 1.

Scheme 1

The formation of formaldehyde-induced DPCs originating from the initial attack of formaldehyde on protein residues (A) and from the initial attack of formaldehyde on DNA (B).

Key to understanding the toxicity of formaldehyde is the detection and quantitation of the DPCs as biomarkers for formaldehyde exposure, both at sites of contact and at sites removed from contact. Of techniques available for this purpose, those based on liquid chromatography coupled with mass spectrometry have shown great promise in recent years23. Selected ion monitoring (SIM), selected reaction monitoring (SRM), or multiple reaction monitoring (MRM) provide ideal sensitivity and specificity for the quantitation of DNA adducts. To take advantage of these techniques, the molecular structures of the DPCs must be characterized and standards generated. In a previous communication, we described a formaldehyde-derived cross-link formed between the Cys sulfhydryl group of glutathione and dG, and suggested that this reaction may be relevant to systemic effects of formaldehyde transported as an S-hydroxymethylene conjugate of glutathione8. In the present work, we have established that cross-linking reactions occur between the amino acids Lys, Cys, His and Trp and the nucleosides dG, dA or dC. We have characterized the nature of the cross-links formed in the presence of formaldehyde between the reactive amino acids and trinucleotides d(T1B2T3) where B2 is the target base G, A or C. We have also examined the cross-links formed in the presence of formaldehyde between dG, dA and dC and N-terminal protected 8-mer peptides containing a single target residue at position 5. Finally, we have determined by NMR and mass spectrometric studies the structures of the cross-links formed between potentially cross-linking amino acids and dG, dA and dC in the presence of formaldehyde as shown in Chart 1. Such structures are anticipated to reflect cross-links in vivo in regions of single-stranded DNA and, implicitly, cross-links formed in double-stranded DNA.

Chart 1.

Chart 1

ccts between amino acids and nucleosides identified in this study. Formaldehyde-derived linkages are shown in red.

Experimental Section

Chemicals

Potassium phosphate, ammonium bicarbonate, trifluoroacetic acid, formic acid, acetonitrile, methanol, Nα-(tert-butoxycarbonyl)-L-lysine, tert-butoxycarbonyl-L-cysteine, Nα-(tert-butoxycarbonyl)-L-histidine, Nα-(tert-butoxycarbonyl)-L-tryptophan, Nα-(tert-butoxycarbonyl)-L-arginine, Nα-(tert-butoxycarbonyl)-L-asparagine, Nα-(tert-butoxycarbonyl)-L-glutamine, Nα-(tert-butoxycarbonyl)-L-tyrosine, deoxyadenosine, deoxythymidine, deoxycytidine and deoxyguanosine were purchased from Sigma (St. Louis, MO). The N-terminal acetylated 8-mer peptides Ac-VEGGRGAA, Ac-VEGGQGAA, Ac-GEGGCGAA, Ac-GEGGYGAA, Ac-VEGGKGAA, Ac-VEGGNGAA, Ac-GEGGHGAA and Ac-GEGGWGAA were synthesized by GenScript Corporation (Piscataway, NJ). 20% Formaldehyde in water was purchased from Tousimis (Rockville, MD). 13CH2O was purchased from Cambridge Isotope Laboratories (Andover, MA). All chemicals were used as received unless otherwise stated.

Instrumental methods

Nuclear magnetic resonance (NMR) analysis

NMR Spectra were recorded on a Varian INOVA NMR spectrometer (Varian, Inc., Palo Alto, CA) at 500 MHz for 1H NMR and 125 MHz for 13C spectra with the Varian cold probe.

Liquid Chromatography-Mass Spectrometry (LC-MS)

LC-MS analyses were performed on a LCQ-Deca ion trap mass spectrometer (Thermo Electron, Waltham, MA), a TSQ Quantum Ultra triple quadrupole mass spectrometer (Thermo Electron, Waltham, MA) or a Q-TOF LC/MS (Model 6240, Agilent Technologies, Santa Clara, CA). The mass spectrometers were equipped with electrospray ionization (ESI) sources. Analytes were separated by reverse phase HPLC using a 250 mm × 2.5 mm C18 analytical column from Grace Vydac (Hesperia, CA) eluted at 200 μL/min with a linear gradient from 2% to 60% solvent A in B over 15 min. Solvent A consisted of 0.1% formic acid in water and solvent B was methanol. The ion trap mass spectrometer was operated in full scan as well as dependent scan modes. For the fragmentation of precursor ions, the normalized collision energy of the ion trap mass spectrometer varied from 30% to 35% depending on the adduct. The activation time was set at 30 msec. The triple quadrupole mass spectrometer was operated in full scan, parent mode scan, MS/MS scan, SIM or SRM mode. The collision energy was set at 17 V for most fragmentation experiments. Q-TOF high resolution mass spectra were obtained on an Agilent 6250 Accurate Mass Q-TOF LC/MS (Agilent Technologies, Santa Clara, CA) equipped with a dual spray ESI source. For liquid chromatography, a Hypersil Gold column (Thermo Scientific, Waltham, MA) (150 × 2.1 mm, 3 μm particle size) was used with a linear gradient from 2% acetonitrile in 0.1% formic acid to 80% acetonitrile over 15 min, eluted at 200 μL/min. The ESI source was set as follows: gas temperature, 350 °C; drying gas, 10 L/min; Vcap, 4000 V; nebulizer, 35 psig; fragmentor, 100 V; skimmer, 65 V. Fourier-transform ion cyclotron resonance mass spectra (FTICR-MS) (10 scans) were acquired on a hybrid Qe-Fourier transform ion cyclotron resonance mass spectrometer equipped with a 12 T actively shielded magnet (Apex Qe-FTICR-MS, 12.0 T AS, Bruker Daltonics, Billerica, MA, USA), and an Apollo II microelectrospray (μESI) source. The voltages on the μESI sprayer, interface plate, heated capillary exit, deflector, ion funnel and skimmer were set at 4.2 kV, 3.9 kV, 300 V, 250 V, 175 V and 30 V, respectively. The μESI source temperature was maintained at 180 °C. Desolvation was carried out by using a nebulization gas flow (2 bar) and a countercurrent drying gas flow. Before transfer, ion packets were accumulated inside the collision cell for a duration of 0.02 second. Using a syringe pump (Cole Parmer, Vernon Hills, IL, USA), sample solutions were infused with a 250 μL syringe (Hamilton, Reno, NV, USA) at 90 μL/hour.

Formation of formaldehyde-induced cross-links

Typical reaction mixtures (final volume 50 μL) consisted of amino acid (1 mM) and deoxynucleoside (1 mM) dissolved in potassium phosphate buffer (10 mM, pH 7.2) to which formaldehyde was added to a final concentration of 50 mM. After 48 h, the reaction mixtures were either separated by reverse phase HPLC or analyzed by LC-MS. Lys-dG coupling reactions were carried out with 5, 50 and 100 mM formaldehyde. Reactions using 50 mM formaldehyde were run for 48, 60, 72, and 84 h. The coupling reaction was repeated with 50 mM 13C-formaldehyde for 48 h.

Formaldehyde-induced coupling between peptides and deoxynucleosides was accomplished in the same manner in 50 μL reaction volumes with final concentrations of 50 mM formaldehyde, 1.5 mM peptide and 5 mM deoxynucleoside. For coupling between amino acids and trinucleotides in 50 μL reaction volumes, final concentrations were 50 mM formaldehyde, 5 mM amino acid and 1 mM trinucleotide.

Large scale reactions between amino acids and deoxynucleosides for structural characterization by NMR were run with 20 mg of deoxynucleoside and 40 mg of Nα-Boc-amino acid in 5 mL 10 mM potassium phosphate buffer (pH 7.2) and 100 mM formaldehyde for 12 hours to 1 week, monitored by HPLC until the chromatographic trace remained constant. The reaction mixtures were separated by semi-preparative HPLC and collected products were characterized by mass spectrometry (Table 2) and NMR. For the Lys-dG reactions, HPLC fractions were collected on dry ice, lyophilized and then stored at − 80 °C until analysis. Exact masses of cross-linked products are tabulated in the text (Table 4). MS/MS data are presented in full as Supporting Information. 1H and 13C shifts are tabulated below.

Table 2.

Exact masses of protonated 8-mers cross-linked to deoxynucleosides by formaldehyde determined by positive ion ESI-QTOF-MS.

Cross-links Experimental Mass ([MH]+) Calculated Mass ([MH]+) Composition Δ (ppm)
Acetyl-VEGG(K-CH2-dG)GAA 1009.4687 1009.4697 C41H65N14O16+ −1.0
Acetyl-VEGG(TPHA-1) GAA 1021.4702 1021.4697 C42H65N14O16+ 0.5
Acetyl-VEGG(TPHA-2)GAA 1051.4805 1051.4803 C43H67N14O17+ 0.2
Acetyl-GEGG(W-CH2-dG)GAA 1025.4048 1025.4071 C43H57N14O16+ −2.2
Acetyl-GEGG(C-CH2-dG)GAA 942.3372 942.3370 C35H52N13O16S+ 0.2
Acetyl-VEGG(H- CH2-dA)GAA 1002.4383 1002.4388 C41H60N15O15+ −0.5
Acetyl-GEGG(C-CH2-dC)GAA 902.3304 902.3309 C34H52N11O16S+ −0.6
Acetyl-GEGG(C-CH2-dA)GAA 926.3427 926.3421 C35H52N13O15S+ 0.6

Kinetics of cross-link formation

Solutions were made up to final concentrations of 4 mM amino acid, 4 mM deoxynucleoside and 50 mM formaldehyde in 50 mM potassium phosphate buffer (pH=7.2). The reaction mixtures were analyzed at 2, 4, 8, 16, 24, 48 and 72 h by HPLC with UV detection at 254 nm. For the Lys-dG coupling reaction, additional samples were analyzed at 10, 20, 30 and 60 min.

Stability of cross-linked Lys-dG products

To measure the kinetics of cross-link degradation, a reaction mixture containing Lys and dG as described above was treated with 50 mM formaldehyde for 48 h then separated by HPLC; fractions eluting at 17.2 and 26.5 min were collected on dry ice. 50 μL aliquots of each fraction were added to 950 μL 50 mM phosphate buffer maintained at 37 °C for 1, 5, 10, 20 or 30 min, then analyzed by HPLC with ESI Q-TOF mass spectrometry.

2-Amino-6-(10-oxo-triazino[1,2-a]purin-7-yl)hexanoic acid (TPHA-1)

1H NMR (500 MHz, DMSO-d6) δ 7.91 (s,1H, H2), 7.87 (s, 1H, N5H, J = 2.1 Hz), 6.74 (d, 1H, Boc-NαH-, J = 4.7 Hz), 6.09 (dd, 1H, H1′, J = 7 .8, 5.9 Hz), 4.89 (s, 2H, C8H2), 4.33 (m, 1H, H3′), 4.24 (bs, 2H, C6H2), 3.83 (m,1H, H4′), 3.75 (dt, 1, CαH, J = 8.1, 8.1, 4.7 Hz), 3.54 (dd, 2H, H5′, J = 11.7, 4.5 Hz), 3.47 (dd, 2H, H5″, J = 11.7, 4.5 Hz), ∼2.47 (CεH2 (overlaps DMSO-d6)), ∼2.50 (H2′ (overlaps DMSO-d6)), 2.18 (ddd, 1H, H2″-TPHA-1, J=13.1, 5.9, 3.0 Hz), 1.34-1.48 (m, 2H, CδH2), 1.24-1.37 (m, 2H, CγH2), 1.44-1.58 (m, 2H, CβH2), 1.35 (s, 9H, CH3-Boc). (13C NMR, 125 MHz, DMSO-d6) δ 173.9 COOH, 155.8 C10, 155.2 COO-Boc, 150.2 C4a, 149.0 C3a, 134.9 C2, 115.5 C10a, 87.3 C4′, 82.1 C1′, 77.4 C-Boc 70.5 C3′, 61.4 C5′, 60.5 C8, 59.5 C6, 53.8 Cα, 49.2 Cε, 39.3 C2′, 30.8 Cβ, 27.9 CH3-Boc, 26.6 Cδ, 22.8 Cγ.

2-Amino-6-(5-hydroxymethyl-10-oxo-triazino[1,2-a]purin-7-yl)hexanoic acid (TPHA-2)

(1H NMR, 500 MHz, DMSO-d6) δ 7.96 (s,1H, H2), 6.93-6.99 (m,1H, NH-Lys), 6.19 (dd, 1H, H1′, J = 7.8, 5.9 Hz), 4.99 (d, 2H, N5CH2OH, J = 1.3 Hz), 4.95 (s, 2H, CH28), 4.45 (s, 2H, C6H2), 4.35 (m, 1H, H3′), 3.78-3.83 (m, 1H, H4′), 3.77-3.84 (m, 1H, CαH), 3.55 (dd, 1H, H5′, J = 11.6, 4.8 Hz), 3.49 (dd, 1H, H5, J = 11.6, 4.7 Hz), 2.58 (ddd, 1H, H2′, J = 13.2, 7.7, 5.9 Hz), 2.49 (m, 2H, CεH2), 2.21 (ddd,1H, H2″, J = 13.2, 7.8, 3.1 Hz), 1.50-1.69 (m, 2H, CβH2), 1.36-1.50 (m, 2H, CδH2), 1.36 (s, 9H, CH3-t-Boc), 1.28-1.34 (m, 2H, CγH2). (13C NMR, 125 MHz, DMSO-d6) δ 173.9 COOH-Lys, 156.0 C10, 155.4 COO- t-Boc, 149.4 C4a, 148.2 C3a, 136.0 C2, 115.9 C1a, 87.4 C4′, 82.3 C1′, 77.6 C-t-Boc, 70.7 C3′, 68.5 N5CH2OH, 63.9 C6, 61.5 C5′, 61.2 C8, 53.2 Cα, 49.0 Cε, 39.2 C2′, 30.4 Cβ, 27.9 CH3- t-Boc, 26.6 Cδ, 22.9 Cγ.

Cys-CH2-dG

1H NMR (500 MHz, DMSO-d6) δ 12.73 (s, 1H, COOH-Cys), 10.82 (s, 1H, N1H-dG), 7.99 (s, 1, H8), 7.11 (bs, 1H, N2H-dG), 6.99-7.08 (m, 1H, NH-Cys), 6.16 (ψt, 1H, H1′, J = 6.8 Hz), 4.48-4.57 (m, 2H, CH2-linker, J = 10 Hz), 4.35 (td, 1, H3′, J = 3.2, 3.2, 6.1 Hz), 4.06-4.12 (m, 1H, CαH-Cys), 3.78-3.82 (m, 1H, H4), 3.55 (dd, 1H, H5′, J = 11.6, 4.8), 3.48 (dd, 1H, H5″, J = 11.62, 4.8 Hz), 3.03 (dd, 1H, CβH2a, J = 13.5, 4.47 Hz), 2.84 (dd, 1H, CβH2b, J = 13.5, 9.19 Hz), 2.62-2.65 (m, 1H, H2′), 2.21 (ddd, 1H, H2″, J = 13.1, 6.8, 3.2 Hz), 1.36 (s, 9H, CH3-t-Boc. 13C NMR (125 MHz, DMSO-d6) δ 172.6 COOH-Cys, 156.7 C6, 155.3 COO-t-Boc, 151.9 C2, 150.0 C4, 136.1 C8, 120.0 C5, 87.8 C4′, 82.9 C1′, 78.2 C-t-Boc, 70.5 C3′, 61.5 C5′, 53.9 Cα, 43.2 CH2-linker, 39.3 C2′, 31.8 Cβ, 28.0 CH3-t-Boc.

Cys-CH2-dA

(1H NMR, 500 MHz, DMSO-d6) δ 8.42 (bs, 1H, N6H), 8.37 (s, 1H, H8), 8.26 (s, 1H, H2), 6.35 (ψt, 1H, H1′, J = 6.8 Hz), 6.01 (d, 1H, NH-Cys, J = 5.26 Hz), 4.66-4.77 (m, 1H, CH2a-linker), 4.50-4.61 (m, 1H, CH2b-linker), 4.40-4.44 (m, 1H, H3′), 3.85-3.89 (m, 1H, H4′), 3.61 (dd, 1H, H5′, J = 11.9, 4.3 Hz), 3.66-3.74 (m, 1H, CHα), 3.53 (dd, 1H, H5″, J = 11.9, 4.3 Hz), 2.96-3.07 (m, 2H, CβH2), 2.66-2.74 (m, 1H, H2′), 2.27 (ddd, 1H, H2″, J = 13.1, 6.8, 3.2 Hz), 1.33 (s, 9H, CH3-t-Boc).). (13C NMR, 125 MHz, DMSO-d6) δ 172.2 COOH-Cys, 154.6 COO-t-Boc, 153.6 C6, 150.5 C2, 148.5 C4, 139.9 C8, 120.2 C5, 87.3 C4′, 82.7 C1′, 78.0 C-t-Boc, 70.4 C3′, 61.5 C5′, 55.2 Cα, 43.0 CH2-linker, 39.3 C2′, 34.7 Cβ, 27.8 CH3-t-Boc.

Cys-CH2-dC

(1H NMR, 500 MHz, DMSO-d6) δ 8.59 (bs, 1H, N4H), 7.83 (d, 1H, H6, J = 7.5 Hz), 6.11 (ψt, 1H, H1′ J = 6.9 Hz), 5.986 (d, 1H, NH-Cys, J = 4.2 Hz), 5.74 (d, 1H, H5, J = 7.4Hz), 4.48 (dd, 1H, CH2a-linker, J = 13.4, 6.6 Hz), 4.26 (dd, 1H, CH2b- linker, J = 13.4 6.5 Hz), 4.20-4.24 (m, 1H, H3′), 3.71-3.74 (m, 1H, H4′), 3.64-3.69 (m, 1H, CHα), 3.53-3.60 (m, 2H, H5′,H5″), 2.98 (ddd, 2H, CβH2, J = 42.10, 13.5 4.3 Hz), 2.11 (ddd, 1H, H2′, J = 12.9, 6.9, 4.4; Hz), 1.93 (td, 1H, H2″, J = 12.9, 6.9, 6.4 Hz), 1.35 (s, 9H, CH3-t-Boc). (13C NMR, 125 MHz, DMSO-d6) δ 171.1 COOH-Cys, 162.5 C4, 154.7 C2, 154.3 COO-t-Boc, 140.0 C6, 94.7 C5, 87.1 C4′, 84.5 C1′, 77.2 C-t-Boc, 69.5 C3′, 60.8 C5′, 55.2 Cα, 42.3 CH2-linker, 40.1 C2′, 35.1 Cβ, 27.8 CH3-t-Boc.

His-CH2-dA

(1H NMR, 500 MHz, DMSO-d6) δ 8.83 (bs, 1H, NH-His), 8.42 (s, 1H, H8), 8.34 (s, 1H, H2), 7.61 (s, 1H, Hε2-His), 6.96 (s, 1H, Hδ1-His), 6.4 (bs, 1, N6H), 6.36 (ψt, 1H, H1′, J = 6.7 Hz), 5.59 (bs, 2H, CH2-linker), 5.32 (bs, 1H, OH3′), 5.12 (bs, 1H, OH5′), 4.39-4.43 (m, 1H, H3′), 3.84-3.89 (m, 1H, H4′), 3.83-3.87 (m, 1H, Hα-His), 3.48-3.65 (m, 2H, H5′,H5″), 2.69-2.81 (m, 2H, CβH2-His), 2.70-2.75 (m, 1H, H2′), 2.27 (ddd, 1H, H2″, J = 12.8, 6.7, 2.8 Hz), 1.27 (s, 9H, CH3-t-Boc). (13C NMR, 125 MHz, DMSO-d6) δ 173.4 COOH-His, 154.6 COO-t-Boc, 148.7 C4, 140.23 C8, 139.3 C2, 138.2 Cγ-His, 136.1 Cε2 -His, 119.5 C5, 115.5 Cδ1-His, 87.8 C4′, 83.7 C1′, 77.1 C-t-Boc, 70.6 C3′, 61.5 C5′, 54.0 Cα-His, 49.9 CH2-linker, 39.1 C2′, 30.2 Cβ-His, 27.9 CH3-t-Boc.

Trp-CH2-dG

(1H NMR, 500 MHz, DMSO-d6) δ 7.90 (s, 1H, H8), 7.06 (d, 1H, Hε3-Trp, J=7.14 Hz), 6.91-6.98 (m, 1H, Hη2-Trp), 6.50-6.58 (m, 2H, Hζ2, Hζ3 -Trp), 6.14 (ψt, 1H, H1′, J=6.8 Hz), 6.24-6.06 (m, 1H, N2H), 5.28-5.33 (bs, 2H, CH2-linker), 5.00-4.72 (bs, OH3′ + OH5′), 4.36-4.32 (m, 1H, H3′), 4.29-4.21 (m, 1H, CHα), 3.79-3.83 (m, 1H, H4′), 3.68-3.66 (m, 1H, CβH2a), 3.58-3.44 (m, 3H, CβH2b, + H5′,H5″), 2.51-2.61 (overlapping DMSO-d6), 2.17-2.25 (m, 1H, H2″), 1.40 (s, 4H, CH3-t-Boc1), 1.34 (s, 5H, CH3-t-Boc2). (13C NMR, 125 MHz, DMSO-d6) δ 156.3 C6, 152.8 CO2H, 150.0 C4, 150.0 Cδ2, 135.5 C8, 129.6 Cε2, 128.1 Cη2, 122.8 Cε3, 116.9 Cζ3, 116.8 C5, 108.4 Cζ2, 87.5 C4′, 82.8 C1′, 79.8 CH2-linker, 79.0 C-t-Boc, 70.8 C3′, 61.5 C5′, 56.4 Cα, 45.3 Cβ, 39.3 C2′, 27.9 CH3-t-Boc.

Results and discussion

Eight amino acids previously reported to form stable adducts with formaldehyde15 were investigated (as Nα-Boc derivatives) in coupling reactions with all four nucleosides to determine which would be of interest for characterization of reactions with oligonucleotides and oligopeptides. No cross-links could be detected with Arg, Gln, Tyr or Asn and consistent with previous studies10-13, the endocyclic nitrogen of dT did not form a coupling product with any of the amino acids. Figure 1 shows the differences in extent and rate of formation; reactions between Lys and dG, and Cys and dG are essentially complete after 24 h, whereas product increased over 72 h for the less reactive combinations Cys and dA, Cys and dC, His and dA, and Trp and dG.

Figure 1.

Figure 1

Formation of cross-links between 4 mM amino acid and 4 mM nucleoside treated with 50 mM formaldehyde: Inline graphic, sum of Lys products; Inline graphic, Cys-CH2-dG; ▲, Cys-CH2-dA; Inline graphic, Cys-CH2-dC; Inline graphic, His-CH2-dA; Inline graphic, Trp-CH2-dG. Relative yields are based on integration of HPLC peak areas in UV traces recorded at 254 nm.

The high amount of cross-links formed by Lys is of particular interest because this residue is involved in extensive DNA-protein contacts and may therefore be considered highly likely to form cross-links in vivo. The second most abundant cross-link, between Cys and dG, may also have relevance for the active site of alkylguanine alkyltransferases24 where a Cys residue can come into proximity with a formaldehyde adduct of guanine.

Trinucleotides cross-linked to Nα-Boc-protected amino acids

Since dT did not form adducts, we selected trinucleotides having G, A or C flanked by T as targets for studies of trinucleotide-amino acid cross-linking. Elemental compositions of the deprotonated molecules [M − H] of cross-linked products of the trinucleotides and Nα-Boc-protected amino acids were determined by high resolution ESI-QTOF mass spectrometry and are given in Table 1. All trinucleotides yielded deprotonated molecules with summed masses expected for the trinucleotide + Nα-Boc protected amino acid + 12 mass units, consistent with formation of a methylene link between the trinucleotide and amino acid. The reaction of TGT with Lys and formaldehyde yielded two additional products: [T(TPHA-1)T], having a composition expected for the formation of two methylene linkages and [T(TPHA-2)T], having a composition expected for two methylene linkages and a hydroxymethylene adduct.

Table 1.

Exact masses for the reaction products of trinucleotides and Nα-Boc-protected amino acids cross-linked by formaldehyde, determined by negative ion ESI-QTOF-MS.

Cross-links Experimental Mass ([M-H]-) Calculated Mass ([M-H]-) Composition Δ (ppm)
T(G—CH2—Lys)T 1132.3360 1132.3395 C42H60N11O22P2- −3.1
T(TPHA-1)T 1144.3373 1144.3395 C43H60N11O22P2- −1.9
T(TPHA-2)T 1174.3494 1174.3501 C44H62N11O23P2- −0.6
T(G—CH2—Trp)T 1190.3264 1190.3239 C47H58N11O22P2- 2.2
T(G—CH2—Cys)T 1107.2526 1107.2537 C39H53N10O22P2S- −1.0
T(A—CH2—His)T 1125.3048 1125.3085 C42H55N12O21P2- 3.3
T(A—CH2—Cys)T 1091.2556 1091.2588 C39H53N10O21P2S- −2.9
T(C—CH2—Cys)T 1067.2464 1067.2476 C38H53N8O22P2S- −1.1

As discussed below, NMR analysis of the products with multiple-methylene linkages establishes T(TPHA-1)T as T(10-oxo-triazino[1,2-a]purin-7-yl)T–substituted 2-aminohexanoic acid and T(TPHA-2)T as the corresponding N5-hydroxymethyl adduct. Formation of triazinane rings has precedent in the intramolecular condensation of terminal amino nitrogens from two peptide residues with formaldehyde as well as from glycine-formaldehyde condensation with the guanidino moiety of Arg15. Biological significance of these products is unlikely, because of the lower concentrations of formaldehyde in vivo.

Fragmentation of the reaction products between trinucleotides and amino acids was investigated by high resolution QTOF MS/MS to confirm that the target base in the second position was indeed the site of the cross-linking reaction. The high resolution data provide elemental compositions to support structural assignments of product ions. The product ion nomenclature applied for describing backbone fragmentation patterns of the trinucleotides follows the widely used convention given in Chart 2.

Chart 2.

Chart 2

MS/MS spectra, acquired on an ion trap mass spectrometer, have been reported for all 64 possible unmodified trinucleotides25. While extensive sequential decomposition would be more likely in the ion trap than in the QTOF used in our work, the fragmentation patterns observed for the modified trinucleotides show reaction sequences similar to those reported previously25. Without exception, the cross-linked trinucleotides formed singly charged anions. The MS/MS spectra of the cross-linked trinucleotides were characterized by initial cleavage of the coupling linkage, with the resulting product ions undergoing backbone fragmentation. With the exception of TAT cross-linked with His, none of the ions containing an intact base–methylene–amino acid linkage was the source of backbone fragmentations. Full MS-MS spectra and assignments of product ions for all cross-linked trinucleotides are presented as Supporting Information (Figures S1 – S8).

TGT derivatives

In addition to the w1 ion (dT-5′-P; m/z 321), a major ion in the MS/MS of all of the TGT derivatives was observed at nominal mass m/z 866 [(TGT − H + 12)]. This ion corresponds in composition to the Schiff base derivative of the trinucleotide at G, which is possible only at the exocyclic N2 and confirms that the formaldehyde-induced cross-linking reactions of TGT involve the target G. The non sequence ion from loss of neutral TH from the Schiff base adduct of the trinucleotide is also common to all the MS/MS spectra of cross-linked TGT products. In the MS/MS spectrum of T(G—CH2—Lys)T, bonds on either side of the methylene linkage cleave, leading to a prominent ion at m/z 874 resulting from the loss of the Schiff base adduct of Lys in addition to the ion at m/z 886. Backbone cleavages of the product ions at m/z 874 and 886 give rise to parallel series of sequence ions w2, x2, y2 and z2 separated by 12 mass units, as expected for source ions TGT and its Schiff base adduct at G. The MS/MS of T(G—CH2—Lys)T in Figure 2 is illustrative of the data obtained from QTOF analysis.

Figure 2.

Figure 2

ESI-QTOF-MS/MS of the deprotonated molecule T(G—CH2—Lys)T. Inline graphic ions containing Schiff base guanine; Inline graphic ions containing unmodified guanine; Inline graphic ions that do not contain guanine.

The triazino ring of T(TPHA-1)T fragments to yield the product ions TGT and Schiff base adduct of the trinucleotide from which are derived the same parallel series of backbone fragmentations observed in the MS/MS spectrum of the singly bridged TGT-Lys product (Figure S1; formation of sodium adducts of the major ions in the MS/MS spectrum of T(TPHA-1)T should be noted). In the MS/MS of T(TPHA-2)T (Figure S3), the cross-linking structure fragments sequentially to give prominent ions corresponding to [M − H − hydroxymethylene] (nominal m/z 1144) and [M − H − N-Boc − hydroxymethylene] (nominal m/z 1044) in addition to the Schiff base adduct of TGT at m/z 866, which is progenitor of the sole series of observed sequence ions w2, x2, y2 and z2.

The MS/MS spectrum of T(G—CH2—Cys)T (Figure S4) shows a single fragmentation pathway consistent with loss of Cys to give the Schiff base adduct of TGT at m/z 866 and formation of the sequence ions w2, x2 y2 and z2 from the expected backbone fragmentations. The MS/MS spectrum of T(G—CH2—Trp)T (Figure S5) yielded product ions in low abundance. The Schiff base adduct of TGT and backbone cleavage ions w2, x2 y2 and z2 were present at the expected nominal mass-to-charge ratios, however the accuracy of mass measurements was low.

TAT and TCT derivatives

Loss of Nα-Boc (m/z 1025) and (Nα-Boc + His) (m/z 870) were prominent ions in the MS/MS spectrum of T(A–CH2–His)T (Figure S6). The ion at m/z 1025 gave the non sequence ion (M − H − TH) and sequence ions w2 and x2 ions from backbone fragmentation in which the cross-link is intact (the only instance of backbone fragmentations with an intact cross-link). The Schiff base derivative from loss of His (m/z 870) yielded the parallel series of sequence ions w2 and x2, fixing the point of cross link attachment at exocyclic N6 of adenine. The MS/MS spectrum of the deprotonated molecule T(A–CH2–Cys)T (Figure S7) features (y2 − B2), w1 and T (base peak) ions along with a prominent ion at nominal mass m/z 545. However, no ion containing the Schiff base adduct of A was detected. The nominal mass m/z 545 corresponds to a 2′,3′-dideoxy-2′,3′-dehydro-AMP or 2′-deoxyadenosine-5′-phosphenate linked through a methylene bridge to Nα-Boc-His. The exact mass differs by +10.8 ppm from that calculated for the composition of the proposed structures, within sufficient tolerance to support the methylene bridge between Cys and the target N6 of A. However, a backbone fragmentation pathway yielding ions of either of the proposed structures was not reported for any of the 64 possible unmodified trinucleotides or observed for any of the other cross-linked trinucleotides in this study. Thus, support for the proposed structure of the cross-link relies on the NMR studies of the formaldehyde-derived cross-link between dA and Cys discussed below. Although product ions in MS/MS spectrum of T(C–CH2–Cys)T (Figure S8) were in low abundance, the exact mass of the base peak lies within an acceptable tolerance (4.1 ppm) for a composition corresponding to the Schiff base adduct of trinucleotide which is consistent with a methylene bridge between the C-N4 and the Cys sulfhydryl group. Sequence ions w2, x2 and w1 were also present at the expected nominal masses, in addition to the non-sequence ion (M − H − TH).

Peptides cross-linked to deoxynucleosides

We reacted dG, dA and dC with formaldehyde and N-terminal acetylated 8-mers in which position 5 contained one of the four residues which had been established as targets for cross-linking in the screening reactions. The exact masses of the protonated molecules are given in Table 2. As in the case of the trinucleotides, all of the 8-mers yielded singly charged ions with compositions corresponding to formation of a methylene cross-link, and the Lys-containing 8-mers yielded two additional products consistent with formation of the tricyclic cross-linked structures TPHA-1 and TPHA-2. In accord with a report on the formation of formaldehyde adducts of 7-mer and 9-mer peptides under conditions similar to ours15, we found no evidence of peptide-peptide cross-linking by LC-MS analysis over the range of mass-to-charge ratios expected for cross-linked peptides.

By MS/MS (Figures S9 – S16), the protonated molecules undergo initial fragmentation of the glycosidic bond or the bridging structure, and the resulting product ions undergo backbone fragmentations similar to those observed for unmodified peptides, giving predominantly yn, an and bn ions.26 Analysis of MS/MS spectra of the protonated molecules definitively established the site of formaldehyde-induced cross linking for all of the peptides with the exception of the His-containing 8-mer cross-linked with dA.

Backbone fragmentations in the MS/MS spectra of the Lys- and Trp-containing 8-mers with a single methylene link to dG (Figures S9, S10) are derived from the intermediate product ions in which the cross-link has broken to yield a peptide sequence containing the Schiff base of Lys (Scheme 2) or the 2-methylene indole derivative of Trp.

Scheme 2.

Scheme 2

Fragmentation of AcVEGG(K-CH2-dG)GAA.

Figure 3 illustrates the MS/MS of the singly-bridged Lys-containing octapeptide. The mass difference between the b5 and b4 ions corresponds to the Schiff base adduct of the target Lys (or 2-methylene-substituted Trp) at position 5, which establishes the point of attachment of the methylene bridge at the predicted target residues. The modification of the Lys and Trp residues is further confirmed by the identification of y4 ions, and the determination that the mass differences between (b5 + y4) and the intermediate product ions yielding the backbone fragmentation series (m/z 742 for the Lys 8-mer and 758 for the Trp 8-mer) correspond to the mass of the modified residue. The product ions representing loss of the Schiff base adduct of dG from the cross-linked peptides establishes the exocyclic N2 of dG as the second point of attachment of the methylene bridge, as discussed above for the cross-linked trinucleotides.

Figure 3.

Figure 3

ESI-QTOF-MS/MS spectrum of the protonated molecule Acetyl-VEGG(K-CH2-dG)GAA. Inline graphic ions containing Schiff base derivative of Lys; Inline graphic ions containing unmodified Lys; Inline graphic ions that do not contain residue 5.

In the MS/MS spectrum of the Cys-containing 8-mer cross linked to dC (Figure S11), a series of low-abundance bn ions originates from the intermediate product ion (m/z 697) after loss of the deoxyribose from the protonated molecule. The difference in mass between the b5 and b4 ions of this bn series corresponds to the mass of the Cys–CH2–C unit, confirming that the cross link is attached to the peptide at Cys. Additional support for the cross-link to Cys is a second series of low abundance b5, b6 and b7 ions having compositions consistent with backbone fragmentation of the product ion from cleavage of the C–S bond of methylene bridge which transforms Cys to an α-amidoacrylic acid residue. As required by this scheme, the difference in mass between the b5 (m/z 412) and b4 (m/z 343), which does not contain a modified residue, is 69 mass units, corresponding to α-amidoacrylic acid at position 5. Major backbone fragmentation series for the Cys-containing 8-mers cross-linked with dA (Figure S12) and dG (Figure S13) originate from the product ions subsequent to the loss of the Schiff base derivatives of the nucleosides, and therefore are not informative with respect to the site of attachment of the methylene bridge to the peptide. However, the bn series from the intermediate product ions containing α-amidoacrylic acid at position 5 are present in the MS/MS spectra of both cross-linked 8-mers, fixing the methylene bridge attachment at Cys.

The MS/MS spectrum of the Lys-containing 8-mer linked by two methylene groups through formation of a fused triazino ring (Figure S14) is consistent with initial fragmentation of the triazino ring followed by proton transfer to give Schiff base adducts of both dG and the 8-mer (nominal mass, m/z 742) according to Scheme 3, with the charge residing predominantly on the 8-mer fragment. The Schiff base adduct of dG loses deoxyribose to give an ion at m/z 164 in low abundance. The absence of an ion corresponding to [MH − dG]+ (nominal mass m/z 730) accompanied by product ions from its backbone fragmentation, such as observed above for a single cross-linking methylene, supports the triazino structure. The difference in mass between b5 and b4 ions corresponds to the Schiff base adduct at the terminal −NH2 of the Lys residue as required by Scheme 3.

Scheme 3.

Scheme 3

Fragmentation of AcVEGG(TPHA-1)GAA.

The MS/MS spectrum of the product the containing the tricyclic (TPHA-2) cross-linking structure with an N5 hydroxymethyl-substituted triazinopurine (Figure S15) is dominated by two series of ions derived from peptide backbone fragmentations of the intermediate product ions shown in Scheme 4. The y4, b4 and b5 ions can be identified for both series, and the compositions establish that the terminal −NH2 of Lys has been incorporated into the linking structure. The sequential fragmentations of the protonated molecule which lead to the intermediate products in Scheme 4 are suggested by the presence of product ions both at m/z 164 for the protonated N2-Schiff base adduct of G and at m/z 178 compatible with the protonated tricyclic base 1,N2-ethano-G.

Scheme 4.

Scheme 4

Fragmentation of AcVEGG(TPHA-2)GAA.

In the MS/MS spectrum of the His-containing 8-mer cross-linked to dA, a single series of backbone fragmentations arises from an intermediate product ion via loss of the Schiff base adduct of dA. This fragmentation pattern is not informative regarding the site of methylene attachment to the 8-mer, and the structure of the cross-link is inferred from the structures determined by NMR studies for cross-linked monomeric units dA–CH2–His.

Nα-Boc-protected amino acids cross-linked to deoxynucleosides

Formaldehyde-induced coupling reactions between the Nα-Boc-protected amino acids and nucleoside monomers were investigated on a scale which allowed detailed structural determination of the linkages by NMR spectrometry. Table 3 gives the exact masses and corresponding elemental compositions of the cross-linked products, MS/MS spectra are given in Figures S17 – S24. LC-MS analysis revealed no cross-links formed between deoxynucleosides or between amino acids. As observed for the trinucleotide and 8-mer, coupling products of Lys with dG incorporated one or two methylene groups or two methylenes with a hydroxymethyl adduct. All of the Lys coupling products were labile and were isolated and stored at sub-ambient temperature. The lability of the Lys coupling products is consistent with a report that formaldehyde-induced DPCs involving the Lys- and Arg-rich major histones are hydrolytically unstable15,20.

Table 3.

Exact masses of formaldehyde-induced deoxynucleoside-amino acid cross-links by FTICR-MS.

Cross-links Experimental Mass ([MH]+) Calculated Mass ([MH]+) Composition Δ (ppm)
Lys-CH2-dG 526.2622 526.2619 C22H35N7O8 0.6
TPHA-1 538.2623 538.2619 C23H35N7O8 0.7
TPHA -2 568.2726 568.2725 C24H37N7O9 0.2
Trp- CH2-dG 584.2465 584.2463 C27H33N7O8 0.3
Cys-CH2-dG 501.1763 501.1762 C19H28N6O8S 0.2
His- CH2-dA 519.2312 519.2310 C22H30N8O7 0.4
Cys- CH2-dA 485.1813 485.1813 C19H28N6O7S 0
Cys- CH2-dC 461.1701 461.1700 C19H28N6O8S 0.2

Characterization of formaldehyde-induced cross-links between deoxynucleosides and amino acids

The structures of formaldehyde-induced cross-links formed between the amino acids and nucleosides are given in Chart 1.

Lys cross-links with dG

In the presence of formaldehyde, Lys and dG formed three coupling products with the structures given in Chart 1. All three structures were detected in coupling reactions run at the lowest formaldehyde concentration. The proportion of tricyclic and N-hydroxymethyl-substituted tricyclic nucleosides increased with increasing formaldehyde concentration and reaction time (confirmed with 13C-formaldehyde), consistent with progressive incorporation of formaldehyde molecules into the initial coupling product (Figure S25). As described above, the products were all labile at ambient temperature and required care in isolation and characterization. The tricyclic structures TPHA-1 and TPHA-2 were definitively established by mass spectrometry and NMR. Lys-CH2-dG was too labile for characterization by NMR and its assignment as a product of cross-linking is based on indirect evidence from the analyses below.

Exact mass measurements by ESI-MS in the positive ion mode of major ions at m/z 526, 538 and 568 in the coupling reaction mixture were compatible with the protonated molecules Lys- CH2-dG, TPHA-1 and TPHA-2, respectively (Table 3). When 13C-formaldehyde was used in the coupling reaction, the protonated molecules are observed at m/z 527, 540 and 571, confirming formaldehyde as the origin of the methylene linkers and the hydroxymethylene group.

Well-resolved peaks were collected by semi-preparative HPLC (Figure 4) and characterized by NMR (Figures S26 – S31). 1H NMR spectra of the fractions collected at 17.2 and 26.5 min indicated they were dG-containing mixtures, suggesting post-column decomposition.

Figure 4.

Figure 4

Trace (254 nm) of semi-preparative HPLC of the reaction mixture from the formaldehyde-induced coupling of Lys and dG. The peaks at 17.2 and 26.5 min were characterized by NMR. Peaks at 7.3 and 10.2 min are dG and N2-hydroxymethyl-dG, respectively.

The peak eluting at 17.2 min contained dG/Lys/TPHA-1 in 1:1:1 ratio, while the peak at 26.5 min contained a 1:1.4:7 mixture of dG/TPHA-1/ TPHA-2 based on signals in the H1′ region of the NMR spectra. In the early-eluting fraction, Lys and dG were readily identified by comparison of well resolved signals in critical regions of the spectrum with authentic standards and in the late-eluting fraction, signals of TPHA-1 and TPHA-2 were readily distinguished because of the difference in signal intensities and peak integrals. The composition of the fractions may be explained by co-chromatography of TPHA-1 and Lys-CH2-dG at 17.2 min and sequential uncoupling of formaldehyde units in both fractions. The absence of signals in either fraction attributable to singly bridged Lys– CH2–dG suggests that Lys–CH2–dG degrades significantly faster than the tricyclic adducts. The appearance of dG and TPHA-1 in the 26.5 min fraction suggests that both TPHA-1 and TPHA-2 are unstable. The mixtures observed in the 1H NMR spectra are consistent with the degradation at 37 °C of the 17.2 and 26.5 min fractions monitored by mass spectrometry (Figure S32). Uncoupling is supported additionally by a signal in the 1H NMR spectra of both early and late-eluting fractions attributable to formaldehyde or formaldehyde hydrate (Figures S26, S29) in accord with reported reversibility of formaldehyde-induced DPCs20. The formaldehyde/formaldehyde hydrate assignment is based on the appearance of a proton singlet below 8 ppm having unsuppressed 1-bond coupling with a carbon signal at 164.4 ppm that has no connectivity with any component in the HMBC spectra of the mixtures (Figures S26, S29).

In each fraction, the NMR signals of the components of the mixtures could be resolved, allowing the structures of TPHA-1 and TPHA-2 to be unambiguously assigned. Critical in establishing the triazine ring is the presence of two formaldehyde-derived methylenes and identification of the connectivities between the methylene groups, guanine and Lys moieties. Figures 5 and 6 show expansions of the HMBC spectra that are key to establishing the fused triazine linkages. In the HMBC spectrum of TPHA-1 (Figure 5), a broad two-proton singlet at 4.89 with three bond coupling to a 13C signal at 59.5 ppm and a second two-proton singlet at 4.24 ppm with 3JC-H coupling to a 13C signal at 60.5 ppm are assigned to methylene groups at positions 8 and 6, respectively, of the triazino[1,2-a]purine framework. Unsuppressed one-bond couplings, confirmed by the corresponding C/H cross peaks in the HSQC spectrum (Figure S27), are observed for both methylene signals and allow assignment of the methylene carbon shifts. The protons attached to the methylene carbon assigned to C8 show the expected connectivities within the tricyclic framework between C4a at 150.2 ppm and C10 at 155.8 ppm, while the methylene protons at position 6 couples only with the carbon at C4a. Attachment of the hexanoic acid moiety at N7 is confirmed by cross peaks between the methylene protons at positions 6 and 8 and a carbon signal at 49.2 ppm, which can be assigned to hexanoic acid Cε by virtue of unsuppressed one-bond coupling in the HMBC spectrum and the corresponding C/H cross peak in the HSQC spectrum. The complementary three-bond coupling between the methylene protons attached to Cε (overlapping with the DMSO signal and sugar H2′′ signals at ∼ 2.5 ppm) and C6 and C8 is also observed.

Figure 5.

Figure 5

HPLC peak at 17.2 min containing TPHA-1: Expansion of the HMBC spectrum between 2.0 and 5.2 ppm on the 1H-axis and 15 and 166 ppm on the 13C-axis. Key signals are identified on the marginal 1H and 13C traces. Unsuppressed 1-bond couplings are indicated by brackets.

Figure 6.

Figure 6

HPLC peak at 26.5 min containing TPHA-2: Expansion of the HMBC spectrum between 2.0 and 5.2 ppm on the 1H-axis and 40 and 160 ppm on the 13C-axis. Key signals are identified on the marginal 1H and 13C traces. Unsuppressed 1-bond couplings are indicated by brackets.

By a similar analysis of the HMBC spectrum in Figure 6, connectivities can be identified to confirm the triazino[1,2a]purine framework of TPHA-2, the major component of the late-eluting peak. The critical cross peaks are between C8H2 (4.95 ppm) and C6 (63.9 ppm), C4a (149.4 ppm), C10 (156.0 ppm) and Cε (49.0 ppm); between C6H2 at 4.45 ppm and C8 (61.5 ppm), C5a (149.8 ppm) and Cε (49.2 ppm) and between CεH2 (2.5 ppm, overlapping with DMSO) and C6 and C8. A methylene proton signal at 4.99 ppm can be assigned to the hydroxymethylene group with the position of attachment fixed at N5 by virtue of cross peaks with C6 and C4a. In the 1H NMR spectrum (Figure S29), a singlet, assignable to formaldehyde by the rationale described above, strongly supports the conclusion that the presence of THPA-1 in the late-eluting fraction results from the hydrolytic elimination of the N5-hydroxymethylene substituent of TPHA-2. The dynamic nature of formaldehyde-induced Lys-dG cross links (Scheme S1) is supported by the lability of the products and the presence of formaldehyde in the NMR spectra of TPHA-1 and TPHA-2 which indicate that the condensations are reversible.

Cys cross links with dG, dA, dC

A single product was isolated from the cross-linking of Cys and dG with formaldehyde. In contrast to the cross-linked products of Lys, the product with Cys was stable and readily isolated and purified. The exact mass corresponded in elemental composition to addition of one methylene group. ESI-MS/MS of the protonated molecule (Figure S20) yielded product ions from loss of deoxyribose and t-Boc. The base peak of the MS/MS spectrum appears at m/z 164 (Figure S20), in accord with fragmentation to a Schiff base derivative of guanine, which indicates that the methylene cross-link bridges N2 of guanine and the Cys sulfhydryl group.

NMR spectrometry (Figures S33, 34) confirmed guanine N2 and Cys SH as the points of attachment of the methylene cross-link. The carbon and proton signals of formaldehyde-derived methylene cross-link are assigned from the HMBC spectrum on the basis of unsuppressed one-bond coupling between the carbon signal at 43.2 ppm and a 2-proton methylene multiplet centered at 4.52 ppm. Attachment at the Cys sulfhydryl is then fixed by coupling between the diastereotopic CβH2 protons of Cys at 2.84 and 3.03 ppm and the formaldehyde-derived methylene bridging carbon, while attachment at guanine N2 is fixed by coupling between the bridging methylene protons and C2 of guanine at 151.9 pm. The presence of the guanine imino N1H at 10.82 ppm in the 1H NMR spectrum, unambiguously assigned from coupling with guanine C5 at 117 ppm in the HMBC spectrum (Figure S33), rules out attachment of the cross link at guanine N1. Attachment to Cys at Nα is ruled out by identification of Cys-NαH as one of two incompletely resolved proton signals at ∼7.10 ppm based on cross peaks with Cys Cα (53.9 ppm) and Cβ (31.8 ppm).

Cys formed stable cross-links with dA and dC in minor amounts. By high resolution mass spectrometry, both products had empirical compositions consistent with a methylene bridge linking the nucleoside and Cys. Both products were characterized by NMR (Figures S35 – S38). Critical C/H connectivities in the HMBC spectra (Figures S35, S37) were observed between the proton signals of the bridging methylene and a carbon signal of the base, and between the bridging methylene and Cys βCH2. In the case of Cys-CH2-dA, attachment to dA at the exocyclic amino group was established by coupling between the diastereotopic methylene protons of the bridging group at 4.56 and 4.73 ppm and C6 of dA at 153.7 ppm (Figure S35). Attachment to the Cys sulfhydryl group was established by coupling between the diastereotopic methylene bridge protons and the Cβ signal of Cys at 34.7 ppm. Identification of a one-proton N6H signal confirmed the substitution at the exocyclic amino group of dA, while presence of the Boc-NαH proton signal ruled out attachment at the α-amino group of Cys. The cross link between dC and Cys was similarly determined to be between the exocyclic amino group of dC and the Cys sulfhydryl because of the observed coupling between the bridging methylene protons at 4.48 and 4.26 ppm and C4 of dC at 162.5 ppm and between the bridging methylene protons and Cβ of Cys at 35.1 ppm (Figure S37). As in the case of Cys-CH2-dA, the position of the cross link determined from the HMBC spectrum is supported by one-proton signals attributable to N4H of dA and Boc-NαH.

His cross-link with dA

A low yield of stable cross-linked product was isolated from the reaction of His and dA with formaldehyde incorporating one methylene linkage identified by accurate mass measurement. Despite the absence of C,H cross-peaks between the linker methylene and His, attachment at Nε1 could readily be established by the presence of imidazole ring C-H proton signals Hδ1 at 6.96 and Hε2 at 7.61 ppm, which were definitively assigned by unsuppressed one-bond coupling to carbon signals at 115.5 and 136.1 ppm, respectively, in the HMBC spectrum (Figure S39). Site of attachment to dA rests on the presence of a signal having an integral value of one proton assigned to N6H. When the cross-link is derived from 13C-formaldehyde, a broad two-proton signal at 5.59 ppm in the 1H NMR spectrum is replaced by a two-proton doublet centered at 5.59 ppm, 1JC-H = 153.2 Hz (Figure S40), and can be assigned to the formaldehyde-derived methylene bridge. This signal has the expected NOESY interactions with the imidazole protons in the ROESY spectrum (Figure S41).

Trp cross-link with dG

The Trp coupling product incorporated one methylene cross-link determined by accurate mass measurement. The site of attachment to Trp can be definitively fixed at C2 of the indole ring, and analogous to the reported formation of tetrahydro-β-carboline from Trp via intramolecular incorporation of a formaldehyde-derived methylene when the α-amino group is unprotected27,28. The proton signal of the formaldehyde-derived methylene forming the cross-link to dG is identified by a broad two-proton resonance at 5.30 ppm in the 1H NMR spectrum attached to a carbon having a signal at 79.4 ppm in the HSQC spectrum (Figure S42) which is also detected by unsuppressed one-bond coupling (1JC-H = 164.4 Hz; Figure S43) in the HMBC spectrum. Attachment of the methylene at indole C2 is supported by the detection of indole benzo-ring carbons as the only indole carbons having attached protons in the HSQC spectrum and by NOESY interactions between the bridge methylene and TrpCβH2 proton signals in the ROESY spectrum (Figure S44), which would not be observed if attachment of the methylene were at the indole nitrogen. While the proton-bearing carbons of the Trp indole ring were readily identified in the HSQC spectrum, overlapping of the quaternary carbon signals of both indole and dG moieties precludes establishing connectivity between the methylene bridge and dG from the HMBC spectrum. Attachment of the methylene bridge at the exocyclic amino group is inferred from the absence of a 2-proton signal assignable to an unsubstituted exocyclic amino group of guanine and the fragmentation of the protonated molecule to yield the Schiff base of dG as the base peak in the MS/MS spectrum (Figure S24).

Conclusion

We have investigated the formation of formaldehyde-induced cross-links between nucleosides and amino acids and provide the first rigorous structural characterizations of DNA-protein cross-links induced by coupling with formaldehyde. Eight cross-linked structures have been characterized. In five of these structures, the cross-linking arose via a methylene bridge connecting the exocyclic amino group of a nucleoside to a nucleophilic nitrogen or sulfur of an amino acid side chain. In the case of Trp, the points of attachment were the N2 exocyclic amino group of dG and C2 of the indole ring. Lys, in addition to forming an expected dG-N2-CH2-Nε-Lys cross-link, formed two tricyclic adducts with dG, a 10-oxo-triazino[1,2-a]purinyl derivative incorporating two formaldehyde molecules and a 5-formyl-10-oxo-triazino[1,2-a]purinyl adduct incorporating three formaldehyde molecules. While the formation of the tricyclic adducts was favored at formaldehyde levels of 50 and 100 mM, all three adducts were identified at the lowest levels tested (5 mM). The three Lys cross-linked products were labile in solution, and the cross-link formed by a single methylene bridge was too unstable for characterization by NMR spectrometry. The lability of the Lys-dG cross-links support the reported reversibility of formaldehyde-induced cross-links formed in vivo between histones and DNA20. The cross-links Cys-CH2-dG, Cys-CH2-dA and Cys-CH2-dC were stable and readily isolated and characterized. The cross-links characterized in this work will contribute to a better understanding of DPC formation induced by formaldehyde and of the mode of action of this known human carcinogen. The stable structures identified in this study have potential as biomarkers for the occurrence of DPCs following formaldehyde exposure. Furthermore, by using [13CD2]-formaldehyde, specific exposure related exogenous cross-links can be differentiated from endogenous cross-links. Such sophisticated methods will be necessary to carefully examine formaldehyde adducts formed at the site of contact versus adduction distant sites to determine the plausibility of inhaled formaldehyde as a causative agent for leukemia29.

Supplementary Material

1_si_001

Acknowledgments

This work was supported in part by NIH grants P30-ES10126, P42-ES05948 and a grant from the Formaldehyde Council, Inc.

Footnotes

Supporting Information available: Complete HMBC and ROESY NMR spectra of Nα-Boc-amino acid-deoxynucleoside cross-links. Full HSQC spectra of TPHA-1, TPHA-2 and Trp-CH2-dG. Complete QTOF–MS/MS spectra of Nα-Boc-amino acid-trinucleotide cross-links and complete QTOF–MS/MS spectra of N-terminal acetylated octapeptide-deoxynucleoside cross-links.

Contributor Information

Avram Gold, Email: golda@email.unc.edu.

Louise M Ball, Email: lmball@email.unc.edu.

James A. Swenberg, Email: jswenber@email.unc.edu.

References

  • 1.International Agency for Research on Cancer. IARC Monographs on the evaluation of carcinogenic risks to humans. Vol. 62. 1995. pp. 217–245. [PMC free article] [PubMed] [Google Scholar]
  • 2.Swenberg JA, Kerns WD, Mitchell RI, Gralla EJ, Pavkov KL. Cancer Res. 1980;40:3398–3402. [PubMed] [Google Scholar]
  • 3.Kerns WD, Pavkov KL, Donofrio DJ, Gralla EJ, Swenberg JA. Cancer Res. 1983;43:4382–4392. [PubMed] [Google Scholar]
  • 4.Monticello TM, Swenberg JA, Gross EA, Leininger JR, Kimbell JS, Seilkop S, Starr TB, Gibson JE, Morgan KT. Cancer Res. 1996;56:1012–1022. [PubMed] [Google Scholar]
  • 5.Speit G, Schutz P, Merk O. Mutagenesis. 2000;15:85–90. doi: 10.1093/mutage/15.1.85. [DOI] [PubMed] [Google Scholar]
  • 6.Merk O, Speit G. Environ Mol Mutagen. 1998;32:260–268. doi: 10.1002/(sici)1098-2280(1998)32:3<260::aid-em9>3.0.co;2-m. [DOI] [PubMed] [Google Scholar]
  • 7.Cheng G, Wang M, Upadhyaya P, Villalta PW, Hecht SS. Chem Res Toxicol. 2008;21:746–751. doi: 10.1021/tx7003823. [DOI] [PubMed] [Google Scholar]
  • 8.Lu K, Ye W, Gold A, Ball LM, Swenberg JA. J Am Chem Soc. 2009;131:3414–3415. doi: 10.1021/ja808048c. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Siomin YA, Simonov VV, Poverenny AM. Biochim Biophys Acta. 1973;331:27–32. doi: 10.1016/0005-2787(73)90415-2. [DOI] [PubMed] [Google Scholar]
  • 10.McGhee JD, von Hippel PH. Biochemistry. 1975;14:1281–1296. doi: 10.1021/bi00677a029. [DOI] [PubMed] [Google Scholar]
  • 11.McGhee JD, von Hippel PH. Biochemistry. 1975;14:1297–1303. doi: 10.1021/bi00677a030. [DOI] [PubMed] [Google Scholar]
  • 12.McGhee JD, von Hippel PH. Biochemistry. 1977;16:3267–3276. doi: 10.1021/bi00634a001. [DOI] [PubMed] [Google Scholar]
  • 13.McGhee JD, von Hippel PH. Biochemistry. 1977;16:3276–3293. doi: 10.1021/bi00634a002. [DOI] [PubMed] [Google Scholar]
  • 14.Chang YT, Loew GH. J Am Chem Soc. 1994;116:3548–3555. [Google Scholar]
  • 15.Metz B, Kersten GFA, Hoogerhout P, Brugghe HF, Timmermans HAM, de Jong A, Meiring H, Hove JT, Hennink WE, Crommelin DJ, Jiskoot W. J Biol Chem. 2004;279:6235–6243. doi: 10.1074/jbc.M310752200. [DOI] [PubMed] [Google Scholar]
  • 16.Lu K, Boysen G, Gao L, Collins LB, Swenberg JA. Chem Res Toxicol. 2008;21:1586–1593. doi: 10.1021/tx8000576. [DOI] [PubMed] [Google Scholar]
  • 17.Chaw YFM, Crane LE, Lange P, Shapiro R. Biochemistry. 1980;19:5525–5531. doi: 10.1021/bi00565a010. [DOI] [PubMed] [Google Scholar]
  • 18.Huang HF, Solomon MS, Hopkins PB. J Am Chem Soc. 1992;114:9240–9241. [Google Scholar]
  • 19.Huang HF, Hopkins PB. J Am Chem Soc. 1993;115:9402–9408. [Google Scholar]
  • 20.Quievryn G, Zhitkovich A. Carcinogenesis. 2000;21:1573–1580. [PubMed] [Google Scholar]
  • 21.Solomon MJ, Varshavsky A. Proc Natl Acad Sci USA. 1985;82:6470–6474. doi: 10.1073/pnas.82.19.6470. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Brodolin K. Protein-DNA crosslinking with formaldehyde in vitro. In: Andrew T, Malcolm B, editors. DNA-Protein interaction. Oxford University Press; Oxford: 2000. pp. 141–150. [Google Scholar]
  • 23.Koc H, Swenberg JA. J Chromatography B. 2002;778:323–343. doi: 10.1016/s1570-0232(02)00135-6. [DOI] [PubMed] [Google Scholar]
  • 24.Daniels DS, Woo TT, Luu KX, Noll DM, Clarke ND, Pegg AE, Tainer JA. Nat Struct Mol Biol. 2004;11:714–720. doi: 10.1038/nsmb791. [DOI] [PubMed] [Google Scholar]
  • 25.Vrkic AK, O'Hair RAJ, Foote S. Aust J Chem. 2000;53:307–319. [Google Scholar]
  • 26.Paizs B, Suhai S. J Am Soc Mass Spectrom. 2004;15:103–113. doi: 10.1016/j.jasms.2003.09.010. [DOI] [PubMed] [Google Scholar]
  • 27.Tammler U, Quillan JM, Lehmann J, Sadee W, Kassack MU. Eur J Med Chem. 2003;38:481–493. doi: 10.1016/s0223-5234(03)00062-x. [DOI] [PubMed] [Google Scholar]
  • 28.Zhao M, Bi L, Bi W, Wang C, Yang Z, Ju J, Peng S. Bioorg Med Chem. 2006;14:4761–4774. doi: 10.1016/j.bmc.2006.03.026. [DOI] [PubMed] [Google Scholar]
  • 29.Zhang L, Steinmaus C, Eastmond DA, Xin XK, Smith MT. Mutat Res. 2009;681:150–168. doi: 10.1016/j.mrrev.2008.07.002. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

1_si_001

RESOURCES