Abstract
DNA-protein cross-links (DPCs) have broad applications in mapping DNA-protein interactions and provide structural insights into macromolecular structures. However, high-resolution mapping of DNA-interacting amino acid residues with tandem mass spectrometry (MS) remains challenging due to difficulties in the sample preparation and data analysis. Herein, we developed a method for identifying cross-linking amino residues in DNA-protein cross-links at single amino acid resolution. We leveraged the alkaline lability of ribonucleotides and designed ribonucleotide-containing DNA to produce structurally defined nucleic acid-peptide cross-links under our optimized ribonucleotide cleavage conditions. The structurally defined oligonucleotide-peptide heteroconjugates improved ionization, reduced the database search space, and facilitated the identification of cross-linking residues in peptides. We applied the workflow to identifying abasic (AP) site-interacting residues in human mitochondrial transcription factor A (TFAM)-DNA cross-links. With sub-nmol sample input, we obtained high-quality fragmentation spectra for nucleic acid-peptide cross-links and identified 14 cross-linked lysine residues with the home-built AP_CrosslinkFinder program. Semi-quantification based on integrated peak areas revealed that K186 of TFAM is the major cross-linking residue, consistent with K186 being the closest (to the AP modification) lysine residue in solved TFAM:DNA crystal structures. Additional cross-linking lysine residues (K69, K76, K136, K154) support the dynamic characteristics of TFAM:DNA complexes. Overall, our combined workflow using ribonucleotide as a chemically cleavable DNA modification together with optimized sample preparation and data analysis offers a simple yet powerful approach for mapping cross-linking sites in DNA-protein cross-links. The method is amendable to other chemical or photo-cross-linking systems and can be extended to complex biological samples.
Graphical Abstract

INTRODUCTION
DNA-protein interactions are essential to all aspects of genetic information transfer in living organisms, such as replication, transcription, recombination, and repair.1 Covalent DNA-protein cross-links (DPCs) mediated by chemical linkers serve as useful tools to map these interactions.2-4 DPCs also form in biological contexts as intermediates during enzymatic processing of DNA or as products when genetic materials are under chemical/physical assault.1,5,6 In particular, a class of DPCs has been shown to form between a prevalent DNA modification, abasic (AP) sites,7 and lysine residues on interacting proteins via Schiff base chemistry.8 DPCs derived from AP or oxidized AP sites have been demonstrated with DNA repair proteins,9,10 nucleosome core particles,11,12 and a transcription factor in mitochondria.13 If DPCs form excessively14 or are not removed promptly,15 they can block many DNA transactions.16,17 Therefore, DPCs play a critical role in genomic maintenance and human health.
Identifying DNA-interacting amino acid residues is of fundamental importance to understanding the basis of DNA-protein interactions. Structural biology approaches, such as X-ray crystallography18 and cryo-EM,19 can generate high-resolution data; however, they often require large amounts of purified samples and offer limited information on the dynamics of macromolecules or complexes. Solution NMR is powerful in characterizing the structure and dynamics of biomolecules but is often limited to analytes with low molecular weights (<35 kDa).20 Over the past decade, cross-linking mass spectrometry (MS) has emerged as a powerful tool in structural biology and interactome research.21 The method can deliver medium-resolution information to complement classical structural biology and computational approaches. Recently, methods to study DNA-protein interactions in reconstituted systems and at a proteome level have also been developed.22,23
Despite the advancement in cross-linking methodologies and mass spectrometry instrumentation, analyzing nucleic acid-protein conjugates remains challenging. The challenge is confounded by several factors in sample preparation and data analysis. First, both the protein and the oligonucleotide need to be digested into short fragments in shotgun proteomics. The complete digestion of the oligonucleotide is not trivial due to the steric hindrance imposed by oligonucleotide-peptide cross-links.24 Commonly used digestion methods involving nuclease cocktails tend to produce a mixture of mono-, di-, and oligonucleotides and consequently structural heterogeneity of oligonucleotide-peptide conjugates.24 Second, data analysis remains labor-intensive due to the low signal intensity of cross-links and their structural heterogeneity. Consequently, uncertainty in peptide search and difficulties in identifying corresponding cross-links are unavoidable.25
In this study, we developed a mass spectrometry-based approach to map cross-linked amino acid residues DNA-protein cross-links at single amino acid resolution. Our approach exploits the reactivity of AP sites with lysine residues and the lability of ribonucleotides under alkaline conditions to generate structurally defined DNA-peptide cross-links. The ribonucleotides in DNA substrates facilitate the preparation of DNA-peptide cross-links with a predictable nucleic acid fragment via our optimized conditions, which we refer to as Cleave R reaction. Compared to nuclease digestion, our method avoids digestion bias and facilitates data analysis and interpretation. We combined the optimized sample preparation with data analysis using our developed AP_CrosslinkFinder program. The optimized workflow was applied to mapping interacting lysine residues of human mitochondrial transcription factor A (TFAM) with AP modification on DNA. We successfully identified 14 cross-linked lysine residues, which provide insights into the interacting (with AP sites) residues on TFAM and the dynamic characteristics of TFAM:DNA complexes.
Experimental Section
Materials and methods.
Chemicals were from Sigma Aldrich or Fisher Scientific and were analytical grade or molecular biology grade. MS grade trypsin was purchased from Fisher Scientific. Uracil DNA glycosylase (UDG) was purchased from New England Biolabs. ODNs were purchased from Integrated DNA Technologies. The AP-containing DNA oligomers was prepared following a reported procedure.13 Briefly, a deoxyuridine-containing DNA oligomer was treated with UDG to convert deoxyuridine to an AP modification. The AP-containing DNA oligomer was purified via phenol/chloroform extraction followed by annealing with a complementary ODN with no base modifications. The recombinant human transcription factor A (TFAM 43-246) and human AP endonuclease 1 (APE1) were expressed and purified based on our previous protocol.13
Electrophoretic mobility shift assay.
The non-denaturing gels were composed of 6% polyacrylamide (acrylamide/bis-acrylamide, 64/1) in 0.35X TBE (Tris-Borate-EDTA) buffer and pre-run for 30 min at 4°C. The TFAM:DNA complex was assembled on ice, equilibrated at room temperature for 1 h and separated in the native gel at 4°C at 100 V. Gel was imaged with a Typhoon imager (GE Healthcare) and quantified using ImageQuant software. The data fitting was performed with GraphPad Prism v8.0.
Preparation of TFAM-DNA cross-links.
A dsDNA substrate containing an AP lesion was incubated with recombinant human TFAM for 12 h at 37°C. Reactions contained 4 μM AP-DNA, 8 μM TFAM, 20 mM HEPES (pH 7.4), 90 mM NaCl, and 20 mM EDTA, with 25 mM NaBH3CN. The reaction was quenched by adding 100 mM NaBH4 followed by incubation on ice for 30 min. The yield of DPC was analyzed using an 8 X 10-cm SDS-urea (7M)-PAGE (12%) gel. The DPC reaction mixtures were stored in −20°C for further use.
Enrichment of DNA-peptide cross-links.
The DPC reaction mixture was digested by trypsin in 100 mM Tris-HCl pH 8.0 and 20 mM CaCl2 at 37°C overnight. The trypsin/TFAM ratio was 1:5 (wt/wt). The digestion mixture was concentrated with a 3 kDa molecular weight cut-off filter (Millipore) and washed with 10 mM Tris-HCl pH 8.0 at 4°C.
Cleavage reaction of ribonucleotides by Cleave R.
The Cleave R reaction was performed with the trypsin digested DNA-peptide cross-links by incubating it with 50 mM glycine-NaOH pH 10.0, 15 mM MgCl2 at 55°C overnight. The efficiency of the cleavage reaction was checked with DNA-sequencing PAGE. The reaction mixture was neutralized with the acetic acid to pH 8.0 before LC-MS/MS analysis.
Data analysis with MaxQuant.
Proteomic analysis and the enrichment efficiency of the workflow were evaluated with MaxQuant26 through searching with TFAM sequence and the MaxQuant default contaminant sequences. The methionine oxidation and tryptophan oxidation were set as the variable modifications, the default settings for orbitrap were applied and allowed three missed cleavages.
LC-MS/MS analysis.
Liquid chromatography was performed on a Thermo nLC1200 in single-pump trapping mode with a Thermo PepMap RSLC C18 EASY-spray column (2 μm, 100 Å, 75 μm x 25 cm) and a Pepmap C18 trap column (3 μm, 100 Å, 75 μm x 20 mm). Solvents used were A: water with 0.1% formic acid and B: 80% acetonitrile with 0.1% formic acid. Samples were separated at 300 nL/min with a 130-minute gradient starting at 3% B (held for 1 minute), then increasing to 50% B in 110 minutes, then to 100% B in 10 minutes, and held at 100% B for 9 minutes.
Mass spectrometry data were acquired on a Thermo Orbitrap Fusion in data-dependent mode. A full scan was conducted using 60k resolution in the Orbitrap in positive mode with a mass range of 375-1500 m/z and AGC 4.0e5. Precursors for MS2 were filtered by monoisotopic peak determination for peptides, intensity threshold 1.0 e4, charge state 2-7, and 5 second dynamic exclusion after 1 analysis with a mass tolerance of 10 ppm. MS2 were collected in the ion trap with an isolation window of 1.6 Da, AGC target of 3.0 e4, and a 300 ms maximum injection time. For each precursor, MS2 scans were collected using both collision-induced dissociation (CID) and higher-energy collision-induced dissociation (HCD), with CID having scan priority 1 and HCD scan priority 2. Both methods used 35% collision energy.
Data analysis with AP_CrosslinkFinder.
Raw MS data were first converted into the MGF files by MSConvert GUI from Proteowizard. These MGF files were loaded to the custom MATLAB-based program AP_CrosslinkFinder. The custom scripts were developed based on Find_XL27 by the Kalisman laboratory for the identification of peptide-peptide cross-links, a number of changes were made to adapt it to the analysis of DNA-peptide crosslinks. The DNA sequence was entered as required. The in silico digestion of TFAM allows three missed cleavages with oxidative modifications on methionine (+16 Da) and tryptophan (+4, 16, 20, 32 Da). The precursor mass tolerance was 10 ppm, and the fragment mass tolerance was 30 ppm. The peptide fragments were searched with b and y ions, while the DNA fragments were searched with a, a-b, c, y, and w ions. The length of the DNA is set to be the AP residue alone or AP and a neighboring AMP, with or without the 5'-phosphate group. The output structure ms1 displayed the identified cross-links and a score for each cross-link based on the number of fragment ions found in MS2. MS2 spectra were annotated manually.
RESULTS
Optimizing the Cleave R Reaction.
To create DNA-peptide cross-links with a predictable DNA fragment, we exploited the alkaline-labile characteristics of ribonucleotides (Figure 1a)28 and designed DNA substrates (sequence shown in Table S1) with ribonucleotides adjacent to the cross-linking deoxyribonucleotide residue (Figure 1b). All DNA substrates are double-stranded (ds) with the ribonucleotide-containing strand harboring a 5'-fluorescein label to facilitate product analysis via polyacrylamide gel electrophoresis (PAGE). The sequences of these substrates are based on the light-strand promoter sequence of human mitochondrial DNA, which has well-characterized binding properties with TFAM.29,30 To obtain the optimal condition to cleave two ribonucleotides on a model substrate (D4), we tested RNase HII31 and 0.3 M NaOH,32 which have been commonly used to cut ribonucleotide-containing DNA.33 RNase HII failed to completely cleave two ribonucleotides up to 21-h reactions (Figure S1a). The cleavage condition with 0.3 M NaOH at 55°C was complete within 2 hours (Figure S1b); however, this reaction hydrolyzed TFAM (Figure S1c), destroying the amino acid signatures of the cross-linked peptides.
Figure 1.
Generation of DNA fragments via Cleave R reaction. (a) Mechanism of alkaline transesterification reaction to generate the strand break at the 3'-side of the ribonucleotide. Under basic conditions, the cyclic phosphate product can be converted to a mixture of 3'-phosphate and 2'-phosphate.28 (b) Schematic illustration of the ribonucleotide containing DNA. Deoxyribonucleotides are in blue; ribonucleotides are in yellow; the cross-linking nucleotide residue is in red. (c) Polyacrylamide gel analysis of products from the ribonucleotide cleavage reaction. Lane 1 is D4. Lane 2 is D4 incubated with 0.3 M NaOH for 10 min. Lanes 3 and 4 are D4 subjected to Cleave R reactions for 19 and 38 hours.
Next, we designed alternative cleavage conditions inspired by a study by Breaker et al.34 The authors demonstrated that the rate of the alkali-promoted transesterification reaction increases with the increase of pH, [Mg]2+, and temperature.34 We tested a variety of reaction conditions with D4 (Figure S1d, Table S2) and found that a reaction condition in the presence of glycine-NaOH (pH=10) and 20 mM MgCl2 at 55°C for 38 h provides the best cleavage yield (>85%), as shown in Figure 1c. In addition, we examined the stability of DNA-peptide cross-links under this condition. No apparent degradation of the cross-links was observed after 20 h incubation (Figure S3). Therefore, we chose this optimized condition (referred to as Cleave R reaction) in the subsequent ribonucleotide cleavage reactions).
Establishing a Workflow for Analyzing Cross-linking Residues in DPC.
We designed two DNA substrates (D1 and D2, sequence shown in Table S1) for cross-linking reactions. D1 is a double-strand (ds) DNA with an AP modification located on one strand (Figure 2b). The AP-containing oligodeoxyribonucleotides (ODNs) were prepared using precursor ODNs containing a deoxyuridine at the lesion position, followed by removal of uracil using a DNA repair enzyme, uracil-DNA glycosylase (UDG).35 The dsDNA substrates were prepared by annealing AP-ODN with a complementary strand. D2 contains two neighboring (of the AP lesion) deoxynucleotides substituted with ribonucleotides. Compared to D1, the ribonucleotide-containing substrate D2 showed similar DNA-binding stoichiometry, as demonstrated by the electromobility shift assay (EMSA) (Figure 2c). Although the molar ratio (1.5) of TFAM/D2, whereby the highest yield of TFAM:DNA complexes was observed, was slightly different from to that with the D1/TFAM (2), the overall yield is comparable under two conditions (86% for TFAM:D2 complexes, 95% for TFAM:D1 complexes). We selected a molar ratio of 1:1.5 (D2:TFAM) to generate the highest yield of TFAM:DNA complexes containing one TFAM and one DNA molecule for subsequent reactions. The complex, presumably conforming to the TFAM:DNA crystal structures,29,30 serves as a good model to probe specific interactions between DNA and TFAM.
Figure 2.
DPC formation with ribonucleotide-containing DNA and the workflow of identifying cross-linking amino acid residues. (a) Reaction mechanism of the formation of DPC via Schiff base chemistry. Reductive amination with NaBH3CN to stabilize the reaction intermediates for mass spectrometry analysis. (b) DNA substrates used in this study. D1 is a double-stranded oligodeoxyribonucleotide with an AP modification located on the top strand. D2 contains two neighboring (of the AP lesion) deoxynucleotides substituted with ribonucleotides. (c) DNA-TFAM binding stoichiometry determined by electrophoretic mobility shift assay (EMSA). Lanes 1 and 2 are with substrate D1 and contain DNA(D):TFAM(T) complexes formed at molecular ratios of 1:1 and 1:2 (DNA:TFAM), respectively. In lane 1, the percent yield of the 1D:1T complexes is 95 %. Lane 3 contains substrate D2. Lanes 4-6 contain D2:TFAM complexes formed at molecular ratios of 1:1, 1:1.5, and 1:2 (DNA:TFAM), respectively. In lane 5, the percent yield of the 1D:1T complexes is 86 %. (d) Workflow for mapping AP-DNA-protein cross-linking residues with LC-MS/MS. DPC was digested with trypsin overnight to form non-cross-linked peptides and DNA-peptide conjugates. Free peptides were removed and DNA-peptide conjugates were enriched with a 3 kDa molecular weight cut-off filter and followed by the Cleave R reaction. The DNA substrate was cleaved at ribonucleotides to yield DNA-peptide cross-links with a predictable number of nucleotide residues. Reaction products were subjected to LC-MS/MS analysis. DNA-peptide cross-links were identified using AP_CrosslinkFinder followed by manual annotation. (e) SDS-urea PAGE analysis of DPC from reactions of TFAM with D1 or D2. Lanes 1 and 2 contain D1 and its correlating strand break marker. Lane 3-5 are the DPC reactions between D1 and TFAM for 3, 6 and 12 hours. Lanes 6-8 are the DPC reactions between D2 and TFAM for 3, 6 and 12 hours. SSB refers to DNA single-strand breaks. The percent yield of DPC is 92% for reactions with D1 and TFAM, and 86% for reactions with D2 and TFAM. Detailed analysis shown in Figure S7. (f) DNA-sequencing PAGE analysis of the resulting DNA fragments after Cleave R reactions. Lanes 1 and 4 are D2, and lane 2 is D2 upon cleavage with AP endonuclease 1 (APE1) (Figure S4a). Lanes 3 and 6 are trypsin digested D2-DPC after Cleave R reactions. Lane 5 is the product of cleaved D2 with 1 M NaOH. Lane 7 is the sample from lane 3 (or 6) treated with alkaline phosphatase.
We aimed to develop an experimental and computational workflow to map the cross-linking residues on TFAM within TFAM-DNA complexes (Figure 2d). We harnessed the Schiff base intermediates formed between an AP modification and lysine residues on TFAM and captured the interactions using covalent DNA-protein cross-links. In situ trapping of the Schiff base intermediate was achieved by reductive amination in the presence of NaCNBH3 (Figure 2a). We incubated TFAM with D2 in the presence of NaCNBH3 for 12 hr followed by quenching of any unreacted AP sites and unreduced DPC with NaBH4. Relative to D1, D2 has a similar DPC yield after 12 h, indicating that the substitution with ribonucleotides in DNA did not perturb the cross-linking reaction (Figure 2e). The stabilized DPCs were digested with trypsin to form DNA-peptide cross-links and non-cross-linked peptides. The trypsin digestion was monitored by SDS-urea PAGE (Figure S2). The resulting sample was filtered through a 3 kDa molecular cut-off filter to remove a majority of the non-cross-linked peptides for simplified data search and to enrich DNA-peptide cross-links. We verified the removal of non-cross-linked peptides through searching with MaxQuant.26 The enriched DNA-peptide cross-links were converted to DNA-peptide cross-links with a predictable nucleic acid moiety under the Cleave R condition. The successful cleavage of DNA was confirmed by SDS-urea PAGE (Figure S2) and high-resolution DNA-sequencing PAGE (lanes 3 and 6 of Figure 2f). In addition, no apparent degradation of DNA-peptide cross-links was observed after 20-h incubation (Figure S3), reinforcing the suitability of the Cleave R reaction in converting nucleic acids into the desired products while preserving the structure of the peptides. On the contrary, under 0.3 M NaOH, the peptide fragment in DNA-peptide cross-links was hydrolyzed (lane 7-9 of Figure S3), indicating that this commonly used condition for ribonucleotide cleavage is not applicable to preparing DNA-peptide cross-links for subsequent mass spectrometry analysis.
To understand the chemistry of the DNA termini from the Cleave R reaction and to guide the data search for the nucleic acid-peptide cross-links, we treated the product with alkaline phosphatase and observed the disappearance of the lower band (lanes 6 and 7 of Figure 2f). The results indicate that products in the faster-migrating band contain a terminal phosphate group at the 3'-end and that products in the upper band do not, consistent with the phosphate mixtures produced under alkaline conditions (Figure 1a). The assignment is supported by comparing with standard products generated under NaOH (lane 5 of Figure 2f) and APE1 (lane 2 of Figure 2f).36 The NaOH treatment produces products containing a phosphate group at the 3'-end upon cleavage at ribonucleotides. Products from APE1 treatment contain a 3'-OH upon cleavage at the 5'-side of the abasic lesion37 (Figure S4a). These results prompted us to search for DNA components with and without the phosphate group when analyzing DNA-peptide cross-links.
The sample was analyzed by nanoLC-MS/MS. The mass spectrometry data were processed by the custom scripts of AP_CrosslinkFinder, which are modified based on Find_XL.27 We applied AP_CrosslinkFinder to find the AP cross-links, and the mass adducts of the cross-linked peptides were computed and added to the in silico digested TFAM. The search begins with matching the molecular weight in MS1, followed by searching the corresponding fragment ions in MS/MS and outputting the number of matches. The spectra were annotated manually. Together, our optimized workflow from sample preparation to data analysis ensures the identification of cross-linking amino acid residues in DPC.
Mapping AP reactive sites at single amino acid resolution in TFAM-DNA cross-links.
The short and structurally predictable nucleic acid component in DNA-peptide cross-links significantly decreased the search space and reduced the search time. The computational analysis of the LC-MS/MS data with AP_CrosslinkFinder was completed within an hour. We identified 14 unique TFAM-DNA cross-linking sites located on 15 unique peptides (Figure 3a, Table S3). Four types of mass adducts were observed for the cross-linked peptides in mass spectrometry analysis (+102, +118, +198, and +527). The mass adduct, 102 Da, is a minor product generated when the reaction proceeds with β-elimination before trapping by NaCNBH3 (Figure S4b). The proposed mechanism by which the other three major products are formed is shown in Figure 3b. We presume that the 4'-OH of the open-chain form of AP sites or the 2'-OH of the ribonucleoside can undergo nucleophilic attack at the phosphate, leading to the formation of cyclic phosphate intermediates (structures in the middle of Figure 3b). The cyclic phosphates can be hydrolyzed, resulting in products containing a mass adduct of 527 or 198 Da. The phosphate group can be hydrolyzed, as evidenced by PAGE analysis (Figure 2f), generating additional products with a mass adduct of 447 or 118 Da (Figure 3b). The MS/MS spectra of the cross-links contain both fragmented peptides and fragmented ODNs. For the cross-links with a mass adduct of 527 or 447, the most abundant product ions are generated by nucleic acid fragmentation owing to the weak N-glycosidic bond and phosphoester bonds in the nucleic acid moiety (Figure 3c). The nucleic acid fragment derived from D2 contains an AP lesion and an adenosine monophosphate residue, which readily loses an adenine residue, generating an abundant product ion with m/z=136. We considered m/z=136 as the diagnostic ion of this type of cross-links (Figures 3d and 3e). For the mass adducts 198 and 118, the nucleic acid component in DPC contains the reduced ring-opened AP site, which appears to be relatively stable under MS fragmentation conditions. Under collision-induced dissociation (CID), the b- and y-ions of these cross-links contain the aforementioned mass adducts relative to the native peptides (Figures 3f and 3g), facilitating the assignment of peptide sequences and mapping of the cross-linking residues at single amino acid resolution. The fragmentation patterns of nucleic acid-peptide cross-links guided us to develop the MATLAB-based program, AP_CrosslinkFinder.
Figure 3.
Identification of the DNA-protein cross-linking sites in DNA-TFAM cross-links. (a) Representative DNA-peptide cross-links found in this study. The cross-linked amino acid residues are in blue. (b) The proposed reaction scheme for the observed mass shift of 118, 198, 447, and 527. (c) The proposed fragmentation pattern for nucleotide derivatives observed in mass spectra in (d). (d), (e), (f), and (g) are MS/MS spectra of the nucleic acid-peptide cross-links shown in (a). In the MS/MS spectra, b ions are in green, and y ions are in red. The DNA fragments are in yellow, and the ions with fragmentations in the DNA component are in violet.
The predictable structures of the nucleic acid component facilitate database search and the assignment of the peptide sequence. The database search focuses on the tryptic peptides of TFAM cross-linked to the AP lesion in the nucleic acid moiety. Nearly all the nucleic acid-peptide cross-links have a mass accuracy within 3 ppm and MS/MS product ions matching to its theoretical ions (Table S3, Figures S5 and S6), ensuring unbiased DPC assignments.
Site and quantitative analysis of the cross-linking sites.
Our previous study demonstrated that K183, K186, and K190, three residues in the vicinity of the AP site (Figure 4a), play an important role in influencing the cross-linking rate in AP-DNA:TFAM complexes.13 The results here corroborate these earlier data and provide direct evidence for covalent cross-linking of K186 and K190 with AP-DNA under reductive amination. Semi-quantification based on the integrated peak areas of MS1 spectra reveal that K186 is the most abundant cross-linking site, consistent with the closest proximity of K186 to the AP site relative to the other two lysine residues in TFAM:DNA co-crystal structures (Figure 4a). 29,30
Figure 4.
High-resolution mapping of the cross-linking sites in TFAM-DNA complexes. (a) Identified cross-linking sites in TFAM-DNA complexes based on a co-crystal structure (PDB: 3TQ6). The cross-linked lysine residues are in red, and the AP lesion is illustrated in sticks and indicated by the black arrow. (b) Relative abundance of the cross-linking site identified in TFAM-DNA cross-links based on their integrated peak areas in LC-MS/MS analysis. (c) The abundance of cross-linked residues in four domains of TFAM. (HMG1 43-122 in blue, linker 122-152 in green, HMG2 152-222 in brown, and C-terminal tail 222-246 in pink). (d) Relative abundance of four types of observed mass adducts to the cross-linked peptides.
We grouped these cross-linking sites by the domains of TFAM (Figure 4b and 4c). Besides K186, additional residues such as K76, K69, and K62 in the HMG1 domain are shown to interact with the AP lesion (Figure 4c). The observation of cross-linking sites other than K186 is consistent with the dynamic characteristics of TFAM:DNA complexes in solution.38 In addition, the majority of the DNA-peptide cross-links identified have the mass adduct of 527 Da (Figure 4d), indicating the cleavage reaction at the ribonucleoside is very efficient. The second most abundant mass adduct is the 198 Da, which could form by the proposed mechanism via an alkaline transesterification reaction similar to the ribonucleoside transesterification reaction (Figure 3b).
DISCUSSION
In this study, we developed a mass spectrometry-based workflow for mapping cross-linked lysine residues in DNA-protein complexes. We leveraged the alkaline lability of ribonucleotides and designed ribonucleotide-containing DNA substrates to prepare structurally defined nucleic acid-peptide cross-links. The optimized Cleave R reaction cleaves DNA at ribonucleotides while retaining the integrity of nucleic acid-peptide conjugates. Common digestion methods for cleaving nucleic acids involve nuclease cocktails, which often yield a mixture of mononucleotides, dinucleotides, and oligonucleotides due to steric hindrance of different cross-links.24 The heterogeneous conjugates tend to decrease the ionization efficiency, complicate the database search, and hinder the identification of conjugates and cross-linking sites.25 Our approach creates structurally defined nucleic acid moieties in nucleic acid-peptide conjugates, which simplify database search and facilitate mapping of AP-interacting lysine residues at single amino acid resolution.
We exploited the Schiff base chemistry to capture interactions between a reactive abasic modification and lysine residues of a protein. The 14 cross-linked lysine residues in TFAM-DNA complexes provided insights into the conformational dynamics and heterogeneity of TFAM:DNA complexes. The predominant cross-linking site at K186 is in agreement with K186 being the closest lysine (to the AP modification) residue in TFAM:DNA crystal structures. Additional cross-linking sites support the dynamic nature of the TFAM:DNA complexes and alternative binding conformations. For example, the heterogeneity of the TFAM:DNA complexes in solution has been demonstrated using small-angle X-ray scattering, single-molecule Föster resonance energy transfer assays, and molecular dynamic simulations.38 Alternative conformations of TFAM:DNA complexes (especially under micromolar TFAM) other than reported crystal structures could explain the observed cross-linking residues in HMG1 and the linker domains.39,40 The identified cross-linking residues K136, K139, K147, and K154 at the linker region corroborate the nonstatic characteristics of the TFAM:DNA complexes: the flexible linker is known to assist TFAM:DNA complexes to undergo a butterfly or ‘‘breathing’’ movement.38 Furthermore, the higher DNA-binding affinity of the HMG1 domain relative to HMG241 and the sliding of TFAM DNA could also contribute to the cross-links formed with residues on HMG1.39
The reported sample preparation is simple, inexpensive, and requires no gel- or affinity-based purification, which avoids cumbersome sample workup and potential contaminations. In particular, the size-based single-step enrichment streamlines sample workup. With sub-nmol of TFAM sample, we were able to identify 14 cross-linking sites on TFAM, demonstrating that the workflow is sensitive in detecting major and minor cross-linking residues. The workflow is amenable to other cross-linking chemistry by replacing AP sites with a different reactive residue.42 A recent report using chemical digestion of RNA to map RNA-interacting proteins involves harsh chemical treatment, which may not be applicable to different cross-linking chemistries.24
To our knowledge, RNPXL is by far the only available software tool for analyzing the MS data of DPCs and requires the commercial Proteome Discoverer software.43 However, RNPXL cannot be applied to analyze the AP sites crosslinked to DPCs due to its lack of flexibility for user-defined searches. The developed MATLAB-based program AP_CrosslinkFinder can be easily applied to analyzing other types of DPCs using a user-defined mass of the cross-linked conjugate, such as UV-cross-linked DPCs/RNPs44 and formaldehyde crosslinked DPCs.45 The AP_CrosslinkFinder used a simpler but more specific algoristhm than the RNPXL; thus, it can complete the searching of a set of LC-MS/MS data within one hour with the current dataset.
CONCLUSIONS
In summary, we have developed a powerful tool for mapping cross-linking amino acid residues with AP sites in DNA-protein cross-links at single amino acid resolution. The method requires minimal sample input and is sensitive in identifying both major and minor cross-linking sites. Results provide quantitative information on the relative abundance of cross-linking sites. The method can complement the advanced structural biology techniques by providing information on the proximity of interacting functional groups, multiple conformations of nucleic acid-protein complexes in solution, and the inferred relative reactivity of varying amino acid residues. We envision that this workflow is applicable to mapping other interacting residues provided a different functional group can be installed on the oligonucleotide (e.g., a thiol group to map interacting cysteine). A limitation of the approach is that it applies to analyzing DPCs formed in vitro mainly. Nonetheless, when used together with an affinity handle, such as biotin, such a synthetic oligomer can potentially be used to probe interacting proteins and residues in complex biological samples, such as cell extracts.
Supplementary Material
ACKNOWLEDGMENT
We acknowledge Dr. Wenyan Xu for helping with preliminary studies, Drs. Laurie Kaguni and Chaoxing Liu for helpful discussions, and Jacob Perkins for critically reading the manuscript. Figures were created with BioRender.com. This work was supported by the National Institutes of Health (NIH) Grant R35 GM128854 (to L.Z.) and the University of California, Riverside. Mass spectrometry instrumentation at the Proteomics Core of the University of California, Riverside, was supported by NIH Grant S10 OD010669.
Footnotes
Supporting Information. The Supporting Information is available free of charge at http://pubs.acs.org
Optimization of ribonucleotide cleavage conditions; preparation of TFAM-DNA cross-links monitored by gel electrophoresis; stability of D2-peptide cross-links under Cleavage R conditions; reaction schemes of APE1-mediated AP-DNA cleavage and TFAM-mediated β-elimination; MS/MS spectra of the crosslinks identified in this study; quantification of gel analysis of TFAM:AP-ODN reactions; sequences of the DNA substrates; the identified peptide list.
AP_CrosslinkFinder has been uploaded to GitHub and is available at https://github.com/JinTangUCR/AP_CrosslinkFinder
The authors declare no competing interest.
REFERENCES
- (1).Tretyakova NY; Groehler A; Ji S DNA–Protein Cross-Links: Formation, Structural Identities, and Biological Outcomes. Acc. Chem. Res 2015, 48 (6), 1631–1644. 10.1021/acs.accounts.5b00056. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (2).Solomon MJ; Varshavsky A Formaldehyde-Mediated DNA-Protein Crosslinking: A Probe for in Vivo Chromatin Structures. Proc. Natl. Acad. Sci 1985, 82 (19), 6470–6474. 10.1073/pnas.82.19.6470. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (3).You Q; Cheng AY; Gu X; Harada BT; Yu M; Wu T; Ren B; Ouyang Z; He C Direct DNA Crosslinking with CAP-C Uncovers Transcription-Dependent Chromatin Organization at High Resolution. Nat. Biotechnol 2021, 39 (2), 225–235. 10.1038/s41587-020-0643-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (4).Liu J; Cai L; Sun W; Cheng R; Wang N; Jin L; Rozovsky S; Seiple IB; Wang L Photocaged Quinone Methide Crosslinkers for Light-Controlled Chemical Crosslinking of Protein–Protein and Protein–DNA Complexes. Angew. Chemie Int. Ed 2019, 58 (52), 18839–18843. 10.1002/anie.201910135. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (5).Ide H; Nakano T; Salem AMH; Shoulkamy MI DNA–Protein Cross-Links: Formidable Challenges to Maintaining Genome Integrity. DNA Repair (Amst). 2018, 71, 190–197. 10.1016/j.dnarep.2018.08.024. [DOI] [PubMed] [Google Scholar]
- (6).Wei X; Peng Y; Bryan C; Yang K Mechanisms of DNA−protein Cross-Link Formation and Repair. Biochim. Biophys. Acta - Proteins Proteomics 2021, 1869 (8), 140669. 10.1016/j.bbapap.2021.140669. [DOI] [PubMed] [Google Scholar]
- (7).Swenberg JA; Lu K; Moeller BC; Gao L; Upton PB; Nakamura J; Starr TB Endogenous versus Exogenous DNA Adducts: Their Role in Carcinogenesis, Epidemiology, and Risk Assessment. Toxicol. Sci 2011, 120 (Supplement 1), S130–S145. 10.1093/toxsci/kfq371. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (8).Greenberg MM Abasic and Oxidized Abasic Site Reactivity in DNA: Enzyme Inhibition, Cross-Linking, and Nucleosome Catalyzed Reactions. Acc. Chem. Res 2014, 47 (2), 646–655. 10.1021/ar400229d. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (9).DeMott MS; Beyret E; Wong D; Bales BC; Hwang J-T; Greenberg MM; Demple B Covalent Trapping of Human DNA Polymerase β by the Oxidative DNA Lesion 2-Deoxyribonolactone. J. Biol. Chem 2002, 277 (10), 7637–7640. 10.1074/jbc.C100577200. [DOI] [PubMed] [Google Scholar]
- (10).Quiñones JL; Thapar U; Yu K; Fang Q; Sobol RW; Demple B Enzyme Mechanism-Based, Oxidative DNA–Protein Cross-Links Formed with DNA Polymerase β in Vivo. Proc. Natl. Acad. Sci 2015, 112 (28), 8602–8607. 10.1073/pnas.1501101112. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (11).Sczepanski JT; Wong RS; McKnight JN; Bowman GD; Greenberg MM Rapid DNA-Protein Cross-Linking and Strand Scission by an Abasic Site in a Nucleosome Core Particle. Proc. Natl. Acad. Sci 2010, 107 (52), 22475–22480. 10.1073/pnas.1012860108. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (12).Zhou C; Sczepanski JT; Greenberg MM Mechanistic Studies on Histone Catalyzed Cleavage of Apyrimidinic/Apurinic Sites in Nucleosome Core Particles. J. Am. Chem. Soc 2012, 134 (40), 16734–16741. 10.1021/ja306858m. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (13).Xu W; Boyd RM; Tree MO; Samkari F; Zhao L Mitochondrial Transcription Factor A Promotes DNA Strand Cleavage at Abasic Sites. Proc. Natl. Acad. Sci 2019, 116 (36), 17792–17799. 10.1073/pnas.1911252116. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (14).de Graaf B; Clore A; McCullough AK Cellular Pathways for DNA Repair and Damage Tolerance of Formaldehyde-Induced DNA-Protein Crosslinks. DNA Repair (Amst). 2009, 8 (10), 1207–1214. 10.1016/j.dnarep.2009.06.007. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (15).Vaz B; Popovic M; Newman JA; Fielden J; Aitkenhead H; Halder S; Singh AN; Vendrell I; Fischer R; Torrecilla I; Drobnitzky N; Freire R; Amor DJ; Lockhart PJ; Kessler BM; McKenna GW; Gileadi O; Ramadan K Metalloprotease SPRTN/DVC1 Orchestrates Replication-Coupled DNA-Protein Crosslink Repair. Mol. Cell 2016, 64 (4), 704–719. 10.1016/j.molcel.2016.09.032. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (16).Stingele J; Bellelli R; Boulton SJ Mechanisms of DNA–Protein Crosslink Repair. Nat. Rev. Mol. Cell Biol 2017, 18 (9), 563–573. 10.1038/nrm.2017.56. [DOI] [PubMed] [Google Scholar]
- (17).Zhang H; Xiong Y; Chen J DNA–Protein Cross-Link Repair: What Do We Know Now? Cell Biosci 2020, 10 (1), 3. 10.1186/s13578-019-0366-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (18).Newby ZER; O’Connell JD; Gruswitz F; Hays FA; Harries WEC; Harwood IM; Ho JD; Lee JK; Savage DF; Miercke LJW; Stroud RM A General Protocol for the Crystallization of Membrane Proteins for X-Ray Structural Investigation. Nat. Protoc 2009, 4 (5), 619–637. 10.1038/nprot.2009.27. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (19).Guo TW; Bartesaghi A; Yang H; Falconieri V; Rao P; Merk A; Eng ET; Raczkowski AM; Fox T; Earl LA; Patel DJ; Subramaniam S Cryo-EM Structures Reveal Mechanism and Inhibition of DNA Targeting by a CRISPR-Cas Surveillance Complex. Cell 2017, 171 (2), 414–426.e12. 10.1016/j.cell.2017.09.006. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (20).Sugiki T; Kobayashi N; Fujiwara T Modern Technologies of Solution Nuclear Magnetic Resonance Spectroscopy for Three-Dimensional Structure Determination of Proteins Open Avenues for Life Scientists. Comput. Struct. Biotechnol. J 2017, 15, 328–339. 10.1016/j.csbj.2017.04.001. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (21).O’Reilly FJ; Rappsilber J Cross-Linking Mass Spectrometry: Methods and Applications in Structural, Molecular and Systems Biology. Nat. Struct. Mol. Biol 2018, 25 (11), 1000–1008. 10.1038/s41594-018-0147-0. [DOI] [PubMed] [Google Scholar]
- (22).Stützer A; Welp LM; Raabe M; Sachsenberg T; Kappert C; Wulf A; Lau AM; David S-S; Chernev A; Kramer K; Politis A; Kohlbacher O; Fischle W; Urlaub H Analysis of Protein-DNA Interactions in Chromatin by UV Induced Cross-Linking and Mass Spectrometry. Nat. Commun 2020, 11 (1), 5250. 10.1038/s41467-020-19047-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (23).Reim A; Ackermann R; Font-Mateu J; Kammel R; Beato M; Nolte S; Mann M; Russmann C; Wierer M Atomic-Resolution Mapping of Transcription Factor-DNA Interactions by Femtosecond Laser Crosslinking and Mass Spectrometry. Nat. Commun 2020, 11 (1), 3019. 10.1038/s41467-020-16837-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (24).Bae JW; Kwon SC; Na Y; Kim VN; Kim J-S Chemical RNA Digestion Enables Robust RNA-Binding Site Mapping at Single Amino Acid Resolution. Nat. Struct. Mol. Biol 2020, 27 (7), 678–682. 10.1038/s41594-020-0436-2. [DOI] [PubMed] [Google Scholar]
- (25).Schmidt C; Kramer K; Urlaub H Investigation of Protein–RNA Interactions by Mass Spectrometry—Techniques and Applications. J. Proteomics 2012, 75 (12), 3478–3494. 10.1016/j.jprot.2012.04.030. [DOI] [PubMed] [Google Scholar]
- (26).Cox J; Mann M MaxQuant Enables High Peptide Identification Rates, Individualized p.p.b.-Range Mass Accuracies and Proteome-Wide Protein Quantification. Nat. Biotechnol 2008, 26 (12), 1367–1372. 10.1038/nbt.1511. [DOI] [PubMed] [Google Scholar]
- (27).Kalisman N; Adams CM; Levitt M Subunit Order of Eukaryotic TRiC/CCT Chaperonin by Cross-Linking, Mass Spectrometry, and Combinatorial Homology Modeling. Proc. Natl. Acad. Sci 2012, 109 (8), 2884–2889. 10.1073/pnas.1119472109. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (28).Watts JK; Katolik A; Viladoms J; Damha MJ Studies on the Hydrolytic Stability of 2′-Fluoroarabinonucleic Acid (2′F-ANA). Org. Biomol. Chem 2009, 7 (9), 1904. 10.1039/b900443b. [DOI] [PubMed] [Google Scholar]
- (29).Rubio-Cosials A; Sydow JF; Jiménez-Menéndez N; Fernández-Millán P; Montoya J; Jacobs HT; Coll M; Bernadó P; Solà M Human Mitochondrial Transcription Factor A Induces a U-Turn Structure in the Light Strand Promoter. Nat. Struct. Mol. Biol 2011, 18 (11), 1281–1289. 10.1038/nsmb.2160. [DOI] [PubMed] [Google Scholar]
- (30).Ngo HB; Kaiser JT; Chan DC The Mitochondrial Transcription and Packaging Factor Tfam Imposes a U-Turn on Mitochondrial DNA. Nat. Struct. Mol. Biol 2011, 18 (11), 1290–1296. 10.1038/nsmb.2159. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (31).Hiller B; Achleitner M; Glage S; Naumann R; Behrendt R; Roers A Mammalian RNase H2 Removes Ribonucleotides from DNA to Maintain Genome Integrity. J. Exp. Med 2012, 209 (8), 1419–1426. 10.1084/jem.20120876. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (32).Koh KD; Balachander S; Hesselberth JR; Storici F Ribose-Seq: Global Mapping of Ribonucleotides Embedded in Genomic DNA. Nat. Methods 2015, 12 (3), 251–257. 10.1038/nmeth.3259. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (33).Williams JS; Kunkel TA Ribonucleotides in DNA: Origins, Repair and Consequences. DNA Repair (Amst). 2014, 19, 27–37. 10.1016/j.dnarep.2014.03.029. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (34).Li Y; Breaker RR Kinetics of RNA Degradation by Specific Base Catalysis of Transesterification Involving the 2‘-Hydroxyl Group. J. Am. Chem. Soc 1999, 121 (23), 5364–5372. 10.1021/ja990592p. [DOI] [Google Scholar]
- (35).Savva R; McAuley-Hecht K; Brown T; Pearl L The Structural Basis of Specific Base-Excision Repair by Uracil–DNA Glycosylase. Nature 1995, 373 (6514), 487–493. 10.1038/373487a0. [DOI] [PubMed] [Google Scholar]
- (36).Freudenthal BD; Beard WA; Cuneo MJ; Dyrkheeva NS; Wilson SH Capturing Snapshots of APE1 Processing DNA Damage. Nat. Struct. Mol. Biol 2015, 22 (11), 924–931. 10.1038/nsmb.3105. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (37).López DJ; Rodríguez JA; Bañuelos S Molecular Mechanisms Regulating the DNA Repair Protein APE1: A Focus on Its Flexible N-Terminal Tail Domain. Int. J. Mol. Sci 2021, 22 (12), 6308. 10.3390/ijms22126308. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (38).Rubio-Cosials A; Battistini F; Gansen A; Cuppari A; Bernadó P; Orozco M; Langowski J; Tóth K; Solà M Protein Flexibility and Synergy of HMG Domains Underlie U-Turn Bending of DNA by TFAM in Solution. Biophys. J 2018, 114 (10), 2386–2396. 10.1016/j.bpj.2017.11.3743. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (39).Farge G; Laurens N; Broekmans OD; van den Wildenberg SMJL; Dekker LCM; Gaspari M; Gustafsson CM; Peterman EJG; Falkenberg M; Wuite GJL Protein Sliding and DNA Denaturation Are Essential for DNA Organization by Human Mitochondrial Transcription Factor A. Nat. Commun 2012, 3 (1), 1013. 10.1038/ncomms2001. [DOI] [PubMed] [Google Scholar]
- (40).Cuppari A; Fernández-Millán P; Battistini F; Tarrés-Solé A; Lyonnais S; Iruela G; Ruiz-López E; Enciso Y; Rubio-Cosials A; Prohens R; Pons M; Alfonso C; Tóth K; Rivas G; Orozco M; Solà M DNA Specificities Modulate the Binding of Human Transcription Factor A to Mitochondrial DNA Control Region. Nucleic Acids Res. 2019, 47 (12), 6519–6537. 10.1093/nar/gkz406. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (41).Wong TS; Rajagopalan S; Freund SM; Rutherford TJ; Andreeva A; Townsley FM; Petrovich M; Fersht AR Biophysical Characterizations of Human Mitochondrial Transcription Factor A and Its Binding to Tumor Suppressor P53. Nucleic Acids Res. 2009, 37 (20), 6765–6783. 10.1093/nar/gkp750. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (42).Ivancová I; Leone D-L; Hocek M Reactive Modifications of DNA Nucleobases for Labelling, Bioconjugations, and Cross-Linking. Curr. Opin. Chem. Biol 2019, 52, 136–144. 10.1016/j.cbpa.2019.07.007. [DOI] [PubMed] [Google Scholar]
- (43).Kramer K; Sachsenberg T; Beckmann BM; Qamar S; Boon K-L; Hentze MW; Kohlbacher O; Urlaub H Photo-Cross-Linking and High-Resolution Mass Spectrometry for Assignment of RNA-Binding Sites in RNA-Binding Proteins. Nat. Methods 2014, 11 (10), 1064–1070. 10.1038/nmeth.3092. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (44).Urdaneta EC; Beckmann BM Fast and Unbiased Purification of RNA-Protein Complexes after UV Cross-Linking. Methods 2020, 178, 72–82. 10.1016/j.ymeth.2019.09.013. [DOI] [PubMed] [Google Scholar]
- (45).Stingele J; Schwarz MS; Bloemeke N; Wolf PG; Jentsch S A DNA-Dependent Protease Involved in DNA-Protein Crosslink Repair. Cell 2014, 158 (2), 327–338. 10.1016/j.cell.2014.04.053. [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.




