Abstract
The single-stranded DNA cytosine-to-uracil deaminase APOBEC3B is an antiviral protein implicated in cancer. However, its substrates in cells are not fully delineated. Here APOBEC3B proteomics reveal interactions with a surprising number of R-loop factors. Biochemical experiments show APOBEC3B binding to R-loops in cells and in vitro. Genetic experiments demonstrate R-loop increases in cells lacking APOBEC3B and decreases in cells overexpressing APOBEC3B. Genome-wide analyses show major changes in the overall landscape of physiological and stimulus-induced R-loops with thousands of differentially altered regions, as well as binding of APOBEC3B to many of these sites. APOBEC3 mutagenesis impacts genes overexpressed in tumors and splice factor mutant tumors preferentially, and APOBEC3-attributed kataegis are enriched in RTCW motifs consistent with APOBEC3B deamination. Taken together with the fact that APOBEC3B binds single-stranded DNA and RNA and preferentially deaminates DNA, these results support a mechanism in which APOBEC3B regulates R-loops and contributes to R-loop mutagenesis in cancer.
Subject terms: Breast cancer, Molecular biology
APOBEC3B interacts with R-loops and helps mediate their resolution in a deamination-dependent way. This association also renders R-loops susceptible to enhanced APOBEC3B-dependent mutagenesis.
Main
The APOBEC3 family of single-stranded (ss)DNA cytosine deaminases function in the overall innate immune response to viral infection1,2. Popularized initially by HIV-1 restriction activity, the seven human APOBEC3 enzymes collectively exhibit activity against a broad number of DNA-based viruses including retroviruses, hepadnaviruses, papillomaviruses, parvoviruses, polyomaviruses and herpesviruses. An important biochemical feature of this family of enzymes is an intrinsic preference for different nucleobases immediately 5′ of target cytosines. For example, APOBEC3B (A3B) and APOBEC3A (A3A) deaminate cytosines in 5′-TC motifs, and the antibody gene diversification enzyme activation-induced cytidine deaminase (AID) prefers 5′AC/GC motifs3–5.
In addition to beneficial functions in innate and adaptive immunity, multiple DNA cytosine deaminases have detrimental roles in cancer mutagenesis1,6,7. Misprocessing of AID-catalyzed deamination events in antibody gene variable and switch regions can result in DNA breaks and chromosomal translocations in B-cell malignancies7. Off-target deamination of other genes also occurs at lower frequencies, and the resulting mutations can also contribute to B-cell cancers7. In comparison, cancer genomics projects have reported an APOBEC mutation signature in a variety of tumor types (ref. 8 and reviews above). In cancer, the APOBEC3 mutation signature is defined as C-to-T transitions and C-to-G transversions in 5′-TCA and 5′-TCT motifs (single base substitution (SBS)2 and SBS13, respectively). APOBEC3 enzymes are estimated to be the second largest mutation-generating process in cancer following spontaneous deamination by water, which associates with aging (SBS1) (ref. 8).
Despite extensive documentation of the APOBEC3 mutation signature in cancer, the precise molecular mechanisms governing this mutational process are unclear. One challenge is the likelihood that at least two enzymes, A3B and A3A, combine in different ways to generate the overall signature (for example, recent studies9–11 and references therein). However, insights have been gleaned from the physical characteristics of genomes with, for instance, APOBEC3 signature association with chromosomal DNA replication12–16. Other genomic structures with exposed ssDNA may be similarly prone to APOBEC3 mutagenesis such as ssDNA loop regions of hairpins17,18 and ssDNA tracts in recombination and repair reactions, which can manifest as clusters of strand-coordinated mutations (aka. kataegis; for example, refs. 17,19–22). Together, these studies have indicated a mechanism in which expression of A3B and/or A3A leads to mutagenic encounters with exposed ssDNA followed in some instances by processive local deamination.
Another potential substrate for APOBEC3 enzymes is an R-loop, which occurs when nascent RNA re-anneals to the transcribed DNA strand, creating a three-stranded structure containing an RNA/DNA hybrid and a displaced nontranscribed ssDNA strand23–25. R-loops are substrates in AID-catalyzed antibody diversification7 and represent a prominent source of genome instability in cancer23–25. However, evidence linking APOBEC3 enzymes to R-loop-associated mutation and genome instability is lacking apart from a report postulating that U/G mismatches, which can be created by C-to-U deamination of R-loop ssDNA followed by R-loop dissolution and DNA reannealing, may be responsible for a synthetic lethal interaction between A3B activity and uracil excision repair disruption26.
A3B is strongly implicated in cancer mutagenesis based on constitutive nuclear localization, overexpression in tumors, upregulation by cancer-causing viruses such as human papillomavirus and associations with clinical outcomes1,27,28. A3B is also capable of directly inflicting APOBEC signature mutations in human genomic DNA9–11. To further investigate A3B in cancer, an unbiased affinity purification and mass spectrometry (AP–MS) approach was used to identify A3B-interacting proteins. Two dozen proteins were recovered in biologically independent experiments, and 60% of the resulting high-confidence interactors had been reported previously as R-loop-associated factors in RNA/DNA hybrid AP–MS experiments29. A comprehensive series of genetic, cell biology, biochemistry, genomic and bioinformatic studies showed that A3B functions in R-loop homeostasis, and moreover, R-loop regions impacted by A3B are enriched for APOBEC3 signature mutations including kataegis. Altogether, these results reveal an unanticipated role for A3B in R-loop biology and a distinct mechanism of transcription-associated mutation in cancer.
Results
A3B interacts with R-loops and R-loop-associated proteins
To identify A3B regulatory factors, a functional A3B-2xStrep-3xFlag construct (hereafter A3B-SF) was expressed in 293T cells, anti-Strep affinity-purified, and subjected to MS to identify interacting proteins (workflow in Extended Data Fig. 1a). This procedure included RNase A and high salt concentrations to enrich for direct and strong interactions, respectively. Immunoblots, Coomassie gels and DNA deaminase activity assays validated the presence, enrichment and activity of affinity-purified A3B (Extended Data Fig. 1b–d). An enhanced green fluorescent protein (eGFP)-SF construct and an empty 2xStrep-3xFlag vector were negative controls.
Six independent AP–MS experiments yielded 24 specific A3B-interacting proteins (Supplementary Table 1 and Extended Data Fig. 1e,f). These proteins were abundant in all six A3B-SF datasets and absent in GFP-SF or empty vector datasets. A total of 60% of these A3B interactors had been found independently in S9.6 AP–MS experiments29 (Fig. 1a,b). As the S9.6 mAb binds RNA/DNA hybrid with high affinity (Methods), this interaction overlap suggested that A3B may also interact with R-loops. To test this hypothesis, interactions between A3B and multiple R-loop-associated factors were confirmed by co-immunoprecipitation (co-IP; Fig. 1c,dand Extended Data Fig. 1e,f). For example, doxycycline (Dox)-inducible A3B-eGFP was immunoprecipitated from MCF10A cells with an anti-eGFP antibody and the R-loop-associated protein hnRNPUL1 was detected by immunoblotting (Fig. 1c,d). Parallel slot blots showed that R-loops also copurified with A3B-eGFP in an RNase H-sensitive manner demonstrating specificity (Fig. 1d).
The S9.6 mAb was then used to IP RNA/DNA hybrids from MCF10A cells treated with phorbol 12-myristate 13-acetate (PMA) to induce endogenous A3B expression30. Immunoblotting confirmed the enrichment of an established R-loop interacting protein, TOP1 (ref. 31), and a shared R-loop and A3B interactor, hnRNPUL1, in all S9.6 IP reactions except those saturated with a synthetic RNA/DNA hybrid competitor (Fig. 1e). Lamin B1 served as a negative control. Endogenous A3B copurified with R-loops in basal noninduced conditions, and this interaction increased following PMA treatment (Fig. 1e). Notably, no A3B signal was detected in S9.6 pull-downs from A3B knockout (KO) MCF10A cells (Fig. 1e and Extended Data Fig. 2a–d).
A3B depletion triggers increased nuclear R-loop levels
To investigate a potential role for A3B in R-loop biology, R-loop levels were quantified in wild-type (WT) MCF10A and its A3B KO derivative. First, nucleoplasmic S9.6 staining intensity was measured by immunofluorescence (IF) confocal microscopy. These experiments revealed a strong increase in nucleoplasmic S9.6 fluorescence in A3B KO compared to WT cells (Fig. 2a,b). Second, S9.6 dot blots confirmed elevated R-loop levels in the A3B KO cells in comparison to WT (Fig. 2c,d). In both experiments, RNase H treatment eliminated the increase in nucleoplasmic R-loop signals observed in the absence of endogenous A3B. In comparison, nucleolar S9.6 signal was mostly insensitive to RNase H treatment, likely due to rRNA being detected by S9.6 Ab32.
To further investigate A3B and R-loops, analogous experiments were done using U2OS cells. A3B knockdown caused a strong increase in nucleoplasmic S9.6 staining by IF compared to control cells (Fig. 2e,f and Extended Data Fig. 2e–g). An increase in RNA/DNA hybrid signal was also obtained in S9.6 dot blots from A3B-depleted versus control cells (Fig. 2g,h). As mentioned above, specificity in these experiments was confirmed by RNase H treatment. R-loop imbalances are known sources of DNA damage23–25, and elevated R-loop levels in A3B KO MCF10A and A3B-depleted U2OS cells triggered concomitant increases in DNA damage as evidenced by staining of the DNA damage marker γ-H2AX (Fig. 2i–l). However, these elevated levels of R-loops and DNA damage did not alter overall rates of DNA replication or cell cycle progression (Extended Data Fig. 2h–k).
A3B deamination is required to reduce nuclear R-loop levels
Given that A3B loss increased R-loop accumulation, we next asked whether A3B overexpression has the opposite effect. These experiments used JQ1, a bromodomain and extra-terminal protein family inhibitor, to enhance global R-loop levels as shown previously33,34. As expected, JQ1-treated but not untreated or DMSO-treated cells exhibited increased R-loops, as measured by S9.6 IF staining (Fig. 3a,b). Similar results were obtained with cells expressing a bacterial mCherry-RNaseH1 D10R-E48R mutant, which binds but does not process R-loops (Fig. 3c,d; Methods). Next, U2OS cells were transfected with A3B-eGFP or eGFP plasmids, incubated for 24 h to allow for protein expression, treated for 4 h with JQ1 and then analyzed by IF for S9.6 staining. We observed that A3B-eGFP caused a substantial decrease in nucleoplasmic S9.6 levels compared to eGFP control (Fig. 3e,f). Interestingly, expression of A3A, which is more active than A3B biochemically35, had no effect on the S9.6 signal, suggesting a specific R-loop role for A3B (Fig. 3e,f).
The hallmark biochemical activity of A3B is ssDNA C-to-U deamination. To determine whether this activity is required for R-loop regulation, U2OS cells were transfected with constructs expressing A3B-eGFP or the catalytic mutant E255A. Notably, WT A3B caused a substantial reduction in nucleoplasmic R-loop levels as quantified by IF, whereas the catalytic mutant had a less pronounced effect despite similar expression levels (Fig. 3g,h). However, as U2OS cells already express high levels of endogenous A3B, and APOBEC3 enzymes including A3B are reported to oligomerize36,37, this intermediate phenotype could potentially arise from the oligomerization between the overexpressed mutant A3B and the endogenous A3B.
Therefore, a series of genetic complementation experiments was performed to compare the activities of WT A3B and the E255A catalytic mutant in cells lacking endogenous A3B. First, endogenous A3B was ablated from U2OS cells as described above, which resulted in lower A3B protein and activity levels (Fig. 3i). Second, A3B-depleted U2OS cells were stably transfected with shRNA-resistant constructs expressing HA-tagged WT A3B, A3B-E255A or an empty vector control (Fig. 3i). The WT A3B enzyme, but not A3B-E255A, restored ssDNA deaminase activity as expected (Fig. 3i, bottom). Third, R-loop levels were analyzed by S9.6 dot-blot assays. Notably, complementation with WT A3B rescued the effect of A3B depletion and caused a significant reduction in R-loops (Fig. 3j,k). In contrast, cells complemented with similar levels of A3B-E255A showed no significant change in R-loop levels (Fig. 3j,k). Finally, these results were confirmed with quantification of the nucleoplasmic S9.6 and mCherry-RNaseH1 mutant IF signals of U2OS parental and A3B KO cells complemented with WT or E255A A3B (Fig. 3l–o and Extended Data Fig. 2l–n). Taken together, these results showed that A3B-dependent suppression of R-loops requires catalytic activity.
A3B alters the genome-wide distribution of R-loops
Increased R-loops in the nucleoplasmic compartment of A3B-depleted cells suggested a role for A3B in regulating R-loop levels genome-wide. In support, this elevated signal required transcription as evidenced by treatment of A3B-depleted cells with the global transcription inhibitor triptolide (TRP)38 and the transcription elongation inhibitor, flavopiridol (FLV)39 (Fig. 4). Next, we investigated the role of A3B in regulating R-loop levels genome-wide using DNA/RNA immunoprecipitation sequencing (DRIP)–seq experiments. DRIP–seq peaks in WT and A3B KO MCF10A cells were mainly intragenic and distributed between protein-coding, long noncoding RNA and enhancer RNA genes (Fig. 5a,b), similar to R-loop distributions observed previously39–41. As anticipated by transcription dependence and DRIP–seq peak distributions, the vast majority of DRIP–seq positive regions occurred in expressed genes (Extended Data Fig. 3a).
A global comparison of DRIP–seq peaks between A3B KO and WT MCF10A revealed changes in the overall R-loop landscape with 8,296 peaks ‘increased’, 13,761 peaks ‘decreased’ and 154,036 peaks ‘unchanged’ (red versus blue traces in Fig. 5c–e). Representative individual gene results are shown for GADD45A and PHLDA1, HIST1H1B and SYT8 and HIST1H1E and DDX1 that show increased, decreased and unchanged R-loop levels, respectively, in KO compared to WT cells (Fig. 5f,h,j). These DRIP–seq results were confirmed by gene-specific DRIP–qPCR (Fig. 5g,i,k). As discussed above, DRIP–qPCR signals were reduced to background levels by RNase H treatment, confirming R-loop specificity (Fig. 5g,i,k, striped bars). Differential DRIP signals in these genes were not due to transcription differences between KO and WT cells (Extended Data Fig. 3b). As expected, negligible DRIP signals were found in nonexpressed genes and intergenic loci (for example, TFF1 in Extended Data Fig. 3c,d). Similar DRIP results were obtained in A3B-depleted HeLa cells (Extended Data Fig. 3e–g).
A3B accelerates the kinetics of R-loop resolution
Transcriptional activation by different signal transduction pathways is known to increase R-loop formation42–44. We therefore investigated whether A3B may also affect signal transduction-induced R-loops. A3B WT and KO MCF10A lines were treated with PMA to induce the protein kinase C and noncanonical nuclear factor kappa B (NF-κB) signal transduction pathways that activate the transcription of many genes including endogenous A3B (refs. 30,45). DRIP–seq analysis in these cells demonstrated that PMA caused perturbations in the overall R-loop landscape, resulting in increased and decreased R-loop peaks (Supplementary Note and Extended Data Fig. 4a–j). Interestingly, A3B, as detected by chromatin immunoprecipitation followed by sequencing (ChIP–seq) with A3B-eGFP, appeared to bind preferentially to genomic DNA regions overlapping with PMA-enriched R-loop peaks (Supplementary Note and Extended Data Fig. 4a–d,k). Furthermore, kinetic analysis by DRIP–seq and IF revealed that A3B contributes to timely resolution of PMA-induced R-loops (Supplementary Note and Fig. 6).
Biochemical activities of A3B required for R-loop resolution
To investigate the biochemical activities of A3B in R-loop resolution, WT A3B was purified from 293T cells (Extended Data Fig. 5a) and used for nucleic acid-binding and DNA deamination experiments (Fig. 7 and Extended Data Fig. 5b,c). EMSAs indicated that A3B binds R-loop structures, ssDNA and ssRNA (as expected35,36,46), and, to lesser extents, dsDNA, dsRNA and RNA/DNA hybrid (also expected35,36,46; Extended Data Fig. 5b,c). These native EMSAs were hard to quantify due to accumulation of large protein/nucleic acid complexes in the wells. We therefore quantified the release of fluorescently labeled ssRNA and ssDNA from A3B by incubating with unlabeled nucleic acid competitors. These experiments demonstrated that A3B binds equally strongly to both ssRNA and ssDNA (Fig. 7b).
RNA is an inferred inhibitor of A3B based on experiments where exogenous RNase A treatment is required to detect ssDNA deaminase activity in cancer cell extracts30,47. We therefore wondered whether the RNA in R-loop structures might inhibit the deaminase activity of A3B on the unpaired ssDNA. Qualitative single timepoint reactions indicated clear activity on free ssDNA cytosines and potentially reduced activities on cytosines in bubble, short and long R-loop structures (Fig. 7c). A quantitative time course comparing A3B activity on free ssDNA versus ssDNA in the R-loop structure indicated that the latter substrate is only ~2-fold less preferred (Fig. 7d). These data showed that R-loops can be substrates for A3B-catalyzed ssDNA deamination. The twofold diminution in activity may be due to ssDNA inaccessibility caused by the relatively short nature of the synthetic R-loop (21 nucleotides) and/or competition with unpaired ssDNA or ssRNA.
To gain additional insights into A3B function in R-loop biology, we analyzed the nucleoplasmic R-loop phenotypes of A3B mutants defective in either nuclear localization (Mut1) (ref. 48) or nucleic acid binding (Mut2) (ref. 46). Both of these activities are governed by the N-terminal domain of A3B and independent of the C-terminal domain, which binds ssDNA weakly but catalyzes deamination35,46,48. We confirmed the nuclear localization defect of Mut1 and showed that Mut2 still retains this activity (Fig. 7e). Mut2 was also purified and, in contrast to WT A3B, demonstrated defective binding to ssRNA and ssDNA (Fig. 7f and Extended Data Fig. 5a). However, Mut2 still retained high levels of ssDNA deaminase activity and was similarly active on free and short R-loop-containing ssDNA substrates (Fig. 7g). This result is consistent with the possibility that unpaired nucleic acid may interfere with the deaminase activity of the WT enzyme but not Mut2, which has reduced nucleic acid-binding activity. Key biochemical results with WT and Mut2 A3B were reproduced with independent >85% pure protein preparations (Extended Data Fig. 5d–g). Most importantly, in contrast to WT A3B, neither mutant was capable of decreasing nucleoplasmic R-loop levels following JQ1 treatment in U2OS cells (Fig. 7h,i). The separation-of-function Mut2 protein also had a diminished capacity to co-IP interactors (Extended Data Fig. 1f). These results combined to indicate that both nuclear localization and nucleic acid-binding activities are required for A3B to regulate nucleoplasmic R-loop levels.
Evidence for R-loop mutagenesis by A3B
Our results suggested a model in which exposed ssDNA cytosines in R-loop regions are deaminated by A3B and resolved into mutagenic or nonmutagenic outcomes (Fig. 8a). Mutagenic outcomes are predicted to reflect the intrinsic structural preference of A3B for TC motifs5 and more broadly TCW and RTCW10,49–51. For comparison, A3A exhibits a preference for YTCW motifs10,11,49–51.
First, we predicted that higher rates of transcription should lead to higher rates of R-loop formation and increased exposure to A3B-mediated deamination because prior work had already correlated gene expression and R-loop formation39. This idea was addressed using whole-exome sequenced (WES) breast cancers and corresponding RNA-seq data from The Cancer Genome Atlas (TCGA) project as well as whole-genome sequenced (WGS) breast cancers from the International Cancer Genome Consortium (ICGC) and normal breast tissue gene expression data from the Genotype-Tissue Expression (GTEx) project (Methods). An initial association between gene expression levels and APOBEC3-attributed mutations was intriguing but became insignificant after accounting for gene size (Extended Data Fig. 6a,b). However, a strong positive association emerged between the magnitude of gene overexpression in breast cancer compared to normal breast tissue and the proportion of mutations attributable to APOBEC3 deamination (TCW mutations in Fig. 8b; RTCW/YTCW breakdown in Extended Data Fig. 6c). Thus, the higher the degree of gene overexpression in breast cancer, the higher the proportion of mutations attributable to APOBEC3, with the highest overexpressed gene group showing an average of over 50-fold more APOBEC3 signature mutations than any of the three lowest expressed gene groups (Methods). The highest overexpressed gene group in breast cancer (>16-fold above normal breast tissue) also showed a strong bias of APOBEC3 signature mutation on the nontranscribed strand over the transcribed strand (P < 0.038 by Wilcoxon rank-sum test; Supplementary Table 1).
Second, we predicted that splice factor mutant tumors will show elevated levels of APOBEC3 signature mutations, as splicing defects are known to increase R-loop formation52–54. This idea was investigated by splitting the TCGA breast cancer WES dataset into tumors with and without mutations in splice factor genes and evaluating associations with the proportion of mutations attributable to APOBEC3 activity. Remarkably, 53% of the breast tumors with mutant splice factor genes (43/81) had substantial levels of APOBEC3 signature mutations (Fig. 8c). In contrast, only 35% of breast tumors without mutations in the same splice factor gene set (326/841) showed a detectable APOBEC3 mutation signature (Fig. 8c; P < 0.017 by Fisher’s exact test). Interestingly, the A3B-associated RTCW motif was only absent from one of the splice factor mutant tumors (1/43) in comparison to the nonsplice factor mutant group (52/326) (Extended Data Fig. 6d,e; P = 0.028 by Fisher’s exact test). Splice factor mutant tumors also had a higher mean percentage of APOBEC3-attributed mutations (39% versus 31%, respectively; P = 0.042 by unpaired two-sample Welsh’s t-test) as well as a higher total number of mutations on average than nonsplice factor mutated samples (P = 0.0018 by Welch’s two-sample t-test). Even the top quartile of tumors with the strongest APOBEC3 signature had a higher total number of mutations in the splice factor mutant group (P = 0.0095 by Welch’s two-sample t-test). Similarly sized housekeeping gene sets selected randomly were not highly mutated (Methods). Thus, the observed splice factor defects are likely contributing to the higher rates of mutation. In strong support of A3B-dependent activity on R-loops resulting from aberrant splicing, A3B overexpression suppressed the increase in R-loops caused by treating U2OS with the splicing inhibitor pladienolide B (Plad B; Fig. 8d).
Third, because APOBEC3 signature kataegic events are due to at least one APOBEC3 enzyme17,19–22, we asked what proportion of these events occur in genes and, moreover, occur on the nontranscribed strand versus the transcribed strand. Global mapping of all kataegis events in primary breast adenocarcinomas from the Pan-Cancer Analysis of Whole Genomes (PCAWG) revealed a bimodal distribution with one peak located within 1 kbp of a structural variation (SV) breakpoint and another similarly sized peak much further away from the nearest SV breakpoint (~1 Mbp; n = 198 WGS datasets; blue bars in Fig. 8e). As expected55, the SV breakpoint-proximal subset of kataegis events is likely due to deamination of resected ssDNA ends during recombination repair. Also expected, dispersed APOBEC3-attributed mutations occur on average of >1 Mbp apart (yellow bars in Fig. 8e). In contrast, the majority of APOBEC3-attributed kataegic events (>75%) map >10 kbp away from SV breakpoints (teal bars >10 kbp in Fig. 8e), and ~17% of these events occur within R-loop regions identified above by DRIP–seq (red bars in Fig. 8e; Methods).
Finally, we investigated the sequence motifs of mutations across individual kataegic events compared to nonclustered mutations within R-loop regions partitioned into nontranscribed strand and transcribed strand regions of genes and within intergenic regions. Specifically, we investigated the overall enrichments for A3B-associated RTCA and A3A-associated YTCA tetranucleotide motifs for each mutation found in a sample (R = A or G; Y = C or T)49. This analysis indicated that APOBEC3 kataegic mutations overlapping NTS R-loop regions are skewed toward A3B-associated RTCA motifs, in contrast to dispersed APOBEC3 mutations (Fig. 8f,g and Extended Data Fig. 7). The overall RTCW skew of kataegic (>3 mutations per cluster) versus dispersed APOBEC3 mutations is elevated for mutations occurring on the nontranscribed strand and transcribed strand but not for mutations in intergenic regions (Fig. 8f). For greater stringency, this latter analysis was repeated for longer APOBEC3 kataegic tracts (≥5 mutations per cluster) and a statistically significant enrichment is only evident for RTCA events on the nontranscribed strand of genes (Fig. 8g). Specifically, this significant enrichment was driven by longer, R-loop-associated kataegic events occurring within the nontranscribed strand, which was not observed for transcribed strand or intergenic events (Extended Data Fig. 7). Furthermore, 70% of the R-loop kataegis occurring within the nontranscribed strand were enriched for A3B-associated RTCA motifs compared to a minority of events associated with A3A-like YTCA motifs (Fig. 8g and Extended Data Fig. 7b). Representative nontranscribed strand kataegic events are shown for PRKCA and LGR5 (Fig. 8h). Taken together, these bioinformatic analyses support a model in which at least a subset of R-loop structures is susceptible to C-to-U deamination events that occur on the nontranscribed strand and are most likely catalyzed by A3B.
Discussion
Our studies reveal an unanticipated role for the antiviral enzyme A3B in R-loop biology. We delineate a functional relationship between A3B and R-loops with higher R-loop levels occurring upon A3B deficiency and lower R-loop levels upon A3B overexpression. Genome-wide DRIP–seq experiments in physiological conditions and upon activation of a signal transduction pathway with PMA indicated that thousands of R-loops in cells are affected by A3B. This number represents over 10% of R-loops genome-wide, which is comparable to the impact of established R-loop regulatory factors40,56. These findings are also in line with the knowledge that multiple proteins contribute to R-loop regulation, including RNase H1, RNase H2, TOP1, SETX, AQR, UAP56/DDX39B, FANCD2 and BRCA1/BRCA2 (refs. 23–25). Determining the specific subsets of factors responsible for regulating individual R-loops remains a challenge for future studies.
Our studies also shed light on the molecular mechanism of R-loop resolution. A3B depletion and overexpression have opposing effects with the former causing a net increase in R-loops and the latter a net decrease. A3B complementation experiments revealed that this A3B function requires an intact catalytic glutamate (E255) consistent with a role for cytosine-to-uracil deamination. Nuclear localization is also required, which further supports a direct model and helps rule out indirect cytoplasmic effects. Our biochemical experiments showed that ssRNA- and ssDNA-binding activities are comparable in strength. Together with the fact that A3B’s strong nucleic acid-binding activity resides within the N-terminal domain and the weaker ssDNA-binding activity required for catalysis is governed by the C-terminal domain, we favor a working model in which direct binding of A3B to nascent ssRNA adjacent to R-loops and/or to ssDNA exposed in R-loop structures is critical for R-loop regulation. Based on specialized mechanisms including AID-catalyzed antibody diversification7 and Cas-mediated cytosine base editing57, exposed ssDNA cytosines in R-loop structures can be deaminated by A3B, and then the resulting uracils become substrates for multiple competing DNA repair/replication processes. This can lead to error-free repair as well as multiple error-prone/mutagenic outcomes ranging from signature mutations to DNA breaks and larger-scale chromosome aberrations.
Although we and others15,16 did not find a general association between APOBEC3 signature mutation and gene expression levels, a recent study reported higher APOBEC3 mutation densities on the nontranscribed strand of actively expressed genes in multiple cancer types58. Our studies indicate that the nontranscribed strand of R-loop regions is particularly susceptible to APOBEC3 mutagenesis including kataegis. Moreover, our studies show that transcription-associated defects in cancer such as gross overexpression and splice factor malfunction additionally increase the probability of APOBEC3 mutagenesis. These mechanistic links are further supported by data showing that A3B can suppress the increase in R-loop formation caused by treating cells with the splicing inhibitor Plad B. Despite the possibility that other APOBEC3 enzymes (most notably A3A) may also contribute to R-loop-associated mutations, a specific role for A3A in R-loop homeostasis is disfavored because its overexpression did not affect R-loop levels. In addition, most APOBEC3 kataegic events observed far away from sites of structural variation are enriched for mutations in A3B-associated 5′-RTCW motifs and not in A3A-associated 5′-YTCW motifs.
In addition to the direct mechanism discussed above, a potentially overlapping alternative is A3B-dependent recruitment of proteins known to promote R-loop resolution. Such interactions could be direct or bridged, for instance, by RNA or ssDNA. In support of this possibility, the A3B separation-of-function mutant Mut2, which is deficient in nucleic acid binding but proficient in nuclear import and DNA deamination, is less capable of interacting with several R-loop-associated factors. Moreover, although our studies here focused on strong A3B interactors, several weaker binders such as the helicase DHX9 might be relevant. This R-loop helicase was reported recently as a regulator of A3B antiviral activity59. Such factors may help explain the subset of genes that exhibit decreased R-loop levels in the absence of A3B. Further studies on A3B regulation of R-loop homeostasis will undoubtedly illuminate additional R-loop biology, provide insights into the normal physiological functions of A3B and define new drug-actionable nodes in A3B-overexpressing tumor types such as breast cancer.
Methods
Cell lines and culturing
U2OS cells were obtained from ATCC (HTB-96) and were maintained in McCoy’s 5A Medium (Thermo Fisher Scientific, 16600082) supplemented with 10% FBS (Gibco) and 0.5% Penicillin/Streptomycin (Pen/Strep; 50 units). U2OS shCtrl and shA3B cell lines were made using previously described shCtrl and shA3B lentiviral constructs, viral production and transduction methods and puromycin selection 1 µg ml−1 (ref. 47). U2OS pcDNA3.1-A3-3xHA stable lines were made via linear (NruI digested) transfection and selection using 800 µg ml−1 G418. HEK 293T cells were obtained from ATCC (CRL-3216) and were maintained in RPMI (Hyclone) supplemented with 10% FBS (Gibco) and 0.5% Pen/Strep (50 units). MCF10A cells were obtained from ATCC (CRL-10317) and were maintained in DMEM/F12 (Invitrogen, 11330-032) supplemented with 5% horse serum (Invitrogen, 16050-122), 20 ng ml−1 EGF (Peprotech), 0.5 µg ml−1 hydrocortisone (Sigma-Aldrich, H-0888), 100 ng ml−1 cholera toxin (Sigma-Aldrich, C-8052), 10 μg ml−1 insulin (Sigma-Aldrich, I-1882, I-9278) and 0.5% Pen/Strep (Invitrogen, 15070-063). MCF10A-TREx-A3B-eGFP were maintained in the same MCF10A media described above with the addition of 100 µg ml−1 Normocin. S9.6 Hybridoma cells were obtained from ATCC (HB-8730) and were maintained in DMEM (Hyclone) supplemented with 10% FBS (Gibco) and 0.5% Pen/Strep (50 units). HeLa cells were obtained from N.J. Proudfoot (University of Oxford) and were maintained in DMEM (Sigma-Aldrich) supplemented with 10% FBS (Sigma-Aldrich) and 0.5% Pen/Strep (Invitrogen, 15070-063). MCF10A A3B KO cell line was engineered by transduction with pLentiCRISPR expressing a gRNA targeting both the A3A and A3B genes (Supplementary Table 1). Cells were selected with puromycin and seeded for single-cell cloning. Deletion mutant lines were identified by PCR using primers amplifying unique sequences within the A3B gene and/or the A3A/B junction (primers in ref. 60; Supplementary Table 1) and confirmed by qPCR and immunoblots. U2OS A3B KO cell line was engineered by transduction with pLentiCRISPR expressing a gRNA targeting exon 3 of A3B (Supplementary Table 1). Cells were selected with puromycin and seeded for single-cell cloning. Biallelic A3B KO was confirmed by PCR using primers spanning the gRNA target region and subsequent sequencing in addition to immunoblotting (Supplementary Table 1). HeLa RNAi was performed in six-well plates 24 h after seeding with 22 nM siRNA and Lipofectamine 2000, and after 6 h, the medium was changed. A second transfection was performed 48 h after seeding using the same experimental setting, and then cells were reseeded 24 h before the experiment. siRNAs were purchased from GE Healthcare targeting luciferase (D-001400-01) or A3B (Supplementary Table 1).
Plasmids and cloning
C-terminal eGFP epitope-tagged plasmids used in this study were described previously47,61–63. Catalytic mutant A3B-E255A and shRNA-resistant derivatives were made using standard site-directed mutagenesis. C-terminal 3x-HA epitope-tagged plasmids used in this study were described previously64, and shRNA-resistant derivatives were made using standard site-directed mutagenesis. C-terminal 2xStrep and 3xFlag-tagged eGFP and A3B constructs used for proteomics were described65. cDNA for some interactors constructs were ordered from Origene (RC216648, RC204785 and RC214037) while the rest were cloned from 293T cDNA. 4/TO-C-terminal 3xFlag-tagged interactor constructs used for IP were generated using standard cloning techniques. A3B Mut1 (E22Y/E24R/Y28S/G29R/S31N/Y32T) (ref. 48) and Mut2 (Y13D/Y28S/Y83D/W127S/Y162D/Y191H) (ref. 46) were subcloned into 5/TO-A3B-GFP as a HindIII and KpnI fragment from a reported construct or gBlock (IDT), respectively. The pcDNA5/FRT/TO-mCherry-RNaseHI-D10R-E48R plasmid was reported previously33,66. All oligonucleotide sequences used to generate new constructs are listed in Supplementary Table 1.
AP–MS
The 293T cells were transfected with pcDNA4/TO-A3B-2xStrep-3xFlag or eGFP-2xStrep-3xFlag using Transit LT1 (Mirus). Cells were collected in 1× PBS 48 h post-transfection. Cells were washed two times in 1× PBS followed by lysis (50 mM Tris–HCl (pH 8.0), 1% Tergitol NP-40, 150 mM NaCl, 0.5% sodium deoxycholate, 0.1% SDS, 1 mM DTT, 1× protease inhibitor (Roche), RNase A and DNase). Lysates were subjected to sonication before clearing by centrifugation. Cleared lysates were then added to Strep-Tactin Superflow resin (IBA) followed by end-over-end rotation for 2 h at 4 °C. Following IP, the anti-Strep resin was washed three times in high-salt wash buffer (20 mM Tris–HCl (pH 8.0), 1.5 mM MgCl2, 1 M NaCl, 0.2% Tergitol NP-40, 0.5 mM DTT and 5% glycerol) followed by three washes in low-salt wash buffer (same as high salt but with 150 mM NaCl). To remove detergents for proteomics submission, samples were subjected to three washes of no-detergent wash buffer (20 mM Tris–HCl (pH 8.0), 1.5 mM MgCl2, 150 mM NaCl, 0.5 mM DTT and 5% glycerol). Protein was eluted from the resin in elution buffer (100 mM Tris–HCl (pH 8.0), 150 mM NaCl and 2.5 mM desthiobiotin). Samples were validated using immunoblotting, DNA deaminase activity assays (discussed below) and Coomassie staining. In-solution samples were analyzed by liquid chromatography–mass spectrometry/mass spectrometry (LC–MS/MS) at the Harvard Proteomic Core (A3B AP–MS data are in Supplementary Table 1). CRAPome repository was used to remove likely nonspecific interactions before S9.6 IP overlap analysis67.
For A3B-mycHis purification, 293T cells grown in RPMI were transfected in 15 cm plates with 20 µg of plasmid using a 3:1 ratio of polyethyleneimine (Polysciences PEI 40k, 24765) to DNA. Twenty-four hours post-transfection, the cells were collected by trypsinization, washed in PBS–EDTA and collected by centrifugation. Cell pellets were frozen at −80 °C. For purification, cells were lysed in 25 mM HEPES (pH 7.4), 300 mM sodium chloride, 20 mM imidazole, 10 mM magnesium chloride, 0.5 mM TCEP, 0.1% Triton X-100, 20% glycerol and Roche complete protease inhibitors. Lysis was performed by 2 min of sonication at a 40% duty cycle. Following sonication, RNase A was added to 100 µg ml−1 and Benzonase to 5 units per ml followed by incubation at 37 °C for an hour. Cell debris was pelleted by centrifugation at 16,000g for 30 min at 25 °C. The supernatant was collected, and sodium chloride was added to a final concentration of 1 M. APOBEC3B-mycHis was allowed to bind to 50 µl nickel-NTA resin per 10 × 15 cm plates for 2 h at 4 °C. The resin was collected in BioRad polyprep columns and washed with 25 mM HEPES (pH 7.4), 300 mM sodium chloride, 0.1% Triton X-100, 40 mM imidazole and 20% glycerol. Protein was eluted in the same buffer with the addition of TCEP to 1 mM and 300 mM imidazole. Purity and concentration were assessed by PAGE with Coomassie stain with gels imaged using a LI-COR Odyssey instrument.
As an alternative procedure for A3B-mycHis purification (Extended Data Fig. 5d), Expi293F cells grown in Expi293 Expression Medium were transfected in 60 ml cultures according to the manufacturer’s standard protocol (Thermo Fisher Scientific). Seventy-two hours post-transfection, the cells were collected by centrifugation, washed in PBS–EDTA and pelleted. Cell pellets were frozen at −80 °C. The AP procedure is the same as that described above except RNase A and Benzonase treatment was for 2 h, and APOBEC3B-mycHis was allowed to bind to 50 µl nickel-NTA resin for 2 h at room temperature.
A3B activity assays
Deamination reactions were performed at 37 °C for 2 h using whole cell lysate, 4 pmol of 3′-fluorescein-labeled oligonucleotide, 0.025 U uracil DNA glycosylase (UDG), 1× UDG buffer (NEB) and 1.75 U RNase A. Reaction mixtures were treated with 100 mM NaOH at 95 °C for 10 min to achieve complete backbone breakage. Reaction mixtures were separated on 15% Tris–borate–EDTA (TBE)-urea gels to separate the substrate from the product. Gels were scanned using a Typhoon FLA-7000 image reader.
A3B activity assays with purified A3B-mycHis or mutants were performed similarly as above in 25 mM HEPES (pH 7.4), 50 mM NaCl, 0.4 U ml−1 Roche RNase Inhibitor for the indicated amounts of time at 37 °C. Reactions were stopped at 95 °C for 5 min then UDG was added to 0.4 U per reaction and incubated for 10 min at 37 °C. Sodium hydroxide was added to 100 mM, and reactions were heated to 95 °C for 5 min. An equivalent volume of 80% formamide in 1× TBE with xylene cyanol and bromophenol blue was added, and reactions were heated again to 95 °C for 3 min to ensure the melting of double-stranded regions of DNA/RNA. Products were separated by 15% denaturing PAGE and digitally scanned using a LI-COR Odyssey imager. Quantitation was performed using LI-COR Odyssey software.
Electrophoretic mobility shift assays
For competition experiments, EMSAs were performed in 25 mM HEPES (pH 7.4), 50 mM sodium chloride and 0.4 U µl−1 Roche RNase inhibitor. For R-loop substrate EMSAs, NEB2 buffer (no BSA) was used to promote the annealing of substrates. Oligonucleotide substrates (illustrated in Fig. 7a and full sequences listed in Supplementary Table 1) were annealed by heating the components to 95 °C in a heat block and then permitted to cool to >10 °C below the predicted annealing temperature under the buffer conditions (UNAFold). Reactions were set up with labeled oligo in the tube to which A3B or mutants were added to the appropriate concentration. Reactions were incubated at room temperature for 5 min, and then either run or competitor was added with an additional 10 min incubation at room temperature. To run the gels, an equal volume of agarose gel loading dye (30% polyethylene glycol, 1× TBE and dyes) was added to each reaction mix and half of each reaction was loaded on the gel. Gels were imaged using a LI-COR Odyssey and quantitated with LI-COR Odyssey software.
Drug treatments
PMA (Sigma-Aldrich, P8139) was added to media at 25 ng ml−1 at 37 °C with 5% CO2 for denoted time. JQ1 (Tocris, 4499) was added to media at 0.5 μM at 37 °C with 5% CO2 for 4 h unless denoted otherwise. Triptolide (Tocris, 3253; Selleckchem, S3604) was added to media at 1 μM at 37 °C with 5% CO2 for 4 h unless denoted otherwise. Flavopiridol (Selleckchem, S1230) was added to media at 1 μM at 37 °C with 5% CO2 for 1 h unless denoted otherwise. Dox (MP Biomedicals, 198955) was added to media at 1 μg ml−1 at 37 °C with 5% CO2 for 24 h unless denoted otherwise. The splicing inhibitor, Plad B (ref. 54; Tocris, 6070), was added to media at 5 μM at 37 °C with 5% CO2 for 2 h unless noted otherwise.
Antibodies
Primary antibodies used in these experiments were α-Tubulin (Sigma-Aldrich, T5168; Abcam, ab6046 and ab4074), α-A3B (5210-87-13, custom68), α-Flag (Sigma-Aldrich, F1804), α-Topoisomerase I (Abcam, ab109374), α-Lamin B1 (Abcam, ab16048), α-IgG2a (Sigma-Aldrich, M5409), α-HA (Cell Signaling Technology, 3724S), α-GFP (Abcam, ab290, Lot GR3251545 and GR3270983 for ChIP), α-mCherry (Abcam, ab167453) α-HNRNPUL1 (gift from S. Wilson, University of Sheffield, UK), α-rabbit IgG Isotype Control (Invitrogen, 02-6102, lot RI238244), α-RNA/DNA Hybrid S9.6 (Kerafast, ENH001 or obtained in house from a hybridoma cell line69,70), α-dsDNA (Abcam, ab27156) and α-gamma-H2AX (Novus, NB100-384). Secondary antibodies used were α-rabbit IRdye 800CW (LI-COR, 827-08365), α-mouse IRdye 680LT (LI-COR, 925-68020), α-rabbit HRP (Cell Signaling Technology, 7074P2 or Sigma-Aldrich, A0545) and α-mouse HRP (Cell Signaling Technology, 7076P2 or Sigma-Aldrich, A8924), Alexa Fluor 488 goat anti-mouse IgG (Invitrogen, A-11029), Alexa Fluor 594 goat anti-mouse IgG (Invitrogen, A-11032), Alexa Fluor 488 goat anti-rabbit IgG (Invitrogen, A-11034), Alexa Fluor 647 goat anti-mouse IgG (Invitrogen, A-21236) and Alexa Fluor 594 goat anti-rabbit IgG (Invitrogen, A-11037).
Co-IP experiments
Semi-confluent 293T cells were transfected with plasmids using TransIT-LT1 (Mirus) per the manufacturer’s protocol. Cells were collected in 1× PBS 48 h post-transfection. Cells were washed two times in 1× PBS followed by lysis (150 mM NaCl, 50 mM Tris–HCl (pH 8.0), 0.5% Tergitol, 1× protease inhibitor (Roche), RNase and DNase). Cells were vortexed vigorously and incubated at 4 °C for 30 min before clearing by centrifugation. Cleared lysates were then added to anti-Flag M2 Magnetic Beads (Sigma, M8823) followed by end-over-end rotation overnight at 4 °C. Beads were then washed three times in lysis buffer followed by elution in elution buffer (lysis buffer + 0.15 mg ml−1 Flag Peptide (Sigma-Aldrich)).
EdU and PI staining
Semi-confluent MCF10A or U2OS cells were treated with 10 μM EdU for 2 h before collection. Click-iT Plus EdU Alexa Fluor 488 Flow Cytometry Assay Kit (Invitrogen, C10632) with the addition of FxCycle PI/RNase Staining Solution (Invitrogen, F10797) was used per manufacturer’s protocol, and flow cytometry of a minimum of 10,000 cells per condition was performed on LSRFortessa with subsequent analysis with Flow Jo version 10.8.1 (BD).
RNA/DNA hybrid slot blots
RNA/DNA hybrid slot-blot experiments were performed based on a standard protocol42,71. RNase H sensitivity was carried out by incubation with 2 U of RNase H (NEB, M0297) per microgram of genomic DNA for 18 h at 37 °C. S9.6 and dsDNA samples were run on the same membrane and cut for primary antibody incubation. Images were acquired with LI-COR Odyssey Fc. Exposure settings for each antibody were consistent within experiments. S9.6 signal relative to dsDNA was quantified using Image Studio software (LI-COR Biosciences). Quantification was performed using S9.6 and dsDNA signal within in the linear range and normalized to WT, untreated or control samples.
mRNA RT–qPCR
Isolation of polyA+ mRNA (High Pure RNA Isolation Kit; Roche Life Science, 11828665001), RT to generate cDNA (Transcriptor RTase; Roche Life Science, 3531317001) and qPCR were done according to manufacturer’s protocols. The abundance of various mRNAs was quantified by RT–qPCR relative to the stable housekeeping transcript, TBP. Gene-specific primers have been described72 and are listed in Supplementary Table 1.
DRIP
DRIP was performed using the S9.6 antibody29,69,73. Noncrosslinked nuclei were lysed in nuclear lysis buffer (50 mM Tris–HCl (pH 8.0), 5 mM EDTA, 1% SDS) and subjected to Proteinase K treatment (Sigma-Aldrich) for 3 h at 55 °C. Genomic nucleic acids were precipitated with isopropanol, washed in 75% ethanol and sonicated in IP dilution buffer (16.7 mM Tris–HCl (pH 8.0), 1.2 mM EDTA, 167 mM NaCl, 0.01% SDS, 1.1% Triton X-100) with Diagenode Bioruptor to an average length of 500 base pair (bp). Following addition of protease inhibitors (0.5 mM PMSF, 0.8 µg ml−1 pepstatin A, 1 µg ml−1 leupeptin), sonicated genomic nucleic acids were precleared with protein A Dynabeads (Invitrogen) blocked with acetylated BSA (Sigma-Aldrich, B8894). A total of 10 µg were subjected to S9.6 or no antibody IP overnight at 4 °C. RNase H sensitivity was carried out by incubation with 1.7 U RNase H (NEB, M0297) per microgram of genomic DNA for 3 h at 37 °C before IP. Retrieval of the immunocomplexes with beads, washes and elution was performed as described for ChIP. Samples were incubated with Proteinase K (Sigma-Aldrich) at 45 °C for 2 h. For qPCR analysis, DNA was purified with QIAquick PCR purification kit (QIAGEN) and analyzed by qPCR with Rotor-Gene Q and QuantiTect SYBR green (QIAGEN). The amount of immunoprecipitated material at a particular gene region was calculated as the percentage of input after subtracting the background signal (no antibody control). The primers used for DRIP are listed in Supplementary Table 1. For DRIP–seq analysis, multiple S9.6 IPs were pooled. DNA was purified with MinElute PCR purification kit (QIAGEN) and subjected to library preparation and sequencing on a NovaSeq 6000 with 150 bp paired-end reads at Oxford Genomics Center (WTCHG, University of Oxford).
RNA/DNA hybrid and protein co-IP
DNA/RNA hybrid co-IPs were carried out using S9.6 antibody29,69,74. Noncrosslinked nuclei were lysed in RSB buffer (10 mM Tris–HCl (pH 7.5), 200 mM NaCl, 2.5 mM MgCl2) with the addition of 0.2% sodium deoxycholate, 0.1% SDS, 0.05% sodium lauroyl sarcosinate and 0.5% Triton X-100. Nuclear extracts were then sonicated with Diagenode Bioruptor and diluted in RSB with 0.5% Triton X-100 (RSB + T). RNA/DNA hybrids were immunoprecipitated for 2 h at 4 °C with BSA-blocked protein A Dynabeads (Invitrogen) conjugated with the S9.6 antibody in the presence of 1.2 ng of RNase A (PureLink, Invitrogen). Washes of the immunocomplexes were carried out with RSB + T (four times) and RSB (two times). Immunocomplexes were then eluted by incubating at 70 °C with 1× LDS (Invitrogen) and 100 mM DTT for 10 min. Where indicated, IPs were performed in the presence of 1.3 µM DNA/RNA hybrid competitors70 (Supplementary Table 1). The same procedure was used for protein co-IP, and anti-GFP antibody (Abcam, ab290) was used instead of S9.6 antibody. Proteins were separated by SDS–PAGE and immunoblotted with α-A3B (5210-87-13; ref. 68), α-Topoisomerase I (Abcam, ab109374), α-Lamin B1 (Abcam, ab16048), α-GFP (Abcam, ab290) and α-HNRNPUL1 (gift from S. Wilson, University of Sheffield, UK) antibodies. For RNA/DNA hybrid slot-blot analysis, A3B-eGFP co-IP was performed starting from 350 μg of proteins following the same procedure without the addition of RNase A. Immunocomplexes were eluted in 1% SDS and 0.1 M NaHCO3 for 30 min at room temperature, and nucleic acids were precipitated overnight with isopropanol and glycogen (Roche) after Proteinase K digestion (Sigma-Aldrich) for 2 h at 45 °C. RNase H sensitivity was performed by incubating with 7.5 U of RNase H (NEB, M0297) for 2.5 h at 37 °C.
ChIP
ChIP experiments were done by crosslinking cells with 1% formaldehyde at 37 °C for 15 min before the reactions were quenched with 0.125 M glycine for 5 min29,73. Nuclei were isolated by lysing cells with cell lysis buffer (5 mM PIPES (pH 8.0), 85 mM KCl, 0.5% NP-40 supplemented with 0.5 mM PMSF and 1× complete EDTA-free protease inhibitors; Sigma-Aldrich). Nuclear pellets were then resuspended in nuclear lysis buffer (50 mM Tris–HCl (pH 8.0), 5 mM EDTA, 1% SDS supplemented with 0.5 mM PMSF and 1× complete EDTA-free protease inhibitors; Sigma-Aldrich) before sonication (Diagenode Bioruptor). Insoluble chromatin was removed by centrifugation. Soluble chromatin was then diluted in ChIP IP buffer (16.7 mM Tris–HCl (pH 8.0), 1.2 mM EDTA (pH 8.0), 167 mM NaCl, 0.01% SDS, 1.1% Triton X-100 supplemented with 0.5 mM PMSF and 1× complete EDTA-free protease inhibitors; Sigma-Aldrich) and precleared by incubation with protein A Dynabeads (Invitrogen) blocked with acetylated BSA (Sigma-Aldrich, B8894). Precleared chromatin was then incubated with α-GFP antibody (Abcam, ab290, lot GR3251545 and GR3270983). BSA-blocked protein A Dynabeads were then added to collect immunocomplexes and washed once with buffer A (20 mM Tris–HCl (pH 8.0), 2 mM EDTA, 0.1% SDS, 1% Triton X-100 and 0.150 M NaCl), once with buffer B (20 mM Tris–HCl (pH 8.0), 2 mM EDTA, 0.1% SDS, 1% Triton X-100 and 0.5 M NaCl), once with buffer C (10 mM Tris–HCl (pH 8.0), 1 mM EDTA, 1% NP-40, 1% sodium deoxycholate and 0.25 M LiCl) and then twice with buffer D (10 mM Tris–HCl (pH 8.0) and 1 mM EDTA). Chromatin complexes were eluted in 1% SDS and 0.1 M NaHCO3. Samples were decrosslinked by incubating at 65 °C for at least 4 h in the presence of RNase A (PureLink, Invitrogen) and NaCl (0.3 M) and digested with proteinase K (Sigma-Aldrich) for 2 h at 45 °C. DNA purification and qPCR analysis were performed as described for DRIP. The primers used for ChIP are listed in Supplementary Table 1. For ChIP–seq analysis, multiple ChIP IPs were pooled. DNA was purified with MinElute PCR purification kit (QIAGEN) and subjected to library preparation and sequencing on a NovaSeq 6000 with 150 bp paired-end reads at Oxford Genomics Center (WTCHG, University of Oxford).
IF for R-loop analysis
Experiments were performed similar to reported procedures33,66 with details as follows.
S9.6 IF analysis
U2OS or MCF10A cells either WT or deficient for A3B were analyzed for S9.6 IF as indicated. Untreated cells were analyzed or treatments were performed as described. Treatment with the transcription initiation inhibitor (triptolide, final concentration 1 µM) was performed for 4 h, or cells were transfected with indicated constructs and either treated with JQ1 (final concentration 0.5 μM in DMSO) or equivalent DMSO concentration only control for 4 h or Plad B (final concentration 5 μM) for 2 h. After each indicated treatment, cells were fixed with 100% ice-cold methanol at 4 °C for 10 min, a common fixation method for S9.6 and R-loops33,75–77, followed by washing three times with PBS at room temperature. For in vitro RNase H treatment, fixed cells were washed with nuclease-free water to remove PBS and treated with 150 U ml−1 RNase H in 1× RNase H reaction buffer (NEB, M0297). Cells were incubated for 2 h at 37 °C followed by two 5 min washes with 1× PBS. Untreated samples were similarly treated except using 1× RNase H reaction buffer without enzyme. To detect S9.6, cells were then blocked with 3% BSA/PBS at room temperature for 1 h and incubated with S9.6 antibody (Kerafast, ENH001; 1:200) at 4 °C for 18 h. Some samples were costained with the DNA damage marker γH2AX (Novus, NB100-384; 1:500). Following primary antibody incubation, cells were washed with PBS three times for 5 min and incubated with appropriate secondary antibody for each primary antibody in 3% BSA/PBS blocking buffer at room temperature for 1 h. Cells were then washed in PBS three times for 5 min, and each coverslip was mounted on a 12 mm glass slide using Vectashield mounting medium containing DAPI (Vector Laboratories, H-1200). Samples were analyzed using a Fluoview FV 3000 confocal microscope (Olympus; Miller Laboratory) or Nikon AR1 (University of Minnesota Imaging Center), and nucleoplasmic S9.6 signal was quantified using Image J (v 1.48) as described in Quantification and statistical analysis subsection below. All constructs were expressed to similar levels.
mCherry-RNaseH1-mutant IF analysis
WT or A3B KO U2OS cells were transfected with mCherry-RNaseH1-D10R-E48R catalytic mutant (mCherry-RNaseH1 mut; refs. 33,66,75) and allowed to incubate for 48 h before treatment. Cells expressing mCherry-RNaseH1 mut were either untreated, treated with JQ1 (final concentration 0.5 μM in DMSO) or treated with the equivalent DMSO concentration as a control for 4 h. Following treatment, cells were fixed with 100% ice-cold methanol at 4 °C for 10 min followed by washing three times with PBS at room temperature. Cells on individual coverslips from each condition were mounted on a 12 mm glass slide using Vectashield mounting medium containing DAPI (Vector Laboratories, H-1200). Samples were then analyzed using a Fluoview FV 3000 confocal microscope (Olympus; Miller Laboratory), and mCherry-RNaseH1 mut signal was detected with a 561 nm diode laser and appropriate filter with high-sensitivity Peltier-cooled GaAsP spectral confocal detector. For experiments performed in U2OS A3B KO cells, WT A3B-eGFP or catalytic mutant A3B-E255A-eGFP was cotransfected with mCherry-RNaseH1 mut and cells expressing both constructs were analyzed. For GFP signal of ectopically expressed A3B, samples were analyzed with a 488 nm diode laser and appropriate filter with high-sensitivity Peltier-cooled GaAsP spectral confocal detector. DAPI signals were detected using a 405 nm diode laser and appropriate filter with high-sensitivity Peltier-cooled GaAsP spectral confocal detector. Equal expression between samples was determined by quantification of the total nuclear fluorescence signal for mCherry using Image J and western blotting for both mCherry-RNaseH1 mut and A3B WT and E255A. Quantification of nucleoplasmic mCherry-RNaseH1 mut was performed as described in the Quantification and statistical analysis subsection below.
Immunoblot analysis
For immunoblotting assays, the samples were combined with 2.5× SDS–PAGE loading buffer. Samples were separated by a 4–20% gradient SDS–PAGE gel and transferred to PVDF-FL membranes (Millipore). Membranes were blocked in blocking solution (5% milk + PBS supplemented with 0.1% Tween 20) and then incubated with primary antibody diluted in blocking solution. Secondary antibodies were diluted in blocking solution + 0.02% SDS. Membranes were imaged with a LI-COR Odyssey instrument or film.
ChIP–seq and DRIP–seq data processing
Adapters were trimmed with Cutadapt version 1.13 (ref. 78) in paired-end mode with the following parameters: -q 15, 10 –minimum-length 10 -A AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGT -a AGATCGGAAGAGCACACGTCTGAACTCCAGTCA. Obtained sequences were mapped to the human hg38 reference genome with STAR version 2.6.1d (ref. 79) and the parameters --runThreadN 16 --readFilesCommand gunzip -c –k --alignIntronMax 1 --limitBAMsortRAM 20000000000 --outSAMtype BAM SortedByCoordinate. Properly paired and mapped reads (-f 3) were retained with SAMtools version 1.3.1 (ref. 80). PCR duplicates were removed with Picard MarkDuplicates tool. Reads mapping to the DAC Exclusion List Regions (accession: ENCSR636HFF) were removed with Bedtools version 2.29.2 (ref. 81). FPKM-normalized bigwig files were created with deepTools version 2.5.0.1 (ref. 82) bamCoverage tool with the parameters –bs 10 –p max -e --normalizeUsing RPKM. ChIP–seq and DRIP–seq peaks were called with MACS2 version 2.1.1.20160309 (ref. 83) and the following parameters: callpeak -f BAMPE -g 2.9e9 -B -q 0.01 –call-summits. Each IP and its respective input were used as treatment and control, respectively. DRIP–seq differential peak calling was performed with MACS2 bdgdiff tool.
Transcription unit annotation
Gencode V31 annotation, based on the hg38 version of the human genome, was used to extract the location of the transcription units. All genes were taken from the most 5′ transcription start site to the most 3′ poly(A) site/transcription end site. The eRNAs annotation based on the hg38 version of the human genome was taken from the FANTOM5 database.
Metagene profiles
Metagene profiles were generated from FPKM-normalized bigwig files with Deeptools2 computeMatrix tool with a bin size of 10 bp, and the plotting data were obtained with plotProfile –outFileNameData tool. Graphs were then created with GraphPad Prism 8.3.1.
RNA-seq data processing
RNA-seq data from ref. 30 were processed as follows: adapters were trimmed with Cutadapt in single-end mode with the following parameters: -q 15, 10 –minimum-length 10 -a AGATCGGAAGAGCACACGTCTGAACTCCAGTCA –max-n 1. The trimmed reads were mapped to the human hg38 reference genome with STAR and the parameters --runThreadN 16 --readFilesCommand gunzip -c –k --limitBAMsortRAM 20000000000 --outSAMtype BAM SortedByCoordinate. SAMtools was used to retain only properly mapped reads (-F 4). Gene expression level (transcripts per million) was calculated with Salmon version 0.13.1 (ref. 84) and the Gencode V31 annotation. For each gene, only the highest expressed transcript was retained.
APOBEC mutation and gene expression
Whole-exome sequencing and RNA-seq datasets for all primary breast tumor specimens (n = 977) and normal breast tissues (n = 111) in TCGA were downloaded from the Broad Institute analysis pipeline through the Firehose GDAC resource (http://gdac.broadinstitute.org/). Similarly, whole-genome sequencing datasets for all primary breast tumor samples (n = 794) in the ICGC were downloaded from the ICGC data portal (https://dcc.icgc.org/). Because ICGC tumors lack corresponding RNA-seq data, expression values for genes in normal breast tissues were obtained by averaging available GTEx data (n = 29,589 genes from 396 normal breast tissue samples; https://gtexportal.org/home/).
SBS mutations from TCGA and ICGC breast cancers were used for analyses here (that is, INDELs and other more complex somatic variations were filtered out)55,85. Tumor datasets were ranked initially by APOBEC mutation enrichment scores using established methods49. Enrichment score significance was assessed using a Fisher’s exact test with Benjamini–Hochberg false discovery rate correction (q < 0.05). TCGA breast tumors with significant APOBEC mutational signature enrichments (n = 154 tumors) were used to test whether mutation load per megabase associates with differential gene expression (tumor versus normal tissue). Mean normal expression values for each gene from 111 normal breast tissues from the TCGA breast cancer dataset were used to generate a baseline for determining fold changes in gene expression in tumor tissues. For each of the 154 APOBEC signature-enriched tumors, we first generated the following seven gene expression groups: (1) tumor genes with expression values of 0 (gene number range = 722–3,903 and median = 2,144); (2) tumor genes with expression values less than 0.8-fold of the normals (first quartile of all genes in all tumors; gene number range = 787–4,369 and median = 2,688); (3) tumor genes with fold changes between 0.8- and 1.2-fold of the normals (covers from first quartile to third quartile of all genes; gene number range = 8,530–14,135 and median = 11,556); (4) tumor genes with fold changes between 1.2-fold and 4-fold above the normals (gene number range = 1,297–3,248 and median = 2,018); (5) tumor genes with fold changes between fourfold and eightfold above the normals (gene number range = 57–415 and median = 150); (6) tumor genes with fold changes between 8-fold and 16-fold above the normals (gene number range = 18–304 and median = 67); (7) tumor genes with fold changes greater than 16-fold above the normal (gene number range = 11–698 and median = 53). Finally, we calculated the fraction of APOBEC signature mutations (TCW to TTW or TGW) per tumor per megabase using the exon lengths of the genes in each group.
A similar analysis was done for ICGC tumor mutation versus GTEx expression values. Five expression groups were created—nonexpressed genes in (Exp = 0) and all other genes divided into expression quartiles. Only C-to-G and C-to-T mutations in TCW trinucleotide motifs were used in these analyses and were plotted for each expression group as (1) total number of T(C>G/T)W mutations, (2) total number of T(C>G/T)W mutations divided by the total number of all SBSs in a tumor and (3) total number of T(C>G/T)W mutations as a fraction of the total nucleotide size of genes’ (exons and introns) in that expression group (mutations per megabase per tumor). Gene size information was downloaded from the UCSC table browser resource (https://genome.ucsc.edu/cgi-bin/hgTables), and correspond to the ‘UCSC Genes, knownGene’ reference set. All mutation calls and gene sizes/positions are relative to the hg19 human reference genome.
Splice factor and APOBEC mutation analysis
TCGA mutation data were downloaded from Broad GDAC Firehose as above. In total, 119 splicing factor genes with recurring mutations in 33 cancers were used as the analysis gene set86. In total, 107 of the 119 genes had deleterious mutations in the TCGA BRCA dataset. These deleterious mutations included stop codon mutations, splice site mutations and insertion and deletion frameshift mutations. Trinucleotide contexts were calculated using the deconstructSigs package87. The APOBEC mutation signature in this analysis included all COSMIC SBS2 and/or SBS13 mutations8,88. Statistical analyses were done with Fisher’s exact tests (with ɑ = 0.05) and Student’s t-tests as indicated.
Housekeeping gene set analysis
We performed 100,000 random selections of 119 housekeeping genes from a previously defined set of 3,804 (ref. 89). In each iteration, we asked whether the selected 119 genes contained one or more deleterious mutations (that is, frameshift, stop codon or splice site) in each tumor of the TCGA breast cancer dataset (n = 841). From these iterations, the median number of mutated housekeeping genes was 35/119, the minimum was 0/119 and the maximum was 79/119. Similarly, from these iterations, the median number of tumors containing mutations in housekeeping genes was 15/841, the minimum was 0/841 and the maximum was 38/841. In contrast, from the 119 splice factor genes reported to be mutated across cancer, 107 of these were found to contain deleterious mutations in the TCGA breast cancer dataset; these 107 mutated splice factor genes are distributed across 81 breast tumors (that is, 81/922 TCGA tumors). For each of the 100,000 iterations, a Fisher’s exact test was done for APOBEC3 signature enrichment, and, in all instances after correcting for multiple hypothesis testing, no significant enrichment was found for the housekeeping gene sets (Benjamini–Hochberg corrected Q = 1.0).
APOBEC kataegis analysis using PCAWG WGS datasets
To analyze APOBEC-associated kataegis, the set of WGS breast adenocarcinomas was downloaded from the official PCAWG release (https://dcc.icgc.org/releases/PCAWG; n = 198). Kataegic events were detected using a sample-dependent intermutational distance (IMD) cutoff, which is unlikely to occur by chance given the mutational burden and mutational pattern of each sample21,90. SigProfilerSimulator (v 1.1.2) was used to generate a random distribution of the mutational spectra while maintaining the ±2 bp sequence context and the strand coordination within genic regions of each mutation91. This background model was used to determine the cutoff for the sample-dependent IMD by ensuring that 90% of clustered mutations occur within the original sample compared to the expected distribution (Q < 0.01). The heterogeneity of mutation rates across the genome and the confounding effects of copy number alterations and clonality were addressed by performing a 10 Mbp regional mutation density correction and by using a cutoff for the difference in variant allele frequencies between adjacent mutations in a clustered event (variant allele frequency difference <0.10) (ref. 21). Clustered events consisting of ≥3 or ≥5 mutations were classified as kataegis. Events that did not fall within 10 kbp of a detected structural variant breakpoint were used for nonstructural variation associated downstream analysis. All breakpoints were determined based on the official PCAWG release. Only base substitution mutations with TCW context were considered associated with APOBEC3 mutagenesis. A 1,000 bp window was included upstream and downstream of each DRIP–seq R-loop region to determine overlap with kataegic events. Mutation enrichment analysis was performed for each mutation by normalizing for the availability of a given motif (RTCA or YTCA) and the number of cytosines within ±20 bp (ref. 49). Additional analyses were conducted using R, Prism (v8.0), and the ggplot2 R package. Statistical significance between the tetranucleotide enrichments of kataegis and dispersed APOBEC3 mutation datasets was determined using a nonparametric Fisher’s exact test, using an α of 0.05 (P values reported in the text). Statistical significance for tetranucleotide mutation biases within samples containing overlaps of R-loop and kataegic events compared to dispersed mutations was assessed using a Mann–Whitney U test (Q values shown in each dot plot). The Cohen’s D effect size was calculated across all pairwise region comparisons to assess the skew of the distributions within R-loop-associated kataegis in comparison to all genome-wide kataegis.
Quantification and statistical analysis
S9.6 and mCherry-RNaseH1-mut IF quantification was done as described in refs. 33,66. Specifically, mCherry-RNaseH1 mut or S9.6 images obtained on the confocal microscope were opened in Image J (v 1.48). For each image, nuclei of individual cells (≥60 cells per sample) were outlined using the selection tools function. Fluorescence intensity per area of each selection (entire nucleus) was measured using the measure function. Nucleoli for each nucleus were identified by importing DAPI overlayed channels for each image. The fluorescence intensity of nucleoli was measured by selecting DNA-free regions and using the measure function. Nucleoli-only intensity was subtracted from the total nuclear fluorescence signal to obtain the nucleoplasmic fluorescence intensity for either S9.6 or mCherry-RNaseH1 mut. These readings were normalized to control samples to obtain the ‘relative fluorescence intensity’. For statistical analysis, one-way analysis of variance was used when comparing more than two groups followed by a Dunnett’s multiple comparison test, a Mann–Whitney test or a two-tailed Student’s t-test as indicated. Statistical analyses for bioinformatic studies are described above.
Reporting summary
Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.
Online content
Any methods, additional references, Nature Portfolio reporting summaries, source data, extended data, supplementary information, acknowledgements, peer review information; details of author contributions and competing interests; and statements of data and code availability are available at 10.1038/s41588-023-01504-w.
Supplementary information
Source data
Acknowledgements
We thank H. Gupta (UT Health San Antonio) and N.J. Proudfoot (Oxford University) for critically reading the manuscript, J. Becker and J. Duda (University of Minnesota) for corroborative localization data with A3B mutants, A. Taylor (UT Health San Antonio) for suggesting alternative purification procedures, the University of Minnesota Imaging Center for access to instrumentation and the Oxford Genomics Center at the Wellcome Center for Human Genetics (funded by Wellcome Trust grant 203141/Z/16/Z) for the generation and initial processing of the sequencing data. Studies in the Harris Lab were supported by NCI P01 CA234228 (to R.S.H.), NIAID R37 AI064046 (to R.S.H.) and a Recruitment of Established Investigators Award from the Cancer Prevention and Research Institute of Texas (CPRIT RR220053 to R.S.H.). NG Lab is supported by the Royal Society University Research Fellowship (BVD07340), Royal Society Enhancement Award (RGF\EA\180023) and EPA Research Fund (Sir William Dunn School of Pathology, University of Oxford) to N.G. and CRUK development fund (CRUK DF-0119) to A.C. and N.G. M.T. and S.M. are supported by the Wellcome Trust Investigator Award (WT210641/Z/18/Z to S.M.). KMM Lab was supported by NCI (RO1 CA198279, CA201268 and CA250905), Cancer Prevention and Research Institute of Texas (RP220330) and a postdoctoral fellowship (PF-22-092-01-DMC) from the American Cancer Society to A.S. LBA Lab was supported by US National Institutes of Health (R01 ES030993 and R01 ES032547). Salary support for J.L.M. was provided by an NSF Graduate Research Fellowship (00039202) and by HHMI. Salary support for M.C.J. was provided by T32 CA009138 and NCI F31 CA243306. Salary support for B.S. was provided by HHMI and the Ovarian Cancer Research Alliance (Mentored Investigator Grant 812337). Salary support for D.J.S. was provided by NIAID K99 AI147811. R.S.H. is the Ewing Halsell President’s Council Distinguished Chair, a CPRIT Scholar and an investigator of the Howard Hughes Medical Institute at the University of Texas Health San Antonio.
Extended data
Author contributions
R.S.H., J.L.M., A.C. and N.G. conceived and designed these studies. J.L.M. and A.C. performed experiments unless otherwise noted. E.K.L. and S.L. made equal secondary contributions. E.K.L. generated U2OS knockdown and complement cell lines and assisted in tissue culture and genomic DNA isolations for dot-blot experiments. S.L., J.K., A.S. and K.M.M. designed, performed and quantified IF experiments. M.T. and S.M. conducted DRIP–seq/ChIP–seq data analysis. C.B. performed DRIP–qPCR validations and HeLa R-loop IP. B.S. assisted with cell culture experiments and R-loop quantification. M.R.B. assisted with cell culture studies. M.C.J., N.A.T., D.J.S., E.N.B. and L.B.A. performed bioinformatic analyses. M.A.C. contributed to biochemical experiments. R.S.H. and J.L.M. drafted the manuscript with input from all other authors.
Peer review
Peer review information
Nature Genetics thanks Stephan Hamperl and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Peer reviewer reports are available.
Data availability
The Gene Expression Omnibus accession number for the ChIP–seq and DRIP–seq datasets reported in this paper is GSE148581. Questions regarding these sequencing data can be addressed to N.G. or R.S.H. The A3B AP–MS datasets are in Supplementary Table 1. Questions regarding these proteomic results can be addressed to R.S.H. Requests for materials and/or questions regarding any of the constructs, cell lines, microscopy results or other data described here can be addressed to R.S.H. Source data are provided with this paper.
Code availability
No custom code or software was generated as part of the study. Details of all software packages used for data processing and/or analysis may be found in the Methods.
Competing interests
The authors declare no competing interests.
Footnotes
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
These authors contributed equally: Jennifer L. McCann, Agnese Cristini.
These authors jointly supervised this work: Kyle M. Miller, Natalia Gromak, Reuben S. Harris.
Contributor Information
Kyle M. Miller, Email: kyle.miller@austin.utexas.edu
Natalia Gromak, Email: natalia.gromak@path.ox.ac.uk.
Reuben S. Harris, Email: rsh@uthscsa.edu
Extended data
is available for this paper at 10.1038/s41588-023-01504-w.
Supplementary information
The online version contains supplementary material available at 10.1038/s41588-023-01504-w.
References
- 1.Green AM, Weitzman MD. The spectrum of APOBEC3 activity: from anti-viral agents to anti-cancer opportunities. DNA Repair (Amst.) 2019;83:102700. doi: 10.1016/j.dnarep.2019.102700. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Harris RS, Dudley JP. APOBECs and virus restriction. Virology. 2015;479–480:131–145. doi: 10.1016/j.virol.2015.03.012. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Kohli RM, et al. Local sequence targeting in the AID/APOBEC family differentially impacts retroviral restriction and antibody diversification. J. Biol. Chem. 2010;285:40956–40964. doi: 10.1074/jbc.M110.177402. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Wang M, Rada C, Neuberger MS. Altering the spectrum of immunoglobulin V gene somatic hypermutation by modifying the active site of AID. J. Exp. Med. 2010;207:141–153. doi: 10.1084/jem.20092238. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Shi K, et al. Structural basis for targeted DNA cytosine deamination and mutagenesis by APOBEC3A and APOBEC3B. Nat. Struct. Mol. Biol. 2017;24:131–139. doi: 10.1038/nsmb.3344. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Petljak M, Maciejowski J. Molecular origins of APOBEC-associated mutations in cancer. DNA Repair (Amst.) 2020;94:102905. doi: 10.1016/j.dnarep.2020.102905. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Casellas R, et al. Mutations, kataegis and translocations in B cells: understanding AID promiscuous activity. Nat. Rev. Immunol. 2016;16:164–176. doi: 10.1038/nri.2016.2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Alexandrov LB, et al. The repertoire of mutational signatures in human cancer. Nature. 2020;578:94–101. doi: 10.1038/s41586-020-1943-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Petljak M, et al. Mechanisms of APOBEC3 mutagenesis in human cancer cells. Nature. 2022;607:799–807. doi: 10.1038/s41586-022-04972-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Carpenter, M. A. et al. Mutational impact of APOBEC3A and APOBEC3B in a human cell line and comparisons to breast cancer. Preprint at bioRxiv10.1101/2022.04.26.489523 (2023). [DOI] [PMC free article] [PubMed]
- 11.DeWeerd RA, et al. Prospectively defined patterns of APOBEC3A mutagenesis are prevalent in human cancers. Cell Rep. 2022;38:110555. doi: 10.1016/j.celrep.2022.110555. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Seplyarskiy VB, et al. APOBEC-induced mutations in human cancers are strongly enriched on the lagging DNA strand during replication. Genome Res. 2016;26:174–182. doi: 10.1101/gr.197046.115. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Hoopes JI, et al. APOBEC3A and APOBEC3B preferentially deaminate the lagging strand template during DNA replication. Cell Rep. 2016;14:1273–1282. doi: 10.1016/j.celrep.2016.01.021. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Bhagwat AS, et al. Strand-biased cytosine deamination at the replication fork causes cytosine to thymine mutations in Escherichia coli. Proc. Natl Acad. Sci. USA. 2016;113:2176–2181. doi: 10.1073/pnas.1522325113. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Haradhvala NJ, et al. Mutational strand asymmetries in cancer genomes reveal mechanisms of DNA damage and repair. Cell. 2016;164:538–549. doi: 10.1016/j.cell.2015.12.050. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Morganella S, et al. The topography of mutational processes in breast cancer genomes. Nat. Commun. 2016;7:11383. doi: 10.1038/ncomms11383. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Nik-Zainal S, et al. Landscape of somatic mutations in 560 breast cancer whole-genome sequences. Nature. 2016;534:47–54. doi: 10.1038/nature17676. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Buisson R, et al. Passenger hotspot mutations in cancer driven by APOBEC3A and mesoscale genomic features. Science. 2019;364:eaaw2872. doi: 10.1126/science.aaw2872. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Nik-Zainal S, et al. Mutational processes molding the genomes of 21 breast cancers. Cell. 2012;149:979–993. doi: 10.1016/j.cell.2012.04.024. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Taylor BJ, et al. DNA deaminases induce break-associated mutation showers with implication of APOBEC3B and 3A in breast cancer kataegis. eLife. 2013;2:e00534. doi: 10.7554/eLife.00534. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Bergstrom EN, et al. Mapping clustered mutations in cancer reveals APOBEC3 mutagenesis of ecDNA. Nature. 2022;602:510–517. doi: 10.1038/s41586-022-04398-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Roberts SA, et al. Clustered mutations in yeast and in human cancers can arise from damaged long single-strand DNA regions. Mol. Cell. 2012;46:424–435. doi: 10.1016/j.molcel.2012.03.030. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Garcia-Muse T, Aguilera A. R loops: from physiological to pathological roles. Cell. 2019;179:604–618. doi: 10.1016/j.cell.2019.08.055. [DOI] [PubMed] [Google Scholar]
- 24.Petermann E, Lan L, Zou L. Sources, resolution and physiological relevance of R-loops and RNA-DNA hybrids. Nat. Rev. Mol. Cell Biol. 2022;23:521–540. doi: 10.1038/s41580-022-00474-x. [DOI] [PubMed] [Google Scholar]
- 25.Brickner JR, Garzon JL, Cimprich KA. Walking a tightrope: the complex balancing act of R-loops in genome stability. Mol. Cell. 2022;82:2267–2297. doi: 10.1016/j.molcel.2022.04.014. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Serebrenik AA, et al. The deaminase APOBEC3B triggers the death of cells lacking uracil DNA glycosylase. Proc. Natl Acad. Sci. USA. 2019;116:22158–22163. doi: 10.1073/pnas.1904024116. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Venkatesan S, et al. Perspective: APOBEC mutagenesis in drug resistance and immune escape in HIV and cancer evolution. Ann. Oncol. 2018;29:563–572. doi: 10.1093/annonc/mdy003. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Roelofs PA, Martens JWM, Harris RS, Span PN. Clinical implications of APOBEC3-mediated mutagenesis in breast cancer. Clin. Cancer Res. 2023;29:1658–1669. doi: 10.1158/1078-0432.CCR-22-2861. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Cristini A, Groh M, Kristiansen MS, Gromak N. RNA/DNA hybrid interactome identifies DXH9 as a molecular player in transcriptional termination and R-loop-associated DNA damage. Cell Rep. 2018;23:1891–1905. doi: 10.1016/j.celrep.2018.04.025. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Leonard B, et al. The PKC/NF-kappaB signaling pathway induces APOBEC3B expression in multiple human cancers. Cancer Res. 2015;75:4538–4547. doi: 10.1158/0008-5472.CAN-15-2171-T. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Tuduri S, et al. Topoisomerase I suppresses genomic instability by preventing interference between replication and transcription. Nat. Cell Biol. 2009;11:1315–1324. doi: 10.1038/ncb1984. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Smolka JA, Sanz LA, Hartono SR, Chedin F. Recognition of RNA by the S9.6 antibody creates pervasive artifacts when imaging RNA:DNA hybrids. J.Cell. Biol. 2021;220:e202004079. doi: 10.1083/jcb.202004079. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Kim JJ, et al. Systematic bromodomain protein screens identify homologous recombination and R-loop suppression pathways involved in genome integrity. Genes Dev. 2019;33:1751–1774. doi: 10.1101/gad.331231.119. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Lee SY, Kim JJ, Miller KM. Bromodomain proteins: protectors against endogenous DNA damage and facilitators of genome integrity. Exp. Mol. Med. 2021;53:1268–1277. doi: 10.1038/s12276-021-00673-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Ito F, Fu Y, Kao SA, Yang H, Chen XS. Family-wide comparative analysis of cytidine and methylcytidine deamination by eleven human APOBEC proteins. J. Mol. Biol. 2017;429:1787–1799. doi: 10.1016/j.jmb.2017.04.021. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Adolph MB, Love RP, Feng Y, Chelico L. Enzyme cycling contributes to efficient induction of genome mutagenesis by the cytidine deaminase APOBEC3B. Nucleic Acids Res. 2017;45:11925–11940. doi: 10.1093/nar/gkx832. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Chaurasiya KR, et al. Oligomerization transforms human APOBEC3G from an efficient enzyme to a slowly dissociating nucleic acid-binding protein. Nat. Chem. 2014;6:28–33. doi: 10.1038/nchem.1795. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Wang Y, Lu JJ, He L, Yu Q. Triptolide (TPL) inhibits global transcription by inducing proteasome-dependent degradation of RNA polymerase II (Pol II) PLoS ONE. 2011;6:e23993. doi: 10.1371/journal.pone.0023993. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Sanz LA, et al. Prevalent, dynamic, and conserved R-loop structures associate with specific epigenomic signatures in mammals. Mol. Cell. 2016;63:167–178. doi: 10.1016/j.molcel.2016.05.032. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Manzo SG, et al. DNA topoisomerase I differentially modulates R-loops across the human genome. Genome Biol. 2018;19:100. doi: 10.1186/s13059-018-1478-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Nojima T, et al. Deregulated expression of mammalian lncRNA through loss of SPT6 induces R-loop formation, replication stress, and cellular senescence. Mol. Cell. 2018;72:970–984. doi: 10.1016/j.molcel.2018.10.011. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.Kotsantis P, et al. Increased global transcription activity as a mechanism of replication stress in cancer. Nat. Commun. 2016;7:13087. doi: 10.1038/ncomms13087. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Stork CT, et al. Co-transcriptional R-loops are the main cause of estrogen-induced DNA damage. eLife. 2016;5:e17548. doi: 10.7554/eLife.17548. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44.Gorthi A, et al. EWS-FLI1 increases transcription to cause R-loops and block BRCA1 repair in Ewing sarcoma. Nature. 2018;555:387–391. doi: 10.1038/nature25748. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45.Holden NS, et al. Phorbol ester-stimulated NF-kappa B-dependent transcription: roles for isoforms of novel protein kinase C. Cell. Signal. 2008;20:1338–1348. doi: 10.1016/j.cellsig.2008.03.001. [DOI] [PubMed] [Google Scholar]
- 46.Xiao X, et al. Structural determinants of APOBEC3B non-catalytic domain for molecular assembly and catalytic regulation. Nucleic Acids Res. 2017;45:7494–7506. doi: 10.1093/nar/gkx362. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47.Burns MB, et al. APOBEC3B is an enzymatic source of mutation in breast cancer. Nature. 2013;494:366–370. doi: 10.1038/nature11881. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48.Salamango DJ, et al. APOBEC3B nuclear localization requires two distinct N-terminal domain surfaces. J. Mol. Biol. 2018;430:2695–2708. doi: 10.1016/j.jmb.2018.04.044. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49.Chan K, et al. An APOBEC3A hypermutation signature is distinguishable from the signature of background mutagenesis by APOBEC3B in human cancers. Nat. Genet. 2015;47:1067–1072. doi: 10.1038/ng.3378. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 50.Roberts SA, et al. An APOBEC cytidine deaminase mutagenesis pattern is widespread in human cancers. Nat. Genet. 2013;45:970–976. doi: 10.1038/ng.2702. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 51.Law EK, et al. APOBEC3A catalyzes mutation and drives carcinogenesis in vivo. J. Exp. Med. 2020;217:e20200261. doi: 10.1084/jem.20200261. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 52.Li X, Manley JL. Inactivation of the SR protein splicing factor ASF/SF2 results in genomic instability. Cell. 2005;122:365–378. doi: 10.1016/j.cell.2005.06.008. [DOI] [PubMed] [Google Scholar]
- 53.Chen L, et al. The augmented R-loop is a unifying mechanism for myelodysplastic syndromes induced by high-risk splicing factor mutations. Mol. Cell. 2018;69:412–425. doi: 10.1016/j.molcel.2017.12.029. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 54.Wan Y, et al. Splicing function of mitotic regulators links R-loop-mediated DNA damage to tumor cell killing. J. Cell Biol. 2015;209:235–246. doi: 10.1083/jcb.201409073. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 55.ICGC/TCGAPan-Cancer Analysis of Whole Genomes Consortium Pan-cancer analysis of whole genomes. Nature. 2020;578:82–93. doi: 10.1038/s41586-020-1969-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 56.Villarreal OD, Mersaoui SY, Yu Z, Masson JY, Richard S. Genome-wide R-loop analysis defines unique roles for DDX5, XRN2, and PRMT5 in DNA/RNA hybrid resolution. Life Sci. Alliance. 2020;3:e202000762. doi: 10.26508/lsa.202000762. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 57.Rallapalli KL, Komor AC. The design and application of DNA-editing enzymes as base editors. Annu. Rev. Biochem. 2023;92:43–79. doi: 10.1146/annurev-biochem-052521-013938. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 58.Chervova A, et al. Analysis of gene expression and mutation data points on contribution of transcription to the mutagenesis by APOBEC enzymes. NAR Cancer. 2021;3:zcab025. doi: 10.1093/narcan/zcab025. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 59.Chen Y, et al. DHX9 interacts with APOBEC3B and attenuates the anti-HBV effect of APOBEC3B. Emerg. Microbes Infect. 2020;9:366–377. doi: 10.1080/22221751.2020.1725398. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 60.Kidd JM, Newman TL, Tuzun E, Kaul R, Eichler EE. Population stratification of a common APOBEC gene deletion polymorphism. PLoS Genet. 2007;3:e63. doi: 10.1371/journal.pgen.0030063. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 61.Land AM, et al. Endogenous APOBEC3A DNA cytosine deaminase is cytoplasmic and nongenotoxic. J. Biol. Chem. 2013;288:17253–17260. doi: 10.1074/jbc.M113.458661. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 62.Lackey L, et al. APOBEC3B and AID have similar nuclear import mechanisms. J. Mol. Biol. 2012;419:301–314. doi: 10.1016/j.jmb.2012.03.011. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 63.Lackey L, Law EK, Brown WL, Harris RS. Subcellular localization of the APOBEC3 proteins during mitosis and implications for genomic DNA deamination. Cell Cycle. 2013;12:762–772. doi: 10.4161/cc.23713. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 64.Hultquist JF, et al. Human and rhesus APOBEC3D, APOBEC3F, APOBEC3G, and APOBEC3H demonstrate a conserved capacity to restrict Vif-deficient HIV-1. J. Virol. 2011;85:11220–11234. doi: 10.1128/JVI.05238-11. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 65.McCann JL, et al. The DNA deaminase APOBEC3B interacts with the cell-cycle protein CDK4 and disrupts CDK4-mediated nuclear import of cyclin D1. J. Biol. Chem. 2019;294:12099–12111. doi: 10.1074/jbc.RA119.008443. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 66.Makharashvili N, et al. Sae2/CtIP prevents R-loop accumulation in eukaryotic cells. eLife. 2018;7:e42733. doi: 10.7554/eLife.42733. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 67.Mellacheruvu D, et al. The CRAPome: a contaminant repository for affinity purification-mass spectrometry data. Nat. Methods. 2013;10:730–736. doi: 10.1038/nmeth.2557. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 68.Brown WL, et al. A rabbit monoclonal antibody against the antiviral and cancer genomic DNA mutating enzyme APOBEC3B. Antibodies (Basel) 2019;8:47. doi: 10.3390/antib8030047. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 69.Boguslawski SJ, et al. Characterization of monoclonal antibody to DNA.RNA and its application to immunodetection of hybrids. J. Immunol. Methods. 1986;89:123–130. doi: 10.1016/0022-1759(86)90040-2. [DOI] [PubMed] [Google Scholar]
- 70.Phillips DD, et al. The sub-nanomolar binding of DNA-RNA hybrids by the single-chain Fv fragment of antibody S9.6. J. Mol. Recognit. 2013;26:376–381. doi: 10.1002/jmr.2284. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 71.Sollier J, et al. Transcription-coupled nucleotide excision repair factors promote R-loop-induced genome instability. Mol. Cell. 2014;56:777–785. doi: 10.1016/j.molcel.2014.10.020. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 72.Refsland EW, et al. Quantitative profiling of the full APOBEC3 mRNA repertoire in lymphocytes and tissues: implications for HIV-1 restriction. Nucleic Acids Res. 2010;38:4274–4284. doi: 10.1093/nar/gkq174. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 73.Cristini A, et al. Dual processing of R-loops and topoisomerase I induces transcription-dependent DNA double-strand breaks. Cell Rep. 2019;28:3167–3181. doi: 10.1016/j.celrep.2019.08.041. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 74.Beghe C, Gromak N. R-loop immunoprecipitation: a method to detect R-Loop interacting factors. Methods Mol. Biol. 2022;2528:215–237. doi: 10.1007/978-1-0716-2477-7_14. [DOI] [PubMed] [Google Scholar]
- 75.Bhatia V, et al. BRCA2 prevents R-loop accumulation and associates with TREX-2 mRNA export factor PCID2. Nature. 2014;511:362–365. doi: 10.1038/nature13374. [DOI] [PubMed] [Google Scholar]
- 76.Crossley MP, et al. Catalytically inactive, purified RNase H1: a specific and sensitive probe for RNA-DNA hybrid imaging. J.Cell. Biol. 2021;220:e202101092. doi: 10.1083/jcb.202101092. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 77.Skourti-Stathaki K, Kamieniarz-Gdula K, Proudfoot NJ. R-loops induce repressive chromatin marks over mammalian gene terminators. Nature. 2014;516:436–439. doi: 10.1038/nature13787. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 78.Martin M. Cutadapt removes adapter sequences from high-throughput sequencing reads. EMBnet J. 2011;17:10–12. [Google Scholar]
- 79.Dobin A, et al. STAR: ultrafast universal RNA-seq aligner. Bioinformatics. 2013;29:15–21. doi: 10.1093/bioinformatics/bts635. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 80.Li H, et al. The sequence alignment/map format and SAMtools. Bioinformatics. 2009;25:2078–2079. doi: 10.1093/bioinformatics/btp352. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 81.Quinlan AR, Hall IM. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics. 2010;26:841–842. doi: 10.1093/bioinformatics/btq033. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 82.Ramirez F, et al. deepTools2: a next generation web server for deep-sequencing data analysis. Nucleic Acids Res. 2016;44:W160–W165. doi: 10.1093/nar/gkw257. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 83.Zhang Y, et al. Model-based analysis of ChIP–seq (MACS) Genome Biol. 2008;9:R137. doi: 10.1186/gb-2008-9-9-r137. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 84.Patro R, Duggal G, Love MI, Irizarry RA, Kingsford C. Salmon provides fast and bias-aware quantification of transcript expression. Nat. Methods. 2017;14:417–419. doi: 10.1038/nmeth.4197. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 85.Cancer Genome Atlas Network Comprehensive molecular portraits of human breast tumours. Nature. 2012;490:61–70. doi: 10.1038/nature11412. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 86.Seiler M, et al. Somatic mutational landscape of splicing factor genes and their functional consequences across 33 cancer types. Cell Rep. 2018;23:282–296. doi: 10.1016/j.celrep.2018.01.088. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 87.Rosenthal R, McGranahan N, Herrero J, Taylor BS, Swanton C. DeconstructSigs: delineating mutational processes in single tumors distinguishes DNA repair deficiencies and patterns of carcinoma evolution. Genome Biol. 2016;17:31. doi: 10.1186/s13059-016-0893-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 88.Alexandrov LB, et al. Signatures of mutational processes in human cancer. Nature. 2013;500:415–421. doi: 10.1038/nature12477. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 89.Eisenberg E, Levanon EY. Human housekeeping genes, revisited. Trends Genet. 2013;29:569–574. doi: 10.1016/j.tig.2013.05.010. [DOI] [PubMed] [Google Scholar]
- 90.Bergstrom EN, Kundu M, Tbeileh N, Alexandrov LB. Examining clustered somatic mutations with SigProfilerClusters. Bioinformatics. 2022;38:3470–3473. doi: 10.1093/bioinformatics/btac335. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 91.Bergstrom EN, Barnes M, Martincorena I, Alexandrov LB. Generating realistic null hypothesis of cancer mutational landscapes using SigProfilerSimulator. BMC Bioinformatics. 2020;21:438. doi: 10.1186/s12859-020-03772-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
The Gene Expression Omnibus accession number for the ChIP–seq and DRIP–seq datasets reported in this paper is GSE148581. Questions regarding these sequencing data can be addressed to N.G. or R.S.H. The A3B AP–MS datasets are in Supplementary Table 1. Questions regarding these proteomic results can be addressed to R.S.H. Requests for materials and/or questions regarding any of the constructs, cell lines, microscopy results or other data described here can be addressed to R.S.H. Source data are provided with this paper.
No custom code or software was generated as part of the study. Details of all software packages used for data processing and/or analysis may be found in the Methods.