Abstract
One of the two chromosomal breakage events in recurring translocations in B cell neoplasms is often due to the recombination-activated gene complex (RAG-complex) releasing DNA ends before end joining. The other break occurs in a fragile zone of 20 to 600 bp in a non-antigen receptor gene locus with a more complex and intriguing set of mechanistic factors underlying such narrow fragile zones. These factors include Activation-Induced Deaminase (AID), which only acts at regions of single-stranded DNA (ssDNA). Recent work leads to a model involving tethering of AID to the nascent RNA as it emerges from the RNA polymerase. This mechanism may have relevance in class switch recombination and somatic hypermutation as well as broader relevance for other DNA enzymes.
Keywords: Activation-induced deaminase (AID), Double-strand break (DSB), Non-homologous end joining (NHEJ), DNA methylation, nascent RNA, APOBEC
New Perspectives on Chromosomal Translocations
Most cancers are driven by altered gene expression that regulates cell proliferation or survival due to chromosomal translocations (see Glossary) between two chromosomes [1,2], as commonly observed in lymphoid malignancies [3-5]. The translocation breakpoint for one of the two chromosomes in lymphoid cells is often within the antigen receptor loci because the RAG-complex normally cleaves at a specific nucleotide within the V(D)J signal sequences. For the partner chromosome involved in the translocation, a gene breaks within concise translocation fragile zones or breakpoint cluster regions of 20 to 600 bp without easily discernable features (Figure 1) [4,6]. The mechanistic basis for focused translocation fragile zones (less than three nucleosomes in length) has been elusive despite some of these breakpoint cluster regions having been clearly defined for decades.
Figure 1. Clustered breakpoints at the BCL2, E2A, and BCL1 genes in B cell chromosomal translocations.
Schematics of (A) BCL2-IGH, (B) E2A-PBX1, and (C) BCL1-IGH translocations are illustrated in the left panel. The patient breakpoints within the fragile zones are shown on the right side of each panel. Each triangle above the sequences denotes the breakpoint sequenced from an individual patient. Among patients presented above the sequences, the breakpoints sequenced from the reciprocal derivative chromosome from some of the same patients are represented by the triangles below the sequences when such information is available. The CG motif within the fragile regions is highlighted in red. Directly overlapping WRCG motifs are boxed in blue. The DSB at the antigen receptor loci is generated by the RAG-complex in V(D)J recombination. The break at the translocation partner gene (non-Ig) is initiated by AID as indicated in the schematic translocation on the left. The breakpoints of BCL2, E2A (also known as TCF3), and BCL1 (also known as CCND1) genes are clustered in < 600 bp fragile zones centered around CG motif. In (A), 88% of the BCL2 breakpoints are in the 175 bp major breakpoint region (MBR), and all of the breakpoints within the MBR are centered around CG motif in three peaks. In (B), 80% of E2A-PBX1 translocations are within the 23 bp fragile zone of E2A, and 60% of these breakpoints are directly at CG sites. The nearest exons are abbreviated with ‘Ex.’ In (C), 64% of BCL1 breakpoints in the BCL1-IGH translocation are centered around CG in the 150 bp major translocation cluster (MTC) near the CCND1 gene. The breakpoints of BCL2, E2A, and BCL1 are from previously published data [6,25,67,68].
Here we propose a model in which AID, another DNA enzyme that acts on the B cell antigen receptor genes, is tethered to the RNA as it exits any transcribing RNA polymerase. This tethering is important for the targeting of AID in its normal functions, and critically for this proposed model, in its pathologic behavior at chromosomal translocation fragile zones. Because AID only acts on single-stranded DNA (ssDNA), our model integrates key elements of DNA structure, DNA methylation, DNA repair, and transcription in the breakage and translocation. We propose that this model likely has broader relevance to biology for other enzymes, such as other members of the ABOPEC family, that can act on DNA during transcription.
Elements of a Novel Model
In this section, we briefly describe the key elements before integrating these in our proposed model for lymphoid translocations. These include background about the DNA enzymes, the DNA features at the fragile zones, and the nascent RNA during transcription at those zones.
Enzymes Important for Generating the Breaks in Lymphoid Chromosomal Translocations
For neoplastic B cell chromosomal translocations, the most common events involve a premature release of the DNA ends when the RAG-complex generates breaks at the immunoglobulin heavy chain (IgH) locus before the usual joining by nonhomologous end joining (NHEJ) [4,7,8]. The most intriguing aspect is that the break in the other participating chromosome is usually caused by the action of AID (Figure 1)[4,6]. AID is a cytidine deaminase that converts C to U as the initial step for Ig somatic hypermutation (affinity maturation, SHM) or Ig heavy chain class switch recombination (CSR) [9,10] (Figure S1). AID is also capable of converting methyl C at CG sites to T [11,12], leading to a problematic T:G mismatch in DNA that is a long-lived lesion [6,13-15]. These long-lived mismatches are vulnerable to a number of nucleases. NHEJ repair at the RAG-complex break sites and most other double-strand break (DSB) sites involves the Artemis:DNA-PKcs complex. Both the RAG-complex and Artemis are active in early B cells and are capable of converting the T:G mismatch into a DSB [6,16,17].
DNA Boundaries of the Translocation Fragile Zones Provide Insight into DNA Structure
AID can only act at cytosines within a region of DNA that is at least transiently single-stranded because the enzyme rotates the cytosine base out of the DNA duplex prior to enzymatically converting it from C to U or methyl C to T [18,19]. Studies of the DNA structure within and outside of the fragile zones precisely and reliably identify the regions that transiently adopt ssDNA character [20-24]. Strings of C-nucleotides, which can shift the DNA duplex conformation away from B-form into a conformation that is intermediate between A-form and B-form (B/A-intermediate), is a common DNA sequence feature in all of these fragile zones [20-24]. These fragile zones frequently form a transiently open state that is suitable for AID action, as documented with bisulfite chemical probing and by nuclease sensitivity assays [22,23,25].
Transcription Stabilizes AID on the Nascent RNA Tail Emanating from the RNA Polymerase
In defined biochemical systems using purified AID, RNA polymerase, and a long DNA template that includes the fragile zone, comparison of the AID activity with and without active transcription is revealing. AID has an increased deamination efficiency on C or methyl C when the substrate is being transcribed compared with the untranscribed state. Further, the deamination activity of AID is enhanced in the regions containing C-strings regardless of transcription, most likely due to the transient ssDNA state caused by the local B/A-intermediate conformation [25].
Most importantly for our model, AID activity decreases substantially when the nascent RNA is removed using RNase A during transcription [25]. AID has two binding motifs (assistant patch and substrate channel) for single-stranded nucleic acid [19] (Figure 2A): the substrate channel is catalytically active for deaminating C in ssDNA, and the assistant patch can bind single-stranded RNA (ssRNA) and ssDNA equally well, but it is catalytically inactive [7,19]. In our RNA tether model, we propose that AID binds to the nascent RNA emanating from the RNA polymerase, thereby allowing AID to remain in proximity to the the ssDNA that remains in the wake of the moving RNA polymerase, thus allowing AID to deaminate any C in the DNA. When the nascent RNA is experimentally removed (by RNase A) during transcription, the advantage of more frequent collision is lost (Figure 2B) [25]. This indicates that the assistant patch, which is catalytically inactive, has great importance in a manner not previously recognized [25].
Figure 2. Tethering of AID to RNA during transcription enhances its deamination efficiency on the non-template DNA strand.
(A) Illustration of the two nucleic acid binding motifs of AID. The assistant patch and catalytic channel in the deaminase domain of AID are shown in yellow and red correspondingly [19]. The black dot in the substrate channel denotes the cytidine deaminase catalytic center of AID. A Y-shaped substrate binding by the two nucleic acid binding motifs in AID is shown. The assistant patch can bind ssRNA equally well as ssDNA [7]. (B) Models for deamination activity of the RNA-tethered AID during transcription. The nascent RNA and the single-stranded non-template DNA strand (shown on top) can form an AID preferred Y-shaped substrate during transcription. Tethering of AID to RNA via its assistant patch (in yellow) can increase AID collision frequency with the nearby single-stranded non-template DNA strand, which results in C to U or methyl C to T deamination (shown as stars). The binding of more than one AID molecule to the same RNA molecule can further increase the collision and deamination frequency of AID with the non-template DNA strand. When the nascent RNA is removed immediately after transcription by RNase A (shown at the bottom of the figure), the AID preferred Y-shaped substrate no longer exists.
Integrated Model for B Cell Chromosomal Translocations
The E2A gene has the smallest known fragile zone among all neoplastic B cell translocations; therefore, dissecting the essential elements for the clustered E2A breakage can better define the mechanistic requirements for B cell fragile regions. Including E2A, the majority of B cell translocations occur during the pre-B cell stage of lymphoid differentiation [6]. Like many of the genes involved in B cell translocations, the E2A gene is actively transcribed in all B cells and is critical to B cell development [26,27]. Nearly all E2A breakage occurs within its intron 16 of 3.3 kb in size, so the transactivation domain of the E2A protein fuses to the DNA binding domain of its translocation partner gene, PBX1. The resulting chimeric protein leads to the upregulation of PBX1-targeted genes that provides a growth advantage for the cell with the translocation, and thus leukemogenesis [28]. Similar intron to intron translocations that generate chimeric proteins occur in many lymphoid and nonlymphoid chromosomal translocations, including BCL6-IGH, MYC-IGH, and BCR-ABL1, ETV6-NTRK3 and Ewsr1-Fli1 translocations [6,29-37]. While the E2A translocation could in theory occur anywhere within the 3.3 kb intron 16 to create the fusion oncogene, the breaks observed in most patients are within a very small 23 bp zone (Figure 1). This observation raises a critical question.
What DNA features predispose the 23 bp fragile zone within the E2A gene to breakage? Nearby C-string sequences (consecutive C nucleotides), transcription, and the presence of AID preferred motifs are key features (Figure 3A). More specifically, the 23 bp E2A fragile zone is located immediately upstream of a C-string rich region, and the C-string feature is also present in other major B cell translocation fragile zones (see ref. [39] for details). The C-string rich region adopts a B/A intermediate conformation, leading to local ssDNA character and permitting AID deamination action [36]. Transcription through these B/A intermediate regions further enhances AID targeting above the untranscribed level due to several factors that are alluded to above and detailed in our model as follows (Figure 3B).
Figure 3. Key features and steps contribute to the clustered DNA breakage in B cell translocations.
(A) Sequence features around E2A fragile zone. The horizontal black line denotes E2A intron 16 with two black squares, representing E2A exons 16 and 17, at its two ends. The red starburst denotes the 23 bp E2A fragile zone. C-strings on the non-template strand and template strand are marked by blue vertical bars along the top and bottom of intron 16, respectively. Densities of C-strings, 1/47, 1/37, and 1/363 bp, are indicated at the top of the black brackets marking the three regions. The 23 bp E2A fragile zone is at the very beginning of region 2, which has the highest C-string density (1/37). An enlarged view of region 2 is shown at the bottom of this panel by the orange horizontal line. Two directly overlapping WRCG sites within the 23 bp fragile zone are marked by green vertical lines. A pair of direct repeat sequences flanking the 23 bp fragile zone are represented by red arrowheads at the far left of the orange line. Thin black vertical lines represent all the CG sites within this region 2. (B) Steps for B cell chromosomal translocations. C-strings are frequently observed around the fragile zones in B cells. The increased transient ssDNA character in C-string rich regions allows AID deamination even in the native duplex DNA state (without transcription). In the presence of transcription, RNA polymerase tends to pause within the C-string rich regions. The RNA tethered AID can thus act on the single-stranded non-template DNA strand within the paused transcription bubble. Misalignment by transcription between direct repeats around the B cell fragile zones can also generate ssDNA as substrate for AID. Deamination of the overlapping methylated WRCG sites within fragile zones can lead to long-lived lesions that are vulnerable to nucleases that convert these to double strand breaks (DSBs).
First, RNA polymerase II tends to pause at high GC regions during transcription, particularly at runs of C on either strand [40-43]. The paused RNA polymerase II complexes provide a single stranded non-template strand for AID deamination. Accumulation of several paused RNA polymerases at the C-string rich region in highly transcribed genes increases negative superhelical tension [44] and increases the propensity for double helix unwinding and transient separation of the two DNA strands into the single-stranded state upstream [45]. A DNA region immediately upstream of a C-string rich region, such as the E2A fragile zone, would have more frequent duplex DNA breathing, leading to transient ssDNA character, and thus be predisposed to higher AID deamination action during transcription.
Second, presence of the new RNA transcript during transcription can increase the efficiency of AID deamination activity [45]. The assistant patch of AID can bind to the nascent RNA transcript, allowing the AID substrate (catalytic) channel to act on the non-template DNA strand when colliding with the single-stranded fragile zone as the RNA polymerase passes through the region with each transcription cycle [19] (Figures 2B). Furthermore, more than one AID molecules can bind to the long RNA transcript at the same time, further increasing the frequency of deamination in the non-template DNA strand [7]. Removal of the nascent RNA during transcription in a purified biochemical system markedly reduces the deamination at the fragile zone [25].
Third, DNA direct repeats are common within and around the B cell fragile zones [39]. Transcription transiently separates two DNA strands, and heteroduplex formation may occur due to misalignment of the direct repeats in the process of reannealing in the wake of the RNA polymerase [45]. Heteroduplexes generate ssDNA, which is a required substrate for AID deamination.
With all the DNA factors favoring a transient ssDNA substrate for AID, the methylated CG sites within AID preferred sequences (WRCG, W = A or T, R = A or G, and typically called AID hotspot motifs) in the single-stranded region are crucial for the occurrence of chromosomal translocations [14]. We have shown that the two CG sites within overlapping AID hotspot motifs in the E2A fragile zone are partially methylated in pre-B cells [25]. The CG sites at the fragile zones for BCL2, BCL1, CCND1, MALT1, and CRLF2 are also partially methylated in a substantial percentage of early B cells [39]. The deamination of methyl C to T by AID generates a T:G mismatch in the genome. T:G mismatches are repaired slowly and are susceptible to nucleases inside the cell, particularly RAG and Artemis nucleases in B cells, to generate DSB [6,17]. Overlapping WRC motifs, which are presumed to further increase the chance of long-lived DNA lesions, are critical for CSR [10,46] It is presumed that overlapping WRCG sites further increase the chance of long-lived DNA lesions. Notably, directly overlapping WRCG sites are present in the major breakage region (MBR) of BCL2, major translocation cluster (MTC) of BCL1, and the 23 bp E2A fragile zone (Figure 1).
The rarity of chromosomal translocations can be attributed to both (a) the low occurrence of durable ssDNA regions either in the native DNA state or when coupled with transcription, and (b) the requirement of methylated WRCG motifs within single-stranded regions. In addition, the low nuclear concentration of active AID may be a factor. All of these features may also be relevant to the breakage of other fragile zones (Table 1).
Table 1.
Lymphoid and nonlymphoid common chromosomal translocations, grouped by the size of translocation zone, and the key factors for DNA breakage.
| Category | Size | Stage of occurrence |
Translocations | Cancers | Key factors |
|---|---|---|---|---|---|
| Lymphoid | 20-600 bp | Early B | BCL2-IGH | follicular lymphomas, diffuse large B cell lymphoma |
|
| BCL1-IGH | mantle cell lymphoma | ||||
| E2A-PBX1 | B-cell acute lymphoblastic lymphomas | ||||
| MALT1-IGH | mucosa-associated lymphoid tissue lymphomas | ||||
| CRLF2-IGH | B-cell precursor acute lymphoblastic leukemias | ||||
| ~2 kb | Mature B | BCL6-IGH | follicular lymphomas, diffuse large B cell lymphoma |
|
|
| MYC-IGH | Burkitt’s lymphoma |
|
|||
| >3 kb | Early T | SCL-SIL deletions | T-cell acute lymphoblastic leukemia |
|
|
| LMO2-TCR | T-cell acute lymphoblastic leukemia | ||||
| Nonlymphoid | >3 kb | Hematopoietic stem cells | BCR-ABL1 | chronic myeloid leukemia |
|
| Hematopoietic stem cells | MLL-AF9 | acute myeloid leukemia | |||
| Mesenchymal stem cells | Ewsr1-Fli1 | Ewing sarcoma | |||
| Stem cells | SYT-SSX | synovial sarcoma |
Our model focuses on the earliest translocations at the etiologic inception of lymphomas and lymphoid leukemias. Recent work shows that AID is also critical for progression of B cell malignancies to increasingly aggressive forms that have multiple mutations and chromosomal rearrangements [47,48].
Comparison and Contrasts with Chromosomal Translocations in Nonlymphoid Cells
The genes involved in all neoplastic translocations are those that affect cell proliferation or survival [1,2], including those in the lymphoid translocations described above (Table 2). It is interesting to consider whether the chromosomal translocations that occur in nonlymphoid cells have any mechanistic parallel to the lymphoid translocations.
Table 2.
Key Features at Translocation Fragile Zones in B Cell Malignancies.
| Basic requirement for all neoplastic translocations | Proliferative or Survival Advantage | The translocation fragile zone must be in proximity to a gene that can confer a growth advantage after the translocation (e.g., generating a fusion protein in some cases; or upregulating a gene). |
| Factors that narrow the boundaries of fragile zones in B cell malignancies | Critical Elements |
|
Most chromosomal translocations are random events, which can become detectable if the growth advantage is provided by the resulting fusion proteins. For instance, Ewing sarcoma and synovial sarcoma are two representative sarcomas associated with specific chromosomal translocations. The t(11;22) translocation occurs in over 90% of the Ewing family tumors and leads to production of a chimeric Ewsr1-Fli1 protein. The majority of the breakpoints in the EWSR1 gene are broadly distributed across its exons 7 to 10, and the Fli1 breakpoints are dispersed across many of its exons (exon 4 to exon 8) [49-51]. The synovial sarcoma is characterized by t(X;18) translocation, which leads to the production of a distinct SYT-SSX gene. The breakpoints on SYT gene are mainly spanning over a 5 kb zone within its intron 10 of 14 kb in size, and the breakpoints of SSX1 and SSX2 genes are dispersed within their 2 kb intron 4 [52]. The broad zones over which most nonlymphoid translocations occur suggest a fundamental difference between lymphoid translocation breakage mechanisms and nonlymphoid ones (Table 1). The RAG and AID enzymes carry out targeted action in lymphoid cells. The RAG enzyme cleaves DNA at precise recombination signal sequences in V(D)J recombination [8]. AID functions in Ig CSR and SHM, which are regional, but still targeted by DNA structural features [10]. The “break cluster regions” in most nonlymphoid translocations are much broader than the fragile zones present in lymphoid translocations. The breakpoint cluster region of BCR gene in the notable BCR-ABL1 translocation (also known as Philadelphia chromosome translocation) in chronic myeloid leukemia is 5.8 kb [53]. The MLL breakpoints in MLL-AF9 translocations observed in acute myeloid leukemia (AML) and t-AML is over 8 kb, and the AF9 breakpoints are located within two sites, an over 10 kb site A in its intron 4 and an over 5 kb site B within introns 7 and 8 [54-56]. The broad zones over which most nonlymphoid translocations occur (usually several kb) suggest that the breakage mechanism for most of these may be due to random oxidative damage or failed topoisomerase II reactions (Table 1) [38]. Note that the joining phase for nearly all of these translocations is NHEJ [4], and so the divergence in mechanism we are describing is only for the breakage phase and not for the joining of the broken chromosomal ends.
Broader Roles of the RNA Tethering Concept
Although catalytically inactive, the assistant patch of AID can bind to RNA and potentially enhances deamination efficiency of AID on the non-template DNA strand in transcription. This RNA tethering model provides explanation for the breakage of fragile zones in lymphoid translocations, it can also explain important aspects of CSR and SHM. R-loops are formed at the Ig switch regions in CSR during transcription [10,57], and AID binding to RNA enhances its deamination efficiency on the single-stranded non-template strand in R-loop structures (Figure 4 and Figure S1). Likewise, the binding of AID on the nascent RNA ensures its targeted activity on the non-template DNA strand during transcription in SHM [7].
Figure 4. Tethering of AID to RNA allows its efficient deamination on R-loop substrate during Ig class switch recombination (CSR).
R-loop structure formed at Ig switch region during CSR is shown with three interactive strands on the top (A) and as simplified diagram at the bottom (B). AID tethering to the RNA strand (shown in blue) of the R-loop structure can potentially increase its collision frequency with the displaced non-template DNA strand (shown in green).
The APOBEC family of cytidine deaminases includes AID and 10 other members in humans. As noted above, AID has two nucleic acid binding motifs, the assistant patch and the substrate channel (Figure 2A). Four members of the APOBEC family (A3B, A3D, A3F, and A3G) contain two deaminase domains, CD1 on the N-terminal and CD2 on the C-terminal portions. While CD2 is catalytically active, CD1 is inactive in substrate deamination but retains nucleic acid binding activity [58-61]. The function of the assistant patch on AID deamination sheds light on the study of the dual deaminase domain APOBECs. It has been reported that the binding of the A3F CD1 with ssDNA substrate can enhance the deamination activity of CD2 [59]. And the CD1 of A3G can bind to different types of RNA, which is important in viral infection [62]. What additional role does the CD1 play in relevant biological processes, especially when binding to RNA during transcription? Could transcribed genes be more readily modified for a variety of purposes because of APOBEC enzymes binding to the nascent RNA via the catalytically inactive CD1? Considering the function of the double domain more broadly, could binding of CD1 to either ssRNA or ssDNA facilitate the enzyme action during viral infection? Encapsidation of APOBEC enzymes within viral particles permits mutagenesis during viral transcription [63,64], and RNA tethering would clearly accentuate the action on the local nucleic acid template [65]. The battle between viruses and host cell APOBECs may also rely on binding to the single-stranded nucleic acids [66]. Therefore, the mechanism of AID action under physiologic and pathologic circumstances may apply broadly in nature.
Concluding Remarks
Our model explains many key features that predispose fragile zones to DNA breakage within non-Ig genes. The model proposed here is likely relevant to the most difficult-to-treat human lymphomas. Specifically, it may explain why some patient lymphomas have multiple oncogenes being activated, and this correlates with AID expression level [47,48]. Future experiments will provide additional basis for evaluating the model (see Outstanding questions). Regarding non-lymphoid translocations, it will be interesting to see if function of the other APOBEC family members utilizes RNA tethering.
Outstanding Questions.
Is one additional role of transcription through the translocation fragile zones to evict or slide the histone octamer away to allow AID access?
Can cutting the nascent RNA prematurely within cells, if possible, reduce AID action?
Does the sequence of the nascent RNA affect its binding to the assistant patch of AID and thus can affect AID deamination of the non-template strand during transcription?
Are there any other factors, besides the sequence features and the mRNA tethered to AID, contributing to the targeting of AID to the fragile zones in human lymphoid translocations?
Do the Ig switch region sequences favor binding to the AID assistant patch?
Are the C-strings, which we find important here for fragile zones also relevant to somatic hypermutation, as suggested by the pyrimidine tracts described in Ig V genes?
Do Apobec3 members generate the chromosome breaks in a manner that relies on RNA tethering in nonlymphoid chromosomal translocations?
Supplementary Material
HIGHLIGHTS.
Tethering of a key mutagenic enzyme (AID) to RNA emanating from RNA polymerase provides insights into its physiologic and pathologic action.
We propose a model where tethering of AID to nascent RNA is important for its action at translocation fragile zones ranging from 20 to 600 base pairs in size.
Tethering likely supports the targeting of AID during its normal function in immunoglobulin class switch recombination and somatic hypermutation.
Broader roles of tethering by the enzymes of the APOBEC family may be relevant to APOBEC functions in anti-viral defense and other responses to cell stress.
Acknowledgements
The Lieber lab has been funded by NIH CA196671 and GM118009. Di Liu is currently supported by the China Postdoctoral Science Foundation (Project No. 2022M712546), the Fundamental Research Funds for the Central Universities (Project No. xzy012023018), and the Postdoctoral Research Project of Shaanxi Province (2023BSHEDZZ07).
GLOSSARY
- AID.
Activation-Induced Deaminase is a cytidine deaminase that is expressed in vertebrate B cells. AID only acts on single-stranded DNA (ssDNA). AID is essential for immunoglobulin (Ig) class switch recombination (CSR). The ssDNA in Ig CSR arises due to R-loop formation during transcription. AID is also essential for Ig somatic hypermutation (SHM), which is responsible for affinity maturation of the variable domain of Ig heavy and light chain genes.
- APOBEC.
Abbreviation for apolipoprotein B mRNA-editing catalytic polypeptide-like. The APOBEC family contains APOBEC1, AID, APOBEC2, APOBEC3A-H, and APOBEC 4. The APOBEC3 family (A3A-A3H) acts on ssDNA. A3B, A3D, A3F, and A3G have two cytidine deaminase (CD) domains, but only one of them (CD2) is catalytically active.
- B/A DNA intermediate.
This is a duplex DNA structure that is intermediate between B form DNA and A form. A form, B form, and B/A form differ in the width and depth of the major and minor grooves and the distance of the base pairs from the longitudinal (central) axis of the DNA duplex. B/A DNA breathes open and closes more frequently than B form DNA.
- B cell.
Lymphocytes that express an immunoglobulin molecule (also called antibody) on their surface or secreted.
- Chromosomal translocation.
Breakage of two chromosomes with the generation of a new chromosome, called the derivative chromosome. The derivative chromosome 1, for example, would have the centromere of chromosome 1, but would include a segment from another chromosome. If the exchange between two chromosomes is reciprocal, with no net loss or gain of genetic materials, this is called a balanced translocation. But sometimes, the other derivative chromosome is not formed, resulting in loss of genetic material.
- Double-strand breaks (DSB).
DSBs involve breakage of both strands of the DNA at a site. This is in contrast to single-strand breaks (SSB) that involve a disruption of only one of the two strands of DNA at a site.
- E2A protein.
The E2A protein is a transcription factor encoded by the E2A gene. The protein is required for the development of B cells and regulates B cell differentiation and activation.
- E2A gene.
The E2A gene is also called TCF3. The gene encodes two basic helix-loop-helix (bHLH) transcription factors, E12 and E47, which are alternative splice forms.
- E2A-PBX1.
Neoplastic chromosomal translocations involving the E2A gene often involve the translocation partner gene, PBX1. The E2A-PBX1 translocation appears in 25% of pediatric pre-B ALL, 4% of ALL overall. The fusion of the transactivating domain of E2A with the DNA binding domain of PBX1 results in altered gene expression that drives the proliferation of early B cells.
- Nascent RNA.
During transcription by RNA polymerases, the newly polymerized RNA exits the RNA polymerase via an exit pore. As the RNA polymerase moves along and polymerized RNA based on the template DNA strand (TS), the nontemplate DNA strand (NTS) is single-stranded and passes along the top of the RNA polymerase.
- Non-Homologous End Joining (NHEJ).
Also called nonhomologous DNA end joining, is the major pathway for the repair of double-strand DNA breaks (DSB). The NHEJ pathway includes several proteins that bind to the DNA and ligate the strands back together after nucleotide removal and addition.
Footnotes
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
Declaration of Conflicts of Interest The authors have no conflicts of interest.
REFERENCES
- 1.Abrash EW and Calabrese JM (2022) Oncogenic transcription factors and neogenes: New opportunities for cancer immunotherapy. Mol Cell 82, 2353–2355 [DOI] [PubMed] [Google Scholar]
- 2.Harewood L. et al. (2010) The effect of translocation-induced nuclear reorganization on gene expression. Genome Res 20, 554–564 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Carbone A. et al. (2019) Follicular lymphoma. Nat Rev Dis Primers 5, 83. [DOI] [PubMed] [Google Scholar]
- 4.Lieber MR (2016) Mechanisms of human lymphoid chromosomal translocations. Nat Rev Cancer 16, 387–398 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Nussenzweig A and Nussenzweig MC (2010) Origin of chromosomal translocations in lymphoid cancer. Cell 141, 27–38 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Tsai AG et al. (2008) Human chromosomal translocations at CpG sites and a theoretical basis for their lineage and stage specificity. Cell 135, 1130–1142 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Liu D. et al. (2022) The mRNA tether model for activation-induced deaminase and its relevance for Ig somatic hypermutation and class switch recombination. DNA Repair (Amst) 110, 103271. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Liu C. et al. (2022) Structural insights into the evolution of the RAG recombinase. Nat Rev Immunol 22, 353–370 [DOI] [PubMed] [Google Scholar]
- 9.Muramatsu M. et al. (2000) Class switch recombination and somatic hypermutation require activation-induced cytidine deaminase (AID), a member of the RNA editing cytidine deaminase family. Cell 102, 541–544 [DOI] [PubMed] [Google Scholar]
- 10.Yu K and Lieber MR (2019) Current insights into the mechanism of mammalian immunoglobulin class switch recombination. Crit Rev Biochem Mol Biol 1–19 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Bransteitter R. et al. (2003) Activation-induced cytidine deaminase deaminates deoxycytidine on single-stranded DNA but requires the action of RNase. Proc. Natl. Acad. Sci 100, 4102–4107 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Beletskii A and Bhagwat AS (1996) Transcription-induced mutations: increase in C to T mutations in the nontranscribed strand during transcription in Escherichia coli. Proc Natl Acad Sci U S A 93, 13919–13924 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Schmutte C. et al. (1995) Base excision repair of U:G mismatches at a mutational hotspot in the p53 gene is more efficient than base excision repair of T:G mismatches in extracts of human colon tumors. Cancer Res. 55, 3742–3746 [PubMed] [Google Scholar]
- 14.Pfeifer GP (2006) Mutagenesis at methylated CpG sequences. Curr Top Microbiol Immunol 301, 259–281 [DOI] [PubMed] [Google Scholar]
- 15.Walsh CP and Xu GL (2006) Cytosine methylation and DNA repair. Curr Top Microbiol Immunol 301, 283–315 [DOI] [PubMed] [Google Scholar]
- 16.Ma Y. et al. (2005) The Artemis:DNA-PKcs Endonuclease Can Cleave Gaps, Flaps, and Loops. DNA Repair 4, 845–851 [DOI] [PubMed] [Google Scholar]
- 17.Cui X. et al. (2013) Both CpG Methylation and AID are Required for the Fragility of the Human Bcl-2 Major Breakpoint Region: Implications for the Timing of the Breaks in the t(14;18). Mol Cell Biol 33, 947–957 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Pham P. et al. (2003) Processive AID-catalyzed cytosine deamination on single-stranded DNA stimulates somatic hypermutation. Nature 424, 103–107 [DOI] [PubMed] [Google Scholar]
- 19.Qiao Q. et al. (2017) AID Recognizes Structured DNA for Class Switch Recombination. Mol Cell 67, 361–373 e4 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Ng HL et al. (2000) The structure of a stable intermediate in the A <--> B DNA helix transition. Proc Natl Acad Sci U S A 97, 2035–2039 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Dornberger U. et al. (1999) High base pair opening rates in tracts of GC base pairs. J. Biol. Chem 274, 6957–6962 [DOI] [PubMed] [Google Scholar]
- 22.Tsai AG et al. (2009) Conformational variants of duplex DNA correlated with cytosine-rich chromosomal fragile sites. J Biol Chem 284, 7157–7164 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Raghavan SC et al. (2004) A non-B-DNA structure at the bcl-2 major break point region is cleaved by the RAG complex. Nature 428, 88–93 [DOI] [PubMed] [Google Scholar]
- 24.Raghavan SC et al. (2004) Stability and strand asymmetry in the non-B DNA structure at the bcl-2 major breakpoint region. J. Biol. Chem 279, 46213–46225 [DOI] [PubMed] [Google Scholar]
- 25.Liu D. et al. (2021) Mechanistic basis for chromosomal translocations at the E2A gene and its broader relevance to human B cell malignancies. Cell Rep 36, 109387. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Hystad ME et al. (2007) Characterization of early stages of human B cell development by gene expression profiling. J Immunol 179, 3662–3671 [DOI] [PubMed] [Google Scholar]
- 27.Sigvardsson M. (2023) Transcription factor networks link B-lymphocyte development and malignant transformation in leukemia. Genes Dev 37, 703–723 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Duque-Afonso J. et al. (2016) E2A-PBX1 Remodels Oncogenic Signaling Networks in B-cell Precursor Acute Lymphoid Leukemia. Cancer Res 76, 6937–6949 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Tsai AG et al. (2010) t(X;14)(p22;q32)/t(Y;14)(p11;q32) CRLF2-IGH translocations from human B-lineage ALLs involve CpG-type breaks at CRLF2, but CRLF2/P2RY8 intrachromosomal deletions do not. Blood 116, 1993–1994 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Tsai AG et al. (2010) The t(14;18)(q32;q21)/IGH-MALT1 translocation in MALT lymphomas is a CpG-type translocation, but the t(11;18)(q21;q21)/API2-MALT1 translocation in MALT lymphomas is not. Blood 115, 3640–1; author reply 3641 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Persson M. et al. (2009) Recurrent fusion of MYB and NFIB transcription factor genes in carcinomas of the breast and head and neck. Proc Natl Acad Sci U S A 106, 18740–18744 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Gazendam AM et al. (2021) Synovial Sarcoma: A Clinical Review. Curr Oncol 28, 1909–1920 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Braun TP et al. (2020) Response and Resistance to BCR-ABL1-Targeted Therapies. Cancer Cell 37, 530–542 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Lu Z. et al. (2013) BCL6 breaks occur at different AID sequence motifs in Ig-BCL6 and non-Ig-BCL6 rearrangements. Blood 121, 4551–4554 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Aplan PD (2006) Chromosomal translocations involving the MLL gene: molecular mechanisms. DNA Repair (Amst) 5, 1265–1272 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Tognon C. et al. (2002) Expression of the ETV6-NTRK3 gene fusion as a primary event in human secretory breast carcinoma. Cancer Cell 2, 367–376 [DOI] [PubMed] [Google Scholar]
- 37.Fischer U. et al. (2015) Genomics and drug profiling of fatal TCF3-HLF-positive acute lymphoblastic leukemia identifies recurrent mutation patterns and therapeutic options. Nat Genet 47, 1020–1029 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Pannunzio NR and Lieber MR (2017) AID and Reactive Oxygen Species Can Induce DNA Breaks within Human Chromosomal Translocation Fragile Zones. Mol Cell 68, 901–912 e3 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Liu D and Lieber MR (2022) The mechanisms of human lymphoid chromosomal translocations and their medical relevance. Crit Rev Biochem Mol Biol 57, 227–243 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Watts JA et al. (2019) cis Elements that Mediate RNA Polymerase II Pausing Regulate Human Gene Expression. Am J Hum Genet 105, 677–688 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Bentin T. et al. (2005) Transcription arrest caused by long nascent RNA chains. Biochim Biophys Acta 1727, 97–105 [DOI] [PubMed] [Google Scholar]
- 42.Gressel S. et al. (2017) CDK9-dependent RNA polymerase II pausing controls transcription initiation. Elife 6, e29736. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Pham P. et al. (2019) AID-RNA polymerase II transcription-dependent deamination of IgV DNA. Nucleic Acids Res 47, 10815–10829 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44.Liu LF and Wang JC (1987) Supercoiling of the DNA template during transcription. Proc. Natl. Acad. Sci. USA 84, 7024–7027 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45.Sinden RR (1994) DNA Structure and Function. 398 [Google Scholar]
- 46.Han L. et al. (2011) Overlapping activation-induced cytidine deaminase hotspot motifs in Ig class-switch recombination. Proc Natl Acad Sci U S A 108,11584–11589 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47.Miyaoka M. et al. (2022) AID is a poor prognostic marker of high-grade B-cell lymphoma with MYC and BCL2 and/or BCL6 rearrangements. Pathol Int 72, 35–42 [DOI] [PubMed] [Google Scholar]
- 48.Phelan JD and Jaffe ES (2023) An AID to follicular lymphoma transformation. Blood 142, 500–502 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49.Berger M. et al. (2013) Genomic EWS-FLI1 fusion sequences in Ewing sarcoma resemble breakpoint characteristics of immature lymphoid malignancies. PLoS One 8, e56408. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 50.Dupuy M. et al. (2023) Ewing sarcoma from molecular biology to the clinic. Front Cell Dev Biol 11, 1248753. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 51.Cantile M. et al. (2013) Molecular detection and targeting of EWSR1 fusion transcripts in soft tissue tumors. Med Oncol 30, 412. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 52.Wei Y. et al. (2003) Characteristic sequence motifs located at the genomic breakpoints of the translocation t(X;18) in synovial sarcomas. Oncogene 22, 2215–2222 [DOI] [PubMed] [Google Scholar]
- 53.Groffen J and Heisterkamp NC (1989) Philadelphia chromosome translocation. Crit Rev Oncog 1, 53–64 [PubMed] [Google Scholar]
- 54.Super HG et al. (1997) Identification of complex genomic breakpoint junctions in the t(9;11) MLL-AF9 fusion gene in acute leukemia. Genes Chromosomes Cancer 20, 185–195 [PubMed] [Google Scholar]
- 55.Harper DP and Aplan PD (2008) Chromosomal rearrangements leading to MLL gene fusions: clinical and biological aspects. Cancer Res 68, 10024–10027 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 56.Alonso CN et al. (2008) A novel AF9 breakpoint in MLL-AF9-positive acute monoblastic leukemia. Pediatr Blood Cancer 50, 869–871 [DOI] [PubMed] [Google Scholar]
- 57.Yu K. et al. (2003) R-loops at immunoglobulin class switch regions in the chromosomes of stimulated B cells. Nature Immunol. 4, 442–451 [DOI] [PubMed] [Google Scholar]
- 58.Polevoda B. et al. (2015) RNA binding to APOBEC3G induces the disassembly of functional deaminase complexes by displacing single-stranded DNA substrates. Nucleic Acids Res 43, 9434–9445 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 59.Chen Q. et al. (2016) The in vitro Biochemical Characterization of an HIV-1 Restriction Factor APOBEC3F: Importance of Loop 7 on Both CD1 and CD2 for DNA Binding and Deamination. J Mol Biol 428, 2661–2670 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 60.Xiao X. et al. (2016) Crystal structures of APOBEC3G N-domain alone and its complex with DNA. Nat Commun 7, 12193. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 61.Salter JD et al. (2016) The APOBEC Protein Family: United by Structure, Divergent in Function. Trends Biochem Sci 41, 578–594 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 62.Yang H. et al. (2022) Structural basis of sequence-specific RNA recognition by the antiviral factor APOBEC3G. Nat Commun 13, 7498. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 63.Moraes SN et al. (2022) Evidence linking APOBEC3B genesis and evolution of innate immune antagonism by gamma-herpesvirus ribonucleotide reductases. Elife 11, e83893. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 64.Wang X. et al. (2007) Biochemical differentiation of APOBEC3F and APOBEC3G proteins associated with HIV-1 life cycle. J Biol Chem 282, 1585–1594 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 65.Wong L. et al. (2022) Competition for DNA binding between the genome protector replication protein A and the genome modifying APOBEC3 single-stranded DNA deaminases. Nucleic Acids Res 50, 12039–12057 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 66.Ito F. et al. (2023) Structural basis for HIV-1 antagonism of host APOBEC3G via Cullin E3 ligase. Sci Adv 9, eade3168. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 67.Ye X. et al. (2021) Genome-wide mutational signatures revealed distinct developmental paths for human B cell lymphomas. J Exp Med 218, e20200573. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 68.Burmeister T. et al. (2023) Molecular characterization of TCF3::PBX1 chromosomal breakpoints in acute lymphoblastic leukemia and their use for measurable residual disease assessment. Sci Rep 13, 15167. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.




