Significance
We used duplex sequencing to detect low-frequency mutations in the BCL6 super-enhancer locus in normal human B cells. The landscape of preexisting mutations is remarkably conserved across different ethnicities and reveals clustered mutational hotspots that correlate with reported sites of clonal mutations and translocation breakpoints in human B cell lymphomas. This high-resolution genomic landscape revealed by duplex sequencing offers accurate and thorough profiling of low-frequency preexisting mutations in normal individuals along with the potential for early detection of neoplastic alterations.
Keywords: duplex sequencing, somatic hypermutation, BCL6, super-enhancer, mutational signatures
Abstract
The super-enhancers (SEs) of lineage-specific genes in B cells are off-target sites of somatic hypermutation. However, the inability to detect sufficient numbers of mutations in normal human B cells has precluded the generation of a high-resolution mutational landscape of SEs. Here we captured and sequenced 12 B cell SEs at single-nucleotide resolution from 10 healthy individuals across diverse ethnicities. We detected a total of approximately 9,000 subclonal mutations (allele frequencies <0.1%); of these, approximately 8,000 are present in the BCL6 SE alone. Within the BCL6 SE, we identified 3 regions of clustered mutations in which the mutation frequency is ∼7 × 10−4. Mutational spectra show a predominance of C > T/G > A and A > G/T > C substitutions, consistent with the activities of activation-induced-cytidine deaminase (AID) and the A-T mutator, DNA polymerase η, respectively, in mutagenesis in normal B cells. Analyses of mutational signatures further corroborate the participation of these factors in this process. Single base substitution signatures SBS85, SBS37, and SBS39 were found in the BCL6 SE. While SBS85 is a denoted signature of AID in lymphoid cells, the etiologies of SBS37 and SBS39 are unknown. Our analysis suggests the contribution of error-prone DNA polymerases to the latter signatures. The high-resolution mutation landscape has enabled accurate profiling of subclonal mutations in B cell SEs in normal individuals. By virtue of the fact that subclonal SE mutations are clonally expanded in B cell lymphomas, our studies also offer the potential for early detection of neoplastic alterations.
In response to antigen stimulation, naïve B cells congregate in germinal centers, where they undergo multiple rounds of cell division and antigenic selection ultimately maturing into antibody-producing plasma cells and memory B cells (1–3). In the germinal center, somatic hypermutation (SHM) occurs, targeting the variable domain of the Ig receptor and resulting in nonfunctional, low-affinity, or high-affinity antibodies. This process involves the creation of multiple single-base substitutions in Ig genes by the mutagenic SHM machinery (4). SHM includes 2 stages of mutagenesis: cytidine deamination by activation-induced cytidine deaminase (AID) to generate a G:U mispair and subsequent error-prone repair synthesis by an error-prone DNA polymerase.
Although SHM is a tightly regulated process, it can act aberrantly, even in normal cells, to somatically mutate other (off-target) sites. Aberrant SHM (aSHM) sites are frequently found in noncoding DNA regions containing cis-regulatory elements, such as intragenic super-enhancers (SEs)—binding sites for multiple factors that increase transcription (5–7). These regions are proximal to the transcription start site (TSS) of actively transcribed genes and are in an “open” chromatin conformation (p300- and H3K27ac-positive).
aSHM targets were delineated by low-resolution single-cell–based polymerase chain reaction (PCR) sequencing of sites with clustered mutations (8). The first of these targets to be identified was the promoter/SE region of the BCL6 gene (8, 9). The BCL6 gene encodes a transcription corepressor protein, whose expression is tightly coordinated with entry and exit from germinal centers (10, 11). Germinal center B cells have high levels of the BCL6 protein, which regulates the expression of many genes involved in B cell differentiation. On the other hand, antibody-secreting plasma cells and memory B cells that exit the germinal center turn off BCL6 expression to facilitate the switch between differentiation and stable cell state maintenance. It has been hypothesized that mutations within the BCL6 SE dysregulate BCL6 expression to promote lymphomagenesis (12, 13). Indeed, Sanger sequencing of PCR amplicons from diffuse large B cell lymphomas has identified clonal mutations in the BCL6 SE locus (12). Furthermore, reporter assays in cultured cells showed that some of these mutations interfere with the binding of transcriptional repressor factors to up-regulate BCL6 expression (14). More recently, next-generation whole-genome sequencing studies of diffuse large B cell lymphomas reported the presence of mutation clusters within the BCL6 SE locus (7); however, only approximately 30 clonal mutations were observed in a total of 10 lymphoma samples, an inadequate number for characterizing the types and distributions of mutations.
A high-resolution landscape of SE mutations in B cells from healthy individuals will aid in unravelling the process of aSHM and enable early detection of preexisting SE mutations that might signal lymphomagenesis. We reasoned that DNA mutated by aSHM is preserved in circulating memory B cells such that deep sequencing of purified B cells would reveal details of the mutational landscape. We used duplex sequencing (DS) (15, 16) for this purpose. DS provides a mutational landscape of genomic DNA at single-nucleotide resolution to reveal mutational patterns and potential underlying mechanisms. The accuracy of DS stems from copying both strands of single DNA molecules; mutations are defined based on complementarity and presence in both strands at the same position. As a result, DS is approximately 1,000-fold more accurate than routine next-generation sequencing and allows the identification of rare mutations—those present at frequencies as low as 1 base substitution in 107 nucleotides sequenced.
Using targeted capture, we purified and sequenced 12 SE loci in genomic DNA samples of human CD19+ B cells isolated from 10 healthy individuals of diverse ethnic backgrounds. The mutational landscape shows clustered mutations, most elevated in the SE of BCL6 and to a lesser extent in the SEs of PAX5 and CD83. A total of ∼9,000 base substitutions were detected in the 10 samples; ∼8,000 of these were found in the BCL6 SE. The mutational profiles of the BCL6 SE are remarkably similar in individuals across various ethnic groups, suggesting that they are not the result of random stochastic events but instead may function in normal developmental processes. Mutational spectra and signatures at the BCL6 SE locus further suggest that mutations result from the activities of AID and of the error-prone DNA polymerase, Pol η.
Results
High Frequency of Mutations in the BCL6 SE.
We sequenced 12 aSHM targets (SI Appendix, Table S1) in CD19+ B cells from 10 healthy blood donors of different ethnicities (SI Appendix, Table S2). The target gene segments are located ∼1 kb 3′ of the TSS and are reported SEs of the associated genes (5–7). As a control, we also captured and sequenced an SHM target locus frequently found in memory B cells: the variable region of the Ig gene, IgHV3-23 (17, 18). Of the 12 sequenced SE loci, we found that BCL6 is the most frequently mutated. We observed more than 8,000 subclonal mutations (allele fraction <0.5%) in 10 individuals, to yield an average mutation frequency (MF; mutations/total nucleotides sequenced) of 2.2 × 10−4 (Fig. 1 and SI Appendix, Table S3), comparable to that observed at the SHM target, IgHV3-23 (MF 1.3 × 10−4; Fig. 1 and SI Appendix, Table S3). Mutant allele fractions (MAFs; number of mutations at each nucleotide position/total alleles sequenced at each nucleotide position) ranging from ∼0.002% to ∼0.5% were detected in the BCL6 SE (Fig. 2 A and B). Since circulating memory B cells in peripheral blood comprise between 5% and 20% of the B cell population (19), we estimate that there are between 0.8 and 3.2 mutations within the 1-kb BCL6 SE region in each circulating memory B cell, assuming that other circulating B cells do not contain frequent mutations in BCL6.
The SEs of PAX5, POU2AF, and CD83 also showed high mutation frequencies; however, they are more than one order of magnitude lower than that of the BCL6 SE. The mutation frequencies are 9.7 × 10−6, 9.4 × 10−6, and 6.9 × 10−6, respectively (Fig. 1 and SI Appendix, Table S3). The SE of H2AFX was the least mutated, with an average mutation frequency of 2.8 × 10−7, close to background detection levels (Fig. 1 and SI Appendix, Table S3). The mutation frequencies of the other sequenced SEs ranged between those of BCL6 and H2AFX SEs (Fig. 1 and SI Appendix, Table S3). Since large numbers of mutations lend confidence to data interpretation, we focused our studies on the mutational analyses of the BCL6 SE locus.
Mutations in the BCL6 SE Locus Show a Remarkably Similar Pattern in All Individuals.
High-resolution duplex DNA sequencing enabled us to delineate the landscape of subclonal mutations in the BCL6 SE locus (Fig. 2 A and B). We found that close to 90% of the ∼8,000 mutations are located in a 600-bp region that corresponds to a tentative “open” chromatin region enriched for histone markers H3K4me3 and H3K27Ac in germinal center B cells (20, 21). The density of mutations is extensive, with as many as 3 mutations present in a single read (SI Appendix, Fig. S1). Within this 600-bp region, we observed 2 major mutational clusters, each spanning ∼50 nt (Fig. 2A and SI Appendix, Fig. S2). The average mutation frequencies for clusters 1 and 2 are 7.2 × 10−4 and 7.7 × 10−4, respectively. These are in contrast to a frequency of 3.5 × 10−6 in a 50-bp region downstream from the hotspot clusters (Fig. 2A). The mutation load at the 2 hotspots suggests that 20% to 80% of memory B cells have at least 1 mutation within 1 of these clusters. Notably, both the location of these mutation clusters and the mutation frequencies are remarkably conserved in healthy individuals across different ethnicities (Fig. 2B and SI Appendix, Fig. S3 and Table S4). The average mutation frequency of the BCL6 SE is 1.5 × 10−4 in Caucasians, 2.0 × 10−4 in African Americans, 4.2 × 10−4 in Asians, and 2.0 × 10−4 in Hispanics (SI Appendix, Fig. S3 and Table S4), and the average across all individuals is 2.2 × 10−4 (Fig. 1 and SI Appendix, Table S3). There is no significant difference among the 4 ethnic groups (P = 0.26) (SI Appendix, Fig. S3).
We also identified a third cluster of mutations in the TAA/WA sequences, with an average mutation frequency of 5.4 × 10−4 (Fig. 2B). Although mutations here are present at high density, they are at lower frequencies than those in clusters 1 and 2. Clustered mutations were also detected in the PAX5 and CD83 SEs (SI Appendix, Fig. S4 A and B, respectively), suggesting that SEs of B cell lineage-specific genes are targeted by a similar mutagenic mechanism.
To confirm that these clustered mutations are specific to the memory B cell population, we purified and sequenced different cells—memory B cells, naïve B cells, monocytes, and T lymphocytes, as well as cultured primary fibroblasts, transformed embryonic kidney cells (293T) and the colon cancer cell line SW480—using the same SE capture probe set. We found that among all cell types, only memory B lymphocytes have an increased frequency of mutations at the BCL6 SE locus (Table 1).
Table 1.
Sample | BCL6 SE mutation frequency |
Memory B cells | 1.8 × 10−4 |
Naïve B cells | 8.5 × 10−6 |
Monocytes | <3.7 × 10−7 |
T cells | <3.6 × 10−7 |
Fibroblasts (AG01440) | <2.8 × 10−7 |
Fibroblasts (AG09860) | <3.4 × 10−7 |
HEK 293T | 5.1 × 10−6 |
Colon cancer (SW480) | 1.4 × 10−6 |
The BCL6 SE Locus Is Targeted by AID and Error-Prone DNA Polymerases.
To gain insight into mutational processes operating at the BCL6 SE, we examined the spectrum of base substitutions. We observed that transition base substitutions occur more frequently than transversions (SI Appendix, Fig. S5A). A > G/T > C and C > T/G > A transitions account for 28% and 23% of all mutations, respectively, while C > G/G > C, A > C/T > G, A > T/T > A, and C > A/G > T transversions constitute 18%, 13%, 12%, and 4%, respectively, of the remainder. The frequency of transition base substitutions in the BCL6 SE is similar to that observed at the SHM target locus IgHV3-23, in which C > T/G > A and A > G/T > C transitions comprise 34% and 23%, respectively, of all mutations (SI Appendix, Fig. S5B). The C > T transitions in SHM are generated primarily by AID. AID is an initiator of SHM; it deaminates cytidine to uracil, which base pairs with adenine.
In addition to AID, error-prone DNA polymerases are also implicated in SHM. The leading candidate is Pol η. Of the known error-prone DNA polymerases, Pol η is believed to function in a noncanonical base excision repair or mismatch repair pathway following AID-catalyzed cytidine deamination (4). Pol η preferentially generates G:T or A:C mismatches at A:T base pairs during DNA synthesis to generate A > G/T > C transition mutations (22). We observed that almost every A/T site within the TAA/WA repeat region in the BCL6 SE shows a predominance of transition mutations (SI Appendix, Fig. S6).
Our mutation spectrum data suggest that, similar to the Ig genes, the BCL6 SE locus is a target of AID and Pol η, resulting in a high prevalence (>60%) of A > G and C > T transition mutations. Moreover, the C > G/G > C transversions that we observed could also result from the sequential activities of AID and error-prone DNA replication. The G:U mismatches generated by AID can be efficiently converted to G:apurinic/apyrimidinic (AP) nucleotide pairs by uracil DNA glycosylase (UDG). While replicative DNA polymerases insert A residues across unrepaired AP sites (23), error-prone polymerases such as Rev1 preferentially insert C residues to generate C > G/G > C transversions instead of C > T/G > A transitions (24, 25). Pol η may also play a role in AID-induced C > G transversions (24, 25) but the mechanism for this is unclear. Thus, the contributions of AID, Rev1, and Pol η to mutations (C > T, C > G and A > G) within the BCL6 SE could be as high as ∼70% (SI Appendix, Fig. S5A).
Mutant Allele Fractions Reveal the Sequence of Mutagenesis at the BCL6 SE.
Allele fractions of mutations can be used to unravel sequential mutagenic events in a cell population. It is well established that SHM of Ig genes involves multiple cycles of mutagenesis, mutation selection, and cell division to generate plasma and memory B cells expressing high-affinity antibodies (1–4). While clonal selection enriches variant Ig genes, mutations in other genes, such as BCL6, are likely unselected or much less selected in healthy individuals. Thus, mutations at the BCL6 locus preserve an unbiased history of mutagenic events that have transpired in memory B cells. For example, a mutagenic event that occurs during earlier rounds of maturation will be clonally expanded by more cell divisions and thus likely to be more frequently detected in terminally selected memory B cells, while later mutagenic events will be present at a lower frequency due to fewer rounds of cell expansion. Based on this premise, we delineated the order of mutagenic events by parsing the observed mutations into 5 bins corresponding to the following MAFs: bin 1, >1 × 10−3; bin 2, >5 × 10−4 to 1 × 10−3; bin 3, >1 × 10−4 to 5 × 10−4; bin 4, >5 × 10−5 to 1 × 10−4; and bin 5, >1.7 × 10−5 to 5 × 10−5. We then determined whether there were differences in the spectrum of base substitutions in different bins.
The base changes at C residues, resulting primarily from the activity of AID are likely the earliest events, as C > T/G > A and C > G/G > C alterations comprise >70% of all mutation types in bin 1 (highest MAF; Fig. 3A and SI Appendix, Fig. S7A). A > G/T > C substitutions, reflective of Pol η activity, increase in abundance in bins 2 and 3, concomitant with a decrease in C substitutions. A > G/T > C substitutions then decrease in bins 4 and 5, while C > T/G > A and C > G/G > C alterations increase progressively. A similar pattern was observed when we compared substitutions at G:C versus A:T pairs. Close to 80% of the substitutions in bin 1 are at G:C pairs; this fraction decreased in bins 2 and 3 but then increased in bins 4 and 5 in coordination with an increase and decrease, respectively, of mutations at A:T pairs (Fig. 3B and SI Appendix, Fig. S7B). These data suggest that mutations at the BCL6 SE locus are not the result of random events but instead arise from the coordinated action of AID and Pol η. They further suggest that AID inscribes its mark on the mutational landscape before Pol η. In contrast, mutational footprints of both AID and Pol η appear early (bin 1: MAF >1 × 10−3, indicative of clonal expansion) in the Ig gene IgHV3-23 (SI Appendix, Fig. S8). After initial rounds of selection, however, AID catalysis becomes predominant in bin 2 (MAF >5 × 10−4 to 1 × 10−3), with approximately equal contributions of AID and Pol η thereafter (SI Appendix, Fig. S8).
Mutational Signatures of BCL6 SE Are Consistent with the Contribution of AID, Pol η, and Rev1 to aSHM.
We determined mutational signatures at the BCL6 SE by examining the identity of adjacent 5′ and 3′ nucleotides (26). We observed that A > G/T > C transitions are mostly present at ATA sites (Fig. 4A), while both C > T/G > A transitions and C > G/G > C transversions occur most often in the trinucleotide sequence, ACA (Fig. 4A). This spectrum is similar in all 10 B cell samples (SI Appendix, Fig. S9A). In contrast, C > T/G > A transition mutations in IgHV3-23 show a sequence preference for GCT and ACC sites, whereas C > G/G > C transversions are more frequent at GCT and ACT sequences (Fig. 4B). Moreover, mutational signatures of IgHV3-23 differ among the 10 individuals (SI Appendix, Fig. S9B), likely due to selection of different mutant IgH clones in each person.
We further used deconstructSigs (27) to deconstruct our signatures into COSMICv3 single base substitution (SBS) signatures. We identified 6 mutational signatures in IgHV3-23 (Fig. 4B), 2 of which—SBS84 (25.2%) and SBS85 (9%)—have SHM-related etiology. SBS84 shows predominant C > T transitions (likely AID-mediated), and SBS85 has equal contributions of T > A and T > C substitutions (likely the result of DNA synthesis by error-prone DNA polymerases). We extracted 2 additional signatures, SBS37 and SBS39, which also potentially could be attributed to error-prone DNA polymerase activity during SHM. SBS37 (9%) and SBS39 (23.3%) are represented by T > C transitions and C > G transversions, respectively (SI Appendix, Fig. S10). Similarly, we identified 6 mutational signatures at the BCL6 SE (Fig. 4A); 3 of these—SBS85 (8.6%), SBS37 (13.1%), and SBS39 (28.6%)—are shared with IgHV3-23. While the etiology of SBS85 is known, the etiologies of SBS37 and SBS39 are not.
Based on the spectrum of nucleotide substitutions found in SBS37 and SBS39, and their presence in IgHV3-23 (the target of SHM), we hypothesize that SBS37 and SBS39 could be signatures of error-prone DNA polymerases in SHM. A combination of signatures SBS 37, 39, 84, and/or 85 was also identified at the PAX5 and CD83 SE loci, where they accounted for as much as 75% of all signatures (SI Appendix, Fig. S11). Thus, signatures of SHM permeate the mutational landscape of B cell SEs.
Discussion
We sequenced the SE regions of 12 highly expressed genes in B cells from 10 healthy individuals. Previous studies of mutagenesis of B cell SEs were based on low-resolution single-cell PCR sequencing (8) or mutant mouse models (5). A recent study involving whole-genome sequencing of single human B cells reported mutations at the BCL6 SE locus; however, very few mutations were identified in this study, precluding a detailed analysis of aSHM (28). We used exceptionally accurate high-depth DS (15, 16) to reveal a high-resolution landscape of mutations and mutational processes in B cell SEs in normal healthy individuals.
We show that of all sequenced SEs, the BCL6 SE is the most favored target of aSHM, with ∼8,000 subclonal mutations in 10 individuals. The average mutation frequency of 2.2 × 10−4 at the BCL6 SE locus is as high as the SHM frequency of the Ig genes (Fig. 1 and SI Appendix, Table S3). The accurate identification of large numbers of single nucleotide substitutions at the single-nucleotide level enabled the identification of refined clustered mutations. There are 3 clusters of mutations within the BCL6 SE, 2 of which overlap with AID target motifs (RGYW) (29) and a third lying within the TAA/WA repeat sequences (Fig. 2B and SI Appendix, Figs. S2 and S6) (30). In addition, there are multiple mutations, located <100 nt apart, in multiple sequence reads (SI Appendix, Fig. S1). While the density of mutations within these clusters is high, the allele frequency of individual mutations is <0.5%, with most even lower (Fig. 2B). Remarkably, the location and allele frequency of these clustered mutations is highly conserved in individuals across different ethnicities (Fig. 2B and SI Appendix, Table S2). BCL6 and IgH could be within the same topologically associating domain or proximal to each other in the chromatin architecture (7); therefore, the subclonal mutations in the BCL6 SE may be the result of collateral damage of SHM at the IgH locus. The unusually high numbers of mutations at the BCL6 SE locus suggests that they are not random events, but rather may be a part of the normal processes of B cell maturation and differentiation.
The identification of thousands of mutations in the BCL6 SE enabled the extraction of mutation signatures and analyses of mutational processes operative during aSHM in normal cells. We show that mutations at the BCL6 SE are dominated by A > G, C > T, and C > G substitutions (SI Appendix, Fig. S5A). Together, these 3 mutation types comprise 70% of all mutations at this locus. While C > T mutations are in accord with cytidine deamination by AID (4), A > G and C > G substitutions are consistent with synthesis by error-prone DNA polymerases Pol η and Rev1, respectively (22, 24, 25, 31–34). It is well established that AID is essential for B cell development in response to antigen stimulation and for antibody diversification in germinal centers (35, 36). Multiple error-prone DNA polymerases have been implicated in SHM (33, 37–39); of these, Pol η in particular shows increased expression in germinal center B cells (31, 40, 41) and can be visualized specifically expressed in the dark zone, where SHM occurs (42). The activities of both mutators are also evident in the extracted mutational signatures (Fig. 4A). SBS84 and SBS85 are designated as signatures of AID-induced somatic mutagenesis in lymphoid cells in COSMICv3. We suggest that SBS37 and SBS39 are signatures of error-prone DNA polymerases Pol η and Rev1, respectively, in SHM, based on their prevalence in the SHM target IgHV3-23 and the dominance of nucleotide substitutions characteristic of Pol η (22) and Rev1 (24). This notion is further supported by the prevalence of signature SBS37 in non-Hodgkin B cell lymphoma and the revelation by Supek and Lehner (43) of a putative Pol η signature, similar to SBS37 (prevalence of A > G transitions in WA motifs), at sites of clustered mutations in human B cell lymphomas. Thus, SHM accounts for the etiology of at least 50% of the mutational signatures at BCL6 SE. Approximately one-quarter of mutations are of unknown signatures (Fig. 4A), raising the possibility that mutagenic processes besides SHM also operate at the BCL6 SE locus. Interestingly, SBS37 and excess mutations in Pol η mutable motifs, TAA/WA, are also found in non-B cell cancers, where in fact there is a negative correlation between the activities of AID and Pol η (44). Perhaps deregulation of Pol η expression and activity contribute to Pol η-induced mutations in these cancers.
Mutant allele frequencies also facilitated an evaluation of the order of mutational events during aSHM. We show that AID initiates aSHM and that Pol η operates later to extend the mutational landscape (Fig. 3). While the order of AID and Pol η mutagenesis during SHM varies slightly from that during aSHM, it is evident that a balance of these 2 mutagenic activities is required for fine-tuning antibody diversity.
One of the unique features of the BCL6 SE mutation landscape is the high density of mutations in DNA sequences located between 400 and 1,000 bp downstream of the TSS. There is a 20-fold difference in mutation frequencies between sequences with low variation (0 to 400 bp downstream of TSS; MF 1.7 × 10−5) and those with clustered mutations (400 to 1,000 bp; MF 3.5 × 10−4). This demarcation in mutation burden tracks with the positioning of histones H3K27ac and H3K4me3 (21). The “mutation-hot” region shows high levels of histone H3 acetylation and methylation, indicative of open chromatin and active transcription with a high density of cis-regulatory elements (45), such as SEs. The cold region is devoid of H3K27ac, suggesting that the nucleosome is in a closed conformation and lacks regulatory function.
While an “open” chromatin conformation facilitates active transcription of BCL6 during B cell development, it also increases the accessibility of this region to the SHM machinery, which generates mutations and promotes strand breakage (46). Indeed, it has been reported that this “mutation-hot” region is also a hotbed for DNA breaks leading to genomic translocations, frequently fusing BCL6 with the Ig heavy chain genes (20, 47). In addition to chromatin conformation, AID access to nuclear DNA and AID-induced C > T mutations are also regulated by proteins such as GANP, which facilitates translocation of AID from the cytoplasm into the nucleus and targets it to sites of SHM (48).
Multiple rounds of affinity selection and clonal expansion are required for high-affinity antibody production (13). B cells expressing low-affinity immunoglobulins are rapidly eliminated by apoptosis. In the case of BCL6, specific subclonal mutations may be selected and expanded to cause aberrant cell differentiation or malignant phenotypes. In fact, clonal BCL6 SE mutations and translocations are characteristic of diffuse large B cell lymphomas (12). They can affect the binding of transcription factors, such as the IRF4 repressor, to deregulate BCL6 protein expression (14); BCL6 is frequently overexpressed in B cell lymphomas (49).
The mutational landscape of the BCL6 SE might serve as a sensitive indicator for mutation loads in healthy individuals. Recent studies have reported the presence of low-frequency coding mutations in a large number of genes, including tumor driver genes, in normal individuals (50–52). Many of these are found to be clonally expanded in the tumor genome. The B cell SE subclonal mutations represent a similar class of preexistent mutations in regulatory regions of genomic DNA. Apparent shifts in the pattern of mutation position, density (the distance between mutations), and/or intensity (the extent of mutations) could signal deregulated gene expression and an enhanced risk for lymphomagenesis.
Methods
Human B Cells.
The CD19+ B cells were purchased from AllCells and ReachBio. Peripheral blood mononuclear cells (PBMCs) were purified from whole blood drawn from 10 healthy donors (SI Appendix, Table S2). CD19+ B cells were purified by positive selection using an anti-CD19 antibody-conjugated affinity column. Purified memory B cells (CD27+), naïve B cells, monocytes, and T cells (all purchased from Bloodworks Northwest) were isolated using cell-specific isolation kits and separation on autoMACS columns (Miltenyl Biotec). All samples were deidentified before use.
DNA Isolation.
Purified CD19+ B cells were lysed with proteinase K, and DNA was extracted using the Qiagen DNeasy Blood and Tissue Kit. DNA concentrations were quantified with a Nanodrop 2000 spectrophotometer.
Capture Library.
Synthetic DNA oligonucleotides (purchased from IDT) were used for targeted gene capture. A capture-probe pool comprising 5′-biotinylated DNA oligonucleotides (120 nt) was designed and used in sequential rounds of hybridization to enrich and sequence the indicated SEs (SI Appendix, Table S1).
DS.
DS was performed using published protocols (15, 16). In brief, genomic DNA from purified CD19+ B cells was fragmented by sonication, end-repaired, and A-tailed. Purified DNA fragments (∼250 bp) were ligated with DS adaptors containing unique molecular barcodes comprising 12 random nucleotides. The ligated DNA was PCR-amplified, and purified amplicons were hybridized to 5′-biotinylated capture probes. Double capturing was performed as described previously (53) to improve on-target capture efficiency. After purification by binding to streptavidin beads, the final PCR amplified the captured DNA library using distinct index primers for each sample to enable multiplex sequencing. The prepared libraries were mixed and sequenced using an Illumina HiSeq 2500 sequencer.
Data Processing.
Sequencing data were processed by as described previously (16). Sequencing reads with 12-nt molecular barcodes were assembled and aligned to RefSeq to generate single-strand consensus sequences. Pairs of single-strand consensus sequences with complementary barcodes were grouped to establish double-strand consensus sequences. Mutations were scored only if base substitutions were present at the same position in both strands and were complementary to each other.
Data Availability.
Sequencing data presented in this paper can be accessed at NCBI BioProject PRJNA574179 (54).
Supplementary Material
Acknowledgments
We thank Mark Fielden, Herve Lebrec, and Hisham Hamadeh for helpful input during the course of these studies; and Clint Valentine for bioinformatics assistance. This study was supported by Amgen, NIH/National Cancer Institute Cancer Center Support Grant P30 CA015704, a pilot grant from the Core Center of Excellence in Hematology (P30 DK 56465), Seattle Translational Tumor Research, and NIH/National Cancer Institute Grants P01 CA077852 and R01 CA193649.
Footnotes
Competing interest statement: L.A.L. is a founder and equity holder at TwinStrand Biosciences.
This article is a PNAS Direct Submission.
Data deposition: The data in this paper have been uploaded to NCBI BioProject (accession no. PRJNA574179).
This article contains supporting information online at https://www.pnas.org/lookup/suppl/doi:10.1073/pnas.1914163116/-/DCSupplemental.
References
- 1.Klein U., Dalla-Favera R., Germinal centres: Role in B cell physiology and malignancy. Nat. Rev. Immunol. 8, 22–33 (2008). [DOI] [PubMed] [Google Scholar]
- 2.Victora G. D., Nussenzweig M. C., Germinal centers. Annu. Rev. Immunol. 30, 429–457 (2012). [DOI] [PubMed] [Google Scholar]
- 3.De Silva N. S., Klein U., Dynamics of B cells in germinal centres. Nat. Rev. Immunol. 15, 137–148 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Peled J. U., et al. , The biochemistry of somatic hypermutation. Annu. Rev. Immunol. 26, 481–511 (2008). [DOI] [PubMed] [Google Scholar]
- 5.Liu M., et al. , Two levels of protection for the B cell genome during somatic hypermutation. Nature 451, 841–845 (2008). [DOI] [PubMed] [Google Scholar]
- 6.Meng F. L., et al. , Convergent transcription at intragenic super-enhancers targets AID-initiated genomic instability. Cell 159, 1538–1548 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Qian J., et al. , B cell super-enhancers and regulatory clusters recruit AID tumorigenic activity. Cell 159, 1524–1537 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Pasqualucci L., et al. , BCL-6 mutations in normal germinal center B cells: Evidence of somatic hypermutation acting outside Ig loci. Proc. Natl. Acad. Sci. U.S.A. 95, 11816–11821 (1998). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Shen H. M., Peters A., Baron B., Zhu X., Storb U., Mutation of BCL-6 gene in normal B cells by the process of somatic hypermutation of Ig genes. Science 280, 1750–1752 (1998). [DOI] [PubMed] [Google Scholar]
- 10.Crotty S., Johnston R. J., Schoenberger S. P., Effectors and memories: Bcl-6 and Blimp-1 in T and B lymphocyte differentiation. Nat. Immunol. 11, 114–120 (2010). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Basso K., Dalla-Favera R., Roles of BCL6 in normal and transformed germinal center B cells. Immunol. Rev. 247, 172–183 (2012). [DOI] [PubMed] [Google Scholar]
- 12.Migliazza A., et al. , Frequent somatic hypermutation of the 5′ noncoding region of the BCL6 gene in B-cell lymphoma. Proc. Natl. Acad. Sci. U.S.A. 92, 12520–12524 (1995). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Basso K., Dalla-Favera R., Germinal centres and B cell lymphomagenesis. Nat. Rev. Immunol. 15, 172–184 (2015). [DOI] [PubMed] [Google Scholar]
- 14.Saito M., et al. , A signaling pathway mediating downregulation of BCL6 in germinal center B cells is blocked by BCL6 gene alterations in B cell lymphoma. Cancer Cell 12, 280–292 (2007). [DOI] [PubMed] [Google Scholar]
- 15.Schmitt M. W., et al. , Detection of ultra-rare mutations by next-generation sequencing. Proc. Natl. Acad. Sci. U.S.A. 109, 14508–14513 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Kennedy S. R., et al. , Detecting ultralow-frequency mutations by duplex sequencing. Nat. Protoc. 9, 2586–2606 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Wu Y. C., et al. , High-throughput immunoglobulin repertoire analysis distinguishes between human IgM memory and switched memory B-cell populations. Blood 116, 1070–1078 (2010). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Weller S., et al. , CD40-CD40L independent Ig gene hypermutation suggests a second B cell diversification pathway in humans. Proc. Natl. Acad. Sci. U.S.A. 98, 1166–1170 (2001). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Morbach H., Eichhorn E. M., Liese J. G., Girschick H. J., Reference values for B cell subpopulations from infancy to adulthood. Clin. Exp. Immunol. 162, 271–279 (2010). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Lu Z., et al. , BCL6 breaks occur at different AID sequence motifs in Ig-BCL6 and non-Ig-BCL6 rearrangements. Blood 121, 4551–4554 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Lu Z., et al. , Convergent BCL6 and lncRNA promoters demarcate the major breakpoint region for BCL6 translocations. Blood 126, 1730–1731 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Matsuda T., Bebenek K., Masutani C., Hanaoka F., Kunkel T. A., Low- fidelity DNA synthesis by human DNA polymerase-η. Nature 404, 1011–1013 (2000). [DOI] [PubMed] [Google Scholar]
- 23.Loeb L. A., Preston B. D., Mutagenesis by apurinic/apyrimidinic sites. Annu. Rev. Genet. 20, 201–230 (1986). [DOI] [PubMed] [Google Scholar]
- 24.Kano C., Hanaoka F., Wang J. Y., Analysis of mice deficient in both REV1 catalytic activity and POLH reveals an unexpected role for POLH in the generation of C to G and G to C transversions during Ig gene hypermutation. Int. Immunol. 24, 169–174 (2012). [DOI] [PubMed] [Google Scholar]
- 25.Masuda K., et al. , A critical role for REV1 in regulating the induction of C:G transitions and A:T mutations during Ig gene hypermutation. J. Immunol. 183, 1846–1850 (2009). [DOI] [PubMed] [Google Scholar]
- 26.Alexandrov L. B., et al. ; Australian Pancreatic Cancer Genome Initiative; ICGC Breast Cancer Consortium; ICGC MMML-Seq Consortium; ICGC PedBrain , Signatures of mutational processes in human cancer. Nature 500, 415–421 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Rosenthal R., McGranahan N., Herrero J., Taylor B. S., Swanton C., DeconstructSigs: Delineating mutational processes in single tumors distinguishes DNA repair deficiencies and patterns of carcinoma evolution. Genome Biol. 17, 31 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Zhang L., et al. , Single-cell whole-genome sequencing reveals the functional landscape of somatic mutations in B lymphocytes across the human lifespan. Proc. Natl. Acad. Sci. U.S.A. 116, 9014–9019 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Rogozin I. B., Kolchanov N. A., Somatic hypermutagenesis in immunoglobulin genes. Biochim. Biophys. Acta Gene Structure Expression 1171, 11–18 (1992). [DOI] [PubMed] [Google Scholar]
- 30.Zhao Y., et al. , Mechanism of somatic hypermutation at the WA motif by human DNA polymerase η. Proc. Natl. Acad. Sci. U.S.A. 110, 8146–8151 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Zeng X., et al. , DNA polymerase eta is an A-T mutator in somatic hypermutation of immunoglobulin variable genes. Nat. Immunol. 2, 537–541 (2001). [DOI] [PubMed] [Google Scholar]
- 32.Rogozin I. B., Pavlov Y. I., Bebenek K., Matsuda T., Kunkel T. A., Somatic mutation hotspots correlate with DNA polymerase eta error spectrum. Nat. Immunol. 2, 530–536 (2001). [DOI] [PubMed] [Google Scholar]
- 33.Seki M., Gearhart P. J., Wood R. D., DNA polymerases and somatic hypermutation of immunoglobulin genes. EMBO Rep. 6, 1143–1148 (2005). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Zanotti K. J., Gearhart P. J., Antibody diversification caused by disrupted mismatch repair and promiscuous DNA polymerases. DNA Repair (Amst.) 38, 110–116 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Storb U., Stavnezer J., Immunoglobulin genes: Generating diversity with AID and UNG. Curr. Biol. 12, R725–R727 (2002). [DOI] [PubMed] [Google Scholar]
- 36.Keim C., Kazadi D., Rothschild G., Basu U., Regulation of AID, the B-cell genome mutator. Genes Dev. 27, 1–17 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Zan H., et al. , The translesion DNA polymerase ζ plays a major role in Ig and bcl-6 somatic hypermutation. Immunity 14, 643–653 (2001). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Zan H., et al. , The translesion DNA polymerase θ plays a dominant role in immunoglobulin gene somatic hypermutation. EMBO J. 24, 3757–3769 (2005). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Weill J.-C., Reynaud C.-A., DNA polymerases in adaptive immunity. Nat. Rev. Immunol. 8, 302–312 (2008). [DOI] [PubMed] [Google Scholar]
- 40.Ouchida R., et al. , Genetic analysis reveals an intrinsic property of the germinal center B cells to generate A:T mutations. DNA Repair (Amst.) 7, 1392–1398 (2008). [DOI] [PubMed] [Google Scholar]
- 41.Masuda K., et al. , DNA polymerase η is a limiting factor for A:T mutations in Ig genes and contributes to antibody affinity maturation. Eur. J. Immunol. 38, 2796–2805 (2008). [DOI] [PubMed] [Google Scholar]
- 42.McHeyzer-Williams L. J., Milpied P. J., Okitsu S. L., McHeyzer-Williams M. G., Class-switched memory B cells remodel BCRs within secondary germinal centers. Nat. Immunol. 16, 296–305 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Supek F., Lehner B., Clustered mutation signatures reveal that error-prone DNA repair targets mutations to active genes. Cell 170, 534–547.e23 (2017). [DOI] [PubMed] [Google Scholar]
- 44.Rogozin I. B., et al. , DNA polymerase η mutational signatures are found in a variety of different types of cancer. Cell Cycle 17, 348–355 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45.Calo E., Wysocka J., Modification of enhancer chromatin: What, how, and why? Mol. Cell 49, 825–837 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46.Lieber M. R., Mechanisms of human lymphoid chromosomal translocations. Nat. Rev. Cancer 16, 387–398 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47.Ye B. H., et al. , Chromosomal translocations cause deregulated BCL6 expression by promoter substitution in B cell lymphoma. EMBO J. 14, 6209–6217 (1995). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48.Eid M. M. A., et al. , Integrity of immunoglobulin variable regions is supported by GANP during AID-induced somatic hypermutation in germinal center B cells. Int. Immunol. 29, 211–220 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49.Hatzi K., Melnick A., Breaking bad in the germinal center: How deregulation of BCL6 contributes to lymphomagenesis. Trends Mol. Med. 20, 343–352 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 50.Martincorena I., et al. , Somatic mutant clones colonize the human esophagus with age. Science 362, 911–917 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 51.Yokoyama A., et al. , Age-related remodelling of oesophageal epithelia by mutated cancer drivers. Nature 565, 312–317 (2019). [DOI] [PubMed] [Google Scholar]
- 52.Yizhak K., et al. , RNA sequence analysis reveals macroscopic somatic clonal expansion across normal tissues. Science 364, eaaw0726 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 53.Schmitt M. W., et al. , Sequencing small genomic targets with high efficiency and extreme accuracy. Nat. Methods 12, 423–425 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 54.Shen J.-C., Loeb L. A., A high resolution landscape of mutations in the BCL6 super-enhancer in normal B cells. BioProject. https://www.ncbi.nlm.nih.gov/sra/PRJNA574179. Deposited 25 September 2019. [DOI] [PMC free article] [PubMed]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
Sequencing data presented in this paper can be accessed at NCBI BioProject PRJNA574179 (54).