Skip to main content
Proceedings of the National Academy of Sciences of the United States of America logoLink to Proceedings of the National Academy of Sciences of the United States of America
. 2009 May 14;106(21):8629–8634. doi: 10.1073/pnas.0903803106

V-region mutation in vitro, in vivo, and in silico reveal the importance of the enzymatic properties of AID and the sequence environment

Thomas MacCarthy a,1, Susan L Kalis b,1, Sergio Roa b, Phuong Pham c, Myron F Goodman c, Matthew D Scharff b,2, Aviv Bergman b,2
PMCID: PMC2682541  PMID: 19443686

Abstract

The somatic hypermutation of Ig variable regions requires the activity of activation-induced cytidine deaminase (AID) which has previously been shown to preferentially deaminate WRC (W = A/T, R = A/G) motif hot spots in in vivo and in vitro assays. We compared mutation profiles of in vitro assays for the 3′ flanking intron of VhJ558-Jh4 region to previously reported in vivo profiles for the same region in the Msh2−/−Ung−/− mice that lack base excision and mismatch repair. We found that the in vitro and in vivo mutation profiles were highly correlated for the top (nontranscribed) strand, while for the bottom (transcribed) strand the correlation is far lower. We used an in silico model of AID activity to elucidate the relative importance of motif targeting in vivo. We found that the mutation process entails substantial complexity beyond motif targeting, a large part of which is captured in vitro. To elucidate the contribution of the sequence environment to the observed differences between the top and bottom strands, we analyzed intermutational distances. The bottom strand shows an approximately exponential distribution of distances in vivo and in vitro, as expected from a null model. However, the top strand deviates strongly from this distribution in that mutations approximately 50 nucleotides apart are greatly reduced, again both in vivo and in vitro, illustrating an important strand asymmetry. While we have confirmed that AID targeting of hot and cold spots is a key part of the mutation process, our results suggest that the sequence environment plays an equally important role.

Keywords: simulation, somatic hypermutation, variable-region


Humans and mice have evolved a complex molecular and cellular process to generate a highly diverse antibody response that protects them from infectious agents and toxic substances in the environment. Initially, a large repertoire of antigen binding sites is generated through the rearrangement of different combinations of germ line immunoglobulin (Ig), variable (V), diversity (D), and joining (J) elements to form the heavy (H) and light (L) chain V region genes (reviewed in ref. 1). Since these germline-encoded antibodies are often not protective, they are further diversified after exposure to antigen through somatic hypermutation (SHM) to produce high affinity antibodies (2, 3). SHM occurs in the germinal centers of secondary lymphoid organs (4) where the B cells differentiate into centroblasts and express large amounts of activation induced cytidine deaminase (AID), which initiates and is required for SHM of the Ig H and L chain V regions (57). The AID-dependent mutation of Ig V regions occurs at frequencies of 10−5 to 10−3 per base pair (8) and requires transcription (2, 3).

In vivo, AID initiates hypermutation of both the transcribed and nontranscribed strands of Ig variable region genes by deaminating dC residues during transcription. In the first phase of SHM, the resulting uracils are either replicated over, resulting in C to T transitions, or excised by uracil DNA glycosylase (UNG) during short patch base excision repair (BER) and replaced with incorrect nucleotides, perhaps by copying abasic sites, resulting in transition or transversion mutations. This first phase of SHM is responsible for the mutations at dC bases (2, 3). In the second phase of SHM the G·U or G·abasic moiety mismatches are excised along with neighboring bases by either long patch BER or mismatch repair (MMR), and the strand that is removed is resynthesized primarily by Polη (9), and perhaps other error prone translesional polymerases, to generate the mutations in A and T bases and to repair some of the mutations generated by AID. While the effects of BER and MMR may complicate the identification of the motifs that are the primary targets of AID in vivo, studies with mice that are genetically deficient in both of these repair processes (10) confirm that AID deaminates dC residues on both DNA strands primarily in WRC (W = A or T, R = A or G) hot spot motifs and avoids SYC (S = G or C; Y = pyrimidine) cold spots. The presence of favored and disfavored motifs for deamination suggests that the primary DNA sequence itself directs the targeting of AID activity to particular sites in the V region in vivo.

Biochemical studies with purified AID show that the substrate for AID is single stranded DNA (ssDNA) (1113). The strong preference for the deamination of dC in WRC motifs and avoidance of SYC motifs seen in vivo is also observed with single stranded DNA substrates or when double stranded DNA is transcribed with either T7 or Escherichia coli DNA polymerases and treated with AID in vitro (14, 15). In vitro AID acts processively through a process of sliding and jumping (12, 14), as has been described for APOBEC3G (16) and restriction enzymes (17), resulting in multiple mutations in single substrate molecules once they are contacted by AID. In vivo AID undergoes postranslational modifications (1820), interacts with DNA binding and other proteins (reviewed in ref. 3), requires ongoing transcription that generates supercoiled and ssDNA that can serve as the substrate for AID (14, 21, 22), and encounters secondary DNA structures (23, 24) and chromatin (reviewed in ref. 25). Any of these could affect the targeting of AID to particular gene segments and to particular motifs within such gene segments, but it remains to be established how this targeting really occurs (26).

Since the same hot spot motifs are preferentially targeted in vivo and in vitro with purified AID, it has been suggested that the AID protein contains much of the information required for its selection of substrate motifs (1214). However, since these in vitro studies have been done with synthetic ssDNA substrates or the lacZα gene, these data have not been used to determine the role of the sequences that adjoin the hot and cold spots in actual V regions so as to quantify the differences and similarities between the selectivity of purified AID and the enzyme preferences in vivo. Here we have reexamined the detailed targeting of purified AID using the VhJ558-Jh4 downstream intron as a substrate in vitro and compared the results to the dataset of mutations on the same region in vivo in mice that lack both BER and MMR (Msh2−/−Ung−/− mice) (10) and have mutations caused solely by the action of AID. We have also simulated the targeting of AID to this natural substrate using minimal assumptions suggested by the in vitro studies. We have found a remarkable similarity of the targeting and spacing of mutations in the upper nontranscribed strand of the VhJ558-Jh4 intron and a significant, but less strong correlation, in the lower transcribed strand. The good agreement between the in vitro and in vivo mutational spectra suggests that the AID protein and the overall sequence environment surrounding hot and cold spots play critical roles in determining which bases are deaminated.

Results

The Spatial Distribution of Mutations of Murine Jh4 Intron in Vitro and in Vivo.

To directly address the targeting of AID activity on a naturally occurring V region substrate, the murine heavy chain VhJ558-Jh4 3′ flanking region (referred to from now on as the Jh4 intron) was inserted downstream of the lacZα gene of M13mp2 phage in both orientations (supporting information (SI) Fig. S1). We chose the Jh4 intron as a substrate to compare the activity of AID in vitro and in vivo because Rada and Neuberger (10) generated a dataset of mutations that are attributable to the direct biochemical action of AID in vivo in MMR and BER doubly deficient mice. This particular region adjoins the V region coding exon but includes only a small part of the actual V region and is therefore close enough to the transcription start site to reflect the high rates of mutation that occur in the coding part of the V region. Since we have excluded from this analysis almost all of the sequence from the 3′ end of the coding exon of the V region, the sequences we are analyzing are noncoding and have not been subjected to selection for higher affinity antibodies in vivo (10). To identify the C residues that are deaminated by AID in vitro, gapped ssDNA substrates carrying either the top (nontranscribed) or bottom (transcribed) strand of the Jh4 intron were incubated with human recombinant GST-AID and electroporated into ung-deficient E. coli. The Jh4 intron was sequenced from DNA isolated from individual mutant phage progeny (light blue and white plaques). Deaminations were detected as C→T transition mutations in the mutant phage DNA (Fig. 1). Since AID acts processively on ssDNA (12, 14), light blue or white mutant phage, which have mutations in the lacZα region, are expected to carry mutations also in the adjacent Jh4 intron within the ssDNA gap. Under the reaction conditions used, where only 5–10% of the phage were mutated, we observed that all of the mutant phage contained mutations in the Jh4 intron. The average number of mutations in the Jh4 insert was 14.5 for the top strand and 27.9 for the bottom strand. These data are similar to those reported previously using only the lacZα insert as a substrate for AID in vitro and confirm that under these conditions, AID only acts on a small percentage of the phage (<5%) and is processive once it deaminates a C in a particular insert (12, 14).

Fig. 1.

Fig. 1.

Spatial AID-catalyzed mutation distribution in the Jh4 intron in vitro and in vivo. The distribution of mutations along the length of the sequence (horizontal axis) for the top (A) and bottom (B) strands is shown. Each graph compares in vivo (vertical axis, upwards) with in vitro (vertical axis, downwards) by showing the percentage of sequences mutated at each site. WRC hot spots and SYC cold spots (GYW and GRS on the bottom strand) are highlighted as shown. Note that both the leftmost 60 nt on the top strand and rightmost 60 nt on the bottom strand shown here were removed for the subsequent analysis.

In Fig. 1 we have compared the spatial (site-by-site) distribution of C→T mutations created by AID in vivo in the Jh4 intron of murine B cells from Msh2−/−Ung−/− mice to the spatial distribution of C→T mutations when the same Jh4 region is accessible as ssDNA in an M13 phage construct and treated with recombinant purified AID in vitro. Fig. 1 shows the spatial mutation profile as the frequency of mutation (number of mutations/number of sequences) at each site. In the top strand, there was a high coincidence in the spatial distribution of the mutations that occurred in vivo and in vitro, leading to a correlation of r = 0.704 (P = 1.87 × 10−12, F-statistic). With the bottom (transcribed) strand, the coincidence between the in vivo and in vitro spatial distributions is still detectable but the correlation is far lower (r = 0.319 with P = 1.12 × 10−4, F-statistic). This asymmetry is not due to enrichment of hot or cold spots relative to the number of C sites on either strand (Table S1).

As can be seen in Fig. 1, the sequences of the Jh4 intron immediately adjacent to the lacZα sequences (in Fig. 1 (Left) top strand and (Right) bottom strand) are more highly mutated than the rest of the gapped substrate. This is most likely an artifact of our experimental setup in which we initially use the gap assay to select for mutations in lacZα and rely on the processivity of AID to ensure that recovered phage carry mutations within the adjacent inserted Jh4 intron (Fig. S1). Since this may create an overabundance of mutations in the 60 bases of the Jh4 intronic sequence immediately downstream of lacZα, we eliminated those 60 nts from the analysis of each strand in the correlations calculated above.

In addition, more mutations per sequenced region were generated in vitro than were present in the in vivo sequences. This could lead to saturation effects in the in vitro data that were not present in the in vivo data so we iteratively excluded the most highly mutated in vitro sequences to bring the average number of mutations/V region down to the level of the in vivo data (see Methods). This reduced dataset was used for Fig. 1 and the subsequent correlation analysis. However, when the censored in vitro sequences are included in the calculation, the high correlation between the in vivo and in vitro data on the top strand and the lower correlation on the bottom strand is still present (data not shown).

While we believe that these modifications of the data increase the reliability of the comparison between the in vivo and in vitro data, as noted, they have only a small effect on the correlations. The important point is that, at least for the top strand, there is a remarkable site-to-site correlation between the mutations that AID generates in this natural substrate in vivo and in vitro despite the enormous differences in these 2 environments.

Hot and Cold Spot Mutability Provides a Model for the Enzymatic Targeting of AID.

The high correlation between the distribution of in vivo and in vitro mutations in the top strand of the Jh4 intron suggests that the AID protein and the sequence context of the top strand contain much of the information required for in vivo AID targeting. To quantify the role of AID and of the sequence context of the targeted DNA on the process of mutation, we used a predictive in silico stochastic model of AID activity that is based exclusively on hot and cold spot mutability in the Jh4 intron. To build this model, we began by measuring frequencies of C→T mutations at WRC hot spots, at SYC cold spots and at the remaining neutral sites (i.e., non hot or cold spot C sites) in each of the strands in the in vivo dataset and correcting for base composition (27) (measured mutation frequencies per site on the top strand, hot spots: 21.39%, neutral sites: 5.45%, cold spots 1.51%; for the bottom strand, hot spots: 14.26%, neutral sites: 3.48%, cold spots: 1.04%). To apply these in vivo parameters in silico to the unmutated germline Jh4 intronic sequence, we generated simulated mutated sequences that accumulated the same total number of mutations as observed in each sequence from the in vivo dataset. For each simulated mutation, first a mutation type (hot spot, cold spot, neutral) was chosen based on the discrete probability of mutation that had been observed in vivo. Then a random C site that corresponded to the chosen motif type was mutated to T. Using this hot/cold spot model, where the observed frequency of mutation at hot/cold spots is the only variable, we produced 10,000 simulated datasets, each one simulating the exact size (number of sequences) and the load (number of mutations per sequence) of the observed in vivo dataset.

Each simulated dataset was then compared (correlation of mutation frequency across sites) to the actual in vivo dataset. We would expect the correlations for the simulated datasets to be high (close to 1) if the model represents all of the underlying processes particularly well. To quantify the degree of correlation between the in silico hot/cold spot model and the in vivo data, we constructed a distribution of correlations, as shown by the dashed curves in Fig. 2 A and B for the top and bottom strands, respectively. For comparative purposes, we also simulated a null model, which does not assume hot/cold spot preferences (continuous lines in Fig. 2) and would be expected, as observed, to have no correlation with the in vivo results if hot spots and cold spots are important. We also constructed a “full” model that uses the actual mutation frequency (dotted lines in Fig. 2) of each site as probabilities. This latter model might be expected to have a correlation of 1 and to the extent that it does not achieve that, it reveals the intrinsic variation (or error of the method) of the in silico modeling process that results from the limited size of the in vivo dataset.

Fig. 2.

Fig. 2.

Correlation distributions for simulated datasets. The dashed line in each graph represents correlations for the hot/cold spot model. Each curve is a normalized histogram with area under the curve equal to 1. The continuous lines (for the null model) and dotted lines (for the full model) are shown for comparative purposes. Simulation results are shown for top strand (A) and bottom strand (B) in vivo, and top strand (C) and bottom strand (D) in vitro.

In Fig. 2A, when we used the hot/cold spot model (dashed line), the mean distribution of the correlation with the in vivo mutations on the top strand (r = 0.604) tells us the model represents more than half the full complexity that determines the observed in vivo spatial mutation distribution for the top strand. For the bottom strand, shown in Fig. 2B, this proportion drops to slightly below half (r = 0.482) suggesting, as in Fig. 1, that in vivo other factors are more responsible for the targeting of AID to the bottom strand. We can also weigh this comparison of the in vivo and in silico results against the comparison of the in vivo and the in vitro results reported in Fig. 1. On the top strand the in vitro assay does a better job of reflecting the in vivo data than the in silico model (r = 0.704 vs. r = 0.604, t test, P = 0.01), whereas on the bottom strand the performance of the in silico model is better than the in vitro assay (r = 0.482 vs. r = 0.319, t test, P = 5.42 × 10−4).

To test how closely the in silico model represents the in vitro assay (as opposed to the in vivo process), we carried out a new set of simulations, now using as the simulating variable the observed frequency of mutations in hot spots, cold spots and the neutral triplets from the in vitro data, and then compared the simulated datasets against the actual in vitro dataset. The results are shown in Fig. 2C for the top strand, and in Fig. 2D for the bottom strand. The mean correlation for the top strand is r = 0.566, which is comparable to the r = 0.604 obtained during the simulation of the in vivo data, although still significantly lower (t test, P < 10−16). For the bottom strand, however, the correlation during the simulation of the in vitro data is r = 0.362, which is lower than the r = 0.482 obtained for the in vivo data (also significant; t test, P < 10−16). These findings suggest that in both in vivo and in vitro scenarios, roughly half of the complexity of the targeting of purified AID activity on the top strand is attributable to hot/cold spot discrimination by AID. Compared to the top strand, the complexity observed at the bottom strand both in vivo and in vitro is less well modeled by in silico simulation of hot/cold spot targeting. All these results suggest the existence, even in vitro, of other variables such as interference between sites affecting the mutation process beyond the positioning of hot/cold spot motifs. In addition, the inherent differences between the top and the bottom strand that are reflected in both the in vivo and in vitro experimental results are retained in silico, suggesting that these differences reflect some general differences in the sequence environments of the hot and cold spots in the top and bottom strand.

Spatial Periodicity of Mutations on the Top Strand.

One explanation for the observed strand-specific asymmetry may lie in differences in the spatial distribution of mutations. To examine this we analyzed the intermutation distances between successive mutations. In each sequence with 2 or more mutations, the distance between each mutation and its nearest neighboring mutation either 5′ or 3′ was determined. Fig. 3 shows the distribution of intermutation distances for the top strand in vivo (A) and in vitro (C), as well as for bottom strand in vivo (B) and in vitro (D). Assuming a simple null model in which mutations occur randomly along the length of the sequence (a Poisson process), an exponential distribution is expected, as shown by the continuous (exponential fit) lines in each panel in Fig. 3. The top strand clearly exhibits a trough at around 50 bp, followed by a peak between 50 and 100 bp. The bottom strand, on the other hand, shows only a small tendency for this feature. The difference between the top and bottom strand distributions is statistically significant for the in vivo case and marginally significant for the in vitro case (Kolmogorov-Smirnov, P = 0.0496 in vivo, P = 0.0917 in vitro). At the same time, comparing the distributions between in vivo and in vitro (A vs. C and B vs. D) shows no significant differences (Kolmogorov-Smirnov, P = 0.236 top strand, P = 0.652 bottom strand), demonstrating that the in vitro system correctly emulates the in vivo system as far as spatial mutation determinants are concerned.

Fig. 3.

Fig. 3.

Distribution of intermutation distances. Each graph shows the distribution of distances between mutations in sequences containing 2 or more mutations. Distributions for top strand in vivo (A) and in vitro (C) and for bottom strand in vivo (B) and in vitro (D) are shown. To represent the null model, a best-fit exponential curve (solid line) is also shown.

The fact that the trough is not observed on the bottom strand suggests that this feature could be an outcome of a difference in the spatial layout of C sites in the top and bottom strands, possibly via particular localizations for hot and cold spots. To test this, we generated a simulated dataset using the in silico hot/cold spot model (see SI Methods) to determine whether the hot/cold spot distribution is sufficient to explain the trough feature observed on the top strand. We found that the trough disappears in this case (Fig. S2a), becoming more similar to the null model (Fig. S2b and solid lines in each panel). However, if the full in silico model, which uses the actual mutation frequency of each site as probabilities, is applied, then the trough is retained (Fig. S2c) albeit less markedly than for the actual in vivo dataset (Fig. 3A). Similar results were found in simulations using in vitro parameters (Fig. S3). These results suggest that the trough feature is very likely to be dependent on sequence characteristics beyond the simple susceptibility to mutation of hot and cold spots.

Discussion

Previous studies (11, 12) had established an in vitro system which consists of purified GST-AID made in insect cells acting on gapped single stranded lacZα substrate in a phage vector. While the high mutation rates at WRC hot spots and low mutation rates at SYC cold spots observed in lacZα in this in vitro system were similar to the pattern of AID-induced mutations in vivo, it was not possible to quantitatively and directly compare these results to the in vivo patterns of mutation because the substrate contained entirely different deamination target motifs and sequence contexts. In the studies described here, most of the single stranded substrate consisted of either the top (nontranscribed) or bottom (transcribed) strand of the Jh4 intron derived from the rearranged murine heavy chain VhJ558-Jh4 region that has been analyzed in many studies of genetically deficient mice (2). We have compared the in vitro pattern of mutation of the Jh4 intron to the mutations that arise in that intron in vivo in mice that lack both BER and MMR (10). In the absence of those 2 repair processes, the frequency and location of the mutations that are observed in vivo should reflect only the activity of AID. We found that the patterns of AID activity in vitro and in vivo are highly correlated (r = 0.704) for the top (nontranscribed) strand. This high correlation demonstrates that, at least in this case, the large number of in vivo factors such as AID interacting proteins, transcription, nucleosome positioning, and DNA structures that have been reported to exist in the V region in vivo (2, 3), are not dominant. Furthermore, although some of those factors are likely to be important in vivo, cumulatively they do not play a larger role in the targeting of hot spots and avoidance of cold spots on the nontranscribed stand than the inherent enzymatic properties of AID itself.

The high correlation between the patterns of in vivo and in vitro mutations in the top strand also suggested the possibility of using a computational modeling approach (28), specifically that an in silico model, based exclusively on hot spot, cold spot and neutral site mutation frequencies, might be capable of representing much of the complexity of the mutation process. However, we found that the in silico model falls considerably short of this objective. For example, if we use the in vivo data as the benchmark, then the in silico model has a lower correlation (r = 0.604) with in vivo mutation than the in vitro system (r = 0.704) for the top strand. At least for the top strand, this shows that slightly more than half of the complexity is captured by this simple in silico model (Fig. 2A). The approximately 10% difference (0.604→0.704) between the in silico and in vitro correlations must lie in other features that are either sequence-dependent, such as interaction effects between sites due to spacing between hot spots and dC sites, or which are dependent on inherent properties of the AID enzyme. The fact that the in vitro system performs better than the in silico model suggests that the in vitro system can be used to investigate some of this additional complexity.

The in silico model presented here uses 3 parameters, i.e., mutation frequencies at hot spot, cold spot and neutral sites. Previous research (12, 29, 30) evaluating mutability for all 16 possible (XXC) trinucleotides suggests that AID mutability is not so discontinuous and roughly follows a gradient from hot to cold mutability. Any in silico model which uses more parameters (for example, if the WRC motif parameter were to be divided into AAC, AGC, TAC, and TGC motifs each with its own parameter) has the capacity to fit the data better than a simpler counterpart. However, increasing the number of parameters leaves us with fewer sites to estimate the value of each parameter, especially since some of the triplets may not be present or be present only once in the limited sequence length, thus reducing the effectiveness of the model. Using a standard technique (see Table S2 and SI Methods), which measures model effectiveness on unseen validation data, we compared the 3-parameter hot/cold spot model to an alternative 16-parameter version based on all possible trinucleotide frequencies. Our results indeed show that performance of the 16-parameter model on unseen validation data are significantly degraded, whereas for the 3-parameter model this difference is not statistically significant, making the 3-parameter model preferable. Thus, in the context of these studies, the biological relevance of the 16-parameter model is still unclear; however, confirming this relevancy will require use of datasets spanning multiple sequences in which there is adequate representation of all of the triplets.

When we examined the spacing of mutations, we found that in the top strand both in vivo and in vitro there was a decrease in the frequency of mutations that were approximately 50 nucleotides apart (the trough feature, Fig. 3). This periodicity is lost in the in silico model that depends only on the hot/cold spot assumption, but is observed when the full sequence context is included by using the frequency of mutation from each individual site (full model Figs. S2 and S3). Periodicity of 175–180 mutated nucleotides separated by approximately 20 unmutated nucleotides has been reported in an ectopically integrated transgene that contained many hot spots in mice expressing AID, BER and MMR (31). These authors suggested that nucleosome phasing and spacing might be one possible explanation for the spacing they observed. By contrast, the main feature of the spacing that we observed is a strong reduction in mutations separated by approximately 50 nucleotides and which furthermore is observed both in vivo and in vitro, suggesting either a higher order sequence organization or a property of AID that allows it to reproducibly skip a fixed distance from one targeted region to another within the V region. The fact that, at least in vitro, AID binds equally well to ssDNA that does not contain any dC as to substrates containing dC and hot spots (13) makes it difficult to interpret these results because we do not know where AID makes its initial contact with the substrate. However, based on the crystal structure of APOBEC2 (32) the AID dimer has maximum length of approximately 127 Å. If the interbase distance of the DNA is 5 Å, we speculate that a dimer would cover about 25 bp and a tetramer might occupy the 50 bp. It is possible that the sliding and jumping characteristics of AID processivity observed in vitro may also explain the trough feature seen in Fig. 3. However, we see the same spacing in the Jh4 intron in vivo. There is little evidence that there is more than one AID-induced mutation per cell division in vivo, and mutations in 2 neighboring dCs often occur in successive cell divisions (31), so if AID is processive in vivo it would have to persist through multiple cell divisions.

When comparing the in vitro and in vivo mutation patterns, we also found a significant but weak correlation (r = 0.319) in the bottom, transcribed strand, in contrast with the strong correlation in the top strand (r = 0.704). The trough feature is absent on the bottom strand, which may contribute to, or reflect, the observed differences between top and bottom strands. Differences in somatic hypermutation patterns between the nontranscribed and transcribed strands have been noted in previous studies both for the Jh4 intron and other regions. In a previous analysis of the same in vivo data we used in this study, Xiao et al. (33) found that, when corrected for base composition, the top strand had a significantly higher mutation rate than the bottom strand. One possible explanation for the observed differences between top and bottom strands is their differences in base composition. The bottom strand contains far more dCs (141 sites or 27% of the total) than the top strand (75 sites or 14%), although the proportion of Cs in hot spots on each strand is not significantly different (Table S1). It is possible that the relative crowding of dC residues and of hot spots (see Fig. 1) has created additional “secondary” complexity of the mutation process on the bottom strand. By contrast, the relative sparseness and periodicity of mutations (Fig. 3 A and C) on the top strand could lead to less complex behavior which is more easily reproduced both in vitro and in silico. Clearly, as this secondary complexity on the bottom strand increases in importance, the behavior deviates further from the “canonical” behavior, represented by the in silico model. The relatively poor performance (r = 0.362) of the in silico model in reproducing the in vitro mutation pattern on the bottom strand suggests that this secondary complexity has had a particularly strong effect on the bottom strand in vitro assay. The secondary complexity may arise from interference between sites (nonindependence).

It is also possible that some higher order sequence organization of the DNA is, at least partly, responsible for some of the features we observe, for example, the decrease in mutation seen at 50 nucleotides and the strand differences. Based on computational analysis, it has been suggested that there may be stem-loop structures in the V region with a dominant length of 65 bp and that these can be correlated with the frequency of mutation (34). However, when transgenes were constructed that would favor such stem loops, either in nascent mRNA or DNA, and examined in vivo, the presence or absence of such potential stem loop structures did not affect the frequency or targeting of mutation (31). Computational analysis also suggested a correlation between potential stem loops in nascent mRNA and hot spots for mutation (35, 36). Since actual stem loops were not demonstrated to be present in these studies, it is still possible the sequence characteristics required for them to be present in chromatin in vivo have not yet been discovered. At the same time, it is possible that stem loop structures might play a nontrivial role for our in vitro dataset, especially since the in vitro system we use here involves a relatively long (approximately 1000 nt) region of ssDNA. We investigated this possibility by making stem loop structure predictions in silico using standard techniques (37) and used these predictions to analyze the effect of stem loops on the top and bottom strands (see SI Methods). Although we found that stem loop structures have no effect on hot spot mutations, we did detect significant effects for neutral and cold spots (Table S3). We also found that, at least for neutral sites, these putative stem-loop structures have a stronger effect on the bottom strand. If stem-loop structures play little or no role in vivo (31), this difference could contribute to the observed strand asymmetry in the form of a decreased in vitro vs. in vivo correlation for the bottom strand. These findings clearly call for further investigation.

In conclusion, we have used a natural substrate for AID that allowed us to confirm that there is a high correlation between the patterns of mutations that occur in vivo and in vitro suggesting that the inherent characteristics of the AID enzyme play a major role in the increased targeting of hot spots and decreased targeting of cold spots in single stranded DNA. Further, by comparing in vivo, in vitro, and in silico patterns of mutation we have a pattern of the spacing of mutation that suggests the sequence environment within which hot and cold spots exist contributes to the behavior of AID. The differences between the top and bottom strand also suggest that the overall sequence environment is important in how AID behaves.

Methods

Analysis of AID-Catalyzed C Deamination in Jh4 Intronic Sequence in Vitro.

A derivative of M13 mp2 phage with 2 unique restriction endonuclease sites for PstI and BglII downstream of the lacZα gene (positions +281 and +293, respectively) was constructed by site directed mutagenesis. The murine heavy chain VhJ558-Jh4 3′ flanking intron (referred here as Jh4 intron) was PCR amplified from genomic DNA and cloned into the PstI site in both orientations giving M13mp2 phage derivatives with either the top (nontranscribed) or the bottom (transcribed) strand inserted downstream of the lacZα gene. Closed circular DNA gapped substrates with the lacZα-Jh4 intronic region as ssDNA were constructed by annealing purified M13 ssDNA with a approximately 6.8 kb PvuII-BglII fragment of M13 dsDNA using a protocol described previously (12). Recombinant human GST-AID protein was expressed in Sf9 cells and purified as described previously (38).

The deamination reactions (30 μl total volume), containing GST-AID (100 ng), RNase (200 ng), and a gapped DNA substrate (500 ng) dissolved in a reaction buffer (10 mM Tris-HCl, pH 8.0, 1 mM EDTA, 1 mM DTT), were carried out at 37 °C for 10 min and terminated by twice extracting the DNA product with phenol:chlorophorm:isoamyl alcohol (25:24:1). The DNA products were transfected into uracil glycosylase deficient (ung) E. coli cells and plated on α-complementation host cells in the presence of 5-bromo-4-chloro-3-indolyl-beta-D-galactopyranoside (X-gal) and Isopropyl beta-D-1-thiogalactopyranoside (IPTG) as described (12). The occurrence of C deaminations within the lacZα ssDNA gap region gives rise to either white or light blue mutant M13 plaques. Since AID acts processively on ssDNA, individual substrates with deaminations in the lacZα gene are also expected to have deaminations in the downstream Jh4 intron within the ssDNA gap. DNAs from mutant M13 phages were isolated and the entire Jh4 insert region was sequenced. Deaminations were detected as C→T transition mutations.

Balancing the Mutations per Sequence in Vitro and in Vivo.

This process ensures that the mean number of mutations per sequence (T and V respectively for in vitro and in vivo) is roughly equal for the 2 datasets to eliminate potential bias. We preprocessed the data by iteratively eliminating the most mutated in vitro sequences (in practice T was much greater than V, and therefore TV > 0) up to the point where removing an additional sequence would cause TV < 0. Note that the data shown in Fig. 1 reflects this correction (the eliminated 60 nt on each strand are shown for illustrative purposes only).

Supplementary Material

Supporting Information

Acknowledgments.

T.M. and A.B. are supported in part by The Seaver Foundation Center for Bioinformatics at the Albert Einstein College of Medicine, and National Institutes of Health Grants 1-R01-AG028872, and 1-P01-AG027734. S.L.K. is supported by National Institutes of Health Grant 5T32CA91773. S.R. is supported by Postdoctoral Fellowship EX-2006–0732 from the Spanish Ministry of Education and Science. M.D.S. is supported by National Institutes of Health Grants R01CA72649 and R01CA102705 and by the Harry Eagle Chair provided by the National Women's Division of the Albert Einstein College of Medicine. M.F.G. and P.P. are supported by National Institutes of Health Grants ESO13192 and R37GM21422.

Footnotes

The authors declare no conflict of interest.

This article contains supporting information online at www.pnas.org/cgi/content/full/0903803106/DCSupplemental.

References

  • 1.Maizels N. Immunoglobulin gene diversification. Annu Rev Genet. 2005;39:23–46. doi: 10.1146/annurev.genet.39.073003.110544. [DOI] [PubMed] [Google Scholar]
  • 2.Di Noia JM, Neuberger MS. Molecular mechanisms of antibody somatic hypermutation. Annu Rev Biochem. 2007;76:1–22. doi: 10.1146/annurev.biochem.76.061705.090740. [DOI] [PubMed] [Google Scholar]
  • 3.Peled JU, et al. The biochemistry of somatic hypermutation. Annu Rev Immunol. 2008;26:481–511. doi: 10.1146/annurev.immunol.26.021607.090236. [DOI] [PubMed] [Google Scholar]
  • 4.MacLennan IC. Germinal centers still hold secrets. Immunity. 2005;22:656–657. doi: 10.1016/j.immuni.2005.06.002. [DOI] [PubMed] [Google Scholar]
  • 5.Muramatsu M, et al. Class switch recombination and hypermutation require activation-induced cytidine deaminase (AID), a potential RNA editing enzyme. Cell. 2000;102:553–563. doi: 10.1016/s0092-8674(00)00078-7. [DOI] [PubMed] [Google Scholar]
  • 6.Muramatsu M, et al. Specific expression of activation-induced cytidine deaminase (AID), a novel member of the RNA-editing deaminase family in germinal center B cells. J Biol Chem. 1999;274:18470–18476. doi: 10.1074/jbc.274.26.18470. [DOI] [PubMed] [Google Scholar]
  • 7.Revy P, et al. Activation-induced cytidine deaminase (AID) deficiency causes the autosomal recessive form of the Hyper-IgM syndrome (HIGM2) Cell. 2000;102:565–575. doi: 10.1016/s0092-8674(00)00079-9. [DOI] [PubMed] [Google Scholar]
  • 8.Rajewsky K, Forster I, Cumano A. Evolutionary and somatic selection of the antibody repertoire in the mouse. [Review] Science. 1987;238:1088–1094. doi: 10.1126/science.3317826. [DOI] [PubMed] [Google Scholar]
  • 9.Delbos F, Aoufouchi S, Faili A, Weill JC, Reynaud CA. DNA polymerase eta is the sole contributor of A/T modifications during immunoglobulin gene hypermutation in the mouse. J Exp Med. 2007;204:17–23. doi: 10.1084/jem.20062131. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Rada C, Di Noia JM, Neuberger MS. Mismatch recognition and uracil excision provide complementary paths to both Ig switching and the A/T-focused phase of somatic mutation. Mol Cell. 2004;16:163–171. doi: 10.1016/j.molcel.2004.10.011. [DOI] [PubMed] [Google Scholar]
  • 11.Bransteitter R, Pham P, Scharff MD, Goodman MF. Activation-induced cytidine deaminase deaminates deoxycytidine on single-stranded DNA but requires the action of RNase. Proc Natl Acad Sci USA. 2003;100:4102–4107. doi: 10.1073/pnas.0730835100. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Pham P, Bransteitter R, Petruska J, Goodman MF. Processive AID-catalysed cytosine deamination on single-stranded DNA simulates somatic hypermutation. Nature. 2003;424:103–107. doi: 10.1038/nature01760. [DOI] [PubMed] [Google Scholar]
  • 13.Larijani M, et al. AID associates with single-stranded DNA with high affinity and a long complex half-life in a sequence-independent manner. Mol Cell Biol. 2007;27:20–30. doi: 10.1128/MCB.00824-06. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Bransteitter R, Pham P, Calabrese P, Goodman MF. Biochemical analysis of hypermutational targeting by wild type and mutant activation-induced cytidine deaminase. J Biol Chem. 2004;279:51612–51621. doi: 10.1074/jbc.M408135200. [DOI] [PubMed] [Google Scholar]
  • 15.Besmer E, Market E, Papavasiliou FN. The transcription elongation complex directs activation-induced cytidine deaminase-mediated DNA deamination. Mol Cell Biol. 2006;26:4378–4385. doi: 10.1128/MCB.02375-05. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Chelico L, Pham P, Calabrese P, Goodman MF. APOBEC3G DNA deaminase acts processively 3′–> 5′ on single-stranded DNA. Nat Struct Mol Biol. 2006;13:392–399. doi: 10.1038/nsmb1086. [DOI] [PubMed] [Google Scholar]
  • 17.Halford SE. Hopping, jumping and looping by restriction enzymes. Biochem Soc Trans. 2001;29:363–374. doi: 10.1042/bst0290363. [DOI] [PubMed] [Google Scholar]
  • 18.Basu U, et al. The AID antibody diversification enzyme is regulated by protein kinase A phosphorylation. Nature. 2005;438:508–511. doi: 10.1038/nature04255. [DOI] [PubMed] [Google Scholar]
  • 19.Pasqualucci L, Kitaura Y, Gu H, Dalla-Favera R. From The Cover: PKA-mediated phosphorylation regulates the function of activation-induced deaminase (AID) in B cells. Proc Natl Acad Sci USA. 2006;103:395–400. doi: 10.1073/pnas.0509969103. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.McBride KM, et al. Regulation of hypermutation by activation-induced cytidine deaminase phosphorylation. Proc Natl Acad Sci USA. 2006;103:8798–8803. doi: 10.1073/pnas.0603272103. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Shen HM, Storb U. Activation-induced cytidine deaminase (AID) can target both DNA strands when the DNA is supercoiled. Proc Natl Acad Sci USA. 2004;101:12997–13002. doi: 10.1073/pnas.0404974101. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Chaudhuri J, Khuong C, Alt FW. Replication protein A interacts with AID to promote deamination of somatic hypermutation targets. Nature. 2004;430:992–998. doi: 10.1038/nature02821. [DOI] [PubMed] [Google Scholar]
  • 23.Yu K, Lieber MR. Nucleic acid structures and enzymes in the immunoglobulin class switch recombination mechanism. DNA Repair. 2003;2:1163–1174. doi: 10.1016/j.dnarep.2003.08.010. [DOI] [PubMed] [Google Scholar]
  • 24.Duquette ML, Huber MD, Maizels N. G-rich proto-oncogenes are targeted for genomic instability in B-cell lymphomas. Cancer Res. 2007;67:2586–2594. doi: 10.1158/0008-5472.CAN-06-2419. [DOI] [PubMed] [Google Scholar]
  • 25.Odegard VH, Schatz DG. Targeting of somatic hypermutation. Nat Rev Immunol. 2006;6:573–583. doi: 10.1038/nri1896. [DOI] [PubMed] [Google Scholar]
  • 26.Yang SY, Schatz DG. Targeting of AID-mediated sequence diversification by cis-acting determinants. Adv Immunol. 2007;94:109–125. doi: 10.1016/S0065-2776(06)94004-8. [DOI] [PubMed] [Google Scholar]
  • 27.MacCarthy T, Roa S, Scharff MD, Bergman A. SHMTool: A webserver for comparative analysis of somatic hypermutation datasets. DNA Repair. 2009;8:137–141. doi: 10.1016/j.dnarep.2008.09.006. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Kleinstein SH. Getting started in computational immunology. PLoS Comput Biol. 2008;4:e1000128. doi: 10.1371/journal.pcbi.1000128. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Shapiro GS, Aviszus K, Murphy J, Wysocki LJ. Evolution of Ig DNA sequence to target specific base positions within codons for somatic hypermutation. J Immunol. 2002;168:2302–2306. doi: 10.4049/jimmunol.168.5.2302. [DOI] [PubMed] [Google Scholar]
  • 30.Larijani M, Frieder D, Basit W, Martin A. The mutation spectrum of purified AID is similar to the mutability index in Ramos cells and in ung(−/−)msh2(−/−) mice. Immunogenetics. 2005;56:840–845. doi: 10.1007/s00251-004-0748-0. [DOI] [PubMed] [Google Scholar]
  • 31.Michael N, et al. Effects of sequence and structure on the hypermutability of immunoglobulin genes. Immunity. 2002;16:123–134. doi: 10.1016/s1074-7613(02)00261-3. [DOI] [PubMed] [Google Scholar]
  • 32.Prochnow C, Bransteitter R, Klein MG, Goodman MF, Chen XS. The APOBEC-2 crystal structure and functional implications for the deaminase AID. Nature. 2007;445:447–451. doi: 10.1038/nature05492. [DOI] [PubMed] [Google Scholar]
  • 33.Xiao Z, et al. Known components of the immunoglobulin A:T mutational machinery are intact in Burkitt lymphoma cell lines with G:C bias. Mol Immunol. 2007;44:2659–2666. doi: 10.1016/j.molimm.2006.12.006. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Wright BE, Schmidt KH, Davis N, Hunt AT, Minnick MF. II. Correlations between secondary structure stability and mutation frequency during somatic hypermutation. Mol Immunol. 2008;45:3600–3608. doi: 10.1016/j.molimm.2008.05.012. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Steele EJ, Lindley RA, Wen J, Weiller GF. Computational analyses show A-to-G mutations correlate with nascent mRNA hairpins at somatic hypermutation hotspots. DNA Repair. 2006;5:1346–1363. doi: 10.1016/j.dnarep.2006.06.002. [DOI] [PubMed] [Google Scholar]
  • 36.Storb U, et al. A hypermutable insert in an immunoglobulin transgene contains hotspots of somatic mutation and sequences predicting highly stable structures in the RNA transcript. J Exp Med. 1998;188:689–698. doi: 10.1084/jem.188.4.689. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.Markham NR, Zuker M. UNAFold: Software for nucleic acid folding and hybridization. Methods Mol Biol. 2008;453:3–31. doi: 10.1007/978-1-60327-429-6_1. [DOI] [PubMed] [Google Scholar]
  • 38.Pham P, et al. Impact of phosphorylation and phosphorylation-null mutants on the activity and deamination specificity of activation-induced cytidine deaminase. J Biol Chem. 2008;283:17428–17439. doi: 10.1074/jbc.M802121200. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supporting Information

Articles from Proceedings of the National Academy of Sciences of the United States of America are provided here courtesy of National Academy of Sciences

RESOURCES