Abstract
During the several-week course of an immune response, B cells undergo a process of clonal expansion, somatic hypermutation of the immunoglobulin (Ig) genes and affinity-dependent selection. Over a lifetime, each B cell may participate in multiple rounds of affinity maturation as part of different immune responses. These two time-scales for selection are apparent in the structure of B-cell lineage trees, which often contain a ‘trunk’ consisting of mutations that are shared across all members of a clone, and several branches that form a ‘canopy’ consisting of mutations that are shared by a subset of clone members. The influence of affinity maturation on the B-cell population can be inferred by analysing the pattern of somatic mutations in the Ig. While global analysis of mutation patterns has shown evidence of strong selection pressures shaping the B-cell population, the effect of different time-scales of selection and diversification has not yet been studied. Analysis of B cells from blood samples of three healthy individuals identifies a range of clone sizes with lineage trees that can contain long trunks and canopies indicating the significant diversity introduced by the affinity maturation process. We here show that observed mutation patterns in the framework regions (FWRs) are determined by an almost purely purifying selection on both short and long time-scales. By contrast, complementarity determining regions (CDRs) are affected by a combination of purifying and antigen-driven positive selection on the short term, which leads to a net positive selection in the long term. In both the FWRs and CDRs, long-term selection is strongly dependent on the heavy chain variable gene family.
Keywords: antigen-driven selection, affinity maturation, mutations, lineage trees, B-cell receptor, next generation sequencing
1. Introduction
B lymphocytes recognize pathogens through the binding of specific B-cell receptors (BCRs), also referred to as immunoglobulin (Ig), expressed on their cell surface. Receptor diversity in the B-cell population is generated in two stages. First, an initial BCR is created through recombination of different germline gene segments during B-cell maturation in the bone marrow [1]. Second, somatic hypermutation (SHM) introduces point mutations into the DNA coding for the BCR during T-cell-dependent adaptive immune responses. The SHM rate has been estimated to be approximately 10−3 per base-pair per cell division [2–4]. This is 106-fold higher than the background mutation rate in other somatic cells. These mutations may alter the affinity or specificity of the BCR, and thus are a source of diversity within an expanding B-cell clone. B cells with affinity-increasing mutations are preferentially expanded in germinal centres, in a process known as affinity maturation, which results in an increase in average affinity in the population over time. Some of these B cells will differentiate into long-lived memory and plasma cells, which are critical to protect us from recurrent infections with the same (or a closely related) microorganism. Within the germinal centres, B cells also undergo isotype switching (e.g. from IgM to IgG) which allows for different effector functions.
Unlike naive B cells that start with a BCR in the unmutated germline state, B memory cells that are reactivated through exposure to recurrent and related infections usually begin with a mutated, affinity-matured receptor, which is then further diversified as part of the adaptive immune response. These two time-scales for selection are apparent in the structure of B-cell lineage trees, which often contain a ‘trunk’ consisting of mutations that are shared across all sampled members of a clone, and several branches that form a ‘canopy’ consisting of mutations that are shared by a subset of clone members (figure 1b). The trunk and canopy are separated by the most recent common ancestor (MRCA), which estimates the state of the B cell that initiated the most recent expansion. The MRCA also contains some of the mutations that were fixed during affinity maturation in the most recent germinal centre reaction [5]. Previous studies of selection in the B-cell repertoire have not differentiated between these two scales, and it is unclear if the selection processes are uniform over time. Our previous work suggests that selection may operate differently in the long- and short-term scales as removing the most recent mutations altered the signal for selection [6].
Figure 1.
B-cell lineages divide the affinity maturation process into long (trunk) and short (canopy) time-scales. (a) Antibodies are composed of two identical heavy chains and two identical light chains. The mRNA coding for each chain is divided into FWRs and CDRs, which are interlaced in the linear genetic code. To identify the somatic mutations that are accumulated during affinity maturation, antibody sequences are first aligned with the corresponding inferred germline sequence. (b) After clustering the repertoire into clonally related sequences (clones) and building lineage trees, the affinity maturation process was divided into two non-overlapping intervals: trunk and canopy, which were separated by the MRCA in the lineage, and correspond to long and short time-scales, respectively. The number of mutations is indicated along each branch (assumed to be unity if no number). (Online version in colour.)
B-cell affinity maturation is a micro-evolutionary stochastic process. The dynamics of B-cell clonal expansion and selection for binding pathogens represents an example of a process involving rapid asexual reproduction, where constant diversification and adaptation occurs in parallel with a high mutation rate [7–9]. The BCR appears to be tuned by evolution to optimize the impact of SHM. Mutation hot-spots are more common in the complementarity-determining regions (CDRs), which include most antigen contact residues, and less common in framework regions (FWRs), which are important for the overall receptor structure. Mutations are also more likely to be non-synonymous (NS) than synonymous (S) in the CDRs compared with the FWRs [10] (figure 1a). Mutation in the FWRs may destabilize the antibody (and thus be subject to negative selection), whereas mutations in the CDRs may improve the antibody (and thus be positively selected). Multiple computational methods have been proposed to measure selection in populations or between populations, in the sense of evolving towards a higher fitness phenotype. A common measure compares the observed frequency of NS mutations (NS/(NS + S)) to its expected value. The expected ratio is calculated based on an underlying mutation probability model (e.g. [11–13]), or based on genetic regions where no selection is assumed to occur [14–16]. An increased frequency of NS mutations is treated as an indication of positive selection, while a decreased frequency indicates negative selection. A potential drawback of such methods is their strong sensitivity to the baseline mutation model (i.e. the expected probability of each mutation type), especially when the mutations rate is position dependent [17]. SHM is significantly affected by the local nucleotide composition, resulting in sequence-specific hot- and cold-spots for mutation [11,12]. We have recently developed a background mutation model for SHM targeting and nucleotide substitution that can be used to estimate the expected NS and S mutation frequencies, and thus quantify selection more accurately than previously possible [11].
Recent advances in high-throughput sequencing technologies allow for large-scale characterization of B-cell repertoires, and construction of lineage trees from observed B-cell clones. We analysed the observed blood cDNA BCR repertoire from three human donors, and show clear differences in the distribution of mutations in different genetic regions, and between fixed (trunk) and non-fixed (canopy) mutations. The results presented here strengthen the common thinking about negative selection in the FWRs, but propose a more complex view of the positive selection argued to occur in CDRs.
2. Results
(a). B-cell lineages in the blood contain both trunks and canopies
B-cell repertoire sequencing data were obtained from blood samples of three healthy human donors [18]. As described in §4(a), these sequencing data were processed using the pRESTO pipeline [19] to generate high-quality Ig sequences, which were then submitted to IMGT/HighV-QUEST for germline V(D)J segment identification. The Change-O pipeline [20] was then used to partition the sequences into clonally related groups, and a lineage tree was constructed for each clone (see the electronic supplementary material for representative examples of these trees). Observed clone sizes ranged from 1 to 128 unique sequences and followed a scale-free distribution (figure 2a). Each reconstructed clonal lineage was split into a trunk and canopy. The trunk was defined as the branch leading from the germline Ig sequence inferred by IMGT/HighV-QUEST to the MRCA, while the canopy contained all other branches. The mutation analysis focused on the variable (V) gene segment. Differences in the nucleotide sequence comparing the inferred germline to the MRCA were defined as trunk mutations, while all V gene segment mutations from the MRCA to the observed sequences were defined as canopy mutations. Singletons (i.e. clones containing a single unique sequence) were not included in the analysis, while clones where the inferred germline Ig sequence was the MRCA were included. On average, 72% of the clones (35% of sequences) in each donor were singletons, and over 25% contained both trunks and canopies. In a significant portion of trees, the MRCA was sampled (42.7%) and not inferred. Somatic mutations were not distributed evenly between the trunk and canopy. The median number of trunk mutations was five per sequence, while two mutations per sequence were observed in the canopy for IgG sequences (figure 2b). Note that only mutations occurring in the heavy chain V gene (VH), excluding the CDR3, can be taken into account, as the germline sequence in the junctional regions is estimated with significantly lower confidence. The skewing of mutations towards the trunk was observed for both FWR and CDR mutations, with a higher variance among clones in the CDRs (figure 2b).
Figure 2.
Trunk mutations dominate B-cell repertoires from blood samples of healthy donors. (a) Clone size distribution for each of the sequenced isotypes from subject 420IV. Axes are logarithmic and bins are ‘logarithmic’ sized (each bin is double the size of the previous one). (b) The number of mutations associated with the trunk or canopy was determined for IgG sequences of subject 420IV. Mutations from different regions (FWRs, CDRs or both) are shown in the three panels.
(b). Selection estimation measures
Selection can be estimated by comparing the observed and expected number of NS mutations. If the observed number of NS mutations is higher than expected, this is interpreted as an advantage induced by NS mutations (i.e. positive selection). Similarly, a lower than expected number of NS mutations is interpreted as a disadvantage of NS mutations (i.e. negative selection). To quantify selection, we use the previously proposed BASELINe method [21], along with Multi-locus Bayesian Selection Measure (MBSM), a novel method proposed herein (see the electronic supplementary material for full derivation). MBSM provides a clear Bayesian interpretation of the observed NS and S value in each region. In order to estimate the selection pressure affecting a single sequence using the number of NS and S mutations, the distribution of expected NS and S is required, and not only their average. While it may be correct to assume a symmetrical normal distribution around an expected number when a large number of sequences is observed, this may not be correct when a single sequence is analysed. To estimate the full distribution of expected NS values in a given region of a sequence, MBSM uses a Bayesian approach, which provides a direct interpretation of the number of S mutations as the probability distribution of generations since the onset of mutations.
(c). The framework regions are subject to purifying selection on both short and long time-scales
The BCR FWRs are important to maintain the structural integrity of the receptor, and it has been estimated that approximately 50% of NS mutations in these regions are subject to negative selection [22]. Indeed, all three individuals studied here exhibited strong negative selection pressure on FWR mutations. This negative selection was observed in both trunk and canopy mutations, and for all isotypes (IgA, IgG and IgM) (figure 3). Significantly, when each clone was analysed separately, practically none displayed evidence of significant positive selection in the FWRs. These results were confirmed using two different computational methods for analysing selection mentioned above (BASELINe and MBSM; see §4(c) for details). While there were quantitative differences in the selection strengths estimated by the different methods, both agreed about the direction and consistency of negative selection in the FWRs. This observation was also not dependent on specific V genes, and similar results were found when analysing each V gene separately (data not shown). Overall, these results suggest that negative selection in the FWRs is sweeping, and not the result of a net negative balance between advantageous and deleterious mutations with selection force of the same order (figure 3).
Figure 3.
Mutations in the FWRs are negatively selected in both trunk and canopy. (a,c) The fraction of clones with significant negative (filled bars) or positive (hatched bars) selection as determined by (a) BASELINe and (c) MBSM applied to the FWRs. In each panel, the upper part refers to the trunk, while the lower part refers to the canopy. (b,d) The average selection strength across all sequences as determined by (b) BASELINe and (d) MBSM applied to the FWRs. For BASELINe (b), 95% CIs are shown for each of the selection estimations.
(d). The complementarity determining regions are subject to purifying selection on short time-scales
Most of the residues that contact antigen are found in the CDRs, and it has been assumed that positive selection for affinity-increasing mutations would be focused in these regions [23]. However, many studies have failed to detect positive selection in the CDRs, and it has been suggested that methods based on the frequency of NS mutations may not be appropriate [24]. All three individuals studied here exhibited neutral selection in the CDRs, even when the analysis was restricted to class-switched (and presumably affinity-matured) sequences. However, a much more complex picture emerged when CDR selection was analysed separately for trunk and canopy mutations. CDR mutations that appear in the trunks of lineage trees displayed a strong signal for positive selection, while mutations in the canopy were negatively selected. These results were consistent across methods to quantify selection, isotypes and individuals. The strengths of positive and negative selection were similar (across individuals and isotypes) in the trunk and canopy, respectively. This may explain why neutral selection is commonly observed when considering all mutations as a group. Unlike the case for FWRs, where negative selection was dominant in virtually all clones, selection in the CDRs was variable among clones. Clones with either positive or negative selection among trunk mutations were identified (figure 4). Similar diversity was found when considering selection among canopy mutations. As expected from the population-level analysis, more clones were positively selected when analysing trunk mutations, while more clones were negatively selected when considering canopy mutations.
Figure 4.
Mutations in the CDRs are positively selected in the trunk. (a,c) The fraction of clones with significant negative (filled bars) or positive (hatched bars) selection as determined by (a) BASELINe and (c) MBSM applied to the CDRs. In each panel, the upper part refers to the trunk, while the lower part refers to the canopy. (b,d) The average selection strength across all sequences as determined by (b) BASELINe and (d) MBSM applied to the CDRs. For BASELINe (b), 95% CIs are shown for each of the selection estimations.
A high NS frequency in the trunk can be interpreted as a high fixation probability, which has been directly correlated with selection [25]. The interpretation of the same frequency in the canopy is less straightforward as, by definition, sequences with and without the mutations are observed (otherwise, the mutation would be in the trunk). In this case, positive selection may be more evident in the frequency of cells carrying NS mutations. However, the selection analysis presented above assumes that all identical sequences were derived from a single cell, no matter how many times the sequence was observed. This is because the data are derived from PCR amplification of mRNA and it is impossible to tell whether two identical sequences were derived from: (1) PCR amplification from a single mRNA molecule, (2) different mRNA molecules from the same cell, or (3) mRNA molecules in different cells carrying the same Ig sequence. Potential biases were overcome by requiring: (1) a minimum coverage of two independent reads to call a sequence to reduce sequencing errors, and (2) using only the set of unique reads to reduce PCR amplification biases. As we estimate selection for each tree twice: once for the trunk and once for the canopy using a representative sequence, the effect of PCR amplification bias is expected to be small. In order to ensure that these results were not an artefact of collapsing duplicate sequences, the same analysis was repeated with all sequences in a clone treated independently, and with random sampling of only one sequence per clone. This type of analysis produced similar results (data not shown) suggesting that, indeed, CDR mutation patterns in the trunk and canopy are shaped by different selection pressures.
(e). Selection strength is dependent upon VH family and conserved across individuals
Selection may be affected by the germline sequence of VH. To check for such an effect, we correlated the selection values for each VH separately between the three donors (figure 5). The correlations are stronger in the trunk relative to the canopy. This similar selection behaviour in different VH segments between unrelated individuals might indicate a similar selection process for clones having similar VH genes. This could be owing to similar antigens that trigger related immune responses demonstrated by similar selection strengths of the corresponding clones.
Figure 5.

Correlation of the V gene selection estimations between subjects. (a) Pearson's correlations are shown for the selection estimations calculated for each V gene using MBMS. Only V genes that were used in more than 1% of the sampled repertoire were included. (b) Scatter plot for the V gene selection estimations in the CDRs and canopy data of subject PGP1 versus subject hu420143. Colours represent different V genes as indicated in the colour key to the right. (Online version in colour.)
3. Discussion
The extent to which affinity maturation shapes the observed pattern of somatic mutations in B-cell repertoires has been debated, as several studies have failed to observe the expected excess of NS mutations in CDRs [26]. In some cases, the effects of selection can be confounded by the intrinsic biases of SHM [7], and integration of improved background models into methods for quantifying selection is critical. However, another limitation of current approaches based on analysing the frequency of NS mutations may relate to the short time-scale of B-cell affinity maturation. These methods were originally developed for the analysis of fixed mutations (i.e. those that appear in the trunk). However, many somatic mutations in BCR sequences are shared by only a subset of the cells in a clone, and these mutations may not share a strong signature of selection. Indeed, we have previously shown that removing the most recent mutations (based on a lineage analysis) can improve the signal for selection [6]. Here, we leverage recent advances in sequencing technologies [27] to carry out large-scale lineage and selection analysis of the human B-cell repertoire.
To quantify selection, we have used our previously proposed BASELINe method [24], along with a state-of-the-art SHM targeting and nucleotide substitution model to properly account for the intrinsic biases of the mutation process [7]. In addition, we have developed and used a precise Bayesian estimate to quantify selection and show that its maximum-likelihood approximation converges to our previously proposed BASELINe method (using the Focused statistic) [24]. This provides a precise estimate of the expected distribution of NS mutations in any given region, even for a single sequence. Using these methods, we obtain an estimate of the selection forces affecting the BCR. Interestingly, we find that distinct selection pressures are apparent in the trunk and canopy of B-cell clonal lineages. The trunk mutations are shared by all members of a clone, and thus may be considered fixed. Clear and consistent selection pressures are found for these trunk mutations. In particular, the FWRs show evidence of negative selection, while the CDRs show evidence of positive selection. Trunk mutation patterns are likely devoid of any strongly deleterious mutations and are likely to be enriched for advantageous mutations in the population. Such advantageous mutations are sometimes denoted ‘key mutations' [5,28]. Interesting evidence of such mutations emerges in transgenic mouse systems, where common mutations occur among multiple mice [29]. Such mutations will be rapidly fixed, and seem to induce the main effect of selection in the observed clones. By contrast, mutation patterns in the canopy show overall evidence of negative selection in both FWRs and CDRs. Nevertheless, some individual clones show evidence of positive selection in the CDRs. These may reflect mutations that provide a moderate advantage, or which have occurred recently and thus have not had sufficient time to become fixed in the population. The difference in selection pressures between trunk and canopy may reflect two different underlying mechanisms. First, the difference may reflect distinct time-scales. In this case, the trunk reflects the cumulative history of mutations that have been selected over the course of many immune responses; for example, as would happen from re-stimulation of memory cells with recurrent infections. An alternative explanation is that each clone is the result of a single germinal centre reaction. In this case, the trunk reflects strongly selected mutations which have become fixed in the population, whereas the canopy reflects weakly selected mutations that are not fixed. This study is unable to distinguish between these two models, and we suspect that they both contribute to the observed mutation patterns.
All of the sequencing data analysed in this study were amplified from mRNA. Thus, it is possible that sequences from plasma cells may be over-represented in the data because they contain larger amounts of mRNA and consequently have a high probability of being observed in cDNA samples. Other biases in PCR amplification could also skew the number of times that the same sequence appears in the dataset. In order to ensure that these effects did not impact our results, we repeated the analysis with or without removing redundant sequences, which both led to similar results. When redundant sequences are ignored, the mRNA level of each BCR does not affect its importance in the analysis.
We found a high correlation in the average selection strength in different VH genes among subjects. As negative selection is likely driven mainly by structural constraints, it is easy to see how selection could be VH gene-specific. Conservation of selection strength in the CDRs, especially on the trunk where positive selection is observed, is harder to explain. It seems that different VH germline segments are more (or less) able to optimize their binding affinity. Another possibility is that the correlated selection pressures represent the shared responses to common antigens. Although it is possible that these correlations represent biases in the background model for SHM targeting, this is unlikely as the correlation is much stronger in the trunk than in the canopy. If the correlation was an artefact of errors in the SHM model, no difference would be expected between the two regions.
In most observed evolutionary systems, the mutation rate is too low to observe diversification in real time. The humoral immune response is an exceptional case, where the BCR is subject to a mutation rate estimated to be approximately 10−3 per base-pair per cell division [2–4]. This process occurs within germinal centre structures in the secondary lymphoid organs where B cells are rapidly dividing (up to four divisions per day), producing an optimal observable micro-evolution system. Our results suggest that a better understanding of selection pressures can be obtained by separating the trunk of the lineage tree (mutations from the germline to the MRCA) and the canopy (mutations from the MRCA to the leaves). While mutations on the trunk show consistent selection pressures among donors and isotypes, the effect of canopy mutations is much more variable and probably represents weak/incremental selection.
4. Material and methods
(a). Repertoire sequencing data
B-cell repertoire sequencing data from blood samples of three healthy human donors was obtained from a previous study [18]. In the original publication, a time-series of samples before and after influenza vaccination were sequenced and IgA, IgG and IgM isotypes were identified based on the constant region of the sequences. In this study, we pooled together data from samples collected at three time-points prior to the vaccination. Raw sequencing reads were filtered in several steps to identify and remove low-quality sequences. Conservative thresholds were applied in all cases to increase the reliability of the resulting mutation calls, at the potential expense of excluding some real mutations. Pre-processing was carried out using the Repertoire Sequencing Toolkit (pRESTO) [19], and involved:
- — Quality filtering
- (1) Removal of reads where the primer could not be identified or had a poor alignment score (mismatch rate greater than 0.1).
- (2) Removal of sequences that did not appear in a single sample at least twice.
— Assignment of germline V(D)J segments for each of the Ig sequences: initial V(D)J assignments for each sequence were obtained using IMGT/HighV-QUEST [28].
— Removal of non-functional sequences due to the occurrence of a stop codon or/and a reading frame shift between the V gene and the J gene.
— Removal of sequences with more than 30 mutations.
— Identification of clonally related sequences: sequences were assigned into clonal groups by first partitioning sequences based on common V gene, J gene and junction region length. Within these larger groups, sequences differing from one another by a weighted distance of less than 5 (within the junction region) were then defined as clones. Distance was measured as the number of point mutations weighted by a symmetric version of the nucleotide substitution probability previously described (50). The five threshold corresponds to up to four transition mutations or one to two transversion mutations [13].
(b). Lineage tree construction
For each clone, a lineage tree was inferred via maximum parsimony with PHYLIP v. 3.69 [30] as described in [31]. FWRs and CDRs were defined using the IMGT definitions according to their unique numbering scheme [32]. A consensus sequence for each clone was formed by a weighted sampling of each mutated (and non ‘N’) base from all the sequences in the clone.
(c). Selection estimation
BASELINe [21] was applied to the set of sequences using the focused test [17]. In order to analyse selection in the canopy, all sequences were collapsed to form a representative sequence as previously described [24]. Mutations in codons that had more than one mutation were discarded, as it is usually not possible to infer the order in which the mutations occurred (and thus the micro-sequence context of the mutations is unknown). MBSM, a novel method proposed here (see electronic supplementary material for full derivation) was used on all sequences. For trunk mutations, similar to BASELINe, the MRCA was compared with the inferred Ig germline sequence. For canopy sequences, all sequences were used and compared with the MRCA.
Supplementary Material
Acknowledgements
We wish to acknowledge Mohamed Uduman for his help in organizing the data.
Authors' contributions
G.Y., S.H.K. and Y.L. designed the experiments; G.Y. and J.A.V.H. processed the data; G.Y., J.B., J.A.V.H. and Y.L. analysed the data; G.Y., S.H.K. and Y.L. wrote the manuscript. All authors read and commented on the text.
Competing interests
All authors have no competing interests.
Funding
This work was supported by United States-Israel Binational Science Foundation grants (2013395 and 2009046).
References
- 1.Janeway C, Travers P, Walport M, Shlomchik M. 2004. Immunobiology, pp. 49–53, 6th edn New York, NY: Garland Science. [Google Scholar]
- 2.McKean D, Huppi K, Bell M, Staudt L, Gerhard W, Weigert M. 1984. Generation of antibody diversity in the immune response of BALB/c mice to influenza virus hemagglutinin. Proc. Natl Acad. Sci. USA 81, 3180–3184. ( 10.1073/pnas.81.10.3180) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Storb U. 1996. The molecular basis of somatic hypermutation of immunoglobulin genes. Curr. Opin. Immunol. 8, 206–214. ( 10.1016/S0952-7915(96)80059-8) [DOI] [PubMed] [Google Scholar]
- 4.Kleinstein SH, Louzoun Y, Shlomchik MJ. 2003. Estimating hypermutation rates from clonal tree data. J. Immunol. 171, 4639–4649. ( 10.4049/jimmunol.171.9.4639) [DOI] [PubMed] [Google Scholar]
- 5.Radmacher MD, Kelsoe G, Kepler TB. 1998. Predicted and inferred waiting times for key mutations in the germinal centre reaction: evidence for stochasticity in selection. Immunol. Cell Biol. 76, 373–381. ( 10.1046/j.1440-1711.1998.00753.x) [DOI] [PubMed] [Google Scholar]
- 6.Uduman M, Shlomchik MJ, Vigneault F, Church GM, Kleinstein SH. 2014. Integrating B cell lineage information into statistical tests for detecting selection in Ig sequences. J. Immunol. 192, 867–874. ( 10.4049/jimmunol.1301551) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Liu Y, Joshua D, Williams G, Smith C, Gordon J, MacLennan I. 1989. Mechanism of antigen-driven selection in germinal centres. Nature 342, 929–931. [DOI] [PubMed] [Google Scholar]
- 8.Berek C, Berger A, Apel M. 1991. Maturation of the immune response in germinal centers. Cell 67, 1121–1129. ( 10.1016/0092-8674(91)90289-B) [DOI] [PubMed] [Google Scholar]
- 9.Hodgkin PD, Heath WR, Baxter AG. 2007. The clonal selection theory: 50 years since the revolution. Nat. Immunol. 8, 1019–1026. ( 10.1038/ni1007-1019) [DOI] [PubMed] [Google Scholar]
- 10.Kocks C, Rajewsky K. 1988. Stepwise intraclonal maturation of antibody affinity through somatic hypermutation. Proc. Natl Acad. Sci. USA 85, 8206–8210. ( 10.1073/pnas.85.21.8206) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Yaari G, et al. 2013. Models of somatic hypermutation targeting and substitution based on synonymous mutations from high-throughput immunoglobulin sequencing data. Front. Immunol. 4, 358 ( 10.3389/fimmu.2013.00358) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Shapiro GS, Ellison MC, Wysocki LJ. 2003. Sequence-specific targeting of two bases on both DNA strands by the somatic hypermutation mechanism. Mol. Immunol. 40, 287–295. ( 10.1016/S0161-5890(03)00101-9) [DOI] [PubMed] [Google Scholar]
- 13.Smith DS, Creadon G, Jena PK, Portanova JP, Kotzin BL, Wysocki LJ. 1996. Di- and trinucleotide target preferences of somatic mutagenesis in normal and autoreactive B cells. J. Immunol. 156, 2642–2652. [PubMed] [Google Scholar]
- 14.MacCarthy T, Kalis SL, Roa S, Pham P, Goodman MF, Scharff MD, Bergman A. 2009. V-region mutation in vitro, in vivo, and in silico reveal the importance of the enzymatic properties of AID and the sequence environment. Proc. Natl Acad. Sci. USA 106, 8629–8634. ( 10.1073/pnas.0903803106) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Shlomchik MJ, Aucoin AH, Pisetsky DS, Weigert MG. 1987. Structure and function of anti-DNA autoantibodies derived from a single autoimmune mouse. Proc. Natl Acad. Sci. USA 84, 9150–9154. ( 10.1073/pnas.84.24.9150) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Liberman G, Benichou J, Tsaban L, Glanville J, Louzoun Y. 2013. Multi step selection in IgH chains is initially focused on CDR3 and then on other CDR regions. Front. Immunol. 4, 274 ( 10.3389/fimmu.2013.00274) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Hershberg U, Uduman M, Shlomchik MJ, Kleinstein SH. 2008. Improved methods for detecting selection by mutation analysis of Ig V region sequences. Int. Immunol. 20, 683–694. ( 10.1093/intimm/dxn026) [DOI] [PubMed] [Google Scholar]
- 18.Laserson U, et al. 2014. High-resolution antibody dynamics of vaccine-induced immune responses. Proc. Natl Acad. Sci. USA 111, 4928–4933. ( 10.1073/pnas.1323862111) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Vander Heiden JA, Yaari G, Uduman M, Stern JN, O'Connor KC, Hafler DA, Vigneault F, Kleinstein SH. 2014. pRESTO: a toolkit for processing high-throughput sequencing raw reads of lymphocyte receptor repertoires. Bioinformatics 30, 1930–1932. ( 10.1093/bioinformatics/btu138) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Gupta NT, Vander Heiden J, Uduman M, Gadala-Maria D, Yaari G, Kleinstein SH. In press Change-O: a toolkit for analyzing large-scale B cell immunoglobulin repertoire sequencing data. Bioinformatics btv359. ( 10.1093/bioinformatics/btv359) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Yaari G, Uduman M, Kleinstein SH. 2012. Quantifying selection in high-throughput immunoglobulin sequencing data sets. Nucleic Acids Res. 40, e134 ( 10.1093/nar/gks457) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Shlomchik MJ, Marshak-Rothstein A, Wolfowicz CB, Rothstein TL, Weigert MG. 1987. The role of clonal selection and somatic mutation in autoimmunity. Nature 328, 805–811. ( 10.1038/328805a0) [DOI] [PubMed] [Google Scholar]
- 23.Morea V, Tramontano A, Rustici M, Chothia C, Lesk AM. 1998. Conformations of the third hypervariable region in the VH domain of immunoglobulins. J. Mol. Biol. 275, 269–294. ( 10.1006/jmbi.1997.1442) [DOI] [PubMed] [Google Scholar]
- 24.Uduman M, Yaari G, Hershberg U, Stern JA, Shlomchik MJ, Kleinstein SH. 2011. Detecting selection in immunoglobulin sequences. Nucleic Acids Res. 39, W499–W504. ( 10.1093/nar/gkr413) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Kimura M. 1962. On the probability of fixation of mutant genes in a population. Genetics 47, 713–719. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Bose B, Sinha S. 2005. Problems in using statistical analysis of replacement and silent mutations in antibody genes for determining antigen-driven affinity selection. Immunology 116, 172–183. ( 10.1111/j.1365-2567.2005.02208.x) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Benichou J, Ben-Hamo R, Louzoun Y, Efroni S. 2012. Rep-Seq: uncovering the immunological repertoire through next-generation sequencing. Immunology 135, 183–191. ( 10.1111/j.1365-2567.2011.03527.x) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Allen D, Simon T, Sablitzky F, Rajewsky K, Cumano A. 1988. Antibody engineering for the analysis of affinity maturation of an anti-hapten response. EMBO J. 7, 1995–2001. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Anderson SM, et al. 2009. Taking advantage: high-affinity B cells in the germinal center have lower death rates, but similar rates of division, compared to low-affinity cells. J. Immunol. 183, 7314–7325. ( 10.4049/jimmunol.0902452) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Feldstein J. 2009. PHYLIP (Phylogeny Inference Package) version 3.69. Seattle, WA: Department of Genetics, University of Washington. [Google Scholar]
- 31.Stern JNH, et al. 2014. B cells populating the multiple sclerosis brain mature in the draining cervical lymph nodes. Sci. Transl. Med. 6, 248ra107 ( 10.1126/scitranslmed.3008879) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Lefranc MP, Pommie C, Ruiz M, Giudicelli V, Foulquier E, Truong L, Thouvenin-Contet V, Lefranc G. 2003. IMGT unique numbering for immunoglobulin and T cell receptor variable domains and Ig superfamily V-like domains. Dev. Comp. Immunol. 27, 55–77. ( 10.1016/S0145-305X(02)00039-3) [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.




