Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2020 Sep 8.
Published in final edited form as: Nature. 2020 May 6;582(7812):421–425. doi: 10.1038/s41586-020-2262-4

BCR Selection and Affinity Maturation in Peyer’s Patches Germinal Centers

Huan Chen 1,4, Yuxiang Zhang 1,4, Adam Yongxin Ye 1,4, Zhou Du 1, Mo Xu 2, Cheng-Sheng Lee 1, Joyce K Hwang 1, Nia Kyritsis 1, Zhaoqing Ba 1, Donna Neuberg 3, Dan R Littman 2, Frederick W Alt 1,*
PMCID: PMC7478071  NIHMSID: NIHMS1613029  PMID: 32499646

SUMMARY

Antigen-binding B cell receptor (BCR)/antibody variable regions are encoded by exons assembled in developing B cells by V(D)J recombination1. Immensely diverse primary BCR repertoires derive from mechanisms that diversify V(D)J gene segment junctions that contribute to the antigen contact complementarity-determining region 3 (CDR3)1. Primary B cells undergo antigen-driven BCR affinity maturation via somatic hyper-mutation (SHM) and cellular selection in germinal centers (GCs)2,3. While most GCs are transient3, gut microbiota-dependent intestinal Peyer’s patch (PP) GCs are chronic4, with little known about their BCR repertoires or SHM patterns. To elucidate physiological PP GC BCR repertoires, we developed a high-throughput repertoire/SHM assay. PP GCs from different mice expand public clonotypes that often have canonical IgH CDR3s that appear far more frequently in naïve B cell repertoires than predicted, due to junctional biases during V(D)J recombination. Some public clonotypes are gut microbiota-dependent and encode antibodies reactive to bacterial glycans, while others are not. SPF fecal transfer to germ-free (GF) mice restored germ-dependent clonotypes, directly implicating BCR selection. Indeed, we identified recurrently selected SHMs in such public clonotypes, implicating affinity maturation in mouse PP GCs under homeostasis conditions. Thus, persistent gut antigens select recurrent BCR clonotypes to seed chronic PP GC responses.


Each newly generated (naïve) B lymphocyte expresses a BCR that harbors, respectively, one IgH and IgL variable region1. However, collectively, B cells express huge repertoire of primary BCRs with different CDR3s, due to mechanisms that diversify variable (V), diversity (D) and joining (J) gene segment junctions during V(D)J recombination1. Junctional diversification mechanisms (deletions, P element formation, and non-templated nucleotide (N) region additions by terminal deoxynucleotidyl transferase (TdT)1) are estimated to generate 1011 or more distinct BCRs5,6, greatly exceeding the approximately 108 steady-state primary mouse B cells7. Not all junctions are randomly diversified. For example, in the absence of TdT during lymphocyte development in the fetal liver, V(D)J junctions frequently use short micro-homologies within two recombining gene segments to generate canonical CDR3s813.

Primary B cells undergo antigen-driven BCR affinity maturation via activation-induced cytidine deaminase (AID)-initiated SHM and cellular selection in “conventional” peripheral lymphoid GC structures found in lymph nodes or spleen2,3. PPs are the major lymphoid tissues involved in gut adaptive immune responses, with continuous GCs in exposure to gut microbiome and dietary contents4. Compared to conventional GCs, PP GCs are unique in two major aspects4. First, they arise in the context of a homeostatic response instead of acute response. Second, the potential antigens PPs face are of enormous complexity. While chronic PP GCs are T cell dependent14,15, affinity maturation to gut antigens at steady state has not been demonstrated16,17. Mice in which BCR-deficient B cell development is driven by a knock-in EBV LMP2A gene form chronic PP GCs but not antigen-induced splenic GCs18. Moreover, identical pre-rearranged knock-in productive and passenger IgH VDJ exons have essentially identical, intrinsic SHM patterns in mouse PP GC B cells19, raising the question of whether PP GCs are sites of BCR diversification in an antigen non-specific manner, as occurs in chicken, sheep and rabbits2022. On the other hand, oral immunization of mice can induce conventional, acute PP GC responses that generate oligoclonal, affinity-matured antibodies23, suggesting chronic PP GC responses are likely shaped by gut micro-environment2. In depth elucidation of physiological BCR repertoires of chronic PP GCs from nontransgenic mice requires application of an appropriate high-throughput assay.

To elucidate PP GC B cell repertoires in specific-pathogen-free (SPF) C57BL/6 WT mice, we upgraded our HTGTS-Rep-Seq assay2426 to cover all V(D)J segment usage and also SHM profiles across the full length of IgH and IgL variable region exons. This “Rep-SHM-Seq” method (Extended Data Fig. 1a,b,c; Methods) uses optimized bait primers designed against a degenerate region at the 3’ end of all JH or JL segments to provide an unbiased assay for determining full IgH and IgL variable region exon repertoires. We further generated a downstream bioinformatic pipeline that incorporated clonotype and SHM analyses (Extended Data Fig. 1d; Methods). Clonotype is conventionally defined as identical V and J segments with more than 90% CDR3 nucleotide sequence identity. Rep-SHM-Seq uses genomic DNA as template and, critical to our experiments, detects both productive and non-productive V(D)J rearrangements. Thus, by assaying SHM patterns of non-productive VHDJH sequences from many GC samples, we generated an intrinsic SHM pattern database for most mouse VHs. With this database, we employ a hierarchical Bayesian model to statistically compare SHM rates of each VH nucleotide for given non-productive (intrinsic) patterns and productive patterns in GC B cells to follow affinity maturation (See Methods). Such experiments cannot be done by existing RNA-based repertoire sequencing methods, since out-of-frame non-productive mRNAs cannot be reliably measured27.

To evaluate ability of Rep-SHM-Seq to follow a well-characterized immune response, we assayed splenic naïve and GC B cell repertoires of NP-CGG intraperitoneally (IP) immunized C57BL/6 mice. We detected significant increases in VH1–72 and Vλ1 utilization above naïve B cell levels (Extended Data Fig. 2a, b), in accord with their known dominance in the C57BL/6 NP response2830. VH1–72 GC rearrangements were associated with two major clonotypes and Vλ1 rearrangements were associated with one major clonotype in all 5 mice examined. Both VH1–72 clonotypes utilized D1–1 and JH2, (Extended Data Fig. 2c), and their associated VH1–72 had G to T point mutation at residue 98 of CDR1 (33W->L, Extended Data Fig. 2d), all of which are known features of C57BL/6 NP response29,31.

To understand the nature of chronic PP GCs, a key question is whether PPs are seeded by B cells harboring specific BCRs that then undergo clonal expansion4. To address this question, we analyzed naïve and GC B cell repertoires in PPs of SPF C57BL/6 mice. VH, D, and JH, as well as VL and JL, utilization patterns were essentially identical between PP and splenic naïve B cells (Extended Data Fig. 3), demonstrating a BCR repertoire with stable composition of variable region gene segment utilization across different lymphoid tissues of SPF C57BL/6 mice. However, PP GC B cell repertoires had significantly enriched utilization of 10 VHs compared to naïve VH repertoires (Fig. 1a). Notably, we identified public clonotypes that recurred in multiple mice for 9 of these VHs, most of which are the largest clonotype among all the clonotypes using the same VH (Fig. 1b). From the 18 different mice analyzed, these recurrent IgH clonotypes were present at the following frequency: VH1–12RC (11/18), VH1–11RC (8/18), VH1–47RC (7/18), VH9–4RC(6/18), VH6–3RC (5/18), VH6–6RC (5/18), VH11–2RC (4–18),VH2–9RC (4/18), VH4–1RC (3/18) (Fig. 1b, Extended Data Fig. 4). To further assess the significance of recurrent PP GC clonotypes enriched in SPF mice, we assayed for this phenomenon in 10 germ-free (GF) mouse pools (3 mice each) to obtain sufficient GC B cell numbers (Extended Data Fig. 5ac). Notably, the VH1–12RC (shown in green) and VH1–47RC (shown in red) were highly recurrent in GF mice (each found in 10/10 GF mice, Fig. 2a, Extended Data Fig. 5d), suggesting they are bacteria-independent. GF mice PP GCs had additional recurrent clonotypes not found in SPF mice, some of which use the same VH and are closely related with only 1–2 amino acids difference in the CDR3 (Fig. 1, 2a, Extended Data Fig. 5e, Extended Data Table 1a).

Fig. 1. |. VH and clonotype enrichment recurs in PP GCs.

Fig. 1. |

a, VH repertoire of productive V(D)J junctions in PP GC on the “+” y-axis vs naïve B cells on the “-” y-axis from 18 SPF mice. The ratio of each functional VH segment is shown as % of total productive V(D)J rearrangements. VH segments are ordered linearly with respect to relative proximity to the Ds based on their chromosome coordinates. Data is plotted as mean ± SEM. P-values for V usage are calculated by two-sided Mann-Whitney U test with FDR correction (*: p < 0.1, **: p < 0.05, ***: p < 0.01). See Supplementary Table 4 for exact p-values. b, GC productive clonotype distribution of each indicated VH is shown in long-tail distribution plot, with the y-axis representing fraction among all productive clonotypes for this VH and x-axis representing the top clonotypes (rank ordered). The dominant recurrent clonotypes are marked with the number of mice containing this clonotype. Colors indicate clonotypes found among different mouse categories as compared with Fig. 2.

Fig. 2 |. Microbial and non-microbial factors underlie recurrent PP GC clonotypes.

Fig. 2 |

a-c, VH repertoire of productive V(D)J junctions in PP GC vs naïve B cells plotted as mean ± SEM from 10 samples of GF mice with 3 mice in each sample as noted by asterisk (a), 5 GF mice colonized with SFB-mono feces (b) and 10 GF mice colonized with SPF feces (c) for 3 weeks. GC productive clonotype distribution of each indicated VH is shown in long-tail distribution plot, with the y-axis representing fraction among all productive clonotypes for this VH and x-axis representing the top clonotypes (rank ordered). The dominant recurrent clonotypes are marked with the number of mice containing this clonotype. Colors indicate clonotypes found among different gnotobiotic mouse categories, including SPF mice in Fig. 1b. P-values for V usage are calculated by Mann-Whitney U test with FDR correction (*: p < 0.1, **: p < 0.05, ***: p < 0.01). See Supplementary Table 4 for exact p-values. The recovery of germ-dependent clonotypes VH1–64RC and VH6–3RC in GF+SPF mice is significant compared to GF mice. (p = 0.0023 for VH1–64RC and p = 0.0001 for VH6–3RC; Fisher’s exact test, two-sided).

We extended these studies to 5 GF mice colonized with segmented filamentous bacteria (SFB)-mono-colonized (SFB-mono) mouse feces32,33 and 10 GF mice transferred with SPF mouse feces. SFB grow primarily in the terminal ileum where they induce Th17 cell responses32,33 and stimulate potent PP GC reactions34,35. We orally-gavaged GF mice with SFB-mono or SPF mouse feces for 3 weeks and found, as expected35, increased PP GC B cell numbers compared to those in GF mice (Extended Data Fig. 5ac). The VH1–12RC and VH1–47RC clonotypes recurred in GF mice following either SFB colonization or SPF mouse feces treatment (Fig. 2b, c, Extended Data Fig. 5d). However, SFB colonization in GF mice further enriched a VH1–64RC (3/5 mice, purple) not observed in SPF mice (Fig. 1, 2b, Extended Data Fig. 5f, Extended Data Table 1b). Given that our SPF mice harbor SFB among an enormous diversity of microbiota, detection of this clonotype might be masked by domination of their PP GC responses by other microbes. In this regard, oral gavage of GF mice with SPF feces also elicited VH1–64RC (4/10 mice), and a few additional recurrent clonotypes not found in SPF mice (Fig. 1, 2c, Extended Data Fig. 5g, Extended Data Table 1c), perhaps due to a more-limited pool of competing bacteria following fecal transfer36. The VH6–3RC, observed in SPF mice, was also recovered as a recurrent PP GC clonotype (6/10 mice, blue) following SPF fecal transfer (Fig. 1, 2c, Extended Data Fig. 5h).

The recurrence and recovery of different PP GC clonotypes under different gnotobiotic conditions indicates that these clonotypes were selected by gut microbiota- or non-bacteria-derived antigens. In support of this, we identified 6 positively selected unique SHMs among 4 of the recurrent PP GC clonotypes (Fig. 3a, b, Extended Data Fig. 6a, b). Consistent with affinity maturation, all selected SHMs render amino acid changes and occur within a CDR region. Specifically, the VH9–4RC had enrichment of 59K->I (Fig. 3a); the VH6–3RC had enrichment of 61H->Y (Fig. 3b); the VH1–12RC had enrichment of 50A->V and 55N->D (Extended Data Fig. 6a); the VH1–64RC had enrichment of 58T->P and 59N->I (Extended Data Fig. 6b). For other recurrent clonotypes where we did not find significantly elevated mutations above the intrinsic pattern (Extended Data Fig. 7), highly complex gut antigens, conceivably could select for various mutations in different cells, which are diluted in the population and mask selected mutations. Alternatively, some recurrent clonotypes may already reach high antigen affinity with germline VH sequences and/or intrinsic SHMs and, thus, not require further affinity maturation.

Fig. 3 |. Affinity maturation of recurrent PP GC clonotypes.

Fig. 3 |

a-b, VH SHM profile of indicated PP GC clonotypes and intrinsic pattern, plotted as mutation rate at each nucleotide. Sequences were stratified by overall VH mutation rate19 (see Methods). Significance is determined by hierarchical Bayesian modeling with PEP0.1 (see Methods; *: PEP0.1<0.05, **: PEP0.1<0.01, ***: PEP0.1<0.005). PEP0.1 value and amino acid changes are denoted for significantly enriched SHMs. Data is presented as mean ± SEM from mice containing indicated clonotype: 6 SPF mice (a), 5 SPF mice and 6 GF+SPF mice (b). c-g, Microbial glycan microarray reactivities of indicated mAbs. See full antibody sequences in Supplementary Table 5. Data is shown as average fold enrichment over negative-control mAb (anti-human PD1) of 4 technical repeats. Major peaks are annotated with glycan number and glycan names are indicated on the right panel.

We assayed the IgL repertoires of the PP GC samples from 9 SPF mice and found three significantly enriched VLs (Vκ14–111, Vκ4–68 and Vκ6–15) and three that were dominantly utilized (Vκ1–110, Vκ4–72, Vκ5–43 albeit usage did not reach significance), each with several recurrent IgL clonotypes (Extended Data Fig. 8a, Supplemental Table 1). To assess whether these recurrent IgL clonotypes were associated with the PP GC IgH recurrent clonotypes to form antibodies, we inferred IgH/IgL pairing based on correlated frequencies in the 9 SPF mice (see Methods) and identified potentially paired IgL clonotypes for 5 recurrent IgH clonotypes. All 5 IgH/IgL pairs recurred in multiple mice. Specifically, VH1–12RC/Vκ5–43Jκ2, VH1–47RC/Vκ1–110Jκ1, VH9–4RC/Vκ1–110Jκ5, VH112RC/Vκ4–72Jκ5 and VH6–6RC/Vκ4–68Jκ2 recurred in 3 to 4 out of 9 SPF mice (Extended Data Fig. 8b). To confirm IgH/IgL pairing, we performed 10x Genomics single cell immune profiling RNA sequencing (scRNA-seq) of 4 SPF mice, and found three exact pairs (VH1–12RC/Vκ5–43Jκ2, VH1–47RC/Vκ1–110Jκ1 and VH11–2RC/Vκ4–72Jκ5) (Extended Data Fig. 8b), indicating these IgH/IgL pairs make antibodies in vivo. We also detected related IgL clonotypes with a different Jκ for VH1–12RC and VH1–47RC by scRNA-seq (Extended Data Fig. 8b), suggesting choice of Jκ is flexible37 for these antibodies. In this regard, scRNA-seq identified a very similar antibody to the inferred VH9–4RC/Vκ1–110Jκ5 in which the paired IgL clonotype used Jκ4 (Extended Data Fig. 8b). We did not find VH6–6RC by scRNA-seq due to more limited data set obtained by this method. For several other IgH recurrent clonotypes in SPF mice where we could not confidently infer recurrently paired IgLs by Rep-SHM-Seq (See Methods), VH6–3RC was of particular interest since it was recovered in SPF fecal transferred GF mice (Fig. 2c). In this regard, we found a Vκ2–109Jκ2 clonotype paired with VH6–3RC by scRNA-seq (Extended Data Fig. 8b), which allowed us to express an antibody with the VH6–3RC in vitro for further characterization.

To further characterize recurrent PP antibodies, we cloned the VH1–12RC/Vκ5–43Jκ2, VH94RC/Vκ1–110Jκ4, VH6–3RC/Vκ2–109Jκ2 pairs from real sequencing reads that contained mutations in both IgH and IgL, including the selected VH mutations, expressed these mAbs and screened them on a microbial glycan microarray38. Consistent with presence in GF mice, the VH1–12RC mAb did not react with any microbial glycans in the microarray (Extended Data Fig. 6c). However, the VH9–4RC mAb showed substantial binding to surface glycans of P. stuartii, P. aeruginosa and P. vulgaris (Fig. 3c), and the VH6–3RC mAb showed substantial binding to lipopolysaccharides of S. marcescens (Fig. 3d), consistent with being selected by gut bacteria in the SPF mice. Since the glycan microarray used is from human bacterial pathogen species that do not normally reside in SPF mice38, the VH9–4RC and VH6–3RC mAbs were likely induced by bacterial species from the mouse gut microbiome bearing similar antigenic glycan structures39. Furthermore, reverting all the IgH and IgL mutations of the VH9–4RC and VH6–3RC mAbs to germline sequences greatly reduced their reactivity to these glycans (Fig. 3e, f), implicating affinity maturation of these antibodies in mouse PP GCs. Finally, replacing the IgH CDR3 of the mutation-reverted VH6–3RC antibody with a same length CDR3 that utilizes the same VH and JH, but has a 50% difference in amino acid sequences (TGPRGFAY vs TDPAWFPY) completely abolished its glycan binding activity, highlighting the importance of its CDR3 sequence per se (Fig. 3g). For this antibody it is possible that, as for many antibodies, the IgL sequence per se is not a key factor for binding40,41 or that the IgL isolated in the context of scRNA-seq actually fulfills this function.

AID-deficient mice, which do not undergo variable region exon SHM or BCR affinity maturation, never-the-less form spontaneous splenic GCs, which are thought to expand B cells with primary BCRs due to induction of aberrantly expanded system-wide gut microbiota42,43. To assess whether appearance of recurrent clonotypes in response to the complex gut microbiome could occur in non-gut lymphoid tissues, we assayed spontaneous splenic GCs in AID-deficient mice and, indeed, found public clonotypes (Extended Data Fig. 9a). One such clonotype (VH1–81RC, brown) is shared between splenic and PP GCs in the AID-deficient mice (Extended Data Fig. 9a, b), indicating potential selection by the same antigen. Thus, selection of public clonotypes within induced splenic GCs also applies to complex gut antigens as well as simple hapten antigens (e.g. NP-CGG, Extended Data Fig. 2). Note that the clonotypes found enriched in the hyper expanded PP GCs42,44 (Extended Data Fig. 5ac) of AID-deficient SPF mice were distinct from those of WT SPF mice (Fig. 1, Extended Data Fig. 9b, Supplementary Table 2), which, among other possibilities, might result from hyper-expanded gut flora with altered bacteria compositions42,44.

Sequence analyses revealed that 8 of the 9 recurrent PP GC clonotypes found in SPF mice, and a subset of those in GF mice with or without fecal transfer from SFB-mono or SPF mice, were each associated with a canonical CDR3 in multiple mice (Fig. 4a, Extended Data Fig. 4, 5d, f, h, Extended Data Table 1). In this regard, the CDR3 of VH1–12RC was found in 32 of 43 total mice from all categories analyzed and that of VH1–47RC was found in 14 of 43 (Fig. 4a). Other canonical CDR3s occur in the mouse backgrounds where the clonotype existed (Fig. 4a). The remaining small portion of non-canonical CDR3s within each clonotype represents junctions with the same VHD or DJH junction and occasional mismatches, many of which do not involve “N” or “P” regions (Supplementary Table 3) and could be explained by SHMs in the canonical CDR3 sequences. Given that the theoretical number of mouse CDR3s that can be generated by V(D)J junctional diversification exceeds mouse naïve B cell numbers by several orders of magnitude57, the question arises as to how the same CDR3 nucleotide sequences occurred in PP GCs in many different mice. To address this question, we used IGoR45 to estimate the intrinsic generation probability (Pgen) of the recurrent CDR3s. The Pgens of the 8 recurrent PP GC CDR3s are significantly higher than the average Pgen of the total CDR3 repertoire (Fig. 4b), and each is also among the highest of all possible CDR3s encoding the same amino acid sequence (Fig. 4c). These findings suggest that the recurrent PP GC CDR3s belong to a subset of CDR3s over-represented in primary repertoires due to biases in junctional diversification813,46, which, in theory, could reflect evolution of the mouse’s V(D)J recombination machinery to generate antibodies recognizing common gut antigens. Thus, the recurrence of PP GC CDR3s likely results from combined influences of strong antigen-dependent BCR selection on set of V(D)J exon sequences that, due to junctional diversification bias, occur in primary repertoires at higher than predicted frequencies.

Fig. 4 |. Intrinsic junctional biases contribute to PP GC clonotype recurrence.

Fig. 4 |

a, Table showing the observed recurrence of each PP GC clonotype and its canonical CDR3 in the mouse categories found with each clonotype. b, Violin plot comparing Pgen distribution of the 8 recurrent PP GC CDR3s with that of 578,654 CDR3s from sequenced naïve B cells. P-value is calculated by two-sided Mann-Whitney U test. c, Violin plot showing Pgen distribution of all possible CDR3 sequences encoding the same amino acids (AA) sequence as the canonical CDR3 for each PP GC recurrent clonotype. The number of CDR3 sequences for each plotted clonotype is: 3072 (VH1–12RC), 147456 (VH1–47RC), 6144 (VH9–4RC), 3072 (VH11–2RC), 2048 (VH6–3RC), 1536 (VH6–6RC), 9216 (VH1–11RC), 221184 (VH2–9RC). Red dot indicates the location of the canonical CDR3 among each Pgen distribution. Box plots are presented with median, upper and lower quartiles and whiskers showing 1.5x interquartile range.

The nature of homeostatic PP GC Ig repertoires, whether BCR selection is involved in their formation, and whether they undergo affinity maturation have been intriguing questions4 that we have now addressed (Extended Data Fig. 9c). Chronic PP GCs of C57BL/6 mice recurrently express a set of clonotype-specific antibodies, mostly with canonical CDR3 sequences, generated more frequently than expected in the naïve B cell repertoire. These antibodies are selected by gut microbial or non-microbial antigens to seed PP GCs, where they undergo affinity maturation. The highly selected PP GC antibodies could contribute to the host-microbiome symbiosis by targeting specific bacterial populations and confer protective capacity against pathogens via cross-reactivity on the surface glycans of microbiota.

Methods

Mice, immunization and cell lines

All mice used in this study are C57BL/6 background except the human VH1–2 mouse model, which was of mixed 129/Sv and C57BL/6 genetic background. WT specific pathogen free (SPF) mice were purchased from Charles River Laboratories International and maintained in SPF conditions. AID−/− mice were generated in the Alt lab from previous studies and maintained in SPF conditions in the animal facility of Boston Children’s Hospital. All SPF mice contained gut SFB as verified by 16S rRNA gene quantitative PCR analysis of fecal bacterial genomic DNA as described32. The human VH1–2 mouse model was described previously5. Germ-free (GF) mice were maintained in germ-free isolators in the Littman lab in the animal facility of the Skirball Institute of the NYU School of Medicine. For SFB-mono or SPF fecal transfer, 5-week old GF mice were orally gavaged with SFB-mono or SPF mouse feces and kept in gnotobiotic iso-cages for 3 weeks. All animal experiments were performed under protocols approved by the Institutional Animal Care and Use Committee of Boston Children’s Hospital and New York University School of Medicine, with compliance to all relevant ethical regulations. For NP-CGG immunization, WT mice aged 8–12 weeks were immunized intraperitoneally with 100 μg of NP-CGG (N-5055A, Biosearch Technologies) in 100 μl PBS mixed with 100 μl of Imject® Alum (Thermo Scientific). Mice were sacrificed at day 10 after immunization with NP-CGG. The mouse Cer/Sis-deleted v-Abl pro-B cell line was made by CRISPR/Cas9-mediated targeting of the Cer/Sis elements that lie between Vκ and Jκ intervening region within an ATM-deficient v-Abl kinase transformed pro-B cell lines that contains an Eμ-Bcl2 transgene as described previously47. Cer/Sis-deletion in the v-Abl pro-B cell line was authenticated by southern blot for the genomic modification. The Cer/Sis-deleted v-Abl pro-B cells were cultured in RPMI medium containing 15% (v/v) FBS, and were treated with 3 μM STI-571 for 4 days to induce V(D)J recombination prior to genomic DNA preparation26. The C57BL/6 ES cell line was derived from a wild-type C57BL/6 mouse. Cell lines were not tested for mycoplasma contamination.

B cell isolation from mouse spleen and PPs

Spleen and PPs were dissected out from 8–12 weeks old mice, prepared into single cell suspensions and purified by EasySep® Negative Selection B cell Enrichment Kit (Stem Cell Techonologies) according to the manufacturer’s protocol. Purified B cells were stained with anti-B220-PE (1:2000) (eBiosciences), anti-CD38-APC (1:200) (eBiosciences) and anti-GL7-FITC (1:200) (Biolegend). GC (B220+GL7+CD38) and nonGC (B220+GL7CD38+) B cells were sorted from the same sample by fluorescence-activated cell sorting (FACS) in the Department of hematology/oncology flow cytometry research facility at Boston Children’s Hospital. Genomic DNA from sorted cells was prepared using a DNeasy Blood and Tissue Kit (Qiagen) according to the manufacturer’s protocol. 4 to 18 independent mice were analyzed for each category as indicated in the respective figures.

Rep-SHM-Seq methodology

Rep-SHM-Seq was derived from HTGTS-Rep-seq24 with several modification. To capture full-length V(D)J sequences in recovered junctions for SHM analysis, we designed bait primers close to the coding ends of JHs and JLs and used Illumina MiSeq 2 × 300-bp paired-end sequencing. Mixed JH or JL primers were designed from a highly degenerative region, so that we minimize the amplification biases caused by different primers. In this regard, we employed a mouse model with human VH1–2 (hVH1–2) replacing the mouse VH81X with IGCR1 deletion so that hVH1–2 accounts for half of the VH usage in mature splenic B cell repertoire25. From splenic B cells of these VH1–2 replaced mice, we made a library from a hVH1–2-specific bait to determine the ratio of each JH in hVH1–2DJ junctions, compared these rations with those of libraries made from mixed JH1–4 baits and found a good correlation (r = 0.96). For Igκ repertoires, we employed a v-Abl kinase transformed pro-B cell line with Cer and Sis element deleted so that proximal Vκ3 family genes are highly utilized48. We made libraries using a Vκ3 family degenerate bait (Vκ3d, annealing to Vκ3–2,7,10,12) and from mixed Jκ1,2,4,5 baits. The Jκ ratios in Vκ3–2,7,10,12Jκ junctions from the two set of baits gave a strong correlation (r = 0.91). For Igλ repertoires, we used Vλ1,2 degenerate bait and mixed Jλ1,2,3 baits to compare Jλ ratios in Vλ1,2Jλ junctions, which gave a strong correlation (r = 0.88). Mixed Jκ, λ baits were further optimized using WT mouse splenic naïve mature B cells to give a κ:λ ratio of 94%:6%, which is consistent with prior findings49,50. These bait primers are listed in Supplementary Table 6.

The bioinformatics pipeline for Rep-SHM-Seq was constructed with three modules: run, mut and clonal. The “run” module was used to preprocess MiSeq reads and assign V, D, J segments, similar to the HTGTS-Rep-seq24 pipeline, but with more stringent filters: only reads with more than 98% of the sequences having a sequencing Quality Score ≥20 were kept; only joined reads (R1 and R2 had an overlap region ≥10 bp and mismatch rate ≤8%) were kept. The qualified reads were assigned to V, D, J segments using IgBLAST software51 with default parameters and determined as productive configuration when the V(D)J rearrangement is in-frame and contains no stop codon; out of frame rearrangements or those with stop codons are considered as in a non-productive configuration. Germline V, D, J gene sequences were obtained from IMGT database52, manually curated, and used to generate IgBLAST sequence databases. We applied various stringencies to filter reads that can align to V, D, and J segments (IgBLAST score >150, total alignment length >100, overall mismatch ratio <0.1). The “mut” module was used to identify V segment mutations and make SHM profile. Only reads with more than 50% V length coverage were kept for SHM profile analysis. Mutations were identified by parsing the reads with the inferred germline sequence collected for each V segment to calculate the mutation frequency at each nucleotide position to profile SHM through the whole V exon. SHM profiles for productive and non-productive sequences were generated separately. The “clonal” module was used to cluster clonotypes based on CDR3 sequences using a previously described method53. The reads with the same assigned V and J segments and CDR3 junction length were grouped. Within each group, reads were hierarchically clustered using single linkage with Hamming distance measured by their junction sequences (distance threshold = 0.1, corresponding to >90% CDR3 sequence identity). The number of clonotypes and input cells for each GC sample is summarized in Supplementary Table 7.

Library preparation for Rep-SHM-Seq and data analysis

All libraries were made using 400 ng genomic DNA as starting materials as described24, so that the level of PCR amplification was kept the same among different samples. In cases when less than 400 ng genomic DNA was obtained from GC or non-GC B cell samples, genomic DNA from JM8a3, a C57BL/6 ES cell line, was added to make the starting DNA template amount consistent among samples. The library preparation procedure was mostly done as previously described24,54 with a few modifications in the linear amplification-mediated PCR step, where we used one reaction of 50 μl instead of 8 reactions and performed 100 cycles instead of 80 cycles.

To get purer populations of GC and naïve B cells for analysis and minimize potential cross-contamination during FACS sorting, we further filtered sequencing reads by keeping mutated reads for B220+GL7+CD38 samples as GC B cells and non-mutated reads for B220+GL7CD38+ samples as naïve B cells. Duplicate junctions were included in the analyses as previously described1,24.

V segment usage analysis

For each sample, the ratio of each functional V segment was calculated as % usage among productive V(D)J junctions. For GC vs naïve repertoire comparison, V segment usages from multiple samples were plotted in the same chart above and below x-axis along chromosome coordinates. A complete list of functional VH and VL segments in the order plotted along x-axis is shown in Supplementary Table 8. To minimize potential variance differences caused by data size differences, the libraries were each normalized by random sampling to the smallest library within each set (Supplementary Table 9). Significantly enriched V segments in GCs were identified by two-sided Mann-Whitney U test and unpaired Student’s t-test with multiple test correction by FDR (p-values shown in Supplementary Table 4). Similar patterns and degree of significance were obtained when compared without normalization.

Recurrent clonotype and CDR3 analysis

To identify recurrent GC clonotypes across different samples, we pooled the GC productive reads from normalized libraries together to carry out clonal clustering53,55. In our study, a clonotype is considered as “enriched” in one sample only if it makes up more than 0.3% of all productive junctions, and as “recurrent” only if it is enriched in more than one mouse. The consensus CDR3 DNA sequence plot for each clonotype was generated by WebLogo as previously described56. To determine the frequency of identified CDR3s in naïve repertoire, we used IGoR as a background model45. We trained IGoR on the VDJ sequences of naïve B cells, with log likelihood converged after six expectation-maximization (EM) iterations. Then, we predicted Pgen for each VDJ sequence of naïve B cells, as well as each recurrent CDR3 by inputting its VDJ sequence. We compared the Pgen distribution of the 8 recurrent PP GC CDR3s with that of all VDJ sequences of naïve B cells by two-sided Mann-Whitney U test, and visualized them by violin plot generated by ggplot257. We also generated all possible ‘back-translated’ nucleotide sequences that could be translated to the same amino acid sequence of each recurrent CDR3, and predicted their Pgens after inserting them back to VDJ sequence. We examined the location of Pgen of observed recurrent ones in the Pgen distribution of ‘back-translated’ sequences.

SHM analysis with hierarchical Bayesian model

To compare SHM mutation rate between productive sequences in a clonotype and background intrinsic non-productive allele, at each nucleotide site of the V gene, we developed a hierarchical Bayesian model to deal with the variable read depth among mouse samples.

For a nucleotide site, denote the mutation rate is θi in a mouse individual (i = 1~m for m biological replicates in one group). The observation xi mutant reads in ni total reads covering the site is modeled as sampling from a binomial distribution with parameter θi. The mutation rate θi of m biological replicates is modeled as sampling from a shared prior beta distribution with parameters α and β. Beta(α, β) distribution can be re-parameterized as η=α + β and μ=α / (α + β), where μ is the mean of the distribution, representing the average mutation rate for the whole group of biological replicates. Here is the detailed model specification.

μ~ Uniform (0,1)η~Gamma(1,1)α=μη; β=(1-μ)ηθi~Beta(α,β)xi~Binomial(ni;θi)

Given data, we applied JAGS58 to generate Markov Chain Monte Carlo sample points to depict the posterior distribution of the parameters. We ran two parallel chains, each ran 100,000 iterations after 60,000 burn-in iterations. To compare SHM profile between a recurrent productive clonotype and intrinsic non-productive allele, we generated posterior sample points of μ and μintrinsic, calculated μdiff = μ - μintrinsic, and the posterior probability P(μdiff > Δ), where the biological effect size Δ is set to 0.1. We reported sites with posterior probability P(μdiff > 0.1) > 0.95, or correspondingly posterior error probability(PEP)(μdiff > 0.1) = 1 - P(μdiff > 0.1) < 0.05 as significant sites. To deal with variation of SHM level in different mice, sequences from all samples were stratified by overall VH mutation rate19 and those with mutation rate lower than 2.5% were used for the analysis.

Heavy-light chain pair inferrences

To obtain possible pairing between recurrent heavy-chain and light-chain clonotypes, we examined the usage percentage of each clonotype in each sample, and ranked light-chain clonotypes for each heavy-chain clonotype by a similarity measure of usage percentage among m samples defined as following. Denote the usage percentage of a heavy-chain clonotype as a m-dimensional vector X = [x1, x2, …, xm], with its average as x¯=1mi=1mxi, and fluctuation vector ΔX=Xx¯. Similarly, denote the usage vector of a light-chain clonotype as Y=[y1,y2,,ym], and also with its average y¯=1mi=1myi and fluctuation vector ΔY=Yy¯. The similarity measure we used is defined as cossimilarity(ΔX,ΔY)|X||Y| , where the cosine similarity measured the similarity in two vector’s direction is calculated as cossimilarity(ΔX,ΔY)=(ΔXΔY)/(|ΔX||ΔY|), and vector length |X|=i=1mxi2. We multiplied the cosine similarity by the usage percentage vector length, because the pairing between clonotypes with higher usage is more reliable. In order to keep confident pairing results, we only reported pairings with the similarity measure larger than 10−3.

Single cell RNA sequencing and data analysis

Droplet-based scRNA-seq datasets were produced using a 10x Genomics Chromium system. B cells isolated from PP GCs were sorted into 80% methanol/PBS and 15,000 cells were loaded onto the Chromium™ Controller instrument following the manufacturer’s recommendations. Cells were partitioned into Gel Beads in Emulsion in the Chromium™ Controller instrument where cell lysis and barcoded reverse transcription of RNA occurred. Libraries were prepared using 10x Genomics Library Kits and sequenced on an Illumina NextSeq500 according the manufacturer’s recommendations. Raw sequencing files were aligned to the mouse V(D)J sequence using Cell Ranger: V(D)J Pipelines (10x Genomics) and the V usage and clonotype profiles were generated and visualized by Loupe V(D)J Browser. Note that due to low viability of GC B cells after isolation in vitro, which is critical for cell recovery according to 10x genomics manufacture’s protocol, we were able to recover only 600~1200 cells for each sample. The low cell recovery rate and low yield of scRNA-seq for GC B cells may introduce some variability in profiling the GC BCR repertoire.

Production of monoclonal antibodies (mAbs)

Heavy chain sequences and their paired light chain sequences of the recurrent PP GC antibodies were determined by Rep-SHM-Seq and confirmed by scRNA-seq. The gblock gene fragments containing the heavy chain and the light chain sequences with or without mutations (Supplementary Table 5), with human constant region sequences (IgG1, Igκ) and a 6xHis tag at the C terminus of heavy chain were synthesized and cloned into pcDNA3.1+ plasmid. The antibodies were generated using the Expi293 expression system (ThermoFisher Scientific, Waltham, MA) according to the product manual. Basically, cell culture supernatants were harvested 5 days after transfection, cleared of cells by centrifugation at 3000 g for 15 min, and subsequently purified by HPLC coupled with HisTrap HP histidine-tagged protein purification columns (GE Healthcare Life Science, Chicago, IL, USA). mAbs were dialyzed by SnakeSkin Dialysis tubing, 10k MWCO, 22mm and stored at 4°C in PBS. The concentrations of the mAbs purified were determined by ELISA with goat anti-human IgG (Southern Biotech).

Microbial glycan microarrays

mAbs were diluted to 5 μg/ml and assayed against the microbial glycan microarray by the Consortium for Functional Glycomics at the Protein-Glycan Interaction Core (CFG) and National Center for Functional Glycomics at Beth Israel Deaconess Medical Center in Boston, MA. Raw data is publicly available under accession numbers CFG_3624, and CFG_3645 (www.functionalglycomics.org). A human anti-PD1 mAb was used as negative control mAb for this assay, and signals for all samples were quantified by calculating the fold change over the anti-PD1 mAb.

Other Statistical analysis

No statistical methods were used to predetermine sample size. All samples were randomly selected and researchers were blinded whenever possible. Fisher’s exact test and Spearman’s correlation coefficient were used as indicated in the figure legends. Statistical tests with appropriate underlying assumptions on data distribution and variance characteristics were used. Statistical analysis was performed by R 3.5.1 and Prism (version 6, GraphPad Software).

Data availability

The next-generation sequencing data reported in this paper have been deposited in the Gene Expression Omnibus (GEO) database under the accession number GSE140795. All figures have associated raw data deposited.

Code availability

The computational pipeline of Rep-SHM-Seq and code for statistical analyses tools used in this study are available at https://github.com/Yyx2626/HTGTSrep.

Extended Data

Extended Data Fig. 1 |. Overview of Rep-SHM-Seq.

Extended Data Fig. 1 |

a, Schematic showing the experimental procedure of the Rep-SHM-Seq method. P, productive; NP, non-productive; MID, multiple identifier. b, Diagram of IgH locus in the hVH1–2 mouse model. The proportions of each JH in hVH1–2DJ junctions are compared between libraries made from splenic B cells with hVH1–2 bait vs mixed JH1–4 bait primers. c, Diagram of IgL locus of the v-Abl kinase transformed pro-B cell line. The Jκ proportions in Vκ3–2,7,10,12Jκ junctions are compared between libraries made from Vκ3d bait and mixed Jκ1,2,4,5 bait primers. The Jλ proportions in Vλ1,2Jλ junctions are compared between libraries made from Vλd1,2 bait and mixed Jλ1,2,3 bait primers. Positions of bait primers are indicated. Three biological repeats are included for each set of primers. r, Spearman’s correlation coefficient. P-values are determined by two-sided Spearman’s correlation test. d, Schematic showing the bioinformatic pipeline of Rep-SHM-Seq. PE, paired-end; QC, quality control.

Extended Data Fig. 2 |. Rep-SHM-Seq detects NP-specific GC selection.

Extended Data Fig. 2 |

a-b, VH and VL repertoire of productive V(D)J junctions in splenic GC vs naïve B cells plotted as mean ± SEM from 5 mice with NP-CGG IP immunization for 10 days. VH and VL segments are each ordered linearly based on their chromosome coordinates, with respect to relative proximity to the Ds or JLs. Vκs are plotted on the left of Vκs. GC productive clonotype distribution of each indicated VH or VL is shown in long-tail distribution plot, with the y-axis representing fraction among all productive clonotypes for this VH or VL and x-axis representing the top clonotypes (rank ordered). The dominant recurrent clonotypes are marked with the number of mice containing this clonotype. P-values for V usage are calculated by two-sided Mann-Whitney U test with FDR correction (*: p < 0.1, **: p < 0.05, ***: p < 0.01). See Supplementary Table 4 for exact p-values. Enrichment of VH14–3, the key VH in BALB/c mouse NP response59,60, is detected here by Rep-SHM-Seq with two recurrent clonotypes (Supplementary Table 2). c, Representative junctional structure of indicated common IgH clonotypes aligned with germline VH (red), D (blue) and JH (orange), with resected sequences shown in grey. Underline indicates microhomology. All possible segment assignments by IGoR with probabilities are shown in Supplementary Table 10. Canonical CDR3 sequences are shown in black, amino acid sequences in purple, and the clonotype consensus in sequence logo pictures. The number of mice containing the canonical CDR3 sequence or the clonotype consensus is indicated in parentheses following each sequence. d, VH SHM profile of indicated clonotypes and intrinsic pattern, plotted as mutation rate at each nucleotide. Sequences were stratified by overall VH mutation rate19 (see Methods). Data is presented as mean ± SEM from 5 mice. Significance is determined by hierarchical Bayesian modeling with PEP0.1 (see Methods; *: PEP0.1<0.05, **: PEP0.1<0.01, ***: PEP0.1<0.005). PEP0.1 value and amino acid changes are denoted for significantly enriched SHMs.

Extended Data Fig. 3 |. Stable composition of variable region exon segments usage in naïve B cells across tissues and mice.

Extended Data Fig. 3 |

VH (a), D, JH (b), VL (c) and JL (d) repertoires of productive V(D)J junctions in splenic vs PP naïve B cells plotted as mean ± SEM from the 5 NP-CGG IP immunized mice. r, Spearman’s correlation coefficient. P-values are determined by two-sided Spearman’s correlation test.

Extended Data Fig. 4 |. Representative junctional structure of indicated recurrent PP GC IgH clonotypes in SPF mice.

Extended Data Fig. 4 |

Junctional structures are aligned with germline VH (red), D (blue) and JH (orange), with resected sequences shown in grey. Underline indicates microhomology. All possible segment assignments by IGoR with probabilities are shown in Supplementary Table 10. For certain clonotypes, the D could not be accurately annotated because of the short, aligned D sequence. Canonical CDR3 sequences are shown in black, amino acid sequences in purple, and the clonotype consensus (CDR3 nucleotide sequences with same V and J segments, same CDR3 length, and more than 90% similarity) in sequence logo pictures. The number of mice containing the canonical CDR3 sequence or the clonotype consensus is indicated in parentheses following each sequence.

Extended Data Fig. 5 |. FACS results of PP GC B cells and recurrent PP GC IgH CDR3s in different gnotobiotic mouse categories.

Extended Data Fig. 5 |

a, FACS plots showing the representative proportion of GC (GL7+CD38-) cells among B220+ cells in PPs. The FACS analysis was performed for 9 SPF mice, 10×3 GF mice, 5 GF+SFB mice, 10 GF+SPF mice, 15 AID−/− SPF mice, and representative results are shown. b, Plot comparing %GC of PP B220+ cells from different mouse categories. c, Plot comparing the number of sorted PP GC B cells in each mouse from different mouse categories. Data is presented as mean ± SEM from 9 SPF mice, 10×3 GF mice, 5 GF+SFB mice, 10 GF+SPF mice, 15 AID−/−SPF mice. P-values are calculated by two-sided Mann-Whitney U test. d, f, h, Representative junctional structure of common IgH clonotypes aligned with germline VH, D and JH for VH1–47RC and VH1–12RC in mice of GF, GF+SFB and GF+SPF conditions (d), VH1–64RC in mice of GF+SFB condition (f) and VH6–3RC in mice of GF+SPF condition (h). All possible segment assignments by IGoR with probabilities are shown in Supplementary Table 10. For certain clonotypes, the D could not be accurately annotated because of the short, aligned D sequence. Canonical CDR3 sequences are shown in black, amino acid sequences in purple, and the clonotype consensus in sequence logo pictures. The number of mice containing the canonical CDR3 sequence or the clonotype consensus is indicated in the parentheses following each sequence. e and g, GC productive clonotype distribution of the indicated VH enriched in PP GC B cells vs naïve B cells in GF mice (e) and GF+SPF mice (g), is shown in long-tail distribution plot, with the y-axis representing fraction among all productive clonotypes for this VH and x-axis representing the top clonotypes (rank ordered). The dominant recurrent clonotypes are marked with the number of mice containing this clonotype. Note that the VH5–17 recurrent clonotype in GF mice is different from the one in GF+SPF mice.

Extended Data Fig. 6 |. SHM selection in recurrent PP GC clonotypes.

Extended Data Fig. 6 |

a-b, VH SHM profile of indicated PP GC clonotypes and intrinsic pattern, plotted as mutation rate at each nucleotide. Sequences were stratified by overall VH mutation rate19 (see Methods). Significance is determined by hierarchical Bayesian modeling with PEP0.1 (see Methods; *: PEP0.1<0.05, **: PEP0.1<0.01, ***: PEP0.1<0.005). PEP0.1 value and amino acid changes are denoted for significantly enriched SHMs. Data is presented as mean ± SEM from mice containing indicated clonotype: 11 SPF mice, 9×3 GF mice, 4GF+SFB mice and 8 GF+SPF mice (a), 3 GF+SFB mice and 4 GF+SPF mice (b). c. Microbial glycan microarray reactivities of mutated VH1–12RC mAb. See full antibody sequences in Supplementary Table 5. Data shown as average fold enrichment over negative-control mAb (anti-human PD1) of 4 technical repeats.

Extended Data Fig. 7 |. Some recurrent PP GC clonotypes do not show selected SHMs.

Extended Data Fig. 7 |

VH SHM profile of indicated PP GC clonotypes and intrinsic pattern, plotted as mutation rate at each nucleotide. Sequences were stratified by overall VH mutation rate19 (see Methods). Significance is determined by hierarchical Bayesian modeling with PEP0.1 (see Methods; ns: PEP0.1>=0.05). Data is presented as mean ± SEM from mice containing indicated clonotype: 7 SPF mice, 10×3 GF mice, 4 GF+SFB mice and 6 GF+SPF mice (a), 4 SPF mice (b), 9×3 GF mice (c).

Extended Data Fig. 8 |. Recurrent PP GC IgH clonotypes are often associated with recurrent IgL clonotypes.

Extended Data Fig. 8 |

a, VL repertoire of productive VJ junctions in PP GC vs naïve B cells plotted as mean ± SEM from 9 SPF mice related to Fig. 1. GC productive clonotype distribution of each indicated VL is shown in long-tail distribution plot, with the y-axis representing fraction among all productive clonotypes for this VL and x-axis representing the top clonotypes (rank ordered). The dominant recurrent clonotypes are marked with the number of mice containing this clonotype. P-values for V usage are calculated by two-sided Mann-Whitney U test with FDR correction (*: p < 0.1, **: p < 0.05, ***: p < 0.01). See Supplementary Table 4 for exact p-values. b, Table showing paired IgL clonotype of recurrent PP GC IgH clonotypes, as inferred by Rep-SHM-Seq (See methods) from 9 SPF mice and/or detected by scRNA-seq from 4 SPF mice.

Extended Data Fig. 9 |. Recurrent clonotypes in chronic splenic and PP GCs in AID−/− mice.

Extended Data Fig. 9 |

a, VH repertoire of productive V(D)J junctions in splenic GC vs naïve B cells plotted as mean ± SEM from 9 AID−/− SPF mice. b, VH repertoire of productive V(D)J junctions in PP GC vs naïve B cells plotted as mean ± SEM from 15 AID−/− SPF mice. GC productive clonotype distribution of each indicated VH is shown in long-tail distribution plot, with the y-axis representing fraction among all productive clonotypes for this VH and x-axis representing the top clonotypes (rank ordered). The dominant recurrent clonotypes are marked with the number of mice containing this clonotype. The brown color indicates the clonotype found common in the splenic and PP GCs of AID−/− mice. The green color indicates the clonotype found in WT mice of different gnotobiotic categories, related to Fig. 1. P-values for V usage are calculated by two-sided Mann-Whitney U test with FDR correction (*: p < 0.1, **: p < 0.05, ***: p < 0.01). See Supplementary Table 4 for exact p-values. c, Schematic summarizing the main findings of the paper. Junctional biases during V(D)J recombination generates a diverse CDR3 repertoire for naïve B cells in PPs, with a set of CDR3s occurring at higher frequency, from which gut microbial or non-microbial antigens select recurrent IgH clonotypes in multiple mice, mostly with canonical CDR3 sequence and recurrent pairing of IgL. These recurrently selected antibodies contain selected SHMs, suggesting affinity maturation. The asterisk on VH6–3RC/Vκ2–109 indicates this pairing was picked up from scRNA-seq data to make the antibody in vitro, not confirmed as recurrent pairing by Rep-SHM-Seq. The frequency of the BCRs/SHMs represented in this schematic does not correspond to their real frequency in the naïve or GC repertoire of PPs.

Extended Data Table. 1 |. Table summarizing recurrent PP GC clonotypes found in GF (a), GF+SFB (b) and GF+SPF mice (c), related to Fig. 2.

The J segment used, CDR3 length, canonical CDR3 sequence in nucleotides or amino acids and the number of mice containing the clonotype or canonical CDR3 nucleotide sequence are indicated.

a
clonotype JH CDR3 length (nts) canonical CDR3 nt canonical CDR3 aa clonotype in # mice canonical CDR3 in # mice
VH1-12RC 3 21 GCAAGAGAGGGGTTTGCTTAC AREGFAY 10 9
VH1-12RC_GF 2 21 GCAAGAGAGGGCTTAGACTAC AREGLDY 5 \
VH1-47RC 4 36 GCAAGGGGGAGTAACTACGACTATGCTATGGACTAC ARGSNYDYAMDY 10 2
VH1-47RC_GF1 4 33 GCAAGGGGGAGTAACTACTATGCTATGGACTAC ARGSNYYAMDY 6 \
VH1-47RC_GF2 2 36 GCAAGGGGGGGTAACTACGTGAACTACTTTGACTAC ARGGNYVNYFDY 6 \
VH1-72RC_GF1 2 30 GCAAGATCGGACTATGGTAACTTTGACTAC ARSDYGNFDY 9 \
VH1-72RC_GF2 2 27 GCAAGATCCGACTACTACTTTGACTAC ARSDYYFDY 4 \
VH5-17RC_GF 3 24 GCAAAACTGGCCTGGTTTGCTTAC AKLAWFAY 5 a
VH2-9RC_GF1 4 39 GCCAAACATGATGGTAACTACGACTATGCTATGGACTAC AKHDGNYDYAMDY 3 \
VH2-9RC_GF2 1 42 GCCAAACATGAAGGTAACTTCGGGGACTGGTACTTCGATGTC AKHEGNFGDWYFDV 3 3
b
clonotype JH CDR3 length (nts) canonical CDR3 nt canonical CDR3 aa clonotype in # mice canonical CDR3 in # mice
VH1 -12RC 3 21 GCAAGAGAGGGGTTTGCTTAC AREGFAY 4 4
VH1-47RC 4 a6 GCAAGGGGGAGTAACTACGACTATGCTATGGACTAC ARGSNYDYAMDY 4 3
VH1-64RC 2 36 GCAAGATGTACTACGGTAGTAGCCCCCTTTGACTAC ARCTTVVAPFDY 3 2
c
clonotype JH CDR3 length (nts) canonical CDR3 nt canonical CDR3 aa clonotype in # mice canonical CDR3 in # mice
VH1-12RC 3 21 GCAAGAGAGGGGTTTGCTTAC AREGFAY 8 8
VH1-12RC_GF 2 21 GCAAGAGAGGGGTTTGACTAC AREGFDY 6 4
VH1-47RC 4 a6 GCAAGGGGGAGTAACTACGACTATGCTATGGACTAC ARGSNYDYAMDY 6 6
VH6-aRC 3 24 ACAGACCCGGCCTGGTTTCCTTAC TDPAWFPY 6 4
VH1-26RC_GF+SPF 1 24 GCAAGATCTCGGTACTTCGATGTC ARSRYFDV 5 2
VH5-17RC_GF+SPF 4 30 GCAAGGCCGTATTACTATGCTATGGACTAC ARPYYYAMDY 4 2
VH1-64RC 2 36 GCAAGGCCGTATTACTATGCTATGGACTAC ARTTTVVAPFDY 4 \
VH1-31RC_GF+SPF 4 48 GCAAGGGCATATTACTACGGTAGTAGCTACAACTATGCTATGGACTAC ARAYYYGSSYNYAMDY 2 2

Supplementary Material

Supplementary Material

Acknowledgments

We thank Ming Tian for providing reagents and other Alt lab members for discussions and comments; the participation of the Protein-Glycan Interaction Resource of the CFG (supporting grant R24 GM098791) and the National Center for Functional Glycomics (NCFG) at Beth Israel Deaconess Medical Center, Harvard Medical School (supporting grant P41 GM103694); the department of hematology/oncology flow cytometry research facility at Boston Children’s Hospital for assistance with cell sorting; Renchao Chen and Yi Zhang for assistance with scRNA-seq library preparation. This work was supported by NIH grant R01AI077595 to F.W.A., NIH grant R01DK103358 to D.R.L., and a grant from the NYU Colton Center for Autoimmunity to D.R.L. F.W.A. and D.R.L. are Howard Hughes Medical Institute Investigators. J.K.H. was supported by an NIH M.D./Ph.D grant F30AI114179–01A1. HC is an NRSA Fellow (T32 AI07386) and was supported by a Leukemia and Lymphoma Society Fellow Award. M.X., C.L. and Z.B. were supported by Cancer Research Institute Fellow Awards. Y.Z. is supported by a Damon Runyon Fellowship Award. D. N. was supported by the Dana-Farber/Harvard Cancer Center Support Grant 5P30 CA006516.

Footnotes

The authors declare no competing interests.

Additional information

Extended Data and Supplementary information are available in the online version of this paper.

References

  • 1.Alt FW, Zhang Y, Meng F-L, Guo C & Schwer B Mechanisms of programmed DNA lesions and genomic instability in the immune system. Cell 152, 417–29 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.De Silva NS & Klein U Dynamics of B cells in germinal centres. Nat. Rev. Immunol. 15, 137–148 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Victora GD et al. Germinal center dynamics revealed by multiphoton microscopy with a photoactivatable fluorescent reporter. Cell 143, 592–605 (2010). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Reboldi A & Cyster JG Peyer’s patches: Organizing B-cell responses at the intestinal frontier. Immunol. Rev. 271, 230–245 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Davis MM & Bjorkman PJ T-cell antigen receptor genes and T-cell recognition. Nature 334, 395–402 (1988). [DOI] [PubMed] [Google Scholar]
  • 6.Janeway CJ, Travers P, Walport M & Shlomchik M Immunobiology: The Immune System in Health and Disease. Garland Science (2005). [Google Scholar]
  • 7.Osmond DG The turnover of B-cell populations. Immunol. Today 14, 34–37 (1993). [DOI] [PubMed] [Google Scholar]
  • 8.Alt FW & Baltimore D Joining of immunoglobulin heavy chain gene segments: implications from a chromosome with evidence of three D-JH fusions. Proc. Natl. Acad. Sci. U. S. A. 79, 4118–4122 (1982). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Feeney AJ Predominance of VH-D-JH junctions occurring at sites of short sequence homology results in limited junctional diversity in neonatal antibodies. J. Immunol. 149, 222–9 (1992). [PubMed] [Google Scholar]
  • 10.Komori T, Okada A, Stewart V & Alt FW Lack of N Regions in Antigen Receptor Variable Region Genes of TdT-Deficient Lymphocytes. Science 261, 1171–1175 (1993). [DOI] [PubMed] [Google Scholar]
  • 11.Gilfillan S, Dierich A, Lemeur M, Benoist C & Mathis D Mice lacking TdT: mature animals with an immature lymphocyte repertoire. Science 261, 1175–8 (1993). [DOI] [PubMed] [Google Scholar]
  • 12.Victor KD, Vu K & Feeney AJ Limited junctional diversity in kappa light chains. Junctional sequences from CD43+B220+ early B cell progenitors resemble those from peripheral B cells. J. Immunol. 152, 3467–75 (1994). [PubMed] [Google Scholar]
  • 13.Gu H, Förster I & Rajewsky K Sequence homologies, N sequence insertion and JH gene utilization in VHDJH joining: implications for the joining mechanism and the ontogenetic timing of Ly1 B cell and B-CLL progenitor generation. EMBO J. 9, 2133–40 (1990). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Bergqvist P, Stensson A, Lycke NY & Bemark M T Cell-Independent IgA Class Switch Recombination Is Restricted to the GALT and Occurs Prior to Manifest Germinal Center Formation. J. Immunol. 184, 3545–3553 (2010). [DOI] [PubMed] [Google Scholar]
  • 15.Bunker JJ et al. Innate and Adaptive Humoral Responses Coat Distinct Commensal Bacteria with Immunoglobulin A. Immunity 43, 541–553 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Bunker JJ & Bendelac A IgA Responses to Microbiota. Immunity 49, (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Bunker JJ et al. Natural polyreactive IgA antibodies coat the intestinal microbiota. Science (80-. ). 358, 1–20 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Casola S et al. B cell receptor signal strength determines B cell fate. Nat. Immunol. 5, 317–327 (2004). [DOI] [PubMed] [Google Scholar]
  • 19.Yeap L, Hwang JK, Kepler TB, Wang JH & Alt FW Sequence-Intrinsic Mechanisms that Target AID Mutational Outcomes on Antibody Genes. Cell 163, 1–14 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Reynaud CA, Anquez V, Grimal H & Weill JC A hyperconversion mechanism generates the chicken light chain preimmune repertoire. Cell 48, 379–388 (1987). [DOI] [PubMed] [Google Scholar]
  • 21.Reynaud CA, Mackay CR, Müller RG & Weill JC Somatic generation of diversity in a mammalian primary lymphoid organ: The sheep ileal Peyer’s patches. Cell 64, 995–1005 (1991). [DOI] [PubMed] [Google Scholar]
  • 22.Lanning D, Zhu X, Zhai S-KK & Knight KL Development of the antibody repertoire in rabbit: gut-associated lymphoid tissue, microbes, and selection. Immunological reviews 175, 214–228 (2000). [PubMed] [Google Scholar]
  • 23.Bergqvist P et al. Re-utilization of germinal centers in multiple Peyer’s patches results in highly synchronized, oligoclonal, and affinity-matured gut IgA responses. Mucosal Immunol. 6, 122–135 (2012). [DOI] [PubMed] [Google Scholar]
  • 24.Lin SG et al. Highly sensitive and unbiased approach for elucidating antibody repertoires. Proc. Natl. Acad. Sci. 113, 7846–7851 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Tian M et al. Induction of HIV Neutralizing Antibody Lineages in Mice with Diverse Precursor Repertoires. Cell 166, 1471–1484.e18 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Jain S, Ba Z, Zhang Y, Dai HQ & Alt FW CTCF-Binding Elements Mediate Accessibility of RAG Substrates During Chromatin Scanning. Cell (2018). doi: 10.1016/j.cell.2018.04.035 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Hou XL, Wang L, Ding YL, Xie Q & Diao HY Current status and recent advances of next generation sequencing techniques in immunological repertoire. Genes Immun. 17, 153–164 (2016). [DOI] [PubMed] [Google Scholar]
  • 28.Bothwell ALM et al. Heavy chain variable region contribution to the NPb family of antibodies: somatic mutation evident in a γ2a variable region. Cell 24, 625–637 (1981). [DOI] [PubMed] [Google Scholar]
  • 29.Curnano A & Rajewsky K Structure of primary anti-(4-hydroxy-3-nitro-phenyl) acetyl (NP) antibodies in normal and idiotypically suppressed C57BL/6 mice. Eur. J. Lmmunol 15, 512–520 (1985). [DOI] [PubMed] [Google Scholar]
  • 30.Jacob J, Przylepa J, Miller C & Kelsoe G In situ studies of the primary immune response to (4-hydroxy-3-nitrophenyl)acetyl. III. The kinetics of V region mutation and selection in germinal center B cells. J. Exp. Med. 178, 1293–307 (1993). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Allen D, Simon T, Sablitzky F, Rajewsky K & Cumano A Antibody engineering for the analysis of affinity maturation of an anti-hapten response. Embo J 7, 1995–2001. (1988). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Ivanov II et al. Induction of Intestinal Th17 Cells by Segmented Filamentous Bacteria. Cell 139, 485–498 (2009). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Yang Y et al. Focused specificity of intestinal TH17 cells towards commensal bacterial antigens. Nature 510, 152–156 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Talham GL, Jiang H, Bos N a & Cebra, J. Segmented Filamentous Bacteria Are Potent Stimuli of a Physiologically Normal State of the Murine Gut Mucosal Immune System. Infect. Immun. 67, 1992 (1999). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Lécuyer E et al. Segmented filamentous bacterium uses secondary and tertiary lymphoid tissues to induce gut IgA and specific T helper 17 cell responses. Immunity 40, 608–620 (2014). [DOI] [PubMed] [Google Scholar]
  • 36.Papanicolas LE et al. Bacterial viability in faecal transplants: Which bacteria survive? EBioMedicine 41, 509–516 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.Czerwinski M, Siemaszko D, Siegel DL & Spitalnik SL Only selected light chains combine with a given heavy chain to confer specificity for a model glycopeptide antigen. J. Immunol. 160, 4406–4417 (1998). [PubMed] [Google Scholar]
  • 38.Stowell SR et al. Microbial Glycan Microarrays Define Key Features of Host- Microbial Interactions. Nat Chem Biol 10, 470–476 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Le Gallou S et al. A splenic IgM memory subset with antibacterial specificities is sustained from persistent mucosal responses. J. Exp. Med. 215, 2035–2053 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Mahon CM et al. Comprehensive interrogation of a minimalist synthetic CDR-H3 library and its ability to generate antibodies with therapeutic potential. J. Mol. Biol. 425, 1712–1730 (2013). [DOI] [PubMed] [Google Scholar]
  • 41.Barbas CF et al. Molecular profile of an antibody response to HIV-1 as probed by combinatorial libraries. J. Mol. Biol. 230, 812–823 (1993). [DOI] [PubMed] [Google Scholar]
  • 42.Fagarasan S et al. Critical roles of activation-induced cytidine deaminase in the homeostasis of gut flora. Science 298, 1424–1427 (2002). [DOI] [PubMed] [Google Scholar]
  • 43.Muramatsu M et al. Class Switch Recombination and Hypermutation Require Activation-Induced Cytidine Deaminase (AID), a Potential RNA Editing Enzyme. Cell 102, 553–563 (2000). [DOI] [PubMed] [Google Scholar]
  • 44.Suzuki K et al. Aberrant expansion of segmented filamentous bacteria in IgA-deficient gut. Proc. Natl. Acad. Sci. U. S. A. 101, 1981–1986 (2004). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45.Marcou Q, Mora T & Walczak AM High-throughput immune repertoire analysis with IGoR. Nat. Commun. 9, (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46.Zhang Y et al. The role of short homology repeats and TdT in generation of the invariant γδ antigen receptor repertoire in the fetal thymus. Immunity 3, 439–447 (1995). [DOI] [PubMed] [Google Scholar]

Methods References

  • 47.Hu J et al. Chromosomal Loop Domains Direct the Recombination of Antigen Receptor Genes. Cell 163, 947–959 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 48.Xiang Y, Park S-K & Garrard WT A major deletion in the Vκ-Jκ intervening region results in hyper-elevated transcription of proximal Vκ genes and a severely restricted repertoire. J. Immunol. 193, 3746–3754 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 49.McGuire KL & Vitetta ES kappa/lambda Shifts do not occur during maturation of murine B cells. J. Immunol. 127, 1670–1673 (1981). [PubMed] [Google Scholar]
  • 50.Chen J et al. B cell development in mice that lack one or both immunoglobulin kappa light chain genes. EMBO J. 12, 821–30 (1993). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 51.Ye J, Ma N, Madden TL & Ostell JM IgBLAST: an immunoglobulin variable domain sequence analysis tool. Nucleic Acids Res. 41, (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 52.Lefranc MP et al. IMGT®, the international ImMunoGeneTics information system®. Nucleic Acids Res. 37, (2009). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 53.Gupta NT et al. Hierarchical Clustering Can Identify B Cell Clones with High Confidence in Ig Repertoire Sequencing Data. J. Immunol. 198, (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 54.Hu J et al. Detecting DNA double-stranded breaks in mammalian genomes by linear amplification-mediated high-throughput genome-wide translocation sequencing. Nat. Protoc. 11, 853–871 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 55.Yaari G & Kleinstein SH Practical guidelines for B-cell receptor repertoire sequencing analysis. Genome Med. 7, 121 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 56.Crooks GE, Hon G, Chandonia JM & Brenner SE WebLogo: A sequence logo generator. Genome Res. 14, 1188–1190 (2004). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 57.Ginestet C ggplot2: Elegant Graphics for Data Analysis. J. R. Stat. Soc. Ser. A (Statistics Soc. 174, 245–246 (2011). [Google Scholar]
  • 58.Plummer M JAGS  : A Program for Analysis of Bayesian Graphical Models. in Proceedings of the 3rd International Workshop on Distributed Statistical Computing (DSC 2003) (2003). [Google Scholar]
  • 59.Loh DY, Bothwell ALM, White-Scharf ME, Imanishi-Kari T & Baltimore D Molecular basis of a mouse strain-specific anti-hapten response. Cell 33, 85–93 (1983). [DOI] [PubMed] [Google Scholar]
  • 60.Kuraoka M et al. Complex Antigens Drive Permissive Clonal Selection in Germinal Centers. Immunity 44, 542–552 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary Material

Data Availability Statement

The next-generation sequencing data reported in this paper have been deposited in the Gene Expression Omnibus (GEO) database under the accession number GSE140795. All figures have associated raw data deposited.

RESOURCES