Skip to main content
Proceedings of the National Academy of Sciences of the United States of America logoLink to Proceedings of the National Academy of Sciences of the United States of America
. 2016 Apr 25;113(19):E2636–E2645. doi: 10.1073/pnas.1525510113

Large-scale sequence and structural comparisons of human naive and antigen-experienced antibody repertoires

Brandon J DeKosky a,1, Oana I Lungu a,b,1, Daechan Park a,b, Erik L Johnson a, Wissam Charab a, Constantine Chrysostomou a, Daisuke Kuroda c, Andrew D Ellington d, Gregory C Ippolito b, Jeffrey J Gray c, George Georgiou a,b,e,f,2
PMCID: PMC4868480  PMID: 27114511

Significance

We applied a very recently developed experimental strategy for high-throughput sequencing of paired antibody heavy and light chains along with large-scale computational structural modeling to delineate features of the human antibody repertoire at unprecedented scale. Comparison of antibody repertoires encoded by peripheral naive and memory B cells revealed (i) preferential enrichment or depletion of specific germline gene combinations for heavy- and light-chain variable regions and (ii) enhanced positive charges, higher solvent-accessible surface area, and greater hydrophobicity at antigen-binding regions of mature antibodies. The data presented in this report provide fundamental new insights regarding the biological features of antibody selection and maturation and establish a benchmark for future studies of antibody responses to disease or to vaccination.

Keywords: antibody, B cell, immunology, high-throughput sequencing, computational modeling

Abstract

Elucidating how antigen exposure and selection shape the human antibody repertoire is fundamental to our understanding of B-cell immunity. We sequenced the paired heavy- and light-chain variable regions (VH and VL, respectively) from large populations of single B cells combined with computational modeling of antibody structures to evaluate sequence and structural features of human antibody repertoires at unprecedented depth. Analysis of a dataset comprising 55,000 antibody clusters from CD19+CD20+CD27 IgM-naive B cells, >120,000 antibody clusters from CD19+CD20+CD27+ antigen–experienced B cells, and >2,000 RosettaAntibody-predicted structural models across three healthy donors led to a number of key findings: (i) VH and VL gene sequences pair in a combinatorial fashion without detectable pairing restrictions at the population level; (ii) certain VH:VL gene pairs were significantly enriched or depleted in the antigen-experienced repertoire relative to the naive repertoire; (iii) antigen selection increased antibody paratope net charge and solvent-accessible surface area; and (iv) public heavy-chain third complementarity-determining region (CDR-H3) antibodies in the antigen-experienced repertoire showed signs of convergent paired light-chain genetic signatures, including shared light-chain third complementarity-determining region (CDR-L3) amino acid sequences and/or Vκ,λ–Jκ,λ genes. The data reported here address several longstanding questions regarding antibody repertoire selection and development and provide a benchmark for future repertoire-scale analyses of antibody responses to vaccination and disease.


Effective antigen recognition by the humoral immune system is predicated on the somatic generation of a large antibody repertoire that encompasses the sequence and conformational diversity to respond to a highly diversified set of antigens (13). Upon antigen challenge, naive B cells (NBCs) expressing unmutated antibodies capable of binding antigen with an affinity sufficient to initiate B-cell receptor (BCR) signaling may be stimulated to undergo somatic hypermutation (SHM) of the antibody genes. B cells expressing higher-affinity BCRs are better equipped to compete for antigen and thus receive signals that enable their preferential proliferation and further antibody sequence diversification in additional rounds of SHM. This process generates a repertoire of somatically mutated antibodies that, at the structural level, generally display decreased conformational flexibility (4, 5), slower antigen dissociation rates, and increased binding selectivity relative to the germline repertoire.

Understanding the salient features of the human antibody repertoire is critical for immunology research (6, 7). Specifically, additional information is needed regarding how a history of pathogen and environmental exposure modulates the sequence and conformational properties of naive antibodies to yield a mature antibody repertoire that confers effective protection. High-throughput DNA sequencing of the heavy-chain variable (VH) gene repertoire from human donors has begun to delineate repertoire-wide differences among antibodies encoded by different B-cell subsets (814). For example, antigen experience has been reported to result in decreased length and hydrophobicity of the heavy-chain third complementarity-determining region (CDR-H3) and to alter CDR-H3 amino acid content relative to the naive repertoire (8, 9). Analysis of the VH repertoire in identical twins provided evidence that germline gene use is genetically predetermined (10), but other studies demonstrated that VH gene use for a particular B-cell subset among different donors is more closely related than are different B-cell subsets within an individual (8, 9, 11). However, because of technical limitations (13), these and other studies were confined to analysis of the heavy and light chain repertoires separately. Earlier studies using limiting dilution and single-cell cloning to determine complete antibody gene sequences [i.e., both VH and light-chain variable (VL) or VH:VL] in small numbers of B cells have provided invaluable insights into aspects of B-cell and antibody repertoire development (1520). However, because of the low throughput of B-cell cloning, it has not been possible to address a number of key questions that require more comprehensive coverage of the VH:VL repertoire. For example, whether heavy and light chains pair in a combinatorial fashion or instead whether certain germline heavy-chain genes are conformationally predisposed to pair with particular light-chain genes remains controversial (15, 21, 22).

Although repertoire-wide crystallographic studies are not feasible at present, advances in computational protein modeling have enabled the structure of antibodies having moderate-length CDR loops to be predicted accurately (2326). In particular, computational models using RosettaAntibody-predicted X-ray structures within 1–1.5 Å rmsd in framework and canonical loops and 2 Å in CDR-H3 loops (27). However, the use of RosettaAntibody for repertoire analysis requires, first and foremost, the availability of high-quality paired VH and VL sequence data for each antibody; second, significant computational resources for efficient execution of large-scale computational modeling of antibody repertoires; and, finally, an appropriate suite of informatic tools and statistical metrics suitable for repertoire-level comparisons. Such a pipeline enables the study of the physicochemical properties of antibody repertoires important for disease and autoimmunity (9, 18, 28) on a 3D level at the site of antigen contact.

Here we deployed a recently developed technology designed for massive determination of the natively paired VH:VL repertoire from single B cells (29, 30) coupled with high-throughput computational modeling to obtain the first (to our knowledge) comprehensive sequence and structural examination of human naive and antigen-experienced antibody repertoires circulating in peripheral blood. Our analysis of >170,000 high-quality antibody sequence clusters and >2,000 antibody models of naive and antigen-experienced repertoires from three human donors provided comprehensive data that have enabled us to address the questions stated above in addition to several other longstanding issues regarding antibody repertoire maturation and development.

Results

Determination of Antibody Sequence and Structural Repertoires.

CD3CD19+CD20+CD27 NBCs were isolated from peripheral blood mononuclear cells (PBMCs) of three healthy human donors (Fig. 1A and SI Appendix, Fig. S1). IgM amplicons encoding natively paired VH and VL sequences were generated and sequenced using a single-cell emulsion-based methodology (Fig. 1B). Briefly, paired heavy- and light-chain sequencing was achieved via single-cell isolation in emulsion droplets, cell lysis, and mRNA capture, overlap extension RT-PCR to link heavy and light chains, and Illumina MiSeq sequencing of the VH:VL cDNA product (29, 30). We obtained 55,355 antibody sequences from NBCs following filtering for read quality, functional gene segment identification, and CDR-H3 clustering. Sequences in the naive group were filtered for >98% germline nucleotide identity in the framework 3 region (FR3), similar to previous reports (10), to eliminate antibody sequences caused by FACS artifacts (typically <1% of nonnaive cells) or by low-frequency CD27 antigen-experienced B cells (AEBCs) falling within the sort gates (Fig. 1C; also see SI Appendix, SI Methods) (31, 32). For two of these donors we determined the IgM/IgG/IgA VH:VL paired repertoire of CD3CD19+CD20+CD27+ AEBCs from the same blood draw. A total of 123,941 distinct CDR-H3:light-chain third complementarity-determining region (CDR-L3) clusters corresponding to antigen-experienced VH:VL pairs were thus obtained (SI Appendix, Table S1). MiSeq 2 × 300 technology permits sequencing 600 bp of the ∼850-bp VH:VL amplicon per sample, and therefore full-length VH and VL genes for AEBCs were generated via bioinformatic assembly of the separate VH, VL, and VH:VL sequencing samples as described previously (29, 30, 33, 34). No gene assembly was necessary for NBC sequences because naive BCRs do not exhibit SHM and can be assumed to correspond to germline FR1–FR3 VH and VL gene segments.

Fig. 1.

Fig. 1.

High-throughput pipeline for antibody repertoire sequencing, modeling, and analysis. (A) Peripheral B-cell repertoires from healthy human donors were fractionated into naive (CD3CD19+CD20+CD27) and antigen-experienced (CD3CD19+CD20+CD27+) samples via FACS. (B) B cells were isolated as single cells in emulsion droplets for single-cell mRNA capture, overlap extension linkage RT-PCR, and high-throughput sequencing as previously reported (29, 30). (C) VH:VL sequences were quality filtered for read quality, two or more reads, and 96% CDR-H3 identity clustering to remove sequence errors. Additional quality controls for the naive antibody dataset included filtering for IgM expression and SHM load to enhance data purity beyond the limitations of FACS. (D) Antibodies were selected for modeling from among sequences with the highest read counts and were filtered for CDR-H3 length and the availability of high-quality, high-sequence-similarity structural templates in the PDB. (E) Antibody repertoires were modeled using RosettaAntibody 3.0. (F) CR-paratopes were identified from antibody models and analyzed for charge (Upper), hydrophobicity (Middle), and SASA (Lower). Sequence and structural metrics were analyzed using a variety of statistical approaches to gain new biological understanding from high-throughput antibody repertoire data.

Structural antibody repertoire modeling was performed using the RosettaAntibody 3.0 antibody modeling protocol implemented on a high-performance computing cluster (Fig. 1 D and E). Briefly, structures were generated by (i) grafting homologous template framework and CDR loop regions and (ii) de novo modeling of the CDR-H3 while refining the surrounding loops and the relative orientation of heavy- and light-chain variable (VH and VL, respectively) domains. Top-scoring models were selected from thousands of candidate model structures. From the two donors, 1,014 naive and 1,015 antigen-experienced antibody models were generated (SI Appendix, Table S1) at an estimated computational expense of 570,000 CPU hours. Paratopes of antibodies are the regions that directly recognize and bind to antigens. We estimated the paratope using contact-residue regions observed in antibody–antigen crystal structures (35). We calculated the net charge, solvent-accessible surface area (SASA), and hydrophobic solvent-accessible surface area (hSASA) over the computationally estimated contact-region paratopes (CR-paratopes), and we analyzed framework regions (FR) for conformational similarities and differences at the repertoire level (Fig. 1F).

V-Gene Use and Public VH:VL Characteristics Differ in Naive and Antigen-Experienced Antibody Repertoires.

We sought to understand how paired heavy:light V-gene use differed across individuals and across B-cell subsets (Fig. 2A and SI Appendix, Fig. S2). Pearson hierarchical clustering of VH:VL repertoire V-gene use revealed that naive antibody repertoires from different individuals clustered together, whereas antigen-experienced repertoires from multiple donors clustered separately from naive repertoires (Fig. 2B, Left). In other words, full antibody (i.e., paired VH:VL) V-gene use in naive repertoires was more similar to the naive repertoires of other donors than to a donor-matched antigen-experienced repertoire. Further, subclassification of the antigen-experienced repertoire by heavy-chain isotype revealed that antigen-experienced IgM repertoires clustered as one group, whereas class-switched repertoires (IgG, IgA) clustered separately (Fig. 2B, Right; principal component analysis is reported in SI Appendix, Fig. S3) Antigen-experienced class-switched IgG and IgA subsets were more similar within an individual than across the two individuals (indicated by the height separating groups in Fig. 2B, Right, and also observed in principal component analysis and pair-wise Pearson correlation coefficients in SI Appendix, Figs. S3 and S4).

Fig. 2.

Fig. 2.

Paired V-gene use in naive and antigen-experienced B-cell repertoires. (A) Paired heavy:light V-gene use surface maps of antibody sequence repertoires from naive (CD3CD19+CD20+CD27 IgM) (Left) and antigen-experienced (CD3CD19+CD20+CD27+ IgG/IgA/IgM) (Right) repertoires of donor 1 (n = 13,780 and 34,692, respectively). Donor-matched NBCs and AEBCs were isolated from the same time point and blood draw. V-genes are plotted in alphanumeric order, with heights indicating percentage representation among VH:VL clusters. (B) Clustergrams resulting from Pearson hierarchical cluster analysis of paired heavy:light V-gene use across donors; relative distance is indicated by line heights connecting different groups. (Left) Clustering of donor and B-cell subset repertoires. (Right) Clustering of heavy-chain isotype repertoires (naive and antigen-experienced IgM, IgA, and IgG). (C) Volcano plot representation of differences in VH:VL gene use in the NBC and AEBC repertoires. Positive fold-change values denote VH:VL gene pairs that were more frequent in antigen-experienced datasets. Gene pairs with adjusted P values below 0.05 are displayed in red and are listed in SI Appendix, Table S2. Gene pairs with a log2 (fold-change) absolute value of 1 or more and with an adjusted P value greater than 0.05 are displayed in orange. Other gene pairs are displayed in black. A total of 872 VH:VL gene pair combinations was present in all donors and are included in this analysis.

We next performed statistical analysis to understand shifting heavy:light V-gene gene use across B-cell subsets using linear model-based t tests with adjustment for multiple comparisons (36). We found 28 statistically significant enriched/depleted VH:VL gene pairs in AEBCs compared with the NBC repertoire, with an adjusted P value less than 0.05, comprising 3.2% of the 872 germline VH:VL gene combinations present in all experimental samples (Fig. 2C). As one salient example, previous studies provided conflicting reports on IGHV6-1 prevalence across B-cell subsets [IGHV6-1 being reported as both enriched (9, 12) and depleted (10) in antigen-experienced subsets]. Interestingly, we found IGHV6-1 was significantly depleted in antigen-experienced subsets when paired with certain light-chain V-genes (KV1-33, LV3-19, LV1-40, and LV3-1); however IGHV6-1 was enriched when paired with other light-chain genes (e.g., KV4-1, LV2-11, and LV1-44). Differentially enriched/depleted VH:VL pairs and a comprehensive list of VH:VL genes and P values are provided in SI Appendix, Table S2 and other data in SI Appendix. We performed several statistical analyses after subdividing the repertoire into SHM bins to understand the role of SHM in gene use; no clear correlations between gene use and SHM load were identified.

A longstanding question in studies of the antibody repertoire is whether certain VH:VL V-gene combinations are favored or disfavored relative to random or combinatorial, VH:VL gene pairings (15, 21, 22). Given p VH genes represented at x fraction among all antibodies, and q VL genes represented at y fraction among all antibodies, then p × q different Ig VH:VL gene pairings can be formed at an expected xi × yj fraction for each i-j VH:VL gene pair in the repertoire (defined as the VH:VL expectation value). A reduced frequency of observation for a particular VH:VL gene pair relative to its expected frequency would suggest negative selection for that VH:VL; negative VH:VL selection could arise from structural incompatibility of VH and VL domains. We applied several statistical techniques including linear model-based t tests (36), DESeq (37), and Student’s t test to identify reduced-frequency “holes” and increased-frequency “peaks” compared with the xi × yj expectation value (null) hypothesis that could be observed repeatedly within each B-cell subset in multiple donors. No statistically significant holes or peaks could be identified across donors among the B-cell subsets analyzed, consistent with prior small datasets (15, 21) and indicating that any disproportional pairings of human heavy- and light-chain V-genes that were observed in a single B-cell dataset were not replicated in other donors.

Analysis of a small set (<200) of antibody sequences obtained by single B-cell cloning (15) and high-throughput sequencing of VH repertoire subsets (8) had suggested that the CDR-H3 region is shorter, has a more basic pI, and is more hydrophilic in AEBCs than in NBCs. We observed that the reduction in CDR-H3 length in antigen-experienced repertoires compared with naive repertoires was very slight, although the difference between distributions was statistically significant [naive CDR-H3: 15.23 ± 3.69 amino acids, antigen-experienced: 15.08 ± 3.64 amino acids, mean ± SD, International Immunogenetics Information System (IMGT) CDR3 length definitions, P < 10−14 by the Kolmogorov–Smirnov (K–S) test which compares the equality of distributions] (SI Appendix, Fig. S5A). We found no strong correlation in paired heavy- and light-chain CDR3 loop lengths (SI Appendix, Fig. S5 C and D), as has been previously inferred from limited antibody sequence data (15, 21).

Promiscuous light-chain sequences (i.e., light chains paired with two or more VH sequences) have been observed previously and result from the lower diversity of light-chain Vκ,λ–Jκ,λ junctions (6, 29). We found widespread evidence of promiscuous light-chain junctions, i.e., Vκ,λ–Jκ,λ junctions observed in multiple BCRs within the same donor. In NBCs, promiscuous CDR-L3 junctions comprised 68.4 ± 4.5% of the repertoire at the nucleotide level (78.5 ± 4.0% based on protein sequence). The fraction of promiscuous Vκ,λ–Jκ,λ junctions was reduced in AEBCs (30.2 ± 3.9% nucleotide basis, 46.9 ± 5.7% amino acid basis) (29). Promiscuous Vκ,λ–Jκ,λ junctions paired with a diverse set of heavy-chain genes (29), consistent with the hypothesis that promiscuous light-chain CDR3 sequences emerge as a consequence of high-probability light-chain Vκ,λ–Jκ,λ recombination events.

Public Vκ,λ–Jκ,λ nucleotide and amino acid junctions (i.e., junctions observed in more than one individual) comprised a significant fraction of NBC CDR-L3s (64.6 ± 6.1% by nucleotide, 75.7 ± 6.7% by amino acid) and AEBC CDR-L3s (16.6 ± 0.3% by nucleotide, 33.6 ± 1.9% by amino acid) (6, 29, 38, 39). In contrast to the high prevalence of public Vκ,λ–Jκ,λ nucleotide sequences, we observed very few public CDR-H3 nucleotide sequences across individuals (three among naive groups, one between antigen-experienced groups), as was consistent with prior reports (10, 13, 40). At the amino acid level we observed 23 CDR-H3 sequences that were shared between two donors in the naive repertoire (0.083% frequency; no CDR-H3 was shared among all three donors) and 38 CDR-H3 amino acid sequences that were shared between two donors in the antigen-experienced repertoire (0.061% frequency). Public CDR-H3 lengths were significantly shorter than CDR-H3 lengths in the overall repertoires, presumably reflecting the lower sequence diversity inherent to shorter CDR-H3s, as is consistent with a higher probability of the same short junction occurring in different individuals (SI Appendix, Fig. S6A) (40, 41). We detected five antigen-experienced public CDR-H3 amino acid sequences that were paired with identical CDR-L3 amino acid sequences; all five public CDR-H3:CDR-L3 antibodies were encoded by different nucleotide sequences with distinct patterns of N/P addition, SHM, gene composition, and isotype use, indicating that they derived from distinct V(D)J recombinations (SI Appendix, Table S3). Convergence in light-chain gene pairing was markedly higher for antigen-experienced public CDR-H3 antibodies that had undergone antigen selection than for public CDR-H3 antibodies from NBCs (SI Appendix, Fig. S6B), reinforcing the hypothesis that public antigen-experienced CDR-H3 antibodies can be functionally selected for binding to common antigens (4245).

Structural Differences in Repertoire Charge, Surface Area, and Hydrophobicity.

We first analyzed the predictive accuracy of RosettaAntibody for naive antibodies. RosettaAntibody models have been reported to have an FR and canonical loop accuracy of 0.45- to 1.0-Å rmsd relative to X-ray structures of antigen-experienced antibodies and an average 2.1-Å rmsd in CDR-H3 loops (27). For the seven human germline antibodies in the Protein Data Bank (PDB) (100% sequence identity to germline V-gene segments), our RosettaAntibody models displayed a 0.8- to 1.4-Å rmsd in FR and canonical CDR loops and a <2.4-Å rmsd in CDR-H3 relative to X-ray structures. We produced antibody models for sequences that were unique among all repertoires and for which high-sequence-identity templates of FR and canonical CDRs were available in the PDB, as required for homology modeling. Because computational structural prediction of long CDR-H3 loops is problematic, we considered only the antibody repertoire subset with CDR-H3 lengths of <16 amino acids, by the Chothia definition (76% of total sequenced antibodies).

We sought to characterize the repertoire-wide physicochemical properties of computationally modeled CR-paratopes, including charge, hydrophobicity, SASA, and hSASA; these properties have been shown to be important in tolerance and for the avoidance of self-reactivity (8, 12, 18). We calculated physicochemical metrics holistically over all six regions of each antibody that comprise the most likely antigen-binding contacts, i.e., the CR-paratope. The CR-paratope overlaps but does not perfectly coincide with the six CDRs of an antibody, which are classically defined by hypervariable amino acid sequence (see SI Appendix, SI Methods for a detailed description). The CR-paratope charge in naive and antigen-experienced repertoires was found to be heavily influenced by V-gene use (Fig. 3A). Naive antibodies exhibited a slightly more negative CR-paratope charge than antigen-experienced antibodies, with distribution mean charges of −1.1 and −0.68, respectively (median charges −1 and 0; differences in charge distribution were statistically significant by the K–S test, P = 2.8 × 10−3). Antibodies using the gene segment IGKV1-33 exhibited a strong negative charge over the CR-paratope because of a −3 charge in the IGKV1-33 germline. Reanalysis of CR-paratope charge distribution data when antibodies using IGKV1-33 were excluded rendered the differences in CR-paratope charge distribution between the repertoires nonsignificant (P = 0.096 by K–S test, n = 859 for naive and n = 958 for antigen-experienced repertoires). IGKV1-33 is a common gene segment, representing 15% of V-gene use in paired VH:VL naive repertoire models but only 3.1% in antigen-experienced models (8.9% and 3.1%, respectively, in donor sequences). Indeed, IGKV1-33 is also a member of four VH:VL V-gene pairs which have statistically significant decreased expression in antigen-experienced repertoires (SI Appendix, Table S2) (10).

Fig. 3.

Fig. 3.

Charge distributions in naive and antigen-experienced repertoires. (A) CR-paratope charge. (B) Total CDR-H3 and CDR-L3 charge. (C) CDR-H3 charge. (D) CDR-L3 charge for naive and antigen-experienced BCR repertoires. In all panels, differences in charge distribution between naive and antigen-experienced repertoires were statistically significant by the K–S test (P = 3.5 × 10−3 for A; P < 10−15 for BD). The number in each group is provided in SI Appendix, Table S1; error bars represent SD.

We compared charge distributions in computationally modeled antibody CR-paratopes with those of CDR-H3:CDR-L3 amino acid sequences. CDR-H3:CDR-L3 sequences from both naive and antigen-experienced repertoires exhibited neutral charge distributions, with 90% of the repertoire falling between +2 and −2 by total CDR3 loop charge (Fig. 3B). In agreement with our CR-paratope charge analysis, the antigen-experienced CDR-H3:CDR-L3 amino acid sequences displayed a slightly reduced negative charge as compared with the naive sequences (mean charges of −0.47 and −0.099, respectively) (Fig. 3B). Antigen-experienced CDR-H3 and CDR-L3 repertoires showed statistically significant increases in net charge distributions as compared with naive repertoires [Fig. 3 C (8, 40) and D and SI Appendix, Table S4]. Increased CDR3 charge was observed in antigen-experienced CDR-H3 and CDR-L3 concurrently and was evident even when antibodies were binned for gene use (SI Appendix, Figs. S7 and S8), indicating that the shift toward a more positive charge was driven by functional BCR selection rather than by changing gene use. We also observed distinct distributions of CDR-L3 charge across kappa vs. lambda light-chain repertoires, with kappa CDR-L3 repertoires being significantly more positively charged than lambda CDR-L3 repertoires (SI Appendix, Fig. S9 and Table S5). Importantly, distinct Igκ vs. Igλ light-chain charge distributions also were evident in light-chain CR-paratopes. Kappa CR-paratopes had a median charge of 0, whereas lambda CR-paratopes had a median charge of −1 (P = 1.8 × 10−9 for naive kappa vs. lambda charge distribution and P < 10−15 for antigen-experienced kappa vs. lambda charge distribution by K–S test) (SI Appendix, Fig. S10).

BCRs with positive charge extremes were slightly more prevalent in antigen-experienced repertoires than in naive repertoires. Antibodies having highly positively charged CDR3s are more likely to be autoreactive, and B cells encoding positively charged CDR3s are known to be eliminated at distinct developmental checkpoints in bone marrow (18). Nonetheless, our results showed that a small but significant fraction of highly positively charged antibodies is found in NBCs in the periphery. Further, the frequency of such antibodies was statistically significantly increased by antigen exposure (+5 or +6 CDR-H3:CDR-L3: average 0.24% of naive vs. 0.33% of antigen-experienced antibodies, P = 2.7 × 10−4 by Z test.)

We next analyzed the hydrophobicity of paired CDR-H3 and CDR-L3 loops by calculating the hydrophobic index (H-index) (46). Although distributions in average H-index did not show significant differences between naive and antigen-experienced repertoires, we observed major differences in the CDR-L3 average H-index between kappa and lambda repertoires (SI Appendix, Fig. S9). Lambda light-chain CDR3s were more hydrophobic than kappa CDR3s, and these patterns were consistent across donors (P < 10−14). Kappa CDR-L3s were under stronger H-index positive selection pressure than lambda light chains (the mean H-index from naive to antigen-experienced increased by 0.037 in kappa light chains compared with 0.016 in lambda light chains) (SI Appendix, Fig. S9 and Table S5), perhaps because of the much lower hydrophobicity of naive kappa light chains.

The availability of structural models also enabled us to characterize the SASA and the hSASA across the antibody repertoire CR-paratopes (Fig. 4A). There was a very slight increase in median SASA, from 2,600 Å2 in antibodies from NBCs to 2,650 Å2 in antibodies from AEBCs (mean SASA 2,611 vs. 2,633 Å2, respectively, P = 6.5 × 10−5 by K–S test to compare SASA distributions) (Fig. 4B). We observed that IGHV4-59 and IGHV4-34 had a strong impact on naive and antigen-experienced antibody SASA. For example, antibodies using IGHV4-34 were characterized by a smaller-than-average SASA (2,527 Å2 for antibodies using IGHV4-34 vs. 2,637 Å2 average for all modeled antibodies in this study) and comprised 8% of the naive repertoire but only 3% of the antigen-experienced repertoire.

Fig. 4.

Fig. 4.

Distribution of SASA in naive and antigen-experienced repertoires. (A) CR-paratope SASA (Upper) and hSASA (Lower) of a naive antibody (Left) and antigen-experienced antibody (Right) with SASA or hSASA at the median of the respective distributions. (Upper) VH CR-paratope SASA is shown in blue; VL CR-paratope SASA is shown in green. (Lower) hSASA of the CR-paratope is rendered with each residue colored according to the Eisenberg hydrophobicity scale, from most hydrophobic (red) to least (white). (BD) Total CR-paratope SASA (B), CDR-H1 SASA (C), and fraction of hSASA (D) for naive (blue) and antigen-experienced (red) BCR repertoires. In BD differences between naive and antigen-experienced repertoires were statistically significant by the K–S test (P = 6.5 × 10−5 for B; P = 2.5 × 10−12 for C; and P = 5.4 × 10−10 for D). The number for each group is provided in SI Appendix, Table S1; error bars represent SD.

Additionally we found that the median SASA contributed by CDR-H1 was slightly smaller, at 400 Å2 for naive antibodies (Fig. 4C), than the median of 425 Å2 in the antigen-experienced antibodies (means: 407 Å2 and 430 Å2, respectively; adjusted P value = 2.5 × 10−12 by K–S test to compare H1-SASA distributions). Although distribution means differed only slightly, the overall probability distributions of NBC vs. AEBC repertoire CR-paratope SASA and CDR-H1 SASA were significantly different, trending toward greater SASA in the CR-paratope of antibodies over the process of antigen experience.

The fraction of hSASA (i.e., the hSASA/SASA ratio) in antigen-experienced antibodies displayed a slight but statistically significant median increase, from 0.57 to 0.58 (means 0.57 and 0.58, respectively; adjusted P value = 2.5 × 10−4 for donor 1 and 7.6 × 10−8 for donor 2) (Fig. 4D). Antibodies using IGHV4-59, IGHV1-18, and IGHV3-33, along with IGHV1-3, IGHV3-30, and IGKV3-20 experienced small but statistically significant increases in the hSASA fraction in antigen-experienced antibody CR-paratopes of 1–3% (P = 9.3 × 10−6, 1.9 × 10−3, 2.5 × 10−2, 2.6 × 10−2, and 4.9 × 10−2, respectively). CDR-L3 CR-paratope hydrophobicity correlated well with sequence-based hydrophobicity metrics, with lambda CR-paratopes exhibiting a 10% higher hSASA fraction than kappa CR-paratopes over the CDR-L3.

Antibody FRs dictate the conformational space accessible to the CDRs. Conserved VH:VL interactions at FR positions are critical determinants of variable domain stability (4750). To explore the relationship between V-gene use and antibody structural similarity within FRs, we calculated the rmsd of antibody models to each other over the FR1–3 backbone within each naive or antigen-experienced computationally predicted antibody model repertoire (VH: 8–25, 36–51, and 57–94 and VL: 10–23, 35–49, and 57–88 backbone residue atoms by the Chothia definition; see SI Appendix, SI Methods). Pairwise antibody heavy-chain FR rmsds were calculated over the heavy-chain V-gene segment and compared across gene subsets in each donor (Fig. 5 A and B). We found that antibody structures were remarkably similar to each other over the VH FR regions, with a median overall rmsd of 1.02–1.09 Å in naive repertoires and 1.03–1.04 Å in antigen-experienced repertoires (Fig. 5). In naive repertoires, the FRs of antibodies having the same VH gene segment use were indistinguishable (same median rmsd of 0.13 in both donor 1 and donor 2) and slightly less so in antigen-experienced repertoires (median rmsds of 0.41 and 0.48 Å in donors 1 and 2, respectively, P < 10−15 by K–S test). A similar trend was observed for antibodies using VH gene segments from the same family (i.e., IGHV1, IGHV2, and so forth), with naive antibodies having a closer structural similarity to each other than antigen-experienced antibodies (naive antibodies: median rmsds of 0.51 Å in both donors; antigen-experienced antibodies: median rmsds of 0.56 and 0.57 Å in donors 1 and 2, respectively, P < 10−15 for both donors by K–S test). Interestingly, pairwise rmsds were not substantially increased for antibody pairs using VH gene segments from different VH gene families, with a median rmsd of 1.1 Å in all repertoires across both donors (P < 10−15 by K–S test for naive and antigen-experienced repertoires in both donors). We compared the FR1–3 rmsds among antigen-experienced predicted models in each donor with 141 nonredundant human antibody structures in the PDB (SI Appendix, SI Methods). Overall rmsd for the set of crystal structures from the PDB was 1.0 ± 0.29 Å; among antibodies using different V-gene segments the median rmsd was 1.1 ± 0.22 Å (SI Appendix, Fig. S11), in excellent agreement with our RosettaAntibody predictions. Of note, antibody sequences in the PDB were more somatically hypermutated than the antibodies modeled here, with antibodies in the PDB having 83% average germline sequence identity vs. the 88% average in our antigen-experienced repertoires. Thus, we conclude that at the repertoire level SHM does not appear to exert a significant impact on FR structural diversity.

Fig. 5.

Fig. 5.

Average rmsd of VH FR1–3 backbone atoms. (A) Superimposed models of two naive antibodies sharing the same IGHV gene segment (Left) or the same IGHV gene family (Center) or from two different IGHV gene families (Right). FR1–3 residues used to calculate pairwise rmsd values are highlighted in blue. (B) Superimposed models of two antigen-experienced antibodies sharing the same IGHV gene segment (Left) or the same IGHV gene family (Center) or from two different IGHV gene families (Right). FR1–3 residues used to calculate pairwise rmsd values are highlighted in red. (C and D) FR average pairwise rmsd for donor 1 (C) and donor 2 (D). Average pairwise rmsds are plotted for antibodies using the same V-gene segment, using the same V-gene family, and using two different V-gene families; naive repertoires are shown in blue, and antigen-experienced repertoires are shown in red; error bars indicate SD. Distribution differences between naive and antigen-experienced repertoires were statistically significant for all comparisons by the K–S test (P < 10−15 in C and D). The number for each group is given in SI Appendix, Table S1; error bars represent SD.

Discussion

The results presented here constitute by far the most in-depth analysis to date of naive and antigen-experienced human antibody repertoires, both at the sequence level and with respect to structural paratope features. Although the technical capabilities for tackling larger numbers of donors now exist, we focused on the analysis of a small set of donors in this study because scaling the scope of such analysis must be tempered by cost considerations, particularly regarding the computational capacity for data analysis and structural modeling (e.g., >250,000 CPU hours per 1,000 antibodies). We note that biological sampling limitations restrict our ability to analyze more than a very small fraction of the 1010 B cells in an individual; however the 1.8 × 105 antibody sequences that we recovered enabled highly sensitive analyses of the genetic, physicochemical, and structural properties of human antibody repertoires. The very rich data presented here highlight a number of key features of the antibody repertoire that either had not been observed previously or had been inferred from limited data.

First, the question of whether heavy- and light-chain gene pairing occurs in a purely combinatorial fashion or if, instead, germline sequences impose conformational constraints on the association of certain heavy- and light-chain genes has been hotly debated for many years. Although earlier studies relying on single-cell sequencing of tens to hundreds of B cells had argued primarily for random pairing (15, 21), this conclusion has been challenged by recent data meta-analyses (22). Here, examination of >175,000 high-quality antibody sequences from three donors provided compelling evidence that there are no observable biases in VH and VL gene pairing at a population level. Instead, we found that population-scale frequencies of VH:VL V-gene combinations are proportional to the relative representation of the VH and VL genes among the population. Further we observed no correlation between CDR-H3 and CDR-L3 loop lengths in either naive or antigen-experienced repertoires (SI Appendix, Fig. S5), as also had been inferred from limited prior data but had not yet been evaluated at the repertoire scale (15, 21).

Second, we present evidence that combined VH:VL gene use the in naive and antigen-experienced repertoire subsets correlates much better across individuals than within a single donor (i.e., naive and antigen-experienced repertoire within the same donor). Earlier reports on the VH-only or VL-only repertoires in different B-cell subsets also had revealed interdonor concordance (912), and our data demonstrate that these trends also are evident when the complete antibody sequence is examined (Fig. 2 and SI Appendix, Table S2). Further, the clear intradonor segregation of VH:VL gene use showed that the CD27+ IgM repertoire was distinct from class-switched CD27+ IgG/IgA (Fig. 2 and SI Appendix, Fig. S3), indirectly supporting prior hypotheses that distinct developmental pathways (possibly in response to different classes of antigen, adjuvant, and/or route of exposure) drive the development of these distinct B-cell subsets, as suggested previously (5153).

Third, we sought to identify public sequences, i.e., sequences shared among multiple individuals, that encompassed paired heavy- and light-chain information. Public VH sequences have been reported to arise in individuals infected with certain pathogens and also at a low frequency in healthy donors (10, 40, 4244, 54). However, the question of whether the light chains paired with public heavy chains are also shared among individuals had not been addressed. Although we did not find any perfect nucleotide sequence matches among full-length antibodies in our datasets, at the amino acid level we identified five public CDR-H3:CDR-L3 pairs (out of 179,296 total antibody clusters), a finding that underscores the very low frequency at which public antibodies typically arise. We observed signatures of convergent light-chain genes among public antibodies expressing the same CDR-H3 amino acid sequence (SI Appendix, Fig. S6B), contributing additional evidence to the hypothesis that public antibodies can be elicited in response to common immune stimuli (SI Appendix, Table S3) (40, 4245, 54).

Fourth, repertoire-wide examination of CR-paratope features revealed a slight shift toward less negatively charged CR-paratopes, which also was observed in CDRH3:CDRL3 amino acid sequences (Fig. 3). Shifts in net charge were observed even when antibodies were binned by gene use (SI Appendix, Fig. S8), providing evidence to support a functional selection mechanism for more positively charged CDR3s in both heavy (40) and light chains. This enhancement of positive charge may be driven by selection for binding to negatively charged bacterial membrane surfaces or by the generation of salt bridges between antibody CR-paratopes and negatively charged antigens. In addition to repertoire-wide differences in charge between naive and antigen-experienced antibodies, we found a slight increase in SASA that was reflected in gene use at the repertoire scale, particularly in the use of segment IGHV4-34. This gene segment may be selected against in AEBC repertoires because of its potential ability to bind mannose-binding lectins, thus explaining its implication in disease states such as follicular lymphoma (8).

Fifth, we noted key differences in CDR-L3 charge and hydrophobicity between kappa and lambda light chains that had not been previously reported (e.g., SI Appendix, Fig. S9). Recent reports delineated functional differences between kappa and lambda antibody responses (55), which may derive from the physicochemical differences between kappa and lambda isotypes reported here. The distinct physicochemical profiles of kappa and lambda light chains may serve important functional purposes in receptor editing and allelic inclusion. For example, receptor editing is known to mitigate self-targeting of autoimmunogenic antibodies (5658). The presence of two different light-chain gene sets, each with a distinct distribution of biochemical properties, may greatly enhance the probability of altering binding specificity by editing the light-chain isotype. Differences between kappa and lambda repertoires also may have functional importance in allelic inclusion, because kappa and lambda allelically included antibodies are more likely to show different binding specificities (29, 5659).

Finally, the close agreement between RosettaAntibody models and crystal structures in the PDB for FR rmsd analysis indicated that computational modeling is a powerful tool for predicting the conformation of FR regions. The structural FR similarities across antibodies indicated that a limited set of available conformational space is used for VH FR domains, which does not expand significantly because of SHM, and similarities between antibody VH domains within families were especially pronounced. FR conformational similarities across V-genes may serve as a structural mechanism to enable heavy and light V-genes to pair productively with each other in a combinatorial fashion according to their overall frequencies in the repertoire and to express successfully the tremendous diversity required for effective humoral adaptive immunity.

Methods

Sample Collection and VH:VL Sequencing.

Informed consent was obtained from anonymous donors by the Gulf Coast Regional Blood Center (Houston, TX). This study was approved by the University of Texas at Austin Institutional Biosafety Committee (2010-06-0084). PBMCs from whole blood were isolated into B-cell subsets via flow cytometric sorting. CD3CD19+CD20+CD27 NBCs and CD3CD19+CD20+CD27+ AEBCs were analyzed for VH:VL sequences as reported previously (29). Briefly, cells were isolated into emulsion droplets along with poly(dT) magnetic beads for mRNA capture using a flow-focusing nozzle apparatus. Droplets contained lithium dodecyl sulfate and DTT to lyse cells and inactivate proteins, and mRNA released from lysed cells was captured by the poly(dT) sequences on magnetic beads. The emulsion was broken chemically as described (29), and beads were collected, washed, and used as template for emulsion overlap extension RT-PCR which linked heavy- and light-chain transcripts into a single, linked cDNA construct for high-throughput sequencing via Illumina MiSeq 2 × 250 or 2 × 300 technology. See SI Appendix, SI Methods for further details regarding cell isolation, sorting, and antibody RT-PCR and sequencing.

Bioinformatic Sequence Analysis.

Illumina sequences were quality-filtered and annotated using both the IMGT (60) and National Center for Biotechnology Information IgBlast software (61) with a CDR3 motif identification algorithm (62). CDR-H3 junction nucleotide sequences were extracted and clustered to 96% nucleotide identity with terminal gaps ignored [USEARCH v. 5.2.32 (63)], with a minimum of one nucleotide mismatch permitted during CDR-H3 junction clustering regardless of sequence length; the most abundant CDR-L3 corresponding to each CDR-H3 cluster seed was chosen as an H3:L3 pair. Resulting CDR-H3:L3 pairs with two or more reads comprised the preliminary list of VH:VL clusters for each dataset. Naive antibody sequences were additionally filtered to include only sequences with >98% germline identity in the FR3 region, similar to previous reports (10). Additional details of bioinformatics analysis are provided in SI Appendix, SI Methods.

Structural Modeling and Analysis.

Structural modeling and analyses are described in SI Appendix, SI Methods. Briefly, antibody sequences represented by the most reads from donor 1 and donor 2 (all selected antibodies were observed at >50 reads per sequence from the respective repertoire) for naive and antigen-experienced sets were analyzed. Antibody sequences were tested for uniqueness in and across repertoires, so that no sequence was modeled more than once. Antibodies with a CDR-H3 length of ≥16 amino acids (Chothia numbering) were excluded from modeling. All sequences were subsequently filtered to ensure that each FR and CDR was identifiable by the modified Chothia definitions in RosettaAntibody. Antibodies for which high-sequence-identity templates were available for CDR-H1, CDR-H2, CDR-L1, CDR-L2, and CDR-L3 were input through the RosettaAntibody 3.0 antibody modeling protocol as described (27). A total of 1,000 trajectories were modeled per antibody; the lowest-scoring models, as evaluated by the Rosetta scoring function, were chosen for visual inspection and further analysis.

The CR-paratope comprised residues that were part of the contact region of each antibody as defined by Stave et al. (35). These consisted of VH residues numbered 26–33 (CDR-H1), 50–58 (CDR-H2), and 94–101 (CDR-H3) and VL residues 27–32 (CDR-L1), 49–56 (CDR-L2), and 91–96 (CDR-L3) in the Chothia numbering scheme.

Similarities between FRs (FR1–3) of antibodies were calculated by determining the rmsd over the backbone atoms (C, Cα, N, O) of each antibody FR1-3 region to all other antibodies in a repertoire using the McLachlan algorithm (64) as implemented in the ProFit software (A. C. R. Martin and C. T. Porter, University College London, London) (www.bioinf.org.uk/software/profit/). Antibodies then were grouped by IGHV gene use (same gene, same family, or different family), and median rmsd values, SDs, and statistical significance of distributions were determined using R version 3.1.1.

Statistical Analysis.

R version 3.1.1 was used for Pearson hierarchical clustering (function “hclust”). Distance between samples was measured by Pearson correlation with complete-linkage as the agglomerative method. Principal component analysis (the “princomp” function in MATLAB R2012b) was applied to processed Pearson hierarchical clustering data. R version 3.1.1 was used for the identification of differentially paired genes (package “limma” version 3.14.4) (36, 65). Before running limma, gene pairs with zero use were removed, and quantile normalization was performed to normalize the difference in distribution of values among samples. P values for multiple comparisons were corrected with the Benjamini–Hochberg procedure. Differentially paired gene cut-offs were established at a fold-change of 2 and an adjusted P value of 0.05. R version 3.1.1 was used for the K–S test (function “ks.test”). Raw values such as charge, length, and hydrophobicity were used to compare probability distributions across experimental groups. The Z score was used to compare two proportions of amino acid charges. Further details regarding statistical tests are described in SI Appendix, SI Methods.

Supplementary Material

Supplementary File

Acknowledgments

We thank Jeliazko Jeliazkov and Brian Weitzner for aid in implementing the antibody structure prediction protocol, Jessica Wheeler and Scott Hunicke-Smith for Illumina MiSeq sequencing, and the Texas Advanced Computing Center for computational resources. This work was supported by NIH Grant R56 AI106006, by Defense Threat Reduction Agency Grant HDTRA1-12-C-0105, and by a grant from the Clayton Foundation. B.J.D. was funded by graduate fellowships from the Hertz Foundation, the University of Texas Donald D. Harrington Foundation, and the National Science Foundation. O.I.L. was funded by NIH Collaborative Opportunities for Research Educators Fellowship 1 K12 GM102745. G.C.I. was funded by the World Health Organization. D.K. and J.J.G. were funded by NIH Grant 5 R01 GM078221.

Footnotes

Conflict of interest statement: G.G., B.J.D., and A.D.E. declare competing financial interests in the form of a patent filed by the University of Texas at Austin.

This article is a PNAS Direct Submission.

Data deposition: The sequence reported in this paper has been deposited in the National Center for Biotechnology Information Short Read Archive (accession codes PRJNA315079, SRX709625, and SRX709626). The Bioinformatic source code is shared on GitHub (https://github.com/bdekosky/PNAS_2015-25510) (accession code PNAS_2015-25510).

This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10.1073/pnas.1525510113/-/DCSupplemental.

References

  • 1.Murphy K, Travers P, Walport M, Janeway C. Janeway’s Immunobiology. 8th Ed Garland Science; New York: 2012. [Google Scholar]
  • 2.Kirkham PM, Schroeder HW., Jr Antibody structure and the evolution of immunoglobulin V gene segments. Semin Immunol. 1994;6(6):347–360. doi: 10.1006/smim.1994.1045. [DOI] [PubMed] [Google Scholar]
  • 3.Manser T. Evolution of antibody structure during the immune response. The differentiative potential of a single B lymphocyte. J Exp Med. 1989;170(4):1211–1230. doi: 10.1084/jem.170.4.1211. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Schmidt AG, et al. Preconfiguration of the antigen-binding site during affinity maturation of a broadly neutralizing influenza virus antibody. Proc Natl Acad Sci USA. 2013;110(1):264–269. doi: 10.1073/pnas.1218256109. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Li T, et al. Redistribution of flexibility in stabilizing antibody fragment mutants follows Le Châtelier’s principle. PLoS One. 2014;9(3):e92870. doi: 10.1371/journal.pone.0092870. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Jackson KJL, Kidd MJ, Wang Y, Collins AM. The shape of the lymphocyte receptor repertoire: lessons from the B cell receptor. Front Immunol. 2013;4(263):263. doi: 10.3389/fimmu.2013.00263. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Yaari G, Benichou JIC, Heiden JAV, Kleinstein SH, Louzoun Y. The mutation patterns in B-cell immunoglobulin receptors reflect the influence of selection acting at multiple time-scales. Phil Trans R Soc B. 2015;370(1676):20140242. doi: 10.1098/rstb.2014.0242. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Wu YC, et al. High-throughput immunoglobulin repertoire analysis distinguishes between human IgM memory and switched memory B-cell populations. Blood. 2010;116(7):1070–1078. doi: 10.1182/blood-2010-03-275859. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Wu Y-CB, Kipling D, Dunn-Walters DK. The relationship between CD27 negative and positive B cell populations in human peripheral blood. Front Immunol. 2011;2:81. doi: 10.3389/fimmu.2011.00081. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Glanville J, et al. Naive antibody gene-segment frequencies are heritable and unaltered by chronic lymphocyte ablation. Proc Natl Acad Sci USA. 2011;108(50):20066–20071. doi: 10.1073/pnas.1107498108. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Briney BS, Willis JR, McKinney BA, Crowe JE., Jr High-throughput antibody sequencing reveals genetic evidence of global regulation of the naïve and memory repertoires that extends across individuals. Genes Immun. 2012;13(6):469–473. doi: 10.1038/gene.2012.20. [DOI] [PubMed] [Google Scholar]
  • 12.Mroczek ES, et al. Differences in the composition of the human antibody repertoire by B cell subsets in the blood. Front Immunol. 2014;5:96. doi: 10.3389/fimmu.2014.00096. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Georgiou G, et al. The promise and challenge of high-throughput sequencing of the antibody repertoire. Nat Biotechnol. 2014;32(2):158–168. doi: 10.1038/nbt.2782. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Robinson WH. Sequencing the functional antibody repertoire--diagnostic and therapeutic discovery. Nat Rev Rheumatol. 2015;11(3):171–182. doi: 10.1038/nrrheum.2014.220. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Brezinschek H-P, Foster SJ, Dörner T, Brezinschek RI, Lipsky PE. Pairing of variable heavy and variable κ chains in individual naive and memory B cells. J Immunol. 1998;160(10):4762–4767. [PubMed] [Google Scholar]
  • 16.Bräuninger A, Goossens T, Rajewsky K, Küppers R. Regulation of immunoglobulin light chain gene rearrangements during early B cell development in the human. Eur J Immunol. 2001;31(12):3631–3637. doi: 10.1002/1521-4141(200112)31:12&#60;3631::aid-immu3631&#62;3.0.co;2-l. [DOI] [PubMed] [Google Scholar]
  • 17.Meffre E, et al. Immunoglobulin heavy chain expression shapes the B cell receptor repertoire in human B cell development. J Clin Invest. 2001;108(6):879–886. doi: 10.1172/JCI13051. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Wardemann H, et al. Predominant autoantibody production by early human B cell precursors. Science. 2003;301(5638):1374–1377. doi: 10.1126/science.1086907. [DOI] [PubMed] [Google Scholar]
  • 19.Tian C, et al. Evidence for preferential Ig gene usage and differential TdT and exonuclease activities in human naïve and memory B cells. Mol Immunol. 2007;44(9):2173–2183. doi: 10.1016/j.molimm.2006.11.020. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Wardemann H, Nussenzweig MC. B-cell self-tolerance in humans. Adv Immunol. 2007;95(95):83–110. doi: 10.1016/S0065-2776(07)95003-8. [DOI] [PubMed] [Google Scholar]
  • 21.de Wildt RM, Hoet RM, van Venrooij WJ, Tomlinson IM, Winter G. Analysis of heavy and light chain pairings indicates that receptor editing shapes the human antibody repertoire. J Mol Biol. 1999;285(3):895–901. doi: 10.1006/jmbi.1998.2396. [DOI] [PubMed] [Google Scholar]
  • 22.Jayaram N, Bhowmick P, Martin ACR. Germline VH/VL pairing in antibodies. Protein Eng Des Sel. 2012;25(10):523–529. doi: 10.1093/protein/gzs043. [DOI] [PubMed] [Google Scholar]
  • 23.Zhu K, et al. Antibody structure determination using a combination of homology modeling, energy-based refinement, and loop prediction. Proteins. 2014;82(8):1646–1655. doi: 10.1002/prot.24551. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Berrondo M, Kaufmann S, Berrondo M. Automated Aufbau of antibody structures from given sequences using Macromoltek’s SmrtMolAntibody. Proteins. 2014;82(8):1636–1645. doi: 10.1002/prot.24595. [DOI] [PubMed] [Google Scholar]
  • 25.Shirai H, et al. High-resolution modeling of antibody structures by a combination of bioinformatics, expert knowledge, and molecular simulations. Proteins. 2014;82(8):1624–1635. doi: 10.1002/prot.24591. [DOI] [PubMed] [Google Scholar]
  • 26.Teplyakov A, et al. Antibody modeling assessment II. Structures and models. Proteins. 2014;82(8):1563–1582. doi: 10.1002/prot.24554. [DOI] [PubMed] [Google Scholar]
  • 27.Weitzner BD, Kuroda D, Marze N, Xu J, Gray JJ. Blind prediction performance of RosettaAntibody 3.0: grafting, relaxation, kinematic loop modeling, and full CDR optimization. Proteins. 2014;82(8):1611–1623. doi: 10.1002/prot.24534. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Marcatili P, et al. Igs expressed by chronic lymphocytic leukemia B cells show limited binding-site structure variability. J Immunol. 2013;190(11):5771–5778. doi: 10.4049/jimmunol.1300321. [DOI] [PubMed] [Google Scholar]
  • 29.DeKosky BJ, et al. In-depth determination and analysis of the human paired heavy- and light-chain antibody repertoire. Nat Med. 2015;21(1):86–91. doi: 10.1038/nm.3743. [DOI] [PubMed] [Google Scholar]
  • 30.McDaniel JR, DeKosky BJ, Tanno H, Ellington AD, Georgiou G. Ultra-high-throughput sequencing of the immune receptor repertoire from millions of lymphocytes. Nat Protoc. 2016;11(3):429–442. doi: 10.1038/nprot.2016.024. [DOI] [PubMed] [Google Scholar]
  • 31.Wang J, et al. High frequencies of activated B cells and T follicular helper cells are correlated with disease activity in patients with new-onset rheumatoid arthritis. Clin Exp Immunol. 2013;174(2):212–220. doi: 10.1111/cei.12162. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Kaminski DA, Wei C, Qian Y, Rosenberg AF, Sanz I. Advances in human B cell phenotypic profiling. Front Immunol. 2012;3:302. doi: 10.3389/fimmu.2012.00302. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.DeKosky BJ, et al. High-throughput sequencing of the paired human immunoglobulin heavy and light chain repertoire. Nat Biotechnol. 2013;31(2):166–169. doi: 10.1038/nbt.2492. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Lavinder JJ, et al. Identification and characterization of the constituent human serum antibodies elicited by vaccination. Proc Natl Acad Sci USA. 2014;111(6):2259–2264. doi: 10.1073/pnas.1317793111. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Stave JW, Lindpaintner K. Antibody and antigen contact residues define epitope and paratope size and structure. J Immunol. 2013;191(3):1428–1435. doi: 10.4049/jimmunol.1203198. [DOI] [PubMed] [Google Scholar]
  • 36.Smyth GK. 2005. limma: Linear models for microarray data. Bioinformatics and Computational Biology Solutions Using R and Bioconductor, Statistics for Biology and Health., eds Gentleman R, Carey VJ, Huber W, Irizarry RA, Dudoit S (Springer, New York), pp 397–420.
  • 37.Anders S, Huber W. Differential expression analysis for sequence count data. Genome Biol. 2010;11(10):R106. doi: 10.1186/gb-2010-11-10-r106. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Jackson KJL, et al. Divergent human populations show extensive shared IGK rearrangements in peripheral blood B cells. Immunogenetics. 2012;64(1):3–14. doi: 10.1007/s00251-011-0559-z. [DOI] [PubMed] [Google Scholar]
  • 39.Hoi KH, Ippolito GC. Intrinsic bias and public rearrangements in the human immunoglobulin Vλ light chain repertoire. Genes Immun. 2013;14(4):271–276. doi: 10.1038/gene.2013.10. [DOI] [PubMed] [Google Scholar]
  • 40.Arnaout R, et al. High-resolution description of antibody heavy-chain repertoires in humans. PLoS One. 2011;6(8):e22365. doi: 10.1371/journal.pone.0022365. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41.Galson JD, et al. In-depth assessment of within-individual and inter-individual variation in the B cell receptor repertoire. Front Immunol. 2015;6:531. doi: 10.3389/fimmu.2015.00531. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42.Smith K, et al. Fully human monoclonal antibodies from antibody secreting cells after vaccination with Pneumovax®23 are serotype specific and facilitate opsonophagocytosis. Immunobiology. 2013;218(5):745–754. doi: 10.1016/j.imbio.2012.08.278. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43.Parameswaran P, et al. Convergent antibody signatures in human dengue. Cell Host Microbe. 2013;13(6):691–700. doi: 10.1016/j.chom.2013.05.008. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44.Jackson KJL, et al. Human responses to influenza vaccination show seroconversion signatures and convergent antibody rearrangements. Cell Host Microbe. 2014;16(1):105–114. doi: 10.1016/j.chom.2014.05.013. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45.Galson JD, et al. Analysis of B cell repertoire dynamics following hepatitis B vaccination in humans, and enrichment of vaccine-specific antibody sequences. EBioMedicine. 2015;2(12):2070–2079. doi: 10.1016/j.ebiom.2015.11.034. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46.Eisenberg D. Three-dimensional structure of membrane and surface proteins. Annu Rev Biochem. 1984;53(1):595–623. doi: 10.1146/annurev.bi.53.070184.003115. [DOI] [PubMed] [Google Scholar]
  • 47.Tan PH, Sandmaier BM, Stayton PS. Contributions of a highly conserved VH/VL hydrogen bonding interaction to scFv folding stability and refolding efficiency. Biophys J. 1998;75(3):1473–1482. doi: 10.1016/S0006-3495(98)74066-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 48.Ewert S, Honegger A, Plückthun A. Stability improvement of antibodies for extracellular and intracellular applications: CDR grafting to stable frameworks and structure-based framework engineering. Methods. 2004;34(2):184–199. doi: 10.1016/j.ymeth.2004.04.007. [DOI] [PubMed] [Google Scholar]
  • 49.Honegger A, Malebranche AD, Röthlisberger D, Plückthun A. The influence of the framework core residues on the biophysical properties of immunoglobulin heavy chain variable domains. Protein Eng Des Sel. 2009;22(3):121–134. doi: 10.1093/protein/gzn077. [DOI] [PubMed] [Google Scholar]
  • 50.Wang N, et al. Conserved amino acid networks involved in antibody variable domain interactions. Proteins. 2009;76(1):99–114. doi: 10.1002/prot.22319. [DOI] [PubMed] [Google Scholar]
  • 51.Dunn-Walters DK, Isaacson PG, Spencer J. Analysis of mutations in immunoglobulin heavy chain variable region genes of microdissected marginal zone (MGZ) B cells suggests that the MGZ of human spleen is a reservoir of memory B cells. J Exp Med. 1995;182(2):559–566. doi: 10.1084/jem.182.2.559. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 52.Weller S, et al. Human blood IgM “memory” B cells are circulating splenic marginal zone B cells harboring a prediversified immunoglobulin repertoire. Blood. 2004;104(12):3647–3654. doi: 10.1182/blood-2004-01-0346. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 53.Reynaud CA, et al. IgM memory B cells: a mouse/human paradox. Cell Mol Life Sci. 2012;69(10):1625–1634. doi: 10.1007/s00018-012-0971-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 54.Dunand CJH, Wilson PC. Restricted, canonical, stereotyped and convergent immunoglobulin responses. Philos Trans R Soc Lond B Biol Sci. 2015;370(1676):20140238. doi: 10.1098/rstb.2014.0238. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 55.Sajadi MM, et al. λ light chain bias associated with enhanced binding and function of anti-HIV env glycoprotein antibodies. J Infect Dis. 2016;213(1):156–164. doi: 10.1093/infdis/jiv448. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 56.Liu S, et al. Receptor editing can lead to allelic inclusion and development of B cells that retain antibodies reacting with high avidity autoantigens. J Immunol. 2005;175(8):5067–5076. doi: 10.4049/jimmunol.175.8.5067. [DOI] [PubMed] [Google Scholar]
  • 57.Casellas R, et al. Igkappa allelic inclusion is a consequence of receptor editing. J Exp Med. 2007;204(1):153–160. doi: 10.1084/jem.20061918. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 58.Andrews SF, et al. Global analysis of B cell selection using an immunoglobulin light chain-mediated model of autoreactivity. J Exp Med. 2013;210(1):125–142. doi: 10.1084/jem.20120525. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 59.Giachino C, Padovan E, Lanzavecchia A. kappa+lambda+ dual receptor B cells are present in the human peripheral repertoire. J Exp Med. 1995;181(3):1245–1250. doi: 10.1084/jem.181.3.1245. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 60.Brochet X, Lefranc M-P, Giudicelli V. IMGT/V-QUEST: the highly customized and integrated system for IG and TR standardized V-J and V-D-J sequence analysis. Nucleic Acids Res. 2008;36(Web Server issue) suppl 2:W503-8. doi: 10.1093/nar/gkn316. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 61.Ye J, Ma N, Madden TL, Ostell JM. IgBLAST: an immunoglobulin variable domain sequence analysis tool. Nucleic Acids Res. 2013;41(Web Server issue) W1:W34-40. doi: 10.1093/nar/gkt382. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 62.Ippolito GC, et al. Antibody repertoires in humanized NOD-scid-IL2Rγ(null) mice and human B cells reveals human-like diversification and tolerance checkpoints in the mouse. PLoS One. 2012;7(4):e35497. doi: 10.1371/journal.pone.0035497. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 63.Edgar RC. Search and clustering orders of magnitude faster than BLAST. Bioinformatics. 2010;26(19):2460–2461. doi: 10.1093/bioinformatics/btq461. [DOI] [PubMed] [Google Scholar]
  • 64.McLachlan AD. Rapid comparison of protein structures. Acta Crystallogr A. 1982;38(6):871–873. [Google Scholar]
  • 65.Smyth GK. Linear models and empirical Bayes methods for assessing differential expression in microarray experiments: statistical applications in genetics and molecular biology. Stat Appl Genet Mol Biol. 2004;3(1):3. doi: 10.2202/1544-6115.1027. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary File

Articles from Proceedings of the National Academy of Sciences of the United States of America are provided here courtesy of National Academy of Sciences

RESOURCES