ABSTRACT
Historically, antibody reactivity to pathogens and vaccine antigens has been evaluated using serological measurements of antigen-specific antibodies. However, it is difficult to evaluate all antibodies that contribute to various functions in a single assay, such as the measurement of the neutralizing antibody titer. Bulk antibody repertoire analysis using next-generation sequencing is a comprehensive method for analyzing the overall antibody response; however, it is unreliable for estimating antigen-specific antibodies due to individual variation. To address this issue, we propose a method to subtract the background signal from the repertoire of data of interest. In this study, we analyzed changes in antibody diversity and inferred the heavy-chain complementarity-determining region 3 (CDRH3) sequences of antibody clones that were selected upon influenza virus infection in a mouse model using bulk repertoire analysis. A decrease in the diversity of the antibody repertoire was observed upon viral infection, along with an increase in neutralizing antibody titers. Using kernel density estimation of sequences in a high-dimensional sequence space with background signal subtraction, we identified several clusters of CDRH3 sequences induced upon influenza virus infection. Most of these repertoires were detected more frequently in infected mice than in uninfected control mice, suggesting that infection-specific antibody sequences can be extracted using this method. Such an accurate extraction of antigen- or infection-specific repertoire information will be a useful tool for vaccine evaluation in the future.
IMPORTANCE
As specific interactions between antigens and cell-surface antibodies trigger the proliferation of B-cell clones, the frequency of each antibody sequence in the samples reflects the size of each clonal population. Nevertheless, it is extremely difficult to extract antigen-specific antibody sequences from the comprehensive bulk antibody sequences obtained from blood samples due to repertoire bias influenced by exposure to dietary antigens and other infectious agents. This issue can be addressed by subtracting the background noise from the post-immunization or post-infection repertoire data. In the present study, we propose a method to quantify repertoire data from comprehensive repertoire data. This method allowed subtraction of the background repertoire, resulting in more accurate extraction of expanded antibody repertoires upon influenza virus infection. This accurate extraction of antigen- or infection-specific repertoire information is a useful tool for vaccine evaluation.
KEYWORDS: antibody repertoire, influenza virus
INTRODUCTION
B cells, which play a pivotal role in humoral immunity, are characterized by B-cell receptors (BCRs) that recognize antigens. The secreted form of the BCR is referred to as an antibody. BCRs and antibodies comprise two component sets: heavy and light chains. After binding to foreign antigens, antibodies exert effector functions such as opsonization and neutralization of pathogens, activation of the complement system, and initiation of cytotoxic and phagocytic signaling in target cells (1). The antigen-binding specificity of an antibody is mainly determined by the structure of the complementarity-determining regions (CDRs) that interact with antigens in variable regions of the heavy and light chains (2). To recognize different antigens, B cells must generate diverse BCRs/antibodies. The variable regions of the heavy and light chains that give rise to diversity are generated by the recombination of germline genes encoding immunoglobulin (Ig) V (variable), D (diverse), and J (junction) segments: V, D, and J for heavy chains, and V and J for light chains (3). Among CDR1, CDR2, and CDR3, CDR3 of the heavy chains (CDRH3) is the most variable because it is translated from the nucleotide sequence containing the end of the V-gene segment, the entire D-gene segment, and the beginning of the J-gene segment (2). In addition, deletion and insertion of nucleotides at the junction of each segment occur during recombination, resulting in a theoretical diversity of >1013 BCRs/antibodies in humans and mice (4–6). Analysis of BCR/antibody diversity underlying humoral immunity is referred to as antibody repertoire analysis.
Antibody reactivity to the antigens of infectious pathogens and vaccines has been evaluated primarily through serological measurements of antigen-specific antibodies. Diverse antibody populations that are reactive to broad-spectrum antigens are present in serum. However, the composition of the clonal populations of B cells from which these antibodies are derived cannot be determined by measuring antibodies in blood samples. Recent advances in gene cloning technologies in single cells combined with fluorescence-activated cell sorting (FACS), enzyme-linked immunosorbent assays (ELISA), micro-neutralization assays, and hybridoma generation have contributed to the understanding of antibody genes in antigen-specific B cells, particularly memory B cells or plasma cells, which are involved in the specificity and breadth of reactivity of antibodies (7–14). Although single-cell repertoire analysis is a powerful tool for evaluating functional antigen-specific B cells, it is difficult to work with numerous specimens (low throughput) in terms of time, effort, and cost, which limits the size of the obtained data. In contrast, bulk repertoire analysis using next-generation sequencing (NGS), which provides comprehensive repertoire information on BCR/antibody genes, can be conducted with a larger number of samples simultaneously (high throughput) at a considerably lower cost than in the past. Bulk repertoire analysis has been widely applied to monitor B-cell repertoire transitions during vaccine-induced immune responses (15–20). However, bulk repertoire analysis does not provide pairing information for each set of heavy and light chains constituting a single antibody. Furthermore, the bulk repertoire of information cannot be linked to the functions of each clonal population of antibodies, such as the specificity and breadth of antibody reactivity. The latest droplet-based technologies, such as the 10× genomics chromium system, allow us to perform bulk repertoire analysis at the single-cell level (21, 22); however, such methods are expensive and not suitable for processing multiple specimens, as for single-cell repertoire analyses. Therefore, a new analytical method is required to obtain large data sets sufficient to understand the diversity and composition of the clonal population of antigen-specific B cells. As specific interactions between antigens and cell-surface BCRs trigger the proliferation of B-cell clones, the frequency of each BCR sequence in the samples reflects the size of each clonal population. Therefore, we hypothesized that the specific responses of B cells to antigens of pathogens and vaccines can be assessed by examining the frequency of each BCR sequence with and without stimulation as changes in the convergence and diversity of antibody clones.
In the present study, through the application of an influenza mouse model, we established a method to comprehensively evaluate the expansion and diversity of antibodies using bulk repertoire analysis. First, bulk repertoire data of BCR were obtained from each mouse infected once or twice with the influenza virus or from uninfected mice. Second, focusing on CDRH3, we quantified the distribution of length and differences in amino acid sequences for each sample. Finally, by comparing the data obtained among the samples, CDRH3 sequences with frequencies of appearance among all the sequences that were significantly increased by infection were identified. The present results indicate that bulk repertoire information can be quantified and that the quantification of antibody expansion and diversity are useful as indices of humoral immunity induced by infection and vaccination.
RESULTS
Influenza A virus infection reduces the diversity of antibody repertoires in mouse blood B cells
Mice were divided into three groups and intranasally inoculated with 50 plaque-forming units (PFUs) of Influenza A/Puerto Rico/8/1947 (H1N1) (PR8) or phosphate-buffered saline (PBS) twice: once-infected, twice-infected, and PBS control groups (Fig. 1a). Three weeks after the second inoculation, peripheral blood samples were collected. Plasma and blood cell samples were prepared for antibody measurements and bulk repertoire analyses, respectively. Neutralizing antibody titers against the PR8 virus in the plasma were highest in the twice-infected group, followed by the once-infected group, whereas no antibody was detected in the PBS control group (Fig. 1b). These results confirm that PR8 infection induced virus-specific antibodies in mice.
Fig 1.
Sampling schedule for bulk repertoire analysis and neutralizing antibody titers at the time of sampling. (a) Schematic representation of the experimental schedule. BALB/c mice were inoculated twice intranasally with PBS and/or PR8 at 4-week intervals. Each group was referred to as PBS control (N = 4), once-infected group (N = 4), and twice-infected group (N = 5), respectively. Three weeks after the second inoculation, blood samples were collected for neutralization titers and bulk repertoire analysis. (b) Collected plasma samples were evaluated for neutralizing antibody titers against PR8. The circles, squares, and triangles indicate data from the PBS control, once-infected, and twice-infected groups, respectively. The P values were calculated using one-way ANOVA with a multiple-comparison correction.
Bulk repertoire analysis, targeting all variable regions of the heavy chains of IgM or IgG, was performed using NGS. Raw read sequences obtained using the MiSeq sequencer were subjected to Ig gene identification using MiGMAP after a quality check. An average of 14,570 IgM sequence reads was obtained from each sample (Table 1). In contrast, the IgG genes were barely detected. Because the IgG genes in the spleen samples were amplified with the same primers used here, it is assumed that the unsuccessful detection of the genes in the blood cell samples was not due to a problem with the experimental system but because the number of B cells or plasma cells expressing IgG was low in the blood compared to those expressing IgM. This is likely due to their migration to secondary lymph nodes and bone marrow (23, 24) or maintenance under relatively clean and specific pathogen-free (SPF) conditions. Therefore, we focused on the IgM genes for further experiments in this study.
TABLE 1.
Number of IgM sequences obtained in this study
| Group | Mouse ID number | Number of IgM sequences |
|---|---|---|
| PBS control | 1 | 10,771 |
| 2 | 19,160 | |
| 3 | 15,721 | |
| 4 | 20,467 | |
| Once infected | 1 | 12,121 |
| 2 | 15,540 | |
| 3 | 7,655 | |
| 4 | 8,112 | |
| Twice infected | 1 | 17,853 |
| 2 | 11,264 | |
| 3 | 17,557 | |
| 4 | 18,183 | |
| 5 | 15,010 |
The length of the amino acid sequences of IgM CDRH3 ranged from 7 to 24 amino acids. Its distribution pattern was examined for each sample, and the extent of the correlation of the measured value with the cumulative Gaussian distribution (CGD) (25) was quantified to evaluate CDRH3 diversity (Fig. 2a). In the PBS control group, the distribution of CDRH3 lengths was relatively close to a normal distribution, with a peak at 13 or 14 amino acids and an average CGD value of 0.845 (0.78–0.91). However, the once- and twice-infected groups showed a decreasing tendency in the CGD values with average values of 0.68 (0.38–0.94) and 0.716 (0.58–0.82), respectively. There was a significant difference in the CGD values between the PBS control and the twice-infected groups (Fig. 2b). This result indicated a greater deviation from the normal distribution in the infected mouse groups than in the PBS control group, likely because of the expansion of selected B-cell antibodies. Moreover, we calculated the Shannon-Wiener diversity index from the frequency of each unique amino acid sequence in CDRH3 (Fig. 2c). The higher the number of sequences and the greater the evenness of the frequency among all sequences, the higher the Shannon-Wiener diversity index. As expected, the Shannon-Wiener diversity indices were significantly lower in the once- and twice-infected groups than in the control group (once infected vs PBS control, P = 0.029; twice infected vs PBS control, P = 0.016). These results confirmed the reduction in CDRH3 repertoire diversity in infected mice, which may have resulted from the expansion of viral antigen-specific antibodies.
Fig 2.
Reduced values of the CGD and diversity in infected mice. (a) Histograms of CDRH3 size distribution of IgM genes in peripheral blood of individual mice 3 weeks after two inoculations of influenza virus and/or PBS are shown. The best-fit CGD curve for each sample, as determined using the mean and SD values, is overlaid. The x-axis represents the CDR3 size in amino acids, and the y-axis is the proportion of the CDR3 sequences at each size. The CGD values for each individual are indicated in the upper right corner of each panel. (b and c) CGD and diversity values were compared between groups. The P values were calculated using the Mann-Whitney U-test.
The antibody sequences selected in mouse blood B cells upon influenza A virus infection can be extracted from comprehensive sequencing data obtained by NGS
Next, we characterized the IgM CDRH3 sequences at high frequencies in each sample. Edit distance-based evaluation of the amino acid similarity of the IgM CDRH3 sequences was performed to generate a distance matrix, which was visualized on a two-dimensional (2D) map using multidimensional scaling (MDS) (Fig. 3a). All CDRH3 sequences were combined into a single MDS map to indicate the amino acid similarities of CDRH3 sequences from different mice at different distances on the map (Fig. 3b). In the MDS map, each dot represents a unique IgM CDRH3 sequence, and similar and different sequences are placed proximally and distally, respectively. The CDRH3 sequence lengths ranged between 7 and 24 amino acids, with the shortest seven-residue CDRH3 sequences located near the center and longer CDRH3 sequences toward the outside. The relative frequency of each CDRH3 sequence to all other sequences was included on the z-axis to make a three-dimensional (3D) MDS map (Fig. 3c). To demonstrate the differences in the IgM CDRH3 repertoire with or without viral infection, individual 3D MDS map data from each mouse were extracted (Fig. 3d). As expected from the CGD results, diverse CDRH3 sequences with low frequencies were more distributed in the 3D MDS maps in the PBS control group compared to those in the once- and twice-infected groups. In contrast, a relatively smaller number of CDRH3 sequences were distributed with a higher frequency in the maps of infected mice, suggesting the selective antibody populations were expanded upon infection. The expansion of selected CDRH3 sequences also tended to be small in once-infected mouse #1 and twice-infected mouse #3, which is consistent with the high CGD values shown in Fig. 2a.
Fig 3.
Visualization of selected CDRH3 sequences. (a) A flowchart of the conversion from CDRH3 amino acid sequences to an MDS map is shown. (b) CDRH3 sequences from all mice were combined into a single 2D MDS map. Both the x- and y-axes represent the number of different amino acids in CDRH3. The different lengths of CDRH3 amino acids are indicated by color, as shown in the right panel. (c) The z-axis represents the proportion of each sequence in the 2D MDS map (b). (d) The combined graph (c) is divided for each mouse.
To highlight the IgM CDRH3 sequences that increased in frequency in response to influenza A virus infection, 3D MDS maps of infected and uninfected mice were converted to subtractable kernel density estimation (KDE) maps (Fig. 4a). This KDE map indicated frequent CDRH3 sequence populations (for example, prevalent CDRH3 sequences and closely related sequences that emerged from them through mutation) as high-frequency areas. By subtracting the KDE of the uninfected group from that of the infected groups, the populations of CDRH3 sequences enlarged by influenza A virus infection were identified as several hotspots (Fig. 4b). The validation and statistical analysis of the differences between subtraction and non-subtraction scenarios are analyzed in detail in the supplemental material. The subtracted KDE map was overlaid on the MDS obtained from all mouse data (Fig. 4c).
Fig 4.
Highlighting expanded antibody sequences by subtracting repertoire background. (a) A flowchart of the conversion from MDS maps to KDE maps to subtract background repertoire is shown. (b) 2D KDE maps for the uninfected control group (PBS control) and the infected groups (once and twice infected) are shown on the left. A figure subtracting the 2D KDE values of the uninfected control group from the 2D KDE values of the infected group is shown on the right. Differences in KDE are indicated by color as shown on the right side of the panel. Both the x- and y-axes represent the number of different amino acids in CDRH3. (c) The subtracted 2D KDE map is overlaid with the 2D MDS map in Fig. 3b. The different lengths of CDRH3 amino acids are indicated by color, as shown in the right panel. (d) The positions of the sequences in top 10 clusters selected according to the sum of 6D KDE difference value of sequences in each cluster on the 2D KDE map are indicated by dots. Red dots indicate sequences from infected mice, and blue dots indicate sequences from control mice.
The MDS illustrations in Fig. 3 and 4 are represented by compressing multidimensional data forcibly into two dimensions for showing on the paper. For this reason, the results of the 2D analysis may be somewhat inconsistent with those of the multidimensional analysis. Theoretically, higher dimensional analysis leads to more accurate results. In supplemental material, the results of the 2D and 6D analyses are compared in detail, confirming the advantage of the 6D analysis. We therefore analyzed the MDS and KDE in 6D space. The 6D KDE of infected mice was subtracted from the 6D KDE of control mice to obtain the KDE differential values for each sequence in 6D space. In addition, clustering analysis was employed to extract infection-specific sequences, because the results of the KDE analysis indicated the presence of areas of concentration of CDRH3 amino acid sequences in infected mice (supplemental material). All CDRH3 amino acid sequences were divided into 993 no-overlapping clusters of sequences by connecting one amino acid difference from each extracted sequence. The sum of 6D KDE difference value of sequences in each cluster was then calculated and ranked. The top 10 clusters are listed in Table 2; Table S7. The distribution of sequences belonging to the top 10 clusters in two dimensions is shown in Fig. 4d. Although there was some scatter in the distribution due to the compression of the 6D data into 2D map, the distribution was generally consistent with the hot areas in the 2D KDE map.
TABLE 2.
The information of the top 10 clusters ranked from the sum of 6D KDE difference value of sequences in each cluster
| Rank | CDRH3 length | Total number of sequences | Consensus amino acid sequence | Total number of mice detected | Number of infected mice detected | V and J combinations |
|---|---|---|---|---|---|---|
| 1 | 19 | 2,420 | CAREEVAYYSNYLYYFDYW | 1 | 1 | IGHV3-6/IGHJ2 |
| 2 | 15 | 865 | CARRRTAQATWFAYW | 2 | 2 | IGHV1-50/IGHJ3, IGHV1-18/IGHJ3 |
| 3 | 12 | 3,030 | CARRGYYAMDYW | 7 | 4 | IGHV2-6/IGHJ4, IGHV2-9-1/IGHJ4, IGHV4-1/IGHJ4, IGHV9-4/IGHJ4, IGHV9-3/IGHJ4, IGHV5-15/IGHJ4, IGHV3-6/IGHJ4, IGHV1-18/IGHJ4, IGHV10-3/IGHJ4, IGHV8-12/IGHJ4, IGHV10-3/IGHJ2, IGHV1-19/IGHJ4, IGHV1-26/IGHJ4, IGHV1-26/IGHJ3, IGHV8-8/IGHJ4, IGHV1-69/IGHJ4, IGHV1-15/IGHJ4, IGHV1-53/IGHJ4 |
| 4 | 13 | 2,305 | CARVNWDWYFDVW | 4 | 3 | IGHV8-8/IGHJ1, IGHV1-39/IGHJ1, IGHV1-82/IGHJ1, IGHV1-64/IGHJ1, IGHV7-1/IGHJ1 |
| 5 | 15 | 634 | CTLFITTVEGYFDVW | 1 | 1 | IGHV14-1/IGHJ1 |
| 6 | 11 | 1,155 | CARGNWYFDVW | 5 | 3 | IGHV8-12/IGHJ1, IGHV3-6/IGHJ1, IGHV1-82/IGHJ1, IGHV5-16/IGHJ1, IGHV1-52/IGHJ1 |
| 7 | 13 | 849 | CASQTAQVWFAYW | 1 | 1 | IGHV1-53/IGHJ3 |
| 8 | 11 | 627 | CARVWYAMDYW | 1 | 1 | IGHV5-16/IGHJ4 |
| 9 | 10 | 761 | CTMVTTGGYW | 1 | 1 | IGHV1-15/IGHJ2 |
| 10 | 17 | 1,058 | CARSGAYYSNHYAMDYW | 2 | 2 | IGHV1-42/IGHJ4, IGHV1-42/IGHJ2, IGHV1-39/IGHJ4 |
The first-ranked cluster containing 2,420 reads of sequences exhibited a combination of immunoglobulin heavy chain (IGH) V3-6 and IGHJ2 segments and was located within right side hot spot with approximate (x, y) coordinates (9, 2.5), consisting of this single cluster. This cluster had a relatively long CDRH3 sequence of 19 amino acids, and the consensus sequence was CAREEVAYYSNYLYYFDYW. All sequences of this cluster were detected in an infected mouse (twice-infected mouse #1) and were presumably derived from a single B cell ancestor. The 2nd- and 10th-ranked clusters consisting of 865 and 1,058 reads had a relatively long CDRH3 sequence of 15 and 17 amino acids, with CARRRTAQATWFAYW and CARSGAYYSNHYAMDYW as the consensus sequences. The sequences of the second cluster consisted of two repertoires (IGHV1-50/IGHJ3 and IGHV1-18/IGHJ3), and the sequences of 10th cluster consisted of three repertoires (IGHV1-42/IGHJ4, IGHV1-42/IGHJ2, and IGHV1-39/IGHJ4). The third-, fourth-, and sixth-ranked clusters had relatively short CDRH3 sequences of 12, 13, and 11 amino acids, with approximate (x, y) coordinates (−2, 2.5), (0,–6), and (-0.5,–4) in Fig. 4d, respectively: the consensus sequences were CARRGYYAMDYW (third), CARVNWDWYFDVW (fourth), and CARGNWYFDVW (sixth). These clusters were found in both infected and control mice and consisted of five or more repertoires (Table 2; Table S7). Sequences from infected mice are indicated by red dots, and sequences from control mice are indicated by blue dots in Fig. 4d. These clusters were detected as multiple repertoires from multiple mice, including control mice, suggesting that these CDRH3 sequences may be relatively common sequences and frequently induced repertoires upon influenza virus infection. Due to the relatively short length of these sequences, it may be possible to easily construct those CDRH3 sequences from combinations of various V, D, and J segments. The other clusters consisted of a single repertoire from a single infected mouse. The majority of the CDRH3 sequences in the highly ranked clusters were derived from infected mice and some were concentrated in hot spots. For example, the third and eighth clusters were located at approximate (x, y) coordinates (−2.5, 2.5). Indeed, the consensus sequences of these clusters were similar, suggesting that repertoires with relatively similar CDRH3 sequences are induced in response to influenza virus infection. This may further indicate that antibodies with these CDRH3 sequences are specific for influenza virus antigens and that selected populations of IgM-expressing B cells are commonly increased by viral infection. Collectively, these results suggest that bulk repertoire analysis can provide information on changes in the diversity of antibody sequences and their expansion to selected populations.
DISCUSSION
Several computational approaches have been developed over the past decade to process and analyze comprehensive immune repertoire data obtained from NGS (26). Some open-source software tools, including MiXCR, Vidjil, IMSEQ, and IgReC (27–30), extract the repertoire from raw sequence reads; determine CDR3 regions, V, D, and J segments, and their boundaries; and generate lists of clonotypes with the number of each clonotype. Although these tools can extract clonotypes that appear at a high frequency, this does not mean that the abundance of the detected clonotypes results from the selection and proliferation of the populations, specifically upon infection, if repertoire bias is not considered. Briefly, background data and noise must be removed from the data set to appropriately evaluate immune responses; however, such a method has yet to be established. Therefore, for functional antibody repertoires of antigen-specific B cells or plasma cells, researchers must perform single-cell repertoire analysis combined with FACS, ELISA, and micro-neutralization assays, which require significant expense, time, and effort (7–14).
In the present study, we aimed to establish a method for evaluating the diversity and expansion of antibodies against influenza viruses using bulk repertoire analysis. Focusing on the IgM-expressing B cells, which are abundantly present in the blood cell samples of mice, we performed a bulk repertoire analysis with blood cell samples from the PBS control and once- or twice-infected mice. The diversity of CDRH3 sequences was quantified in terms of (i) the correlation of the distribution of sequence length with the normal distribution and (ii) the frequency of each unique amino acid sequence. Finally, we integrated the quantified data into 6D MDS and converted them into 6D KDE values. By subtracting the 6D KDE value of the control mice from that of the infected animals, we identified antibody repertoires that increased in frequency upon influenza virus infection, compensating for background noise (Table 2). These results may have resulted from the expansion of the selected antibody sequences. However, the actual antibody diversity is produced by pairing of heavy and light chains. Therefore, the present analysis of the heavy chains alone does not reveal the true infection-specific antibody repertoires. Furthermore, it remains unclear whether these highlighted repertoires are antigen-specific antibody sequences because the specificity of the antibodies to the influenza virus has not been confirmed. In the future, this should be confirmed by analyzing single repertoires from antigen-specific B cells (e.g., using 10× genomics chromium system) simultaneously with bulk repatriation analysis and/or affinity assays of in vitro-expressed antibodies with sequences identified for the antigens.
Of the top 10 clusters, 5 clusters contained sequences from two or more infected mice, indicating that IgM repertoires with similar CDRH3 sequences were induced by influenza virus infection in multiple mice. These findings suggest that common repertoires are reactive to specific antigens in different individuals. This notion is supported by human studies showing that antibodies against IGHV1-69 or IGHV3-7 were preferably induced in different participants by influenza virus infection and influenza vaccines among over 100 types of IGHV (31–35). Furthermore, IGHV1-69 is a V segment often found in antibodies that neutralize a broad spectrum of viruses, such as influenza, hepatitis C, and human immunodeficiency viruses (36–38). Although no specific V segments were detected at high frequencies in this study, identifying such repertoires provides insights into the quality of antibodies induced in response to antigens.
There are technical hurdles to overcome when evaluating comprehensive immune responses based on bulk repertoire analyses. A bias exists in antibody repertoires that result from historically encountered antigens and is induced by other factors, such as exposure to dietary antigens, allergic pollens, self-antigens, and other infections (39). Therefore, intraindividual variation in the BCR/antibody repertoire should be considered in addition to interindividual variation, which is defined as a set of clusters common to multiple individuals (39). It was not possible to obtain blood cell samples from identical mice before and after infection to compare the bulk BCR repertoire between them with the consideration of intra-individual variation in this study because whole blood needed to be collected from one mouse for a sufficient number of samples for further analyses. In the future, we plan to perform bulk BCR repertoire analyses using peripheral blood cell samples collected from the same human individuals at multiple time points. In such studies, it is necessary to minimize background noise due to intra-individual differences by subtracting the pre-infection or pre-immunization data from the post-infection or post-immunization data.
Another point of concern is that only IgM sequences were used for repertoire analysis because IgG sequences were not detected in the present study. However, both IgM and IgG sequences were obtained from the spleen samples of mice, with average read counts of 24,328 and 20,506, respectively. Because of the successful amplification of IgG genes in spleen samples, the failure to detect them in blood cell samples was not due to experimental problems in the polymerase chain reaction (PCR) step. Instead, a notably lower number of B cells or plasma cells expressing IgG in the blood of laboratory mice may have caused the migration of memory B or plasma cells to secondary lymph nodes or bone marrow, respectively (23, 24). In addition, PCR-based amplification of target sequences during library preparation for NGS analysis was performed with a minimal number of cycles to reduce PCR bias. As a result of this semiquantitative amplification, the libraries were thought to reflect the actual mRNA amounts in the samples. Unlike laboratory mice, which are maintained under relatively clean and SPF conditions, the blood cells of human clinical samples are expected to contain more abundant IgG-expressing B cells due to historical exposure to diverse antigens (39). IgG is the class of antibodies produced by already primed B cells that undergo a class switch, while IgM is the class of antibodies possessed by relatively naive B cells. Therefore, the present repertoire analysis with IgM is likely to reflect primary responses (priming) by naive B cells. On the other hand, the repertoire analysis with IgG may predominantly reflect secondary responses (boosting) by memory B cells. In the future, we plan to establish an analytical method for determining the IgG repertoire using human blood cell samples.
In summary, the present study indicates that bulk repertoire analysis can provide information on changes in the diversity of antibody sequences and expansion of selected B cell populations after influenza virus infection. Because repertoire analyses with bulk samples are more cost-effective than those with single cells, our method can be easily applied in various types of studies, especially those with large sample sizes. Furthermore, our approach provides a method for quantifying the diversity of repertoires and enables the elimination of background influences to extract stimulation-associated immune responses. Although there are several problems to overcome, bulk repertoire information has the potential to be used in future studies on vaccines and infectious diseases, including the evaluation of immunogenicity.
MATERIALS AND METHODS
Cells and viruses
Madin-Darby canine kidney (MDCK) cells were grown in RP10 [Roswell Park Memorial Institute (RPMI) 1640 (Thermo Fisher Scientific, MA, USA) supplemented with 10% inactivated fetal calf serum (FCS) (GE Healthcare UK Ltd, Little Chalfont, Buckinghamshire, UK)], 1 mM of sodium pyruvate (Thermo Fisher Scientific), 50 µM of 2-mercaptoethanol (Merck, Darmstadt, Germany), 100 U/mL of penicillin (Thermo Fisher Scientific), 100 µg/mL of streptomycin (Thermo Fisher Scientific), and 20 µg/mL of gentamicin (Thermo Fisher Scientific). These materials were used for neutralization and plaque assays. Influenza virus A/Puerto Rico/8/34 (H1N1; PR8) was provided by the National Institute of Infectious Diseases, Japan. The virus was propagated in 10-day-old, embryonated chicken eggs. The collected allantoic fluids were stored at −80°C until use.
Mice
Six-week-old female BALB/c mice were purchased from Hokudo Co. Ltd. (Sapporo, Japan) and maintained in the BSL-2 laboratory at the Research Center for Zoonosis Control, Hokkaido University. The PR8 virus at 50 PFUs in 50 µL of PBS and/or PBS was intranasally inoculated into mice twice at 4-week intervals under inhalation anesthesia with isoflurane. Three weeks after the second inoculation, blood samples were collected for repertoire analysis and determination of neutralizing antibody titers.
Micro-neutralization ELISA
Receptor-destroying enzyme (RDE II; Denka Company, Tokyo, Japan)-treated mouse sera were serially twofold diluted with PBS in 96-well microplates. The diluted sera were mixed with an equal volume of virus (100 PFUs), and the virus-serum mixtures were incubated at 37°C for 1 h. MDCK cell suspensions containing 1 × 106 cells in 100 µL were added to 50 µL of virus-serum mixture and incubated in 96-well plates in the presence of 1 µg/mL tosyl phenylalanyl chloromethyl ketone-trypsin at 37°C for 24 h. Cell monolayers were washed with PBS and fixed in cold acetone (80%) for 15 min. The presence of nucleoprotein (NP) of the influenza virus was detected using ELISA with a specific monoclonal antibody (HB65/H16-L10-4RS: BioXCell, Lebanon, NH, USA).
ELISA was performed at room temperature. The fixed monolayers were washed thrice with PBS containing 0.05% Tween 20 (wash buffer). After incubation for 1 h with blocking buffer containing 1% bovine serum albumin in PBS, anti-NP antibody diluted to 1/6,000 in blocking buffer was added to each well. The plates were then incubated at RT for 1 h. The plates were washed thrice with wash buffer, and 100 µL of horseradish peroxidase-labeled goat anti-mouse IgG (Thermo Fisher Scientific) diluted to 1/6,000 in blocking buffer was added to each well. The plates were incubated for 1 h at RT and washed thrice with wash buffer. Freshly prepared substrate (100 µL) using SIGMAFAST OPD (Thermo Fisher Scientific) was added to each well, and the plates were incubated at RT for 15 min. The reaction was stopped by the addition of 50 µL of 2 N sulfuric acid. Absorbance at 490 nm (A490) was measured using a microplate reader. The intermediate optical density (OD) value was determined from quadruplicate virus-infected and uninfected control wells, and the neutralization titer was determined as the maximum dilution below the intermediate OD value.
RNA preparation, cDNA synthesis, 5′-RACE PCR amplification, and amplicon sequencing
The methods described in this section were modified from those described previously (40, 41). Total RNA was extracted separately from PBMCs using ISOGEN (Nippon Gene Co. Ltd., Tokyo), and the extracted RNA was purified using a PureLink RNA mini kit (Thermo Fisher Scientific). The purified RNA was used for first-strand cDNA synthesis using a SMARTer RACE cDNA Amplification Kit (Takara Bio USA, Inc. Mountain View, CA, USA) with oligo-dT-containing 5′-rapid amplification of cDNA ends (RACE) CDS Primer A and SMARTer II A oligonucleotides. The cDNAs were amplified using PCR in a 20 µL reaction mixture containing 0.5 µL of cDNA, 0.5 U of Takara Ex Taq DNA Polymerase (Takara), 200 µM of each dNTP, and 250 nM of primers in 1× Ex Taq buffer. Illumina tail-tagged universal forward primers specific for 5′-RACE adaptor sequence (Uni_F; 5′-ACACTCTTTCCCTACACGACGCTCTTCCGATCTNNNNAAGCAGTGGTATCAACGCAGAGT-3′) were used withIllumina tail-tagged reverse primers specific for immunoglobulin-constant-region-1 of IgM (Cµ) (mu_IgM_R; 5′-GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCTNNNNTCTCGCAGGAGACGAGGGGGAAGACATTTG-3′), or IgGs (Cγ1, Cγ2a, Cγ2b, Cγ2c, and Cγ3) (5′- GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCTNNNNGGRCCARKGGATAGACHGATGGGGSTGTYG-3′) to amplify the repertoires of each isotype through thermal cycling (94°C for 2 min, 30 cycles of 94°C for 30 s, 63°C for 30 s, 72°C for 30 s, and a final extension at 72°C for 5 min). The Illumina tail-tagged products were amplified to add index through PCR in a 20 µL reaction mixture containing 0.5 µL of amplicon, 0.5 U of Takara Ex Taq DNA Polymerase (Takara), 200 µM of each dNTP, and 250 nM of index primers in 1× Ex Taq buffer through thermal cycling (94°C for 2 min, 15 cycles of 94°C for 30 s, 63°C for 30 s, 72°C for 30 s, and a final extension at 72°C for 5 min). The index primers are Illumina-i5 primers [5′-AATGATACGGCGACCACCGAGATCTACAC(8-bp index) ACACTCTTTCCCTACACGACGCTCTTCCGATCT-3′] and Illumina-i7 primers [5′-CAAGCAGAAGACGGCATACGAGAT(8-bp index) GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT-3′]. Index is an 8-bp nucleotide that provides a unique index for each sample. The obtained amplicons were pooled for each mouse to reflect the amount of IgM and IgG in each individual and then quantified using agarose gel electrophoresis. An equal amount of sample from each mouse was pooled into one library, and the 600–800-bp PCR products were gel-purified using a PCR purification kit (Qiagen, Venlo, the Netherlands). The PCR amplicon library was sequenced with MiSeq (42, 43) using a 300-bp paired-end sequencing protocol and the MiSeq Sequencing Reagent Kit v3 (Illumina, Hayward, CA, USA) with a 25% PhiX DNA spike-in control, according to the manufacturer’s instructions.https://www.ncbi.nlm.nih.gov/bioproject/PRJNA1046241
Sequence pre-processing
The quality of the raw sequences was checked using FastQC (44). Leading and trailing low-quality nucleotides with scores below 20 were removed using Trimmomatic (45). Paired-end reads were merged using the FLASH program with a maximum overlap length of 300 bp (46). Adapter sequences were removed by the Cutadapt program (47). Sequences shorter than 400 bp were removed using SeqKit (48).
VDJ gene mapping and characterization of CDRH3 sequences
For each nucleotide sequence of the variable regions of the antibody heavy chain, clonotypes consisting of the V, D, and J gene identifiers and CDRH3 amino acid sequences were identified using IgBLAST (49), referred to as the MiGMAP (50) and VDJtools (51). The diversity of clonotypes in each mouse was analyzed using the mean Shannon-Wiener index (52). The length distribution of the IgM CDRH3 amino acid sequences in each mouse was calculated by counting the number of CDRH3 amino acid sequences of each length. The normal distribution closest to the distribution was fitted using mean and standard deviation. The CGD was calculated using Pearson’s correlation coefficient between observed counts and the fitted normal distribution discretized at the observed sequence length (25).
Visualization of CDRH3 amino acid sequences
The CDRH3 amino acid sequences from 13 mice were merged into a single file, and identical amino acid sequences were removed, except for one. From the resulting 7,046 unique amino acid sequences of CDRH3, a distance matrix of edit distances for all pairs was obtained using the stringdist package of R software (53, 54). The coordinates of two axes in the 2D map and six axes in the 6D map for each unique CDRH3 amino acid sequence were calculated using MDS analysis (55). Briefly, the sum of the differences between the edit distances in the distance matrix and the Euclidean distances of the corresponding points in the map were minimized using the Smacof algorithm (55) implemented in a custom-made program. A 2D MDS map and a 6D MDS map of infected mice and control mice were created by selecting the MDS coordinates of the CDRH3 amino acid sequences observed in infected mice and control mice, respectively.
The densities of CDRH3 amino acid sequences of infected mice at each point in the 2D MDS map and 6D MDS map were estimated from relative frequencies and MDS coordinates of the CDRH3 amino acid sequences found in infected mice by conducting the KDE using the ks-package in R software (56). The densities of CDRH3 amino acid sequences of control mice were also estimated in the same manner. The differences in kernel density values between infected and uninfected mice at each point on the MDS maps were calculated by subtracting the kernel density value on the MDS map of uninfected mice from that of the infected mice. The resulting KDE maps highlight the areas on the 2D MDS map and the space in the 6D MDS map where the CDRH3 amino acid sequences of infected mice were observed more frequently than CDRH3 amino acid sequences of uninfected mice.
Clustering of CDRH3 amino acid sequences and KDE-based ranking
All 7,046 CDRH3 amino acid sequences were grouped into no-overlapping clusters of sequences using the DBSCAN algorithm with an epsilon neighborhood parameter of one (57). For each of the resulting 993 clusters, the consensus amino acid sequence of CDRH3 sequences in the cluster was calculated using the msa package in the R software (58). The resulting consensus amino acid sequences were sorted according to cluster size. For each cluster, the differences in kernel density values between infected and uninfected mice in the 6D KDE maps for all sequences in the cluster were summed up. This summed difference in two 6D KDS maps was used to rank the 993 clusters. The top 10 clusters having the largest difference between infected and control mice were selected. See supplemental material for detailed information on the statistical analysis of the CDRH3 sequence data.
ACKNOWLEDGMENTS
The authors thank the National Institute of Infectious Diseases for providing the influenza virus A/Puerto Rico/8/34 (H1N1).
The project was supported by the Japan Initiative for Global Research Network on Infectious Diseases (J-GRID; JP19fm0108008), Japan Program for Infectious Diseases Research and Infrastructure (JIDRI; JP23wm0125008), Research Program on Emerging and Re-emerging Infectious Diseases (22fk0108142), Japan Initiative for World-leading Vaccine Research and Development Centers (JP223fa627005), Program on R&D of new generation vaccine including new modality application (233fa827012 and 233fa827021), and Advanced Research & Development Programs for Medical Innovation (AMED-CREST; 23gm1610011) from Japan Agency for Medical Research and Development (AMED), Global Institution for Collaborative Research and Education (GI-CoRE) program of Hokkaido University, Program for Leading Graduate Schools (F01) from the Japan Society for the Promotion of Science (JSPS), and Doctoral Program for World-leading Innovative & Smart Education (WISE) Program (1801) from the Ministry of Education, Culture, Sports, Science, and Technology (MEXT). M.O. (17K15367 and 21H02376), T.S. (22K15483), K.I. (21H03490), T.K. (23K15362), and M.S. (18K07135) were supported by JSPS KAKENHI. C.H. was supported by the WISE Program (1801) and MEXT.
Contributor Information
Kimihito Ito, Email: itok@czc.hokudai.ac.jp.
Hiroshi Kida, Email: kida@vetmed.hokudai.ac.jp.
Colin R. Parrish, Cornell University Baker Institute for Animal Health, Ithaca, New York, USA
ETHICS APPROVAL
All experiments with mice were performed with the approval (Approval No. 21-0020) of the Animal Care and Use Committee of Hokkaido University, following the Fundamental Guidelines for Proper Conduct of Animal Experiments and Related Activities in Academic Research Institutions under the jurisdiction of the Ministry of Education, Culture, Sports, Science, and Technology in Japan.
DATA AVAILABILITY
The raw sequence data were submitted to the Sequence Read Archive of the National Library of Medicine under BioProject PRJNA1046241.
SUPPLEMENTAL MATERIAL
The following material is available online at https://doi.org/10.1128/jvi.01995-23.
Original data for supplemental text.
Includes 6 supplemental figures and 11 supplemental tables.
ASM does not own the copyrights to Supplemental Material that may be linked to, or accessed through, an article. The authors have granted ASM a non-exclusive, world-wide license to publish the Supplemental Material files. Please contact the corresponding author directly for reuse.
REFERENCES
- 1. Lu L, Suscovich T, Fortune S, Alter G. 2018. Beyond binding: antibody effector functions in infectious diseases. Nat Rev Immunol 18:46–61. doi: 10.1038/nri.2017.106 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2. Xu J, Davis M. 2000. Diversity in the CDR3 region of V-H is sufficient for most antibody specificities. Immunity 13:37–45. doi: 10.1016/s1074-7613(00)00006-6 [DOI] [PubMed] [Google Scholar]
- 3. Tonegawa S. 1983. Somatic generation of antibody diversity. Nature 302:575–581. doi: 10.1038/302575a0 [DOI] [PubMed] [Google Scholar]
- 4. Elhanati Y, Sethna Z, Marcou Q, Callan CG, Mora T, Walczak AM. 2015. Inferring processes underlying B-cell repertoire diversity. Philos Trans R Soc Lond B Biol Sci 370:20140243. doi: 10.1098/rstb.2014.0243 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5. Glanville J, Zhai W, Berka J, Telman D, Huerta G, Mehta GR, Ni I, Mei L, Sundar PD, Day GMR, Cox D, Rajpal A, Pons J. 2009. Precise determination of the diversity of a combinatorial antibody library gives insight into the human immunoglobulin repertoire. Proc Natl Acad Sci U S A 106:20216–20221. doi: 10.1073/pnas.0909775106 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6. Nadel B, Feeney AJ. 1997. Nucleotide deletion and P addition in V(D)J recombination: a determinant role of the coding-end sequence. Mol Cell Biol 17:3768–3778. doi: 10.1128/MCB.17.7.3768 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7. Kurosawa N, Yoshioka M, Fujimoto R, Yamagishi F, Isobe M. 2012. Rapid production of antigen-specific monoclonal antibodies from a variety of animals. BMC Biol 10:80. doi: 10.1186/1741-7007-10-80 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8. von Boehmer L, Liu C, Ackerman S, Gitlin AD, Wang Q, Gazumyan A, Nussenzweig MC. 2016. Sequencing and cloning of antigen-specific antibodies from mouse memory B cells. Nat Protoc 11:1908–1923. doi: 10.1038/nprot.2016.102 [DOI] [PubMed] [Google Scholar]
- 9. Walker LM, Phogat SK, Chan-Hui P-Y, Wagner D, Phung P, Goss JL, Wrin T, Simek MD, Fling S, Mitcham JL, Lehrman JK, Priddy FH, Olsen OA, Frey SM, Hammond PW, Kaminsky S, Zamb T, Moyle M, Koff WC, Poignard P, Burton DR, Protocol G Principal Investigators . 2009. Broad and potent neutralizing antibodies from an African donor reveal a new HIV-1 vaccine target. Sci 326:285–289. doi: 10.1126/science.1178746 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10. Wardemann H, Yurasov S, Schaefer A, Young JW, Meffre E, Nussenzweig MC. 2003. Predominant autoantibody production by early human B cell precursors. Sci 301:1374–1377. doi: 10.1126/science.1086907 [DOI] [PubMed] [Google Scholar]
- 11. Li G-M, Chiu C, Wrammert J, McCausland M, Andrews SF, Zheng N-Y, Lee J-H, Huang M, Qu X, Edupuganti S, Mulligan M, Das SR, Yewdell JW, Mehta AK, Wilson PC, Ahmed R. 2012. Pandemic H1N1 influenza vaccine induces a recall response in humans that favors broadly cross-reactive memory B cells. Proc Natl Acad Sci USA 109:9047–9052. doi: 10.1073/pnas.1118979109 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12. Wrammert J, Smith K, Miller J, Langley WA, Kokko K, Larsen C, Zheng N-Y, Mays I, Garman L, Helms C, James J, Air GM, Capra JD, Ahmed R, Wilson PC. 2008. Rapid cloning of high-affinity human monoclonal antibodies against influenza virus. Nature 453:667–671. doi: 10.1038/nature06890 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13. Wrammert J, Koutsonanos D, Li G-M, Edupuganti S, Sui J, Morrissey M, McCausland M, Skountzou I, Hornig M, Lipkin WI, et al. 2011. Broadly cross-reactive antibodies dominate the human B cell response against 2009 pandemic H1N1 influenza virus infection. J Exp Med 208:181–193. doi: 10.1084/jem.20101352 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14. Yu X, Tsibane T, McGraw PA, House FS, Keefer CJ, Hicar MD, Tumpey TM, Pappas C, Perrone LA, Martinez O, Stevens J, Wilson IA, Aguilar PV, Altschuler EL, Basler CF, Crowe JE. 2008. Neutralizing antibodies derived from the B cells of 1918 influenza pandemic survivors. Nature 455:532–536. doi: 10.1038/nature07231 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15. DeKosky BJ, Ippolito GC, Deschner RP, Lavinder JJ, Wine Y, Rawlings BM, Varadarajan N, Giesecke C, Dörner T, Andrews SF, Wilson PC, Hunicke-Smith SP, Willson CG, Ellington AD, Georgiou G. 2013. High-throughput sequencing of the paired human immunoglobulin heavy and light chain repertoire. Nat Biotechnol 31:166–169. doi: 10.1038/nbt.2492 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16. Jiang N, He J, Weinstein JA, Penland L, Sasaki S, He X-S, Dekker CL, Zheng N-Y, Huang M, Sullivan M, Wilson PC, Greenberg HB, Davis MM, Fisher DS, Quake SR. 2013. Lineage structure of the human antibody repertoire in response to influenza vaccination. Sci Transl Med 5:171ra19. doi: 10.1126/scitranslmed.3004794 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17. Krause JC, Tsibane T, Tumpey TM, Huffman CJ, Briney BS, Smith SA, Basler CF, Crowe JE. 2011. Epitope-specific human influenza antibody repertoires diversify by B cell intraclonal sequence divergence and interclonal convergence. J Immunol 187:3704–3711. doi: 10.4049/jimmunol.1101823 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18. Liao H-X, Chen X, Munshaw S, Zhang R, Marshall DJ, Vandergrift N, Whitesides JF, Lu X, Yu J-S, Hwang K-K, et al. 2011. Initial antibodies binding to HIV-1 gp41 in acutely infected subjects are polyreactive and highly mutated. J Exp Med 208:2237–2249. doi: 10.1084/jem.20110363 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19. Wu X, Zhou T, Zhu J, Zhang B, Georgiev I, Wang C, Chen X, Longo NS, Louder M, McKee K, et al. 2011. Focused evolution of HIV-1 neutralizing antibodies revealed by structures and deep sequencing. Sci 333:1593–1602. doi: 10.1126/science.1207532 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20. Boyd SD, Marshall EL, Merker JD, Maniar JM, Zhang LN, Sahaf B, Jones CD, Simen BB, Hanczaruk B, Nguyen KD, Nadeau KC, Egholm M, Miklos DB, Zehnder JL, Fire AZ. 2009. Measurement and clinical monitoring of human lymphocyte clonality by massively parallel V-D-J pyrosequencing. Sci Transl Med 1:12ra23. doi: 10.1126/scitranslmed.3000540 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21. Eyer K, Doineau RCL, Castrillon CE, Briseño-Roa L, Menrath V, Mottet G, England P, Godina A, Brient-Litzler E, Nizak C, Jensen A, Griffiths AD, Bibette J, Bruhns P, Baudry J. 2017. Single-cell deep phenotyping of IgG-secreting cells for high-resolution immune monitoring. Nat Biotechnol 35:977–982. doi: 10.1038/nbt.3964 [DOI] [PubMed] [Google Scholar]
- 22. Shembekar N, Hu H, Eustace D, Merten CA. 2018. Single-cell droplet microfluidic screening for antibodies specifically binding to target cells. Cell Rep 22:2206–2215. doi: 10.1016/j.celrep.2018.01.071 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23. Aaron TS, Fooksman DR. 2022. Dynamic organization of the bone marrow plasma cell niche. FEBS J 289:4228–4239. doi: 10.1111/febs.16385 [DOI] [PubMed] [Google Scholar]
- 24. Mesin L, Ersching J, Victora GD. 2016. Germinal center B cell dynamics. Immunity 45:471–482. doi: 10.1016/j.immuni.2016.09.001 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25. Ademokun A, Wu YC, Martin V, Mitra R, Sack U, Baxendale H, Kipling D, Dunn-Walters DK. 2011. Vaccination-induced changes in human B-cell repertoire and pneumococcal IgM and IgA antibody at different ages. Aging Cell 10:922–930. doi: 10.1111/j.1474-9726.2011.00732.x [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26. Miho E, Yermanos A, Weber CR, Berger CT, Reddy ST, Greiff V. 2018. Computational strategies for dissecting the high-dimensional complexity of adaptive immune repertoires. Front Immunol 9:224. doi: 10.3389/fimmu.2018.00224 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27. Duez M, Giraud M, Herbert R, Rocher T, Salson M, Thonier F. 2016. Vidjil: a web platform for analysis of high-throughput repertoire sequencing. PLoS One 11:e0166126. doi: 10.1371/journal.pone.0166126 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28. Kuchenbecker L, Nienen M, Hecht J, Neumann AU, Babel N, Reinert K, Robinson PN. 2015. IMSEQ-a fast and error aware approach to immunogenetic sequence analysis. Bioinform 31:2963–2971. doi: 10.1093/bioinformatics/btv309 [DOI] [PubMed] [Google Scholar]
- 29. Shlemov A, Bankevich S, Bzikadze A, Turchaninova MA, Safonova Y, Pevzner PA. 2017. Reconstructing antibody repertoires from error-prone immunosequencing reads. J Immunol 199:3369–3380. doi: 10.4049/jimmunol.1700485 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30. Bolotin DA, Poslavsky S, Mitrophanov I, Shugay M, Mamedov IZ, Putintseva EV, Chudakov DM. 2015. MiXCR: software for comprehensive adaptive immunity profiling. Nat Methods 12:380–381. doi: 10.1038/nmeth.3364 [DOI] [PubMed] [Google Scholar]
- 31. Avnir Y, Watson CT, Glanville J, Peterson EC, Tallarico AS, Bennett AS, Qin K, Fu Y, Huang C-Y, Beigel JH, Breden F, Zhu Q, Marasco WA. 2016. IGHV1-69 polymorphism modulates anti-influenza antibody repertoires, correlates with IGHV utilization shifts and varies by ethnicity. Sci Rep 6:20842. doi: 10.1038/srep20842 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32. Cortina-Ceballos B, Godoy-Lozano EE, Téllez-Sosa J, Ovilla-Muñoz M, Sámano-Sánchez H, Aguilar-Salgado A, Gómez-Barreto RE, Valdovinos-Torres H, López-Martínez I, Aparicio-Antonio R, Rodríguez MH, Martínez-Barnetche J. 2015. Longitudinal analysis of the peripheral B cell repertoire reveals unique effects of immunization with a new influenza virus strain. Genome Med 7:124. doi: 10.1186/s13073-015-0239-y [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33. Forgacs D, Abreu RB, Sautto GA, Kirchenbaum GA, Drabek E, Williamson KS, Kim D, Emerling DE, Ross TM. 2021. Convergent antibody evolution and clonotype expansion following influenza virus vaccination. PLoS One 16:e0247253. doi: 10.1371/journal.pone.0247253 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34. Jackson KJL, Liu Y, Roskin KM, Glanville J, Hoh RA, Seo K, Marshall EL, Gurley TC, Moody MA, Haynes BF, Walter EB, Liao H-X, Albrecht RA, García-Sastre A, Chaparro-Riggers J, Rajpal A, Pons J, Simen BB, Hanczaruk B, Dekker CL, Laserson J, Koller D, Davis MM, Fire AZ, Boyd SD. 2014. Human responses to influenza vaccination show seroconversion signatures and convergent antibody rearrangements. Cell Host Microbe 16:105–114. doi: 10.1016/j.chom.2014.05.013 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35. Strauli NB, Hernandez RD. 2016. Statistical inference of a convergent antibody repertoire response to influenza vaccine. Genome Med 8:60. doi: 10.1186/s13073-016-0314-z [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36. Avnir Y, Tallarico AS, Zhu Q, Bennett AS, Connelly G, Sheehan J, Sui J, Fahmy A, Huang C, Cadwell G, Bankston LA, McGuire AT, Stamatatos L, Wagner G, Liddington RC, Marasco WA. 2014. Molecular signatures of hemagglutinin stem-directed heterosubtypic human neutralizing antibodies against influenza A viruses. PLoS Pathog 10:e1004103. doi: 10.1371/journal.ppat.1004103 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37. Chen F, Tzarum N, Wilson IA, Law M. 2019. V(H)1-69 antiviral broadly neutralizing antibodies: genetics, structures, and relevance to rational vaccine design. Curr Opin Virol 34:149–159. doi: 10.1016/j.coviro.2019.02.004 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38. Lang S, Xie J, Zhu X, Wu NC, Lerner RA, Wilson IA. 2017. Antibody 27F3 broadly targets influenza A group 1 and 2 hemagglutinins through a further variation in V(H)1-69 antibody orientation on the HA. Cell Rep 20:2935–2943. doi: 10.1016/j.celrep.2017.08.084 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39. Galson JD, Trück J, Fowler A, Münz M, Cerundolo V, Pollard AJ, Lunter G, Kelly DF. 2015. In-depth assessment of within-individual and inter-individual variation in the B cell receptor repertoire. Front Immunol 6:531. doi: 10.3389/fimmu.2015.00531 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40. Kono N, Sun L, Toh H, Shimizu T, Xue H, Numata O, Ato M, Ohnishi K, Itamura S. 2017. Deciphering antigen-responding antibody repertoires by using next-generation sequencing and confirming them through antibody-gene synthesis. Biochem Biophys Res Commun 487:300–306. doi: 10.1016/j.bbrc.2017.04.054 [DOI] [PubMed] [Google Scholar]
- 41. Sun L, Kono N, Toh H, Xue H, Sano K, Suzuki T, Ainai A, Orba Y, Yamagishi J, Hasegawa H, Takahashi Y, Itamura S, Ohnishi K. 2019. Identification of mouse and human antibody repertoires by next-generation sequencing. J Vis Exp, no. 145. doi: 10.3791/58804 [DOI] [PubMed] [Google Scholar]
- 42. Chaudhry U, Ali Q, Rashid I, Shabbir MZ, Ijaz M, Abbas M, Evans M, Ashraf K, Morrison I, Morrison L, Sargison ND. 2019. Development of a deep amplicon sequencing method to determine the species composition of piroplasm haemoprotozoa. Ticks Tick Borne Dis 10:101276. doi: 10.1016/j.ttbdis.2019.101276 [DOI] [PubMed] [Google Scholar]
- 43. Glidden CK, Koehler AV, Hall RS, Saeed MA, Coppo M, Beechler BR, Charleston B, Gasser RB, Jolles AE, Jabbar A. 2020. Elucidating cryptic dynamics of theileria communities in African buffalo using a high-throughput sequencing informatics approach. Ecol Evol 10:70–80. doi: 10.1002/ece3.5758 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44. Andrews S. 2010. FastQC A quality control tool for high throughput sequence data. Available from: https://www.bioinformatics.babraham.ac.uk/projects/fastqc. Retrieved 12 Jan 2022.
- 45. Bolger AM, Lohse M, Usadel B. 2014. Trimmomatic: a flexible trimmer for illumina sequence data. Bioinform 30:2114–2120. doi: 10.1093/bioinformatics/btu170 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46. Magoč T, Salzberg SL. 2011. FLASH: fast length adjustment of short reads to improve genome assemblies. Bioinform 27:2957–2963. doi: 10.1093/bioinformatics/btr507 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47. Martin M. 2011. Cutadapt removes adapter sequences from high-throughput sequencing reads. EMBnet j 17:10. doi: 10.14806/ej.17.1.200 [DOI] [Google Scholar]
- 48. Shen W, Le S, Li Y, Hu F. 2016. SeqKit: a cross-platform and ultrafast toolkit for FASTA/Q file manipulation. PLoS One 11:e0163962. doi: 10.1371/journal.pone.0163962 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49. Ye J, Ma N, Madden TL, Ostell JM. 2013. IgBLAST: an immunoglobulin variable domain sequence analysis tool. Nucleic Acids Res 41:W34–40. doi: 10.1093/nar/gkt382 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 50. Shugay M. 2014. Migmap: mapper for full-length T- and B-cell repertoire sequencing
- 51. Shugay M, Bagaev DV, Turchaninova MA, Bolotin DA, Britanova OV, Putintseva EV, Pogorelyy MV, Nazarov VI, Zvyagin IV, Kirgizova VI, Kirgizov KI, Skorobogatova EV, Chudakov DM. 2015. VDJtools: unifying post-analysis of T cell receptor repertoires. PLoS Comput Biol 11:e1004503. doi: 10.1371/journal.pcbi.1004503 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 52. Hill MO. 1973. Diversity and evenness: a unifying notation and its consequences. Ecology 54:427–432. doi: 10.2307/1934352 [DOI] [Google Scholar]
- 53. van der Loo MPJ. 2014. The stringdist package for approximate string matching. The R Journal 6:111. doi: 10.32614/RJ-2014-011 [DOI] [Google Scholar]
- 54. R Core Team . 2021. R: a language and environment for statistical computing. Available from: https://www.R-project.org
- 55. Borg I, Groenen PJ. 2005. Modern multidimensional scaling: theory and applications. Springer Science & Business Media. [Google Scholar]
- 56. Chacón JE, Duong T. 2018. Multivariate kernel smoothing and its applications. 1st ed. Chapman and Hall/CRC, New York. [Google Scholar]
- 57. Hahsler M, Piekenbrock M, Doran D. 2019. dbscan: fast density-based clustering with R. J Stat Softw 91. doi: 10.18637/jss.v091.i01 [DOI] [Google Scholar]
- 58. Bodenhofer U, Bonatesta E, Horejš-Kainrath C, Hochreiter S. 2015. msa: an R package for multiple sequence alignment. Bioinform 31:3997–3999. doi: 10.1093/bioinformatics/btv494 [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Original data for supplemental text.
Includes 6 supplemental figures and 11 supplemental tables.
Data Availability Statement
The raw sequence data were submitted to the Sequence Read Archive of the National Library of Medicine under BioProject PRJNA1046241.




