Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2014 Aug 18.
Published in final edited form as: Cell Host Microbe. 2013 Jun 12;13(6):691–700. doi: 10.1016/j.chom.2013.05.008

Convergent antibody signatures in human dengue

Poornima Parameswaran 1,2, Yi Liu 3,4, Krishna M Roskin 2, Katherine KL Jackson 2, Vaishali P Dixit 2, Ji-Yeun Lee 2, Karen Artiles 2, Simona Zompi 1, Maria José Vargas 5, Birgitte B Simen 6, Bozena Hanczaruk 6, Kim R McGowan 6, Muhammad A Tariq 7, Nader Pourmand 7, Daphne Koller 3, Angel Balmaseda 5, Scott D Boyd 2,#,*, Eva Harris 1,#,*, Andrew Z Fire 2,8,#,*
PMCID: PMC4136508  NIHMSID: NIHMS601431  PMID: 23768493

SUMMARY

Dengue is the most prevalent mosquito-transmitted viral disease in humans, and the lack of early prognostics, vaccines and therapeutics contributes to immense disease burden. To identify patterns that could be used for sequence-based monitoring of the antibody response to dengue, we examined antibody heavy-chain gene rearrangements in longitudinal peripheral blood samples from 60 dengue patients. Comparing signatures between acute dengue, post-recovery and healthy samples, we find increased expansion of B cell clones in acute dengue patients, with higher overall clonality in secondary infection. Additionally, we observe consistent antibody sequence features in acute dengue in the major antigen-binding determinant Complementarity Determining Region-3 (CDR3), with specific CDR3 sequences highly enriched in acute samples compared to post-recovery, healthy or non-dengue samples. Dengue thus provides a striking example of a human viral infection where convergent immune signatures can be identified in multiple individuals. Such signatures could facilitate surveillance of immunological memory in communities.

INTRODUCTION

Nearly three billion people worldwide are at risk for infection with dengue virus (DENV) (WHO, 2012), a mosquito-borne Flavivirus. There are four phylogenetically distinct serotypes of DENV (DENV-1 to DENV-4), all of which cause a spectrum of clinical manifestations ranging from asymptomatic infection to the debilitating acute febrile illness, Dengue Fever (DF), to the life-threatening Dengue Hemorrhagic Fever (DHF) and Dengue Shock Syndrome (DSS) (WHO, 1997). Key determinants of dengue pathogenesis include pre-existing host immunity from prior infection(s) with a distinct DENV serotype and differential virulence of the infecting DENV strain (Halstead and Yamarat, 1965; Messer et al., 2003; Rico-Hesse et al., 1997). Enhanced pathogenicity in sequential DENV infections may be due to sub-optimal immune responses contributed by pre-existing memory B and T cells that were primed by viral antigens from the first infection rather than the naïve B and T cells primed during the current infection (Halstead, 1988, 2009; Peiris and Porterfield, 1979; Rothman and Ennis, 1999). In addition, cytokines released by immune effector cells can play dual roles in attenuating and/or exacerbating disease pathogenesis in dengue (Avirutnan et al., 2007; Rothman and Ennis, 1999). Such complex interactions between protective and enhancing components of the immune response in dengue have made it difficult to identify the precise mechanisms that trigger progression to severe disease.

Severe disease only occurs in a few percent of symptomatic DENV infections. Yet, dengue imposes a major burden on health care systems. This is due to the absence of prognostic biomarkers for severe disease, resulting in a situation where many patients presenting with uncomplicated DENV infections are hospitalized due to difficulties in early identification of high-risk infections. This, together with geographic expansion in dengue transmission and a lack of licensed vaccines and therapeutics has made dengue a major global health problem.

In this study, we employ high-throughput sequencing methodologies to investigate immunoglobulin heavy chain gene rearrangement signatures from 60 individuals enrolled in prospective dengue studies in Nicaragua. Recent advances in high-throughput sequencing of antibody repertoires have facilitated the exploration of heavy chain signatures in zebrafish (Weinstein et al., 2009), mice (Reddy et al., 2010) and humans (Arnaout et al., 2011; Boyd et al., 2009; Liao et al., 2011; Prabakaran et al., 2012; Wu et al., 2011). In particular, investigations of antibody signatures in healthy individuals and individuals with hematological pathologies have emphasized the utility of heavy chain sequencing in tracking disease in humans (Boyd et al., 2009; Logan et al., 2011). Here, we use sequence-based longitudinal monitoring of the human B cell response in PBMCs that are not pre-sorted for B cells or antigen specificity to identify convergent immunoglobulin sequence correlates that are specific to acute dengue, diminished after resolution of infection and absent in non-dengue samples.

RESULTS & DISCUSSION

Description of samples and methods

PBMCs were sampled from 44 individuals exposed to DENV during acute symptomatic dengue at 2–5 days post-symptom onset (dpo), convalescence (7–47 dpo) and post-convalescence (~180 dpo) (Tables 1 and S1). The availability of longitudinal samples facilitated assessment of infection-associated signatures (acute phase), persistence of signatures post-clearance of infection (convalescent phase) and baseline profiles (post-convalescent phase) within the same individual. We also evaluated PBMCs sampled from eight individuals with non-dengue febrile illness and eight healthy individuals with no prior history of dengue (Tables 1 and S1). All individuals were enrolled in one of two Nicaraguan studies (see Experimental Procedures) (Hammond et al., 2005; Kuan et al., 2009) and were chosen to ensure an adequate, unbiased representation of primary and secondary apparent infections with DENV-2 or DENV-3. Supporting clinical information such as dengue immune status, disease severity and infecting DENV serotype is available for all samples (Tables 1 and S1).

Table 1. Samples used in the study.

See also Table S1.

Disease severity Individuals with
primary infection
Individuals with
secondary
infection
Cases with 3
available
time-points^
Total
samples (all
time-points)
DENV-2 DENV-3 DENV-2 DENV-3
Dengue Fever 4 7 6 6 20 55
Dengue Hemorrhagic Fever 1 4 0 5 9 24
Dengue Shock Syndrome 1 0 10 0 9 21
Non-dengue febrile illness N/A N/A N/A N/A N/A 8
Healthy N/A N/A N/A N/A N/A 8
^

Cases with all three time-points available: 2–4 days, 14 days and 6 months post-symptom onset. N/A, not available.

Rearranged antibody heavy chain variable region (VH) segments were amplified from 100 ng of PBMC DNA using barcoded primers in six independent PCR reactions (as previously described (Boyd et al., 2009)). The use of genomic DNA as the template for each PCR reaction ensures that each B cell contributes a single copy of its rearranged immunoglobulin locus to the template pool. Multiple PCR libraries therefore provide sampling of independent B cell populations. Sequences observed in more than one PCR library are derived from sufficiently large B cell clones such that members of these clones populate templates used to generate multiple PCR libraries (Boyd et al., 2009). Thus, replicate sequencing allows us to estimate the size of B cell clones, with relative abundance proportional to the number of replicates in which the clone is detected (Boyd et al., 2009).

All VH libraries were sequenced twice by independent runs on a GS FLX system (454 Life Sciences/Roche) (Margulies et al., 2005). Sequences were segregated based on DNA barcodes and VH rearrangements were aligned to germline gene repertoires using the iHMMune-align algorithm (Gaeta et al., 2007; Jackson et al., 2010). Resulting alignments were parsed to obtain V, D, and J matches. Only alignments that were “in-frame” and could produce functional protein were further considered. Additionally, data filtering steps were implemented to remove potential PCR contaminants and improperly amplified PCR products that were inappropriately found in multiple samples (see Experimental Procedures). We also chose to exclude sequences lacking both VD and DJ junctional bases, since these are more likely to occur independently in different individuals. Sequences with unassigned D regions were retained in our analysis. Post-filtering, the read count per sample varied between 450 and 9000, with a median count of 2000 (Table S1).

VH signatures in acute dengue

Clonal expansions of immune cells have been observed in many pathological states such as viral and bacterial infections and hematological malignancies (Thorselius et al., 2006; Wang and Palese, 2009; Zhou et al., 2002). VH sequencing has facilitated the identification of prominent clonal B cell populations in lymphocytic malignancies, without prior selection for specific classes of B cells (Cleary et al., 1984; Warnke and Levy, 1978). Less prominent B cell clones are also evident in sequence datasets from healthy individuals, and can be detected by identification of identically rearranged sequences in PCR pools generated from independent cellular DNA aliquots from the same sample (Boyd et al., 2009). Such VH ‘coincident sequences’ were found to be extremely rare between samples from different individuals due to the substantial combinatorial and junctional diversity of immunoglobulin gene rearrangements (Boyd et al., 2009; Glanville et al., 2011).

The presence of clonal populations in a sample may be estimated using various diversity indices (for example (Hill, 1973; Jost, 2006)) to calculate clonality scores from a single DNA pool per sample. In our study, we apply a replicate library sequencing approach to robustly test for the presence of clonal B cell populations in an individual by sequencing from multiple distinct DNA aliquots isolated from the same lymphocyte sample (Boyd et al., 2009). We use the probability of identifying clonally-related sequences in arbitrary sequence pairs from different aliquots of a single sample (P(collision)) (Schnabel, 1938) as a metric to track clonality. P(collision) captures the probability that any two randomly chosen B cells in an individual share a clonal origin, with mean values of P(collision) being independent of sequencing depth (Figure S1A–B).

Consistent with an active immune response, P(collision) was higher in acute phase samples compared to convalescent or post-convalescent samples from the same individual (P = 0.0004 and P < 0.0001, respectively) (Figure 1A, left and right panels and Table S2). We also detected higher P(collision) scores in samples from individuals with non-dengue febrile illnesses in contrast to samples from healthy individuals (P = 0.0486) (Figure 1B and Table S2). These results illustrate that quantifiable measurements of VH clonality in peripheral blood samples are able to capture global (if not pathogen-specific) differences in B cell populations associated with diverse illnesses.

Figure 1. Antibody VH clonality in peripheral blood as a surrogate for B cell expansion in human dengue.

Figure 1

(A) Probability of observing identical VH sequences in two or more independent PCR replicates (P(collision)) in acute and convalescent (Conv) phase samples (left panel, *P=0.0004, Wilcoxon Signed Rank test) or in acute and post-convalescent (P-Conv) phase samples (right panel, *P<0.0001, Wilcoxon Signed Rank test) from the same patients. Only convalescent samples that were taken 7–21 days post-symptom onset were considered for this analysis. (B) P(collision) measures for samples from healthy individuals and from individuals with acute, non-dengue febrile illness (febrile). *P=0.0486, Wilcoxon-Mann-Whitney test. (C) P(collision) in acute (left panel), convalescent (middle panel) or post-convalescent (right panel) phase samples from individuals presenting with primary or secondary DENV infections. Annotations to each plot show the 25th–75th percentiles (box), the 10th–90th percentiles (whiskers), and the median (horizontal line). *P=0.0409, Wilcoxon-Mann-Whitney test. See also Figure S1 and Table S2.

We next compared P(collision) scores between primary and secondary acute DENV infections and observed a significant increase in secondary acute cases (P = 0.0409) (Figure 1C, left panel). No such difference was detected for convalescent or post-convalescent samples (Figure 1C, middle and right panels). P(collision) was also significantly higher in acute samples, compared to convalescent or post-convalescent samples for secondary dengue (P = 0.0046 for acute/convalescent and P < 0.0001 for acute/post-convalescent comparisons) but not for primary dengue (Figure S1C–D, left and middle panels). Taken together, our observations suggest that the clonality of the B cell response in peripheral blood is significantly higher in secondary dengue compared to primary dengue.

Convergent CDR3s in acute dengue

The VH protein tertiary structure includes three exposed loop regions that are involved in antigen recognition (Complementarity Determining Regions CDR1, CDR2 and CDR3), and four framework regions (FR1, FR2, FR3 and FR4) that form the scaffolds for the CDR loops. All FR and CDR regions are encoded entirely by germline V or J genes, except for CDR3, which is encoded by recombined sequences from V, D and J genes (Jung et al., 2006). Due to the junctional diversity created by V-D-J recombination processes, the CDR3 is the most diverse region in the VH peptide sequence (Tonegawa, 1983). CDR3s are suggested to be the major determinant of antibody specificity (Xu and Davis, 2000), with some contributions from residues in CDR1 or CDR2 (Ekiert et al., 2009; Padlan et al., 1989). The probability of finding identical antibody sequences in different individuals is reported to be extremely low even in monozygotic twins (Glanville et al., 2011), and it is often assumed that individuals use distinct antibody sequences, in particular, distinct CDR3s, in response to the same antigen. Several studies have investigated CDR3 usage in antigen-specific antibody populations by leveraging sequence information from monoclonal antibodies that bind an antigen of interest in order to subsequently identify similar sequences using deep sequencing of whole-blood PBMC populations (Chen et al., 2012; Prabakaran et al., 2012; Wu et al., 2011). Less than 50% sequence similarity was observed in antigen-specific CDR3s from different individuals (Prabakaran et al., 2012; Wu et al., 2011).

We sought to ascertain CDR3 signatures that were specific to the human immune response to dengue without pre-selecting for antigen-specific B cells. Cross-validation was used for the selection of predictive CDR3s that were highly prevalent in acute dengue samples and absent (or of low prevalence) in longitudinal samples from the same individual or in samples from healthy individuals. Prevalence (or incidence) of a CDR3 is defined as the proportion of samples containing the CDR3 of interest. The dataset was first partitioned by randomly assigning the 44 individuals into two non-overlapping groups of 22 individuals (Figure S2A). We then assessed the prevalence of all CDR3s and their one-mismatch derivatives (i.e., CDR3s that differ by one amino acid) for one of the two groups (the training set). Ten CDR3s were highly prevalent in acute dengue cases in the training set (Figure S2B), of which six CDR3s were found to be absent or of low prevalence in post-convalescent samples from the same set of individuals (Figure 2A–B). The incidence of these six CDR3s was then evaluated in the second group of individuals (1st test set). Notably, five of the six CDR3s were highly prevalent in acute dengue cases in the 1st test set (Figure 2A) and nearly absent in post-convalescent samples (Figure 2B). This bias in prevalence is not due to differences in sequencing depth as samples having or lacking convergent CDR3s had comparable sequencing coverage (Figure S2C). We next examined whether these CDR3s could be detected in an independent set of samples from 16 individuals with acute dengue (2nd test set; Table S1) that were processed using an amplification strategy similar to the original dataset. All six CDR3s were found in a substantial proportion of dengue samples in the 2nd test set (Figure 2A). Swapping the training and test sets, i.e., identifying CDR3s using the 1st test set and validating on the training set and the 2nd test set also yielded similar 10-mer and 13-mer CDR3 candidates (Figure S2D). Alternative partitioning of the dataset by DENV serotype illustrated that these CDR3 signatures were associated with both DENV-2 and DENV-3 infections, but were barely discernible after resolution of infection (Figure 2A–B). Furthermore, these CDR3 sequences and their one-mismatch derivatives were absent or extremely rare in ‘non-dengue’ samples from 47 healthy individuals enrolled in Nicaraguan (Hammond et al., 2005; Kuan et al., 2009) and U.S.-based (Boyd et al., 2010; Boyd et al., 2009; Kidd et al., 2012) studies and from 8 individuals who presented with non-dengue febrile illness (Figure 2B). We also examined over a thousand immunoglobulin sequence datasets derived from 640 individuals enrolled in a number of distinct studies from the authors and collaborating groups where identical primer design was used; none of these samples showed precise matches to the identified CDR3s (data not shown).

Figure 2. CDR3 prevalence in dengue and non-dengue samples.

Figure 2

(A–B) Prevalence of selected CDR3s, measured as proportion of samples containing these CDR3s in various subsets of samples. (A) CDR3 prevalence in acute phase samples partitioned for cross-validation or by DENV serotype. (B) CDR3 prevalence in samples from 47 healthy individuals and 8 individuals with non-dengue febrile (non-dengue), and in post-convalescent phase samples partitioned either for cross-validation or by DENV serotype. See also Figure S2 and Table S3.

In addition to mismatch-based cross-validation, we evaluated four independent approaches for identifying CDR3s that were prevalent in acute dengue cases (see Experimental Procedures). Substantial similarities were observed in the 10-mer and 13-mer CDR3 subsets that were predicted to be dengue-specific using these various approaches (Figure S2F and Table S3). We have thus identified several 10-mer and 13-mer VH CDR3s in B cell repertoires from unsorted PBMCs that appear to be unique signatures of the human immune response to DENV infection.

Notably, the dengue-specific CDR3s appear to be relatively short, with minor contributions from D gene sequences. It is conceivable that convergent sequences with longer CDR3s may be less common, simply due to the greater diversity inherent in long CDR3 sequences. Short CDR3 signatures are typically seen in fetal and neonatal B cell repertoires (Souto-Carneiro et al., 2005; Zemlin et al., 2001). Yet, we found no association between the presence of these short CDR3s in the sampled population and underlying age structures (Figure S2G), suggesting that the identification of short convergent CDR3 species is not due to skewed sampling from neonates or very young infants.

Rearrangements with these CDR3s were present at abundances ranging from a relatively low 0.01% up to 1% of all sequenced VH for each sample (Figure S3A), suggesting that the convergent signatures contributed by these CDR3s are typically rare, but can represent a significant fraction of all B cells in the blood.

Diversity in CDR3 nucleotide sequences

We explored the diversity in nucleotide sequences underlying all CDR3 candidates identified by mismatch-based cross-validation (Figures 34 and S3B–F), but particularly focused on ARLD(Y)5GMDL, which has the highest incidence among all identified convergent CDR3 sequences, when not considering contributions from one-mismatch derivatives (data not shown). Sequences encoding for all convergent CDR3s were different across most individuals (Figures 3 and S3B–E), with identical sequences found only in a handful of samples (Figures 3 and S3C, identical sequences indicated by symbols). Such sequence variation may be attributed to: (i) independent clonal lineages with equivalent rearrangements, (ii) sequence variation due to somatic mutation in cells belonging to a single clonal lineage derived from a single initial rearrangement, or (iii) post-extraction amplification or sequencing errors from uniform underlying sequences. Although (ii) and (iii) likely contribute to observed sequence variation, the data strongly support a set of independently derived rearrangements for these convergent CDR3s due to several sequence-based observations. First, in multiple individuals, sequences underlying the convergent CDR3s exhibited numerous deviations from inferred germline sequence (Figures 3 and S3C–E, highlighted nucleotides). Second, some of these CDR3s and their derivatives appear to be represented in several distinct sub-clonal lineages within individuals (Figure S4). Third, in all individuals we found examples wherein the same amino acid in the CDR3 was encoded by divergent synonymous codons with one or two base differences (Figures 3 and S3B–E). Fourth, we observed substantial diversity in nucleotide sequences underlying FR and CDR regions both upstream and downstream of the CDR3 region, as illustrated for ARLD(Y)5GMDL (Figure 4). Notably, much of this heterogeneity is explained by the usage of multiple V gene segments in encoding the CDR3 and its one-mismatch derivatives both within and across individuals (Figure 4). Six distinct V genes from the V1, V3 or V5 gene families exhibited evidence for association with ARLD(Y)5GMDL and its one mismatch derivatives in at least two samples (Figure 4, ‘V gene usage’). Likewise, multiple V genes were used in encoding the other four CDR3s (Figure S3F). The use of diverse V genes for encoding the prevalent CDR3s is unexpected, and provides strong support for the hypothesis that the associated VH regions were generated by convergent evolution. Taken together, our observations suggest that these CDR3s originated from distinct VH sequences in individuals exposed to DENV and are not likely to represent PCR contaminants or artifacts.

Figure 3. Diversity in sequences encoding for the ARLD(Y)5GMDL CDR3.

Figure 3

Nucleotide sequences encoding for the ARLD(Y)5GMDL CDR3, segregated by patient identity and sample type (A, Acute; C, Convalescent; PC, Post-convalescent). #, count per sample; CON, Consensus; V, Variable region; D, Diversity region; N, non-templated additions; J, Joining region; *^identical sequences in multiple samples. See also Figure S3.

Figure 4. Sequence diversity in regions flanking the ARLD(Y)5GMDL CDR3.

Figure 4

Color-coded nucleotide diversity in FR regions flanking the ARLD(Y)5GMDL CDR3 and its one-mismatch derivatives, with each row representing a unique sequence read within an individual. %mut, percent mutation in the V gene (which has been truncated to include 160 nucleotides at the 3’ terminus for direct comparison between sequences); V gene usage, V genes with evidence for usage in at least two samples; sample, sample of sequence origin. See also Figure S4.

Contributions from affinity-matured B cells

B cells that have encountered antigens undergo successive rounds of affinity maturation during which B cells with optimal antigenic affinity are selected as immune responders. B cells undergoing affinity maturation accrue mutations in V, D and J germline sequences by a process known as somatic hypermutation. The number of mutations observed in V gene sequences has been commonly used for segregating sequence features associated with naïve and affinity-matured B cells (Glanville et al., 2011; Wrammert et al., 2008).

We investigated whether the highly prevalent CDR3 signatures were derived from affinity-matured B cells. For each primer-trimmed V segment associated with these CDR3s, we calculated the percent of nucleotides that were different from the closest reference V gene germline sequence in the International Immunogenetics (IMGT) database. Very few of these V sequences had less than 1.9% mutation compared to germline (3 nucleotide changes) (Figures 4 and S3G), suggesting that only a minor proportion of the convergent CDR3 signatures were contributed by naïve B cells. The median somatic mutation in V gene segments associated with all convergent CDR3s varied from 4.4 to 6.9% (7 to 11 mutations; Figures 4 and S3G), which suggests that these CDR3s were likely derived from memory B cell populations (Glanville et al., 2011). The high mutation score was not attributable to differences in hypermutation frequencies between VH rearrangements using different V segments, since V sequences with less than 1.9% mutation (3 nucleotide changes) were significantly more abundant in other B cell populations that used the same subset of V genes for encoding other CDR3s (P ≤ 0.0001, Fisher’s test; Figure S3H). Antibodies with these prevalent CDR3s thus appear to be derived from B cell populations that have undergone affinity maturation and accumulated somatic mutations in response to DENV infection. Furthermore, the majority of convergent CDR3s were significantly more prevalent in acute secondary dengue compared to acute primary dengue (P ≤ 0.05, Fisher’s test; Figure S3I); this is consistent with a higher prevalence of infection-specific affinity-matured B cell populations in secondary infections (Smith et al., 1997).

Physicochemical properties of prevalent CDR3s

Affinity maturation selects CDR regions with distinctive physicochemical profiles that favor antigen recognition. The skewed representation of the convergent 10-mer and 13-mer CDR3s in affinity-matured B cell populations suggests that the physicochemical profiles of these CDR3s may define specific indicators of the immune response in acute dengue. We hence probed for coherent physicochemical properties associated with the convergent 10-mer and 13-mer CDR3s.

We first identified other CDR3s that were similar to the convergent CDR3s in amino acid physicochemical space, using Principal Component Analysis (PCA) and standard clustering approaches to partition CDR3s based on molecular weight, isoelectric pH and hydrophilicity scores for each amino acid. We note that, as predicted for a complex mixture, the use of a large number of variables for PCA resulted in each principal component accounting for only a modest proportion of total variation between CDR3s. Nevertheless, we confirmed that clusters containing the convergent 10-mer and 13-mer CDR3s exhibited significantly higher prevalence in acute dengue subsets (Figure S5A and Table S4). Several other clusters that also appeared to be more prevalent in acute-phase samples (Table S4) were not considered for further analysis because their members were not identified as being convergent by statistically robust approaches such as mismatch-based cross validation and L1-regularized logistic regression.

Interestingly, clusters containing the convergent CDR3s appeared to occupy exclusive coordinate sets in CDR3 amino acid physicochemical space bounded by the first three components; these coordinates were sparsely populated by other CDR3s from either dengue or non-dengue datasets (Figure 5A–B). Furthermore, several amino acids in CDR3s from these clusters exhibited physicochemical scores that were distinct from median scores for amino acids in other identical-length CDR3s (Figures 5C and S5B–C). These CDR3 clusters principally associated with convergent CDR1 and CDR2 clusters that also exhibited several mutations relative to germline encoding. Taken together, our observations suggest that the identified CDR3 sequences and their associated CDR2 and CDR1 sequences have coherent amino acid physicochemical profiles that uniquely position them as immune repertoire indicators in human dengue.

Figure 5. Amino acid characteristics of coherent CDR3s.

Figure 5

(A, B) Separation of CDR3 clusters in the spaces bounded by principal components (Prin) 1, 2 and 3, which represent residue-by-residue scores for hydrophilicity, molecular weight and isoelectric pH for all CDR3s with VH usage and length identical to the convergent (A) 13-mer and (B) 10-mer CDR3s. All CDR3s that cluster with the (A) 13-mer and (B) 10-mer CDR3s identified by mismatch cross-validation are highlighted; members from all other clusters are faded. The top right quadrants are magnified images of the boxed graph areas. (C) Sequences and residue-specific characteristics of VH CDR1 regions and CDR2 regions associated with the highlighted 13-mer (top panel) and 10-mer (bottom panel) convergent CDR3 clusters. Amino acids in black, consensus residues; amino acids in red, germline-encoded residues different from consensus; *junctional residues with undeterminable germline sequence. Colored circles indicate deviations in molecular weight, isoelectric pH or hydrophilicity scores from germline encodings for CDR1 and CDR2, or from median scores computed across all CDR3s with lengths identical to the corresponding CDR3s. See also Figure S5 and Table S4.

Concluding remarks

We have identified convergent dengue-specific immune repertoire signatures that define prevalent and specific indicators of DENV infection. Such immune signatures may be amenable for engineering protein- or nucleic acid-based diagnostic tools for acute dengue, or for surveying and tracking exposure to DENV in endemic communities. This list of predictors is by no means exhaustive, as it is conceivable that our algorithms for mismatch-based and similarity-based querying overlooked complex motifs encased in other convergent signatures. Similar efforts to identify coherent antibody signatures in infections with other microbial pathogens could reveal novel indicators for tracking infections (particularly when other diagnostics are unavailable) and immunological histories in individuals.

EXPERIMENTAL PROCEDURES

Ethics statement

All studies were approved by the Institutional Review Boards of the Nicaraguan Ministry of Health and of the University of California, Berkeley. Parents or legal guardians of all pediatric subjects provided written consent, and assent was obtained from subjects six years of age and older.

Study population

PBMCs from subjects enrolled in one of two ongoing prospective studies of dengue in Nicaragua were used. (1) The “Hospital-based Study”, ongoing since 2005 (Hammond et al., 2005), enrolls pediatric patients who present with suspected dengue to the National Pediatric Reference Hospital in Managua. PBMCs and plasma are collected from patients during the acute phase of illness at day 2–5 post-onset of symptoms (dpo), at convalescence (14 dpo) and longitudinally at 3, 6, 12, and 18 months. (2) The Pediatric Dengue Cohort Study (2004-present) (Kuan et al., 2009) follows ~3,800 children in District II of Managua. Each year, a blood sample is drawn from all subjects during the dry season, which has low incidence of dengue, to assess prior exposures to DENV. Additionally, acute- and convalescent-phase samples are collected year-round from participants who present with suspected dengue or undifferentiated febrile illness. Extensive epidemiological and clinical data are available for all samples. Criteria for identification of dengue-positive cases and for classification by immune status are as described (Harris et al., 2000). Samples from symptomatic individuals that do not meet these criteria are classified as “non-dengue febrile.” Samples from healthy individuals enrolled in the prospective cohort study with no prior history of dengue or anti-DENV antibodies are classified as “healthy”. Classification of dengue cases as DF, DHF or DSS is according to the World Health Organization’s guidelines (WHO, 1997).

Data filtering

The multiplex biomed-2 primer combinations (Boyd et al., 2010) generate a small proportion of artefactual sequences during PCR. These sequences can be distinguished by their occurrence in amplified human DNA from nonlymphoid cell lines, by the general absence of strongly assignable D regions, and by anomalous positioning of break points relative to expected V and J recombination junctions. A Naïve Bayes Classifier was trained to discriminate such ‘spam’ sequences from bone-fide ‘non-spam’ rearrangements as follows: (1) Sequences from previous analyses were aligned using iHMMune-align and potentially artefactual cross-specimen recurrences were labeled as ‘spam’. (2) The subset of V and J segments used by these ‘spam’ sequences, combined with specific sequences in the putative junctions that could be predicted strongly by assigning Bayesian probabilities to all k-mers (k = 1, 2, 3, 4, 5 or 6) in each junction were used to generate an comprehensive model list of ‘spam’ sequences. (3) For any clone, the product of relative frequencies of each assigned V-J segment and junction k-mer in ‘spam’ versus ‘non-spam’ provided a likelihood estimate for assigning spam. We note that only a small number of sequences scored as spam in these analyses (Table S1), and upon inspection, the vast majority of these clearly resemble recurrent non-lymphoid sequences (Boyd et al., 2009).

In addition to the filtering of artefactual sequences, reads containing common pyrosequencing errors such as insertions and deletions (indels) focused at homopolymer tracts were also removed (Table S1). Indels in the V and J regions were identified by extending iHMMune-align to include a preliminary BLAST (Altschul et al., 1990) alignment against the V and J germline repertoires. If an alignment between a read and the germline V or J included indels (gaps) then the germline repertoire used for iHMMune-align mapping was updated to include a version of the gene with indels for use in model building. This allowed iHMMune-align to provide mappings for sequences with V or J indels. Sequences that included indels within the junction region (N1, D, N2) were largely excluded by only analyzing “in-frame” rearrangements.

Alternative approaches for identifying convergent CDR3s

(1) We implemented a simple scoring system using a modified BLOSUM62 similarity matrix (Figure S2E) to calculate an overall score for each CDR3. Derivatives for each CDR3 were identified using a cutoff of ≥95% for score similarity, and cross-validation was used to discern CDR3s with the highest prevalence (Figure S2F). (2) We employed an association-based analysis using Holm-Bonferroni adjusted P-values and Benjamini-Hochberg false discovery rate estimates to identify CDR3s and their one-mismatch derivatives that were present in a significantly higher proportion of acute dengue cases compared to non-dengue cases or healthy control cases (Table S3). (3) We performed L-1 regularized logistic regression (Friedman et al., 2010) using dengue status as the dependent variable to discover individual CDR3s and their one-mismatch derivatives that were disproportionately associated with dengue status (for L-1 regularized odds ratios, see Table S3). The Area Under the receiver operating Curve (AUC) is a commonly used metric for evaluating the performance of binary classifiers; in this study it can be interpreted as the probability that the classifier predicts a higher acute dengue risk for a randomly selected acute dengue case than for a randomly chosen non-dengue case. The L-1 regularized logistic regression classifier, which used the CDR3s as features, had a Leave-Pair-Out Cross-Validation (LPOCV) AUC estimate of 0.777. (4) Lastly, we clustered CDR3s based on similarities in length and sequence, such that each cluster contained CDR3s of the same length with pairwise hamming distances of ≤2 (Glanville et al., 2011). Performance of the resulting 470 clusters was assessed using L1-regularized logistic regression (for predictive clusters and regularized log odds ratios, see Table S3). This optimized approach yielded an AUC of 0.834. Table S3 also lists these statistical measures for the five CDR3 candidates that were identified using mismatch-based cross-validation.

CDR3 clustering

Principal Component Analysis (PCA) was performed using molecular weight, isoelectric pH and hydrophilicity scores for each amino acid, for 892 10-mer and 9,158 13-mer CDR3s. Percent of total variance in the data explained by the first three principal components is presented in Figure 5A–B. k-means clustering was used to partition the CDR3s using components that accounted for at least 50% variance, and cluster sizes were optimized with the Cubic Clustering Criterion fit statistic (data not shown), yielding 400 and 3000 clusters for the 10-mer and 13-mer CDR3s, respectively.

Supplementary Material

Supplementary Data

HIGHLIGHTS.

  • White blood cell DNA sequences reveal clonal B cell expansion in acute dengue patients

  • Multiple dengue cases exhibit convergent dengue-specific antibody (CDR3) signatures

  • Convergent CDR3s have diverse underlying nucleotide sequences

  • Dengue-specific CDR3s have unique amino acid physicochemical profiles

ACKNOWLEDGEMENTS

We thank K. Seo, B. Moraga, E-C Park, M. Buchmeier, A. Collins, K. Wilson, members of the Boyd/Harris/Fire labs, and the PSWRCE consortium for help/ideas/suggestions, and NIAID [AI65359 (AB), AI62100 (EH/AB), U54AI065359/PSWRCE [PP/AZF], and the Pediatric Dengue Vaccine Intiative [VE-1 (EH)] for financial support. Conceived and designed experiments: AZF, PP, EH, SZ, SB, YL. Performed experiments: PP. Analyzed data: PP, AZF, YL, KR, VPD. Contributed materials/reagents: EH, AB, SB, AZF, PP, KR, KJ, VPD, KS, KA, JYL, MJB, BBS, BH, KRG, NP. Wrote manuscript: PP, AZF, YL, EH, SB.

Footnotes

The authors declare no competing financial interests.

REFERENCES

  1. Altschul S, Gish W, Miller W, Myers E, Lipman D. Basic local alignment search tool. J. Mol. Biol. 1990;215:403–410. doi: 10.1016/S0022-2836(05)80360-2. [DOI] [PubMed] [Google Scholar]
  2. Arnaout R, Lee W, Cahill P, Honan T, Sparrow T, Weiand M, Nusbaum C, Rajewsky K, Koralov S. High-resolution description of antibody heavy-chain repertoires in humans. PLoS One. 2011;6:e22365. doi: 10.1371/journal.pone.0022365. [DOI] [PMC free article] [PubMed] [Google Scholar]
  3. Avirutnan P, Zhang L, Punyadee N, Manuyakorn A, Puttikhunt C, Kasinrerk W, Malasit P, Atkinson J, Diamond M. Secreted NS1 of dengue virus attaches to the surface of cells via interactions with heparan sulfate and chondroitin sulfate E. PLoS Pathog. 2007;3:e183. doi: 10.1371/journal.ppat.0030183. [DOI] [PMC free article] [PubMed] [Google Scholar]
  4. Boyd S, Gaeta B, Jackson K, Fire A, Marshall E, Merker J, Maniar J, Zhang L, Sahaf B, Jones C, et al. Individual variation in the germline Ig gene repertoire inferred from variable region gene rearrangements. J Immunol. 2010;184:6986–6992. doi: 10.4049/jimmunol.1000445. [DOI] [PMC free article] [PubMed] [Google Scholar]
  5. Boyd S, Marshall E, Merker J, Maniar J, Zhang L, Sahaf B, Jones C, Simen B, Hanczaruk B, Nguyen K, et al. Measurement and clinical monitoring of human lymphocyte clonality by massively parallel VDJ pyrosequencing. Sci Transl Med. 2009;1:12ra23. doi: 10.1126/scitranslmed.3000540. [DOI] [PMC free article] [PubMed] [Google Scholar]
  6. Chen W, Prabakaran P, Zhu Z, Feng Y, Streaker E, Dimitrov D. Characterization of human IgG repertoires in an acute HIV-1 infection. Exp Mol Pathol. 2012;93:399–407. doi: 10.1016/j.yexmp.2012.09.022. [DOI] [PMC free article] [PubMed] [Google Scholar]
  7. Cleary M, Chao J, Warnke R, Sklar J. Immunoglobulin gene rearrangement as a diagnostic criterion of B-cell lymphoma. Proc Natl Acad Sci U S A. 1984;81:593–597. doi: 10.1073/pnas.81.2.593. [DOI] [PMC free article] [PubMed] [Google Scholar]
  8. Ekiert D, Bhabha G, Elsliger M, Friesen R, Jongeneelen M, Throsby M, Goudsmit J, Wilson I. Antibody recognition of a highly conserved influenza virus epitope. Science. 2009;324:246–251. doi: 10.1126/science.1171491. [DOI] [PMC free article] [PubMed] [Google Scholar]
  9. Friedman J, Hastie T, Tibshirani R. Regularization Paths for Generalized Linear Models via Coordinate Descent. J Stat Softw. 2010;33:1–22. [PMC free article] [PubMed] [Google Scholar]
  10. Gaeta B, Malming H, Jackson K, Bain M, Wilson P, Collins A. iHMMune-align: hidden Markov model-based alignment and identification of germline genes in rearranged immunoglobulin gene sequences. Bioinformatics. 2007;23:1580–1587. doi: 10.1093/bioinformatics/btm147. [DOI] [PubMed] [Google Scholar]
  11. Glanville J, Kuo T, von Budingen H, Guey L, Berka J, Sundar P, Huerta G, Mehta G, Oksenberg J, Hauser S, et al. Naive antibody gene-segment frequencies are heritable and unaltered by chronic lymphocyte ablation. Proc Natl Acad Sci U S A. 2011;108:20066–20071. doi: 10.1073/pnas.1107498108. [DOI] [PMC free article] [PubMed] [Google Scholar]
  12. Halstead S. Pathogenesis of dengue: challenges to molecular biology. Science. 1988;239:476–481. doi: 10.1126/science.3277268. [DOI] [PubMed] [Google Scholar]
  13. Halstead S. Antibodies determine virulence in dengue. Ann N Y Acad Sci. 2009;1171(Suppl 1):E48–E56. doi: 10.1111/j.1749-6632.2009.05052.x. [DOI] [PubMed] [Google Scholar]
  14. Halstead S, Yamarat C. Recent Epidemics of Hemorrhagic Fever in Thailand. Observations Related to Pathogenesis of a "New" Dengue Disease. Am J Public Health Nations Health. 1965;55:1386–1395. doi: 10.2105/ajph.55.9.1386. [DOI] [PMC free article] [PubMed] [Google Scholar]
  15. Hammond S, Balmaseda A, Perez L, Tellez Y, Saborio S, Mercado J, Videa E, Rodriguez Y, Perez M, Cuadra R, et al. Differences in dengue severity in infants, children, and adults in a 3-year hospital-based study in Nicaragua. Am J Trop Med Hyg. 2005;73:1063–1070. [PubMed] [Google Scholar]
  16. Harris E, Videa E, Perez L, Sandoval E, Tellez Y, Perez M, Cuadra R, Rocha J, Idiaquez W, Alonso R, et al. Clinical, epidemiologic, and virologic features of dengue in the 1998 epidemic in Nicaragua. Am J Trop Med Hyg. 2000;63:5–11. doi: 10.4269/ajtmh.2000.63.5. [DOI] [PubMed] [Google Scholar]
  17. Hill M. Diversity and evenness: a unifying notation and its consequences. Ecology. 1973;54:427–432. [Google Scholar]
  18. Jackson K, Boyd S, Gaeta B, Collins A. Benchmarking the performance of human antibody gene alignment utilities using a 454 sequence dataset. Bioinformatics. 2010;26:3129–3130. doi: 10.1093/bioinformatics/btq604. [DOI] [PubMed] [Google Scholar]
  19. Jost L. Entropy and diversity. Oikos. 2006;113:363–375. [Google Scholar]
  20. Jung D, Giallourakis C, Mostoslavsky R, Alt F. Mechanism and control of V(D)J recombination at the immunoglobulin heavy chain locus. Annu Rev Immunol. 2006;24:541–570. doi: 10.1146/annurev.immunol.23.021704.115830. [DOI] [PubMed] [Google Scholar]
  21. Kidd M, Chen Z, Wang Y, Jackson K, Zhang L, Boyd S, Fire A, Tanaka M, Gaeta B, Collins A. The inference of phased haplotypes for the immunoglobulin H chain V region gene loci by analysis of VDJ gene rearrangements. J Immunol. 2012;188:1333–1340. doi: 10.4049/jimmunol.1102097. [DOI] [PMC free article] [PubMed] [Google Scholar]
  22. Kuan G, Gordon A, Aviles W, Ortega O, Hammond S, Elizondo D, Nunez A, Coloma J, Balmaseda A, Harris E. The Nicaraguan pediatric dengue cohort study: study design, methods, use of information technology, and extension to other infectious diseases. Am J Epidemiol. 2009;170:120–129. doi: 10.1093/aje/kwp092. [DOI] [PMC free article] [PubMed] [Google Scholar]
  23. Liao H, Chen X, Munshaw S, Zhang R, Marshall D, Vandergrift N, Whitesides J, Lu X, Yu J, Hwang K, et al. Initial antibodies binding to HIV-1 gp41 in acutely infected subjects are polyreactive and highly mutated. J Exp Med. 2011;208:2237–2249. doi: 10.1084/jem.20110363. [DOI] [PMC free article] [PubMed] [Google Scholar]
  24. Logan A, Gao H, Wang C, Sahaf B, Jones C, Marshall E, Buno I, Armstrong R, Fire A, Weinberg K, et al. High-throughput VDJ sequencing for quantification of minimal residual disease in chronic lymphocytic leukemia and immune reconstitution assessment. Proc Natl Acad Sci U S A. 2011;108:21194–21199. doi: 10.1073/pnas.1118357109. [DOI] [PMC free article] [PubMed] [Google Scholar]
  25. Margulies M, Egholm M, Altman W, Attiya S, Bader J, Bemben L, Berka J, Braverman M, Chen Y, Chen Z, et al. Genome sequencing in microfabricated high-density picolitre reactors. Nature. 2005;437:376–380. doi: 10.1038/nature03959. [DOI] [PMC free article] [PubMed] [Google Scholar]
  26. Messer W, Gubler D, Harris E, Sivananthan K, de Silva A. Emergence and global spread of a dengue serotype 3, subtype III virus. Emerging Infectious Diseases. 2003;9:800–809. doi: 10.3201/eid0907.030038. [DOI] [PMC free article] [PubMed] [Google Scholar]
  27. Padlan E, Silverton E, Sheriff S, Cohen G, Smith-Gill S, Davies D. Structure of an antibody-antigen complex: crystal structure of the HyHEL-10 Fab-lysozyme complex. Proc Natl Acad Sci U S A. 1989;86:5938–5942. doi: 10.1073/pnas.86.15.5938. [DOI] [PMC free article] [PubMed] [Google Scholar]
  28. Peiris J, Porterfield J. Antibody-mediated enhancement of Flavivirus replication in macrophage-like cell lines. Nature. 1979;282:509–511. doi: 10.1038/282509a0. [DOI] [PubMed] [Google Scholar]
  29. Prabakaran P, Zhu Z, Chen W, Gong R, Feng Y, Streaker E, Dimitrov D. Origin, diversity, and maturation of human antiviral antibodies analyzed by high-throughput sequencing. Front Microbiol. 2012;3:277. doi: 10.3389/fmicb.2012.00277. [DOI] [PMC free article] [PubMed] [Google Scholar]
  30. Reddy S, Ge X, Miklos A, Hughes R, Kang S, Hoi K, Chrysostomou C, Hunicke-Smith S, Iverson B, Tucker P, et al. Monoclonal antibodies isolated without screening by analyzing the variable-gene repertoire of plasma cells. Nat Biotechnol. 2010;28:965–969. doi: 10.1038/nbt.1673. [DOI] [PubMed] [Google Scholar]
  31. Rico-Hesse R, Harrison L, Salas R, Tovar D, Nisalak A, Ramos C, Boshell J, de Mesa M, Nogueira R, da Rosa A. Origins of dengue type 2 viruses associated with increased pathogenicity in the Americas. Virology. 1997;230:244–251. doi: 10.1006/viro.1997.8504. [DOI] [PubMed] [Google Scholar]
  32. Rothman A, Ennis F. Immunopathogenesis of Dengue hemorrhagic fever. Virology. 1999;257:1–6. doi: 10.1006/viro.1999.9656. [DOI] [PubMed] [Google Scholar]
  33. Schnabel Z. The estimation of the total fish population of a lake. Amer Math Mon. 1938;45:348–352. [Google Scholar]
  34. Smith K, Light A, Nossal G, Tarlinton D. The extent of affinity maturation differs between the memory and antibody-forming cell compartments in the primary immune response. EMBO J. 1997;16:2996–3006. doi: 10.1093/emboj/16.11.2996. [DOI] [PMC free article] [PubMed] [Google Scholar]
  35. Souto-Carneiro M, Sims G, Girschik H, Lee J, Lipsky P. Developmental changes in the human heavy chain CDR3. J Immunol. 2005;175:7425–7436. doi: 10.4049/jimmunol.175.11.7425. [DOI] [PubMed] [Google Scholar]
  36. Thorselius M, Krober A, Murray F, Thunberg U, Tobin G, Buhler A, Kienle D, Albesiano E, Maffei R, Dao-Ung L, et al. Strikingly homologous immunoglobulin gene rearrangements and poor outcome in VH3-21-using chronic lymphocytic leukemia patients independent of geographic origin and mutational status. Blood. 2006;107:2889–2894. doi: 10.1182/blood-2005-06-2227. [DOI] [PubMed] [Google Scholar]
  37. Tonegawa S. Somatic generation of antibody diversity. Nature. 1983;302:575–581. doi: 10.1038/302575a0. [DOI] [PubMed] [Google Scholar]
  38. Wang T, Palese P. Universal epitopes of influenza virus hemagglutinins? Nat Struct Mol Biol. 2009;16:233–234. doi: 10.1038/nsmb.1574. [DOI] [PubMed] [Google Scholar]
  39. Warnke R, Levy R. Immunopathology of follicular lymphomas. A model of B-lymphocyte homing. N Engl J Med. 1978;298:481–486. doi: 10.1056/NEJM197803022980903. [DOI] [PubMed] [Google Scholar]
  40. Weinstein J, Jiang N, White R, 3rd, Fisher D, Quake S. High-throughput sequencing of the zebrafish antibody repertoire. Science. 2009;324:807–810. doi: 10.1126/science.1170020. [DOI] [PMC free article] [PubMed] [Google Scholar]
  41. WHO. Dengue haemorrhagic fever: diagnosis, treatment, prevention and control. 1997
  42. WHO. 2012 < http://www.who.int/mediacentre/factsheets/fs117/en/>.
  43. Wrammert J, Smith K, Miller J, Langley W, Kokko K, Larsen C, Zheng N, Mays I, Garman L, Helms C, et al. Rapid cloning of high-affinity human monoclonal antibodies against influenza virus. Nature. 2008;453:667–671. doi: 10.1038/nature06890. [DOI] [PMC free article] [PubMed] [Google Scholar]
  44. Wu X, Zhou T, Zhu J, Zhang B, Georgiev I, Wang C, Chen X, Longo N, Louder M, McKee K, et al. Focused evolution of HIV-1 neutralizing antibodies revealed by structures and deep sequencing. Science. 2011;333:1593–1602. doi: 10.1126/science.1207532. [DOI] [PMC free article] [PubMed] [Google Scholar]
  45. Xu J, Davis M. Diversity in the CDR3 region of V(H) is sufficient for most antibody specificities. Immunity. 2000;13:37–45. doi: 10.1016/s1074-7613(00)00006-6. [DOI] [PubMed] [Google Scholar]
  46. Zemlin M, Bauer K, Hummel M, Pfeiffer S, Devers S, Zemlin C, Stein H, Versmold H. The diversity of rearranged immunoglobulin heavy chain variable region genes in peripheral blood B cells of preterm infants is restricted by short third complementarity-determining regions but not by limited gene segment usage. Blood. 2001;97:1511–1513. doi: 10.1182/blood.v97.5.1511. [DOI] [PubMed] [Google Scholar]
  47. Zhou J, Lottenbach K, Barenkamp S, Lucas A, Reason D. Recurrent variable region gene usage and somatic mutation in the human antibody response to the capsular polysaccharide of Streptococcus pneumoniae type 23F. Infect Immun. 2002;70:4083–4091. doi: 10.1128/IAI.70.8.4083-4091.2002. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary Data

RESOURCES