Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2015 Jan 2.
Published in final edited form as: J Immunol. 2010 May 21;184(12):6986–6992. doi: 10.4049/jimmunol.1000445

Individual Variation in the Germline Ig Gene Repertoire Inferred from Variable Region Gene Rearrangements

Scott D Boyd *, Bruno A Gaëta , Katherine J Jackson , Andrew Z Fire *,§, Eleanor L Marshall §, Jason D Merker *,, Jay M Maniar §, Lyndon N Zhang , Bita Sahaf §, Carol D Jones *, Birgitte B Simen #, Bozena Hanczaruk #, Khoa D Nguyen **, Kari C Nadeau **, Michael Egholm #, David B Miklos ††, James L Zehnder *,††, Andrew M Collins
PMCID: PMC4281569  NIHMSID: NIHMS606240  PMID: 20495067

Abstract

Individual variation in the Ig germline gene repertoire leads to individual differences in the combinatorial diversity of the Ab repertoire, but the study of such variation has been problematic. The application of high-throughput DNA sequencing to the study of rearranged Ig genes now makes this possible. The sequencing of thousands of VDJ rearrangements from an individual, either from genomic DNA or expressed mRNA, should allow their germline IGHV, IGHD, and IGHJ repertoires to be inferred. In addition, where previously mere glimpses of diversity could be gained from sequencing studies, new large data sets should allow the rearrangement frequency of different genes and alleles to be seen with clarity. We analyzed the DNA of 108,210 human IgH chain rearrangements from 12 individuals and determined their individual IGH genotypes. The number of reportedly functional IGHV genes and allelic variants ranged from 45 to 60, principally because of variable levels of gene heterozygosity, and included 14 previously unreported IGHV polymorphisms. New polymorphisms of the IGHD3-16 and IGHJ6 genes were also seen. At heterozygous loci, remarkably different rearrangement frequencies were seen for the various IGHV alleles, and these frequencies were consistent between individuals. The specific alleles that make up an individual’s Ig genotype may therefore be critical in shaping the combinatorial repertoire. The extent of genotypic variation between individuals is highlighted by an individual with aplastic anemia who appears to lack six contiguous IGHD genes on both chromosomes. These deletions significantly alter the potential expressed IGH repertoire, and possibly immune function, in this individual.


The mammalian immune system has the ability to respond to almost any Ag to which it is exposed because of the incredible diversity of lymphocyte receptor molecules. The diversity of Ab molecules is made possible by multiple sets of highly similar genes, which recombine to form functional VDJ gene rearrangements encoding the IgH chain and VJ gene rearrangements encoding the IgL chain (1). This combinatorial diversity is expanded still further by junctional diversity arising from exonuclease trimming of the recombining gene ends and from the essentially random addition of nucleotides by the enzyme TdT (2).

Although the complex IGHV gene locus was carefully mapped in the 1990s (3), the difficulties associated with the sequencing of these highly similar genes and pseudogenes has meant that the locus of a single individual has only been fully explored once (4). Even the initial versions of the human genome (57) were based on the previously reported locus sequence (4). A complete description of the IGHD locus has also only been published for a single individual (8). Detailed exploration of genotypic differences of the Ig gene loci between individuals is almost totally lacking.

Although the sequencing of the immunoreceptor gene loci remains a challenge, investigation of allelic variation of a single Ig gene is relatively easy. Many allelic variants were identified in the early years of investigation of the Ig gene loci. However with the first sequencing of the H chain gene locus (4), most exploration of gene variants came to an end. Although hundreds of genes and allelic variants were identified in the 1980s and early 1990s, a mere handful of alleles have been identified since that time. We have recently re-evaluated the reported H chain V region gene sequences and have concluded that many alleles were reported in error (911). Using a bioinformatic approach, we also inferred the existence of a number of previously unreported polymorphisms but concluded that the description of genetic diversity in the H chain V region loci was essentially complete. The identification of alleles that are present in the population at low frequency, and the identification of genes that are less commonly rearranged or expressed was not possible from the data sets of our earlier analysis.

We have recently performed high-throughput pyrosequencing of VDJ rearrangements from the genomic DNA of B cell populations in a series of human clinical specimens and have demonstrated the potential of this approach for monitoring of lymphoid malignancies and the healthy immune system (12). In this study, we extend these results to show the usefulness of deep sequencing of VDJ rearrangements to determine individual Ig locus gene segment genotypes. Sequencing thousands of VDJ gene rearrangements from the genomic DNA of an individual’s B cells allows the sets of germline IGHV, IGHD, and IGHJ genes that contributed to the rearrangements to be inferred and permits novel allelic variants to be distinguished from somatic hypermutation sequence changes. The rearrangements identified with this approach will include both functional VDJ rearrangements that contribute to the expressed RNA and Ig protein repertoire, as well as some sequences that are nonfunctional.

Large data sets of rearranged IGH sequences from peripheral blood B cells allow accurate genotyping of germline gene segments because of the opportunity to observe a given germline V, D, or J segment in many distinct, independently derived VDJ combinations, even when an individual carries previously unreported alleles in his or her genome. This is true even though pyrosequencing of VDJ genes from genomic DNA of peripheral blood will include many sequences that have accumulated somatic point mutations during an immune response. When an individual carries an unreported polymorphism, the polymorphism is typically identified in data sets by the absence of apparently unmutated rearrangements of a gene that is highly similar to the unreported polymorphism and by the presence of hundreds or even thousands of examples of rearrangements that include shared mismatches when aligned to that similar germline sequence. The analysis reported in this paper shows that the inference of the existence of previously unreported polymorphisms is possible even when the utilization frequency of the allele in VDJ rearrangements is relatively low. Relatively common copy number changes of particular gene segments can also be identified and likely contribute to variation in the IGH repertoire of different individuals.

Our finding that an individual with aplastic anemia has an apparent homozygous deletion of a major part of the IGHD repertoire suggests that investigations of Ig genotypes may provide important insights into the operation of the immune system in health and disease. Whether or not such holes in the available germline gene repertoire are common, or of clinical significance, will require widespread investigation of individual variation in the Ig gene loci.

Materials and Methods

Specimens

Specimens of human peripheral blood and a sample of tonsillar tissue were obtained under a Stanford University Institutional Review Board-approved protocol. Blood samples came from eight healthy individuals, one patient with acute myeloid leukemia, one patient with chronic myeloid leukemia, and one patient with aplastic anemia. The tonsil sample came from a healthy individual.

DNA template preparation

PBMCs were isolated by centrifugation of diluted blood layered over Hypaque 1077 (Sigma-Aldrich, St. Louis, MO). Column purification (Qiagen, Valencia, CA) or magnetic bead-based isolation (Magnapure; Roche Diagnostics, Indianapolis, IN) was used to purify DNA template. Frozen tonsil tissue was homogenized prior to genomic DNA purification.

PCR amplifications and high-throughput pyrosequencing

High-throughput amplicon pyrosequencing data obtained with a 454 instrument are presented in this paper and are derived from two independent experiments: one performed using FLX chemistry and one with Titanium chemistry, with amplicons sequencing reads beginning from the “B” primer in the manufacturer’s protocol in both cases (Roche, Branford, CT). PCR amplifications were performed using 200 ng PBMC genomic DNA as template for each reaction. Following PCR, amplicons were pooled in equal amounts and were then purified by 1.5% agarose gel electrophoresis and gel extraction as described previously (12). PCR was performed using the BIOMED-2 protocol and primers that were augmented for sequencing experiments by the addition of 5′ sequencing elements and by the addition of unique 6-, 7-, or 10-nt sample “bar codes” to the IGHJ primers as described previously (12).An additional 10-nt sample bar code was incorporated into the multiplexed IGHV gene segment primers in the analysis of specimens with the 454 Titanium chemistry (12). Sequence bar codes allowed for the pooling and bulk sequencing of many libraries and the subsequent retrieval of sequences that were derived from each sample.

Sequence data analysis

Sequences from each input specimen were sorted based on recognition of a perfect match to the sample bar code as well as a perfect match to the first 3 bases of the IGHJ common primer. Non-Ig sequences were filtered from the sequence sets prior to analysis. Alignment of rearranged IGH sequences to germline V, D, and J segments, and determination of V–D junctions and D–J junctions was performed using the iHMMune-align utility (13). Sequences containing base-pair insertions or deletions in the IGHV or IGHJ gene segments were filtered from the data set, based on the known error properties of pyrosequencing (14, 15). Duplicate sequences were identified in a postprocessing step on the basis of shared IGHV, IGHD, and IGHJ genes as well as shared N regions. Where duplicate sequences were identified that had differing levels of somatic point mutations, the least mutated sequence was retained in the data set.

Genotyping donors

Alignments were reviewed for each individual in the study to identify misaligned sequences. IGHV alignments were considered to be potentially in error where an individual’s data set included alignments to more than one allele of a particular gene. Where unique, unmutated alignments to a second or third allele were seen at a frequency that was >10% of the frequency of a more highly used allele; the alignments were accepted as correct if investigations could exclude PCR-mediated chimerism as an explanation for the sequences (16). Such chimerism was detected by separately aligning 5′ and 3′ portions of the IGHV sequences. Where small numbers of alignments were seen to a single allele, these were accepted as correct alignments if unmutated alignments were seen and if there was no other gene that was present in other VDJ gene rearrangements that could give rise to such a sequence by, in most cases, a single mutation. In a few cases, where hundreds or even thousands of alignments were seen to a particular gene, it was acknowledged that two and rarely three somatic point mutations could give rise to a sequence that aligned better to another gene. In this way, the IGHV genotype of each individual was progressively defined, and finally, misaligned sequences could be reassigned by reference to this genotype. A handful of sequences could not be realigned to another IGHV gene with absolute certainty, and in these cases, the sequences were removed from the analysis. This situation was confined to very rarely rearranged genes, such as IGHV4-30-2. Although this gene was undoubtedly found in a few VDJ rearrangements of some individuals, in others, a single example might be found where a VDJ sequence aligned to IGHV4-30-2 with many mismatches. This was not considered sufficient evidence of the presence of that germline gene in the individual’s genome.

IGHD alignments were reviewed to identify misalignments to allelic variants. For example, where alignments were seen to both IGHD2-8*01 and IGHD2-8*02, the length of alignments, the extent of IGHD mutations, and patterns of 5′ and 3′ exonuclease removals of nucleotides from the IGHD genes were analyzed. It was assumed that data from highly similar alleles would, for example, show similar levels of exonuclease activity. Where differences were seen in comparisons of data sets from the different alleles, misalignments could be identified, and the IGHD genotype of each individual could be determined. iHMMune-align does not attempt to identify IGHD2-2 alleles, because the critical nucleotides that distinguish the three IGHD2-2 alleles are found at the ends of the sequences. Exonuclease removals make it impossible to determine the allelic source of most sequences, whereas N nucleotide addition can very easily make a sequence using one allele align perfectly to another. The large data sets analyzed in this study, however, allowed the unequivocal determination of the IGHD2-2 locus in most cases, because large sets of apparently unprocessed gene ends could be compiled and reviewed. IGHJ alignments were also reviewed, and where small numbers of alignments were seen to variant IGHJ6 alleles, these were reassigned to the alleles that were predominant in the VDJ gene rearrangements.

Identification of unreported polymorphisms

Levels of mutations in the IGHV, IGHD, and IGHJ genes were analyzed, and where large numbers of alignments to a particular gene carried shared mismatches, further investigations were carried out to identify putative unreported polymorphisms. Such sequences were accepted as putative polymorphisms where three conditions were met. First, the same IGHV sequence had to be found in association with a wide variety of IGHD and IGHJ genes. Second, the possibility that the sequences had arisen by somatic point mutation of another gene or allele had to be excluded. This was done on the assumption that no more than 5% of sequences would carry somatic point mutations of any particular nucleotide in a set of gene alignments, unless that nucleotide was a known G/C mutation hot spot, in which case no more than 10% of sequences would carry the mutation. Finally, the germline repertoire was exhaustively searched to exclude the possibility that an apparent polymorphism had been generated through PCR generation of chimeric sequences (16). As Ig genes may have evolved through gene conversion, genuine polymorphisms were also distinguished from possibly chimeric sequences by reference to expression patterns in different individuals. For example, the putative polymorphism IGHV3-11*p05 (11) appears like a chimeric sequence resulting from recombination between IGHV3-11*03 and IGHV3-21*01. The presence of alignments to this putative polymorphism in individuals who lack the IGHV3-11*03 allele is evidence in favor of the existence of this polymorphism.

Results

A total of 210,606 sequences from 12 individuals were analyzed. After removal of duplicate sequences 108,210 unique sequences remained. The sequences were typically ~250 nt in length. The number of unique sequences per individual ranged from 2,442 to 15,947, with a mean of 9,018. Analysis of sequences allowed the unequivocal identification of the genotype of the IGHV, IGHD, and IGHJ loci for each individual. Key features of the detected IGHV genes are summarized in Table I (17). Each individual studied had a unique IGHV genotype, and striking differences were seen between individuals. The total number of reportedly functional IGHV genes ranged from 40 to 46. Perhaps surprisingly, given the large number of IGHV allelic variants that have been reported, individuals were homozygous, on average, at 40.2 gene loci. All individuals carried some genes for which they were heterozygous, and one individual was heterozygous at 13 gene loci. There was clear evidence of relatively frequent gene copy number variants, because three alleles of IGHV1-69, IGHV2-5, IGHV2-70, IGHV3-11, IGHV3-30, IGHV3-48, and IGHV3-64 were seen in one or more individuals.

Table I.

The number of IGHV genes and allelic variants identified in VDJ gene rearrangements from 12 individuals

Mean No. per
Individual
± SEM
Range
Functional IGHV genesa 43.1 ± 0.5 40–46
Pseudogenesa 6.8 ± 0.3 4–8
Homozygous genes 40.2 ± 0.8 34–43
Heterozygous genes (two alleles) 8.6 ± 0.9 3–13
Heterozygous genes (three alleles) 1.1 ± 0.3 0–3
Unreported polymorphisms 1.8 ± 0.5 0–5
a

Genes were classified as functional genes or as pseudogenes by reference to the IMGT information system (17).

Many individuals studied were found to carry unreported polymorphisms as well as polymorphisms that we have previously inferred from bioinformatic analysis of public sequence databases (11). In many cases, hundreds of sequences were found that aligned perfectly to the new putative polymorphisms. Two of the new putative polymorphisms, designated in this study as IGHV1-2*p05 and IGHV3–20*p02, were found to align perfectly with genomic sequences (humIGHV071 and humIGHV154, respectively) in the VBASE2 database (18). Basic local alignment search tool searches of two additional putative polymorphisms, designated IGHV3-30*p20 and IGHV3–30*p21, identified a large number of previously reported VDJ rearrangements that appear to carry these alleles. All 14 putative polymorphisms are presented as partial gene sequences in Supplemental Table I. Although the existence of some of these polymorphisms is supported by hundreds of rearranged sequences, they will not be assigned gene names by the designated authority (the ImMunoGeneTics [IMGT] database) until genomic sequences are amplified from unrearranged germline genes. As we believe it is unlikely that all such germline sequences will be quickly reported, and as the availability of the most complete germline repertoire is critical for many studies, we have established a Putative Immunoglobulin Polymorphism database. All the polymorphisms identified in this study, and evidence supporting their existence, are accessible via this database (www.cse.unsw.edu.au/~ihmmune/IgPdb/).

The IGHV genes that were identified in rearrangements can be divided into core genes that were seen in all individuals studied and noncore genes that were seen in some but not all individuals. Table II shows that some core genes make a relatively trivial contribution to the repertoire, despite being a feature of all genomes studied to date. In contrast, some noncore genes make a substantial contribution to the repertoire of certain individuals. Although many genes contributed to widely varying percentages of the overall repertoire of rearranged VDJ genes, there is evidence that this may not solely be a result of selection, but rather may also be a consequence of the varying propensity for the different IGHV alleles to recombine with DJ rearrangements. For example, the allele IGHV7-4-1*01 was seen in from 0.04 to 0.4% of all sequences in the individuals with this allele. The alternative allele IGHV7-4-1*02, which differs from IGHV7-4-1*01 by a single nucleotide but which encodes the identical amino acid sequence, was seen in as many as 3.6% of sequences. Preferential PCR amplification is unlikely to account for this difference in prevalence, given that the nucleotide distinguishing these alleles is distant from the binding sites of the IGHV primer used to amplify the sequences.

Table II.

Mean frequency and the range of frequencies of rearrangements of core IGHV genes, which were observed in VDJ gene rearrangements of all individuals, and of noncore genes, which were seen in some but not all individuals studied

Core Genes Mean ± SEM (%) Range (%) Noncore Genes Carriers Range (%)
IGHV1-2 1.66 ± 0.30 0.3–3.9 IGHV1-f 4 0.1–0.4
IGHV1-3 2.48 ± 0.32 1.2–4.5 IGHV3-13 5 0.01–0.04
IGHV1-8 2.16 ± 0.23 1.1–3.4 IGHV3-19 (P) 9 0.02–0.13
IGHV1-18 3.84 ± 0.27 2.1–5.1 IGHV3-30-3 5 2.6–6.8
IGHV1-24 0.58 ± 0.04 0.4–0.8 IGHV3-35 (ORF) 3 0.01–0.02
IGHV1-45 0.14 ± 0.02 0.1–0.2 IGHV3-43 8 0.04–1.0
IGHV1-46 2.65 ± 0.11 2.0–3.1 IGHV3-47 (P) 9 0.03–0.1
IGHV1-58 0.43 ± 0.04 0–3–0.6 IGHV3-52 (P) 2 0.01–0.02
IGHV1-69 6.16 ± 0.55 3.1–9.7 IGHV4-28 6 0.01–0.07
IGHV2-5 1.46 ± 0.17 0.2–2.3 IGHV4-30-2 7 0.05–1.2
IGHV2-26 0.70 ± 0.06 0.4–1.1 IGHV4-30-4 7 0.03–1.5
IGHV2-70 1.13 ± 0.10 0.7–1.6 IGHV4-61 4 0.6–2.1
IGHV3-7 3.70 ± 0.22 2.4–5.1 IGHV5-a 7 0.04–2.5
IGHV3-9 2.34 ± 0.29 1.1–3.9 IGHV7-4 9 0.04–3.6
IGHV3-11 3.16 ± 0.20 2.3–5.0 IGHV7-81 8 0.01–0.4
IGHV3-15 1.98 ± 0.18 1.2–3.4
IGHV3-20 0.57 ± 0.12 0.1–1.5
IGHV3-21 4.59 ± 0.21 3.5–6.3
IGHV3-22 (P) 0.15 ± 0.02 0.1–0.2
IGHV3-23 8.86 ± 0.61 6.0–13.7
IGHV3-30 5.89 ± 0.71 1.6–10.5
IGHV3-33 4.92 ± 0.45 3.0–7.8
IGHV3-48 3.78 ± 0.33 1.7–5.5
IGHV3-49 1.00 ± 0.05 0.8–1.3
IGHV3-53 1.84 ± 0.16 0.5–2.6
IGHV3-64 0.58 ± 0.10 0.1–1.1
IGHV3-66 1.01 ± 0.16 0.3–1.7
IGHV3-71 0.18 ± 0.03 0.1–0.4
IGHV3-72 0.26 ± 0.02 0.1–0.4
IGHV3-73 0.47 ± 0.04 0.3–0.7
IGHV3-74 1.78 ± 0.11 1.3–2.4
IGHV4-4 1.73 ± 0.24 0.4–2.6
IGHV4-31 2.60 ± 0.34 0.8–4.1
IGHV4-34 4.70 ± 0.52 2.7–9.7
IGHV4-39 4.82 ± 0.46 2.0–8.0
IGHV4-55 (P) 0.07 ± 0.01 0.03–0.2
IGHV4-59 5.76 ± 0.38 3.1–7.6
IGHV5-51 2.01 ± 0.22 0.5–3.0
IGHV6-1 0.80 ± 0.06 0.5–1.1
humIGHV177(P) 0.15 ± 0.02 0.1–0.2
humIGHV181(P) 0.56 ± 0.06 0.4–1.1

Data are presented for the 11 subjects from whom PBMC-derived sequences were obtained. The number of carriers of each noncore gene is also shown. Ps and ORFs are indicated.

P, pseudogene.

Where pairs of alleles were carried in the genomes of multiple individuals, the relative proportions in which the different alleles were seen were remarkably constant, as seen in Table III. This was true for individuals carrying two or even three alleles of a particular gene. In almost all cases, the proportions in which the alleles were observed were similar, and χ2 analysis showed significant differences between individual levels of rearrangement for only 3 of 23 pairs and sets of three alleles. Interestingly, two of these three pairs included the IGHV3-48*02 allele. Between individuals, there were significantly different patterns of rearrangement of the IGHV3-48*01 and *02 alleles as well as the *02 and *03 alleles. This could result from the existence of variants of the *02 allele that rearrange at both high and low frequency. It is also possible that the unobserved 5′ region of the V segment upstream of the PCR primer sites used in preparing amplicons for sequencing could contain unreported polymorphisms defining a new allele that has a different propensity for rearrangement, compared with the *02 allele. Some rearrangement of pseudogenes was seen, and despite their extremely low frequency within the data sets, the relative proportions in which the pseudogenes were seen also showed a clear pattern (data not shown).

Table III.

The relative proportions of rearrangements that contain different alleles in individuals whose genotype includes multiple alleles

IGHV1-2*02: *04 75:25 72:28 62:38 74:27 83:17
IGHV1-3*01: *02 85:15 76:24 81:19 74:26
IGHV1-46*01: *03 69:31 70:30
IGHV1-69*01: *02 66:34 60:40
IGHV1-69*01: *06 70:30 72:28 69:31
IGHV2-5*01: *p11 60:40 47:53
IGHV2-5*10: *p11 14:86 17:83 15:85 15:85
IGHV2-70*01: *p14 66:34 61:39
IGHV2-70*01: *11: *p14 36:41:23 40:40:20
IGHV3-11*01: *p05 72:28 65:35 78:22
IGHV3-11*01: *03: *p05 47:39:13 52:38:9
IGHV3-15*01: 07 39:61 43:57
IGHV3-30*03: *18 21:79 32:68 29:71
IGHV3-30*03:*04:*18 13:55:32 8:69:23 13:49:38
IGHV3-33*01:05 82:18 91:9 82:18 88:12 78:22
IGHV3-48*01: 02a 67:33 25:75 64:36
IGHV3-48*01: 03 47:53 48:52
IGHV3-48*02: 03b 38:62 64:35 44:56
IGHV3-64*02: *p07 28:72 24:76 21:79 12:88
IGHV3-66*01: 03 71:29 78:22 79:21
IGHV3-66*02: 03 38:62 33:67 11:89 21:79
IGHV4-4*02: *07 62:38 61:39 64:36 71:29 75:25
IGHV7-4-1*01: *02a 74:26 11:89 19:81

Each column entry for each allele pair represents data derived from a single individual. Data from the tonsillar tissue-derived sequences are not included.

a

Significantly different frequencies of rearrangement within group; p < 0.001.

b

Significantly different frequencies of rearrangement within group; p < 0.05.

We have previously suggested that 104 of the 226 reported germline IGHV genes include sequencing errors, or other faults, and should be removed from the available repertoire (11). Analysis of the 108,210 sequences in this study is generally consistent with this conclusion, although 4 of the 104 germline sequences that were absent in analysis of the earlier, smaller data set (n = 4718) were seen in this study. These sequences (IGHV3-d*01, IGHV7-81*01, IGHV3-35*01, and IGHV4-30-2*02) should therefore be retained in the available repertoire. Some doubt was raised in the earlier study regarding the existence or accuracy of other reported alleles. The existence of nine of these alleles (IGHV1-46*03, IGHV1-69*05, IGHV1-69*10, IGHV1-f*01, IGHV2-70*12, IGHV3-15*07, IGHV3-33*05, IGHV3-66*02, and IGHV4-31*01) were confirmed in this study through the identification of hundreds of perfect alignments to these alleles, as shown in Supplemental Table II. Three genes (IGHV1-c*01, IGHV3-35*01, and IGHV7-81*01) were found to rearrange in this study but have previously been defined as open reading frames (ORFs) by IMGT. IMGT defines ORFs as sequences with coding regions that are ORFs, but where defects have been reported in the recombination signal sequences or other regulatory elements. These three genes should be retained in the available repertoire. IGHV sequences are defined by IMGT as pseudogenes if the coding region includes stop codons or frameshift mutations. Few pseudogenes seem capable of rearrangement, but seven pseudogenes (humIGHV177, humIGHV181, IGHV3-22*01, IGHV3-47*02, IGHV3-52*01, IGHV3-71*01, and IGHV4-55*01) were seen to rearrange in this study and should also be included in the IGHV repertoire that contributes to combinatorial diversity. An updated version of the iHMMune-align IGHV repertoire is presented as Supplemental Table IV.

The IGHJ germline repertoire includes just six genes, of which four have multiple reported allelic variants. We have previously concluded that a number of these variants (IGHJ3*01, IGHJ4*01, IGHJ5*01, and IGHJ6*01) may have been reported in error (10). None of these alleles was seen in the current study. A probable new polymorphism of the IGHJ6 gene, which we designate as IGHJ6*p05, was seen in one individual (see Supplemental Table I). There were 902 alignments to this putative polymorphism, of which 588 were perfectly matched to the sequence. An updated version of the iHMMune-align IGHJ repertoire is presented as Supplemental Table V.

The IGHD gene repertoire is also relatively small, with few reported allelic variants. We have previously questioned the existence of IGHD3-3*02, IGHD3-10*02, and IGHD3-16*01 (9), and these alleles were not seen in the current study. We also previously inferred a new IGHD allele, IGHD3-10*p03, which in this study was seen in ~5% of the VDJ rearrangements of one individual. This individual also appears to carry another previously unreported IGHD polymorphism, which we term IGHD3-16*p03 (see Supplemental Table I). An updated version of the iHMMune-align IGHD repertoire is presented as Supplemental Table VI.

The 12 individuals in this study included a patient with aplastic anemia. This individual had a strikingly different pattern of IGHD gene rearrangement, and the rearrangement frequencies of different IGHD genes in this patient are compared with those of the 11 other individuals in Table IV. The patient with aplastic anemia has an almost complete absence of sequences that seem to use the IGHD1-7, IGHD2-8, IGHD3-3, IGHD4-4/4-11, and IGHD6-6 genes. Analysis of the relatively few alignments that were made to these genes shows that the IGHD alignments are almost invariably short and include more mismatches than alignments to the same genes in other individuals (Supplemental Table III). Virtually all these sequences were found to align equally well to alternative IGHD genes, and no unequivocal alignments could be found to any of the five genes.

Table IV.

Mean frequency and range of alignments to IGHD genes in 11 healthy individuals and in an individual with aplastic anemia

Mean ± SEM (Healthy
Individuals) (%)
Range (Healthy
Individuals) (%)
AA Sample
IGHD1-1 0.7 ± 0.1 0.5–0.8 0.5
IGHD2-2 7.4 ± 0.8 4.3–13.2 3.3
IGHD3-3 7.8 ± 0.9 2.9–11.3 0.3
IGHD4-4/4-11a 1.0 ± 0.1 0.5–1.6 0.1
IGHD5-5/5-18b 3.6 ± 0.3 2.4–4.4 2.9
IGHD6-6 3.6 ± 0.4 2.1–4.4 0.7
IGHD1-7 1.0 ± 0.2 0.5–1.0 0.1
IGHD2-8 1.2 ± 0.2 0.4–1.6 0.2
IGHD3-9 3.3 ± 0.4 1–8–5.2 4.7
IGHD3-10 10.5 ± 1.3 5.9–15.4 20.8
IGHD5-12 2.3 ± 0.2 1.4–3.0 2.4
IGHD6-13 7.5 ± 0.5 5.5–12.1 13.0
IGHD2-15 4.9 ± 0.3 3.0–5.6 3.6
IGHD3-16 2.0 ± 0.1 1.2–2.8 2.2
IGHD4-17 5.0 ± 0.3 3.6–6.3 4.4
IGHD6-19 7.6 ± 0.3 6.6–9.1 11.8
IGHD1-20 0.7 ± 0.1 4.2–6.9 5.6
IGHD2-21 2.4 ± 0.1 2.1–4.4 1.8
IGHD3-22 12.2 ± 0.8 11.3–14.8 13.0
IGHD4-23 1.4 ± 0.1 0.8–2.1 1.5
IGHD5-24 2.1 ± 0.1 1.8–2.6 1.6
IGHD1-26 5.1 ± 0.2 4.2–6.9 5.6
IGHD7-27 1.0 ± 0.1 0.6–1.7 1.4

Data for six IGHD genes that may not be present in the genome of the AA patient are in boldface.

a

IGHD4-4 and IGHD4-11 have identical coding regions, but IGHD4-11 is associated with an apparently nonfunctional recombination signal sequence.

b

IGHD5-5 and IGHD5-18 have identical coding regions, and both genes are normally functional. The alignments shown for the AA patient could all use the IGHD5-18 gene.

AA, aplastic anemia.

It is impossible to differentiate between the presence of IGHD4-4 and IGHD4-11 in VDJ rearrangements, because the coding regions of these genes are identical. IGHD4-11 is defined by IMGT as an ORF that is incapable of recombination because of an unusual 5′ heptamer recombination signaling sequence (19). The almost complete absence of alignments to IGHD4-4/4-11 therefore probably reflects an absence of rearrangements using the IGHD4-4 gene. It is also impossible to distinguish between the identical IGHD5-5 and IGHD5-18 genes. The alignment of sequences to the IGHD5-5/5-18 genes could be entirely the result of VDJ rearrangements using the IGHD5-18 gene, and certainly the individual concerned had the second lowest proportion of sequences that aligned to IGHD5-5/5-18. We therefore suggest that this individual may have a deletion of six contiguous genes: IGHD3-3, IGHD4-4, IGHD5-5, IGHD6-6, IGHD1-7, and IGHD2-8.

Discussion

The individual variation in the IgH chain V region gene locus that we report in this paper has been suspected (20) but is greatly clarified by examination of large numbers of independent VDJ recombinations from single individuals. IGH sequence data sets from 12 individuals presented in this study each included between 40 and 46 IGHV genes that are known to contribute to the functional, expressed repertoire. This increases to between 45 and 60 IGHV genes when allelic variants are considered. Although all individuals were heterozygous at some gene loci, they were homozygous at most gene loci. Given the extent of reported allelic variation in the H chain IGHV gene repertoire, this is perhaps surprising, but a substantial proportion of these allelic variants may have been reported in error (11). In addition, there has been no data on the frequency with which different alleles are carried in the population. It seems that most gene segments exist in the study population as single common alleles, with relatively rare allelic variants.

Most frequently rearranged IGHV genes are “core” genes that were seen in all individuals. Genetic differences between individuals usually involved “noncore” genes that each make a relatively modest contribution to the overall repertoire of VDJ rearrangements, although together they accounted for 13.5% of all IGHV sequences seen in the rearrangements of one individual (data not shown). IGHV3-30-3, IGHV4-30-2, and IGHV4-30-4 have been reported as a gene cluster that is only seen in the genome of some individuals (21). These genes were seen in 7 of the 12 individuals in this study.

Although copy number polymorphisms have not previously been reported for Ig genes, the presence of more than two alleles of certain genes in the genome of some individuals is not surprising, given the extent of such structural variation within the human genome and the repetitive nature of the IgH chain locus (22). There is certainly no reason to believe that the observation of multiple alleles is a result of sequencing errors. It is possible that some bar codes may have been misread. However all bar codes were designed to differ at two or more base positions so that sequencing errors would be unlikely to convert one bar code into another. Rare chimeric PCR products could also result in swapped bar codes. If, however, contamination happened on the scale that would be required to account for the large numbers of triple allele alignments, it would lead to many genes being represented by more than two alleles in these individuals. In fact only three individuals have more than one gene that was represented by more than two alleles. In one of these individuals, the IGHV1-69 gene and the IGHV2-70 gene were each represented by three alleles, suggesting a single duplication event involving these contiguous genes. It is also noteworthy that three individuals carried the same three IGHV3-30 alleles, whereas other individuals carried just two of the three alleles. Mutational hot spots and systematic errors, such as the formation of chimeric sequences, cannot therefore account for the observations.

Individual variation was also seen in both the IGHD and IGHJ regions. In the IGHJ region, variation was confined to the IGHJ6 gene, where 7 of 12 individuals were homozygous for IGHJ6*02, one individual was homozygous for IGHJ6*03, three individuals were heterozygous with IGHJ6*02 and IGHJ6*03, whereas one individual produced rearranged sequences using IGHJ6*02 and the putative allele IGHJ6*p05. IGHJ6*02 and IGHJ6*03 define the previously reported “b” and “c” IGHJ haplotypes (23), where these IGHJ alleles are found in association with the IGHJ genes IGHJ1*01, IGHJ2*01, IGHJ3*02, IGHJ4*02, and IGHJ5*02. Variation of the IGHD region was seen in the IGHD2-2, IGHD2-8, IGHD2-21, IGHD3-10, and IGHD3-16 genes, although few individuals were heterozygous at any loci, except IGHD2-2 and IGHD2-21.

The consequences of allelic differences are worthy of further scrutiny. The number of heterozygous loci varied substantially between individuals, leading almost certainly to variation in the diversity of the expressed H chain repertoire between individuals. And although many alleles differ from one another by just a single nucleotide, such differences can be significant. Allelic differences can lead to quite different Ag-binding properties. It has been shown, for example, that although IGHV3-23*01 may form Abs that are critical for defense against encapsulated Haemophilus influenzae, the IGHV3-23*03 allele is unable to do so (24).

This study also demonstrates that highly similar IGHV alleles may have different propensities for rearrangement and therefore for expression. These differences were even seen in alleles such as IGHV1-46*01 and IGHV1-46*03, where the different sequences encode identical amino acid sequences. It therefore seems unlikely that the variable frequencies of alleles in the data sets reflect differences in either self-reactivity or reactivity to foreign Ag. Differences could be the results of variation in the recombination signal sequences that are located at the 3′ end of each IGHV gene. Such variation is known to influence the frequency of rearrangement (25), but because recombination signal sequences have not been reported for all genes and have been reported for very few allelic variants, this cannot yet be determined.

Variations in the translational efficiency of different codons could also contribute to differences between alleles in their propensity to appear in successful VDJ rearrangements in B cells in peripheral blood. There are, however, often 2- or 3-fold differences in the rearrangement frequency, whereas the alleles in question often differ from one another by just a single codon. Translational efficiency is therefore unlikely to be the major determinant of the differences between alleles.

That particular alleles are seen to recombine at similar frequencies in different individuals is unlikely to be a consequence of similar histories of Ag exposure, for similar recombination frequencies were seen in VDJ rearrangements when analysis was confined to unmutated sequences (data not shown). These sequences must be derived primarily from naive B cells, and the frequency with which each allele is represented in the data set should therefore be unaffected by Ag selection. Those pseudogenes that are capable of recombination also showed consistency of rearrangement frequency between individuals.

The use of high-throughput sequencing to explore individual differences in the Ig locus is perhaps best illustrated by the certainty with which the genotype of the IGHD locus could be determined in this study. The possible importance of such investigations into individual genotypes is demonstrated by our identification in this study of a major deletion of IGHD genes in an individual with aplastic anemia. Such a deletion would significantly impact the possible expressed repertoire of this individual. It has been argued that variation in the expressed repertoire could lead to the emergence of self-reactive B cell clones (reviewed in Ref. 26). Whether or not an absence of IGHD genes may play a causal role in the development of aplastic anemia is deserving of further study.

Investigation of genotypic differences in the H chain V region gene locus requires a detailed understanding of the allelic variants that are found in the human population. We have previously argued that few common IGHV gene polymorphisms remain to be discovered, other than alleles of rarely expressed genes, and perhaps uncommon alleles of more frequently expressed genes (11). In this study, however, a large number of previously unreported polymorphisms were identified, and some of them were found to be relatively commonly carried in the genomes of the study population. The IGHV3-7*p03 allele, for example, was seen in 4 of 12 individuals and was found in between 1.0 and 2.0% of all VDJ rearrangements in those individuals. The IGHV2-5*p11 allele that we first inferred in an earlier study (11) was seen in all individuals examined and was found in ~1.4% of the 108,210 VDJ rearrangements analyzed. Many other putative polymorphisms were present at lower frequency in the sequence data set but were clearly detectable. Nine of the 14 new putative polymorphisms were each seen in just a single individual and were present in as few as 0.2% of the VDJ rearrangements of that individual. Nevertheless, this amounted to between 27 and 69 examples of the sequences, allowing even these putative polymorphisms to be identified with confidence.

The immunoreceptor gene loci are arguably the most complex loci in the mammalian genome, and this complexity has hampered the study of human Ig gene sequences. Nevertheless, insights into both protective and pathogenic immune responses have been gained over a period of 25 y by the reporting of thousands of sequences. Because the diversity of the peripheral blood IgH repertoire is thought to include, at a lower estimate, millions of distinct H chain sequences (12), the combined data of prior studies likely represent only a first glimpse of the available repertoire and of variation within human populations. Further application of high-throughput sequencing to the study of Ig and TCR genes, both in V(D)J rearrangements and via whole genome sequencing efforts now under way, will allow the repertoire of rearranged genes, and even individual variation in the available repertoire, to be explored in detail.

Supplementary Material

Supplementary Data

Abbreviations used in this paper

AA

aplastic anemia

IMGT

ImMunoGeneTics

ORF

open reading frame

P

pseudogene

Footnotes

The online version of this article contains supplemental material.

Disclosures

The authors have no financial conflicts of interest.

References

  • 1.Tonegawa S. Somatic generation of antibody diversity. Nature. 1983;302:575–581. doi: 10.1038/302575a0. [DOI] [PubMed] [Google Scholar]
  • 2.Benedict CL, Gilfillan S, Thai TH, Kearney JF. Terminal deoxynucleotidyl transferase and repertoire development. Immunol. Rev. 2000;175:150–157. [PubMed] [Google Scholar]
  • 3.Cook GP, Tomlinson IM, Walter G, Riethman H, Carter NP, Buluwela L, Winter G, Rabbitts TH. A map of the human immunoglobulin VH locus completed by analysis of the telomeric region of chromosome 14q. Nat. Genet. 1994;7:162–168. doi: 10.1038/ng0694-162. [DOI] [PubMed] [Google Scholar]
  • 4.Matsuda F, Ishii K, Bourvagnet P, Kuma K, Hayashida H, Miyata T, Honjo T. The complete nucleotide sequence of the human immunoglobulin heavy chain variable region locus. J. Exp. Med. 1998;188:2151–2162. doi: 10.1084/jem.188.11.2151. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Venter JC, Adams MD, Myers EW, Li PW, Mural RJ, Sutton GG, Smith HO, Yandell M, Evans CA, Holt RA, et al. The sequence of the human genome. Science. 2001;291:1304–1351. doi: 10.1126/science.1058040. [DOI] [PubMed] [Google Scholar]
  • 6.International Human Genome Sequencing Consortium. Finishing the euchromatic sequence of the human genome. Nature. 2004;431:931–945. doi: 10.1038/nature03001. [DOI] [PubMed] [Google Scholar]
  • 7.Istrail S, Sutton GG, Florea L, Halpern AL, Mobarry CM, Lippert R, Walenz B, Shatkay H, Dew I, Miller JR, et al. Whole-genome shotgun assembly and comparison of human genome assemblies. Proc. Natl. Acad. Sci. USA. 2004;101:1916–1921. doi: 10.1073/pnas.0307971100. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Corbett SJ, Tomlinson IM, Sonnhammer ELL, Buck D, Winter G. Sequence of the human immunoglobulin diversity (D) segment locus: a systematic analysis provides no evidence for the use of DIR segments, inverted D segments, “minor” D segments or D-D recombination. J. Mol. Biol. 1997;270:587–597. doi: 10.1006/jmbi.1997.1141. [DOI] [PubMed] [Google Scholar]
  • 9.Lee CEH, Gaëta B, Malming HR, Bain ME, Sewell WA, Collins AM. Reconsidering the human immunoglobulin heavy-chain locus. 1. An evaluation of the expressed human IGHD gene repertoire. Immunogenetics. 2006;57:917–925. doi: 10.1007/s00251-005-0062-5. [DOI] [PubMed] [Google Scholar]
  • 10.Lee CEH, Jackson KJL, Sewell WA, Collins AM. Use of IGHJ and IGHD gene mutations in analysis of immunoglobulin sequences for the prognosis of chronic lymphocytic leukemia. Leuk. Res. 2007;31:1247–1252. doi: 10.1016/j.leukres.2006.10.013. [DOI] [PubMed] [Google Scholar]
  • 11.Wang Y, Jackson KJL, Sewell WA, Collins AM. Many human immunoglobulin heavy-chain IGHV gene polymorphisms have been reported in error. Immunol. Cell Biol. 2008;86:111–115. doi: 10.1038/sj.icb.7100144. [DOI] [PubMed] [Google Scholar]
  • 12.Boyd SD, Marshall EL, Merker JD, Maniar JM, Zhang LN, Sahaf B, Jones CD, Simen BB, Hanczaruk B, Nguyen KD, Nadeau KC, Egholm M, Miklos DB, Zehnder JL, Fire AZ. Measurement and clinical monitoring of human lymphocyte clonality by massively parallel VDJ pyrosequencing. Sci. Transl. Med. 2009;1 doi: 10.1126/scitranslmed.3000540. 12ra23. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Gaëta BA, Malming HR, Jackson KJL, Bain ME, Wilson P, Collins AM. iHMMune-align: hidden Markov model-based alignment and identification of germline genes in rearranged immunoglobulin gene sequences. Bioinformatics. 2007;23:1580–1587. doi: 10.1093/bioinformatics/btm147. [DOI] [PubMed] [Google Scholar]
  • 14.Huse SM, Huber JA, Morrison HG, Sogin ML, Welch DM, Huse SM, Huber JA, Morrison HG, Sogin ML, Welch DM. Accuracy and quality of massively parallel DNA pyrosequencing. Genome Biol. 2007;8:R143. doi: 10.1186/gb-2007-8-7-r143. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Margulies M, Egholm M, Altman WE, Attiya S, Bader JS, Bemben LA, Berka J, Braverman MS, Chen YJ, Chen Z, et al. Genome sequencing in microfabricated high-density picolitre reactors. Nature. 2005;437:376–380. doi: 10.1038/nature03959. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Brakenhoff RH, Schoenmakers JG, Lubsen NH. Chimeric cDNA clones: a novel PCR artifact. Nucleic Acids Res. 1991;19:1949. doi: 10.1093/nar/19.8.1949. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Lefranc MP, Giudicelli V, Ginestoux C, Jabado-Michaloud J, Folch G, Bellahcene F, Wu Y, Gemrot E, Brochet X, Lane J, et al. IMGT, the international ImMunoGeneTics information system. Nucleic Acids Res. 2009;37:D1006–D1012. doi: 10.1093/nar/gkn838. (Database issue) [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Retter I, Althaus HH, Münch R, Müller W. VBASE2, an integrative V gene database. Nucleic Acids Res. 2005;33:D671–D674. doi: 10.1093/nar/gki088. (Database issue) [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Giudicelli V, Lefranc MP. Ontology for immunogenetics: the IMGT-ONTOLOGY. Bioinformatics. 1999;15:1047–1054. doi: 10.1093/bioinformatics/15.12.1047. [DOI] [PubMed] [Google Scholar]
  • 20.Li H, Cui X, Pramanik S, Chimge NO. Genetic diversity of the human immunoglobulin heavy chain VH region. Immunol. Rev. 2002;190:53–68. doi: 10.1034/j.1600-065x.2002.19005.x. [DOI] [PubMed] [Google Scholar]
  • 21.Lefranc MP. Nomenclature of the human immunoglobulin heavy (IGH) genes. Exp. Clin. Immunogenet. 2001;18:100–116. doi: 10.1159/000049189. [DOI] [PubMed] [Google Scholar]
  • 22.Feuk L, Carson AR, Scherer SW. Structural variation in the human genome. Nat. Rev. Genet. 2006;7:85–97. doi: 10.1038/nrg1767. [DOI] [PubMed] [Google Scholar]
  • 23.Mattila PS, Schugk J, Wu H, Mäkelä O. Extensive allelic sequence variation in the J region of the human immunoglobulin heavy chain gene locus. Eur. J. Immunol. 1995;25:2578–2582. doi: 10.1002/eji.1830250926. [DOI] [PubMed] [Google Scholar]
  • 24.Liu L, Lucas AH. IGH V3-23*01 and its allele V3-23*03 differ in their capacity to form the canonical human antibody combining site specific for the capsular polysaccharide of Haemophilus influenzae type B. Immunogenetics. 2003;55:336–338. doi: 10.1007/s00251-003-0583-8. [DOI] [PubMed] [Google Scholar]
  • 25.Nadel B, Tang A, Escuro G, Lugo G, Feeney AJ. Sequence of the spacer in the recombination signal sequence affects V(D)J rearrangement frequency and correlates with nonrandom Vκ usage in vivo. J. Exp. Med. 1998;187:1495–1503. doi: 10.1084/jem.187.9.1495. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Dörner T, Lipsky PE. Immunoglobulin variable-region gene usage in systemic autoimmune diseases. Arthritis Rheum. 2001;44:2715–2727. doi: 10.1002/1529-0131(200112)44:12<2715::aid-art458>3.0.co;2-l. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary Data

RESOURCES