Skip to main content
Proceedings of the National Academy of Sciences of the United States of America logoLink to Proceedings of the National Academy of Sciences of the United States of America
. 2006 Feb 21;103(9):3192–3197. doi: 10.1073/pnas.0511280103

Heterogeneous but conserved natural killer receptor gene complexes in four major orders of mammals

Li Hao 1,*, Jan Klein 1, Masatoshi Nei 1,*
PMCID: PMC1413923  PMID: 16492762


The natural killer (NK) receptor gene complex (NKC) encodes a large number of C-type lectin-like receptors, which are expressed on NK and other immune-related cells. These receptors play an important role in regulating NK-cell cytolytic activity, protecting cells against virus infection and tumorigenesis. To understand the evolutionary history of the NKC, we characterized the C-type lectin-like NKC genes and their organization from four major orders of placental mammals, primates (human), rodents (mouse and rat), carnivores (dog), and artiodactyls (cattle) and then conducted phylogenetic analysis of these genes. The results indicate that the NKC of placental mammals is highly heterogeneous in terms of the gene content and rates of birth and death of different gene lineages, but the NKC is also remarkably conserved in its gene organization and persistence of orthologous gene lineages. Among the 28 identified NKC gene lineages, 4, KLRA1, KLRB1, CLEC2D, and CLEC4A/B/C, have expanded rapidly in rodents only. The high birth and death rate of these 4 gene families might be due to functional differentiation driven by positive selection. Identification of putative NKC sequences in opossum and chicken genomes implies that the expansion of the NKC gene families might have occurred before the radiation of placental mammals but after the divergence of birds from mammals.

Keywords: contraction–expansion, evolution, genomic organization

Natural killer (NK) cells are a group of lymphocytes which have intrinsic cytolytic activity against certain virus-infected and tumor cells. They are crucial in innate immunity, as demonstrated by their ability to kill target cells without prior exposure to pathogens (1). The ligands of the NK cell receptors (NKR) are mostly MHC class I molecules. Most NKRs belong to two large gene families, the Ig superfamily and the C-type lectin superfamily (CLSF) (24), clustered at a different genomic region. The CLSF proteins are characterized by the possession of at least one C-type lectin-like domain (CTLD). The CLSF proteins are classified into 14 groups designated I–XIV, and each group is distinguished from other groups primarily by the type of additional domains its members share (5). The sequences of the CTLDs from the different groups are alignable, whereas the sequences outside the CTLD are generally not.

In the human species, genes encoding the CTLD-bearing NKRs are clustered in a single region, the NK receptor gene complex (NKC) on chromosome 12p13 (2, 4). The C-type lectin genes in the NKC belong to groups II and V of the 14 CLSF groups. Although many group V proteins are known to function as NKRs, the remaining group V and all group II proteins are not expressed in NK cells and appear to have different functions (4). The original function of the CTLD appears to have been binding of carbohydrates in the presence of Ca2+ ions. This function has been retained by most of the CTLDs in the 14 groups, but the CTLDs of the group V CLSFs have lost this function and have become involved in protein–protein interactions instead (5).

Previous studies (2, 4) described the conserved genomic structure of NKC in human, mouse, and rat. The gene content of the NKC from other species, the extent of variation in gene numbers among species, and the long-term evolutionary history of the NKC remain largely unknown. To address these issues, we searched six mammalian genomes and one avian genome, specifically, human, mouse, rat, dog, cattle, opossum, and chicken, for either the entire NKC genomic segment or the putative NKC sequences. By this large-scale, multispecies comparison, we have been able to trace the important evolutionary changes the NKC has undergone during the last 300 million years of its existence.


To gain an insight into evolution of the NKC, in the first phase of the study, we searched the genomic databases of five mammalian species for C-type lectin-like NKC genes. The five species represent four orders of placental mammals: Primates (human), Rodentia (mouse and rat), Carnivora (dog), and Artiodactyla (cattle). We also determined the position of the individual genes in the genomes and classified them in terms of their homology to genes identified by other investigators. The results of this analysis are summarized in Figs. 1 and 2. Fig. 1 compares the organization of the NKCs in the five species, and Fig. 2 shows the phylogenetic relationships among the identified sequences. For additional information about the individual genes, see Table 1, which is published as supporting information on the PNAS web site.

Fig. 1.

Fig. 1.

Genomic structure of NKC in human, mouse, rat, dog, and cattle. Each pointed triangle represents a gene. Note that the distance between genes is not to scale. Green triangles, KLR genes; red triangles, CLEC-type genes; blue triangles, CD69 and OLR1 genes; orange triangles, CLEC15A and CLEC16 genes, which are identified in this study. The pseudogenes are labeled with ‘p’ at the end of the gene names. The light red line indicates orthologous relationships between genes of different species. Red bars indicate orthologous gene groups. The name of each gene in the rodent CLEC4A/B cluster is not shown because of space limitation. The asterisks indicate deviation from standardized nomenclature of human and mouse genes. See Table 1 for details. N, number of genes; chr, chromosome; Mb, length of NKC in megabases; Un, unassigned genomic contigs;//, segments containing non-CLSF genes in NKC. The position of some of these non-CLSF genes [e.g., the gene GABARAPL1 (GABA-receptor-associated protein-like 1) located between OLR1 and KLRE1] are conserved in the NKCs of the five species of placental mammals, indicating that the genes were presumably present in the NKC in the most recent common ancestor of the five species of placental mammals.

Fig. 2.

Fig. 2.

NJ tree of CTLD sequences from 172 NKC proteins identified in five placental mammals. Color code: human, blue; mouse, red; rat, purple; cattle, black; dog, green. The topology of this tree was obtained by using p-distance; the option of pairwise deletion was used. Bootstrap values are shown for orthologous groups. The asterisks indicate deviation from standardized nomenclature of human and mouse genes. See Table 1 for details.

Overall Organization of the NKC.

In four of the five species, the NKC occupies a single chromosomal region (12p13 in human, 6 in mouse, 4 in rat, and 27 in dog). In cattle, however, it is divided between two chromosomes, 1 and 5. In all of the five species, the NKC contains genes of 2 of the 14 CLSF groups, II and V. In the three nonrodent mammals, these two groups occupy separate genomic segments, which are adjacent to each other in human and dog but on separate chromosomes in cattle (Fig. 1). In the two rodent species, one of the group V genes (KLRG1) has been transposed to a location at the other end of the group II segment. The transposition, therefore, must have occurred before the divergence of mice from rats, but presumably after the divergence of rodents from the other three orders. The genes in the group II segment are arranged in the same order in the five species, except that the segment appears to be inverted in the dog. The possibility that the inversion is an artifact of an erroneous contig assembly could not be excluded, however. The group II segment does not seem to contain any proven or implied NK cell receptor (NKR) genes in any of the five species.

The group V segment contains NKR genes [designated killer cell lectin-like receptor (KLR)] as well as other genes encoding the CTLD [designated C-type lectin (CLEC)] whose function is largely unknown, except that their products do not appear to act as NKRs. One of the CLEC genes (CLEC2D, previously called Ocil or Clr) apparently codes for the ligand of one of the KLR genes (KLRB1; see refs. 6 and 7). The KLR genes cluster at one end of the group V segment, except KLRB1 and KLRF, which are intercalated among the CLEC2 genes (Fig. 1). Although the individual genes appear to have expanded and contracted by duplications and deletions during the evolution of the five species, the overall arrangement of the genes or gene groups has remained remarkably stable. This observation suggests that the expansions have occurred largely by tandem duplications and that the duplicated genes have remained clustered without subsequent rearrangements. The general order of the genes/gene clusters in the group V segment of the five species is: KLRG1-KLRB1-CLEC2D-CD69-CLEC15A-CLEC1C-CLEC1D-CLEC1B-CLEC9A-CLEC1A-CLEC7A-OLR1-KLRD1-KLRK1-KLRC-KLRA1, with some minor variations. Exceptions to this order are found mainly in cattle, in which some of the orthologues seemingly missing in the cattle group V segment are present in the unassigned genomic contigs. It is therefore possible that, when the genome is completely assembled, the genes will find their way back into the NKC. The orphan group V genes of cattle include KLRB1, KLRF1, KLRF2, CLEC1D, CLEC1B, CLEC9A, KLRH1, and KLRA1. They are clearly orthologues of their name-sakes in the other species (Fig. 2), but their position in the genome is uncertain at this stage.

Phylogenetic Relationships.

We collected 172 C-type lectin-like NKC sequences of the five mammalian species and used their corresponding protein sequences of CTLD to construct a phylogenetic tree by the neighbor-joining (NJ) method (Fig. 2). In the collection, we included pseudogenes with complete CTLD-encoding part, undisrupted by any stop codons. The Supporting Sequence Alignment is published as supporting information on the PNAS web site, and the full-length amino acid sequences of those putative functional NKC sequences can be found at our database ( The tree served two main purposes: to identify orthologous sequences in the five species and to examine changes during evolution of the different lineages of orthologous genes. Generally, the identification of orthologues posed no great difficulty because their clustering on the tree was supported by high bootstrap values. The only confounding situations were those in which the gene apparently duplicated one or more times in a given species. In such cases, identification of a single gene in that species as an orthologue was not possible. Examples are the expansions of the CLEC4A/B/C, CLEC2D, KLRA1, and KLRB1 genes in the two rodent species. Some of these duplications took place before and others after the divergence of mice and rats. For the purpose of the present discussion, we treat each such a cluster as an orthologous group of the genes in other species.

On the tree in Fig. 2, we identified 28 lineages of orthologous genes, including 11 KLRs, 15 CLECs, 1 OLR1, and 1 CD69. In addition, there were five singleton genes, of which four were from cattle and were mostly unmapped, whereas one, CLEC16, was found in the dog and seemed to be related to the KLRA genes. For simplicity, we didn’t take these five genes into account in the following discussion. Of the 28 gene lineages, 15 had an orthologue in each of the five species examined. The absence of an orthologue in one or more species in the remaining 13 lineages could be explained by either the incompleteness of the genomic data in the databases (especially in the dog and cattle genomes) or gene deletions in one species or in the ancestors of some of the species. Two of the 28 gene lineages (CLEC15A and KLRF2) were identified in this study. The CLEC15A gene was found in the mouse, rat, dog, and cattle, and the human orthologue may have been lost. The KLRF2 gene was identified in the human and cattle, whereas the human sequence appears to be a pseudogene; the gene has apparently been lost in the two rodent species and has not been found in the dog genome thus far. The remaining 26 human and mouse genes were either identified previously or annotated by genome project (Table 1). Most of the orthologous genes in the dog and cattle had not been described.

The determination of phylogenetic relationships among the different orthologous lineages has proved to be more difficult than the identification of the orthologues themselves, especially among the KLR genes. Most, if not all, of the lineages apparently diverged before the divergence of the four mammalian orders sampled, but presumably at different times, because some of the lineages appear to be related more closely to one another than do others. Among the CLEC genes, two well supported monophyletic clades of orthologous lineages can be recognized, the CLEC4 clade composed of the CLEC4A-E and CLEC4N genes and the CLEC2 clade containing the CLEC2A, CLEC2B, CLEC2D, and CD69 genes. On the tree, the CLEC15A gene also groups with the CLEC2 clade, although the bootstrap support for the grouping is low in this case. Other monophyletic clades, admittedly not well supported, are the CLEC1 clade composed of the CLEC1A-D genes and possibly also the CLEC7A and OLR1 genes; and the KLRB/F clade containing the KLRB1, KLRF1, and KLRF2 genes. The CLEC4 clade encompasses all the group II CLSF genes of the NKC; all the other clades contain group V CLSF genes. The groupings into clades are reflected in the standard HUGO nomenclature of these genes with minor modification (Table 1). For instance, the CLEC12A gene has been renamed to CLEC1C, because it is more closely related to CLEC1 than to any other genes. Similarly, the CLEC6A gene in humans has been renamed to CLEC4N.

Rapid Expansion and Contraction of Rodent NKC Genes.

The lengths of the NKCs in human, dog, and cattle are comparable (2.8 Mb, 2.4 Mb, and 3.3 Mb, respectively), as are the gene numbers (29, 22, and 32, respectively). However, the mouse and rat NKCs encompassing 8.7 and 10.3 Mb, respectively, are 2.7 and 3.6 times longer than the human NKC, respectively. Corresponding to this length differences are differences in the number of genes: The human NKC (29 genes) contains only about one half of the genes present in the mouse NKC (57 genes) and slightly more than one third of the genes present in the rat (75 genes). These observations suggest that an expansion of genes occurred in the mouse/rat lineage compared with the lineage leading to the three other species and in the rat lineage in comparison with the mouse lineage.

This suggestion is borne out by the phylogenetic analysis. The tree in Fig. 2 indicated that several genes in the complex have duplicated repeatedly in the mouse and rat lineages. Gene duplications in other species (e.g., CLEC1B in human and KLRI in cattle) had occurred to a much lesser extent. Rodent gene expansions occurred in both the KLR- and CLEC-type genes and affected some genes (or some parts of the NKC) more than others. The KLRA gene underwent, by far, the most extensive expansions in the two rodent species. To keep the NKC maps and the tree within acceptable limits, we have limited the presentations of the mouse and rat KLRA cluster to just a few genes in Figs. 1 and 2. In reality, however, the KLRA cluster of the mouse contains 15 genes and that of the rat contains 34 genes (810). In the CLEC category, the two gene lineages most subject to duplications are CLEC4A/B/C and CLEC2D. In the former, there are five and four genes in the mouse and rat, respectively. In the latter, there are eight and eleven genes in the mouse and rat, respectively, of which four and two, respectively, might be nonfunctional. A phylogenetic tree of the mammalian CLEC2D genes, including pseudogenes, constructed from nucleotide sequences is shown in Fig. 3, which is published as supporting information on the PNAS web site. The tree suggests the existence of two ancestral CLEC2D lineages in the mammals tested, CLEC2Da and CLEC2Db. The CLEC2Db lineage might have become extinct in the two rodent species and the cattle. All rodent CLEC2D genes are derived from the CLEC2Da lineage, and they all form a single monophyletic clade supported by a high bootstrap value, suggesting that they started to diverge after the divergence of rodents from other mammals. Because of this phylogenetic relationship, we use a different nomenclature for the CLEC2D genes in rodents than the mouse genomic nomenclature committee (MGNC; see Table 1 for details). From these observations, it seems that most prone to amplifications are some of the genes at or near the KLRA and CLEC4 ends of the NKC. This observed polarity may be accidental. It may not be coincidental, however, that one of the rapidly expanding genes (CLEC2D) has been identified recently (6, 7) as encoding the ligand of one of the KLR genes in the NKC (KLRB1). Interestingly, KLRB1 and CLEC2D genes are intermingled within the NKC. It is tempting to speculate that the expansions might be related to the coevolution of the ligand with its receptor. The fact that the orthologies of the expanded genes in rodents are difficult to ascertain suggests that the sequences might have diverged very quickly after the duplications. On the other hand, however, the tree in Fig. 3 suggests that some of the duplications took place before the divergence of the mouse and rat species >30 million years ago (11). Analysis of the presence and absence of genes in the different species suggests that, corresponding to the gains, there have also been gene losses in rodents (data not shown). Overall, rodent NKCs evolve more rapidly and are less stable than NKCs of nonrodents because of the high birth and death rate.

Positive Selection During the Early Expansion of the CLEC2D Genes in Rodents.

On the phylogenetic tree in Fig. 3, the branch A leading to the clade of the expanded rodent CLEC2D genes is rather long, suggesting the possibility that positive selection might have been involved in the expansion. To test this possibility, we inferred the ancestral CLEC2D sequences from the functional CLEC2D sequences (pseudogenes excluded) and then estimated the number of nonsynonymous (aN) and synonymous (aS) substitutions per sequence per branch. The results show that, after speciation at node a, there were 13 nonsynonymous substitutions but no synonymous substitution on branch A (Fig. 3). The Fisher’s exact test that compares the ratio of aN/aS with the expected ratio (N/S) gives statistically significant support (P = 0.04) for the positive-selection hypothesis. During the early divergence of the rodent CLEC2Da genes (branches A, B, and C), there are a total of 47 nonsynonymous substitutions and only four synonymous substitutions. For comparison, in the lineage leading to the cattle CLEC2Da genes, there were 17 nonsynonymous and 35 synonymous substitutions. These observations suggest that a functional differentiation driven by positive selection occurred at the early stage of the divergence of rodent CLEC2Da genes. At this stage, however, what kinds of functional change have occurred is unclear.

Putative NKC Sequences of Opossum and Chicken.

The demonstration that the 28 orthologous NKC gene lineages diverged from one another before the divergence of the four placental mammal orders raises the possibility that the divergence might have occurred much deeper in the evolutionary past. To explore this possibility, we extended the search for NKC genes to the opossum, a representative of marsupial mammals, which diverged from the placental mammals ≈170 million years ago (MYA), and to the domestic fowl (chicken), a bird representative, which diverged from mammals ≈310 MYA (12). The genomes of these two species are currently being sequenced and, although the data thus far available are incomplete, they nevertheless afford us an insight into the long-term evolution of the NKC genes.

The search of the currently chromosome-wise unassembled opossum genome yielded eight sequences homologous to the NKC genes of placental mammals. We used their translated amino acid sequences, together with the chicken sequences (see below) and selected sequences of placental mammals, to construct a phylogenetic tree by the NJ method (see Fig. 4, which is published as supporting information on the PNAS web site). The criteria for the selection of the placental mammal sequences were full coverage of the previously identified 28 gene lineages (Fig. 2) and representation of one rodent (mostly mouse) and one nonrodent (human, in most cases) genes for each lineage. The observed good correspondence between the topologies of the trees in Figs. 2 and 4 indicates that the selection procedure did not bias the sample. On the tree in Fig. 4, four of the eight opossum sequences ally themselves with the CLEC4 cluster and so, presumably, represent CLSF group II genes; the remaining four sequences are apparently group V genes. One of the four opossum group II sequences, CLEC4E, shows an orthologous relationship to the placental mammalian CLEC4E genes. This observation indicates that the origin of the CLEC4E gene lineage and, hence, the divergence of CLEC4E from the other CLEC4 gene lineages occurred before the divergence of marsupial and placental mammals. The remaining three opossum group II sequences are in an outgroup position to the entire CLEC4 cluster together with a singleton cattle sequence, whose chromosomal location is undetermined. The existence of this outgroup cluster suggests that it might represent a new ancestral CLEC4 gene lineage or even some other group II CLSF genes. The four group V opossum CLSF genes appear to be orthologues of the CLEC1A, CLEC1B, KLRK1, and CD69 genes in placental mammals. In each case, the opossum sequence is in an outgroup position to the placental mammal members of the gene lineage, and the clade is supported by a high bootstrap value. Overall, these observations suggest that at least five NKC genes are present in the opossum and that they originated before the divergence of marsupial and placental mammals.

By searching the chicken genome, we identified ten NKC-like sequences, and, by searching a chicken EST database, we have found nine additional sequences. Two of these sequences were reported in ref. 13. From the phylogenetic tree (Fig. 4), only one group II CLSF sequence, which comes from the EST database, has been identified in the chicken, whereas the remaining 18 chicken sequences apparently represent CLSF group V genes. These two groups of CLSF genes, therefore, must have diverged from each other before the divergence of mammals and birds. The two previously reported chicken NKC sequences are on chromosome 16, linked to the chicken MHC (13). The genomic sequence corresponding to the first reported sequence, B-lec, has only the first two exons of the CTLD-encoding part on chromosome 16, whereas the third exon is located on an unmapped contig. Yet, the EST database contains a full-length CTLD sequence corresponding to B-lec. These discrepancies are probably caused by genome assembly errors, although the possibility of haplotype polymorphism has not been excluded. On the tree, another reported sequence, B-NK, forms an outgroup to the KLRB/F cluster, pushing its origin to the time before the bird–mammal split. The remaining 17 sequences are affiliated with the CLEC2 cluster of genes in placental mammals, one forming an outgroup to the pair of CLEC2A and CLEC2D lineages, and the other 16 chicken sequences, forming a monophyletic clade in an outgroup position to the group of CLEC2A, CLEC2B, and CLEC2D genes of placental mammals.


Our data indicate that a NKC similar to the human and mouse complexes exists also in three other species of placental mammals: rat, dog, and cattle (Fig. 1). Because these five species represent four very different orders of placental mammals, it is probably safe to predict that all placental mammals have a NKC, although not always as a single region on one chromosome: some species may have it split, like the cattle, into two parts residing on different chromosomes. The incompleteness of the characterization of the opossum and chicken genomes, compared with the five placental mammals, precludes us from making any firm statements about the organization of the NKC genes in nonplacental mammals and birds, at this stage. For convenience, we refer to the homologues in these species as putative NKC sequences, keeping in mind that they may not be clustered together in a manner similar to that of the five species of placental mammals.

In terms of the type of genes it contains and their evolution, the NKC of the placental mammals is a heterogeneous chromosomal region, and, to discuss its evolution, it is expedient to distinguish four levels of its heterogeneity. At the first level, the NKC genes fall into 2 of the 14 groups of CLSF genes, namely, groups II and V. We have demonstrated that genes belonging to these two groups are present not only in placental mammals, but also in marsupials and in birds. The two groups, therefore, must have separated from each other before the separation of mammals and birds. How far back in evolution the separation of the two groups occurred is uncertain. Group II genes have been described in teleost fishes, as have genes belonging to several other CLSF groups (14, 15). The presence of group V genes in teleosts is, however, controversial. Some authors have failed to identify them in genomes of those fishes that have been sequenced (15), but others (16, 17) have described whole clusters of such genes in certain teleostean orders. A gene with weak phylogenetic affinity to human group V genes has even been reported to be present in the genome of a protochordate (18). In all of the placental mammals tested, the group II and V genes of the NKC reside in separate genomic segments, either adjacent to each other on the same chromosome (human, mouse, rat, and dog), or on different chromosomes (cattle). Orphan genes belonging to these two groups may also be scattered over other genomic regions. Of the two different arrangements of group II and V NKC genes, the linked one seems to be more ancient, because it is more parsimonious than the alternative. Presumably, the separation of the two segments occurred in the evolutionary lineage leading to the artiodactyls. The divergence of group II and V genes presumably took place by tandem duplication, followed by a series of intragroup duplications. Although some of the duplicated genes may have been transposed to other chromosomes, the bulk of them has remained clustered together, the group II duplicates in one segment and the group V genes in an adjacent segment.

The second level of NKC gene heterogeneity is reflected in the distinction of the KLR from the CLEC genes. The KLR genes are unified by their expression in the NK cells and their function. The expression of an NKC gene in NK cells alone is apparently not sufficient to make it a KLR gene, because there are members of the CLSFs that are expressed in NK cells, yet do not function as KLR genes. Some of the CLEC genes apparently belong to this category (e.g., CLEC2D genes; ref. 6); several other CLEC genes, however, are expressed in different cell types but not in NK cells (19, 20). The CLEC genes lack a positive unifying feature that would distinguish them from the KLR genes; they are defined negatively as NKC genes whose products do not function as NK cell receptors. They will probably turn out to be a heterogeneous group with different genes specialized to different functions.

The third level of NKC gene heterogeneity is manifested in the existence of gene families, groups of genes belonging to different, but closely related, orthologous lineages. In addition to their close phylogenetic relatedness, members of the same family are generally also clustered on the genetic map of the NKC. These two characteristics identify families in both the group II and group V segments as well as in KLR and CLEC genes. The correspondence between the phylogenetic and physical clustering of genes within a family can be explained by the evolutionary history of the NKC genes, specifically by their origin by tandem duplication and the retention of their positions after the duplication events.

The fourth level of NKC gene heterogeneity is the differentiation into the individual genes, most of which have orthologues in the different species of placental mammals. At this level, too, there is a considerable degree of evolutionary conservation manifested in the existence of the 28 orthologous lineages presumably retained throughout the evolution of the placental mammals. The divergence of the 28 NKC gene lineages apparently predates the radiation of placental mammals. How far back the divergence occurred is not clear. The three NKC gene lineages identified in the chicken imply that the NKC might have started to expand after the divergence of birds from mammals.

The heterogeneity at all of the four levels reveals a surprising and seemingly paradoxical feature of NKC evolution. Much of the heterogeneity implies genomic instability, yet the complex also displays a remarkable degree of conservation. Behind much of the instability is the birth and death process, in which new genes are created by repeated gene duplications, and some duplicate genes can stay in the genome for a long time, whereas others may become pseudogenes and are eventually lost (21). Evidence for the process is apparent everywhere within the complex and in all of the species (Figs. 1 and 2), but nowhere is its effect as striking as in the four orthologous gene groups in the two rodent species KLRA1, KLRB1, CLEC2D, and CLEC4A/B/C. The KLRA gene group shows an expansion that occupies almost half of the entire NKC in the two rodents. By contrast, there is only one KLRA gene in each of the other placental mammals studied thus far (22), except horse (23). Similarly, the CLEC2D lineage is much expanded in rodents, as are the KLRB1 genes (Fig. 3), which code for the receptor of the CLEC2D-encoded proteins. Why the two rodents are so strongly affected by the birth and death process is unclear. Behind the increased rate of gene birth and gene loss could be factors peculiar to all rodents or to the individual NKC genes. If the former were the case, there would have to be an across-the-board increase in the rate of gene duplication and loss in a variety of genes over the entire rodent genome. Although there seems to be some increase in the frequency of gene duplication/deletion in some other rodent gene families (24, 25), its magnitude is insufficient to explain the values observed in the case of the NKC. Hence, the rodent NKC-specific factors might also play important roles in the expansion. An indication that this might be so is the observation of an increased rate of nonsynonymous substitutions in the lineage leading to the rapidly duplicated rodent CLEC2D genes (Fig. 3). The increase is indicative of positive selection, which might be driven by the need of these genes to coevolve with the expanding rodent KLRB1 genes encoding their receptors. For many of the other NKC-encoded KLRs, the ligands are encoded in the MHC genes (26, 27), which are well known for their instability in rodents. In some rodent species (e.g., the mole rat), the MHC class I genes have expanded to an estimated 100 copies (24), whereas in others (e.g., Syrian hamsters), they have been reduced to only a few genes (28). Other evidence supporting the existence of rodent NKC-specific factors is the observation that KIR genes, which belong to the Ig superfamily and perform functions analogous to the KLRA in rodents, are expanded into a multigene cluster in primates, cattle, and other nonrodent mammals, whereas only one or two KIR-like genes are present in the mouse and rat (2, 29, 30). Although the genomic instability manifested by the birth and death process may seem contradictory to the observed conservation of gene organization and of orthologous lineages, in fact, the two features might have the same basis: the coevolution of the receptors with their ligands. If so, one would expect the ligands to evolve by a similar interplay of destabilizing and stabilizing processes. Indeed, there are some tantalizing hints that this might be the case.

Materials and Methods

Characterization of C-Type Lectin-Like NKC Sequences.

We searched human (Homo sapiens), mouse (Mus musculus), and rat (Rattus norvegicus) genome sequences from ENSEMBL (, assembly versions v27.35a, v27.33c.1, and v28.3e.1, respectively; dog (Canis familiaris) and cattle (Bos taurus) genome sequences deposited in NCBI, build number 2.1; opossum (Monodelphis domestica); and chicken (Gallus gallus) sequences deposited in the University of California, Santa Cruz, CA, database (, October and February 2004. In the case of human, mouse, rat, dog, and cattle genomes, we first identified roughly the location of the NKC using the program tblastn (31) (, using the known mouse NKC protein sequences as queries. We then retrieved the entire sequence of the NKC-bearing chromosome and used it as a database to perform a homology search for the C-type lectin-like NKC sequences using the tblastn program. By using high expected E-values (E = 10), we ensured that all C-type lectin-like NKC sequences would be retrieved. For the human and mouse NKC analysis, the queries were all annotated CTLD (INTERPRO domain ID: IPR001304)-containing protein sequences from the NKC region retrieved from ENSEMBL by using the data mining tool biomart ( For the poorly annotated genomes of rat, dog, and cattle, we used as queries not only the ENSEMBL-annotated C-type lectin-like sequences in NKC, but also all the human and mouse sequences identified as described above. The sequence fragments that contain only part or none of CTLD were not considered.

Because of low sequence similarity outside of the CTLD in the opossum and chicken genomes, we limited our search to the CTLD of homologous NKC sequences only. The CTLD sequences of the representative NKC genes from human, mouse, rat, dog, and cattle genomes identified above were used as queries to search the opossum and chicken genomes. We also searched one of the largest chicken EST databases, BBSRC chickEST (, for sequences homologous to the CTLD of mammalian NKC genes.

Phylogenetic Analysis and Nomenclature.

We constructed a phylogenetic tree from protein sequences aligned by using the program mafft with the option of E-INS-I ( We used the NJ method (32) in the program mega3 (33), with the pairwise deletion option. The proportional amino acid differences (p-distances) were used. The tree was evaluated by 1,000 bootstrap resamplings. In the case of CLEC2D genes, which contain several pseudogenes in rodents, we constructed a NJ tree of the nucleotide sequences with Jukes–Cantor distance. The nucleotide sequences of all ancestral nodes of the phylogenetic tree were inferred from the present-day putatively functional CLEC2D sequences by using the program anc-gene (34).

For the gene names, we used the HUGO gene nomenclature ( and MGNC ( with minor modification, taking into account their phylogenetic relationship. Except for two gene names, CD69 and OLR1, all of the remaining NKC genes were named starting with either KLR or CLEC. The fourth and fifth letters (or numbers) refer to different orthologous genes or gene groups. We define an orthologous gene (or gene group) as a group of sequences from different species, all derived from a single most recent common ancestor.

Supplementary Material

Supporting Information


We thank Wayne Yokoyama, Colm O’hUigin, and Yoko Satta for their comments. This work was supported by National Institutes of Health Grant GM020293 (to M.N.).



C-type lectin


C-type lectin superfamily


C-type lectin-like domain


killer cell lectin-like receptor




natural killer


NK receptor gene complex


NK receptor


Conflict of interest statement: No conflicts declared.


  • 1.Seaman W. E. Arthritis Rheum. 2000;43:1204–1217. doi: 10.1002/1529-0131(200006)43:6<1204::AID-ANR3>3.0.CO;2-I. [DOI] [PubMed] [Google Scholar]
  • 2.Kelley J., Walter L., Trowsdale J. PLoS Genet. 2005;1:129–139. doi: 10.1371/journal.pgen.0010027. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Trowsdale J., Barten R., Haude A., Stewart C. A., Beck S., Wilson M. J. Immunol. Rev. 2001;181:20–38. doi: 10.1034/j.1600-065x.2001.1810102.x. [DOI] [PubMed] [Google Scholar]
  • 4.Yokoyama W. M., Plougastel B. F. M. Nat. Rev. Immunol. 2003;3:304–316. doi: 10.1038/nri1055. [DOI] [PubMed] [Google Scholar]
  • 5.Drickamer K., Fadden A. J. Biochem. Soc. Symp. 2002:59–72. doi: 10.1042/bss0690059. [DOI] [PubMed] [Google Scholar]
  • 6.Iizuka K., Naidenko O. V., Plougastel B. F., Fremont D. H., Yokoyama W. M. Nat. Immunol. 2003;4:801–807. doi: 10.1038/ni954. [DOI] [PubMed] [Google Scholar]
  • 7.Carlyle J. R., Jamieson A. M., Gasser S., Clingan C. S., Arase H., Raulet D. H. Proc. Natl. Acad. Sci. USA. 2004;101:3527–3532. doi: 10.1073/pnas.0308304101. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Hao L., Nei M. Immunogenetics. 2004;56:343–354. doi: 10.1007/s00251-004-0703-0. [DOI] [PubMed] [Google Scholar]
  • 9.Nylenna O., Naper C., Vaage J. T., Woon P. Y., Gauguier D., Dissen E., Ryan J. C., Fossum S. Eur. J. Immunol. 2005;35:261–272. doi: 10.1002/eji.200425429. [DOI] [PubMed] [Google Scholar]
  • 10.Wilhelm B. T., Gagnier L., Mager D. L. Genomics. 2002;80:646–661. doi: 10.1006/geno.2002.7004. [DOI] [PubMed] [Google Scholar]
  • 11.Nei M., Xu P., Glazko G. Proc. Natl. Acad. Sci. USA. 2001;98:2497–2502. doi: 10.1073/pnas.051611498. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Kumar S., Hedges S. B. Nature. 1998;392:917–920. doi: 10.1038/31927. [DOI] [PubMed] [Google Scholar]
  • 13.Rogers S. L., Gobel T. W., Viertlboeck B. C., Milne S., Beck S., Kaufman J. J. Immunol. 2005;174:3475–3483. doi: 10.4049/jimmunol.174.6.3475. [DOI] [PubMed] [Google Scholar]
  • 14.Soanes K. H., Figuereido K., Richards R. C., Mattatall N. R., Ewart K. V. Immunogenetics. 2004;56:572–584. doi: 10.1007/s00251-004-0719-5. [DOI] [PubMed] [Google Scholar]
  • 15.Zelensky A. N., Gready J. E. BMC Genom. 2004;5:51. doi: 10.1186/1471-2164-5-51. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Kikuno R., Sato A., Mayer W. E., Shintani S., Aoki T., Klein J. Scand. J. Immunol. 2004;59:133–142. doi: 10.1111/j.0300-9475.2004.01372.x. [DOI] [PubMed] [Google Scholar]
  • 17.Sato A., Mayer W. E., Overath P., Klein J. Proc. Natl. Acad. Sci. USA. 2003;100:7779–7784. doi: 10.1073/pnas.1235938100. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Khalturin K., Becker M., Rinkevich B., Bosch T. C. Proc. Natl. Acad. Sci. USA. 2003;100:622–627. doi: 10.1073/pnas.0234104100. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Flornes L. M., Bryceson Y. T., Spurkland A., Lorentzen J. C., Dissen E., Fossum S. Immunogenetics. 2004;56:506–517. doi: 10.1007/s00251-004-0714-x. [DOI] [PubMed] [Google Scholar]
  • 20.Yamanaka S., Zhang X. Y., Miura K., Kim S., Iwao H. Genomics. 1998;54:191–199. doi: 10.1006/geno.1998.5561. [DOI] [PubMed] [Google Scholar]
  • 21.Nei M., Gu X., Sitnikova T. Proc. Natl. Acad. Sci. USA. 1997;94:7799–7806. doi: 10.1073/pnas.94.15.7799. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Gagnier L., Wilhelm B. T., Mager D. L. Immunogenetics. 2003;55:109–115. doi: 10.1007/s00251-003-0558-9. [DOI] [PubMed] [Google Scholar]
  • 23.Takahashi T., Yawata M., Raudsepp T., Lear T. L., Chowdhary B. P., Antczak D. F., Kasahara M. Eur. J. Immunol. 2004;34:773–784. doi: 10.1002/eji.200324695. [DOI] [PubMed] [Google Scholar]
  • 24.Vincek V., Nizetic D., Golubic M., Figueroa F., Nevo E., Klein J. Mol. Biol. Evol. 1987;4:483–491. doi: 10.1093/oxfordjournals.molbev.a040458. [DOI] [PubMed] [Google Scholar]
  • 25.Grus W. E., Shi P., Zhang Y. P., Zhang J. Proc. Natl. Acad. Sci. USA. 2005;102:5767–5772. doi: 10.1073/pnas.0501589102. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Natarajan K., Dimasi N., Wang J., Margulies D. H., Mariuzza R. A. Mol. Immunol. 2002;38:1023–1027. doi: 10.1016/s0161-5890(02)00031-7. [DOI] [PubMed] [Google Scholar]
  • 27.Kabat J., Borrego F., Brooks A., Coligan J. E. J. Immunol. 2002;169:1948–1958. doi: 10.4049/jimmunol.169.4.1948. [DOI] [PubMed] [Google Scholar]
  • 28.Darden A. G., Streilein J. W. Immunogenetics. 1984;20:603–622. doi: 10.1007/BF00430319. [DOI] [PubMed] [Google Scholar]
  • 29.McQueen K. L., Wilhelm B. T., Harden K. D., Mager D. L. Eur. J. Immunol. 2002;32:810–817. doi: 10.1002/1521-4141(200203)32:3<810::AID-IMMU810>3.0.CO;2-P. [DOI] [PubMed] [Google Scholar]
  • 30.Hoelsbrekken S. E., Nylenna O., Saether P. C., Slettedal I. O., Ryan J. C., Fossum S., Dissen E. J. Immunol. 2003;170:2259–2263. doi: 10.4049/jimmunol.170.5.2259. [DOI] [PubMed] [Google Scholar]
  • 31.Zhang Z., Schaffer A. A., Miller W., Madden T. L., Lipman D. J., Koonin E. V., Altschul S. F. Nucleic Acids Res. 1998;26:3986–3990. doi: 10.1093/nar/26.17.3986. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Saitou N., Nei M. Mol. Biol. Evol. 1987;4:406–425. doi: 10.1093/oxfordjournals.molbev.a040454. [DOI] [PubMed] [Google Scholar]
  • 33.Kumar S., Tamura K., Nei M. Brief. Bioinform. 2004;5:150–163. doi: 10.1093/bib/5.2.150. [DOI] [PubMed] [Google Scholar]
  • 34.Zhang J., Nei M. J. Mol. Evol. 1997;44:S139–S146. doi: 10.1007/pl00000067. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supporting Information
pnas_0511280103_3.pdf (22.1KB, pdf)
pnas_0511280103_1.pdf (307.5KB, pdf)
pnas_0511280103_2.pdf (31.4KB, pdf)

Articles from Proceedings of the National Academy of Sciences of the United States of America are provided here courtesy of National Academy of Sciences