Skip to main content
Proceedings of the National Academy of Sciences of the United States of America logoLink to Proceedings of the National Academy of Sciences of the United States of America
. 2025 Oct 23;122(43):e2519372122. doi: 10.1073/pnas.2519372122

Exceptional diversity of allorecognition receptors in a nonvertebrate chordate reveals principles of innate allelic discrimination

Henry Rodriguez-Valbuena a, Jorge Salcedo a, Olivier De Thier b, Jean Francois Flot b, Stefano Tiozzo c, Anthony W De Tomaso a,1
PMCID: PMC12582321  PMID: 41129228

Significance

Polymorphic self/nonself discrimination systems are found throughout the metazoa, with discriminatory capabilities reminiscent of vertebrate adaptive immunity. However, there is no conservation of allorecognition genes between phyla, and invertebrate allorecognition depends on germline encoded receptors, thus the mechanisms underlying allelic discrimination are unknown. Botryllus is an invertebrate chordate with an allorecognition system that can discriminate between up to a thousand histocompatibility alleles. Here we characterize the Botryllus allorecognition receptors, revealing an unprecedented level of diversity and evolvability, as well as a genetic linkage to canonical ITAM/ITIM signal transduction and processing pathways. These results provide critical insight into the mechanisms responsible for allelic discrimination in Botryllus and can also explain the rapid birth and turnover of allorecognition receptors throughout the metazoa.

Keywords: allorecognition, structural immunology, immunogenetics, innate immunity, evolution immunity

Abstract

Highly polymorphic allorecognition systems have been characterized in numerous invertebrate species, and exhibit discriminatory capabilities reminiscent of vertebrate adaptive immunity. As these systems utilize germline encoded receptors, the mechanisms underlying allelic discrimination are unknown. The invertebrate chordate, Botryllus schlosseri, undergoes a natural transplantation reaction controlled by a highly polymorphic, polygenic locus (called the fuhc) with over 1,000 allelic haplotypes found worldwide. Two individuals are compatible if they share one or both fuhc alleles, and we had found that polymorphic discrimination is due to the integration of signals from two allorecognition receptors encoded within the fuhc locus, called fester and uncle fester. Here we show that these two receptors are members of an extended family consisting of >35 genes, now called the Fester family (FF), and coexpressed with members of another diverse gene family, the fester coreceptors (FcoR). Both FF and FcoR are Immunoglobulin superfamily members and each FcoR encodes conserved tyrosine signal transduction motifs, including ITIMs or hemITAMs. FF and FcoR are expressed and encoded as cognate pairs in two polymorphic haplotypes: one within the fuhc locus, and another on a separate chromosome, and remarkably, copy number variation between haplotypes is of gene pairs. Furthermore, two FcoR genes can swap ITIMs and hemITAMs by alternative splicing, suggesting that dynamic tuning of activating and inhibitory signaling is required for allelic discrimination. These results indicate that conserved signal processing mechanisms are the foundation of both allelic discrimination in Botryllus, and recurring convergent evolution of allorecognition receptors observed from invertebrates to mammals.


Allorecognition is found in species throughout the metazoa, and is the basis of diverse processes, including mate choice, mediating competition for space, and immune function (1, 2). The ability to discriminate self from nonself is due to the presence of polymorphic ligand(s), and an effector system with the capacity to discriminate between alleles. In turn, this requires both the creation and maintenance of genetic polymorphism for the ligand, while preserving the ability of the receptors to maintain specificity in the face of this constant diversification. Yet despite the ubiquity of allorecognition and the complex interactions required to maintain specificity, there is no conservation of candidate ligands or receptors. For example, in three well-studied models of invertebrate allorecognition (the Poriferan, Amphimedon queenslandica; the Cnidarian Hydractinia symbiolongicarpus and the Tunicate, Botryllus schlosseri) candidate genes appear in a punctuated manner, with no obvious homologs identified outside of familial clades (3, 4). But the mechanisms which allow the rapid and recurring evolution of unique and complex recognition systems are a complete mystery.

Studies in allorecognition led to the discovery of the ligands and receptors in both the adaptive and innate immune systems of mammals, and provide further examples of convergent evolution of polymorphic discrimination systems (5, 6). Both recognize the same ligand—a highly polymorphic MHC molecule bound to a short peptide (pMHC). However, they rely on distinct receptors and recognition mechanisms to discriminate among pMHC polymorphisms. Discrimination in adaptive immunity is carried out by T-cells, and due to differential binding of the T-cell receptor (TCR) to self vs. nonself pMHC complexes. This discriminatory ability is based on the structural chemistry at the TCR/pMHC interface, a property of each TCR selected via positive and negative selection in the thymus (7, 8). By contrast, innate allorecognition is mediated by Natural Killer (NK) cells, which use germline encoded inhibitory receptors to bind to oligomorphic epitopes found on multiple MHC class I alleles (9). This allows NK cells to determine the presence or absence of MHC molecules on the cell surface, a strategy called missing-self recognition (6). In both T- and NK cells, the outcome is dependent on integration of signals generated by receptors that either activate or inhibit immune function (10).

Both innate and adaptive receptors utilize the same signaling pathways: the TCR and NK activating receptors use the ITAM pathway, while inhibitory receptors in both cells use the ITIM pathway. Discrimination is due to an interaction between kinases, phosphatases, and adaptor molecules triggered following ligand binding (1113). Finally, during development both T- and NK cells go through an autonomous education process, which quantifies binding potential of the expressed receptor(s) and sets an activation threshold that allows cells to be self-tolerant and functional (8, 14, 15).

Similar to the invertebrates, convergent evolution is the default state for vertebrate allorecognition. For example, T-cell recognition appears as a functioning unit in the earliest jawed vertebrates. However, no homologs of the MHC, TCR, or evidence of RAG-based somatic recombination are found in jawless fish, nonvertebrate chordates, or invertebrates (16, 17). NK receptors are evolving even faster: rodents use the C-type lectin Ly49 genes, primates use Killer-cell immunoglobulin-like receptors (KIRs) (18), and orthologs of these genes are not found outside of the mammals (19). However, despite no evolutionary relationship, all innate vertebrate allorecognition receptors, as well as nonvertebrate candidate allorecognition genes, share a similar genomic organization: they belong to multigene families that are encoded in linked, polymorphic haplotypes (3, 1921). Moreover, in the vertebrates, Hydractinia (21), and Botryllus (this study), many of these genes encode ITAM or ITIM signaling motifs, partitioning them into activating and inhibitory functions. In summary, multiple unrelated allorecognition protein families in species that diverged over 700 Mya share a similar genomic organization, and likely use the same signaling pathways. In turn, this suggests the mechanisms that integrate and process information from cell surface binding events are conserved, providing a stable framework that permits rapid evolution of immune gene families (22).

Botryllus schlosseri is a Tunicate, a chordate subphylum that is the sister group of the vertebrates (23), and provides a powerful model to investigate the cellular and molecular mechanisms underlying polymorphic discrimination and the evolution of vertebrate immunity. Botryllus undergoes a natural transplantation reaction when two individuals grow into proximity. Terminal projections of the vasculature, called ampullae come into contact (Fig. 1A). If two individuals are compatible, the ampullae will fuse, forming a parabiosis. If they are incompatible, the vessels will reject an inflammatory reaction which prevents vascular fusion (Fig. 1 B and C). Fusion or rejection is controlled by a single, highly polymorphic locus called the fuhc (for fusion/histocompatibility) and discrimination occurs by missing-self recognition. Specifically, individuals are compatible if they share one or both fuhc alleles, but are incompatible if no alleles are shared (16, 24). The outcome is restricted to the ampullae in contact. For instance, a colony can simultaneously fuse on one side, and reject on another, and each ampulla of an fuhc heterozygote can recognize both self fuhc alleles (16, 25, 26).

Fig. 1.

Fig. 1.

Allorecognition response and AlphaFold structures of FF and FcoR proteins (A) Initial contact of ampullae between a blue (Bottom) and orange (Top) colony. Following contact, opaque morula cells migrate to the tips of the ampullae that are in contact (white arrow). Orange arrow points to an immune synapse formed between ampulla from blue and orange colony. Initial contact and morula migration is equivalent in compatible and incompatible pairings. (B) rejection response between incompatible genotypes. Ampullae become leaky, and the morula cells discharge, releasing precursors of the prophenoloxidase pathway, resulting in the local deposition of melanin, forming a structure called a point of rejection (POR). Three POR formed during this response (white arrows) (C) Fusion between two blue compatible colonies. Initial ampullae fusion can be seen (Top white arrow). Within a few hours, the ampullae transdifferentiate into a vessel (Bottom white arrow). (D) Model of a FF family protein. Ig-like and Sushi domains have high pLDDT scores indicating high confidence in the structural prediction. (E) Overlap between FF Ig-like domain and antibody Immunoglobulin variable (V) domain. (F) Overlap among FF Ig-like domains (FF1 to FF45). (G) Model of FcoR protein. Three Ig-like domains and a Sushi domain are encoded in the extracellular region. (H) Overlap between FcoR first Ig-like domain and antibody Immunoglobulin variable (V) domain. (I) Overlap among FcoR first Ig-like domains (FcoR1 to FcoR71). Alternative splice sites in both genes are indicated with red arrows.

Botryllus populations have 200 to 500 fuhc alleles, with up to a thousand worldwide. The fuhc is codominantly expressed, and most individuals are heterozygous; thus, the effector system can pinpoint both self-alleles from hundreds of competing nonself alleles (25, 27). However, the mechanisms that allow an innate recognition system to carry out this level of discrimination are not understood. Previous studies provide some insight: the fuhc locus spans ca. 800 Kb and consists of two distinct regions. On one side are four polymorphic genes (bhf, fuhc-sec, fuhc-tm, and hsp40l) in a conserved 110 Kb linkage group called the fuhc-L, one or more of which encode the fuhc ligand (28). The other side encodes a dynamic, polymorphic receptor complex. We initially identified two genes in this complex: fester which is highly polymorphic, paralogous, and required for fusion, and uncle fester which is monomorphic and required for rejection (26, 29). We found that each gene controls an independent pathway: uncle fester activates the rejection response, while fester controls the fusion response. Signals from the two pathways are integrated, and the outcome determines if the ampullae fuse or reject (26, 29). While both missing-self recognition and signal integration were analogous to NK cell reactivity, neither protein encoded known signaling domains, and it was unclear if other receptors contributed to specificity.

Here we find that rather than being encoded in only two genes, fester and uncle fester are part of a large, diverse, multigene family that we now call the Fester Family (FF), and are coexpressed with members of another diverse family we are calling the fester coreceptors (FcoR). FcoR are type I transmembrane proteins that encode ITIM, hemITAM, and two other tyrosine motifs of unknown function. The FF and FcoR genes are encoded, expressed, and evolve as gene pairs on two chromosomes, one within the fuhc locus in chr 11, and another on chr 5. A third gene cluster, composed exclusively of FcoR genes, is encoded on chr 9. These results suggest that: 1) two families of immune receptors participate in the Botryllus allorecognition response; 2) allelic discrimination does not rely on a single high-affinity interaction, instead it is based on the integration of activating and inhibitory signals triggered from multiple binding events; and, 3) this conservation of signal processing underlies convergent evolution of allorecognition receptors across the metazoa, including the adaptive immune system.

Results

fester and Uncle Fester Are Members of a Large Gene Family.

fester and uncle fester share a similar domain structure; they are type I transmembrane proteins with an extracellular Ig and Sushi domain, followed by 3 transmembrane domains and a short intracellular tail (Fig. 1D). Overall, the two genes have a pairwise identity of 55%, but this similarity is not equally distributed across the protein. The extracellular domain has a low amino acid identity (37%), while the COOH half of the protein (three transmembrane domains and a cytoplasmic region) have high amino acid identity (91%; SI Appendix, Fig. S1 A and B) (26). During the analysis of the fuhc locus of a related ascidian (Botrylloides diegensis), we identified multiple low similarity (ca. 20% amino acid identity) FF homologs with the exact same secondary structure. These FF homologs were encoded in a syntenic region of the B. diegensis genome, near the fuhc-L. This finding made us wonder if we had missed other low similarity FF loci in our previous studies, so we searched for homologs in 26 bulk transcriptomes from wild-type and lab-reared individuals, and four genome databases of B. schlosseri (SI Appendix, Table S1). While both fester and uncle fester were expressed in many individuals (SI Appendix, Fig. S15), each was also expressing an average of 8 other low similarity FF homologs with the equivalent domain structure (SI Appendix, Figs. S3 and S15). In total, we identified 43 unique sequences: 26 were expressed by multiple (n > 5) individuals, while the remaining 17 were only identified in one or two individuals, and six of the latter were partial sequences that only encoded the ectodomain. Pairwise identity analyses revealed that these sequences ranged from 20 to 92% identity, with an average of 32% (SI Appendix, Fig. S1A). In contrast, the average pairwise identity of the human KIR loci is 78% (SI Appendix, Fig. S1A). This suggested that most of these 43 sequences were independent loci, while the closely related sequences were likely alleles of the same locus (SI Appendix, Fig. S1A). In summary, fester and uncle fester are part of a large, diverse multigene family. We are calling these sequences the fester family (FF), and fester and uncle fester genes have been renamed FF1 and FF3, respectively.

Polymorphism and Diversity of the FF Family.

We had previously found that FF1 is highly polymorphic, and identified over 60 alleles of this gene in populations in the United States (29, 30), 23 of which were also found in this study. FF1 is also paralogous, as we had found two haplotypes, one encoding a single FF1 locus, another with two (29, 30). To estimate the number of unique loci among the 43 new FF sequences, we used the range of pairwise identities of both protein and nucleotides from the previously characterized FF1 alleles as a threshold (30) (SI Appendix, Fig. S1A). The two most divergent FF1 alleles were 78% identical at the amino acid level (SI Appendix, Fig. S1A), and 88% at the nucleotide level, so any of the 43 new sequences with pairwise identities above those values (over the entire alignment) were classified as alleles, and any below those values as unique loci. This suggested that only four of the new loci had multiple alleles and were oligomorphic. FF12 and FF22 had 2 alleles each, FF28 had 3 alleles, and FF13 had 4 alleles (SI Appendix, Table S3). In contrast, we identified 23 alleles of FF1 in the same databases. After merging the alleles from the 43 newly identified sequences, we classify 36 of these sequences as new FF genes. Excluding the four oligomorphic genes, 22 of the new loci were clearly monomorphic, as identical nucleotide sequences were identified in >5 individuals. However, for ten of the new loci, we could not estimate polymorphism because they were only expressed in one or two individuals (SI Appendix, Fig. S15).

To summarize, in a survey of 26 transcriptomes and 4 genomic databases, we found uncle fester (now FF3), multiple alleles of fester (now FF1), and also identified 36 new FF loci. Thus far, the FF gene family comprises 38 members: one polymorphic gene (FF1), four oligomorphic genes (FF12, FF13, FF22, and FF28), 23 monomorphic genes (including FF3), and ten genes that were too rare to characterize. However, as our locus/threshold metric is an estimate, and multiple genes were only found in one or two individuals, our results are only an approximation of both the overall diversity of the FF family, and the number of polymorphic loci. Nevertheless, 22 of the 36 new loci appear to be monomorphic.

Phylogenetic analysis showed that the FF proteins are grouped into approximately seven well-supported clades (bootstrap >96%, posterior probability >88%) (SI Appendix, Fig. S2 A-C). Moreover, there are two main clades of FF genes that correlate with their genomic locations (chromosomes 5 and 11; discussed below). In addition, each individual expressed a nearly unique repertoire of FF genes, with an average of 10 FF genes per individual, with a range of 7 to 13 (SI Appendix, Fig. S3).

The FF Proteins Encode a NH2-Terminal Ig-Like Domain.

FF proteins shared on average 36% and 41% protein identity in the extracellular and intracellular regions, respectively (SI Appendix, Fig. S1A). We next compared the predicted structure of all the FF proteins using AlphaFold (31) (Fig. 1 DF and SI Appendix, Fig. S4). We confirmed the presence of a Sushi domain in the extracellular region of all FF proteins (Fig. 1D), but the new structural prediction techniques also detected an Immunoglobulin domain (Fig. 1D and SI Appendix, Fig. S4A). We found that 34 of 38 FF genes encode Ig-like domains that showed matches to Immunoglobulin Superfamily members with significant scores (e-value range 3.48E-01 to 4.59E-06; SI Appendix, Fig. S4A) using Foldseek (SI Appendix, Fig. S4C; (32). Moreover, structural alignment of the Ig-like domains of all FF proteins showed conservation of two β-sheets (average TM-score 0.59; Fig. 1F). We further characterized the β-sheets of the Ig-like domains with the icn3d software and found that 15 of 38 FF genes have a V-type Ig-like domain (SI Appendix, Fig. S4D- genes with different alleles were counted only once; (33) This is the case of FF2 Ig-like domain that contain nine β-strands, including the presence of a C’’ β-strand, which are features observed in the V-type Ig domains (SI Appendix, Fig. S4E) (34). Supporting this classification, the overlap between FF2 Ig-like domain and an antibody Ig domain with the icn3d software showed a high score (PDB: 5ESV; TM-score 0.62; Fig. 1E). These analyses demonstrate that despite low average protein similarity, the FF proteins share equivalent domain structure: an ectodomain consisting of N-terminal Ig-like domains with high similarity to vertebrate V-type Ig domains followed by a Sushi domain, and three transmembrane domains.

As described above, FF1 is the only highly polymorphic FF gene identified thus far, with 60 alleles identified in the United States (30). We mapped the variable positions of the FF1 alleles on the AlphaFold model of this protein (SI Appendix, Fig. S5 A and B), and found that 52 of the 116 variable positions (44.8%) were concentrated in the Ig-like domain of this protein, suggesting that this region is involved in ligand binding.

Alternative Splicing in the Extracellular and Intracellular Regions of FF Genes.

One common characteristic of the known FF genes (fester and uncle fester) was the alternative splicing of two extracellular exons between the Sushi domain and TM domain, moving the NH2-terminal Ig and Sushi regions relative to the plasma membrane (Figs. 1D and 4A) (29). Using transcriptomic data, we found that 26 of 38 genes spliced out 1, 2, or 3 exons in the extracellular region (SI Appendix, Fig. S6A). For example, the FF7 gene has 3 exons between the Sushi and TM domains, and seven of eight possible combinations were identified (SI Appendix, Fig. S7A). Based on the transcriptomic data, we also detected alternative splicing in 12 of 38 genes that spliced out 1 or 2 exons, or 2 introns in the intracellular region (SI Appendix, Fig. S6B). These COOH terminal splice variants generate new stop codons, changing the structure of the cytoplasmic tail (SI Appendix, Fig. S7A, described below). We also quantified the expression of alternative splicing variants for the FF genes with full genome sequence (FF1, FF3, FF12, FF13, FF22, FF26) across different Botryllus individuals (Fig. 4B and SI Appendix, Fig. S6C). We found that there was genotype-specific expression of different alternative splice variants, as we had observed previously for FF1 (29).

Fig. 4.

Fig. 4.

Genomic structure and alternative splicing of FF and FcoR genes. (A) Genomic structure of gene pairs FF1/FcoR7 and FF3/FcoR1. Exons encoding the signal peptide (SP), Ig-like, and Sushi domains are indicated. FF genes exhibit alternative splicing between the Sushi and transmembrane domains. In both FF and FcoR genes, the first Ig domain is encoded in either 1 or 3 exons (pink). Thus far, all FF/FcoR pairs identified follow a 3:1 rule, whereby one gene in the pair has a single exon Ig domain, and the other has a three exon Ig domain. (B) Quantification of FF1 and FF3 splicing variants reveal genotype specific patterns. Full-length and splice variants were quantified from different Botryllus transcriptomes. Exons 6 and 7 (FF1) or exons 4 and 5 (FF3) are spliced out in these genes.

FF Genes Diversify by Intergenic Recombination.

FF1 (fester) and FF3 (uncle fester) share high amino acid sequence identity (91%) in the C-terminal regions, but low protein identity (37%) in the extracellular region (SI Appendix, Fig. S1A). To characterize the evolution of these regions, we independently analyzed the extracellular and intracellular (transmembrane and cytoplasmic) regions of the full-length FF proteins (SI Appendix, Fig. S8 AE). Phylogenetic analysis of the intracellular region identified five well supported clades (bootstrap >95%, posterior probability 100%). However, most of the ectodomains of FF proteins that shared intracellular regions split into different clades when extracellular regions were analyzed independently, suggesting that the FF genes are diversifying by intergenic recombination, mixing and matching the ectodomains and intracellular domains. There was one exception, as one clade of FF genes did not exhibit intergenic recombination with other clades (orange in SI Appendix, Fig. S8 BE). As discussed below, this is a result of the presence of two separate clusters of FF genes, each on a different chromosome (chr 5 and 11), restricting recombination to genes in each cluster (Fig. 3 B and C). Additionally, we analyzed the transmembrane domains and cytoplasmic regions of FF proteins (SI Appendix, Fig. S9). No conserved motifs were identified.

Fig. 3.

Fig. 3.

Haplotypic variation of FF and FcoR genes. (A) Partial physical maps of the α and β haplotypes of the fuhc locus from individuals in California (B) Comparison of the maternal and paternal haplotype of the fuhc locus (Chromosome 11) from a single individual from the Mediterranean. (C), Comparison of two haplotypes encoding FF/FcoR pairs on chromosome 5, and (D) comparison of the two FcoR haplotypes on chromosome 9 from the same individual. FF and FcoR genes are highlighted in yellow and pink, respectively. Framework genes are highlighted in blue.

Identification of the FcoR Gene Family.

Our search for new FF genes also identified another group of conserved sequences, also initially identified in the B. diegensis fuhc locus. These encoded a type I transmembrane protein with an extracellular region that consisted of a signal sequence, N-terminal Ig domain, Sushi domain, and two further Ig domains, followed by a single transmembrane domain, and an intracellular tail of varying lengths (Fig. 1G). We identified 87 unique sequences in the same transcriptome databases where the FF genes were found (SI Appendix, Table S4). As described below, 20 of these sequences are alleles of a single locus, leaving 68 unique sequences. Of these 68, 45 contained a transmembrane domain, while 23 were partial ectodomain sequences without a predicted transmembrane domain and no stop codon (SI Appendix, Table S4). In addition, each full-length sequence also encoded either canonical ITIMs, hemITAMs, or two other tyrosine-based motifs in the cytoplasmic tails (SI Appendix, Fig. S13; discussed below). As outlined next, multiple genetic and genomic characteristics of these new proteins suggest they interact with FF proteins and trigger signaling pathways, and we are tentatively naming them the FcoRs.

The characteristics of the FcoR sequences are similar to the new FF loci. These sequences are also highly diverse, with an average of 32% pairwise amino acid identity, with a range of 20 to 98% (SI Appendix, Fig S1A). Similar to FF family, one of these loci (FcoR7) was highly polymorphic, and we identified 20 alleles, with most of the diversity concentrated in the ectodomain (SI Appendix, Figs. S1A and S5 C and D). As this was the only highly polymorphic FcoR gene identified, with nearly equivalent characteristics as FF1, we used the same similarity values described above to estimate the number of potential loci vs alleles in the remaining 67 unique sequences (SI Appendix, Fig. S1A). Similar to the FF family, we also found seven groups of related sequences that are oligomorphic with 2-5 alleles of each locus (loci designated as FcoR2, 3, 4, 9, 14, 19, and 27; SI Appendix, Fig. S1A and Table S4). Thus, of the 68 FcoR sequences, 56 were predicted to be unique genes using these cutoff values. Of those, 38 contained a TM domain and 18 were partial sequences (SI Appendix, Table S4). Thirty of the remaining 56 sequences were only identified in one or two individuals (SI Appendix, Fig. S15B). In summary, the FcoR family consists of 56 loci: one is highly polymorphic (FcoR7), seven are oligomorphic (FcoR2, 3, 4, 9, 14, 19, and 27), and there are multiple monomorphic loci, with the same caveats as for the FF family. We are likely underestimating the diversity and polymorphism of the FcoR family, but many of the newly identified loci appear to be monomorphic.

Similar to the FF family, each individual expressed an almost unique repertoire of FcoR genes (SI Appendix, Fig. S15B), with an average expression of 14 FcoR genes/individual, and a range of 4 to 28 genes expressed/individual (SI Appendix, Fig. S3). Thus, multiple characteristics are identical between the two gene families.

FcoR Protein Structures Contain Ig-Like and Sushi Domains.

Structural prediction of FcoR proteins using AlphaFold revealed four domains in the extracellular region with high scores (pLDDT >80%), and a transmembrane domain (Fig. 1G). Comparisons of these domains with structures in the PDB database using Foldseek software identified the first (e-values 0.503 to 2.65e-05), third (e-values 7.82 to 1.16E-03) and fourth (e-values 1.76 to 1.52E-04) domains as Ig-like domains, while the second domain is a Sushi domain (e-values 1.03 to 3.60E-02) (SI Appendix, Fig. S10). Structural alignment of each domain with its counterpart in the PDB database using the Foldseek software showed high similarities (SI Appendix, Fig. S11A). The first Ig-domain structurally aligned with an antibody Ig domain (TM-score 0.59), while the Sushi domain aligned with the CD46 receptor (TM-score 0.55), the second Ig-like domain aligned with the C2-type domain of the CD22 receptor (TM-score 0.59) (35), and the third Ig-like domain aligned with Ig domains of the DIP proteins of Drosophila melanogaster (TM-score 0.66) (36). Additionally, we aligned the first Ig-like domain across the 26 FcoR proteins that were full-length, and found that this domain is shared with an average TM-score of 0.62 (Fig. 1I and SI Appendix, Fig. S13). We next analyzed the structure of those Ig-like domains with the icn3d software and found that in 7 of 26 FcoR proteins the first Ig-like was classified as V-type with nine β-strands, including a C’’ β-strand (Fig. 1H and SI Appendix, Fig. S11B). In contrast, the second and third Ig-like domains exhibited fewer β-strands that the first Ig-like domain, and lacked the C’’ β-strand, indicating that these Ig-like domains are not V-type (SI Appendix, Fig. S11B). For example, the second Ig-like domain of FcoR23 protein lacked the D β-strand, suggesting that it is a C2-type domain (35), while the third Ig-like domain of FcoR1 protein contained A to G β-strands (SI Appendix, Fig. S11B). Thus, these results indicate that the FcoR proteins contain three Ig-like domains and a Sushi domain in the extracellular region, and some of the first Ig-like domains were classified as V-type (36).

Intergenic Recombination and Alternative Splicing of FcoR Genes.

Phylogenetic analysis revealed that the unique FcoR sequences grouped into a higher number of clades compared to the FF gene clades (SI Appendix, Fig. S2 DF). To analyze if the FcoR genes diversify extracellular and intracellular regions as we found for the FF genes, an independent analysis of the extracellular domains and TM/intracellular region was performed with the 43 FcoR proteins that contain TM domains (SI Appendix, Fig. S8 FJ andTable S4; FcoR25 and FcoR53 sequences were excluded). We found that the TM/intracellular region of the FcoR genes had a higher number of clades (eight clades with bootstrap >78%, posterior probability >80%) than the FF genes (five clades with bootstrap >95%, posterior probability 100%) (SI Appendix, Fig. S8 C, E, H, and J). As we described below, FcoR genes are encoded in haplotypes on three different chromosomes, explaining this higher diversity. Moreover, we observed that FcoR7, FcoR33, and FcoR34 were grouped in different clades in the extracellular and intracellular trees, which suggest that these genes diversify via intergenic recombination (SI Appendix, Fig. S8 GJ). However, our results suggest fewer recombination events in FcoR genes than the FF genes (SI Appendix, Fig. S8).

We also found alternative splicing in the extracellular and intracellular regions of FcoR genes (Fig. 1G and SI Appendix, Fig. S7). The most common splicing variant consisted of a deletion between the third Ig-like domain and the transmembrane domain (e.g., a small deletion in exon 8 of FcoR33), creating a shorter variant of the ectodomain but with no effect on the predicted TM region (SI Appendix, Fig. S6D). Less common splicing variants were also identified, including one that deletes a fragment of the exon that encodes the transmembrane domain (FcoR3; blue triangle), as well as several splice variants (intron retentions) in the cytoplasmic tail (FcoR3 and FcoR4; discussed below) (SI Appendix, Fig. S7B). However, alternative splicing of the FcoR genes was minimal when compared to the FF genes, nor did we detect any genotype specific patterns.

Because the FF and FcoR proteins share the Ig-like and Sushi domains in the extracellular region (Fig. 1 D and G), we performed a phylogenetic analysis with these two domains to determine if these genes could possibly recombine with each other (SI Appendix, Fig. S12). We found that the FF proteins grouped in a clade (bootstrap 76%, posterior probability 78%), while the FcoR proteins grouped in multiple clades. However, one clade of FcoR proteins (FcoR7, FcoR10, FcoR33, and FcoR39; bootstrap 68%, posterior probability 78%) is nested within the clade of FF proteins. This suggests that while most of the FF and FcoR genes do not recombine with each other, it is possible that a small subset could.

FcoRs Encode Canonical ITIMs, hemITAMs, and Other Tyrosine Motifs in the Intracellular Tails.

Every FcoR protein sequence with a transmembrane domain and cytoplasmic tail longer than 60 aa encoded two or more tyrosine-based motifs (Fig. 2 and SI Appendix, Figs. S15 and S16). Three genes (FcoR3, 4, 7) encoded a canonical Immunoreceptor Tyrosine-based Inhibitory Motif (ITIM: I/L/VxYxxI/L/V), one gene (FcoR4) encoded a canonical hemi-Immunoreceptor Tyrosine-based Activation Motif (hemITAM: [E/D][E/D][E/D]xYxxL), one (FcoR3) had a hemITAM-like motif, with a substitution at the first acidic residue (37). Two additional tyrosine motifs were identified which we are calling core-1 (YxxI/L/V) and core-2 ([A/D]VYxxV), and with one exception, a core-1 or core-2 motif was found on every protein. Interestingly, core-2 motifs were only found in pairs, with the two motifs always separated by 29-33 residues, and encoded at the C-terminal end of the protein. In contrast, core-1 motifs were found in various permutations (between 1-4/gene) with no obvious spatial patterning, and most genes with an ITIM or hemITAM also had at least one core-1 motif (SI Appendix, Fig. S15).

Fig. 2.

Fig. 2.

Two oligomorphic FcoR proteins can switch signaling motifs (A) Alignments of the cytoplasmic tail of the FcoR4 alleles (A and B) and alternative splice variant (AS) of 4A. (B) Cytoplasmic tails of FcoR3 alleles (A and B) and three alternative splice variants (AS). ITIM, hemITAM and hemITAM-like, and tyrosine core motifs are highlighted. AS: alternative splice variant. Illustrations of alternative cytoplasmic regions are displayed at the top of each alignment.

Although the function of the core motifs is unknown, most permutations are similar to the sequences surrounding the target tyrosine in ITAMs, hemITAMs, ITIMs, ITSM, and ITT-like, as well as several of the ZAP-70 substrates on LAT (37, 38). These motifs are also found in the innate immune receptors (NILTs) of teleost fish (19). In addition, several of the FcoR intracellular regions also encoded predicted SH2 (Src Homology 2) motifs, and most genes had combinations of motifs, and SH2 domains often contained a core-1 motif within the consensus sequence (Fig. 2 and SI Appendix, Figs. S13 and S14). No tyrosine motifs are found on any FF gene (SI Appendix, Fig. S9), and other signaling motifs such as the ITT domain (YxNM) found on DAP10, were not found on any of the FcoR genes.

Two of the oligomorphic genes (FcoR3 and FcoR4) encoded alleles that differed in the cytoplasmic tail and swapped an ITIM and hemITAM motif in nearly the same pattern (Fig. 2). For FcoR4, one of the alleles (FcoR4B) encoded a canonical ITIM motif (LAYAIV) and a hemITAM-like motif (PDDVYAIL), while the other allele (FcoR4A) had a substitution on the ITIM motif (FAYAIV), and encoded a canonical hemITAM (EDDVYAIL) (Fig. 2A). We also found an alternative splice of FcoR4A which retained intron 12 and generates a premature stop codon (SI Appendix, Fig. S7B). This splice variant encoded another ITIM motif (VIYMTI) and excluded the exon encoding the hemITAM motif (Fig. 2A). FcoR3 had a similar allelic pattern but even more extensive alternative splicing (Fig. 2B). The FcoR3B allele encodes a canonical ITIM (VAYAIV), while the FcoR3A allele encodes an ITIM-like sequence (FAYAIV), and both alleles contain a hemITAM-like motif (ADDVYAIL). Two alternative splice (AS) variants for each allele were also detected that retained introns 10 (FcoR3B-AS2) or 14 (FcoR3A/B-AS1) and generated premature stop codons, which deleted the core motif or hemITAM-like region, respectively (Fig. 2B and SI Appendix, Fig. S7B). In summary, two of the oligomorphic FcoR receptors can switch signaling motifs via allelic polymorphism and alternative splicing, suggesting there may be plasticity to their functional roles.

Coexpression of Cognate FF and FcoR Gene Pairs.

Interestingly, analyses of the bulk transcriptome datasets from multiple individuals (either whole animal or dissected tissues) showed that the expression of each FF gene was often accompanied by a cognate FcoR gene (SI Appendix, Fig. S15). We evaluated the expression of cognate pairs of FF and FcoR genes using three strategies: 1) mapping with Kallisto, 2) mapping with RSEM, and 3) de novo assembly of the transcriptomes with Trinity, using 26 Botryllus transcriptomes (SI Appendix, Fig. S15 AC). Based on these analyses, we identified multiple FF and FcoR gene pairs, including 14 gene pairs supported by all strategies, and 18 gene pairs supported by one or two strategies (SI Appendix, Fig. S15C). However, it was the pairings themselves that were striking. The two polymorphic family members (FF1 and FcoR7, both with ≥ 20 alleles) were expressed as pairs; likewise, the oligomorphic genes: FF12 and FcoR3, FF13 and FcoR4 (with 2–4 alleles each) were also expressed as pairs (SI Appendix, Fig. S15). All full-length alleles of FcoR7 encode an ITIM motif, while both FcoR3 and FcoR4 have alleles with an ITIM motif (Fig. 2 and SI Appendix, Figs. S14 and S15). The pairing of the only highly polymorphic genes of each family, FF1 and FcoR7, is consistent with previous functional studies, which suggested that FF1 was part of an inhibitory receptor that discriminated between polymorphic fuhc ligands, as it was required for fusion (29). In addition, the pairing of the oligomorphic FF (FF12, FF13) with oligomorphic FcoR genes (FcoR3, FcoR4) encoding an ITIM is consistent with the idea that the polymorphic FF genes would be involved in specificity, and thus be inhibitory receptors. Five other oligomorphic FcoR loci: FcoR2, FcoR9, FcoR14, FcoR19, and FcoR27 (2-4 alleles each) encode either SH2 or core domains but are not paired with FF genes (SI Appendix, Table S5). Interestingly, the latter, unpaired FcoR genes all are encoded on chromosome 9, whereas FcoR3 and FcoR4 are encoded on chromosome 5 (discussed below; Fig. 3).

In addition, our previous functional data suggested that the monomorphic FF3 (uncle fester) was an activating receptor (26), and it is paired with the monomorphic FcoR1, which encodes a dual core-2 motif in the intracellular tail (SI Appendix, Figs. S14 and S15). In the remaining pairs, both genes are monomorphic and the FcoR partner encodes either a hemITAM, core tyrosine motifs, or both. Interestingly, no monomorphic FcoR encodes an ITIM, and most encode a dual core-2 motif (SI Appendix, Fig. S15).

Physical Mapping of the FF and FcoR Genes.

Physical mapping of the FF and FcoR genes revealed that coexpression patterns observed in the transcriptomic data correlated to the genomic linkage of the two genes (Fig. 3 and SI Appendix, Fig. S15). We initially mapped the new FF and FcoR sequences back to the partial physical map of the fuhc locus, which consisted of seven scaffolds covering two fuhc haplotypes (fuhc α2 and β; Fig. 3A) identified in genotypes from California (29). In these haplotypes, we found that FF and FcoR genes are encoded head-to-head in pairs (FF1/FcoR7; FF3/FcoR1; FF22/FcoR42), which correlated with coexpression in the transcriptomic data from multiple wild-type individuals (Fig. 3A and SI Appendix, Fig. S15). Thus, copy number variation (CNV) between these haplotypes is of FF/FcoR pairs, consistent with the expression data.

Next, we analyzed the complete genome sequence of a single individual isolated from the Mediterranean coast of France (39). In this assembly, both haplotypes of each of the 16 chromosomes were recovered, allowing us to physically map the maternal and paternal chromosomes of this individual independently. We found that there are two FF/FcoR clusters in the genome, one within the fuhc locus on chromosome 11, and another on chromosome 5 (Fig. 3 B and C). The physical location of the FF and FcoR genes correlate with phylogenetic analyses (SI Appendix, Figs. S2 and S8), as intergenic recombination is only observed between genes on the same chromosome. Importantly, every FF/FcoR pair found in the genome of the Mediterranean individual was also identified in individual transcriptomes or genome sequence from California individuals (Fig. 3 and SI Appendix, Fig. S15). Moreover, in the Mediterranean individual, one of the FF12/FcoR3 pairs on chromosome 5 is duplicated on one haplotype (SI Appendix, Fig. S16B), similar to the FF1 gene between two haplotypes in our lab reared strains [described above; (29). In summary, analyses of individuals in California and France reveal that the members of two gene families are undergoing birth and death evolution as gene pairs.

In addition, a third FcoR cluster was found on chromosome 9 (Fig. 3D). This region also exhibited CNV of single FcoR genes; for example, FcoR57 and FcoR62 genes were found on only one haplotype. The presence of FcoR clusters on three chromosomes (chr5, 9, and 11) is also consistent with phylogenetic analyses, which had split the FcoR sequences into clades that correspond with their genomic location (SI Appendix, Figs. S2 E and F and S8 GJ).

Next, we compared the sequenced genomic haplotypes of the fuhc on chromosome 11 (SI Appendix, Fig. S16A). This confirmed our previous results: the fuhc locus is bounded by conserved framework genes and consists of two distinct parts. The 3′ region of the fuhc locus encodes the fuhc-L, and the genomic organization is conserved between four independent haplotypes from California and France (Fig. 3 A and B and SI Appendix, Fig. S16A). In contrast, the region spanning from bhf to GMP synthase genes at the 5′ region of the fuhc locus, which encodes the FF/FcoR genes, is highly variable between haplotypes (Fig. 3 and SI Appendix, Fig. S16A).

Comparison of the FF/FcoR clusters on chromosome 5 and the FcoR clusters on chromosome 9 tell the same story (Fig. 3 and SI Appendix, Fig. S16 B and C). Similar to mammalian NKC or LRC, polymorphic clusters are bounded by conserved framework genes, within which these receptors are diversifying rapidly by intergenic recombination and birth and death evolution (40). In summary, both the genetic and physical mapping suggest that the majority of diversification is of the FF/FcoR pairs of chromosomes 5 and 11, along with a diversification of FcoR genes on chromosome 9.

Interestingly, during analysis of the genomic organization of each FF/FcoR pair on chromosomes 5 and 11, we found that the intron/exon structure of each gene had an unexpected characteristic: the first Ig domains of FF and FcoR proteins are encoded in either one or three exons (Fig. 4A and SI Appendix, Fig. S7, pink exons). When we compared the exon structure of each gene in the FF/FcoR pairs, we found they followed a 3:1 rule in all cases. For instance, if the FF gene had a single exon Ig domain, the FcoR partner always had a three exon Ig domain. The opposite was also true: if the FcoR gene had a single exon Ig domain, the partner FF gene had a three exons domain (Fig. 4A and SI Appendix, Fig. S7). This was similar for gene pairs on chromosome 11 (FF1/FcoR7, FF3/FcoR1, FF22/FcoR42, FF43/FcoR71, FF7/FcoR23, and FF26/FcoR33) and on chromosome 5 (FF12/FcoR3 and FF13/FcoR4). We did not observe FF and FcoR gene pairs where the first Ig-like domains of both members were encoded by a single exon (1:1) or three exons (3:3), indicating the existence of a genomic mechanism that maintains certain gene pairs, with the caveat that only a few haplotypes have been analyzed thus far.

Finally, these analyses revealed a completely unexpected result: neither FF1 nor FF3 are ubiquitously expressed, instead CNV also includes the FF1/FcoR7 and FF3/FcoR1 gene pairs (Fig. 3 and SI Appendix, Fig. S15). This reveals that our previous functional studies demonstrating that these genes were necessary and sufficient for fusion and rejection were genotype specific; there is not a single activation or single inhibitory pathway used by all individuals (26, 29). As discussed below, this surprising finding both explains the diversity of these two gene families, and provides crucial insight into the mechanisms that underlie allelic discrimination in Botryllus, and likely throughout the metazoa.

Genotype-Specific Expression of the FF and FcoR Genes.

The ampullae (Fig. 1) are highly regenerative, and following surgical ablation will regrow rapidly. We found that expression levels of FF1 and FF3 were genotype specific, and maintained in the ampullae even following multiple cycles of ablation and regeneration (29, 41). Genotype-specific expression may be due to an education process, during which inhibitory and activating signals are calibrated for allelic discrimination (29, 41, 42). We next quantified expression of all the new FF and FcoR family members in ampullae isolated from five genotypes following 3–4 cycles of ablation and regeneration. All FF and FcoR genes showed repeatable, genotype-specific expression patterns in all five genotypes that could differ by an order of magnitude in some cases (Fig. 5 A and B). As described next, that diversity was not observed in other Botryllus allorecognition genes (bhf, fuhc-sec, fuhc-tm, hsp40L, E3-ligase) or signal transduction genes in the ITAM/ITIM pathways.

Fig. 5.

Fig. 5.

Expression of allorecognition genes in Botryllus ampullae. RNA expression levels (y-axis, TMM units) of (A) FF, (B) FcoR and (C) fuhc-L and signal transduction genes (x-axis) are shown from isolated vascular tissue from five individuals (colored boxes) following 3–4 rounds of ablation/regeneration (see text). Expression of FF/FcoR gene pairs and chromosomal (chr) locations are indicated. Each individual expresses a unique repertoire of FF and FcoR genes. By contrast, fuhc-L (bhf, fuhc-sec, fuhc-tm, and hsp40l) and signal transduction genes were expressed in all individuals. Means and SD were calculated between three or four replicates per individual, Adjusted P-values for the ANOVA test are indicated: ***P < 0.001, **P < 0.01, *P < 0.05.

Botryllus Vasculature Expresses Homologs of ITAM and ITIM Signal Transduction Proteins.

Allelic discrimination occurs at the interface between two apposed ampullae. To determine if the ampullar cells also express ITIM/ITAM signal transduction proteins, we characterized gene expression in bulk RNA-seq transcriptomes from FACS isolated vascular cells (42, 43), as well as quantifying expression in the transcriptomes from regenerated ampullae of five genotypes (Fig. 5C and SI Appendix, Fig. S17). In addition to the FF and FcoR genes, we identified homologs of nearly every signal transduction protein used in ITAM and ITIM signaling in T-cells and NK cells (Fig. 5C and SI Appendix, Fig. S15), including receptor tyrosine phosphatases with homology to CD45, Src, and SYK family kinases, SHP-1, SHIP, VAV, PI3K, PLC-γ, ITK, GADS, Grb2, Crk, RasGRP, PKC-θ, the entire MAPK pathway, CARMA1, calcineurin, calmodulin, NFAT, and NF-κB. However, there are two notable exceptions: homologs of LAT and SLP-76 are not found in either Botryllus or other ascidian genomes (39, 44). Importantly, the expression levels of the signal transduction genes are similar between genotypes, in contrast to those of the FF and FcoR genes (Fig. 5 AC). Moreover, all fuhc-L genes (bhf, fuhc-sec, fuhc-tm, and hsp40l) have been shown to be expressed in the ampullae previously (4548), and these genes were also expressed at similar levels in all genotypes.

In summary, all candidate allorecognition genes, and the repertoire of signal transduction and transcription factors for ITAM and ITIM signaling are expressed in vascular cells, the site of the allorecognition response (Fig. 1).

Discussion

Allorecognition in B. schlosseri utilizes innate recognition mechanisms to discriminate between up to a thousand alleles of the fuhc, but how this level of specificity is achieved is not understood. Previously, we found that the allorecognition response uses a missing-self recognition strategy based on the integration of two independent signals, one generated by the binding of fester (FF1), and the other by the binding of uncle fester (FF3). Fusion or rejection are not mutually exclusive outcomes, as each signal can be manipulated in vivo (26, 29). For example, a rejection can be converted to a fusion via mAb crosslinking fester, while a fusion could be converted to a rejection by mAb crosslinking of uncle fester. These results suggested that the monomorphic uncle fester initiated the rejection response, which could be overridden by recognition of a single self fuhc allele by the polymorphic inhibitory receptor, fester. These characteristics are reminiscent of ITAM/ITIM signal integration by NK cells, but there are no signaling domains on either fester or uncle fester proteins. Conversely, proteins encoding cytoplasmic ITIM and ITAM motifs, as well as all associated signal transduction molecules used in mammals are present in Botryllus and other invertebrate genomes (4, 44, 49); thus, we hypothesized that signaling occurs via pairing of fester and uncle fester proteins with adaptor molecules encoding the signaling motifs. In summary, these results suggested that binding of uncle fester activated the rejection response, and that fester was solely responsible for allelic discrimination. However, this left us with three unanswered questions: what is the activating ligand of uncle fester, how are the fusion and rejection signals transmitted and integrated, and most importantly, how can fester discriminate between 1000 histocompatibility ligands?

Here, we find that our original conclusions were oversimplified. However, this newfound complexity coupled to two unexpected and counterintuitive results provide insight into the mechanisms that underlie the discriminatory capabilities of Botryllus histocompatibility. First, the FF gene family has over 36 diverse members, and the average pairwise amino acid identity between FF loci is 2-3 times lower than between any two KIR (SI Appendix, Fig. S1A) or Ly49 loci (50). We have also identified the putative signal transduction partner, the even more diverse FcoR. Each FF gene is encoded adjacent to and coexpressed with a cognate FcoR gene, which encodes one or more signal transduction motifs, including canonical ITIMs, hemITAMs, and other core tyrosine motifs. Remarkably, both the polymorphism and signal transduction motifs of the FcoR partner of fester and uncle fester are equivalent to our previous functional studies (SI Appendix, Figs. S11 and S12): the polymorphic inhibitory receptor fester (FF1) is paired with the polymorphic FcoR7, encoding an ITIM; while the monomorphic activating receptor uncle fester (FF3) pairs with the monomorphic FcoR1, encoding a dual core-2 domain. Thus far, the association between polymorphism and signal transduction motifs extends to all new pairings. Monomorphic FF genes are paired with monomorphic FcoR genes encoding hemITAMs and/or different tyrosine core motifs, but never encode ITIMs. By contrast, there are two oligomorphic FF genes (FF12 and FF13), paired with two oligomorphic FcoR genes (FcoR3 and FcoR4). Interestingly, the putative functional roles of FcoR3 and FcoR4 genes are plastic; both genes have alleles that are either activating (encoding an hemITAM) or inhibitory (encoding an ITIM). In addition, each allele of both genes can undergo alternative splicing, swapping between the hemITAM and ITIM motifs (Fig. 2), suggesting that dynamic tuning of signaling is required for specificity.

The genomic complexity of these genes is significant (Fig. 3). There are two FF/FcoR clusters, one within the fuhc locus on chromosome 11, while the other is on chromosome 5. Each locus is expressed in a genotype specific manner (Fig. 5A), and for the FF family, this includes differences in alternative splicing (Fig. 4). Notably, the copy number variation (CNV) between haplotypes on each chromosome is of the gene pairs, including duplication of entire pairs (Fig. 3). Finally, each FF/FcoR pair in the Mediterranean individual is also found in genotypes from California, indicating that selection is on the protein pairs, likely because both contribute to binding specificity. While this hypothesis is consistent with the transcriptomic data, we are currently testing if each pair forms specific heterodimers. In addition, another highly polymorphic haplotype of FcoR genes is found on chromosome 9, but lacks FF genes.

How do we interpret these results functionally? While the exact structure of the fuhc ligand is unknown, there are four polymorphic genes (bhf, fuhc-sec, fuhc-tm, and hsp40l) encoded in a 110 Kb region of the fuhc locus, which we call the fuhc-L (Fig. 3) (28). In our lab reared strains, polymorphisms of all four genes predict fusion/rejection outcomes, implying that one or more of these genes form the fuhc ligand (4547, 51). However, fuhc-sec must be a key component of the fuhc ligand, as it is the only protein with polymorphism high enough to explain fusibility outcomes in wild-type populations (48, 5154). fuhc-sec encodes a 550 aa protein with over 300 alleles in natural populations (48, 5154). In contrast, these same populations harbor 60 alleles of FF1, the second most polymorphic gene in the fuhc locus (29, 30, 53). On average, two fuhc-sec alleles differ by substitutions at 40 amino acids, and 28 codons have strong evidence of positive selection (28, 52, 53). Interestingly, these polymorphic residues are spread randomly throughout the protein, with no hypervariable sites or other patterns that provide structural insight into how fuhc-sec alleles could be differentiated by fester (FF1, (48, 52)). The results here provide an explanation. We hypothesize that the 28 polymorphic residues on fuhc-sec serve as epitopes for the diverse FF/FcoR heterodimers. Functionally, this is equivalent to the MHC Class I epitopes (A3/11, Bw4, C1, C2) for inhibitory KIR receptors in the primates (18), but with added complexity. In turn, this suggests that individual FF and FcoR genes are so divergent because the FF/FcoR heterodimers are binding to independent epitopes scattered throughout fuhc-sec, and any of other proteins that may make up the fuhc ligand.

Two other results were completely unexpected, but provide critical insights into the mechanisms that underlie allelic discrimination. First, in context of this newfound FF diversity, the most surprising finding in this study was that neither FF1 nor FF3 are expressed by all genotypes (Fig. 3). This means that our previous functional studies suggesting that fester (FF1) and uncle fester (FF3) were necessary and sufficient for fusion and rejection pathways, respectively, were genotype-specific (26, 29). This indicates there is not a single, conserved activation ligand or receptor used by all individuals, nor a single, conserved inhibitory ligand or receptor (26). Therefore, in genotypes that do not express FF1 or FF3, other FF/FcoR heterodimers are likely activating and inhibitory receptors, and bind to other activating and inhibitory epitopes on the fuhc ligand. Second, there are multiple FcoRs with ITIMs, multiple with hemITAMs or core motifs, and two oligomorphic FcoRs that can switch between putative activating and inhibitory roles. As each individual expresses many FF and FcoR genes, this suggests that multiple independent activating and inhibitory receptor/epitope binding events contribute to allelic discrimination, versus integration of a single activating and single inhibitory pathway. In addition, knockdown of only a single gene could block fusion (FF1) or rejection (FF3), suggesting all binding events are required for outcome (26, 29).

Together, we hypothesize that allelic discrimination in Botryllus is similar to how a mammalian NK cell integrates inputs from the binding of multiple activating and inhibitory receptors to determine whether to kill or tolerate a target cell (9). However, in Botryllus, multiple activating and inhibitory FF/FcoR receptors bind to different epitopes on the fuhc ligand, and allelic discrimination occurs at the point of signal integration. Consequently, there is no requirement for a single high affinity receptor/ligand interaction to identify a self-allele. Instead, discrimination is partitioned among many independent activating and inhibitory receptor/ligand interactions, all of which are integrated for outcome. Signal integration between two independent pathways augments the dynamic range available for discrimination, filtering out noise and boosting the ability to detect subtle changes in binding (55). In turn, this suggests that fuhc ligand plays a role in both fusion and rejection.

For fusion (Fig. 1C), we hypothesize that when two ampullae sharing a self-allele come into contact, all the FF/FcoR heterodimers on each cell bind in trans to their epitopes on the self fuhc-sec of the other individual. This would position the activating and inhibitory motifs inside each cell into a configuration that reflects the position of the bound epitopes, and initiates repeatable downstream interactions between kinases, phosphatases, and adaptor molecules that generates a signal for fusion (28). Similar spatial requirements are required for signal integration in both T- and NK-cells (56, 57). In contrast, if none, or only a subset of the FF/FcoR heterodimers bind in trans, the signaling is altered, fusion is blocked, and a rejection response occurs (Fig. 1B, discussed below).

To establish specificity, the cell must match its repertoire of receptors to its self-ligand(s), setting a fusion/rejection threshold for each self fuhc ligand allele independently. We envision this occurs via cis interactions (28), as signaling motifs must be physically adjacent to interact, and a fuhc heterozygote would need two discrete receptor structures to discriminate between both self-histocompatibility alleles. The establishment of specificity would depend on processes similar to the rheostat model of education that occurs in mammalian NK cells, and required for the exact same reason. Each NK cell stochastically expresses a group of activating and inhibitory receptors, then using those randomly generated binding specificities, sets a cell autonomous activation threshold that allows the cell to be functional but self-tolerant (14, 15). Similarly, each Botryllus individual inherits a repertoire of diverse receptors, which it must use to establish allelic specificity, likely using the same education process.

Evidence of education includes any genotype-specific pattern that would affect binding or signaling. For example, the maintenance of individual FF and FcoR expression levels following multiple ablation/regeneration cycles of the ampullae (Fig. 5). In addition, the FcoR3 and FcoR4 genes can switch between putative functional roles by alternative splicing. This would allow an individual receptor to toggle epitope binding between activating or inhibitory output, providing the ability to tune the integrated signal until the correct activation threshold is set (Fig. 2). However, the mechanisms that a cell uses to establish and maintain a correct activation threshold are not understood in any system. It is one of the most important unanswered questions in NK cell biology, and the foundation of immune tolerance (6, 58, 59).

In contrast to compatible interactions, the rejection response is variable, and a wide range of phenotypes have been described. These include a ‘no-reaction’ phenotype, where ampullae from different colonies touch, but there are no visible signs of rejection (41, 48, 60). There are also differences in rejection intensity based on the size, number, and time to development of the PORs (Fig. 1B) (41, 61). Finally, asymmetric rejection responses have been observed, where ampullae from one individual undergo a rejection response, while those from the other appear to be unresponsive (41, 62). Interestingly, rejection phenotypes are specific to the interacting genotypes; for example, pairing incompatible genotypes A and B can result in a strong rejection, whereas pairing incompatible genotypes A and C can result in a “no reaction” phenotype (48). In addition, two previous genetic studies showed that the severity of the rejection response mapped to the fuhc locus (48, 61). This indicates that polymorphisms of the fuhc ligand also control the rejection response.

We hypothesize that the rejection response is variable because there are >25 putative epitopes on fuhc-sec, and the FF and FcoR clusters on chr5 and chr9 are segregating independently of the ligand. Thus, some individuals will have fuhc-sec epitopes with no receptors, and receptors with no epitopes. In addition, some epitopes will be shared by multiple alleles, others will not, or one individual could have receptors for epitopes on a nonself allele, while the other does not. This suggests that the number and ratio of activating and inhibitory receptors from one individual that bind to epitopes on the nonself fuhc ligand(s) of the other can explain the range of rejection phenotypes (28). This can also be demonstrated in vivo. We had found that in two incompatible genotypes, a normal rejection can be converted to an asymmetric rejection by knocking out FF3 (uncle fester) in one of the individuals. Ampullae without FF3 expression were rendered unresponsive, but that had no effect on the rejection response of the unmanipulated individual (26). In summary, manipulating the FF/FcoR repertoire of one individual changed the rejection phenotype, consistent with our hypothesis.

The partitioning of allelic discrimination among multiple receptor/ligand interactions can also explain how new fuhc specificities arise. We envision that the affinity of each FF/FcoR heterodimer for their cognate epitopes would be variable, some highly specific, others more promiscuous. Point mutations and small recombination events would create and combine new epitope combinations in the fuhc-sec gene (52, 53). Meanwhile, the FF and FcoR genes could coevolve via diversification of individual genes and pairings, creating new specificities, with others becoming obsolete, resulting in the observed birth and death evolution of FF/FcoR haplotypes (Fig. 3). This would be similar to the stepwise evolution of the KIR2DL1/C2 specificity in humans during primate evolution (63). Consequently, generation of a novel histocompatibility type does not require an evolutionary pathway in which mutations in a ligand disrupt a single high-affinity interaction, followed by complementary mutations in the receptor to bind the new ligand, which also must occur on the same chromosome to be functional. While there are single lock-and-key systems which evolve this way (e.g., the self-incompatibility (SI) loci in plants), those populations carry 10–30 allorecognition alleles, not the >300 observed in most Botryllus populations (6466). And even in the case of the SI loci, it is difficult to understand how the generation of new specificities could require complementary mutations in two genes on the same DNA strand (67). In Botryllus, allelic discrimination likely occurs in aggregate, with each epitope/receptor interaction evolving independently. This explains how an innate recognition system can create and discriminate between up to a thousand histocompatibility types.

This study provides insight into how an innate invertebrate histocompatibility system can achieve a discriminatory ability reminiscent of adaptive immunity, and can explain every characteristic of Botryllus allorecognition responses. Allelic discrimination uses a paired receptor system binding to epitopes on a single ligand, which is coupled to signal integration from canonical immune signal transduction pathways (10). Our results suggest this strategy appeared early in evolution, and is likely the foundation of rapid evolvability of immune receptors in the vertebrates, and throughout the metazoa. While much remains to be done to prove the predicted interactions between FF, FcoR, and fuhc-sec proteins (28), multiple characteristics of the allorecognition response position Botryllus as a powerful model organism to study the physical interactions that underlie signal integration, as well as the establishment and maintenance of immune specificity.

Materials and Methods

Identification of FF and FcoR Gene Families.

FF and FcoR genes were identified using the published fester (DQ517888.1) and uncle fester (JF806283.1) genes as queries against assembled transcriptomes (SI Appendix, Table S1). Newly detected genes were then used as queries to reiterate the process. Identified genes were searched in genomic scaffolds from (29) and the genome of B. schlosseri (39). The polymorphisms of the FF1 and FcoR7 genes were used to distinguish between alleles and different genes.

Phylogenetic Analysis and Tyrosine-Based Motifs of FF and FcoR Proteins.

Phylogenetic trees of full-length, extracellular and intracellular regions of FF and FcoR proteins were performed using Maximum likelihood (68) and Bayesian inference methods (69). ITIM and hemITAM motifs in FcoR proteins were annotated based on patterns described in (37).

Quantification of FF and FcoR Gene Expression.

Raw reads were mapped to FF and FcoR genes using Kallisto and RSEM (70). The number of genes per individual was estimated from de novo Trinity-assembled transcriptomes. For vasculature transcriptomes with three or four replicates, average expression levels and standard deviations were calculated. Statistical and visualization analyses are detailed in SI Appendix.

Structure Prediction of FF and FcoR Proteins.

Structures of FF and FcoR proteins were predicted using AlphaFold2 (71) and AlphaFold3 (31). Domain identification and Ig domain characterization were performed using Foldseek (32) and icn3d software (33), respectively.

Supplementary Material

Appendix 01 (PDF)

pnas.2519372122.sapp.pdf (23.2MB, pdf)

Dataset S01 (PDF)

pnas.2519372122.sd01.pdf (167.6KB, pdf)

Acknowledgments

We thank Greg Stoney for collecting and maintaining the animals used here, and the Center for Scientific Computing of the University of California, Santa Barbara for sharing its clusters to perform the bioinformatics analysis. This research was supported by the NIH Grants GM139649 and OD030520 to A.W.D.T., and the Agence Nationale de la Recherche (ANR-24-CE02-2277) and Institut des Sciences Biologiques-(DBM EVOCHORE) to S.T.

Author contributions

H.R.-V. and A.W.D.T. designed research; H.R.-V., J.S., and A.W.D.T. performed research; O.D.T., J.F.F., and S.T. contributed new reagents/analytic tools; H.R.-V. and A.W.D.T. analyzed data; and H.R.-V. and A.W.D.T. wrote the paper.

Competing interests

The authors declare no competing interest.

Footnotes

This article is a PNAS Direct Submission.

Data, Materials, and Software Availability

RNA-seq raw reads have been deposited in GenBank under Accession no. GSE295655 (72). FF, FcoR, and NK homolog genes have been deposited in GenBank, and accession numbers are listed in SI Appendix, Table S6.

Supporting Information

References

  • 1.Burnet F. M., “Self-recognition” in colonial marine forms and flowering plants in relation to the evolution of immunity. Nature 232, 230–235 (1971). [DOI] [PubMed] [Google Scholar]
  • 2.Buss L. W., Somatic cell parasitism and the evolution of somatic tissue compatibility. Proc. Natl. Acad. Sci. U.S.A. 79, 5337–5341 (1982). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Grice L. F., et al. , Origin and evolution of the sponge aggregation factor gene family. Mol. Biol. Evol. 34, 1083–1099 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Nicotra M. L., Invertebrate allorecognition. Curr. Biol. 29, R463–R467 (2019). [DOI] [PubMed] [Google Scholar]
  • 5.Snell G. D., Studies in histocompatibility. Science 213, 172–178 (1981). [DOI] [PubMed] [Google Scholar]
  • 6.Kärre K., Natural killer cell recognition of missing self. Nat. Immunol. 9, 477–480 (2008). [DOI] [PubMed] [Google Scholar]
  • 7.Sibener L. V., et al. , Isolation of a structural mechanism for uncoupling T Cell Receptor signaling from peptide-MHC binding. Cell 174, 672–687.e27 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Ashby K. M., Hogquist K. A., A guide to thymic selection of T cells. Nat. Rev. Immunol. 24, 103–117 (2024). [DOI] [PubMed] [Google Scholar]
  • 9.Lanier L. L., NK cell recognition. Annu. Rev. Immunol. 23, 225–274 (2005). [DOI] [PubMed] [Google Scholar]
  • 10.Lanier L. L., Face off — the interplay between activating and inhibitory immune receptors. Curr. Opin. Immunol. 13, 326–331 (2001). [DOI] [PubMed] [Google Scholar]
  • 11.Štefanová I., et al. , TCR ligand discrimination is enforced by competing ERK positive and SHP-1 negative feedback pathways. Nat. Immunol. 4, 248–254 (2003). [DOI] [PubMed] [Google Scholar]
  • 12.Stebbins C. C., et al. , Vav1 dephosphorylation by the tyrosine phosphatase SHP-1 as a mechanism for inhibition of cellular cytotoxicity. Mol. Cell. Biol. 23, 6291–6299 (2003). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Hui E., Vale R. D., In vitro membrane reconstitution of the T-cell receptor proximal signaling network. Nat. Struct. Mol. Biol. 21, 133–142 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Joncker N. T., Fernandez N. C., Treiner E., Vivier E., Raulet D. H., NK cell responsiveness is tuned commensurate with the number of inhibitory receptors for self-MHC class I: The rheostat model. J. Immunol. 182, 4572–4580 (2009). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Brodin P., Kärre K., Höglund P., NK cell education: Not an on-off switch but a tunable rheostat. Trends Immunol. 30, 143–149 (2009). [DOI] [PubMed] [Google Scholar]
  • 16.Scofield V. L., Schlumpberger J. M., West L. A., Weissman I. L., Protochordate allorecognition is controlled by a MHC-like gene system. Nature 295, 499–502 (1982). [DOI] [PubMed] [Google Scholar]
  • 17.Flajnik M. F., Kasahara M., Origin and evolution of the adaptive immune system: Genetic events and selective pressures. Nat. Rev. Genet. 11, 47–59 (2010). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Guethlein L., Hilton H., Norman P., Parham P., Co-evolution of lymphocyte receptors with MHC class I. Immunol. Rev. 267, 1–5 (2015). [DOI] [PubMed] [Google Scholar]
  • 19.Wcisel D. J., et al. , A highly diverse set of novel immunoglobulin-like transcript (NILT) genes in zebrafish indicates a wide range of functions with complex relationships to mammalian receptors. Immunogenetics 75,69 (2022). 10.1007/s00251-022-01270-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Kelley J., Walter L., Trowsdale J., Comparative genomics of natural killer cell receptor gene clusters. PLoS Genet. 1, e27 (2005). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Huene A. L., et al. , A family of unusual immunoglobulin superfamily genes in an invertebrate histocompatibility complex. Proc. Natl. Acad. Sci. U.S.A. 119, e2207374119 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.De Tomaso A. W., Sea squirts and immune tolerance. Dis. Model. Mech. 2, 440–445 (2009). [DOI] [PubMed] [Google Scholar]
  • 23.Delsuc F., Brinkmann H., Chourrout D., Philippe H., Tunicates and not cephalochordates are the closest living relatives of vertebrates. Nature 439, 965–968 (2006). [DOI] [PubMed] [Google Scholar]
  • 24.Sabbadin A., Le basi genetiche della capacita di fusione fra colonies in Botryllus schlosseri (Asidiacea). Rend Accad Naz Lincie, Series 8, 1031–1035 (1962). [Google Scholar]
  • 25.Rinkevich B., Porat R., Goren M., Allorecognition elements on a urochordate histocompatibility locus indicate unprecedented extensive polymorphism. Proc. R. Soc. Lond. Ser. B: Biol. Sci. 259, 319–324 (1995). [Google Scholar]
  • 26.McKitrick T. R., Muscat C. C., Pierce J. D., Bhattacharya D., De Tomaso A. W., Allorecognition in a basal chordate consists of independent activating and inhibitory pathways. Immunity 34, 616–626 (2011). [DOI] [PubMed] [Google Scholar]
  • 27.Grosberg R. K., Quinn J. F., The genetic control and consequences of kin recognition by the larvae of a colonial marine invertebrate. Nature 322, 456–459 (1986). [Google Scholar]
  • 28.De Tomaso A. W., Rodriguez-Valbuena H., Histocompatibility in Botryllus schlosseri and the origins of adaptive immunity. Immunogenetics 77, 22 (2025). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Nyholm S. V., et al. , fester, a candidate allorecognition receptor from a primitive chordate. Immunity 25, 163–173 (2006). [DOI] [PubMed] [Google Scholar]
  • 30.Nydam M. L., De Tomaso A. W., The fester locus in Botryllus schlosseri experiences selection. BMC Evol. Biol. 12, 249 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Abramson J., et al. , Accurate structure prediction of biomolecular interactions with AlphaFold 3. Nature 630, 493–500 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.van Kempen M., et al. , Fast and accurate protein structure search with Foldseek. Nat. Biotechnol. 42, 243–246 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Tawfeeq C., et al. , IgStrand: A universal residue numbering scheme for the immunoglobulin-fold (Ig-fold) to study Ig-proteomes and Ig-interactomes. PLoS Comput. Biol. 21, e1012813 (2025). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Harpaz Y., Chothia C., Many of the immunoglobulin superfamily domains in cell adhesion molecules and surface receptors belong to a new structural set which is close to that containing variable domains. J. Mol. Biol. 238, 528–539 (1994). [DOI] [PubMed] [Google Scholar]
  • 35.Ereño-Orbea J., et al. , Molecular basis of human CD22 function and therapeutic targeting. Nat. Commun. 8, 764 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Carrillo R. A., et al. , Control of synaptic connectivity by a network of drosophila igSf cell surface proteins. Cell 163, 1770–1782 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.Bauer B., Steinle A., HemITAM: A single tyrosine motif that packs a punch. Sci. Signal. 10, eaan3676 (2017). [DOI] [PubMed] [Google Scholar]
  • 38.Lo W.-L., et al. , A single-amino acid substitution in the adaptor LAT accelerates TCR proofreading kinetics and alters T-cell selection, maintenance and function. Nat. Immunol. 24, 676–689 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Thier O. D., et al. , First chromosome-level genome assembly of the colonial tunicate Botryllus schlosseri (Tunicata). GigaScience 14, giaf097 (2025). 10.1093/gigascience/giaf097. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Nei M., Gu X., Sitnikova T., Evolution by the birth-and-death process in multigene families of the vertebrate immune system. Proc. Natl. Acad. Sci. U.S.A. 94, 7799–7806 (1997). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41.Taketa D. A., Cengher L., Rodriguez D., Langenbacher A. D., De Tomaso A. W., Genotype-specific expression of Uncle Fester suggests a role in allorecognition education in a basal chordate. Integr. Comp. Biol. 64, icae107 (2024). 10.1093/icb/icae107. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42.Braden B. P., et al. , Vascular regeneration in a basal chordate is due to the presence of immobile. Bi-functional cells. PLoS One 9, e95460 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43.Rodriguez D., et al. , In vivo manipulation of the extracellular matrix induces vascular regression in a basal chordate. Mol. Biol. Cell 28, 1883–1893 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44.Azumi K., et al. , Genomic analysis of immunity in a Urochordate and the emergence of the vertebrate immune system: “Waiting for Godot”. Immunogenetics 55, 570–581 (2003). [DOI] [PubMed] [Google Scholar]
  • 45.Voskoboynik A., et al. , Identification of a colonial chordate histocompatibility gene. Science 341, 384–387 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46.Taketa D. A., et al. , Molecular evolution and in vitro characterization of Botryllus histocompatibility factor. Immunogenetics 67, 605–623 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47.Nydam M. L., Hoang T. A., Shanley K. M., De Tomaso A. W., Molecular evolution of a polymorphic HSP40-like protein encoded in the histocompatibility locus of an invertebrate chordate. Dev. Comp. Immunol. 41, 128–136 (2013). [DOI] [PubMed] [Google Scholar]
  • 48.Nydam M. L., et al. , The candidate histocompatibility locus of a basal chordate encodes two highly polymorphic proteins. PLoS One 8, e65980 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 49.Takahashi H., Ishikawa G., Ueki K., Azumi K., Yokosawa H., Cloning and tyrosine phosphorylation of a novel invertebrate immunocyte protein containing immunoreceptor tyrosine-based activation motifs*. J. Biol. Chem. 272, 32006–32010 (1997). [DOI] [PubMed] [Google Scholar]
  • 50.Mehta I. K., Wang J., Roland J., Margulies D. H., Yokoyama W. M., Ly49A allelic variation and MHC class I specificity. Immunogenetics 53, 572–583 (2001). [DOI] [PubMed] [Google Scholar]
  • 51.De Tomaso A. W., et al. , Isolation and characterization of a protochordate histocompatibility locus. Nature 438, 454–459 (2005). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 52.Nydam M. L., Taylor A. A., De Tomaso A. W., Evidence for selection on a protochordate histocompatibility locus. Evolution 67, 487–500 (2013). [DOI] [PubMed] [Google Scholar]
  • 53.Nydam M. L., Stephenson E. E., Waldman C. E., De Tomaso A. W., Balancing selection on allorecognition genes in the colonial ascidian Botryllus schlosseri. Dev. Comp. Immunol. 69, 60–74 (2017). [DOI] [PubMed] [Google Scholar]
  • 54.Nydam M. L., De Tomaso A. W., Creation and maintenance of variation in allorecognition loci: Molecular analysis in various model systems. Front Immunol. 2, 79 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 55.Goldbeter A., Koshland D. E., An amplified sensitivity arising from covalent modification in biological systems. Proc. Natl. Acad. Sci. 78, 6840–6844 (1981). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 56.Tomaz D., et al. , Nanoscale colocalization of NK cell activating and inhibitory receptors controls signal integration. Front. Immunol. 13, 868496 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 57.Yokosuka T., et al. , Programmed cell death 1 forms negative costimulatory microclusters that directly inhibit T cell receptor signaling by recruiting phosphatase SHP2. J. Exp. Med. 209, 1201–1217 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 58.Orr M. T., Lanier L. L., Natural killer cell education and tolerance. Cell 142, 847–856 (2010). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 59.Kadri N., Thanh T. L., Höglund P., Selection, tuning, and adaptation in mouse NK cell education. Immunol. Rev. 267, 167–177 (2015). [DOI] [PubMed] [Google Scholar]
  • 60.Rinkevich B., Weissman I. L., Incidents of rejection and indifference in Fu/HC incompatible protochordate colonies. J. Exp. Zoöl. 263, 105–111 (1992). [DOI] [PubMed] [Google Scholar]
  • 61.Scofield V., Nagashima L., Morphology and genetics of rejecting reactions between oozooids from the tunicate Botryllus schlosseri. Biol. Bull. 165, 733–744 (1983). [DOI] [PubMed] [Google Scholar]
  • 62.Oren M., et al. , ‘Rejected’ vs. ‘rejecting’ transcriptomes in allogeneic challenged colonial urochordates. Mol. Immunol. 47, 2083–2093 (2010). [DOI] [PubMed] [Google Scholar]
  • 63.Aguilar A. M. O., et al. , Coevolution of killer cell Ig-like receptors with HLA-C to become the major variable regulators of human NK cells. J. Immunol. 185, 4238–4251 (2010). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 64.Nasrallah J. B., Recognition and rejection of self in plant self-incompatibility: Comparisons to animal histocompatibility. Trends Immunol. 26, 412–418 (2005). [DOI] [PubMed] [Google Scholar]
  • 65.Maenosono T., et al. , Exploring the allelic diversity of the self-incompatibility gene across natural populations in petunia (Solanaceae). Genome Biol. Evol. 16, evae270 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 66.Mable B. K., Schierup M. H., Charlesworth D., Estimating the number, frequency, and dominance of S-alleles in a natural population of Arabidopsis Lyrata (Brassicaceae) with sporophytic control of self-incompatibility. Heredity 90, 422–431 (2003). [DOI] [PubMed] [Google Scholar]
  • 67.Charlesworth D., How can two-gene models of self-incompatibility generate new specificities? Plant Cell 12, 309–310 (2000). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 68.Trifinopoulos J., Nguyen L., Haeseler A. V., Minh B. Q., W-IQ-TREE: A fast online phylogenetic tool for maximum likelihood analysis. Nucleic Acids Res. 44, 232–235 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 69.Huelsenbeck J. P., Ronquist F., MRBAYES: Bayesian inference of phylogenetic trees. Bioinformatics 17, 754–755 (2001). [DOI] [PubMed] [Google Scholar]
  • 70.Li B., Dewey C. N., RSEM: Accurate transcript quantification from RNA-Seq data with or without a reference genome. BMC Bioinform. 12, 323 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 71.Mirdita M., et al. , ColabFold: Making protein folding accessible to all. Nat. Methods 19, 679–682 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 72.Rodriguez-Valbuena H., et al. , Exceptional diversity of allorecognition receptors in a non-vertebrate chordate reveal principles of innate allelic discrimination. NCBI GEO. https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE295655. Deposited 25 April 2025. [DOI] [PMC free article] [PubMed]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Appendix 01 (PDF)

pnas.2519372122.sapp.pdf (23.2MB, pdf)

Dataset S01 (PDF)

pnas.2519372122.sd01.pdf (167.6KB, pdf)

Data Availability Statement

RNA-seq raw reads have been deposited in GenBank under Accession no. GSE295655 (72). FF, FcoR, and NK homolog genes have been deposited in GenBank, and accession numbers are listed in SI Appendix, Table S6.


Articles from Proceedings of the National Academy of Sciences of the United States of America are provided here courtesy of National Academy of Sciences

RESOURCES