Skip to main content
Journal of Virology logoLink to Journal of Virology
. 2025 Mar 21;99(4):e00181-25. doi: 10.1128/jvi.00181-25

Development of ferret immune repertoire reference resources and single-cell-based high-throughput profiling assays

Evan S Walsh 1,2,3,#, Kui Yang 3,#, Tammy S Tollison 1, Sujatha Seenu 3, Nicole Adams 1,2, Guilhem Zeitoun 4, Ifigeneia Sideri 4, Geraldine Folch 4, Hayden N Brochu 1,2, Hsuan Chou 1, Sofia Kossida 4,5, Ian A York 3, Xinxia Peng 1,2,6,
Editor: Stacey Schultz-Cherry7
PMCID: PMC11998538  PMID: 40116504

ABSTRACT

Domestic ferrets (Mustela putorius furo) are important for modeling human respiratory diseases. However, ferret B and T cell receptors have not been completely identified or annotated, limiting immune repertoire studies. Here, we performed long-read transcriptome sequencing of ferret splenocyte and lymph node samples to obtain over 120,000 high-quality full-length immunoglobin (Ig) and T cell receptor (TCR) transcripts. We constructed a complete reference set of the constant regions of ferret Ig and TCR isotypes and chain types. We also systematically annotated germline Ig and TCR variable (V), diversity (D), joining (J), and constant (C) genes on a recent ferret reference genome assembly. We designed new ferret-specific immune repertoire profiling assays by targeting positions in constant regions without allelic diversity across 11 ferret genome assemblies and experimentally validated them using a commercially compatible single-cell-based platform. These improved resources and assays will enable future studies to fully capture ferret immune repertoire diversity.

IMPORTANCE

Domestic ferrets (Mustela putorius furo) are an increasingly common model organism to study human respiratory diseases such as influenza infections. However, researchers lack ferret-specific reagents and resources to study the immune system and immune response in ferrets. In this study, we developed comprehensive ferret immune repertoire reference resources and assays, which will enable more accurate analyses of the ferret immune system in the future.

KEYWORDS: ferret, immune repertoire, reference resource development, VDJ annotation, single-cell sequencing, repertoire profiling

INTRODUCTION

Domestic ferrets (Mustela putorius furo) are an increasingly common model organism to study human respiratory diseases. For example, ferrets are routinely used in studies involving influenza viruses due to their ability to be infected with human strains (1, 2), allowing vaccine (3) and therapeutic (4, 5) efficacy studies. Ferrets are also useful models for cystic fibrosis (6), and more recently, ferrets have been used as a model organism for SARS-CoV-2 infection (79). Despite these important roles in translational medicine, ferret infectious disease research is limited due to a lack of resources, including incomplete and missing immunogenetic resources. Developing complete and accurate genomic resources, especially for the immune system, is imperative for effective translational interpretation.

B and T cell receptor molecules serve the important role of recognizing and binding to foreign antigens (Ag) from infectious agents. The ability of these receptors to recognize foreign antigens is in part due to their generation and structure. Immunoglobulins (Igs) and T cell receptors (TCRs) have two domains: a constant (C) region and a variable (V) region. The V region is comprised of a variable (V), joining (J), and in some cases, a diversity (D) gene. These genes are duplicated in several large loci in the genome and are somatically rearranged before transcription to establish a diverse repertoire of both receptor molecules and antigen-binding sites. Following rearrangement, humans may have up to 1013 and 1018 unique Ig and TCR V-region domains within an individual, respectively (10). The genetic diversity of Ig and TCR molecules makes their correct assembly, annotation, and validation a major technical challenge for standard high-throughput short-read based approaches, as illustrated by earlier artificial chromosome cloning-based studies (11, 12) and more recent ones based on long-read sequencing (13, 14).

Immune repertoire studies are rapidly becoming a mainstream tool for understanding immunity to infectious and autoimmune diseases and cancers. In these studies, the Ig and TCR repertoires of B and T lymphocytes are profiled, usually via high-throughput sequencing methods (Rep-seq). Currently, only humans, mice, and some non-human primates [rhesus monkey (15) and lowland gorilla (16)] have relatively complete germline Ig and TCR annotations and established assays needed to perform and analyze high-throughput immune-profiling analyses. Although progress (see references 1719) has been made to perform and analyze high-throughput immune-profiling analyses in other species, ferrets remain lacking in this area of study, making the analysis and interpretation of immune repertoire studies challenging.

Some efforts have been made to close the gap in ferret immunogenetics and annotate these complex Ig and TCR genes. Recently, Wong et al. (20) predicted V-, D-, J-, and C-region genes from Ig loci and subsequently designed a single-cell multiplex PCR (MPCR) experiment for sequence validation. Additionally, Gerritsen et al. (21) used 5’ rapid amplification of cDNA ends (RACE) to annotate the ferret T cell receptor B (TRB) locus. These were the only ferret sequences available in the international ImMunoGeneTics information system (IMGT) database (22). The most recent annotation of the domestic ferret genome was the NCBI RefSeq annotation release 102 on the NCBI RefSeq genome assembly GCF_011764305 (23); the corresponding Submitted GenBank assembly was GCA_011764305.2, which was generated using both Illumina short read and PacBio long read technologies (24). This RefSeq annotation reportedly contains 142 Ig and TCR genes (23), which are computational predictions based on orthologous genes in similar species and have yet to be validated. Despite these recent advances, the design of ferret-specific assays for profiling Ig and TCR repertoires still relies on very limited resources.

In this study, we aimed to systematically identify and annotate ferret Ig and TCR sequences, including the C, V, D, and J genes of each, by extending previously established strategies (18, 25) and combining multiple approaches. This allowed us to construct the first complete reference set of ferret Ig and TCR C-region genes, covering all ferret Ig/TCR isotypes and chain types, using long-read transcriptome sequencing analysis (Iso-Seq). We then utilized the Iso-Seq data, C-region reference, and recombination signal sequence (RSS) analysis to additionally identify hundreds of Ig and TCR V, D, and J loci on a recent ferret reference genome assembly. We also used this C-region reference to design new ferret-specific assays that allow for a full Ig/TCR repertoire analysis at the single-cell level. We validated these new assays with ferret peripheral blood mononuclear cell (PBMC) and splenocyte samples in conjunction with the 10× Genomics Chromium system, a common commercial single-cell sequencing platform. Taken together, this study provides comprehensive reference resources and assays that are essential for ferret immunogenetics and immune repertoire profiling analysis.

RESULTS

Overall strategy

We started by performing Iso-Seq analysis of ferret lymph node and spleen samples, similarly as (25) to circumvent the challenging task of correctly assembling Ig/TCR transcripts from short sequencing reads. The complete reference set of ferret Ig and TCR constant region (C-region) sequences was constructed from the obtained high-quality full-length Ig and TCR transcript sequences and the corresponding ferret C-region genes were annotated on a recent ferret reference genome assembly (NCBI accession no. GCA_011764305.2).

Guided by these ferret Ig/TCR C-region references, we selected full-length Ig and TCR transcript sequences containing C-regions from the ferret Iso-Seq data and analyzed them for V, D, and J genes. In parallel, we performed a semi-automated analysis of ferret genome assemblies, using conserved features such as RSS sequences to identify and annotate putative germline genes. Germline V, D, and J genes were manually reviewed, curated, and classified as functional, open reading frame (ORF), or pseudogenes based on the presence of conserved features, including the RSS and full-length open reading frames. Subsequently, expression of V and J genes in transcriptomes from naïve ferrets was used to further validate the classification of genes as functional; this resulted in changing the classification of several genes (mainly those with non-canonical RSS sequences; see below) to “functional.”

We then designed new ferret-specific V(D)J assays that use primers intended to target the C-regions and allow single-cell level Ig/TCR repertoire analysis, similarly to previous studies (18, 25). Individual ferret-specific primers were experimentally validated and screened in silico across 11 ferret genome assemblies to account for observed variations in the ferret C-regions, including allelic, isoform, and subtype. Finally, these new ferret-specific assays were evaluated by single-cell-based V(D)J and transcriptome sequencing analysis of ferret PBMC and splenocyte samples on a commercial single-cell platform.

Generation of a complete collection of reference sequences and annotations for domestic ferret Ig and TCR constant region genes

Using PacBio Iso-Seq transcriptome sequencing, we obtained over 5.9 million circular consensus sequence (CCS) reads from 11 samples sourced from either lymph nodes or splenocytes. Each CCS read represents the consensus sequence of a single transcript. About 89.75% of these CCS reads were full length (i.e., contained the 5’ cDNA primer, 3’ cDNA primer, and polyadenylation tail). Using the available IMGT annotation of the V, D, and J regions of Ig and TCR germline sequences from ferret, horse, dog, and cat as an IgBLAST database, we recovered 118,485 Ig and 4,746 TCR putative transcript sequences (2% and 0.08% of the total number of full-length CCS reads) (Table S1). The average rates of mismatch, insertion, and deletion events in the C region sequences were estimated to be 0.11%, 0.12%, and 0.04% respectively, which were comparable with our previous observations (25).

We extracted the C-regions from these putative Ig/TCR transcript sequences and constructed C-region consensus sequences from 202 clusters of highly similar sequences. These consensus sequences represented complete cDNA sequences for the entire ferret C-regions (i.e., including the 3’ UTR). We then filtered out low-quality consensus sequences and collapsed redundant consensus sequences representing the same transcript isoforms from the same genes based on their alignments to the reference ferret genome assembly sequences (GCA_011764305.2) to generate a set of non-redundant C-region consensus sequences. Furthermore, we only included consensus sequences with complete ORFs and examined the corresponding splice site sequences after aligning them to the ferret reference genome assembly. These ferret C-region reference sequences and annotations were further manually curated and named based on the orthologous human C-region genes (20, 21, 26, 27). Additional details are described in Materials and Methods.

As summarized in Table 1, Fig. 1 and 2; Fig. S1 and Table S2, we identified functional ferret orthologues for IGHA, IGHD, IGHE, IGHG2, IGHG4, IGHM, IGKC, IGLC, TRBC, TRDC, and TRGC. We also identified ferret orthologs for the components of the surrogate light chain, VpreB and λ5 (Table 1 and Fig. 1D). We detected multiple subclasses of ferret IGHG, IGLC, TRBC, and TRGC genes, based on the alignment of consensus sequences to the reference ferret genome assembly (GCA_011764305.2). For each IGH isotype, we also identified both membrane-bound and secreted transcript-splicing isoforms in the transcriptome (Table 1). We also observed apparent allelic variants (i.e., genetic variations among ferrets evidenced by the sequence differences between the transcripts we obtained in this study and the reference genome assembly) for each of the IGH isotypes data (Table 1 and Table S2, “Allelic Variants” tab). We also observed several additional alternatively spliced transcript isoforms of low frequencies in the Iso-Seq data, which are not included here, as it is unclear if they were technical artifacts. Overall, ferret Ig and TCR C-region genes are present on the reference assembly in the same relative order and orientation as seen in humans (Fig. 1A through C and 2A through D).

TABLE 1.

Summary of ferret ig and TCR constant regionsa

Subclass Form Number of allelic variants CCS counts (10 ferrets)
IGHA Secreted 4 14,197
Transmembrane 336
IGHD Secreted 2 266
Transmembrane 466
IGHE Secreted 3 5
Transmembrane 0
IGHG2 Secreted 3 332,500
Transmembrane 10,276
IGHG4 Secreted 2 28,572
Transmembrane 322
IGHM Secreted 2 9,227
Transmembrane 7,942
IGKC 1 2,519
IGLC1 1 3,680
IGLC2 1 7
IGLC3 1 1,206
IGLC4 Pseudogene 1 N/Ab
IGLC5 1 1,206
IGLC6 Pseudogene 1 N/Ab
IGLC7 1 86
TRAC1 1 33
TRBC1 2 79
TRBC2 1 109
TRDC1 1 429
TRGC1 Pseudogene 1 N/Ab
TRGC2 Pseudogene 1 N/Ab
TRGC3 1 43
TRGC4 1 1
TRGC5 1 38
TRGC5A Pseudogene 1 N/Ab
TRGC6 Pseudogene 1 N/Ab
TRGC7 1 0
TRGC8 1 0
TRGC9 1 0
VpreB 1 N/Ab
λ5 1 N/Ab
a

Ferret Ig and TCR constant regions and allelic variants were identified as described in Materials and Methods. Transcriptomes comprising splenocytes from a total of 10 individual ferrets (SRA accession numbers SRR29376976, SRR29376975, SRR29376974, SRR26825672, and SRR26825671) were assessed for the number of exact matches for each sequence. See Table S2 for detailed sequence information and Table S2, “Allelic Variants” tab, for sequences and counts of allelic variants.

b

N/A, the CCS read counts of pseudogenes and surrogate light chain genes were not included.

Fig 1.

Genomic maps of immunoglobulin loci display gene segments for IGH, IGK, and IGL regions. Variable, diversity, joining, and pseudogenes are annotated along chromosomal coordinates. Arrows indicate transcriptional orientation.

Genomic organization of ferret Ig regions. (A) Heavy chains and associated V, D, and J genes; (B) Kappa light chains and associated V and J genes; and (C) Lambda light chains and associated V and J genes. Constant regions are annotated in orange, V regions are in blue, D regions are in yellow, J regions are in red, and surrogate light chain components VpreB and λ5 are shown in gray. The orientation of each gene is indicated with an arrow. Kappa annotations are on the sense strand of the reference contig but depicted on the antisense strand in (C). Genome positions relative to ferret reference genome assembly GCA_011764305.2, contigs JAADYL010000038.1 (heavy chain locus), JAADYL010000784.1 (light chain lambda locus), and JAADYL010000091.1 (light chain kappa locus) are indicated below the bar. See Fig. S1 for additional details.

Fig 2.

Genomic maps of T-cell receptor loci depict gene segments for TRA, TRB, TRD, and TRG regions. Variable, diversity, joining, and pseudogenes are annotated along chromosomal coordinates, with arrows indicating transcriptional orientation.

Genomic organization of ferret TCR region. (A) TRA associated V, D, and J genes; (B) TRB and associated V, D, and J genes; (C) TRG and associated V and J genes; and (D) TRD and associated V and J genes. Constant regions are annotated in orange, V regions are in blue, D regions are in yellow, and J regions are in red. The orientation of each gene is indicated with an arrow. Genome positions relative to ferret reference genome assembly GCA_011764305.2, contigs JAADYL010000821.1 (TCR alpha and delta chain locus), and JAADYL010000772.1 (TCR beta and gamma chain locus) are indicated below the bar. See Fig. S1 for additional details.

Generation of a reference annotation of domestic ferret Ig and TCR V, D, and J germline genes

We further identified and annotated Ig and TCR V, D, and J germline genes on the recent ferret reference genome assembly (GCA_011764305.2) using a combined analysis of the ferret Ig/TCR transcripts and genomic sequences and manually validated and curated them based on the presence of highly conserved features such as RSS signals (see Materials and Methods and Fig. S2). The number of Ig and TCR V-, D-, and J- region genes are summarized in Table 2, and their genome arrangements are shown in Fig. 1 and 2 and Fig. S1. Detailed annotations of Ig and TCR V, D, and J genes are provided in Table S2 and Supplementary Text. A significant number of ferret Ig V genes were identified as pseudogenes or non-functional ORFs (35/63 for IGH, 96/136 for IGL, and 47/87 for IGK). For TRA, TRB, TRD, and TRG, we identified 96, 34, 5, and 16 V genes (with 44, 15, 2, and 10 being non-functional), respectively (Table 2; Table S2).

TABLE 2.

Summary of ferret V, D, and J region counts for Ig and TCRa

Class Gene type Functional ORF Pseudogene Total
IGH V 28 1 34 63
D 9 1 0 10
J 6 1 1 8
IGK V 40 3 44 87
J 4 1 0 5
IGL V 40 6 90 136
J 7 1 0 8
TRA V 52 9 35 96
J 36 19 5 60
TRB V 19 3 12 34
D 2 0 0 2
J 8 3 1 12
TRD V 3 0 2 5
D 2 0 0 2
J 2 0 3 5
TRG V 6 2 8 16
J 4 6 11 21
a

Ferret Ig and TCR V, D, and J regions were identified and classified as functional, pseudogenes, or ORF, as described in Materials and Methods. In the case of IGHV, 19 provisional genes were found in separate contigs (see Materials and Methods and Table S2, “IGHV Provisional” tab) are not included.

Ferret germline genes were generally located in the expected contexts, based on comparisons to other species (Fig. 1 and 2). For example, IGH V genes generally map upstream of the heavy chain constant regions on the same contig (Fig. 1A), and IgL J genes map between each of the seven IgL gene loci (Fig. 1C) (20, 21). Most ferret RSSs identified in this study shared the canonical heptamer (CACAGTG) and nonamer (ACAAAAACC) sequences of both mice and humans (Table S2 and Fig. S3 ); however, abundant expression of several genes flanked by non-canonical heptamers and/or nonamers was identified (Table S2); for example, some well-expressed genes used CGATTCGGA, CCATATTGT, GTCTTTGTC, or ACTTCTTGT instead of the canonical nonamer, or CAGTGTG or CATTGTG instead of the canonical heptamer.

Design of ferret-specific B and T cell V(D)J assays for single-cell analysis

We designed ferret-specific primer sets based on the Ig and TCR constant regions identified above, using consensus sequences that match all isoforms of each receptor class. We also checked these primers against 11 ferret genome assemblies that were made public recently to reduce the potential effects of allelic variations on the amplification coverage (see Materials and Methods). Primer sets are described in Table S3.

We validated the ability of our primers to amplify each of the C-region references in silico (Fig. S4) and their amplification specificity experimentally (Fig. 3A), and compared them with reverse primers targeting C-regions of Ig and TRB genes from Wong et al. (20) and Gerritsen et al. (21), respectively (Fig. S4). Across every Ig and TCR C-region genes in our reference, we were able to amplify all of the corresponding C-region genes identified on the 11 ferret genome assemblies in silico, with the exception of the pseudogene TRGC6*01 with a truncated exon 1 sequence that does not have the inner primer binding site (Fig. S4). This coverage was better than or equal to that of primers by Wong et al. (20) and Gerritsen et al. (21) across all genes (Fig. S4).

Fig 3.

Gel electrophoresis images depict Ig and TCR enrichment across samples. Heatmap quantifies primer hits for immunoglobulin and TCR constant regions, with color intensity reflecting log2-transformed contig counts.

Validation of ferret-specific primers. (A) Individual ferret Ig- and TCR-specific primers were used to amplify the corresponding genes of the C-region genes from cDNAs isolated from a pool of three PBMC samples. Corresponding forward primers that were upstream but still within the C regions were designed solely for the purposes of this assay. PCR amplicon products of such reactions were analyzed on a 1% agarose gel to confirm the amplification specificity. Gel image on the left: primers for the first enrichment reactions. Gel image on the right: primers for the second enrichment reactions. On each gel, image lanes 1, 9, and 14 are 100 bp ladder. Lanes for Igs are labeled as IGHA (A), IGHD (D), IGHE (E), IGHG (G), IGHM (M), IGK (K), and IGL (L). Lanes for TRs are labeled as TRA (A), TRB (B), TRD (D), and TRG (G). (B) Recovery of ferret V(D)J transcripts using the single cell-based V(D)J assays with the ferret C-region specific primers. The tiled plot indicates the total number of ferret V(D)J transcript contigs with respective ferret primer and C region matches recovered from the ferret PBMC and splenocyte samples. See also Fig. S4.

Immune repertoire sequencing of ferret PBMC and splenocyte tissue samples

We next sought to experimentally validate these ferret-specific single-cell based V(D)J assays. To this end, we applied our ferret-specific primers to ferret PBMC and splenocyte tissue samples in conjunction with the 10× Genomics-based paired gene expression and immune repertoire analysis workflow, similarly to reference 18. These V(D)J assays enable the recovery of corresponding transcriptomic profiles and paired VDJ-VJ chains from single B and T cells. We sequenced over 16,000 single-cell barcodes from PBMC and splenocyte tissue samples. De novo assembly of Ig and TCR enrichment sequencing libraries resulted in 82,418 unfiltered, unannotated contigs. As shown in Fig. 3, 68,003 contigs had ferret Ig/TCR enrichment primer hits, and 65,203 (95.9%) of them also had ferret Ig/TCR C-region matches, which all corresponded to the primer hit, indicating our ferret Ig/TCR enrichment primers were efficient and specific. All ferret Ig/TCR isotypes and subtypes, including the weakly expressed IGHE isotype, were recovered. The small percentage of contigs with primer hits but not C-region matches tended to have lower UMI counts and were shorter (Table S3), suggesting that their corresponding transcripts might not be fully assembled de novo due to the low transcript abundances.

Separately, we filtered single-cell transcriptomic profiles for cells of healthy quality and removed cell barcodes suspected of being technical doublets (see Materials and Methods). After filtering, in total, we obtained 12,331 cell barcodes (8,016 for PBMC and 4,315 for splenocyte sample), 5,550 barcodes with annotated Ig VDJ and/or VJ contigs, and 5,776 barcodes with TCR VDJ and/or VJ contigs. Also, 810 cell barcodes had annotated TRG or TRD contigs. Interestingly, we observed a large difference between the two samples in terms of the frequencies of gamma-delta T cells, with 400 cell barcodes with TRG and/or TRD contig in the PBMC sample (8.1% of all cell barcodes with at least one TCR contig) and 410 in the splenocyte sample (49.3%).

Parallel gene expression analysis allowed us to cluster single cells and served as a ground truth to assess the pairing efficiency of our single-cell immune repertoire assay. Pairing efficiency is defined as the rate of captured B and T cells that have paired VDJ and VJ chains recovered. To integrate gene expression profiles and immune repertoires of single cells, we clustered PBMC and splenocyte cells in UMAP space and examined the VDJ-VJ pairing efficiency in each cluster with at least 50 cells. Based on the detection of Ig contigs, we captured four clusters of B cells with at least 50 cells in the PBMC sample (Fig. 4A and B; Table S4). Figure 4C shows the Ig pairing efficiencies in the ferret PBMC sample, which range from 84.8% (cluster 3) to 89.8% (cluster 5). Among three T cell clusters in the PBMC sample, the TCR pairing efficiencies were 60.7% for cluster 1 and 72.2% for cluster 2, but 41% for cluster 8 (Fig. 4C). To investigate the apparently low TCR pairing efficiency for cluster 8 and to check if our Ig/TCR analysis might have missed other B and T cell clusters, we performed the cell type prediction for each cell using SingleR (28) (see Materials and Methods, Fig. S5 and Table S5 ). We also examined the expression of ferret orthologs of several canonical markers for human B (CD79A, CD79B, and CD19) and T (CD3D, CD3D, and CD3G) cells in each of the cell clusters (Fig. S6). We found that cluster 8 included two subclusters of cells, one for T and one for NK cells (Fig. S5A and S5B), and the TCR pairing efficiency for the T cell subcluster was 63.3% (Fig. S5C). The same cell type prediction analysis also indicated that cluster 6 was a mixture of B and T cells (Fig. S5B), but the observed Ig or TCR efficiency was extremely low due to the lack of detection of Ig/TCR contigs (Fig. 4A). We compared the cell gene expression profiles of cluster 6 with those of clusters 2 (closest T cell cluster in Fig. 4A) and 7 (closest B cell cluster). Functional analysis of both lists of the differentially expressed genes showed that cluster 6 was highly enriched with cell death-related biological functions (Table S6), suggesting that cells in cluster 6 might have failed to fully rearrange their VDJ/VJ regions, leading to programmed cell death.

Fig 4.

UMAP plots display immune cell clustering in PBMCs and splenocytes, colored by cell type and VDJ recombination status. Bar graphs depict cell barcode distributions across Ig and TCR loci. Violin plots illustrate gene expression levels across clusters.

Validation of ferret single-cell gene expression and immune repertoire profiling assay. (A) UMAP plots of the (left to right) assignment of the cell cluster, Ig chains, TCR chains in the ferret PBMC sample. (B) Same as (A) but for the splenocyte sample. (C) Ig and TCR pairing efficiencies of the cell clusters in the PBMC sample. Percentages and numbers indicate the frequency of B and T cells with paired chains (VDJ, VJ), VDJ-only, VJ-only, or no recovered chains. (D) Same as (C) for the splenocyte sample. (E) IGH isotype usage of B cell clusters in the splenocyte sample shown in (D).(F) Expressions of selected B cell development-related marker genes in the individual cell clusters of the splenocyte sample. The cell type of cell clusters in PBMC (A) and splenocyte (B) samples was collectively annotated based on the detection of Ig and TCR transcripts, SingleR-based cell type prediction, and manual review of selected canonical cell markers. The list of canonical markers used: common T cell marker: CD3D, CD3D, CD3G; common B cell marker: CD79A, CD79B, CD19; neutrophil in PBMC: S100A8, S100A9; monocyte: S100A4, LOC101672794 (PYCARD), LOC101687145 (FCN1); plasma cell: JCHAIN, TNFRSF17, MZB1; pre-B cell: surrogate light chains VPREB1 (LOC101692964) and IGLL5 (LOC101692660); pro-B cell: EBF1, DNTT; myelocyte/ immature neutrophil: MMP9, MMP8, NCF1, NCF2, NCF4; and NK cell: NCR3, NCR1 (LOC101670751). n/a in (B) indicates the cell type was not annotated due to the total number of cells (20) in cluster #12 was extremely small. For additional details, see Fig. S5 for the cell type prediction and Table S4 for differentially expressed genes in each cell cluster.

Similarly, there were four clusters of B cells (Fig. 4B) in the splenocyte sample. The Ig pairing efficiency was 81.9% for cluster 1 and 92.4% for cluster 11 (Fig. 4D). Also, most of the B cells in cluster 11 had undergone class switch, based on the high percentage of cells with IGHG and IGHA contigs (Fig. 4E). The high expression of plasma marker genes such as TNFRSF17 and MZB1 also suggests that cluster 11 were likely to be plasma cells (Fig. 4F), therefore with higher Ig expressions and higher Ig pairing efficiency observed. However, the Ig pairing efficiencies were much lower for clusters 2 (45%) and 5 (13%) (Fig. 4D). These two clusters also included much higher percentages of cells with only IGH contigs (cluster 2: 49.6%; cluster 5: 85.7%) and had high expression of surrogate light chains VPREB1 and IGLL5 (Fig. 4F), indicating that those cells were in the pre-B cell phase of development. At this stage, IGH germline sequence has undergone VDJ rearrangement, and a pre-B receptor is expressed on the surface of the cell. The pre-B receptor utilizes a surrogate light chain because neither the IGK nor IGL loci has been rearranged at this stage, and the cells express IGHM (10).

Cell type prediction also suggested that the cluster 8 in the splenocyte sample was a cluster of pro-B cells (Fig. S5D and S5E), which was consistent with the high expression of related markers such as DNTT (Figure 4F, and [29]). This suggests that a very low detection of Ig contigs in cluster 8 (Fig. 4B) was likely due to the lack of the VDJ rearrangements at the pro-B cell stage. In addition, cluster 4 in the splenocyte sample was predicted to be a cluster of B cells (Fig. S5D and S5E), but with a very low detection of Ig or TCR contigs as well (Fig. 4B). We then approximated if the cells in cluster 4 tended to express Ig transcripts using their RNA-seq data (see Materials and Methods). We found the percentage of cells in cluster 4 that had RNA-seq reads aligned to genomic regions annotated with Ig variable genes was comparable with the B cell cluster 1 and plasma cell cluster 11, and similarly much higher than non-B cell clusters 3, 6, 7, 9, and 10 (Table S5), indicating the cells in cluster 4 were very likely B cells. Interestingly, this analysis showed the pre-B cell clusters 2 and 5 had higher percentages of cells with IGH-only expressions (Table S5), similarly as the VDJ analysis described above (Fig. 4D). This analysis also showed the pro-B cluster 8 had a slightly higher percentage of cells with IGH only expressions, suggesting that partial IGH rearrangements/expressions in those cells might have occurred. We also compared the cell gene expression profiles of cluster 4 with that of clusters 1 (high Ig pairing efficacy representing mature B cells) and 2 (high percentage of IGH only cells representing B cells in early development). Functional analysis of both lists of the differentially expressed genes showed that cluster 4 was highly enriched with cell death-related biological functions (Table S6). Since we were not able to assemble full-length Ig transcripts through V(D)J sequencing, these results suggest that the overall quality of RNAs including Ig transcripts of the cells in cluster 4 could be too low, either due to low cell quality or cell death triggered by incomplete VDJ/VJ rearrangements. Cluster 10 in the splenocyte sample was predicted to be a cluster of T and NK cells (Fig. S5D and S5E), but the high expressions of NK cell markers such as NCR3 and NCR1 (LOC101670751) and low expressions of T cell markers such as CD3E and CD3G suggest that cluster 10 is most likely to be predominately NK cells (Fig. S5F).

Thus, the observed variations in ferret Ig and TCR pairing efficiencies corresponded to the expected biological functions and developmental progression of cells in these compartments, demonstrating that our analyses are able to elucidate the Ig and TCR transcript abundances in individual ferret B and T cells and link them to their developmental stages. The resources developed here are therefore suitable for efficient single-cell-based immune repertoire sequencing analysis of ferret cells.

DISCUSSION

Ferrets are a valuable small animal model for several human diseases, including influenza, SARS-CoV-2, and cystic fibrosis. However, research with this model has been limited by a lack of reagents and genetic information, especially as immune repertoire studies have become widely used in the studies of humans and other species. In this study, we have identified and annotated both constant and variable regions for ferret Igs and TCRs. We used long-read transcriptome sequencing, combined with available genome sequences, to generate the first complete reference of C-regions from all expected ferret Ig and TCR isotypes and chain types, as well as an extensive reference annotation of V-, D-, and J germline genes. We also developed and experimentally validated the first ferret-specific single-cell paired gene expression and immune repertoire profiling assays compatible with the high-throughput 10× Genomics platform. These results mark a major advancement in ferret immunogenetic resources and the potential for improved translational medicine regarding domestic ferrets as an animal model of infectious diseases.

We found two TRBC genes in ferrets, as did Gerritsen et al. (21), which is also the known number of TRBC genes in humans, dogs, rhesus macaque, and cats (26). We also identified 10 TRGC genes, of which six are functional (Table 1); dogs have six functional TRGC (30). IGHG is also known to have multiple subclasses, including four in dogs (31), four in mink (32), and three in cats (33); we find two IGHG subclasses in ferrets. The genomic version of one of these subclasses (IGHG2) is classified here as a pseudogene (Table S2) due to apparent frameshift mutations in the sequence; however, these may represent sequencing errors, since mRNA transcripts of this gene with frameshifts corrected (here referred to as IGHG2*02 and IGHG2*03) are present at high levels in transcriptomes, whereas IGHG2*01 (the frameshifted version) represents only 0.15% of the total transcripts of secreted IGHG2 (Table 1; Table S2, “Allelic Variants” tab). No exact matches for the genomic sequences of IGA were found in transcriptomes; this may reflect an allelic variant not present in the 10 ferrets analyzed or may also reflect a sequencing error in the genomic sequence. In humans, there are seven IGL genes, of which three (IGLC 4, 5, and 6) are pseudogenes (34). Ferrets also have seven IGLC genes, positioned in the same arrangement as in the human genome (34) (Fig. 1C); however, only two (IGLC 4 and 6) seem to be pseudogenes, since unlike the human situation, no premature stop codon is present in IGLC5 and the splice signal is intact. The sequences of IGLC3 and IGLC5 are identical, and the sequences of the proteins encoded by IGLC1, IGLC3, and IGLC5 are identical (Table S2).

We identified both membrane-bound and secreted forms of all IGHC, differentiated by distinct exon usage at the 3’ end of transcripts. As expected, IGA and IGG transcriptomes were dominated by secreted versions of the genes, whereas transmembrane versions of IGM and IGD were more commonly seen in transcriptomes. Interestingly, an allelic variant of IGD (IGHD*02) was identified in transcriptomes containing an 18-base pair insertion in the transmembrane exon (Table S2, “Allelic Variants” tab). Unsurprisingly, IGE expression in splenocytes from these young, healthy, naïve, laboratory-housed ferrets was very low. Similarly, TCR expression in transcriptomes was generally much lower than for immunoglobulins, with some subclasses being undetected; allelic variants of TCR constant genes very likely are present in ferrets as well, but the low counts make them difficult to confidently identify. Supporting this, a previously published TRBC sequence (here termed “TRBC1*01”) differs by one base pair from the sequence identified here (“TRBC1*02”) ( Table S2, “Allelic Variants” tab), and presumably these are allelic variants at this locus. Additional alternative splicing events that were identified were at low frequency in the lymph node and splenocyte transcriptomes and included retained introns, skipped exons, and variation in the start positions of exons, including some apparent scFv sequences. The abundances in other tissues and the biological function, if any, of these apparent splice variants require more studies.

For each of the constant regions (IGHC, IGKC, IGLC, TRAC, TRBC, TRDC, and TRGC), we identify V and J genes (and where expected, D genes) in the expected genomic contexts (Fig. 1 and 2; Fig. S1). Genome analysis also identified variable genes in the TRB and TRD regions without corresponding constant regions (Fig. 2B and C; Fig. S1 ). In some cases, putative VH genes were identified on a contig that did not overlap with constant regions, so that their genomic context could not be accurately determined. These genes were assigned provisional names pending more genome data and are listed in Table S2, “IGHV Provisional” tab.

The degree of overlap between the VDJ-region genes we identified here and those reported by others varied among the different IG and TR regions. For example, Wong et al. (20) identified 29 IGHV, 53 IGKV, and 34 IGLV genes, whereas this study identifies 82 (including 19 provisional genes: Table S2, “IGHV Provisional” tab), 87, and 136 respectively, including genomic context and additional sequence information (e.g., leader sequences) that was not available to previous work. Gerritsen et al. (21) identified 27 TRBV, 2 TRBD, and 12 TRBJ genes, and interestingly, we did not identify any new genes in the TRB region. In some cases, putative Ig or TCR V-region genes that did not meet our filtering criteria did overlap with the sequences reported by Wong et al. (20) or Gerritsen et al. (21), and these may represent allelic variants (given the outbred nature of ferrets used in these studies) or pseudogenes. In cases where genes identified here are identical to those previously identified (e.g., 21 of the 27 TRBV genes described by Gerritsen et al. (21), we have noted this in a “Comment” column (Table S2, “Previously Identified” tab); it is likely that some non-identical genes also represent allelic variants of previously described genes.

Although we examined transcriptome data only from a limited number of ferrets from a single supplier, allelic variants in the C regions were readily detected in these data sets (e.g., four alleles of IGHA, three of IGHE and IGHG2, and two each of IGHD, IGHG4, and IGHM: Table 1; Table S2, “Allelic Variants” tab), including both point variants (e.g., IGHA1-1 vs IGHA1-2) and insertion/deletion variations (e.g., a 6 amino acid insertion near the C-terminus of the membrane-anchored form of IGHD1*02 vs IGHD1*01). In addition, several sequences identified here are ~99% identical to those identified previously and in the same genomic context (20, 21) and are therefore potentially allelic variants of those reported previously. It is likely that much more allelic variation exists in the worldwide population of laboratory and domestic ferrets. We did not attempt to identify V/D/J allelic variants in this study because factors such as hypermutations may complicate the accurate detection of allelic variants in the rearranged Ig/TCR transcript sequences. However, it is likely that V/D/J allelic variants are also common. Consensus sequence generation of the most abundantly expressed IGHV, IGKV, and IGLV genes revealed some variation between alleles and subclasses, including in the positions of the primers designed by Wong et al. (20). This allowed us to consider sequence variation in C-region genes when we designed our ferret-specific assays, targeting positions within Ig and TCR C-region genes that were characterized by no or minimal sequence variation across different subtypes, isoforms, and alleles. We demonstrate that these primers amplify a substantial number of highly abundant V-region genes, as well as multiple lesser-abundant V-region genes, which are inefficiently amplified by previous Ig MPCR assays. Although the primers designed by Gerritsen et al. (21) were able to achieve 100% amplification efficiency for each TRBC sequence in our reference, the position and melting temperature parameters of these primers were not compatible with the 10× Genomics immune profiling assay. We were able to target and enrich every Ig and TCR chain and the isotype and recover B and T cells with matching transcriptomic profiles and paired repertoire profiles. Analysis of these two data together provides valuable insights into the functionality and development of B and T cells. This was exemplified by our analysis of IGH-only B cells detected in our splenocyte sample.

By combining genome analysis with full-length transcriptome sequencing data, we were able to greatly increase the number of annotated Ig and TCR V-, D-, and J-region genes, including over 100 new Ig V-, D-, and J-region genes. The ability to annotate the V, D, and J genes of Ig and TCR transcripts correctly and robustly is critical to future single-cell immune repertoire profiling assays. Overall, the use of these annotations and single-cell sequencing assays provides a valuable resource to infectious disease researchers utilizing the domestic ferret animal model.

Limitations of the study

There may be several potential limitations in this study. First, we directly sequenced the lymph node and spleen samples using Iso-seq, and consequently, the percentages of sequencing data representing Ig and TCR transcripts were low. In some cases, it was challenging to accurately construct reference sequences, and we had to rely on additional analysis such as transmembrane domain search in CCS reads. Using isolated B and T cells will significantly improve the coverages of Ig and TCR transcripts and therefore reduce the computational challenges and likely improve the quality of the final references but would increase the experimental complexity for a species lacking reagents. Also, the detection and annotation of Ig and TCR genes could be biased toward those genes expressed and detected in the specific samples used in this study. Considering that IMGT continues to curate and update its databases with newly discovered germline V, D, and J gene sequences for well-studied genomes, such as those of humans and mice, it is very likely that additional ferret V, D, and J genes will be identified. Second, the allelic variants we observed were based on the analysis of transcript sequences that may not have even representation across all positions. DNA sequencing will be more accurate and lead to more complete results. Third, we annotated V, D, and J genes on an unpublished draft ferret reference genome assembly. Although this current reference genome assembly used third-generation long read sequencing technology and is significantly more continuous than the previously released ferret assembly MusPutFur1.0 (an increase of contig N50 size from 44.8 kb to 23.6 Mb), it is not a haplotype-resolved genome assembly. Therefore, some of the annotated alleles may represent potential heterozygosity or a collapse of duplicated segments. The sequences of some constant genes were not found in the analyzed cDNA data sets, and this could indicate potential sequencing errors. We successfully identified 57% of V, D, and J non-pseudogenes of the entire ferret germline repertoire expressed in full-length and 100% identity in cDNA sequences (Table S2). In the future, the generation of haplotype-resolved ferret genome assemblies using longer and higher quality reads with improved assembly algorithms will continuously enhance the ferret immune repertoire. Fourth, for the single-cell analysis, we only analyzed PBMC and splenocyte samples from naïve animals. Additional samples from infected and/or vaccinated animals may provide additional information about ferret Ig and TCR repertoires. Fifth, we performed ferret cell type prediction and classification based on the human references and a small number of canonical human immune cell markers, but we did not perform a detailed benchmark analysis of the accuracy of these predictions.

MATERIALS AND METHODS

Ferret tissue sample preparation and transcriptome sequencing of Iso-Seq analysis

All ferrets originated from Triple F Farms LLC (Gillett, PA). To construct cDNA libraries for Pacific Biosciences (PacBio) single-molecule real-time (SMRT) sequencing, RNAs were isolated from two tissue types: lymph node (LN) and splenocytes. Splenocyte samples were from 4- to 6-month-old female specific pathogen-free (SPF) ferrets that were serologically negative for influenza (used under CDC IACUC protocols #2885YORFERC and 3168YORFERC). Lymph node samples were archived samples from 16 to 19 weeks old male and female ferrets in a bartonella infection study (used under NC State University IACUC #18-175-B). Additional information on the ferrets and samples including source, age, sex, and infection/treatment is in Table S1.

Tissue from five colonic and three cranial mesenteric lymph nodes was flash frozen in liquid nitrogen. No more than 100 mg of frozen tissue was placed in at least 10 volumes of RNAlater ICE (Invitrogen) overnight at −20°C. After overnight incubation, tissue samples were transferred to a 2 mL tube containing 2.8 mm ceramic beads and 1 mL TRIzol (Invitrogen) for RNA isolation. Samples were processed in a FastPrep 24 (MP Biomedicals) for 2–4 cycles of 20 s at a speed of 4.5 with 1 min ice rests between cycles. Exact cycle numbers were determined based on visual observance of tissue homogenization. Homogenized samples were isolated according to the TRIzol procedure for RNA to the point of isolating the aqueous phase, after which each phase was precipitated with 1.5× volumes 100% ethanol. Samples were then applied to RNeasy (Qiagen) columns, treated with DNase, and isolated according to the manufacturer’s protocol. Samples were quantified using a NanoDrop (Thermo) and were assayed for RIN using a Bioanalyzer (Agilent). Only samples with a RIN ≥ 7.5 were used for Iso-Seq library construction. Iso-Seq library construction was performed using 300 ng input RNA per sample following the manufacturer’s specification (Iso-Seq Express Template Preparation for Sequel and Sequel II Systems, Iso-Seq Express 2.0 Workflow) and was sequenced using a Sequel II System (Delaware Biotechnology Institute at the University of Delaware). RNA samples were barcoded separately and pooled together for sequencing.

Ferret splenocytes were stored in liquid nitrogen in the presence of 10% DMSO and then processed in a similar manner to the lymph nodes. Total RNA from splenocytes was extracted with the RNeasy Micro Kit (QIAGEN). High-quality RNA (with an RNA integrity number over 7.0) was processed for cDNA synthesis by following the Iso-Seq express template preparation protocol for Sequel and Sequel II sequencing systems. cDNAs were amplified and those with a size greater than 2 kb were selected with ProNex beads (Promega). cDNA SMRTbell libraries were prepared and sequenced with a PacBio Sequel II system in the core facility at Centers for Disease Control and Prevention. RNA samples were barcoded separately and pooled together for sequencing.

Raw PacBio Iso-Seq sequencing data were first run through the circular consensus sequence (CCS) protocol in SMRT Link v8.0 to generate CCS reads. Each CCS read represents the consensus sequence from multiple passes of a single transcript. CCS reads were classified as non–full length and full length, in which the latter has all the following detected: polyadenylation tail, 5’ cDNA primer and 3’ cDNA primer.

Identification and annotation of ferret Ig and TCR constant regions

Here, we aimed to generate the complete cDNA sequences representing the entire ferret C-region (i.e., including the 3’ UTR) from the Iso-Seq data. We first identified putative full-length ferret Ig and TCR transcript sequences by aligning all full-length ferret Iso-Seq CCS reads to a custom Ig/TCR database using IgBLAST (v1.16.0) (35), with an e-value cutoff of 0.01. Due to the incompleteness of the domestic ferret reference sequences in the IMGT database, before the finalization of this project, our custom IgBLAST database was composed of Ig and TCR V, D, and J sequences from several related species including dog, cat, horse, and TRB. Full-length CCS reads were retained as putative Ig/TCR sequences for downstream analyses if they had a variable gene hit with an e-value < −log10(50) and a joining gene hit < −log10(3).

Next, we extracted the C-region sequence from each of the putative full-length ferret Ig/TCR transcript sequences, based on the J-region end position reported by IgBLAST alignment. Extracted C-region sequences of high similarity were clustered together using CD-HIT v4.6.6 (36) with the following parameters: −c 0.99 G 0 -aL 0.95 -AL 100 -aS 0.99 -AS 30 -d 0 T. We only kept the clusters with at least 10 extracted C-region sequences for the consensus sequence generation. After clustering, a consensus C-region sequence was generated from each cluster. Abundant allelic variants, that is, alternative alleles represented by approximately 50% of CCS reads that were uniquely aligned to the corresponding position on the reference genome assembly with more than 98% identity, were interpreted as allelic variants. We used the cons tool from the EMBOSS v6.6.0 suite (37) to generate consensus sequences and identify and remove any remaining insertions at predominantly gapped locations. To ensure that these consensus sequences were representative of true C-region sequences captured by our Iso-seq analysis, we assessed the quality of the generated consensus sequences by calculating the percentage of consensus at each base position and insertion and deletion rates. We removed all consensus sequences from our analysis if they had more than 10 base positions with a consensus percentage less 90%, an insertion rate above 0.0015 or a deletion rate above 0.0007.

To remove potentially redundant C-region consensus sequences, we performed a second clustering and consensus sequence generation step after aligning them to the recently released ferret reference assembly (NCBI accession no. GCA_011764305.2, generated by JCVI from the haploid genome of an outbred adult male ferret, using a combination of short and long reads (25). The corresponding NCBI RefSeq genome assembly was GCF_011764305 (23). This recent ferret genome assembly appears to be more continuous (contig N50 of ~24M bps vs. ~45 k bps) than the previously published draft ferret genome sequences (38). We aligned the ferret C-region consensus sequences generated above to the reference genome using the minimap2 alignment tool (39) with parameters: -ax splice:hq -uf --secondary=no. This round of ferret C-region consensus generation was performed with the collapse_isoforms_by_sam.py script from the Iso-seq bioinformatics platform (https://github.com/Magdoll/cDNA_Cupcake) to produce full-length, non-redundant consensus sequences. After this round of consensus sequence generation, we remapped these updated ferret C-region consensus sequences to the same reference assembly similarly using minimap2 and visualized the alignments in the Integrative Genomics Viewer (IGV) browser (40). Aligned ferret C-region consensus sequences were manually curated based on intersecting alignments of annotated C-region sequences from IMGT.

To check that these Iso-Seq data-derived ferret C-region transcript consensus sequences are functional, we performed two assessments. First, we generated amino acid sequences by translating the C-region consensus sequences using the EMBOSS transeq tool (37) and kept the frame with the longest amino acid sequence for each consensus sequence. Amino acid sequences were then analyzed using the Pfam collection of protein families (41) for functional domain prediction, particularly the presence of C1-set or DUF1968 protein domains. C1-set domains are found in immune system molecules such as Igs and TCRs. DUF1968 domains are representative of the TRA C-region domain. Second, we manually examined the corresponding donor and acceptor splice site sequences, after aligning these C-region consensus sequences to the ferret reference genome assembly. Since we observed both a high rate of predicted C-region domains (C1-set & DUF1968) in the translated amino acid sequences and corresponding canonical donor (GT) and acceptor (AG) splice sites, we removed those consensus sequences without a predicted C1-set or DUF1968 domain from further consideration.

Immunoglobulins can be secreted or membrane-bound. Therefore, we further classified our IGH C-region consensus sequences as membrane-bound and secreted isoforms for each IGH isotype, based on the presence or absence of a transmembrane domain. To do this, we generated translated amino acid sequences from each consensus sequence using transeq (37) and inputted them to the TransMembrane Hidden Markov Model (TMHMM) (v2.0c) sequence analysis tool (42) or DeepTMHMM (43). To thoroughly identify corresponding membrane-bound isoforms we first aligned all CCS reads to their respective secreted isoforms (see below) to identify the corresponding gene. We then scanned the aligned CCS reads for the presence of transmembrane domains similarly. This analysis also allowed us to identify a membrane-bound IGHE isoform. This additional sequence was similarly evaluated as described above and added to our C-region reference set.

We also assessed the completeness of this new ferret C-region reference set by comparing the overall consistency in identifying putative Ig and TCR transcripts in the Iso-Seq data using the custom IgBLAST with variable region reference sequences from related species vs. aligning directly to this new ferret C-region reference databases. We independently classified all CCS reads as putative Ig and TCR transcripts as follows. Iso-Seq CCS reads were globally aligned to this new collection of ferret Ig and TCR C-region sequences using the USEARCH -usearch_global tool (44) with a 90% identity threshold to ascertain chain type and isotype/subclass where appropriate. CCS read sequences without large gaps (>10 bp) in their alignment were kept in the final assignment of Ig and TCR transcript sequences, as it was unclear if those sequences with larger gaps were rare alternatively spliced transcripts or the result of sequencing errors. The agreement was evaluated by the overlap between the CCS reads captured by the new ferret C-region reference databases with those identified by the initial IgBLAST search.

The final set of C-region reference annotations was further reviewed by extensive manual curations including the addition of two IgL pseudogenes and named by the corresponding human nomenclature.

To assess the sequence quality of putative Ig/TCR transcript sequences initially identified by IgBLAST, we extracted the C region portions from the Ig and TCR CCS reads based on IgBLAST and aligned the extracted C region portions against the final set of C-region reference sequences using the USEARCH usearch_global tool with parameters: -id 0.65 -strand both -gapopen 10.0I/0.0E -gapext 1.0I/0.0E—query_cov 0.99. We then quantified the mismatch, insertion, and deletion events within these alignments.

Initial annotation of ferret Ig and TCR germline V and J genes

Similarly, we started annotating germline V and J genes by leveraging the full-length Ig and TCR transcripts in the ferret Iso-seq data. V-region sequences were extracted from Iso-Seq CCS reads using the start and end positions reported by the custom IgBLAST described above. We considered the entire sequence up to the V-region end position. Therefore, our putative V-region sequences contained the 5’ UTR, first leader sequence (L1), second leader sequence (L2), and V-region sequences (FR1 to CDR3). To determine the genomic positions of germline V and J genes, we aligned the putative V- and J-region sequences from CCS reads to the ferret reference assembly (NCBI accession no: GCA_011764305.2) using the minimap2 sequence alignment tool (39) with parameter-ax splice:hq -uf --secondary=no. At this step, we did not allow for secondary alignments as we reasoned that the germline sequences located by the primary alignments would have the best chance of locating authentic V genes.

Given the expected genomic organization of V-region genes, we filtered the alignment data to only include sequences with mappings consistent with the genomic structure of these genes. V genes are composed of two exons. The first exon is the L1 sequence, and the second exon is composed of the L2 and V-region sequences (45). We observed that more than 80% of our putative V-region sequence alignments had two exons and therefore removed the other 20% with more or less than two aligned exons. Additionally, we observed a small number of V-region sequence alignments that had an unusually large intron length (>350 bps) and a large first exon length (>200 bps). These alignments were also removed from the downstream analysis. The values of these criteria were chosen empirically, and sequence alignments outside of these values represented rare outliers. When multiple extracted V-region sequences were mapped to the same genomic locations, overlapping sequence alignments were collapsed into single loci using the merge command from bedtools (46).

Furthermore, we considered the potential of these putative V genes producing functional V(D)J transcripts and receptor proteins by the presence of a downstream RSS and a complete ORF. We examined the 50 bps directly downstream of the 3’ end of the putative V genes for the presence of an RSS. These 50 bp sequences were scanned for the presence of an RSS using the RSSsite software (47). To mitigate potential false negatives in the analysis of putative RSSs, we also considered whether the putative V gene had a complete ORF. To do so, we performed a three-frame translation of putative V gene sequences using the EMBOSS transeq tool. Because the captured L1 sequence represented the beginning of the V(D)J coding region, we required that each ORF have a methionine amino acid and removed any amino acid sequence upstream of the first methionine. Next, we removed any sequence downstream of the first stop codon (if one was detected) in each translation frame. We selected the translation frame with the longest ORF for each putative V gene. Based on these two criteria, we removed sequences that did not have a downstream RSS detected and did not have a complete ORF, that is, free of stop codons from the methionine in L1 to the detected RSS.

Annotation of J genes was performed similarly to the V genes. We extracted J-region sequences from Iso-seq CCS reads similarly using the method described for V- and C-region sequences. However, we limited this analysis to only CCS reads that were both reported by the custom IgBLAST and mapped to our ferret V-region and C-region references. The alignment of the extracted putative J-region transcript sequences to the ferret reference assembly was done using GSNAP (2017–04-13) (48), a mapping tool for short-read sequence alignment. The ferret reference assembly sequence was also filtered to contain only regions in between and including V- and C-region genes. This narrowed the search of germline J genes to genomic regions flanked by annotated V- and C-region genes. Due to the short sequence length, our J-region filtering criteria were relatively less stringent. We required that there be only one exon per J-region sequence alignment. This criterion was consistent with over 99.9% of our J-region sequence alignments. Overlapping alignments were similarly merged and filtered based on the presence of an upstream RSS. Additionally, we observed that in some cases the upstream RSS was within the putative J-region gene body; hence, the range of the RSS search consisted of 50 bps upstream of the J gene start and up to 50 bps within the J gene body.

Annotation of additional ferret Ig and TCR germline V and J genes

One potential constraint of the above transcript alignment-based analysis is that only a portion of the ferret Ig and TCR germline genes may be annotated, since not all V, D, and J genes were expressed and therefore captured by our Iso-seq data. In addition, the presence of extensive somatic mutations in immunoglobulin variable regions also means these regions may not be located directly from transcript alignments. Therefore, we identified and annotated additional germline V and J genes as follows.

We first iteratively scanned the reference genome assembly for additional Ig and TCR V and J genes that may have been undetected initially. For each iteration, we utilized the current set of annotated V and J genes as references, as we expected high sequence similarities among genes in the V(D)J region in general. To search for additional V and J genes in the same genomic region, we first masked out the currently annotated loci using the bedtools command maskfasta. Next, we aligned the current V- and J-region genes to the reference genome assembly using the same procedures described initially, but with less stringent scoring parameters. All new alignments were analyzed and filtered using the same RSS and ORF criteria described above. The only difference to our filtering procedure was that we used a custom R script to score and classify putative RSSs rather than RSSsite (47). The presence of an RSS was determined using the same formulas and thresholds described in (49). We confirmed the appropriateness of the score thresholds based on the assessment of the RSS score distributions of randomly selected sequences in the ferret genome that started with a CA, similar to previous studies (47, 49). Based on the RSS and ORF criteria, we classified new sequence alignments as true or false positives. True positives were added to our reference set, but both true and false positives were masked in the next iteration. We repeated this masking, searching, and filtering procedure until no new true positives were detected.

In parallel, for additional Ig germline V genes, we used the VgenExtractor tool (50) to locate potential V-exon regions on the ferret genome assembly. Using a Python script, the genomic region 500 bps upstream of each putative V gene was checked for the presence of probable leader sequences, and the region 55 bps downstream was checked for probable RSS sequences. The extended genomic sequences obtained were translated in all six frames using EMBOSS sixpack software (37) and manually examined. The upstream genomic regions corresponding to the Iso-Seq Ig V leader sequences were annotated as the L1 region.

To identify additional Ig J genes, we generated a list of non-redundant reference sequences for all immunoglobulin heavy and light chain J-region protein sequences in IMGT/GENE-DB (51). A local BLAST (52) database was created using the contig sequences that contained most of the IGH, IGK, and IGL V genes along with the C region genes from the ferret reference genome assembly, since those contigs were expected to also contain the corresponding germline J and D genes. A translated BLAST search was performed for each of the IMGT J genes against this local BLAST database. The genomic regions with a minimum match percentage of 60% were retained as putative J genes and the 50bps upstream and downstream of these putative regions were checked for the presence of RSS sequences using the Python script.

Annotation of ferret Ig and TCR germline D genes

Since the D genes are extremely short therefore challenging to align directly, we did not utilize the IgBLAST alignment of Iso-seq CCS reads to annotate D-region sequences. Instead, we devised two distinct but complementary approaches to annotate ferret germline D genes. For the first approach, we utilized a similar iterative RSS-based scanning strategy as described above to delineate the start and end positions of D genes, based on their expected gene structure. In brief, TRBD and TRDD genes are flanked on the 5’ end (V-D) by an RSS with a 12 bp spacer on the opposite strand as the D gene and an RSS with a 23 bp spacer on the same strand at the 3’ end (D-J). IGHD sequences are flanked by 12 bp spacers on both the 5’ and 3’ ends in the same strand orientations as TRBD and TRDD. We therefore utilized the 5′-RSS-D-RSS-3′ organization to determine the D gene start and end positions at the IGH, TRB, and TRD loci. Scoring of putative RSS sequences was done using the same procedure employed in the iterative genomic search of V- and J-region genes as described above. If two adjacent RSS sequences were less than 50 bps apart and in the expected orientation, then the sequence in between the two RSSs was classified as a D gene.

For the second approach, Ig D regions were first estimated within the VDJ-rearranged Iso-Seq CCS reads by searching for an extended CDR3 sequence pattern "Y[HY]C.{1,55}VSS” that incorporates the flanking residues, in addition to the previously reported CDR3 region (53). The estimated D-region protein sequences were queried against the same local BLAST database as described above containing D genes using translated BLAST. The genomic regions with a minimum match percentage of 60% were retained for further analysis. The length of germline D genes in IMGT (51) varies from ~3–15 amino acids and chances for picking random matches in the shortlisted blast hits are expected to be high. Accordingly, to reduce false positives in D gene selection, combinations of 3 bp patterns based on RSS conserved nucleotides like ‘CAC’ from heptamer and ‘ACC’ or ‘AAA’ from nonamer were selected and searched on both sides of the putative D genes. The results were further evaluated manually to exclude D genes with defective RSS flanking sequences (47, 49).

Finally, amplicon sequence sets were directly submitted to IMGT/HighV-QUEST (54, 55), which implements IMGT/V-QUEST program version 3.6.3 (30 January 2024) and IMGT/V-QUEST reference directory release 202430–2. To focus on Ig/TCR transcript sequences, transcriptomic reads were first assembled using MiCXR (56) v4.6.0 using a customized library created from IMGT reference directory 202430–2 (23 July 2024) for ferret species before the analysis by IMGT/HighV-QUEST. The IMGT/HighV-QUEST results are provided in 11/12 CSV files the content of which has been previously published (57). Productive rearrangements were studied to analyze the expression of V and J genes with an ORF functionality due to noncanonical V-RS sequences. For the analysis of the expression of ORF D-GENES, the CIGAR format for the D-REGION generated by IMGT/HighV-QUEST in vquest_airr.tsv file was used to identify non-trimmed and non-mutated rearranged D genes.

Classification of germline V, D, and J genes as functional or pseudogenes

Pseudogenes for V-gene units are commonly found in genome sequences but do not contribute to functional Ig or TCR. We identified pseudogenes based on IMGT criteria (45). Briefly, a V gene was considered functional if it (i) lacked a stop codon or a frame-shift mutation in the leader sequence part1 and V exon, (ii) contained the conserved tryptophan and cysteines, (iii) contained no defect in the splicing sites, recombination signals and/or regulatory elements, and (iv) had an intron less than 500bps between the leader sequence and V exon. V genes that met these criteria but that lacked any of the three conserved amino acids were classified as “ORF”; those that failed to meet the criteria were classified as pseudogenes.

Nomenclature of V, D, and J genes

Genes were assigned names as previously described (26, 27) Briefly, V genes were subgrouped by comparison to human sequences and by 75% percent identity at the nucleotide level. They were assigned a number for the subgroup, followed by a hyphen and a number for their localization from 3′ to 5′ in the locus. The J and D genes were similarly assigned to a subgroup according to the human sets, followed by a number based on genome location. Constant regions were assigned names according to homology to human sequences, whereas homology to canine sequences was used to resolve ferret IgG2 and IgG4 constant regions.

Design of ferret-specific primers targeting Ig and TCR C-regions

When designing ferret-specific primers to target and enrich Ig and TCR mRNA transcripts, we wanted to account for all observed variations in these sequences, including allelic, isoform, and subtype. To account for variation between different animals (alleles) in C-region genes, we used all 10 additional domestic ferret genome assemblies available under the same NCBI BioProject (accession no. PRJNA580247) as the ferret reference genome assembly. Of these 11 publicly available genome assemblies, 10 were sequenced using Oxford Nanopore PromethION technology, and one was sequenced using both Illumina NovoSeq 6000 and PacBio RS technology. Additionally, two of the 11 genome assemblies were generated using DNA samples from Marshall Farms’ animals, and the other nine were taken from Triple F farms.

We aligned the annotated C-region reference sequences as described above to each of the additional 10 genome assemblies using the minimap2 (39) tool with parameters: -ax splice:hq -uf –secondary=no -G20k. The resulting alignment files were converted to bed12 files using the bedtools (v2.30.0) (46) command bamToBed with parameter -split to account for splicing in C-region genes. Bed files were then used to extract the genomic C-region sequence for each gene from each assembly using the bedtools getfasta command with parameters -split -s to account for the strandedness and splicing of the sequence.

For each C-region gene, a consensus sequence was generated with the protocol described above with 11 references, one from each genome assembly. Rather than designing separate primers for each subtype and isoform of a C-region gene, all the C-region sequences of the same chain type (or isotype for IGH) were used to generate a consensus sequence. Only the first 500 bps of each sequence were used in consensus sequence generation and considered in the primer design for the 10× Genomics single-cell immune profiling assays. Next, we remapped the 11+ (>11 if more than one isoform or subtype) reference sequences to our consensus sequences using USEARCH -usearch_global to measure the percentage of consensus at each position. To ensure that our primers would be able to enrich all alleles, consensus sequences were altered such that every base pair with less than 100% consensus was masked (ACTG → N). In this case, the primer design software Primer3 (58) would only consider sections of consensus sequences that were identical across all 11 + reference sequences. We additionally restricted our primers to having a melting temperature between 60 and 70 degrees and a GC content between 50% and 60%. In some cases, primers necessitated manual design to achieve desired parameters and capture.

After primer design and selection, we again tested the ability of our primers to amplify each of the 11 C-region references using in-silico PCR (isPCR). Across every Ig and TCR gene in our reference, we were able to amplify all Ig and TCR C-region genes identified on the 11 assemblies in silico with a 100% primer match with the exception of TRGC1*01, TRGC2*01, TRGC6*01, and TRGC7*01. TRGC6*01 is a pseudogene with a truncated exon 1 sequence that does not have the inner primer binding site. Each of TRGC1*01, TRGC2*01, and TRGC7*01 contains one mismatched base at position 5 or 6 for the inner enrichment primer. This mismatch was necessary to achieve a complete complementary pairing with other TRG constant regions. Given the placement of this mismatch near the 5' end of the primer and the presence of the mismatched primer in the second enrichment reaction, the recovery of these three TRGs would still be possible with this design.

Validation of ferret-specific primers targeting Ig and TCR C-regions

We first evaluated ferret-specific MPCR assays for amplifying IGH, IGK, and IGL transcript sequences in the Iso-Seq data in silico. The input sequences to these assays were the ferret Iso-seq CCS reads used throughout this study that met filtering criteria after the custom IgBLAST alignment. We considered a transcript to be amplified after only a single round of amplification despite the nested RT-PCR design of this approach. The Wong et al. (20), Gerritsen et al. (21), and our newly designed reverse primers used were also assessed via in silico PCR analysis in which a dummy adapter sequence was prepended to the 5’ end of sequences to enable testing the differences in the reverse primer efficiency using the University of California, Santa Cruz isPcr tool (59). These in silico, template-switch PCR experiments were performed on the 11 ferret genome assemblies for each C-region gene. For the isPcr tool, we used the parameters which required a 14-nucleotide perfect match from the 3’ end of the primer (-tileSize = 11 -minGood = 14 -minPerfect = 14). The adjustment from the default value of 15 to 14 nucleotide perfect match from the 3’ end of the primer was made to amplify the pseudogene TRGC2*01 in silico.

Ferret Ig- and TCR-specific primers were further experimentally validated against cDNAs isolated from a pool of three PBMC samples, each from a different animal. For each ferret Ig- and TCR-specific primer, a corresponding forward primer was designed within the region upstream of the C-region reverse primer. These forward primers were then paired with the appropriate assay reverse primers and assessed for specificity via PCR in separate reactions and in pools representative of their final concentrations in the 10× Genomics VDJ enrichment protocol. PCRs were performed at an annealing temp of 65°C using Accuprime Taq polymerase (Invitrogen) with 1 min extension and 30 cycles. All primers were additionally tested with qPCR using PowerUp SYBR Green (Applied Biosystems) to confirm expression levels and rule out the possibility of significant primer bias occurring in the pooled reactions. PCR products of such reactions were analyzed on a 1% agarose gel pre-strained with SYBR-Safe to confirm the amplification specificity.

Ferret tissue processing and cDNA generation for single-cell sequencing

For single-cell sequencing analysis, one cryopreserved PBMC and one splenocyte sample were randomly selected from two influenza-naïve male animals between 1.3 and 1.5 years of age (CDC IACUC protocol #3139ROWFERC). Cells were washed in resuspension buffer (RPMI, 10% FBS), depleted of dead cells using a removal kit (Miltenyi Biotec), collected by centrifugation at 700 × g, and resuspended at an appropriate concentration, between 700 and 1200 cells/µL. All cells were counted by hemocytometer and assayed for viability using a Countess II automated cell counter (Thermo Fisher Scientific, Waltham, MA). From each sample, a maximum volume containing 17,000 cells, having approximately >80% viable cells, were loaded onto the 10x Chromium (10× Genomics, Pleasanton, CA) controller for a targeted cell recovery of 10,000 cells (10× Genomics, Pleasanton, CA, Chromium Next GEM Single Cell 5’ v2 UserGuide RevD). cDNA amplification was carried out at 20 cycles of amplification for all samples and assayed for quality and concentration using a Bioanalyzer 2100 and High Sensitivity DNA Kit (Agilent, Santa Clara, CA).

Ferret V(D)J enriched and 5’ gene expression library construction and sequencing

Construction of V(D)J libraries required the design and use of ferret-specific primers targeting the constant region of V(D)J transcripts for two subsequent enrichments as required by the 10× V(D)J workflow. An equimolar ratio of gene-specific primers was used to construct each pool, whereby Ig assays utilized a final concentration of 0.5 µM of each gene-specific primer paired with 1 µM 10× forward primer, and TCR assays utilized a final concentration of 1 µM of each gene-specific primer paired with 2 µM of the 10× forward primer. Primer pools were constructed in volumes of 48 μL to allow for direct substitution into the 10× workflow without further changes at this step. The number of amplification cycles was equalized between Ig and TCR enrichments such that each library was amplified for 12 cycles in the first enrichment and 10 cycles in the second enrichment. 5’GEX libraries underwent 16 cycles of final amplification. Otherwise, the protocol was followed according to the manufacturer’s specifications (10× Genomics, Pleasanton, CA, Chromium Next GEM Single Cell 5’ v2 UserGuide RevD).

Final V(D)J and 5’ GEX libraries were run on a Bioanalyzer 2100 with a High Sensitivity DNA kit (Agilent, Santa Clara, CA) to assess library size and concentration. Further analysis of library concentration was performed using a Qubit 3 fluorometer (Thermo Fisher Scientific, Waltham, MA) and combined with the sizing data from the Bioanalyzer to determine the appropriate loading concentration for each library. Sequencing was performed using a NextSeq 500 sequencer (Illumina, San Diego, CA) utilizing 150 cycle kits in a paired-end fashion. Runs were programmed to generate a 26bp read 1 sequence consisting of the 16bp 10× barcode and 10 bp 10× UMI, two 10 bp index reads, and a 122 bp read 2 sequence of the cDNA insert. Sequencing depth was targeted at a minimum of 5,000 reads per cell for V(D)J libraries and 20,000 reads per cell for 5’ GEX libraries as recommended by the 10× workflow (10× Genomics, Pleasanton, CA, Chromium Next GEM Single Cell 5’ v2 UserGuide RevD).

Single-cell immune repertoire sequencing analysis

Raw sequencing data were demultiplexed and converted to fastq files using the Cellranger (v7.0.0) command mkfastq. Next, raw sequencing reads were de novo assembled into contigs using the Cellranger vdj command. A de novo, rather than reference-based, contig assembly was performed due to the lack of a comprehensive collection of the ferret germline V(D)J sequences. To annotate the C- regions of the assembled V(D)J transcripts, we searched assembled contigs against inner-enrichment primers and the reference set of C-region sequences we compiled above, using the ublast (e value ≤ 1e−4) and usearch_global (sequence ID threshold = 0.9) commands, respectively, from the USEARCH (v10.0.240) (44) suite of sequence analysis tools. Subsequently, to annotate the V regions of the assembled V(D)J transcripts, we again utilized the same custom IgBLAST to search all de novo assembled V(D)J contigs against the IMGT collection of V-region reference from ferret and related species, using the same protocol described for Iso-seq data analysis as described above. After V(D)J contig annotation, we systematically filtered all de novo assembled contigs as follows. First, we removed any contigs that represented misassembled chimeric transcripts (e.g., IGL primer alignment and IGHM C-region alignment) or off-target transcripts (e.g., primer match at the 3’ end but without a corresponding C-region alignment). Second, we removed contigs that did not meet the same filtering criteria we used in the annotation of Iso-seq reads to the IgBLAST database [variable gene hit with an e-value < -log10(50) and a joining gene hit < - log10(3)].

Single-cell 5’ gene expression sequence analysis

Raw sequencing data were demultiplexed, aligned to the ferret reference assembly (GCA_011764305.2) with NCBI RefSeq Annotation release ID 102, and UMI-collapsed using cellranger (v7.0.0) commands mkfastq and count. We used the filtered feature-barcode matrices generated by cellranger, which contained only detected cell-associated barcodes. Cellranger used a cell-calling algorithm that was expected to identify populations of low RNA content cells and was based on the EmptyDrops method (60). UMI-count gene expression matrices were further processed using the protocol below using the Seurat package (61). Cells that appeared to be dead or dying were removed by filtering cell barcodes with greater than 5% of their UMIs mapping to any mitochondrial genes. Next, we removed technical doublets from our gene expression matrices using the doublet detection software Scrublet (62). We did not perform additional computational analysis to identify and correct potential ambient RNA. Because the libraries for transcriptome and VDJ/VJ were prepared and sequenced separately, we reasoned that low-quality cells by their transcriptome data might potentially contain valuable information about VDJ/VJ sequences and their diversity. For the PBMC sample, the minimum numbers of UMIs and genes of the remaining cell barcodes were 500 and 315. For the splenocyte sample, the minimum number of UMIs and genes of the remaining cell barcodes were 500 and 103.

We normalized the remaining gene expression data by log transforming the gene expression matrix using the “LogNormalize” method from Seurat. With this normalized matrix and canonical marker genes of the cell cycle, we used the CellCycleScoring function from Seurat to estimate the cell cycle phase for each cell. The list of 97 human cell cycle gene symbols was from reference 63, and we used their ferret orthologs based on the ortholog gene groups publicly available from NCBI. Finally, we ran the normalization and scaling procedure SCTransform (64) with vars.to.regress equal to the S and G2M cell cycle scores assigned to each cell previously. Genes annotated with “C_region” and “V_segment” biotypes in the NCBI RefSeq Annotation were excluded from the list of 3,000 most variable genes in each data set for the downstream analysis to mitigate their potential interferences (65). The principal components analysis was performed using the normalized and scaled expression levels of the remaining most variable genes in each data set. Based on Seurat’s recommendations, the first 30 principal components were used as input to uniform manifold approximation and projection (UMAP) and K-nearest neighbor cell clustering with a resolution parameter of 0.1 for both samples.

Cell type predictions of individual cells were performed using SingleR (28) using Seurat’s LogNormalized UMI counts and were independent of the cell clustering. We used the built-in reference Human Primary Cell Atlas (66) through the celldex package (28), and “pruned.labels” for cell type prediction. Differential expression analysis of individual comparisons was performed using a Wilcoxon Rank Sum test within the FindMarkers function from Seurat with min.pct and logfc.threshold parameters, both set to the value of 0 to test for differential expression across all genes. Differentially expressed genes that had a P-value less than 0.05 were used for functional enrichment analysis using Ingenuity Pathway Analysis (Qiagen, CA). Separate tables of differentially expressed gene profiles of all cell clusters were generated using a Wilcoxon Rank Sum test within the FindAllMarkers function from Seurat with a min.pct parameter of 0.5 for all genes.

To further assess the B cell type predicted for cluster 4 in the splenocyte sample, from each cell, we extracted RNA-seq reads aligned to the genomic regions annotated with ferret Ig variable genes. Because it is challenging to align individual RNA-seq reads to specific variable genes accurately, we chose to count the total number of unique reads aligned to each of the genomic regions (IGHV, IGKV, and IGLV) that we annotated Ig variable genes (regardless of the specific locations that each read was aligned to) to approximate the detection of Ig transcript in each cell. To balance spurious alignments and potentially low efficiency of recovering VDJ transcripts directly from 5’ expression sequencing analysis, we filtered the obtained read counts using a minimum read count threshold. We then calculated the percentage of cells in each cell cluster with Ig heavy and light chain transcripts detected. The minimum read count thresholds were empirically determined by manually examining the expected differences in the obtained percentages between B and non-B cell clusters.

ACKNOWLEDGMENTS

Dr. Ed Breitschwerdt at North Carolina State University provided the ferret lymph node samples sequenced and analyzed in this study. Dr. Stacey Schultz-Cherry at St. Jude Children’s Research Hospital provided the ferret PBMCs used to validate the designed ferret-specific primers. Thomas Rowe in Ted Ross’s lab at University of Georgia provided the ferret PBMCs and splenocytes used in the single cell sequencing analysis. Nedzad Music and Adrian Reber provided ferret splenocytes and helpful discussions. We thank Véronique Giudicelli for insightful scientific discussions and technical support concerning IMGT/HighV-QUEST analyses. We acknowledge interesting discussions with Corey Watson and the contribution of Fratzeska Fragkiadi in TRA/TRD locus annotation.

This project has been funded in part with Federal funds from the National Institute of Allergy and Infectious Diseases, National Institutes of Health, Department of Health and Human Services, under Contract No. 75N93019C00052. We acknowledge the support of Immun4Cure IHU “Institute for innovative immunotherapies in autoimmune diseases” (France 2030/ANR-23-IHUA-0009) as well as the support of the Institut Universitaire de France. IMGT is a member of the French Infrastructure Institut Français de Bioinformatique (IFB) as well as a member of BioCampus, MAbImprove, and IBiSA.

E.S.W., K.Y., T.S.T., S.S., H.N.B., and H.C. designed and performed experiments, analyzed data, and wrote the manuscript. S.S. also curated data. G.Z., I.S., G.F., and S.K. analyzed and curated data and wrote the manuscript. N.A. designed and performed experiments, analyzed and curated data, and wrote the manuscript. I.A.Y. and X.P. conceptualized and supervised the project, designed experiments, analyzed data, and wrote the manuscript.

The findings and conclusions in this report are those of the authors and do not necessarily represent the official position of the Centers for Disease Control and Prevention.

Contributor Information

Xinxia Peng, Email: xpeng5@ncsu.edu.

Stacey Schultz-Cherry, St. Jude Children's Research Hospital, Memphis, Tennessee, USA.

DATA AVAILABILITY

All ferret Iso-seq and single-cell RNA-seq data were deposited into NCBI (BioProject accession no. PRJNA939558 and GEO accession no. GSE231948). All ferret C, V, D, and J sequences are available in the IMGT database under accession numbers IMGT000135, IMGT000136, IMGT000131, IMGT000203, IMGT000195, IMGT000196, and IMGT000168 and NCBI TPA accessions BK063796, BK063797, BK068009, BK068010, BK068295, BK068011, and BK068537. Code is available on Github at https://github.com/ncsu-penglab/FerretIgTCR.

SUPPLEMENTAL MATERIAL

The following material is available online at https://doi.org/10.1128/jvi.00181-25.

Supplemental material. jvi.00181-25-s0001.pdf.

Figures S1 to S6, Table S5, supplemental text, and legends for additional supplemental tables.

jvi.00181-25-s0001.pdf (2.8MB, pdf)
DOI: 10.1128/jvi.00181-25.SuF1
Table S1. jvi.00181-25-s0002.xlsx.

Summary of the ferret Iso-Seq data and the abundance of ferret C-region isoforms.

jvi.00181-25-s0002.xlsx (30.1KB, xlsx)
DOI: 10.1128/jvi.00181-25.SuF2
Table S2. jvi.00181-25-s0003.xlsx.

Ferret Ig and TCR C-region cDNA consensus sequences, annotations and description of allelic sites, and V, D, J gene annotations on the ferret reference genome assembly.

jvi.00181-25-s0003.xlsx (420.5KB, xlsx)
DOI: 10.1128/jvi.00181-25.SuF3
Table S3. jvi.00181-25-s0004.xlsx.

Ferret Ig and TCR C-region specific primer sequences and the comparison of V(D)J contigs with primer hit and with or without the corresponding C-region match.

jvi.00181-25-s0004.xlsx (15.8KB, xlsx)
DOI: 10.1128/jvi.00181-25.SuF4
Table S4. jvi.00181-25-s0005.xlsx.

Differentially expressed genes for each cell cluster of the ferret PBMC and splenocyte samples.

jvi.00181-25-s0005.xlsx (229.8KB, xlsx)
DOI: 10.1128/jvi.00181-25.SuF5
Table S6. jvi.00181-25-s0006.xlsx.

Functional enrichment analysis of differentially expressed genes between selected cell clusters identified in the ferret PBMC and splenocyte samples.

jvi.00181-25-s0006.xlsx (207KB, xlsx)
DOI: 10.1128/jvi.00181-25.SuF6

ASM does not own the copyrights to Supplemental Material that may be linked to, or accessed through, an article. The authors have granted ASM a non-exclusive, world-wide license to publish the Supplemental Material files. Please contact the corresponding author directly for reuse.

REFERENCES

  • 1. Bouvier NM, Lowen AC. 2010. Animal models for influenza virus pathogenesis and transmission. Viruses 2:1530–1563. doi: 10.3390/v20801530 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2. Smith H, Sweet C. 1988. Lessons for human influenza from pathogenicity studies with ferrets. Rev Infect Dis 10:56–75. doi: 10.1093/clinids/10.1.56 [DOI] [PubMed] [Google Scholar]
  • 3. Bodewes R, Kreijtz JHCM, Geelhoed-Mieras MM, van Amerongen G, Verburgh RJ, van Trierum SE, Kuiken T, Fouchier RAM, Osterhaus ADME, Rimmelzwaan GF. 2011. Vaccination against seasonal influenza A/H3N2 virus reduces the induction of heterosubtypic immunity against influenza A/H5N1 virus infection in ferrets. J Virol 85:2695–2702. doi: 10.1128/JVI.02371-10 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4. Toots M, Yoon JJ, Hart M, Natchus MG, Painter GR, Plemper RK. 2020. Quantitative efficacy paradigms of the influenza clinical drug candidate EIDD-2801 in the ferret model. Transl Res 218:16–28. doi: 10.1016/j.trsl.2019.12.002 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5. Cox RM, Wolf JD, Plemper RK. 2021. Therapeutically administered ribonucleoside analogue MK-4482/EIDD-2801 blocks SARS-CoV-2 transmission in ferrets. Nat Microbiol 6:11–18. doi: 10.1038/s41564-020-00835-2 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6. Semaniakou A, Croll RP, Chappe V. 2018. Animal models in the pathophysiology of cystic fibrosis. Front Pharmacol 9:1475. doi: 10.3389/fphar.2018.01475 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7. Ryan KA, Bewley KR, Fotheringham SA, Slack GS, Brown P, Hall Y, Wand NI, Marriott AC, Cavell BE, Tree JA. 2021. Dose-dependent response to infection with SARS-CoV-2 in the ferret model and evidence of protective immunity. Nat Commun 12:81. doi: 10.1038/s41467-020-20439-y [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8. Kim YI, Kim SG, Kim SM, Kim EH, Park SJ, Yu KM, Chang JH, Kim EJ, Lee S, Casel MAB, Um J, Song MS, Jeong HW, Lai VD, Kim Y, Chin BS, Park JS, Chung KH, Foo SS, Poo H, Mo IP, Lee OJ, Webby RJ, Jung JU, Choi YK. 2020. Infection and rapid transmission of SARS-CoV-2 in ferrets. Cell Host Microbe 27:704–709. doi: 10.1016/j.chom.2020.03.023 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9. Liu HL, Yeh IJ, Phan NN, Wu YH, Yen MC, Hung JH, Chiao CC, Chen CF, Sun Z, Jiang JZ, Hsu HP, Wang CY, Lai MD. 2020. Gene signatures of SARS-CoV/SARS-CoV-2-infected ferret lungs in short- and long-term models. Infect Genet Evol 85:104438. doi: 10.1016/j.meegid.2020.104438 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10. Charles A, Janeway J, Travers P, Walport M, Shlomchik MJ. 2001. The generation of lymphocyte antigen receptors. In Immunobiology: the immune system in health and disease, 5th edition [Google Scholar]
  • 11. Watson CT, Steinberg KM, Huddleston J, Warren RL, Malig M, Schein J, Willsey AJ, Joy JB, Scott JK, Graves TA, Wilson RK, Holt RA, Eichler EE, Breden F. 2013. Complete haplotype sequence of the human immunoglobulin heavy-chain variable, diversity, and joining genes and characterization of allelic and copy-number variation. Am J Hum Genet 92:530–546. doi: 10.1016/j.ajhg.2013.03.004 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12. Matsuda F, Ishii K, Bourvagnet P, Kuma K, Hayashida H, Miyata T, Honjo T. 1998. The complete nucleotide sequence of the human immunoglobulin heavy chain variable region locus. J Exp Med 188:2151–2162. doi: 10.1084/jem.188.11.2151 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13. Rodriguez OL, Gibson WS, Parks T, Emery M, Powell J, Strahl M, Deikus G, Auckland K, Eichler EE, Marasco WA, Sebra R, Sharp AJ, Smith ML, Bashir A, Watson CT. 2020. A novel framework for characterizing genomic haplotype diversity in the human immunoglobulin heavy chain locus. Front Immunol 11:2136. doi: 10.3389/fimmu.2020.02136 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14. Rodriguez OL, Silver CA, Shields K, Smith ML, Watson CT. 2022. Targeted long-read sequencing facilitates phased diploid assembly and genotyping of the human T cell receptor alpha, delta, and beta loci. Cell Genom 2:100228. doi: 10.1016/j.xgen.2022.100228 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15. Nguefack Ngoune V, Bertignac M, Georga M, Papadaki A, Albani A, Folch G, Jabado-Michaloud J, Giudicelli V, Duroux P, Lefranc M-P, Kossida S. 2022. IMGT biocuration and analysis of the rhesus monkey IG loci. Vaccines (Basel) 10:394. doi: 10.3390/vaccines10030394 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16. Debbagh C, Folch G, Jabado-Michaloud J, Giudicelli V, Kossida S. 2024. Deciphering Gorilla gorilla gorilla immunoglobulin loci in multiple genome assemblies and enrichment of IMGT resources. Front Immunol 15:1475003. doi: 10.3389/fimmu.2024.1475003 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17. Goldstein LD, Chen YJJ, Wu J, Chaudhuri S, Hsiao YC, Schneider K, Hoi KH, Lin Z, Guerrero S, Jaiswal BS, Stinson J, Antony A, Pahuja KB, Seshasayee D, Modrusan Z, Hötzel I, Seshagiri S. 2019. Massively parallel single-cell B-cell receptor sequencing enables rapid discovery of diverse antigen-reactive antibodies. Commun Biol 2:304. doi: 10.1038/s42003-019-0551-y [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18. Walsh ES, Tollison TS, Brochu HN, Shaw BI, Diveley KR, Chou H, Law L, Kirk AD, Gale M Jr, Peng X. 2022. Single-cell-based high-throughput Ig and TCR repertoire sequencing analysis in rhesus macaques. J Immunol 208:762–771. doi: 10.4049/jimmunol.2100824 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19. Vázquez Bernat N, Corcoran M, Nowak I, Kaduk M, Castro Dopico X, Narang S, Maisonasse P, Dereuddre-Bosquet N, Murrell B, Karlsson Hedestam GB. 2021. Rhesus and cynomolgus macaque immunoglobulin heavy-chain genotyping yields comprehensive databases of germline VDJ alleles. Immunity 54:355–366. doi: 10.1016/j.immuni.2020.12.018 [DOI] [PubMed] [Google Scholar]
  • 20. Wong J, Tai CM, Hurt AC, Tan HX, Kent SJ, Wheatley AK. 2020. Sequencing B cell receptors from ferrets (Mustela putorius furo). PLoS ONE 15:e0233794. doi: 10.1371/journal.pone.0233794 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21. Gerritsen B, Pandit A, Zaaraoui-Boutahar F, van den Hout MCGN, van IJcken WFJ, de Boer RJ, Andeweg AC. 2020. Characterization of the ferret TRB locus guided by V, D, J, and C gene expression analysis. Immunogenetics 72:101–108. doi: 10.1007/s00251-019-01142-9 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22. Lefranc MP, Giudicelli V, Duroux P, Jabado-Michaloud J, Folch G, Aouinti S, Carillon E, Duvergey H, Houles A, Paysan-Lafosse T, Hadi-Saljoqi S, Sasorith S, Lefranc G, Kossida S. 2015. IMGT, the international Immunogenetics information system 25 years on. Nucleic Acids Res 43:413. doi: 10.1093/nar/gku1056 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23. NCBI RefSeq genome assembly ASM1176430v1.1 and annotation. National Library of Medicine. Available from: https://www.ncbi.nlm.nih.gov/datasets/genome/GCF_011764305.1/. Accessed 10 November 2021
  • 24. Mustela putorius furo isolate JIRA1106, whole genome shotgun sequencing project. National Library of Medicine. Available from: https://www.ncbi.nlm.nih.gov/nuccore/1825500599. Retrieved 1010 NovNovember 2021. Accessed , 1010 NovNovember 2021
  • 25. Brochu HN, Tseng E, Smith E, Thomas MJ, Jones AM, Diveley KR, Law L, Hansen SG, Picker LJ, Peng X. 2020. Systematic profiling of full-length Ig and TCR repertoire diversity in rhesus macaque through long read transcriptome sequencing. J Immunol 204:3434–3444. doi: 10.4049/jimmunol.1901256 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26. Pégorier P, Bertignac M, Chentli I, Nguefack Ngoune V, Folch G, Jabado-Michaloud J, Hadi-Saljoqi S, Giudicelli V, Duroux P, Lefranc MP, Kossida S. 2020. IMGT biocuration and comparative study of the T cell receptor beta locus of veterinary species based on Homo sapiens TRB. Front Immunol 11:821. doi: 10.3389/fimmu.2020.00821 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27. Lefranc MP. 2000. Nomenclature of the human T cell receptor genes. CP in Immunology 40:A-1O. doi: 10.1002/0471142735.ima01os40 [DOI] [PubMed] [Google Scholar]
  • 28. Aran D, Looney AP, Liu L, Wu E, Fong V, Hsu A, Chak S, Naikawadi RP, Wolters PJ, Abate AR, Butte AJ, Bhattacharya M. 2019. Reference-based analysis of lung single-cell sequencing reveals a transitional profibrotic macrophage. Nat Immunol 20:163–172. doi: 10.1038/s41590-018-0276-y [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29. Morgan D, Tergaonkar V. 2022. Unraveling B cell trajectories at single cell resolution. Trends Immunol 43:210–229. doi: 10.1016/j.it.2022.01.003 [DOI] [PubMed] [Google Scholar]
  • 30. Massari S, Bellahcene F, Vaccarelli G, Carelli G, Mineccia M, Lefranc MP, Antonacci R, Ciccarese S. 2009. The deduced structure of the T cell receptor gamma locus in Canis lupus familiaris. Mol Immunol 46:2728–2736. doi: 10.1016/j.molimm.2009.05.008 [DOI] [PubMed] [Google Scholar]
  • 31. Bergeron LM, McCandless EE, Dunham S, Dunkle B, Zhu Y, Shelly J, Lightle S, Gonzales A, Bainbridge G. 2014. Comparative functional characterization of canine IgG subclasses. Vet Immunol Immunopathol 157:31–41. doi: 10.1016/j.vetimm.2013.10.018 [DOI] [PubMed] [Google Scholar]
  • 32. Tabel H, Ingram DG. 1972. Evidence for five immunoglobulin classes of 7S type. Immunology 22:933–942. [PMC free article] [PubMed] [Google Scholar]
  • 33. Strietzel CJ, Bergeron LM, Oliphant T, Mutchler VT, Choromanski LJ, Bainbridge G. 2014. In vitro functional characterization of feline IgGs. Vet Immunol Immunopathol 158:214–223. doi: 10.1016/j.vetimm.2014.01.012 [DOI] [PubMed] [Google Scholar]
  • 34. Vasicek TJ, Leder P. 1990. Structure and expression of the human immunoglobulin lambda genes. J Exp Med 172:609–620. doi: 10.1084/jem.172.2.609 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35. Ye J, Ma N, Madden TL, Ostell JM. 2013. IgBLAST: an immunoglobulin variable domain sequence analysis tool. Nucleic Acids Res 41:34. doi: 10.1093/nar/gkt382 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36. Fu L, Niu B, Zhu Z, Wu S, Li W. 2012. CD-HIT: accelerated for clustering the next-generation sequencing data. Bioinformatics 28:3150–3152. doi: 10.1093/bioinformatics/bts565 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37. Rice P, Longden I, Bleasby A. 2000. EMBOSS: the European molecular biology open software suite. Trends Genet 16:276–277. doi: 10.1016/s0168-9525(00)02024-2 [DOI] [PubMed] [Google Scholar]
  • 38. Peng X, Alföldi J, Gori K, Eisfeld AJ, Tyler SR, Tisoncik-Go J, Brawand D, Law GL, Skunca N, Hatta M. 2014. The draft genome sequence of the ferret (Mustela putorius furo) facilitates study of human respiratory disease. Nat Biotechnol 32:1250–1255. doi: 10.1038/nbt.3079 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39. Li H. 2018. Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics 34:3094–3100. doi: 10.1093/bioinformatics/bty191 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40. Robinson JT, Thorvaldsdóttir H, Winckler W, Guttman M, Lander ES, Getz G, Mesirov JP. 2011. Integrative genomics viewer. Nat Biotechnol 29:24–26. doi: 10.1038/nbt.1754 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41. Mistry J, Chuguransky S, Williams L, Qureshi M, Salazar GA, Sonnhammer ELL, Tosatto SCE, Paladin L, Raj S, Richardson LJ, Finn RD, Bateman A. 2021. Pfam: The protein families database in 2021. Nucleic Acids Res 49:D412–D419. doi: 10.1093/nar/gkaa913 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42. Krogh A, Larsson B, von Heijne G, Sonnhammer EL. 2001. Predicting transmembrane protein topology with a hidden Markov model: application to complete genomes. J Mol Biol 305:567–580. doi: 10.1006/jmbi.2000.4315 [DOI] [PubMed] [Google Scholar]
  • 43. Hallgren J, Tsirigos KD, Pedersen MD, Almagro Armenteros JJ, Marcatili P, Nielsen H, Krogh A, Winther O. 2022. DeepTMHMM predicts alpha and beta transmembrane proteins using deep neural networks. Biorxiv. doi: 10.1101/2022.04.08.487609 [DOI]
  • 44. Edgar RC. 2010. Search and clustering orders of magnitude faster than BLAST. Bioinformatics 26:2460–2461. doi: 10.1093/bioinformatics/btq461 [DOI] [PubMed] [Google Scholar]
  • 45. Lane J, Duroux P, Lefranc MP. 2010. From IMGT-ONTOLOGY to IMGT/LIGMotif: the IMGT standardized approach for immunoglobulin and T cell receptor gene identification and description in large genomic sequences. BMC Bioinformatics 11:223. doi: 10.1186/1471-2105-11-223 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46. Quinlan AR, Hall IM. 2010. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics 26:841–842. doi: 10.1093/bioinformatics/btq033 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47. Merelli I, Guffanti A, Fabbri M, Cocito A, Furia L, Grazini U, Bonnal RJ, Milanesi L, McBlane F. 2010. RSSsite: a reference database and prediction tool for the identification of cryptic recombination signal sequences in human and murine genomes. Nucleic Acids Res 38:W262–7. doi: 10.1093/nar/gkq391 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 48. Wu TD, Reeder J, Lawrence M, Becker G, Brauer MJ. 2016. GMAP and GSNAP for genomic sequence alignment: enhancements to speed, accuracy, and functionality. Methods Mol Biol 1418:283–334. doi: 10.1007/978-1-4939-3578-9_15 [DOI] [PubMed] [Google Scholar]
  • 49. Cowell LG, Davila M, Kepler TB, Kelsoe G. 2002. Identification and utilization of arbitrary correlations in models of recombination signal sequences. Genome Biol 3:research0072. doi: 10.1186/gb-2002-3-12-research0072 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 50. Olivieri D, Faro J, von Haeften B, Sánchez-Espinel C, Gambón-Deza F. 2013. An automated algorithm for extracting functional immunologic V-genes from genomes in jawed vertebrates. Immunogenetics 65:691–702. doi: 10.1007/s00251-013-0715-8 [DOI] [PubMed] [Google Scholar]
  • 51. Giudicelli V, Chaume D, Lefranc MP. 2005. IMGT/GENE-DB: a comprehensive database for human and mouse immunoglobulin and T cell receptor genes. Nucleic Acids Res 33:D256–61. doi: 10.1093/nar/gki010 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 52. Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ. 1990. Basic local alignment search tool. J Mol Biol 215:403–410. doi: 10.1016/S0022-2836(05)80360-2 [DOI] [PubMed] [Google Scholar]
  • 53. Mandric I, Rotman J, Yang HT, Strauli N, Montoya DJ, Van Der Wey W, Ronas JR, Statz B, Yao D, Petrova V, Zelikovsky A, Spreafico R, Shifman S, Zaitlen N, Rossetti M, Ansel KM, Eskin E, Mangul S. 2020. Profiling immunoglobulin repertoires across multiple human tissues using RNA sequencing. Nat Commun 11:3126. doi: 10.1038/s41467-020-16857-7 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 54. Alamyar E, Duroux P, Lefranc M-P, Giudicelli V. 2012. IMGT(®) tools for the nucleotide analysis of immunoglobulin (IG) and T cell receptor (TR) V-(D)-J repertoires, polymorphisms, and IG mutations: IMGT/V-QUEST and IMGT/HighV-QUEST for NGS. Methods Mol Biol 882:569–604. doi: 10.1007/978-1-61779-842-9_32 [DOI] [PubMed] [Google Scholar]
  • 55. Li S, Lefranc M-P, Miles JJ, Alamyar E, Giudicelli V, Duroux P, Freeman JD, Corbin VDA, Scheerlinck J-P, Frohman MA, Cameron PU, Plebanski M, Loveland B, Burrows SR, Papenfuss AT, Gowans EJ. 2013. IMGT/HighV QUEST paradigm for T cell receptor IMGT clonotype diversity and next generation repertoire immunoprofiling. Nat Commun 4:2333. doi: 10.1038/ncomms3333 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 56. Bolotin DA, Poslavsky S, Mitrophanov I, Shugay M, Mamedov IZ, Putintseva EV, Chudakov DM. 2015. MiXCR: software for comprehensive adaptive immunity profiling. Nat Methods 12:380–381. doi: 10.1038/nmeth.3364 [DOI] [PubMed] [Google Scholar]
  • 57. Giudicelli V, Duroux P, Rollin M, Aouinti S, Folch G, Jabado-Michaloud J, Lefranc M-P, Kossida S. 2022. IMGT immunoinformatics tools for standardized V-DOMAIN analysis. Methods Mol Biol 2453:477–531. doi: 10.1007/978-1-0716-2115-8_24 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 58. Untergasser A, Cutcutache I, Koressaar T, Ye J, Faircloth BC, Remm M, Rozen SG. 2012. Primer3new capabilities and interfaces. Nucleic Acids Res 40:e115. doi: 10.1093/nar/gks596 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 59. Kuhn RM, Haussler D, Kent WJ. 2013. The UCSC genome browser and associated tools. Brief Bioinform 14:144–161. doi: 10.1093/bib/bbs038 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 60. Lun ATL, Riesenfeld S, Andrews T, Dao TP, Gomes T, Marioni JC. 2019. EmptyDrops: distinguishing cells from empty droplets in droplet-based single-cell RNA sequencing data. Genome Biol 20:63. doi: 10.1186/s13059-019-1662-y [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 61. Stuart T, Butler A, Hoffman P, Hafemeister C, Papalexi E, Hao Y, Stoeckius M, Smibert P, Satija R. 2019. Comprehensive integration of single-cell data. Cell 177:1888–1902. doi: 10.1016/j.cell.2019.05.031 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 62. Wolock SL, Lopez R, Klein AM. 2019. Scrublet: computational identification of cell doublets in single-cell transcriptomic data. Cell Syst 8:281–291. doi: 10.1016/j.cels.2018.11.005 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 63. Tirosh I, Izar B, Prakadan SM, Wadsworth MH 2nd, Treacy D, Trombetta JJ, Rotem A, Rodman C, Lian C, Murphy G. 2016. Dissecting the multicellular ecosystem of metastatic melanoma by single-cell RNA-seq. Science 352:189–196. doi: 10.1126/science.aad0501 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 64. Hafemeister C, Satija R. 2019. Normalization and variance stabilization of single-cell RNA-seq data using regularized negative binomial regression. Genome Biol 20:296. doi: 10.1186/s13059-019-1874-1 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 65. Sundell T, Grimstad K, Camponeschi A, Tilevik A, Gjertsson I, Mårtensson IL. 2022. Single-cell RNA sequencing analyses: interference by the genes that encode the B-cell and T-cell receptors. Brief Funct Genomics 22:263–273. doi: 10.1093/bfgp/elac044 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 66. Mabbott NA, Baillie JK, Brown H, Freeman TC, Hume DA. 2013. An expression atlas of human primary cells: inference of gene function from coexpression networks. BMC Genomics 14:632–632. doi: 10.1186/1471-2164-14-632 [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplemental material. jvi.00181-25-s0001.pdf.

Figures S1 to S6, Table S5, supplemental text, and legends for additional supplemental tables.

jvi.00181-25-s0001.pdf (2.8MB, pdf)
DOI: 10.1128/jvi.00181-25.SuF1
Table S1. jvi.00181-25-s0002.xlsx.

Summary of the ferret Iso-Seq data and the abundance of ferret C-region isoforms.

jvi.00181-25-s0002.xlsx (30.1KB, xlsx)
DOI: 10.1128/jvi.00181-25.SuF2
Table S2. jvi.00181-25-s0003.xlsx.

Ferret Ig and TCR C-region cDNA consensus sequences, annotations and description of allelic sites, and V, D, J gene annotations on the ferret reference genome assembly.

jvi.00181-25-s0003.xlsx (420.5KB, xlsx)
DOI: 10.1128/jvi.00181-25.SuF3
Table S3. jvi.00181-25-s0004.xlsx.

Ferret Ig and TCR C-region specific primer sequences and the comparison of V(D)J contigs with primer hit and with or without the corresponding C-region match.

jvi.00181-25-s0004.xlsx (15.8KB, xlsx)
DOI: 10.1128/jvi.00181-25.SuF4
Table S4. jvi.00181-25-s0005.xlsx.

Differentially expressed genes for each cell cluster of the ferret PBMC and splenocyte samples.

jvi.00181-25-s0005.xlsx (229.8KB, xlsx)
DOI: 10.1128/jvi.00181-25.SuF5
Table S6. jvi.00181-25-s0006.xlsx.

Functional enrichment analysis of differentially expressed genes between selected cell clusters identified in the ferret PBMC and splenocyte samples.

jvi.00181-25-s0006.xlsx (207KB, xlsx)
DOI: 10.1128/jvi.00181-25.SuF6

Data Availability Statement

All ferret Iso-seq and single-cell RNA-seq data were deposited into NCBI (BioProject accession no. PRJNA939558 and GEO accession no. GSE231948). All ferret C, V, D, and J sequences are available in the IMGT database under accession numbers IMGT000135, IMGT000136, IMGT000131, IMGT000203, IMGT000195, IMGT000196, and IMGT000168 and NCBI TPA accessions BK063796, BK063797, BK068009, BK068010, BK068295, BK068011, and BK068537. Code is available on Github at https://github.com/ncsu-penglab/FerretIgTCR.


Articles from Journal of Virology are provided here courtesy of American Society for Microbiology (ASM)

RESOURCES