Skip to main content
Data in Brief logoLink to Data in Brief
. 2017 Aug 3;14:507–514. doi: 10.1016/j.dib.2017.08.002

Data characterizing the genomic structure of the T cell receptor (TRB) locus in Camelus dromedarius

Rachele Antonacci a,, Mariagrazia Bellini a, Vito Castelli a, Salvatrice Ciccarese a, Serafina Massari b
PMCID: PMC5562110  PMID: 28856181

Abstract

These data are presented in support of structural and evolutionary analysis of the published article entitled “The occurrence of three D-J-C clusters within the dromedary TRB locus highlights a shared evolution in Tylopoda, Ruminantia and Suina” (Antonacci et al., 2017) [1]. Here we describe the genomic structure and the gene content of the T cell receptor beta chain (TRB) locus in Camelus dromedarius. As in the other species of mammals, the general genomic organization of the dromedary TRB locus consists of a pool of TRBV genes located upstream of in tandem TRBD-J-C clusters, followed by a TRBV gene with an inverted transcriptional orientation. A peculiarity of the dromedary TRB locus structure is the presence of three TRBD-J-C clusters, which is a common feature of sheep, cattle and pig sequences.

Keywords: T cell receptor, TRB locus, Dromedary genome, Camelus dromedarius, IMGT


Specifications Table

Subject area Biology, genetics, genomics
More specific subject area Genetics, Genomics and Molecular Biology
Type of data Tables and figures
How data was acquired A standard BLAST search (Basic Local Alignment Search Tool.http://blast.ncbi.nlm.nih.gov/Blast.cgi.) of the public dromedary genomic assembly, Long PCR on genomic DNA and cloning
Data format Analyzed
Experimental factors Sequence analysis and dromedary DNA extraction
Experimental features Dromedary lung genomic DNA was prepared from a single healthy animal. PCRs were performed by High Fidelity DNA polymerase. The PCR products were purified and cloned into the TA-vector system.
Data source location Bari and Lecce, Italy
Data accessibility The whole dromedary genome shotgun sequence is available at GenBank (ID: GCA_000767585.1). Sequence data published with this article were registered in EMBL database with the Accession numberLT837971

Value of the data

  • These data insight into the genomic structure of the T cell receptor (TRB) locus in Camelus dromedaries. This results in the first, mostly complete, map of the TRB locus in a species of the Tylopoda suborder.

  • The dromedary TRB locus characterization can be used to increase the understanding in the evolution of Camelidae and to contribute to solving the relative placement of this species within the Artiodactyla order.

  • The availability of the sequence of the dromedary TRB locus allows researchers to concentrate on functional study and provides a tool to use this specie as a valuable model for immunological research.

1. Data

Data presented in the text include tables and figures giving information on the genomic structure and the gene content of the dromedary TRB locus, a mammalian species belonging to the Camelus genus. This information was obtained by integrating the sequence data deduced from the public genomic assembly [2] with sequences obtained by PCR experiments conducted in our laboratory. Table 1 describes position, classification and functionality of the TRB genes retrieved from the dromedary public genome assembly. Table 2 shows the description of the dromedary TRBV pseudogenes. Table 3 describes position, classification and functionality of the unrelated TRB genes recovered from the dromedary public genome assembly. Fig. 1 shows the deduced amino acid sequences of the dromedary TRBV genes according to IMGT unique numbering for the V-REGION [6]. Table 4 provides the list of the genomic clones of the dromedary TRBD-J-C region with the primer pairs used and the PCR conditions. Fig. 2 shows the TRBD, the TRBJ and the TRBC gene sequences.

Table 1.

Description of the TRB genes in the Camelus dromedarius genome assembly. The position of all genes and their classification and functionality are reported.

Gene classification Functionalitya NCBI Reference Sequence Positionb
TRBV1 F NW_011591622 861263-861886
TRBV2 F NW_011591622 932263-932714
TRBV3 P NW_011591622 927952-928412
TRBV5S1 F NW_011591622 937384-937843
TRBV5S2 F NW_011591622 940879-941358
TRBV5S3 F NW_011591622 955293-955748
TRBV6 F NW_011591622 944809-945237
TRBV7S1 F NW_011591622 947134-947581
TRBV7S2 F NW_011591622 962228-962689
TRBV8 F NW_011591622 950124-950593
TRBV9 P NW_011591622 965923-966346
TRBV10 F NW_011591622 970368-970809
TRBV11 F NW_011591622 975860-976308
TRBV12S1 P NW_011591622 981727-982197
TRBV12S2 P NW_011591622 992125-992569
TRBV14 P NW_011591622 995472-995906
TRBV15S1 F NW_011591622 997569-998023
TRBV15S2 F NW_011591622 999129-999583
TRBV16 F NW_011591622 1003645-1004098
TRBV19 F NW_011591622 1018094-1018641
TRBV20 F NW_011591622 1020910-1021565
TRBV21S1 F NW_011591622 1028337-1028797
TRBV21S2 F NW_011591151 70843-70731
TRBV21S3 P NW_011591151 62738-62511
TRBV22 F NW_011591151 46518-46381
TRBV23 P NW_011591151 60590-60480
TRBV24 P NW_011591151 56428-56106
TRBV25 F NW_011591151 52347-52219
TRBV26 F NW_011591151 66428-66297
TRBV27 F NW_011591151 41158-41032
TRBV28 F NW_011591151 32762-32640
TRBV29 F NW_011591151 27109-26837
TRBD1 F NW_011591151 9932-9943
TRBJ1-1 F NW_011591151 9247-9294
TRBJ1-2 F NW_011591151 9116-9159
TRBJ1-3 F NW_011591151 8861-8910
TRBJ1-4 F NW_011591151 8258-8308
TRBJ1-5 F NW_011591151 7982-8031
TRBJ1-6 F NW_011591151 7491-7543
TRBC1 F NW_011591151 EX1 4773-5166
EX2 4311-4328
EX3 4044-4150
EX4 3711-3731
TRBC3 nd NW_011591151 EX2 2866-2883
EX3 2599-2705
EX4 2266-2286
TRBJ3-1 F NW_011620189 653-702
TRBJ3-1 F NW_011601111 2234-2283
TRBJ3-2 F NW_011601111 2426-2476
TRBJ3-3 F NW_011601111 2642-2690
TRBJ3-4 nd NW_011601111 2787-2814
TRBJ2-2 F NW_011616084 215-265
TRBJ2-3 nd NW_011616084 2-46
TRBJ2-6 nd NW_011607149 185-231
TRBC2 nd NW_011593440 EX1 1911-2149
EX2 2622-2639
EX3 2800-2906
EX4 3190-3210
TRBV30 F NW_011593440 14509-14160
a

nd: not defined (indicates that the nt sequence of the gene is incomplete and its functionality cannot be defined).

b

L-PART1/ V-exon for TRBV genes and coding sequence for TRBD and TRBJ.

Table 2.

Description of the Camdro TRBV pseudogenes.

TRBV genes Defective Leader Frameshift Stop codon Defective splice sites Defective RSS
TRBV3
TRBV9
TRBV12S1
TRBV12S2
TRBV14
TRBV21S3
TRBV23
TRBV24

Table 3.

Description of the unrelated TRB genes in the Camelus dromedarius genome assembly. The position of all genes and their classification and functionality are reported.

Gene classification Functionalitya NCBI reference sequence Position
MOXD2 F NW_011591622 850155-856730
TRY1 F NW_011591622 870036-876394
TRY2 F NW_011591622 882909-888072
TRY3 nd NW_011623391 1-2387
TRY4 F NW_011591151 13974-17714
EPBH6 F NW_011593440 46466-60647
a

nd: not defined (indicates that the nt sequence of the gene is incomplete and its functionality cannot be defined).

Fig. 1.

Fig. 1

The IMGT Protein display of the dromedary TRBV genes. Only functional genes and in-frame pseudogenes are shown. The description of the strands and loops and of the FR-IMGT and CDR-IMGT is according to the IMGT unique numbering for V-REGION [6]. The amino acid length of the CDR-IMGT AA is also indicated in square brackets.

Table 4.

Camelus dromedarius D-J-C region genomic clones. The primer sequences, the PCR conditions and the size of each clone are reported.

Clone Primer pairs sequence (5′-3′) Primer location T annealing Product length (bp)
pSCBJ11 JB11U: CTTTGGAGAAGGCACCAG TRBJ1-1 gene 55/58 4396
CB2L: TGGTTGCGGGGGTTGTGC TRBC gene exon 1
pSCJ22KN CB2U: GCACAACCCCCGCAACCA TRBC gene exon 1 53/55 5000
JB34L: GCCAAAGTACTGAGTGTT TRBJ3-4 gene
pSCBJ27U JB34U: AACACTCAGTACTTTGGC TRBJ3-4 gene 56/58 4077
CB2L: TGGTTGCGGGGGTTGTGC TRBC gene exon 1
pSCBD3 CB2U: GCACAACCCCCGCAACCA TRBC gene exon 1 55/56 4848
JB23L: CCGCCGAAAAACAGTGTC TRBJ2-3 gene
pSCMG1 JB23U: GACACTGTTTTTCGGCGG TRBJ2-3 gene 55/58 3160
CB2L: TGGTTGCGGGGGTTGTGC TRBC gene exon 1
pSCB2C8 CB2U: GCACAACCCCCGCAACCA TRBC gene exon 1 62 1331
3UTR:GTTGAGCTCACTTTGCAGGG TRBC2 gene 3UTR

Fig. 2.

Fig. 2

Fig. 2

Nucleotide and deduced amino acid sequences of the dromedary TRBD (a), TRBJ (b) and TRDC (c) genes. The consensus sequence of the heptamer and nonamer are provided at the top of the figure and underlined. The numbering adopted for the gene classification is reported on the left of each gene. The gene sequence retrieved from the Ca_dromedarius_V1.0 genomic assembly is highlighted in red. In (a), the inferred amino acid sequence of the TRBD genes in the three coding frames are reported. In (b), the donor splice site for each TRBJ is shown. The canonical FGXG amino acid motifs are underlined. The unusual TRBJ3.6 gene motif is in italics. In (c), IMGT Protein display of the dromedary TRBC genes. Descriptions of the strands and loops were collected according to the IMGT unique numbering for C-DOMAIN [7].

2. Experimental design, materials and methods

2.1. Analysis of the dromedary TRB locus retrieved from the genome assembly: identification of the related and unrelated TRB genes

We employed the recent submission to NCBI (BioProject PRJNA234474) of a draft genome sequence from the Arabian camel [2] to identify the TRB locus in this species. A standard BLAST search (Basic Local Alignment Search Tool. http://blast.ncbi.nlm.nih.gov/Blast.cgi.) of the dromedary genomic resource was then performed by using human and sheep TRB gene sequences to assess their physical location in the dromedary genome. We directly retrieved a sequence of 457871 pb (gaps included) from the PRJNA234474_Ca_dromedarius_V1.0 assembly that corresponds to eight distinct unplaced and not continuous scaffolds (Fig. 1 in [1]). The sequence comprises the MOXD2 and the EPHB6 genes that flank the 5′ and 3′ ends, respectively, of all mammalian TRB loci studied to date. All dromedary TRB genes have been recognized and annotated while taking into account both the human sequence and the sheep genomic D-J-C region as a reference [3], [4], [5] (Table 1). The functionality of V, J and C genes was predicted through the manual alignment of sequences adopting the following parameters: (a) identification of the leader sequence at the 5′ of the TRBV genes; (b) determination of proper recombination signal (RS) sequences located at 3′ of the TRBV, 5′ of the TRBJ, and 3′ and 5′ ends of the TRBD genes, respectively; (c) determination of correct acceptor and donor splicing sites; (d) estimation of the expected length of the coding regions; (e) absence of frameshifts and stop signals in the coding regions of the genes. We annotated 33 TRBV germline genes (twenty-five functional genes and eight pseudogenes) (Table 2), one TRBD, 13 TRBJ and two complete and one incomplete TRBC genes. The analysis of the 3′ part of the locus revealed the potential presence of three D-J-C clusters similar to clusters found in sheep [4], [5].

We also identified and annotated four trypsin-like serine protease (TRY) genes (Table 3). In this context, downstream of the TRBV1 gene, proceeding from 5′ to 3′, we found as in humans two protease genes that we recognized tentatively, according to their genomic position, as TRY1 (alias PRSS58 or TRYX3) and TRY2 (alias TRY2P), respectively. A third TRY gene, named TRY3, was homologous to a gene located after the TRY2P gene in humans that was found within the NW_011623391 unplaced scaffold. Extrapolation of the synteny with the human sequence predicts that the NW_011623391 scaffold should be juxtaposed within the dromedary TRB locus, upstream of the TRBV3 gene (Fig. 1 in [1]). An additional TRY gene, classified as TRY4, was found before the D-J-C region. Thus, unlike humans, only one TRY gene encompasses the array of the TRBV genes. All dromedary TRY genes appear putatively functional with the presence of correct acceptor and donor splicing site and an absence of frameshifts and stop codon in their coding regions. The genomic structure of the MOXD2 and EPHB6 genes, which delimit the TRB locus, was also defined (Table 3).

2.2. Protein display of the dromedary TRBV genes

The deduced amino acid sequences of the germline TRBV genes were manually aligned according to IMGT unique numbering for the V-REGION [6] to maximize the percentage of identity (Fig. 1). Only potential functional genes and in-frame pseudogenes are shown. All sequences exhibit the typical framework regions (FR) and complementarity determining regions (CDR) as well as the four amino acids: cysteine 23 (1st-CYS) in FR1-IMGT, tryptophan 41 (CONSERVED-TRP) in FR2-IMGT, hydrophobic amino acid 89, and cysteine 104 (2nd-CYS) in FR3-IMGT [6]. Conversely, CDR-IMGT varies in amino acid composition and length. It should be noted that the TRBV21 genes show a difference in length of one amino acid in the FR3 that corresponds to a C′′ strand that is shorter and has a diverse amino acid sequence for TRBV21S2 compared to the TRBV21S1 gene.

2.3. Isolation of the dromedary TRBD-J-C region and analysis of the gene content

To isolate the entire TRBD-J-C region, we set up six different PCRs to produce six consecutive amplicons that cover the region between the first TRBJ and the last TRBC gene. Mostly, for each amplification, we used a primer pair, a gene-specific primer designed on the sequence of the TRBJ genes identified within the cDNA clones (see [1]), and a conserved primer constructed on the first exon of the TRBC genes. For the isolation of the TRBC2 gene, a 3'UTR lower primer derived from the sequence of the genomic assembly was used. Amplification consisted of an initial denaturation step at 93 °C for 2 min followed by 10 amplification cycles that each comprised a denaturation step at 93 °C for 10 s, an annealing step with a low temperature (53–56 °C, according to the melting temperature of the primers) for 30 s, an extension step at 68 °C for 7 min, followed by 25 cycles with a higher annealing temperature (55–58 °C, according to the melting temperature of the primers) and a gradually increasing extension time of 20 s as well as a final incubation at 68 °C for 7 min. A 30-deoxyadenosine overhang was added to blunt-ended amplicons by incubation with 1.0 unit of Platinum Taq DNA Polymerase (Invitrogen) at 72 °C for 10 min. These products were purified and cloned into the StrataClone TA-vector per the manufacturer's instructions. For each sample, 6 to 10 colonies were propagated and bi-directionally sequenced using M13 and T7 vector-specific primers. All plasmid sequence data were manually analysed. For the list of the clones with the primer pairs used and the PCR conditions see Table 4. All the obtained amplicons were sequenced (Acc. no. LT837971). The sequenced region is schematically illustrated in Fig. 3 in [1].

The nucleotide and deduced amino acid sequences of the TRBD, TRBJ and TRBC genes classified according to the similarity to the sheep sequence are shown in Fig. 2.

Acknowledgements

The “Bilateral agreement of scientific cooperation between CNR and ASRT” for the years 2009–10 is gratefully acknowledged as well as the Italian Ministry of Foreign Affairs and Egyptian Academia of Science for supporting the “Programme of scientific and technological cooperation between Italy and Egypt”. The financial support of the University of Bari “Aldo Moro” (ex 60% delivered to RA) and University of Salento is gratefully acknowledged.

Footnotes

Transparency document

Transparency document associated with this article can be found in the online version at 10.1016/j.dib.2017.08.002.

Contributor Information

Rachele Antonacci, Email: rachele.antonacci@uniba.it.

Mariagrazia Bellini, Email: mg1192@hotmail.it.

Vito Castelli, Email: vtcast@gmail.com.

Salvatrice Ciccarese, Email: salvatricemaria.ciccarese@uniba.it.

Serafina Massari, Email: sara.massari@unisalento.it.

Transparency document. Supplementary material

Transparency document

mmc1.docx (25.5KB, docx)

.

References

  • 1.Antonacci R., Bellini M., Pala A., Mineccia M., Hassanane M.S., Ciccarese S., Massari S. The occurrence of three D-J-C clusters within the dromedary TRB locus highlights a shared evolution in Tylopoda, Ruminantia and Suina. Dev. Comp. Immunol. 2017;76:105–119. doi: 10.1016/j.dci.2017.05.021. [DOI] [PubMed] [Google Scholar]
  • 2.(a) Wu H., Guang X., Al-Fageeh M.B., Cao J., Pan S., Zhou H., Zhang L., Abutarboush M.H., Xing Y., Xie Z., Alshanqeeti A.S., Zhang Y., Yao Q., Al-Shomrani B.M., Zhang D., Li J., Manee M.M., Yang Z., Yang L., Liu Y., Zhang J., Altammami M.A., Wang S., Yu L., Zhang W., Liu S., Ba L., Liu C., Yang X., Meng F., Wang S., Li L., Li E., Li X., Wu K., Zhang S., Wang J., Yin Y., Yang H., Al-Swailem A.M., Wang J. Camelid genomes reveal evolution and adaptation to desert environments. Nat. Commun. 2014;5:5188. doi: 10.1038/ncomms6188. [DOI] [PubMed] [Google Scholar]; (b) Wu H., Guang X., Al-Fageeh M.B., Cao J., Pan S., Zhou H., Zhang L., Abutarboush M.H., Xing Y., Xie Z., Alshanqeeti A.S., Zhang Y., Yao Q., Al-Shomrani B.M., Zhang D., Li J., Manee M.M., Yang Z., Yang L., Liu Y., Zhang J., Altammami M.A., Wang S., Yu L., Zhang W., Liu S., Ba L., Liu C., Yang X., Meng F., Wang S., Li L., Li E., Li X., Wu K., Zhang S., Wang J., Yin Y., Yang H., Al-Swailem A.M., Wang J. Erratum in: Nat. Commun. 2015;6:6107. doi: 10.1038/ncomms7107. [DOI] [PubMed] [Google Scholar]
  • 3.Lefranc M.-P., Giudicelli V., Duroux P., Jabado-Michaloud J., Folch G., Aouinti S., Carillon E., Duvergey H., Houles A., Paysan-Lafosse T., Hadi-Saljoqi S., Sasorith S., Lefranc G., Kossida S. IMGT®, the international ImMunoGeneTics information system® 25 years on. Nucleic Acids Res. 2015;43:D413–D422. doi: 10.1093/nar/gku1056. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Di Tommaso S., Antonacci R., Ciccarese S., Massari S. Extensive analysis of D-J-C arrangements allows the identification of different mechanisms enhancing the diversity in sheep T cell receptor beta-chain repertoire. BMC Genom. 2010;11:3. doi: 10.1186/1471-2164-11-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Antonacci R., Di Tommaso S., Lanave C., Cribiu E.P., Ciccarese S., Massari S. Organization, structure and evolution of 41 Kb of genomic DNA spanning the D-J-C region of the sheep TRB locus. Mol. Immunol. 2008;45:493–509. doi: 10.1016/j.molimm.2007.05.023. [DOI] [PubMed] [Google Scholar]
  • 6.Lefranc M.-P., Pommié C., Ruiz M., Giudicelli V., Foulquier E., Truong L., Thouvenin-Contet V., Lefranc G. IMGT unique numbering for immunoglobulin and T cell receptor variable domains and Ig superfamily V-like domains. Dev. Comp. Immunol. 2003;27:55–77. doi: 10.1016/s0145-305x(02)00039-3. [DOI] [PubMed] [Google Scholar]
  • 7.Lefranc M.-P., Pommié C., Kaas Q., Duprat E., Bosc N., Guiraudou D., Jean C., Ruiz M., Da Piédade I., Rouard M., Foulquier E., Thouvenin V., Lefranc G. IMGT unique numbering for immunoglobulin and T cell receptor constant domains and Ig superfamily C-like domains. Dev. Comp. Immunol. 2005;29:185–203. doi: 10.1016/j.dci.2004.07.003. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Transparency document

mmc1.docx (25.5KB, docx)

Articles from Data in Brief are provided here courtesy of Elsevier

RESOURCES