Table 2. Structural features of species-specific transposable element insertions in ca. 5.7 Mb of orthologous genomic sequences from the coelacanth species Latimeria chalumnae and L. menadoensis.
Type of TE | Species with insertion | Insertion identifier | Insertion length (nt) | Target Site Duplication? | ORF(s)/Domain(s)/Specific features | Copy number in genomic sequences * | Element representation in the transcriptome ** | Genomic position relative to next gene | Distance to the closest exon (kb) and corresponding gene |
CR1 | L. ch. | 1 | 1622 | AT | ORF2: RT | 5 (2 with id ≥98%) | 17 | IGR | 5.9 (HOXD12) |
2 | 1060 | AT-rich region | ORF2: RT (partial) | 1 | 0 | Intron (exon 4–exon 5) | 0.7 (exon 5) (GRID1) | ||
3 | 1097 | GAGTCTTGTT | ORF2: RT (partial) | 1 | 4 | IGR | 4.0 (PCDHGC) | ||
4 | 227 | CTA | - | 49 (5 with id ≥98%) | 1 | IGR | 9.7 (ighv14-1 (21)) | ||
5 | 320 | TTTAG | - | 37 (5 with id ≥98%) | 0 | IGR | 9.4 (vomeronasal 2 receptor) | ||
6 | 303 | TATTAGG | - | 1 | 0 | IGR | >70.9 (CALCOCO1) | ||
L. me. | 7 | 2845 | ACTCA | ORF2: RT, APE (partial) | 23 (4 with id ≥98%) | 24 | IGR | >9.0 | |
8 | 2821 | AAT | ORF2: RT, APE (partial) | 24 | 31 | IGR | 3.2 (PCDHGC5) | ||
9 | 1174 | AAGTA | ORF2: RT (partial) | 4 | 8 | IGR | 3.6 (PCDHGC5) | ||
10 | 1038 | CCAT | ORF2: RT (partial) | 74 (18 with id ≥98%) | 10 | IGR | 18.3 (protocadherin gamma) | ||
11 | 862 | GATTAA | ORF2: RT (partial) | 86 (19 with id ≥98%) | 6 | Intron (exon 2–exon 3) | 0.2 (exon 3) (SRA1) | ||
12 | 1398 | TCTA | ORF2: RT (partial) | 57 (15 with id ≥98%) | 13 | IGR | 37.5 (HOXB13) | ||
13 | 1019 | Poly-A region | ORF2: RT (partial) | 1 | 0 | Intron (exon 2–exon 3) | 1.3 (exon 3) (PCDHGC) | ||
14 | 385 | ND | - | 110 (14 with id ≥98%) | 0 | Intron (exon 4–exon 5) | 0.4 (exon 4) (ighm) | ||
15 | 387 | CTATTCC | - | 109 (12 with id ≥98%) | 3 | Intron (exon 2–exon 3) | 6.2 (exon 2) (FAT tumor suppressor homolog) | ||
L1 | L. me. | 16 | 2168 | ACTAATCTTATTTTAA | Endonuclease (PFAM PF02994, “Transposase_22”) (partial) | 2 (2 with id ≥98%) | 20 | IGR | 41.6 (hoxc1a) |
17 | 1999 | ND | RT | 4 | 19 | IGR | 0.8 (ighv14-1 (25)) | ||
L2 | L. ch. | 18 | 2219 | G | RT | 2 | 0 | IGR | 3.4 (PCDHGC) |
CoeG-SINE | L. ch. | 19 | 1362 | ND | - | 1 | 16 | Intron (exon 4–exon 5) | 0.6 (exon 5) (von Willebrand factor A domain containing 5A) |
L. me. | 20 | 1018 | ATTTT | - | 1 | 0 | IGR | 18.0 (EVX2) | |
LF-SINE | L. ch. | 21 (inserted within element 22) | 391 | TG | - | 48 | 0 | IGR | 33.1 (uncharacterized protein) |
Gypsy | L. ch. | 22 | 896 | CCCGCAGCGCCCCCCCCAGAGAAT | RT, no LTR | 1 | 1 | IGR | 33.1 (uncharacterized protein) |
ERV | L. me. | 23 | 5091 | AGAT | Gag, Pol, Env (partial), LTR | 1 | 41 | IGR | 10.6 (ighv14-1 (21)) |
MITE-like | L. ch. | 24 | 225 | CCT | - | 2 | 0 | IGR | 6.4 (von Willebrand factor A domain containing 5A) |
25 | 1311 | ATTTCAAG | Derived from a hAT transposon | 1 | 5 | IGR | 2.8 (CHRNB4) | ||
Composite insertion | L. ch. | 26 | 2303 | T | CR1 (RT, TSD “AGT”)/SINE (TSD “AAGT”)/LF-SINE/CoeSINE | 1 | 5 | IGR | 90.7 (CRHR2) |
L. me. | 27 | 1249 | ND | CoeG-SINE/LF-SINE | 1 | 0 | IGR | >12.8 |
ORF = Open Reading Frame; L. ch. = Latimeria chalumnae; L. me. = Latimeria menadoensis; IGR = Intergenic Region; RT = Reverse Transcriptase; APE = Apurinic/Apyrimidic Endonuclease; LTR = Long Terminal Repeat; ND = Not Detected; TSD = Target Site Duplication; *Number of BlastN hits in the analyzed regions, with hit length ≥80% of insertion length and identity ≥80%, criteria that are classically used to define TE families; ** Number of BlastN hits against L. menadoensis testis transcriptome with hit length ≥80 nt and identity ≥95%.