Table 1.
Comparison between automated and human annotation of TEs
repeata | hitsb | nestsc | K ± s.d. (× 103)d | time ± s.d. (Mya)e | type | |||||
a | Ji-6 | PREM2_ZM | 2 | 2 | 2 | ... | - | ... | - | LTR |
b | Tekay | TEKAY_ZM | 1 | 1 | 1 | ... | - | ... | - | LTR |
c | Rle | REINA | 1 | 0 | 0 | - | - | - | - | LTR |
d | Cinful-2 | CINFUL2_ZM | 2 | 0 | 0 | - | - | - | - | LTR |
e | Milt | 00081 | 3 | 0 | 0 | 20.3 ± 5.5 | >2.4 ± 1.4 | 1.56 ± 0.42 | > .18 ± .15 | LTR |
* | 00081 | 1 | 0 | 0 | - | - | - | - | LTR | |
f | Opie-2 | OPIE2_ZM | 3 | 1 | 1 | 2.4 ± 1.6 | 2.4 ± 1.4 | 0.18 ± 0.11 | 0.18 ± 0.15 | LTR |
g | Fourf | 00098 | 5 | 0 | 0 | 18.1 ± 4.1 | 18.2 ± 4.1 | 1.39 ± 0.32 | 1.40 ± 0.44 | LTR |
h | Huck-2 | HUCK1 | 3 | 1 | 1 | 12.3 ± 2.9 | 15.3 ± 3.1 | 0.95 ± 0.22 | 1.18 ± 0.34 | LTR |
i | Victim | 00093 | 6 | 0 | 0 | 31.4 ± 19 | 30.7 ± 18 | 2.42 ± 1.44 | 2.36 ± 1.92 | LTR |
j | Ji-2 | PREM2_ZM | 1 | 1 | 1 | - | < 31 ± 18 | - | < 2.4 ± 1.9 | LTR |
k | Ji-3 | PREM2_ZM | 5 | 1 | 1 | 24.2 ± 4.8 | 24.7 ± 4.7 | 1.86 ± 0.37 | 1.90 ± 0.51 | LTR |
l | Opie-3 | OPIE2_ZM | 3 | 2 | 2 | 6.4 ± 2.3 | 6.4 ± 2.3 | 0.49 ± 0.18 | 0.49 ± 0.25 | LTR |
m | Ji-5 | PREM2_ZM | 1 | 2 | 2 | - | < 25 ± 5 | - | < 1.9 ± 0.5 | LTR |
n | Ji-4 | PREM2_ZM | 3 | 1 | 1 | 21.1 ± 4.2 | 20.8 ± 4.1 | 1.62 ± 0.32 | 1.60 ± 0.44 | LTR |
o | Reina | REINA | 4 | 0 | 0 | 27.0 ± 9.8 | 26.4 ± 9.4 | 2.08 ± 0.75 | 2.03 ± 1.02 | LTR |
p | Cinful-1 | CINFUL1/2_ZM | 4 | 1 | 1 | 3.4 ± 2.4 | 3.4 ± 2.4 | 0.26 ± 0.18 | 0.26 ± 0.26 | LTR |
q | Kake-1 | 00243 | 2 | 1 | 1 | ... | - | ... | - | LTR |
1 | Angela_F2-2 | ANGELA1_TM | 2 | 1† | 0 | - | - | - | - | LTR |
2 | RIRE2 (rice) | SABRINA2_TM | 1 | 0 | 0 | - | - | - | - | LTR |
3 | SabrinaF_2-2 | SABRINA2_TM | 4 | 0 | 0 | 25.9 ± 4.2 | 26.6 ± 4.2 | 1.99 ± 0.32 | 2.04 ± 0.46 | LTR |
SABRINA3_TM | 1 | - | 1 | - | < 27 ± 4 | - | < 2.0 ± 0.5 | LTR | ||
SABRINA_HV | 1 | - | 1 | - | < 27 ± 4 | - | < 2.0 ± 0.5 | LTR | ||
4 | Nusif_F2-1 | NUSIF1_TM | 1 | 1 | 1 | - | < 27 ± 4 | - | < 2.0 ± 0.5 | LTR |
5 | RIRE2 (rice) | RIRE2 | 1 | 0 | 0 | - | - | - | - | LTR |
6 | MITE 1-4 | THALOS_HV | 1 | 0 | 0 | - | - | - | - | MITE |
7 | MITE 2-5 | TREP220 | 1 | 0 | 0 | - | - | - | - | MITE |
8 | Veju_F2-1 | VEJU1_TM | 3 | 0 | 0 | 10.8 ± 5.5 | 10.8 ± 5.4 | 0.83 ± 0.42 | 0.83 ± 0.59 | LTR |
9 | Claudia_F2-1 | CLAUDIA1_TM | 3 | 0 | 0 | - | > 41 ± 6 | - | > 3.2 ± 0.6 | LTR |
10 | Latidu F2-1 | LATIDU2_TM | 3 | 1 | 1 | 13.3 ± 5.3 | 13.1 ± 5.4 | 1.01 ± 0.41 | 1.01 ± 0.58 | LTR |
11 | Wham F2-1 | WHAM3_TM | 3 | 1 | 1 | 40.6 ± 5.6 | 41.4 ± 5.6 | 3.12 ± 0.43 | 3.18 ± 0.60 | LTR |
12 | Fatima_F2-1 | FATIMA_TM | 6 | 0 | 0 | - | > 31 ± 4 | - | > 2.4 ± 0.4 | LTR |
13 | Sukkula_F2-1 | SUKKULA3_TM | 1 | 1 | 1 | 29.9 ± 2.6 | - | 2.30 ± 0.20 | - | LTR |
SUKKULA3_TM | 4 | 1 | 1 | 29.9 ± 2.6 | 30.7 ± 3.5 | 2.30 ± 0.20 | 2.36 ± 0.37 | LTR | ||
14 | Angela_F2-3 | ANGELA1_TM | 2 | 2 | 2 | - | < 31 ± 4 | - | < 2.4 ± 0.4 | LTR |
15 | Angela_F2-1 | ANGELA1_TM | 3 | 2 | 2 | 19.9 ± 3.4 | 19.9 ± 3.4 | 1.53 ± 0.26 | 1.53 ± 0.37 | LTR |
16 | Sabrina_F2-1 | SABRINA3_TM | 2 | 0 | 0 | - | - | - | - | LTR |
17 | Wis_F2-1 | WIS4_TM | 3 | 0 | 0 | 58.1 ± 6.0 | 57.0 ± 6.0 | 4.47 ± 0.46 | 4.38 ± 0.64 | LTR |
18 | Sabrina_G1-1 | SABRINA1_TM | 3 | 0 | 0 | 55.8 ± 6.1 | > 39 ± 5 | 4.29 ± 0.47 | > 3.0 ± 0.6 | LTR |
SABRINA1_TM | 3 | 0 | 0 | 55.8 ± 6.1 | - | 4.29 ± 0.47 | - | LTR | ||
19 | Wham_G1-2 | WHAM2_TM | 5 | 1 | 1 | 39.1 ± 5.5 | 39.1 ± 5.4 | 3.01 ± 0.42 | 3.01 ± 0.58 | LTR |
20 | Sabrina_G1-2 | SABRINA2_TM | 4 | 2 | 2 | 34.7 ± 4.8 | 35.9 ± 4.9 | 2.67 ± 0.37 | 2.76 ± 0.52 | LTR |
21 | Wham_G1-1 | WHAM1_TM | 3 | 3 | 3 | 32.2 ± 4.9 | 31.6 ± 4.8 | 2.48 ± 0.38 | 2.43 ± 0.52 | LTR |
22 | Miuse_G1-1 | MIUSE1_TM | 1 | 2 | 2 | - | < 39 ± 5 | - | < 3.0 ± 0.6 | LINE |
23 | Latidu_G1-1 | LATIDU2_TM | 3 | 1 | 1 | - | - | - | - | LTR |
24 | Eway_G1-1‡ | EWAY1_TM | 3 | 0 | 0 | 0‡ | 73.1 ± 18 | 0‡ | 5.62 ± 1.87 | LTR |
25 | MITE 4A-10 | TREP216 | 1 | 0 | 0 | - | - | - | - | MITE |
26 | MITE 4A-4B | TREP216 | 1 | 0 | 0 | - | - | - | - | MITE |
27 | Barbara | BARBARA_TM | 2 | 0 | 0 | - | - | - | - | LTR |
28 | Angela_G1-1 | ANGELA6_TM | 2 | 0 | 1† | - | - | - | - | LTR |
Manual annotation results of maize [45] and diploid wheat [51] sequences are shown in italics. REANNOTATE results are shown in regular font style. Only elements spanning sequences that were annotated as TEs both in the manual annotation and in the input (REPEATMASKER) to the automated re-annotation are listed. In the first column letters indicate maize elements and correspond to labels in Figure 3C, numbers indicate wheat elements and labels in Figure 4.
a Uppercase names correspond to reference element sequences in REPBASE UPDATE (RU), numbers correspond to reference sequences in the TIGR ZEA REPEAT DATABASE. Rows without an entry for the manually annotated repeat name indicate that REANNOTATE constructed multiple models (one model per row) corresponding to a single element in the manual annotation: for instance, Sabrina_F2-2 corresponds to three automated models, a result due to the fact that (parts of) different RU reference elements, SABRINA2_TM, SABRINA3_TM and SABRINA_HV are closely related, and were best matches (annotated by REPEATMASKER) to different segments of Sabrina_F2-2.
b Number of similarity hits reported by REPEATMASKER that were defragmented into a single element model by REANNOTATE.
c Number of repetitive elements nesting a given element. (†) The first wheat element listed was annotated in [51] to be inserted into a TE sequence with no detectable similarity to reference elements in RU (absent form the input REPEATMASKER annotation); the last wheat element was annotated by REANNOTATE to be interrupting a fragment of an element homologous to CLAUDIA1_TM, which is not present in the manual annotation.
d Estimated number of nucleotide substitutions between intra-element LTRs. (*) REANNOTATE did not date Milt because the 3' LTR is in inverse orientation relative to the rest of the element: an element model was built including the Milt 5' LTR and internal region, and another model for the 3' LTR. (‡) Eway G1-1 was originally annotated as having identical LTRs, but they are in fact quite divergent. (...) Elements Ji-6, Tekay and Kake-1 were dated in the original annotation, but these elements are truncated at the ends of the available 160 Kb of contiguous sequence re-annotated here.
e Estimated time of insertion (million years ago), obtained with the substitution rate for the adh loci of grasses [66]. The standard deviations computed by REANNOTATE are larger than in the manual annotation: in the latter the variance in time was propagated from the variance in K, whilst REANNOTATE additionally accounts for the Poisson variance (stochasticity) in the accumulation of nucleotide substitutions.