Table 2.
Data type | Number of entriesa (2011‐09‐15) | Number of entries (2016‐08‐15) |
---|---|---|
GK FSTs | ∼133,000 | 143,601 |
Lines | 71,235b | 77,034c |
with segregation data | 15,289 | 20,037 |
available at NASC | 9,644 | 13,967 |
Insertion alleles (predicted genome hits) | 88,580d | 95,233d |
analyzed with final result | 16,081e | 26,319e |
delivered to individual users | 6,816 | 7,819 |
confirmed and available at NASC | 9,653f | 14,280f |
Distinct genes covered | 21,005 | 24,789 |
protein coding genes | 19,120 | 20,697 |
RNA-encoding genes | 182 | 988 |
pseudogenes | 420 | 481 |
transposable element genes | 1,283 | 1,416 |
Distinct CDSi covered | 13,037 | 14,235 |
a Numbers as of September 15, 2011, taken from (Kleinboelting et al. 2012).
b Database release version 24.
c Database release version 28 from August 15, 2016.
d Insertion alleles are different from lines, because a line can contain several insertions. An insertion is expected to be different from another one in the same line if the distance between the two predicted insertion positions is at least 20 kbp (Kleinboelting et al. 2015). The gain of 6,653 predicted insertion alleles (from 88,580 (September 15, 2011) to 95,233 (August 15, 2016)) is in part due to data from the Ecker group (O’Malley et al. in preparation). Selected GK-lines were analyzed by TDNA-Seq using Illumina technology (NCBI accession numbers KG779961 to KG787552), and the resulting predictions have been included in SimpleSearch. In addition, 119 cases are derived from ‘composite FSTs’ as described (Huep et al. 2014).
e A final result can be ‘confirmed’, but also ‘failed to confirm’ or ‘part of a contamination group’; see (Kleinboelting et al. 2012).
f For each confirmed insertion there are confirmation sequences available which are generated from the amplicon that spans the T-DNA/genome sequence junction. For about 1,400 insertions there are data from both (the ‘north’ and the ‘south’) junction of the inserted T-DNA sequences (Kleinboelting et al. 2015).