Table 1.
Class | Concept | Number of Annotations | Number of Documents | % Frequency (Eq. 1)ψ | Mean (Eq. 2) | Std (Eq. 3) | VMR (Eq. 4) |
---|---|---|---|---|---|---|---|
Genes | relA | 3163 | 138 | 71.50 | 22.92 | 27.23 | 33.14 |
spoT | 1315 | 88 | 45.60 | 14.94 | 27.42 | 52.07 | |
lac | 354 | 63 | 32.64 | 5.620 | 19.42 | 72.20 | |
lacZ | 534 | 50 | 25.91 | 10.68 | 17.16 | 28.90 | |
thi | 91 | 47 | 24.35 | 1.940 | 0.050 | 4.000 | |
rel | 523 | 47 | 24.35 | 11.13 | 20.68 | 36.36 | |
recA | 82 | 39 | 20.21 | 2.100 | 1.810 | 0.5000 | |
rpsL | 95 | 36 | 18.65 | 2.640 | 3.530 | 4.500 | |
thr | 84 | 36 | 18.65 | 2.330 | 3.760 | 4.500 | |
rpsG | 103 | 34 | 17.62 | 3.030 | 7.250 | 16.33 | |
leu | 98 | 34 | 17.62 | 2.880 | 6.800 | 18.00 | |
rpoS | 205 | 33 | 17.10 | 6.210 | 10.83 | 16.67 | |
kan | 308 | 33 | 17.10 | 9.330 | 16.61 | 28.44 | |
glnV | 42 | 31 | 16.06 | 1.350 | 0.7400 | 0 | |
rpoB | 389 | 30 | 15.54 | 12.97 | 17.60 | 24.08 | |
ptsG | 240 | 30 | 15.54 | 8.000 | 21.54 | 55.13 | |
trp | 144 | 25 | 12.95 | 5.760 | 14.73 | 39.20 | |
carA | 60 | 20 | 10.36 | 3.000 | 3.810 | 3.000 | |
hsdR | 23 | 19 | 9.840 | 1.210 | 0.5600 | 0 | |
DNAs | DNA | 1839 | 137 | 70.98 | 13.42 | 16.31 | 19.69 |
plasmid DNA | 193 | 36 | 18.65 | 5.360 | 12.31 | 28.80 | |
chromosomal DNA | 63 | 24 | 12.44 | 2.630 | 2.440 | 2.000 | |
cDNA | 125 | 23 | 11.92 | 5.430 | 5.820 | 5.000 | |
RNAs | RNA | 4193 | 140 | 72.54 | 29.95 | 38.21 | 49.79 |
uncharged tRNA | 1168 | 117 | 60.62 | 9.980 | 19.64 | 40.11 | |
rRNA | 1116 | 97 | 50.26 | 11.51 | 25.97 | 56.82 | |
a mRNA | 999 | 91 | 47.15 | 10.98 | 19.52 | 36.10 | |
rrnA | 911 | 87 | 45.08 | 10.47 | 22.51 | 48.40 | |
stable RNA | 430 | 87 | 45.08 | 4.940 | 8.030 | 16.00 | |
a charged tRNA | 140 | 43 | 22.28 | 3.260 | 4.200 | 5.330 | |
rrnB | 301 | 26 | 13.47 | 11.58 | 19.30 | 32.82 | |
rrn | 321 | 26 | 13.47 | 12.35 | 30.42 | 75.00 | |
16s-rRNAs | 156 | 25 | 12.95 | 6.240 | 9.090 | 13.50 |
Individual genetic components (i.e. genes, DNAs and RNAs) were evaluated considering the number of documents where these entities were annotated and their number of annotations in the corpus. Statistical measurements are detailed in the Methods and Materials section.
ψ A threshold of 10% of the frequency of annotation was set for each genetic component category. However, lists of all annotated entities are provided in Additional file 5.
VMR: variance-to-mean
Std: standard deviation