Table 2.
Class | Concept | Number of Annotations | Number of Documents | % Frequency (Eq. 1)ψ | Mean (Eq. 2) | Std (Eq. 3) | VMR (Eq. 4) |
---|---|---|---|---|---|---|---|
Proteins | Ribosome | 1643 | 128 | 66.32 | 12.84 | 23.57 | 44.08 |
Rel | 1021 | 62 | 32.12 | 16.50 | 36.60 | 81.00 | |
LacZ | 543 | 53 | 27.46 | 10.30 | 17.44 | 28.90 | |
Sigma 38 factor | 392 | 42 | 21.76 | 9.330 | 15.40 | 25.00 | |
Sigma factor | 112 | 35 | 18.13 | 3.200 | 5.870 | 8.330 | |
UvrD | 56 | 35 | 18.13 | 1.600 | 1.300 | 1.000 | |
RpoB | 252 | 35 | 18.13 | 7.200 | 11.50 | 17.29 | |
RecA | 99 | 31 | 16.06 | 3.190 | 4.260 | 5.330 | |
EF-Tu | 223 | 26 | 13.47 | 8.580 | 17.32 | 36.13 | |
Der | 51 | 25 | 12.95 | 2.040 | 2.140 | 2.000 | |
Sigma 70 factor | 134 | 21 | 10.88 | 6.380 | 11.19 | 20.17 | |
Transcription factors | Fis | 888 | 18 | 9.330 | 49.33 | 86.88 | 150.9 |
Fur | 56 | 13 | 6.740 | 4.310 | 9.260 | 20.25 | |
CRP | 279 | 12 | 6.220 | 23.25 | 36.28 | 56.35 | |
DnaA | 121 | 11 | 5.700 | 11.00 | 23.00 | 48.09 | |
H-NS | 73 | 11 | 5.700 | 6.640 | 10.73 | 16.67 | |
LexA | 101 | 10 | 5.180 | 10.10 | 18.32 | 32.40 | |
IHF | 54 | 9 | 4.660 | 6.000 | 5.250 | 4.170 | |
Enzymes | RelA | 4138 | 152 | 78.76 | 27.22 | 31.16 | 35.59 |
RNAP | 1873 | 117 | 60.62 | 16.01 | 28.08 | 49.00 | |
SpoT | 1024 | 60 | 31.09 | 17.07 | 42.19 | 103.8 | |
EcoRI | 215 | 53 | 27.46 | 4.060 | 4.970 | 4.000 | |
β-galactosidase | 294 | 47 | 24.35 | 6.260 | 6.550 | 6.000 | |
BamHI | 149 | 43 | 22.28 | 3.470 | 5.870 | 8.330 | |
HindIII | 114 | 41 | 21.24 | 2.780 | 2.160 | 2.000 | |
RNase | 109 | 36 | 18.65 | 3.030 | 4.280 | 5.330 | |
YbcS | 50 | 23 | 11.92 | 2.170 | 2.620 | 2.000 | |
Reverse transcriptase | 34 | 21 | 10.88 | 1.620 | 1.050 | 1.000 | |
tRNA synthetase | 54 | 20 | 10.36 | 2.700 | 2.630 | 2.000 | |
Endonuclease I | 29 | 20 | 10.36 | 1.450 | 1.400 | 1.000 |
Individual gene products (i.e. enzymes, transcription factors and other proteins) were evaluated considering the number of documents where these entities were annotated and their number of annotations in the corpus. Statistical measurements are detailed in the Methods and Materials section.
ψ A threshold of 10% of the frequency of annotation was set for enzymes and other proteins, whereas a threshold of 5% was set for transcription factors. However, lists of all annotated entities are provided in Additional file 6.
VMR: variance-to-mean
Std: standard deviation