Skip to main content
. 2020 Oct 19;16(10):e1008319. doi: 10.1371/journal.pcbi.1008319

Table 3. The top three genes with the highest total feature importance from all k-mers used in each model.

Results are based on ten separate models built from 100 randomly sampled non-overlapping core gene sets.

Protein family Average gene length Cumulative feature importance PATRIC annotation
K. pneumoniae
PLF_570_00002496 1422.3 818.8 Uncharacterized 55.8 kDa protein in cps region (ORF3)
PLF_570_00001044 1419.6 669.2 Alpha,alpha-trehalose-phosphate synthase [UDP-forming] (EC 2.4.1.15)
PLF_570_00003328 1247.9 533.3 N-carbamoyl-L-amino acid hydrolase (EC 3.5.1.87)
M. tuberculosis
PLF_1763_00001229 909.9 1157.0 OpcA, an allosteric effector of glucose-6-phosphate dehydrogenase, actinobacterial
PLF_1763_00001419 697.0 649.9 putative phosphoglycerate mutase
PLF_1763_00001681 1580.9 518.2 DNA methylase
S. enterica
PLF_590_00006292 11345.6 1828.8 T1SS secreted agglutinin RTX
PLF_590_00001168 1593.0 616.6 Bis-ABC ATPase YbiT
PLF_590_00001455 906.0 571.3 GTP-binding protein Era
S. aureus
PLF_1279_00000411 477.3 484.8 23S rRNA (pseudouridine(1915)-N(3))-methyltransferase (EC 2.1.1.177)
PLF_1279_00002023 687.0 250.0 Phage-encoded chromosome degrading nuclease YokF
PLF_1279_00001560 899.8 232.0 Secretory antigen precursor SsaA