Skip to main content
. 2020 Nov 27;2(4):lqaa098. doi: 10.1093/nargab/lqaa098

Table 1.

Top 15 features ranked by importance for each training dataset, from completely balanced (50% positive, 50% negative) to most imbalanced (5% positive, 95% negative)

Training dataset distribution
50%–50% 40%–60% 30%–70% 20%–80% 10%–90% 5%–95%
PF00698.21 PF00698.21 GO:0008168 HGTGTQ PF00109.26 TACSSS
PF00668.20 HGTGTQ HGTGTQ GO:0008152 GO:0044550 GTGTQA
ADGYCR GO:0031177 GQGAQW PF00550.25 LYRTGD GYARGE
GO:0016491 GAGTGG GYCRAD IDTACS VFTGQG GO:0046148
FDGYRF VEMHGT GAGTGG PF02458.15 NFSAAG TGDLAR
GO:0016740 VFTGQG QQRLLL DTACSS VEAHGT SINSFG
MHGTGT PF00668.20 TACSSS VTLSGD GO:0043041 DPQQRL
DTACSS GO:0016874 PF02801.22 FTGQGA GHSLGE LFTSGS
GO:1900557 YKTGDL GO:0009058 PF08242.12 AYEALE NSFGFG
GO:0009058 GO:0019184 GO:0046148 AYGPTE GO:0016491 CDTAVA
GRFFAA GO:0043042 GO:0047462 GO:0004315 TQVKIR FDASFF
PF14765.6 PGRFFA GEYAAL GO:0031177 GO:0046500 AYGPTE
MDPQQR MHGTGT GO:0005829 KLRGFR DTACSS YILFTS
FTSGST GO:1900790 PFAFHS GO:0016021 GO:0032259 AIVLAG
GQGAQW VEIGPH LHSLEA PF00067.22 DTFVRC AVVGHS

Highlighted features appeared in multiple datasets.