Table 2.
ID | Feature set | Dimension | Top ranking feature names, mutual information and standard errors |
---|---|---|---|
1 | The compositional factor | 6 | – |
2 | The bi-transitional factor | 18 | AC: 0.0163 ± 0.0058; CA: 0.0427 ± 0.0103 |
3 | The distributional factor | 20 | DA(3/4): 0.1009 ± 0.0136; DG(0): 0.0376 ± 0.0084 |
4 | The tri-transitional factor | 66 |
AAC: 0.0289 ± 0.0081; CCA: 0.0162 ± 0.0064 CGA: 0.0229 ± 0.0070; GCA: 0.0195 ± 0.0069 UAG: 0.0127 ± 0.0058; UCG: 0.0063 ± 0.0032 |
5 | The spaced bi-gram factor | 18 | – |
6 | The potential base-pairing factor | 3 | G-C: 0.0225 ± 0.0079 |
7 | The asymmetry of direct-complementary triplets | 3 | ADCT1: 0.0380 ± 0.0096 |
8 | The nucleotide proportional factor | 12 | – |
9 | The potential single-stranded factor | 3 | – |
10 | The sequence specific score | 1 | The sequence specific score: 0.0089 ± 0.0049 |
11 | The segmental factor | 40 | Normalized Seg5: 0.0069 ± 0.0044 |
12 | The sequence moment | 15 | η2(C): 0.0142 ± 0.0060 |
13 | The spectral properties | 20 | PC: 0.0587 ± 0.0107 |
14 | The wavelet features | 20 |
q2(A): 0.0191 ± 0.0061, q2(G): 0.0123 ± 0.0050 q3(U): 0.0162 ± 0.0056, q3(ACGU): 0.0198 ± 0.0068 |
15 | The 2D-dynamic representation | 19 | μ23: 0.0093 ± 0.0034 |
16 | The protein features | 375 |
RF1-P10: 0.0150 ± 0.0059; RF1-V12: 0.0277 ± 0.0082 RF1-Z2: 0.0164 ± 0.0056; RF2-C1: 0.0138 ± 0.0063 RF2-S12: 0.0179 ± 0.0069; RF2-S5: 0.0135 ± 0.0057 RF2-S8: 0.0317 ± 0.0077; RF2-Z12: 0.0350 ± 0.0095 RF3-C10: 0.0065 ± 0.0037; RF3-H1: 0.0170 ± 0.0053 RF3-H20: 0.0253 ± 0.0077; RF3-H7: 0.0531 ± 0.0098 RF3-P15: 0.0284 ± 0.0076; RF3-P18: 0.0477 ± 0.0082 RF3-S14: 0.0329 ± 0.0086; RF3-S7: 0.0275 ± 0.0081 RF3-S9: 0.0174 ± 0.0062; RF3-V12: 0.0396 ± 0.0091 RF3-V16: 0.0121 ± 0.0046 |
17 | The co-occurrence factor | 10 | – |
18 | The 2D graphical representation | 36 | MM-10: 0.0023 ± 0.0022 |
19 | The dinucleotides factor | 32 |
d1(C, U): 0.0118 ± 0.0057; d2(A, G): 0.0078 ± 0.0037 d2(A, U): 0.0102 ± 0.0050; d2(C, A): 0.0161 ± 0.0065 d2(U; G): 0.0203 ± 0.0073 |
20 | The wavelet encoding for 2D graphical representation | 24 |
w4(ACUG): 0.0189 ± 0.0060; w3(AGCU): 0.0141 ± 0.0049 w2(AUCG): 0.0142 ± 0.0055; w3(AUGC): 0.0111 ± 0.0050 |
21 | The sequence length | 1 | – |
Total | 742 | 50 |