Table 3.
Protein region | Permutation class | Number of sequences in the class | Total occurrences of the class | Average occurrences of the class | Variance | Sequence | Occurrences of the sequence | %Total | Z score |
---|---|---|---|---|---|---|---|---|---|
Under-represented | |||||||||
DM | LLLLS | 5 | 1103203 | 220641 | 176512 | SLLLL | 129826 | 11.8 | −216 |
DM | AAELR | 60 | 7061680 | 117695 | 115733 | LEARA | 46773 | 0.7 | −208 |
DM | AELLR | 60 | 5987177 | 99786 | 98123 | LELRA | 35795 | 0.6 | −204 |
DM | AAAAL | 5 | 2256485 | 451297 | 361038 | LAAAA | 335710 | 14.9 | −192 |
DM | GGGGI | 5 | 442963 | 88593 | 70874 | GGGGI | 41341 | 9.3 | −177 |
DM | AAAGL | 20 | 4815784 | 240789 | 228750 | LGAAA | 156132 | 3.2 | −177 |
DM | AEKLL | 60 | 4064306 | 67738 | 66609 | LKLEA | 24337 | 0.6 | −168 |
DM | ELLRR | 30 | 2152724 | 71757 | 69366 | RLELR | 29344 | 1.4 | −161 |
DM | AAALR | 20 | 4377052 | 218853 | 207910 | LRAAA | 145448 | 3.3 | −161 |
DM | EEKLL | 30 | 1999573 | 66652 | 64431 | ELKLE | 26021 | 1.3 | −160 |
Over-represented | |||||||||
DM | EGHKT | 120 | 827985 | 6900 | 6842 | HTGEK | 413267 | 49.9 | 4913 |
DM | FMNSW | 120 | 268723 | 2239 | 2221 | NMSFW | 198813 | 74.0 | 4171 |
DM | PTVWY | 120 | 351052 | 2925 | 2901 | WTVYP | 209428 | 59.7 | 3834 |
DM | GKLST | 120 | 3009236 | 25077 | 24868 | GKSTL | 595718 | 19.8 | 3619 |
DM | EGKPY | 120 | 911766 | 7598 | 7535 | GEKPY | 318604 | 34.9 | 3583 |
DM | EGKPT | 120 | 1461844 | 12182 | 12081 | TGEKP | 389313 | 26.6 | 3431 |
DM | GTVWY | 120 | 445075 | 3709 | 3678 | GWTVY | 210756 | 47.4 | 3414 |
DM | EGMWY | 120 | 217167 | 1810 | 1795 | WMGYE | 145689 | 67.1 | 3396 |
DM | FMNPR | 120 | 339635 | 2830 | 2807 | FPRMN | 177146 | 52.2 | 3290 |
DM | FLMSW | 120 | 362542 | 3021 | 2996 | MSFWL | 182120 | 50.2 | 3272 |
Under-represented | |||||||||
ND | GGPPP | 10 | 586979 | 58698 | 52828 | GPGPP | 10793 | 1.8 | −208 |
ND | AAAPP | 10 | 1239944 | 123994 | 111595 | APPAA | 71014 | 5.7 | −159 |
ND | AGGGG | 5 | 839291 | 167858 | 134287 | GAGGG | 114103 | 13.6 | −147 |
ND | DDSSS | 10 | 606514 | 60651 | 54586 | SDDSS | 26740 | 4.4 | −145 |
ND | GPPPP | 5 | 388310 | 77662 | 62130 | PGPPP | 42721 | 11.0 | −140 |
ND | DDDSS | 10 | 487092 | 48709 | 43838 | DDDSS | 19800 | 4.1 | −138 |
ND | GGGRR | 10 | 580934 | 58093 | 52284 | GRRGG | 26697 | 4.6 | −137 |
ND | AAGGG | 10 | 948949 | 94895 | 85405 | AGGGA | 58015 | 6.1 | −126 |
ND | RRRSS | 10 | 510453 | 51045 | 45941 | RSSRR | 24251 | 4.8 | −125 |
ND | RRSSS | 10 | 563675 | 56368 | 50731 | SRSSR | 29134 | 5.2 | −121 |
Over-represented | |||||||||
ND | DHKPW | 120 | 141538 | 1179 | 1170 | HPDKW | 122449 | 86.5 | 3546 |
ND | DKPTW | 120 | 168175 | 1401 | 1390 | PDKWT | 121326 | 72.1 | 3217 |
ND | KQTVW | 120 | 156893 | 1307 | 1297 | KWTVQ | 116218 | 74.1 | 3191 |
ND | ILQTW | 120 | 171694 | 1431 | 1419 | QITLW | 121321 | 70.7 | 3183 |
ND | DGKMP | 120 | 207222 | 1727 | 1712 | KPGMD | 129015 | 62.3 | 3076 |
ND | DKTVW | 120 | 175850 | 1465 | 1453 | DKWTV | 117182 | 66.6 | 3036 |
ND | GKLMP | 120 | 249781 | 2082 | 2064 | LKPGM | 128663 | 51.5 | 2786 |
ND | ILPQT | 120 | 431365 | 3595 | 3565 | PQITL | 122621 | 28.4 | 1994 |
ND | FIPPS | 60 | 241704 | 4028 | 3961 | FPISP | 110649 | 45.8 | 1694 |
ND | EIPST | 120 | 536102 | 4468 | 4430 | SPIET | 106093 | 19.8 | 1527 |
Under-represented | |||||||||
NN | AGGGG | 5 | 395836 | 79167 | 63334 | AGGGG | 52481 | 13.3 | −106 |
NN | GGGGN | 5 | 157711 | 31542 | 25234 | NGGGG | 17608 | 11.2 | −88 |
NN | AAAPP | 10 | 469392 | 46939 | 42245 | APPAA | 29107 | 6.2 | −87 |
NN | GGPPP | 10 | 100799 | 10080 | 9072 | GPGPP | 2337 | 2.3 | −81 |
NN | AAGGG | 10 | 484217 | 48422 | 43580 | AGGGA | 32303 | 6.7 | −77 |
NN | GGGNN | 10 | 126506 | 12651 | 11386 | GGGNN | 4638 | 3.7 | −75 |
NN | GGGGT | 5 | 151280 | 30256 | 24205 | GTGGG | 18821 | 12.4 | −74 |
NN | LLQQQ | 10 | 160339 | 16034 | 14431 | QQLQL | 8129 | 5.1 | −66 |
NN | DDSSS | 10 | 179338 | 17934 | 16140 | SDDSS | 9821 | 5.5 | −64 |
NN | DDDSS | 10 | 144807 | 14481 | 13033 | DDSSD | 7326 | 5.1 | −63 |
Over-represented | |||||||||
NN | CEFHK | 120 | 41470 | 346 | 343 | KHCFE | 26831 | 64.7 | 1431 |
NN | CEFHV | 120 | 44399 | 370 | 367 | HCFEV | 27459 | 61.8 | 1414 |
NN | CFHKS | 120 | 45270 | 377 | 374 | SKHCF | 26211 | 57.9 | 1336 |
NN | DESTV | 120 | 348198 | 2902 | 2877 | TDEVS | 48183 | 13.8 | 844 |
NN | CEFVV | 60 | 44179 | 736 | 724 | CFEVV | 22635 | 51.2 | 814 |
NN | HKSSV | 60 | 89154 | 1486 | 1461 | VSSKH | 31133 | 34.9 | 776 |
NN | DEFVV | 60 | 148928 | 2482 | 2441 | FEVVD | 37821 | 25.4 | 715 |
NN | CHKSS | 60 | 34640 | 577 | 568 | SSKHC | 16273 | 47.0 | 659 |
NN | DDERT | 60 | 122092 | 2035 | 2001 | DRTDE | 30087 | 24.6 | 627 |
NN | DERTV | 120 | 255182 | 2127 | 2109 | RTDEV | 30351 | 11.9 | 615 |
Ten most underrepresented and ten most overrepresented peptides in DM, in ND and in NN protein regions, respectively.