Skip to main content
. 2018 Oct 11;8:15178. doi: 10.1038/s41598-018-33433-8

Table 3.

Summary of the top outlier pentapeptides.

Protein region Permutation class Number of sequences in the class Total occurrences of the class Average occurrences of the class Variance Sequence Occurrences of the sequence %Total Z score
Under-represented
DM LLLLS 5 1103203 220641 176512 SLLLL 129826 11.8 −216
DM AAELR 60 7061680 117695 115733 LEARA 46773 0.7 −208
DM AELLR 60 5987177 99786 98123 LELRA 35795 0.6 −204
DM AAAAL 5 2256485 451297 361038 LAAAA 335710 14.9 −192
DM GGGGI 5 442963 88593 70874 GGGGI 41341 9.3 −177
DM AAAGL 20 4815784 240789 228750 LGAAA 156132 3.2 −177
DM AEKLL 60 4064306 67738 66609 LKLEA 24337 0.6 −168
DM ELLRR 30 2152724 71757 69366 RLELR 29344 1.4 −161
DM AAALR 20 4377052 218853 207910 LRAAA 145448 3.3 −161
DM EEKLL 30 1999573 66652 64431 ELKLE 26021 1.3 −160
Over-represented
DM EGHKT 120 827985 6900 6842 HTGEK 413267 49.9 4913
DM FMNSW 120 268723 2239 2221 NMSFW 198813 74.0 4171
DM PTVWY 120 351052 2925 2901 WTVYP 209428 59.7 3834
DM GKLST 120 3009236 25077 24868 GKSTL 595718 19.8 3619
DM EGKPY 120 911766 7598 7535 GEKPY 318604 34.9 3583
DM EGKPT 120 1461844 12182 12081 TGEKP 389313 26.6 3431
DM GTVWY 120 445075 3709 3678 GWTVY 210756 47.4 3414
DM EGMWY 120 217167 1810 1795 WMGYE 145689 67.1 3396
DM FMNPR 120 339635 2830 2807 FPRMN 177146 52.2 3290
DM FLMSW 120 362542 3021 2996 MSFWL 182120 50.2 3272
Under-represented
ND GGPPP 10 586979 58698 52828 GPGPP 10793 1.8 −208
ND AAAPP 10 1239944 123994 111595 APPAA 71014 5.7 −159
ND AGGGG 5 839291 167858 134287 GAGGG 114103 13.6 −147
ND DDSSS 10 606514 60651 54586 SDDSS 26740 4.4 −145
ND GPPPP 5 388310 77662 62130 PGPPP 42721 11.0 −140
ND DDDSS 10 487092 48709 43838 DDDSS 19800 4.1 −138
ND GGGRR 10 580934 58093 52284 GRRGG 26697 4.6 −137
ND AAGGG 10 948949 94895 85405 AGGGA 58015 6.1 −126
ND RRRSS 10 510453 51045 45941 RSSRR 24251 4.8 −125
ND RRSSS 10 563675 56368 50731 SRSSR 29134 5.2 −121
Over-represented
ND DHKPW 120 141538 1179 1170 HPDKW 122449 86.5 3546
ND DKPTW 120 168175 1401 1390 PDKWT 121326 72.1 3217
ND KQTVW 120 156893 1307 1297 KWTVQ 116218 74.1 3191
ND ILQTW 120 171694 1431 1419 QITLW 121321 70.7 3183
ND DGKMP 120 207222 1727 1712 KPGMD 129015 62.3 3076
ND DKTVW 120 175850 1465 1453 DKWTV 117182 66.6 3036
ND GKLMP 120 249781 2082 2064 LKPGM 128663 51.5 2786
ND ILPQT 120 431365 3595 3565 PQITL 122621 28.4 1994
ND FIPPS 60 241704 4028 3961 FPISP 110649 45.8 1694
ND EIPST 120 536102 4468 4430 SPIET 106093 19.8 1527
Under-represented
NN AGGGG 5 395836 79167 63334 AGGGG 52481 13.3 −106
NN GGGGN 5 157711 31542 25234 NGGGG 17608 11.2 −88
NN AAAPP 10 469392 46939 42245 APPAA 29107 6.2 −87
NN GGPPP 10 100799 10080 9072 GPGPP 2337 2.3 −81
NN AAGGG 10 484217 48422 43580 AGGGA 32303 6.7 −77
NN GGGNN 10 126506 12651 11386 GGGNN 4638 3.7 −75
NN GGGGT 5 151280 30256 24205 GTGGG 18821 12.4 −74
NN LLQQQ 10 160339 16034 14431 QQLQL 8129 5.1 −66
NN DDSSS 10 179338 17934 16140 SDDSS 9821 5.5 −64
NN DDDSS 10 144807 14481 13033 DDSSD 7326 5.1 −63
Over-represented
NN CEFHK 120 41470 346 343 KHCFE 26831 64.7 1431
NN CEFHV 120 44399 370 367 HCFEV 27459 61.8 1414
NN CFHKS 120 45270 377 374 SKHCF 26211 57.9 1336
NN DESTV 120 348198 2902 2877 TDEVS 48183 13.8 844
NN CEFVV 60 44179 736 724 CFEVV 22635 51.2 814
NN HKSSV 60 89154 1486 1461 VSSKH 31133 34.9 776
NN DEFVV 60 148928 2482 2441 FEVVD 37821 25.4 715
NN CHKSS 60 34640 577 568 SSKHC 16273 47.0 659
NN DDERT 60 122092 2035 2001 DRTDE 30087 24.6 627
NN DERTV 120 255182 2127 2109 RTDEV 30351 11.9 615

Ten most underrepresented and ten most overrepresented peptides in DM, in ND and in NN protein regions, respectively.