Skip to main content
. 2021 Mar 10;49(6):3139–3155. doi: 10.1093/nar/gkab139

Table 5.

List of the most (A) frequent 5-mers and (B) avoided 5-mers in the human proteome. The peptide search feature of UniProt (https://www.uniprot.org/peptidesearch/) was utilized to estimate the number of distinct proteins in which the overabundant/avoided motifs occur

A Frequent sequences Number of proteins contain the motif (Swiss-Prot) Number of proteins contain the motif (TrEMBL)
CGKSF 288 494
CGKTF 250 397
CGKGF 161 236
CGKAF 411 673
HQRVH 152 216
TGEKP 525 965
YRDVM 167 472
HERTH 73 108
CGKVF 134 198
HKRIH 114 160
GEKPY 481 915
B Avoided sequences Number of proteins contain the motif (Swiss-Prot) Number of proteins contain the motif (TrEMBL)
LTGEK 20 45
GEKPL 21 61
GEKPS 16 48
EGEKP 4 9
TGEKG 31 100
KGEKP 15 36
GEKPK 6 19
LGKAF 11 18
GEKPT 6 5
HTGEE 9 25
PEKPY 5 4