Table 5.
List of the most (A) frequent 5-mers and (B) avoided 5-mers in the human proteome. The peptide search feature of UniProt (https://www.uniprot.org/peptidesearch/) was utilized to estimate the number of distinct proteins in which the overabundant/avoided motifs occur
| A | Frequent sequences | Number of proteins contain the motif (Swiss-Prot) | Number of proteins contain the motif (TrEMBL) |
| CGKSF | 288 | 494 | |
| CGKTF | 250 | 397 | |
| CGKGF | 161 | 236 | |
| CGKAF | 411 | 673 | |
| HQRVH | 152 | 216 | |
| TGEKP | 525 | 965 | |
| YRDVM | 167 | 472 | |
| HERTH | 73 | 108 | |
| CGKVF | 134 | 198 | |
| HKRIH | 114 | 160 | |
| GEKPY | 481 | 915 | |
| B | Avoided sequences | Number of proteins contain the motif (Swiss-Prot) | Number of proteins contain the motif (TrEMBL) |
| LTGEK | 20 | 45 | |
| GEKPL | 21 | 61 | |
| GEKPS | 16 | 48 | |
| EGEKP | 4 | 9 | |
| TGEKG | 31 | 100 | |
| KGEKP | 15 | 36 | |
| GEKPK | 6 | 19 | |
| LGKAF | 11 | 18 | |
| GEKPT | 6 | 5 | |
| HTGEE | 9 | 25 | |
| PEKPY | 5 | 4 |