Supplementary Table 3 The 50 2 of 3 consensus 5-residue (10-letter alphabet) exceptional words that are most abundant in PfamAB. The 5-residue words shown use the 20 to 10-letter mapping: LMIV to L, C, A, G, ST to S, P, FYW to F, EDNQ to E, KR to K, H. Shown are the word, its clump frequency in PfamAB (PfamAB_clumps), its log2(over-representation) compared to the average of the three models (uniform random, shuffled, win10-shuffled), the word's rank (out of 100,000) by clump count, the word's rank based on log2(over-representation), the number of times the word is present in our non-redundant CATH set (n_cath), and the fraction of the words in alpha-helix and beta-sheet from the CATH structure. word PfamAB_clumps ave_log2_odds clump_rank log2_rank n_cath f_helix f_sheet LEELL 14684 0.4662683 100000 87681 128 0.7890625 0.10156250 EELLE 14026 0.4706382 99999.0 87838 128 0.7281250 0.06718750 LLEEL 13866 0.3830681 99998.0 84152 100 0.7240000 0.09200000 ELEEL 13168 0.3916396 99997.0 84561 100 0.6441352 0.10139170 LEELE 13150 0.3886190 99996.0 84421 105 0.5875706 0.12052730 ELLEE 12942 0.3544825 99995.0 82710 104 0.7461538 0.07884615 SLEEL 8820 0.5978107 99976.0 91501 71 0.6901408 0.09859155 LEELK 8732 0.6720665 99974.0 93114 72 0.7194444 0.08055556 LKELL 8370 0.5412230 99973.0 90010 83 0.7469880 0.07951807 ELLKE 8329 0.6081751 99972.0 91754 80 0.7750000 0.05000000 LLEKL 8248 0.5202856 99971.0 89435 90 0.8111111 0.08666667 EELLK 8088 0.5689516 99969.0 90754 72 0.8611111 0.02222222 ELEKL 8059 0.5558008 99968.0 90404 72 0.7166667 0.06666667 LEKLL 7950 0.4658275 99967.0 87658 72 0.8351648 0.05494505 EKLLE 7870 0.5282775 99966.0 89653 69 0.8280802 0.05730659 LLKEL 7862 0.4501884 99965.0 87064 74 0.6837838 0.08918919 EELKE 7837 0.6114338 99964.0 91827 63 0.6698413 0.06349206 ELLEK 7771 0.5080011 99962.0 89052 63 0.6952381 0.15238100 EELEK 7683 0.5810580 99959.0 91089 63 0.7650794 0.00952381 KELLE 7601 0.4792067 99956.0 88110 64 0.6687500 0.12812500 KLEEL 7549 0.4648466 99955.0 87614 61 0.7213115 0.09508197 KLLEE 7505 0.4614568 99954.0 87476 67 0.7552239 0.05074627 LKELE 7432 0.4416805 99953.0 86741 62 0.4677419 0.14838710 LSEEE 7411 0.4482957 99952.0 87003 62 0.4193548 0.05161290 ELKEL 7287 0.4218724 99950.0 85938 60 0.5533333 0.13000000 LEELS 7277 0.3212178 99949.0 80803 50 0.4800000 0.11200000 LEKLE 7181 0.3993460 99948.0 84938 73 0.5896739 0.19565220 LKEEL 7122 0.3808843 99944.0 84027 55 0.6327273 0.10181820 SEEEL 7090 0.3857497 99942.0 84291 46 0.5739130 0.04782609 KELEE 6932 0.4357297 99939.0 86505 47 0.6000000 0.06808510 EKLEE 6836 0.4148833 99935.0 85615 62 0.7388535 0.05095541 LEEFL 5358 0.4794099 99870.0 88121 46 0.7913043 0.06956522 FEELL 5316 0.4661815 99865.0 87676 54 0.6814815 0.11851850 FLEEL 5225 0.4424333 99858.0 86770 51 0.6509804 0.12941180 EELKK 5191 0.8802133 99854.0 95854 40 0.8550000 0.00500000 EFLEE 5069 0.4866112 99841.0 88379 53 0.6113208 0.06792453 LEKLK 5020 0.7427142 99835.0 94244 51 0.7686275 0.07450980 EKLKE 5011 0.8288539 99834.0 95348 50 0.7640000 0.04000000 EEFLE 4956 0.4579991 99829.5 87353 44 0.7454545 0.04545455 LFEEL 4917 0.3553043 99826.0 82746 35 0.6857143 0.12571430 LLEAL 4561 0.4614197 99806.0 87474 46 0.8565217 0.03478261 KLKEL 4556 0.6068681 99805.0 91714 36 0.6388889 0.11666670 EFEEL 4540 0.3282595 99804.0 81205 43 0.5767442 0.05116279 ELLKK 4524 0.5961694 99801.5 91460 44 0.7909091 0.07272727 LKKLL 4524 0.5146060 99801.5 89255 42 0.7571429 0.10476190 LEEAL 4465 0.5546654 99796.0 90369 58 0.8931034 0.04137931 KKLLE 4459 0.5808958 99792.5 91085 44 0.7500000 0.09545455 LKEKL 4459 0.5689528 99792.5 90755 45 0.7777778 0.03111111 SLKEL 4415 0.5104032 99790.0 89134 43 0.6558140 0.15348840 ELKKL 4393 0.5515337 99789.0 90287 34 0.7000000 0.09411765