Skip to main content
. 2018 Feb 9;7:e31486. doi: 10.7554/eLife.31486

Appendix 1—table 7. Sequence similarity comparison.

Frequencies of dipeptides (pairs of neighboring amino residues) were computed for phase-separating proteins and the human proteome, and enrichment was measured by the percentage of human proteins with lower frequency than found in a given sequence. The fifteen dipeptides enriched (≥99%) in the most sequences within the phase separation test sets are shown in the table vs. enrichment values obtained for the phase separation training set and three experimentally verified proteins. Values in the top fifth percentile are shown in bold.

Protein
Name
Dipeptide enrichment (Percentage of human proteome with lower frequency)
GV VG VP PG FG RG GR GG YG GS SG GA GF GD DS
Training Set Proteins
Elastin 100 100 100 100 97 31 32 99 99 20 20 100 89 38 30
Nsp1 30 34 31 26 100 31 30 75 52 90 38 99 60 66 68
TIA1 73 75 46 26 86 31 86 77 99 29 53 26 84 54 30
LAF1 30 78 65 29 67 99 99 100 77 88 97 65 78 97 32
EIF4H 30 65 31 52 98 99 95 99 52 99 42 79 99 98 89
Ddx3x 51 70 34 43 89 98 97 96 93 93 95 68 96 59 78
hnRNPA1 30 55 31 44 100 99 99 100 99 99 98 44 99 60 79
DDX4 33 77 48 53 98 96 91 89 59 87 96 29 98 96 45
FUS 30 31 31 83 78 100 99 100 100 98 99 33 93 91 57
EWS 52 31 35 97 51 100 99 100 100 72 61 30 97 91 48
TAF15 36 38 31 30 53 100 99 100 100 92 99 26 71 100 94
Experimentally Verified Proteins
FMR1 69 89 93 44 43 96 94 83 62 62 34 70 48 41 67
SCAF pAP 75 50 89 91 43 49 73 97 75 92 96 96 40 36 44
Engrailed-2 30 31 31 97 70 78 90 100 52 99 91 99 40 95 97