Table 2.
PCA dimension | PCA-derived motif | Most frequent match | Frequency ranking |
---|---|---|---|
1 | ZMHKKRHZ | ZMHKKRHZ | 1 |
2 | ZEYGEQZ | ZEYGEQZ | 2 |
3 | ZRYGTZ | ZRYGTZ | 3 |
4 | ZGERQZ | ZGERQZ | 4 |
ZGEZ | |||
5 | ZGVYGGFZ | ZGVYGGFZ | 7 |
ZGVYZ | ZGVYZ | 8 | |
6 | ZAKERHZ | ZAKERHZ | 5 |
GVY | ZGVYZ | 8 | |
ZE(V/K/R)XZ | ZEKERHZ | 108 | |
7 | ZAKEXH | ZAKERHZ | 5 |
Z(A/G)YVZ | ZGYVZ | 6 | |
8 | ZEEVHZ | ZEEVHZ | 9 |
9 | Z(A/W)(E/Y)EHRZ | ZWEGRQZ | 10 |
ZWEGR(Q)Z | ZAYEHRZ | 12 | |
VEGRQ | |||
10 | VXZ | ||
Z(A/G)Y(Y/E)HRZ | ZGVYZ | 8 | |
ZGY(Y)Z | ZAYEHRZ | 12 | |
ZGAYEHRZ |
Motifs were reconstructed for the first 10 PCA dimensions and used to search the NGS results for the ranking of the highest enriched sequences (highest Z-scores). Because the second round post-selection library was significantly smaller and heavily biased, correlation between PCA and NGS frequency is very good. Some sequence variation and small motifs can still be seen.