Skip to main content
. 2021 Apr 28;11:9134. doi: 10.1038/s41598-021-88708-4

Table 2.

Comparison of PCA-derived enriched sequences and NGS read frequency for the 2nd round library.

PCA dimension PCA-derived motif Most frequent match Frequency ranking
1 ZMHKKRHZ ZMHKKRHZ 1
2 ZEYGEQZ ZEYGEQZ 2
3 ZRYGTZ ZRYGTZ 3
4 ZGERQZ ZGERQZ 4
ZGEZ
5 ZGVYGGFZ ZGVYGGFZ 7
ZGVYZ ZGVYZ 8
6 ZAKERHZ ZAKERHZ 5
GVY ZGVYZ 8
ZE(V/K/R)XZ ZEKERHZ 108
7 ZAKEXH ZAKERHZ 5
Z(A/G)YVZ ZGYVZ 6
8 ZEEVHZ ZEEVHZ 9
9 Z(A/W)(E/Y)EHRZ ZWEGRQZ 10
ZWEGR(Q)Z ZAYEHRZ 12
VEGRQ
10 VXZ
Z(A/G)Y(Y/E)HRZ ZGVYZ 8
ZGY(Y)Z ZAYEHRZ 12
ZGAYEHRZ

Motifs were reconstructed for the first 10 PCA dimensions and used to search the NGS results for the ranking of the highest enriched sequences (highest Z-scores). Because the second round post-selection library was significantly smaller and heavily biased, correlation between PCA and NGS frequency is very good. Some sequence variation and small motifs can still be seen.