Fig 2. Separation of sequences by machine learning-based methods.
A) Data points from all three features sets: sequence only, AAindex and ESM-1b, are represented as a two-dimensional projection of prenylated (red x) and non-prenylated sequences (black dot). The axes are not shown as they represent a linear combination of all features that maximizes variance. B) Bimodal distribution of sequences across the X-axis from the AAindex manifold were graphed as sequence logos. The distribution shown on the left contains a mix of non-prenylated Cxxx sequences and prenylated, non-canonical sequences, while the one on the right mostly consists of prenylated, canonical CaaX sequences. C) A similar two-dimensional projection was used to represent cleaved (red x) and shunted (i.e., uncleaved) sequences (black dot).