(A). The amino acid bias of the C-terminus of proteins in our Glycosome Conserved Enzyme Collection (GCEC) training dataset from Trypanosoma cruzi, Trypanosoma brucei, and Leishmania donovani (L. major homologues of L. donovani proteins were used in the amino acid bias calculations). The last three amino acids of the proteins could harbor glycosome targeting signals (glycosome PTS1s). The position of the first amino acid of the PTS1 is position 1, with the final amino acid of the protein being position 3. (B). Frequencies of PTS1 scores for GCEC proteins (blue, positive set) and a dataset of non-glycosomal proteins (red, negative set). Bars shown underneath the graph indicate mean and standard deviation of each dataset. Cutoff values are shown as vertical dotted lines for pre-selected protein datasets (left) and whole genome prediction (right).