Skip to main content
. 2021 Jul 12;37(Suppl 1):i245–i253. doi: 10.1093/bioinformatics/btab311

Fig. 2.

Fig. 2.

Encoding module and choice of classifier drive classification performance. Publicly available modules—trained to classify natural images—were used to encode off-the-shelf feature vectors. Exceptions to this are the gold standard datasets proteins, peptides3 and peptides4, which were obtained using a curated proteomics analysis pipeline. Classification performance, measured by AUC, is reported in order of descending median AUC for different classifiers and two resolutions of MS images (rasterized spectra). Here, we only report results obtained using concatenated feature vectors encoded from MS1 and all MS2 images (ms1_and_ms2). As observed in the figure, the main driver of performance is the encoding of features. Different off-the-shelf features achieve results ranging from 0.623 up to 0.849 median AUC, while gold standard features reached 0.951 median AUC. The variance over results from different classifiers is much larger for off-the-shelf features compared to the gold standard features