Skip to main content
. 2021 Oct 14;40(3):411–421. doi: 10.1038/s41587-021-01045-9

Extended Data Fig. 9. COSMIC confidence score vs. exact FDR and ratio of annotated compounds.

Extended Data Fig. 9

Independent data (Agilent, QTOF), 20 eV, medium noise, N = 3, 013. We vary the confidence score threshold and present the resulting exact FDR (a) and the ratio of annotated compounds (b). Dashed lines indicate COSMIC confidence score thresholds of 0.94, 0.64, 0.34, and 0.14, corresponding to exact FDR levels of rougly 5 %, 10 %, 20 %, and 30 %, respectively. The spike for high tresholds beyond 0.9 is an artifact of the small number of hits that pass this threshold; hence, a few incorrect hits with high confidence score can lead to high FDR. In practice, confidence scores depend on numerous factors such as the overall quality of the data and the identity of the query compounds. Hence, these thresholds come with no guarantee in either direction: For example, in the CASMI 2016 dataset, a smaller confidence score threshold of 0.53 corresponded to exact FDR 10 %, and using the abovementioned threshold of 0.64 would have returned fewer hits than possible. Nevertheless, these thresholds may serve as a starting point for practitioners.

Source data