Skip to main content
. Author manuscript; available in PMC: 2024 Aug 12.
Published in final edited form as: Nat Methods. 2024 Feb 12;21(2):182–194. doi: 10.1038/s41592-023-02150-0

Figure 1:

Figure 1:

Examples of metric-related pitfalls in image analysis validation. (A) Medical image analysis example: Voxel-based metrics are not appropriate for detection problems. Measuring the voxel-level performance of a prediction yields a near-perfect Sensitivity. However, the Sensitivity at the instance level reveals that lesions are actually missed by the algorithm. (B) Biological image analysis example: The task of predicting fibrillarin in the dense fibrillary component of the nucleolus should be phrased as a segmentation task, for which segmentation metrics reveal the low quality of the prediction. Phrasing the task as image reconstruction instead and validating it using metrics such as the Pearson Correlation Coefficient yields misleadingly high metric scores [4, 26, 29, 36, 36].