Skip to main content
. 2024 Feb 28;15:1808. doi: 10.1038/s41467-024-46000-9

Fig. 1. SUDO is a framework to evaluate AI systems without ground-truth labels.

Fig. 1

a An AI system is often deployed on data in the wild, which can vary significantly from those in the held-out set (distribution shift), and which can also lack ground-truth labels. b SUDO is a 5-step framework that circumvents the challenges posed by data in the wild. First, deploy an AI system on data in the wild to obtain probability values. Second, discretize those values into quantiles. Third, sample data points from each quantile and pseudo-label (temporarily label) them with a possible class (SUDO Class 0). Sample data points with ground-truth labels from the opposite class to form a classification task. Fourth, train a classifier to distinguish between these data points. Repeat the process with a different pseudo-label (SUDO Class 1). Finally, evaluate and compare the performance of the classifiers on the same held-out set of data with ground-truth labels, deriving the pseudo-label discrepancy. This discrepancy and the relative classifier performance indicate whether the sampled data points are more likely to belong to one class than another.