Extended Data Fig. 6. Performance comparison using different evaluation metrics.
Median performance of SATURN and baseline methods on label transfer between frog and zebrafish embryogenesis datasets evaluated using (a) accuracy, (b) macro-F1-score, (c) macro-precision, and (d) macro-recall. Blue boxplots show zebrafish to frog label transfer performance, while orange boxplots show frog to zebrafish label transfer performance. Distribution is estimated with n = 30 runs of each method.