Fig. 7. Performance evaluation for the classification of the diffusion model.
a Total accuracy and b expected calibration error (ECE)80,81 (see Supplementary Information for detailed definitions) achieved by Multi-SWAG, plotted for different trajectory lengths T by averaging over 105 trajectories each. The ECE describes the difference one may expect between the predicted confidence and the observed accuracy. As before we achieve a low calibration error between 0.3% and 0.6%. The classification accuracy improves the longer the trajectory, achieving results similar to the best scoring models in the AnDi-Challenge62. c Expected calibration error (ECE)80,81 achieved for lower-ranked predictions, meaning those models that were not assigned the highest confidence. A prediction of rank i corresponds to the output with the ith highest confidence. Even these predictions show low calibration errors below 0.5%. The vanishing ECE for the 4th and lower-ranked predictions of long trajectories are caused by them being correctly assigned a 0% probability.