Unit-normalised Matthews correlation coefficient (MCC) versus F1 score (the harmonic mean of accuracy and precision). A value of 0.5 on the vertical axis corresponds to the performance of a random classifier. The point at coordinate marked with a black dot corresponds to 100% true positives and 0% false negatives. Lines in MCC-F1 space were drawn by calculating the MCC and F1 values for 100 false-positive rates between 0 and 100, and the point closest to is indicated with the symbol shown in the legend. This point may not correspond to an acceptably low false-positive rate, but for the classifiers shown here it is indicative of the method’s overall performance. In all panels, condition positive is the AI simulation scenario, and the condition negative varies by panel column (indicated at top). The 'weakly misspecified' row used simulations of Model A1 as the training/null, and evaluated the method on simulations of Model A2. The 'strongly misspecified' row used simulations of Model A1 as the training/null, and evaluated the method on simulations of Model B.