a, Comparison of performance of three machine learning models via ROC AUC. Performance was parsed by the three classification classes (ambiguous, fail, and somatic) for cross-validation data (n = 27,470 variants). b, Reliability diagrams depict how closely model outputs scale to a probability (between 0 and 1) using cross-validation data (n = 27,470 variants). Bar graphs show 10 equally distributed bins of model output. The bar graphs plot the number of model calls that agree and disagree with the manual review call. The diagonal line indicates a perfectly scaled probabilistic prediction. The colored points display the ratio of predictions that agree with the call to the total number of predictions for a given bin. Binomial proportion confidence intervals were calculated for each bin. Pearson correlation coefficient comparing colored points to the diagonal line was calculated to assess the output of the respective model.