a A deep learning architecture was designed to train sample-level classification of collections of white blood cells from peripheral smears. The proposed model takes a collection of cells from a given sample/individual and applies the same convolutional and fully connected layers described in Fig. 1 to arrive at per-cell predictions for APL/AML. The per-cell predictions are then averaged over all the cells to arrive at a sample-level prediction. b The proposed model was trained where CellaVision data was available on a discovery cohort of 82 patients and tested on an independent prospective validation cohort of 24 patients for which performance metrics are shown. Initially, the model was trained only on immature myeloid cells, here denoted as Blasts, and performance was assessed both at the cell (i,iv) and sample/patient (iii,vi) level for the discovery and validation cohort where cell-level predictions come from cell assignment layer within network and sample/patient predictions come from aggregation layer within the network. Performance was assessed in the discovery cohort in Monte–Carlo (MC) cross-validation and was assessed in the validation cohort by applying the 100 MC models trained in discovery onto the validation cohort in ensemble. Probability of a cell being APL is shown per CellaVision cell type (ii,v). Sample-level performance from the MIL model was benchmarked against the proportion of promyelocytes within a sample (iii,vi,viii,x). In addition, the model was trained on all cell types from CellaVision, here denoted as All Cells, and performance was assessed both at the cell (vii,ix) and sample/patient (viii,x) level for the discovery and validation cohort. CellaVision cells from patients in the validation cohort were provided to 10 clinicians to assess clinician diagnostic specificity/sensitivity against deep learning model (x) (+ denotes an individual clinician. * denotes two individuals with the same performance). Probability of being APL is shown for all CellaVision cell types in the discovery (xi) and validation (xii) cohorts.