Extended Data Fig. 5. Training and validation of a random forest on high-resolution cell types.
A training dataset cells for random forest classification was sampled as shown by a, the uniform manifold approximation and projection (UMAP) of integrated data. Sampled cells were shown in black covering 33 % of the dataset and with 6 folds representing the dataset evenly as shown by the coloured sub-plots. Cross-validation (6-folds) suggests reasonable model accuracy for most clusters by comparing b, precision and recall (mean and standard deviation, k = 6). Prediction of the remaining naive 66 % of cells as test data shown in c, the confusion matrix between the reference and predicted cell classes with high scores for most cell types.