Fig. 3. scMerge2 is scalable to integrate five millions COVID-19 PBMC cells.
a UMAP of integration of COVID-19 data collection by scMerge2, coloured by cell type (left) and studies (right). b Evaluation metrics of PCA scores using dataset, protocol and technology as labels, comparing raw logcounts (blue) and scMerge2 normalised results. A lower score indicates better unwanted technical variation removal. c Prediction results from 20 times repeated cross validation of disease severity using cell type-specific aggregated expression calculated from raw logcounts (blue) and scMerge2 normalised results (red). Each box includes 20 points, ranges from the first to third quartile of classification accuracy with the median as the horizontal line. The box plot’s lower whisker extends 1.5 times the interquartile range below the first quartile, while the upper whisker extends 1.5 times the interquartile range above the third quartile. Source data are provided as a Source Data file.