The optimal number of highly variable genes to be used in CP and RPC algorithms. The highly variable genes were identified from the reference dataset and ranked by standardized variance from the mean-variance feature selection method with variance-stabilizing transformation. A. The boxplot depicts the overall accuracy averaged over inter-dataset predictions (PBMC, pancreas, TM full, TM lung, and simulation) with the top 100, 200, 500, 1000, 2000, and 5000 highly variable genes as input features for CP and RPC methods. The x-axis is the number of highly variable features, and the y-axis is the overall accuracy. Methods are reflected by different box colors. B. The boxplot represents the condition number of the pseudo-bulk reference matrix averaged over inter-dataset predictions with the top 100, 200, 500, 1000, 2000, and 5000 highly variable genes as input features. The x-axis is the number of highly variable features, and the y-axis is the condition number.