. 2025 Feb 11;16:1544. doi: 10.1038/s41467-025-56618-y

Table 2.

Evaluation metrics used and their interpretations

Category	Metric name	Full metric name	Explanation
Gene expression prediction/Model generalisability	PCC	Pearson Correlation Coefficient	Measures the linear relationship between predicted and observed gene expression, providing a value between −1 and 1, where 1 indicates a perfect positive correlation and −1 indicates a perfect negative correlation.
	MI	Mutual Information	Measures the amount of information shared between predicted and observed gene expression, capturing their statistical dependence. Higher values indicate stronger dependence and similarity between the variables.
	JS-Div	Jensen–Shannon divergence	Quantifies the dissimilarity or divergence between the predicted and true gene expression probability distributions. It provides a measure of dissimilarity that ranges from 0 to 1, where 0 indicates identical distributions and 1 indicates complete dissimilarity. A lower JS-Div indicates better agreement and similarity between the distributions.
	NRMSE	Normalised Root Mean Squared Error	The RMSE (Root Mean Squared Error) between predicted and observed gene expression values, normalised by the range of the observed values. It provides a normalised measure of prediction accuracy, allowing for comparison across different datasets or scales. A lower NRMSE indicates better prediction performance.
	SSIM	Structural Similarity Index	Evaluates the structural similarity of spatial patterns between predicted and observed gene expression by treating each spot as a ‘pixel’ in the spatial grid. It measures the similarity of intensities, luminance, contrast and structural information. Higher SSIM values indicate better structural similarity.
	AUC	Area Under the Curve	Quantifies how well the predicted gene expression can discriminate between binarisation of zero vs. non-zero (or small vs. large) values of the observed gene expression values. It ranges from 0 to 1, and an AUC of 1 suggests that the predicted gene expression values can perfectly discriminate between the binarisation of observed gene expression value.
Clinical translational impact	C-index	Concordance index	Quantifies the discriminatory power of a predictive survival model by assessing its ability to correctly rank or classify pairs of observations, typically in terms of their survival times or outcome probabilities. A C-index value of 0.5 indicates random chance, while a value of 1.0 signifies perfect discrimination.
	Log-rank p value	Log-rank test p value	The log-rank p value is a statistical measure commonly used in survival analysis to assess the difference in survival or event occurrence between two or more groups. If the p value is small (typically below a predefined significance level, such as 0.05), it suggests that the observed differences in survival curves are unlikely to have occurred due to chance alone.