Table 2.
Evaluation metrics used and their interpretations
| Category | Metric name | Full metric name | Explanation |
|---|---|---|---|
| Gene expression prediction/Model generalisability | PCC | Pearson Correlation Coefficient | Measures the linear relationship between predicted and observed gene expression, providing a value between −1 and 1, where 1 indicates a perfect positive correlation and −1 indicates a perfect negative correlation. |
| MI | Mutual Information | Measures the amount of information shared between predicted and observed gene expression, capturing their statistical dependence. Higher values indicate stronger dependence and similarity between the variables. | |
| JS-Div | Jensen–Shannon divergence | Quantifies the dissimilarity or divergence between the predicted and true gene expression probability distributions. It provides a measure of dissimilarity that ranges from 0 to 1, where 0 indicates identical distributions and 1 indicates complete dissimilarity. A lower JS-Div indicates better agreement and similarity between the distributions. | |
| NRMSE | Normalised Root Mean Squared Error | The RMSE (Root Mean Squared Error) between predicted and observed gene expression values, normalised by the range of the observed values. It provides a normalised measure of prediction accuracy, allowing for comparison across different datasets or scales. A lower NRMSE indicates better prediction performance. | |
| SSIM | Structural Similarity Index | Evaluates the structural similarity of spatial patterns between predicted and observed gene expression by treating each spot as a ‘pixel’ in the spatial grid. It measures the similarity of intensities, luminance, contrast and structural information. Higher SSIM values indicate better structural similarity. | |
| AUC | Area Under the Curve | Quantifies how well the predicted gene expression can discriminate between binarisation of zero vs. non-zero (or small vs. large) values of the observed gene expression values. It ranges from 0 to 1, and an AUC of 1 suggests that the predicted gene expression values can perfectly discriminate between the binarisation of observed gene expression value. | |
| Clinical translational impact | C-index | Concordance index | Quantifies the discriminatory power of a predictive survival model by assessing its ability to correctly rank or classify pairs of observations, typically in terms of their survival times or outcome probabilities. A C-index value of 0.5 indicates random chance, while a value of 1.0 signifies perfect discrimination. |
| Log-rank p value | Log-rank test p value | The log-rank p value is a statistical measure commonly used in survival analysis to assess the difference in survival or event occurrence between two or more groups. If the p value is small (typically below a predefined significance level, such as 0.05), it suggests that the observed differences in survival curves are unlikely to have occurred due to chance alone. |