Table 5.
Checklist for Computational Data Harmonisation in Digital Healthcare (CHECDHA) criteria.
| Category | Item | Explanation | Example | |
| Motivation | Background | The application field of the dataset(s) | Information fusion of DW-MRI data from different scanners | |
| Importance | Why this study is conducted, how important it is | Dramatically increase the statistical power and sensitivity of clinical studies | ||
| Data | Common | Dataset | What the dataset(s) is (are), how it is (they are) collected (details of acquisition protocols, entry and exit criteria) How many categories, cohorts, subjects, and cases are included in the studies |
m healthy subjects under n protocols ( cases, n cohorts) Protocol 1: … Protocol 2: … |
| Property | Whether the dataset(s) is (are) in-house or public, provide the access link if appropriate | Public/In-house | ||
| Pre-processing | How the dataset is pre-processed | Z-score normalisation | ||
| Ground truth | What the ground truth is and how it is generated | Cohort x under protocol i | ||
| Partition | For machine learning, how the dataset is partitioned into training, validation, and testing subsets in terms of the number of samples, patients | 7:2:1 for training, validation and test | ||
| Augmentation | For machine learning, how the dataset is augmented | Randomized flip, rotation | ||
| Specific | MRI sequence | What the MRI sequence is | Diffusion-weighted | |
| Region | Which region(s) of the body or the subject in the dataset is (are) covered | Brain | ||
| Slice size | What the sizes of each slice are | 512512 | ||
| Pixel/Voxel size | What the physical length of a pixel/voxel is | 0.25 mm/ 1 | ||
| WSI size | What the sizes of the whole slide images are | 12,00030,000 | ||
| Patch size | What the extracted image patches are | 256256 | ||
| mmp | What the microns per pixel in the level-0 scan are | – | ||
| Model | Workflow | What the procedures of train and inference are, illustrated by the flow chart(s) if appropriate. | – | |
| Learning approaches | What the learning method is. e.g., supervised learning, un/semi-supervised learning | Semi-supervised learning | ||
| Architecture | What the structure of the proposed neural network is, if appropriate | nnUNet | ||
| Task | The description of main tasks conducted on harmonised datasets, e.g., lesion segmentation/classification. | Tumour Segmentation | ||
| Input domain | What the input modality of the proposed method is | 3-D images / 2D feature vectors | ||
| Input size | The input sizes of the model | |||
| Loss | What the optimisation functions are during the training. | Dice and cross-entropy loss | ||
| Open-source | Whether the source code is available or not, provide the link if appropriate. | Open-source code www.github.com... | ||
| Platform | The learning library used to build the model | TensorFlow 2.5.0 | ||
| Evaluation | Statistical Analysis | What the evaluation methods of statistical analysis are | ANOVA-test | |
| Metric | What indicators are used to evaluate harmonisation performance, e.g., the ratio of the reproducible features, coefficient of variation, Pearson correlation coefficient. | Intra-class correlation coefficient (>0.9 is considered reproducible) | ||
| Comparison | What existing approaches are used to compare the performance of the proposed method | stVAE | ||
| Visualisation | What approaches are used to visualise the data distribution before and after harmonisation strategies | t-SNE/UMAP/PCA | ||
| Result | Result | What the quantitative values of evaluation metrics are. | – | |
| Time-consuming | The computational time of the proposed method and the comparisons. | 30 s per case | ||
| Discussion | Novelty | What the innovation of the proposed method is. | – | |
| Strength | The importance/significance of the issue addressed by the proposed method. | – | ||
| Limitation | What remained and unsolved issues are. | – | ||
| Future works | Whether there will be potential studies in the future. | – | ||