Skip to main content
. 2022 Jun;82:99–122. doi: 10.1016/j.inffus.2022.01.001

Table 5.

Checklist for Computational Data Harmonisation in Digital Healthcare (CHECDHA) criteria.

Category Item Explanation Example
Motivation Background The application field of the dataset(s) Information fusion of DW-MRI data from different scanners
Importance Why this study is conducted, how important it is Dramatically increase the statistical power and sensitivity of clinical studies
Data Common Dataset What the dataset(s) is (are), how it is (they are) collected (details of acquisition protocols, entry and exit criteria)
How many categories, cohorts, subjects, and cases are included in the studies
m healthy subjects under n protocols (m×n cases, n cohorts)
Protocol 1: …
Protocol 2: …
Property Whether the dataset(s) is (are) in-house or public, provide the access link if appropriate Public/In-house
Pre-processing How the dataset is pre-processed Z-score normalisation
Ground truth What the ground truth is and how it is generated Cohort x under protocol i
Partition For machine learning, how the dataset is partitioned into training, validation, and testing subsets in terms of the number of samples, patients 7:2:1 for training, validation and test
Augmentation For machine learning, how the dataset is augmented Randomized flip, rotation
Specific MRI sequence What the MRI sequence is Diffusion-weighted
Region Which region(s) of the body or the subject in the dataset is (are) covered Brain
Slice size What the sizes of each slice are 512×512
Pixel/Voxel size What the physical length of a pixel/voxel is 0.25 mm/ 1mm3
WSI size What the sizes of the whole slide images are 12,000×30,000
Patch size What the extracted image patches are 256×256
mmp What the microns per pixel in the level-0 scan are
Model Workflow What the procedures of train and inference are, illustrated by the flow chart(s) if appropriate.
Learning approaches What the learning method is. e.g., supervised learning, un/semi-supervised learning Semi-supervised learning
Architecture What the structure of the proposed neural network is, if appropriate nnUNet
Task The description of main tasks conducted on harmonised datasets, e.g., lesion segmentation/classification. Tumour Segmentation
Input domain What the input modality of the proposed method is 3-D images / 2D feature vectors
Input size The input sizes of the model n×w×h×c
Loss What the optimisation functions are during the training. Dice and cross-entropy loss
Open-source Whether the source code is available or not, provide the link if appropriate. Open-source code www.github.com...
Platform The learning library used to build the model TensorFlow 2.5.0
Evaluation Statistical Analysis What the evaluation methods of statistical analysis are ANOVA-test
Metric What indicators are used to evaluate harmonisation performance, e.g., the ratio of the reproducible features, coefficient of variation, Pearson correlation coefficient. Intra-class correlation coefficient (>0.9 is considered reproducible)
Comparison What existing approaches are used to compare the performance of the proposed method stVAE
Visualisation What approaches are used to visualise the data distribution before and after harmonisation strategies t-SNE/UMAP/PCA
Result Result What the quantitative values of evaluation metrics are.
Time-consuming The computational time of the proposed method and the comparisons. 30 s per case
Discussion Novelty What the innovation of the proposed method is.
Strength The importance/significance of the issue addressed by the proposed method.
Limitation What remained and unsolved issues are.
Future works Whether there will be potential studies in the future.