. 2022 Jun;82:99–122. doi: 10.1016/j.inffus.2022.01.001

Table 5.

Checklist for Computational Data Harmonisation in Digital Healthcare (CHECDHA) criteria.

Category		Item	Explanation	Example
Motivation		Background	The application field of the dataset(s)	Information fusion of DW-MRI data from different scanners
Motivation		Importance	Why this study is conducted, how important it is	Dramatically increase the statistical power and sensitivity of clinical studies
Data	Common	Dataset	What the dataset(s) is (are), how it is (they are) collected (details of acquisition protocols, entry and exit criteria) How many categories, cohorts, subjects, and cases are included in the studies	m healthy subjects under n protocols ( $m \times n$ cases, n cohorts) Protocol 1: … Protocol 2: …
		Property	Whether the dataset(s) is (are) in-house or public, provide the access link if appropriate	Public/In-house
		Pre-processing	How the dataset is pre-processed	Z-score normalisation
		Ground truth	What the ground truth is and how it is generated	Cohort x under protocol i
		Partition	For machine learning, how the dataset is partitioned into training, validation, and testing subsets in terms of the number of samples, patients	7:2:1 for training, validation and test
		Augmentation	For machine learning, how the dataset is augmented	Randomized flip, rotation
	Specific	MRI sequence	What the MRI sequence is	Diffusion-weighted
		Region	Which region(s) of the body or the subject in the dataset is (are) covered	Brain
		Slice size	What the sizes of each slice are	512 $\times$ 512
		Pixel/Voxel size	What the physical length of a pixel/voxel is	0.25 mm/ 1 $m m^{3}$
		WSI size	What the sizes of the whole slide images are	12,000 $\times$ 30,000
		Patch size	What the extracted image patches are	256 $\times$ 256
		mmp	What the microns per pixel in the level-0 scan are	–
Model		Workflow	What the procedures of train and inference are, illustrated by the flow chart(s) if appropriate.	–
		Learning approaches	What the learning method is. e.g., supervised learning, un/semi-supervised learning	Semi-supervised learning
		Architecture	What the structure of the proposed neural network is, if appropriate	nnUNet
		Task	The description of main tasks conducted on harmonised datasets, e.g., lesion segmentation/classification.	Tumour Segmentation
		Input domain	What the input modality of the proposed method is	3-D images / 2D feature vectors
		Input size	The input sizes of the model	$n \times w \times h \times c$
		Loss	What the optimisation functions are during the training.	Dice and cross-entropy loss
		Open-source	Whether the source code is available or not, provide the link if appropriate.	Open-source code www.github.com...
		Platform	The learning library used to build the model	TensorFlow 2.5.0
Evaluation		Statistical Analysis	What the evaluation methods of statistical analysis are	ANOVA-test
		Metric	What indicators are used to evaluate harmonisation performance, e.g., the ratio of the reproducible features, coefficient of variation, Pearson correlation coefficient.	Intra-class correlation coefficient (>0.9 is considered reproducible)
		Comparison	What existing approaches are used to compare the performance of the proposed method	stVAE
		Visualisation	What approaches are used to visualise the data distribution before and after harmonisation strategies	t-SNE/UMAP/PCA
Result		Result	What the quantitative values of evaluation metrics are.	–
Result		Time-consuming	The computational time of the proposed method and the comparisons.	30 s per case
Discussion		Novelty	What the innovation of the proposed method is.	–
		Strength	The importance/significance of the issue addressed by the proposed method.	–
		Limitation	What remained and unsolved issues are.	–
		Future works	Whether there will be potential studies in the future.	–