. 2022 Dec 1;20(5):850–866. doi: 10.1016/j.gpb.2022.11.003

Table 1.

Publications relevant toMLon early detection and diagnosis using imaging data

Publication	Feature extraction	Classification model	Sample size	Imaging data type	Performance	Validation method	Feature selection/input	Highlight/advantage	Shortcoming
McWilliams et al. [31]	NA	LR	2961	CT images	AUC (0.907–0.960)	Hold-out	Clinical risk factors + nodule characteristics on CT images	Using the extracted feature as input, the classifier can achieve high AUC in small nodules (< 10 mm)	The selection of nodule characteristics affects the predictive performance of the model
Riel et al. [32]	NA	LR	300	CT images	AUC (0.706–0.932)	Hold-out	Clinical factors + nodule characteristics on CT images	The classifier can perform equivalently as human observers for malignant and benign classification	The performance heavily relies on nodule size as the discriminator, and is not robust in small nodules
Kriegsmann et al. [34]	NA	LDA	326	MALDI	Accuracy (0.991)	Hold-out	Mass spectra from ROIs of MALDI image	The model maintains high accuracy on FFPE biopsies	The performance relies on the quality of the MALDI stratification
Buty et al. [37]	Spherical harmonics [44]; DCNN [41]	RF	1018	CT images	Accuracy (0.793–0.824)	10-fold cross-validation	CT imaging patches + radiologists’ binary nodule segmentations	The model reaches higher predictive accuracy by integrating shape and appearance nodule imaging features	No benchmarking comparisons were used in the study
Hussein et al. [38]	3D CNN-based multi-task model	3D CNN-based multi-task model	1018	CT images	Accuracy (0.9126)	10-fold cross-validation	3D CT volume feature	The model achieves higher accuracy than other benchmarked models	The ground truth scores defined by radiologists for the benchmark might be arbitrary
Khosravan et al. [39]	3D CNN-based multi-task model	3D CNN-based multi-task model	6960	CT images	Segmentation DSC (0.91); classification accuracy (0.97)	10-fold cross-validation	3D CT volume feature	The model integration of clustering and sparsification algorithms helps to accurately extract potential attentional regions	Segmentation might fail if the ROIs are outside the lung regions
Ciompi et al. [40]	OverFeat [42]	SVM; RF	1729	CT images	AUC (0.868)	10-fold cross-validation	3D CT volume feature, nodule position coordinate, and maximum diameter	This is the first study attempting to classify whether the diagnosed nodule is benign or malignant	The model requires specifying the position and diameter of the nodule as input, but many nodules could not be located on the CT images
Venkadesh et al. [44]	2D-ResNet50-based [45]; 3D-Inception-V1 [46]	An ensemble model based on two CNN models	16,429	CT images	AUC (0.86–0.96)	10-fold cross-validation	3D CT volume feature and nodule coordinates	The model achieves higher AUC than other benchmarked models	The model requires specifying the position of the nodule, but many nodules are unable to be located on the CT images
Ardila et al. [47]	Mask-RCNN [48]; RetinaNet [49]; 3D-inflated Inception-V1 [50], [51]	Mask-RCNN [48]; RetinaNet [49]; 3D-inflated Inception-V1 [50], [51]	14,851	CT images	AUC (0.944)	Hold-out	Patient’s current and prior (if available) 3D CT volume features	The model achieves higher AUC than radiologists when samples do not have prior CT images	The training cohort is from only one dataset, although the sample size is large
AbdulJabbar et al. [52]	Micro-Net [53]; SC-CNN [54]	An ensemble model based on SC-CNN [54]	100	Histological images	Accuracy (0.913)	Hold-out	Image features of H&E-stained tumor section histological slides	The model can annotate cell types at the single-cell level using histological images only	The annotation accuracy is affected by the used reference dataset
Coudray et al. [55]	Multi-task CNN model based on Inception-V3 [51]	Multi-task CNN model based on Inception-V3 network [51]	1634	Histological images	AUC (0.733–0.856)	Hold-out	Transformed 512 × 512-pixel tiles from nonoverlapping ‘patches’ of the whole-slide images	The model can predict whether a given tissue has somatic mutations in genes STK11, EGFR, FAT1, SETBP1, KRAS, and TP53	The accuracy of the gene mutation prediction is not very high
Lin et al. [59]	DCGAN [58] + AlexNet [41]	DCGAN [58] + AlexNet [41]	22,489	CT images	Accuracy (0.9986)	Hold-out	Initial + synthetic CT images	The model uses GAN to generate synthetic lung cancer images to reduce overfitting	No benchmarking comparisons were used
Ren et al. [60]	DCGAN [58] + VGG-DF	DCGAN [58] + VGG-DF	15,000	Histopathological images	Accuracy (0.9984); F1-score (99.84%)	Hold-out	Initial + synthetic histopathological images	The model uses GAN to generate synthetic lung cancer images and a regularization-enhanced model to reduce overfitting	The dimension of images by generator (64 × 64) is not sufficient for biomedical domain

Note: ML, machine learning; NA, not applicable; LR, logistic regression; AUC, area under the curve; CT, computed tomography; LDA, linear discriminant analysis; MALDI, matrix-assisted laser desorption/ionization; ROI, region of interest; FFPE, formalin-fixed paraffin-embedded; CNN, convolutional neural network; DSC, dice similarity coefficient; SVM, support vector machine; RF, random forest; DCNN, deep convolutional neural network; SC-CNN, spatially constrained convolutional neural network; DCGAN, deep convolutional generative adversarial network; RCNN, Region-CNN; H&E, hematoxylin and eosin; 2D, two dimensional; 3D, three dimensional. Compared with hold-out, cross-validation is usually more robust, and accounts for more variance between possible splits in training, validation, and test data. However, cross-validation is more time consuming than using the simple holdout method.