Table 1.
Publication | Feature extraction | Classification model | Sample size | Imaging data type | Performance | Validation method | Feature selection/input | Highlight/advantage | Shortcoming |
---|---|---|---|---|---|---|---|---|---|
McWilliams et al. [31] | NA | LR | 2961 | CT images | AUC (0.907–0.960) | Hold-out | Clinical risk factors + nodule characteristics on CT images | Using the extracted feature as input, the classifier can achieve high AUC in small nodules (< 10 mm) | The selection of nodule characteristics affects the predictive performance of the model |
Riel et al. [32] | NA | LR | 300 | CT images | AUC (0.706–0.932) | Hold-out | Clinical factors + nodule characteristics on CT images | The classifier can perform equivalently as human observers for malignant and benign classification | The performance heavily relies on nodule size as the discriminator, and is not robust in small nodules |
Kriegsmann et al. [34] | NA | LDA | 326 | MALDI | Accuracy (0.991) | Hold-out | Mass spectra from ROIs of MALDI image | The model maintains high accuracy on FFPE biopsies | The performance relies on the quality of the MALDI stratification |
Buty et al. [37] | Spherical harmonics [44]; DCNN [41] |
RF | 1018 | CT images | Accuracy (0.793–0.824) | 10-fold cross-validation | CT imaging patches + radiologists’ binary nodule segmentations | The model reaches higher predictive accuracy by integrating shape and appearance nodule imaging features | No benchmarking comparisons were used in the study |
Hussein et al. [38] | 3D CNN-based multi-task model | 3D CNN-based multi-task model | 1018 | CT images | Accuracy (0.9126) | 10-fold cross-validation | 3D CT volume feature | The model achieves higher accuracy than other benchmarked models | The ground truth scores defined by radiologists for the benchmark might be arbitrary |
Khosravan et al. [39] | 3D CNN-based multi-task model | 3D CNN-based multi-task model | 6960 | CT images | Segmentation DSC (0.91); classification accuracy (0.97) | 10-fold cross-validation | 3D CT volume feature | The model integration of clustering and sparsification algorithms helps to accurately extract potential attentional regions | Segmentation might fail if the ROIs are outside the lung regions |
Ciompi et al. [40] | OverFeat [42] | SVM; RF | 1729 | CT images | AUC (0.868) | 10-fold cross-validation | 3D CT volume feature, nodule position coordinate, and maximum diameter | This is the first study attempting to classify whether the diagnosed nodule is benign or malignant | The model requires specifying the position and diameter of the nodule as input, but many nodules could not be located on the CT images |
Venkadesh et al. [44] | 2D-ResNet50-based [45]; 3D-Inception-V1 [46] |
An ensemble model based on two CNN models | 16,429 | CT images | AUC (0.86–0.96) | 10-fold cross-validation | 3D CT volume feature and nodule coordinates | The model achieves higher AUC than other benchmarked models | The model requires specifying the position of the nodule, but many nodules are unable to be located on the CT images |
Ardila et al. [47] | Mask-RCNN [48]; RetinaNet [49]; 3D-inflated Inception-V1 [50], [51] |
Mask-RCNN [48]; RetinaNet [49]; 3D-inflated Inception-V1 [50], [51] |
14,851 | CT images | AUC (0.944) | Hold-out | Patient’s current and prior (if available) 3D CT volume features | The model achieves higher AUC than radiologists when samples do not have prior CT images | The training cohort is from only one dataset, although the sample size is large |
AbdulJabbar et al. [52] | Micro-Net [53]; SC-CNN [54] | An ensemble model based on SC-CNN [54] | 100 | Histological images | Accuracy (0.913) | Hold-out | Image features of H&E-stained tumor section histological slides | The model can annotate cell types at the single-cell level using histological images only | The annotation accuracy is affected by the used reference dataset |
Coudray et al. [55] | Multi-task CNN model based on Inception-V3 [51] | Multi-task CNN model based on Inception-V3 network [51] | 1634 | Histological images | AUC (0.733–0.856) | Hold-out | Transformed 512 × 512-pixel tiles from nonoverlapping ‘patches’ of the whole-slide images | The model can predict whether a given tissue has somatic mutations in genes STK11, EGFR, FAT1, SETBP1, KRAS, and TP53 | The accuracy of the gene mutation prediction is not very high |
Lin et al. [59] | DCGAN [58] + AlexNet [41] | DCGAN [58] + AlexNet [41] | 22,489 | CT images | Accuracy (0.9986) | Hold-out | Initial + synthetic CT images | The model uses GAN to generate synthetic lung cancer images to reduce overfitting | No benchmarking comparisons were used |
Ren et al. [60] | DCGAN [58] + VGG-DF | DCGAN [58] + VGG-DF | 15,000 | Histopathological images | Accuracy (0.9984); F1-score (99.84%) |
Hold-out | Initial + synthetic histopathological images | The model uses GAN to generate synthetic lung cancer images and a regularization-enhanced model to reduce overfitting | The dimension of images by generator (64 × 64) is not sufficient for biomedical domain |
Note: ML, machine learning; NA, not applicable; LR, logistic regression; AUC, area under the curve; CT, computed tomography; LDA, linear discriminant analysis; MALDI, matrix-assisted laser desorption/ionization; ROI, region of interest; FFPE, formalin-fixed paraffin-embedded; CNN, convolutional neural network; DSC, dice similarity coefficient; SVM, support vector machine; RF, random forest; DCNN, deep convolutional neural network; SC-CNN, spatially constrained convolutional neural network; DCGAN, deep convolutional generative adversarial network; RCNN, Region-CNN; H&E, hematoxylin and eosin; 2D, two dimensional; 3D, three dimensional. Compared with hold-out, cross-validation is usually more robust, and accounts for more variance between possible splits in training, validation, and test data. However, cross-validation is more time consuming than using the simple holdout method.