Skip to main content
. 2022 Jul 12;24(7):e36490. doi: 10.2196/36490

Table 6.

Study analysis for journal publications on the diagnosis phase.

Reference Objective, data set, and methodology Performance and remarks
[60]
  • Objective: Classification of mature B-cell neoplasm

  • Data set: 20,622 routine diagnostic samples from Munich Leukemia Laboratory

  • Methodology: CNN-SOMa transformation

Performance:
  • Accuracy: 95%

Strengths:
  • Large data set

  • High accuracy

Limitations:
  • Nonuniform distribution of misclassifications due to similarity in flow cytometric profiles

Validation:
  • 10% validation split

[54]
  • Objective: Detection of immature leukocytes and their classification into 4 types

  • Data set: Images extracted from a publicly available data set at The Cancer Imaging Archive

  • Methodology: Random forest algorithm

Performance:
  • Accuracy: 92.99%

Strengths:
  • High precision results for each class

Limitations:
  • High number of false positives leading to low precision and specificity

Validation:
  • 5-fold cross validation

[49]
  • Objective: Identification of the leukemia type based on patient genetic expression

  • Data set: A sample of 7129 genes that represent the genetic expressions of 72 people from Kaggle

  • Methodology: XGBoost, artificial neural networks, and random forest algorithm

Performance:
  • Random forest accuracy: 80.8%

  • XGBoost accuracy: 92.3%

Strengths:
  • Use of principal component analysis for dimensionality reduction and faster computation

  • Use of grid search for the best hyperparameter selection

Limitations:
  • Small data set (72 people)

Validation:
  • Internal validation (65%/35% split)

[135]
  • Objective: Classification of lymphocytic cells

  • Data set: The ALL-IDB2 Database

  • Methodology: bare bones particle swarm optimization–based feature optimization

Performance:
  • Accuracy: 94.94%-96.25%

Strengths:
  • A good performance on capturing prognostic chronic myeloid leukemia markers by the model

Limitations:
  • Challenge of capturing relationships between data types with no information loss in clinical clustering

Validation:
  • Validation on an external independent clinical trial

[34]
  • Objective: Detection of leukemia and its types

  • Data set: 220 blood smear images from healthy individuals and patients with leukemia

  • Methodology: support vector machine

Performance:
  • Accuracy: Above 80%

Strengths:
  • Use of 3 segmentation methods

  • Broader range of leukemia classification (types and subtypes)

Limitations:
  • Costly method based on imaging data

Validation:
  • Internal validation (train test split)

[122]
  • Objective: Automated detection of malignant lymphoma

  • Data set: Prepared histopathologic images (388 sections, 259 diffuse large B-cell lymphomas, 89 follicular lymphomas, and 40 reactive lymphoid hyperplasia)

  • Methodology: Deep neural network classifier

Performance:
  • Accuracy: 97%

Strengths:
  • High accuracy outperforming 7 pathologists

  • Model ensemble comprising 3 classifiers

Limitations:
  • Classifier requires a manual annotation

  • Model not able to classify all the subtypes

Validation:
  • K-fold cross validation repeated 5 times

[37]
  • Objective: Multiclassification of leukemia

  • Data set: 100 blood smear images

  • Methodology: Neural network classifiers

Performance:
  • Accuracy: 97.7%

Strengths:
  • Two-step neural network classifier

Limitations:
  • Limited data set (100 blood smear images)

Validation:
  • Internal validation (90 images used for training and 10 kept for validation)

[133]
  • Objective: Leukemia and lymphoma diagnosis

  • Data set: 283 blood and bone marrow sample images from patients with leukemia and lymphoma

  • Methodology: Decision tree

Performance:
  • Correctness: 95%

Strengths:
  • Application of the LASSO algorithm for regularization

  • Model robustness and strength against false negatives

Limitations:
  • Complexity of the decision tree and the risk of overfitting through the production of too large trees

Validation:
  • 30-fold cross validation

[66]
  • Objective: Leukemia image segmentation

  • Data set: The Acute Lymphoblastic Leukemia Image Database

  • Methodology: HSCRKMb/particle swarm optimization/K-means

Performance:
  • Accuracy: 80% and above

Strengths:
  • Use of 7 machine learning methods

  • Application of soft covering rough approximation

Limitations:
  • Suitable for medical images only

  • Application on multiple color images increases the processing time

Validation:
  • Different train/test sizes were used for model evaluation

[144]
  • Objective: Determining the most predictive features for acute lymphoblastic leukemia identification

  • Data set: 94 pediatric patient samples collected from the Department of Hematology and Oncology, Children Hospital and Institute of Child Health, Lahore

  • Methodology: Random forest, boosting machine, C5.0 decision tree, and classification and regression trees

Performance:
  • Accuracy: 87.4%

Strengths:
  • High accuracy

  • Balanced data set

Limitations:
  • Small-scale study

  • Few machine learning models

  • Socioeconomic risk factors not selected automatically

Validation:
  • Internal validation (train/validation data)

  • 10-fold cross validation

[31]
  • Objective: Leukemia diagnosis and its subtypes

  • Data set: 200 blood smear images extracted from Vidyalankar Institute of Technology, Mumbai and online databases

  • Methodology: support vector machine

Performance:
  • Accuracy: 97.8%

Strengths:
  • Good detection accuracy

  • Thorough image segmentation process

Limitations:
  • Challenging detection process due to the irregularity of the cancer cell’s shape and nucleus

  • Use of only support vector machine for classification

aCNN-SOM: convolutional neural network-self-organizing map.

bHSCRKM: histogram-based soft covering rough K-means clustering.