Skip to main content
. 2025 Jul 21;8:1621514. doi: 10.3389/frai.2025.1621514

Table 2.

Comparative analysis of data quality improvement strategies in healthcare.

Study Methods used Accuracy/AUC Completeness Reproducibility/strategy
This study KNN imputation, anomaly detection (IF, IQR, LOF), PCA, RF 75.3%, AUC 0.83 ~100% MLflow tracking, reproducible pipeline
Chen et al. (2021) Transfer learning, data quality evaluation, CNN/RNN ~ + 11% accuracy improvement after cleaning Implicitly improved Structured pipeline with medical concept normalization
Kale and Pandey (2024) KNN, SMOTE, clustering-based anomaly detection, multiple classifiers Accuracy 70% → 91%, F1-score 0.65 → 0.89 Substantial post-imputation Clear before/after metric reporting
Emery et al. (2024) Imputation: MICE, hot-deck, log-linear, MICT-timing 65–74% (simulated data) Trajectory coverage improved Reproducible in R, used on longitudinal data
Azimi and Pahl (2024) Anomaly analytics, ML quality metadata exploration Not quantified; improved ML output reliability Qualitative evaluation Anomaly visualization and root cause strategy