. 2024 Nov 28;10(12):1915–1929. doi: 10.3390/tomography10120139

Table 2.

Performance comparison of proposed method and baseline models for distal ulna maturity grading.

Models	Accuracy (95%CI)	Precision (95%CI)	Recall (95%CI)	F1 score (95%CI)
Ensemble DenseNet [40]	83.4% (80.9–84.1%)	81.3% (79.6–83.0%)	83.9% (82.1–84.4%)	83.2% (81.5–84.0%)
ResNet [24]	81.0% (79.5–83.0%)	78.6% (77.9–80.4%)	81.5% (80.1–82.4%)	80.8% (79.5–81.9%)
Efficient-Net B4	82.8% (81.7–83.6%)	83.9% (82.0–84.7%)	82.1% (81.5–83.9%)	82.5% (81.6–84.1%)
Two-stage framework	85.6% (84.1–85.9%)	86.0% (84.4–86.7%)	83.2% (83.0–84.5%)	83.9% (83.3–85.0%)
U-Net with multitask model	85.9% (84.3–86.7%)	85.0% (83.9–86.2%)	86.7% (84.9–87.0%)	86.3% (84.6–86.8%)
Multi-task without pretrain	87.2% (86.4–88.6%)	85.0% (83.8–86.2%)	87.9% (86.1–88.5%)	87.2% (85.5–87.9%)
Multi-task with regression	89.1% (87.0–91.1%)	90.3% (88.7–90.9%)	88.0% (87.6–89.8%)	88.6% (87.9–90.1%)
Proposed method	90.8% (88.6–93.3%)	90.3% (89.0–92.6%)	92.4% (90.1–94.2%)	91.9% (89.8–93.8%)