. Author manuscript; available in PMC: 2023 Nov 14.

Published in final edited form as: IEEE Trans Pattern Anal Mach Intell. 2023 Jun 6;45(7):8081–8093. doi: 10.1109/TPAMI.2023.3234291

TABLE III.

Prediction of Stable Versus Progressive Mild Cognitive Impairment

	AUROC		Balanced accuracy (%)		Sensitivity (%)		Specificity (%)

Model	Mean	95% CI	Mean	95% CI	Mean	95% CI	Mean	95% CI

Seen sites
Conventional DFNN	0.884	0.836 – 0.931	80.8	74.6 – 87.0	81.2	68.3 – 94.1	80.3	74.7 – 86.0
Cluster input DFNN	0.866	0.819 – 0.914	81.3	75.8 – 86.8	80.2	68.6 – 91.7	82.4	77.5 – 87.3
DA-DFNN	0.811	0.745 – 0.876	75.5	68.9 – 82.2	74.9	62.3 – 87.6	76.1	69.0 – 83.2
MeNet	0.830	0.780 – 0.880	75.5	68.3 – 82.7	73.7	59.0 – 88.4	77.3	71.7 – 82.9
LMMNN	0.860	0.824 – 0.896	79.4	72.2 – 86.6	73.9	59.7 – 88.1	84.9	81.6 – 88.1
ARMED-DFNN	0.926	0.901 – 0.951	81.9	77.7 – 86.1	76.5	67.6 – 85.3	87.4	84.5 – 90.2
w/o Adv.	0.919	0.891 – 0.946	81.4	76.8 – 86.1	74.5	64.6 – 84.4	88.4	85.4 – 91.4
randomized Z	0.889	0.862 – 0.916	79.1	73.9 – 84.2	73.9	64.0 – 83.9	84.2	80.2 – 88.2

Unseen sites
Conventional DFNN	0.806	0.786 – 0.825	73.9	71.9 – 76.0	76.2	73.4 – 78.9	71.7	68.5 – 74.8
Cluster input DFNN	0.796	0.776 – 0.816	74.4	72.7 – 76.2	75.4	72.5 – 78.4	73.4	71.6 – 75.2
DA-DFNN	0.723	0.665 – 0.780	67.9	63.2 – 72.6	64.7	52.7 – 76.8	71.1	67.4 – 74.7
MeNet	0.750	0.693 – 0.807	70.2	65.6 – 74.9	66.0	57.7 – 74.4	74.5	69.8 – 79.1
LMMNN	0.811	0.805 – 0.817	74.6	73.6 – 75.7	71.1	68.1 – 74.2	78.1	76.9 – 79.3
ARMED-DFNN	0.837	0.833 – 0.842	75.6	74.1 – 77.1	72.4	67.6 – 77.1	78.8	76.6 – 80.9
w/o Adv.	0.838	0.827 – 0.848	73.5	72.5 – 74.5	65.4	62.9 – 67.8	81.7	80.7 – 83.3
randomized Z	0.830	0.822 – 0.837	74.6	73.3 – 75.9	69.8	65.0 – 74.5	79.5	77.0 – 82.0

DFNN: dense feedforward neural network; MLDG: meta-learning domain generalization; DA: domain adversarial; Adv.: adversary; AUROC: area under receiver operating characteristic curve; CI: confidence interval. Note: Cluster is inferred via our Z-predictor for the unseen sites for Cluster input DFNN model.

Confidence intervals were computed through 10×10-fold nested cross-validation. Sensitivity and specificity were computed at the Youden point. The best results for each metric are bolded.