. 2021 Jul 20;52(4):698–705. [Article in Chinese] doi: 10.12182/20210460201

表 2. T staging model evaluation in the training set.

T分期模型训练集上的评估指标

Model	Threshold	Sensitivity	Specificity	Accuracy	Hamming loss	F1 score	Kappa score	AUC	AP
Best performance of all models in training set; AP: Average precision; AUC*: Area under the curve; Hamming loss: The fraction of labels that are incorrectly predicted; F1 score=2×(precision×recall)/(precision+recall); Kappa score: A score that expresses the level of agreement between two annotators on a classification problem.
1*	0.5	0.809	0.875	0.838	0.162	0.850	0.674	0.893	0.901
2	0.5	0.809	0.833	0.819	0.180	0.836	0.636	0.841	0.880
3	0.5	0.778	0.833	0.802	0.198	0.817	0.602	0.846	0.885
4	0.5	0.825	0.833	0.829	0.171	0.845	0.654	0.842	0.869
5	0.5	0.762	0.896	0.819	0.180	0.827	0.642	0.864	0.892