Skip to main content
. 2023 Mar 30;54(1):e2035300. doi: 10.25100/cm.v54i1.5300

Table 4. Performance measures of the extraction algorithm when applied to categorical characteristics. Precision measures the number of correctly classified reports among the total number of reports assigned to the class by the algorithm. Recall, measures the number of reports correctly classified among the number of true (i.e., human classified) reports in that class. The f-score is the harmonic mean of precision and recall. For multiclass characteristics precision, recall and f-score are averaged over classes (macro average). Overall accuracy is the number of reports correctly classified among the total number of reports evaluated.

Descriptor Macro Precision (%) Macro Recall (%) Macro f-score (%) Overall Accuracy % (n/N)
Complementary descriptors Laterality 66.2 50.0 52.9 64.3 (27/42)
Behavior 57.1 92.7 58.6 85.7 (36/42)
Grade 70.3 64.8 79.6 76.2 (32/42)
Method of Assessment for Solid Tumors 78.6 94.8 78.4 85.7 (36/42)
Method of Assessment for Hematological Tumors 100 100 100 100 (42/42)
Diagnostic Procedure 95.0 83.7 87.2 90.5 (38/42)
Lymphovascular Invasion 82.5 91.2 83.9 85.7 (36/42)
Surgical Margins 94.4 77.2 82.8 90.5 (38/42)
Pulmonary Metastasis 100 100 100 100 (42/42)
Osseous Metastasis 92.8 50.0 96.3 92.9 (39/42)
Hepatic Metastasis 75.0 66.7 83.3 97.6 (41/42)
Brain Metastasis 50.0 50.0 100 97.6 (41/42)
Distant Lymph Nodes Metastasis 50.0% 97.6 98.8 97.6 (41/42)
Other Metastasis 98.8 75.0 82.7 97.6 (41/42)
Special descriptors Examined Regional Nodes 92.3 100 96.0 41.7 (5/12)
Positive Regional Nodes 92.3 100 96.0 58.3 (7/12)
Tumor Size 85.7 75.0 80.0 50.0 (6/12)
TNM-based Staging 100 75.0 85.7 100 (3/3)