Table 2.
Classification performance of the disease identification algorithma.
Disease category | Campaigns in the reference set that mention disease category, n | Precision (95% CI) | Recall (95% CI) | F1 score | Accuracy (95% CI) |
Cardiovascular diseases | 82 | 0.92 (0.86-0.99) | 0.74 (0.65-0.84) | 0.82 | 0.94 (0.91-0.96) |
Endocrine diseases | 19 | 0.75 (0.54-0.96) | 0.63 (0.41-0.85) | 0.69 | 0.97 (0.96-0.99) |
Gastrointestinal diseases | 18 | 0.56 (0.33-0.79) | 0.56 (0.33-0.79) | 0.56 | 0.96 (0.94-0.98) |
Genitourinary diseases | 35 | 0.97 (0.90-1.03) | 0.8 (0.67-0.93) | 0.88 | 0.98 (0.97-0.99) |
Infections | 30 | 0.56 (0.41-0.71) | 0.77 (0.62-0.92) | 0.65 | 0.94 (0.91-0.96) |
Injuries and external causes | 53 | 0.69 (0.58-0.80) | 0.92 (0.85-1.00) | 0.79 | 0.94 (0.91-0.96) |
Mental health disorders | 20 | 0.48 (0.30-0.66) | 0.7 (0.50-0.90) | 0.57 | 0.95 (0.93-0.97) |
Musculoskeletal diseases | 45 | 0.64 (0.48-0.80) | 0.51 (0.37-0.66) | 0.57 | 0.91 (0.88-0.94) |
Neoplasms | 162 | 0.95 (0.91-0.98) | 0.98 0.96-1.00) | 0.96 | 0.97 (0.95-0.99) |
Nervous system diseases | 66 | 0.88 (0.76-0.99) | 0.42 (0.31-0.54) | 0.57 | 0.90 (0.86-0.93) |
Respiratory diseases | 29 | 0.92 (0.81-1.03) | 0.76 (0.60-0.91) | 0.83 | 0.98 (0.96-0.99) |
aThe average precision, recall, F1 score, and accuracy values are 0.83, 0.77, 0.78, and 0.95, respectively. Classification performance is based on a comparison to 400 campaigns that were annotated by a team of expert coders. The averages are weighted by the number of campaigns in the reference set that mention each disease category.