Table 3.
Smoking status | Reference | ||||
---|---|---|---|---|---|
Current smoker | Former smoker | Never smoker | No information | Total | |
Data mining algorithm | |||||
Current smoker | 576 | 93 | 30 | 23 | 722 |
Former smoker | 71 | 265 | 19 | 4 | 359 |
Never smoker | 2 | 18 | 230 | 0 | 250 |
Not classified | 87 | 38 | 28 | 76 645 | 76 798 |
Total | 736 | 414 | 307 | 76 672 | 78 129 |
Current smoker: precision = 79.7% (95% CI 76.6–82.6%), recall = 78.3% (95% CI 75.1–81.2%), F1-score = 0.79. Former smoker: precision = 73.8% (95% CI 68.9–78.2%), recall = 64.0% (95% CI 59.2–68.6%), F1-score = 0.69. Never smoker: precision = 92.0% (95% CI 87.7–94.9%), recall = 74.9% (95% CI 69.6–79.6%), F1-score = 0.83. Named Entity Recognition of keywords associated with smoking: precision = 98.0% (95% CI 97.1–98.7%), recall = 89.5% (95% CI 87.8–91.0%), F1-score = 0.94.