Table 3.
Long COVID diagnosis: recently applied data mining and NLP techniques.
| Study | Input data | AI method | Task | Output (%) |
|---|---|---|---|---|
| Miao et al. (2022) | Tweets | NLP | Analysis of reported Long COVID symptoms in terms of demographics, geographical and temporal parameters | Accuracy demographic categories 89 symptom categories 95 |
| Zhu et al. (2022) | Clinical notes | Pretrained BERT | Identification of Long COVID and potential computational phenotypes | Sensitivity score 88.1 |
| Scarpino et al. (2022) | Blogs | LDA and BERT | Extract discussion topics in the Italian narration of COVID-19 pandemic | Accuracy of BERT 91.97 |
| Matharaarachchi et al. (2022) | Tweets | Association rule mining | Relationships between symptoms | Confidence 77 for lung/breathing problems and loss of taste vs. loss of smell |
| Wang et al. (2022) | Clinical notes | PASCLex (NLP) model | Identification of symptoms | Precision 94 recall 84 |
| Banda et al. (2021) | Tweets | NLP and SVM | Identification of symptoms | Accuracy 75 on a 20% random held-out test set |
| Déguilhem et al. (2022) | Tweets | Biterm Topic Modeling | Identification and co-occurrence of symptoms | Three major symptom co-occurrences: asthenia-dyspnea 35.3, asthenia-anxiety 22.5, asthenia-headaches 17.3 |
The division is between BERT (the first 3 rows) and other approaches (all reported measures have the same number of decimal digits as the original paper).