Table 2:
Summary of the datasets used for the experiments presented. We merge the NER datasets to detect symptoms by combining ADE and Indication spans. Positive column indicates the percentage of tweets containing ADE, Indication, Symptom or Disease spans among all tweets in the dataset.
Corpus | Source | Annotations used | Training | Test | % Positive |
---|---|---|---|---|---|
DS-NER | DailyStrength | NER spans (ADE, Indication) | 4720 | 1559 | 32% |
Tw-NER | NER spans (ADE, Indication) | 1340 | 443 | 56% | |
SMM4H-2020 | MedDRA (ADE) | 1786 | 1123 | 100% | |
HLP-ADE-v1 | MedDRA (ADE) | 2276 | 1559 | 100% | |
CADEC | AskAPatient | NER spans and MedDRA (ADE, Symptom, Drug, Disease) | 1000 | 250 | 100% |
Micromed | NER spans (Symptom, Drug, Disease) | 500 | 165 | 44% | |
TwiMed | NER spans and UMLS (Symptom, Drug, Disease) | 400 | 145 | 82% | |
MedNorm | Twitter, AskAPatient | MedDRA (Symptom, Drug, Disease) | 27,979 | - | 100% |