Skip to main content
[Preprint]. 2022 Mar 21:2021.02.09.21251454. Originally published 2021 Feb 12. [Version 2] doi: 10.1101/2021.02.09.21251454

Table 2:

Summary of the datasets used for the experiments presented. We merge the NER datasets to detect symptoms by combining ADE and Indication spans. Positive column indicates the percentage of tweets containing ADE, Indication, Symptom or Disease spans among all tweets in the dataset.

Corpus Source Annotations used Training Test % Positive
DS-NER DailyStrength NER spans (ADE, Indication) 4720 1559 32%
Tw-NER Twitter NER spans (ADE, Indication) 1340 443 56%
SMM4H-2020 Twitter MedDRA (ADE) 1786 1123 100%
HLP-ADE-v1 Twitter MedDRA (ADE) 2276 1559 100%
CADEC AskAPatient NER spans and MedDRA (ADE, Symptom, Drug, Disease) 1000 250 100%
Micromed Twitter NER spans (Symptom, Drug, Disease) 500 165 44%
TwiMed Twitter NER spans and UMLS (Symptom, Drug, Disease) 400 145 82%
MedNorm Twitter, AskAPatient MedDRA (Symptom, Drug, Disease) 27,979 - 100%