[Preprint]. 2022 Mar 21:2021.02.09.21251454. Originally published 2021 Feb 12. [Version 2] doi: 10.1101/2021.02.09.21251454

Table 2:

Summary of the datasets used for the experiments presented. We merge the NER datasets to detect symptoms by combining ADE and Indication spans. Positive column indicates the percentage of tweets containing ADE, Indication, Symptom or Disease spans among all tweets in the dataset.

Corpus	Source	Annotations used	Training	Test	% Positive
DS-NER	DailyStrength	NER spans (ADE, Indication)	4720	1559	32%
Tw-NER	Twitter	NER spans (ADE, Indication)	1340	443	56%
SMM4H-2020	Twitter	MedDRA (ADE)	1786	1123	100%
HLP-ADE-v1	Twitter	MedDRA (ADE)	2276	1559	100%
CADEC	AskAPatient	NER spans and MedDRA (ADE, Symptom, Drug, Disease)	1000	250	100%
Micromed	Twitter	NER spans (Symptom, Drug, Disease)	500	165	44%
TwiMed	Twitter	NER spans and UMLS (Symptom, Drug, Disease)	400	145	82%
MedNorm	Twitter, AskAPatient	MedDRA (Symptom, Drug, Disease)	27,979	-	100%