. 2017 Mar 7;8:11. doi: 10.1186/s13326-017-0115-3

Table 1.

Decisions that are made during the process of integrating sources that can influence downstream pharmacovigilance analyses

Data Type	Feature	Option for variability	Performance questions
Product labels	Product label outcome mention	Named entity performance (PPV and sensitivity)	Do improvements in entity recognition performance improve system recall and precision?
	Product label outcome mention	Section location (e.g., anywhere vs specific sections)	Does identifying which sections are more informative than others reduce noise?
	Frequency information	Threshold variation	Does incorporation of ADE frequency improve performance? What cut-off should be used?
Pharmacovigilance DBs (e.g. FAERS, MedEffect, VigiBase)	Minimum detectable relative risk	Threshold variation	What is the appropriate cut-off for MDRR? Is it HOI specific?
	Minimum detectable relative risk	Database (s) chosen	Does the database influence the value of MDRR for this task?
	Risk identification method	Disproportionality metric	What metric (e.g. PRR, EBGM, IC) leads to the best performance? Is it HOI specific?
	Number of cases in FAERS	Threshold variation	What is the appropriate cut-off for number of case reports?
Drug Indication DB	Indication listings in FDB	Yes/no and when mentioned	Does using on-label and off-label indication knowledge improve performance?
Indexed literature	Number of relevant publications from the indexed literature	Threshold variation	Is there an appropriate cut-off for number of publications? What is its variability relative to specific HOIs and drugs?
	Source of relevant publications from the indexed literature	Varying the combination of sources	Should we be selective about the sources used or chose all of them?
	Drug and outcome mention in relevant indexed literature	Named entity performance	Do improvements in entity recognition performance improve system recall and precision?
		Main MeSH terms vs supplemental	What is the value of MeSH supplemental terms relative to the primary index terms?
		Scientific discourse tag of the location of mention (e.g., intro, methods, results, conclusions)	Does limiting identification of drug-HOI co-mention to specifically tagged text excerpts improve performance?
		Publication type label (randomized trial, case report, etc.)	Should the publication type of the drug-HOI co-mention be tracked and possibly weighted to improve performance?
		Source of publication type label (Embase, MeSH)	Is one publication type indexing system better than the other for the question answering task, or should they be combined?
		Topic of the source publication based on latent semantic indexing	Does the use of tags assigned to text sources by latent semantic indexing improve system performance if used as a feature?
Observational health data (claims + EHR)	Minimum detectable relative risk	Threshold variation	What is the appropriate cut-off for MDRR? Is it HOI specific?
	Minimum detectable relative risk	Database (s) chosen	Does the database influence the value of MDRR for this task?
	Risk identification method	Analytic method	What method (e.g. disproportionality analysis, self-controlled case series, IC temporal pattern discovery, high-dimensional propensity score) leads to the best performance? Is it HOI specific?
	Cohort selection	Patient ethnicity, age, sex, co-morbidities, concurrent medications	Does cohort selection using these features affect model performance? What is the appropriate size and diversity of the cohort to reduce noise and bias?
	Drug exposure conditions	Length of exposure, dosage	Does selecting minimum exposure duration criteria and/ or drug dosage information improve performance?
	Study replicability	Number of locations for confirming results	How many replicates of the study should be performed at different institutions?
	Observation period	Observation duration threshold	Does setting minimum observation period durations improve performance?

PPV: positive predictive value, OMOP: Observational Medical Outcomes Partnership, ADE: adverse drug event, MDRR: minimal detectable reporting ratio, HOI: health outcome of interest, DB: database, FAERS: Food and Drug Administration Adverse Event Reporting System, EBGM: empirical Bayes geometric mean. IC: information component, FDB: First Data Bank (commercial drug knowledge base), EHR: electronic health record