Skip to main content
. 2017 Mar 7;8:11. doi: 10.1186/s13326-017-0115-3

Table 1.

Decisions that are made during the process of integrating sources that can influence downstream pharmacovigilance analyses

Data Type Feature Option for variability Performance questions
Product labels Product label outcome mention Named entity performance (PPV and sensitivity) Do improvements in entity recognition performance improve system recall and precision?
Section location (e.g., anywhere vs specific sections) Does identifying which sections are more informative than others reduce noise?
Frequency information Threshold variation Does incorporation of ADE frequency improve performance? What cut-off should be used?
Pharmacovigilance DBs (e.g. FAERS, MedEffect, VigiBase) Minimum detectable relative risk Threshold variation What is the appropriate cut-off for MDRR? Is it HOI specific?
Database (s) chosen Does the database influence the value of MDRR for this task?
Risk identification method Disproportionality metric What metric (e.g. PRR, EBGM, IC) leads to the best performance? Is it HOI specific?
Number of cases in FAERS Threshold variation What is the appropriate cut-off for number of case reports?
Drug Indication DB Indication listings in FDB Yes/no and when mentioned Does using on-label and off-label indication knowledge improve performance?
Indexed literature Number of relevant publications from the indexed literature Threshold variation Is there an appropriate cut-off for number of publications? What is its variability relative to specific HOIs and drugs?
Source of relevant publications from the indexed literature Varying the combination of sources Should we be selective about the sources used or chose all of them?
Drug and outcome mention in relevant indexed literature Named entity performance Do improvements in entity recognition performance improve system recall and precision?
Main MeSH terms vs supplemental What is the value of MeSH supplemental terms relative to the primary index terms?
Scientific discourse tag of the location of mention (e.g., intro, methods, results, conclusions) Does limiting identification of drug-HOI co-mention to specifically tagged text excerpts improve performance?
Publication type label (randomized trial, case report, etc.) Should the publication type of the drug-HOI co-mention be tracked and possibly weighted to improve performance?
Source of publication type label (Embase, MeSH) Is one publication type indexing system better than the other for the question answering task, or should they be combined?
Topic of the source publication based on latent semantic indexing Does the use of tags assigned to text sources by latent semantic indexing improve system performance if used as a feature?
Observational health data (claims + EHR) Minimum detectable relative risk Threshold variation What is the appropriate cut-off for MDRR? Is it HOI specific?
Database (s) chosen Does the database influence the value of MDRR for this task?
Risk identification method Analytic method What method (e.g. disproportionality analysis, self-controlled case series, IC temporal pattern discovery, high-dimensional propensity score) leads to the best performance? Is it HOI specific?
Cohort selection Patient ethnicity, age, sex, co-morbidities, concurrent medications Does cohort selection using these features affect model performance? What is the appropriate size and diversity of the cohort to reduce noise and bias?
Drug exposure conditions Length of exposure, dosage Does selecting minimum exposure duration criteria and/ or drug dosage information improve performance?
Study replicability Number of locations for confirming results How many replicates of the study should be performed at different institutions?
Observation period Observation duration threshold Does setting minimum observation period durations improve performance?

PPV: positive predictive value, OMOP: Observational Medical Outcomes Partnership, ADE: adverse drug event, MDRR: minimal detectable reporting ratio, HOI: health outcome of interest, DB: database, FAERS: Food and Drug Administration Adverse Event Reporting System, EBGM: empirical Bayes geometric mean. IC: information component, FDB: First Data Bank (commercial drug knowledge base), EHR: electronic health record