Table 1.
Decisions that are made during the process of integrating sources that can influence downstream pharmacovigilance analyses
Data Type | Feature | Option for variability | Performance questions |
---|---|---|---|
Product labels | Product label outcome mention | Named entity performance (PPV and sensitivity) | Do improvements in entity recognition performance improve system recall and precision? |
Section location (e.g., anywhere vs specific sections) | Does identifying which sections are more informative than others reduce noise? | ||
Frequency information | Threshold variation | Does incorporation of ADE frequency improve performance? What cut-off should be used? | |
Pharmacovigilance DBs (e.g. FAERS, MedEffect, VigiBase) | Minimum detectable relative risk | Threshold variation | What is the appropriate cut-off for MDRR? Is it HOI specific? |
Database (s) chosen | Does the database influence the value of MDRR for this task? | ||
Risk identification method | Disproportionality metric | What metric (e.g. PRR, EBGM, IC) leads to the best performance? Is it HOI specific? | |
Number of cases in FAERS | Threshold variation | What is the appropriate cut-off for number of case reports? | |
Drug Indication DB | Indication listings in FDB | Yes/no and when mentioned | Does using on-label and off-label indication knowledge improve performance? |
Indexed literature | Number of relevant publications from the indexed literature | Threshold variation | Is there an appropriate cut-off for number of publications? What is its variability relative to specific HOIs and drugs? |
Source of relevant publications from the indexed literature | Varying the combination of sources | Should we be selective about the sources used or chose all of them? | |
Drug and outcome mention in relevant indexed literature | Named entity performance | Do improvements in entity recognition performance improve system recall and precision? | |
Main MeSH terms vs supplemental | What is the value of MeSH supplemental terms relative to the primary index terms? | ||
Scientific discourse tag of the location of mention (e.g., intro, methods, results, conclusions) | Does limiting identification of drug-HOI co-mention to specifically tagged text excerpts improve performance? | ||
Publication type label (randomized trial, case report, etc.) | Should the publication type of the drug-HOI co-mention be tracked and possibly weighted to improve performance? | ||
Source of publication type label (Embase, MeSH) | Is one publication type indexing system better than the other for the question answering task, or should they be combined? | ||
Topic of the source publication based on latent semantic indexing | Does the use of tags assigned to text sources by latent semantic indexing improve system performance if used as a feature? | ||
Observational health data (claims + EHR) | Minimum detectable relative risk | Threshold variation | What is the appropriate cut-off for MDRR? Is it HOI specific? |
Database (s) chosen | Does the database influence the value of MDRR for this task? | ||
Risk identification method | Analytic method | What method (e.g. disproportionality analysis, self-controlled case series, IC temporal pattern discovery, high-dimensional propensity score) leads to the best performance? Is it HOI specific? | |
Cohort selection | Patient ethnicity, age, sex, co-morbidities, concurrent medications | Does cohort selection using these features affect model performance? What is the appropriate size and diversity of the cohort to reduce noise and bias? | |
Drug exposure conditions | Length of exposure, dosage | Does selecting minimum exposure duration criteria and/ or drug dosage information improve performance? | |
Study replicability | Number of locations for confirming results | How many replicates of the study should be performed at different institutions? | |
Observation period | Observation duration threshold | Does setting minimum observation period durations improve performance? |
PPV: positive predictive value, OMOP: Observational Medical Outcomes Partnership, ADE: adverse drug event, MDRR: minimal detectable reporting ratio, HOI: health outcome of interest, DB: database, FAERS: Food and Drug Administration Adverse Event Reporting System, EBGM: empirical Bayes geometric mean. IC: information component, FDB: First Data Bank (commercial drug knowledge base), EHR: electronic health record