. 2022 May 17;45(5):459–476. doi: 10.1007/s40264-022-01155-6

Table 1.

Data sources for pharmacovigilance, analytical approaches, advantages, and biases

Analytical approaches	Pharmacovigilance tasks	Advantages and biases
Spontaneous reporting system
Association rule mining [25, 26]	Drug–event pair extraction [11] ADE detection [12, 25–28, 31, 32] ADE prediction (post-marketing) [71, 72]	Advantages: 1. Large volume of data worldwide. Create potentials for machine learning models to be trained 2. Provide other related information such as demographic and indication data 3. More effective at detecting rare ADEs 4. Publicly accessible Biases: 1. No population denominator who takes the medications. Could not calculate incidence rates of ADEs. Limited ability to provide causal evaluation 2. Suffer from under-reporting and stimulated reporting. May cause bias in machine learning 3. Lower reporting rates for older products 4. May have duplicate reports 5. Reporters have diverse background, such as pharmaceuticals companies, physicians, patients, and lawyers, which may pose challenges in data standardization. May undermine machine learning model transportability 6. It will take a long time for data collection, thus there may be a delay in detection of ADEs
Disproportionality [27, 28, 31, 32]
Network analysis [12]
Clustering [11]
SVM, Bayesian classifier, decision tree and/or Random Forest [71, 72]
RWD (EHRs and registries)
Disproportionality [6, 7]	Drug–event pair extraction [6, 7, 73] ADE detection [36, 41–44, 64, 74] ADE prediction (post-marketing) [63, 75]	Advantages: 1. Provides a population denominator who has taken the same medications, which enables adoption of study designs for causal effect estimation 2. The data quality in well-curated RWD databases is better than SRS 3. Less duplicated and missing data in well-curated RWD databases 4. Less adverse event unreported rate 5. RWD databases could provide more complete clinical information such as lab test results. Provide better causal inference ability compared with SRS Biases: 1. Less sample size than SRS. May diminish predictive power of machine learning models 2. EHRs contain protected health information of the patients. Thus, it could not be opened to the public, also difficult to share between institutions 3. EHRs mainly record drug usage information in the hospital. Thus, EHRs work better in inpatient ADE detection than outpatient. May diminish generalizability of machine learning models
Cohort/case-based study [41, 42, 74]
Sequence/temporal analysis [36, 43, 44]
SVM, Bayesian classifier, decision tree and/or Random Forest [75, 76]
NLP relation extraction [10, 73]
Neural network [63]
Social media
Association rule mining [46]	Drug–event pair extraction [46, 56, 69, 77, 78] ADE detection [54]	Advantages: 1. Huge data size with rapid growth. Create potentials for machine learning models to be trained 2. Open access 3. The content is patient centric 4. Could conduct a “real-time” ADE monitor Biases: 1. The contents are not from experts, thus it may affect data quality and reliability 2. Using NLP to extract all the ADE-related data from texts is challenging. NLP techniques are essential before applying to any machine learning task or causal inference paradigm 3. Could not calculate ADE incidence rate. Limited ability to provide causal evaluation 4. Still need to be further confirmed by other evidence or analysis 5. Ethical issues may exist
SVM, Bayesian classifier, decision tree and/or Random Forest [54, 56, 69, 77]
NLP relation extraction [69]
Neural network [56, 78]
Biomedical literature
Clustering [70]	Drug–event pair extraction [49, 50] ADE detection [70]	Advantages: 1. Data quality and reliability are better 2. Literature is easily accessible. Biases: 1. Data size is smaller than social media. May diminish predictive power of machine learning models 2. Timeliness is worse because of the peer-review and publishing process 3. Detected ADEs still need to be further confirmed by other evidence or analysis
SVM, Bayesian classifier, decision tree and/or Random Forest [70]
NLP relation extraction [8, 65]
Neural network [49, 50]
Knowledge bases
SVM, Bayesian classifier, decision tree and/or Random Forest [57–59, 79]	ADE prediction (pre-marketing) [57–59, 66, 79]	Advantages: 1. Most of the databases are open to the public 2. Better data structure and data standardization level. Create potentials for machine learning models to be trained Biases: 1. Need for a complicated paradigm to integrate and analyze the data 2. The graph structures in knowledge bases lack causal components, making causal interpretation difficult 3. Many false-positive results may impact the prediction accuracy 4. ADE prediction results are based on theoretical algorithms, which needs other RWD or evidence to confirm
Neural network [66]	ADE prediction (pre-marketing) [57–59, 66, 79]

ADE adverse drug event, EHR electronic health record, NLP natural language processing, RWD real-world data, SRS spontaneous reporting system, SVM support vector machine