Abstract
Introduction
Individual case reports are essential to identify and assess previously unknown adverse effects of medicines. On these reports, information on adverse events (AEs) and drugs are encoded in hierarchical terminologies. Encoding differences may hinder the retrieval and analysis of clinically related reports relevant to a topic of interest. Recent studies have explored the use of data-driven semantic vector representations to support analysis of pharmacovigilance data.
Objective
This study aims to evaluate the stability and clinical relatedness of vigiVec, a semantic vector representation for codes of AEs and drugs.
Methods
vigiVec is a published adaptation to pharmacovigilance of the publicly available Word2Vec model, applied to structured data instead of free text. It provides vector representations for MedDRA® Preferred Terms and WHODrug Global active ingredients, learned from reporting patterns in VigiBase, the WHO global database of adverse event reports for medicines and vaccines. For this study, a 20-dimensional Skip-gram architecture with window size 250 was used. Our evaluation focused on nearest neighbors identified by the cosine similarity of vigiVec vector representations. Clinical relatedness was measured through term intruder detection, whereby a medical doctor was tasked to identify a randomly selected term—the intruder—included among the four nearest neighbors to a specific AE or drug. Stability was measured as the average overlap in the ten nearest neighbors for each AE or drug, in repeated fittings of vigiVec.
Results
Among the ten nearest neighbors, 1.8 AEs on average belonged to the same MedDRA High Level Term (HLT; e.g., coagulopathies), and 1.3 drugs belonged to the same Anatomical Therapeutic Chemical level 3 (ATC-3; e.g., opioids). In the intruder detection task, when neighbors and intruders were both chosen from the same HLT, the intruder detection rate was 46%. When selected from different HLTs, it was 79%. By random chance, we should expect 20% (1 in 5). Corresponding rates for drugs were 42% in same ATC-3 and 65% in different ATC-3. The stability of nearest neighbors was 80% for AEs and 64% for drugs.
Conclusion
Nearest neighbors identified with vigiVec are stable and show high level of clinical relatedness. They are often from different parts of the existing hierarchies and complement these.
Supplementary Information
The online version contains supplementary material available at 10.1007/s40264-024-01509-2.
Key Points
| Pharmacovigilance relies on information encoded in structured terminologies, whose hierarchical structures can be complemented by data-driven identification of clinically related adverse events and drugs. |
| vigiVec is an artificial intelligence solution for pharmacovigilance providing semantic vector representations of adverse events and drugs based on reporting patterns in VigiBase, the WHO global database of adverse event reports for medicines and vaccines. |
| This study’s systematic evaluation demonstrates that the semantic representations of vigiVec are stable and show a high level of clinical relatedness, providing a foundation for new and improved capture and analysis of pharmacovigilance data. |
Introduction
For early detection of previously unknown adverse drug reactions, individual case reports of suspected harm from medicinal products (hereafter ‘drugs’) are the primary data source [1–3]. Individual case reports describe adverse events (AEs) and drugs in free text and associated medical codes. Specifically, AEs tend to be encoded using the Medical Dictionary for Regulatory Activities (MedDRA®) [4] and drugs are categorized using classification terminologies such as the Anatomical Therapeutic Chemical (ATC) classification system [5] or the WHODrug Global international reference for medicinal product information [6]. The use of hierarchical terminologies facilitates data retrieval, aggregation, and analysis. However, there will often be differences in the encoding of clinically related AEs, depending on the clinical presentation, capture, and encoding of each patient’s experience [7]. Some of this variability in coding practices will be systematic. There may, for example, be encoding preferences associated with specific time periods, geographic regions, or types of reporter (e.g., patients, physicians). As a result, analyses may fail to capture all relevant reports related to a specific clinical topic, unless data retrieval relies on comprehensive database queries that account for the relatedness of codes placed within distinct hierarchical branches of standardized terminologies. When analyzing individual case reports, it is important to consider the reporting context and potential ambiguities when interpreting the coded information. This is accentuated by the lack of formal definitions of AE codes and drug codes within the standard terminologies, even though some information can be inferred from their place in the hierarchies.
An active area of research is whether semantic relationships between codes for AEs and drugs (whether there are meaningful relationships between them) can be inferred from AE reporting patterns, in a data-driven manner [8–12]. Semantic representations of coded AEs may complement the information embedded in medical terminologies and highlight AEs used in similar contexts, even when they are positioned in different parts of the terminology (e.g., blood pressure increased in the investigations class and hypertension in the diagnosis class of MedDRA). This could help support pharmacovigilance experts in several phases of signal management. In signal validation, the initial human review of identified signals, domain experts often search for additional case reports related to the medical concept of interest that may have been encoded with slightly different codes. Here, AE codes may be proposed based on semantic similarity with the AE of interest. Semantic relationships may also be combined with natural language processing of Structured Product Labels or the scientific literature to alert assessors to relevant existing knowledge about adverse drug reactions. There may also be opportunities to improve statistical signal detection by incorporating semantic representations of AEs (or drugs), for example, by rewarding consistent reporting patterns for semantically similar AEs (or drugs). Data-driven semantic similarity may even provide a basis for approaches like hierarchical Bayes [13] or tree-based scan statistics [14], which would otherwise be based on the standard MedDRA hierarchy. Beyond that, semantic representations could be combined with named entity recognition to enable improved encoding of AEs or drugs upon data entry. Here, closely related AE terms could be suggested to the reporter to aid with selecting the best terms based on their initial input, either by replacing the initial code with one more appropriate, or by suggesting additional related codes. Finally, semantic representations of AEs and drugs could assist domain experts in the creation of new custom groupings within the standard terminologies. Of note, all the mentioned cases are such that computational methods would support and enhance (but not replace) human decision making by individuals with in-depth insights into the breadth and structure of the medical terminologies.
Earlier initiatives seeking to derive semantic relations from AE reporting patterns include the use of Hopfield neural networks [8] and bi-clustering methods that identify sets of AEs and drugs with strong pairwise associations [9]. Cluster analysis of individual case reports based on co-reported AEs has also been proposed to identify reporting patterns for certain subsets of patients [15]. More recently, word embedding models from natural language processing have been adapted to model AE and drug codes. Specifically, Gattepaille [10] fitted a Word2Vec model to reporting patterns of AE and drug codes in VigiBase, the WHO global database of adverse event reports for medicines and vaccines [16], providing the basis for the vigiVec model of focus in this study. vigiVec is trained to predict AEs and drugs based on all other encoded AEs and drugs in the same report. The predictions themselves are not of interest. Instead, the resulting weight matrices of the trained neural network are used as semantic representations for AE codes and drug codes. Independently, Portanova et al adapted Word2Vec to reporting patterns in the publicly available version of the FDA AE Reporting System (FAERS) in a model called aer2vec [11]. Unlike vigiVec, aer2vec focuses on predicting either drugs based on the co-reported AEs or AEs based on the co-reported drugs, using the resulting probabilities for signal detection, while the weight matrices are not central to their analysis.
The aim of this paper is to evaluate the stability and clinical relatedness of semantic vector representations for AEs and drugs generated by vigiVec.
Method
We performed a systematic evaluation of stability and clinical relatedness of the vigiVec model, after optimizing its hyperparameters.
Dataset
As of June 2022, VigiBase held 31 million reports, excluding suspected duplicates, from over 150 countries where the earliest report dates back to 1967. Each case report consists of several structured fields for reporting data on the patient, the drugs used, the observed AEs, dates and/or durations for the drugs and events, etc. In the structured fields, AEs are encoded using MedDRA [4] and drugs are encoded using the WHODrug global dictionary [6]. For example, the coded AEs and drugs for a single case report in VigiBase may be: asthma [MedDRA 10003553]; paracetamol [WHODrug 000200]; chest pain [MedDRA 10008479].
In this study, the vocabulary of codes derived from VigiBase (as of June 1, 2022) comprised a total of 50,668 unique codes, 22,384 of which were MedDRA PT codes (Version 25.0) and 28,284 of which were WHODrug active ingredient level codes (March 2022 Version). In this data set, the median number of MedDRA Preferred Terms per report was 2 (with 90% inter-percentile range, IPR: 1–7) and the median number of WHODrug active ingredients was 1 (IPR: 1–7). Considering AEs and drugs together, the median number of codes per report was 3 (IPR: 2–12).
For some of our evaluations and analyses, we utilized the structures of the hierarchical terminologies. For AE analyses, we used MedDRA High Level Terms (HLTs) and Standardised MedDRA Queries (SMQs). The HLT groups related PTs by anatomy, pathology, physiology, etiology, or function (e.g.,, allergic conditions [10027654]). Standardised MedDRA Queries (SMQs) are manually curated groups of PTs related to specific medical conditions or areas of interest (e.g., anaphylactic reaction). For drug analyses, we utilized ATC Level 3 (ATC-3) and WHODrug Standardised Drug Groupings (SDGs) [6]. ATC-3 groups drugs based on chemical, pharmacological and therapeutic characteristics (e.g., N02B other analgesics and antipyretics). Standardised Drug Groupings bring together medicinal products with shared properties (e.g., anxiolytics or drugs interacting with CYP2C8), based on factors such as indication, chemical properties, pharmacodynamic properties or pharmacokinetic properties.
vigiVec
vigiVec is an adaptation of the distributional semantic method Word2Vec [17, 18] applied to pharmacovigilance data. vigiVec was built using Gensim word2vec model [19] version 4.2.0. The original vigiVec implementation is described in a conference paper by Gattepaille [10].
The original Word2Vec creates vector representations of words (sets of real numbers for each word) from an unlabeled corpus. It learns word representations by training a shallow neural network with a single hidden layer. The resulting neural network weights are used as vector representations for words. These vectors can be seen as coordinates, ideally placing words with similar meanings close together in space. Word2Vec has two possible architectures, Continuous Bag of Words (CBOW) or Skip-gram. In both architectures, Word2Vec applies a sliding window over a sentence where the number of words in the window, i.e., the context, is decided by the hyperparameter window size. CBOW predicts the center word based on the average vectors of the words in the context whereas Skip-gram creates tasks to predict the individual words in the context given the center word. The number of nodes in the hidden layer determines the dimensions of the resulting vectors, referred to as vector size. As low-frequent words have less data to be trained on, the parameter min count can be set to discard words below a set threshold from the training.
vigiVec, instead, creates vector representations for AE codes and drug codes, jointly referred to as codes. These codes share a single vector space. For the modelling of vigiVec, Word2Vec was trained using VigiBase as the corpus, treating each case report’s coded AEs and drugs as a sentence and considering each respective AE and drug code as its tokens, e.g., treating MedDRA chest pain 10008479 as a token in fitting the model. For an illustration of the correspondence between Word2Vec and vigiVec, see Table 1. One important difference is that Word2Vec utilizes the order of the tokens, but as there is no generally meaningful ordering of the coded drugs and AEs on the reports, this was randomized in vigiVec for reports with greater numbers of codes than could be covered by the context.
Table 1.
Correspondence between concepts in vigiVec and the underlying Word2Vec model
| Word2Vec | vigiVec |
|---|---|
| Tokens, e.g., “beautiful” or “ocean” | Codes (AEs and drugs), e.g., “10008479” representing chest pain or “000200” representing paracetamol |
| Sentence, e.g., “The sun was setting over the beautiful ocean” | Codes from a report in random order, e.g., “10008479 000200 001092 10013968” |
| Corpus, e.g., all text in Wikipedia | All reports in VigiBase |
During model search and selection, we assessed different model configurations according to stability and correlation with the University of Minnesota Semantic Relatedness Standard (UMNSRS) and identified the Skip-gram model architecture with a vector size of 20, a window size of 250 and min count of 1 as our optimized configuration of vigiVec. For a detailed description of the model search and selection process and findings, please refer to Section 1 of the supplementary material.
Evaluation Metrics
We performed a systematic evaluation of the optimized vigiVec configuration focused on stability and clinical relatedness. Since most of our use cases involve the identification of nearest neighbors of the same kind (either AEs or drugs) our evaluations follow the same approach. Throughout the paper, 'nearest' is defined as having highest cosine similarity restricted to codes of the same kind (i.e., either AEs or drugs). Cosine similarity is the normalized inner product of two vector representations measuring the extent to which they point in the same direction.
For clinical relatedness we adapted the term intruder approach previously proposed for topic modelling [15, 20], and previously used in at least one evaluation of word embeddings [21].
Stability
A desired attribute of the vector space is consistency, given random initializations of the neural network weights. However, the absolute positions need not be the same but rather the relative positions. With that in mind, we define the stability of the neighborhood for a single AE code as:
where and are the sets of k nearest AE code neighbors to the i:th AE code in two separate vector representations trained with the same configuration, but with different random initializations. An equivalent analysis is performed to evaluate stability of the neighborhood of a single drug (equivalent for drug codes). We define our overall stability metric as the average of over all included AE codes, ranging from 0 and 1. We used k=10 in all stability evaluations reported in this study, based on our hypothesis that smaller sets of nearest neighbors will typically be preferred in our use cases.
A benefit of this stability metric is that since each AE and drug code has its own stability estimate we could analyze the stability for different groups of AEs and drugs separately, to further understand how their characteristics impacted the stability of the resulting vectors.
Since the same stability metric was used for both model search and selection, as well as performance evaluation, we used a random sample that included half of the AE codes and half of the drug codes with at least one report in VigiBase for model search and selection. The remaining codes were then used for performance evaluation.
Term Intruder Detection
Clinical relatedness was evaluated through a term intruder detection task. For a specific AE code (or drug code), we selected the four nearest neighbors along with a random intruder and scrambled the order. We then tasked a medical doctor (co-author JFC) to highlight the most likely intruder, quantifying clinical relatedness as the fraction of correctly highlighted intruders. The medical doctor performing the term intruder detection was blinded to the preparation of the task. The idea is that the more likely the expert is able to identify the intruder, the more clinically coherent are the neighbors. To ensure that the term intruder task was not overly reliant on the perspective of a single domain expert, we had a second medical doctor complete the task. For a comparison of the term intruder detection task results between assessors, please refer to the supplementary material.
To evaluate the ability of vigiVec to reflect clinical relatedness both outside and within the existing hierarchical groups, separate tasks were created where both neighbors and intruders were selected either within or outside the hierarchical groups of the AE (or drug) of interest. See Fig. 1 for detailed description and examples.
Fig. 1.
Example of term intruder detection tasks for nearest neighbors, within and outside HLT, for a specific AE (“Peripheral coldness”). In the "within HLT" task, the four nearest neighbors to “Peripheral coldness” are selected, all belonging to its HLT: “Feelings and sensations”. Then, a random, non-nearest neighbor, intruder, “Hangover”, belonging to the same HLT, is added. In the outside HLT task, the four nearest neighbors to “Peripheral coldness” are selected, none of them belonging to its HLT. Then a random, non-nearest neighbor, intruder, “Psychomotor hyperactivity”, which falls outside the HLT “Peripheral coldness", is added
HLT High-Level Term
The term intruder detection evaluation used 150 random AEs and 150 random drugs. To ensure their diversity, we sampled from three different bins of overall report counts for the AE and drug in total in VigiBase (subjectively set to fewer than 20 reports, between 20 and 2000 reports, and above 2000 reports) and from different SOCs or parent ATC classes. We also required that the overall report count of the intruder was within the range of report counts for the four nearest neighbors and the selected AE. For drugs, the two tasks were analogously created based on ATC-3.
Results
Model Evaluation
Figure 2 shows the stability metric for drugs and AEs with different report counts. As described in Sect. 2.3.1, this was computed for a separate set of AEs and drugs than those used for model search and selection. Stability increased with code frequency (measured as the total number of reports on the code in VigiBase), and for codes with low frequency, the average stability was higher for AEs. For a comparison with the stability metric of the original vigiVec configuration, please refer to Figure S3 in supplementary material.
Fig. 2.
Stability of the nearest neighbors across different frequencies of adverse events and drugs codes
Figure 3 shows that most AEs and drugs had 0 or 1 neighbor from a shared HLT or ATC-3, a tendency that was more pronounced for terms with lower frequencies.
Fig. 3.
Histograms of adverse events (AEs) and drug codes by the number of their 10 nearest neighbors that shared a high-level term (HLT) or Anatomical Therapeutic Chemical level 3 (ATC-3). Restricted to AEs with at least 10 other AEs sharing an HLT with the AE in question (for drugs: sharing an ATC-3), to ensure that all values between 0 and 10 can be observed. The leftmost histograms (in blue) show the overall patterns whereas histograms to the right show results for codes in different frequency ranges of the AEs and drugs in VigiBase (fewer than 20 reports, between 20 and 2000 reports, and above 2000 reports); neighbors could have any frequency
Figure 4 presents the results of the intruder detection task. In all the sub-tasks, intruder AEs were identified significantly more often than the 20% expected by chance, given the 1 in 5 probability of randomly choosing the intruder. When neighbors and intruders came from the same HLT(s) as the AE of interest, the detection rate was 46%. When neighbors and intruders were not from those HLTs, the detection rate was 79%. For drugs, the detection rate was 51% for neighbors and intruders belonging to a common ATC-3 grouping as the drug of interest, and 64% for drugs not belonging to the same ATC-3 groupings. All of these deviations from the expected rate by chance are statistically significant at 1%. Lower intruder detection rates were observed for AEs and drugs with lowest frequencies. The intruder detection results were very similar for the two assessors, with no statistically significant differences between the estimated intruder detection rates (see supplementary material).
Fig. 4.
Intruder detection rates, shown separately for within and outside HLT/ATC level 3 and for different frequency ranges. The data labels show the fraction of tasks where the intruder was correctly identified followed by the corresponding counts (correctly identified/all tasks). The error bars represent 95% confidence intervals. The red dotted line marks the 20% detection rate that would be expected by random chance (1 in 5). ATC anatomical therapeutic chemical, HLT high-level term
For a comparison with the intruder detection rates of the original vigiVec configuration, please refer to the Figure S4 in supplementary material.
Examples of Nearest Neighbors
Table 2 shows the 10 nearest neighbors to sleep disorder, throat tightness, myalgia, rhabdomyolysis, drug withdrawal syndrome, and drug withdrawal syndrome neonatal, respectively. These AEs were chosen to represent a range of clinical terms and to illustrate different features of vigiVec nearest neighbors, as well as how they compare to MedDRA groupings.
Table 2.
Examples of nearest neighbors to adverse events (AEs)
| Focus term | Neighbors | Cosine similarity | Overall frequency | Sharing HLTs | Sharing SMQs |
|---|---|---|---|---|---|
| Sleep disordera | Poor quality sleep | 0.95 | 21,707 | ||
| Middle insomnia | 0.94 | 15,717 | |||
| Initial insomnia | 0.94 | 9686 | |||
| Insomnia | 0.94 | 346,422 | |||
| Sleep deficit | 0.93 | 1192 | |||
| Restlessness | 0.92 | 50,791 | |||
| Nervousness | 0.91 | 84,552 | |||
| Loss of personal independence in daily activities | 0.90 | 51,203 | |||
| Depressed mood | 0.90 | 51,806 | |||
| Discomfort | 0.90 | 62,641 | |||
| Throat tightness | Pharyngeal swelling | 0.98 | 21,473 | ✓ | |
| Pharyngeal paresthesia | 0.97 | 3107 | ✓ | ||
| Paresthesia oral | 0.95 | 37,568 | ✓ | ||
| Pharyngeal oedema | 0.94 | 17,350 | ✓ | ||
| Tongue pruritus | 0.94 | 1911 | ✓ | ||
| Choking sensation | 0.94 | 6435 | ✓ | ||
| Swollen tongue | 0.93 | 42,523 | ✓ | ||
| Pharyngeal hypoesthesia | 0.93 | 2174 | ✓ | ||
| Throat irritation | 0.93 | 61,100 | ✓ | ✓ | |
| Enlarged uvula | 0.92 | 766 | |||
| Myalgia | Chills | 0.95 | 805,089 | ||
| Body temperature increased | 0.95 | 76,346 | |||
| Limb discomfort | 0.95 | 84,921 | |||
| Headache | 0.92 | 1,854,402 | |||
| Lymphadenopathy | 0.92 | 196,329 | |||
| Night sweats | 0.92 | 38,704 | |||
| Muscle fatigue | 0.90 | 7352 | ✓ | ||
| Feeling cold | 0.89 | 62,969 | |||
| Influenza like illness | 0.89 | 215,962 | |||
| Malaise | 0.89 | 749,045 | |||
| Rhabdomyolysis | Myoglobin blood increased | 0.92 | 963 | ✓ | |
| Myoglobinuria | 0.91 | 781 | ✓ | ||
| Blood creatine phosphokinase increased | 0.89 | 40,613 | ✓ | ||
| Myoglobin urine present | 0.88 | 85 | ✓ | ||
| Blood creatine phosphokinase abnormal | 0.86 | 295 | ✓ | ||
| Blood osmolarity increased | 0.86 | 245 | |||
| Crush syndrome | 0.85 | 61 | |||
| Metabolic encephalopathy | 0.84 | 1905 | |||
| PCO2 decreased | 0.83 | 697 | |||
| Anion gap increased | 0.83 | 796 | |||
| Drug withdrawal syndrome | Drug dependence | 0.97 | 154,497 | ✓ | ✓ |
| Substance use disorder | 0.95 | 682 | ✓ | ✓ | |
| Drug tolerance | 0.94 | 3937 | ✓ | ✓ | |
| Dependence | 0.93 | 7733 | ✓ | ✓ | |
| Substance abuse | 0.93 | 6865 | ✓ | ✓ | |
| Substance dependence | 0.92 | 573 | ✓ | ✓ | |
| Drug detoxification | 0.91 | 573 | ✓ | ||
| Drug diversion | 0.91 | 2903 | ✓ | ||
| Drug abuse | 0.89 | 55,243 | ✓ | ✓ | |
| Overdose | 0.89 | 160,390 | ✓ | ||
| Drug withdrawal syndrome neonatal | Learning disability | 0.94 | 4911 | ||
| Developmental delay | 0.92 | 7631 | |||
| Vision abnormal neonatal | 0.91 | 422 | ✓ | ||
| Fetal exposure during pregnancy | 0.89 | 34,811 | ✓ | ||
| Neonatal hypoacusis | 0.87 | 211 | ✓ | ||
| Tremor neonatal | 0.87 | 282 | ✓ | ||
| Drug dependence | 0.86 | 154,497 | ✓ | ✓ | |
| Congenital nystagmus | 0.84 | 102 | ✓ | ||
| Congenital anomaly | 0.83 | 8199 | ✓ | ||
| Spina bifida | 0.83 | 1540 | ✓ | ||
| Drug withdrawal syndrome | 0.83 | 84,775 | ✓ | ✓ | |
| Drug diversion | 0.82 | 2903 | ✓ | ||
| Dependence | 0.82 | 7733 | ✓ | ✓ |
aSleep disorder does not belong to any SMQ
HLT high-level term, SMQ standardized MedDRA® query
For sleep disorder, the nearest neighbors included codes related to sleep disturbances as well as common physical, psychological, and social effects associated with sleep disorder. Notably, none of the neighbors share its HLT sleep disorders NEC, and sleep disorder is not part of an SMQ.
The nearest neighbors of throat tightness reflect a clinical context of oropharyngeal hypersensitivity, with multiple neighbors associated with allergic AEs. Only two of the nearest neighbors share its HLT upper respiratory tract signs and symptoms, but eight share its SMQ oropharyngeal disorders.
Myalgia is one of the most frequently reported AEs in VigiBase, with multiple possible causes. Its nearest neighbors reflect general systemic AEs (e.g., nonspecific AEs following immunization) none of which share its HLT muscle pains. Only muscle fatigue shares one of its SMQs, rhabdomyolysis/myopathy. As for rhabdomyolysis, most of its nearest neighbors are specific laboratory findings associated with this diagnosis. None of them share its HLT myopathies, but the top 5 nearest neighbors share its SMQ rhabdomyolysis/myopathy.
Most of the nearest neighbors to drug withdrawal syndrome are related to substance abuse, sharing its HLT substance-related and addictive disorders and its SMQ drug abuse, dependence and withdrawal. The nearest neighbors to drug withdrawal syndrome neonatal highlight the same drug abuse context but also developmental outcomes and other neonatal disorders following prenatal drug exposure. The former shares its HLT substance related and addictive disorders, whereas the latter shares its SMQ pregnancy and neonatal topics.
Table 3 shows the 10 nearest neighbors to sertraline, lisinopril, and paracetamol, chosen to represent a diverse range of drug classes, mechanisms of action, and AE profiles.
Table 3.
Examples of nearest neighbors to drugs
| Focus drug | Neighbors | Cosine similarity | Overall frequency | Sharing ATC-3 | Sharing SDGs |
|---|---|---|---|---|---|
| Sertraline | Escitalopram | 0.98 | 129,003 | ✓ | ✓ |
| Citalopram | 0.98 | 140,261 | ✓ | ✓ | |
| Propranolol | 0.98 | 96,353 | ✓ | ✓ | |
| Fluoxetine | 0.97 | 167,898 | ✓ | ✓ | |
| Venlafaxine | 0.96 | 146,664 | ✓ | ✓ | |
| Trazodone | 0.96 | 105,545 | ✓ | ||
| Paroxetine | 0.95 | 136,110 | ✓ | ✓ | |
| Bupropion | 0.95 | 139,867 | ✓ | ✓ | |
| Mirtazapine | 0.95 | 83,698 | ✓ | ||
| Buspirone | 0.95 | 34,373 | ✓ | ✓ | |
| Lisinopril | Hydrochlorothiazide | 0.99 | 174,172 | ||
| Pravastatin | 0.97 | 105,355 | |||
| Losartan | 0.97 | 173,332 | |||
| Atorvastatin | 0.97 | 497,409 | |||
| Atenolol | 0.97 | 185,522 | |||
| Amlodipine | 0.96 | 433,240 | |||
| Lovastatin | 0.96 | 46,802 | |||
| Chlortalidone | 0.96 | 15,988 | |||
| Fenofibrate | 0.96 | 61,634 | |||
| Hydrochlorothiazide; Lisinopril | 0.95 | 24,308 | ✓ | ||
| Paracetamol | Ibuprofen | 0.95 | 357,617 | ||
| Gabapentin | 0.94 | 291,032 | ✓ | ✓ | |
| Diphenhydramine | 0.93 | 123,363 | |||
| Bifidobacterium lactis; Bifidobacterium longum; Lactobacillus acidophilus | 0.93 | 134 | |||
| Omeprazole | 0.93 | 503,601 | |||
| Cyanocobalamin | 0.92 | 42,514 | |||
| Loratadine | 0.92 | 105,259 | |||
| Ondansetron | 0.92 | 167,282 | |||
| Iron | 0.92 | 204,874 | |||
| Pantoprazole | 0.92 | 347,448 |
ATC-3 anatomical therapeutic chemical level 3, SDGs standardized drug groupings
The nearest neighbors to sertraline reflect a specific psychiatry context. They all share an ATC-3 (either other anxiolytics or selective serotonin reuptake inhibitors [SSRIs]) and most of them share one of sertraline’s SDGs, antidepressant SSRIs or anxiolytics.
The neighborhood around lisinopril reflects a broader clinical context of cardiovascular disease in which lisinopril is commonly prescribed, including drugs used as high blood pressure medications and drugs used for treatment of dyslipidemia and heart failure. No nearest neighbors share lisinopril’s ATC-3 (angiotensin-converting enzyme [ACE] inhibitors) and just one belongs to the same SDG (antihypertensive ACE inhibitors).
Paracetamol’s neighborhood reflects a pharmacologically broad clinical context with multiple drug classes. Most of the drugs are either for pain treatment (either inflammatory, neuropathic, or abdominal) or common over-the-counter drugs like antihistamines or supplements. Only one of the neighboring drugs shares the same ATC-3 (other analgesics and antipyretics) and the same SDG (other analgesic drugs used in pain therapies).
Figure 5 displays a projection of the AE profile for prednisolone, selected due to its well-established impact on multiple organ systems. The black ellipses reflect different clinical regions, such as cardiovascular, musculoskeletal, nervous, immune, endocrine, ocular, gastrointestinal and dermatological AEs. For example, the cardiovascular conditions region includes AEs such as cardiac arrest, atrial fibrillation, pulmonary edema, myocardial infarction, and congestive heart failure, while the region related to dermatology includes AEs such as alopecia, skin atrophy, impaired healing, rash, and swelling. The relative placement of some of the regions themselves also reflects clinical vicinity. For example, the proximity of psychiatric to neurologic AE regions and of dermatologic to hypersensitivity AEs is consistent with the common co-occurrence of such clinical manifestations.
Fig. 5.
Visualization of the adverse event (AE) profile of prednisolone. Each colored bubble represents an AE, where its size indicates the number of reports with the drug/AE combination, and the color shows the Information Component disproportionality measure [22] reflecting strengths of association in the reporting of each AE with prednisolone in VigiBase. The ellipses (in black) are manually created to indicate specific clinical regions reflected by the AEs in these parts of the space. The projection from 20 dimensions to 2 is done with t-distributed stochastic neighbor embedding (t-SNE) [24]
Discussion
The semantic representations of AEs and drugs generated by vigiVec are stable and allow clinically related AEs and clinically related drugs to be identified. We consider this to be a necessary requirement for real-world use. The vector representations produced by vigiVec provide a unique semantic representation of each AE and drug, based on VigiBase in its entirety, which is possible due to Word2Vec’s computational efficiency, scaled for massive corpora. One advantage of these vectors, and even the associated similarity scores for AEs and drugs, is their transferability, allowing them to be exported and utilized in multiple applications, independent of the original training data.
Earlier research found higher average cosine similarities of vigiVec representations for AEs and drugs belonging to the same MedDRA or ATC groups [10]. Our results show that vigiVec did not recreate the MedDRA and ATC hierarchies since most of the nearest neighbors belong to different HLT or ATC-3 groups than the code of interest. Importantly, our term intruder detection analysis showed that vigiVec could distinguish clinically related terms, both within and outside their hierarchical groupings. For example, in 46% of the examples the human expert preferred all four vigiVec neighbors above the intruder term even when the latter was selected from within the same HLT. As such, vigiVec representations may serve as a valuable complement to MedDRA and WHODrug. Generally, the continuous nature of the vigiVec representations enables more nuanced definitions of clinical relatedness than do the discrete groupings of MedDRA and WHODrug Global/ATC. Specifically, vigiVec may allow medical professionals with limited familiarity of the hierachies to identify clinically related AEs or drugs. Additionally, vigiVec’s representations of AEs and drugs could perhaps support the development of new SMQs and SDGs.
Both stability and clinical relatedness were higher for more frequently reported drugs and AEs. This observation may partially account for the overall better performance for AEs than drugs, since a larger proportion of drugs are infrequently reported. It may also suggest that performance is likely to improve as VigiBase continues to grow. The expected stability and clinical relatedness of individual neighbors could be communicated to end users, allowing a differential consideration and use of vigiVec results, if appropriate. Changing the model architecture from CBOW to Skip-gram while optimizing the vector size and increasing the context window to include all terms on a report improved stability, especially for less common terms.
Other vector representations for biomedical terms focused on different use cases [11, 23] making direct comparison difficult. Some of them reported correlations with UMNSRS similarity and relatedness scores around 0.6 [23–26], which is like that of vigiVec (see supplementary results)—even though vigiVec was not primarily optimized for this. Comparable studies of stability for word representations [27] have reported stability scores in similar ranges as reported here. The term intruder detection analysis has, to our knowledge, not been implemented for biomedical vector representations, but one evaluation of word embeddings reported a term intruder detection rate of 60% [21]. In the previous study of vigiVec, Gattepaille [10] conducted an evaluation where domain experts scored the clinical relatedness of different pairs of AEs; it found a correlation between these scores and vigiVec cosine similarities both within and outside HLT.
One limitation of vigiVec as evaluated here is that computed cosine similarities are always based on the full dimensionality of the vectors. This yields a single set of nearest neighbors for each term and means that a term cannot simultaneously be close to other terms unless these are close to one another, due to the triangle inequality. As a result, AEs that occur in different clinical contexts may not be associated with all of these. For example, myalgia, which is the third most commonly co-reported AE with rhabdomyolysis in VigiBase, is not among its 100 nearest neighbors according to vigiVec. Rhabdomyolysis occupies a region of the space dominated by investigation results, but myalgia resides in a region with general systemic reactions such as headache and influenza-like illness. Earlier research on word embedding models has suggested that there may be algebraic features of the vector representation, which would allow specific semantic aspects to be captured [18]. Future research should explore whether subdimensions of the vigiVec vector representations may be considered separately to identify and isolate specific aspects, such as the nature of an AE term (e.g., an investigation result, sign/symptom, or diagnosis) or whether it relates to a certain body system or disease process such as inflammation. This may provide alternative sets of nearest neighbors for each AE term, depending on the selected subdimensions. For example, we may have a set of neonatal neighbors and a separate set of withdrawal neighbors for a term like neonatal withdrawal syndrome. Currently, vigiVec disregards textual similarity, considering codes as separate entities. Future research could explore the use of semantic representations of words based on medical text corpora and/or VigiBase reporting patterns as a complementary or alternative approach.
Adverse event cluster analysis is a different approach to identifying distinct clinical concepts based on reporting patterns, which allows individual AEs to exist in multiple contexts. Future research may explore whether AE cluster analysis may be enhanced by vigiVec semantic representations of AEs. Additionally, the computational tractability of running vigiVec on all 30+ million reports in VigiBase may help to jump-start AE cluster analysis for a specific medicinal product, borrowing strength from overall reporting patterns in the entire database. In such research, context-dependent word embedding models like those provided by Bidirectional Encoder Representations from Transformers (BERT) may be considered as a possible alternative base model for vigiVec. They account for the polysemic nature of words and may assign unique vectors to each AE and drug code on a specific report depending on the other AEs and drugs listed on the same report, which could be an advantage in the context of cluster analysis.
The results presented here relate to sets of nearest neighbors identified with vigiVec, but future studies may also consider the stability of pairwise cosine similarities directly. Additionally, term intruder detection analyses may be restricted to a single neighbor for each term to isolate the ability of vigiVec to identify clinically relevant pairwise associations. An important next step is to complement this study’s intrinsic evaluations of performance with evaluations of the ability of vigiVec to support and enhance human decision making in real-world pharmacovigilance tasks. In view of the strengths and limitations highlighted above, many use cases may benefit from combining vigiVec with information from the medical terminologies and other analytical approaches, like those based on pairwise associations between terms. If so, the performance of the combined methods should be assessed, and for use cases with humans in the loop, the performance of the proposed human-computer team should be considered.
Conclusion
Nearest neighbors identified with vigiVec are stable and show a high level of clinical relatedness. They are often from different parts of the existing hierarchies and complement these. The performance demonstrated here should be sufficient for vigiVec to be used in its current form when humans are in the loop, for example to propose additional AE terms to be included in queries when domain experts search for reports related to a topic of interest during signal validation or assessment.
Supplementary Information
Below is the link to the electronic supplementary material.
Acknowledgments
The authors are indebted to the members of the WHO Programme for International Drug Monitoring who contribute reports to VigiBase. However, the opinions and conclusions of this study are not necessarily those of the various member organisations nor of the WHO. MedDRA® the Medical Dictionary for Regulatory Activities terminology is the international medical terminology developed under the auspices of the International Conference on Harmonization of Technical Requirements for Registration of Pharmaceuticals for Human Use (ICH). MedDRA® trademark is owned by the International Federation of Pharmaceutical Manufacturers and Associations (IFPMA) on behalf of ICH. Lucie Gattepaille originally conceptualized and developed the vigiVec methodology and highlighted the need for an extended systematic evaluation of its performance. Michele Fusaroli generously performed the additional term intruder detection task required to assess consistency of the results between domain experts. Daniele Sartori provided helpful feedback on the manuscript.
Declarations
Funding
Not applicable.
Availability of data and materials
The data that support the findings of this study are not publicly available. Access to the data is restricted based on the conditions for access within the WHO Programme for International Drug Monitoring. Subject to these conditions, data is available from the authors on reasonable request. For further inquiries, please contact Uppsala Monitoring Centre via https://who-umc.org/contact-information/.
Code availability
The code used for this study is not publicly available but may be made available upon reasonable request. For further inquiries, please contact Uppsala Monitoring Centre via https://who-umc.org/contact-information/.
Ethics approval
Not applicable. This study did not use personal data.
Consent to participate
Not applicable. This study did not involve human participants.
Consent to publish
Not applicable.
Competing interests
The authors declare that they have no competing interests.
Author contributions
All authors participated in the conceptualization and design of the study. NE, HT performed the data analysis. JFC performed the UMNSRS mapping and the term intruder detection analysis. All authors contributed to writing on the manuscript. All authors have read and approved the final version.
References
- 1.Onakpoya IJ, Heneghan CJ, Aronson JK. Post-marketing withdrawal of 462 medicinal products because of adverse drug reactions: a systematic review of the world literature. BMC Med. 2016;14(1):10. 10.1186/s12916-016-0553-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Sartori D, Aronson JK, Norén GN, Onakpoya IJ. Signals of adverse drug reactions communicated by pharmacovigilance stakeholders: a scoping review of the global literature. Drug Saf. 2023;46(2):109–20. 10.1007/s40264-022-01258-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Meyboom RHB, Egberts ACG, Edwards IR, Hekster YA, De Koning FHP, Gribnau FWJ. Principles of signal detection in pharmacovigilance. Drug Saf. 1997;16(6):355–65. 10.2165/00002018-199716060-00002. [DOI] [PubMed] [Google Scholar]
- 4.Mozzicato P. MedDRA: an overview of the medical dictionary for regulatory activities. Pharm Med. 2009;23(2):65–75. 10.1007/BF03256752. [Google Scholar]
- 5.WHO Collaborating Centre for Drug Statistics Methodology. Guidelines for ATC classification and DDD assignment. 2024. https://atcddd.fhi.no/atc_ddd_index_and_guidelines/guidelines/. Accessed May 31, 2024.
- 6.Lagerlund O, Strese S, Fladvad M, Lindquist M. WHODrug: a global, validated and updated dictionary for medicinal information. Ther Innov Regul Sci. 2020. 10.1007/s43441-020-00130-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Goldman SA. AE reporting and standardized medical terminologies: strengths and limitations. Drug Inf J. 2002;36(2):439–44. 10.1177/009286150203600224. [Google Scholar]
- 8.Orre R, Bate A, Norén GN, Swahn E, Arnborg S, Edwards IR. A Bayesian recurrent neural network for unsupervised pattern recognition in large incomplete data sets. Int J Neural Syst. 2005;15(03):207–22. 10.1142/S0129065705000219. [DOI] [PubMed] [Google Scholar]
- 9.Harpaz R, Perez H, Chase HS, Rabadan R, Hripcsak G, Friedman C. Biclustering of adverse drug events in the FDA’s spontaneous reporting system. Clin Pharmacol Ther. 2011;89(2):243–50. 10.1038/clpt.2010.285. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Gattepaille LM. Using the WHO database of spontaneous reports to build joint vector representations of drugs and adverse drug reactions, a promising avenue for pharmacovigilance. In: 2019 IEEE International Conference on Healthcare Informatics (ICHI). IEEE; 2019:1–6. 10.1109/ICHI.2019.8904551
- 11.Portanova J, Murray N, Mower J, Subramanian D, Cohen T. aer2vec: distributed representations of AE reporting system data as a means to identify drug/side-effect associations. AMIA Annu Symp Proc AMIA Symp. 2019;2019:717–26. [PMC free article] [PubMed] [Google Scholar]
- 12.Michele F, Stefano P, Luca M, Valentina G, Luca P, Emanuel R, Daniel W, Maurizio R, Gastone C, Fabrizio DP, Elisabetta P. Unveiling the burden of drug-induced impulsivity: a network analysis of the FDA adverse event reporting system. Drug Saf. 2024;47(12):1275–92. 10.1007/s40264-024-01471-z. [DOI] [PMC free article] [PubMed]
- 13.Berry SM, Berry DA. Accounting for multiplicities in assessing drug safety: a three-level hierarchical mixture model. Biometrics. 2004;60(2):418–26. 10.1111/j.0006-341X.2004.00186.x. [DOI] [PubMed] [Google Scholar]
- 14.Kulldorff M, Dashevsky I, Avery TR, et al. Drug safety data mining with a tree-based scan statistic. Pharmacoepidemiol Drug Saf. 2013;22(5):517–23. 10.1002/pds.3423. [DOI] [PubMed] [Google Scholar]
- 15.Norén GN, Meldau E-L, Chandler RE. Consensus clustering for case series identification and AE profiles in pharmacovigilance. Artif Intell Med. 2021;122: 102199. 10.1016/j.artmed.2021.102199. [DOI] [PubMed] [Google Scholar]
- 16.Lindquist M. VigiBase, the WHO global ICSR database system: basic facts. Drug Inf J. 2008. 10.1177/009286150804200501. [Google Scholar]
- 17.Mikolov T, Sutskever I, Chen K, Corrado G, Dean J. Distributed representations of words and phrases and their compositionality. 2013. 10.48550/arXiv.1310.4546
- 18.Mikolov T, Chen K, Corrado G, Dean J. Efficient estimation of word representations in vector space. 2013. 10.48550/arXiv.1301.3781
- 19.Rehurek R, Sojka P. Software framework for topic modelling with large corpora. In: Proceedings of the LREC 2010 Workshop on New Challenges for NLP Frameworks. 2010. 10.13140/2.1.2393.1847
- 20.Chang J, Gerrish S, Wang C, Boyd-graber JL, Blei DM. Reading Tea Leaves: How Humans Interpret Topic Models. In: Bengio Y, Schuurmans D, Lafferty JD, Williams CKI, Culotta A, eds. Advances in Neural Information Processing Systems 22. Curran Associates, Inc.; 2009:288–296. http://papers.nips.cc/paper/3700-reading-tea-leaves-how-humans-interpret-topic-models.pdf. Accessed November 21, 2019.
- 21.Schnabel T, Labutov I, Mimno D, Joachims T. Evaluation methods for unsupervised word embeddings. In: Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing. Lisbon, Portugal: Association for Computational Linguistics; 2015:298–307. 10.18653/v1/D15-1036
- 22.Norén GN, Hopstadius J, Bate A. Shrinkage observed-to-expected ratios for robust and transparent large-scale pattern discovery. Stat Methods Med Res. 2013;22(1):57–69. 10.1177/0962280211403604. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Zhang Y, Chen Q, Yang Z, Lin H, Lu Z. BioWordVec, improving biomedical word embeddings with subword information and MeSH. Sci Data. 2019;6(1):52. 10.1038/s41597-019-0055-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Ding X, Cohen T. Retrofitting vector representations of adverse event reporting data to structured knowledge to improve pharmacovigilance signal detection. In: AMIA Annual Symposium Proceedings. vol. 2020. 2021, pp. 383–92. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8075473/. Accessed 31 May 2024. [PMC free article] [PubMed]
- 25.Chiu B, Crichton G, Korhonen A, Pyysalo S. How to Train good Word Embeddings for Biomedical NLP. In: Proceedings of the 15th Workshop on Biomedical Natural Language Processing. Berlin, Germany: Association for Computational Linguistics; 2016:166-174. 10.18653/v1/W16-2922
- 26.Pakhomov SVS, Finley G, McEwan R, Wang Y, Melton GB. Corpus domain effects on distributional semantic modeling of medical terms. Bioinformatics. 2016. 10.1093/bioinformatics/btw529. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Borah A, Barman MP, Awekar A. Are Word Embedding Methods Stable and Should We Care About It? In: Proceedings of the 32nd ACM conference on hypertext and social media. HT ’21. New York, NY, USA: Association for Computing Machinery; 2021:45–55. 10.1145/3465336.3475098
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
The data that support the findings of this study are not publicly available. Access to the data is restricted based on the conditions for access within the WHO Programme for International Drug Monitoring. Subject to these conditions, data is available from the authors on reasonable request. For further inquiries, please contact Uppsala Monitoring Centre via https://who-umc.org/contact-information/.





