Abstract
Objective
Automated analysis of vaccine postmarketing surveillance narrative reports is important to understand the progression of rare but severe vaccine adverse events (AEs). This study implemented and evaluated state-of-the-art deep learning algorithms for named entity recognition to extract nervous system disorder-related events from vaccine safety reports.
Materials and Methods
We collected Guillain-Barré syndrome (GBS) related influenza vaccine safety reports from the Vaccine Adverse Event Reporting System (VAERS) from 1990 to 2016. VAERS reports were selected and manually annotated with major entities related to nervous system disorders, including, investigation, nervous_AE, other_AE, procedure, social_circumstance, and temporal_expression. A variety of conventional machine learning and deep learning algorithms were then evaluated for the extraction of the above entities. We further pretrained domain-specific BERT (Bidirectional Encoder Representations from Transformers) using VAERS reports (VAERS BERT) and compared its performance with existing models.
Results and Conclusions
Ninety-one VAERS reports were annotated, resulting in 2512 entities. The corpus was made publicly available to promote community efforts on vaccine AEs identification. Deep learning-based methods (eg, bi-long short-term memory and BERT models) outperformed conventional machine learning-based methods (ie, conditional random fields with extensive features). The BioBERT large model achieved the highest exact match F-1 scores on nervous_AE, procedure, social_circumstance, and temporal_expression; while VAERS BERT large models achieved the highest exact match F-1 scores on investigation and other_AE. An ensemble of these 2 models achieved the highest exact match microaveraged F-1 score at 0.6802 and the second highest lenient match microaveraged F-1 score at 0.8078 among peer models.
Keywords: VAERS, deep learning, vaccine adverse events, named entity recognition
INTRODUCTION
Vaccines are biological products that provide active acquired immunity to vaccine-preventable diseases, such as influenza and measles. Vaccines not only protect the individuals who have been vaccinated but also protect the community if a high enough percentage of the population is vaccinated (ie, herd immunity).1 As the most effective method of preventing infectious diseases, vaccination has saved millions of lives and protects millions more from illness and disability each year.2 Although vaccines have been evaluated as very safe by extensive, rigorous studies,3 and most of the vaccine adverse events (AEs) are very mild, rare but severe problems can happen after vaccination.4
Spontaneous reporting systems, such as the US Vaccine Adverse Event Reporting System (VAERS), serve as valuable tools for postmarketing surveillance to monitor possible safety signals in US-licensed vaccines. VAERS is comanaged by the Centers for Disease Control and Prevention (CDC) and the Food and Drug Administration (FDA) to collect reports about AEs after vaccination.5 VAERS has collected more than 650 000 reports between 1990 and 2020. The system contains both structured (eg, patient and vaccine information) and unstructured data (eg, symptom text). Compared with structured data, narrative symptom text often contains richer information regarding the progression of potential vaccine AEs, including the descriptions of key temporal expressions of clinical events, diagnosis and symptoms, treatments, and lab tests. However, the nature of the free text (eg, ambiguous and unstructured) makes it difficult to identify key concepts efficiently, and thus their functionality in helping understand more details of the postvaccination AE progressions is limited. Thereby, the automated extraction of key concepts from the high volume of narrative symptoms texts would greatly facilitate a high-quality clinical review of potential vaccine AEs.
As VAERS is a passive reporting system, and the reports can be submitted voluntarily by both experts (eg, healthcare providers and vaccine manufacturers) and nonexperts (eg, patients and family members), the quality of reports varies6,7, which makes VAERS reports differ from other clinical or medical documentation. Thus, existing pretrained clinical Natural Language Processing (NLP) systems for AE extraction (eg, adverse drug events extraction systems8,9) may not work well with VAERS reports. A few studies have been devoted to the automated analysis of VAERS reports. Foster et al introduced an annotated corpus for vaccine adverse event reports that focus on clinical features.10 Vaccine adverse event Text Mining (VaeTM) was a system developed by Botsis et al that extracts key clinical features (eg, diagnostic features, assessment features) from VAERS reports using a lexicon and rule-based approach.11 The extracted features further contributed to the identification of possible Guillain-Barré Syndrome (GBS) cases12 and anaphylaxis.6 However, to date, few machine learning-based studies have been reported for VAERS reports AEs extraction.
Fraudulent links of vaccination with severe diseases have been debunked, such as MMR with autism spectrum disorder13 and human papillomavirus vaccine with an increased incidence of cervical cancer.14 However, the links of vaccination with some nervous system disorders-related diagnoses and symptoms, such as GBS15 or febrile seizure,16 remain unclear. GBS, an acute immune-mediated peripheral neuropathy, is 1 of the most common acute paralytic neuromuscular disorders17,18 and is associated with substantial mortality and morbidity.19 Though the incidence of GBS is very rare, influenza vaccines have long been suspected to increase the risk of GBS.15,20 Thus, the extraction of nervous system disorder-related events and information from VAERS reports would contribute to the understanding of GBS and other nervous system disorders following vaccination.
In this article, as 1 of the first efforts, we evaluated the use of machine learning especially deep learning models for named entity recognition from VAERS narrative reports, with a particular focus on severe nervous system disorders-related entities (eg, GBS), including temporal expression, symptom and disease, procedure (eg, treatment), etc, contrary to general clinical features annotations in Foster et al’s study.10 In particular, we evaluated transfer learning (ie, use of pretrained language representation) for entity recognition tasks, which is beneficial to clinical NLP tasks where the annotation efforts are typically expensive.21 To promote community efforts on vaccine AEs identification, we also make our human-curated VAERS corpus containing annotations publicly available to global researchers.
MATERIALS AND METHODS
VAERS report annotation
VAERS data is accessible by downloading raw data in comma-separated value files from https://vaers.hhs.gov/data.html. We downloaded all the comma-separated value files from 1990 to 2016 and imported the data to MySQL database. Each report was coded using Medical Dictionary for Regulatory Activities (MedDRA) Preferred Terms in VAERS data. We collected all symptom texts (ie, narrative safety reports) with GBS and typical GBS-related MedDRA codes following any influenza virus vaccination (eg, FLU3, FLU4, H5N1, H1N1). 1849 reports were found in total. To select the reports with relatively rich information, we further filtered out the reports with a short length of text (eg, less than 1100 characters). Among the remaining 282 reports, we randomly selected 100 reports for annotation. However, 9 reports were further removed after manual review, as these reports contain many duplicate texts with other reports. Thus, 91 reports were annotated in our corpus.
We are focusing on 6 entity types related to the understanding of the progression of GBS and other nervous system disorders, including investigation, nervous_AE, other_AE, procedure, social_circumstances, and temporal_expression, which has covered major clinical information in VAERS reports. As we’re interested in the progression to GBS following vaccination for each report, we only annotated the text and events until the diagnosis of GBS. The text that appeared after the diagnosis of GBS was removed in our annotation. The definitions and examples of these named entities and their annotation agreement are shown in Table 1. Except for temporal_expression, the definitions and classifications of other types of entities were referenced from the MedDRA terminology. Five study team members with a background in pharmacy and health informatics were involved in the annotation process. MS, MZ, YX, and CT developed the annotation guideline (see Supplementary Material S2) through iterative discussions, with consultation from medical experts. MS and MZ served as primary annotators for 91 VAERS reports and annotated each report independently. MS and MZ worked with JD to resolve any inconsistencies on entities and entity boundaries through discussion consensus. Clinical language annotation, modeling, and processing (CLAMP) was leveraged as the tool for annotation.22Figure 1 shows an example of an annotated report.
Table 1.
Types of named entities annotated in VAERS reports
| Entity Type | Definition | Example | Annotation Agreement |
|---|---|---|---|
| investigation | Lab tests and examinations | “neurological exam,” “lumbar puncture” | 0.7436 |
| nervous_AE | Symptoms and diseases related to nervous system disorders | “tremors,” “Guillain-Barré syndrome” | 0.7885 |
| other_AE | Other symptoms and diseases | “complete bowel incontinence,” “diarrhea” | 0.8017 |
| procedure | Clinical interventions to the patient, including vaccination, treatment and therapy, intensive care, etc. | “flu shot,” “hospitalized” | 0.7492 |
| social_circumstance | Events associated with the social environment of a patient | “smoking,” “alcohol abuse” | 0.4706 |
| temporal_expression | Temporal expressions with prepositions | “for 3 days,” “on Friday morning” | 0.8448 |
Note: The annotation agreement scores were measured by F-1 scores using 1 annotator’s annotation as the gold standard.
Figure 1.
A sample VAERS report with annotated entities.
Named entity recognition (NER)
NER is 1 of the most popular tasks in NLP that seeks to locate and classify named entities mentioned in unstructured text into predefined classes.23 The identification of nervous system disorder-related events is framed as a NER task. Annotated data was transformed into the BIO format, where “B” represents the word at the beginning of the entity, “I” represents the word inside of the entity, and “O” represents the word outside of the entity. The machine learning-based NER models predict the BIO labels for a sequence of words (ie, a sentence).
Text preprocessing
We leveraged CLAMP22 to perform preprocessing steps, including tokenization, sentence boundary detection, and part-of-speech (POS) tagging.
Conditional random fields (CRF)
CRF is a statistical sequence modeling algorithm that has been commonly applied to entity recognition tasks.24,25 Before the “era of deep learning,” CRF with feature engineering achieved state-of-the-art performance in many clinical NLP tasks and challenges.26,27 In CRF, each word in a sentence is considered as a time step and assigned a tag (a BIO tag) by converting our annotation of a named entity. We leveraged CLAMP, which achieved top performances in clinical NLP challenges,22 to implement the CRF algorithm. An extensive set of features was employed, including lexical features, syntactic features, context features, distributional representation of words, and domain knowledge features.
Long short-term memory (LSTM)
Recurrent neural networks (RNN) has demonstrated remarkable achievements in modeling sequential data.28 LSTM is a variation of recurrent neural networks that helps to alleviate the vanishing gradient problem by using the “gate” mechanism to retain “memories” of prior time steps.29 LSTM has achieved extensive success in NER tasks.30,31 As an extension of traditional single direction (forward) LSTM, bidirectional LSTM, which trains 2 LSTMs simultaneously to capture both forward and backward information of the input sequence, has further improved the performance for NER tasks.30,32,33 Word-level representation (ie, pretrained word embedding34) is typically used as the input for LSTM models. Considering the issues of out-of-vocabulary words or misspellings, character-level representation can be further concatenated to word embedding as an enhancement to capture subword information.35 In this study, we leveraged the NeuroNER, which is a bidirectional LSTM framework, for the NER tasks.36 NeuroNER contains a character-enhanced token-embedding layer, a label prediction layer, and a label sequence optimization layer (ie, CRF) for sequence labeling tasks. We implemented NeuroNER using the code available from.37 The pretrained BioWordVec embedding (dimension: 200 38) was used to initiate the word embedding layer. The maximum number of epochs was set at 100. Other parameters were set as default.
Bidirectional encoder representations from transformers (BERT)
Transfer learning was developed to address the issue whereby some representations were first pretrained on large volumes of unannotated datasets and then further adapted to guide other tasks especially on resource-deficient datasets (using word embeddings can also be seen as a simple transfer learning).39 A recent trend in transfer learning is to use self-supervised learning over large general datasets to derive a general-purpose pretrained model that captures the intrinsic structure of the data, which can be applied to a specific task with a specific dataset by fine-tuning. Among the models, the BERT model is 1 of the most popular ones for handling sequential inputs (eg, texts, with numerous variations) and has obtained dozens of state-of-the-art results in many NLP tasks including NER.40 BERT also has been applied to the biomedical and clinical domain, where there already were publicly available models pretrained on the biomedical and clinical corpora (eg, BioBERT41 and MIMIC BERT42). We evaluated 3 pretrained BERT models on our tasks, including general BERT (the original BERT model provided from Google), BioBERT, and MIMIC BERT. For every BERT model, both the base (with 110M parameters) and large (with 340M parameters) cased models were evaluated. We replicated BERT models from43. The learning rate was set at 2e-5, and the max sequence length was set at 512. For the BERT base model, the batch size was set at 8 and the total number of training epochs was set at 20; for the BERT large model, the batch size was set at 4 and the total number of training epochs was set at 10.
VAERS BERT
A domain-specific pretrained language representation model might be able to capture additional semantics for a specific task. Thus, we further pretrained BERT models on VAERS reports. We’ve collected all symptom texts (length of characters ≥ 1100) from 1990 to 2019 and pretrained BioBERT models on these texts. A total of 43 240 reports with 798 029 sentences were used for pretraining. The batch size was set at 8 for the base model and 4 for the large model. The max sequence length was set at 512 for both models. We pretrained both large and base models for 380 000 steps.
Ensemble learning
As a particular model may excel on 1 or more categories of entities, we adopted a category-level best ensemble learning strategy for VAERS NER tasks to integrate the advantages of individual models. The model which has the highest prediction F1 score among its peers on a particular category was chosen for the prediction of that specific category. Specifically, we used BioBERT large model to predict 4 entity types, including nervous_AE, procedure, social_cirumstance, and temporal_expression and VAERS BERT large model to predict the other 2 types, including investigation and other_AE.
Postprocessing
The first token in a predicted entity should start with “B” tag in the BIO format; however, conventional machine learning and deep learning models may misassign an “I” tag to the first token of an entity in some cases. This is inevitable due to the soft constraints of leveraging probabilities. We set a postprocessing rule to further refine the entity prediction results: if an entity started with “I” tag in its first token, the “I” was changed to “B.” In addition, if there was no other type of entity in 1 sentence, we removed the prediction of temporal_expression in that sentence, which is consistent with our annotation guideline.
Evaluation
The annotated VAERS reports were split into train, validation, and test sets with a proportion of 7:1:2. We evaluated NER algorithms using standard metrics, including precision, recall, and F-measure, based on both exact match (same entity boundary) and lenient match (overlap in entity boundary) respectively:
For each model, considering the random variations, we run the algorithm 10 times and report the average score.
RESULTS
Of the 91 VAERS reports, 2512 entities were annotated. Three of the most common entity types were nervous_AE (651, 25.92%), temporal_expression (546, 21.74%), and other_AE (530, 21.09%). The entity social circumstance very rarely (28, 1.11%) occurred in these VAERS reports. On average, each report contained 11.15 sentences and 202.62 tokens. The statistics of reports, annotated entities, and their distributions among train, validation, and test sets can be seen in Table 2. In particular, the distribution of nervous_AE and other_AE entities is shown on Supplementary Material S1 Figure S2.
Table 2.
VAERS reports annotation statistics
| Train | Validation | Test | Total Entities | Total Unique Entities | ||
|---|---|---|---|---|---|---|
| Named entities statistics | investigation | 148 | 29 | 59 | 236 | 159 |
| nervous_AE | 406 | 83 | 162 | 651 | 418 | |
| other_AE | 301 | 62 | 167 | 530 | 373 | |
| procedure | 338 | 57 | 126 | 521 | 220 | |
| social_circumstance | 20 | 4 | 4 | 28 | 28 | |
| temporal_expression | 371 | 48 | 127 | 546 | 512 | |
| Reports statistics | Reports | 63 | 9 | 19 | 91 | |
| Total sentences | 603 | 126 | 286 | 1,015 | ||
| Total tokens | 11,196 | 2,097 | 5,145 | 18,438 | ||
| Average characters | 847.22 | 1090.33 | 1290.63 | |||
Note: As the text that appeared after the diagnosis of GBS was removed in our annotation, the average characters might be shorter than 1100.
Table 3 and 4 show the exact and lenient match F scores of NER models on VAERS reports, respectively. Tables S1–4 show the exact and lenient match precision and recall scores. The deep learning models demonstrated superiority over the conventional machine learning algorithm (ie, CRF) on AE-relevant entity recognition from VAERS reports. One observation from Supplementary Material S1 Table S1–4 is that deep learning models were able to greatly improve the recall scores of entity recognition, which led to the improvement of F scores. The LSTM model (ie, NeuroNER) nearly outperformed the CRF model on every entity type except procedure. Another observation is that large BERT models achieved better performance than base BERT models, which is consistent with prior studies. BioBERT, the BERT models that were further fine-tuned on the biomedical corpus, achieved the highest exact match F-1 score on nervous_AE, procedure, social_cirumstance, and temporal_expression; while VAERS BERT achieved the highest exact match F-1 score on investigation and other_AE. The category-level best ensemble learning strategy has further advanced best microaverage F-1 scores (0.6802) compared with single models.
Table 3.
NER comparison on VAERS reports entity extraction measured by exact match F score
| CRF | LSTM | BERT |
BioBERT |
MIMIC BERT |
VAERS BERT |
Ensemble | Number of Entities | |||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Base | Large | Base | Large | Base | Large | Base | Large | |||||
| investigation | 0.4272 | 0.4655 | 0.3896 | 0.4550 | 0.4903 | 0.5106 | 0.4763 | 0.4672 | 0.5115 | 0.5418* | 0.5347* | 59 |
| nervous_AE | 0.5935 | 0.6193 | 0.5778 | 0.6131 | 0.6146 | 0.6653* | 0.5702 | 0.5624 | 0.6268 | 0.6518* | 0.6596* | 162 |
| other_AE | 0.5352 | 0.5627 | 0.5317 | 0.5982 | 0.5670 | 0.6201* | 0.5682 | 0.5897 | 0.5575 | 0.6249* | 0.6331* | 167 |
| procedure | 0.7013 | 0.6839 | 0.6938 | 0.7262 | 0.7179 | 0.7852* | 0.7047 | 0.7363 | 0.7294 | 0.7649 | 0.7806* | 126 |
| social_circumstance | 0.0000 | 0.0000 | 0.0000 | 0.0354 | 0.0348 | 0.1792* | 0.0000 | 0.0200 | 0.1158 | 0.0167* | 0.1674* | 4 |
| temporal_expression | 0.6344 | 0.6590 | 0.7577 | 0.7689 | 0.7527 | 0.7821* | 0.7293 | 0.7416 | 0.7517 | 0.7702* | 0.7832* | 127 |
| Microaverage | 0.5919 | 0.6099 | 0.5985 | 0.6398 | 0.6333 | 0.6783* | 0.6138 | 0.6220 | 0.6396 | 0.6735* | 0.6802* | 645 |
Note: The scores were averaged scores after 10 runs.
statistically higher than other methods (CI: 0.95).
Table 4.
NER comparison on VAERS reports entity extraction measured by lenient match F score
| CRF | LSTM | BERT |
BioBERT |
MIMIC BERT |
VAERS BERT |
Ensemble | Number of Entities | |||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Base | Large | Base | Large | Base | Large | Base | Large | |||||
| investigation | 0.6061 | 0.5844 | 0.6406 | 0.6905 | 0.6792 | 0.6927 | 0.6712 | 0.7114 | 0.7304 | 0.6870 | 0.5844 | 59 |
| nervous_AE | 0.7480 | 0.7541 | 0.7814 | 0.7948 | 0.8180 | 0.7643 | 0.7605 | 0.7939 | 0.8070 | 0.8195 | 0.7541 | 162 |
| other_AE | 0.6536 | 0.7341 | 0.7788 | 0.7538 | 0.7928 | 0.7575 | 0.7510 | 0.7525 | 0.8048 | 0.7975 | 0.7341 | 167 |
| procedure | 0.7446 | 0.7539 | 0.7776 | 0.7668 | 0.8284 | 0.7715 | 0.7980 | 0.7751 | 0.8164 | 0.8269 | 0.7539 | 126 |
| social_circumstance | 0.0000 | 0.2959 | 0.3622 | 0.3154 | 0.3658 | 0.4110 | 0.3997 | 0.4578 | 0.3598 | 0.3557 | 0.2959 | 4 |
| temporal_expression | 0.7274 | 0.8469 | 0.8645 | 0.8661 | 0.8795 | 0.8566 | 0.8587 | 0.8685 | 0.8928 | 0.8804 | 0.8469 | 127 |
| Microaverage | 0.7061 | 0.7455 | 0.7775 | 0.7794 | 0.8064 | 0.7728 | 0.7722 | 0.7848 | 0.8137 | 0.8078 | 0.7455 | 645 |
Note: The scores were averaged scores after 10 runs.
For the lenient match, VAERS BERT showed superiority over peer models on F-1 scores. VAERS BERT large achieved highest microaveraged F-1 score and highest lenient match F-1 score on other_AE and temporal_expression. BioBERT large led the best F-1 score on nervous_AE and procedure. The ensemble model achieved the second highest microaveraged F-1 score and highest F-1 score on investigation. VAERS BERT large outperformed CRF baseline model by a large margin (11.98%) on microaveraged F-1 score, and also 0.83% higher than BioBERT large model.
DISCUSSION
The automatic identification of possible adverse drug events from electronic health records, such as discharge summaries, has been well studied in the past.44 However, the effectiveness of machine learning-based adverse event identification from vaccine postmarketing safety reports remains unclear. In this study, we built a collection of annotated VAERS GBS reports covering major entities related to nervous system disorders, including nervous_AE, other_AE, procedure, social_circumstance, and temporal_expression. A variety of machine learning and deep learning algorithms were then evaluated for the automated identification of these entities from VAERS narrative texts. Deep learning-based methods have been found to outperform conventional machine learning-based methods with a margin.
The pretrained language models have shown promising results on many clinical and biomedical NLP tasks in recent years.41,42,45,46 This study further demonstrated their efficacy in vaccine safety reports information extraction tasks. Domain-specific language representation (BioBERT and VAERS BERT) fine-tuned on large-scale biomedical corpora, achieved the best performance in our tasks. Unlike NLP tasks in the general domain, clinical and biomedical tasks are typically limited by the size of labeled datasets. The transfer learning (eg, use of pretrained word embedding or pretrained language models) paradigm plays a significant role in deep learning models to achieve superior performance on those tasks with the limited size of datasets. Our prior study has found the efficacy of cross-domain transfer learning on NER tasks on a social media dataset.47
Although deep learning models achieved good performance in recognizing most types of entities in VAERS reports, there are some limitations as well. Deep learning models, especially BERT models, require significant computational resources (eg, use of a graphics processing unit and longer prediction time) compared with traditional machine learning models, such as CRF (see Figure S1). In addition, the accuracy of prediction on named entities is still suboptimal. The common types of prediction errors (from the best ensemble model) are summarized in Table 5 and the statistics of these errors are shown in Table 6. For AE (nervous_AE and other_AE) annotation, we removed certain adjective words that describe the extent of the AE (eg, lots of, deep, severe) in the phrase while keeping adjacent words that describe the location of the AE (eg, upper back), as the inconsistency led to obstacles in identifying exact boundaries. Annotating granular attributes of the AE, such as body location and severity, might help with entity boundary recognition. Another major type of error is that the deep learning models might recognize the entities which were not annotated by human annotators. One reason is that we did not annotate the entities related to patient medical history before vaccination, as we were focusing on the events that happened after vaccination. We also did not annotate temporal_expression if there were any clinical entities that appeared in the same sentence. The last major types of errors happened when deep learning model predicted entity other_AE or nervous_AE. It is challenging for deep learning models to distinguish these 2 entities as it is even challenging for human annotators to do so. In some cases, other_AE (eg, AE related to musculoskeletal disorders) might also be the symptoms of nervous system disorders.
Table 5.
Common types of prediction errors in VAERS reports. Bold: human annotated entity; Underline: machine annotated entity
| Boundary mismatch |
|
| Irrelevant clinical events (False positive) |
|
| Incorrect entity type |
|
| False negative |
|
Table 6.
Statistics of ensemble model prediction errors on different entity types
| Boundary Mismatch (out of human annotated entities) | False Positive (out of machine annotated entities) | False Negative (out of human annotated entities) | Incorrect Entity Type (out of machine annotated entities) | |
|---|---|---|---|---|
| investigation | 9.9/59, 16.8% | 34.1/83.9, 40.6% | 10.9/59, 18.5% | 4.5/59, 7.6% |
| nervous_AE | 27.5/162, 17.0% | 53.5/205.6, 26.0% | 13.3/162, 8.2% | 20.1/162, 12.4% |
| other_AE | 26.3/167, 15.7% | 49.3/198.2, 24.9% | 25.1/167, 15.0% | 29.6/167, 17.7% |
| procedure | 6.2/126, 4.9% | 30.9/141.5, 21.8% | 15.4/126, 12.2% | 8.6/126, 6.8% |
| social_circumstance | 1.1/4, 27.5% | 8.7/11.5, 75.7% | 1.8/4, 45.0% | 2.5/4, 62.5% |
| temporal_expression | 12.4/127, 9.8% | 20/136.3, 14.7% | 11.5/127, 9.1% | 1.5/127, 1.2% |
Note: The counts of errors were averaged after 10 runs.
The present study focused on named entity recognition of nervous system disorder-related events and temporal expressions. In future work, we will link the extracted entities to corresponding concepts in medical terminologies (eg, MedDRA48). Normalizing the extracted entities would benefit statistical analyses of vaccine adverse events. In addition to entity normalization, we will also work on temporal relation classification and inference among extracted entities. Temporal information terminology such as Time Event Ontology (TEO) 49 and TLINK50 would help with temporal information inference and timeline generation, which would offer insights to the understanding of vaccine AE progression.
CONCLUSION
In this study, we focused on information extraction from vaccine postmarketing safety reports and built an annotated corpus covering major entities related to vaccine-related nervous system disorders. A variety of conventional machine learning and deep learning algorithms were evaluated for the automatic identification of entities in VAERS narrative texts. An ensemble of pretrained language models-based deep learning methods achieved the best performance.
Rapid advances in vaccine development (eg, mRNA vaccines for COVID-1951) are very encouraging and foster hopes of protecting the public and ending the global pandemic. As the gold standard to evaluate the safety and efficacy of new vaccines, clinical trials, however, suffer from some limitations in evaluating rare adverse events, due to the short observation period and the limited size of testing populations. As a result, postmarketing surveillance, such as a spontaneous reporting database, serves the critical role of continually monitoring the safety signals of adverse events. Together with relevant efforts from other studies (eg, VaeTM 11 and Foster et al’s annotated corpus 10), the corpus and NER models we built in this study would facilitate the automated and accurate information extraction from vaccine safety reports and thus will improve the efficiency and accuracy of large-scale vaccine AE reports reviews and analyses for both newly developed and existing vaccines. We view this as a necessary step to evaluate long-term and rare vaccine safety signals to answer public concerns and ensure the success of vaccination programs for current and future infectious disease outbreaks.
FUNDING
This research was supported by the National Institutes of Health under award numbers R01AI130460 and R01LM011829.
Disclaimer: The content of this article is the sole responsibility of the authors and does not necessarily represent the official views of the National Institute of Health.
AUTHOR CONTRIBUTIONS
Study concept and design: JD, YX, MS, and CT; corpus annotation: MS, MZ, and JD; experiments: JD, YX, JW, and YS; draft of the manuscript: JD, YX, and CT; acquisition, analysis, and interpretation of data: JD, YX, MS, and MZ. All authos contributed critical revisions of the manuscript for important intellectual content. CT provided study supervision.
Supplementary Material
ACKNOWLEDGMENT
We thank Dr. Alokananda Ghosh and Dr. Hangyu Ji for the valuable discussions on the annotation guideline.
DATA AVAILABILITY
The annotated corpus underlying this article are available at https://github.com/UT-Tao-group/Vaccine-AE-recognition.
CONFLICT OF INTEREST STATEMENT
Dr. Xu and The University of Texas Health Science Center at Houston have research-related financial interests in Melax Technologies, Inc.
REFERENCES
- 1.Herd immunity (Herd protection) | Vaccine Knowledge. https://vk.ovg.ox.ac.uk/vk/herd-immunity Accessed June 15, 2020[TQ1]
- 2. Philippe D, Jean-Marie O-B, Marta G-D, et al. Global immunization: status, progress, challenges and future. BMC Int Health Hum Rights 2009; 9 (Suppl 1): S2. doi:10.1186/1472-698x-9-s1-s2 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.US Centers for Disease Control and Prevention. Vaccine Testing and Approval Process. https://www.cdc.gov/vaccines/basics/test-approve.html Accessed June 15, 2020
- 4.US Centers for Disease Control and Prevention. Possible side effects from vaccinationsCent. Dis. Control Prev. USA. 2015. http://www.cdc.gov/vaccines/vac-gen/side-effects.htm Accessed June 15, 2020 [Google Scholar]
- 5.US Centers for Disease Control and Prevention. VAERS - About Us. https://vaers.hhs.gov/about.html Accessed June 15, 2020
- 6. Botsis T, Nguyen MD, Woo EJ, et al. Text mining for the Vaccine Adverse Event Reporting System: medical text classification using informative feature selection. J Am Med Inform Assoc 2011; 18 (5): 631–8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.US Centers for Disease Control and Prevention. VAERS - Guide to Interpreting VAERS Data. https://vaers.hhs.gov/data/dataguide.html Accessed June 15, 2020
- 8. Uzuner Ö, Stubbs A, Lenert L. Advancing the state of the art in automatic extraction of adverse drug events from narratives. J Am Med Inform Assoc. 2020; 27 (1): 1–2. [DOI] [PMC free article] [PubMed]
- 9. Wei Q, Ji Z, Li Z, et al. A study of deep learning approaches for medication and adverse drug event extraction from clinical text. J Am Med Inform Assoc 2020; 27 (1): 13–21. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10. Foster M, Pandey A, Kreimeyer K, et al. Generation of an annotated reference standard for vaccine adverse event reports. Vaccine 2018; 36 (29): 4325–30. doi:10.1016/j.vaccine.2018.05.079 [DOI] [PubMed] [Google Scholar]
- 11. Botsis T, Buttolph T, Nguyen MD, et al. Vaccine adverse event text mining system for extracting features from vaccine safety reports. J Am Med Inform Assoc 2012; 19 (6): 1011–8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12. Botsis T, Woo EJ, Ball R.. The contribution of the vaccine adverse event text mining system to the classification of possible Guillain-Barre syndrome reports. Appl Clin Inform 2013; 04 (01): 88–99. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13. Godlee F, Smith J, Marcovitch H.. Wakefield’s article linking MMR vaccine and autism was fraudulent. BMJ 2011; 342 (jan05 1): c7452–6. doi:10.1136/bmj.c7452 [DOI] [PubMed] [Google Scholar]
- 14. Hawkes N. Journals retract four articles criticising vaccines by author who used a pseudonym. BMJ. 2018; 361: k2394. doi: 10.1136/bmj.k2394. [DOI] [PubMed]
- 15.US Centers for Disease Control and Prevention. Vaccine Safety: Guillain-Barré Syndrome Concerns. https://www.cdc.gov/vaccinesafety/concerns/guillain-barre-syndrome.html Accessed June 15, 2020
- 16.US Centers for Disease Control and Prevention. Vaccine Safety: Childhood Vaccines and Febrile Seizures Concerns. https://www.cdc.gov/vaccinesafety/concerns/febrile-seizures.html Accessed June 15, 2020
- 17. Geier MR, Geier DA, Zahalsky AC.. Influenza vaccination and Guillain Barre syndrome. Clin Immunol 2003; 107 (2): 116–21. doi:10.1016/S1521-6616(03)00046-9 [DOI] [PubMed] [Google Scholar]
- 18. Vellozzi C, Burwen DR, Dobardzic A, et al. Safety of trivalent inactivated influenza vaccines in adults: background for pandemic influenza vaccine safety monitoring. Vaccine 2009; 27 (15): 2114–20. doi:10.1016/j.vaccine.2009.01.125 [DOI] [PubMed] [Google Scholar]
- 19. Hartung HP, Willison HJ, Kieseier BC.. Acute immunoinflammatory neuropathy: update on Guillain-Barré syndrome. Curr Opin Neurol 2002; 15 (5): 571–7. doi:10.1097/00019052-200210000-00008 [DOI] [PubMed] [Google Scholar]
- 20. Lasky T, Terracciano GJ, Magder L, et al. The Guillain–Barré Syndrome and the 1992–1993 and 1993–1994 influenza vaccines. N Engl J Med 1998; 339 (25): 1797–802. doi:10.1056/NEJM199812173392501 [DOI] [PubMed] [Google Scholar]
- 21. Wu S, Roberts K, Datta S, et al. Deep learning in clinical natural language processing: a methodical review. J. Am. Med. Informatics Assoc 2020; 27 (3): 457–70. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22. Soysal E, Wang J, Jiang M, et al. CLAMP: a toolkit for efficiently building customized clinical natural language processing pipelines. J Am Med Informatics Assoc 2018; 25 (3): 331–6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23. Ritter A, Clark S, Mausam M, Etzioni O.. A survey of named entity recognition and classification. Lingvisticae Investigationes 2007; 30: 3–26. doi:10.1075/li.30.1.03nad [Google Scholar]
- 24. Settles B. Biomedical named entity recognition using conditional random fields and rich feature sets. In: proceedings of the International Joint Workshop on Natural Language Processing in Biomedicine and its Applications (NLPBA/BioNLP); August 2004; Geneva, Switzerland.
- 25. Lafferty J, McCallum A, Pereira FCN. Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In: Proceedings of the Eighteenth International Conference on Machine Learning (ICML '01); June 28–July 1, 2001; Williams College, Williamstown, MA, USA.
- 26. Tang B, Cao H, Wu Y, et al. Clinical entity recognition using structural support vector machines with rich features. In: proceedings of the ACM Sixth International Workshop on Data and Text Mining in Biomedical Informatics; October 2012; Maui, Hawaii, USA.
- 27. Li D, Savova G, Kipper K. Conditional random fields and support vector machines for disorder named entity recognition in clinical texts. In: proceedings of the Workshop on Current Trends in Biomedical Natural Language Processing; June 2008; Columbus, Ohio, USA.
- 28. Mikolov T, Karafiát M, Burget L, et al. Recurrent neural network based language model. In: proceedings of the Eleventh Annual Conference of the International Speech Communication Association; September 2010; Makuhari, Chiba, Japan.
- 29. Hochreiter S, Schmidhuber J.. Long short-term memory. Neural Comput 1997; 9 (8): 1735–80. [DOI] [PubMed] [Google Scholar]
- 30. Chiu JPC, Nichols E.. Named entity recognition with bidirectional LSTM-CNNs. TACL 2016; 4: 357–70. doi:10.1162/tacl_a_00104 [Google Scholar]
- 31. Lample G, Ballesteros M, Subramanian S, et al. Neural architectures for named entity recognition. arXiv Prepr arXiv1603013602016.
- 32. Limsopatham N, Collier N. Bidirectional LSTM for named entity recognition in Twitter messages. In: Proceedings of the 2nd Workshop on Noisy User-generated Text; December 2016: 145–52; Osaka, Japan.
- 33. Huang Z, Xu W, Yu K. Bidirectional LSTM-CRF models for sequence tagging. arXiv Prepr arXiv150801991 2015.
- 34.Mikolov T, Sutskever I, Chen K, Corrado GS, Dean J. Distributed representations of words and phrases and their compositionality. In: Advances in Neural Information Processing Systems 26 (NIPS 2013); December 2013: 3111–9; Lake Tahoe, Nevada, USA. [Google Scholar]
- 35. Li J, Sun A, Han J, Li C. A survey on deep learning for named entity recognition. IEEE Trans Knowl Data Eng2020. doi: 10.1109/TKDE.2020.2981314.
- 36. Dernoncourt F, Lee JY, Szolovits P. Neuroner: An easy-to-use program for named-entity recognition based on neural networks. In: Proceedings of the 2017 EMNLP System Demonstrations; September 7–11, 2017: 97–102; Copenhagen, Denmark. [Google Scholar]
- 37.Franck-Dernoncourt/NeuroNER: Named-entity recognition using neural networks. Easy-to-use and state-of-the-art results. https://github.com/Franck-Dernoncourt/NeuroNER Accessed June 15, 2020
- 38. Zhang Y, Chen Q, Yang Z, Lin H, Lu Z.. BioWordVec, improving biomedical word embeddings with subword information and MeSH. Sci Data 2019; 6 (1): 1–9. doi:10.1038/s41597-019-0055-0 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39. Pan SJ, Yang Q.. A survey on transfer learning. IEEE Trans Knowl Data Eng 2010; 22 (10): 1345–59. doi:10.1109/TKDE.2009.191 [Google Scholar]
- 40. Devlin J, Chang M-W, Lee K, et al. BERT: Pre-training of deep bidirectional transformers for language understanding. arXiv Prepr arXiv181004805 Published Online First: 2018.http://arxiv.org/abs/1810.04805
- 41. Lee J, Yoon W, Kim S, et al. BioBERT: A pre-trained biomedical language representation model for biomedical text mining. Bioinformatics 2020; 36 (4): 1234–40. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42. Si Y, Wang J, Xu H, et al. Enhancing clinical concept extraction with contextual embeddings. J Am Med Inform Assoc 2019; 26 (11): 1297–304. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.kyzhouhzau/BERT-NER. https://github.com/kyzhouhzau/BERT-NER Accessed June 15, 2020
- 44. Henry S, Buchan K, Filannino M, et al. 2018 n2c2 shared task on adverse drug events and medication extraction in electronic health records. J Am Med Inform Assoc 2020; 27 (1): 3–12. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45. Wei Q, Ji Z, Si Y, et al. Relation extraction from clinical narratives using pre-trained language models. AMIA Annu Symp Proc 2019; 2019: 1236–45. [PMC free article] [PubMed] [Google Scholar]
- 46. Ji Z, Wei Q, Xu H.. Bert-based ranking for biomedical entity normalization. AMIA Summits Transl Sci Proc 2020; 2020: 269–77. [PMC free article] [PubMed] [Google Scholar]
- 47. Du J, Zhang Y, Luo J, et al. Extracting psychiatric stressors for suicide from social media using deep learning. BMC Med Inform Decis Mak 2018; 18 (S2): 43. doi:10.1186/s12911-018-0632-8 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48.Welcome to MedDRA | MedDRA. https://www.meddra.org/Accessed June 18, 2020
- 49. Li F, Du J, He Y, et al. Time Event Ontology (TEO): to support semantic representation and reasoning of complex temporal relations of clinical events. J Am Med Informatics Assoc 2020; 27 (7): 1046–56. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 50.TempEval 2010 TLINK Guidelines. http://www.timeml.org/tempeval2/tempeval2-trial/guidelines/tlink-guidelines-081409.pdf Accessed August 11, 2020
- 51.Understanding and Explaining mRNA COVID-19 Vaccines CDC. https://www.cdc.gov/vaccines/covid-19/hcp/mrna-vaccine-basics.html Accessed November 30, 2020
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
The annotated corpus underlying this article are available at https://github.com/UT-Tao-group/Vaccine-AE-recognition.

