Table 1.
Components and subcomponents | Functions |
1. Pre-processor | Prepares the text for the main processing. |
1.1 Sentence tokenizer | Splits text into sentences using a period. |
1.2 Word tokenizer | Splits each sentence into tokens. |
1.3 Normalizer | Removes punctuation marks and converts text to lowercase (1st normalization step). |
Removes the tokens tagged as ‘Unimportant’ after their tagging by the semantic tagger (2nd normalization step). | |
Removes an irrelevant tagged token that disrupts the contiguous tokens of a feature (3rd normalization step). | |
2. VAERS dictionary | Includes 55 000 entries (each entry includes to a term and its tag that corresponds to a semantic type). |
3. Semantic tagger | Tags the tokens based on the dictionary entries. |
4. Grammar rules | Define the relationships between tags (ie, the semantic types). |
5. Rule-based parser | Parses the text by executing the grammar rules after: (1) the 2nd normalization step, and (2) the 3rd normalization step. |
6. Features extractor | Extracts the predefined features. |
VAERS, vaccine adverse event reporting system; VaeTM, vaccine adverse event text mining.