Table 1.
Handcraft features extracted for the ADME task.
Feature name | Description |
---|---|
PoSa | PoS information generated in the preprocessing step was extracted for the current word. |
Context wordsa | A context window of 7 was set to extract the current word and its surrounding words. |
Chunk | The chunk information of the current word was extracted. |
Morphological featuresa | The morphological features such as prefixes and suffixes defined in our previous work28 were extracted, which were empirically shown to provide clues for classifying the type of concepts. |
Orthographic featuresa | The orthographic features defined in our previous work28 were extracted, which were empirically shown to be able to detect patterns of named entities. |
Common medical abbreviations | Whether the current word matched with the common medical abbreviations defined in the annotation guideline.30 The following list of names and the corresponding entity types was used:
|
ADE, drug and disease dictionary featuresa | The dictionaries used in our previous work31 were encoded based on the occurrence encoding presented in our previous work.29 The encoded information of the current word was extracted. |
Word cluster featuresa | The cluster number where the current word belongs to was extracted as a feature. The cluster was generated by using the k-means algorithm from the word embedding vectors. |
ADE: adverse drug event; ADME: adverse drug events and medication extraction; PoS: part of speech.
aFeature also used by other top-ranked teams.