Skip to main content
. 2019 Jul 23;27(1):47–55. doi: 10.1093/jamia/ocz120

Table 1.

Handcraft features extracted for the ADME task.

Feature name Description
PoSa PoS information generated in the preprocessing step was extracted for the current word.
Context wordsa A context window of 7 was set to extract the current word and its surrounding words.
Chunk The chunk information of the current word was extracted.
Morphological featuresa The morphological features such as prefixes and suffixes defined in our previous work28 were extracted, which were empirically shown to provide clues for classifying the type of concepts.
Orthographic featuresa The orthographic features defined in our previous work28 were extracted, which were empirically shown to be able to detect patterns of named entities.
Common medical abbreviations Whether the current word matched with the common medical abbreviations defined in the annotation guideline.30 The following list of names and the corresponding entity types was used:
  • Route: IV, PO, Gtt, drip(s), Inhalation, Topical

  • Drug: IVF(s), PRBC(s)

  • Frequency: PRN, QD, bid

ADE, drug and disease dictionary featuresa The dictionaries used in our previous work31 were encoded based on the occurrence encoding presented in our previous work.29 The encoded information of the current word was extracted.
Word cluster featuresa The cluster number where the current word belongs to was extracted as a feature. The cluster was generated by using the k-means algorithm from the word embedding vectors.

ADE: adverse drug event; ADME: adverse drug events and medication extraction; PoS: part of speech.

aFeature also used by other top-ranked teams.