Skip to main content
. 2017 Jul 5;2017:4898963. doi: 10.1155/2017/4898963

Table 4.

List of various features for the drug name recognizer.

Feature set Features Description
F1-1 CWS = 1:
1‐gram = Ci−1, Ci, Ci+1
2‐gram = Ci−1 Ci, CiCi+1
3‐gram = Ci−1 CiCi+1
The 1-gram, 2-gram, and 3-gram of the character text at CWS = 1
F1-2 CWS = 2:
1‐gram = Ci−2, Ci−1, Ci, Ci+1, Ci+2
2‐gram = Ci−2 Ci−1, Ci−1 Ci, CiCi+1, Ci+1Ci+2
3‐gram = Ci−2 Ci−1Ci, Ci−1 CiCi+1, Ci+1 Ci+2Ci+3
The 1-gram, 2-gram, and 3-gram of the character text at CWS = 2
F1-3 CWS = 3:
1‐gram = Ci−3, Ci−2, Ci−1, Ci, Ci+1, Ci+2, Ci+3
2‐gram = Ci−3 Ci−2, Ci−2 Ci−1, Ci−1 Ci, CiCi+1, Ci+1Ci+2, Ci+2Ci+3
3‐gram = Ci−3 Ci−2Ci−1, Ci−2 Ci−1Ci, Ci−1 CiCi+1, Ci+1 Ci+2Ci+3, Ci+2 Ci+3Ci+4
The 1-gram, 2-gram, and 3-gram of the character text at CWS = 3
F1-4 CWS = 1:
1‐gram = Pi−1, Pi, Pi+1
2‐gram = Pi−1 Pi, PiPi+1
3‐gram = Pi−1 PiPi+1
The 1-gram, 2-gram, and 3-gram of the pinyin corresponding to the current character at CWS = 1
F1-5 CWS = 2:
1‐gram = Pi−2, Pi−1, Pi, Pi+1, Pi+2
2‐gram = Pi−2 Pi−1, Pi−1 Pi, PiPi+1, Pi+1Pi+2
3‐gram = Pi−2 Pi−1Pi, Pi−1 PiPi+1, Pi+1 Pi+2Pi+3
The 1-gram, 2-gram, and 3-gram of the pinyin corresponding to the current character at CWS = 2
F1-6 CWS = 3:
1‐gram = Pi−3, Pi−2, Pi−1, Pi, Pi+1, Pi+2, Pi+3
2‐gram = Pi−3 Pi−2, Pi−2 Pi−1, Pi−1 Pi, PiPi+1, Pi+1Pi+2, Pi+2Pi+3
3‐gram = Pi−3 Pi−2Pi−1, Pi−2 Pi−1Pi, Pi−1 PiPi+1, Pi+1 Pi+2Pi+3, Pi+2 Pi+3Pi+4
The 1-gram, 2-gram, and 3-gram of the pinyin corresponding to the current character at CWS = 3
F2-1 InDictTCM Are the current character and the surrounding characters contained in the TCM dictionary?
F2-2 InDictTCMPinyin Are the pinyins corresponding to the current character and the surrounding characters contained in the TCM dictionary?
F2-3 InDictWM Are the current character and the surrounding characters contained in the WM dictionary?
F2-4 InDictWMPinyin Are the pinyins corresponding to the current character and the surrounding characters contained in the WM dictionary?
F3-1 CurCx-HasTCMDoseUnit Do the current character and subsequent characters contain the TCM dosage unit x = {i, i + 1, i + 2, i + 3} at CWS = 3?
F3-2 CurCx-HasWMDoseUnit Do the current character and subsequent characters contain the WM dosage unit x = {i, i + 1, i + 2, i + 3} at CWS = 3?
F3-3 PreCx-HasTCMDoseUnit Do the characters before the current character contain the TCM dosage unit x = {i − 1, i − 2, i − 3} at CWS = 3?
F3-4 PreCx-HasWMDoseUnit Do the characters before the current character contain the WM dosage unit x = {i − 1, i − 2, i − 3} at CWS = 3?
F3-5 CurCx-HasTCMRoute Do the current character and subsequent characters contain the TCM usage term x = {i, i + 1, i + 2, i + 3} at CWS = 3?
F3-6 CurCx-HasWMRoute Do the current character and subsequent characters contain the WM usage term x = {i, i + 1, i + 2, i + 3} at CWS = 3?
F3-7 PreCx-HasTCMRoute Do the characters before the current character contain the TCM usage term x = {i − 1, i − 2, i − 3} at CWS = 3?
F3-6 PreCx-HasWMRoute Do the characters before the current character contain the WM usage term x = {i − 1, i − 2, i − 3} at CWS = 3?
F3-9 CurCx-HasTCMFormUnit Do the current character and subsequent characters contain the TCM drug form unit x = {i, i + 1, i + 2, i + 3} at CWS = 3?
F3-10 CurCx-HasWMFormUnit Do the current character and subsequent characters contain the WM drug form unit x = {i, i + 1, i + 2, i + 3} at CWS = 3?
F3-11 PreCx-HasTCMFormUnit Do the characters before the current character contain the TCM drug form unit x = {i − 1, i − 2, i − 3} at CWS = 3?
F3-12 PreCx-HasWMFormUnit Do the characters before the current character contain the WM drug form unit x = {i − 1, i − 2, i − 3} at CWS = 3?
F3-13 CurCx-HasTCMFrequency Do the current character and subsequent characters contain the TCM frequency description x = {i, i + 1, i + 2, i + 3} at CWS = 3?
F3-14 CurCx-HasWMFrequency Do the current character and subsequent characters contain the WM frequency description x = {i, i + 1, i + 2, i + 3} at CWS = 3?
F3-15 PreCx-HasTCMFrequency Do the characters before the current character contain the TCM frequency description x = {i − 1, i − 2, i − 3} at CWS = 3?
F3-16 PreCx-HasWMFrequency Do the characters before the current character contain the WM frequency description x = {i − 1, i − 2, i − 3} at CWS = 3?
F4-1 HasNum9 Do the current character and the surrounding characters include the figure “9”?
F4-2 HasToken@ Do the current character and the surrounding characters include the symbol “@”?
F4-3 HasEnglishAlphabets Do the current character and the surrounding characters include English letters?
F4-4 HasTime Do the current character and the surrounding characters contain time description such as hour, week, date, or year?
F5 InListSectionName Do the name of AN section involving the current character and the surrounding characters appear in the predefined section list?
F6 Classx = [BIO] These three types of features indicate the type labels of the 3 characters before the current character x = {i − 1, i − 2, i − 3}