Table 4.
Feature set | Features | Description |
---|---|---|
F1-1 | CWS = 1: 1‐gram = Ci−1, Ci, Ci+1 2‐gram = Ci−1 Ci, CiCi+1 3‐gram = Ci−1 CiCi+1 |
The 1-gram, 2-gram, and 3-gram of the character text at CWS = 1 |
F1-2 | CWS = 2: 1‐gram = Ci−2, Ci−1, Ci, Ci+1, Ci+2 2‐gram = Ci−2 Ci−1, Ci−1 Ci, CiCi+1, Ci+1Ci+2 3‐gram = Ci−2 Ci−1Ci, Ci−1 CiCi+1, Ci+1 Ci+2Ci+3 |
The 1-gram, 2-gram, and 3-gram of the character text at CWS = 2 |
F1-3 | CWS = 3: 1‐gram = Ci−3, Ci−2, Ci−1, Ci, Ci+1, Ci+2, Ci+3 2‐gram = Ci−3 Ci−2, Ci−2 Ci−1, Ci−1 Ci, CiCi+1, Ci+1Ci+2, Ci+2Ci+3 3‐gram = Ci−3 Ci−2Ci−1, Ci−2 Ci−1Ci, Ci−1 CiCi+1, Ci+1 Ci+2Ci+3, Ci+2 Ci+3Ci+4 |
The 1-gram, 2-gram, and 3-gram of the character text at CWS = 3 |
F1-4 | CWS = 1: 1‐gram = Pi−1, Pi, Pi+1 2‐gram = Pi−1 Pi, PiPi+1 3‐gram = Pi−1 PiPi+1 |
The 1-gram, 2-gram, and 3-gram of the pinyin corresponding to the current character at CWS = 1 |
F1-5 | CWS = 2: 1‐gram = Pi−2, Pi−1, Pi, Pi+1, Pi+2 2‐gram = Pi−2 Pi−1, Pi−1 Pi, PiPi+1, Pi+1Pi+2 3‐gram = Pi−2 Pi−1Pi, Pi−1 PiPi+1, Pi+1 Pi+2Pi+3 |
The 1-gram, 2-gram, and 3-gram of the pinyin corresponding to the current character at CWS = 2 |
F1-6 | CWS = 3: 1‐gram = Pi−3, Pi−2, Pi−1, Pi, Pi+1, Pi+2, Pi+3 2‐gram = Pi−3 Pi−2, Pi−2 Pi−1, Pi−1 Pi, PiPi+1, Pi+1Pi+2, Pi+2Pi+3 3‐gram = Pi−3 Pi−2Pi−1, Pi−2 Pi−1Pi, Pi−1 PiPi+1, Pi+1 Pi+2Pi+3, Pi+2 Pi+3Pi+4 |
The 1-gram, 2-gram, and 3-gram of the pinyin corresponding to the current character at CWS = 3 |
F2-1 | InDictTCM | Are the current character and the surrounding characters contained in the TCM dictionary? |
F2-2 | InDictTCMPinyin | Are the pinyins corresponding to the current character and the surrounding characters contained in the TCM dictionary? |
F2-3 | InDictWM | Are the current character and the surrounding characters contained in the WM dictionary? |
F2-4 | InDictWMPinyin | Are the pinyins corresponding to the current character and the surrounding characters contained in the WM dictionary? |
F3-1 | CurCx-HasTCMDoseUnit | Do the current character and subsequent characters contain the TCM dosage unit x = {i, i + 1, i + 2, i + 3} at CWS = 3? |
F3-2 | CurCx-HasWMDoseUnit | Do the current character and subsequent characters contain the WM dosage unit x = {i, i + 1, i + 2, i + 3} at CWS = 3? |
F3-3 | PreCx-HasTCMDoseUnit | Do the characters before the current character contain the TCM dosage unit x = {i − 1, i − 2, i − 3} at CWS = 3? |
F3-4 | PreCx-HasWMDoseUnit | Do the characters before the current character contain the WM dosage unit x = {i − 1, i − 2, i − 3} at CWS = 3? |
F3-5 | CurCx-HasTCMRoute | Do the current character and subsequent characters contain the TCM usage term x = {i, i + 1, i + 2, i + 3} at CWS = 3? |
F3-6 | CurCx-HasWMRoute | Do the current character and subsequent characters contain the WM usage term x = {i, i + 1, i + 2, i + 3} at CWS = 3? |
F3-7 | PreCx-HasTCMRoute | Do the characters before the current character contain the TCM usage term x = {i − 1, i − 2, i − 3} at CWS = 3? |
F3-6 | PreCx-HasWMRoute | Do the characters before the current character contain the WM usage term x = {i − 1, i − 2, i − 3} at CWS = 3? |
F3-9 | CurCx-HasTCMFormUnit | Do the current character and subsequent characters contain the TCM drug form unit x = {i, i + 1, i + 2, i + 3} at CWS = 3? |
F3-10 | CurCx-HasWMFormUnit | Do the current character and subsequent characters contain the WM drug form unit x = {i, i + 1, i + 2, i + 3} at CWS = 3? |
F3-11 | PreCx-HasTCMFormUnit | Do the characters before the current character contain the TCM drug form unit x = {i − 1, i − 2, i − 3} at CWS = 3? |
F3-12 | PreCx-HasWMFormUnit | Do the characters before the current character contain the WM drug form unit x = {i − 1, i − 2, i − 3} at CWS = 3? |
F3-13 | CurCx-HasTCMFrequency | Do the current character and subsequent characters contain the TCM frequency description x = {i, i + 1, i + 2, i + 3} at CWS = 3? |
F3-14 | CurCx-HasWMFrequency | Do the current character and subsequent characters contain the WM frequency description x = {i, i + 1, i + 2, i + 3} at CWS = 3? |
F3-15 | PreCx-HasTCMFrequency | Do the characters before the current character contain the TCM frequency description x = {i − 1, i − 2, i − 3} at CWS = 3? |
F3-16 | PreCx-HasWMFrequency | Do the characters before the current character contain the WM frequency description x = {i − 1, i − 2, i − 3} at CWS = 3? |
F4-1 | HasNum9 | Do the current character and the surrounding characters include the figure “9”? |
F4-2 | HasToken@ | Do the current character and the surrounding characters include the symbol “@”? |
F4-3 | HasEnglishAlphabets | Do the current character and the surrounding characters include English letters? |
F4-4 | HasTime | Do the current character and the surrounding characters contain time description such as hour, week, date, or year? |
F5 | InListSectionName | Do the name of AN section involving the current character and the surrounding characters appear in the predefined section list? |
F6 | Classx = [BIO] | These three types of features indicate the type labels of the 3 characters before the current character x = {i − 1, i − 2, i − 3} |