Table 2.
Feature seta | Step 1: Boundary detection |
Steps 1 + 2: Boundary detection + Classification |
|||||
---|---|---|---|---|---|---|---|
Precision | Recall | F1-score | Precision | Recall | F1-score | ||
BOW | Exact | 0.8284 | 0.6661 | 0.7384 | 0.7917 | 0.6363 | 0.7054 |
Inexact | 0.9411 | 0.8137 | 0.8728 | 0.8715 | 0.7536 | 0.8083 | |
BOW + POS + Lemma | Exact | 0.8687 | 0.7393 | 0.7988 | 0.8342 | 0.7100 | 0.7671 |
Inexact | 0.9480 | 0.8325 | 0.8865 | 0.8894 | 0.7811 | 0.8317 | |
BOW + POS + Lemma + UMLS | Exact | 0.8644 | 0.7574 | 0.8073 | 0.8341 | 0.7309 | 0.7791 |
Inexact | 0.9445 | 0.8541 | 0.8970 | 0.8836 | 0.7991 | 0.8392 | |
BOW + POS + Lemma + UMLS + BC | Exact | 0.8682 | 0.7661 | 0.8137 | 0.8382 | 0.7400 | 0.7861 |
Inexact | 0.9491 | 0.8558 | 0.8978 | 0.8866 | 0.8037 | 0.8432 |
Entity classes | Precision |
Recall |
F1 score |
|||
---|---|---|---|---|---|---|
Inexact | Exact | Inexact | Exact | Inexact | Exact | |
*Baseline – CliNER (Problem class) | 0.3692 | 0.3421 | 0.4809 | 0.4140 | 0.4177 | 0.3746 |
*Baseline – EliXR (Disorder group) | 0.6402 | 0.4289 | 0.8138 | 0.7089 | 0.7176 | 0.5345 |
Condition | 0.9071 | 0.8566 | 0.8788 | 0.8209 | 0.8927 | 0.8384 |
Observation | 0.83.97 | 0.8169 | 0.7378 | 0.6760 | 0.7855 | 0.7398 |
Procedure/Device | 0.8817 | 0.7951 | 0.6581 | 0.6110 | 0.7537 | 0.6910 |
Drug/Substance | 0.9027 | 0.8573 | 0.7287 | 0.7179 | 0.8064 | 0.7814 |
Qualifier/Modifier | 0.8807 | 0.8505 | 0.7412 | 0.7253 | 0.8049 | 0.7829 |
Temporal Constraints | 0.8808 | 0.8045 | 0.8239 | 0.7254 | 0.8514 | 0.7629 |
Measurement | 0.8984 | 0.8101 | 0.8401 | 0.7168 | 0.8683 | 0.7606 |
Overall |
0.8866 |
0.8382 |
0.8037 |
0.7400 |
0.8432 |
0.7861 |
aFeature notation: BOW: bag of words; POS: part of speech; BC: brown clustering. The upper table describes the general performance with different feature sets. The lower table shows the detailed results of each class using the best feature set (BOW + POS + Lemma + UMLS + BC.
*Here we choose the performance of “problem” entity class in CliNER and concepts that belong to UMLS disorder semantic types identified by EliXR as 2 baselines. We compare 2 baselines with the performance of the “Condition” entity class by EliIE. The full list of semantic types we include is: T020, T190, T049, T019, T047, T050, T033, T037, T048, T191, T046, T184. The bold values in feature set (BOW + POS + Lemma + UMLS + BC) correspond to the overall best performance was achieved using the combination of all the features. The bold values in Entity classes (Procedure/Device) due to the less occurrence in the trials, Procedure/Device has the worst performance with F1 score of 0.69 among all the entity classes. The bold values in Entity classes (Overall) indicate by implementing the system with the best setting (BOW+POS+Lemma+UMLS), the overall performance achieves precision, recall and F1 score with 0.84, 0.74, and 0.79 respectively.