Skip to main content
. 2014 Oct 21;22(e1):e48–e66. doi: 10.1136/amiajnl-2014-002868

Table 6:

Contribution of each feature to the best system compilation using cross-validation with 120 reports for training

Removed feature\category Macro-averaged over ICCCO Iccco iCcco icCco iccCo icccO NA
Difference in precision [%]
 Word −1.09 −0.57 −1.77 −3.16 −1.39 −4.49 −0.07
 Lemma −0.93 0.00 0.31 −2.58 −0.93 −4.36 −0.38
 NER −0.91 −1.31 6.69 −2.26 −0.82 −7.14 −0.50
 POS −1.07 −1.98 −3.12 −7.11 −1.22 −9.09 −1.15
 Parse tree −0.43 −2.60 4.85 3.63 −1.73 −7.14 −0.13
 Basic dependents 0.35 −0.76 0.72 0.51 −0.78 0.00 −0.19
 Basic governors −1.98 −0.92 1.51 0.33 −1.89 −14.64 0.34
 Phrase context −2.46 −0.34 0.05 −0.55 −3.60 −9.40 −0.65
 Top 5 candidates −0.48 0.00 −2.07 0.73 −1.14 −9.09 −0.26
 Top mapping 0.06 −1.15 5.22 −0.15 −0.24 0.00 0.63
 Location percentage 0.35 −0.79 20.47 7.36 0.32 −4.36 −0.24
 Medication score 0.11 −0.57 4.63 1.00 0.20 −4.36 −0.08
 SNOMED-CT-AU IDs −0.34 −0.75 0.73 −3.90 −0.42 −12.69 0.18
Difference in recall [%]
 Word −1.11 0.00 −4.94 −0.66 −1.42 0.00 −0.01
 Lemma −0.85 0.00 −4.94 −3.81 −0.93 −2.12 −0.37
 NER −1.52 −0.94 −0.10 −2.53 −0.73 −2.12 −0.12
 POS −2.54 −0.38 −4.11 −6.31 −2.44 −2.10 0.43
 Parse tree 2.18 0.51 4.72 3.95 1.36 −5.82 −1.50
 Basic dependents 0.02 −0.27 0.28 0.45 −0.68 0.00 0.00
 Basic governors −2.60 0.93 −0.04 0.60 −4.55 −1.99 0.31
 Phrase context −2.21 −0.53 −2.33 0.00 −3.38 0.00 −0.57
 Top 5 candidates −1.74 0.69 −1.81 −4.51 −1.96 −6.06 0.68
 Top mapping 0.08 0.95 1.64 1.40 0.22 0.00 0.30
 Location percentage −2.35 −0.31 16.10 3.20 −2.01 −2.12 1.97
 Medication score −0.11 0.69 0.49 1.07 −0.22 −2.12 −0.18
 SNOMED-CT-AU IDs −0.31 −0.50 −5.54 0.14 0.85 −7.68 −0.03
Difference in F1 [%]
 Word −1.10 −0.27 −3.86 −1.43 −1.42 −1.02 −0.05
 Lemma −0.88 0.00 −3.14 −3.70 −0.94 −2.85 −0.38
 NER −1.23 −1.11 2.38 −2.61 −0.79 −3.47 −0.33
 POS −1.87 −1.14 −3.80 −6.93 −1.75 −3.92 −0.44
 Parse tree 0.94 −0.98 4.85 4.06 −0.48 −6.88 −0.78
 Basic dependents 0.17 −0.50 0.46 0.49 −0.75 0.00 −0.11
 Basic governors −2.31 0.05 0.55 0.55 −3.05 −5.33 0.32
 Phrase context −2.33 −0.44 −1.48 −0.16 −3.54 −2.29 −0.62
 Top 5 candidates −1.16 0.37 −1.93 −3.41 −1.50 −7.48 0.16
 Top mapping 0.07 −0.05 3.02 1.01 −0.05 0.00 0.47
 Location percentage −1.13 −0.54 18.02 4.58 −0.68 −2.85 0.76
 Medication score 0.00 0.09 2.04 1.11 0.02 −2.85 −0.13
 SNOMED-CT-AU IDs −0.32 −0.61 −3.40 −1.07 0.10 −9.71 0.08

Negative values indicate that removing a given feature decreases the performance—the larger the absolute value the more this feature contributes to the performance of the best compilation. Positive values indicate that a given feature does not contribute to the performance—the larger the value the more harmful the feature. NA refers to the category for irrelevant text. Minimal and maximal values are in bold.

POS, part of speech.