Skip to main content
. 2016 Nov 9;7(4):1051–1068. doi: 10.4338/ACI-2016-08-RA-0129

Table 3.

Measures of importance for the word stems and bigrams. The ten terms with the most positive and negative coefficients from the regularized logistic regression model are shown alongside the Gini importance measure from the random forest model, which approximates the permutation importance of the variable.

Word Stem or Bigram Coefficient (Regularized Logistic Regression) Gini Importance (Random Forest) Percent of Documents
longbon* 3.00 61.76 52.1%
fractur 1.79 18.71 66.0%
close reduct 1.44 0.52 2.1%
cast 1.06 7.75 12.4%
distal 0.89 32.41 40.1%
distal forearm 0.79 0.70 1.3%
through 0.72 3.28 8.6%
metaphysi 0.72 1.49 4.4%
angul 0.66 16.78 19.4%
buckl 0.62 1.95 4.0%
left elbow 0.58 0.16 1.2%
fractur disloc -0.61 1.57 5.8%
normal -0.73 11.73 45.9%
handbon* -0.73 2.36 5.9%
injuri -0.77 0.51 1.4%
no visibl -1.03 2.50 8.0%
heal -1.22 0.73 1.8%
acut fractur -1.30 0.58 2.0%
Nth* -1.31 0.55 1.2%
no fractur -1.75 3.41 7.3%
proxim handbon* -2.31 1.31 1.7%

*The terms “longbon,” “handbon,” and “Nth” were introduced during the text normalization process.