Table 2. The Accuracy of the Human-Machine Classification System: Implementation of a Strategic Filtera Based on Agreement Between Two Naïve Bayes Algorithms.
BLS OIICS 2-Digit Event Code | Gold Standardc | Human-Machine System Coding of all Narrativesd | %Agreement Between 2 Manual Codersj | Fleiss Kappak manual coders | ||||||
---|---|---|---|---|---|---|---|---|---|---|
| ||||||||||
(n) | nprede | %predf,g | Senh | 95% CI | PPVi | 95% CI | ||||
1* Violence and other injuries by persons or animals | ||||||||||
11 | Intentional injury by person | 159 | 132 | 0.9 | 0.81 | 0.75, 0.87 | 0.98 | 0.95, 1.00 | 81%–97% | 0.85 |
2* Transportation incidents | ||||||||||
24 | Pedestrian vehicular incidents | 120 | 117 | 0.8 | 0.78 | 0.71, 0.86 | 0.80 | 0.73, 0.88 | 57%–78% | 0.65 |
26 | Roadway incidents motorized land vehicle | 650 | 672 | 4.5 | 0.98 | 0.97, 0.99 | 0.95 | 0.93, 0.97 | 93%–96% | 0.94 |
27 | Nonroadway incidents motorized land vehicle | 136 | 122 | 0.8 | 0.80 | 0.73, 0.87 | 0.89 | 0.84, 0.95 | 52%–84% | 0.62 |
4* Falls, slips, trips | ||||||||||
41 | Slip or trip without fall | 806 | 658 | 4.4 | 0.70 | 0.67, 0.73 | 0.86 | 0.83, 0.89 | 66%–89% | 0.71 |
42 | Falls on same level | 2,148 | 2386 | 15.9 | 0.92 | 0.91, 0.93 | 0.83 | 0.81, 0.84 | 85%–93% | 0.86 |
43 | Falls to lower level | 1,065 | 1176 | 7.8 | 0.89 | 0.87, 0.91 | 0.81 | 0.79, 0.83 | 78%–92% | 0.81 |
5* Exposure to harmful substances or environments | ||||||||||
53 | Exposure to temperature extremes | 141 | 130 | 0.9 | 0.86 | 0.8, 0.92 | 0.93 | 0.89, 0.97 | 82%–98% | 0.88 |
55 | Exposure to other harmful substances | 175 | 165 | 1.1 | 0.83 | 0.77, 0.88 | 0.88 | 0.83, 0.93 | 81%–96% | 0.87 |
6* Contact with objects and equipment | ||||||||||
62 | Struck by object or equipment | 1,651 | 1749 | 11.7 | 0.90 | 0.89, 0.92 | 0.85 | 0.83, 0.87 | 82%–90% | 0.82 |
63 | Struck against object or equipment | 466 | 397 | 2.6 | 0.74 | 0.7, 0.78 | 0.87 | 0.84, 0.91 | 66%–83% | 0.68 |
64 | Caught in or compressed by equipment | 505 | 532 | 3.5 | 0.90 | 0.87, 0.93 | 0.86 | 0.83, 0.89 | 72%–83% | 0.75 |
7* Overexertion and bodily reaction | ||||||||||
70 | Overexertion and bodily reaction, uns | 188 | 151 | 1.0 | 0.59 | 0.51, 0.66 | 0.73 | 0.66, 0.80 | 6%–48% | 0.19 |
71 | Overexertion involving outside sources | 4,189 | 4334 | 28.9 | 0.95 | 0.95, 0.96 | 0.92 | 0.91, 0.93 | 87%–95% | 0.87 |
72 | Repetitive motions involving micro tasks | 484 | 537 | 3.6 | 0.90 | 0.87, 0.92 | 0.81 | 0.77, 0.84 | 71%–83% | 0.75 |
73 | Other exertions or bodily reactions | 916 | 827 | 5.5 | 0.79 | 0.76, 0.82 | 0.88 | 0.85, 0.90 | 56%–85% | 0.64 |
X* All other classifiables (n<100) in training dataset | ||||||||||
xx | Other small (n<100 cases) classifiable categoriesb | 632 | 467 | 3.1 | 0.68 | 0.64, 0.72 | 0.92 | 0.89, 0.94 | - | - |
Nonclassifiable | ||||||||||
9999 | Nonclassifiable | 569 | 448 | 3.0 | 0.70 | 0.66, 0.74 | 0.89 | 0.86, 0.92 | 69%–84% | 0.72 |
| ||||||||||
Overall | 15,000 | 15,000 | 100.0 | 0.87 | 0.87, 0.88 | 0.87 | 0.87, 0.88 | 77%–90% | 0.78 |
A filter is a technique to decide which narratives the computer should classify vs. which should be left for a human to read and classify.
Two-digit categories with <100 cases.
Gold Standard codes were assigned to each narrative by expert manual coders.
Human-Machine system: The computer assigns codes to narratives that the algorithms agreed on the classification (68% of the dataset), and the remainder are manually coded (32 % of the dataset).
npred = number predicted into category.
%pred = percent of cases in whole dataset predicted into category.
The distribution of two-digit classifications will be skewed towards categories with high sensitivity, biasing the finally distribution of the coded datasets.
Sen = Sensitivity: (true positives) the percentage of narratives that had been coded by the experts into each category that were also assigned correctly by the algorithm.
PPV = Positive Predicted Value: the percentage of narratives correctly coded into a specific category out of all narratives placed into that category by the algorithm.
Two-coder agreement, e.g. 6 total comparisons, coder 1 compared to 2,3,4, coder 2 compared to 3,4 coder 3 compared to 4.
Fleiss Kappa between 0 and 1, > 0.6 considered good agreement, >.8 considered very good agreement.
Naivesw = Naïve Bayes Single Word Algorithm. Naiveseq = Naïve Bayes Sequence Word Algorithm