Table 3. The Accuracy of the Human-Machine Classification System: Implementation of a Strategic Filtera Based on Agreement Between the Two Naïve Bayes Algorithms (Results for Small Categories Only, n< 100 Cases in Each Category).
BLS OIICS 2-Digit Event Code | Gold Standardb | Human-Machine System Coding of All Narrativesc | %Agreement Between 2 Manual Codersg | Fleiss Kappah manual coders | |||||
---|---|---|---|---|---|---|---|---|---|
| |||||||||
(n) | npredd | Sene | (95% CI) | PPVf | 95% CI | ||||
1* Violence and other injuries by persons or animals | |||||||||
12 | Injury by person - intentional or intent unknown | 96 | 78 | 0.66 | 0.56, 0.75 | 0.81 | 0.71, 0.88 | 47%–78% | 0.57 |
13 | Animal and insect related incidents | 99 | 79 | 0.80 | 0.71, 0.87 | 1.00 | 1.00, 1.00 | 79%–94% | 0.87 |
2* Transportation incidents | |||||||||
20 | Transportation incident, unspecified | 3 | 3 | 1.00 | 1.00, 1.00 | 1.00 | 1.00, 1.00 | 0%–0% | 0.00 |
21 | Aircraft incidents | 22 | 15 | 0.68 | 0.47, 0.89 | 1.00 | 1.00, 1.00 | 0%–75% | 0.17 |
22 | Rail vehicle incidents | 6 | 4 | 0.67 | 0.12, 1.00 | 1.00 | 1.00, 1.00 | 0%–100% | 0.67 |
23 | Animal & other non-motorized vehicle transport incidents | 14 | 13 | 0.86 | 0.65, 1.00 | 0.92 | 0.76, 1.00 | 0%–0% | 0.00 |
25 | Water vehicle incidents | 11 | 5 | 0.45 | 0.1, 0.81 | 1.00 | 1.00, 1.00 | 0%–88% | 0.25 |
3* Fires and explosion | |||||||||
31 | Fires | 22 | 20 | 0.91 | 0.78, 1.00 | 1.00 | 1.00, 1.00 | 55%–88% | 0.58 |
32 | Explosions | 21 | 18 | 0.86 | 0.69, 1.00 | 1.00 | 1.00, 1.00 | 44%–83% | 0.46 |
4* Falls, slips, trips | |||||||||
40 | Fall, slip, trip, unspecified | 4 | 2 | 0.50 | 0.00, 1.00 | 1.00 | 1.00, 1.00 | 0%–0% | 0.00 |
44 | Jumps to lower level | 57 | 39 | 0.61 | 0.48, 0.74 | 0.90 | 0.80, 1.00 | 51%–90% | 0.65 |
45 | Fall or jump curtailed by personal fall arrest system | 3 | 2 | 0.67 | 0.00, 1.00 | 1.00 | 1.00, 1.00 | 0%–0% | 0.00 |
5* Exposure to harmful substances or environments | |||||||||
50 | Exposure to harmful substances or environ, unspecified | 23 | 18 | 0.78 | 0.6, 0.96 | 1.00 | 1.00, 1.00 | 21%–88% | 0.33 |
51 | Exposure to electricity | 27 | 18 | 0.67 | 0.48, 0.86 | 1.00 | 1.00, 1.00 | 65%–88% | 0.81 |
52 | Exposure to radiation and noise | 38 | 36 | 0.87 | 0.76, 0.98 | 0.92 | 0.82, 1.00 | 54%–100% | 0.80 |
54 | Exposure to air and water pressure change | 1 | 0 | 0.00 | . | 0.00 | . | 0%–100% | 0.40 |
57 | Exposure to traumatic or stressful even nec | 32 | 23 | 0.72 | 0.55, 0.88 | 1.00 | 1.00, 1.00 | 73%–85% | 0.80 |
59 | Exposure to harmful substances or environments, nec | 1 | 7 | 0.00 | . | 0.00 | . | 0%–100% | 0.12 |
6* Contact with objects and equipment | |||||||||
60 | Contact with objects and equipment, uns | 78 | 43 | 0.54 | 0.43, 0.65 | 0.98 | 0.93, 1.00 | 12%–63% | 0.25 |
61 | Needle stick | 1 | 1 | 1.00 | 1.00, 1.00 | 1.00 | 1.00, 1.00 | - | - |
65 | Struck/caught/crush in collapsing structure, equip or material | 5 | 3 | 0.60 | 0.00, 1.00 | 1.00 | 1.00, 1.00 | 0%–0% | 0.33 |
66 | Rubbed or abraded by friction or pressure | 16 | 12 | 0.69 | 0.43, 0.94 | 0.92 | 0.73, 1.00 | 0%–50% | 0.11 |
67 | Rubbed abraded or jarred by vibration | 7 | 4 | 0.57 | 0.08, 1.00 | 1.00 | 1.00, 1.00 | 0%–67% | 0.14 |
69 | Contact with objects and equipment, nec | 1 | 1 | 1.00 | 1.00, 1.00 | 1.00 | 1.00, 1.00 | - | - |
7* Overexertion and bodily reaction | |||||||||
74 | Bodily conditions nec | 20 | 10 | 0.50 | 0.26, 0.74 | 1.00 | 1.00, 1.00 | 0%–75% | 0.33 |
78 | Multiple types of overexertions and bodily reactions | 23 | 13 | 0.39 | 0.18, 0.61 | 0.69 | 0.40, 0.98 | 0%–0% | 0.00 |
79 | Overexertion and bodily reaction and exertion, nec | 1 | 0.00 | . | 0.00 | . | - | - | |
Overall | 437 | 467 | 0.68 | 0.64, 0.72 | 0.92 | 0.89, 0.94 |
A filter is a technique to decide which narratives the computer should classify vs. which should be left for a human to read and classify.
Gold Standard codes were assigned to each narrative by expert manual coders
Human-machine system consisted of human coding 32% of the dataset, machine coding 68% of the dataset.
npred = number predicted into category.
Sen = Sensitivity: (true positives) the percentage of narratives that had been coded by the experts into each category that were also assigned correctly by the algorithm.
PPV = Positive Predicted Value: the percentage of narratives correctly coded into a specific category out of all narratives placed into that category by the algorithm.
Two-coder agreement, e.g. 6 total comparisons, coder 1 compared to 2,3,4, coder 2 compared to 3,4 coder 3 compared to 4.
Fleiss Kappa between 0 and 1, > 0.6 considered good agreement, >.8 considered very good agreement. Naivesw = Naïve Bayes Single Word Algorithm. Naiveseq = Naïve Bayes Sequence Word Algorithm.