. 2020 Oct 21;5(6):e853. doi: 10.1097/PR9.0000000000000853

Table 2.

Reliability, recall, precision, and accuracy of OpenFace automated coding, based on manual FACS coding of 50 pain-categorized and 50 non–pain-categorized images.

Action unit	Reliability (κ)	Recall	Precision	Accuracy	Presence in pain expressions (manual FACS)	Presence in nonpain expressions (manual FACS)	Presence in pain expressions (OpenFace)	Presence in nonpain expressions (OpenFace)
AU4*	0.451	0.812	0.958	0.810	0.98_a	0.72_b	0.94_a	0.50_b
AU6*	0.270	1.000	0.500	0.590	0.60_a	0.20_b	0.94_a	0.68_b
AU7*	0.357	0.934	0.845	0.800	0.86_a	0.68_b	0.94_a	0.76_b
AU9*	0.459	0.891	0.710	0.740	0.82_a	0.28_b	0.82_a	0.56_b
AU10	0.064	0.828	0.320	0.430	0.34_a	0.24_a	0.86_a	0.66_b
AU12	0.512	0.811	0.652	0.760	0.40_a	0.36_a	0.52_a	0.40_a
AU20	−0.005	0.400	0.098	0.570	0.14_a	0.06_a	0.42_a	0.40_a
AU25	0.899	0.945	1.000	0.950	0.56_a	0.56_a	0.56_a	0.50_a
AU26	0.485	0.763	0.552	0.800	0.12_b	0.32_a	0.18_b	0.42_a
AU45*	0.358	0.870	0.671	0.690	0.92_a	0.16_b	0.86_a	0.56_b
Average	0.385	0.825	0.631	0.714	0.574_a	0.358_b	0.704_a	0.544_b
Average in selected AUs:	0.379	0.901	0.737	0.726	0.836_a	0.408_b	0.900_a	0.612_b

Asterisks indicate AUs determined to be reliable and pain relevant based on these data. Reliability is measured in Cohen’s kappa values. Recall (eg, sensitivity) was calculated as the number of true positives divided by the sum of true positives and false negatives. Precision (eg, positive predictive value) was calculated as the number of true positives divided by the sum of true and false positives. The last four columns present the proportion of expressions demonstrating the presence of a given AU in pain-categorized and non–pain-categorized expressions, split by manual and automated coding. Values within a coding set with the different subscripts are significantly different from each other (P < 0.05; a > b).

AU, action unit; FACS, Facial Action Coding System.