. 2021 Aug 4;11(8):2303. doi: 10.3390/ani11082303

Table 2.

Statistical methods for validation of the Broken/Unbroken Test.

Reliability and Validity	Type of Analysis	Description	Statistical Test
-	Frequency distribution	Distribution of AHT, HT, and BUT scores	Descriptive statistics
Reliability	Inter-observer	Agreement among the four blinded observers ¹	Fleiss’ kappa and ICC
	Intra-observer	Agreement between scores assigned by the same observer to videos viewed twice ¹	Kendall tau-b correlation coefficient, concordance rate, and ICC
	Test–retest	Agreement between results of tests conducted on the same horse at two different times ¹
	Internal consistency and item-total correlation	Agreement between individual items of the scale ¹ and between each item and the total score	Spearman’s rank-order coefficient ²
Validity	Construct	Degree to which the BUT score correlates with other measures to which it is theoretically related ^1,3	Spearman’s rank-order coefficient and ordinal logistic regressions
Validity	Criterion	Strength of the relationship between the BUT score and the ‘gold standard’ criterion ^4,5	Binary logistic regression ⁵, Receiver operating characteristic (ROC) analysis, and Cohen’s kappa ⁵

AHT = Approaching and Haltering Test; BUT = Broken/Unbroken Test; CI = Confidence Interval; HT = Handling Test; ICC = intraclass correlation coefficient; ROC = receiver operating characteristic. BUT score = sum of scores assigned to AHT and HT tests. ¹ Modified by Meagher [36]. ² Spearman’s coefficient was chosen as Cronbach’s coefficient alpha is inappropriate for two-item scales [39]. ³ Convergent validity. ⁴ Modified by Boateng [37] (concurrent criterion validity). ⁵ Expert’s judgment used as criterion measure.