Table 2.
Reliability and Validity | Type of Analysis | Description | Statistical Test |
---|---|---|---|
- | Frequency distribution | Distribution of AHT, HT, and BUT scores | Descriptive statistics |
Reliability | Inter-observer | Agreement among the four blinded observers 1 | Fleiss’ kappa and ICC |
Intra-observer | Agreement between scores assigned by the same observer to videos viewed twice 1 | Kendall tau-b correlation coefficient, concordance rate, and ICC | |
Test–retest | Agreement between results of tests conducted on the same horse at two different times 1 | ||
Internal consistency and item-total correlation | Agreement between individual items of the scale 1 and between each item and the total score | Spearman’s rank-order coefficient 2 | |
Validity | Construct | Degree to which the BUT score correlates with other measures to which it is theoretically related 1,3 | Spearman’s rank-order coefficient and ordinal logistic regressions |
Criterion | Strength of the relationship between the BUT score and the ‘gold standard’ criterion 4,5 | Binary logistic regression 5, Receiver operating characteristic (ROC) analysis, and Cohen’s kappa 5 |
AHT = Approaching and Haltering Test; BUT = Broken/Unbroken Test; CI = Confidence Interval; HT = Handling Test; ICC = intraclass correlation coefficient; ROC = receiver operating characteristic. BUT score = sum of scores assigned to AHT and HT tests. 1 Modified by Meagher [36]. 2 Spearman’s coefficient was chosen as Cronbach’s coefficient alpha is inappropriate for two-item scales [39]. 3 Convergent validity. 4 Modified by Boateng [37] (concurrent criterion validity). 5 Expert’s judgment used as criterion measure.