. 2018 Apr 10;13(4):e0195448. doi: 10.1371/journal.pone.0195448

Table 2. Sample sizes and statistical tests used for the reliability analyses.

Type of reliability	Sample specifications	Statistical test
Internal consistency	All dogs (N = 217)	Cronbach’s alpha
Intra-observer reliability	38 videos were coded twice by the same coder (> 2 years between the two coding sessions).	ICC (2,k absolute agreement)
Inter-observer reliability	40 videos were coded twice by two of three coders.	ICC (1,k absolute agreement)
Test-retest reliability	37 dogs (43.2% males, mean age during first test + SD = 2.76+1.92 years, mean age during second test + SD = 6.53+2.05 years) were tested a second time, on average 3.77 years (range: 2.52–4.72 years) after the first test, by a different experimenter and in a different test room.	ICC (3,k consistency)
Reliability between two test locations	A sample of 72 dogs (36 dogs per room, matched by age and sex) were compared across the two test locations: Room 1: 38.8% males, mean age + SD = 2.53+2.05 years Room 2: 38.8% males, mean age + SD = 2.55+2.02 years	Independent t-test
Reliability between experimenters	A sample of 105 dogs (35 dogs per experimenter, matched by age and sex) were compared across the three experimenters: E 1: 42.9% males, mean age + SD = 1.79+0.90 years E 2: 40.0% males, mean age + SD = 1.66+0.45 years E 3: 45.7% males, mean age + SD = 1.77+0.25 years	ANOVA

E: experimenter, ICC: Intraclass correlation coefficient