. Author manuscript; available in PMC: 2024 May 3.

Published in final edited form as: Adv Neural Inf Process Syst. 2021 Dec;2021(DB1):1–15.

Table 4:

Class-averaged results on Task 3 over the 7 behaviors of interest (mean ± standard deviation over 5 runs.) The average F1 score in brackets corresponds to improvements with threshold tuning. See appendix for per class results.

Method	Data Used During Training			Average F1	MAP
Method	Task 1 (train split)	Task 3 (train split)	Unlabeled Set	Average F1	MAP
Baseline	✓	✓		0.338 ± .004	.317 ± .005
Baseline w/task prog	✓	✓	✓	.328 ± .009	.320 ± .009
MABe 2021 Task 3 Top-1		✓		.319 ± .025 (.363 ± .020)	.352 ± .023