Table 1.
Summary of the challenges of reliability estimation for online learning tasks that are discussed here. The general problem, our proposed recommendations, and their concrete illustration using the ASRT are shown
Challenge | Possible solution | Concrete example with the ASRT |
---|---|---|
Not all forms of reliability can be meaningfully evaluated in all contexts | Determine appropriate reliability forms | Interference and offline consolidation effects make test–retest reliability unfeasible for the ASRT. Rely on internal consistency and split-half reliability instead |
Multiple performance metrics can be calculated from the same task, the reliabilities of which cannot be assumed to be equivalent | Estimate reliability for each metric separately |
Accuracy and RT-based learning scores have distinct reliability profiles, with RT-based learning scores being somewhat more reliable Learning scores calculated using two-stage averaging are generally more reliable Triplet-based learning scores are more reliable, than pattern-random trial difference scores |
Different pre-processing choices regarding splitting can lead to distinct reliability estimates | Investigate robustness of reliability estimation to splitting choices, e.g., by varying the units of splitting and carrying out trial-resampling | Splitting by sequences instead of trials leads to lower reliability, with more variance, but possibly less bias |
Task length influences reliability estimation, with longer tasks being associated with higher reliability estimates. This needs to be taken into account when interpreting published reliability estimates and designing studies | Determine the scaling of reliability estimates with increasing task length | Threshold for 'minimally acceptable' reliability of .65 is met with a task length of around 25 blocks |
Sample size influences reliability estimation, with larger samples being associated with more precise reliability estimates. This needs to be taken into account when interpreting published reliability estimates | Determine the scaling of the precision of reliability estimates with increasing sample size | Marginal gains in the precision of reliability estimates drop off noticeably around 50 subjects |