. 2023 Jan 5;56(1):301–317. doi: 10.3758/s13428-022-02038-5

Table 1.

Summary of the challenges of reliability estimation for online learning tasks that are discussed here. The general problem, our proposed recommendations, and their concrete illustration using the ASRT are shown

Challenge	Possible solution	Concrete example with the ASRT
Not all forms of reliability can be meaningfully evaluated in all contexts	Determine appropriate reliability forms	Interference and offline consolidation effects make test–retest reliability unfeasible for the ASRT. Rely on internal consistency and split-half reliability instead
Multiple performance metrics can be calculated from the same task, the reliabilities of which cannot be assumed to be equivalent	Estimate reliability for each metric separately	Accuracy and RT-based learning scores have distinct reliability profiles, with RT-based learning scores being somewhat more reliable Learning scores calculated using two-stage averaging are generally more reliable Triplet-based learning scores are more reliable, than pattern-random trial difference scores
Different pre-processing choices regarding splitting can lead to distinct reliability estimates	Investigate robustness of reliability estimation to splitting choices, e.g., by varying the units of splitting and carrying out trial-resampling	Splitting by sequences instead of trials leads to lower reliability, with more variance, but possibly less bias
Task length influences reliability estimation, with longer tasks being associated with higher reliability estimates. This needs to be taken into account when interpreting published reliability estimates and designing studies	Determine the scaling of reliability estimates with increasing task length	Threshold for 'minimally acceptable' reliability of .65 is met with a task length of around 25 blocks
Sample size influences reliability estimation, with larger samples being associated with more precise reliability estimates. This needs to be taken into account when interpreting published reliability estimates	Determine the scaling of the precision of reliability estimates with increasing sample size	Marginal gains in the precision of reliability estimates drop off noticeably around 50 subjects