Skip to main content
. 2023 Jan 5;56(1):301–317. doi: 10.3758/s13428-022-02038-5

Table 1.

Summary of the challenges of reliability estimation for online learning tasks that are discussed here. The general problem, our proposed recommendations, and their concrete illustration using the ASRT are shown

Challenge Possible solution Concrete example with the ASRT
Not all forms of reliability can be meaningfully evaluated in all contexts Determine appropriate reliability forms Interference and offline consolidation effects make test–retest reliability unfeasible for the ASRT. Rely on internal consistency and split-half reliability instead
Multiple performance metrics can be calculated from the same task, the reliabilities of which cannot be assumed to be equivalent Estimate reliability for each metric separately

Accuracy and RT-based learning scores have distinct reliability profiles, with RT-based learning scores being somewhat more reliable

Learning scores calculated using two-stage averaging are generally more reliable

Triplet-based learning scores are more reliable, than pattern-random trial difference scores

Different pre-processing choices regarding splitting can lead to distinct reliability estimates Investigate robustness of reliability estimation to splitting choices, e.g., by varying the units of splitting and carrying out trial-resampling Splitting by sequences instead of trials leads to lower reliability, with more variance, but possibly less bias
Task length influences reliability estimation, with longer tasks being associated with higher reliability estimates. This needs to be taken into account when interpreting published reliability estimates and designing studies Determine the scaling of reliability estimates with increasing task length Threshold for 'minimally acceptable' reliability of .65 is met with a task length of around 25 blocks
Sample size influences reliability estimation, with larger samples being associated with more precise reliability estimates. This needs to be taken into account when interpreting published reliability estimates Determine the scaling of the precision of reliability estimates with increasing sample size Marginal gains in the precision of reliability estimates drop off noticeably around 50 subjects