The LBA assumes that, on every trial, two accumulators race deterministically toward a common bound
b. Each accumulator
i starts at a start point sampled from a uniform distribution [0, A], and with a speed of evidence accumulation sampled from a normal distribution
. In the RL-ALBA model, we used
Equation 4 to link Q-values to LBA drift rates
and
(excluding the
term, since the LBA assumes no within-trial noise). Instead of directly estimating threshold
b, we estimated the difference
B =
b-A (which simplifies enforcing
b>
A). We used the following mildly informed priors for the hypermeans:
,
truncated at lower bound 0,
,
,
,
truncated at lower bound 0, and
, truncated at lower bound 0.025 and upper bound 1. For the hyperSDs, all priors were
. The summed BPIC was 4836, indicating that the RL-ALBA performs slightly better than the RL-DDM with between-trial variabilities (BPIC = 4844), and better than the RL-lARD (BPIC = 4849), but not as well as the RL-ARD (BPIC = 4577).