Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2021 Oct 13.
Published in final edited form as: J Am Stat Assoc. 2020 Sep 8;116(535):1114–1127. doi: 10.1080/01621459.2020.1801448

Bayesian Semiparametric Longitudinal Drift-Diffusion Mixed Models for Tone Learning in Adults

Giorgio Paulon a, Fernando Llanos b,c, Bharath Chandrasekaran c, Abhra Sarkar a
PMCID: PMC8513775  NIHMSID: NIHMS1647893  PMID: 34650315

Abstract

Understanding how adult humans learn nonnative speech categories such as tone information has shed novel insights into the mechanisms underlying experience-dependent brain plasticity. Scientists have traditionally examined these questions using longitudinal learning experiments under a multi-category decision making paradigm. Drift-diffusion processes are popular in such contexts for their ability to mimic underlying neural mechanisms. Motivated by these problems, we develop a novel Bayesian semiparametric inverse Gaussian drift-diffusion mixed model for multi-alternative decision making in longitudinal settings. We design a Markov chain Monte Carlo algorithm for posterior computation. We evaluate the method’s empirical performances through synthetic experiments. Applied to our motivating longitudinal tone learning study, the method provides novel insights into how the biologically interpretable model parameters evolve with learning, differ between input-response tone combinations, and differ between well and poorly performing adults. supplementary materials for this article, including a standardized description of the materials available for reproducing the work, are available as an online supplement.

Keywords: Auditory category/tone learning, Drift-diffusion models, Inverse Gaussian distributions, Local clustering, Longitudinal mixed models, Perceptual decision making

1. Introduction

Understanding the cognitive and biological mechanisms underlying our ability to learn new speech categories in adulthood constitute important questions in auditory neuroscience. Recent studies have demonstrated that adults are capable of learning features of a second language to a high degree of efficiency, demonstrating that age need not always constrain language learning abilities. The inherent dynamic complexities underlying learning in adulthood are not yet well understood but are being studied through extensive ongoing research.

The research reported here is motivated particularly by experiments on the acquisition of Mandarin tones by native speakers of English. Native speech categories are acquired during the first year of life, within a so-called phonetic sensitivity period. There is a greater neural commitment to native-language speech sounds, and this commitment may preclude the learning of novel speech categories in adulthood (Johnson and Newport 1989; Iverson et al. 2003). In Mandarin Chinese, there are four tone categories that systematically change word meaning, similar to consonants and vowels in English. These tones are, however, linguistically irrelevant in English. English native speakers thus struggle to distinguish the four tones and generalize their differences (Wang et al. 1999; Chandrasekaran, Sampath, and Wong 2010; Maddox and Chandrasekaran 2014). In laboratory settings, combining exposure to perceptually variable tones with trial-by-trial corrective feedback can improve tone categorization skills within a few hundred trials. Reaching a native like proficiency, however, may take several sessions of training (Xie, Reetzke, and Chandrasekaran 2017; Reetzke et al. 2018). The perceptual and sensory representation of Mandarin tones gets fundamentally refined over the course of this learning period (Feng, Yi, and Chandrasekaran 2019). Understanding this longitudinal evolution is critical to assess the cognitive dynamics of speech category learning. The statistical challenge is to make this assessment indirectly from behavioral data on tone categorization responses and response times.

To this end, we identify the Mandarin tone categorization problem with the broader class of problems of multi-category decision making under perceptual stimuli (Heekeren et al. 2004; Smith and Ratcliff 2004; Gold and Shadlen 2007; Schall 2001; Glimcher and Fehr 2013; Purcell 2013). In such contexts, drift-diffusion processes are popular models for behavioral accuracies and response times as they mimic the accumulation of sensory evidence in favor of different decision alternatives in the human brain (Ratcliff 1978; Ratcliff et al. 2016). The existing literature on drift-diffusion models is substantive (Smith and Vickers 1988; Ratcliff and Rouder 1998; Ratcliff and McKoon 2008). These classical methods, as well as their recent adaptations using reinforcement learning based ideas (Pedersen, Frank, and Biele 2017; Fontanesi et al. 2019; Peters and D’Esposito 2020), are, however, heavily focused on the two category case with a single latent diffusion process and two boundaries, one for each of the two decision alternatives. This is despite the fact that humans often are required to learn more than two categories at once. For example, English has 14 vowels and 24 consonant phonemes; Mandarin has four tone categories, etc. The joint likelihood of accuracies and response times under models with a single diffusion process is mathematically complex and computationally expensive (Tuerlinckx et al. 2001; Tuerlinckx 2004; Navarro and Fuss 2009). Inference in such models is thus often based on approximations of the likelihood (Vandekerckhove and Tuerlinckx 2007), or on the conditional likelihood of the response times, conditioned on the decisions (Vandekerckhove, Tuerlinckx, and Lee 2008). Multi-category drift-diffusion models with separate latent processes, one for each decision category and simultaneously at play, have been developed to address some of the limitations (Usher and McClelland 2001; Brown and Heathcote 2008; Leite and Ratcliff 2010; Dufau, Grainger, and Ziegler 2012; Kim et al. 2017), but the relevant literature remains sparse and focused only on simple static designs.

Learning to distinguish Mandarin tones or, more generally, to make categorization decisions is, however, a dynamic process, driven by continuous and nuanced perceptual adjustments in our brain and behavior over time. The existing simple static models are thus severely limited in their ability to capture the true inherent complexities, including assessing the biologically relevant changes that take place over the learning period. Principled statistical approaches to multi-category dynamic drift-diffusion mixed effects models, that appropriately accommodate fixed effects of experimental factors as well as random effects due to subjects, are therefore highly needed but present daunting methodological and computational challenges.

In this article, we address these challenges by developing a novel biologically interpretable flexible Bayesian semiparametric inverse Gaussian drift-diffusion mixed model for studying multi-alternative perceptual decision making processes in longitudinal settings.

Our construction proceeds by characterizing the accumulation of evidence for different input-response tone combinations by associated independent Wiener diffusion processes, resulting in an inverse Gaussian distribution based joint probability model for the final response tone and the associated response time. To adapt this to a longitudinal mixed model setting, we then assume the model parameters to comprise input-response tone specific fixed effects and subject specific random effects, modeling them both by mixtures of locally supported B-spline bases (de Boor 1978; Eilers and Marx 1996) spanning the length of the longitudinal experiment. Both these effects are thus allowed to evolve flexibly as smooth functions over the training period (Ramsay and Silverman 2007; Morris 2015; Wang, Chiou, and Müller 2016) as the participants get more experience and training in their assigned decision tasks.

Dependence in the fixed effects model spline coefficients across adjacent temporal regions is induced via hidden Markov models (HMMs, Rabiner 1989; McDonald and Zucchini 1997; Cappé, Moulines, and Rydén 2005; Frühwirth-Schnatter 2006), one for each input-response tone combination but all sharing a common state space, as well as a novel smoothness inducing Markovian prior on the core spline coefficients. The HMMs, adapted in such novel ways, induce a local clustering of the fixed effects spline coefficients associated with different input-response tone combinations, in effect, allowing us to assess local similarities and differences between the corresponding parameter trajectories in different learning phases.

This ability to infer local similarities and differences in the cognitive dynamics is theoretically and practically relevant for tone learning applications. The underlying mechanisms are expected to be very similar when the participants are first introduced to the tones; differences may appear as they get better at identifying the tones as some tones may be easier to identify than others in this stage; these differences may start to disappear again in later stages of the experiment as the participants become highly proficient in identifying all the different tones. As for individual heterogeneity, neural measures of sensory encoding information collected prior to the learning task show no clear individual differences, even though the process of learning itself results in good and poor learners (Reetzke et al. 2018).

The literature on longitudinal data analysis models is enormous. See, for example, books by Diggle et al. (2002), Singer, Willett, and Willett (2003), Fitzmaurice et al. (2008), and the references therein. Bayesian methods for longitudinal data have also been extensively developed (Chib and Hamilton 2002; Daniels and Pourahmadi 2002; Li, Lin, and Müller 2010; Müller et al. 2013; Quintana et al. 2016, etc.). The problem of modeling locally clustered effects has, however, not garnered much attention. We can only mention Petrone, Guindani, and Gelfand (2009) and Nguyen and Gelfand (2011, 2014), all of which were designed primarily for normally distributed functional data with continuous covariates. It is not clear how these approaches can be adapted to our problem.

Overall, our proposed method takes the existing state-of-the-art many significant steps forward, including (a) introducing a novel biologically interpretable class of multi-category inverse Gaussian drift-diffusion models for decision making, (b) accommodating fixed effects of perceptual stimuli and random effects due to subject specific heterogeneity in such models in a statistically principled manner, (c) adapting these models to longitudinal study designs, studying the temporal evolution of the underlying process parameters as the subjects get trained and experienced in their assigned decision tasks, (d) allowing the process parameters to be locally clustered, enabling the assessment of their similarities and differences in various learning stages.

Applied to our motivating tone learning dataset, the proposed method provides many novel insights into the cognitive dynamics, allowing us to answer important scientific questions completely outside the scope of the previously existing literature. These include a detailed understanding of how biologically significant model parameters, that systematically relate to the underlying neural processes, evolve and interplay to enable gradual longitudinal learning in the participants, how similar or different these parameters are across different input and output tone combinations in different learning phases, how these processes differ between a good and a bad learner, etc.

The rest of this article is organized as follows. Section 2 provides additional background on tone learning and drift-diffusion models. Section 3 details our novel locally varying longitudinal drift-diffusion mixed model. Section 4 outlines computational challenges and solution strategies. Section 5 presents the results of the proposed method applied to tone learning data. Section 6 contains concluding remarks. Substantive additional details, including a Markov chain Monte Carlo (MCMC) based posterior inference algorithm and results of simulation experiments, are presented in the supplementary materials.

2. Behavioral Data and Scientific Background

The behavioral dataset that motivated our research comes from an intensive multi-day longitudinal speech category training study reported previously in Reetzke et al. (2018). In this study, n = 20 native English-speaking adults were trained to categorize Mandarin Chinese syllables into lexical tone categories as a function of their pitch contour. Mandarin Chinese has four syllabic pitch contours or tones that are used to convey different lexical meanings. For example, in Mandarin Chinese, the syllable “ma” can be interpreted as “mother,” “hemp,” “horse,” or “scold” depending on whether is pronounced with a high-level (T1), low-rising (T2), low-dipping (T3), or high-falling (T4) tone, respectively. The stimuli consisted of these tones pronounced by four native Mandarin speakers. The trials were administered in homogeneous blocks. Each block comprised 40 categorization trials for 40 different speech exemplars, corresponding to different combinations of speakers, syllables, and input tones. Participants were trained across several days, with five blocks on each day. On each categorization trial, participants indicated the tone category they heard via a button press on a computer keyboard. Following the button press, the participants were given corrective feedback (“Correct/Incorrect”) on a computer screen which was previously shown to be more effective in enhancing learning compared to full feedback (e.g., “Incorrect, that was a category 2”) (Chandrasekaran, Yi, and Maddox 2014). Individual categorization performance was monitored across training sessions until each participant achieved and maintained accuracy levels comparable to that of native speakers of Mandarin.

The data consist of the tone responses and the associated response times for different input tones for the 20 participants. We focus here on the first two days of training (10 blocks in total) as they exhibited the steepest improvement in learning as well as the most striking individual differences relative to any other collection of blocks (Figure 1). In that sense, they provide an optimal longitudinal frame to assess the effects of learning on decision making variables.

Figure 1.

Figure 1.

Left panel: Proportions of times an input tone was classified into different tone categories by different subjects. The thick line represents the average performance across subjects. Right panel: Associated response times averaged across subjects for clarity. In both panels, high-level tone responses are shown in red; low-rising in blue; low-dipping in green; and high-falling in purple.

Tone learning can be viewed from a broader perspective of multi-category decision making tasks, and hence can be studied using computational models developed for such tasks. We present here a brief nontechnical overview of how these models relate to the underlying neurobiology. Mathematical details and developments are deferred to Section 3.

In a typical multi-category decision task, the brain accumulates sensory evidence to make a categorical decision. This accumulation process is reflected in increasing firing rate at local neural populations associated with alternative decisions. A decision is taken when neural activity in one of these populations crosses a particular threshold level. The decision category that is finally chosen is the one whose decision threshold is crossed first (Gold and Shadlen 2007; Brody and Hanks 2016).

Changes in evidence accumulation rates and decision thresholds can be induced by task difficulty, neurostimulation, and/or individual differences in cognitive function (Cavanagh et al. 2011; Ding and Gold 2013). Decision-making is also regulated by demands on both speed and accuracy as a function of the task (Bogacz et al. 2010; Milosavljevic et al. 2010). The overall learning accuracies (“Correct/Incorrect” response proportions) in our dataset were previously analyzed in Paulon et al. (2019) using a binary logistic longitudinal mixed model. In a different context, Craigmile, Peruggia, and Van Zandt (2010) had developed a model for response times. Separate models for accuracies and response times cannot, however, provide a meaningful interpretation of the speed-accuracy trade-off.

An excellent basis for jointly modeling accuracies and response times is obtained by imitating the underlying neural evidence accumulation mechanisms via latent drift-diffusion processes racing toward their respective boundaries, the process reaching its boundary first producing the final observed decision and the time taken to reach this boundary giving the associated response time (Figure 2) (Usher and McClelland 2001). The drift and the boundary parameters jointly explain the dynamics of choice, including the speed-accuracy trade-off. Broadly speaking, decision thresholds remaining fixed, higher drift rates lead to faster and more accurate responses; for fixed drift rates, higher decision thresholds, on the other hand, increase response times as well as inaccuracies.

Figure 2.

Figure 2.

Drift-diffusion model for perceptual decision making. After an initial δs amount of time required to encode an input signal s, the evidence in favor of a response category d accumulates according to a Wiener diffusion process with drift μd,s. The decision d is eventually taken if the underlying process is the first to reach its decision boundary bd,s. Here we illustrate a tone learning trial with input tone T1 (s = 1) that was eventually correctly identified. Section 2 provides additional neurobiological background. Section 3 provides additional mathematical details.

In our motivating tone learning experiment, we are interested in understanding the evolution and interplay of the drift and the boundary parameters behind the improved tone identification performances over training. Importantly, as was also discussed in the introduction, we are not just interested in estimating the overall trajectories of these parameters but also how they might differ between different input-response tone combinations locally in different longitudinal stages of the experiment. Additional interest lies in assessing subject level heterogeneity in these parameter trajectories, including particularly how they differ between good versus bad learners.

3. Longitudinal Drift-Diffusion Mixed Models

The basic Wiener diffusion process can be specified as W(τ) = μτ + σB(τ), where B(τ) is the standard Brownian motion, μ is the drift rate, and σ is the diffusion coefficient (Cox and Miller 1965; Ross et al. 1996). The process has independent normally distributed increments, that is, ΔW(τ) = {W(τ + Δτ) − W(τ)} ~ Normal(μΔτ, σ2Δτ), independently from W(τ). The first passage time of crossing a threshold b, τ = inf{τ′ : W(0) = 0, W(τ′) ≥ b}, is then distributed according to an inverse Gaussian distribution (Whitmore and Seshadri 1987; Chhikara 1988; Lu 1995) with density

f(τμ,σ2,b)=b2πσ2τ3/2exp{(bμτ)22σ2τ},b>0,μ>0,σ2>0.

With θ = (μ, σ, b)T, we have E(τθ)=b/μ and var (τ | θ) = 2/μ3.

Given perceptual stimuli and a set of decision choices, the neurons in the brain accumulate evidence in favor of the different alternatives. Modeling this behavior using Wiener processes with unit variances, assuming that a response is given when the decision threshold for one of the options is crossed, a probability model for the time τd to reach the threshold for the dth decision category under the influence of the sth stimulus is obtained as

f(τdδs,μd,s,1,bd,s)=bd,s2π(τdδs)3/2exp[{bd,sμd,s(τdδs)}22(τdδs)], (1)

where μd,s denotes the rate of accumulation of evidence, bd,s the decision boundaries, and δs an offset representing the collective time required to encode the sth signal before evidence accumulation begins, the time to press a computer key to record a response after a decision is reached, etc. (Figure 2). We now let θd,s = (δs, μd,s, bd,s)T. Since a decision d is reached at response time τ if the corresponding threshold is crossed first, that is when {τ = τd} ∩d′ ≠d {τd > τd}, we have d = arg min τd. Assuming simultaneous accumulation of evidence for all decision categories, modeled by independent Wiener processes, and termination when the threshold for the observed decision category d is reached, the joint distribution of (d, τ) is thus given by

f(d,τs,θ)=g(τθd,s)dd{1G(τθd,s)}, (2)

where, to distinguish from the generic notation f, we now use g(· | θ) and G(· | θ) to denote, respectively, the probability density function (pdf) and the cumulative distribution function (cdf) of an inverse Gaussian distribution, as defined in (1). We refer to model (2) as the inverse Gaussian drift-diffusion model.

The marginal distribution of the response times τ under the influence of stimulus s is then obtained as

f(τs,θ)=dg(τθd,s)dd{1G(τθd,s)}. (3)

The marginal probability of taking decision d under the influence of stimulus s is likewise obtained as

f(ds,θ)=δsg(τθd,s)dd{1G(τθd,s)}dτ. (4)

Interestingly, model (4) is similar to traditional multinomial probit/logit regression models (Borooah 2002; Agresti 2018) except that the latent variables are now inverse Gaussian distributed as opposed to being normal or extreme-value distributed, and the observed category is associated with the minimum of the latent variables in contrast to being identified with the maximum of the latent variables.

In an interesting recent work, Kunkel et al. (2019) have also used an inverse Gaussian distribution based hierarchical Bayesian model for decision making, albeit in a simpler binary category case, focusing primarily on individual level models with no mechanism to assess population level effects or their dynamic complexities.

For our motivating longitudinal tone learning experiment described in Section 2, for i ∈ {1, , n = 20}, ∈ {1, , L = 40}, t ∈ {1, , T = 10}, let si,,t denote the input tone for the ith individual in the th trial in block t. Likewise, let di,,t and τi,,t denote, respectively, the selected Mandarin tone and the time taken to reach the corresponding threshold by the ith individual in the th trial in block t. We now have

g{τi,l,tsi,l,t=s,θd,s(i)(t)}=bd,s(i)(t)2π(τi,l,tδs(i))3/2exp[{bd,s(i)(t)μd,s(i)(t)(τi,l,tδs(i))}22(τi,l,tδs(i))]. (5)

The drift rates μd,s(i)(t) and the decision boundaries bd,s(i)(t) now also vary with the blocks t. In addition, we accommodate random effects by allowing δs(i), μd,s(i)(t) and bd,s(i)(t) to also depend on the subject index i. We let yi,, t = (di,,t, τi,,t), y = {yi,,t}i,,t, and d0 = 4 be the number of possible decision categories (T1, T2, T3, T4). The likelihood function of our longitudinal drift-diffusion mixed model thus takes the form

L(ys,θ)=d=1d0s=1d0t=1Ti=1nl=1L(g{τi,l,tθd,s(i)(t)}dd[1G{τi,l,tθd,s(i)(t)}])1{di,l,t=d,si,l,t=s}.

3.1. Modeling the Offsets

The offset parameters δs(i), we recall, signify the times spent on encoding the different input tones, the time to press computer keys to record the responses, etc., and hence are not directly relevant to the actual decision making processes. These parameters are thus biologically not very interesting but may still vary between individuals and have an important effect on the estimates of drift rates and boundaries (Teichert, Grinband, and Ferrera 2016). We thus let them vary between input stimuli and participants but assume them to remain stable across blocks as in (5).

We assign uniform priors on δs(i)~Unif(0,δs,i,max), where δs,i, max is the minimum of all response times under stimulus s for individual i, that is, δs,i,max=min{(l,t):si,l,t=s}τi,l,t.

3.2. Modeling the Drifts and the Boundaries

Our modeling efforts concentrate henceforth on flexibly characterizing the longitudinal evolution of the mixed effects parameters μd,s(i)(t), bd,s(i)(t). Variations in these parameters over training blocks explain perceptual learning in the participants. Variations across participants, on the other hand, explain their performance heterogeneity. Following the discussion in the introduction, of particular interest are the local similarities and differences between these parameters for different input-response tone combinations (d, s) in different learning phases.

To this end, we propose essentially identical modeling strategies for μd,s(i)(t) and bd,s(i)(t). For ease of exposition avoiding unnecessary repetition, we describe below only these common strategies using simplified generic notations. With x=(d,s)X={(1,1),(1,2),,(4,4)}{1,2,,xmax}, xmax = 4 × 4, succinctly representing the input-response tone combinations and, with some abuse, θx(i)(t) being a generic for μd,s(i)(t) and bd,s(i)(t), we let

θx(i)(t)=exp{fx(t)+ux(i)(t)},ux(i)(t)~fu{ux(i)(t)}. (6)

The exponentiation in (6) enforces positivity constraints; fx(t) and ux(i)(t) denote, respectively, additive fixed and random effects components in the exponential scale; fu denotes the underlying random effects distribution. When needed, the fixed and random effects components for the drifts and the boundaries, as well as associated parameters and hyper-parameters, will be distinguished by reintroducing the subscripts as fμ,x(t), fb,x(t), uμ,x(i)(t), ub,x(i)(t), etc. To further simplify notation, generic data recording experimental blocks in {1, , T} as well as other generic time points in [1, T] will both be denoted by t. Likewise, generic input-response tone combinations as well as their particular values will both be denoted by x and so forth.

We model the components fx(t) and ux(i)(t), and hence θx(i)(t), to all be smoothly varying functions over t ∈ [1, T]. A functional approach is not strictly necessary if inference is restricted only to the T data recording blocks t ∈ {1, , T}. Learning may, however, be viewed as a continuous process—the brain synthesizes information from relevant past experiences even when not being actively engaged in actual decision making. A functional approach to modeling fx(t) and ux(i)(t) for any t ∈ [1, T], not just the experimental blocks t ∈ {1, , T}, thus facilitates parameter interpretability. A functional approach is also practically convenient in characterizing smoothly varying longitudinal parameter trajectories.

In modeling the fixed effects components fx(t), we are not only interested in characterizing their overall trajectories over time t for different input-response combinations x = (d, s) but also how they might vary locally between different values of x in different learning stages. Compared to the fixed effects, we have to, however, rely on much less data to estimate the random effects ux(i)(t) for different x = (d, s) and different participant i, especially for ds toward later stages of the experiment when most participants identify the input tones with high accuracies. Our models and inferential goals for the random effects ux(i)(t) will therefore be relatively modest.

3.2.1. Locally Varying Functional Fixed Effects

We now propose a novel approach to modeling the latent functions fx(t) using basis decomposition methods that allow them to smoothly vary with the blocks t while also depending locally on the indexing variable x. To begin with, we let

fx(t)=k=1Kβk(x)Bk(t), (7)

where B(t) = {B1(t), , BK(t)}T are a set of known locally supported basis functions spanning [1, T], β(x)=(β1(x),,βK(x))T are associated unknown coefficients to be estimated from the data. In this article, we use quadratic B-spline bases with knot points coinciding with the block locations. B-splines are nonnegative, continuous and have desirable local supports (Figure 3). Mixtures of B-splines are highly flexible (de Boor 1978). Allowing the βk(x)’s to flexibly vary with x, the model can accommodate widely different shapes for different input-response tone combinations.

Figure 3.

Figure 3.

Plot of 8 quadratic B-splines on an interval [1, T] defined by 11 knot points that divide [1, T] into K = 6 equal subintervals.

It is difficult to assess how similar or different these functions are using such unstructured models. One potential solution is to cluster the spline coefficients β(x) associated with different input-response tone combinations x. If, for example, β(x1) = β(x2) for two combinations x1 and x2, then we have fx1(t)=fx2(t) for all t.

Such global clustering of all elements of β(x) together does not, however, allow us to straightforwardly assess the local similarities and differences between these functions in different learning phases. To induce a desirable local cluster inducing mechanism, we introduce a set of latent variables zk(x) for each input-response tone combination x with a shared state space X, and associated core coefficients βk,z and let

(βk(x)zk(x)=zk)=βk,zk, implying {fx(t)zk(x)=zk,k=1,,K}=k=1Kβk,zkBk(t). (8)

The set of B-spline coefficients to be estimated at the kth location now comprises the βk,zk’s that are indexed by zk(x)=zk at that location k. When zk(x1)=zk(x2) for two different levels x1 and x2 of x, we have βk(x1)=βk(x2) and the implied functions fx1(t) and fx2(t) will tend to be similar at location k. Indeed, for quadratic B-splines with knots at the blocks {1, , T}, fx1(t) and fx2(t) will be exactly equal at block t when zt(x1)=zt(x2) and zt+1(x1)=zt+1(x2).

In theory, we could use B-splines of other small degrees as they all enjoy local support properties. With linear splines, however, smoothness becomes harder to control, and with cubic splines, three latent variables would be needed to determine the cluster configuration at each block t. We found quadratic B-splines to be a good compromise between the two for modeling smoothly varying curves while also maintaining easy interpretability of the latent variables.

Letting Zk={zk:zk(x)=zk for some xX}, the case |Zk|=1 then characterizes the scenario when the spline coefficients for all input-response tone combinations x are the same at location k. On the other end, when |Zk|=xmax=4×4, the spline coefficients are all different for different x at location k. In our tone learning application, |Zk| tend to be much smaller than xmax uniformly for all k and the restricted support zk(x){1,,zmax}X with zmax=8<xmax=16 will suffice.

We model the temporal evolution of the latent local cluster indicators zk(x),k=1,,K, using HMMs (Figure 4). We consider two types of dynamics for the latent states corresponding to correct (C) and incorrect (I) identification of the tones. That is,

(zk(d,s)zk1(d,s)=zk1)~Mult(πzk1,1(C),,πzk1,zmax(C))when d=s,
(zk(d,s)zk1(d,s)=zk1)~Mult(πzk1,1(I),,πzk1,zmax(I))when ds.

The latent cluster inducing variables zk(x)’s are shared between fμ,x(t) and fb,x(t), reducing computational complexities while also facilitating model interpretability. We assign Dirichlet priors on the transition probabilities

πz(C)=(πz,1(C),,πz,zmax(C))T~Dir(α(C)/zmax,,α(C)/zmax)withα(C)~Ga(aα,bα),
πz(I)=(πz,1(I),,πz,zmax(I))T~Dir(α(I)/zmax,,α(I)/zmax)withα(I)~Ga(aα,bα).
Figure 4.

Figure 4.

Left panel: Graph of a conventional HMM. Right panel: Graph of our proposed functional HMM model (8) with quadratic B-splines (Figure 3) with knots points coinciding with the data recording time blocks (T = K − 1).

We next consider priors for the atoms βk,zk. Conditional on the zk(x)’s and the coefficients at the previous locations, for k = 2, , K, we construct the priors sequentially as

βk,zk~{{zk1(x):xXk(zk)}Normal(βk1,zk1(x),σβ,12)if |Xk(zk)|>0,Normal(μβ,0,σβ,02)otherwise, (9)

where Xk(zk)={x:zk(x)=zk} is the set of values of x that, at the location k, are assigned the label zk. In constructing the prior in this manner, we center the core coefficients around the ones that are “expressed” at the previous location (Figure 5), penalizing their first-order differences. The coefficients that are not associated with any levels of x are assigned a normal prior with a large variance σβ,02. The initial coefficients are assigned non-informative flat priors as β1,zk~1. Additional illustrations on these smoothness inducing priors on the core coefficients can be found in Section S.2 of the supplementary materials.

Figure 5.

Figure 5.

An illustration of the prior on the spline core coefficients βk,zk at location k (marked by the dashed vertical lines) in the fixed effects model developed in Section 3.2.1 for a synthetic scenario with x ∈ {1, 2, 3}, where the curves corresponding to the three levels of x are initially equal, the curves for x = 1, 3 (in red) and x = 2 (in blue) then diverge at t = 6, merging back again at t = 15.

The smoothness of the curves is controlled by the parameter σβ,12 and is assigned a prior, allowing it to be informed by the data. We let

σβ,12~C+(0,1),

where C+(a, b) denotes a half-Cauchy distribution (Gelman 2006; Polson and Scott 2012) with location parameter a and scale parameter b. The half-Cauchy distribution, which attains its mode at zero, is capable of capturing strong smoothness, while also having heavy tails, thus being capable of capturing wiggly functions. The choice of the scale hyper-parameter is discussed in Section S.5.1 in the supplementary materials.

Importantly, although our basic building blocks for the fixed effects components comprise conventional HMMs, one for each input-response tone combination x = (d, s), for any input tone s, all four latent variables zk(1,s), zk(2,s), zk(3,s), zk(4,s) simultaneously appear in Equation (2). For each input tone, the graph for our tone learning model (Figure 6 and Figure S.6 in the supplementary materials) thus resembles a factorial HMM (fHMM, Ghahramani and Jordan 1997) with four hidden layers. In the posterior, a latent state zk(d,s) is thus informed by all responses generated under the tone s, not just the subset corresponding to x = (d, s). This has important consequences for posterior inference, as we discuss in Section 4.

Figure 6.

Figure 6.

Graph of the proposed fixed effects model for tone learning.

3.2.2. Locally Varying Functional Random Effects

We now focus on flexibly modeling the functional random effects components. For reasons outlined before Section 3.2.1, estimating ux(i)(t) for each different x is a challenging task. For any participant, the random effects for correct and incorrect identification of the tones may, however, be expected to be on the opposite sides of the corresponding population level curves. Taking a middle path, we thus allow different random effects uC(i)(t) and uI(i)(t) for correct (C) and incorrect (I) identifications, respectively, as

ud,s(i)(t)=uC(i)(t)when d=s,
ud,s(i)(t)=uI(i)(t)when ds.

We adopt a common strategy to model both uC(i)(t) and uI(i)(t). Suppressing the subscripts to simplify notation and avoid repetition, we model the time-varying random effects components u(i)(t) as

u(i)(t)=k=1Kβk,u(i)Bk(t),βu(i)~MVNK{0,(σu,a2IK+σu,s2Pu)1}, (10)

where βu(i)=(β1,u(i),,βK,u(i))T are subject-specific spline coefficients, MVNK(μ, Σ) denotes a K-dimensional multivariate normal distribution with mean μ and covariance Σ. We choose Pu=DuTDu, where the (K − 1) × K matrix Du is such that Duβu(i) computes the first-order differences in βu(i). The model thus penalizes k=1K(βk,u(i))2=βu(i)TPuβu(i), the sum of squares of first-order differences in βu(i) (Eilers and Marx 1996). The random effects variance parameter σu,s2 models the smoothness of the random effects curves, smaller σu,s2 inducing smoother u(i)(t)’s. Additional variations from the constant zero curve are explained by σu,a2 (Figure 7). The absence of random effects is signified by the limiting case σu,s2=σu,a2=0. We assign half-Cauchy priors on the variance parameters as

σu,s2~C+(0,1),σu,a2~C+(0,1).
Figure 7.

Figure 7.

An illustration of the functional random effects model proposed in Section 3.2.2. Each panel shows a collection of 10 random draws from the random effects distribution for a combination of values of (σu,s2, σu,a2).

Modeled in the same space of quadratic B-splines, the fixed and the random effects curves thus share similar smoothness properties. Having different smoothness controlling parameters, they are, however, allowed to have different smoothness levels. A similar approach, but with additional assumptions on the covariance matrix of the random effects, has previously been developed in Guo (2002). To our knowledge, model (10) for the random effects is thus also novel to the literature.

Integrating out the random effects, the corresponding population level parameters θx(t) are obtained as

θx(t)=exp{fx(t)+ux(i)(t)}fu{ux(i)(t)}dux(i)(t)=exp[fx(t)+var{ux(i)(t)}2].

4. Posterior Inference

Posterior inference for conventional HMMs can generally be based on samples drawn from the posterior using dynamic message passing MCMC algorithms (Rabiner 1989; Scott 2002). The nonstandard inverse Gaussian likelihood and the fHMM type model structure of our proposed longitudinal drift-diffusion mixed model, however, bring in significant additional complexities. We adapt recent advances in MCMC algorithms for discrete spaces (Neal 2003; Van Gael et al. 2008; Titsias and Yau 2014; Zanella 2020) in novel nontrivial ways, designing locally informative slice sampling moves that carefully exploit the conditional independence relationships encoded in the model to overcome the computational challenges. Due to space constraints, the details are deferred to Section S.5 in the supplementary materials.

5. Application to Tone Categorization Data

In this section, we discuss the results produced by our method applied to the tone category learning data described in Section 2. Our primary inference goals, we recall, include understanding systematic longitudinal variations in perceptual categorization decision as the participants get better at identifying the four Mandarin tones with there being some additional interests in assessing individual specific trajectories, especially how they differ between good and bad learners.

Figure 8 shows the posterior mean trajectories and associated 90% credible intervals for the boundaries bd,s(t) and the drift rates μd,s(t) estimated by our method for different combinations of (d, s). Figure 9 reports the estimated posterior probabilities of each of the (42)=6 pairs of success (d = s) parameters to cluster together in different blocks. Figure S.16 in the supplementary materials additionally presents the drift curves for successful identifications (d = s) superimposed on each other. These results suggest that after an initial learning phase, where the underlying processes are all similar across all input tones, there are two main learning groups. Two of the tones {T1, T3} seem to be easier to learn, as the corresponding drift parameters are larger, and tones {T2, T4} are more challenging. These findings are corroborated by empirical evidence and have significant biological relevance. The similarity groups of the mandarin tones are in fact {T1, T3}, which are characterized by the height of the pitch, and tones {T2, T4}, which are characterized by the direction of the pitch and are more challenging to learn. Tone T3, in particular, has a unique “dipping” pitch pattern that is rarely encountered in English (Song et al. 2008), and therefore is easier to categorize. Our proposed method allows similar inferential questions to be answered for the drift parameters corresponding to misclassifications, as well as for all the boundary parameters. The misclassification drift curves are mostly similar to each other, although some minor local differences can be found. Notable exceptions are μ1,3(t) and μ3,1(t) which are significantly smaller than all other drifts after the third block. As the participants get trained and experienced, for input tone T1, evidence in favor of tone T3 is thus collected more slowly compared to evidence in favor of T2 and T4, and vice versa. Likewise, while the boundary curve estimates mostly remain constant over the training blocks and similar to each other, b1,3(t) and b3,1(t) again differ from the rest and actually increase over the blocks. As the participants get trained and experienced, more evidence in favor of tone T3 is thus needed to misclassify tone T1 as tone T3 and vice versa. These suggest that, as the participants get trained and experienced, tones T1 and T3 become harder to misclassify for one another.

Figure 8.

Figure 8.

Results for tone learning data: Estimated posterior mean trajectories of the population level drifts μd,s(t) (left panel) and boundaries bd,s(t) (right panel) for the proposed longitudinal inverse Gaussian drift-diffusion mixed model. The shaded areas represent the corresponding 90% point wise credible intervals. Parameters for the high-level tone response category T1 are shown in red; low-rising T2 in blue; low-dipping T3 in green; and high-falling T4 in purple.

Figure 9.

Figure 9.

Results for tone learning data: Pairwise posterior co-clustering probabilities of the parameter trajectories for successful identification (d = s) of different input tones in different learning phases. The estimated posterior probability of (μ2,2, b2,2) and (μ3,3, b3,3) being clustered together, and hence being equal, in the third block is thus 0.64, as shown in row (2, 3) and column 3. Equivalently, the estimated posterior probability of (μ2,2, b2,2) and (μ3,3, b3,3) being different in the third block is 0.36.

Importantly, our proposed drift-diffusion mixed model not only allows population level inference about the underlying processes but also allows us to assess individual specific parameter trajectories. Figure 10 shows the posterior mean trajectories and associated 90% credible intervals for the drift rates μs,d(i) and the boundaries bs,d(i) estimated by our method for the different success combinations of (d, s) for two participants—the one with the best accuracy averaged across all blocks, and the one with the worst accuracy averaged across all blocks. These results suggest significant individual specific heterogeneity. Importantly, the differences in the performances can again be explained mostly by differences in the drift trajectories. For the well performing participant, the drift trajectories increase rapidly with the training blocks before plateauing down around block 6 at which stage the participant has already attained native-like proficiency. For the poorly performing candidate on the other hand, the drift trajectories remain approximately constant across all 10 blocks.

Figure 10.

Figure 10.

Results for tone learning data: Estimated posterior mean trajectories for individual specific drifts μd,s(i)(t)=exp{fμ,d,s(t)+uμ,C(i)(t)} (left panel) and boundaries bd,s(i)(t)=exp{fb,d,s(t)+ub,C(i)(t)} (right panel) for successful identification (d = s) for two different participants—one performing well (dotted line) and one performing poorly (dashed line). The shaded areas represent the corresponding 90% point wise credible intervals. Parameters for the high-level tone response category T1 are shown in red; low-rising T2 in blue; low-dipping T3 in green; and high-falling T4 in purple.

We compare the performance of our method with that of the linear ballistic accumulator (LBA) model (Brown and Heathcote 2008). Similar to our model, the LBA uses independent evidence accumulators starting at δ that continue until a response threshold b is reached. The accumulator that first reaches the boundary corresponds to the decision outcome, and the time taken to reach this decision boundary is the observed response time. The LBA model, however, assumes that the evidence accumulates linearly at the rate μ, reaching the boundary b precisely at time τ = b/μ. Unlike in drift-diffusion models, where trial-by-trial variability is explained by stochastically different diffusion paths, the LBA model explains trial-by-trial variability assuming the slopes μ for different trials to be drawn from a Normal(md,s, vd,s) distribution. (Figure S.9 in the supplementary materials).

The literature on LBA models has many serious limitations. The normality assumption on the slopes μ clearly does not satisfy any nonnegativity constraints. Existing LBA models are also limited in their use of a common boundary bs for all decision categories d. There is also no principled way to incorporate systematic stimulus and decision category specific fixed or individual specific random effects into the LBA model. Existing literature is also limited to static settings, there is no mechanism to estimate smoothly varying longitudinal parameter trajectories as the participants get trained and experienced in their decision tasks. In our implementation, we thus fitted the LBA model separately for each block. Finally, the likelihood function of the LBA model is nonconvex in the parameters. Parameter estimation based on optimization of the likelihood function is thus fraught with convergence issues. We used the rtdists package in R, using several random initializations and tracking the objective function to ensure convergence. A more detailed review of the LBA model can be found in Section S.7 of the supplementary materials.

Results produced by the LBA model applied to our motivating tone-learning data are reported in Figure 11. Owing to the limitations discussed above, the inference we make with such models is very limited. For instance, only nonsmooth population level estimates are available, individual specific trajectories cannot be assessed, etc. Some of our findings can, however, be confirmed by the LBA method. For example, looking at the drift parameter estimates, one can see that tone T3 is consistently associated with larger drifts. As was also seen in the estimates returned by our method, tones {T2, T4} have similar values for the drift and the boundary parameters. Except such general overall findings, the LBA model, however, cannot answer scientific questions related to the dynamics of category learning with fine detail.

Figure 11.

Figure 11.

Results for tone learning data: Left: Estimated mean slopes md,s,t for the LBA model. Right: Estimated boundaries bs,t for the LBA model. In the left panel, md,s,t’s for the high-level tone response category T1 are shown in red; low-rising T2 in blue; low-dipping T3 in green; and high-falling T4 in purple.

Our method, on the other hand, provides a biologically interpretable, statistically principled approach to accommodate fixed effects of input stimuli and decision categories as well as random subject specific heterogeneity, allows MCMC algorithm based efficient estimation of longitudinally smoothly evolving parameter trajectories, borrowing information across sample subgroups, participants as well as adjacent time stamps through many layers of hierarchy. Crucially, building on a novel local cluster inducing mechanism, our method also allows automated assessment of local similarities and differences in the parameter trajectories in very fine detail as the participants get trained and experienced in their decision tasks.

On the scientific side, the detailed insights obtained here point toward interesting and novel hypotheses about learning. For example, we demonstrate that a difference in drift rates, associated with the speed of sensory evidence accumulation, is critical in determining good versus poor learners. Evidence thresholds, on the other hand, remain relatively stable over training blocks as well as across participants. Recent studies have shown that the process of evidence accumulation can be selectively targeted by brain stimulation (Van der Groen et al. 2018). Novel tone learning studies are currently being designed to test if such neurostimulation primarily improves the drift rates but not the evidence thresholds.

On the practical side, the insights obtained above can have important implications for developing advanced training regimens in language learning platforms used by millions of adults. Due to poor understanding of the temporal dynamics of learning, especially in multi-category learning problems, current training regimens are neither time adaptive nor individualized. Similar to personalized medicine, next-generation speech training paradigms seek to optimize and individualize training to reduce vast inter-individual differences in learning success (Birdsong 2004; Wong, Vuong, and Liu 2017). With our ability to assess detailed longitudinal confusion patterns, we can set up efficient training paradigms that can change the dynamics of learning in specific ways. For example, learners may generally benefit from introducing greater variability in pitch height that allows them to shift their focus on pitch direction and hence can reduce disparities in tone confusions like that between T2 and T4; poor learners may additionally benefit from “perceptual fading”—beginning with easy tones like {T1,T3} and making the training more challenging afterward with the introduction of tones like {T2,T4}; etc. As mentioned before, non-invasive and safe brain stimulation approaches like transcranial random noise stimulation and vagus nerve stimulation can be leveraged to selectively improve the process of sensory accumulation that could enhance the performance in poor learners.

6. Discussion

6.1. Summary

In this article, we proposed a novel longitudinal drift-diffusion mixed model for perceptual decision making, allowing the underlying mechanisms to be similar or different at different longitudinal stages. Our research was motivated primarily by auditory neuroscience experiments where scientists are interested in understanding how the decision making mechanisms evolve as the participants get more training in the decision tasks. Our model was built on a novel statistical framework for longitudinal data that exploited local support properties of B-spline bases and (factorial) HMMs to allow automated assessment of local similarities and differences in the underlying parameter trajectories.

Application to our motivating tone categorization experiments provided interesting novel insights into the underlying learning mechanisms. Notably, we discovered that the improvements and the local variations in tone categorization performance can be explained mostly by variations in the underlying drift parameters while the boundaries mostly remain constant. We also discovered local groupings among the underlying parameter curves in various phases of the learning experiments, how they differ between well and poorly performing participants etc. Such inferences were outside the scope of the previously existing literature.

6.2. Methodological Extensions

Methodological extensions and topics of our ongoing research include adapting the proposed models to time constrained learning experiments, developing nested models to capture the dynamics within the blocks, accommodating sleep induced overnight “consolidation” effects, fully developing the inverse-probit model (4) for accuracies introduced in Section 3, etc.

6.3. Broader Scientific Impact

The proposed approach, we believe, takes the existing literature on drift-diffusion decision making models many significant steps forward, enabling neuroscientists to study the longitudinal behavior of biologically interpretable model parameters in much finer detail than what previous methods could achieve.

As reported in Section 5, the findings of our motivating speech learning experiment help formulate interesting novel scientific hypotheses about speech learning. The findings are also practically highly significant in providing exciting opportunities for developing time adaptive and individualized training regimens for language learning.

Efficient estimation of group and individual level trajectories also open exciting avenues for potential adaptations in clinical settings, especially in conjunction with simultaneously performed imaging studies.

Finally, the scope of proposed method is also not restricted to auditory neuroscience problems but the approach can be readily applied to study decision making mechanisms in other areas of neuroscience as well.

Supplementary Material

Supplementary material

Acknowledgments

We thank the editor, Dr. Heping Zhang, for comments leading to a significantly improved version of the initial article. We also thank Dr. Peter Mueller, Dr. Mario Peruggia, Dr. Rachel Reetzke, and Dr. Tobias Teichert for helpful discussions on the research presented here.

Funding

This work was supported by the National Science Foundation grant NSF DMS-1953712 and National Institute on Deafness and Other Communication Disorders grants R01DC013315 and R01DC015504 awarded to Sarkar and Chandrasekaran.

Footnotes

Supplementary Materials

Supplementary materials present substantive additional details. These include brief reviews of fHMMs, B-splines, locally informed Hamming ball samplers, the linear ballistic accumulator model, etc. to make the article relatively self-contained. The supplementary materials also discuss the choice of hyper-parameters for our model, the MCMC algorithm used to sample from the posterior of our model and its convergence diagnostics. The supplementary materials also present simulation studies and a comparison with a reduced model further illustrating the efficacy and the advantages of our proposed method. In separate files, the supplementary materials additionally include the tone categorization dataset described in Section 2 and analyzed in Section 5, audio recordings of the four input Mandarin tones, and R programs implementing the longitudinal drift-diffusion mixed model developed in this article.

Supplementary materials for this article are available online. Please go to www.tandfonline.com/r/JASA.

References

  1. Agresti A (2018), An Introduction to Categorical Data Analysis, New York: Wiley. [Google Scholar]
  2. Birdsong D (2004), “Second Language Acquisition and Ultimate Attainment,” in Handbook of Applied Linguistics, eds. Davies A and Elder C, London: Blackwell, pp. 82–105. [Google Scholar]
  3. Bogacz R, Wagenmakers E-J, Forstmann BU, and Nieuwenhuis S (2010), “The Neural Basis of the Speed-Accuracy Tradeoff,” Trends in Neurosciences, 33, 10–16. [DOI] [PubMed] [Google Scholar]
  4. Borooah VK (2002), Logit and Probit: Ordered and Multinomial Models, Thousand Oaks, CA: SAGE. [Google Scholar]
  5. Brody CD, and Hanks TD (2016), “Neural Underpinnings of the Evidence Accumulator,” Current Opinion in Neurobiology, 37, 149–157. [DOI] [PMC free article] [PubMed] [Google Scholar]
  6. Brown SD, and Heathcote A (2008), “The Simplest Complete Model of Choice Response Time: Linear Ballistic Accumulation,” Cognitive Psychology, 57, 153–178. [DOI] [PubMed] [Google Scholar]
  7. Cappé O, Moulines E, and Rydén T (2005), Inference in Hidden Markov Models, Berlin: Springer-Verlag. [Google Scholar]
  8. Cavanagh JF, Wiecki TV, Cohen MX, Figueroa CM, Samanta J, Sherman SJ, and Frank MJ (2011), “Subthalamic Nucleus Stimulation Reverses Mediofrontal Influence Over Decision Threshold,” Nature Neuroscience, 14, 1462–1467. [DOI] [PMC free article] [PubMed] [Google Scholar]
  9. Chandrasekaran B, Sampath PD, and Wong PC (2010), “Individual Variability in Cue-Weighting and Lexical Tone Learning,” The Journal of the Acoustical Society of America, 128, 456–465. [DOI] [PMC free article] [PubMed] [Google Scholar]
  10. Chandrasekaran B, Yi H-G, and Maddox WT (2014), “Dual-Learning Systems During Speech Category Learning,” Psychonomic Bulletin & Review, 21, 488–495. [DOI] [PMC free article] [PubMed] [Google Scholar]
  11. Chhikara R (1988), The Inverse Gaussian Distribution: Theory, Methodology, and Applications, Boca Raton, FL: CRC Press. [Google Scholar]
  12. Chib S, and Hamilton BH (2002), “Semiparametric Bayes Analysis of Longitudinal Data Treatment Models,” Journal of Econometrics, 110, 67–89. [Google Scholar]
  13. Cox DR, and Miller HD (1965), The Theory of Stochastic Processes, Boca Raton, FL: CRC Press. [Google Scholar]
  14. Craigmile PF, Peruggia M, and Van Zandt T (2010), “Hierarchical Bayes Models for Response Time Data,” Psychometrika, 75, 613–632. [Google Scholar]
  15. Daniels MJ, and Pourahmadi M (2002), “Bayesian Analysis of Covariance Matrices and Dynamic Models for Longitudinal Data,” Biometrika, 89, 553–566. [Google Scholar]
  16. de Boor C (1978), A Practical Guide to Splines, New York: Springer-Verlag. [Google Scholar]
  17. Diggle P, Diggle PJ, Heagerty P, Heagerty PJ, Liang K-Y, and Zeger S (2002), Analysis of Longitudinal Data, Oxford: Oxford University Press. [Google Scholar]
  18. Ding L, and Gold JI (2013), “The Basal Ganglia’s Contributions to Perceptual Decision Making,” Neuron, 79, 640–649. [DOI] [PMC free article] [PubMed] [Google Scholar]
  19. Dufau S, Grainger J, and Ziegler JC (2012), “How to Say ‘No’ to a Nonword: A Leaky Competing Accumulator Model of Lexical Decision,” Journal of Experimental Psychology: Learning, Memory, and Cognition, 38, 1117–1128. [DOI] [PubMed] [Google Scholar]
  20. Eilers PH, and Marx BD (1996), “Flexible Smoothing With B-Splines and Penalties,” Statistical Science, 11, 89–102. [Google Scholar]
  21. Feng G, Yi HG, and Chandrasekaran B (2019), “The Role of the Human Auditory Corticostriatal Network in Speech Learning,” Cerebral Cortex, 29, 4077–4089. [DOI] [PMC free article] [PubMed] [Google Scholar]
  22. Fitzmaurice G, Davidian M, Verbeke G, and Molenberghs G (2008), Longitudinal Data Analysis, Boca Raton, FL: CRC Press. [Google Scholar]
  23. Fontanesi L, Gluth S, Spektor MS, and Rieskamp J (2019), “A Reinforcement Learning Diffusion Decision Model for Value-Based Decisions,” Psychonomic Bulletin & Review, 26, 1099–1121. [DOI] [PMC free article] [PubMed] [Google Scholar]
  24. Frühwirth-Schnatter S (2006), Finite Mixture and Markov Switching Models, New York: Springer. [Google Scholar]
  25. Gelman A (2006), “Prior Distributions for Variance Parameters in Hierarchical Models,” Bayesian Analysis, 1, 515–534. [Google Scholar]
  26. Ghahramani Z, and Jordan MI (1997), “Factorial Hidden Markov Models,” Machine Learning, 29, 245–273. [Google Scholar]
  27. Glimcher PW, and Fehr E (2013), Neuroeconomics: Decision Making and the Brain, London: Academic Press. [Google Scholar]
  28. Gold JI, and Shadlen MN (2007), “The Neural Basis of Decision Making,” Annual Review of Neuroscience, 30, 535–574. [DOI] [PubMed] [Google Scholar]
  29. Guo W (2002), “Functional Mixed Effects Models,” Biometrics, 58, 121–128. [DOI] [PubMed] [Google Scholar]
  30. Heekeren HR, Marrett S, Bandettini PA, and Ungerleider LG (2004), “A General Mechanism for Perceptual Decision-Making in the Human Brain,” Nature, 431, 859–862. [DOI] [PubMed] [Google Scholar]
  31. Iverson P, Kuhl PK, Akahane-Yamada R, Diesch E, Tohkura Y, Kettermann A, and Siebert C (2003), “A Perceptual Interference Account of Acquisition Difficulties for Non-Native Phonemes,” Cognition, 87, 47–57. [DOI] [PubMed] [Google Scholar]
  32. Johnson JS, and Newport EL (1989), “Critical Period Effects in Second Language Learning: The Influence of Maturational State on the Acquisition of English as a Second Language,” Cognitive Psychology, 21, 60–99. [DOI] [PubMed] [Google Scholar]
  33. Kim S, Potter K, Craigmile PF, Peruggia M, and Van Zandt T (2017), “A Bayesian Race Model for Recognition Memory,” Journal of the American Statistical Association, 112, 77–91. [Google Scholar]
  34. Kunkel D, Potter K, Craigmile PF, Peruggia M, and Van Zandt T (2019), “A Bayesian Race Model for Response Times Under Cyclic Stimulus Discriminability,” The Annals of Applied Statistics, 13, 271–296. [Google Scholar]
  35. Leite FP, and Ratcliff R (2010), “Modeling Reaction Time and Accuracy of Multiple-Alternative Decisions,” Attention, Perception, & Psychophysics, 72, 246–273. [DOI] [PMC free article] [PubMed] [Google Scholar]
  36. Li Y, Lin X, and Müller P (2010), “Bayesian Inference in Semiparametric Mixed Models for Longitudinal Data,” Biometrics, 66, 70–78. [DOI] [PMC free article] [PubMed] [Google Scholar]
  37. Lu J (1995), “Degradation Processes and Related Reliability Models,” Ph.D. thesis, McGill University, Montreal, Canada. [Google Scholar]
  38. Maddox WT, and Chandrasekaran B (2014), “Tests of a Dual-System Model of Speech Category Learning,” Bilingualism: Language and Cognition, 17, 709–728. [DOI] [PMC free article] [PubMed] [Google Scholar]
  39. McDonald S, and Zucchini W (1997), Hidden Markov and Other Models for Discrete-Valued Time Series, London: Chapman & Hall. [Google Scholar]
  40. Milosavljevic M, Malmaud J, Huth A, Koch C, and Rangel A (2010), “The Drift Diffusion Model Can Account for the Accuracy and Reaction Time of Value-Based Choices Under High and Low Time Pressure,” Judgment and Decision Making, 5, 437–449. [Google Scholar]
  41. Morris JS (2015), “Functional Regression,” Annual Review of Statistics and Its Application, 2, 321–359. [Google Scholar]
  42. Müller P, Quintana FA, Rosner GL, and Maitland ML (2013), “Bayesian Inference for Longitudinal Data With Non-Parametric Treatment Effects,” Biostatistics, 15, 341–352. [DOI] [PMC free article] [PubMed] [Google Scholar]
  43. Navarro DJ, and Fuss IG (2009), “Fast and Accurate Calculations for First-Passage Times in Wiener Diffusion Models,” Journal of Mathematical Psychology, 53, 222–230. [Google Scholar]
  44. Neal RM (2003), “Slice Sampling,” The Annals of Statistics, 31, 705–767. [Google Scholar]
  45. Nguyen X, and Gelfand AE (2011), “The Dirichlet Labeling Process for Clustering Functional Data,” Statistica Sinica, 21, 1249–1289. [Google Scholar]
  46. Nguyen X (2014), “Bayesian Nonparametric Modeling for Functional Analysis of Variance,” Annals of the Institute of Statistical Mathematics, 66, 495–526. [DOI] [PMC free article] [PubMed] [Google Scholar]
  47. Paulon G, Reetzke R, Chandrasekaran B, and Sarkar A (2019), “Functional Logistic Mixed-Effects Models for Learning Curves From Longitudinal Binary Data,” Journal of Speech, Language, and Hearing Research, 62, 543–553. [DOI] [PMC free article] [PubMed] [Google Scholar]
  48. Pedersen ML, Frank MJ, and Biele G (2017), “The Drift Diffusion Model as the Choice Rule in Reinforcement Learning,” Psychonomic Bulletin & Review, 24, 1234–1251. [DOI] [PMC free article] [PubMed] [Google Scholar]
  49. Peters J, and D’Esposito M (2020), “The Drift Diffusion Model as the Choice Rule in Inter-Temporal and Risky Choice: A Case Study in Medial Orbitofrontal Cortex Lesion Patients and Controls,” PLOS Computational Biology, 16, e1007615. [DOI] [PMC free article] [PubMed] [Google Scholar]
  50. Petrone S, Guindani M, and Gelfand AE (2009), “Hybrid Dirichlet Mixture Models for Functional Data,” Journal of the Royal Statistical Society, Series B, 71, 755–782. [Google Scholar]
  51. Polson NG, and Scott JG (2012), “On the Half-Cauchy Prior for a Global Scale Parameter,” Bayesian Analysis, 7, 887–902. [Google Scholar]
  52. Purcell BA (2013), Neural Mechanisms of Perceptual Decision Making, Nashville, TN: Vanderbilt University. [Google Scholar]
  53. Quintana FA, Johnson WO, Waetjen LE, and Gold EB (2016), “Bayesian Nonparametric Longitudinal Data Analysis,” Journal of the American Statistical Association, 111, 1168–1181. [DOI] [PMC free article] [PubMed] [Google Scholar]
  54. Rabiner L (1989), “A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition,” Proceedings of the IEEE, 77, 257–286. [Google Scholar]
  55. Ramsay JO, and Silverman BW (2007), Applied Functional Data Analysis: Methods and Case Studies, New York: Springer. [Google Scholar]
  56. Ratcliff R (1978), “A Theory of Memory Retrieval,” Psychological Review, 85, 59–108. [Google Scholar]
  57. Ratcliff R, and McKoon G (2008), “The Diffusion Decision Model: Theory and Data for Two-Choice Decision Tasks,” Neural Computation, 20, 873–922. [DOI] [PMC free article] [PubMed] [Google Scholar]
  58. Ratcliff R, and Rouder JN (1998), “Modeling Response Times for Two-Choice Decisions,” Psychological Science, 9, 347–356. [Google Scholar]
  59. Ratcliff R, Smith PL, Brown SD, and McKoon G (2016), “Diffusion Decision Model: Current Issues and History,” Trends in Cognitive Sciences, 20, 260–281. [DOI] [PMC free article] [PubMed] [Google Scholar]
  60. Reetzke R, Xie Z, Llanos F, and Chandrasekaran B (2018), “Tracing the Trajectory of Sensory Plasticity Across Different Stages of Speech Learning in Adulthood,” Current Biology, 28, 1419–1427. [DOI] [PMC free article] [PubMed] [Google Scholar]
  61. Ross SM, Kelly JJ, Sullivan RJ, Perry WJ, Mercer D, Davis RM, Washburn TD, Sager EV, Boyce JB, and Bristow VL (1996), Stochastic Processes, New York: Wiley. [Google Scholar]
  62. Schall JD (2001), “Neural Basis of Deciding, Choosing and Acting,” Nature Reviews Neuroscience, 2, 33–42. [DOI] [PubMed] [Google Scholar]
  63. Scott SL (2002), “Bayesian Methods for Hidden Markov Models Recursive Computing in the 21st Century,” Journal of the American Statistical Association, 97, 337–351. [Google Scholar]
  64. Singer JD, Willett JB, and Willett JB (2003), Applied Longitudinal Data Analysis: Modeling Change and Event Occurrence, Oxford: Oxford University Press. [Google Scholar]
  65. Smith PL, and Ratcliff R (2004), “Psychology and Neurobiology of Simple Decisions,” Trends in Neurosciences, 27, 161–168. [DOI] [PubMed] [Google Scholar]
  66. Smith PL, and Vickers D (1988), “The Accumulator Model of Two-Choice Discrimination,” Journal of Mathematical Psychology, 32, 135–168. [Google Scholar]
  67. Song JH, Skoe E, Wong PC, and Kraus N (2008), “Plasticity in the Adult Human Auditory Brainstem Following Short-Term Linguistic Training,” Journal of Cognitive Neuroscience, 20, 1892–1902. [DOI] [PMC free article] [PubMed] [Google Scholar]
  68. Teichert T, Grinband J, and Ferrera V (2016), “The Importance of Decision Onset,” Journal of Neurophysiology, 115, 643–661. [DOI] [PMC free article] [PubMed] [Google Scholar]
  69. Titsias MK, and Yau C (2014), “Hamming Ball Auxiliary Sampling for factorial Hidden Markov Models,” in Advances in Neural Information Processing Systems, pp. 2960–2968. [Google Scholar]
  70. Tuerlinckx F (2004), “The Efficient Computation of the Cumulative Distribution and Probability Density Functions in the Diffusion Model,” Behavior Research Methods, Instruments, & Computers, 36, 702–716. [DOI] [PubMed] [Google Scholar]
  71. Tuerlinckx F, Maris E, Ratcliff R, and De Boeck P (2001), “A Comparison of Four Methods for Simulating the Diffusion Process,” Behavior Research Methods, Instruments, & Computers, 33, 443–456. [DOI] [PubMed] [Google Scholar]
  72. Usher M, and McClelland JL (2001), “The Time Course of Perceptual Choice: The Leaky, Competing Accumulator Model,” Psychological Review, 108, 550–592. [DOI] [PubMed] [Google Scholar]
  73. Van der Groen O, Tang MF, Wenderoth N, and Mattingley JB (2018), “Stochastic Resonance Enhances the Rate of Evidence Accumulation During Combined Brain Stimulation and Perceptual Decision-Making,” PLOS Computational Biology, 14, 1–17. [DOI] [PMC free article] [PubMed] [Google Scholar]
  74. Van Gael J, Saatci Y, Teh YW, and Ghahramani Z (2008), “Beam Sampling for the Infinite Hidden Markov Model,” in Proceedings of the 25th International Conference on Machine Learning, ACM, pp. 1088–1095. [Google Scholar]
  75. Vandekerckhove J, and Tuerlinckx F (2007), “Fitting the Ratcliff Diffusion Model to Experimental Data,” Psychonomic Bulletin & Review, 14, 1011–1026. [DOI] [PubMed] [Google Scholar]
  76. Vandekerckhove J, Tuerlinckx F, and Lee MD (2008), “A Bayesian Approach to Diffusion Process Models of Decision-Making,” in Proceedings of the 30th Annual Conference of the Cognitive Science Society, Washington, DC, pp. 1429–1434. [Google Scholar]
  77. Wang J-L, Chiou J-M, and Müller H-G (2016), “Functional Data Analysis,” Annual Review of Statistics and Its Application, 3, 257–295. [Google Scholar]
  78. Wang Y, Spence MM, Jongman A, and Sereno JA (1999), “Training American Listeners to Perceive Mandarin Tones,” The Journal of the Acoustical Society of America, 106, 3649–3658. [DOI] [PubMed] [Google Scholar]
  79. Whitmore G, and Seshadri V (1987), “A Heuristic Derivation of the Inverse Gaussian Distribution,” The American Statistician, 41, 280–281. [Google Scholar]
  80. Wong PC, Vuong LC, and Liu K (2017), “Personalized Learning: From Neurogenetics of Behaviors to Designing Optimal Language Training,” Neuropsychologia, 98, 192–200. [DOI] [PMC free article] [PubMed] [Google Scholar]
  81. Xie Z, Reetzke R, and Chandrasekaran B (2017), “Stability and Plasticity in Neural Encoding of Linguistically Relevant Pitch patterns,” Journal of Neurophysiology, 117, 1409–1424. [DOI] [PMC free article] [PubMed] [Google Scholar]
  82. Zanella G (2020), “Informed Proposals for Local MCMC in Discrete Spaces,” Journal of the American Statistical Association, 115, 852–865. [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary material

RESOURCES