Abstract
In this study, we explore parameter estimation for a joint count‐time data model with a two‐factor latent trait structure, representing accuracy and speed. Each count‐time variable pair corresponds to a specific item on a measurement instrument, where each item consists of a fixed number of tasks. The count variable represents the number of successfully completed tasks and is modeled using a Beta‐binomial distribution to account for potential over‐dispersion. The time variable, representing the duration needed to complete the tasks, is modeled using a normal distribution on a logarithmic scale. To characterize the model structure, we derive marginal moments that inform a set of method‐of‐moments (MOM) estimators, which serve as initial values for maximum likelihood estimation (MLE) via the Monte Carlo Expectation‐Maximization (MCEM) algorithm. Standard errors are estimated using both the observed information matrix and bootstrap resampling, with simulation results indicating superior performance of the bootstrap, especially near boundary values of the dispersion parameter. A comprehensive simulation study investigates estimator accuracy and computational efficiency. To demonstrate the methodology, we analyze oral reading fluency (ORF) data, showing substantial variation in item‐level dispersion and providing evidence for the improved model fit of the Beta‐binomial specification, assessed using standardized root mean square residuals (SRMSR).
Keywords: accuracy count data, Beta‐binomial distribution, item response theory, oral reading fluency, overdispersion, psychometric calibration, response time analysis
1. INTRODUCTION
Count data are increasingly common in behavioral, cognitive, and psychometric assessments. For instance, they arise in psychomotor skill testing (Spray, 1990), cognitive processing tasks (Behrens et al., 2024), decision‐making experiments (Cogoni et al., 2024), and oral reading assessments (Jansen, 1997). With the rise of computer‐based testing, count data are often accompanied by other process indicators, such as response times (RTs) and eye‐tracking data, enabling a richer understanding of test‐taker behavior. Recent models have jointly incorporated item responses with RTs (van der Linden, 2007), eye fixations (Man et al., 2022), action counts (Qiao et al., 2023), and revisit behaviors (Bezirhan et al., 2021). RTs, in particular, are useful for investigating underlying cognitive processes (De Boeck & Jeon, 2019), and their inclusion in psychometric models can enhance inferences by jointly accounting for person‐level speed and item‐level time intensity (Van Der Linden, 2009).
Many studies have used the Poisson distribution and its extensions to model unbounded count data (e.g., Bezirhan et al., 2021; Forthmann et al., 2020; Hung, 2012; Man et al., 2022; Qiao et al., 2023; Rasch, 1960; Zhan et al., 2022). The Rasch Poisson Counts Model (RPCM) (Rasch, 1960) is a popular choice due to its simplicity, but its strict equi‐dispersion assumption limits its applicability. Extensions such as the negative binomial (Hung, 2012; Man et al., 2022) and the Conway‐Maxwell‐Poisson distribution (Forthmann et al., 2020; Qiao et al., 2023) address dispersion but remain ill‐suited for settings where the data are naturally bounded.
In bounded count data, where a natural upper limit constrains the data, binomial‐based that directly model the probability of success across a fixed number of attempts are more appropriate (Masters & Wright, 1984). The Binomial Trials Model proposed by Rasch (1972) is a well‐known example. This is especially relevant for oral reading fluency (ORF) assessments, where the number of correctly read words is capped by passage length. In such cases, binomial models offer a conceptually coherent alternative to unbounded count models. Our work is motivated by a computer‐based ORF assessment system (Nese & Kamata, 2014) that records both the number of words read correctly and the time taken, making it possible to jointly model accuracy and speed. Potgieter et al. (2024) demonstrate how this dual structure can support a model‐based Words Correct Per Minute (WCPM) score.
To account for over‐dispersion in the bounded count data, we adopt a Beta‐binomial model, which generalizes the binomial distribution. Although often viewed as a binomial mixed over a Beta distribution, the Pólya urn interpretation (Johnson et al., 2005, Chapter 6) offers a more dynamic view, characterizing the distribution as the sum of correlated Bernoulli trials with evolving success probabilities. Still, the Beta‐binomial can yield U‐shaped distributions that may not suit all applications (Kim & Lee, 2017). A Normal model is used for log‐scale reading times.
We estimate model parameters using maximum likelihood estimation (MLE) via a Monte Carlo Expectation‐Maximization (MCEM) algorithm. While our implementation uses a custom rejection sampler within MCEM, other approaches are viable. For instance, Kara et al. (2020) use a fully Bayesian model under a binomial framework. Marginal likelihoods can also be evaluated via adaptive quadrature, and scalable alternatives include the Laplace approximation method of Zhang et al. (2024). Two‐step estimation procedures, such as separate logit modeling of the count component (e.g., via the PROreg package in R, Najera‐Zuloaga et al., 2019), offer other simplified alternatives.
The model is flexible enough to accommodate various units of analysis, such as passages or sentences. To emphasize the generality of this model, we use the term “item” to refer to the unit of analysis, and “tasks” to represent the specific components being counted within each item (e.g., words in a passage or sentence). Though motivated by ORF, the model is broadly applicable to any assessment yielding both bounded count and timing data.
This study contributes to the literature in several ways. First, we derive and examine the moment structure of a joint count‐time model with two latent traits, offering insight into the relationships between components. Second, we develop method‐of‐moments (MOM) and MCEM estimation algorithms for this joint model that allows for overdispersion of count variables that remain valid under block designs and missing‐at‐random (MAR) patterns. Finally, we evaluate standard error estimation using both asymptotic and bootstrap methods, and demonstrate the model's practical utility in an empirical analysis of ORF data.
The remainder of the paper is organized as follows: Section 2 introduces the joint model and its marginal moments. Section 3 details the estimation procedures, including the MCEM algorithm and standard error methods. Section 4 presents simulation results. Section 5 applies the model to real data. Section 6 concludes with a discussion and future directions.
2. JOINT MODEL OF COUNT AND TIME DATA
2.1. Notation and model
Consider a random sample of test‐takers indexed by , each characterized by two latent traits: accuracy and speed . Let denote the vector of latent traits for test‐taker . These traits jointly determine their performance on each item in terms of both success count and completion time.
Each item consists of discrete tasks (e.g., words in a reading passage). Denote the task lengths as . The observed count outcome for person on item is , and the completion time is . For modeling purposes, we define the standardized log‐transformed time variable where is a reference unit (e.g., for log‐time per 10 words for ORF). Let and .
For the count outcomes, a Beta‐binomial distribution is assumed,
Here, the parameter governs overdispersion and is the standard normal distribution function. The parameters and represent item‐level discrimination and difficulty, respectively. Conditional on , the mean and variance are and . The limiting case recovers the binomial distribution.
This reparameterization in terms of success probability and overdispersion, rather than the usual Beta shape parameters, facilitates interpretation and inference. To streamline later computations, we express the Beta‐binomial probability mass function (PMF) using rising factorial notation, The PMF becomes
where is the number of trials, the success probability, and the over‐dispersion parameter. The intra‐class correlation coefficient is given by , quantifying the degree of response clustering within items.
Ultimately, the strength of the Beta‐binomial model lies in its increased flexibility. While it includes the binomial model as a special case, it also accounts for over‐dispersion relative to the binomial model.
There are two complementary interpretations of the Beta‐binomial distribution, each reflecting different assumptions about the data‐generating process. In the mixture interpretation, the model assumes that an individual's probability of success is drawn once per item (i.e., group of tasks) from a Beta distribution and then held constant across all trials in that item. This captures item‐level differences across individuals.
In contrast, the Pólya urn interpretation offers a more dynamic perspective. Imagine an urn containing a certain number of “success” and “failure” tokens. On each trial, one token is drawn at random, and then replaced along with an additional token of the same type. This reinforcement mechanism increases the chance of repeating previously observed outcomes: early successes make future successes more likely (and likewise for failures). The resulting success probability evolves over time, modeling local dependencies or learning effects during tasks. From this viewpoint, the Beta‐binomial reflects reinforcement‐driven adaptation in the response process.
Although mathematically equivalent, the two interpretations offer distinct conceptual lenses on the response process. The mixture view emphasizes between‐person heterogeneity, while the Pólya urn interpretation captures within‐person adaptation. This makes it particularly well‐suited for modeling variability in grouped count data. The choice between interpretations depends on the assumed nature of the underlying response mechanism.
However, caution is warranted. For some parameter settings, the Beta‐binomial distribution can become U‐shaped, meaning it may assign high probability to both extreme outcomes (e.g., all correct or none correct). This occurs when is large relative to , particularly when for , and symmetrically when . Although such shapes may fit marginal data well, they may yield unrealistic predictions at the individual level. Future work may consider alternative formulations to mitigate this behavior.
Following van der Linden (2007), log‐time is modeled as
where and represent the item‐level time discrimination and intensity, respectively. Higher values of imply a stronger association between latent speed and observed time. Specifically, where .
We assume the latent traits follow a bivariate normal distribution,
| (1) |
The accuracy trait has unit variance by identifiability. The accuracy‐speed correlation and speed variance are free parameters.
We assume local independence: given the latent traits , the count and time responses are conditionally independent within and across items. That is, , for , and . This does not imply that accuracy and speed are unrelated but rather that their joint behavior is fully explained by the latent variables.
2.2. Marginal likelihood and computational considerations
Let denote the observed data for test‐taker . The set indexes the items with observed data for test‐taker . Here, and represent the observed counts and standardized log‐times (i.e., log‐time per reference unit), respectively. A randomized block design was used in the motivating study, and items not in are treated as Missing at Random (MAR). This assumption also covers incidental missingness from technical issues (e.g., equipment failure), as the missingness is independent of latent ability.
Let collect all model parameters. Here, for example, represents item discrimination parameters for accuracy, with analogous notation for the remaining item‐level parameters.
Given latent traits and an item set , the joint conditional distribution of random response is
where denotes the previously‐defined Beta‐binomial PMF, and , , with denoting the standard normal probability density function (PDF). Thus, represents the normal PDF with mean and standard deviation .
To obtain the marginal distribution, we integrate over the bivariate latent distribution,
![]() |
(2) |
where denotes the bivariate normal density with mean and covariance matrix (see Equation 1). This integral defines the likelihood contribution for an individual test‐taker.
This integral can be reduced to a single dimension by exploiting the conditional normality of , which has mean and variance . The inner integral over is
This evaluates to the closed‐form expression
with coefficient terms
Subscript emphasizes that these expressions involve multiple model parameters. Substituting into the marginal likelihood, the joint distribution expression becomes
![]() |
(3) |
Reducing likelihood evaluation to one‐dimensional integrals simplifies the structure of the problem, but the integrand remains difficult to approximate. It often features sharp peaks arising from the product of multiple Beta‐binomial PMFs, especially when many items have been completed. These characteristics pose challenges for numerical integration and typically require adaptive quadrature. Implementing maximum likelihood via numerical integration of (3) is further complicated by the fact that the integrand changes with each parameter update and varies across individuals based on the number of completed items and individual responses. As a result, quadrature grids must be repeatedly reinitialized, which undermines the computational benefits typically associated with marginalization and motivates the exploration of alternative estimation strategies.
To address these challenges, we employ a Monte Carlo Expectation‐Maximization (MCEM) algorithm based on the complete‐data log‐likelihood, thereby avoiding direct integration. Implementation details are provided in Section 3.2. Notably, the resulting objective function exhibits separability across parameter blocks, simplifying optimization. Specifically, item‐level parameters related to accuracy, speed, and the global parameters can be updated independently, reducing the computational burden.
Although MCEM is used for estimation, we return to the marginalized form in Equation (3) to evaluate the observed information matrix and approximate standard errors in Section 3.3. This dual approach offers a balance between computational tractability during estimation and precision for inferential purposes.
2.3. Marginal moments of count and time variables
We summarize the unconditional first and second moments of the random variables and for item , respectively. For derivations, refer to Section S.1 in the Appendix S1. Let and . Define as the bivariate normal distribution function with standard marginals and correlation .
The mean of is , while its variance is given by
where denotes the bivariate normal distribution function with standard marginals and correlation . Similarly, for items , define and note the covariance between the two item counts is
Next, for the log‐time variable, we have and For items , the covariance is .
Lastly, the covariance between the count and log‐time variables for item is
These moments offer a framework for understanding the behavior of the observed variables under the assumed model, including their dependence on item‐ and person‐level parameters.
3. PARAMETER ESTIMATION
3.1. Method of moments
The method of moments (MOM) offers a computationally efficient approach to estimation. Using the marginal means, variances, and covariances derived in Section 2.3, MOM estimators are obtained by equating model‐implied moments with their empirical counterparts and solving the resulting system for the parameters of interest. Let denote the collection of model‐implied moments selected for estimation, where equals the number of model parameters. Let denote the corresponding sample moments. The MOM approach solves , provided all parameters are identifiable from the chosen set of moments.
There are, of course, many variations in implementation, depending on which moments are used, how parameters are recovered through transformation, and whether certain equations are combined to leverage shared information across items. In this study, MOM estimators primarily serve to initialize the MCEM algorithm described in the next section. Full derivations and computational details are provided in Section S.2 in the Appendix S1.
3.2. MCEM algorithm
To estimate parameters in the proposed joint count‐time model, we adopt a version of the Expectation‐Maximization (EM) algorithm (Dempster et al., 1977), which is suitable when latent variables can be treated as missing data. Specifically, we treat the latent vectors as unobserved and implement a Monte Carlo EM (MCEM) algorithm (Levine & Casella, 2001; Wei & Tanner, 1990) due to the intractability of closed‐form expectations.
The complete‐data log‐likelihood contribution from the th case with observed items is given (up to additive constants) by
![]() |
Here, and . The total complete‐data log‐likelihood is .
At the th iteration of the MCEM algorithm, let denote the current parameter estimates. We use the shorthand to denote conditional expectation taken with respect to the distribution of latent variables given the observed data, evaluated as if the parameter estimate were the true value. The algorithm then proceeds as follows:
- E‐step: Approximate the conditional expectation
using Monte Carlo integration. For each individual , draw samples from the conditional distribution (see Section S.3 in the Appendix S1 for a rejection sampling algorithm). Then approximate the conditional expectations by(4)
for any function . Denote the resulting Monte Carlo approximation to the ‐function by . M‐step: Maximize the Monte Carlo approximation obtained in the E‐step. The maximizer yields updated estimate .
The algorithm is initialized at , which may be obtained, for example, by the method of moments (MOM) estimators described in Section S.2 in the Appendix S1. The MCEM procedure is then iterated until convergence criteria are met, yielding approximate maximum likelihood estimates for . Following Wei and Tanner (1990), we use smaller Monte Carlo sample sizes in early iterations and increase them later once the algorithm enters a neighborhood of the MLE. We define the MCEM estimator as the average of the final values from iterations with the maximum number of samples, i.e., those with .
Since the conditional expectations required in (4) lack closed‐form solutions, Monte Carlo integration provides a practical and efficient alternative. While we rely on rejection sampling (Section S.3 in the Appendix S1), alternatives like Hamiltonian Monte Carlo can also be implemented via platforms such as Stan.
A key computational advantage of MCEM lies in the structure of and its estimate : it is separable across blocks of parameters. That is, item‐level parameters and can be optimized independently across items, and the global parameters are updated separately. This contrasts sharply with direct maximization of the observed‐data log‐likelihood in Equation (3), which requires adaptive quadrature and joint optimization over all parameters. MCEM thus allows low‐dimensional block updates, offering both clarity and computational efficiency.
One practical concern in the count component of the model is the estimation of the dispersion parameters . The Beta‐binomial likelihood becomes numerically unstable when approaches 1, indicating minimal over‐dispersion relative to the binomial model. To mitigate this, we use the following rule: for each item , let denote the estimated dispersion from the th MCEM iteration. If for some small , set and substitute a binomial model for the Monte Carlo sampling step in the next iteration. The dispersion estimate is then updated by maximizing the Beta‐binomial complete‐data function. In our implementation, was sufficient to avoid numerical instability near the boundary.
3.3. Estimating standard errors
In this subsection, two general approaches for estimating the standard errors of the MCEM estimators are considered. The first approach relies on numerical approximation of the observed information matrix, while the second approach relies on bootstrap resampling.
For the first approach, recall given after marginalizing over one latent dimension in (3). The log‐likelihood contribution for case is . This quantity is computed using adaptive Gaussian quadrature applied to (3), specifically via global adaptive quadrature based on the QAGS and QAGI algorithms described by Piessens et al. (2012) and implemented in the integrate function in R. The observed information matrix is approximated using a quadratic form of the gradient, relying solely on first‐order derivatives. Gradients are computed using Richardson's extrapolation as implemented in the grad function from the numDeriv package in R (Gilbert et al., 2009). By default, we apply two‐sided finite differences to estimate the gradient. However, when the estimated dispersion for a given item is small–specifically, when (or when )–we switch to a forward‐difference scheme to avoid boundary concerns. The expression
provides a quadratic‐form approximation to the observed information matrix, based on the outer products of individual score vectors. Although this is not the true Hessian, evaluating second derivatives directly is computationally intensive in high‐dimensional settings, and we found no gain in accuracy from doing so.
Ultimately, an approximation of the observed information matrix can be found as , where denotes the MCEM parameter estimates. The plug‐in estimated covariance matrix of the MCEM estimators is given by . Estimated standard errors are the square roots of the diagonal elements of .
The observed information matrix is an asymptotic construct, relying on the assumption that all parameters lie in the interior of the parameter space. In finite‐sample settings, however, this assumption may not hold. Additionally, there is a concern that dispersion estimates in the beta‐binomial model are near the boundary. To provide a more robust alternative under these conditions, we also consider the bootstrap for estimating standard errors.
To define the bootstrap‐based standard errors, let denote the observed data with . The nonparametric bootstrap draws with replacement from yielding bootstrap samples , . The nonparametric bootstrap automatically accounts for the missingness structure, as the for each case are resampled, resulting in a different overall missingness pattern in the bootstrap sample. While a stratified bootstrap can be used when missingness patterns are limited, this was not feasible in our setting due to the high variability in missingness patterns across individuals.
In the parametric bootstrap, samples are generated from the fitted model using . For each subject , latent variables are drawn from a bivariate normal distribution with zero mean and covariance as in (1) using estimates and . Observed data are then generated conditionally on the missingness pattern : count responses are drawn from , and log‐times are drawn from , . Optionally, missingness patterns may also be resampled.
In either case, let denote the vector of MCEM estimates obtained using bootstrap sample . Then, based on such resampling‐based estimates, the bootstrap covariance matrix is defined as
where denotes the average of the estimates. Standard errors are then taken as the square roots of the diagonal elements of .
4. SIMULATION STUDIES
We conducted two simulation studies: one comparing MOM and MCEM estimators in terms of root mean square error (RMSE), including considering the number of MCEM iterations, and another evaluating the accuracy of standard error estimates using plug‐in and bootstrap methods.
4.1. Data generation
Each simulation used a test with items. Item lengths were drawn from a discrete uniform, for and for , representing medium and long items. Count model parameters were generated as and , with difficulties given by . Time model parameters were set via and . These parameters were generated once and kept fixed for all simulations.
Dispersion was characterized using the intra‐class correlation , a dimensionless measure ranging from 0 (corresponding to the binomial case) to 1. We considered three settings: (i) binomial, with ; (ii) low dispersion, with ; and (iii) high dispersion, with . The corresponding beta‐binomial dispersion parameter was computed as .
Latent pairs for subjects were sampled from a bivariate normal distribution with mean zero, variances , , and correlation . Each subject was assigned 10 items using a balanced incomplete block design: for , and to ensure full item coverage and overlap. For each subject, a random subset of items was retained, where was drawn with probabilities {0.17, 0.43, 0.03, 0.03, 0.04, 0.05, 0.09, 0.14, 0.01, 0.01}, mimicking realistic missingness patterns. The set of retained items for subject is denoted by , where . For retained items , observed data were generated as
![]() |
We considered sample sizes , yielding nine simulation conditions in total (three dispersion levels crossed with three sample sizes). Parameter settings and missingness proportions were informed by empirical results from our data application.
4.2. Performance of estimators
A total of 2000 samples were generated for each of the above configurations. For each sample, estimators were computed under both binomial and Beta‐binomial count model assumptions. The binomial model is correct only when , while the Beta‐binomial is valid in all settings, though it may face challenges near the boundary when .
For each count model, we computed three sets of estimators. First, we obtained the MOM estimates, which were also used to initialize the corresponding MCEM algorithm. We then ran two MCEM configurations: MCEM1, with iterations using for and for the final two iterations; and MCEM2, with iterations using for and for . Final MCEM estimates were computed by averaging across the iterations with . This yielded six sets of estimators for the count model parameters. For the time model parameters, the MOM estimates are identical regardless of the count model used. However, MCEM estimates depend on the assumed count model. As a result, five distinct sets of estimators were produced overall: one shared MOM set and four MCEM sets.
Let denote a set of parameter estimates calculated for the th simulated dataset for the items using a particular method, and let denote the true values. We defined and report the root mean square error (RMSE) given by
We report RMSE for the item parameters , , (intra‐class correlation scale), , and , along with and from the latent model. Note for the log‐time component, we adopt the parameterization to aid interpretation and improve numerical stability. In finite samples, MOM estimation may yield , corresponding to , an interpretable boundary value. Even under MCEM, estimates of are considerably more stable than those of .
Tables 1, 2, 3 summarize RMSE results for each parameter class. Across all configurations, MCEM2 (210 iterations) offers only marginal improvements over MCEM1 (52 iterations), suggesting that additional iterations provide limited benefit beyond the substantial gains already achieved over MOM estimators. This is especially evident in the count model parameters under the Beta‐binomial, where MOM shows dramatically higher RMSE. That MCEM1 performs comparably to MCEM2 is encouraging from a computational efficiency standpoint and holds across both count and time model parameters.
TABLE 1.
Count data parameter recovery (RMSE) for binomial (bin) and Beta‐binomial (bb) models.
|
|
|
|
||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
|
|
|
|
|
|
|
|
|
||||||||||
| No dispersion | ||||||||||||||||||
| MOMbin | 0.094 | 0.101 | 0.000 | 0.066 | 0.072 | 0.000 | 0.051 | 0.055 | 0.000 | |||||||||
| MCEMbin,1 | 0.059 | 0.057 | 0.000 | 0.039 | 0.040 | 0.000 | 0.029 | 0.031 | 0.000 | |||||||||
| MCEMbin,2 | 0.058 | 0.056 | 0.000 | 0.038 | 0.039 | 0.000 | 0.029 | 0.031 | 0.000 | |||||||||
| MOMbb | 1.869 | 2.029 | 0.045 | 1.084 | 1.152 | 0.043 | 0.775 | 0.795 | 0.042 | |||||||||
| MCEMbb,1 | 0.062 | 0.063 | 0.003 | 0.041 | 0.043 | 0.002 | 0.030 | 0.032 | 0.002 | |||||||||
| MCEMbb,2 | 0.058 | 0.056 | 0.003 | 0.039 | 0.039 | 0.002 | 0.029 | 0.031 | 0.002 | |||||||||
| Low dispersion | ||||||||||||||||||
| MOMbin | 0.194 | 0.165 | 0.069 | 0.180 | 0.134 | 0.069 | 0.175 | 0.119 | 0.069 | |||||||||
| MCEMbin,1 | 0.222 | 0.141 | 0.069 | 0.172 | 0.110 | 0.069 | 0.154 | 0.095 | 0.069 | |||||||||
| MCEMbin,2 | 0.223 | 0.135 | 0.069 | 0.172 | 0.106 | 0.069 | 0.153 | 0.092 | 0.069 | |||||||||
| MOMbb | 2.304 | 2.523 | 0.068 | 1.315 | 1.431 | 0.064 | 0.871 | 0.924 | 0.061 | |||||||||
| MCEMbb,1 | 0.126 | 0.105 | 0.026 | 0.078 | 0.071 | 0.018 | 0.058 | 0.054 | 0.014 | |||||||||
| MCEMbb,2 | 0.122 | 0.103 | 0.026 | 0.075 | 0.070 | 0.018 | 0.056 | 0.053 | 0.014 | |||||||||
| High dispersion | ||||||||||||||||||
| MOMbin | 0.322 | 0.253 | 0.138 | 0.314 | 0.20 | 0.138 | 0.311 | 0.206 | 0.138 | |||||||||
| MCEMbin,1 | 0.400 | 0.251 | 0.138 | 0.312 | 0.194 | 0.138 | 0.281 | 0.172 | 0.138 | |||||||||
| MCEMbin,2 | 0.405 | 0.234 | 0.138 | 0.312 | 0.183 | 0.138 | 0.281 | 0.164 | 0.138 | |||||||||
| MOMbb | 2.084 | 2.266 | 0.106 | 1.230 | 1.306 | 0.099 | 1.101 | 1.156 | 0.093 | |||||||||
| MCEMbb,1 | 0.175 | 0.140 | 0.050 | 0.104 | 0.091 | 0.036 | 0.075 | 0.068 | 0.029 | |||||||||
| MCEMbb,2 | 0.169 | 0.137 | 0.049 | 0.100 | 0.089 | 0.036 | 0.073 | 0.067 | 0.029 | |||||||||
TABLE 2.
Time data model parameter recovery (10 × RMSE).
|
|
|
|
||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
|
|
|
|
|
|
|||||||
| No dispersion | ||||||||||||
| MOM | 0.601 | 0.659 | 0.438 | 0.466 | 0.350 | 0.359 | ||||||
| MCEMbin,1 | 0.275 | 0.415 | 0.186 | 0.294 | 0.143 | 0.224 | ||||||
| MCEMbin,2 | 0.282 | 0.415 | 0.185 | 0.289 | 0.142 | 0.223 | ||||||
| MCEMbb,1 | 0.275 | 0.442 | 0.186 | 0.306 | 0.143 | 0.234 | ||||||
| MCEMbb,2 | 0.282 | 0.416 | 0.185 | 0.289 | 0.142 | 0.222 | ||||||
| Low dispersion | ||||||||||||
| MOM | 0.624 | 0.646 | 0.456 | 0.461 | 0.362 | 0.352 | ||||||
| MCEMbin,1 | 0.255 | 0.399 | 0.175 | 0.280 | 0.133 | 0.214 | ||||||
| MCEMbin,2 | 0.261 | 0.398 | 0.174 | 0.278 | 0.132 | 0.212 | ||||||
| MCEMbb,1 | 0.256 | 0.398 | 0.175 | 0.282 | 0.133 | 0.214 | ||||||
| MCEMbb,2 | 0.261 | 0.393 | 0.174 | 0.278 | 0.132 | 0.212 | ||||||
| High dispersion | ||||||||||||
| MOM | 0.626 | 0.655 | 0.454 | 0.457 | 0.363 | 0.355 | ||||||
| MCEMbin,1 | 0.257 | 0.395 | 0.175 | 0.279 | 0.134 | 0.216 | ||||||
| MCEMbin,2 | 0.263 | 0.394 | 0.174 | 0.278 | 0.133 | 0.211 | ||||||
| MCEMbb,1 | 0.257 | 0.396 | 0.175 | 0.278 | 0.134 | 0.214 | ||||||
| MCEMbb,2 | 0.262 | 0.392 | 0.175 | 0.278 | 0.133 | 0.213 | ||||||
TABLE 3.
Latent normal model parameter recovery ().
|
|
|
|
||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
|
|
|
|
|
|
|||||||
| No dispersion | ||||||||||||
| MOMbin | 0.155 | 0.495 | 0.108 | 0.335 | 0.081 | 0.268 | ||||||
| MCEMbin,1 | 0.096 | 0.398 | 0.066 | 0.274 | 0.052 | 0.216 | ||||||
| MCEMbin,2 | 0.096 | 0.402 | 0.066 | 0.270 | 0.052 | 0.216 | ||||||
| MOMbb | 0.155 | 0.594 | 0.108 | 0.458 | 0.081 | 0.362 | ||||||
| MCEMbb,1 | 0.098 | 0.417 | 0.068 | 0.280 | 0.053 | 0.222 | ||||||
| MCEMbb,2 | 0.096 | 0.405 | 0.066 | 0.271 | 0.052 | 0.217 | ||||||
| Low dispersion | ||||||||||||
| MOMbin | 0.157 | 0.798 | 0.110 | 0.755 | 0.083 | 0.749 | ||||||
| MCEMbin,1 | 0.100 | 0.521 | 0.065 | 0.444 | 0.050 | 0.425 | ||||||
| MCEMbin,2 | 0.101 | 0.514 | 0.065 | 0.440 | 0.050 | 0.426 | ||||||
| MOMbb | 0.157 | 0.658 | 0.110 | 0.486 | 0.083 | 0.390 | ||||||
| MCEMbb,1 | 0.101 | 0.442 | 0.065 | 0.293 | 0.051 | 0.227 | ||||||
| MCEMbb,2 | 0.100 | 0.441 | 0.065 | 0.290 | 0.050 | 0.223 | ||||||
| High dispersion | ||||||||||||
| MOMbin | 0.153 | 1.110 | 0.102 | 1.095 | 0.083 | 1.073 | ||||||
| MCEMbin,1 | 0.097 | 0.728 | 0.067 | 0.678 | 0.054 | 0.656 | ||||||
| MCEMbin,2 | 0.097 | 0.717 | 0.067 | 0.673 | 0.054 | 0.653 | ||||||
| MOMbb | 0.153 | 0.701 | 0.102 | 0.550 | 0.083 | 0.416 | ||||||
| MCEMbb,1 | 0.099 | 0.459 | 0.067 | 0.328 | 0.054 | 0.249 | ||||||
| MCEMbb,2 | 0.098 | 0.458 | 0.067 | 0.326 | 0.054 | 0.248 | ||||||
As Table 1 shows, the MOMBetaBin estimators exhibit high RMSE across all settings. While they serve as initial values for the MCEM algorithm, their variability may limit their usefulness. We note that initializing MCEM under the Beta‐binomial model using MOMBin estimates (with dispersion parameters set to 1) is a viable alternative. In test cases not reported here, both initialization strategies led to convergence within the default 52 iterations, suggesting robustness of the MCEM procedure to starting values. Nevertheless, the choice of initialization can influence early convergence behavior and warrants further study in future work.
Table 1 highlights the impact of count model specification. When , the binomial model performs slightly better than the Beta‐binomial, though differences are modest. However, under low and high dispersion settings, the Beta‐binomial yields significantly lower RMSE, indicating robust performance even when overdispersion is substantial. Its near‐equivalence to the binomial model when further demonstrates efficiency retention in the absence of dispersion.
In Table 2, time model parameters are consistently estimated with low RMSE regardless of count model specification or true dispersion level. This aligns with the assumption of conditional independence between count and time components, ensuring accurate recovery of and even when the count model is misspecified.
Table 3 presents results for the latent variance and correlation . Estimation of is stable across conditions, but is highly sensitive to count model specification. When overdispersion is ignored, RMSE for increases substantially, reflecting systematic bias. Even under binomial conditions, the Beta‐binomial's RMSE for is only about 5% larger than the binomial's, but its advantage grows markedly when dispersion is present.
Taken together, these findings support the Beta‐binomial model as a robust default. It performs competitively when the binomial model is correct and substantially better when it is not, offering protection against misspecification with minimal efficiency loss.
To contextualize these gains, we report average run times for the MCEM algorithm and the MOM estimator under both count model specifications. Standard run times were recorded on a personal machine running macOS 15.1.1 with an Apple M3 processor (8 cores), 8 GB RAM, and R version 4.4.1. These results reflect averages over 1,000 simulated replications of the MCEM algorithm, for the sample size settings. In the binomial model, the Method of Moments estimator completed in 0.26 s on average. The MCEM algorithm, which used 52 iterations (50 with a single imputation per case and 2 with 100 imputations per case), required an average of 34.3 s. For reference, iterations using a single imputation took approximately 1.1 s each, while those using 100 imputations required about 17.2 s each. In the beta‐binomial model, the Method of Moments estimator took 0.65 s on average, while MCEM required approximately 19.4 min under the same 52‐iteration structure. In this case, single‐imputation iterations took around 16.9 s, and 100‐imputation iterations approximately 2.7 min each.
4.3. Estimating standard errors
In Section 4.3, we introduced two approaches for estimating the standard errors of the MCEM estimators: a plug‐in method based on an approximation to the observed information matrix, and a bootstrap‐based method. This section compares the resulting standard error estimates to empirical Monte Carlo standard errors, using simulated datasets generated as described in Section 4.1.
For each simulation configuration, 1000 datasets were generated. MCEM was implemented using 52 iterations, and standard errors were estimated using both the plug‐in and bootstrap methods, as described in Section 3.3. The bootstrap methods used resamples. Empirical standard errors were computed from the 2000 MCEM estimates obtained previously in Section 4.
Specifically, for arbitrary parameter , let denote the Monte Carlo standard error computed from 2000 simulated datasets. Here, is the estimate from the th dataset and is the average across all simulations.
Let , , and denote the standard errors computed via the plug‐in, nonparametric bootstrap, and parametric bootstrap methods, respectively, across 1000 simulated datasets. To facilitate comparison across parameters on different scales, we define for each of these methods and . A ratio near 1 indicates close agreement with the empirical standard error, while values above or below 1 indicate systematic over‐ or underestimation.
Figures 1 and 2 display the standard error ratios under the no dispersion scenario, where the binomial model correctly specifies the data‐generating process. Estimates were obtained under both the binomial and Beta‐binomial models. The latter is of particular interest, as the dispersion parameters in this setting lie on the boundary of the parameter space, As such, the plug‐in method is not expected to perform well. In fact, SE ratios for the dispersion parameters are excluded from the comparative boxplots, as their extreme values under the plug‐in approach distort the overall scale of the plots.
FIGURE 1.

Binomial model: Standard error ratio boxplots.
FIGURE 2.

Beta‐binomial model: Standard error ratio boxplots (excluding dispersion).
Inspection of Figures 1 and 2 reveals that the plug‐in approach consistently overestimates standard errors and exhibits greater variability than the bootstrap methods, regardless of the model used. Although this upward bias diminishes at , the plug‐in estimates remain more variable. In contrast, both bootstrap approaches perform well across models and sample sizes, with boxplots tightly centered around 1. The nonparametric bootstrap shows slightly less spread than the parametric bootstrap.
Figure 3 shows a scatterplot comparison of estimated versus Monte Carlo standard errors across five key parameter groups under the Beta‐binomial model with . For clarity, only the plug‐in and nonparametric bootstrap estimates are shown; the parametric bootstrap results were nearly indistinguishable from the nonparametric bootstrap and are omitted for visual simplicity. A 45‐degree reference line is included in each panel to aid comparison.
FIGURE 3.

Beta‐binomial model: Standard error scatter for n = 2000.
While the earlier boxplots excluded the dispersion parameters (“icc”) due to their differing scale under the plug‐in method, this figure includes them to provide a more complete picture of how each method performs across parameter types. The nonparametric bootstrap closely matches the Monte Carlo benchmark across all groups, slightly overestimating standard errors for dispersion. In contrast, the plug‐in approach underestimates standard errors for , overestimates for , and performs especially poorly for dispersion–consistent with its known limitations near boundary values. These distortions likely arise from joint inversion of the observed information matrix, where instability in one block (e.g., dispersion parameters near the boundary) can adversely affect variance estimates for other, otherwise stable, components.
For the remaining simulated dispersion scenarios, only the Beta‐binomial model is appropriate (as the binomial model fails to capture the true data‐generating process leading to biased parameter estimates). Although additional plots are not shown, results consistently demonstrate that the plug‐in approach underperforms relative to both bootstrap methods across parameter types. We therefore recommend using the Beta‐binomial model for parameter estimation, and advise reserving the binomial model for cases where its dispersion assumption is clearly justified. For standard error estimation, we advocate bootstrap methods, with a slight preference for the nonparametric bootstrap over the parametric alternative.
Note that while bootstrap methods offer clear advantages here, they do carry a greater computational burden.are often perceived as computationally intensive. For the simulation study, the bootstrap runs were executed on a high‐performance computing (HPC) cluster featuring AMD EPYC 7763 processors (2.45 GHz, 2 CPUs per node, 128 cores per node) with 512 GB memory per node. The inherently parallel nature of the bootstrap made it well suited for this environment: resampling and estimation tasks were distributed across nodes, substantially mitigating the additional computational burden. Although some initial effort was required for scheduling and code adaptation, overall run times remained comparable to serial implementations, demonstrating that the bootstrap approach can be both statistically and computationally viable at scale.
5. EMPIRICAL ILLUSTRATION
In this section, we illustrate the joint count‐time model framework using oral reading fluency (ORF) data from the Computerized Oral Reading Evaluation (CORE) project by Nese and Kamata (2014), conducted in public school districts in the Pacific Northwest. The primary aim of the CORE project is to improve the reliability and validity of ORF assessment through careful calibration and equating of reading passages. This effort seeks to produce a psychometrically sound instrument for evaluating student reading skills. Here, we focus on demonstrating accurate parameter estimation through comprehensive calibration using the MCEM procedure.
The full CORE dataset includes 150 passages, but our analysis is restricted to a subset of passages read by students. Each student was assigned a subset of passages to read, with accuracy (bounded by passage length) and reading time (in seconds) recorded for each. Missing data arose from the study's planned randomized block design, designed to reduce student burden, as well as from technical issues such as recorder errors or audio file corruption. Because missingness is not systematically related to student ability, it is assumed to be missing at random (MAR). Of the 69,100 possible student‐passage combinations, only 5048 responses were observed, indicating 92.7% missingness, and no student has complete data.
In terms of student participation, 232 students read only one passage, 595 read two, and 467 read five or more; the average was 3.6 passages per student. Passage lengths ranged from 46 to 103 words, with an average of 64 words. The least‐read passage had 41 readers, and the most‐read had 136, with passages read by about 100 students on average.
The data can be represented as for . Here, and represent the counts and reading times for the set of passages read by student . We define for all . Thus, represents the log‐scale reading time per 10 words for student on passage . This transformation expresses reading time data on a consistent scale and facilitates interpretation of the speed model difficulty parameters across passages of varying length, following Kara et al. (2020).
To visualize the data, we use Words Correct Per Minute (WCPM) scores, defined as Lower WCPM scores indicate fewer words read correctly per minute, suggesting a more difficult passage, while higher scores imply easier passages. Passage‐specific boxplots in Figure 4 show clear variation in both the location and spread of WCPM scores across passages. Average passage WCPM scores range from 71 to 109.2, with a maximum standard error of 5.54, underscoring meaningful differences in passage difficulty. These differences motivate our modeling approach, which allows each passage to have its own parameter values to account for such variation.
FIGURE 4.

Boxplots of WCPM scores for the passages.
To obtain maximum likelihood estimates via the MCEM algorithm, we initialized the procedure using the MOM estimates. A total of 52 MCEM iterations were performed: the first 50 used a single imputation per case, followed by two iterations with 100 imputations per case. To monitor convergence, we tracked the squared changes in MCEM parameter estimates over the first 50 iterations (see Section S.4 in the Appendix S1 for details and diagnostic plots). While additional single‐imputation iterations may be warranted if convergence is unclear, the results here indicate reasonable convergence. Final parameter estimates were computed by averaging the results from the last two iterations. For standard error estimation and confidence interval construction, we implemented nonparametric bootstrap resampling with resamples for both the binomial and Beta‐binomial count models Figures 5, 6, 7, 8, 9.
FIGURE 5.

Estimated count model discrimination () with confidence intervals.
FIGURE 6.

Estimated count model intercepts () with confidence intervals.
FIGURE 7.

Estimated count model dispersion () with confidence intervals.
FIGURE 8.

Estimated speed model intercepts () with confidence intervals.
FIGURE 9.

Estimated speed model standard deviation () with confidence intervals.
Figures 5, 6, 7, 8, 9 through 5, 6, 7, 8, 9 display parameter estimates from the Beta‐binomial count model, along with 95% bootstrap percentile confidence intervals based on the nonparametric bootstrap implementation described above. The intervals indicate that many passages differ meaningfully from one another in their parameter values, as evidenced by the lack of overlap across intervals. Additionally, the MCEM estimate of the latent variance was with a 95% confidence interval of , and the estimate of the latent correlation was with a 95% confidence interval of .
Consider specifically Figure 7 which presents the estimated dispersion parameters on the scale . The 95% bootstrap percentile confidence intervals naturally adhere to this constraint. For two passages, the point estimate was , and for eight additional passages, the confidence intervals included 1, suggesting that binomial‐like equi‐dispersion cannot be ruled out. However, for the remaining 40 passages, the intervals excluded 1, providing strong evidence of over‐dispersion in 80% of the passages. This highlights the advantage of the Beta‐binomial model in capturing variability beyond what the binomial model allows (Figures 8 and 9).
To compare parameter estimates across count model formulations, Figure 10 presents scatter plots with reference lines. In the top‐left panel, the Beta‐binomial model (which accounts for over‐dispersion) yields lower discrimination estimates () than the binomial model. Difficulty estimates () also show notable discrepancies between models. In contrast, the speed model components ( and in the bottom panels) appear largely unaffected by the choice of count model. The small differences observed in these parameters are likely due to Monte Carlo variability in the MCEM algorithm.
FIGURE 10.

Comparing Beta‐binomial and binomial parameter estimates.
The empirical demonstration and parameter comparisons are intended first and foremost to illustrate the MCEM implementation, not to advocate for a particular model as the definitive fit for the data. Model selection in latent variable frameworks with complex data structures remains an open area of research. Standard information criteria such as AIC and BIC are not appropriate here, as these rely on regularity conditions that assume all parameters lie in the interior of the parameter space, a condition often violated when dispersion parameters approach the boundary.
That being said, in response to reviewer feedback we have incorporated a goodness‐of‐fit (GOF) diagnostic to better assess model performance. Specifically, we report the standardized root mean square residual (SRMSR), a commonly used GOF measure defined as
where denotes the observed correlations and the model‐implied correlations. Model‐implied correlations were obtained by simulating a large synthetic dataset from the estimated model and computing the empirical correlation matrix of the simulated data.
For the count‐only responses, the Beta‐binomial model yields an SRMSR of 0.164, compared to 0.309 for the binomial model. When evaluating the full dataset (50 count‐time variable pairs), the Beta‐binomial again outperforms with an SRMSR of 0.162 versus 0.209. These results underscore the improved fit of the more flexible Beta‐binomial formulation. While these SRMSR values may appear large relative to thresholds cited in the structural equation modeling (SEM) literature, such benchmarks are based on continuous, normally distributed data. Given the highly discrete and skew nature of our count responses, we advise interpreting SRMSR values in a relative sense, emphasizing model comparisons rather than absolute cutoffs.
6. DISCUSSION
This study introduces method of moments (MOM) and Monte Carlo Expectation‐Maximization (MCEM) calibration methods for jointly analyzing item‐level bounded count subject to dispersion and response time data in educational assessments. Motivated by computer‐based ORF data, we present a two‐factor latent trait model for reading accuracy and speed. These methods accommodate missing‐at‐random (MAR) data and are suitable for incomplete designs, such as block assignments where examinees complete only a subset of items. A key innovation is the use of a Beta‐binomial count model, which generalizes the standard binomial model by incorporating an item‐level dispersion parameter . This formulation accounts for increased variability in counts beyond what the binomial model allows and reduces to the binomial when .
Simulation studies demonstrated that the proposed estimation methods yield accurate and efficient parameter recovery. When overdispersion is present, the Beta‐binomial model substantially outperforms the binomial model. These findings were supported by an empirical demonstration using ORF data, where the joint model effectively captured variation in item characteristics and identified overdispersion in most passages, thus supporting the use of the Beta‐binomial model. In terms of standard error estimation, bootstrap methods showed superior performance to plug‐in estimates based on the observed information matrix.
We also reiterate the value of jointly modeling both count and time outcomes. For example, the work of Potgieter et al. (2024) introduces a model‐based Words Correct Per Minute (WCPM) score under a binomial framework. That model shows that the distribution of WCPM depends not only on the latent accuracy and speed traits ( and ), but also on their correlation. The model developed here enables a model‐based WCPM score that adjusts for the passage‐specific characteristics of the reference instrument (comprising 150 passages, 50 of which were used in this study). This joint modeling approach provides a principled way to account for variation in passage difficulty and dispersion across students.
Another important extension involves relaxing the conditional independence assumption between count and time outcomes within the same item. As noted by Zhang et al. (2024), response accuracy and response times may be conditionally dependent (e.g., reading errors may be associated with slower response times). To address this, a model with an item‐specific residual factor influencing both outcomes can be considered. Specifically, let and where is independent of and , and and are item‐specific coefficients that quantify the effect of the residual factor . This captures a joint effect on accuracy and response time within the same item. This extension allows residual dependence within items by introducing a shared latent term. Implementing this model would require substantial changes to the estimation framework, as the latent dimensionality increases from 2 to . Even so, this as a valuable direction for future research, particularly for applications where the conditional independence assumption may be too restrictive.
AUTHOR CONTRIBUTIONS
Cornelis J. Potgieter: funding acquisition; writing – original draft; methodology; conceptualization; visualization; validation; software; writing – review and editing. Akihito Kamata: funding acquisition; conceptualization; methodology; validation; project administration; data curation; investigation. Yusuf Kara: conceptualization; investigation; methodology; validation; writing – original draft; writing – review and editing; visualization. Xin Qiao: conceptualization; investigation; methodology; visualization; formal analysis; validation; writing – review and editing.
CONFLICT OF INTEREST STATEMENT
The authors declare that there is no conflict of interest regarding the publication of this paper.
Supporting information
Appendix S1.
ACKNOWLEDGEMENTS
The work of authors CJP, AK, and YK was supported by grant R305D200038 from the U.S. Department of Education, Institute of Education Sciences. Author CJP is grateful for the support and hospitality of the Sydney Mathematical Research Institute (SMRI). The authors thank the anonymous reviewers for their thoughtful feedback, which contributed significantly to the improvement and strengthening of this research.
Potgieter, C. J. , Kamata, A. , Kara, Y. , & Qiao, X. (2026). Joint analysis of dispersed count‐time data using a bivariate latent factor model. British Journal of Mathematical and Statistical Psychology, 79, 207–228. 10.1111/bmsp.70005
DATA AVAILABILITY STATEMENT
A subset of the ORF data is available at https://github.com/kamataak/orfr in the file passage2.rda. The code for implementing the Method of Moments (MOM) and Monte Carlo Expectation Maximization (MCEM) for both binomial and beta‐binomial models, along with a simulated dataset and a worked example, is available at https://github.com/cpotgieter/latent_betabinomial.
REFERENCES
- Behrens, T. , Kühn, A. , & Jäkel, F. (2024). Connecting process models to response times through Bayesian hierarchical regression analysis. Behavior Research Methods, 56, 6951–6966. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Bezirhan, U. , von Davier, M. , & Grabovsky, I. (2021). Modeling item revisit behavior: The hierarchical speed–accuracy–revisits model. Educational and Psychological Measurement, 81(2), 363–387. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Cogoni, C. , Fiuza, A. , Hassanein, L. , Antunes, M. , & Prata, D. (2024). Computer anthropomorphisation in a socio‐economic dilemma. Behavior Research Methods, 56(2), 667–679. [DOI] [PMC free article] [PubMed] [Google Scholar]
- De Boeck, P. , & Jeon, M. (2019). An overview of models for response times and processes in cognitive tests. Frontiers in Psychology, 10, 102. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Dempster, A. P. , Laird, N. M. , & Rubin, D. B. (1977). Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society: Series B: Methodological, 39(1), 1–22. [Google Scholar]
- Forthmann, B. , Gühne, D. , & Doebler, P. (2020). Revisiting dispersion in count data item response theory models: The Conway–Maxwell–Poisson counts model. British Journal of Mathematical and Statistical Psychology, 73, 32–50. [DOI] [PubMed] [Google Scholar]
- Gilbert, P. , Varadhan, R. , & Gilbert, M. P. (2009). Package ‘numderiv’. Differential Equations, 3, 203–267. [Google Scholar]
- Hung, L.‐F. (2012). A negative binomial regression model for accuracy tests. Applied Psychological Measurement, 36(2), 88–103. [Google Scholar]
- Jansen, M. G. (1997). Rasch's model for reading speed with manifest explanatory variables. Psychometrika, 62(3), 393–409. [Google Scholar]
- Johnson, N. L. , Kemp, A. W. , & Kotz, S. (2005). Univariate discrete distributions (Vol. 444). John Wiley & Sons. [Google Scholar]
- Kara, Y. , Kamata, A. , Potgieter, C. , & Nese, J. F. (2020). Estimating model‐based oral reading fluency: A Bayesian approach. Educational and Psychological Measurement, 80(5), 847–869. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kim, J. , & Lee, J.‐H. (2017). The validation of a beta‐binomial model for overdispersed binomial data. Communications in Statistics: Simulation and Computation, 46(2), 807–814. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Levine, R. A. , & Casella, G. (2001). Implementations of the Monte Carlo EM algorithm. Journal of Computational and Graphical Statistics, 10(3), 422–439. [Google Scholar]
- Man, K. , Harring, J. R. , & Zhan, P. (2022). Bridging models of biometric and psychometric assessment: A three‐way joint modeling approach of item responses, response times, and gaze fixation counts. Applied Psychological Measurement, 46(5), 361–381. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Masters, G. N. , & Wright, B. D. (1984). The essential process in a family of measurement models. Psychometrika, 49(4), 529–544. [Google Scholar]
- Najera‐Zuloaga, J. , Lee, D.‐J. , & Arostegui, I. (2019). A beta‐binomial mixed‐effects model approach for analysing longitudinal discrete and bounded outcomes. Biometrical Journal, 61(3), 600–615. [DOI] [PubMed] [Google Scholar]
- Nese, J. F. T. , & Kamata, A. (2014). Measuring oral reading fluency: Computerized oral reading evaluation (core) (Grant No. R305A140203, Funded by the National Center for Education Research (NCER), University of Oregon, Award Period: 8/1/2014‐7/31/2018).
- Piessens, R. , de Doncker‐Kapenga, E. , Überhuber, C. W. , & Kahaner, D. K. (2012). Quadpack: A subroutine package for automatic integration (Vol. 1). Springer Science & Business Media. [Google Scholar]
- Potgieter, C. , Qiao, X. , Kamata, A. , & Kara, Y. (2024). Likelihood‐based estimation of model‐derived oral reading fluency. Journal of Educational Measurement, 61, 542–559. [Google Scholar]
- Qiao, X. , Jiao, H. , & He, Q. (2023). Multiple‐group joint modeling of item responses, response times, and action counts with the Conway‐Maxwell‐Poisson distribution. Journal of Educational Measurement, 60(2), 255–281. [Google Scholar]
- Rasch, G. (1960). Probabilistic models for some intelligence and attainment tests. Denmarks Paedagogiske Institut. [Google Scholar]
- Rasch, G. (1972). Objektivitet i samfundsvidenskaberne et metodeproblem [Objectivity in social sciences: A methodological problem]. Nationaløkonomisk Tidsskrift, 110, 161–196. [Google Scholar]
- Spray, J. A. (1990). One‐parameter item response theory models for psychomotor tests involving repeated, independent attempts. Research Quarterly for Exercise and Sport, 61(2), 162–168. [DOI] [PubMed] [Google Scholar]
- van der Linden, W. J. (2007). A hierarchical framework for modeling speed and accuracy on test items. Psychometrika, 72(3), 287. [Google Scholar]
- Van Der Linden, W. J. (2009). Conceptual issues in response‐time modeling. Journal of Educational Measurement, 46(3), 247–272. [Google Scholar]
- Wei, G. C. , & Tanner, M. A. (1990). A Monte Carlo implementation of the EM algorithm and the poor man's data augmentation algorithms. Journal of the American Statistical Association, 85(411), 699–704. [Google Scholar]
- Zhan, P. , Man, K. , Wind, S. A. , & Malone, J. (2022). Cognitive diagnosis modeling incorporating response times and fixation counts: Providing comprehensive feedback and accurate diagnosis. Journal of Educational and Behavioral Statistics, 47(6), 736–776. [Google Scholar]
- Zhang, M. , Andersson, B. , & Jin, S. (2024). Fast estimation of generalized linear latent variable models for performance and process data with ordinal, continuous, and count observed variables. British Journal of Mathematical and Statistical Psychology, 77(3), 477–507. [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Appendix S1.
Data Availability Statement
A subset of the ORF data is available at https://github.com/kamataak/orfr in the file passage2.rda. The code for implementing the Method of Moments (MOM) and Monte Carlo Expectation Maximization (MCEM) for both binomial and beta‐binomial models, along with a simulated dataset and a worked example, is available at https://github.com/cpotgieter/latent_betabinomial.




