Asymptotic Posterior Normality of Multivariate Latent Traits in an IRT Model

Mia J K Kornely; Maria Kateri

doi:10.1007/s11336-021-09838-2

. 2022 Feb 11;87(3):1146–1172. doi: 10.1007/s11336-021-09838-2

Asymptotic Posterior Normality of Multivariate Latent Traits in an IRT Model

Mia J K Kornely ¹, Maria Kateri ^1,^✉

PMCID: PMC9433366 PMID: 35149979

Abstract

The asymptotic posterior normality (APN) of the latent variable vector in an item response theory (IRT) model is a crucial argument in IRT modeling approaches. In case of a single latent trait and under general assumptions, Chang and Stout (Psychometrika, 58(1):37–52, 1993) proved the APN for a broad class of latent trait models for binary items. Under the same setup, they also showed the consistency of the latent trait’s maximum likelihood estimator (MLE). Since then, several modeling approaches have been developed that consider multivariate latent traits and assume their APN, a conjecture which has not been proved so far. We fill this theoretical gap by extending the results of Chang and Stout for multivariate latent traits. Further, we discuss the existence and consistency of MLEs, maximum a-posteriori and expected a-posteriori estimators for the latent traits under the same broad class of latent trait models.

Supplementary Information

The online version contains supplementary material available at 10.1007/s11336-021-09838-2.

Keywords: multidimensional item response theory, empirical Bayes, posterior distribution, ability estimation, consistency, normal approximation, Bernstein–von Mises theorem

Introduction

In the context of item response theory (IRT) methodology, statistical inference for the examinee’s ability relies often on the assumption that its posterior distribution given the test response is a normal distribution. As this is usually hard to justify and in contradiction to common models of the examinees abilities distribution in the population, it can assumed to be, for a long test, well approximated by a normal distribution. This assumption of asymptotic posterior normality (APN) is part of the famous Dutch identity conjecture of Holland (1990), who mentioned then that he was not aware of a thorough discussion of APN of latent variables and this would be an interesting area for future research. Shortly after, Chang and Stout (1993) proved the APN for univariate latent traits (LTs), mentioning that APN for multivariate LTs can be proved, but without providing further details or discussing the associated regularity conditions required.

As far as we know, APN of multivariate LTs has not been proved so far for IRT models of a general context, although posterior normality or APN is assumed quite often under various IRT setups (e.g., Anderson & Vermunt, 2000; Anderson & Yu, 2017; Anderson et al., 2007; Hessen, 2012; Li, 2010; Paek, 2016). For example, Pelle et al. (2016) assume posterior multivariate normality for the latent variable vector of a log-linear multidimensional Rasch model for capture–recapture analysis of registration data.

Sometimes the APN-assumption is justified by the APN in a Bayesian framework (pointing to the “Bernstein–von Mises Theorem”) without however proceeding to further details (e.g., the computationally efficient adaptive quadrature methods for high-dimensional item factor analysis (Schilling & Bock, 2005) and for generalized linear mixed models (Rabe-Hesketh et al., 2002) are based on the APN assumption).

In this work, we study the APN for multivariate latent trait models, focusing on models for dichotomous items and targeting at conditions that are tailored to IRT models and thus simpler to verify. APN of LTs, univariate or multivariate, is related to Bayesian asymptotics. In the light of this connection, we deepen in the approach of Ghosal et al. (1995), who discussed asymptotic posterior distributions in a very general setup that includes the regular cases and some non-regular cases as well. They also proved a general result on the asymptotic equivalence of the Bayes and maximum likelihood estimators, a well-known result for the regular cases. In particular, we generalize the approach and results of Chang and Stout (1993), CS hereafter, linking them to the semiproper centering concept of Ghosal et al. (1995), GGS hereafter, and embedding them in their approach. We provide conditions for multivariate APN that correspond one to one to the conditions of CS for univariate LTs, which is the standard approach for IRT models, as alternatives to the conditions imposed in Ghosal et al. (1995). Even for the case of univariate LTs, the proposed approach could be an interesting alternative to that of CS, since it has the advantage of applying also to models with non-monotone item response functions, which is not the case in the CS setup. Furthermore, we discuss conditions under which the existence of the maximum likelihood estimators (MLEs) for latent variable vectors is ensured. The consistency of MLEs under mild conditions, which was indicated as an open issue by Sinharay (2015), follows as a natural consequence of the proof of the APN. Finally, we prove the consistency of maximum a-posteriori (MAP) and expected a-posteriori (EAP) estimators for multivariate LTs.

The paper is organized as follows. Basic notation and the adopted IRT framework is set in Sect. 2, while the CS-theory for a univariate LT is briefly reviewed in Sect. 3. The approach of Ghosal et al. (1995) is discussed and linked to the APN of LTs and the CS-results in Sect. 4. The CS-conditions are generalized for the multivariate case and commented in Sect. 5 while they are verified for characteristic examples in Sect. 6. The main result on APN for multivariate LTs and properties of the MLEs, MAPs and EAPs of LTs are provided in Sect. 7 and supported by a simulation study in Sect. 8. Finally, the results are summarized in Sect. 9. A brief version of the proofs of the results of Sect. 7 is given in “Appendix” while their extended version can be found in the web-appendix. For a preliminary version of these results, see also Chapter 3 in Kornely (2021).

Preliminaries

Consider a test consisting of d binary response variables $Y_{i}$ , $i \in [d] : = {1, \dots, d}$ , with $Y_{i} \in {0, 1}$ for the i-th item, where 1 (0) denotes a correct (incorrect) response, and defined over a probability space $(Ω, A, P)$ . Consider further the response vector $Y^{(d)} = {(Y_{1}, \dots, Y_{d})}^{^{⊺}}$ , with superscript $^{^{⊺}}$ denoting the transpose of a vector. Thus, the manifest probability for a specific response pattern $y^{(d)}$ is given by $P (y^{(d)}) = P (Y^{(d)} = y^{(d)})$ . In an multidimensional IRT (MIRT) modeling framework, manifest probabilities are derived via conditioning on an absolutely continuous latent variable vector $η = {(η_{1}, \dots, η_{q})}^{^{⊺}} \in Θ \subseteq R^{q}$ , defined over the same probability space as the binary items with probability density function (pdf) and cumulative distribution function (cdf) $h$ and $H$ , respectively. In particular, the conditional probability mass function (pmf) of $Y_{i} | η$ is thus given by

\begin{matrix} P (Y_{i} = y | η) = P_{i} {(η)}^{y} {(1 - P_{i} (η))}^{1 - y}, y \in {0, 1}, i \in [d], \end{matrix}

with $P_{i} (η)$ being known as the i-th item response function. In MIRT modeling, specific assumptions are usually imposed on the conditional distribution $P (Y^{(d)} = y^{(d)} | η)$ ; namely the assumption of local independence

\begin{matrix} P^{(d)} (y^{(d)} | η) : = P (Y^{(d)} = y^{(d)} | η) & = \prod_{i = 1}^{d} P (Y_{i} = y_{i} | η), y^{(d)} \in {0, 1}^{d}, \end{matrix}

and that of monotonicity for the item response functions $P_{i} (η)$ , i.e., for $i \in [d]$

\begin{matrix} P_{i} (η) is strictly monotonic in every dimension of η being measured. \end{matrix}

Note that assumption (3), which is required in the CS-approach, is relaxed in our setup. In the sequel, we denote by ${Y_{i}}_{i \in N} \sim P (η)$ a sequence of Bernoulli random variables that fulfill (1) and (2) for all $d \in N$ .

Due to assumption (2) and using (1), the manifest probabilities are derived through the following integral

\begin{matrix} P (y^{(d)}) = \int \dots \int (\prod_{i = 1}^{d} P_{i} {(η)}^{y_{i}} {(1 - P_{i} (η))}^{1 - y_{i}}) h (η) d η . \end{matrix}

Remark 1

For simplicity of notation, we use $η$ to denote the random latent variable vector as well as a realization of it. If not clear from the context, we write explicitly $η \in Θ$ for a realization or $η \sim H$ for the random vector with values in $Θ$ . In the sequel, we abbreviate the term latent variable vector to latent vector.

The posterior density of $η$ , given an observed response $y^{(d)} \in {0, 1}^{d}$ , is then given by

\begin{matrix} h (η | y^{(d)}) : = h (η | Y^{(d)} = y^{(d)}) = \frac{P (Y^{(d)} = y^{(d)} | η) h (η)}{P (Y^{(d)} = y^{(d)})} = \frac{exp (ℓ^{(d)} (η | y^{(d)})) h (η)}{P (y^{(d)})}, \end{matrix}

where $ℓ^{(d)} (\cdot | y^{(d)}))$ is the log-likelihood corresponding to (1), given by

\begin{matrix} ℓ^{(d)} (η | y^{(d)})) = \sum_{i = 1}^{d} (y_{i} λ_{i} (η) - ψ (λ_{i} (η))), \end{matrix}

with $λ_{i}$ denoting the item logit, i.e.,

\begin{matrix} λ_{i} (η) : = log (\frac{P_{i} (η)}{1 - P_{i} (η)}), \end{matrix}

and the function $ψ (\cdot)$ being defined as $ψ (x) = log (1 + exp (x))$ , $x \in R$ .

Let ${\hat{η}}_{d} = \hat{η} (y^{(d)})$ denote the MLE of the true value of the latent vector $η_{0}$ , based on a test realization $y^{(d)}$ . Furthermore, the Fisher information matrix of the test at point $η$ is given by

\begin{matrix} I^{(d)} (η) : = E_{η} (\nabla, ℓ^{(d)}, (η ∣ Y^{(d)}), \nabla^{⊺}, ℓ^{(d)}, (η ∣ Y^{(d)})) = \sum_{i = 1}^{d} I_{i} (η), \end{matrix}

where $I_{i} (\cdot)$ is the i-th item information matrix

\begin{matrix} I_{i} (η) : = & E_{η} (\nabla log (P_{i} {(η)}^{Y_{i}} {(1 - P_{i} (η))}^{1 - Y_{i}}) \nabla^{⊺} log (P_{i} {(η)}^{Y_{i}} {(1 - P_{i} (η))}^{1 - Y_{i}})) \\ = & P_{i} (η) (1 - P_{i} (η)) \nabla λ_{i} (η) \nabla λ_{i} {(η)}^{⊺}, η \in Θ . \end{matrix}

This work studies the APN of $η$ for $d \to \infty$ , based on a sequence of random variables ${Y_{i}}_{i \in N} \sim P (η)$ , as defined above. Particularly, we shall prove that, under certain conditions, (8) is invertible at ${\hat{η}}_{d}$ and $η | Y^{(d)} = y^{(d)}$ is approximately normal distributed, $N ({\hat{η}}_{d}, {[I^{(d)} (\hat{η_{d}})]}^{- 1})$ , for a realization $y^{(d)}$ of $Y^{(d)}$ . This enables the approximation of probabilities of the type

\begin{matrix} P ((I^{(d)} {({\hat{η}}_{d})}^{1 / 2} (η - {\hat{η}}_{d}) \in B|, Y^{(d)}), B \in B^{q}, \end{matrix}

where $B^{q}$ denotes the Borel- $σ$ -algebra of $R^{q}$ . Practically speaking, a set B can be any countable union or intersection of q-dimensional real cubes.

Next, we define some functions that are useful for the sequel derivations. For all $d \in N$ , set $Z^{(d)} : Θ \times Θ \to R$ with

\begin{matrix} Z^{(d)} (η, η^{'}) : = \prod_{i = 1}^{d} Z_{i} (η, η^{'}), \end{matrix}

where $Z_{i} : Θ \times Θ \to R$ , $i \in N$ , are defined as

\begin{matrix} Z_{i} (η, η^{'}) : = \frac{P_{i} {(η)}^{Y_{i}} {(1 - P_{i} (η))}^{1 - Y_{i}}}{P_{i} {(η^{'})}^{Y_{i}} {(1 - P_{i} (η^{'}))}^{1 - Y_{i}}} . \end{matrix}

Note that for given d and $η, η^{'} \in Θ$ , (10) is the likelihood ratio of the likelihoods for $η$ and $η^{'}$ . Furthermore,

\begin{matrix} log (Z^{(d)}, (η, η^{'})) = \sum_{i = 1}^{d} log Z_{i} (η, η^{'}) = ℓ^{(d)} (η ∣ Y^{(d)}) - ℓ^{(d)} (η^{'} ∣ Y^{(d)}), \end{matrix}

while $- E_{η_{0}} (log Z_{i} (η, η_{0}))$ is the Kullback–Leibler divergence between the conditional distributions of $Y_{i}$ given $η$ and $η_{0}$ , respectively. A basic approach for deriving APN results relies on a quadratic approximation of (12).

Review of APN for Univariate Latent Traits

In case of a single latent variable ( $q = 1$ , $η = η$ ), Chang and Stout (1993) proved the APN of the univariate latent trait, adopting the approach of Walker (1969) for binary $Y_{i}$ , $i \in [d]$ , that are independent but not identically distributed (inid). We briefly review their results, so that we can extend in the sequel their approach to the multivariate case ( $q > 1$ ).

Additional to the general assumptions (2) and (3), they also introduced the following regularity conditions.

- [i]
  Let $η \in Θ$ , where $Θ \subseteq (- \infty, \infty)$ is a bounded or unbounded interval.
- [ii]
  Let the prior density $h$ be continuous and positive at the true value $η_{0}$ .
$P_{i} (η)$ is twice continuously differentiable with the first two derivatives being uniformly bounded in absolute value with respect to both $η$ and i in some closed interval $Θ_{0} \subset Θ$ around $η_{0}$ .
For every fixed $η \neq η_{0}$ , $η \in Θ$ , there is a $c (η) > 0$ such that
$\begin{matrix} \underset{d ⟶ \infty}{lim sup} \frac{1}{d} \sum_{i = 1}^{d} E_{η_{0}} log Z_{i} (η, η_{0}) \leq & - c (η), \end{matrix}$ 13
and ${sup}_{i \in N} | λ_{i} (η) | < \infty$ .
If restricted to $Θ_{0}$ , the following sets of functions are uniformly bounded:
$\begin{matrix} \{|\frac{d I_{i}}{d η}| ∣ i \in N\}, \{|\frac{d^{2} λ_{i}}{d η^{2}}| ∣ i \in N\}, \{|\frac{d^{3} λ_{i}}{d η^{3}}| ∣ i \in N\} . \end{matrix}$
Asymptotically, the average information at $η_{0}$ is bounded away from 0, i.e.,
$\begin{matrix} \underset{d ⟶ \infty}{lim inf} \frac{I^{(d)} (η_{0})}{d} > 0 . \end{matrix}$

Remark 2

With respect to the prior of $η$ , additional to (CS1[ii]), Chang and Stout (1993) implicitly assumed its properness, which was stated explicitly in the earlier associated technical report (Chang & Stout, 1991, p. 15).

Remark 3

Reasonable models for applications do not depend on a specific compact interval in $Θ$ since usually $η_{0}$ is unknown. For this, also the conditions depending on $η_{0}$ should be satisfied for almost all $η_{0} \in Θ$ and for almost each $η_{0}$ there should be some (arbitrary small) interval $Θ_{0}$ . In the usual models these conditions are satisfied.

Chang and Stout (1993) argued convincingly that conditions (CS1)–(CS5) are realistic and non-restrictive in practice for commonly used IRT models of well-designed tests. They particularly commented condition (CS3) and (13), which plays an important role in the proof of their main theorem. (CS3) is required when the item responses ${Y_{i}}_{i \in N}$ are independent but not identically distributed. If they are iid, (CS3) is automatically satisfied, which however is not necessarily the case in IRT models. Their main results are expressed in the three theorems given below.

Theorem 1

(Chang & Stout, 1993, Theorem 1) Suppose that conditions (CS1) through (CS5) hold for a fixed $η_{0}$ . Let ${\hat{η}}_{d}$ be the MLE of $η_{0}$ and ${\hat{σ}}_{d} = {(I^{(d)} ({\hat{η}}_{d}))}^{- 1 / 2}$ . Then, for $- \infty \leq a < b \leq \infty$ , the posterior probability of ${\hat{η}}_{d} + a {\hat{σ}}_{d} < η < {\hat{η}}_{d} + b {\hat{σ}}_{d}$ approaches the probability of $Z \in (a, b)$ in $P_{η_{0}}$ for $Z \sim N (0, 1)$ , that means

\begin{matrix} A_{d} \equiv \int_{{\hat{η}}_{d} + a {\hat{σ}}_{d}}^{{\hat{η}}_{d} + b {\hat{σ}}_{d}} h (η ∣ Y^{(d)}) d η \overset{P_{η_{0}}}{⟶} \frac{1}{\sqrt{2 π}} \int_{a}^{b} exp (- \frac{1}{2} η^{2}) d η \equiv A, d \to \infty . \end{matrix}

Theorem 2

(Chang & Stout, 1993, Theorem 2) Suppose that conditions (CS1) through (CS5) hold for fixed $η_{0}$ and let ${\hat{η}}_{d}$ and ${\hat{σ}}_{d}$ be defined as in Theorem 1. Then, for $- \infty \leq a < b \leq \infty$ , the posterior probability $A_{d}$ approaches A $P_{η_{0}}$ -almost surely, as $d \to \infty$ .

Theorem 3

(Chang & Stout, 1993, Theorem 3) Assume $Θ = Θ_{0}$ , a finite interval. Suppose that conditions (CS1) through (CS5) hold for all $η_{0} \in Θ_{0}$ and let ${\hat{η}}_{d}$ and ${\hat{σ}}_{d}$ be defined as in Theorem 1. Then, for $- \infty \leq a < b \leq \infty$ , the posterior probability $A_{d}$ approaches A in manifest probability $P$ , as $d \to \infty$ .

The result of Theorem 3 does not depend on the true value $η_{0}$ and is thus of special practical interest for estimation and prediction purposes. As Chang and Stout (1993) comment, Theorems 1 and 3 treat sampling from a fixed ability sub-population and from the whole population, respectively. An important by-product of the proof of the APN of latent variables distributions was the establishment of the weak and strong consistency of the MLE of $η$ under milder conditions than Lord (1983).

Due to the theorems above, the following approximation for a large d and any observed response pattern $y \in {0, 1}^{d}$ , i.e., the construction of asymptotic credible intervals, is justified

\begin{matrix} P (a \leq η \leq b ∣ Y^{(d)} = y) \approx Φ_{1} (b ; {\hat{η}}_{d}, {\hat{σ}}^{2}) - Φ_{1} (a ; {\hat{η}}_{d}, {\hat{σ}}^{2}), \end{matrix}

for $- \infty \leq a \leq b \leq \infty \in R$ , where ${\hat{η}}_{d}$ is the MLE of $η_{0}$ based on the sample $y$ , ${\hat{σ}}^{2} = {(I^{(d)} ({\hat{η}}_{d}))}^{- 1}$ and $Φ_{1} (\cdot ; {\hat{η}}_{d}, {\hat{σ}}^{2})$ denotes the cdf of $N ({\hat{η}}_{d}, {\hat{σ}}^{2})$ . Approximation (14) is of special practical importance in the context of long tests where the exact computation of posterior probabilities for latent variables is commonly intractable. Furthermore, (14) allows the approximation of the posterior if the exact distribution $H$ of $η$ is unavailable or uncertain.

Finally, Chang and Stout (1993) noted that their theory, under suitable regularity conditions, can be extended to prove the APN for latent vectors of general multidimensional IRT models, without however commenting further the proving procedure or the regularity conditions required. Next, we discuss the asymptotic posterior distribution of multivariate latent traits in the context of MIRT.

APN for Multivariate Latent Traits

The theory of APN of the latent variables is naturally linked to Bayesian procedures and results on the convergence of posterior distributions. In particular, interesting and inspiring is the fundamental contribution by GGS (Ghosal et al., 1995), who consider asymptotic multivariate posterior distributions (not necessarily normal) in a very general and flexible framework discussing different types of convergence, relying on earlier works by Ghosh et al. (1994) and Ibragimov and Has’minskii (1981), denoted as IH hereafter. In particular, they studied posterior convergence of suitably centered and normalized posteriors. Their results provide a very general framework, which can be adopted for the APN in the IRT setup. Next, we adjust the GGS approach for MIRT models and discuss their conditions, embedding the CS approach in the GGS framework.

Following Ghosal et al. (1995, Definition 2), we distinguish two types of APN and link them to the statistic used for the centering of the posterior distribution of the latent vector.

Definition 1

Let $Z \sim N_{q} (0, I_{q})$ be a q-variate standard normal distributed random vector. A $R^{q}$ -valued statistic ${\tilde{η}}_{d}$ is called a proper centering (with limiting normal distribution) if

\begin{matrix} sup_{A \in B^{q}} |P (I^{(d)} {({\tilde{η}}_{d})}^{1 / 2} (η - {\tilde{η}}_{d}) \in A |Y^{(d)})) - P (Z \in A)| \overset{P_{η_{0}}}{⟶} 0, as d \to \infty . \end{matrix}

A statistic ${\tilde{η}}_{d}$ is called semiproper centering (with limiting normal distribution) if, for all $A \in B^{q}$ ,

\begin{matrix} P (I^{(d)} {({\tilde{η}}_{d})}^{1 / 2} (η - {\tilde{η}}_{d}) \in A |Y^{(d)})) \overset{P_{η_{0}}}{⟶} P (Z \in A), as d \to \infty . \end{matrix}

A statistic ${\tilde{η}}_{d}$ is called compatible (with the posterior), if

\begin{matrix} (I^{(d)} {({\tilde{η}}_{d})}^{1 / 2} ({\tilde{η}}_{d} - η_{0}), h^{*} (\cdot |Y^{(d)}))), \end{matrix}

as a random element in $R^{q} \times L^{1} (R^{q})$ , converges in distribution for $d \to \infty$ , where $h^{*} (\cdot ∣ Y^{(d)})$ denotes the density of the posterior distribution of $I^{(d)} {(η_{0})}^{1 / 2} (η - η_{0})$ and $L^{1} (R^{q})$ stands for the space of all q-variate Lebesgue-integrable real functions on $R^{q}$ .

Proper and semiproper centering correspond to uniform and pointwise convergence of the posterior of the standardized latent vector $I^{(d)} {({\tilde{η}}_{d})}^{1 / 2} (η - {\tilde{η}}_{d})$ , respectively. Hence, proper centering is a stronger property than semiproper centering, and is consequently expected to require stronger assumptions.

Under this view, one can easily recognize that Theorem 1 of Chang and Stout (1993) is the semiproper centering of the MLE, since it can be formulated as

\begin{matrix} P (I^{(d)} {({\hat{η}}_{d})}^{1 / 2} (η - {\hat{η}}_{d}) \in [a, b] |Y^{(d)})) \overset{P_{η_{0}}}{⟶} P (Z \in [a, b]), d ⟶ \infty, \end{matrix}

for $Z \sim N (0, 1)$ , $A = [a, b] \subset R$ and ${\tilde{η}}_{d} = {\hat{η}}_{d}$ being the MLE of $η_{0}$ based on $Y^{(d)}$ . Thus, for the extension of the CS-theory for multivariate LTs, we focus on semiproper centering.

The asymptotic results of GGS adjusted in our setup, primarily focus on the convergence of the posterior distribution of the standardized latent vector

\begin{matrix} η^{*} = I^{(d)} {({\tilde{η}}_{d})}^{1 / 2} (η - η_{0}), \end{matrix}

with $η^{*} \in Θ_{d} : = I^{(d)} {({\tilde{η}}_{d})}^{1 / 2} (Θ - η_{0})$ . We need the likelihood ratio (10) expressed in terms of $η^{*}$ , which is denoted by

\begin{matrix} Z^{* (d)} (η^{*}) : = Z^{(d)} (η_{0} + I^{(d)} {({\tilde{η}}_{d})}^{- 1 / 2} η^{*}, η_{0}) . \end{matrix}

In our setup, for binary response variables $Y_{i}$ , $i \in [d]$ , and log-likelihoods given by (6), the likelihood ratio $Z^{* (d)}$ takes the form

\begin{matrix} Z^{* (d)} (η^{*}) = & exp (\sum_{i = 1}^{d} (Y_{i} (λ_{i} (η_{0} + I^{(d)} {({\tilde{η}}_{d})}^{- 1 / 2} η^{*}) - λ_{i} (η_{0})) \\ - (ψ (λ_{i}, (η_{0} + I^{(d)} {({\tilde{η}}_{d})}^{- 1 / 2} η^{*})) - ψ (λ_{i}, (η_{0})))), \end{matrix}

with the item logits $λ_{i}$ provided in (7).

The primary conditions of GGS for APN are given as follows:

For some $M > 0$ , $m_{1} \geq 0$ and $α > 0$ holds
$\begin{matrix} E_{η_{0}} ({|Z^{* (d)} {(η_{1}^{*})}^{1 / 2} - Z^{* (d)} {(η_{2}^{*})}^{1 / 2}|}^{2}) \leq M (1 + R^{m_{1}}) {‖ η_{1}^{*} - η_{2}^{*} ‖}^{α}, \end{matrix}$
for all $η_{j}^{*} \in Θ_{d}$ , satisfying $‖ η_{j}^{*} ‖ \leq R$ , $j = 1, 2$ , where $‖ \cdot ‖$ is the Euclidean norm.
For all $η^{*} \in Θ_{d}$ holds
$\begin{matrix} E_{η_{0}} (Z^{* (d)}, {(η^{*})}^{1 / 2}) \leq exp (- g_{d} (‖ η^{*} ‖)), \end{matrix}$
where ${g_{d}}_{d \in N}$ is a sequence of real-valued functions on $[0, \infty)$ satisfying the following: (a) for a fixed $d \geq 1$ , ${lim}_{x \to \infty} g_{d} (x) = \infty$ ; (b) for any $N > 0$ ,
$\begin{matrix} lim_{x \to \infty} lim_{d \to \infty} x^{N} exp (- g_{d} (x)) = lim_{d \to \infty} lim_{x \to \infty} x^{N} exp (- g_{d} (x)) = 0 . \end{matrix}$
For all $n \in N$ and $η_{1}^{*}, \dots, η_{n}^{*} \in R^{q}$ , the vector of the likelihood-ratios, defined in (18), satisfies
$\begin{matrix} (Z^{* (d)} (η_{1}^{*}), \dots, Z^{* (d)} (η_{n}^{*})) \overset{D}{⟶} (Z (η_{1}^{*}), \dots, Z (η_{n}^{*})), \end{matrix}$
for $d ⟶ \infty$ , where $\overset{D}{⟶}$ denotes convergence in distribution and $Z (η^{*}) = exp (ξ^{T} η^{*} - \frac{1}{2} ‖ η^{*} ‖^{2})$ , $η^{*} \in R^{q}$ , where $ξ \sim N_{q} (0, I_{q})$ .

Under these conditions, Ghosal et al. (1995) provided the following general result. Notice that they discussed a far more general framework, allowing further distributions for the response variable and considering cases for which the posterior may converge to another distribution than a normal. We refer to GGS for further details regarding these cases.

Theorem 4

(Ghosal et al., 1995, Theorem 1) Assume that conditions (GGS1) through (GGS3) hold. If either a proper centering or a semiproper compatible centering sequence ${{\tilde{η}}_{d}}_{d \in N}$ exists, then it exists a random vector $W$ , such that (a) $I^{(d)} {({\tilde{η}}_{d})}^{1 / 2} ({\tilde{η}}_{d} - η_{0}) \overset{D}{⟶} W$ for $d ⟶ \infty$ and (b) for almost all $x \in R^{q}$ , $\frac{Z (x - W)}{\int_{R^{q}} Z (x^{*} - W) d x^{*}}$ is nonrandom, where Z is as defined in condition (GGS3). Conversely, if (b) holds for a random vector $W$ , then any Bayes estimator (with respect to a prior and loss considered by Ghosal et al. (1995)) is a compatible proper centering.

Applying Theorem 4 for an appropriate Bayes estimator for $η_{0}$ , the APN of an MIRT model under conditions (GGS1) to (GGS3) is derived. The extension of Theorem 4 for an MLE, i.e., for ${\tilde{η}}_{d} = {\hat{η}}_{d}$ , is based on its asymptotic equivalence to an arbitrary Bayes estimator, which has been proved by Ghosal et al. (1995, cf. Corollary 1) under (GGS2)–(GGS3) and the following strengthened form of (GGS1):

For some $M > 0$ , $m_{1} \geq 0$ and $m \geq α > q$ holds
$\begin{matrix} E_{η_{0}} ({|Z^{* (d)} {(η_{1}^{*})}^{1 / m} - Z^{* (d)} {(η_{2}^{*})}^{1 / m}|}^{m}) \leq M (1 + R^{m_{1}}) {‖ η_{1}^{*} - η_{2}^{*} ‖}^{α}, \end{matrix}$
for all $η_{j}^{*} \in Θ_{d}$ , satisfying $‖ η_{j}^{*} ‖ \leq R$ , $j = 1, 2$ .

Remark 4

Alternatively to the GGS conditions discussed above, one could consider the conditions of Ibragimov & Has’minskii (1981, Section III.4) for general regular models for independent non-necessarily identical distributed (inid) random variables. They proved that these conditions are sufficient for the set of conditions N1–N4 of IH, Section III.1, where N1 is the uniform asymptotic normality and corresponds to (GGS3), while N3 and N4 correspond to (GGS1) and (GGS2), respectively.

Regularity Conditions for Asymptotic Properties of Latent Vectors

Aiming to generalize the CS approach, we provide conditions for APN of (multivariate) LTs that correspond one to one to the conditions of CS for univariate LTs, which is the standard approach for IRT models, as alternatives to the conditions imposed in Ghosal et al. (1995). Throughout, we assume that ${Y_{i}}_{i \in N} \sim P (η)$ , i.e., ${Y_{i}}_{i \in N}$ are Bernoulli random variables fulfilling (1) and (2), and that the true latent vector $η_{0}$ lies in the interior of the parameter space, i.e., $η_{0} \in Θ \ \partial Θ$ , where $\partial Θ$ denotes the boundary of $Θ$ . The asymptotic results of Sect. 7 rely on the following regularity conditions.

- [i]
  The set $Θ$ is closed, convex and has non-empty interior.
- [ii]
  The prior density $h$ of $η$ is proper and continuous at $η_{0}$ with $h (η_{0}) > 0$ .
$P_{i}$ is thrice continuously differentiable, $i \in N$ . If restricted to a compact subset $K \subseteq Θ$ , all $|\frac{\partial P_{i}}{\partial η_{k}}|$ and $|\frac{\partial^{2} P_{i}}{\partial η_{k} \partial η_{j}}|$ are uniformly bounded for all $i \in N$ , $1 \leq j, k \leq q$ . Moreover, there exist constants $0 < ζ_{0} (K) < ζ_{1} (K) < 1$ , which are independent of $i \in N$ , such that
$\begin{matrix} ζ_{0} (K) \leq inf_{(i, η) \in N \times K} P_{i} (η) \leq sup_{(i, η) \in N \times K} P_{i} (η) \leq ζ_{1} (K) . \end{matrix}$ 19
For each $η \in Θ$ , $η \neq η_{0}$ , there is a $c (η) < 0$ such that
$\begin{matrix} \underset{d \to \infty}{lim sup} \frac{1}{d} \sum_{i = 1}^{d} E_{η_{0}} (log Z_{i} (η, η_{0})) = \underset{d \to \infty}{lim sup} \frac{1}{d} E_{η_{0}} (ℓ^{(d)} (η | Y^{(d)}) - ℓ^{(d)} (η_{0} | Y^{(d)})) \leq c (η), \end{matrix}$
and if $Θ$ is unbounded holds additionally
$\begin{matrix} sup_{η \in Θ \ B_{δ} (η_{0})} c (η) < 0, for all δ > 0, \end{matrix}$ 20
where $B_{δ} (η_{0}) : = {η \in R^{q} : ‖ η - η_{0} ‖ < δ}$ is the open ball of radius $δ$ and center $η_{0}$ .
If restricted to any compact set $K \subseteq Θ$ , the following set of functions is uniformly bounded
$\begin{matrix} {|\frac{\partial^{3} P_{i}}{\partial η_{k} \partial η_{g} \partial η_{u}}| & : i \in N, 1 \leq k, g, u \leq q} . \end{matrix}$
For all $η \in Θ$ holds
$\begin{matrix} \underset{d \to \infty}{lim inf} ν_{min} (\frac{1}{d}, \sum_{i = 1}^{d}, \nabla, λ_{i}, (η), \nabla, λ_{i}, {(η)}^{⊺}) > 0, \end{matrix}$ 21
where $ν_{min}$ denotes the smallest eigenvalue.

These regularity conditions correspond one to one to conditions (CS1)–(CS5), given in Sect. 3. For the comparison of these conditions, have in mind that convexity and connectivity are equivalent properties in $R$ . The convexity condition in (CS1’) is at first place stronger but it does not impose a real practical restriction, since non-convex $Θ$ are only rarely needed in MIRT. Analogue to the CS-theory (s. Remark 3), conditions involving $η_{0}$ , like $h (η_{0}) > 0$ , should be interpreted as $h > 0$ almost surely. Condition (CS1’[ii]) on $h$ seems more strict than (CS1[ii]). However, Chang and Stout (1993) require additional a proper prior (s. Remark 2). Thus, under the consideration that $η_{0}$ is still unknown and we consider $R^{q}$ instead of $R$ , the requirements on proper priors in (CS1’[ii]) are analogue to (CS1[ii]). Finally note that in the generalization of conditions (CS3) and (CS4), some requirements have been removed as the remaining requirements on $λ_{i}$ , $i \in N$ , and its derivatives are implied by conditions (CS2’) and (CS4’).

A common assumption in one-dimensional IRT models ( $q = 1$ ) is the strict monotonicity assumption (3) of $P_{i}$ in $η$ , for all $i \in N$ . Conceptually, this represents the notion that a more able subject has a higher probability of responding correct in any item of an educational test. Thus, models fulfilling this strict monotonicity assumption are easier to interpret. However, models with non-generalized-linear latent variable effects can be more adequate in practice. For example, Rizopoulos and Moustaki (2008) considered IRT models with possibly non-monotonic latent variable dependencies (like polynomial effects). Due to this reason, in order to allow for more flexible modeling options, in our semiproper centering theory, we abandon the requirement on strict monotonicity of $η \mapsto P_{i} (η)$ , for all $i \in N$ , in each component. Since the results of Chang and Stout (1993) rely on this monotonicity assumption, the merit of the current contribution is not only the extension of the CS-results for latent vectors ( $q > 1$ ) but also for univariate latent variables in case of a non-monotonic latent variable effect.

If all $P_{i}$ , $i \in N$ , are strictly monotonic in each component, then requirement (19) of condition (CS2’) is satisfied as in the univariate case. Otherwise, the requirement in (19) is generally not really restrictive; it is the technical formulation of the notion that the response probabilities can (but not necessarily have to) approach zero or one only if $‖ η ‖ ⟶ \infty$ . Assumption (20) in (CS3’) serves for ensuring the identifiability of the latent vector in case of $‖ η ‖ ⟶ \infty$ . Hence, this condition is quite natural for a statistical model. In the univariate case, (20) of (CS3’) is implied by the strict monotonicity, too. But in contrast to (19), (20) cannot be concluded directly from the strict monotonicity of all $P_{i}$ in each component if $q > 1$ . Moreover, while a single item can suffice in the univariate case for identifiability, there are always at least q needed in the q-dimensional one. Similarly, the average test information $\frac{1}{d} I^{(d)} (η)$ is always singular for $d < q$ , since it is a sum of d rank-one matrices. Condition (CS5’) ensures that $\frac{1}{d} I^{(d)} (η)$ becomes regular for $d \to \infty$ and can be interpreted as a condition to ensure that the asymptotic posterior of $η$ is regular q-dimensional distributed and does not have a lower dimensional support (cf. Lemma W.6 in the web-appendix).

To get a better impression of the conditions, we exemplary discuss them next for model (22), the multidimensional version of a model of Lee and Bolt (2018) and a logit model with an interaction of the latent variables (Rizopoulos, 2006).

Verification of the CS Regularity Conditions for Multidimensional IRT Models

We shall verify the proposed conditions (CS1’) to (CS5’) for a multidimensional version of a model by Lee and Bolt (2018) and discuss them also for the models of Pelle et al. (2016) and a logit model with interaction of the latent variables (Rizopoulos, 2006).

Consider first the IRT model by Lee and Bolt (2018) or its multidimensional version

\begin{matrix} P_{i} (η) = Φ_{1} (\frac{α_{i}^{^{⊺}} η + β_{i}}{\sqrt{2} {(1 + exp (- δ_{i}^{^{⊺}} η))}^{- 1 / 2}}), η \in Θ, i \in N . \end{matrix}

If $H$ is one of the usual structural models or any other regular distribution, for example $N_{q} (0, I_{q})$ , a mixture of normals or a uniform distribution on some compact set, then (CS1’) is directly satisfied. Further, considering the model parameters as random variables, we assume that the parameter sequences ${α_{i}}_{i \in N}$ and ${δ_{i}}_{i \in N}$ behave as they were two independent iid sequences drawn from absolutely continuous regular distributions in some bounded region in $R^{q}$ and the sequence ${β_{i}}_{i \in N}$ is in an arbitrary bounded subset of $R$ , then conditions (CS2’) and (CS4’) are directly satisfied. The assumption of regular distributions with a bounded support for the model parameters is reasonable, since in IRT practice items with arbitrarily large discrimination are not realistic and items of arbitrarily high or low difficulty are avoided. Furthermore, in almost all cases, the latent vector is identifiable if q arbitrary items are given. Hence, (CS3’) is satisfied, too. The gradient of the response probabilities is given by

\begin{matrix} \begin{matrix} \nabla P_{i} (η) = ϕ_{1} ( & \frac{α_{i}^{⊺} η + β_{i}}{\sqrt{2} {(1 + exp (- δ_{i}^{⊺} η))}^{- 1 / 2}}) \\ \times (\sqrt{\frac{1 + exp (- δ_{i}^{⊺} η)}{2}} α_{i} - \frac{exp (- δ_{i}^{⊺} η)}{2 \sqrt{2}} \frac{α_{i}^{⊺} η + β_{i}}{\sqrt{1 + exp (- δ_{i}^{⊺} η)}} δ_{i}), \end{matrix} \end{matrix}

for all $i \in N$ and $η \in Θ$ , where $ϕ_{q}$ denotes the pdf of $N_{q} (0, I_{q})$ . In particular, we see from (23), that ${\nabla P_{i} (η)}_{i \in N}$ behaves in almost all cases for all $η \in Θ$ as an iid sequence drawn from a regular distribution with bounded support in $R^{q}$ , since the parameters are iid distributed for all items and every $η \in Θ$ is considered separately, i.e., $η$ is held fixed. Exceptions are pathological cases like the one in which zero belongs to the support of the distributions of all model parameters and all model parameters equal zero, i.e., $P_{i} (η) = 0.5$ for all $η \in Θ$ and $i \in N$ . However, the subset of such cases is of zero probability for regular continuous distributions, i.e., is a null-set. Thus,

\begin{matrix} \frac{1}{d} \sum_{i = 1}^{d} \nabla P_{i} (η) \nabla P_{i} {(η)}^{⊺} \end{matrix}

converges to the second moment of the distribution of $\nabla P_{1} (η)$ , as a random vector formed by the multivariate transformation of the randomly selected parameter values described directly after equation (23), and is thus positive definite.

With respect to (CS5’), note that (19) in (CS2’) implies that (21) is equivalent to

\begin{matrix} \underset{d \to \infty}{lim inf} ν_{min} (\frac{1}{d}, \sum_{i = 1}^{d}, I_{i}, (η)) > 0 and to \underset{d \to \infty}{lim inf} ν_{min} (\frac{1}{d}, \sum_{i = 1}^{d}, \nabla, P_{i}, (η), \nabla, P_{i}, {(η)}^{⊺}) > 0, \end{matrix}

which in our case ensures that (CS5’) is satisfied (cf. Lemma W.6 in the web-appendix).

For illustrative purposes, consider an example with $d = 30$ and $q = 2$ and model parameter values, as given in Table 1, which are independently drawn from a uniform distribution on $(- 2, 2)$ for $β_{i}$ , and on $(- 0.5, 1)$ for all other parameters.

Table 1.

Hypothetical parameter values for model (22) and the first 30 items.

i	$β_{i}$	$α_{i 1}$	$α_{i 2}$	$δ_{i 1}$	$δ_{i 2}$	i	$β_{i}$	$α_{i 1}$	$α_{i 2}$	$δ_{i 1}$	$δ_{i 2}$
1	$-$ 0.906	0.946	$-$ 0.398	0.230	0.327	16	$-$ 0.104	$-$ 0.498	0.298	0.359	0.622
2	$-$ 1.337	0.893	$-$ 0.085	0.696	0.622	17	0.458	0.496	0.267	0.677	0.551
3	$-$ 0.913	$-$ 0.363	0.703	0.052	0.605	18	$-$ 1.882	$-$ 0.259	$-$ 0.484	0.101	$-$ 0.022
4	0.387	0.166	0.313	$-$ 0.352	$-$ 0.447	19	0.801	0.066	$-$ 0.488	0.282	0.697
5	0.676	0.677	0.199	0.701	0.283	20	1.953	$-$ 0.169	0.963	$-$ 0.467	0.443
6	$-$ 0.084	0.865	0.833	0.957	0.724	21	$-$ 1.712	0.326	0.979	$-$ 0.146	$-$ 0.347
7	0.138	0.688	$-$ 0.444	0.942	$-$ 0.043	22	1.725	0.146	0.875	0.843	0.092
8	1.608	0.296	0.472	$-$ 0.096	$-$ 0.101	23	1.980	0.826	0.517	0.195	0.825
9	$-$ 0.882	0.016	$-$ 0.378	0.088	0.458	24	0.171	0.197	0.722	0.069	$-$ 0.442
10	1.240	0.320	0.924	$-$ 0.250	0.386	25	0.499	$-$ 0.101	$-$ 0.319	0.943	0.071
11	$-$ 0.919	0.867	$-$ 0.391	0.051	0.202	26	$-$ 1.775	0.530	$-$ 0.188	0.819	$-$ 0.336
12	0.216	0.093	0.028	0.152	0.339	27	1.523	$-$ 0.302	0.646	0.581	0.718
13	1.900	0.726	0.007	0.842	$-$ 0.072	28	1.548	0.272	0.469	$-$ 0.207	$-$ 0.241
14	0.510	$-$ 0.068	$-$ 0.105	0.132	$-$ 0.256	29	$-$ 0.823	$-$ 0.113	$-$ 0.147	0.027	$-$ 0.282
15	$-$ 1.109	0.072	$-$ 0.017	0.370	0.623	30	0.822	0.661	$-$ 0.485	$-$ 0.277	$-$ 0.017

Open in a new tab

In Fig. 1 (top) visualizations of $E_{η_{0}} (log (Z_{5} (η, η_{0})))$ are provided for two exemplary values of $η_{0} \in Θ$ and the parameters in Table 1. In particular, we can recognize lines in $Θ$ , for which $E_{η_{0}} (log (Z_{5} (η, η_{0}))) = 0$ holds. In Fig. 1 (bottom), surfaces of $\frac{1}{30} \sum_{i = 1}^{30} E_{η_{0}} (log (Z_{i} (η, η_{0})))$ are illustrated for further two exemplary $η_{0}$ values. The surfaces are drawn over ${[- 1, 1]}^{2}$ . There is a nearly parabolic surface, which illustrates that there is no reason to doubt for (20) in (CS3’).

Figure 2 provides the minimal smallest eigenvalue of $\frac{1}{d} \sum_{i = 1}^{d} \nabla λ_{i} (η) \nabla λ_{i} {(η)}^{⊺}$ on ${[- 1, 1]}^{2}$ for $d \in {1, \dots, 30}$ , cf. (CS5’). Overall, in this case, the regularity conditions can be considered as justified to apply the APN for arbitrary response patterns on the illustrated 30 items.

Fig. 2 — Minimal eigenvalue of $\frac{1}{d} \sum_{i = 1}^{d} \nabla λ_{i} (η) \nabla λ_{i} {(η)}^{⊺}$ for $η \in {[- 1, 1]}^{2}$ .

Conditions (CS1’) to (CS5’) for other models can be verified similarly. For example, the multidimensional Rasch model implemented in Pelle et al. (2016) is the logit model

\begin{matrix} logit (P (Y_{i} = 1 ∣ η)) : = log (\frac{P (Y_{i} = 1 ∣ η)}{1 - P (Y_{i} = 1 ∣ η)}) = α_{i 0} + α_{i}^{⊺} η, η \in R^{q}, i = 1, \dots, d, \end{matrix}

where $P (Y_{i} = 1 ∣ η)$ is the probability of inclusion in registration i, given the vector of latent variables. In this case,

\begin{matrix} \nabla λ_{i} (η) = α_{i}, i = 1, \dots, d, \end{matrix}

would replace (23) for $η \in Θ$ , while the subsequent arguments are the same as above.

Another example is the two-dimensional model

\begin{matrix} λ_{i} (η) = α_{i 0} + α_{i 1} η_{1} + α_{i 2} η_{2} + α_{i 3} η_{1} η_{2}, i = 1, \dots, d, \end{matrix}

which is a logit model that contains an interaction term between the two latent variables and was considered by Rizopoulos (2006) (see Section 4). While it is still logit-linear in the model parameters $α_{ij}$ , $i \in [d]$ , $j \in [4]$ , it is no longer linear in the latent variables. However, with

\begin{matrix} \nabla λ_{i} (η) = (\begin{matrix} α_{i 1} + α_{i 3} η_{2} \\ α_{i 2} + α_{i 3} η_{1} \end{matrix}), η \in R^{2}, i = 1, \dots, d, \end{matrix}

the same arguments still apply (compare also to Rizopoulos and Moustaki (2008), who discuss MIRT models within a more general form of the generalized latent variable model, allowing nonlinear effects of latent variables).

Main Results

Our main contribution is the generalization of Theorems 1 and 3 of Chang and Stout (1993) for $q > 1$ , under the assumptions (CS1’) to (CS5’). Furthermore, we embed the CS-approach in the GGS framework (see Theorem 5 (iii)). Similarly to Chang and Stout (1993), the consistency of the MLE is received as a by-product, along with an assertion on its existence. Additionally, the consistency of a penalized MLE is derived. The results are provided in the next theorem, while their proofs along with some preliminary required lemmas are given in appendix.

Theorem 5

Let $Z \sim N_{q} (0, I_{q})$ be a q-variate standard normal distributed random vector and ${Y_{i}}_{i \in N} \sim P (η_{0})$ is a sequence of binary response variables for a sequence of item response functions ${Y_{i}}_{i \in N}$ satisfying (CS1’[i]), (CS2’) and (CS3’) for $η_{0} \in Θ \ \partial Θ$ . Then, the following statements holds:

(i)
There is a sequence ${{\hat{η}}_{d}}_{d \in N}$ of measurable mappings so that
$\begin{matrix} lim_{d \to \infty} P_{η_{0}} (\nabla ℓ^{(d)} ({\hat{η}}_{d} | Y^{(d)}) = 0) = 1, \\ lim_{d \to \infty} P_{η_{0}} (ℓ^{(d)} ({\hat{η}}_{d} ∣ Y^{(d)}) = max_{η \in Θ} ℓ^{(d)} (η ∣ Y^{(d)})) = 1 \end{matrix}$
and ${\hat{η}}_{d} \overset{P_{η_{0}}}{⟶} η_{0}$ for $d \to \infty$ .
(ii)
Statement (i) remains valid if $ℓ^{(d)}$ is replaced by the penalized log-likelihood
$\begin{matrix} {\tilde{ℓ}}^{(d)} (η ∣ Y^{(d)})) = ℓ^{(d)} (η ∣ Y^{(d)}) + log (W (η)), η \in Θ, d \in N, \end{matrix}$
for some continuously differentiable, positive and bounded function $W$ .

(iii)
If additional (CS1’[ii]), (CS4’) and (CS5’) are satisfied, then the following statement holds: If $η_{0}$ is held fix, then, for all $B \in B^{q}$ ,
$\begin{matrix} P ((I^{(d)} {({\hat{η}}_{d})}^{1 / 2} (η - {\hat{η}}_{d}) \in B|, Y^{(d)}) \overset{P_{η_{0}}}{⟶} P (Z \in B) . \end{matrix}$ 26
That is, the MLE ${\hat{η}}_{d}$ is a semiproper centering (cf. Definition 1). If $η_{0} \sim G$ , where $G$ is an absolutely continuous proper distribution with $supp (G) \subseteq Θ$ , then furthermore
$\begin{matrix} P ((I^{(d)} {({\hat{η}}_{d})}^{1 / 2} (η - {\hat{η}}_{d}) \in B|, Y^{(d)}) \overset{P}{⟶} P (Z \in B), \end{matrix}$ 27
for all $B \in B^{q}$ .

Remark 5

For $W = h$ , the penalized MLE in part Theorem 5 (ii) becomes the maximum a-posteriori estimator (MAP), which is an important estimator for $η$ in IRT, also because it ensures the existence of estimates in cases the MLE becomes infinite (for example when $\sum_{i = 1}^{d} y_{i} = 0$ or d). The restriction on $h$ in part (ii) is stronger than (CS1’[ii]), but still mild.

As already noted in Sect. 3, Theorem 5 (iii) can be used for the construction of credible regions for $η$ . Additionally, it allows the interpretation of the MLE as a Bayesian estimator of $η_{0}$ and thus enables the use of ${\hat{η}}_{d}$ to derive some kind of objective posterior, in the sense that it is prior-free constructed.

An important concept in the asymptotic analysis of Bayesian procedures is the consistency of the posterior distribution, which forms a basis for the asymptotic validity of inferential methods, and is proved in Theorem 6 (i). The consistency of the EAP is stated in Theorem 6 (ii).

Theorem 6

Consider the setup and the assumptions of Theorem 5(iii), the following statements hold:

(i)
If $η_{0}$ is held fix, then
$\begin{matrix} P (η \in B ∣ Y^{(d)}) \overset{P_{η_{0}}}{⟶} δ_{η_{0}} (B) : = \{\begin{matrix} 1, & η_{0} \in B \\ 0, & η_{0} \notin B \end{matrix}), d \to \infty, \end{matrix}$
for all Borel-sets $B \in B^{q}$ with $η_{0} \notin \partial B$ .
(ii)
Suppose that $η_{0}$ is held fix and that there is a continuous mapping $f : Θ \to R$ so that $\int_{Θ} f (η) h (η) d (η)$ exists. Then, the posterior expected value $E (f (η) ∣ Y^{(d)})$ exists for all $d \in N$ and is weakly consistent for $f (η_{0})$ , i.e.,
$\begin{matrix} E (f (η) ∣ Y^{(d)}) \overset{P_{η_{0}}}{⟶} f (η_{0}), for d \to \infty . \end{matrix}$
If in particular $E (η)$ exists, then the posterior expected value $E (η ∣ Y^{(d)})$ exists for all $d \in N$ and is weakly consistent for $η_{0}$ .

Simulation Study

The simulation study that follows examines the convergence to zero of the error for the approximation of the MLE-centered normalized posterior by a standard normal distribution and its relation to the convergence of the MLE, for the case of a bivariate latent variable vector ( $q = 2$ ). Convergences are evaluated based on the following measures. For the MLE, we use the root-mean-square error

\begin{matrix} RMSE ({\hat{η}}_{d}, η_{0}) = \sqrt{\frac{1}{2} ({({\hat{η}}_{d 1} - η_{01})}^{2} + {({\hat{η}}_{d 2} - η_{02})}^{2})} . \end{matrix}

For the approximation of the normalized posterior density $h^{*}$ by a bivariate normal pdf $ϕ_{2}$ , we compute the density approximation error (also known as $L^{1}$ -distance)

\begin{matrix} DAE (h^{*}, ϕ_{2}) = \int_{R^{2}} |h^{*} (η) - ϕ_{2} (η)| d η, \end{matrix}

the Hellinger-distance

\begin{matrix} HD (h^{*}, ϕ_{2}) = \sqrt{\frac{1}{2} \int_{R^{2}} {(\sqrt{h^{*} (η)} - \sqrt{ϕ_{2} (η)})}^{2} d η}, \end{matrix}

and the Kullback–Leibler divergence

\begin{matrix} KLD (h^{*}, ϕ_{2}) = \int_{R^{2}} log (\frac{ϕ_{2} (η)}{h^{*} (η)}) ϕ_{2} (η) d η . \end{matrix}

The simulation study is based on model (22) with the same item parameters across all replications, to mimic the situation that different persons respond on the same test. These are generated as in Sect. 6. For the structural model we assume $H = N_{2} (0, I_{2})$ , resulting in $Θ = R^{2}$ . The number of items d varies from 10 to 70 in steps of ten items, to mimic the asymptotic behavior with test lengthening. All involved integrals are approximated using an importance sampling Monte Carlo (MC) approximation with $N_{2} (0, I_{2})$ being the importance distribution.

We replicate 1000 times ( $ℓ = 1, \dots, 1000$ ) the following procedure.

Draw $η_{0}^{(ℓ)} \sim H$ .
Draw $y^{(70, ℓ)} = (y_{1}^{(ℓ)}, \dots, y_{70}^{(ℓ)})$ from model (22) with underlying true latent variable vector $η_{0}^{(ℓ)}$ and item parameter values as described above (setting $y_{i}^{(ℓ)} = 1$ if $y_{i}^{* (ℓ)} < P_{i} (η_{0}^{(ℓ)})$ and $y_{i} = 0$ otherwise, where $y_{i}^{* (ℓ)}$ is drawn from iid $U (0, 1)$ , $i = 1, \dots, 70$ ). Then set $y^{(d, ℓ)} = (y_{1}^{(ℓ)}, \dots, y_{d}^{(ℓ)})$ , for $d = 10, 20, \dots, 70$ .
Compute the MLE ${\hat{η}}_{d}^{(ℓ)}$ and the test information matrix $I^{(d)} ({\hat{η}}_{d}^{(ℓ)})$ , based on $y^{(d, ℓ)}$ , for $d = 10, 20, \dots, 70$ .
Derive the posterior pdf $h^{* (ℓ)}$ of the normalized latent vector $I^{(d)} ({\hat{η}}_{d}) (η - {\hat{η}}_{d})$ , estimating its normalization constant by a MC quadrature.
Compute ${RMSE}_{ℓ}^{(d)} = RMSE ({\hat{η}}_{d}^{(ℓ)}, η_{0}^{(ℓ)})$ , ${DAE}_{ℓ}^{(d)} = DAE (h^{* (ℓ)}, ϕ_{2})$ , ${HD}_{ℓ}^{(d)} = HD (h^{* (ℓ)}, ϕ_{2})$ and ${KLD}_{ℓ}^{(d)} = KLD (h^{* (ℓ)}, ϕ_{2})$ , for $d = 10, 20, \dots, 70$ .

Our results are visualized in Fig. 3, where the box-plots of the RMS, DAE, HD and KLD values computed above are pictured, for all d values considered. As expected, all evaluation measures and their range are decreasing in d.

Table 2 provides the average values of the evaluation measures, i.e., ${\bar{RMSE}}^{(d)} = \frac{\sum_{ℓ} {RMSE}_{ℓ}^{(d)}}{1000}$ , and ${\bar{DAE}}^{(d)}$ , ${\bar{HD}}^{(d)}$ , and ${\bar{KLD}}^{(d)}$ , defined analogously. Notice that in our simulation study in case of relatively small number of items ( $d \leq 30$ ), we observed simulation cycles for which the Kullback–Leibler divergence was numerically infinite (due to floating point arithmetics), indicating that the divergence between the two compared distributions for these cases was extremely large. In particular, this occurred in 112 cases for $d = 10$ , 20 cases for $d = 20$ and one case for $d = 30$ (out of 1000). These cases were excluded from the calculation of the corresponding average KLD-values reported in Table 2.

Table 2.

Average values of the root-mean-square error (RMSE) for the MLE along with the density approximation error (DAE), the Hellinger distance (HD) and the Kullback–Leibler divergence (KLD) between the density of the MLE-centered normalized posterior distribution and a bivariate standard normal density, based on 1000 simulations of model (19) with $q = 2$ , for different numbers of items d.

d	${\bar{RMSE}}^{(d)}$	${\bar{DAE}}^{(d)}$	${\bar{HD}}^{(d)}$	${\bar{KLD}}^{(d)}$
10	1.4105	1.1416	0.1330	3.0803
20	0.8206	0.8267	0.0885	1.2698
30	0.6730	0.7272	0.0750	0.8601
40	0.5658	0.6314	0.0634	0.5922
50	0.4552	0.5222	0.0515	0.3850
60	0.3859	0.4510	0.0439	0.2776
70	0.3530	0.4184	0.0403	0.2303

Open in a new tab

Figure 4 visualizes the relation of the divergence measures of the normalized posterior from the standardized normal distribution to the RMSE of the MLE for the values of Table 2, pictured for $d \geq 30$ . Observe that as d increases, ${\bar{DAE}}^{(d)}$ and ${\bar{HD}}^{(d)}$ are linear in ${\bar{RMSE}}^{(d)}$ , while ${\bar{KLD}}^{(d)}$ is linear in ${\bar{RMSE}}^{(d)} / \sqrt{d}$ . This is an indication that DAE and HD have the same rate of convergence to zero as RMSE while that of KLD is scaled by $d^{- 1 / 2}$ .

Fig. 4 — Linear regression of ${\bar{DAE}}^{(d)}$ and ${\bar{HD}}^{(d)}$ on ${\bar{RMSE}}^{(d)}$ and of ${\bar{KLD}}^{(d)}$ on ${\bar{RMSE}}^{(d)} / \sqrt{d}$ for $d \geq 30$ (s. Table 2).

Notice that the convergence of the DAE to zero in probability is equivalent to proper centering. Thus, our simulation results suggest that the MLE is a proper centering for the multivariate version of the model of Lee and Bolt (2018) with the parameters we consider.

Simulation studies for other IRT models can be conducted similarly, expecting analogous results.

Discussion

In this work, we proved the APN of LTs under mild conditions that are fulfilled by a broad class of MIRT models for binary items. Furthermore, we obtained as by-products the existence and consistency of the MLE and the MAP estimator. Note that though the MLE is commonly known as consistent in IRT and MIRT settings, Sinharay (2015) indicated the lack of asymptotic results under milder conditions than some of the usual ones (such as test lengthening by strictly parallel forms). Thus, Theorem 5 (i) is a contribution toward this direction.

The distribution $G$ in Theorem 5 (iii) can be different from $H$ used in the model. Hence, the asymptotic result above is robust to misspecifications of $H$ as long as the support is sufficiently large. An interesting task for further investigation, pointed out by one of the reviewers, is the study of the effect of misspecified item response functions.

Under similar mild conditions we provided results on the weak consistency of posterior distributions. In Theorem 6 (ii), we get the existence and consistency of the expected a-posteriori estimator (EAP) for estimating $η_{0}$ as well as for estimating $f (η_{0})$ . To the best of our knowledge, a proof of these properties in such a general setup and under comparably mild or milder conditions on the MIRT model does not exist in the related literature.

Our results are under the assumption of a proper prior $h$ . This is appropriate in IRT settings, where the prior is a model of the population distribution of the latent traits. However, in a Bayesian framework, improper priors can also be considered. If this is the case, the proper prior assumption in (CS1’[ii]) can be replaced by the following condition if the posterior is still proper

\begin{matrix} \frac{1}{P^{(d)} (Y^{(d)} ∣ η_{0})} \int_{Θ \ B_{δ} (η_{0})} P^{(d)} (Y^{(d)} ∣ η) h (η) d η = o_{P_{η_{0}}} (d^{- q / 2}), for all δ > 0 . \end{matrix}

Condition (28) is sufficient for the derivation of the results stated here and is satisfied by a proper prior.

The APN for a univariate LT for polytomous items was discussed by Chang (1996). The extension of the results for MIRT models with polytomous items is the subject of our current research.

Here, we derived conditions for APN of LTs by generalizing the contribution of Chang and Stout (1993) to $q > 1$ . The methodology of GGS/IH, discussed in Sect. 4, provides a general framework for APN in various contexts, including MIRT. The results of Ghosal (1997, 1999) are helpful in deriving alternative conditions for APN tailored for MIRT models.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary file 1 (pdf 409 KB)^{(386.2KB, pdf)}

Acknowledgements

The authors thank the associate editor and the reviewers for their constructive and useful comments on earlier versions of the manuscript.

Open Access

This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

Appendix: Proofs of Theorems in Section 7

Here, we prove the main results Theorem 5 and 6 and provide required lemmas for their proof. More detailed versions of all proofs including preliminary results are provided in the web-appendix.

For the proof of Theorem 5 (i) the following lemma is required. This lemma ensures that for $d \to \infty$ , a global maximum of the log-likelihood has to be in an arbitrarily small area around $η_{0}$ , thus being the main step to prove the consistency of the MLE.

Lemma 1

Consider ${Y_{i}}_{i \in N} \sim P (η)$ and assume that conditions (CS1’[i]), (CS2’) and (CS3’) are satisfied for a fixed $η_{0} \in Θ$ . Then, for any $δ > 0$ there is a $k (δ) < 0$ so that

\begin{matrix} lim_{d \to \infty} P_{η_{0}} (sup_{η \in Θ \ B_{δ} (η_{0})} \frac{1}{d} (ℓ^{(d)} (η | Y^{(d)}) - ℓ^{(d)} (η_{0} | Y^{(d)})) < k (δ)) = 1 . \end{matrix}

Proof

A more detailed version of the proof can be found in the web-appendix (p. 3).

Consider an arbitrary $δ > 0$ . One can show that the associated sequence of item response functions ${P_{i}}_{i \in N}$ is equicontinuous on each compact set (compare Lemma W.1 in the web-appendix). This implies, applying the strong law of large numbers, that

\begin{matrix} lim_{d \to \infty} P_{η_{0}} (sup_{η \in K} \frac{1}{d} (ℓ^{(d)} (η | Y^{(d)}) - ℓ^{(d)} (η_{0} | Y^{(d)})) < c_{K}) = 1, \end{matrix}

for each compact $K \subset Θ$ for which a $δ > 0$ exists such that $B_{δ} (η_{0}) ⊄ K$ with a constant $c_{K} \leq {sup}_{η \in K} c (η) / 2 < 0$ (compare the more detailed version of the proof in the web-appendix). This is in particular true for

\begin{matrix} K = Θ_{j} : = {η \in Θ : δ + j \leq ‖ η - η_{0} ‖ \leq δ + j + 1} \end{matrix}

for each $j \in N_{0}$ and $δ > 0$ . Finally, we get with probability tending to one for $d \to \infty$ that

\begin{matrix} sup_{η \in Θ \ B_{δ} (η_{0})} \frac{1}{d} (ℓ^{(d)} (η | Y^{(d)}) - ℓ^{(d)} (η_{0} | Y^{(d)})) \\ = sup_{j \in N_{0}} (sup_{η \in Θ_{j}} \frac{1}{d} (ℓ^{(d)} (η | Y^{(d)}) - ℓ^{(d)} (η_{0} | Y^{(d)}))) \\ \leq sup_{j \in N_{0}} c_{j} \leq sup_{η \in Θ \ B_{δ} (η_{0})} c (η) / 2 = : 2 \cdot k (δ) . \end{matrix}

$□$

Proof of Theorem 5(i)–(ii)

A more detailed version of the proof can be found in the web-appendix (p. 7).

Analogously to the proof of Corollary 3.1 of Chang and Stout (1991), notice that

\begin{matrix} ℓ^{(d)} ({\hat{η}}_{d} | Y^{(d)}) - ℓ^{(d)} (η_{0} | Y^{(d)}) = log (\frac{P^{(d)} (Y^{(d)} | {\hat{η}}_{d})}{P^{(d)} (Y^{(d)} | η_{0})}) \geq 0, \end{matrix}

if the MLE exists due to its definition as global maximum. From Lemma 1 follows, that every global maximum of the log-likelihood has to be in every arbitrary small region around $η_{0}$ with probability tending to one for $d \to \infty$ , which implies consistency. The existence of the MLE and further its derivation as solution of the likelihood equations can be shown completely analogous to classical iid cases (e.g., Lehmann & Casella, 1998, Chapter 6, Theorem 5.1, p. 463).

Considering the modified log-likelihood function

\begin{matrix} {\tilde{ℓ}}^{(d)} (η ∣ Y^{(d)})) = ℓ^{(d)} (η ∣ Y^{(d)}) + log (W (η)), η \in Θ, d \in N, \end{matrix}

of part (ii), the consistency is obtained by replacing $ℓ$ by $\tilde{ℓ}$ in Lemma 1 and part (i) of this theorem. $□$

The following lemma ensures the log-likelihood-ratio can be well approximated by a quadratic form of the test information matrix, which is an essential part for the proof of Theorem 5(iii). Lemma 3 and Corollary 1 provided in the sequel, are additionally required for the proof of Theorem 5(iii) and Theorem 6.

Lemma 2

Suppose that conditions (CS1’) through (CS5’) hold. Denote by $H_{d} (\cdot)$ the Hessian matrix of $ℓ^{(d)} (\cdot | Y^{(d)})$ . Set $Σ_{d} : = I^{(d)} {(η_{0})}^{- 1}$ , which is estimated by ${\hat{Σ}}_{d} = I^{(d)} {({\hat{η}}_{d})}^{- 1}$ , if $I^{(d)} {({\hat{η}}_{d})}^{- 1}$ exists, and by ${\hat{Σ}}_{d} = I_{q}$ otherwise, where $I_{q}$ is the $q \times q$ identity matrix, $d \in N$ . Then, we have the following.

There is a sequence ${a_{d}}_{d \in N}$ , $a_{i} \in [0, 1]$ , such that for
$\begin{matrix} R_{d} (η) : = {\hat{Σ}}_{d} (I^{(d)} ({\hat{η}}_{d}) + H_{d} (η_{d}^{*})) = I_{q} + I^{(d)} {({\hat{η}}_{d})}^{- 1} H_{d} (η_{d}^{*}) \end{matrix}$
where $η_{d}^{*} : = a_{d} {\hat{η}}_{d} + (1 - a_{d}) η$ , it holds with probability tending to 1 for $d \to \infty$ , that
$\begin{matrix} ℓ^{(d)} (η | Y^{(d)}) - ℓ^{(d)} ({\hat{η}}_{d} | Y^{(d)}) & = \frac{1}{2} {(η - {\hat{η}}_{d})}^{⊺} H_{d} (η_{d}^{*}) (η - {\hat{η}}_{d}) \end{matrix}$ A2

$\begin{matrix} = - \frac{1}{2} {(η - {\hat{η}}_{d})}^{⊺} I^{(d)} ({\hat{η}}_{d}) (I_{q} - R_{d} (η)) (η - {\hat{η}}_{d}), \end{matrix}$ A3
$η \in Θ$ .
For any $ε > 0$ , there is a $δ > 0$ such that
$\begin{matrix} lim_{d \to \infty} P_{η_{0}} (sup_{η \in B_{δ} (η_{0})} ‖ R_{d} (η) ‖ < ε) = 1, \end{matrix}$ A4
where $‖ A ‖$ denotes the spectral norm for a matrix $A$ .
For any $ε > 0$ , there is a $δ > 0$ so that for all $η \in B_{δ} (η_{0})$
$\begin{matrix} lim_{d \to \infty} P_{η_{0}} & ((1 + ε) Q_{d} (η) \leq - \frac{1}{2} {(η - {\hat{η}}_{d})}^{⊺} I^{(d)} ({\hat{η}}_{d}) (I_{q} - R_{d} (η)) (η - {\hat{η}}_{d}) \\ \leq (1 - ε) Q_{d} (η)) = 1, \end{matrix}$
where
$\begin{matrix} Q_{d} (η) : = - \frac{1}{2} {(η - {\hat{η}}_{d})}^{⊺} I^{(d)} ({\hat{η}}_{d}) (η - {\hat{η}}_{d}), η \in B_{δ} (η_{0}), d \in N . \end{matrix}$

Recall that if a matrix $A$ is symmetric and positive definite, then $‖ A ‖ = ν_{max} (A)$ and $‖ A^{- 1} ‖ = ν_{min} {(A)}^{- 1}$ hold, where $ν_{max}$ and $ν_{min}$ denote the largest and smallest eigenvalues of a matrix.

Proof

An extended proof is provided in the web-appendix (p. 14).

Equation (A2) follows directly from a second-order Taylor expansion of $ℓ^{(d)} (η | Y^{(d)})$ at ${\hat{η}}_{d}$ . Theorem 5(i) and conditions (CS2’) and (CS5’) imply the existence of $I^{(d)} {({\hat{η}}_{d})}^{- 1}$ with probability tending to one for $d \to \infty$ (compare Lemma W.6 in the web-appendix) and, therefore, (A3). Condition (CS5’) further implies for some constant $C_{0} > 0$ and $d \to \infty$ that

\begin{matrix} ‖ R_{d} (η) ‖ & = ‖ (\frac{1}{d} I^{(d)} ({\hat{η}}_{d}))^{- 1} \frac{1}{d} (I^{(d)} ({\hat{η}}_{d}) + H_{d} (η_{d}^{*})) ‖ \\ \leq ‖ (\frac{1}{d} I^{(d)} ({\hat{η}}_{d}))^{- 1} ‖ \cdot ‖ \frac{1}{d} (I^{(d)} ({\hat{η}}_{d}) + H_{d} (η_{d}^{*})) ‖ \\ \leq \frac{1}{C_{0}} ‖ \frac{1}{d} (I^{(d)} ({\hat{η}}_{d}) + H_{d} (η_{d}^{*})) ‖ . \end{matrix}

One can show that the conditions imposed imply that

\begin{matrix} {\{\frac{1}{d}, I^{(d)}, (\cdot)\}}_{d \in N}, {\{\frac{1}{d}, H_{d}, (\cdot)\}}_{d \in N} \end{matrix}

are equicontinuous in every compact and convex region in $Θ$ (compare Lemma W.4 in the web-appendix on p. 11). Kolmogorov’s strong law of large numbers leads then to

\begin{matrix} ‖ R_{d} (η) ‖ \leq C_{1} ‖ η_{0} - η_{d}^{*} ‖ + C_{2} ‖ η_{0} - {\hat{η}}_{d} ‖ + o_{P_{η}} (1), \end{matrix}

for $d \to \infty$ and some appropriate constants $C_{1}, C_{2} > 0$ . Since the MLE is consistent and for every $η \in B_{δ} (η_{0})$ it holds

\begin{matrix} ‖ η_{d}^{*} - η_{0} ‖ \leq ‖ η_{0} - {\hat{η}}_{d} ‖ + ‖ η - η_{0} ‖, \end{matrix}

(recall $η_{d}^{*}$ lies between $η$ and $\hat{η_{d}}$ ), we get for $ε : = 2 C_{1} δ$

\begin{matrix} sup_{η \in B_{δ} (η_{0})} ‖ R_{d} (η) ‖ < ε + o_{P_{η}} (1) . \end{matrix}

Notice that

\begin{matrix} | x^{⊺} A B x | \leq \sqrt{\frac{ν_{max} (A)}{ν_{min} (A)}} ‖ B ‖ x^{⊺} A x, \end{matrix}

for every $x \in R^{q}$ , symmetric and positive definite $A \in R^{q \times q}$ and any further matrix $B \in R^{q \times q}$ . The final part follows by selecting $A = I^{(d)} ({\hat{η}}_{d})$ and $B = R_{d} (η_{d}^{*})$ for $d \to \infty$ . $□$

Lemma 3

Let $\tilde{Φ} (B) : = P (Z \in B)$ for all $B \in B^{q}$ with $Z \sim N_{q} (0, I_{q})$ . Consider a sequence ${Y_{i}}_{i \in N}$ for a fixed $η_{0} \in Θ$ , for which conditions (CS1’ [i]), (CS2’) through (CS5’), and either (CS1’ [ii]) or (28) are satisfied. Then, the following holds.

For every function f that is either absolutely bounded by a constant $c > 0$ or for which the integral $\int_{R^{q}} f (η) h (η) d η$ exists, and for every $δ > 0$ , it holds
$\begin{matrix} \frac{\int_{R^{q} \ B_{δ} (η_{0})} f (η) P^{(d)} (Y^{(d)} ∣ η) h (η) d η}{P^{(d)} (Y^{(d)} ∣ {\hat{η}}_{d}) det ({\hat{Σ}}_{d}^{1 / 2})} \overset{P_{η_{0}}}{⟶} 0, d ⟶ \infty . \end{matrix}$ A5
Consider a sequence ${G_{d}}_{d \in N}$ with $G_{d} : (Θ, B (Θ)) \to (Θ, B (Θ))$ satisfying either
$\begin{matrix} lim_{d \to \infty} P_{η_{0}} (G_{d} (B) \subset B_{δ} (η_{0})) = 1, for all δ > 0, \end{matrix}$ A6
or
$\begin{matrix} lim_{d \to \infty} P_{η_{0}} (G_{d} (B) \supset B_{δ} (η_{0})) = 1, for one δ > 0, \end{matrix}$ A7
for all bounded $B \in B^{q}$ . Then, for $d ⟶ \infty$ , it holds
$\begin{matrix} \frac{\int_{G_{d} (B)} P^{(d)} (Y^{(d)} ∣ η) h (η) d η}{P^{(d)} (Y^{(d)} ∣ {\hat{η}}_{d}) det ({\hat{Σ}}_{d}^{1 / 2})} \\ - \tilde{Φ} (I^{(d)}, {({\hat{η}}_{d})}^{1 / 2}, (G_{d} (B) - {\hat{η}}_{d})) h (η_{0}) {(2 π)}^{q / 2} = o_{P_{η_{0}}} (1) . \end{matrix}$
In particular, in case of (A7), it holds
$\begin{matrix} \tilde{Φ} (I^{(d)}, {({\hat{η}}_{d})}^{1 / 2}, (G_{d} (B) - {\hat{η}}_{d})) = P (Z \in I^{(d)} {({\hat{η}}_{d})}^{1 / 2} (G_{d} (B) - {\hat{η}}_{d})) \overset{P_{η_{0}}}{⟶} 1 . \end{matrix}$

Proof

A more detailed version of the proof can be found in the web-appendix (p. 17). 1. With regard to the left-hand side of (A5), note that, in terms of the log-likelihood function, it can be written as

\begin{matrix} \frac{\int_{R^{q} \ B_{δ} (η_{0})} f (η) P^{(d)} (Y^{(d)} ∣ η) h (η) d η}{P^{(d)} (Y^{(d)} ∣ {\hat{η}}_{d}) det ({\hat{Σ}}_{d}^{1 / 2})} = exp (ℓ^{(d)} (η_{0} | Y^{(d)}) - ℓ^{(d)} ({\hat{η}}_{d} | Y^{(d)})) \frac{T_{d}}{det ({\hat{Σ}}_{d}^{1 / 2})}, \end{matrix}

where

\begin{matrix} T_{d} : = \int_{R^{q} \ B_{δ} (η_{0})} f (η) exp (ℓ^{(d)} (η | Y^{(d)}) - ℓ^{(d)} (η_{0} | Y^{(d)})) h (η) d η, \end{matrix}

while it always fulfills

\begin{matrix} |\frac{\int_{R^{q} \ B_{δ} (η_{0})} f (η) P^{(d)} (Y^{(d)} ∣ η) h (η) d η}{P^{(d)} (Y^{(d)} ∣ {\hat{η}}_{d}) det ({\hat{Σ}}_{d}^{1 / 2})}| \leq |\frac{T_{d}}{det ({\hat{Σ}}_{d}^{1 / 2})}| . \end{matrix}

If $H$ is improper and f is bounded by a constant, (28) directly implies

\begin{matrix} |\frac{T_{d}}{det ({\hat{Σ}}_{d}^{1 / 2})}| = o_{P_{η_{0}}} (1) . \end{matrix}

In any other case, Lemma 1 leads to

\begin{matrix} lim_{d \to \infty} P_{η_{0}} (| T_{d} | \leq E (| f (η) |) exp (d c (δ))) = 1 . \end{matrix}

Finally, from the polynomial grows of $det ({\hat{Σ}}_{d}^{1 / 2})^{- 1}$ , it follows

\begin{matrix} \frac{E (| f (η) |) exp (d c (δ))}{det ({\hat{Σ}}_{d}^{1 / 2})} \overset{P_{η_{0}}}{⟶} 0 . \end{matrix}

2. Let $B \in B^{q}$ be an arbitrary bounded Borel set and define for $δ > 0$ and $d \in N$ the set

\begin{matrix} M_{δ, d} : = B_{δ} (η_{0}) \cap G_{d} (B) \end{matrix}

and the integral

\begin{matrix} V_{d} : = & \int_{M_{δ, d}} P^{(d)} (Y^{(d)} | η) h (η) d η . \end{matrix}

Using the definition of $R_{d} (η)$ in Lemma 2, it holds

\begin{matrix} \begin{matrix} \frac{V_{d}}{P^{(d)} (Y^{(d)} | {\hat{η}}_{d}) det ({\hat{Σ}}_{d}^{1 / 2})} \\ = \frac{h (η_{0})}{det ({\hat{Σ}}_{d}^{1 / 2})} \int_{M_{δ, d}} \frac{h (η)}{h (η_{0})} exp (- \frac{1}{2} {(η - {\hat{η}}_{d})}^{⊺} I^{(d)} ({\hat{η}}_{d}) (I - R_{d} (η)) (η - {\hat{η}}_{d})) d η . \end{matrix} \end{matrix}

By (CS1’), i.e., the continuity of $h$ and $h (η_{0}) > 0$ , it follows that for every $ε_{1} > 0$ , it exists a $δ_{1} > 0$ , such that

\begin{matrix} 1 - ε_{1} \leq inf_{η \in M_{δ_{1}, d}} \frac{h (η)}{h (η_{0})} \leq sup_{η \in M_{δ_{1}, d}} \frac{h (η)}{h (η_{0})} \leq 1 + ε_{1} . \end{matrix}

A10

Furthermore, by Lemma 2 we get for any $ε_{2} > 0$ and appropriate $δ_{2} = δ_{2} (ε_{2}) > 0$ :

\begin{matrix} (1 - o_{P_{η_{0}}} (1)) \int_{M_{δ_{2}, d}} exp (- \frac{1 + ε_{2}}{2} {(η - {\hat{η}}_{d})}^{⊺} I^{(d)} ({\hat{η}}_{d}) (η - {\hat{η}}_{d})) d η \\ \leq \int_{M_{δ_{2}, d}} exp (- \frac{1}{2} {(η - {\hat{η}}_{d})}^{⊺} I^{(d)} ({\hat{η}}_{d}) (I - R_{d} (η_{d}^{*})) (η - {\hat{η}}_{d})) d η \\ \leq (1 + o_{P_{η_{0}}} (1)) \int_{M_{δ_{2}, d}} exp (- \frac{1 - ε_{2}}{2} {(η - {\hat{η}}_{d})}^{⊺} I^{(d)} ({\hat{η}}_{d}) (η - {\hat{η}}_{d})) d η . \end{matrix}

A11

Next, (A10) and (A11) imply

\begin{matrix} \tilde{Φ} (\sqrt{1 + ε_{2}} I^{(d)} {({\hat{η}}_{d})}^{1 / 2} (M_{δ_{2}, d} - {\hat{η}}_{d})) h (η_{0}) {(2 π)}^{q / 2} \frac{1 - ε_{1}}{{(1 + ε_{2})}^{q / 2}} (1 - o_{P_{η_{0}}} (1)) \\ \leq \frac{V_{d}}{P^{(d)} (Y^{(d)} | {\hat{η}}_{d}) det ({\hat{Σ}}_{d}^{1 / 2})} \\ \leq \tilde{Φ} (\sqrt{1 - ε_{2}} I^{(d)} {({\hat{η}}_{d})}^{1 / 2} (M_{δ_{2}, d} - {\hat{η}}_{d})) h (η_{0}) {(2 π)}^{q / 2} \frac{1 + ε_{1}}{{(1 - ε_{2})}^{q / 2}} (1 + o_{P_{η_{0}}} (1)) . \end{matrix}

A12

In the case of (A6), it holds ${lim}_{d \to \infty} P_{η_{0}} (G_{d} (B) = M_{δ_{2}, d}) = 1$ . Selecting $ε_{1}$ and $ε_{2}$ arbitrarily small leads to

\begin{matrix} \frac{\int_{G_{d} (B)} P^{(d)} (Y^{(d)} ∣ η) h (η) d η}{P^{(d)} (Y^{(d)} ∣ {\hat{η}}_{d}) det ({\hat{Σ}}_{d}^{1 / 2})} = \tilde{Φ} (I^{(d)} {({\hat{η}}_{d})}^{1 / 2} (G_{d} (B) - {\hat{η}}_{d})) h (η_{0}) {(2 π)}^{q / 2} + o_{P_{η_{0}}} (1) . \end{matrix}

In the case of equation (A7), we get for each $δ_{2} < δ$ : ${lim}_{d \to \infty} P_{η_{0}} (B_{δ_{2}} (η_{0}) = M_{δ_{2}, d}) = 1$ . Condition (CS5’) implies

\begin{matrix} \tilde{Φ} (\sqrt{1 + ε_{2}} I^{(d)} {({\hat{η}}_{d})}^{1 / 2} (B_{δ_{2}} (η_{0}) - {\hat{η}}_{d})) \overset{P_{η_{0}}}{⟶} 1 . \end{matrix}

Finally, the further valid selection of arbitrary small $ε_{1}, ε_{2} > 0$ in (A12) and the application of Lemma 3(1.) on $f = 11_{G_{d} (B) \ B_{δ_{2}} (η_{0})}$ completes the proof. $□$

For $G_{d} (B) : = R^{q}$ , Lemma 3(2.) leads directly to the following Corollary.

Corollary 1

Suppose a sequence ${Y_{i}}_{i \in N}$ for a fixed $η_{0} \in Θ$ , for which conditions (CS1’) through (CS5’) hold. Then holds for $d \to \infty$

\begin{matrix} (\frac{P (Y^{(d)})}{P^{(d)} (Y^{(d)} | {\hat{η}}_{d}) det ({\hat{Σ}}_{d}^{1 / 2})})^{- 1} \overset{P_{η_{0}}}{⟶} \frac{1}{h (η_{0}) {(2 π)}^{q / 2}} . \end{matrix}

Proof of Theorem 5(iii)

An extended proof can be found in the web-appendix (p. 21).

Analogously to the proof of Lemma 2, we can assume without loss of generality that $I^{(d)} {({\hat{η}}_{d})}^{- 1 / 2}$ exists. Set

\begin{matrix} G_{d} (B) : = {{\hat{Σ}}_{d}^{1 / 2} x + {\hat{η}}_{d} : x \in B} \equiv I^{(d)} {({\hat{η}}_{d})}^{- 1 / 2} B + {\hat{η}}_{d}, B \in B^{q}, d \in N . \end{matrix}

Then, (26) for bounded B follows directly from the reformulation

\begin{matrix} P ((I^{(d)} {({\hat{η}}_{d})}^{1 / 2} (η - {\hat{η}}_{d}) \in B|, Y^{(d)}) \\ = \frac{\int_{G_{d} (B)} P^{(d)} (Y^{(d)} | η) h (η) d η}{P^{(d)} (Y^{(d)} | {\hat{η}}_{d}) det ({\hat{Σ}}_{d}^{1 / 2})} (\frac{P (Y^{(d)})}{P^{(d)} (Y^{(d)} | {\hat{η}}_{d}) det ({\hat{Σ}}_{d}^{1 / 2})})^{- 1} \end{matrix}

due to Lemma 3 part 2 and Corollary 1. The case of unbounded B can be shown by considering a decomposition $B = ⋃_{m \in N} B_{m}$ for bounded and pairwise disjoint ${B_{m}}_{m \in N}$ (compare to the web-appendix, p. 21–22). Next, set for all $d \in N$ , $ε > 0$ and $η^{'} \in Θ$

\begin{matrix} H_{d, ε} (η^{'}) : = P (| \int_{G_{d} (B)} h (η | Y^{(d)}) d η - \int_{B} ϕ (x) d (x) | > ε | η_{0} = η^{'}), \end{matrix}

where $ϕ$ is the pdf of $N_{q} (0, I_{q})$ . Then, (27) follows from

\begin{matrix} lim_{d \to \infty} P (| \int_{G_{d} (B)} h (η | Y^{(d)}) d η - \int_{B} ϕ (x) d (x) | > ε) & = lim_{d \to \infty} \int_{Θ} H_{d, ε} d G \\ = \int_{Θ} lim_{d \to \infty} H_{d, ε} d G = 0, \end{matrix}

for each $ε > 0$ , which is valid due to (26) and Lebesgue’s theorem of dominated convergence. $□$

Proof of Theorem 6

A more detailed proof is provided in the web-appendix (p. 23).

Part (i) follows directly from Lemma 3 and Corollary 1 by using the reformulation

\begin{matrix} H (B | Y^{(d)}) = \frac{\int_{B} P^{(d)} (Y^{(d)} | η) h (η) d η}{P^{(d)} (Y^{(d)} | {\hat{η}}_{d}) det ({\hat{Σ}}_{d}^{1 / 2})} (\frac{P (Y^{(d)})}{P^{(d)} (Y^{(d)} | {\hat{η}}_{d}) det ({\hat{Σ}}_{d}^{1 / 2})})^{- 1}, \end{matrix}

A13

for an arbitrary $B \in B^{q}$ with $η_{0} \notin \partial B$ .

Next, we prove part (ii). In a first step, the existence of $E (f (η) ∣ Y^{(d)})$ for all functions $f : Θ \to R$ , which are continuous and for which the integral $\int_{Θ} f (η) h (η) d η$ exists, will be proved. In a second step its consistency for $f (η_{0})$ will be discussed.

For every $d \in N$ , it holds

\begin{matrix} \int_{Θ} | f (η) | H (d η ∣ y^{(d)}) & = \frac{\int_{Θ} | f (η) | P^{(d)} (y^{(d)} ∣ η) h (η) d η}{P^{(d)} (y^{(d)})} \leq \frac{\int_{Θ} | f (η) | h (η) d η}{P^{(d)} (y^{(d)})} < \infty, \end{matrix}

for all $y^{(d)} \in {0, 1}^{d}$ , since $P^{(d)} (y^{(d)} ∣ η) \in (0, 1)$ , $P^{(d)} (y^{(d)})$ is positive and independent of $η \in Θ$ , and $\int_{Θ} f (η) h (η) d η$ exists if and only if $\int_{Θ} f (η) h (η) d η$ exists. Hence, $E (f (η) ∣ Y^{(d)})$ exists. Furthermore, it remains integrable for $d \to \infty$ , as shown next. Notice that the last statement does not follow directly, since $P^{(d)} (y^{(d)}) ⟶ 0$ for any sequence ${y_{i}}_{i \in N}$ and $d ⟶ \infty$ .

Adjusting representation (A13), we have

\begin{matrix} \int_{Θ} | f (η) | H (d η ∣ Y^{(d)}) = \frac{\int_{Θ} | f (η) | P^{(d)} (Y^{(d)} ∣ η) h (η) d η}{P^{(d)} (Y^{(d)} ∣ {\hat{η}}_{d}) det {({\hat{Σ}}_{d})}^{1 / 2}} {(\frac{P^{(d)} (Y^{(d)})}{P^{(d)} (Y^{(d)} ∣ {\hat{η}}_{d}) det ({\hat{Σ}}_{d})^{1 / 2}})}^{- 1} . \end{matrix}

Lemma 3 and Corollary 1 imply

\begin{matrix} lim_{d \to \infty} P_{η_{0}} (\int_{Θ} | f (η) | H (d η ∣ Y^{(d)}) < C_{1}) = 1, \end{matrix}

with ${sup}_{η \in B_{δ} (η_{0})} | f (η) | = : C_{1} < \infty$ for an arbitrary $δ > 0$ .

Further, for every $δ > 0$ it holds

\begin{matrix} | \int_{Θ} f (η) H (d η ∣ Y^{(d)}) - f (η_{0}) | = | \int_{Θ} f (η) H (d η ∣ Y^{(d)}) - f (η_{0}) H (Θ ∣ Y^{(d)}) | \\ \leq | \int_{B_{δ} (η_{0})} (f (η) - f (η_{0})) H (d η ∣ Y^{(d)}) | + | \int_{Θ \ B_{δ} (η_{0})} f (η) H (d η ∣ Y^{(d)}) | \\ + | f (η_{0}) | \cdot H (Θ \ B_{δ} (η_{0}) ∣ Y^{(d)}), \end{matrix}

where the last two terms converge to zero in probability by Lemma 3, Corollary 1 and part (i). Last, the continuity of f implies for each $ε > 0$ and appropriate $δ^{'} = δ^{'} (ε)$ :

\begin{matrix} | \int_{B_{δ^{'}} (η_{0})} (f (η) - f (η_{0})) H (d η ∣ Y^{(d)}) | \leq ε \cdot H (B_{δ^{'}} (η_{0}) ∣ Y^{(d)}) \leq ε . \end{matrix}

The second part follows directly by considering the mappings $η \mapsto η_{j}$ , $j \in {1, \dots, q}$ , in the first part, which are continuous and by assumption integrable. $□$

Funding

Open Access funding enabled and organized by Projekt DEAL.

Footnotes

Mia J. K. Kornely appreciates the financial support of the Heinrich-Böll-Stiftung e.V. in form of a PhD scholarship (Grant No. P127357)

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

References

Anderson CJ, Li Z, Vermunt JK. Estimation of models in a Rasch family for polytomous items and multiple latent variables. Journal of Statistical Software. 2007;20(6):1–36. doi: 10.18637/jss.v020.i06. [DOI] [Google Scholar]
Anderson CJ, Vermunt JK. Log-multiplicative association models as latent variable models for nominal and/or ordinal data. Sociological Methodology. 2000;30:81–121. doi: 10.1111/0081-1750.00076. [DOI] [Google Scholar]
Anderson, C. J., & Yu, H.-T. (2017). Properties of Second-Order Exponential Models as Multidimensional Response Models. In L. A. van der Ark, M. Wiberg, S. Culpepper, J. A. Douglas, & W. C. Wang (Eds.), Quantitative Psychology. IMPS 2016. Springer Proceedings in Mathematics & Statistics (Vol. 196). Springer.
Chang H-H. The asymptotic posterior normality of the latent trait for polytomous IRT models. Psychometrika. 1996;61(3):445–463. doi: 10.1007/BF02294549. [DOI] [Google Scholar]
Chang, H.-H., & Stout, W. (1991). The asymptotic posterior normality of the latent trait in an IRT model. Technical Report ONR Research Report 91-4, Department of Statistics, University of Illinois at Urbana-Champaign.
Chang H-H, Stout W. The asymptotic posterior normality of the latent trait in an IRT model. Psychometrika. 1993;58(1):37–52. doi: 10.1007/BF02294469. [DOI] [Google Scholar]
Ghosal S. Normal approximation to the posterior distribution for generalized linear models with many covariates. Mathematical Methods of Statistics. 1997;6:332–348. [Google Scholar]
Ghosal S. Asymptotic normality of posterior distributions in high dimensional linear models. Bernoulli. 1999;5:315–331. doi: 10.2307/3318438. [DOI] [Google Scholar]
Ghosal S, Ghosh JK, Samanta T. On convergence of posterior distributions. The Annals of Statistics. 1995;23(6):2145–2152. doi: 10.1214/aos/1034713651. [DOI] [Google Scholar]
Ghosh, J. K., Ghosal, S., & Samanta, T. (1994). Stability and convergence of the posterior in non-regular problems. Statistical Decision Theory and Related Topics V (pp. 183–199). Springer.
Hessen DJ. Fitting and testing conditional multinormal partial credit models. Psychometrika. 2012;77(4):693–709. doi: 10.1007/s11336-012-9277-1. [DOI] [Google Scholar]
Holland PW. The Dutch identity: A new tool for the study of item response models. Psychometrika. 1990;55(1):5–18. doi: 10.1007/BF02294739. [DOI] [Google Scholar]
Ibragimov, I. A., & Has’minskii, R. Z. (1981). Statistical estimation: Asymptotic theory. Springer.
Kornely, M. J. K. (2021). Multidimensional Modeling and Inference of Dichotomous Item Response Data. PhD thesis, RWTH Aachen University, Germany.
Lee S, Bolt DM. An alternative to the 3pl: Using asymmetric item characteristic curves to address guessing effects. Journal of Educational Measurement. 2018;55(1):90–111. doi: 10.1111/jedm.12165. [DOI] [Google Scholar]
Lehmann, E. L., & Casella, G. (1998). Theory of point estimation (2nd ed.). Springer.
Li, Z. (2010). Loglinear models as item response models. PhD thesis, University of Illinois at Urbana-Champaign.
Lord FM. Unbiased estimators of ability parameters, of their variance, and of their parallel-forms reliability. Psychometrika. 1983;48(2):233–245. doi: 10.1007/BF02294018. [DOI] [Google Scholar]
Paek, Y. (2016). Pseudo-Likelihood Estimation of Multidimensional Polytomous Item Response Theory Models. PhD thesis, University of Illinois at Urbana-Champaign.
Pelle E, Hesse D, van der Heijden PGM. A log-linear multidimensional rasch model for capture-recapture. Statistics in Medicine. 2016;35:622–634. doi: 10.1002/sim.6741. [DOI] [PubMed] [Google Scholar]
Rabe-Hesketh S, Skrondal A, Pickles A. Reliable estimation of generalized linear mixed models using adaptive quadrature. The Stata Journal. 2002;2(1):1–21. doi: 10.1177/1536867X0200200101. [DOI] [Google Scholar]
Rizopoulos D. ltm: An R package for latent variable modeling and item response theory analyses. Journal of Statistical Software. 2006;17(5):1–25. doi: 10.18637/jss.v017.i05. [DOI] [Google Scholar]
Rizopoulos D, Moustaki I. Generalized latent variable models with non-linear effects. British Journal of Mathematical and Statistical Psychology. 2008;61(2):415–438. doi: 10.1348/000711007X213963. [DOI] [PubMed] [Google Scholar]
Schilling S, Bock RD. High-dimensional maximum marginal likelihood item factor analysis by adaptive quadrature. Psychometrika. 2005;70(3):533–555. [Google Scholar]
Sinharay S. The asymptotic distribution of ability estimats: Beyond dichotomous items and unidimensional IRT models. Journal of Educational and Behavioral Statistics. 2015;40(5):511–528. doi: 10.3102/1076998615606115. [DOI] [Google Scholar]
Walker AM. On the asymptotic behaviour of posterior distributions. Journal of the Royal Statistical Society. Series B (Methodological) 1969;31(1):80–88. doi: 10.1111/j.2517-6161.1969.tb00767.x. [DOI] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary file 1 (pdf 409 KB)^{(386.2KB, pdf)}

[CR1] Anderson CJ, Li Z, Vermunt JK. Estimation of models in a Rasch family for polytomous items and multiple latent variables. Journal of Statistical Software. 2007;20(6):1–36. doi: 10.18637/jss.v020.i06. [DOI] [Google Scholar]

[CR2] Anderson CJ, Vermunt JK. Log-multiplicative association models as latent variable models for nominal and/or ordinal data. Sociological Methodology. 2000;30:81–121. doi: 10.1111/0081-1750.00076. [DOI] [Google Scholar]

[CR3] Anderson, C. J., & Yu, H.-T. (2017). Properties of Second-Order Exponential Models as Multidimensional Response Models. In L. A. van der Ark, M. Wiberg, S. Culpepper, J. A. Douglas, & W. C. Wang (Eds.), Quantitative Psychology. IMPS 2016. Springer Proceedings in Mathematics & Statistics (Vol. 196). Springer.

[CR4] Chang H-H. The asymptotic posterior normality of the latent trait for polytomous IRT models. Psychometrika. 1996;61(3):445–463. doi: 10.1007/BF02294549. [DOI] [Google Scholar]

[CR5] Chang, H.-H., & Stout, W. (1991). The asymptotic posterior normality of the latent trait in an IRT model. Technical Report ONR Research Report 91-4, Department of Statistics, University of Illinois at Urbana-Champaign.

[CR6] Chang H-H, Stout W. The asymptotic posterior normality of the latent trait in an IRT model. Psychometrika. 1993;58(1):37–52. doi: 10.1007/BF02294469. [DOI] [Google Scholar]

[CR7] Ghosal S. Normal approximation to the posterior distribution for generalized linear models with many covariates. Mathematical Methods of Statistics. 1997;6:332–348. [Google Scholar]

[CR8] Ghosal S. Asymptotic normality of posterior distributions in high dimensional linear models. Bernoulli. 1999;5:315–331. doi: 10.2307/3318438. [DOI] [Google Scholar]

[CR9] Ghosal S, Ghosh JK, Samanta T. On convergence of posterior distributions. The Annals of Statistics. 1995;23(6):2145–2152. doi: 10.1214/aos/1034713651. [DOI] [Google Scholar]

[CR10] Ghosh, J. K., Ghosal, S., & Samanta, T. (1994). Stability and convergence of the posterior in non-regular problems. Statistical Decision Theory and Related Topics V (pp. 183–199). Springer.

[CR11] Hessen DJ. Fitting and testing conditional multinormal partial credit models. Psychometrika. 2012;77(4):693–709. doi: 10.1007/s11336-012-9277-1. [DOI] [Google Scholar]

[CR12] Holland PW. The Dutch identity: A new tool for the study of item response models. Psychometrika. 1990;55(1):5–18. doi: 10.1007/BF02294739. [DOI] [Google Scholar]

[CR13] Ibragimov, I. A., & Has’minskii, R. Z. (1981). Statistical estimation: Asymptotic theory. Springer.

[CR14] Kornely, M. J. K. (2021). Multidimensional Modeling and Inference of Dichotomous Item Response Data. PhD thesis, RWTH Aachen University, Germany.

[CR15] Lee S, Bolt DM. An alternative to the 3pl: Using asymmetric item characteristic curves to address guessing effects. Journal of Educational Measurement. 2018;55(1):90–111. doi: 10.1111/jedm.12165. [DOI] [Google Scholar]

[CR16] Lehmann, E. L., & Casella, G. (1998). Theory of point estimation (2nd ed.). Springer.

[CR17] Li, Z. (2010). Loglinear models as item response models. PhD thesis, University of Illinois at Urbana-Champaign.

[CR18] Lord FM. Unbiased estimators of ability parameters, of their variance, and of their parallel-forms reliability. Psychometrika. 1983;48(2):233–245. doi: 10.1007/BF02294018. [DOI] [Google Scholar]

[CR19] Paek, Y. (2016). Pseudo-Likelihood Estimation of Multidimensional Polytomous Item Response Theory Models. PhD thesis, University of Illinois at Urbana-Champaign.

[CR20] Pelle E, Hesse D, van der Heijden PGM. A log-linear multidimensional rasch model for capture-recapture. Statistics in Medicine. 2016;35:622–634. doi: 10.1002/sim.6741. [DOI] [PubMed] [Google Scholar]

[CR21] Rabe-Hesketh S, Skrondal A, Pickles A. Reliable estimation of generalized linear mixed models using adaptive quadrature. The Stata Journal. 2002;2(1):1–21. doi: 10.1177/1536867X0200200101. [DOI] [Google Scholar]

[CR22] Rizopoulos D. ltm: An R package for latent variable modeling and item response theory analyses. Journal of Statistical Software. 2006;17(5):1–25. doi: 10.18637/jss.v017.i05. [DOI] [Google Scholar]

[CR23] Rizopoulos D, Moustaki I. Generalized latent variable models with non-linear effects. British Journal of Mathematical and Statistical Psychology. 2008;61(2):415–438. doi: 10.1348/000711007X213963. [DOI] [PubMed] [Google Scholar]

[CR24] Schilling S, Bock RD. High-dimensional maximum marginal likelihood item factor analysis by adaptive quadrature. Psychometrika. 2005;70(3):533–555. [Google Scholar]

[CR25] Sinharay S. The asymptotic distribution of ability estimats: Beyond dichotomous items and unidimensional IRT models. Journal of Educational and Behavioral Statistics. 2015;40(5):511–528. doi: 10.3102/1076998615606115. [DOI] [Google Scholar]

[CR26] Walker AM. On the asymptotic behaviour of posterior distributions. Journal of the Royal Statistical Society. Series B (Methodological) 1969;31(1):80–88. doi: 10.1111/j.2517-6161.1969.tb00767.x. [DOI] [Google Scholar]

PERMALINK

Asymptotic Posterior Normality of Multivariate Latent Traits in an IRT Model

Mia J K Kornely

Maria Kateri

Abstract

Supplementary Information

Introduction

Preliminaries

Remark 1

Review of APN for Univariate Latent Traits

Remark 2

Remark 3

Theorem 1

Theorem 2

Theorem 3

APN for Multivariate Latent Traits

Definition 1

Theorem 4

Remark 4

Regularity Conditions for Asymptotic Properties of Latent Vectors

Verification of the CS Regularity Conditions for Multidimensional IRT Models

Table 1.

Fig. 1.

Fig. 2.

Main Results

Theorem 5

Remark 5

Theorem 6

Simulation Study

Fig. 3.

Table 2.

Fig. 4.

Discussion

Supplementary Information

Acknowledgements

Open Access

Appendix: Proofs of Theorems in Section 7

Lemma 1

Proof

Proof of Theorem 5(i)–(ii)

Lemma 2

Proof

Lemma 3

Proof

Corollary 1

Proof of Theorem 5(iii)

Proof of Theorem 6

Funding

Footnotes

References

Associated Data

Supplementary Materials

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases