Skip to main content
Applied Psychological Measurement logoLink to Applied Psychological Measurement
. 2021 Feb 4;45(3):159–177. doi: 10.1177/0146621621990753

Examining Nonnormal Latent Variable Distributions for Non-Ignorable Missing Data

Chen-Wei Liu 1,
PMCID: PMC8042559  PMID: 33958834

Abstract

Missing not at random (MNAR) modeling for non-ignorable missing responses usually assumes that the latent variable distribution is a bivariate normal distribution. Such an assumption is rarely verified and often employed as a standard in practice. Recent studies for “complete” item responses (i.e., no missing data) have shown that ignoring the nonnormal distribution of a unidimensional latent variable, especially skewed or bimodal, can yield biased estimates and misleading conclusion. However, dealing with the bivariate nonnormal latent variable distribution with present MNAR data has not been looked into. This article proposes to extend unidimensional empirical histogram and Davidian curve methods to simultaneously deal with nonnormal latent variable distribution and MNAR data. A simulation study is carried out to demonstrate the consequence of ignoring bivariate nonnormal distribution on parameter estimates, followed by an empirical analysis of “don’t know” item responses. The results presented in this article show that examining the assumption of bivariate nonnormal latent variable distribution should be considered as a routine for MNAR data to minimize the impact of nonnormality on parameter estimates.

Keywords: item response theory, missing not at random, nonnormal distribution


Missing data are ubiquitous in psychological studies, social surveys, and many other scientific fields. An example in the Round 4 of 2008 European Social Survey revealed a maximum rate of non-response, 20.2%, for attitude items in Bulgaria (Kuha et al., 2018). Some other examples, such as examinee-selected (ES) items in educational testing, demonstrate that half of the item responses can be missing when the examinees are allowed to choose one item from pairs of items to answer (Liu & Wang, 2017). Analyses of incomplete data studies have shown that the respondents’ attitudes or abilities are often related to their propensities for missing data. Various studies have shown that treating non-ignorable missing data as ignorable will bias parameter estimates (De Ayala et al., 2001). To delineate the fundamental mechanism of missing data, we first introduce three well-known types of missing data mechanisms—missing completely at random (MCAR), missing at random (MAR), and missing not at random (MNAR) (Rubin, 1976). The first two types are known to lead to “ignorable” missing data for likelihood inference, whereas the third one requires a particular missingness model to account for the MNAR effect. In this article, we focus on the MNAR data.

Numerous missing data types have been noticed in psychological tests and social surveys—(a) not administered items, (b) a linking test design, (c) computerized adaptive testing, (d) multi-stage testing, (e) not-reached items, (f) non-response items (e.g., omitted or “don’t know”), and (g) ES items. The first four types have been shown MCAR or MAR, so they require no special treatment for likelihood methods (Mislevy, 2016). In contrast, the last three types often incur MNAR data (e.g., Liu & Wang, 2017; Rose et al., 2017). The not-reached missing responses often occur when test-takers do not have enough time to complete a test and left consecutive missing values at the end of the items. In contrast, the omitted items in psychological testing are recognized as test-takers skip items when they do not understand the content or do not know the answer. Non-response options of items in social surveys often include “refusal,”“no opinion,” or “don’t know.” The resulting missing patterns are similar to omitted items but different from not-reached items, so the latter is usually tackled differently in statistical analysis (e.g., Rose et al., 2017). The ES items are distinct from the aforementioned missing data types in that the ES design requires the test-takers to choose a fixed number of items they want to answer. The definition of ES design is that a test-taker is allowed to select one or more items from a block of multiple items, where a whole test can comprise multiple blocks of items. The items included in the ES design are called ES items. The number of items been selected is fixed for each block. For example, an ES test consists of two blocks of items, where the first block has two items and the second block has three items. The examinees are allowed to select only one item from the first block and only two items from the second block. In practice, how many items can be selected is up to the ES design. This is called the ES test. As a result, the missing data are massive, and different statistical method is required (Liu & Wang, 2017; Liu & Qui, 2019). Several kinds of MNAR modeling have been studied such as confirmatory factor analysis (Muthén et al., 1987), structural equation modeling (Bacci & Bartolucci, 2015), latent class models (List et al., 2019), multi-group models (Rose et al., 2015), or item response theory (IRT; Holman & Glas, 2005). However, it seems no question has been put to the assumption of bivariate latent variable normal distribution in these approaches.

For not-reached items, the latent regression model is proposed to avoid the violation of local independence between the binary missing-datum indicators (Rose et al., 2010). This method incorporates the person-wise average number of not-reached items as a covariate to predict the person’s latent trait of interest. This latent regression method has become a standard for not-reached items in the Programme for International Student Assessment (PISA) since 2017 (Organisation for Economic Co-Operation and Development, 2017). We also note that an alternative method like multiple-group IRT could be used for not-reached items (Rose et al., 2010). As the latent regression or the multiple-group method does not posit a latent propensity (i.e., the tendency to giving missing responses), the not-reached items will not be discussed further in this article.

Regarding non-response items, a standard bivariate IRT model (denoted as BM) comprising a general IRT model that assumes a latent trait of scientific interest for observed item responses and a missingness model that has a latent propensity for missing responses has been proposed (e.g., Holman & Glas, 2005). The critical feature of the BM is the use of a binary missing-datum indicator matrix. This indicator matrix might contain non-ignorable missing data information for the latent trait of interest. The significant correlation between the latent trait and latent propensity suggests that the occurrence of missing data is related to the latent trait (Holman & Glas, 2005), which violates the assumption of MCAR and leads to MNAR data (Rubin, 1976). However, note that the binary missing-datum indicator matrix is not useful for ES items because the ES design induces the violation of local independence between the binary missing-datum indicators (Liu & Wang, 2017). We will provide an explicit explanation for such violation of local independence in the next section. Fortunately, the BM can deal with the ES items when the missing patterns of selected items are utilized instead (Liu & Wang, 2017).

The bivariate normal distribution of the trait and the propensity is often taken for granted in the BM, possibly for its parametric simplicity, straightforward interpretation, or estimation stability. However, the bivariate normality assumption is barely verified. It has been shown that disregarding the unidimensional nonnormality, especially the skewed or bimodal, could result in biased estimates, as demonstrated in several simulation studies (Woods, 2006; Woods, 2007; Woods & Lin, 2009). Concerning the MNAR data, the impact of violating the bivariate normality on parameter estimates remains an open question. Moreover, MNAR data have been found tending to induce nonnormal latent propensity distribution in shape (Lange et al., 1989). The missing patterns could appear skewed because of systematically missing from one or many items. Despite the adverse effect of nonnormality on the parameter estimates due to MNAR data has been noticed (e.g., Shin et al., 2009), this issue seems not been addressed yet. In this article, we extend two statistical methods for MNAR data that allow for estimating the bivariate latent variable distribution together with item parameters and demonstrate the consequence of ignoring the nonnormality and MNAR data on parameter estimates. The utility of the new approaches will be assessed in simulation studies and empirical data and compared with conventional BM.

The remaining sections of this article are as follows. First, the BM for non-response items and ES items is reviewed. Second, a BM with flexible nonnormal estimation methods is proposed. Third, a simulation study is conducted to examine the impact of ignoring nonnormal latent variable distribution and MNAR data on estimates, followed by analyzing an empirical dataset of “don’t know” items. Finally, concluding remarks are given about the utilities and extensions of the new BM for future study.

BMs for Non-Response and ES Items

In MNAR modeling for non-response items (e.g., omitted or “don’t know”), the missing data are informative about respondents’ inclinations toward responding to items. A straightforward approach is to use a binary missing-datum indicator matrix created by the binary pattern of each missing response. Let Dni be the binary missing-datum indicator and its realized value dni for person n and item i, which is defined as

dni={1ifyniwasmissing0ifyniwasobserved, (1)

where yni is the item response. However, the binary missing-datum indicator does not apply to ES items because dni and dnj for ij within a block are not locally independent (Liu & Wang, 2017). Specifically, suppose an ES test comprises a block of two items and examinees are only allowed to pick one item from the block. The binary missing-datum indicator (d1, d2) is either (0, 1) (the first item is selected and answered) or (1, 0) (the second item is selected and answered). The patterns such as (1, 1) or (0, 0) are not allowed due to ES design. Consequently, the sample space of the random variables d1 and d2 is reduced (from 4 points to 2 points), which violates the original assumption made in Equation 1. In other words, when d1 is determined, the value of d2 is also determined simultaneously. A degree of freedom is lost for d2 in this case, and vice versa for d1. Thus, the binary missing-datum indicator is no longer applicable to ES items. To avoid this issue, the patterns of respondents’ selected items are utilized instead. A selection pattern matrix is created to provide information about the selection behavior of respondents, which consists of the index of the selection patterns Mbn(1,,k,,Wb) for item block b and person n, where Wb denotes the number of possible selection patterns in block b. Note that a student/examinee is only allowed to pick a fixed number of items from each block. Taking an item block that needs respondents to select two out of three items as an example, the index will be mbn = 1 for selection pattern (1, 2), mbn = 2 for selection pattern (1, 3), and mbn = 3 for selection pattern (2, 3). Therefore, mbn = k and mbn = k′ for k ≠ k′ are mutually independent.

With D for non-response items and M for ES items, respectively, the likelihood function of parameters of interest for the BM (cf. Holman & Glas, 2005; Liu & Wang, 2017) is formulated as

Pr(θ,η,γ,κ,ρ;Yobs,Q)Pr(Q|γ,κ)Pr(Yobs|θ,η)Pr(θ,γ|ρ), (2)

where Q ∈ (D, M), γ is the latent propensity of person, κ is a vector consisting of all the parameters in the missingness model (first term), θ is the latent trait of a person, η consists of all item parameters in the measurement model (second term), and ρ is the structural parameter in the structural model (i.e., prior distribution) of θ and γ (third term). The missingness model can be, for instance, the two-parameter logistic model (2PLM; Birnbaum, 1968) for non-responses (D). The item response probability of the 2PLM is expressed by

Pr(Dni=1|ηn,ai,bi)=exp(aiηn+bi)1+exp(aiηn+bi), (3)

where ai is the slope parameter for ith item, bi is the intercept parameter, and ηn represents the latent propensity of omitting the item for nth person. Note that we use plus sign before bi, which only affects the interpretation of bi. Within the non-response context, γ = η.

In contrast, for ES items, the missingness model is suggested to be the nominal response model (NRM; Bock, 1972). The NRM is defined as

Pr(Mbn=k|κn,λbk,τbk)=exp(λbkκn+τbk)h=1Wbexp(λbhκn+τbh), (4)

where λbk is the slope parameter and τbk is the intercept parameter, for selection pattern k in block b, and κn represents the latent propensity of selecting the item(s) in the blocks. Within the ES context, γ = κ.

In terms of interpretations for the two types of missingness, the item selection behavior in the ES items may result from various underlying reasons such as item difficulty, scoring rubric, topic, motivation, item length, or familiarity (Allen et al., 2005; Fitzpatrick & Yen, 1995; Jennings et al., 1999; Powers & Bennett, 1999). In contrast, the non-responses can be due to item difficulty, time intensity, no opinion, don’t know, and so on. Thus, the underlying reasons that lead to the two types of missingness can be very different. Their differences can also be seen in terms of modeling. We must use different missingness models for each type of missingness. For example, we adopt the 2PLM for non-responses because the missingness can only either be “missing” (D = 1) or “observed” (D = 0). On the contrary, the item selection in ES items can be scored as polytomous and nominal (M); thus, the NRM is much appropriate in this situation. Interestingly, the two types of missingness have one similarity in that their likelihood functions can be subsumed by Equation 2.

As for the measurement model, it could be any IRT model, such as the generalized partial credit model (GPCM; Muraki, 1992). The slope-intercept version of the GPCM is given by

Pr(Yni=y|θn,αi,δik)=exp[y(αiθn)+δik]j=0Jiexp[y(αiθn)+δij], (5)

where y is the observed item response, αi is the slope parameter, δik is the kth “easiness” parameter for item i, and Ji is the maximum score of item i.

Regarding the structural model of θ and γ, a bivariate normal distribution is often assumed. The ρ, linear correlation between θ and γ, is used to indicate the magnitude of the MNAR effect. However, the normality assumption was barely verified in the IRT literature for the BM. In this article, the significant contribution is to examine the detriment of ignoring the nonnormal distribution and MNAR data on parameter estimates, and show that the new approaches can deal with such situation. We first review the expectation–maximization (EM; Dempster et al., 1977) algorithm that will be used for parameter estimation.

Parameter Estimation Using EM

Let X, Y, Q, and ζ be the latent variables, the observed data, the missing data indicators, and the model parameters, respectively, and the realized values of X, Y, and Q are x, y, and q, x = (θ, γ), and q = (d, m). The logarithm of the marginal likelihood function of ζ given y and q is given by

l(ζ;y,q)=logPr(y,q|x,ω)Pr(x|φ)dx, (6)

where ω = (η, κ); ζ = (ω, ϕ); ϕ is the collection of structural parameters of latent variables; Pr(y, q|x, ω) is the joint item response probability function of y and q, which is usually simplified as a product of item response probability and missingness probability; and Pr(x|ϕ) is a prior distribution for the latent variables. Directly evaluating the marginal likelihood function is computationally intensive, an alternative surrogate function based on EM is often used:

Q(ζ|ζ(t))=Ex|y,q,ζ(t)[l(y,q|x,ω)+l(x|φ)], (7)

where ζ(t) is the current estimates at iteration t; the expectation of the joint log-probability function of y, q, and x is taken with respect to the posteriori of x given y, q, and ζ(t); and the item response probabilistic function can be user-specified in the Q function. The Q function is thus maximized with respect to ζ given fixed ζ(t) during iterations and updated until user-defined termination criteria are satisfied.

A notable feature of the EM is that the ω and ϕ can be separately updated. The ω is estimated as usual in the EM (Woods & Lin, 2009). To estimate the distribution of the latent variables (i.e., estimate structural parameters), the log-likelihood function of ϕ is given by

Q(φ|φ(t))=Ex|y,q,ω(t),φ(t)l(x|φ)w=1WPr(xw|y,q,ω(t),φ(t))logPr(xw|φ), (8)

which is usually approximated by numerical quadrature methods, where W is the total number of quadrature points. Often, ϕ contains the means and covariance matrix when Pr(x|ϕ) is assumed a normal distribution.

Bivariate Nonnormal Latent Variable Models

In this study, we proposed two methods that were developed for latent variable models. The first method is the empirical histogram (EH) method (e.g., Mislevy, 1984; Woods, 2007) and the other one is the Davidian curve (DC; Woods & Lin, 2009; Zhang & Davidian, 2001). We notice that the applications of the two methods are often restricted to unidimensional IRT models for completely observed item responses (for a review, see Woods, 2014). In this study, we extended the two methods to deal with the bivariate nonnormal latent variable distribution estimation for MNAR data.

EH

Before stating the main results of the EH method, more notations are required. Let P (S×W) denote the posterior probability matrix of x given y, q, and ζ(t), which is given by

Psw=Pr(θw,γw|φ(t))Πi=1IPr(ysi|θw,η(t))χ(ysi)Πb=1BPr(qsb|γw,κ(t))w=1WPr(θw,γw|φ(t))Πi=1IPr(ysi|θw,η(t))χ(ysi)Πb=1BPr(qsb|γw,κ(t)), (9)

where the χ(ysi) is the data indicator function where χ(ysi) = 1 if ysi is not missing and χ(ysi) = 0 otherwise, S is the number of response patterns, N is the number of people, SN, and B is either the number of missing-datum indicators or the number of blocks. For non-response items, B = I; for ES items, B is the number of blocks. The EH works as follows. In the E-step of the EM, the current Pr(x|ϕ(t)) is replaced by the Pr(x|y,q,ω(t–1),ϕ(t–1)) retrieved from the previous E-step. The “Pr, x|ϕ(t=1)”, is, at the very first E-step, set equal to a standard normal distribution. In this case, the ϕ(t) represents the posterior probabilities from the previous E-step on the quadrature points.

DC

The bivariate DC is presented for MNAR data. As described by Zhang and Davidian (2001), the Pr(x|ϕ) is reformulated as

Pr(x|φ)=PrK2(x|φ)f(x), (10)

where K is the order of the polynomial PrK(x|ϕ), f(x) is a standard bivariate normal distribution, and PrK(x|ϕ) is formulated as

PrK(x|φ)=h1=0Kh2=0h1ah1h2,h2x1h1h2x2h2, (11)

where ah1h2,h2 is the coefficient parameter of the polynomial function. For later computational purpose, let a be a vector (G× 1) comprising all the ah1h2,h2 and V be a matrix (G× 2) containing all the patterns of the two exponents (h1 – h2 and h2) of x1 and x2 over h1=0,,K and h2=0,,h1, where G=(Πd=12K+d)/Πd=12d. For ensuring the integral of the latent variable distribution is one, the a of PrK(x|φ) must be constrained to achieve Pr(x|φ)dx=1 by

E{PrK2(x|φ)}=aTE(UUT)a=aTAa=1, (12)

where U is a vector (G× 1) with gth element x1Vg,1x2Vg,2 for g=1,,G, and A is a matrix (G×G) with (g,h)th element equal to E(x1Vg,1+Vh,1)E(x2Vg,2+Vh,2) for g=1,,G and h=1,,G. E(xj) denotes the expected value of a normally distributed variable xj, where j is a positive integer. Instead of estimating a, Zhang and Davidian (2001) proposed to use polar coordinate transformation to stabilize parameter estimation. Notice that A is positive definite, so there exists A = BTB. Let c=Ba(G×1) and c=(c1,cG)T which are given by

c1=sin(ϕ1),c2=cos(ϕ1)sin(ϕ2),cG1=cos(ϕ1)cos(ϕ2),,cos(ϕG2)sin(ϕG1),cG=cos(ϕ1)cos(ϕ2),,cos(ϕG2)cos(ϕG1), (13)

where –π/2 < ϕg≤π/2 for g=1,2,,G1. Note that the ϕ=(ϕ,,ϕG1) are the parameters of DC to be estimated. Because of a = B–1c,

Pr(x|φ)=[Z(B1c)]2°f(x), (14)

where Z is a W×G matrix defined as

Z=[x11V11x21V12...x11VG1x21VG2.........x1WV11x2WV12...x1WVG1x2WVG2]. (15)

Inserting Equation 14 into Equation 8, we maximize Equation 8 with respect to φ in the EM.

Scale Identification when Pr(x) is Estimated

For the estimation of the bivariate nonnormal latent variable distribution (e.g., EH and DC), the latent scale is identifiable by fixing the means and variances of x to 0 and 1, respectively, where the correlation between the latent trait and latent propensity is not constrained. A strategy to achieving the model identification is to standardize the histogram of expected frequencies, Pr(x|y,q,ω(t), ϕ(t)), in the E-step. However, the locations of the quadrature points of the standardized histogram are changed. To keep the locations unchanged, one has to approximate the height of the histogram on the original locations of quadrature points after standardization. A straightforward method for achieving this goal is to use the interpolation approach (Woods, 2007). In this article, the pointwise bivariate interpolation implemented in the akima package in R programming software was adopted for carrying out the computation (Akima, 1978). The estimated histogram on the original locations of quadrature points is then used for the next iteration of the EM.

Simulation Study

A simulation study was conducted to assess the consequence of ignoring the bivariate nonnormal latent variable distribution when MNAR data are present. The test length was 20 binary items, and 1,000 respondents were used (Woods & Lin, 2009). Two missing data types were investigated. The first type was non-response (e.g., don’t know), and the second type was ES. The first type was usually of a smaller portion of missing data in practice; thus, in this study, roughly 20% missing rate was set (Kuha et al., 2018). The second type was often of a larger portion of missing data, and a 50% missing rate was set (Liu & Wang, 2017). The application scenarios of the “don’t know” is that when practitioners would like to utilize “don’t know” as supplementary information to improve the accuracy of measurement model, instead of treating “don’t know” as ignorable missing data, the new models can be adopted. For the ES items, the application scenarios are somewhat restricted to the condition that allows examinees to select items. If it happens, the new models are recommended. The dichotomous item responses were simulated by the 2PLM, where αi was drawn from N(1.7, 0.82) truncated between 0.5 and 2, and δi was drawn from N(0, 1.22) (Woods & Lin, 2009). The missing patterns were generated by the 2PLM for the non-response items with bi = −1.643 and ai = 1. For the ES items, the respondents were asked to answer one out of paired items, where 10 pairs were used (i.e., 20 items in the test). λb1 = 0 and λb2 = −1.5, and τb1 = 0 and τb2 = 0 for each block were adopted (Liu & Wang, 2017). The missing data of the ES items were generated by the NRM.

Three densities were used to generate the bivariate latent variables, which extended the unidimensional setting in Woods and Lin (2009) to the two-dimensional setting:

  1. Standard bivariate normal: N(0, Σ), where variances were 1 and correlation was .7.

  2. Bimodal: 0.6N(µ1, Σ1) + 0.4N(µ2, Σ2), where µ1 = [0.6, 0.6] is the mean vector for the first normal distribution, µ2 = [–0.9, –0.9] is the mean vector for the second normal distribution:

Σ1=[0.6782330.30.30.678233]andΣ2=[0.6782330.30.30.678233]. (16)
  • 3. Skew: The bivariate skewed normal distribution with zero means and a skew parameter value equal to 1 was used to simulate latent variables by the Vale and Maurelli (1983) method which was implemented in the fungible package (see monte1 function; Waller et al., 2015). To ensure the true values of the latent variables not deviating too much from population values, a quality control step was used in the data generation where the true values of latent variables were restricted between –5 and 5. The correlation was set to .7 to ensure the MNAR effect is large enough to examine the consequence of ignoring the skewed distribution of latent variables.

All the distributions have zero expected value (mean) and unit variance in each dimension, which is consistent with the model identification. The number of replications was 100. The number of quadrature points was 612, and the range was between –5 and 5 for each dimension. For each simulated data, the BM (two-dimensional 2PLM), BM0 (two-dimensional 2PLM with correlation constrained to zero), EH, and DC (with varying k=1,,15) were fitted to the data. Our prior research indicated that the DC is sometimes sensitive to starting values; thus, 10 sets of multiple starting values were initialized randomly for each replication. This approach was shown mitigating the issue of local maxima according to the prior experience.

Outcome measures of interest were (a) accuracy of the Akaike information criterion (AIC), Bayesian information criterion (BIC), and Hannan–Quinn information criterion (HQIC) in justifying “better” model; (b) accuracy of the item parameter estimates; and (c) accuracy of the person estimates by expected a posteriori (EAP). The AIC, BIC, and HQIC were, respectively, calculated by –2logL+ 2p, –2logL+plog(N), –2logL+ 2p(log(log(N))), where logL is the log-likelihood value, p is the number of estimated parameters, and N is the number of respondents. Notice that the results of model comparison for the EH will not be presented because of its large penalty term (i.e., due to numerous quadrature points, 612) and too large information criterion values, we simply compared the model selection between DC, BM, and BM0. The motivation for comparing the performance of the information criteria for DC, BM, and BM0 is that their performances on MNAR data have not been investigated. Thus, although the EH will not feature in the results of model comparison, it remains important to examine the performances of the three information criteria for MNAR data.

The averages of absolute bias and root mean square error (RMSE) over items were used to assess the recovery of item parameters. EAPs were approximated by xxPr(x|y,ω^,φ^) given item estimates and distribution estimates, where Pr(x|y,ω^,φ^) is defined in Equation 9 and x contains the two-dimensional quadrature points.

Results

Table 1 shows the accuracy of AIC, BIC, and HQIC in model selection between BM0, BM, and DC. The three information criteria correctly selected the BM in the simulation replications for latent variables generated from the standard bivariate normal distribution. In contrast, for bimodal distribution and skewed distribution, the AIC generally performed better than the HQIC and BIC, whereas the BIC was the least satisfactory in performance.

Table 1.

Accuracy Number of AIC, BIC, and HQIC in Justifying “Better” Model When Latent Variable Distribution Is Bivariate Standard Normal Distribution (Normal), Bimodal Distribution (Bimodal), or Skewed Bivariate Normal Distribution (Skewed), for N = 1,000.

Non-response Examinee-selected
Latent trait distribution HQIC AIC BIC HQIC AIC BIC
Normal
 DC > BM 0 0 0 0 0 0
 DC > BM0 100 100 100 100 100 100
 BM > BM0 100 100 100 100 100 100
Bimodal
 DC > BM 70 95 46 32 71 19
 DC > BM0 100 100 100 100 100 100
 BM > BM0 100 100 100 100 100 100
Skewed
 DC > BM 100 100 95 96 100 49
 DC > BM0 100 100 100 100 100 100
 BM > BM0 100 100 100 100 100 100

Note. AIC = Akaike information criterion; BIC = Bayesian information criterion; HQIC = Hannan–Quinn information criterion; DC = Davidian curve; BM = bivariate model; “DC > BM” = Davidian curve method is preferred to bivariate model, “BM > BM0” = bivariate model is preferred to bivariate model with correlation constrained to zero, and so on.

Table 2 presents the average absolute bias over items under different models and missing data types. For latent variables generated from the bivariate normal distribution, bias and RMSE were similar among the BM, DC, and EH but the BM0, which means additionally estimating latent variable distribution did not induce adverse effects on item estimates. For the bimodal latent variable distribution, there were nearly no evident effects on the item estimates for all the models but the BM0. This result means that the BM was robust to the bimodal latent variable distribution considered. For the skewed normal distribution, the EH yielded the least bias for the item parameter estimates, closely followed by the DC. It is evident that the BM was not robust to the skewed normal distribution considered, and the BM0’s performance was always least satisfactory. Similar patterns for the RMSE of the item estimators can be found in Table 3.

Table 2.

Averages of Absolute Bias Over Items for BM0, BM, DC, and EH, for Omitted Items and Examinee-Selected Items, for N = 1,000.

Non-response Examinee-selected
Latent trait distribution α^ δ^ λ^ τ^ α^ δ^ λ^ τ^
Normal
 BM0 .05 .07 .04 .06 .09 .17 .05 .04
 BM .04 .05 .04 .06 .06 .07 .04 .04
 DC .04 .05 .04 .06 .06 .07 .04 .04
 EH .04 .05 .05 .06 .07 .07 .04 .04
Bimodal
 BM0 .04 .06 .05 .06 .09 .13 .05 .05
 BM .04 .06 .05 .06 .09 .07 .05 .04
 DC .04 .06 .06 .07 .07 .07 .06 .04
 EH .04 .06 .06 .07 .07 .07 .05 .04
Skewed
 BM0 .12 .16 .16 .10 .22 .21 .05 .07
 BM .10 .10 .14 .09 .17 .16 .05 .07
 DC .07 .08 .07 .08 .10 .11 .06 .05
 EH .06 .06 .07 .07 .07 .08 .04 .04

Note.α^ and δ^ are parameter estimators of measurement model (generalized partial credit model), and λ^ and τ^ are parameter estimators of missingness model (nominal response model). BM0 = bivariate IRT model with zero correlation; BM = bivariate IRT model; IRT = item response theory; BM = bivariate model; DC = Davidian curve; EH = empirical histogram.

Table 3.

Averages of RMSE Over Items for BM0, BM, DC, and EH, for Omitted Items and Examinee-Selected Items, for N = 1,000.

Non-response Examinee-selected
Latent trait distribution α^ δ^ λ^ τ^ α^ δ^ λ^ τ^
Normal
 BM0 .14 .11 .12 .10 .21 .21 .13 .08
 BM .13 .10 .12 .10 .20 .14 .13 .08
 DC .14 .10 .12 .10 .20 .14 .13 .08
 EH .15 .10 .13 .10 .22 .14 .14 .08
Bimodal
 BM0 .13 .10 .12 .10 .21 .18 .13 .09
 BM .13 .10 .12 .10 .21 .14 .13 .09
 DC .14 .10 .14 .11 .21 .14 .15 .09
 EH .14 .10 .13 .11 .23 .14 .14 .09
Skewed
 BM0 .18 .17 .19 .13 .30 .25 .13 .09
 BM .16 .13 .18 .12 .26 .20 .13 .09
 DC .15 .12 .13 .11 .22 .16 .14 .08
 EH .17 .10 .12 .10 .22 .15 .13 .07

Note.α^ and δ^ are parameter estimators of measurement model (generalized partial credit model), and λ^ and τ^ are parameter estimators of missingness model (nominal response model). RMSE = root mean square error; BM0 = bivariate IRT model with zero correlation; BM = bivariate IRT model; IRT = item response theory; DC = Davidian curve; EH = empirical histogram.

Table 4 displays the average absolute bias of EAP over persons. We divided the persons into four groups based on their true trait values in order and, respectively, showed the average absolute bias of the EAP estimates. The results show that the BM, DC, and EH yielded similar results, where the BM0 ignoring the MNAR data and nonnormality always produced worse results. These results also suggest that the EAP estimates were not influenced much by the nonnormal latent variable distribution for the BM, DC, and EH, under the present simulation settings. The patterns of the results for RMSE were similar and thus not presented here.

Table 4.

Average Absolute Bias of EAP Over Persons for BM0, BM, DC, and EH, for Omitted Items and Examinee-Selected Items, for N = 1,000.

Non-response Examinee-selected
Latent trait distribution 25% 50% 75% 100% 25% 50% 75% 100%
Normal
 BM0 .20 .21 .19 .22 .20 .23 .21 .23
 BM .19 .19 .18 .21 .18 .19 .19 .20
 DC .20 .21 .19 .22 .19 .22 .20 .22
 EH .20 .21 .19 .21 .19 .22 .20 .22
Bimodal
 BM0 .20 .20 .23 .19 .19 .20 .23 .22
 BM .19 .19 .22 .19 .17 .19 .22 .20
 DC .20 .20 .23 .19 .19 .20 .23 .22
 EH .20 .20 .23 .19 .18 .20 .23 .21
Skewed
 BM0 .22 .20 .23 .22 .22 .21 .22 .24
 BM .19 .19 .21 .21 .20 .20 .20 .20
 DC .21 .20 .23 .21 .21 .21 .21 .23
 EH .21 .20 .22 .21 .21 .21 .21 .23

Note.“25%,”“50%,”“75%,” and “100%” are, respectively, short for 0% to 25%, 25% to 50%, 50% to 75%, and 75% to 100% of persons sorted based on their true latent trait values. EAP = expected a posteriori; BM0 = bivariate IRT model with zero correlation; IRT = item response theory; BM = bivariate IRT model; DC = Davidian curve; EH = empirical histogram.

Other Simulation Conditions

Further investigations into the effects of (a) sample sizes and (b) missing data rates on parameter estimates were conducted in this section. For condition (a), smaller or larger sample sizes than N = 1,000 were considered: n = 250, 500, and 5,000. The smaller sample sizes were used to examine whether the sample size would influence the item and person estimates and lead to estimation instability for DC and EH. In addition, the larger sample size was adopted to check whether the effect of nonnormality would still have significant influence on parameter estimates.

For condition (b), missing rates might affect the parameter estimation stability of DC, especially if missing rates are sparse. Thus, we considered three smaller missing rates, 1%, 5%, and 10%, than 20% used in previous simulation studies, which were generated by setting bi = −5.05, bi = −3.35, and bi = −2.55, respectively. Notice that bi = −5.05 is an extreme item location which could not be stably estimated. Non-responses were simulated from the 2PLM. Other settings were identical as previous simulation studies (e.g., N = 1,000). ES items were not included here because its missing rates are often large in practice.

The results for condition (a) were shown in Supplemental Appendix A (N = 250), Supplemental Appendix B (N = 500), and Supplemental Appendix C (N = 5,000). Compared with the results for N = 1,000 (see Tables 14), the AIC still performed best which appears irrelevant to sample sizes. As sample size increases from 250 to 5,000, the accuracy rate grows closer to large, until it is 100% for most conditions in bimodal and skewed conditions. Notably, Supplemental Table A1 shows that the AIC was still sensitive when distribution changed from normal to skewed for N = 250. In terms of bias and RMSE for skewed conditions, the DC performed similarly with the BM for N = 250 (see Supplemental Tables A2 and A3). When increased to N = 500 and N = 1,000, the DC performed slightly better than the BM (see Supplemental Tables B2 and B3, and Tables 2 and 3). When N = 5,000, the DC performed overall better than the BM (see Supplemental Tables C2 and C3). Therefore, as observed, the larger the sample size, the easier it is to detect the MNAR effect, especially under skewed conditions. The reason is that, as more samples available, even the small shape change of the nonnormal distribution can be detected. For EAP estimates, the bias and RMSE were similar across sample sizes, because fixed test length was used (I = 20) and the accuracy of a person’s estimate relies on how much item information can be obtained for a person. In terms of estimation stability for the DC, the results indicated that the model can be stably estimated even with small sample sizes such as N = 250.

The results for condition (b) were shown in Supplemental Appendices D, E, and F. Regarding 1% missing rate condition, Supplemental Table D1 shows that the AIC performed best in selecting skewed distributions. Supplemental Table D2 shows that the DC and EH slightly improved the estimates in terms of bias and RMSE for α^ and δ^ in the case of skewed condition. The item estimates of the missingness model, λ^ and τ^, were unstable due to extremely low missing rate. Fortunately, the DC and EH were still stable for α^ and δ^, because the observed item response matrix was dense. When the missing rate increased to 5%, Supplemental Table E1 shows the AIC still performed best in model selection. Supplemental Table E2 shows similar patterns that the λ^ and τ^ of DC were slightly less accurate than the BM, but the estimation was more stable for the DC with 5% missing rate than 1%. Moreover, the interested estimates, α^ and δ^, were slightly more accurate than the BM, which means the DC can gain a slight improvement on estimation accuracy. For 10% missing rate condition, the overall patterns of the results shown in Supplemental Appendix F were similar to the 5% missing rate condition. For EAP estimates across conditions, there were no evident differences between methods due to fixed test length. Overall, the results presented in this section show that using the parameter estimation of the missingness model is stable with small missing rates such as 5% (e.g., λ^ and τ^) and item estimates (e.g., α^ and δ^) could be improved even with missing rate of 1% or more.

Religious Orientation Data With Don’t Know Items

The data from a religious orientation study about health consisted of 565 respondents (188 males, 374 females, three missing values) in Taiwan (Liu, 2010). Twenty-five-category items were designed to measure the relationship between religious belief and health, where the options included 1 = strongly disagree, 2 = disagree, 3 = don’t know, 4 = agree, and 5 = strongly agree. The survey was initially developed to indirectly measure a person’s prejudice against ethnic minorities by assessing his or her extrinsic and intrinsic orientations in religious belief. However, the information of the non-responses (don’t know) was often ignored in previous data analysis. In this study, we focused on the impact of ignoring the non-responses and the nonnormal latent variable distribution. The “don’t know” responses were regarded as missing data, and the corresponding missing-datum indicator matrix was constructed (0: not choosing “don’t know,” 1: choosing “don’t know”). The “don’t know” rates of the 20 items ranged between 6.02% and 26.73% (average = 18.73%). For example, the “don’t know” response rate of the item, “Preferring to join a Bible study group rather than a social fellowship,” was 26.73%. The high “don’t know” rate might be that only the respondents who are engaged in religious activities can tell the difference in Bible study group and social fellowship, but not for others who are not familiar with religious activities. In other words, such “don’t know” responses are likely informative in distinguishing the respondents’ religious familiarity which might correlate with their religious belief.

The BM0, BM, EH, and DC with k = 1 through k = 10 were, respectively, fitted to the data, where the item response model was the GPCM and the missingness model was 2PLM. According to the previous simulation study results, the AIC was adopted to choose the best model fit to the data. We carried out a likelihood-ratio test to compare the best-fit DC with the BM, followed by comparing the item estimates and person estimates (EAP) among the BM0, BM, and DC. We also inspected the three-dimensional plot of the bivariate latent variable distribution of the best-fit DC.

In addition to the illustrative three-dimensional plot, the adjusted Fisher–Pearson coefficient of skewness (denoted as g1; Joanes & Gill, 1998) was used as a numerical index to describe the amount of skewness, respectively, for θ and γ distributions. Calculating g1 requires empirical samples of θ and γ from the estimated distribution (Figure 2; DC with k = 4). The Metropolis–Hastings (MH) algorithm (Hastings, 1970; Metropolis et al., 1953) was used to draw the samples of θ and γ, where the likelihood function is the estimated distribution. First, the covariance matrix of the proposal distribution in the MH was estimated with “burn-in” set to 1,000, followed by 50,000 iterations to collect the samples. This is called adaptive phase in the MH. Second, the MH was rerun with the estimated covariance matrix and 1 million iterations to collect the samples. The “thin” was set to 20 to alleviate the autocorrelation between iterations. The final number of samples was 50,000 and then used to calculate the skewness statistics. Regarding bivariate normality testing, Mardia’s (1970) skewness test was carried out to test the null hypothesis—The skewness of the samples’ distribution is identical to that of a bivariate normal distribution. If rejected, we conclude that the distribution of the samples is skewed.

Figure 2.

Figure 2.

Estimated distribution of latent variables of DC (k = 4) viewed from different angles for religious orientation data.

Note. DC = Davidian curve.

Our preliminary analysis showed that four items’ slope estimates were zero (at the boundary of parameter space). We avoided the effects of the zero slopes on computing log-likelihood ratio by discarding the four items. As a result, 16 items were retained in the subsequent analysis.

Results

The values of the AIC for the 10 DCs decreased from 20,979.35 (k = 1) to the minimum 20,800.09 (k = 4) and gradually increased to 20,863.97 (k = 10). In contrast, the AIC values for the BM0, BM, and EH were 20,981.39, 20,874.52, and 28,098.24, respectively. The EH tended to have a considerable AIC value due to numerous free parameters in the distribution estimation. Comparing the DC with k = 4 with the BM by likelihood-ratio test, the results show that the DC with k = 4 was preferred to the BM, χ2(9) = 94.43, p = .00, which is consistent with the results of the AIC.

The item estimates were shown in Table 5. The overall results indicate that the item parameter estimates were similar for the BM and DC with k = 4, but slightly different from those of the BM0. The EH yielded little different item estimates.

Table 5.

Item Parameter Estimates of Generalized Partial Credit Model for Religious Orientation Data (N = 565).

Item α δ1 δ2 δ3
BM0 BM DC4 EH BM0 BM DC4 EH BM0 BM DC4 EH BM0 BM DC4 EH
19 0.44 0.43 0.44 0.46 1.12 1.11 1.18 1.16 4.03 4.00 4.10 4.06 3.14 3.12 3.22 3.18
3 0.78 0.76 0.75 0.8 2.20 2.14 2.21 2.15 4.18 4.09 4.18 4.11 2.81 2.73 2.82 2.77
4 1.73 1.75 1.73 1.93 3.15 3.07 3.18 3.17 5.61 5.51 5.61 5.60 3.95 3.81 3.96 4.00
6 0.23 0.22 0.21 0.23 1.37 1.37 1.37 1.37 0.91 0.90 0.90 0.91 −0.75 −0.76 −0.75 −0.75
8 0.87 0.85 0.85 0.93 1.90 1.86 1.88 1.86 3.22 3.16 3.20 3.19 1.30 1.23 1.28 1.27
10 0.99 0.98 0.96 1.06 2.21 2.16 2.18 2.17 2.78 2.71 2.74 2.74 0.48 0.38 0.43 0.44
11 1.37 1.36 1.32 1.50 3.19 3.1 3.16 3.16 5.53 5.41 5.44 5.45 4.02 3.88 3.97 3.99
13 2.36 2.34 2.39 2.71 3.78 3.65 3.71 3.80 4.90 4.71 4.80 4.91 2.46 2.20 2.34 2.49
14 2.84 2.84 2.9 3.24 3.88 3.8 3.82 3.86 4.94 4.79 4.84 4.89 1.85 1.59 1.70 1.82
5 2.94 2.94 2.96 3.55 4.38 4.26 4.26 4.56 5.72 5.54 5.57 5.94 2.25 1.96 2.08 2.34
16 1.15 1.11 1.10 1.22 2.19 2.11 2.15 2.14 3.34 3.23 3.27 3.27 1.34 1.23 1.29 1.32
7 1.52 1.58 1.55 1.72 1.65 1.68 1.62 1.62 2.31 2.33 2.27 2.26 1.31 1.24 1.25 1.26
17 1.30 1.29 1.31 1.40 1.45 1.43 1.43 1.39 1.54 1.50 1.52 1.49 −0.02 −0.09 −0.05 −0.01
20 3.41 3.45 3.55 4.18 4.51 4.45 4.49 4.73 6.05 5.94 6.03 6.33 2.57 2.28 2.43 2.68
2 1.21 1.22 1.22 1.37 1.25 1.23 1.23 1.25 0.82 0.77 0.80 0.83 −2.0 −2.2 −2.13 −2.11
12 3.27 3.23 3.24 3.75 4.85 4.65 4.62 4.78 7.52 7.24 7.21 7.45 4.84 4.49 4.58 4.86

Note. BM0 = bivariate model with zero correlation (generalized partial credit model); BM = bivariate model (generalized partial credit model); DC4 = Davidian curve (k = 4); EH = empirical histogram.

Figure 1 illustrates the person estimates (EAP) for the BM0, BM, and DC with k = 4. Figure 1A shows that the consequence of ignoring the significant MNAR effect (ρ^ = −.55, SE = 0.04, for the BM) was that many estimates between the BM and BM0 were different. Figure 1B indicates that the mixed impact of the MNAR effect and the nonnormal latent variable distribution on person parameter estimates. Figure 1C depicts the impact of ignoring only the nonnormal distribution, which suggests that the person estimates on both poles tended to be influenced. Overall, the results presented suggest that the effects of the MNAR data and the nonnormal distribution should be taken into account and tackled in the parameter estimation.

Figure 1.

Figure 1.

Comparison of person estimates (θ^; expected a posteriori) for religious orientation data (A) between BM (generalized partial credit model) and BM with zero correlation, (B) between DC (k = 4) and BM with zero correlation, and (C) between DC (k = 4) and BM (generalized partial credit model).

Note. BM = bivariate model; DC = Davidian curve.

Regarding the estimates of distribution shape, Figure 2 depicts the four profiles of the estimated latent variable distribution of the DC with k = 4. Viewing θ or γ scale (Figure 2A and 2B), it can be seen that the distribution was slightly right-skewed for θ, and left-skewed for γ. Figure 2C views this three-dimensional plot using an azimuth of −37.5 degrees and an elevation of 30 degrees, which suggests θ and γ were negatively correlated and the distribution was skewed. Moreover, viewing the three-dimensional plot in θ–γ view (Figure 2D) shows that the correlation of θ and γ was negative, and the distribution was skewed. Besides, the g1 values for θ and γ were 0.046 and −0.016, suggesting that the distribution for θ was slightly right-skewed but slightly left-skewed for γ. Moreover, Mardia’s skewness test indicated that the skewness of the bivariate distribution was significant (Mardia’s skew = 0.07, χ2 = 618.81, p = .00). Overall, the results presented suggest that the nonnormal latent variable distribution and MNAR effect are existing in this empirical data considered.

Concluding Remarks

This article investigates the consequence of ignoring the nonnormal latent variable distribution and the MNAR data on the parameter estimates of interest. By extending the EH and the DC to bivariate dimensions (EH and DC), this article demonstrates that the latent variable distribution can be estimated for non-response and ES items. Simulation studies for the “don’t know” items and ES items were included to show that the performance of the HQIC, AIC, and BIC on selecting the appropriate model when the latent variable distribution is either normal, bimodal, or skewed. The results suggest that the AIC performed better than the HQIC and BIC when nonnormal latent variable distribution is present, which might be attributed to the fact that the AIC tends to select a more complex model, whereas the BIC tends to select a more parsimonious model and the HQIC is intermediate (Zhang & Davidian, 2001). Fortunately, the AIC did not select the more complex model (e.g., DC) but prefers the normal model when the data were truly generated from the normal model (Zhang & Davidian, 2001). Moreover, the AIC tends to select a more complex model when the latent variable distribution is nonnormal. Therefore, the AIC is preferred to the HQIC and BIC for the non-response data in the simulations investigated, although other factors that influence the detection of AIC, BIC, and HQIC should be further explored such as other types of nonnormal distributions in future research (Woods & Lin, 2009). The simulation studies further show that the normal or the bimodal distribution barely influenced the item estimates and EAP estimates for the BM, DC, and EH; however, the BM and BM0 could not well fit the data of skewed latent variable distribution and yield less accurate estimates. Although the EH sometimes yielded marginally better results than the DC (Woods, 2007), the EH must estimate numerous free parameters, which always leads to a large value of the information criteria. The EH was thus never selected and preferred in the simulations considered. Moreover, the EH always produces a rugged curve, which is hard to interpret and barely approximate the “true” population distribution well (Woods, 2014), especially in small sample sizes. Finally, a religious orientation dataset about health was included to demonstrate the existing nonnormal (skewed) latent variable distribution and MNAR effect, and their impact especially on the person estimates in practice.

Based on the arguments and numerical examples, it seems fair to suggest that dealing with nonnormal latent variable distribution and the MNAR effect is an essential step for MNAR data. The results presented in this article suggest that the DC is currently preferred for the following reasons. First, the DC estimates the bivariate distribution of latent trait and propensity with a much less number of parameters than the EH. Thus, the penalty terms of information criteria would not be excessively large, and these information criteria could be used to select the best-fit model. Second, for data even presenting a small or large number of missing data (e.g., 5% in non-responses or 50% in ES items), the DC can be stably estimated. Third, the distribution estimated by the DC is always smooth given sufficient quadrature points; in contrast, the EH tends to yield a non-smooth curve for small samples. Fourth, the framework of the DC is general so that it can be readily applied to various types of missing data with appropriate missing-datum matrices such as omitted, ES, and other non-response items.

Despite the utilities of the DC described above, several research routes may be pursued to extend the scope of the DC. First, the DC could be extended to simultaneously accommodate the multiple non-response options such as “no opinion,”“refusal,” and “don’t know.” A potential solution is the multiple decision approach (Liu & Wang, 2016), which can help realize the posited underlying response processes on non-responses. Second, it is possible to include the information of response time to facilitate the accuracy of parameters of interest. Third, developing a new estimation algorithm for multidimensional IRT-DC models in large-scale testing might be an important issue to solve.

Supplemental Material

sj-pdf-2-apm-10.1177_0146621621990753 – Supplemental material for Examining Nonnormal Latent Variable Distributions for Non-Ignorable Missing Data

Supplemental material, sj-pdf-2-apm-10.1177_0146621621990753 for Examining Nonnormal Latent Variable Distributions for Non-Ignorable Missing Data by Chen-Wei Liu in Applied Psychological Measurement

sj-zip-1-apm-10.1177_0146621621990753 – Supplemental material for Examining Nonnormal Latent Variable Distributions for Non-Ignorable Missing Data

Supplemental material, sj-zip-1-apm-10.1177_0146621621990753 for Examining Nonnormal Latent Variable Distributions for Non-Ignorable Missing Data by Chen-Wei Liu in Applied Psychological Measurement

Footnotes

Declaration of Conflicting Interests: The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding: The author(s) received no financial support for the research, authorship, and/or publication of this article.

Data Availability Statement: The data that support the findings of this study are available from the corresponding author upon reasonable request.

Supplemental Material: Supplementary material is available for this article online.

References

  1. Akima H. (1978). A method of bivariate interpolation and smooth surface fitting for irregularly distributed data points. ACM Transactions on Mathematical Software, 4(2), 148–159. [Google Scholar]
  2. Allen N. L., Holland P. W., Thayer D. T. (2005). Measuring the benefits of examinee-selected questions. Journal of Educational Measurement, 42(1), 27–51. [Google Scholar]
  3. Bacci S., Bartolucci F. (2015). A multidimensional finite mixture structural equation model for nonignorable missing responses to test items. Structural Equation Modeling: A Multidisciplinary Journal, 22(3), 352–365. [Google Scholar]
  4. Birnbaum A. (1968). Some latent trait models and their use in inferring an examinee’s ability. In Lord F. M., Novick M. R. (Eds.), Statistical theories of mental test scores (pp. 397–472). Addison-Wesley. [Google Scholar]
  5. Bock R. D. (1972). Estimating item parameters and latent ability when responses are scored in two or more nominal categories. Psychometrika, 37(1), 29–51. [Google Scholar]
  6. De Ayala R., Plake B. S., Impara J. C. (2001). The impact of omitted responses on the accuracy of ability estimation in item response theory. Journal of Educational Measurement, 38(3), 213–234. [Google Scholar]
  7. Dempster A. P., Laird N. M., Rubin D. B. (1977). Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society. Series B (Methodological), 39(1), 1–38. [Google Scholar]
  8. Fitzpatrick A. R., Yen W. M. (1995). The psychometric characteristics of choice items. Journal of Educational Measurement, 32(3), 243–259. [Google Scholar]
  9. Hastings W. K. (1970). Monte Carlo sampling methods using Markov chains and their applications. Biometrika, 57(1), 97–109. [Google Scholar]
  10. Holman R., Glas C. A. W. (2005). Modelling non-ignorable missing-data mechanisms with item response theory models. British Journal of Mathematical and Statistical Psychology, 58(1), 1–17. [DOI] [PubMed] [Google Scholar]
  11. Jennings M., Fox J., Graves B., Shohamy E. (1999). The test-takers’ choice: An investigation of the effect of topic on language-test performance. Language Testing, 16(4), 426–456. [Google Scholar]
  12. Joanes D., Gill C. (1998). Comparing measures of sample skewness and kurtosis. Journal of the Royal Statistical Society: Series D (The Statistician), 47(1), 183–189. [Google Scholar]
  13. Kuha J., Katsikatsou M., Moustaki I. (2018). Latent variable modelling with non-ignorable item non-response: Multigroup response propensity models for cross-national analysis. Journal of the Royal Statistical Society: Series A (Statistics in Society), 181(4), 1169–1192. [Google Scholar]
  14. Lange K. L., Little R. J., Taylor J. M. (1989). Robust statistical modeling using the t distribution. Journal of the American Statistical Association, 84(408), 881–896. [Google Scholar]
  15. List M. K., Köller O., Nagy G. (2019). A semiparametric approach for modeling not-reached items. Educational and Psychological Measurement, 79(1), 170–199. [DOI] [PMC free article] [PubMed] [Google Scholar]
  16. Liu C.-W., Wang W.-C. (2016). Unfolding IRT models for Likert-type items with a don’t know option. Applied Psychological Measurement, 40(7), 517–533. [DOI] [PMC free article] [PubMed] [Google Scholar]
  17. Liu C.-W., Wang W.-C. (2017). Non-ignorable missingness item response theory models for choice effects in examinee-selected items. British Journal of Mathematical and Statistical Psychology, 70(3), 499–524. [DOI] [PubMed] [Google Scholar]
  18. Liu C.-W., Qiu X.-L., Wang W.-C. (2019). Item response theory modeling for examinee-selected items with rater effect. Applied Psychological Measurement, 43(6), 435–448. [DOI] [PMC free article] [PubMed] [Google Scholar]
  19. Liu Y.-R. (2010). The influence of religion on health and ageing in Taiwan. A. S. Institute of Sociology: Survey Research Data Archive. [Google Scholar]
  20. Mardia K. V. (1970). Measures of multivariate skewness and kurtosis with applications. Biometrika, 57(3), 519–530. [Google Scholar]
  21. Metropolis N., Rosenbluth A. W., Rosenbluth M. N., Teller A. H., Teller E. (1953). Equation of state calculations by fast computing machines. The Journal of Chemical Physics, 21(6), 1087–1092. [Google Scholar]
  22. Mislevy R. J. (1984). Estimating latent distributions. Psychometrika, 49(3), 359–381. [Google Scholar]
  23. Mislevy R. J. (2016). Missing responses in item response modeling. In van der Linden W. J. (Ed.), Handbook of item response theory, volume two: Statistical tools (Vol. 21, pp. 171–194). Chapman and Hall/CRC Press. [Google Scholar]
  24. Muraki E. (1992). A generalized partial credit model: Application of an EM algorithm. Applied Psychological Measurement, 16(2), 159–176. [Google Scholar]
  25. Muthén B., Kaplan D., Hollis M. (1987). On structural equation modeling with data that are not missing completely at random. Psychometrika, 52(3), 431–462. [Google Scholar]
  26. Organisation for Economic Co-Operation and Development. (2017). PISA 2015 technical report. [Google Scholar]
  27. Powers D. E., Bennett R. E. (1999). Effects of allowing examinees to select questions on a test of divergent thinking. Applied Measurement in Education, 12(3), 257–279. [Google Scholar]
  28. Rose N., von Davier M., Nagengast B. (2015). Commonalities and differences in IRT-based methods for nonignorable item-nonresponses. Psychological Test and Assessment Modeling, 57(4), 472–498. [Google Scholar]
  29. Rose N., von Davier M., Nagengast B. (2017). Modeling omitted and not-reached items in IRT models. Psychometrika, 82(3), 795–819. [DOI] [PubMed] [Google Scholar]
  30. Rose N., von Davier M., Xu X. (2010). Modeling nonignorable missing data with item response theory (IRT) (No. RR-10-11). Educational Testing Service. [Google Scholar]
  31. Rubin D. B. (1976). Inference and missing data. Biometrika, 63(3), 581–592. [Google Scholar]
  32. Shin T., Davison M. L., Long J. D. (2009). Effects of missing data methods in structural equation modeling with nonnormal longitudinal data. Structural Equation Modeling: A Multidisciplinary Journal, 16(1), 70–98. [Google Scholar]
  33. Vale C. D., Maurelli V. A. (1983). Simulating multivariate nonnormal distributions. Psychometrika, 48(3), 465–471. [Google Scholar]
  34. Waller N., Jones J., Giordano C. (2015). fungible: Psychometric Functions from the Waller Lab, https://cran.r-project.org/web/packages/fungible/index.html [Google Scholar]
  35. Woods C. M. (2006). Ramsay-curve item response theory (RC-IRT) to detect and correct for nonnormal latent variables. Psychological Methods, 11(3), 253–270. [DOI] [PubMed] [Google Scholar]
  36. Woods C. M. (2007). Empirical histograms in item response theory with ordinal data. Educational and Psychological Measurement, 67(1), 73–87. [Google Scholar]
  37. Woods C. M. (2014). Estimating the latent density in unidimensional IRT to permit non-normality. In Reise S. P., Revicki D. A. (Eds.), Handbook of item response theory modeling (pp. 78–102). Routledge. [Google Scholar]
  38. Woods C. M., Lin N. (2009). Item response theory with estimation of the latent density using Davidian curves. Applied Psychological Measurement, 33(2), 102–117. [Google Scholar]
  39. Zhang D., Davidian M. (2001). Linear mixed models with flexible distributions of random effects for longitudinal data. Biometrics, 57(3), 795–802. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

sj-pdf-2-apm-10.1177_0146621621990753 – Supplemental material for Examining Nonnormal Latent Variable Distributions for Non-Ignorable Missing Data

Supplemental material, sj-pdf-2-apm-10.1177_0146621621990753 for Examining Nonnormal Latent Variable Distributions for Non-Ignorable Missing Data by Chen-Wei Liu in Applied Psychological Measurement

sj-zip-1-apm-10.1177_0146621621990753 – Supplemental material for Examining Nonnormal Latent Variable Distributions for Non-Ignorable Missing Data

Supplemental material, sj-zip-1-apm-10.1177_0146621621990753 for Examining Nonnormal Latent Variable Distributions for Non-Ignorable Missing Data by Chen-Wei Liu in Applied Psychological Measurement


Articles from Applied Psychological Measurement are provided here courtesy of SAGE Publications

RESOURCES