Abstract
Asymptotic robustness against misspecification of the underlying distribution for the polychoric correlation estimation is studied. The asymptotic normality of the pseudo-maximum likelihood estimator is derived using the two-step estimation procedure. The t distribution assumption and the skew-normal distribution assumption are used as alternatives to the normal distribution assumption in a numerical study. The numerical results show that the underlying normal distribution can be substantially biased, even though skewness and kurtosis are not large. The skew-normal assumption generally produces a lower bias than the normal assumption. Thus, it is worth using a non-normal distributional assumption if the normal assumption is dubious.
Electronic supplementary material
The online version of this article (doi:10.1007/s11336-016-9512-2) contains supplementary material, which is available to authorized users.
Keywords: underlying distribution, asymptotic covariance matrix, non-normality, pseudo-maximum likelihood
Introduction
Structural equation models (SEMs) are widely used in social sciences to model latent structures. Typically, normal distributions are assumed for both latent variables and error terms. However, observed measures in surveys are often ordinal. For example, a five-point Likert scale is commonly used in psychometric studies. Conceptually, categorical data should not be incorporated into a SEM by assuming they are continuous. There have been numerous advances in the literature on SEMs with respect to analysing ordinal data as they are. The observed ordinal data are usually assumed to be counterparts of some underlying continuous distributions. A typical choice of the underlying distributions is the standard normal distribution. Olsson (1979) studied the one-step maximum likelihood estimator (MLE) and the two-step MLE of the polychoric correlation coefficient. All parameters (i.e. thresholds and polychoric correlation) are estimated simultaneously for the one-step MLE, whereas the thresholds are estimated from the marginals and the polychoric correlation is computed based on the threshold estimates for the two-step MLE. Olsson showed that under the normality assumption, the one- and the two-step MLEs produce similar polychoric correlation estimates and similar variance estimates. Jöreskog (1994) derived the estimator of the asymptotic covariance matrix of the polychoric correlation estimators for the two-step maximum likelihood procedure (for a more compact expression, see Christoffersson & Gunsjö, 1996, and related references).
The underlying normality assumption is questionable. For example, the underlying normality assumption in the Life Orientation Test dataset (Scheier & Carver, 1985) was rejected by Maydeu-Olivares (2006). In yet another example, income is commonly used in the socio-economic status studies (e.g. Chateau, Metge, Prior, & Soodeen, 2012; Hodge & Treiman, 1968; Scharoun-Lee, Adair, Kaufman, & Gordon-Larsen, 2009). A Pareto distribution is classically used to model income (Arnold, 2008). Using a normal distribution to model income is dubious because the income is bounded by a lower limit. The question regarding income, however, is commonly categorized in a questionnaire: for example, see the National Longitudinal Study of Adolescent Health dataset (Carolina Population Center, 2009) used by Scharoun-Lee et al. (2009). Thus, “income" is an ordinal indicator with a non-normal underlying distribution. The consequences of violating the underlying normality assumption have been investigated (e.g. Flora & Curran, 2004; Lee & Lam, 1988; Quiroga, 1992). Flora and Curran (2004) generated non-normal data from the Fleishman–Vale–Maurelli method (Fleishman, 1978; Vale & Maurelli, 1983) in which a standard univariate normal random variable is polynomially transformed to introduce skewness and kurtosis. The authors found that the polychoric correlation estimates are only slightly biased when the underlying distribution has a skewness of 0.75 or 1.25 and a kurtosis of 1.75 or 3.75. They found, however, that the polychoric correlation is not robust against extreme underlying non-normality (e.g. skewness = 5 and kurtosis = 50). Lee and Lam (1988) generated non-normal data from an elliptical t distribution and an elliptical contaminated normal distribution and noted that the polychoric correlation estimates based on the normality assumption are fairly robust against non-normal underlying distributions. The study of Quiroga (1992) was conducted using non-normal data from an underlying bivariate skew-normal distribution and from the Fleishman–Vale–Maurelli method. The author also suggests that the polychoric correlation estimator is robust to non-normality. These studies share two features in common. First, they assume that the underlying distribution is normal to investigate the effect of underlying non-normality. So, a non-normal distribution assumption has not been systematically studied. Second, they are simulation studies. To our knowledge, there are no robustness studies on polychoric correlations from a theoretical standpoint.
Because the polychoric correlation is not distribution-free, tests of the underlying normality assumption are desired. For example, LISREL (Jöreskog & Sörbom, 1996) uses a likelihood ratio test to assess underlying normality, which is equivalent to a Pearson . Maydeu-Olivares, Forero, Gallardo-Pujol, and Renom (2009) and Maydeu-Olivares and Joe (2005, 2006) introduced a variant of the Pearson’s that is more suitable for the two-step MLE of the polychoric correlation. LISREL (Jöreskog & Sörbom, 1996) also provides the root-mean-square error of approximation (RMSEA) to assess the underlying normality assumption.
If the normality assumption fails, a new assumption of distribution is needed. Quiroga (1992) studied a new underlying distributional assumption whose marginal distributions are weighted averages of a univariate skew-normal distribution and a standard univariate normal distribution. Through an empirical example, the author showed that the polychoric correlation estimates based on the new assumption of distribution produce a smaller test statistic. The normality assumption has also been criticized in the item response theory and alternative distributions have been studied to account for the underlying non-normality (e.g. see Bolfarine & Bazán, 2010; Lucke, 2014; Woods & Thissen, 2006).
The purpose of this paper is twofold. First, we study robustness against misspecification of the underlying distribution from a theoretical perspective. The effect of distributional misspecification under the two-step maximum likelihood procedure is investigated. Because the two-step MLE is computationally easier (Olsson, 1979) and is implemented in LISREL, we focus only on the two-step MLE for its simplicity and popularity. Second, the underlying distribution is not restricted to a standard normal distribution. The t distribution and the skew-normal distribution are used as alternatives in the present study. In particular, the skew-normal distribution has been applied in the item response theory as an alternative to the normality assumption (e.g. see Azevedo, Bolfarine, & Andrade, 2011; Bázan, Branco, & Bolfarine, 2006; Molenaar, 2015; Molenaar, Dolan, & de Boeck, 2012; Santos, Azevedo, & Bolfarine, 2013). Because the underlying distribution cannot be fully determined from ordinal data, we attempt to pinpoint potential alternatives for the bivariate normal distribution assumption.
The remainder of this paper is organized as follows. General theories are presented, followed by numerical examples to illustrate our ideas. A brief conclusion ends the paper.
General Theory
Consider two ordinal variables U and V with and categories, respectively. The classic polychoric correlation estimation method assumes that there are two underlying continuous variables X and Y for U and V, respectively. The values of U and V are defined through X and Y as
where and are thresholds such that
The true joint distribution function is denoted by with two marginal distributions and , where is the correlation coefficient and is the vector of other parameters (e.g. degrees of freedom, location, and scale parameters). The corresponding joint density function is with marginal densities and . Because the true distribution family is unknown, we assume the underlying distribution to be with marginal distributions and . The joint density function is with marginal densities and , respectively. Conventionally, is taken to be the distribution function of a standard bivariate normal distribution. The normality assumption will be relaxed in our study. We also allow for different marginal distributions both in true underlying distributions and in the assumed ones.
Two-Step Estimation
Threshold Estimation
Let and be the observed frequency and proportion, respectively, of and , for and . If the true underlying distribution is different from the assumed distribution , the MLEs of thresholds will be inconsistent estimators of and , where the subscript 0 indicates true values. Consider the ordinal variable U first. Denote , where is the marginal total for . The corresponding marginal proportion is . The pseudo-maximum likelihood estimator (PMLE) of , denoted as , is obtained by maximizing
It is easy to see that is a consistent estimator of , where , because the observed cell probabilities are consistent estimators of . Similarly, is a consistent estimator of , where . Let be an matrix with (i, j)-th entry . Then
where is the total number of observations,
with being an vector of 1’s, and is a diagonal matrix with i-th element
for . The operator constructs a diagonal matrix using the enclosed vector as diagonal elements. The Taylor expansion of around is
1 |
where lies between and . Because both and are consistent estimators of is consistent for . So, Eq. (1) implies
2 |
Similar arguments applying to yield
3 |
where . Here and are defined by substituting with in and . The PMLEs and are inconsistent in the sense that and are different from the true values and .
Polychoric Correlation Coefficient Estimation
Under the distributional assumption , the assumed cell probability is
while the true cell probability is obtained by substituting with . Conditionally on and , the polychoric correlation is estimated by maximizing
Theorem 2.2 in White (1982) shows that the PMLE is a consistent estimator that minimizes the Kullback–Leibler information (Kullback & Leibler, 1951) under some regularity conditions, one of which is that the absolute value of is dominated by a variable with finite expectation. Such a regularity condition is satisfied if implies for all (i, j). Consequently, Theorem 2.2 in White (1982) shows that converges to that minimizes the Kullback–Leibler information
Theorem 1
Assume , as a function of , has a unique maximum at . If implies for all (i, j), then there exists a root of the equation
such that is a consistent estimator of .
That is, is a consistent estimator of that minimizes the probabilistic divergence between H and F (Kullback, 1959) in the sense of the Kullback–Leibler information. This minimized divergence implies similarities of H and F in terms of cell probabilities.
The assumption in Theorem 1 requires uniqueness of the maximum. In so doing, we rule out all cases with local maxima. If we have several stationary points, we can then only conclude that one of the stationary points minimizes the Kullback–Leibler information.
Asymptotic Variance of Polychoric Correlations
Let denote the first order partial derivative of with respect to . Similar symbols are used to represent other partial derivatives and higher order partial derivatives. can be expanded around for a sufficiently large n,
4 |
where lies between and . Term in Eq. (4) is equivalent to
5 |
where lies between and and lies between and . Hence, if implies for all in a neighbourhood of given the correlation is consistent for and is consistent for . Thus, Eq. (5) is equivalent to
Likewise, Term can be written as
provided that implies in a neighbourhood of . Hence, combining with Eqs. (2) and (3), (4) is equivalent to
where with being an matrix with -th element , and . Note that , where stacks the columns of the enclosed matrix and
with being an matrix with (i, j)-th entry . The arguments above establish the following theorem.
Theorem 2
Let be the consistent root of given and . Assume implies in a neighbourhood of , then
where . Here, matrix is evaluated under . The operator implies element-wise multiplication.
Estimating the Asymptotic Covariance Matrix
Following Theorem 2, the asymptotic variance of can be consistently estimated by
where is the (i, j)-th element in . The polychoric correlation between variables U and V satisfies
where the superscript (UV) emphasizes that all quantities are evaluated under the distributional assumption for U and V. Similarly, the polychoric correlation between variables K and Z satisfies
The underlying distributional assumption for U and V can either be the same as that for K and Z or different. Thus, the asymptotic covariance between and is consistently estimated by
6 |
where is the sample proportion of observing , and . Under the assumption that the underlying distribution is normal and correctly specified, Eq. (6) reduces to the estimator in Jöreskog (1994).
A Variant of Two-Step Estimation
The above two-step estimation is applicable to bivariate distributions whose marginal distributions do not depend on unknown parameters. For example, the mean and variance of a bivariate normal distribution are unknown parameters and assuming a standard normal distribution fixes those parameters to known values. In many other distributions, unknown parameters are included in the marginal distributions. Consequently, the above two-step MLE cannot be obtained unless the unknown parameters are prefixed. In such a case, a variant of the two-step MLE can be obtained instead. The MLE maximizes
with respect to the vector that consists of free unknown parameters. Not all parameters in and are free parameters. The mean and variance of an ordinal variable are not identified. Thus, the scale and location parameters that do not contribute to the correlation coefficient are not identified. In some distributions (e.g. the skew-normal distribution introduced later), the correlation coefficient is also determined by and, therefore, is not a free parameter. If is differentiable with respect to ,
is solved to obtain . By standard calculation,
where and the k-th row in is with the (i, j)-th element being , provided that is invertible. Assume that the correlation coefficient satisfies in which is nonzero. The delta method (Ferguson, 1996) indicates
where is the asymptotic covariance matrix of . Hence, the asymptotic covariance between and can be consistently estimated in a similar manner to Eq. (6). Let the matrix be constructed through . Then the asymptotic covariance between and is consistently estimated by
7 |
In Olsson (1979), thresholds and are parameters for the one-step MLE. However, the thresholds are not always directly estimated for the one-step MLE for other distributions. For example, the density function of a bivariate skew-normal distribution in Azzalini and Valle (1996) is
8 |
where is the density function of the bivariate standard normal distribution with correlation coefficient is the distribution function of a standard normal distribution, and and control skewness and kurtosis. The covariance matrix of X and Y is
9 |
where
Thus, the correlation coefficient is affected by and . The marginal distributions are univariate skew-normal distributions with densities
10 |
where for X and for Y. Bazán et al. (2006), Molenaar (2015), and Molenaar et al. (2012) have applied the univariate skew-normal distribution to the item response theory. The marginal distributions are affected by , and , as are the thresholds. Therefore, the thresholds are not free parameters. The vector of free parameters in the variant of two-step estimation is .
Numerical Examples
A numerical study is conducted in this section to examine the asymptotic bias under different distributional assumptions. Asymptotic limits of PMLE for polychoric correlation coefficients are numerically computed.
Distributional Assumption
Four experiments are conducted in which different true underlying distributions are investigated.
Experiment 1: Elliptical Distribution
In probability and statistics, an elliptical distribution belongs to a broad family of probability distributions. The bivariate joint density function of an elliptical distribution is of the form
11 |
where is a univariate function and
with being the variance of X and being the variance of Y. An elliptical distribution generalizes the normal distribution and keeps some properties (e.g. Balakrishnan & Lai, 2009; Fang, Kotz, & Ng, 1990; Kelker, 1970). Some examples of the bivariate elliptical distributions that will be used later are
Normal distributions: ;
t(v) distributions with degrees of freedom v: ;
Bivariate uniform distributions: with being an indicator function;
Bivariate Logistic distributions: ;
Bivariate exponential power distributions: .
The elliptical distribution family plays a very important role in robustness studies (e.g. Kano, Berkane, & Bentler, 1993). In the context of Pearson correlation estimation, Hampel, Ronchetti, Rousseeuw, and Stahel (1986) showed that the PMLE of the covariance matrix is proportional to the MLE under the true distributional assumption, provided that continuous data have been acquired. This result enables us to use any member of the family to estimate the correlation matrix, having the same estimates as if the true distribution were used. Likewise, Berkane, Kano, and Bentler (1994) claimed that “there is practically no cost in treating the distribution as multivariate t with specified (possibly small) degrees of freedom” (Berkane et al., 1994, p. 266) when the true distribution is normal and continuous data are observed. It only slightly inflates the variance of the resulting estimator. Thus, it is worth investigating the effect of an underlying elliptical distribution. Because we have only categorical data, the mean and variance are not identified. But then only the correlation coefficient is the parameter of interest, so we can assume and .
For some members of the elliptical distribution family, the marginal distribution is still elliptical but not of the same type (Gómez, Gómez-villegas, & Marín, 2003). The bivariate uniform distribution, the logistic distribution, and the exponential power distribution possess such properties. The support of the bivariate uniform distribution is not the whole Cartesian plane, whereas the other distributions have the whole Cartesian plane as their support. The exponential power distribution includes the normal distribution and the Laplace distribution as special cases.
Experiment 2: Skew-Normal Distribution
An elliptical distribution is symmetric. Qui-roga (1992) reported that kurtosis does not have strong effects on the polychoric correlation but that skewness increases the bias. The above elliptical distributions examine various values of kurtosis. The following distributions introduce nonzero values of skewness.
A natural generalization of a standard normal distribution is the univariate skew-normal distribution proposed by Azzalini (1985) and extended by Azzalini and Valle (1996) to a multivariate skew-normal distribution. The bivariate density function, covariance matrix and marginal density function are shown in Eqs. (8), (9), and (10), respectively. The ranges of the skewness and excess kurtosis are and [0, 0.8692), respectively (Azzalini & Capitanio, 2014, p. 32). This range is close to the low skewness and low kurtosis case in Flora and Curran (2004). The reader can refer to Azzalini (2005) for an overview of the skew-normal distribution and to Azzalini and Capitanio (2014) for the expressions of skewness and excess kurtosis. Note that the bivariate skew-normal distribution proposed by Azzalini and Valle (1996) is different from the skew-normal distribution in Quiroga (1992). The specification in Azzalini and Valle (1996) is used in the present study for its connection with the skew-t(v) distribution in the next experiment.
Experiment 3: Skew-t(v) Distribution
Skewness can also be introduced to the t(v) distribution. Azzalini and Capitanio (2003) proposed a multivariate skew-t(v) distribution whose bivariate density function is
where is the density function of a standard bivariate t distribution with correlation w and degrees of freedom v and is the distribution function of a univariate t distribution with degrees of freedom . The covariance matrix of X and Y is
provided that . Both marginal distributions are univariate skew-t distributions with density function
The reader is directed to Azzalini and Capitanio (2003) for the expressions of skewness and excess kurtosis.
Experiment 4: Other Distributions
The skew-normal and t(v) distributions are special cases of the skew-t(v) distribution family. There are many distributions that are not members of the skew-elliptical distribution family. In addition, the underlying distribution cannot be truly determined from the observed ordinal data. It is therefore important to investigate the effect of distributional misspecification using the distributions that do not belong to the skew-t distribution family. A Pareto distribution is commonly used to model income (Arnold, 2008) and income is commonly used as an indicator of socio-economic status. Mardia (1962) proposed a multivariate Pareto distribution in which the bivariate density function is
and the marginal density function is , with , and . The correlation coefficient between X and Y is 1 / a, which is always positive.
Numerical Design
Three combinations of categories are used. First, both U and V have five categories with cell probabilities (0.1, 0.2, 0.4, 0.2, 0.1) and (0.1, 0.1, 0.3, 0.3, 0.2), respectively. Second, both U and V have three categories with cell probabilities (0.2, 0.5, 0.3) and (0.1, 0.3, 0.6), respectively. Third, U has three categories with cell probabilities (0.2, 0.5, 0.3) and V has five categories with cell probabilities (0.1, 0.1, 0.3, 0.3, 0.2).
In Experiment 1, in the exponential power distribution is . In Experiments 2 and 3, three values of are considered () and 20 evenly spaced values of are considered ranging from 0.5 to 10 for both skew-normal and skew-t(v) distributions. Thus, different combinations of univariate skewness and kurtosis are investigated. In Experiments 1, 2 and 3, the degrees of freedom for the t(v) and skew-t(v) distributions are 4, 6, 8, and 10. In Experiment 4, parameters for the Pareto distribution are .
For all experiments, two values of are used: . For the purpose of illustration, the assumed underlying distributions are bivariate normal, skew-normal, and t(v) distributions. The normal assumption consists of only one unknown parameter of interest, . The skew-normal assumption consists of three parameters: , and that determine the correlation coefficient. The degrees of freedom in the t(v) are prefixed to be 4, 6, 8, 10 and the correlation coefficient is the only parameter of interest. The expressions of the partial derivatives of the skew-normal distribution and t distribution can be found in the supplementary materials.
Numerical Results
To assess the bias of polychoric correlation estimates, the relative bias (RB) is computed, which is defined as . Following the definition in Flora and Curran (2004), , and indicate slight, moderate, and large bias, respectively. To assess the closeness of the fit, the limit value of RMSEA
is computed. Owing to space limitations, here only some main results are presented and discussed in this subsection. Complete results can be found in the supplementary materials.
Experiment 1
Figures 1 displays the RB and RMSEA values when the true correlation is 0.4. As expected, assuming a wrong underlying distribution generally biases the polychoric correlation. Observe that the skew-normal distribution contains the normal distribution as a special case. Thus, both the normal and skew-normal assumptions consistently estimate the polychoric correlation when the true underlying distribution is normal. When the true underlying distribution is a normal distribution or a t distribution, all distributional assumptions produce a low RB (less than ). When the true underlying distribution is a uniform distribution or logistic distribution, the normal assumption generally produces a low-biased correlation estimate. However, the t assumption may produce a high RB (Figure 1). The normal and skew-normal assumptions can produce moderately biased polychoric correlations when the underlying distribution is the exponential power distribution with that corresponds to a distribution with high kurtosis (Figure 1). As the kurtosis in the exponential power family decreases, the magnitude of RB concomitantly decreases. When the underlying distribution is non-normal, the normal and skew-normal assumptions may produce different correlation estimates. Thus, the skew-normal distribution adjusts the underlying non-normality by introducing some degree of skewness. Consequently, the magnitude of RB may become higher but the RMSEA may become lower (Figure 1), which occurs when the number of categories is three for both ordinal variables. The polychoric correlation based on the underlying normal assumption generally underestimates the true correlation coefficient. The t(4) and t(6) assumptions sometimes outperform the normal assumption in Experiment 1.
Experiment 2
As expected, the polychoric correlation is consistently estimated when the true and assumed underlying distributions are both skew-normal (Figure 2). The normal assumption produces negatively biased correlation estimates. It can be moderately or strongly biased unless both and are small. Recall that and control the skewness and kurtosis of the underlying distribution. Small values of and only introduce a small departure from the bivariate normal distribution. All the t distribution assumptions produce similar RBs relative to the normal assumption. Under both the normal and t(v) distribution assumptions, three categories in both ordinal variables generally lead to a higher magnitude of the RB value than five categories in both variables. For example, the RB with five-category variables does not exceed when and , whereas the RB with three-category variables frequently exceeds under the same condition (Figure 2). As the true value of the correlation increases while the other conditions remain the same, the RB generally becomes smaller (see Figures 7, 8, and 9 in the supplementary materials).
In Experiment 2, the RMSEA can be misleading when the number of categories is three in both variables. Consider the normal assumption as an example. The magnitude of RB may exceed 10 when , and both ordinal variables have three categories (Figure 2), whereas the RMSEA is still below 0.05 (Figure 3). The pattern is more dramatic when . The RB is almost when and , but the RMSEA is slightly below 0.05. Thus, the estimated probabilities can be rather close to the true probabilities but the polychoric correlation can be largely biased. This event occurs because RMSEA only measures the closeness between the estimated and true category probabilities, and is not a direct measure of the correlation estimate.
On the other hand, although the skew-normal assumption consistently estimates the polychoric correlation in Experiment 2, the numerical difficulties (such as non-convergence and local maximizer) are encountered in the present study. The fit function can be fairly flat (see Figure 13 in the supplementary materials as an illustration). A bad choice of the starting value for the numerical optimization process can lead to the aforementioned issues. Thus, 20 starting values are employed. As a result, the skew-normal assumption is computationally much more intensive than the normal assumption.
Experiment 3
When the true underlying distribution is a skew-t(4) distribution, the normal and t(v) underlying distributional assumptions lead to a largely biased polychoric correlation, except when both and are small (Figure 4). A small pair of only introduces a small skewness and kurtosis to the underlying distribution, which is similar to a t(4) distribution. As known from Experiment 1, the normal and t(v) underlying distributional assumptions are only slightly biased when the true underlying distribution is a t distribution. The skew-normal assumption may produce not so biased correlations when both ordinal variables have three categories and is small (Figure 4). In general, the skew-normal assumption is less biased than the normal and t(v) assumptions. As the degrees of freedom of the skew-t(v) distribution increases, all distributional assumptions become less biased, and the skew-normal assumption in particular is often robust (See the figures in the supplementary materials). This effect is expected from the fact that the skew-normal distribution corresponds to the skew-t distribution. Nevertheless, the normal and t(v) assumptions still produce moderately or largely biased polychoric correlations. Similar to the conclusions from the underlying skew-normal distribution, a wrong distributional assumption tends to underestimate the polychoric correlation. Three categories in both ordinal variables generally lead to a higher RB in magnitudes than five categories in both ordinal variables; and a higher value of the true correlation coefficient generally leads to less biased estimates. Similar to the case in Experiment 2, the RMSEA can be misleading as well. A low RMSEA does not necessarily indicate a low RB (e.g. see Figure 26 in the supplementary materials).
Experiment 4
Table 1 shows that all the underlying distributional assumptions tend to be extremely biased when the true underlying distribution is a Pareto distribution. Similar to Experiments 2 and 3, the polychoric correlation tends to be underestimated across all conditions in Experiment 4. The skew-normal assumption produces a lower RB than the normal and t(v) assumptions, although all assumptions generally produce a large RB. The value of RMSEA tends to be small despite the heavily biased polychoric correlation. In particular, the RMSEA produced by the skew-normal assumption is always low. Note that the Pareto distribution is skewed. Thus, the skew-normal distribution assumption mimics the skewed pattern, although the true correlation coefficient is inconsistently estimated.
Table 1.
Assumed distribution | ||||||||
---|---|---|---|---|---|---|---|---|
Normal | t(4) | t(6) | t(8) | t(10) | Skew-normal | |||
RB | ||||||||
3 | 3 | 0.4 | 47.18 | 45.98 | 46.29 | 46.47 | 46.59 | 32.89 |
0.6 | 51.23 | 50.17 | 50.42 | 50.57 | 50.68 | 38.78 | ||
3 | 5 | 0.4 | 38.23 | 37.83 | 37.60 | 37.60 | 37.65 | 32.12 |
0.6 | 43.50 | 43.12 | 42.92 | 42.93 | 42.97 | 37.99 | ||
5 | 5 | 0.4 | 36.26 | 38.19 | 36.94 | 36.48 | 36.28 | 31.60 |
0.6 | 41.96 | 43.35 | 42.36 | 42.01 | 41.86 | 37.41 | ||
RMSEA | ||||||||
3 | 3 | 0.4 | 0.04 | 0.07 | 0.05 | 0.05 | 0.04 | 0.00 |
0.6 | 0.05 | 0.08 | 0.06 | 0.06 | 0.06 | 0.01 | ||
3 | 5 | 0.4 | 0.04 | 0.05 | 0.04 | 0.04 | 0.04 | 0.01 |
0.6 | 0.05 | 0.06 | 0.05 | 0.05 | 0.05 | 0.01 | ||
5 | 5 | 0.4 | 0.03 | 0.04 | 0.04 | 0.03 | 0.03 | 0.01 |
0.6 | 0.04 | 0.05 | 0.04 | 0.04 | 0.04 | 0.01 |
Asymptotic Variance
In this subsection, the asymptotic variance is illustrated in Figure 5 when the true underlying distribution is a skew-normal distribution and both ordinal variables have five categories. The skew-normal assumption produces a lower asymptotic variance than do the other assumptions of distribution. The normal assumption often produces a similar asymptotic variance to the t assumption when . Otherwise, the normal assumption tends to be slightly less variable than the t assumption. However, Figure 5 shows that the asymptotic variances under the skew-normal assumption can be substantially higher than the asymptotic variances of other assumptions of distribution when both ordinal variables have three categories. Recall that the normal assumption is asymptotically biased (Figure 2); however, a lower variance may lead to a lower mean squared error than the skew-normal assumption. Thus, although the skew-normal assumption is asymptotically unbiased, the correlation estimate is likely to have a larger departure from the true value than the normal assumption because of the large variation.
Conclusion and Discussion
In this paper, we study robustness of polychoric correlation estimation against misspecification of underlying distributions. The asymptotic polychoric correlation and its asymptotic (co)variance are derived under the conditions of the support of assumed distributions. Unlike the continuous case, the correlation structure is not asymptotically unbiased any more. Although the bias is sometimes small, a large bias can occur, especially when the true underlying distribution is skewed but a bivariate normal or t distribution is assumed. It is seen from the numerical example that the skew-normal assumption performs as well as the conventional normal assumption when the true underlying distribution is a t distribution and improves the normal assumption when skewness exists.
Both Flora and Curran (2004) and Quiroga (1992) found that the normal assumption is robust against non-normal data generated from the Fleishman–Vale–Maurelli method. For example, the largest skewness and kurtosis considered in Flora and Curran (2004) are 1.25 and 3.75, respectively. The RB is lower than 10 in most conditions and is lower than 5 when the number of categories is five and (Flora & Curran, 2004, Table 2). Our results show that the polychoric correlation can be largely underestimated using the normal assumption when the true underlying distribution is a skew-normal distribution skewness and kurtosis of which are bounded by some small values. The bias becomes even higher when the true underlying distribution is skew-t(4) or a Pareto distribution in which cases the kurtosis is not well defined. Although the skew-normal assumption is also largely biased sometimes, it greatly improves the conventional normal assumption. Still, the skew-normal assumption has a much higher variance than the normal assumption when the number of categories is small. Thus, the volatility is high under the skew-normal assumption. Obviously, more studies are needed to investigate small sample volatility in order to provide suggestions for practice.
Lee and Lam (1988) suggested using the correct underlying distributional assumption to estimate more accurately the polychoric correlation if the ordinal data are asymmetric. Because the ordinal data indicate the loss of information when comparing with continuous data, we cannot have visual inspections of the underlying distribution. If the tests of the underlying distribution were rejected, the underlying distributional assumption is questionable, and an alternative distributional assumption should be used. In practice, several assumptions of underlying distribution can be tested and then the most plausible one chosen.
The normal distribution is a special case of the skew-normal distribution. We have shown that both distributions consistently estimate the polychoric correlation when the true distribution is normal. Thus, the skew-normal assumption, which is able to model skewness and kurtosis, is a natural extension to the conventional normal assumption and frequently outperforms the normal assumption. However, three parameters are simultaneously estimated in the skew-normal distribution. Because the thresholds are determined through , and , the gradient and Hessian matrix involve derivatives of the thresholds with respect to , and . Accordingly, it is computationally more difficult than the normal assumption. Besides, non-convergence and local optimizers are encountered in the present study and multiple starting values are used to obtain the correlation estimate.
Although only the t and skew-normal assumptions are illustrated as non-normal alternatives in the present study, other distributions that are differentiable with respect to unknown parameters can be used to estimate the correlation coefficient by the aid of Theorem 1 or Eq. (6). Its asymptotic variance and covariance can be estimated using Theorem 2 or Eq. (7). For example, the logistic distribution can be assumed in the two-step estimation and the skew-t distribution can be assumed in the variant of the two-step estimation. It will be of interest to derive analytical expressions for the skew-elliptical distribution family that consists of the skew-normal and skew-t distributions. Our numerical results demonstrate that the skew-normal assumption generally improves the conventional normal assumption in the imaginary case where n is infinite. It is worthy to conduct a simulation study to investigate the small sample bias in estimating the correlation coefficient and its effects on the bias of parameters in a SEM with ordinal data.
Electronic supplementary material
Below is the link to the electronic supplementary material.
Acknowledgments
The research reported in this article has been supported by the Swedish Research Council (VR) under the program: Structural Equation Modeling with Ordinal Variables, 421-2011-1727.
Open Access
This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
Contributor Information
Shaobo Jin, Phone: +46184711038, Email: shaobo.jin@statistik.uu.se.
Fan Yang-Wallentin, Phone: +46184715158, Email: fan.yang@statistik.uu.se.
References
- Arnold BC. Pareto and generalized pareto distributions. In: Chotikapanich D, editor. Modeling income distributions and Lorenz curves. New York: Springer; 2008. pp. 119–145. [Google Scholar]
- Azevedo CLN, Bolfarine H, Andrade DF. Bayesian inference for a skew-normal IRT model under the centred parameterization. Computational Statistics & Data Analysis. 2011;55:353–365. doi: 10.1016/j.csda.2010.05.003. [DOI] [Google Scholar]
- Azzalini A. A class of distributions which includes the normal ones. Scandinavian Journal of Statistics. 1985;12:171–178. [Google Scholar]
- Azzalini A. The skew-normal distribution and related multivariate families. Scandinavian Journal of Statistics. 2005;32:159–188. doi: 10.1111/j.1467-9469.2005.00426.x. [DOI] [Google Scholar]
- Azzalini A, Capitanio A. Distributions generated by perturbation of symmetry with emphasis on a multivariate skew t-distribution. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 2003;65:367–389. doi: 10.1111/1467-9868.00391. [DOI] [Google Scholar]
- Azzalini A, Capitanio A. The skew-normal and related families. Cambridge: Cambridge University Press; 2014. [Google Scholar]
- Azzalini A, Valle AD. The multivariate skew-normal distribution. Biometrika. 1996;83:715–726. doi: 10.1093/biomet/83.4.715. [DOI] [Google Scholar]
- Balakrishnan N, Lai CD. Continuous bivariate distributions. 2. New York, NY: Springer; 2009. [Google Scholar]
- Bazán JL, Branco MD, Bolfarine H. A skew item response model. Bayesian Analysis. 2006;1:861–892. doi: 10.1214/06-BA128. [DOI] [Google Scholar]
- Berkane M, Kano Y, Bentler PM. Pseudo maximum likelihood estimation in elliptical theory: Effects of misspecification. Computational Statistics & Data Analysis. 1994;18:255–267. doi: 10.1016/0167-9473(94)90175-9. [DOI] [Google Scholar]
- Bolfarine H, Bazán JL. Bayesian estimation of the logistic positive exponent irt model. Journal of Educational and Behavioral Statistics. 2010;35:693–713. doi: 10.3102/1076998610375834. [DOI] [Google Scholar]
- Carolina Population Center. (2009). National Longitudinal Study of Adolescent to Adult Health (Add Health) [Data file and code book]. http://www.cpc.unc.edu/projects/addhealth
- Chateau D, Metge C, Prior H, Soodeen RA. Learning from the census: The Socio-economic Factor Index (SEFI) and health outcomes in Manitoba. Canadian Journal of Public Health. 2012;4:23–27. doi: 10.1007/BF03403825. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Christoffersson A, Gunsjö A. A short note on the estimation of the asymptotic covariance matrix for polychoric correlations. Psychometrika. 1996;61:173–175. doi: 10.1007/BF02296965. [DOI] [Google Scholar]
- Fang KT, Kotz S, Ng KW. Symmetric multivariate and related distributions. New York, NY: Chapman and Hall; 1990. [Google Scholar]
- Ferguson TS. A course in large sample theory. New York, NY: Chapman and Hall; 1996. [Google Scholar]
- Fleishman AI. A method for simulating non-normal distributions. Psychometrika. 1978;43:521–532. doi: 10.1007/BF02293811. [DOI] [Google Scholar]
- Flora DB, Curran PJ. An empirical evaluation of alternative methods of estimation for confirmatory factor analysis with ordinal data. Psychological Methods. 2004;9:446–491. doi: 10.1037/1082-989X.9.4.466. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Gómez E, Gómez-villegas MA, Marín JM. An survey on continuous elliptical vector distributions. Revista Matemática Complutense. 2003;16:345–361. doi: 10.5209/rev_REMA.2003.v16.n1.16889. [DOI] [Google Scholar]
- Hampel FR, Ronchetti EM, Rousseeuw PJ, Stahel WA. Robust statistics: The approach based on influence functions. New York, NY: Wiley; 1986. [Google Scholar]
- Hodge RW, Treiman DJ. Social participation and social status. American Sociological Review. 1968;33:722–740. doi: 10.2307/2092883. [DOI] [Google Scholar]
- Jöreskog KG. On the estimation of polychoric correlations and their asymptotic covariance matrix. Psychometrika. 1994;59:381–389. doi: 10.1007/BF02296131. [DOI] [Google Scholar]
- Jöreskog KG, Sörbom D. Lisrel 8: User’s reference guide. Chicago, IL: Scientific Software International; 1996. [Google Scholar]
- Kano Y, Berkane M, Bentler PM. Statistical inference based on pseudo-maximum likelihood estimators in elliptical populations. Journal of the American Statistical Association. 1993;88:135–143. [Google Scholar]
- Kelker D. Distribution theory of spherical distributions and a location-scale parameter generalization. Sankhyā: The Indian Journal of Statistics, Series A. 1970;32:419–430. [Google Scholar]
- Kullback S. Information theory and statistics. New York, NY: Wiley; 1959. [Google Scholar]
- Kullback S, Leibler RA. On information and sufficiency. Annals of Mathematical Statistics. 1951;22:79–86. doi: 10.1214/aoms/1177729694. [DOI] [Google Scholar]
- Lee SY, Lam ML. Estimation of polychoric correlation with elliptical latent variables. Journal of Statistical Computation and Simulation. 1988;30:173–188. doi: 10.1080/00949658808811095. [DOI] [Google Scholar]
- Lucke, J. F. (2014). Positive trait item response models. In R. E. Millsap, L. A. van der Ark, D. M. Bolt & C. M. Woods (Eds.), New Developments in Quantitative Psychology: Presentations from the 77th Annual Psychometric Society Meeting (Vol. 66, pp. 199–213). New York: Springer.
- Mardia KV. Multivariate pareto distributions. The Annals of Mathematical Statistics. 1962;33:1008–1015. doi: 10.1214/aoms/1177704468. [DOI] [Google Scholar]
- Maydeu-Olivares A. Limited information estimation and testing of discretized multivariate normal structural models. Psychometrika. 2006;71:57–77. doi: 10.1007/s11336-005-0773-4. [DOI] [Google Scholar]
- Maydeu-Olivares A, Forero CA, Gallardo-Pujol D, Renom J. Testing categorized bivariate normality with two-stage polychoric correlation estimates. Methodology. 2009;5:131–136. doi: 10.1027/1614-2241.5.4.131. [DOI] [Google Scholar]
- Maydeu-Olivares A, Joe H. Limited-and full-information estimation and goodness-of-fit testing in 2n contingency tables: A unified framework. Journal of the American Statistical Association. 2005;100:1009–1020. doi: 10.1198/016214504000002069. [DOI] [Google Scholar]
- Maydeu-Olivares A, Joe H. Limited information goodness-of-fit testing in multidimensional contingency tables. Psychometrika. 2006;71:713–732. doi: 10.1007/s11336-005-1295-9. [DOI] [Google Scholar]
- Molenaar D. Heteroscedastic latent trait models for dichotomous data. Psychometrika. 2015;80:625–644. doi: 10.1007/s11336-014-9406-0. [DOI] [PubMed] [Google Scholar]
- Molenaar D, Dolan CV, de Boeck P. The heteroscedastic graded response model with a skewed latent trait: Testing statistical and substantive hypotheses related to skewed item category functions. Psychometrika. 2012;77:455–478. doi: 10.1007/s11336-012-9273-5. [DOI] [PubMed] [Google Scholar]
- Olsson U. Maximum likelihood estimation of the polychoric correlation coefficient. Psychometrika. 1979;44:443–460. doi: 10.1007/BF02296207. [DOI] [Google Scholar]
- Quiroga, A. M. (1992). Studies of the polychoric correlation and other correlation measures for ordinal variables. Unpublished Doctoral dissertation, Uppsala University, Uppsala.
- Santos JRS, Azevedo CLN, Bolfarine H. A multiple group item response theory model with centered skew-normal latent trait distributions under a bayesian framework. Journal of Applied Statistics. 2013;40:2129–2149. doi: 10.1080/02664763.2013.807331. [DOI] [Google Scholar]
- Scharoun-Lee M, Adair LS, Kaufman JS, Gordon-Larsen P. Obesity, race/ethnicity and the multiple dimensions of socioeconomic status during the transition to adulthood: A factor analysis approach. Social Science & Medicine. 2009;68:708–716. doi: 10.1016/j.socscimed.2008.12.009. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Scheier FM, Carver CS. Optimism, coping, and health: Assessment and implications of generalized outcome expectancies. Health Psychology. 1985;4:219–247. doi: 10.1037/0278-6133.4.3.219. [DOI] [PubMed] [Google Scholar]
- Vale CD, Maurelli VA. Simulating multivariate nonnormal distributions. Psychometrika. 1983;48:465–471. doi: 10.1007/BF02293687. [DOI] [Google Scholar]
- White H. Maximum likelihood estimation of misspecified models. Econometrica. 1982;50:1–25. doi: 10.2307/1912526. [DOI] [Google Scholar]
- Woods CM, Thissen D. Item response theory with estimation of the latent population distribution using spline-based densities. Psychometrika. 2006;71:281–301. doi: 10.1007/s11336-004-1175-8. [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.