Abstract
Categorical marginal models (CMMs) are flexible tools for modelling dependent or clustered categorical data, when the dependencies themselves are not of interest. A major limitation of maximum likelihood (ML) estimation of CMMs is that the size of the contingency table increases exponentially with the number of variables, so even for a moderate number of variables, say between 10 and 20, ML estimation can become computationally infeasible. An alternative method, which retains the optimal asymptotic efficiency of ML, is maximum empirical likelihood (MEL) estimation. However, we show that MEL tends to break down for large, sparse contingency tables. As a solution, we propose a new method, which we call maximum augmented empirical likelihood (MAEL) estimation and which involves augmentation of the empirical likelihood support with a number of well-chosen cells. Simulation results show good finite sample performance for very large contingency tables.
Keywords: categorical marginal model, Cronbach’s alpha, large categorical data sets, marginal homogeneity, maximum empirical likelihood estimation, maximum likelihood estimation, scalability coefficients
Categorical marginal models (CMMs; Bergsma et al., 2009; also see, e.g., Bergsma, 1997; Bergsma & Rudas, 2002; Bartolucci et al., 2007; Colombi & Forcina, 2001; Evans & Forcina, 2013; Lang & Agresti, 1994; Lang, 1996; Molenberghs & Lesaffre, 1999; Rudas & Bergsma, 2023) are flexible tools to model location, spread, and association in dependent or clustered categorical data, when the dependence itself is not of interest. CMMs require data in a table format for input; that is, for a dataset with N respondents and J categorical variables, CMMs require a (vectorized) J-variate contingency table, where each cell corresponds to a response pattern, and the frequencies within the cells represent the observed frequencies of each response pattern. The only assumption of the CMMs under consideration is that the cell frequencies in the contingency table follow a multinomial distribution, rendering a very flexible method.
CMMs can be a valuable psychometric tool since they allow for null-hypothesis significance testing (NHST) of complex coefficients without the need to specify a parametric model or impose additional assumptions. In Psychometrics, NHTS often occurs under the assumption of a parametric model. For example, testing measurement invariance across several groups is typically done under a structural equation model (e.g., Cheung & Rensvold, 2002). However, rather than testing (the null-hypothesis of interest), we implicitly test : plus the assumption that the structural equation model fits the data. Rejecting does not provide information about because should be rejected either when is false or when the structural equation model does not fit the data (cf. Jorgensen et al., 2017). In other fields of psychometrics (e.g., nonparametric modeling, classical test theory) and applied statistics, there is no comprehensive parametric modeling framework. In such situations, it becomes particularly valuable if the assumptions required for NHST are easily satisfied, ensuring that the null hypothesis of interest is not excessively confounded by data failing to meet the assumptions, thus maintaining a close approximation between and . The CMM assumption that cell frequencies follow a multinomial distribution is very lenient, implying that every response pattern should, in principle, be observable.
The process of relaxing assumptions for NHST can be a time-consuming endeavor spanning several years. For instance, in the case of NHST for Cronbach’s alpha, there exists a history of research papers progressively relaxing the required assumptions: Feldt derived tests for three types of null-hypothesis on Cronbach’s alpha: alpha equals some criterion value (Feldt, 1969), alpha is equal across groups (Feldt, 1965), and alpha is equal across different measurements (Feldt 1980). Feldt assumed that alpha asymptotically follows an F distribution. This assumptions was subsequently relaxed by Van Zyl et al. (2000), who derived a distribution without restricting the covariances, Maydeu-Olivares et al. (2007) who relaxed the assumptions of Feldt’s first hypothesis by deriving asymptotically distribution-free interval estimates for alpha, Maydeu-Olivares et al. (2010) who proposed testing Feldt’s hypotheses in a structural equation modeling framework, and ultimately, Kuijpers et al. (2013), who proposed using CMMs for testing Feldt’s hypotheses. Each successive paper demonstrated significant enhancements in the properties of NHST for Cronbach’s alpha when compared to its predecessors.
In some cases, no hypothesis tests are available leaving CMMs as a possible option to derive hypothesis tests. For example, Van der Ark et al. (2008) used CMMs for developing NHST for Mokken’s (1971) scalability coefficients, which allows testing scalability coefficients for item pairs, individual items, and scales across groups and across measurement occasions. Finally, we would like to note that CMMs can be used in conjunction with latent variables models, although this needs further development. We refer to Bergsma et al. (2009), for other applications of CMMs, and Bergsma et al. (2009, 2013) who introduced CMMs with latent variables.
CMMs can be estimated using the maximum likelihood (ML) method, which has many favorable properties, including asymptotic efficiency. A serious limitation of the ML method is that for large contingency tables estimation is infeasible, as ML requires the computation of an expected frequency for each cell in the contingency table. This curse of dimensionality may be an important reason why CMMs have failed to become popular in psychometrics. Most psychological and educational tests consist of many variables (usually referred to as items) yielding an extremely large number of possible response patterns and, therefore, extremely large contingency tables. For example, Raven’s Advanced Progressive Matrices (Raven et al., 2003), measuring general intelligence, consists of 48 binary items, which yields a contingency table of cells; and the personality inventory NEO-PI-R (Costa & McCrae 2008), measuring five personality traits, consists of 48 five-category items per trait, which yields a contingency table of cells. Lloyd (2000) estimated that if every particle in the universe could be used as part of a huge computer, it could store approximately bits. Hence, for contingency tables based on psychological and educational tests, the required computer capacity easily exceeds the ultimate physical limits of computation, whereas The ML estimation procedure to estimate CMMs implemented in the R-package cmm (Bergsma & Van der Ark, 2023) cannot handle more than a few million cells.
In this paper, we give a new adaptation to the ML estimation procedure to solve the above problem. Although there are alternative estimation procedures that may be used to estimate CMMs, we preferred to stay within a ML-framework as ML guarantees asymptotic efficiency, whereas alternatives estimation methods for contingency tables, such as generalizing estimation equations (GEE’s, e.g., Qaqish & Liang, 1992), and composite likelihood (e.g., Varin et al., 2011) are not, and weighted least squares (Grizzle et al., 1969; a.k.a the GSK-method) is sensitive to sparsity in the marginal distribution (cf. Rudas & Bergsma, 2023). In addition, an adaptation of the ML approach is easy to fit in the existing software.
Initially, we considered the empirical likelihood method (Owen, 2001, Qin & Lawless, 1994), a data-driven, nonparametric estimation method. The core idea behind the empirical likelihood method is to construct a likelihood function directly from the observed data, without assuming any specific underlying probability distribution; that is, given vector valued data , an empirical likelihood is the likelihood of a probability distribution with support (Owen, 2001). In the context of CMMs, the empirical likelihood method involves constructing the likelihood solely from cells with nonzero frequencies, while regarding cells with zero frequency as structural zeroes and setting their estimated probability to zero. Given that the number of cells with nonzero frequencies cannot exceed the sample size, and in the case of psychological and educational test data, the sample size rarely exceeds 10,000, the empirical likelihood method serves as a computationally feasible alternative to ML. We abbreviate the method of maximizing the empirical likelihood subject to model constraints by MEL.
Unfortunately, the support belonging to the empirical likelihood may be too small (i) to estimate the parameters of a CMM, or, even if this can be done, (ii) to estimate the asymptotic covariance matrix of the ML estimators of the parameters of the CMM. We will refer to these two problems as the first- and second-order estimation problems, respectively (see Appendix A for more details). The first problem has also been called the empty set problem (Grendár & Judge, 2009). As far as we are aware, the second problem has not yet been described in the literature. The solution to these problems which we propose in this paper is to augment the empirical likelihood support with a number of well-chosen points, and we will refer to the method of maximizing the resulting empirical likelihood as maximum augmented empirical likelihood (MAEL). Note that as the sample size goes to infinity, assuming no structural zeroes, the probability that all cells in a contingency table will have a positive count will go to 1, so for categorical data MEL, MAEL and ML are asymptotically equivalent.
The reason why MEL and MAEL estimators work asymptotically (as ) is because they are with probability tending to 1 equivalent to ML estimator. That justifies testing goodness of fit and making inferences for parameters in same ways as we would do with ML. Two related methods, called adjusted empirical likelihood, Chen et al. (2008) and balanced augmented empirical likelihood (Emerson & Owen, 2009; also see Nguyen et al., 2015, Xia & Liu, 2019) have been considered for continuous data. These methods augment the data set with one or two additional observations. In contrast, our methodology consists of only augmenting the support of distributions corresponding to the empirical likelihood with additional points, but without adding any observations to the data.
The remainder of the paper is organized as follows. In Sect. 1, we give a brief overview of and notation for CMMs. In Sect. 2, we describe ML and MEL estimation for CMMs and introduce MAEL estimation. In Sect. 3, we present two simulation studies. Study 1 compares the convergence rate and computation time of ML, MEL, and MAEL estimation for small contingency tables, and Study 2 investigates the Type I error rate of CMMs using MAEL estimation for small and large contingency tables, and bias and variance of the model parameters. In Sect. 4, we briefly discuss the advantages and disadvantages of MAEL estimation in relation to other, non-likelihood-based estimation procedures. In Appendix A, we describe the first- and second-order estimation problems in some generality, whereas Appendix B gives details of the estimation algorithm used.
CMMs
Consider the categorical variables with . Let be i.i.d. data points, where each consists of the scores of the ith respondent on the variables . The data can be collected in a J-way contingency table of observed frequencies with cells. The observed frequency of the response pattern on variables is denoted by . The observed frequencies in the contingency table are collected in an vector , arranged in lexicographical order; that is, the digit in the last row of the corresponding response pattern changes fastest and the digit in the first row changes slowest. As an example, Eq. 1 shows the vector containing the observed frequencies of the response patterns pertaining to the scores of respondents on binary variables, a, b, and c:
| 1 |
If it is clear which variables are involved, then the superscript may be omitted. Marginal frequencies are denoted by removing the appropriate variable(s) from the subscript and score(s) from the superscript. In some formulas, the subscript i in is used as an index. For example, means the sum over all elements of .
The probability that a randomly drawn respondent has response pattern given that the CMM of interest is true, is denoted by . Assuming a fixed sample size N, let be the expected frequency satisfying . The expected frequencies and probabilities are collected in vectors , and , respectively, in the same manner as the observed frequencies were collected in . ML estimates of and are denoted by and , respectively. Without any constraints imposed upon the data, and .
Let be a matrix of zeroes and ones, so that consists of the relevant marginals from the contingency table. A CMM is defined by constraints of the form
| 2 |
where is an appropriate function, is a design matrix of full column rank, and is a vector of parameters. For estimation purposes, parameter is eliminated from the equation as follows. Let be the orthogonal complement of the column space spanned by the columns of (i.e., and the concatenated matrix is square and non-singular). By pre-multiplying both sides of Eq. 2 by , the CMM is written as a set of constraints:
| 3 |
Note that parameter can be obtained from Eq. 2 by
| 4 |
The constraint formulation (cf. Eq. 3) is computationally convenient since it allows the Lagrange multiplier technique to be used, and asymptotic theory has been developed using this formulation (Aitchison & Silvey, 1958, Lang, 2005). In addition, the parameter formulation (Eq. 2) is not possible if is of full column rank because , the orthogonal complement of , does not exist. Therefore, the parameter formulation of CMMs will be disregarded from here on.
For notational convenience, we can replace by . So, the shortest notation for a CMM is
| 5 |
Let D be the number of constraints in Eq. 5; that is, the length of vector . The fit of the CMM can be investigated by comparing and the ML estimate under the model, , using a likelihood ratio test statistic () or Pearson’s Chi-square test statistic (), which have an asymptotic Chi-square distribution with D degrees of freedom if the model is true. Example 1 shows a simple CMM following the build up in Eqs. 2, 3, 4, and 5, whereas Example 2 shows a CMM that has been used in psychometrics.
Example 1
Consider in Eq. 1. Suppose that we want to fit the CMM that prescribes marginal homogeneity: (and consequently, ). First, pre-multiplying by design matrix (Eq. 2) yields the required margins; that is,
| 6 |
Function (Eq. 2) is the identity function, so . To write the CMM as a set of constraints, is pre-multiplied by constraint matrix (cf. Eq. 3, left-hand side), and set to zero, yielding
| 7 |
As the column vector is the orthogonal complement of , with , parameter (which in this case is 1-dimensional) can be obtained by Eq. 4; that is,
| 8 |
Conventional short notation (Eq. 5) is obtained by letting ; that is,
| 9 |
The vector of expected frequencies that is closest (in an ML sense) to (Eq. 1) and meets the requirement of Eq. 9 is
| 10 |
Comparing given in Eq. 1 and given in Eq. 10 yields (df = 2, , the hypothesis of marginal homogeneity should not be rejected.
Example 2
Item-scalability coefficient ) is used in Mokken scale analysis (e.g., Mokken, 1971; Sijtsma & Van der Ark, 2017) and expresses the strength of the relationship between item and the other items in the test, comparable with a regression coefficient in a regression model. One of the criteria of a Mokken scale is that all coefficients (Sijtsma & Molenaar 2002). Hence, a relevant question is whether all the most popular or least difficult item), item-scalability coefficients for dichotomous items (Mokken, 1971, p. 151) are defined as
| 11 |
Consider the observed frequencies in Eq. 1. Let be a vector containing the item-scalability coefficients of items . Equation 11 shows that defines a CMM (Eq. 5); we refer to Van der Ark et al. (2008) for computational details.
The sample values of for the vector of observed frequencies in Eq. 1 are . Fitting the CMM that all item-scalability coefficients equal 0.3 to the data in Eq. 1 yields should be rejected.
Estimation of CMMs
ML and MEL Estimation
Assuming that the frequency vector follows a multinomial distribution, the likelihood function is
| 12 |
The maximum likelihood estimate subject to the model constraint
| 13 |
and the multinomial constraint
| 14 |
In Appendix B, an algorithm for finding subject to Eqs. 13 and 14 and the structural-zero constraint
| 15 |
MEL estimation can be done using the same algorithm as ML estimation because the cells corresponding to zero observed cells set to zero. Example 3 shows an illustration of MEL estimation.
Example 3
This example illustrates MEL estimation of the CMM in Example 1. For the vector of observed frequencies in Eq. 1,
| 16 |
In Eq. 16, is fixed to zero, and not considered in the estimation procedure. The CMM in Eq. 9 under MEL reduces to
| 17 |
Comparing given in Eq. 16 and
yields ). In this case, ML estimation (see Example 1) and MEL estimation provide identical expected frequencies and model fit, but this is not true in general.
The First- and Second-Order Estimation Problems for CMMs
Unfortunately, the support belonging to the empirical likelihood may be too small for the CMM to be estimated and to do inference. We identify two problems, which are described more formally and in some more generality in Appendix A. We say that the first-order estimation problem occurs if the equation does not have any solutions. This is also known as the empty set problem (Grendár & Judge, 2009). The second-order estimation problem occurs if the empirical likelihood support is too small to be able to estimate the covariance matrix of the estimated marginal parameters. Occurrence of the first-order problem implies occurrence of the second-order problem, and absence of the second-order problem implies absence of the first-order problem. If the second-order problem occurs, inference for the model is problematic. The first- and second-order estimation problems can occur for MEL estimation with sparse observed contingency tables, as illustrated next.
Example 4
Consider a contingency table and let
Suppose we observe
Then, it can be verified that does not have any solutions; that is, the first-order estimation problem (or empty set problem) occurs, and hence so does the second-order one. If, on the other hand, we observed
then the first-order problem does not occur. Assuming Poisson sampling for simplicity, we have . Under empirical likelihood, this is zero; that is, the variance of the marginal parameter cannot be estimated, and the second-order problem occurs.
Example 5
Consider dichotomous variables and , and let the CMM be . Let , hence . It follows that and . Under the assumption that , Eq. 11 reduces to
| 18 |
Frequency is not observed, so due to the structural-zero constraint (Eq. 15), MEL estimation produces by definition. As a result, the ratio on the right-hand side of Eq. 18 equals zero, and . Hence, there exists no satisfying .
MAEL Estimation
A solution to the first- and second-order estimation problems is obtained by augmenting the empirical likelihood support with a number of support cells, which we call maximum augmented empirical likelihood (MAEL) estimation. The question arises which cells to add. For CMMs, there is a fairly natural choice, in particular, suppose the order k marginal distributions are of interest for a particular CMM. Then clearly, to avoid the first-order estimation problem, the support must contain for every marginal cell at least one cell in the contingency table contributing to it. Hence, this is the least augmentation that should be done for the empirical likelihood support. To avoid the second-order estimation problem, note that the covariance between observed marginals is a function of higher-order marginals, for example,
or
where a plus in the subscript denotes summation over that subscript. If the relevant higher-order marginals are estimable, the second-order estimation problem can typically be avoided.
If the second-order estimation problem occurs, it can be resolved by augmenting the empirical likelihood support so that each of the relevant higher-order marginals has one or more cells contributing to it. We found that the methodology is not affected much by which cells were chosen. In practice, we randomly added cells, which gave good results.
The notation is as follows. For ML estimation, all L cells of are considered, and for MEL estimation, only the cells with a positive observed count, collected in , are considered. MAEL can be regarded as an intermediate estimation method, considering the cells with a positive observed count plus a number of cells with zero observed count to avoid the first- and second-order estimation problems. Let be such that , and let , , and denote the augmented vector of observed frequencies, expected frequencies, and probabilities, respectively.
Example 6 explores some possibilities to augment the empirical likelihood support for a small example, illustrating that the fit of a CMM decreases dramatically when too few cells are added to .
Example 6
Suppose that
| 19 |
and suppose the marginal homogeneity CMM in Eq. 9 is the CMM of interest. The ML estimate is with (df = 2). For MEL estimation, the second-order estimation problems occur. Because
| 20 |
Eq. 9 reduces to
| 21 |
The rows of the design matrix in Eq. 21 contain only nonnegative elements, and the constraints imply that . But since and , the likelihood function is zero whenever Eq. 21 holds; that is, .
The problem of a zero likelihood can be circumvented by adding to . Then we obtain
| 22 |
yielding with . Neither ML nor MAEL fit the data well but is almost 10 times larger for MAEL than for ML. Including more cells may decrease the difference in global fit between MAEL and ML. The second-order estimation problem can be circumvented if , , and are added to . In this way, includes all bivariate margins:
| 23 |
yielding with (df = 2). is now much closer to of the ML solution.
Comparing ML, MEL, and MAEL
Two studies compared the ML, MEL, and MAEL estimation procedures for three CMMs relevant for psychology and educational sciences:
Model “Alpha". Kuijpers et al. (2013) showed that testing whether Cronbach’s alpha () equals a certain benchmark can be done using a CMM with 1 degree of freedom. Model “Alpha" is , because .8 is an arbitrary but commonly used benchmark to assess the quality of the test-score reliability (see, e.g., Nunnally, 1978).
Model “". For a set of J items, Van der Ark et al. (2008) showed that testing whether each item-scalability coefficients () equals the researcher-specified lower-bound values c can be done using a CMM with J degree of freedom. Let . Model “” is , as.3 is the default value of for lower bound c provided by software programs for Mokken scale analysis.
Model “Mean”. Bergsma et al. (2009, pp. 185–188) showed that testing equality of means of J variables can be done using a CMM with degrees of freedom. Investigating equality of means may be useful when investigating whether a set of items are parallel (e.g., Lord & Novick, 1968, pp. 47–50)
Study 1 is an exploratory simulation study to investigate the convergence rate and computation time under various settings. The tables are small to allow ML estimation. In Study 2, we investigated the Type I error rate of CMMs estimated with MAEL for realistic numbers of items in psychological and educational test data. We considered tables ranging from small (16 cells) to enormous (). In addition, we investigated bias and variance of parameter . ML estimation was not considered because it is feasible only for small tables, and MEL estimation was not considered because in most cases the algorithm runs into singularity problems and, consequently, does not converge.
Population Models and Estimation
Both Study 1 and Study 2 required population models (i.e., the vector of probabilities, ) that comply with the constraints of the CMM under consideration (i.e., “Model Alpha”, “Model ”, or “Model Mean” for J items). The population models were constructed as follows. First, we constructed a two-parameter logistic model (2PLM), a popular item response theory model (Birnbaum, 1968), for which the location and discrimination parameters were selected (by trial and error) such that data generated from that 2PLM were close, in a loose sense, to the requirements of the CMM under consideration. Next, we generated 1000 response patterns from the 2PLM. Then, using ML (Study 1) or MAEL (Study 2), the CMM under consideration was estimated for the generated data, and the resulting estimated probabilities were used as the probabilities of the data generating model. Finally, N observations were sampled from . This data-generating procedure yields expected frequencies that meet the constraints of the CMM of interest and have a relatively close fit to the 2PLM.
In Study 1, a certain percentage of the probabilities from the population model was deliberately set to zero, so as to create conditions with many zero cells. The cells in that were set to zero were randomly selected, and afterwards was rescaled. Note that setting random cells to zero is useful to investigate convergence, but makes investigation of Type I error and bias impossible.
The CMMs under consideration were estimated using the generated data as input, employing the R package cmm (Bergsma & Van der Ark, 2023), which offers MAEL estimation starting from version 1.0. All CMMs received uniform starting values and a maximum of 1,000 iterations. The code is available on the Open Science Framework at https://osf.io/yz8rm/).
Study 1: convergence Rates and Computation Times
For , we investigated the effect of four independent variables on convergence rate and computation time. Estimation Procedure had three levels: ML, MEL, and MAEL. Type of CMM had three levels: “Model Alpha”, “Model ”, and “Model Mean”. For “Model Alpha” the criterion value was set to the sample value plus. 2; and for “Model ” the criterion value was set to the average of the sample values. For convenience, the criterion values depend on the sample values. Because Study 1 investigated only computation time and convergence rate, sample-dependent criterion values are not a problem. Minimum Percentage Cells with Zero Observed Frequency (U) had three levels: 0% (none), 25% (small percentage), and 75% (large percentage). Number of items (J) had two levels: 4 dichotomous items, yielding possible response patterns, and 8 items, yielding response patterns. The number of items was kept small to allow for ML estimation. Hence, we had a 3 (Estimation Method) 3 (CMM) 3 (U) 2 (J) experimental design with a total of 54 cells. Each cell in the experimental design was replicated 1,000 times. For a small extra design (100 replications), we estimated CMMs with 10 () items to demonstrate the sharp increase in computation time.
Table 1 shows that for the smallest tables ( and ), both ML and MAEL almost always converged, whereas MEL often broke down for models “” and “Mean”. For , ML ran into memory problems for models “” and “Mean”, whereas MEL almost always broke down. For Model “Alpha”, convergence results were satisfactory for all three estimation methods.
Table 1.
Convergence rates (percentage) and median computation times in seconds for ML, MEL, and MAEL, for three different CMMs, two numbers of items (J), and three percentages of unobservable response patterns (U) based on () and 100 () replications.
| CMM | J | U | Convergence rate | Median computation time | |||||
|---|---|---|---|---|---|---|---|---|---|
| ML | MEL | MAEL | ML | MEL | MAEL | ||||
| Alpha | 4 | 0% | 100.0 | 100.0 | 100.0 | 0.00 | 0.00 | 0.00 | |
| 25% | 100.0 | 100.0 | 100.0 | 0.01 | 0.00 | 0.01 | |||
| 75% | 100.0 | 96.6 | 100.0 | 0.00 | 0.00 | 0.00 | |||
| 8 | 0% | 100.0 | 100.0 | 100.0 | 1.14 | 0.00 | 0.05 | ||
| 25% | 100.0 | 100.0 | 100.0 | 1.16 | 0.00 | 0.06 | |||
| 75% | 100.0 | 100.0 | 100.0 | 1.25 | 0.00 | 0.09 | |||
| 10 | 0% | 100.0 | 100.0 | 100.0 | 107.26 | 0.10 | 0.30 | ||
| 25% | 100.0 | 100.0 | 100.0 | 114.13 | 0.05 | 0.39 | |||
| 75% | 100.0 | 100.0 | 100.0 | 108.56 | 0.10 | 0.48 | |||
| 4 | 0% | 100.0 | 75.5 | 99.5 | 0.00 | 0.00 | 0.00 | ||
| 25% | 100.0 | 75.6 | 99.5 | 0.00 | 0.00 | 0.00 | |||
| 75% | 100.0 | 0.0 | 100.0 | 0.00 | NA | 0.00 | |||
| 8 | 0% | 99.8 | 47.6 | 99.5 | 0.14 | 0.02 | 0.01 | ||
| 25% | 100.0 | 52.8 | 99.8 | 0.14 | 0.03 | 0.01 | |||
| 75% | 99.7 | 40.2 | 99.7 | 0.14 | 0.82 | 0.02 | |||
| 10 | 0% | 0.0 | 36.0 | 99.0 | NA | 0.78 | 0.50 | ||
| 25% | 0.0 | 30.0 | 98.0 | NA | 1.13 | 0.60 | |||
| 75% | 0.0 | 31.1 | 99.0 | NA | 1.12 | 0.80 | |||
| Mean | 4 | 0% | 100.0 | 45.9 | 100.0 | 0.01 | 0.01 | 0.01 | |
| 25% | 100.0 | 20.8 | 100.0 | 0.01 | 0.01 | 0.01 | |||
| 75% | 100.0 | 0.0 | 100.0 | 0.01 | NA | 0.01 | |||
| 8 | 0% | 100.0 | 2.9 | 100.0 | 0.02 | 0.02 | 0.02 | ||
| 25% | 100.0 | 3.1 | 100.0 | 0.02 | 0.02 | 0.02 | |||
| 75% | 100.0 | 0.5 | 100.0 | 0.02 | 0.02 | 0.02 | |||
| 10 | 0% | 0.0 | 2.0 | 100.0 | NA | 0.37 | 0.02 | ||
| 25% | 0.0 | 0.0 | 100.0 | NA | NA | 0.03 | |||
| 75% | 0.0 | 0.0 | 100.0 | NA | NA | 0.03 | |||
The distribution of the computation time was positively skewed. Therefore, we reported the median rather than the mean computation time. Naturally, MAEL and MEL were at least as fast as ML: Ranging from just as fast to more than 200 times faster. As the number of items increased, the computation time increased dramatically (Table 1, columns 4–6). This was especially true for ML estimation. For 4 and 8 items (), but the computation time was still reasonable in all sample (never longer than 100 s), but for 10 items () some runs took up to 30 min for Model “Alpha”.
The results show that even for moderately large tables, ML may run into memory problems. Moreover, the results show that the first- and second-order estimation problems are omnipresent so that MEL often breaks down. This leaves MAEL as the viable candidate for estimating CMMs for large sparse contingency tables.
Study 2: Type I Error Rate
For MAEL estimation, we investigated the effect of the type of CMM, the number of items, and sample size on the Type I error rate and the bias and standard deviation of model parameter . (Eq. 2). As in Study 1, Type of CMMs had three levels: “Model Alpha” (the criterion value was set to 0.8), “Model ” (the criterion value was set to 0.3), and “Model Mean”. For “Model Alpha” and “Model ” parameter is fixed to and , respectively. Hence, bias and standard deviation of were investigated only for Model “Mean”, where equals the overall mean item score. Moreover, we studied four levels of number of items: 4 (), 8 (), 20 (), and 40 (); and three levels of sample size (, , and ). Hence, we had a experimental design with a total of 36 cells. Each cell in the experimental design was replicated 10,000 times for and items and 1000 times for and items. The empirical Type I error rate over the replications was compared to the nominal Type I error rate of 0.05, the mean value of over replications was used to estimate the bias, and the standard deviation of over replications was used as an estimate of the standard error of .
Table 2 shows the Type I error rates for all cells in the design. In most cells, the Type I error rates are close to the nominal Type I error rate. For models with many degrees of freedom estimated using a relatively small sample size, the models are too liberal. For 40 items, models “” and “Mean” have 40 and 39 degrees of freedom, respectively. For , this results in approximately 6 observations per degree of freedom. Hence, the poor performance is not so much due to the large table as due to the increase in degrees of freedom. Results are satisfactory if the sample size per degree of freedom exceeds 25 (see Fig. 1).
Table 2.
Type I error rate for MAEL estimation of three different CMMs, four different numbers of items (J), and three different sample sizes (N), based on 1000 replications.
| Model | N | J | ||||
|---|---|---|---|---|---|---|
| 4 | 8 | 20 | 40 | |||
| Alpha | 1 | 250 | 0.048 | 0.060 | 0.056 | 0.060 |
| 500 | 0.053 | 0.063 | 0.056 | 0.043 | ||
| 1000 | 0.063 | 0.050 | 0.047 | 0.038 | ||
| J | 250 | 0.052 | 0.055 | 0.095 | 0.351 | |
| 500 | 0.057 | 0.053 | 0.051 | 0.106 | ||
| 1000 | 0.067 | 0.052 | 0.078 | 0.070 | ||
| Mean | 250 | 0.048 | 0.052 | 0.056 | 0.053 | |
| 500 | 0.048 | 0.054 | 0.050 | 0.055 | ||
| 1000 | 0.046 | 0.049 | 0.055 | 0.050 | ||
Note: A 95% confidence interval for the Type I error rate equals [0.036;0.064]. Values outside the 95% confidence interval are printed in boldface.
Fig. 1.

Type I error rates by the ratio of sample size and degrees of freedom in Study 2. Dashed lines are the limits of the 95% confidence interval of the Type I error rate due to Monte Carlo error.
For Model “Mean”, the bias of (not tabulated) was negligible in all cases, and the estimated standard error (Table 3) behaved as expected; that is, if N doubles, the estimated standard error decreased approximately by a factor .
Table 3.
Estimated standard error of CMM-parameter estimate for Model “Mean”, for four different numbers of items (J), and three different sample sizes(N), based on 1000 ( and ) and 10, 000 ( and ) replications.
| Model | N | J | ||||
|---|---|---|---|---|---|---|
| 4 | 8 | 20 | 40 | |||
| Mean | 250 | 0.020 | 0.018 | 0.003 | 0.002 | |
| 500 | 0.014 | 0.013 | 0.003 | 0.002 | ||
| 1000 | 0.010 | 0.009 | 0.002 | 0.001 | ||
Discussion
CMMs have potential for application to psychological data, but an important reason that this potential has so far not been realized may be that up to now ML estimation of CMMs could only be applied to contingency tables for a limited number of categorical variables (up to, say, 10–20 variables, depending on the number of categories per variable). The present paper shows that this limitation can be resolved by the newly introduced maximum augmented empirical likelihood (MAEL) estimation method, a procedure that considers all nonzero cells in the table (i.e., cells with at least one observation) and some well-chosen zero cells in the table (i.e., cells with no observations). MAEL can be thought of as lying in between maximum empirical likelihood (MEL) estimation, which considers only nonzero cells in the table and subsequently suffers from the first-order and second-order estimation problems, and maximum likelihood (ML), which considers all cells in the table and runs into memory problems if the table is large.
The asymptotic distribution of the ML estimators of marginal parameters is known (Lang 2005), and depends only on the covariance matrix of the sample marginal distributions. In contrast to MEL, due to the augmentation step MAEL allows this covariance matrix to be estimated. Simulation study 2 shows this estimation is done sufficiently well in a number of practical settings, in particular, the asymptotic distribution of the ML estimators also provide a good approximation of the distribution of the MAEL estimators. The asymptotic distributions of ML and MAEL estimators are identical.
MAEL estimation has advantages compared to alternative methods which can be used to estimate CMMs for large contingency table, namely the weighted least squares method (Grizzle et al., 1969, a.k.a. the GSK-method), generalized estimating equations (GEEs, e.g., Qaqish & Liang, 1992), and composite likelihood (e.g., Varin et al., 2011). A comparison of GSK and GEE with ML estimation is given in Rudas and Bergsma (2023). All these four methods can be used to estimate CMMs for almost arbitrarily large contingency tables, but the only methods with guaranteed optimal asymptotic efficiency are MAEL and GSK. Unlike MAEL, however, GSK is sensitive to sparsity of the marginal distributions (Bergsma et al., 2013, see also the discussion of Berkson, 1980).
Like GEE and GSK, MAEL estimation is computationally fast, and like ML but unlike GEE, it is asymptotically efficient. Furthermore, MAEL is less sensitive to sparsity of the marginal distributions than GSK. Thus, MAEL seems to be the preferred method for estimating CMMs. Researchers should take heed that if the ratio of the sample size and degrees of freedom becomes too small (say less than 25), the Type I error rates may be too liberal. This is not a feature of MAEL per se, but for all models that are too complex for the number of observations. Composite likelihood estimation is a possibly attractive alternative for estimating CMMs, which was not considered in this study because the estimation procedures are not yet available for CMMs, whereas MAEL fits nicely in the ML framework and software that is already available for CMMs. In addition, composite likelihood is a quasi-likelihood method, and hence asymptotic efficiency is lost, whereas ML, and hence MAEL and MEL, are asymptotically efficient (Aitchison & Silvey 1958, Lang, 2005).
Appendix A: First- and Second-Order Estimation Problems
With a random variable, MEL can be used to make inferences on a Euclidean parameter of the distribution of , where is defined by an estimating equation of the form
| 24 |
for some function . For example, if , then (24) implies that . Denote the population value of by . Suppose we have observed , which are i.i.d. and distributed as . The MEL estimator of defined by (24) solves the constrained optimization problem
subject to
| 25 |
The first-order estimation problem occurs if (25) does not have a solution (This problem is also known as the empty set problem, see Grendár & Judge, 2009). The best known example is the case that while the population mean lies outside the convex hull of (Qin and Lawless, 1994).
Let F be the distribution function of . Under some conditions on and F, has an asymptotic multivariate normal distribution, in particular,
where
and
The second-order estimation problem occurs if there does not exist a distribution function G with empirical support such that .
Example 7
To illustrate the second-order estimation problem, consider a contingency table with cell probabilities (), and let be the log ratio of marginal odds; that is,
(For details on how to define such that this is the solution of (24), see Owen, 2001). If one observed marginal count is zero, then under empirical likelihood; that is, the first-order estimation problem occurs since . If the two observed off-diagonal cell counts are zero, then under empirical likelihood and as a consequence for any distribution G with support the two diagonal cells. However, assuming no structural zeroes in the table, , and therefore the second-order estimation problem occurs. In this case, the first-order estimation problem occurs in addition to the second-order one if .
A special case of the second-order estimation problem has been identified earlier by Bergsma et al. (2012), who called it the zero-likelihood problem. It occurs if the empirical likelihood is zero for all solutions of (25). In this case, for any distribution G with support , is a matrix of zeroes, unequal to ; hence, the second-order estimation problem occurs.
We propose a solution for both estimation problems by augmentation of the support of the empirical likelihood, resulting in an estimation procedure lying in a spectrum with ML at one extreme and MEL at the other, which we call maximum augmented empirical likelihood (MAEL) estimation.
B Algorithm for Maximum Likelihood Estimation
Under some regularity conditions, the maximum likelihood estimates under model (5) are a saddle point of the Lagrangian log-likelihood
| 26 |
where and are Lagrange multipliers. In Eq. 26, is the unconstrained kernel of the log-likelihood, and the Lagrangian terms are added to satisfy the multinomial sampling constraint (Eq. 14) and the model constraint (Eq. 13). Bergsma (1997, pp. 89–95) developed a Fisher scoring algorithm to find the ML estimates of the constrained expected frequencies in (Eq. 5) or, equivalently, the constrained cell probabilities . This algorithm is a modification of Lagrangian algorithms by Aitchison and Silvey (1958) and Lang and Agresti (1994).
It can be shown that , so Eq. 26 can be simplified to
| 27 |
The ML estimates of the and are obtained by means of an iterative procedure that determines a saddle point of this Lagrangian.
We take the derivative of with respect to rather than because they yield simpler expressions. Note that iff . Let be the Jacobian of with respect to . Differentiating with respect to yields
Under suitable regularity conditions, the ML estimator is a vector for which there is a Lagrange multiplier vector such that the simultaneous equations
and
are satisfied. Then, the expected value of the derivative matrix of the vector with respect to is
Let be equal to the vector with zeroes replaced by a small positive constant (say, ), and define the Fisher scoring starting values
and, for ,
Then, as , should go to . Tedious but straightforward matrix algebra yields the simplified form
This algorithm does not always converge, and it can be helpful to introduce a step size as follows:
| 28 |
Note that the update of is left unchanged.
The step size should be chosen so that the new estimate is better than the old estimate . A criterion for deciding this is obtained by defining the following quadratic form measuring the distance from convergence:
Convergence is reached at if and only if and therefore, if possible, the step size should be chosen so that for all k. This is possible if the tentative solution is sufficiently close to the ML estimate. Otherwise, a recommendation which seems to work very well in practice is to jump to another region by taking a step size equal to one.
Footnotes
Letty Koopman received a research Grant from the Dutch Research Council (NWO): Research Talent Grant 406.16.554. The other authors declare no conflict of interest.
Letty Koopman is now at the University of Groningen.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
References
- Aitchison J, Silvey SD. Maximum-likelihood estimation of parameters subject to restraints. The Annals of Mathematical Statistics. 1958;29(3):813–828. doi: 10.1214/aoms/1177706538. [DOI] [Google Scholar]
- Bartolucci F, Colombi R, Forcina A. An extended class of marginal link functions for modelling contingency tables by equality and inequality constraints. Statistica Sinica. 2007;17(2):691–711. [Google Scholar]
- Bergsma, W. P. (1997). Marginal models for categorical data. Tilburg: Tilburg University Press. Retrieved from http://stats.lse.ac.uk/bergsma/pdf/bergsma_phdthesis.pdf
- Bergsma WP, Croon MA, Hagenaars JA. Marginal models: For dependent, clustered, and longitudinal categorical data. Springer. 2009 doi: 10.1007/b12532. [DOI] [Google Scholar]
- Bergsma WP, Croon MA, Hagenaars JA. Advancements in marginal modelling for categorical data. Sociological Methodology. 2013;43(1):1–41. doi: 10.1177/0081175013488999. [DOI] [Google Scholar]
- Bergsma WP, Croon MA, Van der Ark LA. The empty-set and zero-likelihood problems in maximum empirical likelihood estimation. Electronic Journal of Statistics. 2012;6(1):2356–2361. doi: 10.1214/12-EJS750. [DOI] [Google Scholar]
- Bergsma WP, Rudas T. Marginal models for categorical data. The Annals of Statistics. 2002;30(1):140–159. doi: 10.1214/aos/1015362188. [DOI] [Google Scholar]
- Bergsma, W. P., & Van der Ark, L. A. (2023). cmm: Categorical marginal models. R package version 1.0. [Computer software] http://cran.r-project.org/web/packages/cmm/
- Berkson J. Minimum chi-square, not maximum likelihood! The Annals of Statistics. 1980;8(3):457–487. doi: 10.1214/aos/1176345003. [DOI] [Google Scholar]
- Birnbaum, A. (1968). Some latent trait models and their use in inferring an examinees ability. In F. M. Lord & M. R. Novick (Eds.), Statistical theories of mental test scores (pp. 395–480). Addison-Wesley.
- Chen J, Variyath AM, Abraham B. Adjusted empirical likelihood and its properties. Journal of Computational and Graphical Statistics. 2008;17(2):426–443. doi: 10.1198/106186008X321068. [DOI] [Google Scholar]
- Cheung GW, Rensvold RB. Evaluating goodness-of-fit indexes for testing measurement invariance. Structural Equation Modeling. 2002;9(2):233–255. doi: 10.1207/S15328007SEM0902_5. [DOI] [Google Scholar]
- Colombi R, Forcina A. Marginal regression models for the analysis of positive association of ordinal response variables. Biometrika. 2001;88(4):1007–1019. doi: 10.1093/biomet/88.4.1007. [DOI] [Google Scholar]
- Costa, P. T., & McCrae, R. R. (2008). The Revised NEO Personality Inventory (NEO-PI-R). In G. J. Boyle, G. Matthews, & H. Saklofske (Eds.), The SAGE handbook of personality theory and assessment (Vol. 2, pp. 179–198). Sage.
- Emerson, S. C., & Owen, A. B. (2009). Calibration of the empirical likelihood method for a vector mean. Electronic Journal of Statistics, 3(1), 1161–1192. 10.1214/09-EJS518
- Evans RJ, Forcina A. Two algorithms for fitting constrained marginal models. Computational Statistics & Data Analysis. 2013;66(1):1–7. doi: 10.1016/j.csda.2013.02.001. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Feldt LS. The approximate sampling distribution of Kuder–Richardson reliability coefficient twenty. Psychometrika. 1965;30:357–370. doi: 10.1007/BF02289499. [DOI] [PubMed] [Google Scholar]
- Feldt LS. A test of the hypothesis that Cronbach’s alpha or Kuder–Richardson coefficient twenty is the same for two tests. Psychometrika. 1969;34:363–373. doi: 10.1007/BF02289364. [DOI] [Google Scholar]
- Feldt LS. A test of the hypothesis that Cronbach’s alpha reliability coefficient is the same for two tests administered to the same sample. Psychometrika. 1980;45:99–105. doi: 10.1007/BF02293600. [DOI] [Google Scholar]
- Grendár M, Judge G. Empirical set problem of maximum empirical likelihood methods. Electronic Journal of Statistics. 2009;3(1):1542–1555. doi: 10.1214/09-EJS528. [DOI] [Google Scholar]
- Grizzle JE, Starmer CF, Koch GG. Analysis of categorical data by linear models. Biometrics. 1969;25(3):489–504. doi: 10.2307/2528901. [DOI] [PubMed] [Google Scholar]
- Jorgensen, T. D., Kite, B. A., & Chen, P.-Y. (2017). Finally! A valid test of configural invariance using permutation in multigroup CFA. In L. A. van der Ark, M. Wiberg, S. A. Culpepper, J. A. Douglas, & W.-C. Wang (Eds.), Quantitative psychology. The 81st Annual Meeting of the Psychometric Society , Asheville, North Carolina, 2016., Springer. 10.1007/978-3-319-56294-0_9
- Kuijpers RE, Van der Ark LA, Croon MA. Testing hypotheses involving Cronbach’s alpha using marginal models. British Journal of Mathematical and Statistical Psychology. 2013;66(3):503–520. doi: 10.1111/bmsp.12010. [DOI] [PubMed] [Google Scholar]
- Lang JB. Maximum likelihood methods for a generalized class of log-linear models. The Annals of Statistics. 1996;24(2):726–752. doi: 10.1214/aos/1032894462. [DOI] [Google Scholar]
- Lang JB. Homogeneous linear predictor models for contingency tables. Journal of the American Statistical Association. 2005;100(469):121–134. doi: 10.1198/016214504000001042. [DOI] [Google Scholar]
- Lang JB, Agresti A. Simultaneously modeling the joint and marginal distributions of multivariate categorical responses. Journal of the American Statistical Association. 1994;89(426):625–632. doi: 10.1080/01621459.1994.10476787. [DOI] [Google Scholar]
- Molenberghs G, Lesaffre E. Marginal modelling of multivariate categorical data. Statistics in Medicine. 1999;18(17–18):2237–2255. doi: 10.1002/(SICI)1097-0258(19990915/30)18:17/18<2237::AID-SIM252>3.0.CO;2-R. [DOI] [PubMed] [Google Scholar]
- Lloyd S. Ultimate physical limits to computation. Nature. 2000;406(1):1047–1054. doi: 10.1038/35023282. [DOI] [PubMed] [Google Scholar]
- Lord, F. M., & Novick, M. R. (1968). Statistical theories of mental test scores. Addison-Wesley.
- Maydeu-Olivares A, Coffman DL, García-Forero C, Gallardo-Pujol D. Hypothesis testing for coefficient alpha: An SEM approach. Behavior Research Methods. 2010;42:618–625. doi: 10.3758/BRM.42.2.618. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Maydeu-Olivares A, Coffman DL, Hartmann WM. Asymptotically distribution-free (ADF) interval estimation of coefficient alpha. Psychological Methods. 2007;12:157–176. doi: 10.1037/1082-989X.12.2.157. [DOI] [PubMed] [Google Scholar]
- Mokken, R. J. (1971). A theory and procedure of scale analysis. De Gruyter.
- Nunnally, J. C. (1978). Psychometric theory. McGraw-Hill.
- Nguyen MK, Phelps S, Ng WL. Simulation based calibration using extended balanced augmented empirical likelihood. Statistics and Computing. 2015;25(6):1093–1112. doi: 10.1007/s11222-014-9506-9. [DOI] [Google Scholar]
- Owen, A. B. (2001). Empirical likelihood. Chapman & Hall/CRC. 10.1201/9781420036152
- Qaqish BF, Liang KY. Marginal models for correlated binary responses with multiple classes and multiple levels of nesting. Biometrics. 1992;48(3):939–950. doi: 10.2307/2532359. [DOI] [PubMed] [Google Scholar]
- Qin J, Lawless J. Empirical likelihood and general estimating equations. The Annals of Statistics. 1994;22(1):300–325. doi: 10.1214/aos/1176325370. [DOI] [Google Scholar]
- Raven J, Raven JC, Court JH. Manual for Raven’s Progressive Matrices and Vocabulary Scales. Section 1: General Overview. New York: Harcourt Assessment; 2003. [Google Scholar]
- Rudas T, Bergsma WP. Marginal models: An overview. In: Kateri M, Moustaki I, editors. Trends and challenges in categorical data analysis: Statistical modelling and interpretation. Berlin: Springer; 2023. [Google Scholar]
- Sijtsma K, Molenaar IW. Introduction to nonparametric item response theory. Thousand Oaks: Sage; 2002. [Google Scholar]
- Sijtsma K, Van der Ark LA. A tutorial on how to do a Mokken scale analysis on your test and questionnaire data. British Journal of Mathematical and Statistical Psychology. 2017;70(3):137–158. doi: 10.1111/bmsp.12078. [DOI] [PubMed] [Google Scholar]
- Van der Ark LA, Croon MA, Sijtsma K. Mokken scale analysis for dichotomous items using marginal models. Psychometrika. 2008;73:183–208. doi: 10.1007/s11336-007-9034-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Van Zyl JM, Neudecker H, Nel DG. On the distribution of the maximum likelihood estimator of Cronbach’s alpha. Psychometrika. 2000;65:271–280. doi: 10.1007/BF02296146. [DOI] [Google Scholar]
- Varin C, Reid N, Firth D. An overview of composite likelihood methods. Statistica Sinica. 2011;12(1):5–42. [Google Scholar]
- Xia X, Liu Z. Balanced augmented empirical likelihood for regression models. Journal of the Korean Statistical Society. 2019;48(2):233–247. doi: 10.1016/j.jkss.2018.10.006. [DOI] [Google Scholar]
