KR20 and KR21 for Some Nondichotomous Data (It’s Not Just Cronbach’s Alpha)

Robert C Foster

doi:10.1177/0013164421992535

. 2021 Feb 15;81(6):1172–1202. doi: 10.1177/0013164421992535

KR20 and KR21 for Some Nondichotomous Data (It’s Not Just Cronbach’s Alpha)

Robert C Foster ^1,^✉

PMCID: PMC8451024 PMID: 34565820

Abstract

This article presents some equivalent forms of the common Kuder–Richardson Formula 21 and 20 estimators for nondichotomous data belonging to certain other exponential families, such as Poisson count data, exponential data, or geometric counts of trials until failure. Using the generalized framework of Foster (2020), an equation for the reliability for a subset of the natural exponential family have quadratic variance function is derived for known population parameters, and both formulas are shown to be different plug-in estimators of this quantity. The equivalent Kuder–Richardson Formulas 20 and 21 are given for six different natural exponential families, and these match earlier derivations in the case of binomial and Poisson data. Simulations show performance exceeding that of Cronbach’s alpha in terms of root mean square error when the formula matching the correct exponential family is used, and a discussion of Jensen’s inequality suggests explanations for peculiarities of the bias and standard error of the simulations across the different exponential families.

Keywords: reliability, KR-20, KR-21, Cronbach’s alpha, exponential families

Introduction

Formulas 20 and 21 of Kuder and Richardson (1937), abbreviated throughout this article as KR20 and KR21, respectively, are some of the earliest and most well-known formulas in assessing reliability of a test. For dichotomous data, the formulas are given by

KR 20 = \frac{k}{k - 1} (1 - \frac{\sum_{j = 1}^{k} p_{j} (1 - p_{j})}{σ_{X}^{2}}),

KR 21 = \frac{k}{k - 1} (1 - \frac{k \bar{p} (1 - \bar{p})}{σ_{X}^{2}}),

where $k$ is the test length, $σ_{X}^{2}$ is the variance of sum test scores, $p_{j}$ is the proportion of correct responses to test item $j$ , and $\bar{p}$ is the average correct response over all items. The most common estimator of reliability, Cronbach’s alpha, is often seen as a general version of KR20 (Cronbach, 1951). Beyond Cronbach’s alpha, there does not seem to have been many attempts to determine equivalent variants of KR20 and KR21 to specific types of nondichotomous data, such as count data. Allison (1978) derived a KR21 equivalent for Poisson distributed data, but most have stuck to Cronbach’s alpha as a general estimator of reliability.

The topic of reliability estimation, and Cronbach’s alpha in particular, has been a subject of much recent debate within the psychometric literature. Alpha has been criticized as having assumptions which are not realistic in practice (McNeish, 2018; Schmitt, 1996; Sijtsma, 2009). Criticisms have focused on alpha’s assumptions of tau equivalence and uncorrelated errors. McNeish (2018) claims that normality is an assumption of Cronbach’s alpha, but Raykov and Marcoulides (2019) rebut that no assumptions of normality are made in the derivation of alpha and consistency as an estimator does not depend on an assumption of normality. However, Zumbo (1999) notes that though the classical test theory derivation of alpha makes no assumptions of normality, estimators of alpha often do, in agreement with Bay (1973) who noted in his time that early derivations of statistical properties of estimators such as Cronbach’s alpha and KR20 as in L. S. Feldt (1965) were developed using an analysis of variance model which includes an assumption of normality. More recent research in van Zyl et al. (2000) on the sampling distribution of Cronbach’s alpha using maximum likelihood techniques also assumes normality, though nonparametric methods have been developed for psychometric calculations using techniques like the bootstrap as in Raykov (1998). The statistical properties of alpha and other estimators have been little explored outside of these assumptions of normality, with simulation studies in Sheng and Sheng (2012) and Zimmerman et al. (1993). Beyond classical test theory, Geldhof et al. (2014) explicitly state that nonnormal data is a limitation for reliability estimation in a multilevel confirmatory factor analysis framework. As Zinbarg et al. (2006) note, properties of alternatives to Cronbach’s alpha based on a factor analysis are not well known under nonnormality, despite most psychological data being nonnormal. It is clear that there is a need for analysis of reliability for nonnormal data.

The purpose of this article is to make a contribution toward the analysis of reliability under nonnormality by reversing direction from Cronbach’s alpha once again to derive formulas to estimate test reliability in special cases—to find equivalent forms of KR20 and KR21 which can be used to assess reliability of tests for specific types of nondichotomous data. For example, chapter 21 of Lord et al. (1968), discussing the work of Rasch (1960), notes that the number of an examinee’s misreadings in an oral reading test may be modeled as a Poisson random variable. If the response of interest were instead the time between misreadings, an exponential distribution would then be appropriate. Meredith (1971) also showed that a Poisson process is appropriate for tests of speed, and noted that under certain assumptions the distributions of observed scores ought to be negatively binomial distributed. The key idea is this: KR20 and KR21 can be seen as a specific version of an estimator for reliability for Bernoulli distributed item responses where the mean–variance relationship of the Bernoulli distribution is exploited for variance calculations rather than using standard sample variances. Other exponential family distributions also have mean–variance relationships. By working within the generalized framework of Foster (2020) and deriving a formula for reliability in exponential families which uses the mean–variance relationship and which matches traditional KR20 and KR21 in the binomial case, equivalent versions of KR20 and KR21 are obtained for data from other exponential family distributions. The KR20 and KR21 equivalent formulas are shown for when sum test scores can be said to follow one of the six natural exponential family distributions with quadratic variance function (NEF-QVF): the normal, the binomial, the Poisson, the gamma, the negative binomial, or the natural exponential family generated by a convolution of generalized hyperbolic secant functions (NEF-GHS).

The “Framework” section describes the properties of the generalized framework for reliability described in Foster (2020) which form the basis for derivations of the formulas and definition of reliability as parallel-test correlation, and states which assumptions are slightly modified for the purposes of this article. The “KR20 and KR21 as Estimators of Reliabilty” section shows how the mean–variance relationship can be exploited to obtain test reliability and derives the general KR20 and KR21 formulas as estimators of this relationship, giving the formulas for each NEF-QVF distribution. These are shown to match the traditional KR20 and KR21 of Kuder and Richardson (1937) in the binomial case, and the Poisson KR21 of Allison (1978) in the Poisson case. The conditions for algebraic equivalence between KR20 and Cronbach’s alpha are also discussed. The “Simulation Study” section performs a simulation study showing that these formulas do appear to converge to the population reliability as the number of subjects and test length increases and comparing the equivalent KR20 and KR21 formulas with Cronbach’s alpha and each other in terms of root mean square error (RMSE), bias, and standard deviation. Results indicate that when the formulas are used for data following the appropriate exponential family, performance is improved over Cronbach’s alpha in terms of RMSE, though whether KR20 or KR21 is superior depends on the variance function of the exponential distribution. A brief discussion of Jensen’s inequality indicates a possible explanation for why one formula is superior to another for a given distribution.

Framework

This article follows the generalized framework of Foster (2020), where “generalized” is used in the same sense of a generalized linear model which may deal with nonnormal exponential family data. A complete theoretical description of the framework is given in Foster (2020), but this article will only state without proof elements which are necessary. One major difference is that while the framework of Foster (2020) applies to the entire natural exponential family, this article focuses only on the members of the exponential family having quadratic variance function so that the variance is a polynomial function of the mean up to degree 2. For example, a Bernoulli distribution with success probability $p$ has mean $p$ and variance $p (1 - p) = p - p^{2}$ , so the variance is a quadratic function of the mean. Such distributions and their properties are extensively described in Morris (1982) and Morris (1983), which serve as a general reference. Furthermore, while the framework of Foster (2020) allows the test length to vary between subjects, this article assumes a common test length for all subjects for the sake of simplicity. The framework of Foster (2020) is most closely related to the strong true-score theory found in chapters 21 through 24 of Lord et al. (1968) and associated articles such as Lord (1965) and Keats and Lord (1962) in deriving properties of reliability when specific distributional forms can be assumed; but rather than dealing individually with the binomial and Poisson distributions, the framework of Foster (2020) derives properties for the exponential family in general. The framework makes assumptions of true score distributions in such a way as to ensure that the regression of true score on observed score is linear, though the observed scores are nonnormal.

Let $i = 1, 2, \dots, n$ index test subject and let $j = 1, 2, \dots, k$ index test item. In this framework, each test item for each subject $Y_{ij}$ is assumed to identically follow a natural exponential family distribution with quadratic variance function (NEF-QVF), with independence conditional on subject ability $θ_{i}$ . The six NEF-QVF distributions which may be used as generator distributions in this fashion are the normal, Bernoulli, Poisson, exponential, geometric, and generalized hyperbolic secant densities. Each of these distributions is closed under convolution. For example, the normal is a sum of normals, the binomial is the sum of Bernoullis, and the gamma is a sum of exponentials. Let $X_{i} = \sum_{j = 1}^{k} Y_{ij}$ be the sum score for subject $i$ , summing over all $k$ test items. The six possible distributions for $X_{i}$ are then the normal, binomial, Poisson, gamma, negative binomial, and NEF-GHS.

Furthermore, let abilities $θ_{i}$ follow the corresponding conjugate prior $g (θ_{i} | μ, M)$ for the natural exponential family distribution, where $μ = E [θ_{i}]$ and $M = E [V (θ_{i})] / Var (θ_{i})$ . For example, if the test items $Y_{ij}$ are Bernoulli-distributed with mean given by subject ability $θ_{i}$ , then subject sum scores $X_{i}$ are binomial distributed, and abilities $θ_{i}$ themselves follow a beta distribution with parameters $μ = α / (α + β)$ and $M = α + β$ . A complete description of several common NEF-QVF distributions, their conjugate priors, and the appropriate parameterizations in terms of $μ$ and $M$ is given in the appendix of Foster (2020). The model can be thought of as a hierarchy, with observed scores at the top level and abilities at the bottom level.

For natural exponential family distributions with quadratic variance function, the variance is a polynomial function of the mean of up to degree 2:

Var (Y_{ij} | θ_{i}) = V (θ_{i}) = v_{0} + v_{1} θ_{i} + v_{2} θ_{i}^{2} .

(1)

For example, the Bernoulli distribution has $V (θ_{i}) = θ_{i} (1 - θ_{i}) = θ_{i} - θ_{i}^{2}$ , so $v_{0} = 0, v_{1} = 1,$ and $v_{2} = - 1$ . The normal distribution, which assumes the variance around each item response is known to be $σ^{2}$ , has $V (θ_{i}) = σ^{2}$ constant.

In this framework, the conditional and unconditional expectations of test item $Y_{ij}$ are given by

\begin{matrix} E [Y_{ij} | θ_{i}] = θ_{i} \\ E [Y_{ij}] = μ \end{matrix},

(2)

where $μ$ is the population mean ability. The conditional expectation being equal to ability $θ_{i}$ in Equation (2) is an implicit assumption that each item is of equal difficulty conditional on subject with ability $θ_{i}$ , the necessity of which is discussed in the “Conclusion, Suggestions, and Future Research Directions” section. The conditional and unconditional variances are given by

\begin{matrix} Var (Y_{ij} | θ_{i}) = V (θ_{i}) \\ Var (Y_{ij}) = E [V (θ_{i})] + Var (θ_{i}) \end{matrix},

(3)

where $V (θ_{i})$ is the variance function in Equation (1) applied to abilities $θ_{i}$ . Then sum scores $X_{i}$ have conditional and unconditional expectations

\begin{matrix} E [X_{i} | θ_{i}] = k θ_{i} \\ E [X_{i}] = k μ \end{matrix} .

(4)

Correspondingly, the conditional and unconditional variances of $X_{i}$ are given by

\begin{matrix} Var (X_{i} | θ_{i}) = kV (θ_{i}) \\ Var (X_{i}) = kE [V (θ_{i})] + k^{2} Var (θ_{i}) \end{matrix} .

(5)

Within this framework, the test reliability is defined as the correlation between parallel tests, where parallel means that each test consists of identical, conditionally independent items $Y_{ij}$ following the same exponential family model with means and variances given by Equations (2) to (5). As shown in Foster (2020), when this condition is met the test reliability is equal to

ρ = \frac{k}{M + k},

where $k$ is the test length and $M = E [V (θ_{i})] / Var (θ_{i})$ is the parameter of the underlying distribution of abilities $g (θ_{i} | μ, M)$ . This is also one minus the shrinkage parameter in the Bayesian posterior distribution. Cronbach’s alpha reduces to this quantity in the framework.

Because the response to each test item $Y_{ij}$ is conditionally independent and identical on ability $θ_{i}$ , this framework is unidimensional. As shown in Equation (3) and in Foster (2020), the unconditional variances of each test item and unconditional covariances between test items are equal. The variance–covariance matrix between test items implied in this framework is, thus, tau-equivalent with equal variances, though the variance around each test item $Y_{ij}$ is different for each subject $i$ because of the mean–variance relation given in Equation (1).

KR20 and KR21 as Estimators of Reliability

Conjugate priors for natural exponential families have a close relationship between their mean $μ$ and their variance $Var (θ_{i})$ , connecting to the variance function of the original exponential family distribution they are conjugate to. From Morris (1983), the variance of the conjugate prior for natural exponential families with a quadratic variance function is

V (μ) = Var (θ_{i}) (M - v_{2}),

(6)

where $V (μ)$ is the variance function of Equation (1) applied to the mean $μ$ , $M$ is the parameter of the conjugate distribution of abilities $g (θ_{i} | μ, M)$ , and $v_{2}$ is the coefficient of the quadratic term of the variance function in Equation (1). For example, a beta distribution for $θ_{i}$ with parameters $μ = α / (α + β)$ and $M = α + β$ has variance $Var (θ_{i}) = μ (1 - μ) / (M + 1)$ . With $V (θ_{i}) = θ_{i} (1 - θ_{i})$ for the Bernoulli distribution, this gives $V (μ) = (M + 1) Var (θ_{i})$ .

Define the quantity $R$ as

R = \frac{k}{k + v_{2}} (1 - \frac{kV (μ)}{Var (X_{i})}),

(7)

where $Var (X_{i})$ is the unconditional variance. Using the mean–variance relationship given in Equation (6) and the variances given in Equation (5), this quantity becomes

\begin{matrix} R = \frac{k}{k + v_{2}} (1 - \frac{kV (μ)}{Var (X_{i})}) \\ = \frac{k}{k + v_{2}} (1 - \frac{kVar (θ_{i}) (M - v_{2})}{kE [V (θ_{i})] + k^{2} Var (θ_{i})}) \\ = \frac{k}{k + v_{2}} (\frac{kE [V (θ_{i})] + k^{2} Var (θ_{i}) - kVar (θ_{i}) (M - v_{2})}{kE [V (θ_{i})] + k^{2} Var (θ_{i})}) \\ = \frac{k}{k + v_{2}} (\frac{M + k - (M - v_{2})}{M + k}) \\ = \frac{k}{k + v_{2}} (\frac{k + v_{2}}{M + k}) \\ = \frac{k}{M + k} \\ = ρ \end{matrix} .

(8)

The last line in Equation (8) is the parallel-test reliability, as previously stated. For dichotomous, Bernoulli-distributed item responses, which have $V (θ_{i}) = θ_{i} - θ_{i}^{2}$ and $v_{2} = - 1$ , substitution into the quantity $R$ in Equation (7) gives

R = \frac{k}{k - 1} (1 - \frac{k μ (1 - μ)}{Var (X_{i})}),

which is strikingly similar to the original KR20 and KR21 estimators of Kuder and Richardson (1937), but depends on known population parameters rather than sample quantities.

The parameters in Equation (7) are unknown, however, or else there would be no need to administer any sort of test. The question is then, is it possible to construct a consistent estimator for Equation (7) from sample quantities? The answer is yes. Given a consistent estimator for $V (μ)$ and a consistent estimator for $Var (X_{i})$ , then by Slutsky’s theorem, using these as plug-ins for the numerator and denominator of Equation (7) will produce a consistent estimator for the population test reliability.

For $Var (X_{i})$ , the standard unbiased variance moment-based estimator of variance is used for raw sum scores $x_{i}$ :

s_{x}^{2} = \frac{1}{n - 1} \sum_{i = 1}^{n} (x_{i} - \bar{x})^{2} .

A note on the equivalence of KR20 and Cronbach’s alpha for dichotomous data: These are often stated to be exactly equal; however, algebraic equivalence only occurs when the biased estimator of the sample variance which divides by the number of subjects $n$ is used for $Var (X_{i})$ rather than the unbiased estimator which divides by $n - 1$ (for Cronbach’s alpha itself, either estimator of variance yields identical estimates so long as it is used consistently). For all calculations in this article, the unbiased estimator of sample variance is used. The historical reason for introducing bias into the denominator of the KR20 estimator of reliability appears to be obtaining algebraic equivalence Cronbach’s alpha. As the results of the simulation study in the “Simulation Study” section show, however, using the unbiased estimator of variance produces improved performance over alpha.

For the quantity $V (μ)$ , more than one estimator is possible. Because $V (θ_{i})$ is a continuous polynomial for all six distributions considered, any consistent estimator for $μ$ is consistent for $V (μ)$ by the continuous mapping theorem. The simplest estimator is to note that from Equation (4), $E [X_{i}] = k μ$ , and so $\frac{1}{k}$ times the sample mean of sum scores has expectation $E [\frac{1}{k} \bar{x}] = \frac{1}{k} (k μ) = μ$ . Hence, $\frac{1}{k} \bar{x}$ is a consistent estimator for $μ$ by the law of large numbers, and so $V (\frac{1}{k} \bar{x})$ is a consistent estimator for $V (μ)$ .

An alternative estimator is constructed by observing that the terms in the mean of sum scores $\bar{x}$ can be rearranged to equal the sum of test item means:

\bar{x} = \frac{1}{n} \sum_{i = 1}^{n} x_{i} = \frac{1}{n} \sum_{i = 1}^{n} \sum_{j = 1}^{k} y_{ij} = \frac{1}{n} \sum_{j = 1}^{k} \sum_{i = 1}^{n} y_{ij} = \sum_{j = 1}^{k} {\bar{y}}_{j} .

(9)

From Equation (2), a single test item $Y_{ij}$ has $E [Y_{ij}] = μ$ . By the law of large numbers, the item sample mean ${\bar{y}}_{j}$ converges to $μ$ for each item $j$ . Then $V ({\bar{y}}_{j})$ is also a consistent estimator for $V (μ)$ .

In Equation (7), the desired quantity is not $V (μ)$ , but $kV (μ)$ . Because the variance function applied each item mean $V ({\bar{y}}_{j})$ is independently consistent for $V (μ)$ , a sum of $V ({\bar{y}}_{j})$ over all $k$ items gives the desired result. Plugging this into Equation (7) with unbiased sample variance $s_{x}^{2}$ for the denominator yields the generalized form of KR20:

KR 20 = \frac{k}{k + v_{2}} (1 - \frac{\sum_{j = 1}^{k} V ({\bar{y}}_{j})}{s_{x}^{2}}) .

(10)

For dichotomous, Bernoulli-distributed item responses with $V (θ_{i}) = θ_{i} - θ_{i}^{2}$ so that $v_{2} = - 1$ , this gives:

\frac{k}{k - 1} (1 - \frac{\sum_{j = 1}^{k} {\bar{y}}_{j} (1 - {\bar{y}}_{j})}{s_{x}^{2}}),

which is the traditional KR20 formula, with ${\bar{y}}_{j} = p_{j}$ as the proportion of correct responses on item $j$ .

The alternative is to use $kV (\frac{1}{k} \bar{x})$ as a consistent estimator for $kV (μ)$ . Plugging this and $s_{x}^{2}$ into Equation (7) yields the generalized form of KR21:

KR 21 = \frac{k}{k + v_{2}} (1 - \frac{kV (\frac{1}{k} \bar{x})}{s_{x}^{2}}) .

(11)

For dichotomous, Bernoulli-distributed item responses, this is

\frac{k}{k - 1} (1 - \frac{k (\frac{1}{k} \bar{x}) (1 - \frac{1}{k} \bar{x})}{s_{x}^{2}}),

which is the traditional KR21 formula, as $\bar{x}$ is the average sum score and so $\frac{1}{k} \bar{x}$ is the average proportion correct $\bar{p}$ .

The difference between KR20 and KR21, as far as this framework is concerned, is whether the variance function $V (\cdot)$ is applied “inside” the sum of item means on the right-hand side of Equation (9), yielding KR20, or “outside” the sum, yielding KR21. A table showing the six different NEF-QVF distributions for item responses, their variance functions, and their corresponding KR20 and KR21 estimators is given in Table 1. As shown, the binomial formulas match the formulas originally derived in Kuder and Richardson (1937). The Poisson KR21 formula here exactly matches equation (13) of Allison (1978), which originally derived KR21 for Poisson count data. This article shows that it is also the Poisson KR20 equivalent because the variance function for the Poisson distribution is simply the identity $V (θ_{i}) = θ_{i}$ , and so applying Equation (9) gives equality. The rest of the formulas are new, so far as can be determined. For the normal distribution, the error variance $σ^{2}$ around the response to each item is assumed known. As this is extremely unlikely to be the case in practice, Cronbach’s alpha is recommended instead.

Table 1.

Natural Exponential Family Distributions With Quadratic Variance Functions and Their Corresponding KR20 and KR21 Estimators, Given by Equations (10) and (11), Respectively.

$Y_{ij}$ distribution	$V (θ)$	$KR 20$	$KR 21$
Normal	$σ^{2}$	$(1 - \frac{\sum_{j = 1}^{k} σ^{2}}{s_{x}^{2}})$	$(1 - \frac{k σ^{2}}{s_{x}^{2}})$
Bernoulli	$θ - θ^{2}$	$\frac{k}{k - 1} (1 - \frac{\sum_{j = 1}^{k} {\bar{y}}_{j} (1 - {\bar{y}}_{j})}{s_{x}^{2}})$	$\frac{k}{k - 1} (1 - \frac{k (\frac{1}{k} \bar{x}) (1 - \frac{1}{k} \bar{x})}{s_{x}^{2}})$
Poisson	$θ$	$(1 - \frac{\sum_{j = 1}^{k} {\bar{y}}_{j}}{s_{x}^{2}})$	$(1 - \frac{\bar{x}}{s_{x}^{2}})$
Exponential	$θ^{2}$	$\frac{k}{k + 1} (1 - \frac{\sum_{j = 1}^{k} {\bar{y}}_{j}^{2}}{s_{x}^{2}})$	$\frac{k}{k + 1} (1 - \frac{\frac{1}{k} {\bar{x}}^{2}}{s_{x}^{2}})$
Geometric	$θ + θ^{2}$	$\frac{k}{k + 1} (1 - \frac{\sum_{j = 1}^{k} ({\bar{y}}_{j} + {\bar{y}}_{j}^{2})}{s_{x}^{2}})$	$\frac{k}{k + 1} (1 - \frac{\bar{x} + \frac{1}{k} {\bar{x}}^{2}}{s_{x}^{2}})$
GHS	$1 + θ^{2}$	$\frac{k}{k + 1} (1 - \frac{\sum_{j = 1}^{k} (1 + {\bar{y}}_{j}^{2})}{s_{x}^{2}})$	$\frac{k}{k + 1} (1 - \frac{k (1 + \frac{1}{k^{2}} {\bar{x}}^{2})}{s_{x}^{2}})$

Open in a new tab

Note. The sum scores $x_{i}$ for subject $i$ have sample mean $\bar{x}$ and sample variance $s_{x}^{2}$ , where sample mean and variance are calculated over subjects. Each test item has response $y_{ij}$ , with $i$ indexing subject and $j$ indexing item, and where ${\bar{y}}_{j}$ is the mean response for item $j$ , again averaging over subjects. The test length is $k$ items and the number of subjects is $n$ . Note that the use of the normal density assumes that the noise variance $σ^{2}$ around each test item is known, which is unlikely to be the case in practice. Cronbach’s alpha is recommended when the data are believed to be normal. Where possible, terms are canceled to simplify formulas, except in the case of the binomial so as to preserve KR20 and KR21 in their original forms. Also note that because $\sum_{j = 1}^{k} {\bar{y}}_{j} = \bar{x}$ as shown in Equation (9), the Poisson estimator is identical for both KR20 and KR21. The binomial estimators were first derived in Kuder and Richardson (1937) and the Poisson estimator was first derived in Allison (1978). GHS = generalized hyperbolic secant function.

Simulation Study

Simulation Method

A reasonable question is, why use these generalized KR20 or KR21 estimators when Cronbach’s alpha is available? What advantage do these estimators have? This question is answered with a simulation study.

The purpose of this simulation study is to show that when data are simulated from the correct NEF-QVF model, the generalized KR20 and KR21 estimators corresponding to that family both converge to the population reliability as the test length and number of subjects increase, and that they do so as or more efficiently than Cronbach’s alpha. It is not a simulation study to determine all properties of the estimators or to investigate all potential sources and magnitudes of bias, though several are discussed. For data generation, the method described in the appendix of Foster (2020) is used, which is identical to the simulation method of Huynh (1979) in the case of the binomial-beta model. Test data are simulated using the following algorithm:

Choose the appropriate NEF-QVF distribution for test item responses $y_{ij}$ , the mean ability $μ$ , the number of subjects $n$ , the number of test items $k$ , and desired population reliability $ρ$ .
Calculate the $M$ required to obtain the desired population reliability $ρ$ as

M = (\frac{1 - ρ}{ρ}) k .

3. Simulate $n$ abilities $θ_{i}$ from the conjugate prior $g (θ_{i} | μ, M)$ , one for each subject $i$ . The appropriate parameterization of the prior in terms of $μ$ and $M$ is given in the appendix of Foster (2020).
4. For each $θ_{i}$ , simulate $k$ responses from the NEF-QVF distribution $p (y_{ij} | θ_{i})$ . The sum score $x_{i}$ for each subject $i$ is calculated as the sum of these responses over the $k$ items.

This algorithm implies that the underlying distribution of talent levels $g (θ_{i} | μ, M)$ is different for each $(k, ρ)$ pair, as the parameter $M$ of the distribution is calculated anew for each. Hence, it is difficult to directly compare simulation results across two identical $k$ values if $ρ$ differs, or across identical $ρ$ values if $k$ differs. Again, the primary goal is to show convergence. The alternative is to keep the distribution of talent levels constant by keeping $M$ constant and varying the reliability through the test length $k$ , but this was not chosen because it limits the potential values of the desired population reliability $ρ$ and preliminary simulations indicated that the shape of the distribution of talent levels $g (θ_{i} | μ, M)$ was less influential than the value of $ρ$ , with the exception of extremely skewed distributions which may be of interest in other simulations. The balance of choices in producing a desired population reliability $ρ$ is delicate and it is not possible within this framework to have complete freedom in all choices. A more complete simulation study might achieve a desired $ρ$ through different methods in order to determine the effect of each. Though the mean ability $μ$ does not affect the population reliability directly, it does have the potential to affect the sampling distribution of reliability through manipulating aspects of the distribution of abilities $g (θ_{i} | μ, M)$ such as skew and kurtosis. In particular, preliminary simulations indicated that $μ$ values near the edges of the distribution support may lead to very skewed $g (θ_{i} | μ, M)$ distributions which could potentially have a large influence, but these were not considered for further study by simulation in this article. The mean $μ$ is instead kept constant throughout the simulation studies to avoid yet another quantity which could potentially affect results, though it could be of interest in a further simulation study. This algorithm also produces items which are of equal difficulty conditional on subject with ability $θ_{i}$ , though items may have different algebraic means.

For the choice of exponential family model, the Poisson-gamma, gamma-inverse gamma, negative binomial-F, and binomial-beta distributions are used, where the first distribution named is the distribution of sum scores $x_{i}$ and the second is the conjugate prior $g (θ_{i} | μ, M)$ . The binomial-beta distribution corresponds to the Kuder and Richardson (1937) coefficients and has been extensively studied but is also shown here for completeness and comparison to results for other estimators. The normal–normal assumes that the variance $σ^{2}$ is known, and as such is not of practical use. The distribution in the last row of Table 1, the natural exponential family generated by a convolution of generalized hyperbolic secant distributions, is neither common nor easy to simulate from.

For population ability $ρ,$ the values $0.3, 0.6,$ and $0.8$ are used, representing low, moderate, and high reliability. For test length, the values $k = 5$ , 10, and $30$ are used, representing a short, medium, and long test, respectively. For number of subjects, the values $n = 30$ , 75, and $500$ are used, representing a small, moderate, and large number of subjects, respectively. The value of the mean ability $μ$ is constant within each set of simulations at $μ = 1$ for the Poisson-gamma model and the gamma-inverse gamma model, $μ = 1.01$ for the negative binomial-F model, and $μ = 0.5$ for the binomial-beta model. These were simply chosen as reasonable values which worked well in simulation.

This gives a total of 27 sets of $(ρ, k, n)$ values for each exponential family model and four models. Details of the implied variance–covariance matrix are given in the appendix. For each set of values, a total of $N_{sims} = 1, 000, 000$ data sets were simulated and Cronbach’s alpha and the KR20 and KR21 equivalents given by Equations (10) and (11) were calculated for each. To judge the estimator for a given set of estimated reliabilities $\hat{ρ}$ , the sample RMSE is used:

RMSE (\hat{ρ}) = \sqrt{\frac{1}{N_{sims}} \sum_{s = 1}^{N_{sims}} {(ρ - {\hat{ρ}}_{s})}^{2}} .

The RMSE is further decomposed into the traditional bias and variance (squared standard deviation) of the estimator:

RMSE (\hat{ρ}) = \sqrt{Bia s^{2} (\hat{ρ}) + Var (\hat{ρ})} = \sqrt{{(\frac{1}{N_{sims}} \sum_{s = 1}^{N_{sims}} (ρ - {\hat{ρ}}_{s}))}^{2} + \frac{1}{N_{sims}} \sum_{s = 1}^{N_{sims}} {({\hat{ρ}}_{s} - {\bar{ρ}}_{s})}^{2}},

where ${\bar{ρ}}_{s}$ is the sample mean of the ${\hat{ρ}}_{s}$ . Note that decomposition requires the biased variance formula for exact algebraic equivalence, but for $N_{sims} = 10^{6}$ total simulations using either the biased or unbiased variance estimate gives identical results within the first three decimal places. The decomposition allows for further inspection of results in order to determine exact properties of the estimator and answer questions about why a particular estimator might be performing better or worse than another.

The results of the simulation study are shown in Table 2 and Figure 1 for the Poisson-gamma model, Table 3 and Figure 2 for the gamma-inverse gamma model, Table 4 and Figure 3 for the negative binomial-F model, and Table 5 and Figure 4 for the binomial-beta model. In the results, the sample standard deviation is given rather than the sample variance because the sample variance was often extremely small. Results are reported to three decimal places. The largest sample variance reported for any estimator is $0 . 251^{2}$ for Cronbach’s alpha in the first row of Table 3; thus, the approximate Monte Carlo standard error is less than or equal to $0.251 / \sqrt{10^{6}} = 0.000251$ , and quite often much smaller. For this reason, the reporting of three decimal places in results is appropriate. Again, note that the unbiased estimator of sample variance $s_{x}^{2}$ which divides by $n - 1$ is used in all cases. Furthermore, intervals around derived quantities are small. A 99% bootstrap interval for the RMSE of Cronbach’s alpha in the first row of Table 3, based on 10,000 bootstrap samples of the one million simulated alpha values, is $(0.251, 0.252)$ . The standard error of the point estimate of RMSE is thus smaller than the rounding amount 0.001, so confidence intervals surrounding the point estimates are not included.

Table 2.

Simulation Results for Cronbach’s Alpha and the KR20 and KR21 Equivalents for the Poisson-Gamma Model, Where Abilities $θ_{i}$ Follow a Gamma Distribution and Sum Test Scores $x_{i}$ Follow a Poisson Distribution.

Poisson-gamma
Parameters					Cronbach’s alpha			KR20			KR21
$μ$	$ρ$	$k$	$n$	$M$	RMSE	Bias	SD	RMSE	Bias	SD	RMSE	Bias	SD
1	0.30	5	30	11.67	0.247	−0.057	0.241	0.224	−0.057	0.216	0.224	−0.057	0.216
1	0.30	5	75	11.67	0.142	−0.022	0.140	0.128	−0.022	0.126	0.128	−0.022	0.126
1	0.30	5	500	11.67	0.052	−0.003	0.052	0.047	−0.003	0.047	0.047	−0.003	0.047
1	0.30	10	30	23.33	0.232	−0.054	0.225	0.221	−0.054	0.214	0.221	−0.054	0.214
1	0.30	10	75	23.33	0.132	−0.021	0.130	0.126	−0.021	0.124	0.126	−0.021	0.124
1	0.30	10	500	23.33	0.048	−0.003	0.048	0.046	−0.003	0.046	0.046	−0.003	0.046
1	0.30	30	30	70.00	0.223	−0.053	0.217	0.220	−0.053	0.213	0.220	−0.053	0.213
1	0.30	30	75	70.00	0.126	−0.020	0.124	0.124	−0.020	0.122	0.124	−0.020	0.122
1	0.30	30	500	70.00	0.046	−0.003	0.046	0.045	−0.003	0.045	0.045	−0.003	0.045
1	0.60	5	30	3.33	0.156	−0.041	0.151	0.141	−0.041	0.135	0.141	−0.041	0.135
1	0.60	5	75	3.33	0.090	−0.017	0.089	0.081	−0.016	0.080	0.081	−0.016	0.080
1	0.60	5	500	3.33	0.033	−0.003	0.033	0.030	−0.003	0.030	0.030	−0.003	0.030
1	0.60	10	30	6.67	0.139	−0.036	0.135	0.133	−0.036	0.128	0.133	−0.036	0.128
1	0.60	10	75	6.67	0.080	−0.014	0.079	0.076	−0.014	0.075	0.076	−0.014	0.075
1	0.60	10	500	6.67	0.029	−0.002	0.029	0.028	−0.002	0.028	0.028	−0.002	0.028
1	0.60	30	30	20.00	0.130	−0.032	0.126	0.128	−0.032	0.124	0.128	−0.032	0.124
1	0.60	30	75	20.00	0.074	−0.012	0.073	0.072	−0.012	0.071	0.072	−0.012	0.071
1	0.60	30	500	20.00	0.027	−0.002	0.027	0.026	−0.002	0.026	0.026	−0.002	0.026
1	0.80	5	30	1.25	0.096	−0.031	0.091	0.087	−0.031	0.081	0.087	−0.031	0.081
1	0.80	5	75	1.25	0.055	−0.013	0.054	0.050	−0.013	0.048	0.050	−0.013	0.048
1	0.80	5	500	1.25	0.020	−0.002	0.020	0.018	−0.002	0.018	0.018	−0.002	0.018
1	0.80	10	30	2.50	0.079	−0.023	0.075	0.075	−0.023	0.071	0.075	−0.023	0.071
1	0.80	10	75	2.50	0.045	−0.009	0.044	0.043	−0.009	0.042	0.043	−0.009	0.042
1	0.80	10	500	2.50	0.017	−0.001	0.017	0.016	−0.001	0.016	0.016	−0.001	0.016
1	0.80	30	30	7.50	0.068	−0.018	0.065	0.067	−0.018	0.064	0.067	−0.018	0.064
1	0.80	30	75	7.50	0.039	−0.007	0.038	0.038	−0.007	0.037	0.038	−0.007	0.037
1	0.80	30	500	7.50	0.014	−0.001	0.014	0.014	−0.001	0.014	0.014	−0.001	0.014

Open in a new tab

Note. The population reliability is $ρ$ , the test length is $k$ , the number of subjects is $n$ , and the parameters of the gamma distribution for abilities are $μ$ and $M$ . The exact parameterizations of $μ$ and $M$ are given in the appendix of Foster (2020). The number in the first column beneath the estimator is the sample root mean square error, RMSE. The numbers in the second and third columns are the sample bias and sample standard deviation (SD) of the estimator. The relationship between the three numbers is given by the formula $RMSE = \sqrt{Bia s^{2} + S D^{2}}$ (the formula may not be exact for results in the table due to rounding). For example, for $\hat{α}$ in the first row of the table, $0.247 = \sqrt{{(- 0.057)}^{2} + {0.241}^{2}}$ . One million simulated data sets are used for each set of parameters. The results are identical for KR20 and KR21 for the Poisson-gamma model because the two estimators are equal. The generalized KR21 for Poisson count data was first derived in Allison (1978).

Figure 1. — Difference in RMSE (alpha-KR21) for tests of different lengths for the Poisson-gamma model with $ρ = 0.3$ , grouped by number of subjects.

*Note*. The plot clearly shows that $\hat{α}$ has a larger RMSE than KR21, but this effect decreases toward zero as both the test length $k$ and number of subjects $n$ increases. The data in this plot are from Table 2. RMSE = root mean square error.

Table 3.

Simulation Results for Cronbach’s Alpha and the KR20 and KR21 Equivalents for the Gamma-Inverse Gamma Model, Where Abilities $θ_{i}$ Follow an Inverse Gamma Distribution and Sum Test Scores $x_{i}$ Follow a Gamma Distribution.

Gamma-inverse gamma
Parameters					Cronbach’s alpha			KR20			KR21
$μ$	$ρ$	$blackk$	$blackn$	$M$	RMSE	Bias	SD	RMSE	Bias	SD	RMSE	Bias	SD
1	0.30	5	30	11.67	0.262	−0.075	0.251	0.233	−0.098	0.211	0.221	−0.081	0.206
1	0.30	5	75	11.67	0.157	-0.033	0.154	0.134	−0.041	0.127	0.131	−0.035	0.126
1	0.30	5	500	11.67	0.061	−0.005	0.061	0.051	−0.007	0.050	0.051	−0.006	0.050
1	0.30	10	30	23.33	0.239	−0.063	0.230	0.233	−0.087	0.216	0.220	−0.066	0.210
1	0.30	10	75	23.33	0.140	−0.026	0.137	0.131	−0.035	0.126	0.127	−0.027	0.124
1	0.30	10	500	23.33	0.052	−0.004	0.052	0.048	−0.005	0.047	0.047	−0.004	0.047
1	0.30	30	30	70.00	0.226	−0.055	0.219	0.233	−0.080	0.219	0.220	−0.057	0.212
1	0.30	30	75	70.00	0.128	−0.021	0.127	0.128	−0.031	0.124	0.124	−0.022	0.123
1	0.30	30	500	70.00	0.047	−0.003	0.047	0.046	−0.004	0.046	0.046	−0.003	0.046
1	0.60	5	30	3.33	0.212	−0.099	0.187	0.191	−0.115	0.152	0.181	−0.104	0.148
1	0.60	5	75	3.33	0.135	−0.054	0.124	0.115	−0.060	0.098	0.112	−0.056	0.097
1	0.60	5	500	3.33	0.062	−0.015	0.060	0.051	−0.016	0.048	0.051	−0.015	0.048
1	0.60	10	30	6.67	0.168	−0.063	0.156	0.164	−0.078	0.144	0.155	−0.066	0.140
1	0.60	10	75	6.67	0.101	−0.029	0.097	0.095	−0.035	0.089	0.093	−0.030	0.088
1	0.60	10	500	6.67	0.041	−0.005	0.041	0.038	−0.006	0.037	0.038	−0.006	0.037
1	0.60	30	30	20.00	0.139	−0.040	0.134	0.144	−0.055	0.133	0.136	−0.041	0.129
1	0.60	30	75	20.00	0.080	−0.016	0.079	0.080	−0.022	0.077	0.078	−0.016	0.076
1	0.60	30	500	20.00	0.030	−0.002	0.030	0.029	−0.003	0.029	0.029	−0.003	0.029
1	0.80	5	30	1.25	0.199	−0.137	0.144	0.183	−0.150	0.104	0.175	−0.142	0.102
1	0.80	5	75	1.25	0.139	−0.095	0.102	0.120	−0.100	0.067	0.118	−0.097	0.067
1	0.80	5	500	1.25	0.081	−0.051	0.062	0.063	−0.052	0.035	0.062	−0.052	0.035
1	0.80	10	30	2.50	0.126	−0.072	0.103	0.124	−0.083	0.092	0.117	−0.075	0.090
1	0.80	10	75	2.50	0.080	−0.042	0.068	0.075	−0.046	0.060	0.073	−0.043	0.059
1	0.80	10	500	2.50	0.038	−0.014	0.035	0.034	−0.015	0.031	0.034	−0.014	0.031
1	0.80	30	30	7.50	0.083	−0.032	0.077	0.086	−0.040	0.077	0.081	−0.033	0.074
1	0.80	30	75	7.50	0.049	−0.014	0.047	0.049	−0.017	0.046	0.048	−0.014	0.046
1	0.80	30	500	7.50	0.020	−0.002	0.020	0.019	−0.003	0.019	0.019	−0.003	0.019

Open in a new tab

Note. The population reliability is $ρ$ , the test length is $k$ , the number of subjects is $n$ , and the parameters of the inverse gamma distribution for abilities are $μ$ and $M$ . The exact parameterizations of $μ$ and $M$ are given in the appendix of Foster (2020). The number in the first column beneath the estimator is the sample root mean square error, RMSE. The numbers in the second and third columns are the sample bias and sample standard deviation (SD) of the estimator. The relationship between the three numbers is given by the formula $RMSE = \sqrt{Bia s^{2} + S D^{2}}$ (the formula may not be exact for results in the table due to rounding). For example, for $\hat{α}$ in the first row of the table, $0.262 = \sqrt{{(- 0.075)}^{2} + {0.251}^{2}}$ . One million simulated data sets are used for each set of parameters.

Figure 2. — Difference in RMSE (alpha-KR21) for tests of different lengths for the gamma-inverse gamma model with $ρ = 0.3$ , grouped by number of subjects.

*Note*. The plot clearly shows that $\hat{α}$ has a larger RMSE than KR21, but this effect decreases toward zero as both the test length $k$ and number of subjects $n$ increases. The data in this plot are from Table 3. RMSE = root mean square error.

Table 4.

Simulation Results for Cronbach’s Alpha and the KR20 and KR21 Equivalents for the Negative Binomial-F Model, Where Abilities $θ_{i}$ Follow an F Distribution and Sum Test Scores $x_{i}$ Follow a Negative Binomial Distribution.

Negative binomial-F
Parameters					Cronbach’s alpha			KR20			KR21
$μ$	$ρ$	$k$	$n$	$M$	RMSE	Bias	SD	RMSE	Bias	SD	RMSE	Bias	SD
1.01	0.30	5	30	11.67	0.251	−0.067	0.242	0.220	−0.090	0.201	0.210	−0.073	0.197
1.01	0.30	5	75	11.67	0.151	−0.028	0.149	0.126	−0.037	0.120	0.123	−0.030	0.119
1.01	0.30	5	500	11.67	0.058	−0.005	0.058	0.047	−0.006	0.047	0.047	−0.005	0.047
1.01	0.30	10	30	23.33	0.235	−0.061	0.227	0.228	−0.084	0.212	0.215	−0.064	0.206
1.01	0.30	10	75	23.33	0.137	−0.024	0.135	0.127	−0.033	0.123	0.124	−0.025	0.122
1.01	0.30	10	500	23.33	0.051	−0.004	0.051	0.046	−0.005	0.046	0.046	−0.004	0.046
1.01	0.30	30	30	70.00	0.225	−0.055	0.218	0.232	−0.080	0.218	0.219	−0.057	0.212
1.01	0.30	30	75	70.00	0.128	−0.021	0.126	0.128	−0.031	0.124	0.124	−0.022	0.122
1.01	0.30	30	500	70.00	0.047	−0.003	0.047	0.046	−0.005	0.046	0.046	−0.003	0.045
1.01	0.60	5	30	3.33	0.164	−0.053	0.155	0.132	−0.068	0.113	0.125	−0.058	0.111
1.01	0.60	5	75	3.33	0.103	−0.024	0.100	0.076	−0.030	0.070	0.074	−0.026	0.069
1.01	0.60	5	500	3.33	0.042	−0.004	0.042	0.029	−0.005	0.029	0.029	−0.005	0.029
1.01	0.60	10	30	6.67	0.142	−0.043	0.136	0.135	−0.058	0.122	0.128	−0.046	0.119
1.01	0.60	10	75	6.67	0.085	−0.018	0.083	0.077	−0.024	0.073	0.075	−0.019	0.072
1.01	0.60	10	500	6.67	0.032	−0.003	0.032	0.028	−0.004	0.028	0.028	−0.003	0.028
1.01	0.60	30	30	20.00	0.132	−0.036	0.127	0.136	−0.050	0.126	0.128	−0.037	0.122
1.01	0.60	30	75	20.00	0.076	−0.014	0.074	0.075	−0.019	0.073	0.073	−0.014	0.072
1.01	0.60	30	500	20.00	0.028	−0.002	0.028	0.027	−0.003	0.027	0.027	−0.002	0.027
1.01	0.80	5	30	1.25	0.114	−0.032	0.109	0.064	−0.043	0.048	0.060	−0.038	0.046
1.01	0.80	5	75	1.25	0.080	−0.014	0.079	0.029	−0.018	0.022	0.027	−0.016	0.022
1.01	0.80	5	500	1.25	0.042	−0.003	0.042	0.009	−0.004	0.009	0.009	−0.003	0.009
1.01	0.80	10	30	2.50	0.075	−0.026	0.070	0.065	−0.035	0.055	0.061	−0.029	0.053
1.01	0.80	10	75	2.50	0.047	−0.012	0.045	0.037	−0.015	0.033	0.035	−0.013	0.033
1.01	0.80	10	500	2.50	0.019	−0.002	0.019	0.014	−0.003	0.014	0.014	−0.002	0.014
1.01	0.80	30	30	7.50	0.066	−0.020	0.063	0.068	−0.027	0.062	0.063	−0.021	0.060
1.01	0.80	30	75	7.50	0.038	−0.008	0.038	0.038	−0.011	0.036	0.037	−0.008	0.036
1.01	0.80	30	500	7.50	0.014	−0.001	0.014	0.014	−0.002	0.014	0.014	−0.001	0.014

Open in a new tab

Note. The population reliability is equal to $ρ$ , the test length is $k$ , the number of subjects is $n$ , and the parameters of the F distribution for abilities are $μ$ and $M$ . The exact parameterizations of $μ$ and $M$ are given in the appendix of Foster (2020). The number in the first column beneath the estimator is the sample root mean square error, RMSE. The numbers in the second and third columns are the sample bias and sample standard deviation (SD) of the estimator. The relationship between the three numbers is given by the formula $RMSE = \sqrt{Bia s^{2} + S D^{2}}$ (the formula may not be exact for results in the table due to rounding). For example, for $\hat{α}$ in the first row of the table, $0.251 = \sqrt{{(- 0.067)}^{2} + {0.242}^{2}}$ . One million simulated data sets are used for each set of parameters.

Figure 3. — Difference in RMSE (alpha-KR21) for tests of different lengths for the negative binomial-F model with $ρ = 0.3$ , grouped by number of subjects.

*Note*. The plot clearly shows that $\hat{α}$ has a larger RMSE than KR21, but this effect decreases toward zero as both the test length $k$ and number of subjects $n$ increases. The data in this plot are from Table 4. RMSE = root mean square error.

Table 5.

Simulation Results for Cronbach’s Alpha and KR20 and KR21 for the Binomial-Beta Model, Where Abilities $θ_{i}$ Follow a Beta Distribution and Sum Test Scores $x_{i}$ Follow a Binomial Distribution.

Binomial-beta
Parameters					Cronbach’s alpha			KR20			KR21
$μ$	$ρ$	$k$	$n$	$M$	RMSE	Bias	SD	RMSE	Bias	SD	RMSE	Bias	SD
0.50	0.30	5	30	11.67	0.240	−0.048	0.235	0.228	−0.015	0.228	0.238	−0.039	0.235
0.50	0.30	5	75	11.67	0.136	−0.018	0.135	0.133	−0.005	0.133	0.136	−0.015	0.135
0.50	0.30	5	500	11.67	0.050	−0.003	0.050	0.050	−0.001	0.050	0.050	−0.002	0.050
0.50	0.30	10	30	23.33	0.229	−0.050	0.223	0.217	−0.021	0.216	0.228	−0.046	0.223
0.50	0.30	10	75	23.33	0.129	−0.019	0.128	0.126	−0.008	0.126	0.129	−0.017	0.127
0.50	0.30	10	500	23.33	0.047	−0.003	0.047	0.047	−0.001	0.047	0.047	−0.002	0.047
0.50	0.30	30	30	70.00	0.221	−0.051	0.216	0.210	−0.025	0.208	0.221	−0.050	0.215
0.50	0.30	30	75	70.00	0.125	−0.019	0.124	0.122	−0.009	0.122	0.125	−0.019	0.124
0.50	0.30	30	500	70.00	0.045	−0.003	0.045	0.045	−0.001	0.045	0.045	−0.003	0.045
0.50	0.60	5	30	3.33	0.134	−0.023	0.132	0.127	−0.001	0.127	0.132	−0.015	0.131
0.50	0.60	5	75	3.33	0.077	−0.009	0.077	0.076	0.000	0.076	0.077	−0.005	0.077
0.50	0.60	5	500	3.33	0.028	−0.001	0.028	0.028	0.000	0.028	0.028	−0.001	0.028
0.50	0.60	10	30	6.67	0.127	−0.025	0.124	0.121	−0.007	0.120	0.126	−0.022	0.124
0.50	0.60	10	75	6.67	0.072	−0.009	0.072	0.071	−0.003	0.071	0.072	−0.008	0.072
0.50	0.60	10	500	6.67	0.026	−0.001	0.026	0.026	0.000	0.026	0.026	−0.001	0.026
0.50	0.60	30	30	20.00	0.125	−0.028	0.122	0.118	−0.012	0.118	0.125	−0.026	0.122
0.50	0.60	30	75	20.00	0.071	−0.010	0.070	0.069	−0.004	0.069	0.071	−0.010	0.070
0.50	0.60	30	500	20.00	0.026	−0.001	0.026	0.026	−0.001	0.026	0.026	−0.001	0.026
0.50	0.80	5	30	1.25	0.068	−0.009	0.067	0.065	0.006	0.065	0.067	−0.001	0.067
0.50	0.80	5	75	1.25	0.040	−0.003	0.040	0.039	0.003	0.039	0.040	0.000	0.040
0.50	0.80	5	500	1.25	0.015	0.000	0.015	0.015	0.000	0.015	0.015	0.000	0.015
0.50	0.80	10	30	2.50	0.061	−0.010	0.061	0.059	0.000	0.059	0.061	−0.007	0.060
0.50	0.80	10	75	2.50	0.036	−0.004	0.035	0.035	0.000	0.035	0.035	−0.002	0.035
0.50	0.80	10	500	2.50	0.013	−0.001	0.013	0.013	0.000	0.013	0.013	0.000	0.013
0.50	0.80	30	30	7.50	0.061	−0.012	0.059	0.057	−0.004	0.057	0.060	−0.011	0.059
0.50	0.80	30	75	7.50	0.034	−0.005	0.034	0.034	−0.001	0.034	0.034	−0.004	0.034
0.50	0.80	30	500	7.50	0.013	−0.001	0.013	0.013	0.000	0.013	0.013	−0.001	0.013

Open in a new tab

Note. The population reliability is equal to $ρ$ , the test length is $k$ , the number of subjects is $n$ , and the parameters of the beta distribution for abilities are $μ$ and $M$ . The exact parameterizations of $μ$ and $M$ are given in the appendix of Foster (2020). The number in the first column beneath the estimator is the sample root mean squared error, RMSE. The numbers in the second and third columns are the sample bias and sample standard deviation (SD) of the estimator. The relationship between the three numbers is given by the formula $RMSE = \sqrt{Bia s^{2} + S D^{2}}$ (the formula may not be exact for results in the table due to rounding). For example, for $\hat{α}$ in the first row of the table, $0.240 = \sqrt{{(- 0.048)}^{2} + {0.235}^{2}}$ . One million simulated data sets are used for each set of parameters. The KR20 and KR21 estimators for this model are the traditional ones derived in Kuder and Richardson (1937).

Figure 4. — Difference in RMSE (alpha-KR21) for tests of different lengths for the binomial-beta model with $ρ = 0.3$ , grouped by number of subjects.

*Note*. The plot clearly shows that $\hat{α}$ has a larger RMSE than KR21, but this effect decreases toward zero as both the test length $k$ and number of subjects $n$ increases. The data in this plot are from Table 5. RMSE = root mean square error.

Analysis of Results

The results of Tables 2 to 5 show that the primary aims of the simulation study are met. Both Cronbach’s alpha and the estimators in Table 1 decrease in RMSE and appear to be converging toward zero as the test length $k$ and the number of subjects $n$ increase. This can be seen for $ρ = 0.3$ in Figures 1 to 4. Furthermore, the estimators in Table 1 are nearly universally superior to Cronbach’s alpha, answering the initial question posed of why they might be preferred over alpha. It is only in high-data situations with a relatively large number of test items $k$ and large number of subjects $n$ that all estimators are nearly equal and the RMSE is approximately the same no matter which estimator you choose. In general, it appears that increasing the number of subjects $n$ can greatly lead to a reduction in RMSE through reduction of both bias and variance. For shorter tests with a smaller number of subjects, the generalized estimators provided by Equations (10) and (11) often strongly outperform alpha in terms of RMSE. However, there are noticeable differences in the exact manner the increase occurs. For example, the generalized KR21 estimator outperforms the generalized KR20 estimator for the gamma-inverse gamma and negative binomial-F results in Tables 3 and 4, but the reverse occurs for the binomial-beta model results given in Table 5. The bias of all three estimators is also generally negative, except for KR20 and KR21 in the binomial-beta model results given in Table 5. There are also differences in variances of the estimators across models. Though formal proof is not offered, some probability theory may help to explain the idiosyncrasies within each table and between tables.

First, it is noticeable that in all three sets of simulations the bias is negative for Cronbach’s alpha in all cases, though converging to zero as the number of subjects $n$ increases. Why? From Foster (2020), Cronbach’s alpha is estimating the quantity

α = \frac{k}{k - 1} (1 - \frac{\sum_{j = 1}^{k} Var (Y_{j})}{Var (X)})

using plug-in estimates of the sample variances $s_{y_{j}}^{2}$ and $s_{x}^{2}$ , though as in Foster (2020) this article assumes $Var (Y_{j})$ is identical for all test items $j$ . Essentially, it is a ratio of sample variances, and ratios of sample variances tend to be biased upward. For independent variances, Jensen’s inequality shows that the bias is always high, as $1 / s_{x}^{2}$ is a convex function of $s_{x}^{2}$ and so $E [1 / s_{x}^{2}] \geq 1 / E [s_{x}^{2}] = 1 / Var (X)$ ; however, the lack of independence between $s_{x}^{2}$ and $\sum s_{y_{j}}^{2}$ in the calculation of Cronbach’s alpha makes precise determination of the bias for alpha difficult. When the bias of the ratio is positive, then the act of subtracting the ratio from one in the calculation of Cronbach’s alpha flips the bias of the estimator as a whole to underestimation.

Also noticeable is that the bias of the KR21 and KR20 estimators is often larger than the bias of Cronbach’s alpha for the gamma-inverse gamma and negative binomial-F models, identical for the Poisson-gamma model, and smaller for the binomial-beta model. Why? Though is it difficult to state exactly the reasons, as Equations (10) and (11) are slightly different than alpha due to the presence of the $k / (k + v_{2})$ in front and because the two estimators are estimating the reliability using different quantities, a very strong clue is offered by Jensen’s inequality. The numerators of KR20 and KR21 in Equations (10) and (11) are still estimating variances similarly to alpha, only in an indirect manner. From Equation (7), the mean–variance relationship for exponential family random variables is being exploited to estimate the variance through estimation of the variance function $V (μ)$ . Because $V (μ)$ is not available, however, the plug-in estimate $V (\frac{1}{k} \bar{x})$ is used for the KR21 estimator. By Jensen’s inequality, $E [V (\frac{1}{k} \bar{x})] \geq V (E [\frac{1}{k} \bar{x}]) = V (μ)$ for convex functions $V (\cdot)$ . In Table 1, the variance function $V (μ)$ for both the exponential (Poisson-gamma model) and geometric (negative binomial-F model) distributions is convex. Hence, additional bias is introduced to the numerator of the ratio estimate, as the expected value of the variance function of the sample mean is larger than the variance function of the population mean. This is seen in the increased bias over Cronbach’s alpha in Tables 3 and 4. A similar argument applies for the KR20 estimator. The bias for the Poisson-gamma model in Table 2, however, is equal for both Cronbach’s alpha and the KR20 and KR21 equivalents. This is because the variance function for the Poisson distribution $V (μ) = μ$ is linear, and thus $E [V (\frac{1}{k} \bar{x})] = V (E [\frac{1}{k} \bar{x}]) = V (μ)$ . The bias for KR20 and KR21 in the binomial-beta model is smaller than the bias of Cronbach’s alpha, as seen in Table 5, because the variance function $V (μ) = μ - μ^{2}$ is concave, and so $E [V (\frac{1}{k} \bar{x})] \leq V (E [\frac{1}{k} \bar{x}]) = V (μ)$ by Jensen’s inequality. For most estimators, a downward bias would be a disadvantage. However, the bias of the standard binomial KR20 and KR21 appear in the numerator of a ratio estimator that is already biased upward. By all accounts, it appears that the downward bias for the traditional KR20 and KR21 from the use of the variance function applied to the sample mean acts as a counterweight to the upward bias coming from taking the ratio of sample variances, which is a rather remarkable idea.

It has long been noted that in the standard binomial case, KR21 serves as a lower bound for KR20 (Kuder & Richardson, 1937). This is also easily seen as an application of Jensen’s inequality in its algebraic form. Because $V (θ) = θ - θ^{2}$ is concave, Jensen’s inequality gives

kV (\frac{1}{k} \bar{x}) = kV (\frac{1}{k} \sum_{j = 1}^{k} {\bar{y}}_{j}) \geq k (\frac{1}{k}) \sum_{j = 1}^{k} V ({\bar{y}}_{j}) = \sum_{j = 1}^{k} V ({\bar{y}}_{j})

and once again, the act of subtracting from 1 in Equations (10) and (11) flips the inequality to obtain the standard result for the inequalities as a whole. Conversely, for convex variance functions, as is the case with the exponential (gamma - inverse gamma) and geometric (negative binomial-F model) data, the inequality is reversed and KR20 serves as a lower bound to KR21. For Poisson data with identity variance function, the two estimators are equal, as previously discussed. Because KR20 and KR21 are both generally underestimating the reliability, this has the effect that KR20 is less biased than KR21 for the binomial-beta model, with a smaller variance, mirroring the traditional wisdom that KR20 is superior to KR21 for dichotomous data. KR20 is more biased than KR21 for the gamma-inverse gamma and negative binomial-F models, with a larger variance.

Last, even though the bias of the KR20 and KR21 estimators is sometimes larger, it compensates for this by a greatly decreased variance. What is the reason for this decrease in variance? A possible explanation is that it comes from using sample means in combination with the mean–variance relationship of exponential families to estimate a variance rather than using sample variances themselves. Sample means converge to population means at a rate of $\sqrt{n}$ , where $n$ is the number of test subjects, while sample variances converge to population variances at a rate of $n$ (Casella & Berger, 2002). Simply put, sample means converge faster than sample variances. Hence, estimating item variances using sample means in combination with a mean–variance relationship should tend to be more efficient than using sample variances alone.

Alternative Reliability Estimators

Cronbach’s alpha, though popular is far from the only estimator available, and its use has been criticized (McNeish, 2018; Sijtsma, 2009) and alternatively defended (Raykov & Marcoulides, 2019). One alternative based on analysis of a factorial structure is McDonald’s (1999) omega, which has been proposed as preferable to Cronbach’s alpha (Hayes & Coutts, 2020; McNeish, 2018). Different estimators called omega exist, but two in particular, omega hierarchical ( $ω_{H})$ , which calculates the general factor saturation of a test and omega total ( $ω_{T}$ ), which also includes specific factors, have been a focus of interest (Revelle & Zinbarg, 2009; Savalei & Reise, 2019). Another commonly proposed alternative is the greatest lower bound (Sijtsma, 2009; Trizano-Hermosilla & Alvarado, 2016), but Revelle and Zinbarg (2009) note that it is potentially smaller than $ω_{T}$ , and its performance has been examined in Trizano-Hermosilla and Alvarado (2016). It is, however, worth comparing the generalized estimators of Table 1 with $ω_{H}$ and $ω_{T}$ to evaluate performance. Because the data is tau-equivalent, omega will equal reliability and the simulation study will evaluate performance as an estimator.

To do so, a smaller simulation study similar to that of the “Simulation Method” section was performed. The Poisson-gamma (P-G), gamma-inverse gamma (G-IG), and binomial-beta (B-B) models were used to simulate data with a desired population reliability of $ρ = 0.8$ , chosen to avoid possible numerical issues with low reliabilities. Test lengths of $k = 5, 10$ and number of subjects $n = 30, 75$ were chosen to present results with large RMSEs so that differences between the estimators are more easily observed. A total of $N_{sims} = 1, 000, 000$ data sets were generated for each model. The generalized KR21 estimator was chosen for comparison because it performed the best on the Poisson-gamma and gamma-inverse gamma models, as seen in Tables 2 and 3. Calculation of $ω_{H}$ and $ω_{T}$ was performed using the psych package in R (Revelle, 2020). RMSE and both bias and variance of $KR 21, ω_{H}$ , and $ω_{T}$ are shown in Table 6.

Table 6.

Simulation Results for the KR21 Equivalent for the Poisson-Gamma (P-G), Gamma-Inverse Gamma (I-G), and Binomial-Beta (B-B) Models Compared With Omega Hierarchical and Omega Total, $ω_{H}$ and $ω_{T}$ , as Measured by Root Mean Square Error.

Parameters						$ω_{H}$			$ω_{T}$			KR21
Model	$μ$	$ρ$	$k$	$n$	$M$	RMSE	Bias	SD	RMSE	Bias	SD	RMSE	Bias	SD
P-G	1	0.80	5	30	1.25	0.201	−0.153	0.130	0.090	0.061	0.066	0.087	−0.031	0.081
P-G	1	0.80	10	30	2.50	0.310	−0.281	0.131	0.077	0.050	0.059	0.075	−0.023	0.071
P-G	1	0.80	5	75	1.25	0.145	−0.114	0.088	0.060	0.041	0.045	0.050	−0.013	0.048
P-G	1	0.80	10	75	2.50	0.250	−0.230	0.096	0.053	0.039	0.036	0.043	−0.009	0.042
G-IG	1	0.80	5	30	1.25	0.309	−0.248	0.183	0.116	0.034	0.110	0.175	−0.142	0.102
G-IG	1	0.80	10	30	2.50	0.355	−0.317	0.159	0.090	0.035	0.083	0.118	−0.076	0.090
G-IG	1	0.80	5	75	1.25	0.270	−0.221	0.156	0.091	0.030	0.085	0.118	−0.097	0.066
G-IG	1	0.80	10	75	2.50	0.331	−0.306	0.124	0.068	0.035	0.059	0.073	−0.043	0.059
B-B	0.5	0.80	5	30	1.25	0.173	−0.135	0.108	0.084	0.066	0.052	0.067	−0.001	0.067
B-B	0.5	0.80	10	30	2.50	0.296	−0.272	0.118	0.073	0.055	0.048	0.061	−0.007	0.061
B-B	0.5	0.80	5	75	1.25	0.124	−0.101	0.071	0.054	0.041	0.035	0.040	0.000	0.040
B-B	0.5	0.80	10	75	2.50	0.231	−0.215	0.084	0.049	0.040	0.029	0.035	−0.002	0.035

Open in a new tab

Note. The omega reliability calculations are performed by the psych package in R Revelle (2020). The number in the first column beneath the estimator is the sample root mean square error, RMSE. The numbers in the second and third columns are the sample bias and sample standard deviation (SD) of the estimator. The relationship between the three numbers is given by the formula $RMSE = \sqrt{Bia s^{2} + S D^{2}}$ (the formula may not be exact for results in the table due to rounding). For example, for $ω_{H}$ in the first row of the table, $0.201 = \sqrt{{(- 0.153)}^{2} + {0.130}^{2}}$ . One million simulated data sets are used for each set of parameters. P-G = Poisson-gamma; G-IG = gamma-inverse gamma; B-B = binomial-beta.

The results of the study generally show that $ω_{H}$ has the worst performance in terms of RMSE, including both elements of bias and variance, while the generalized KR21 formulas of Table 1 are competitive with $ω_{T}$ for superior RMSE. In some scenarios one is larger, while in the others the second is larger. Noticeably, it occasionally occurs that one estimator has a larger RMSE but a lower standard deviation, suggesting that bias correction could be a worthwhile strategy to improve the estimator. Also noticeable is that the bias of $ω_{T}$ is generally small, but positive. One possible explanation for the small bias is that $ω_{T}$ may be overfitting the model, which causes an upward bias in estimation of reliability that counters the general downward bias from ratios of sample statistics seen in other estimators. However, as Savalei and Reise (2019) note, further study regarding the properties of reliability coefficients such as $ω_{H}$ and $ω_{T}$ is needed to determine their statistical properties.

Example

A small example shows how and why these formulas can be applied in practice. The data in Table 7 are taken from table 3 of Moore (1970) and represent a count of the number of times a specific letter was seen in a clerical speed test, for which the time has been subdivided into blocks of 10 units. Only the first nine blocks of time are used in Table 7 in order to keep test length equal for all subjects. Chi-square goodness-of-fit tests performed by Moore (1970) show that a Poisson process is a good fit to the data, and so the number of responses in each block of time may be treated as an independent and identical Poisson random variable with mean $θ_{i}$ for subject $i$ . The data may therefore also be treated as unidimensional and tau-equivalent. If the data were larger, either in terms of number of subjects or number of items, a regression of sample variance on sample means could indicate a linear relationship which would indicate that the Poisson model is appropriate. For small data, recognizing that the observation is a count should draw attention to the Poisson distribution for responses.

Table 7.

Data From Moore (1970) Showing the Count of the Number of Responses for Each of 10 Subjects in Consecutive Intervals of Transformed Time.

Subject	0-10	10-20	20-30	30-40	40-50	50-60	60-70	70-80	80-90	$x_{i}$	$\frac{1}{k} x_{i}$	$s^{2}$
1	13	10	12	8	12	10	9	16	8	98	10.89	6.86
2	14	9	9	13	10	10	14	9	11	99	11.00	4.50
3	11	10	5	12	12	8	12	11	5	86	9.56	4.50
4	12	13	4	8	7	10	14	1	14	83	9.22	21.19
5	16	8	9	7	11	12	16	8	11	98	10.89	11.11
6	11	13	7	8	9	7	8	7	4	74	8.22	6.69
7	13	11	10	9	15	9	18	10	5	100	11.11	14.36
8	11	5	9	9	9	13	11	15	6	88	9.78	9.94
9	13	8	10	11	11	12	8	7	4	84	9.33	8.00
10	12	9	12	9	14	8	12	10	7	93	10.33	5.25

Open in a new tab

Note. These data may be treated as a Poisson process, where each response is a realization of a Poisson random variable with mean given by a subject’s ability $θ_{i}$ .

The reliability of this test represents the theoretical correlation of the sum scores $x_{i}$ with sum scores on a parallel test. Ideally, this correlation will be high to indicate that the sum scores are accurately capturing each subject’s counting ability under time constraints. However, the data contain only $k = 9$ responses for each of $n = 10$ subjects, and the data are discrete and nonnegative. Cronbach’s alpha for this data set is $α = 0.079$ , which is low. A problem is that the small number of responses per subject indicates that item variances are not being efficiently estimated. McDonald’s omega for this data set gives $ω_{H} = 0.726$ and $ω_{T} = 0.802$ , but factor analysis on a small, exceedingly nonnormal data set should call for caution with respect to the statistical properties, and the results of Table 6 indicate that this could be an overestimate.

The generalized KR20 and KR21 estimators of Table 1, originally derived in Allison (1978) and which are equal in this case, resolve this issue by utilizing the mean–variance relationship for Poisson data to obtain reliability. For Poisson data, the variance of a subject’s response for each item is equal to their mean ability. The generalized KR21 estimator therefore uses sample means to obtain an estimate of $\hat{ρ} = 0.626$ . The simulation results of Table 2 indicate that this estimate is likely still biased downward but is a more efficient estimator than alternatives.

Conclusion, Suggestions, and Future Research Directions

This article has introduced formulas which may be seen as the equivalent of the classical KR20 and KR21 formulas of Kuder and Richardson (1937) for some nondichotomous data, extending to data which may be modeled using an NEF-QVF distribution. Simulations show that when the model is satisfied, the KR20 and KR21 equivalent formulas appear to converge to the population reliability and are generally a more efficient estimator of reliability than Cronbach’s alpha. Exponential families are defined by mean–variance relations, so these formulas may be considered when a quadratic relationship between subject mean and subject variance appears to be present, as in Equation (1), which can be checked with analysis of sample variances as a function of sample means, exactly in the way a generalized linear model is chosen. Certain types of data such as dichotomous data (binomial), count data (Poisson), or data measuring time between events (exponential) also naturally fit this framework, especially when each response may be assumed to have roughly the same difficulty.

It is important to state the assumptions made in the derivation of these estimators: that the response to each test item identically follows a natural exponential family conditionally independent given ability $θ_{i}$ for subject $i$ , implying that all item difficulties are equal for a subject, and that abilities themselves follow the corresponding conjugate prior distribution. Not all of these assumptions are made in the derivations of the traditional binomial KR20 and KR21 in Kuder and Richardson (1937) and the Poisson KR21 in Allison (1978), suggesting that some assumptions may be superfluous, particularly the assumption that abilities follow the conjugate prior distribution. L. J. Feldt (1984) also derives the traditional binomial KR20 and KR21 formulas simply from the binomial error model. It is likely that these generalized KR20 and KR21 formulas may similarly be derived using a general error model from the mean–variance relationship of natural exponential families, and possibly for exponential families without quadratic variance functions. Furthermore, the difference in use between KR20 and KR21 in Kuder and Richardson (1937) is taken to be whether or not all items have equal difficulties. Zimmerman (1972) showed that the traditional binomial KR21 does not equal reliability when item difficulties are unequal, and it seems incredibly likely that some similar property may hold for the generalized KR20 and KR21 estimators of Table 1. How these estimators perform with items of varying difficulty, and whether there exist theoretical guarantees of performance, remains unknown excepting the case of the traditional binomial KR20 and KR21 formulas.

As stated in the “Framework” section, the data in this framework are implied to be unidimensional and tau-equivalent. If either of these assumptions is incorrect, none of KR20, KR21, or alpha are appropriate, and a reliability estimator based on analysis of a factorial structure, such as $ω_{T}$ of McDonald (1999), is more appropriate. Unidimensionality should be tested before reliability is calculated using these formulas. When assumptions are appropriate, the “Alternative Reliability Estimators” section shows that the KR20 and KR21 values are competitive with $ω_{T}$ in terms of RMSE.

The advantage of these formulas is that when the data can be assumed to come from the correct model, use of distribution-specific formulas will generally provide a superior estimate of the test reliability as measured by mean square error as compared with Cronbach’s alpha. This reduction occurs primarily through a reduction of the variance of the estimator, though for some estimators with an increase in bias. In general, for negative binomial and gamma data, the KR21 estimator outperforms the KR20 estimator and has less bias, though both generally outperform Cronbach’s alpha. For Poisson data, the two estimators are equal, and both outperform Cronbach’s alpha. For binomial data, simulations agree with conventional wisdom that the traditional KR20 estimator is superior to KR21. This is verified through simulations in the “Simulation Study” section. Notably, it is only in the case of dichotomous data, assumed to come from a binomial distribution, that KR20 outperforms KR21. Though the reasons for this are not fully clear, Jensen’s inequality provides a strong clue.

Cronbach’s alpha can be used when no assumptions are made regarding the distribution of responses. Simulations show that while it is not the most efficient estimator of reliability, it is a consistent estimator that performs well over all the exponential families tested. The use of distribution-specific formulas such as KR20 and KR21 can then be seen as a riskier method of estimating reliability—when the parametric assumption is correct, the distribution-specific formula provides improved performance; however, if an incorrect distribution is assumed, performance may be disastrously worse. This raises the obvious question of how “wrong” a parametric assumption must be in order for an estimate based on it to be worse than Cronbach’s alpha, which could potentially be the subject of a future simulation study.

Bias is present in both Cronbach’s alpha and the equivalent KR20 and KR21 estimators. Noticeably, the bias is universally negative for alpha and almost universally negative for both KR20 and KR21, excepting only a few instances of positive bias for KR20 in the binomial-beta model. As discussed in the “Simulation Study” section, Jensen’s inequality may help to explain the direction of the bias, though further study is needed. As noted in the “Alternative Reliability Estimators” section, bias correction of the estimator could lead to improved performance. In general, a strategy to fix bias in an estimate of reliability could be to perform a bootstrap, parametric, or otherwise, to obtain an estimate of the bias and then bias-correct the original estimate of reliability. The accuracy of this procedure would depend on whether the bias is the same at both the true and estimated values of reliability, which is not yet known; however, even an inaccurate bootstrap bias-corrected estimate may prove to be more useful than an uncorrected estimate. As the RMSE of KR20 and KR21 for the gamma-inverse gamma and negative binomial-F models is smaller than the RMSE of alpha despite the increase in bias, a bootstrap bias-corrected version of the generalized KR20 and KR21 estimators for these models might prove incredibly useful.

Last, the “Alternative Reliability Estimators” section showed that $ω_{T}$ remains a strong estimator of reliability in the presence of nonnormality, sometimes superior to the KR21 estimates, but other times inferior. The exact mechanism for this remains unknown, and the framework of the “Framework” section could potentially be used to analyze statistical properties of estimators of factorial structures in the presence of nonnormality. It could also potentially be possible to derive estimators of the factorial structure which utilize the mean–variance relationship of exponential families with improved performance when the distribution of responses is known to take a certain form.

These formulas have been shown to be strong estimators of reliability in the scenario of perfect match to the assumed exponential model and represent a small step toward the derivation of a more complete theory for properties of reliability estimators in the presence of nonnormality. However, questions still remain about their use in the presence of items of unequal difficulty, lack of unidimensionality, and what their properties are both when the data do and do not perfectly fit the assumed exponential model. Further research is needed, and these formulas offer many fruitful avenues for such research.

Appendix

The elements of the variance–covariance matrices used in the simulations of the “Simulation Study” section are easily obtained through applying the formulas of Foster (2020). For an individual test item $y_{j}$ or pair of test items $y_{j, 1}, y_{j, 2}$ , the unconditional variance and covariance in this framework are

Var (y_{j}) = E [V (θ_{i})] + Var (θ_{i}) = Var (θ_{i}) (M + 1) = (\frac{V (μ)}{M - v_{2}}) (M + 1)

Cov (y_{j, 1}, y_{j, 2}) = Var (θ_{i}) = \frac{V (μ)}{M - v_{2}} .

These equalities hold for all natural exponential family distributions except for the last line, which depends on the mean–variance relationship given in Equation (6) and holds only for those with quadratic variance function, as discussed in this article. The function $V (μ)$ is the variance function of the natural exponential family applied to the underlying population mean, several of which are shown in Table 1, and $v_{2}$ is the coefficient of the quadratic term of the quadratic variance function, as shown in Equation (1).

If the parameter $M$ is defined by the desired population value of Cronbach’s alpha $ρ$ and test length $k$ by the relation $M = [(1 - ρ) / ρ] k$ as in the “Simulation Study” section, these formulas reduce to

Var (y_{j}) = V (μ) [\frac{(1 - ρ) k + ρ}{(1 - ρ) k - ρ v_{2}}]

Cov (y_{j, 1}, y_{j, 2}) = V (μ) [\frac{ρ}{(1 - ρ) k - ρ v_{2}}] .

Within the variance–covariance matrix, all test items have identical variances and all covariances between them are equal. The variances and covariance for the simulations in the “Simulation Study” section are as follows.

Poisson-Gamma

For the Poisson-gamma model, the variance function is $V (θ) = θ$ with $v_{2} = 0$ . This yields variance and covariance elements as

Var (y_{j}) = \frac{μ}{M} (M + 1)

Cov (y_{j, 1}, y_{j, 2}) = \frac{μ}{M} .

$ρ$	$k$	$μ$	$M$	$V (y_{j})$	$Cov (y_{j, 1}, y_{j, 2})$
0.3	5	1	11.67	1.0857	0.0857
0.3	10	1	23.33	1.0429	0.0429
0.3	30	1	70	1.0143	0.0143
0.6	5	1	3.33	1.3	0.3
0.6	10	1	6.67	1.15	0.15
0.6	30	1	20	1.05	0.05
0.8	5	1	1.25	1.8	0.8
0.8	10	1	2.5	1.4	0.4
0.8	30	1	7.5	1.1333	0.1333

Open in a new tab

Gamma-Inverse Gamma

For the gamma-inverse gamma model, the variance function is $V (θ) = θ^{2}$ with $v_{2} = 1$ . This yields variance and covariance elements as

Var (y_{j}) = (\frac{μ^{2}}{M - 1}) (M + 1)

Cov (y_{j, 1}, y_{j, 2}) = \frac{μ^{2}}{M - 1} .

$ρ$	$k$	$μ$	$M$	$V (y_{j})$	$Cov (y_{j, 1}, y_{j, 2})$
0.3	5	1	11.67	1.0857	0.0857
0.3	10	1	23.33	1.0429	0.0429
0.3	30	1	70	1.0143	0.0143
0.6	5	1	3.33	1.3	0.3
0.6	10	1	6.67	1.15	0.15
0.6	30	1	20	1.05	0.05
0.8	5	1	1.25	1.8	0.8
0.8	10	1	2.5	1.4	0.4
0.8	30	1	7.5	1.1333	0.1333

Open in a new tab

$ρ$	$k$	$μ$	$M$	$V (y_{j})$	$Cov (y_{j, 1}, y_{j, 2})$
0.3	5	1.01	11.67	2.4107	0.1903
0.3	10	1.01	23.33	2.2119	0.0909
0.3	30	1.01	70	2.0889	0.0294
0.6	5	1.01	3.33	3.7702	0.87
0.6	10	1.01	6.67	2.7466	0.3583
0.6	30	1.01	20	2.2438	0.1068
0.8	5	1.01	1.25	18.2709	8.1204
0.8	10	1.01	2.5	4.7369	1.3534
0.8	30	1.01	7.5	2.6547	0.3123

Open in a new tab

Note that for small values of $M$ the resulting distributions of $y_{j}$ become heavy tailed, and so sample variances and covariances may not match theoretical values due to outliers.

Negative Binomial-F

For the negative binomial-F model, the variance function is $V (θ) = θ + θ^{2}$ with $v_{2} = 1$ . This yields variance and covariance elements as

Var (y_{j}) = (\frac{μ + μ^{2}}{M - 1}) (M + 1)

Cov (y_{j, 1}, y_{j, 2}) = \frac{μ + μ^{2}}{M - 1} .

Binomial-Beta

For the binomial-beta model, the variance function is $V (θ) = θ (1 - θ)$ with $v_{2} = - 1$ . This yields variance and covariance elements as

Var (y_{j}) = μ (1 - μ)

Cov (y_{j, 1}, y_{j, 2}) = \frac{μ (1 - μ)}{M + 1} .

$ρ$	$k$	$μ$	$M$	$V (y_{j})$	$Cov (y_{j, 1}, y_{j, 2})$
0.3	5	0.5	11.67	0.25	0.0197
0.3	10	0.5	23.33	0.25	0.0103
0.3	30	0.5	70	0.25	0.0035
0.6	5	0.5	3.33	0.25	0.0577
0.6	10	0.5	6.67	0.25	0.0326
0.6	30	0.5	20	0.25	0.0119
0.8	5	0.5	1.25	0.25	0.1111
0.8	10	0.5	2.5	0.25	0.0714
0.8	30	0.5	7.5	0.25	0.0294

Open in a new tab

Footnotes

Declaration of Conflicting Interests: The author declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding: The author received no financial support for the research, authorship, and/or publication of this article.

ORCID iD: Robert C. Foster Inline graphic https://orcid.org/0000-0002-0245-8264

References

Allison P. D. (1978). The reliability of variables measured as the number of events in an interval of time. Sociological Methodology, 9, 238-253. 10.2307/270811 [DOI] [Google Scholar]
Bay K. S. (1973). The effect of non-normality on the sampling distribution of standard error of reliability coefficient estimates under an analysis of variance model. British Journal of Mathematical and Statistical Psychology, 26(1), 45-57. 10.1111/j.2044-8317.1973.tb00505.x [DOI] [Google Scholar]
Casella G., Berger R. (2002). Statistical inference (2nd ed.). Duxbury Press. [Google Scholar]
Cronbach L. J. (1951, September). Coefficient alpha and the internal structure of tests. Psychometrika, 16(3), 297-334. 10.1007/BF02310555 [DOI] [Google Scholar]
Feldt L. J. (1984). Some relationships between the binomial error model and classical test theory. Educational and Psychological Measurement, 44(4), 883-891. 10.1177/0013164484444010 [DOI] [Google Scholar]
Feldt L. S. (1965, September). The approximate sampling distribution of Kuder-Richardson reliability coefficient twenty. Psychometrika, 30(3), 357-370. 10.1007/BF02289499 [DOI] [PubMed] [Google Scholar]
Foster R. C. (2020, June). A generalized framework for classical test theory. Journal of Mathematical Psychology, 96, Article 102330. 10.1016/j.jmp.2020.102330 [DOI]
Geldhof G. J., Preacher K. J., Zyphur M. J. (2014). Reliability estimation in a multilevel confirmatory factor analysis framework. Psychological Methods, 19(1), 72-91. 10.1037/a0032138 [DOI] [PubMed] [Google Scholar]
Hayes A. F., Coutts J. J. (2020). Use omega rather than Cronbach’s alpha for estimating reliability. But . . . Communication Methods and Measures, 14(1), 1-24. 10.1080/19312458.2020.1718629 [DOI] [Google Scholar]
Huynh H. (1979). Statistical inference for two reliability indices in mastery testing based on the beta-binomial model. Journal of Educational Statistics, 4(3), 231-246. 10.3102/10769986004003231 [DOI] [Google Scholar]
Keats J. A., Lord F. M. (1962, March). A theoretical distribution for mental test scores. Psychometrika, 27(1), 59-72. 10.1007/BF02289665 [DOI] [Google Scholar]
Kuder G. F., Richardson M. W. (1937). The theory of the estimation of test reliability. Psychometrika, 2(3), 151-160. 10.1007/BF02288391 [DOI] [Google Scholar]
Lord F. M. (1965, September). A strong true-score theory, with applications. Psychometrika, 30(3), 239-270. 10.1007/BF02289490 [DOI] [PubMed] [Google Scholar]
Lord F. M., Novick M. R., Birnbaum A. (1968). Statistical theories of mental test scores. Addison-Wesley. [Google Scholar]
McDonald R. (1999). Test theory: A unified treatment. Taylor & Francis. [Google Scholar]
McNeish D. (2018). Thanks coefficient alpha, we’ll take it from here. Psychological Methods, 23(3), 412-433. 10.1037/met0000144 [DOI] [PubMed] [Google Scholar]
Meredith W. (1971, August). Poisson distributions of error in mental test theory. British Journal of Mathematical and Statistical Psychology, 24(1), 49-82. 10.1111/j.2044-8317.1971.tb00449.x [DOI] [Google Scholar]
Moore W. E. (1970, November). Stochastic processes as true-score models for highly speeded mental tests (Tech. Rep.). Educational Testing Service. 10.1002/j.2333-8504.1970.tb00795.x [DOI]
Morris C. N. (1982). Natural exponential families with quadratic variance functions. Annals of Statistics, 10(1), 65-80. 10.1214/aos/1176345690 [DOI] [Google Scholar]
Morris C. N. (1983). Natural exponential families with quadratic variance functions: Statistical theory. Annals of Statistics, 11(2), 515-529. 10.1214/aos/1176346158 [DOI] [Google Scholar]
Rasch G. (1960). Probabilistic models for some intelligence and attainment tests. Danmarks Paedagogiske Institut. [Google Scholar]
Raykov T. (1998). A method for obtaining standard errors and confidence intervals of composite reliability for congeneric items. Applied Psychological Measurement, 22(4), 369-374. 10.1177/014662169802200406 [DOI] [Google Scholar]
Raykov T., Marcoulides G. A. (2019). Thanks coefficient alpha, we still need you! Educational and Psychological Measurement, 79(1), 200-210. 10.1177/0013164417725127 [DOI] [PMC free article] [PubMed] [Google Scholar]
Revelle W. (2020). psych: Procedures for psychological, psychometric, and personality research. R package Version 2.0.12 [Computer software manual]. https://CRAN.R-project.org/package=psych
Revelle W., Zinbarg R. (2009, 03). Coefficients alpha, beta, omega, and the glb: Comments on Sijtsma. Psychometrika, 74(1), Article 145. 10.1007/s11336-008-9102-z [DOI] [Google Scholar]
Savalei V., Reise S. P. (2019). Dont forget the model in your model-based reliability coefficients: A reply to McNeish (2018). Collabra: Psychology, 5(1), Article 36. 10.1525/collabra.247 [DOI] [Google Scholar]
Schmitt N. (1996). Uses and abuses of coefficient alpha. Psychological Assessment, 8(4), 350-353. 10.1037/1040-3590.8.4.350 [DOI] [Google Scholar]
Sheng Y., Sheng Z. (2012, February). Is coefficient alpha robust to non-normal data? Frontiers in Psychology, 3, Article 34. 10.3389/fpsyg.2012.00034 [DOI] [PMC free article] [PubMed] [Google Scholar]
Sijtsma K. (2009). On the use, the misuse, and the very limited usefulness of Cronbach’s alpha. Psychometrika, 74(1), 107-120. 10.1007/s11336-008-9101-0 [DOI] [PMC free article] [PubMed] [Google Scholar]
Trizano-Hermosilla I., Alvarado J. (2016, 05). Best alternatives to Cronbach’s alpha reliability in realistic conditions: Congeneric and asymmetrical measurements. Frontiers in Psychology, 7, Article 769. 10.3389/fpsyg.2016.00769 [DOI] [PMC free article] [PubMed]
van Zyl J. M., Neudecker H., Nel D. G. (2000). On the distribution of the maximum likelihood estimator of Cronbach’s alpha. Psychometrika, 65(3), 271-280. 10.1007/BF02296146 [DOI] [Google Scholar]
Zimmerman D. W. (1972). Test reliability and the Kuder-Richardson formulas: Derivation from probability theory. Educational and Psychological Measurement, 32(4), 939-954. 10.1177/001316447203200408 [DOI] [Google Scholar]
Zimmerman D. W., Zumbo B. D., Lalonde C. (1993). Coefficient alpha as an estimate of test reliability under violation of two assumptions. Educational and Psychological Measurement, 53(1), 33-49. 10.1177/0013164493053001003 [DOI] [Google Scholar]
Zinbarg R., Yovel I., Revelle W., McDonald R. (2006). Estimating generalizability to a latent variable common to all of a scale’s indicators: A comparison of estimators for ωh. Applied Psychological Measurement, 30(2), 121-144. 10.1177/0146621605278814 [DOI] [Google Scholar]
Zumbo B. (1999). A glance at coefficient alpha with an eye towards robustness studies: Some mathematical notes and a simulation model (Paper No. ESQBS-99-1). University of Northern British Columbia, Edgeworth Laboratory for Quantitative Behavioural Science. [Google Scholar]

[bibr1-0013164421992535] Allison P. D. (1978). The reliability of variables measured as the number of events in an interval of time. Sociological Methodology, 9, 238-253. 10.2307/270811 [DOI] [Google Scholar]

[bibr2-0013164421992535] Bay K. S. (1973). The effect of non-normality on the sampling distribution of standard error of reliability coefficient estimates under an analysis of variance model. British Journal of Mathematical and Statistical Psychology, 26(1), 45-57. 10.1111/j.2044-8317.1973.tb00505.x [DOI] [Google Scholar]

[bibr3-0013164421992535] Casella G., Berger R. (2002). Statistical inference (2nd ed.). Duxbury Press. [Google Scholar]

[bibr4-0013164421992535] Cronbach L. J. (1951, September). Coefficient alpha and the internal structure of tests. Psychometrika, 16(3), 297-334. 10.1007/BF02310555 [DOI] [Google Scholar]

[bibr5-0013164421992535] Feldt L. J. (1984). Some relationships between the binomial error model and classical test theory. Educational and Psychological Measurement, 44(4), 883-891. 10.1177/0013164484444010 [DOI] [Google Scholar]

[bibr6-0013164421992535] Feldt L. S. (1965, September). The approximate sampling distribution of Kuder-Richardson reliability coefficient twenty. Psychometrika, 30(3), 357-370. 10.1007/BF02289499 [DOI] [PubMed] [Google Scholar]

[bibr7-0013164421992535] Foster R. C. (2020, June). A generalized framework for classical test theory. Journal of Mathematical Psychology, 96, Article 102330. 10.1016/j.jmp.2020.102330 [DOI]

[bibr8-0013164421992535] Geldhof G. J., Preacher K. J., Zyphur M. J. (2014). Reliability estimation in a multilevel confirmatory factor analysis framework. Psychological Methods, 19(1), 72-91. 10.1037/a0032138 [DOI] [PubMed] [Google Scholar]

[bibr9-0013164421992535] Hayes A. F., Coutts J. J. (2020). Use omega rather than Cronbach’s alpha for estimating reliability. But . . . Communication Methods and Measures, 14(1), 1-24. 10.1080/19312458.2020.1718629 [DOI] [Google Scholar]

[bibr10-0013164421992535] Huynh H. (1979). Statistical inference for two reliability indices in mastery testing based on the beta-binomial model. Journal of Educational Statistics, 4(3), 231-246. 10.3102/10769986004003231 [DOI] [Google Scholar]

[bibr11-0013164421992535] Keats J. A., Lord F. M. (1962, March). A theoretical distribution for mental test scores. Psychometrika, 27(1), 59-72. 10.1007/BF02289665 [DOI] [Google Scholar]

[bibr12-0013164421992535] Kuder G. F., Richardson M. W. (1937). The theory of the estimation of test reliability. Psychometrika, 2(3), 151-160. 10.1007/BF02288391 [DOI] [Google Scholar]

[bibr13-0013164421992535] Lord F. M. (1965, September). A strong true-score theory, with applications. Psychometrika, 30(3), 239-270. 10.1007/BF02289490 [DOI] [PubMed] [Google Scholar]

[bibr14-0013164421992535] Lord F. M., Novick M. R., Birnbaum A. (1968). Statistical theories of mental test scores. Addison-Wesley. [Google Scholar]

[bibr15-0013164421992535] McDonald R. (1999). Test theory: A unified treatment. Taylor & Francis. [Google Scholar]

[bibr16-0013164421992535] McNeish D. (2018). Thanks coefficient alpha, we’ll take it from here. Psychological Methods, 23(3), 412-433. 10.1037/met0000144 [DOI] [PubMed] [Google Scholar]

[bibr17-0013164421992535] Meredith W. (1971, August). Poisson distributions of error in mental test theory. British Journal of Mathematical and Statistical Psychology, 24(1), 49-82. 10.1111/j.2044-8317.1971.tb00449.x [DOI] [Google Scholar]

[bibr18-0013164421992535] Moore W. E. (1970, November). Stochastic processes as true-score models for highly speeded mental tests (Tech. Rep.). Educational Testing Service. 10.1002/j.2333-8504.1970.tb00795.x [DOI]

[bibr19-0013164421992535] Morris C. N. (1982). Natural exponential families with quadratic variance functions. Annals of Statistics, 10(1), 65-80. 10.1214/aos/1176345690 [DOI] [Google Scholar]

[bibr20-0013164421992535] Morris C. N. (1983). Natural exponential families with quadratic variance functions: Statistical theory. Annals of Statistics, 11(2), 515-529. 10.1214/aos/1176346158 [DOI] [Google Scholar]

[bibr21-0013164421992535] Rasch G. (1960). Probabilistic models for some intelligence and attainment tests. Danmarks Paedagogiske Institut. [Google Scholar]

[bibr22-0013164421992535] Raykov T. (1998). A method for obtaining standard errors and confidence intervals of composite reliability for congeneric items. Applied Psychological Measurement, 22(4), 369-374. 10.1177/014662169802200406 [DOI] [Google Scholar]

[bibr23-0013164421992535] Raykov T., Marcoulides G. A. (2019). Thanks coefficient alpha, we still need you! Educational and Psychological Measurement, 79(1), 200-210. 10.1177/0013164417725127 [DOI] [PMC free article] [PubMed] [Google Scholar]

[bibr24-0013164421992535] Revelle W. (2020). psych: Procedures for psychological, psychometric, and personality research. R package Version 2.0.12 [Computer software manual]. https://CRAN.R-project.org/package=psych

[bibr25-0013164421992535] Revelle W., Zinbarg R. (2009, 03). Coefficients alpha, beta, omega, and the glb: Comments on Sijtsma. Psychometrika, 74(1), Article 145. 10.1007/s11336-008-9102-z [DOI] [Google Scholar]

[bibr26-0013164421992535] Savalei V., Reise S. P. (2019). Dont forget the model in your model-based reliability coefficients: A reply to McNeish (2018). Collabra: Psychology, 5(1), Article 36. 10.1525/collabra.247 [DOI] [Google Scholar]

[bibr27-0013164421992535] Schmitt N. (1996). Uses and abuses of coefficient alpha. Psychological Assessment, 8(4), 350-353. 10.1037/1040-3590.8.4.350 [DOI] [Google Scholar]

[bibr28-0013164421992535] Sheng Y., Sheng Z. (2012, February). Is coefficient alpha robust to non-normal data? Frontiers in Psychology, 3, Article 34. 10.3389/fpsyg.2012.00034 [DOI] [PMC free article] [PubMed] [Google Scholar]

[bibr29-0013164421992535] Sijtsma K. (2009). On the use, the misuse, and the very limited usefulness of Cronbach’s alpha. Psychometrika, 74(1), 107-120. 10.1007/s11336-008-9101-0 [DOI] [PMC free article] [PubMed] [Google Scholar]

[bibr30-0013164421992535] Trizano-Hermosilla I., Alvarado J. (2016, 05). Best alternatives to Cronbach’s alpha reliability in realistic conditions: Congeneric and asymmetrical measurements. Frontiers in Psychology, 7, Article 769. 10.3389/fpsyg.2016.00769 [DOI] [PMC free article] [PubMed]

[bibr31-0013164421992535] van Zyl J. M., Neudecker H., Nel D. G. (2000). On the distribution of the maximum likelihood estimator of Cronbach’s alpha. Psychometrika, 65(3), 271-280. 10.1007/BF02296146 [DOI] [Google Scholar]

[bibr32-0013164421992535] Zimmerman D. W. (1972). Test reliability and the Kuder-Richardson formulas: Derivation from probability theory. Educational and Psychological Measurement, 32(4), 939-954. 10.1177/001316447203200408 [DOI] [Google Scholar]

[bibr33-0013164421992535] Zimmerman D. W., Zumbo B. D., Lalonde C. (1993). Coefficient alpha as an estimate of test reliability under violation of two assumptions. Educational and Psychological Measurement, 53(1), 33-49. 10.1177/0013164493053001003 [DOI] [Google Scholar]

[bibr34-0013164421992535] Zinbarg R., Yovel I., Revelle W., McDonald R. (2006). Estimating generalizability to a latent variable common to all of a scale’s indicators: A comparison of estimators for ωh. Applied Psychological Measurement, 30(2), 121-144. 10.1177/0146621605278814 [DOI] [Google Scholar]

[bibr35-0013164421992535] Zumbo B. (1999). A glance at coefficient alpha with an eye towards robustness studies: Some mathematical notes and a simulation model (Paper No. ESQBS-99-1). University of Northern British Columbia, Edgeworth Laboratory for Quantitative Behavioural Science. [Google Scholar]

PERMALINK

KR20 and KR21 for Some Nondichotomous Data (It’s Not Just Cronbach’s Alpha)

Robert C Foster

Abstract

Introduction

Framework