Skip to main content
Applied Psychological Measurement logoLink to Applied Psychological Measurement
. 2016 Feb 14;40(4):302–310. doi: 10.1177/0146621616629380

A Note on the Poisson’s Binomial Distribution in Item Response Theory

Jorge González 1,, Marie Wiberg 2, Alina A von Davier 3
PMCID: PMC5978503  PMID: 29881055

Abstract

The Poisson’s binomial (PB) is the probability distribution of the number of successes in independent but not necessarily identically distributed binary trials. The independent non-identically distributed case emerges naturally in the field of item response theory, where answers to a set of binary items are conditionally independent given the level of ability, but with different probabilities of success. In many applications, the number of successes represents the score obtained by individuals, and the compound binomial (CB) distribution has been used to obtain score probabilities. It is shown here that the PB and the CB distributions lead to equivalent probabilities. Furthermore, one of the proposed algorithms to calculate the PB probabilities coincides exactly with the well-known Lord and Wingersky (LW) algorithm for CBs. Surprisingly, we could not find any reference in the psychometric literature pointing to this equivalence. In a simulation study, different methods to calculate the PB distribution are compared with the LW algorithm. Providing an exact alternative to the traditional LW approximation for obtaining score distributions is a contribution to the field.

Keywords: Poisson’s binomial distribution, compound binomial distribution, Lord and Wingersky’s recursive formula, score distributions

Introduction

Lord (1980) pointed out that using item response theory (IRT), the frequency distribution of test scores, X, conditional on a given ability, θ, could be obtained. For the case when X is the summation of the correct responses, Lord recognized that an explicit formula of the conditional distribution was difficult to obtain in a simple form, except for the unrealistic scenario when all items have the same probability of being answered correctly, in which case the binomial distribution appears naturally (see Lord, 1980, Section 4.1). For the non-identically distributed case, that is, when the probabilities of a correct answer differ across items, the use of probability generating functions leads to what is called the compound binomial (CB) distribution, also known as the generalized binomial (Lord, 1980, Section 4.1; Kendall & Stuart, 1958; Lord & Novick, 1968, Section 16.12). The CB distribution can be directly used to obtain the score probabilities, but this practice is computationally demanding for some cases (as it will be demonstrated later).

Wang (1993) presented an explicit form for the distribution of the number of successes in independent but not necessarily identically distributed binary trials, the Poisson’s binomial (PB) distribution, and studied many of its properties. Wang pointed out that this distribution has played an important role in probability theory, and it dates back at least to Poisson (1837). Direct calculation of probabilities using the PB distribution suffers from similar computational problems as the CB. Nevertheless, efficient algorithms are available for the estimation of the PB distribution. To our knowledge, neither the PB distribution nor the algorithms to obtain the score probabilities seemed to have been referenced in the psychometric literature.

First, we show that the CB and PB distributions lead to the same probabilities of the number of successes in independent binary trials. Second, we conduct a simulation study to evaluate the performance of these algorithms. Third, the well-known Lord and Wingersky (1984) algorithm is shown to be equivalent to one of the methods used for the estimation of the PB distribution.

Theoretical Models

PB

Let X denote the total number of successes in n independent Bernoulli trials where pi is the probability of success at trial i. If pi=pi, the distribution of X is the Binomial(n,p). For the case of independent but non-identically distributed Bernoulli variables, X follows the PB distribution (Wang, 1993). The PB distribution has been used in many areas such as, for example, pool screening (Gao, Aban, & Katholi, 2014), survey sampling (Chen & Liu, 1997), and bioinformatics (Niida, Imoto, Shimamura, & Miyano, 2012).

If pi is the probability of success at the ith trial, the following defines the probability mass function (PMF) of X (Wang, 1993).

Definition: Let Fx={A:A{1,,n},A=x}, where A denotes the number of elements of A. The PMF of X is

f(x;p)=Pr(X=x)=AFx(ΠiApi)(ΠjAc(1pj)).

and the corresponding cumulative distribution function (CDF)

FX(x;p)=Pr(Xx)=m=0xAFx(ΠiApi)(ΠjAc(1pj)).

To use this distribution in practice, consider n=3, in which case the Fx sets are

F0={},F1={{1},{2},{3}},F2={{1,2},{1,3},{2,3}},F3={{1,2,3}}.

To calculate the probability that two successes are obtained, we use Equation 1 to obtain

f(2,p)=(Πi{1,2}pi)(Πj{3}(1pj))+(Πi{1,3}pi)(Πj{2}(1pj))+(Πi{2,3}pi)(Πj{1}(1pj))=p1p2(1p3)+p1p3(1p2)+p2p3(1p1)=p1p2q3+p1p3q2+p2p3q1,

where qi = 1-pi.. Using similar calculations, the probabilities of obtaining 0, 1, and 3 successes are f(0,p)=q1q2q3; f(1,p)=p1q2q3+p2q1q3+p3q1q2; f(3,p)=p1p2p3, respectively. Note, to calculate the probabilities, all the sets satisfying A=x in Fx must be listed. For instance, with n=40, F10 would contain (4010)=847,660,528 elements, so direct calculation of probabilities is not practical for large values of n.

Algorithms for the Calculation of Probabilities

Both approximate and exact alternatives to the direct calculation of probabilities to efficiently obtain the distribution function of the PB model have been proposed and are reviewed here. A well-known approximation used in general statistics is the normal approximation (NA), which is based on the central limit theorem and approximates the CDF of the PB distribution by

FX(x)Φ(x+0.5μσ),

where Φ is the CDF of the standard normal distribution,

μ=E(X)=i=1npiandσ=[Var(X)]1/2=[i=1npi(1pi)]1/2.

Not surprisingly, this approximation has also been used in psychometrics for the CB (e.g., Lord & Novick, 1968), which will later be shown to be equivalent to the PB.

An improved version of the NA was described by Volkova (1996; see also Neammanee, 2005) and is known as the refined normal approximation (RNA). Compared with NA, it adds a correction to the skewness of the distribution of X. Under this method, the CDF of the PB is approximated by

FX(x)G[x+0.5μσ],x=0,1,...,n,

where μ and σ are defined as in Equation 3, G(x)=ϕ(x)+γ(1x2)ϕ(x)/6, ϕ(x) is the probability distribution of the standard normal distribution and

γ=σ3i=1npi(1pi)(12pi).

Another approximate method referred here to as the Poisson approximation (PA) is based on a famous inequality established by Le Cam (Le Cam, 1960; Steele, 1994) and uses the Poisson distribution, with μ as defined in Equation 3, to approximate the PMF of the PB by

f(x)μxexp(μ)x!.

Among the exact methods, one algorithm for computing the distribution function of the PB was proposed in Fernandez and Williams (2010) where polynomial interpolation and the Discrete Fourier Transform (DFT) are used to derive closed-form formulas for the PB’s probability distribution function and CDF. Later, Hong (2013) derived the same closed-form expressions in a simpler way. The method is based on the application of the DFT to the characteristic function of the PB distribution, and it is accordingly called the DFT-CF method. Using this method, the CDF of the PB distribution can be obtained through

FX(x)=1n+1l=0nm=0xexp(iωlm)zl,x=0,,n,

where i is the imaginary unit, ω=2π/(n+1), and zl is the characteristic function of the PB random variable evaluated at ωl, that is, zl=Πi=1n[1pi+piexp(iωl)] (Hong, 2013).

Other exact methods are recursive and initiated as approximate expressions for the calculation of the PB’s CDF. In an early work, Walsh (1955) proposed a method based on power expansions of (pip¯). Interestingly, Walsh’s method was also used in Lord and Novick (1968, Section 23.10) and later in Yen (1984). Successive work regarding the distribution of the number of successes in independent binary trials includes Hoeffding (1956), Darroch (1964), Samuels (1965), and Nedelman and Wallenius (1986). Thomas and Taub (1982) developed a recursive algorithm based on the probability generating function of the random variable X, which is capable to generate exact values of the distribution without explicitly enumerating all the sets satisfying A=x in Fx.

The probability generating function of a random variable X taking values 0,1,2, is defined as P(s)=xsxPr(X=x) (Grimmett & Welsh, 2014), and its explicit form for the case when X is the number of successes in independent binary trials is known to be

P(s)=Πi=1n(qi+pis).

It follows that one can extract the values Pr(X=x) grouping the coefficients of sx in the expansion of Equation 7. Thomas and Taub (1982) used this result and the fact that Equation 7 can be written as an nth degree polynomial in s in the form

Pn(s)=αn1s0+αn2s1+αn3s2++αn,n+1sn

so that Prn(X=x)=αn.x+1 can be calculated using the recursive formula (RF)

αij=αi1,j1pi+αi1,jqi,

with the conditions α11=q1, α12=p1, αi1=αi1,1qi if i2, and αi,n+1=αi1,npi if i2. A program written in BASIC implementing this RF is presented in Barlow and Heidtmann (1984). Note that the use of the probability generating function leads directly to the calculation of probabilities. For instance for n=3, one has

x=03Pr(X=x)sx=Πi=13(qi+pis)=(q1+p1s)(q2+p2s)(q3+p3s)=(q1q2q3)+(q1q2p3+q1p2q3+p1q2q3)s=(p1p2q3+q1p2p3+p1q2p3)s2+(p1p2p3)s3

where s is used to group terms so that

P(X=0)=q1q2q3,P(X=1)=q1q2p3+q1p2q3+p1q2q3,P(X=2)=p1p2q3+q1p2p3+p1q2p3,P(X=3)=p1p2p3

which coincide with the probabilities calculated using Equation 1. This strategy is also used in Lord (1980, Section 4.1). Next, we will show that the RF actually corresponds to the Lord and Wingersky (1984) recursive algorithm.

IRT Models

Let X=(X1,,Xn) be an individual’s response pattern to n binary items where Xi{0,1}. Let θ denote the ability of the individual, ωi an item parameter vector, and pi the probability that the individual answers item i correctly. The underlying probability model is then Xiθ,ωi~Bernoulli(pi(θ,ωi)), where the probability of success can be defined with different IRT models. Local independence is assumed, which means that, conditionally on θ, the answers of the individual to the n items in the test are independent. Let X=i=1nXi be the sum score of an individual with ability θ. Assuming that individuals with a given ability θ will correctly answer to each of n items of a test with probability pi (i=1,,n), the conditional distribution of X for a given θ is the CB (Lord & Novick, 1968) and is defined as

f(xθ)=Tx(Πi=1npixiqi1xi),

where Tx is the set containing all the response patterns with exactly x correct answers, and x=0,,n. As an example of its use to calculate the probability of a certain score, let us consider again n=3 items. Then, the set Tx(x=0,1,2,3) contains the following elements:

T0={(0,0,0)},T1={(1,0,0),(0,1,0),(0,0,1)}T2={(1,1,0),(0,1,1),(1,0,1)},T3={(1,1,1)}

Using Equation 9, for a given ability, the probabilities of earning each of the x values are

f(0θ)=q1q2q3,f(1θ)=p1q2q3+q1p2q3+q1q2p3,f(2θ)=p1p2q3+q1p2p3+p1q2p3,f(3θ)=p1p2p3,

respectively. As it was the case for Fx, the number of elements in Tx grows rapidly with the number of items, so direct use of the CB is computationally demanding. For instance, with 40 items, to calculate f(x=10θ), the set T10 would contain (4010)=847,660,528 elements, which complicates the numerical calculations. In the psychometric literature, these difficulties were noted by Yen (1984; see also, Thissen, Pommerich, Billeaud, & Williams, 1995) who stated that calculating this distribution is laborious because all 2n response patterns must be evaluated.

As an alternative to the direct calculation of score probabilities, the CB distribution has traditionally been obtained using a recursion formula given by Lord and Wingersky (1984). The formula reads as follows

fr(xθ)=fr1(xθ)qr,x=0=qrfr1(xθ)+prfr1(x1θ),0<x<r=fr1(x1θ)prx=r

where fr(xθ) is the distribution of sum scores over the first r items for examinees of ability θ and pr=pi is the probability defined from the chosen IRT model.

Note, (a) in both the PB and CB distributions, the number of elements in the sets Fx and Tx, respectively, grows rapidly with the number of trials, making the computation of probabilities computationally inefficient. Despite this, both PB and CB can be directly used to obtain the score probabilities, and we have shown that they generate identical probabilities so that these distributions can be considered equivalent; (b) the LW algorithm in Equation 10 is actually the RF shown in Equation 8 (if αn,x+1 is replaced by fr(xθ)). It is, however, surprising that none of our cited references on the RF for the PB actually appear in the psychometric literature. As PB and CB are equivalent, we will evaluate the described alternative methods to calculate score probabilities next.

Simulation Example

In this empirical example, the R (R Core Development Team, 2014) package poibin (Hong, 2013) was used to obtain the CDFs for the PB distribution using all the described methods (i.e., NA, RNA, PA, DFT-CF) in the previous section in addition to the LW recursive algorithm. The test scores were simulated under the two-parameter logistic IRT model. Individuals were sampled from a N(0,1) distribution and item parameters from a~U(0.1,1.5) and b~N(0,1). Varying factors were the number of test items n=10,30,50,80,100, and the sample size N=1000,5000,10000. The mean absolute error was calculated over 100 replications for each combination of factors and was defined as

MAE=1100l=1100(x=0nF(x)F*(x))l,

where F(x) is calculated using either the LW, NA, RNA or the PA algorithm, and F*(x) is calculated using the DFT-CF method. This method was chosen as criterion as it produces exact values of probabilities and, in comparison with the LW, is not recursive. To calculate both F(x) and F*(x), item parameter estimates are obtained for each replicated data set in the simulation.

In many practical applications, the marginal score distribution f(x) is of interest (for instance, in IRT observed-score equating); we integrated out the value of θ by averaging over the range θ=2.0,1.9,,1.9,2.0, although other methods such as quadrature points or estimates of θ can also be used (Kolen & Brennan, 2014, p. 199). A comparison among the actual conditional distributions for three values of θ, –2.0, 0.0, and 2.0, is given in the supplementary material. The conditional distribution of scores is of interest, for instance, in local equating methods (van der Linden, 2011).

Results

Table 1 shows the mean absolute error for each combination of the factors in the simulation for the marginal test score distribution F(x). Because increasing the sample size from N = 1000 to N = 10000 produced no significant differences or pattern, only results for N=1000 are shown. For all cases, the commonly used recursive LW algorithm performs equally well as the exact DFT-CF method, with results agreeing to the ninth decimal place (although only four decimals are shown here). LW can thus be considered a method that accurately estimates the score distributions.

Table 1.

Mean Absolute Error for the Estimation of Marginal Distributions for Different Test Lengths.

N LW NA RNA PA
10 0.000 0.029 0.011 0.462
30 0.000 0.020 0.005 0.644
50 0.000 0.015 0.003 0.759
80 0.000 0.012 0.002 0.832
100 0.000 0.011 0.002 0.869

Note. LW = Lord and Wingersky; NA = normal approximation; RNA = refined normal approximation; PA = Poisson approximation.

Studying the approximation methods, PA does not perform well for most cases. The RNA performs better than the NA in all cases and both methods improve with the number of items. Interestingly, when the test length increase (n30), the mean absolute error of the RNA method is less than 0.005. This last finding could suggest that the RNA is a competitive method for the estimation of the score distributions as an alternative to the LW recursive algorithm. The results for conditional distributions of test scores are in line with these and are given as supplemental material.

Discussion

We have shown that in the distribution of the number of successes in independent binary trials, the PB distribution is equivalent to the CB distribution seen in psychometrics. The LW recursive algorithm for CBs was shown to be equivalent to an RF derived for the PB distribution. Alternative methods, both approximate and exact, were introduced for the calculation of PB probabilities and evaluated in the IRT framework.

We used four different approaches for the estimation of the test score distributions. Some approximation methods were competitive in the cases where the score distributions were marginalized over θ. Because the marginal distributions of scores F(x) and F(y) of two test forms to be equated, X and Y, respectively, are used in observed-score equating, the approximation methods can be a convenient alternative to the recursive LW algorithm when performing IRT observed-score equating.

For the case of conditional distributions, approximate methods perform worse when extreme values of abilities are considered in comparison with average values of ability. This could have consequences in equating methods that are based on conditional score distribution functions as, for example, local equating. Because a family of equating functions is defined by different values of θ, the equating for individuals with extreme abilities could be distorted. It is, however, necessary to know to what extent the differences seen for the conditional distributions could have an effect on the actual equating function. This is a topic of future research.

There is an advantage of having a compact and exact mathematical definition for the PB distribution as random variables can directly be generated from this model. This could help in making fairer simulation studies, for instance, when comparing equating methods, as the advantages and disadvantages of using one or another method to simulate score data are not always clear (Sinharay, Holland, & von Davier, 2011). This topic is currently being investigated by the authors. Other potential benefit of knowing the exact mathematical definition of the PB distribution could arise in multistage testing if the conditional test scores of the individuals are used at each stage of testing (Haberman & von Davier, 2014). In this article, local independence is assumed in the specification of IRT models. The psychometric literature, however, has questioned the independence of items in IRT models (Gibbons & Hedeker, 1992; Wainer & Kiely, 1987). Exploring the score distribution in the non-independent, non-identically distributed case is an interesting topic for future research.

Recent work improving the described algorithms for the PB and showing new applications can be found in Barrett and Gray (2014), and in the psychometric field in Cai (2014). How the presented algorithms would extend to the case of polytomous items data (e.g., Thissen et al., 1995) is also a topic of future research.

Footnotes

Declaration of Conflicting Interests: The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding: The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: Jorge González was partially funded by Fondo Nacional de Desarrollo Científico y Tecnológico (FONDECYT) Grant 1150233. The research in this article by Marie Wiberg was funded by the Swedish Research Council Grant 2014-578.

Supplemental Material: The online appendices are available at http://apm.sagepub.com/supplemental

References

  1. Barlow R. E., Heidtmann K. D. (1984). Computing k-out-of-n system reliability. IEEE Transactions on Reliability, 33, 322-323. [Google Scholar]
  2. Barrett B., Gray J. (2014). Efficient computation for the Poisson binomial distribution. Computational Statistics, 29, 1469-1479. [Google Scholar]
  3. Cai L. (2014). Lord–Wingersky algorithm version 2.0 for hierarchical item factor models with applications in test scoring, scale alignment, and model fit testing. Psychometrika, 2, 535-559. [DOI] [PMC free article] [PubMed] [Google Scholar]
  4. Chen S. X., Liu J. S. (1997). Statistical applications of the Poisson-binomial and conditional Bernoulli distributions. Statistica Sinica, 7, 875-892. [Google Scholar]
  5. Darroch J. N. (1964). On the distribution of the number of successes in independent trials. The Annals of Mathematical Statistics, 35, 1317-1321. [Google Scholar]
  6. Fernandez M., Williams S. (2010). Closed-form expression for the Poisson-binomial probability density function. IEEE Transactions on Aerospace and Electronic Systems, 46, 803-816. [Google Scholar]
  7. Gao H., Aban I., Katholi C. (2014). Pool screening: An example of independent non-identical Bernoulli trial. Communications in Statistics: Simulation and Computation. Advance online publication. doi: 10.1080/03610918.2014.941486 [DOI] [Google Scholar]
  8. Gibbons R. D., Hedeker D. (1992). Full information item bi-factor analysis. Psychometrika, 57, 423-436. [Google Scholar]
  9. Grimmett G., Welsh D. (2014). Probability: An introduction (2nd ed.). Oxford, UK: Oxford University Press. [Google Scholar]
  10. Haberman S. J., von Davier A. A. (2014). Considerations on parameter estimation, scoring, and linking in multistage testing. In Yan D., von Davier A. A., Lewis C. (Eds.), Computerized multistage testing: Theory and applications (pp. 285-300). London, England: Chapman & Hall. [Google Scholar]
  11. Hoeffding W. (1956). On the distribution of the number of successes in independent trials. The Annals of Mathematical. Statistics, 27, 713-721. [Google Scholar]
  12. Hong Y. (2013). On computing the distribution function for the Poisson binomial distribution. Computational Statistics and Data Analysis, 59, 41-51. [Google Scholar]
  13. Kendall M. G., Stuart A. (1958). The advanced theory of statistics (Vol. 1). London, England: C. Griffin. [Google Scholar]
  14. Kolen M. J., Brennan R. L. (2014). Test equating, scaling and linking: Methods and practices (3rd ed.). New York, NY: Springer. [Google Scholar]
  15. Le Cam L. (1960). An approximation theorem for the Poisson binomial distribution. Pacific Journal of Mathematics, 10, 1181-1197. [Google Scholar]
  16. Lord F. M. (1980). Applications of item response theory to practical testing problems. Hillsdale, NJ: Lawrence Erlbaum. [Google Scholar]
  17. Lord F. M., Novick M. R. (1968). Statistical theories of mental test scores. Reading, MA: Addison-Wesley. [Google Scholar]
  18. Lord F. M., Wingersky M. (1984). Comparison of IRT true-score and equipercentile observed-score equatings. Applied Psychological Measurement, 8, 453-461. [Google Scholar]
  19. Neammanee K. (2005). A refinement of normal approximation to Poisson binomial. International Journal of Mathematics and Mathematical Sciences, 5, 717-728. [Google Scholar]
  20. Nedelman J., Wallenius T. (1986). Bernoulli trials, Poisson trials, surprising variances, and Jensen’s inequality. The American Statistician, 40, 286-289. [Google Scholar]
  21. Niida A., Imoto S., Shimamura T., Miyano S. (2012). Statistical model-based testing to evaluate the recurrence of genomic aberrations. Bioinformatics, 28, 115-120. [DOI] [PMC free article] [PubMed] [Google Scholar]
  22. Poisson S. D. (1837). Recherches sur la probabilité des jugements en matière criminelle et en matière civile, précédées des règles générales du calcul des probabilités [Researches into the Probabilities of Judgements in Criminal and Civil Cases]. Paris, France: Bachelier. [Google Scholar]
  23. R Core Development Team. (2014). R: A language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing. [Google Scholar]
  24. Samuels S. M. (1965). On the number of successes in independent trials. The Annals of Mathematical Statistics, 36, 1272-1278. [Google Scholar]
  25. Sinharay S., Holland P. W., von Davier A. A. (2011). Evaluating the missing data assumptions of the chain and poststratification equating methods. In von Davier A. A. (Ed.), Statistical models for test equating, scaling and linking (pp. 281-296). New York, NY: Springer-Verlag. [Google Scholar]
  26. Steele J. M. (1994). Le Cam’s inequality and Poisson approximations. The American Mathematical Monthly, 101, 48-54. [Google Scholar]
  27. Thissen D., Pommerich M., Billeaud K., Williams V. S. (1995). Item response theory for scores on tests including polytomous items with ordered responses. Applied Psychological Measurement, 19, 39-49. [Google Scholar]
  28. Thomas M., Taub A. (1982). Calculating binomial probabilities when the trial probabilities are unequal. Journal of Statistical Computation and Simulation, 14, 125-131. [Google Scholar]
  29. van der Linden W. J. (2011). Local observed-score equating. In von Davier A. A. (Ed.), Statistical models for equating, scaling, and linking (pp. 201-222). New York, NY: Springer. [Google Scholar]
  30. Volkova A. Y. (1996). A refinement of the central limit theorem for sums of independent random indicators. Theory of Probability and Its Applications, 40, 791-794. [Google Scholar]
  31. Wainer H., Kiely G. (1987). Item clusters and computerized adaptive testing: A case for testlets. Journal of Educational Measurement, 24, 185-202. [Google Scholar]
  32. Walsh J. E. (1955). Approximate probability values for observed number of “successes” from statistically independent binomial events with unequal probabilities. Sankhya: The Indian Journal of Statistics, 15, 281-290. [Google Scholar]
  33. Wang Y. H. (1993). On the number of successes in independent trials. Statistica Sinica, 3, 295-312. [Google Scholar]
  34. Yen W. M. (1984). Obtaining maximum likelihood trait estimates from number-correct scores for the three-parameter logistic model. Journal of Educational Measurement, 21, 93-111. [Google Scholar]

Articles from Applied Psychological Measurement are provided here courtesy of SAGE Publications

RESOURCES