Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2021 Aug 1.
Published in final edited form as: Sociol Methods Res. 2018 Jan 30;49(3):637–671. doi: 10.1177/0049124117747302

Optimizing Count Responses in Surveys: A Machine-learning Approach

Qiang Fu 1, Xin Guo 2, Kenneth C Land 3
PMCID: PMC8034261  NIHMSID: NIHMS1032001  PMID: 33840866

Abstract

Count responses with grouping and right censoring have long been used in surveys to study a variety of behaviors, status, and attitudes. Yet grouping or right-censoring decisions of count responses still rely on arbitrary choices made by researchers. We develop a new method for evaluating grouping and right-censoring decisions of count responses from a (semisupervised) machine-learning perspective. This article uses Poisson multinomial mixture models to conceptualize the data-generating process of count responses with grouping and right censoring and demonstrates the link between grouping-scheme choices and asymptotic distributions of the Poisson mixture. To search for the optimal grouping scheme maximizing objective functions of the Fisher information (matrix), an innovative three-step M algorithm is then proposed to process infinitely many grouping schemes based on Bayesian A-, D-, and E-optimalities. A new R package is developed to implement this algorithm and evaluate grouping schemes of count responses. Results show that an optimal grouping scheme not only leads to a more efficient sampling design but also outperforms a nonoptimal one even if the latter has more groups.

Keywords: survey methodology, optimality, experimental design, search algorithm, machine learning, fisher information, zero inflation, right censoring, poisson distribution


The design of count responses in surveys is a common yet understudied topic in social sciences. Although the collection of exact counts of frequencies or incidence in social, epidemiological, and demographic surveys (e.g., number of births, frequencies of delinquent behaviors, incidents of diseases, and counts of social contacts) is analytically appealing, actual count responses in survey questions often consist of grouped counts (e.g., one response category “3–4 times” instead of two separate “3 times” and “4 times” categories) or are right-censored (e.g., the upper end response category as “6 or more times”). In fact, such grouped and right-censoring (GRC) count responses have long been adopted by social scientists to study a range of behaviors, events, and attitudes (Akers et al. 1989; Bachman, Johnston, and O’Malley 1990; Bailey, Flewelling, and Valley Rachal 1992; Barnes et al. 2006; Basu and Famoye 2004; Fu, Land, and Lamb 2013, 2016; Hagan, Shedd, and Payne 2005; Marsden 2003; Reardon and Raudenbush 2006; Schaeffer and Dykema 2011; Straus, Gelles, and Smith 1990; Thoits and Hewitt 2001). Scholars often find that GRC count responses are useful to study sensitive research topics (e.g., juvenile delinquency, domestic violence, and drug use) or to solicit information from respondents with less cognitive capacity (e.g., young adolescents or the oldest old). For example, one nationally representative survey project in the United States, the monitoring the future (MTF) study (or the National High School Senior Survey), has used GRC count responses to track annual trends of delinquency and substance use among U.S. high school seniors since the 1975. Such GRC count responses have also been used by the National Longitudinal Study of Adolescent Health (Add Health) to study adolescent behaviors at home, school, or neighborhood.

As documented in existing literature (Bradburn, Sudman, and Wansink 2004; Schwarz et al. 1985), the design of GRC count responses has a direct impact on the estimation of behavioral or cognitive frequencies. For example, an experimental study shows that the choice of right-censored count categories influences the estimation of TV-watching time Schwarz et al. (1985). Yet the design of GRC count responses is still arbitrarily determined by survey investigators. This practice is surprising, given the abundant presence of count responses with either right censoring or grouping or both in surveys. Under certain scenarios, determining the optimal grouping scheme of GRC count responses has been implemented by sophisticated statistical procedures and context-specific research designs, depending on the other variables of interest. For example, the intrinsic or contingent ordering of log-multiplicative association models may provide the optimal grouping scheme if conditional or joint distributions of variables in contingency tables are provided (Goodman 1987; Smith and Garnier 1986; Wong 2010). Likewise, given the extensive debates over the conceptualizations of gradations of democracy (Bollen 1990; Cheibub et al. 1996), it is found that the validity of dichotomous and graded measures of democracy can be evaluated by projecting their qualitative difference into two essential indicators related to democracy, international conflict, and regime stability (Elkins 2000). Although these innovative studies provide useful tools for scholars to assess grouping decisions for counts that are intrinsic to specific research questions at stake, their statistical procedures or research designs require additional information on the distribution of the ungrouped outcome variable and its association with other variables. Nevertheless, the use of GRC count responses often means that investigators have yet to understand the distribution of counts that are extrinsic to a specific research question, let alone its association with other variables. A search algorithm for the optimal grouping scheme focusing exclusively on the outcome variable per se rather than its research context is therefore useful and readily facilitates the evaluation of alternative grouping schemes with different a priori assumptions.

Applying the theory of optimal experimental designs (Atkinson, Donev, and Tobias 2007; De Leon and Atkinson 1991; Dette, Melas, and Pepelyshev 2004; Minkin 1987), we propose an innovative three-step algorithm for searching the parameter space and generating optimal grouping decisions for GRC count responses. In the machine-learning literature, optimal experimental design is also referred as a special case of semisupervised machine learning (or active learning) because a learning/search algorithm interacts with users (survey investigators for the current research) to obtain optimal outputs from the parameter space (Cohn, Ghahramani, and Jordan 1996; Settles 2010). Based on a Poisson multinomial mixture distribution, this article begins with configuring the data-generating process of GRC count responses and develops related maximum likelihood estimators. Two members of the Poisson family of frequency distributions, the Poisson and zeroinflated Poisson (ZIP), are studied in detail. Combined with prior Poisson distribution parameters, the Fisher information (matrix) of the maximum likelihood estimator is then employed to implement a new M search algorithm using Bayesian A-, D-, and E-optimalities. An R package [version 1.0] GRCdata currently consisting of two functions find.scheme and grcmle has been written to assess the grouping decisions of count responses.

GRC Count Data

Before discussing optimal designs for GRC count data, the question arises as to why such response categories have been adopted by social scientists. As examples, in various surveys, respondents are asked to list their numbers of close friends, weekly frequencies of alcohol intake, incidents of criminal victimization in the recent six months, times of illness in the last year, and lifetime history of residential moves. Admittedly, a precise enumeration of exact counts is methodologically appealing for two reasons. First, exact counts can be readily analyzed by existing statistical tools (e.g., Poisson regression models) and software packages. Second, survey investigators do not have to deal with arbitrary grouping or right-censoring decisions. Yet one major problem encountered by survey investigators is that the precise enumeration of counts can impose a cognitive burden on interviewees and sometimes leads to excessive missing data. In other words, the GRC data structure is a compromise between what survey investigators want and what respondents are willing or able to offer (Groves et al. 2011; Schaeffer and Presser 2003). For example, although medical sociologists and psychiatrists would like to know exactly how many days in the past week respondents experienced a variety of depressive symptoms, respondents, especially these with depressive symptoms, often get frustrated when required to distinguish between, for example, two days and three days. Thus, the Center for Epidemiological Studies-Depression (CES-D) scale, an established selfreport depression measure, offers four grouped response categories: less than one day, one to two days, three to four days, and five to seven days (Radloff 1977). For a study on elder adults aged 65 and above, a pretest showed that respondents were unwilling to answer even the four grouped response categories of the CES-D scale, so researchers had to further collapse the four grouped categories and used a dichotomous measure instead (Blazer et al. 1991).

Likewise, for research topics that are perceived as sensitive or less socially desirable, such as personal income, number of sex partners, incidents of delinquent behaviors, and history of drug use, respondents feel more comfortable in reporting grouped or right-censored categories instead of exact numbers (Sudman, Bradburn, and Schwarz 1996). It is not surprising that most, if not all, questions related to the frequency of juvenile delinquency and drug use in both the MTF study and the Add Health adopted GRC count responses.

Even if respondents are willing to collaborate, the difficulty in recalling the exact number of events that happened some time (e.g., several months) ago makes the exact number of events unreliable and introduces additional measurement errors (Groves et al. 2011; Schaeffer and Presser 2003). Similarly, if listing the total number of events requires extra efforts during field interview, fatigue of interviewers can result in underreported numbers of events. For example, as interviewers were instructed to probe for more discussion partners, it has been shown that interviewer effects (e.g., the failure to elicit more private network data) contributed to the extensive debates concerning increasing social isolation in the United States (Paik and Sanchagrin 2013).

Generating GRC Count Data

In order to define the optimality for objective functions of GRC grouping schemes, we first configure a data-generating process for GRC count responses. The Poisson distribution is often used to model count data with probability mass:

f(y|λ)=eyλyy!,y=0,1,2,, (1)

where y is a random count variable and λ is both the mean and the variance of the Poisson distribution. To define a Poisson-based likelihood function for GRC count data, we propose a data-generating scheme in the form of a Poisson multinomial process. Similar Poisson multinomial models were previously used to study contingency tables and traffic accidents (Lang 2004; Lord, Washington, and Ivan 2005).

We let G={Ij}j=1N denote a GRC grouping scheme with N groups (i.e., the total number of response categories) and consecutive subsets I1, …,IN of nonnegative integers {0, 1, 2, …}. For identically and independently distributed observations xi’s from a Poisson(λ) distribution, we have,

αj(xi)={1,whenxiIj,0,otherwise. (2)

In other words, we have a N-dimensional random vector (α1, …, αN) denoting the GRC count responses. For example, (α{0}, α{1}, α{2}, α{3−5}, α{6−9}, α{10+}) denotes the GRC count response categories never, once, twice, 3–5 times, 6–9 times, and 10 and more times. Note that for any given observation in a survey sample, there is one and only one component αj for (α1,…, αN) that equals 1. This N-dimensional vector then has a multinomial distribution M(1, θ1, …,θN), where the parameters θj depend on the parameter λ of the underlying Poisson (λ) distribution:

θj(λ)=yIjeλλyy!,j=1,,N. (3)

For example, the multinomial distribution corresponding to (α{0}, α{1}, α{2}, α{3−5}, α{6−9}, α{10+}) is

M(1,eλ,eλλ,eλλ22,y=35eλλyy!,y=69eλλyy!,y=10eλλyy!).

The probability mass function of α(X) = (α(X1), …, α(XN)) is also given:

f(α|λ)=θ1α1θ2α2θNαN.

If there are n independent observations {xi}i=1n drawn from the Poisson (λ) distribution, the likelihood function is defined using the probability mass function of the Poisson multinomial distribution:

L(λ)=i=1nf(α(xi)|λ). (4)

Because this likelihood function derives from a Poisson multinomial distribution, it is easy to show that the corresponding maximum likelihood estimator is consistent and asymptotically normal. More importantly, the variance of its asymptotic distribution is given by the inverse of the Fisher information.1 In other words, any consistent sequence λ^n of roots of the likelihood in equation (4) satisfies n(λ^nλ0)N(0,1/I(λ0)) in distribution, where I0) is the fisher information corresponding to a specific grouping scheme, and λ0 is the underlying true parameter.

The Poisson distribution assumes that its mean λ equals variance. However, this assumption is violated if empirical frequency distributions show excess zeros relative to a Poisson distribution (Hall 2000; Klein, Kneib, and Lang 2015; Lambert 1992; Puig and Valero 2006). The ZIP distribution takes excess zeros into account and has the probability mass function:

f(y|λ,p)={1p+peλ,wheny=0,peλλyy!,wheny>0, (5)

where p is the proportion of population exposed to the Poisson(λ) distribution.

Using the same distribution of αj in equation (2), the GRC data αj (Xi) defined in the last section now has a different multinomial distribution M(1, μ1, … μN), where

μ1(λ,p)=1p+pyI1eyλyy!,andμi(λ,p)=pyIieλλyy!,  fori=2,,N.

The probability mass function of α then depends on two parameters p and λ of the ZIP distribution, f(α|λ,p)=μ1α1μNαN For independent observations {xi}i=1n, we have the likelihood function

L(λ,p)=i=1nf(α(xi)|λ,p). (6)

Again, based on Theorem 6.5.1 at Lehmann and Casella (1998), it is easy to show that the maximum likelihood estimators remain consistent and asymptotically normal for the ZIP case. The detailed proof of their asymptotic properties is given in Fu, Guo, and Land (2018). Estimators λ^n and p^n are asymptotically efficient in the sense that in distribution,

n(λ^nλ0)N(0,J22/|J|),
n(p^np0)N(0,J11/|J|),

where J11 and J22 are the 1–1 and 2–2 entry of the Fisher information matrix J corresponding to a specific grouping scheme, and λ0 and p0 are the underlying true parameters. We will further illustrate and discuss the Fisher information matrix J in the following sections.

Optimal Designs for GRC Count Data

The foregoing demonstration that the asymptotic distributions of the maximum likelihood estimators of both the Poisson and the ZIP cases are characterized by the Fisher information (matrix) is important for defining Bayesian optimality of GRC grouping schemes, given the internal link between Fisher information (matrix) and grouping choices. For example, the Fisher information of the Poisson case depends on both the true unknown parameter λ0 and the specific grouping scheme G: If we know the true parameter λ0, the corresponding asymptotic distribution of the estimator λn is entirely determined by grouping choices in the sense that a better grouping scheme is associated with a smaller variance of the asymptotic distribution or a more efficient estimator. Although the search for an optimal grouping scheme is easier if the true parameter λ0 is known or given, one task of optimal experimental designs is to take uncertainty of unknown parameters into account by incorporating prior knowledge (from experts, prior research, or pilot studies) into a general search algorithm. Next, we further investigate the relations between the Fisher information (matrix) and grouping choices. An objective function is then proposed to synthesize these relations and facilitate our subsequent discussion of a general three-step search algorithm.

Fisher Information and Grouping Choices: the Poisson Case

For the Poisson case, the Fisher information of the previous Poisson multinomial distribution with parameter λ and the grouping scheme G={Ij}j=1N is

I(λ)=IG(λ)=E[d2dλ2logf(α|λ)]=j=1Nθj(λ)d2dλ2logθj(λ)=i=1N(θj)2θj. (7)

Equation (7) follows the definition of probability mass function θj in equation (3) and j=1Nθj=1 We next remark on the relationship between the Fisher information and grouping choices.

Remark 4.1: When N = 1, we have θ1 ≡ 1 and thus I = 0. This corresponds to a trivial case where data provide no information for optimal designs. When N ≥ 2, it is easy to see I > 0, and the search for an optimal grouping scheme becomes possible.

Remark 4.2: While in empirical applications we restrict N to be finite, we can also let N = ∞ and make each group contain only one integer. This scenario is exactly the same as precise enumeration without any grouped counts. Under this circumstance, equation (7) shows that the Fisher information I is 1/λ, which corresponds to the asymptotic variance of the Poisson estimator.

Remark 4.3: If we obtain a finer grouping scheme G by dividing one or more groups of G into subgroups, such a grouping scheme yields a larger Fisher information. To show this, let θ=θ(λ)=k=a+1ceλλk/k! be the probability corresponding to a particular group {a + 1, …, c} with a ≥ −1 and a + 1 < c. For a grouping scheme G we see from equation (7) that this particular group contributes (θ)2/θ to the overall Fisher information. Now, we divide this {a + 1, …, b} and {a + 1, …, c} with a + 1 ≤ b and a + 1 ≤ c. For the new finer grouping scheme, the contribution of these two subgroups to the Fisher information is

(θ*)2θ*+(θ**)2θ**,

Where θ*=k=a+1beλλk/k! and θ**=k=b+1ceλλk/k!. Here we note that θ = θ* + θ** and U2v(u + v) + V2u(u + v) ≥ uv(U + V)2 for u, v, U, V > 0. Substituting U=θ*, u = θ*, V=θ** and v = θ**, we have

(θ*)2θ*+(θ**)2θ**(θ)2θ. (8)

In inequality (8), we note that the equality holds if and only if θ*θ**=θ**θ* (i.e, Uv = uV). Next, we further demonstrate that θ*θ**θ**θ* and the equality in equation (8) does not hold. If −1 < a < b < c < ∞, we have

e2λ(θ**θ*θ*θ**)=k=a+1cλbλkb!k!k=b+1cλaλka!k!k=a+1bλcλkc!k!,=k=1cb(1b!(a+k)!1a!(b+k)!)λa+b+k+k=1ba1(1b!(ck)!1c!(bk)!)λc+bk.

Likewise, for special cases where a = −1 or c = ∞ we also have

e2λ(θ**θ*θ*θ**)={k=0cλbλkb!k!k=0bλcλkc!k!,if1=a<b<c<,k=a+1λbλkb!k!k=b+1λaλka!k!,if1<a<b<c=,k=0λbλkb!k!,if1=a<b<c=.

Given that a < b < c, the coefficients of the polynomial e2λ(θ*θ**θ**θ*) across all cases discussed above (i.e, −1 < a < b < c < ∞, a= −1 or c = ∞) must be positive. Since λ is also positive, θ*θ**θ**θ* cannot be zero and we have

(θ*)2θ*+(θ**)2θ**>(θ)2θ.

We previously noted that the Fisher information of the Poisson case depends on both the specific grouping scheme G and the true (unknown) parameter λ. Given the foregoing three remarks on the relationship between grouping choices and Fisher information, a probability function ρ can be defined to take prior knowledge of λ into account. In general, we define an objective function as

ΩP(G)=0IG(λ)dρ(λ),

where ρ is a continuous or discrete distribution. The introduction of ρ allows analysts to deal with the uncertainty in estimating the true parameter λ and explore optimal grouping schemes under different prior distributions. For example, if a survey investigator assumes that the true value of λ is known, ρ becomes a degenerate distribution with a point mass of 1 at λ0 ∈ (0, ∞). For the continuous distribution case, we could specify a uniform distribution on [a, b] for ρ and obtain

ΩP(G)=1baabIG(λ)dλ.

ρ can also be specified as a discrete distribution supported on positive numbers λ1; …; λn with probability masses q1; …; qn, respectively, and we have

ΩP(G)=j=1nIG(λj)qj,j=1nqj=1.

Fisher Information and Grouping Choices: The ZIP Case

Given that the ZIP distribution has two parameters p and λ, its corresponding Fisher information is denoted by a symmetric and positive semidefinite matrix:

J(λ,p)=JG(λ,p)=[J11J12J21J22]=j1μj[(μjλ)2μjλμjpμjλμjp(μjp)2],

Where μ1 = 1 – p + pθ1, μj = pθj for j ≥ 2, and θj=yIjeλλy/y! for j = 1; …; N. This matrix could also be expressed in the form of equation (7) as

J=[p(p1)(θ1)2θ1μ1+pI(λ)θ1μ1θ1μ11θ1pμ1], (9)

where I(λ)=j=1N(θj)2/θj. Next, we remark on the relationship between Fisher information and grouping choices for the ZIP case.

Remark 4.4: Optimal designs become impossible in the trivial case when N = 1. This scenario is similar to Remark 4.1, as the Fisher information matrix J becomes a zero matrix. Another trivial case appears when N = 2 and the determinant of J is zero (note that J11(λ,p)=p(θ1)2/(μ1(1θ1))). Since the asymptotic distribution of n(λ^nλ0,p^np0) is now a degenerate distribution, optimal designs based on prior knowledge of both λ and p become impossible. When N ≥ 3, Remark 4.3 implies that I(λ)>(θ1)2/θ1+((1θ1))2/(1θ1) and the determinant of J is calculated as:

det(J)=1θ1μ1I(λ)+(p1)(θ1)2(1θ1)θ1μ12(θ1)2μ12,=1θ1μ1I(λ)(θ1)2θ1μ1=1μ1((1θ1)I(λ)(θ1)2θ1)>0.

The Fisher information matrix J is therefore strictly positive definite when N ≥ 3 and thus can be used for optimal designs.

For both the Poisson and the ZIP cases, we have demonstrated that asymptotic variances are given by the inverse of Fisher information (matrix). Given that in experimental designs an optimal design is often selected to yield the most efficient estimator (see, e.g., Steinberg and Hunter 1984), an optimal grouping scheme of GRC data should, according to the same principle, maximize Fisher information (matrices) and produce more (asymptotically) efficient estimators. Because there are multiple ways of ordering square matrices, we introduce a local objective function S to compare Fisher information matrices: Optimising S will give a locally optimal design (Chernoff 1953), where local means that the design is optimal for a specific value of an unknown parameter (or vector). To illustrate the definition of S, we follow previous research on the Loewner partial order (see, e.g., Horn and Johnson 2013) and write J*+J if J*J is positively semidefinite, where J* and J both strictly positive definite matrices.

Definition 4.5 (objective function): We define a local objective function of positive definite matrices (e.g., the Fisher information matrices) as any function S satisfying S(J*) ≥ S(J) if J* ± J.

To maximize the Fisher information matrix and achieve more efficient estimation, we apply local objective functions based on three common optimality criteria (Horn and Johnson 2013; Steinberg and Hunter 1984): A-optimality: maximizing SA = 1/tr(J−1), where tr(J−1) is the trace of J−1; D-optimality: maximizing SD = det(J); and E-optimality: maximizing SE, where SE is the minimum eigenvalue of J. If we assume that the two eigenvalues of J are e1 and e2, the A-, D-, and E-optimality designs maximize e1e2 = (e1 + e2), e1e2 and min(e1, e2), respectively (Nguyen and Miller 1992). Note that SA, SD, and SE satisfy the definition of local objective functions above (see ,e.g., Horn and Johnson 2013:495). Among the three optimality criteria, A-optimality minimizes average asymptotic variances of all parameter estimates, D-optimality minimizes the generalized asymptotic variance (or the volume of the confidence ellipsoid under normality) of parameter estimates, and E-optimality minimizes the maximum asymptotic variance of the estimates of (components of) parameters. Because all three optimality criteria as information functions are isotonic with respect to the Loewner ordering (Pukelsheim 1993), results from simulations (not shown) suggest that optimal grouping schemes generated by the three methods are virtually the same. Nevertheless, we recommend the use of D-optimality due to its calculation simplicity. A- and E-optimalities may lead to more accurate results if there is a strong correlation between p and λ (Steinberg and Hunter 1984).

Remark 4.6: For the Poisson case, Remark 4.3 shows that a finer grouping scheme gives a larger Fisher information. This conclusion does not always hold for the ZIP case. For example, the local objective function S2(J):=J22=1θ1pμ1 depends entirely on how the first group of a grouping scheme is defined.S2 remains unchanged if one divides a group other than the first group into more subgroups. Yet the conclusion S(JG*)S(JG) still holds for the ZIP case if a grouping scheme G* is finer than G

This conclusion that S(JG*)S(JG) becomes obvious once the difference ΔJ=JG*JG is shown to be positive semidefinite. To investigate whether ΔJ is positive semidefinite, we assume without loss of generality that G* is obtained by dividing one group from G into two and denote

ΔJ=[ΔJ11ΔJ12ΔJ21ΔJ22].

Because both JG* and JG are symmetric matrices, we have ΔJ12 = ΔJ21. If G* is obtained by dividing the jth (j ≥ 2) group of G into two subgroups, we have ΔJ12 = ΔJ21 = ΔJ22 = 0. This conclusion follows equation (9) because the choice of the first group, which remains the same for both G* and G, determines ΔJ12, ΔJ21, and ΔJ22. Remark 4.3 indicates that ΔJ11 > 0. Therefore ΔJ has two nonnegative eigenvalues, 0 and ΔJ11, and is positive semidefinite.

If G* is obtained by dividing the first group of G into two subgroups, we have θj=yIjeyλy/y! with I1; :::; IN denoting different groups of G*. Therefore I1I2 is the first group of G, I3 is the second group of, G and so on. Following the definition of the Poisson multinomial distribution, we still have μ1 = 1 − p + pθ1 and μj = pθj for j ≥ 2. ΔJ is then calculated as follows

ΔJ11=(pθ1)2μ1+(pθ2)2μ2(pθ1+pθ2)2μ1+μ20,
ΔJ12=ΔJ21=θ1μ1+θ1+θ2μ1+μ2,
ΔJ22=1θ1pμ11θ1θ2p(μ1+μ2)=θ2pμ1(μ1+μ2)>0.

Since tr(ΔJ) = ΔJ11 + ΔJ22 > 0, the sum of the two eigenvalues of ΔJ is positive. Meanwhile, ΔJ has no negative eigenvalues because det (ΔJ) = 0 (proof omitted). Hence, we conclude that ΔJ is positive semidefinite, and S(JG*)S(JG) if a grouping scheme G* is finer than G.

We use a distribution ρ(λ, p) to model prior knowledge of the parameters λ and p. Let S be a local objective function. We define another global objective function as

ΩZIP(G)=ΩZIP,S(G)=S(+×(0,1)JG(λ,p)dρ(λ,p)).

Here, we choose to optimize the integral of S (∫ J) because this method has been justified by Chaloner and Verdinelli (1995) and is shown to be a preferred option for defining Bayesian D-optimality (Atkinson et al. 2007). For example, if ρ(λ, p) is a uniform distribution on (a; b) × (c, d) we have

ΩZIP(G)=S(1(ba)(dc)abdλcdJG(λ,p)dp).

When ρ(λ, p) is a discrete distribution supported on {(λj,pj)}j=1n with probabilities q1; …; qn, respectively, we have

ΩZIP=S(j=1nJG(λj,pj)qj),j=1nqj=1.

A Three-step M Algorithm

Considering a global objective function Ω(·) that is either ΩP or ΩZIP in the preceding section, we propose a three-step M search algorithm for selecting an optimal grouping scheme that maximizes Ω. It should be noted that the application of this algorithm is not restricted to GRC data but could be extended to optimal designs for count responses in general if either grouping or censoring is present. From the perspective of semisupervised machine learning, this M algorithm searches all possible combinations of grouping schemes and interacts with survey investigators to yield the optimal grouping scheme.

Remarks 4.3 and 4.6 show that a finer grouping scheme increases the value of Ω. Without grouping or right censoring, Ω is thus maximized by the finest scheme where each separate response group contains and only contains one integer. This finest possible grouping scheme is obviously the optimal one. In the presence of grouping and right censoring, however, the search for an optimal grouping scheme is constrained by the total number of groups N allowed. Now, the search becomes challenging, if not impossible, since the search algorithm has to deal with infinitely many grouping schemes. To make sure that the infinitely many grouping schemes for the GRC count responses can be processed by our search algorithm, we introduce a hypothetical integer M, which is sufficiently large, to divide the infinitely many grouping schemes into two parts: a finite set where M is contained in the last groups of schemes and an infinite set where M is not contained in the last groups. With the introduction of M, the search algorithm consists of three major steps. First, we use M to produce a finite set of possible grouping schemes. An optimal grouping scheme maximizing Ω(·) is identified after a search of this finite set. The second and third steps verify whether the optimal grouping scheme returned by the first step is the global maximizer, that is, the scheme achieving the best performance among all N-group schemes. The search algorithm stops if the optimal grouping scheme returned by the first step passes the verification. Otherwise, the iteration continues with a larger M.

Step 1: Select a sufficiently large positive integer M. Among all N-group grouping schemes where M is contained in their last groups, find the scheme Gmax that maximizes Ω.

The introduction of M divides the whole set of infinite grouping schemes into two parts: a finite set with M contained in the last group and an infinite set with M not contained in the last group. This procedure is motivated by the idea that, if M is sufficiently large, all integers larger than M from a Poisson process cannot exert much influence on the Fisher information and thus do not affect the choice of optimal grouping schemes. To illustrate this idea, we define the last right-censored group IN of a N-group scheme G={Ii}i=1N as IN = {M, M + 1; …}. For a Poisson (λ) model, we see that contribution this particular group containing M and larger integers to the Fisher information is trivial:

(eλλμ1(M1)!)2k=Meλλkk!0,asM.

Moreover, an implication of this property is that, to increase the Fisher information, finer grouping decisions should be applied to integers with nontrivial probabilities. If the total number of groups N is fixed, a finer grouping of large integers with trivial probabilities should be avoided, and a coarse right-censored group is preferred.

The choice of M follows a Goldilocks rule. An important assumption of the search algorithm is that M should be sufficiently large and represents the lower bound of a set of integers leading to a successful search for the global optimal scheme. Yet researchers should not choose a too large M either: The number of all possible grouping schemes processed by the search algorithm grows quickly with larger M, and the search takes much longer time despite optimization of the algorithm (the computation time is roughly proportional to MN−1). In theory, M should be the lowest integer included in the last rightcensored group of the global optimal scheme, so that the search algorithm works without consuming too much time. As a practical guidance, researchers may start from an integer larger than the mean of the prior distribution of λ, gradually increase M if its previous value fails the verification from step 2 and 3, and locate a sufficiently large M in a trial and error learning process.

One example of the search algorithm is demonstrated in Figure 1. If we set N = 3 and M = 4, step 1 only searches six schemes as plotted in the part A of Figure 1. When M is not contained in the last group, there are infinitely many grouping schemes to search and their overall set is denoted as F1 Examples of F1 are plotted in the part B of Figure 1.

Figure 1.

Figure 1.

An illustration of the search algorithm. Part A: all possible three-group schemes with M = 4, where M is contained in their last groups; Part B: examples of infinitely many three-group schemes from the set F1, where M is not contained in their last groups; Part C: the set F2 of two-group schemes obtained from a merging process of schemes in part B. M is still contained in their last groups; Part D: the set F3 obtained from F2 by including each integer greater than M in one and only in one group.

Step 2: Compute the objective function Ω* of F3 (defined below), where Ω*=maxGF3Ω(G).

The foregoing discussion in step 1 shows that our algorithm divides the set of all possible grouping schemes into two parts, and step 1 deals with the finite set with M contained in the last group. Step 2 then deals with the other infinite set F1 with M not contained in the last group. In step 2, the algorithm will search a finite set F3 of grouping schemes and calculate the objective function based on an optimal scheme from F3. To understand the second step, we first illustrate what F2 and F3 are and then discuss the relation between F1 and F3. First, let F2 be the overall set of (N−1)-group schemes such that M is contained in the last group. When N = 3 and M = 4, F2 only consists of four schemes and is illustrated in the part C in Figure 1. Second, for each grouping scheme G in F2, we divide its tail after M to make a new scheme G. Now, the first N−2 groups in G the corresponding G but each integer greater than M is now contained and only contained in a separate group in G. We denote F3 as the total set of all grouping schemes G obtained in this way from F2. The case with N−3 and M = 4 is shown in parts C (for F2) and D (for F3) in Figure 1. Due to the one-to-one match between grouping schemes from F2 and F3, they have the same number of schemes.

For any N-group scheme G from F1, where M is not contained in the last group, F3 contains at least one scheme G finer than G. specifically, if M is contained in the (N − 1) th group for a grouping scheme G, there exists some G in F3 that has identical first N − 2 groups as G. G must be finer than G, given that every integer beyond M is also contained in a separate group in G. The case where M is contained in the kth group with kN − 2 can be deduced by analogy, as now the first N − 2 groups of G are finer than the first k − 1 groups of G.

Step 3: If Ω*Ω(Gmax), Gmax is the global maximizer of the objective function. A larger M should be chosen otherwise.

The relationship among F1, F2, and F3 can be further conceptualized as follows. When M is not contained in the last group (as shown in F1), the search algorithm actually merges the group containing M with all its right side groups (including the right-censored group) to form a new and bigger right-censored group. The first three grouping schemes from part B to part C in Figure 1 illustrate this merging process. Subsequently, the new grouping scheme with a bigger right-censored group (e.g., the first scheme in part C) has fewer total number of groups and is thus coarser than its original form in F1 (e.g., the first scheme in part B). To make a fair comparison between Fisher information of grouping schemes with M contained in the last group and with M not contained in the last group, after the merging, we must compensate for the loss in the latter’s Fisher information due to this reduction in the total number of groups. To compensate for the loss of the Fisher information after merging, each integer greater than M in the new last group is subsequently contained and only contained in a separate group. This procedure thus forms a new (much) finer grouping scheme (i.e., from grouping schemes in part C to corresponding grouping schemes in part D). Step 3 then compares values of the objective functions between the optimal grouping scheme with M contained in the last group and the (much finer) optimal grouping scheme with M contained in other groups. Pseudocode describing the search algorithm is listed as below to facilitate readers’ understanding:

  1. Input the (maximum) number of groups N, a sufficiently large integer M and the objective function of a grouping scheme Ω;

  2. Among all N-group grouping schemes where M is contained in their last groups, find the scheme Gmax that maximizes Ω and denote this maximum value as Ω(Gmax);

  3. Set F2 as the overall set of (N − 1)-group schemes where M is contained in the last group for every grouping scheme G in F2;

  4. For every G in F2, there exists one corresponding grouping scheme G where every integer greater than M is contained and only contained in one separate group. The total set of all such grouping schemes G is defined F3. Find the grouping scheme in F3 that maximizes Ω and denote this maximum value as Ω*;

  5. Return Gmax if Ω*Ω(Gmax); else choose a larger M and proceed to the first step.

To summarize, because there are infinitely many grouping schemes with M not contained in the last group (e.g., grouping schemes from the set F1 as shown in part B), we first transform them into finite schemes with M contained in a new (big) last group (F2 as shown in part C) and then much finer schemes (F3 as shown in part D) to do a fair comparison. It is clear that these much finer grouping schemes may sometimes overcompensate for loss in Fisher information in the merging process. For example, step 3 could falsely reject the true optimal grouping scheme if the M chosen is at or slightly higher than the lowest integer included in the last right-censored group of the global optimal scheme. Yet the false rejection can be easily solved by increasing the value of M as each separate group containing one integer larger than M plays less role in estimating the objective function. The global optimal grouping scheme successfully accepted by the algorithm remains the same as that falsely rejected. Actually, the search algorithm is intentionally developed in a way that it prevents any false acceptance of a wrong optimal scheme at the cost of tolerating false rejection of the true optimal grouping scheme, while a larger M further solves the false-rejection issue.

Data Simulation and Empirical Analysis

To illustrate the optimal designs for count data, we employ data from a nationally representative survey of youth in America, the MTF study. Since 1975, each year about 250,000 high school students from approximately 130 U.S. high schools nationwide participate in this survey. In the current study, we focus on four questions from the MTF study related to 12th graders’ frequencies of alcohol drinking from 1996 to 2012. The first three questions on alcohol drinking are virtually the same except for the reference period (in your lifetime, during the last 12 months, and during the last 30 days): “On how many occasions have you had alcoholic beverages to drink–more than just a few sips?” The GRC count response categories for the three questions are: 0 occasions, 1–2 occasions, 3–5 occasions, 6–9 occasions, 10–19 occasions, 20–39 occasions, and 40 or more. The fourth question is related to binge drinking: “Think back over the LAST TWO WEEKS. How many times have you had five or more drinks in a row? (A “drink” is a glass of wine, a bottle of beer, a wine cooler, a shot glass of liquor, a mixed drink, etc.).” GRC count response categories for this question are none, once, twice, 3–5 times, 6–9 times, and 10 or more times. Table 1 shows the original counts of drinking data from 1996 to 2012. Drinking behaviors tend to be less often with shorter reference periods. Binge drinking is most rare among the 12th graders.

Table 1.

Frequency Distributions of Adolescent Alcoholic Drinking, MTF, 1996–2012.

Year Lifetime Drinking Drinking in Last 12 Months
0 1–2 3–5 6–9 10–19 20–39 40+ 0 1–2 3–5 6–9 10–19 20–39 40+
2012 674 199 235 208 281 224 416 795 335 322 225 231 160 163
2011 675 191 271 211 294 211 422 825 384 297 237 237 136 151
2010 671 193 265 216 281 244 460 784 397 321 231 250 137 199
2009 582 191 239 223 270 245 446 701 385 292 222 262 158 171
2008 634 160 251 210 292 222 470 759 380 288 218 241 165 187
2007 676 195 227 204 306 229 531 808 354 277 243 271 157 239
2006 626 169 233 226 304 242 497 756 349 302 249 262 178 196
2005 610 197 272 243 244 246 544 745 390 308 237 265 197 212
2004 571 187 245 210 307 254 583 690 363 314 246 295 214 218
2003 535 187 232 260 295 257 595 687 373 322 282 256 171 261
2002 465 151 228 214 267 211 564 592 318 311 214 239 171 240
2001 447 145 188 185 298 287 514 574 298 269 263 272 179 217
2000 412 157 246 196 283 248 538 533 303 318 236 277 190 209
1999 420 178 215 192 311 261 632 558 364 265 247 256 238 274
1998 442 168 295 233 295 307 724 588 396 327 256 331 213 350
1997 492 171 247 233 340 310 694 638 367 338 261 325 262 285
1996 515 155 226 206 282 314 625 642 350 298 254 297 219 250
Year Drinking in Last 30 Days Binge Drinking in Last Two Weeks
0 1–2 3–5 6–9 10–19 20–39 40+ 0 1 2 3–5 6–9 10+
2012 1,264 462 254 123 76 28 24 1,665 197 150 142 29 19
2011 1,346 432 234 122 74 22 34 1,730 190 134 124 32 28
2010 1,323 457 254 132 86 25 38 1,716 220 158 129 35 27
2009 1,224 438 257 138 85 24 24 1,586 222 145 147 34 32
2008 1,262 454 240 137 78 26 41 1,630 215 160 135 44 24
2007 1,293 451 275 144 107 37 45 1,688 217 164 149 55 39
2006 1,244 469 262 154 100 35 27 1,661 211 160 145 45 32
2005 1,257 459 304 157 104 33 32 1,674 243 178 165 37 33
2004 1,201 466 296 193 103 31 40 1,635 253 171 170 61 26
2003 1,193 504 247 196 123 41 37 1,636 234 180 202 42 22
2002 1,062 446 223 151 136 30 39 1,456 200 161 166 47 25
2001 1,036 430 239 173 114 40 27 1,438 209 158 144 62 37
2000 1,022 437 288 169 94 31 30 1,425 202 186 150 40 36
1999 1,041 425 285 214 147 40 53 1,460 213 176 217 63 54
1998 1,162 453 347 220 155 58 63 1,634 228 224 228 61 57
1997 1,153 527 324 207 170 46 47 1,664 253 198 238 62 53
1996 1,134 436 321 193 142 34 53 1,572 236 191 202 55 47

Note: MTF = monitoring the future.

To identify appropriate prior distributions for Bayesian optimal designs, we wrote an R function grcmle to infer Poisson parameters based on the likelihood functions given in equations (4)) and (6). This R function adopts maximum likelihood estimation and reports the mean, standard error, and confidence interval estimated from data. As multiple waves of data are available from the MTF study, we use the mean + 3 standard deviations as the range of the prior distribution. In the absence of multiple data sets, researchers could also determine the range of prior distributions based on the mean and standard error (e.g., the mean + 3 standard errors) reported by this R function. Table 2 summarizes means and standard deviations of Poisson and ZIP parameters across the 17 years investigated. As expected, the means of λ across different survey years tend to be larger with a longer reference period. Also noteworthy is that all standard deviations calculated are much smaller than their corresponding means, suggesting that the year-to-year (ZI) Poisson estimates are relatively stable.

Table 2.

Maximum Likelihood Estimates of Means and Standard Deviations (SD) of (Zero-inflated) Poisson Parameters.

Drinking in Last 30 Days Drinking in Last 12 Months Lifetime Drinking Binge Drinking in Last Two Weeks
λ (Poisson) P (ZIP) λ (ZIP) λ (Poisson) P (ZIP) λ (ZIP) λ (Poisson) P (ZIP) λ (ZIP) λ (Poisson) P (ZIP) λ (ZIP)
Mean 2.723 .477 6.062 8.750 .698 13.026 15.425 .756 21.127 0.774 .307 2.662
SD 0.461 .039 0.561 1.209 .039 1.089 1.834 .040 1.348 0.129 .029 0.226

Note: ZIP = zero-inflated Poisson.

Next, based on the aforementioned search algorithm, we developed another R function find.scheme in the R package GRCdata to search for the optimal GRC grouping scheme, which has the following parameters:

find.scheme(N, densityFUN, lambda.lwr, lambda.upr, p.lwr, p.upr, probs, lambdas, ps, is.0.isolated = TRUE, model = c(“Poisson”, “ZIP”), matSc = c(“A”, “D”, “E”), M = “auto”).

N defines the (maximum) number of groups, which should be greater than one for the Poisson case and greater than two for the ZIP case. densityFUN gives the probability density function of a prior distribution, if needed. [lambda.lwr, lambda.upr] and [p.lwr, p.upr] define the range of λ and p, respectively, and [p.lwr, p.upr] is not needed if a Poisson model is chosen. probs, lambdas, and ps define discrete prior distributions. is.0.isolated indicates whether zero should be contained in a separate group. This parameter is included, given that researchers are often interested in estimating prevalence or incidence rates, which requires zero to be contained in a separated group. model specifies Poisson or ZIP cases to be used in the search algorithm. matSc gives the type of local objective functions of the Fisher information matrix for the ZIP case. Users can choose from A-, D-, and E-optimality. M is a sufficiently large integer required to implement the search, as discussed above. If the lowest Ms needed to find the global optimal grouping schemes are specified, most examples listed in Tables 35 take several seconds to converge. Depending on the computer configuration, the program may take several minutes to converge if a large M (e.g., 33) is chosen. If M is set as auto, the search algorithm will automatically determine an adequate M needed to produce the global optimal grouping scheme and subsequently return such optimal grouping scheme. As expected, it can take longer time for the program to converge when the auto option is chosen. If M is not set as auto, the output of find.scheme includes an indicator succeed indicating whether M chosen is sufficiently large for the search algorithm to identify the global optimal grouping scheme. Users need to choose a slightly larger value for M if succeed is false.

Table 3.

Estimated Optimal Grouping Schemes With Different Numbers of Groups: Uniform Prior Distributions.

N Number of Groups Drinking in 30 Days (Poisson) Ma Drinking in 12 Months (Poisson) Ma Lifetime Drinking (Poisson) Ma Binge Drinking (Poisson) Ma
λ = [1.34, 4.02] λ = [5.12, 12.38] λ = [9.92, 20.93] λ = [0.39, 1.16]
3 0 1–2 3+ 4 0 1–8 9+ I, 0 1–14 15+ 18 0 1 2+ 3
4 0 1–2 3–4 5+ 5 0 1–6 7–10 11+ 13 0 1–11 12–17 18+ 21 0 1 2 3+ 3
5 0 1 2 3–4 5+ 6 0 1–5 6–8 9–12 13+ 14 0 1–10 11–14 15–19 20+ 23 0 1 2 3 4+ 4
6 0 1 2 3 4–5 6+ 6 0 1–4 5–7 8–10 11–13 14+ 15 0 1–9 10–13 14–17 18–22 23+ 24 0 1 2 3 4 5+ 5
7 0 1 2 3 4 5–6 7+ 7 0 1–4 5–6 7–8 9–10 11–13 14+ 16 0 1–8 9–11 12–14 15–18 19–22 23+ 25 0 1 2 3 4 5 6+ 6
8 0 1 2 3 4 5 6–78+ 8 0 1–3 4–5 6–7 8–9 10–11 12–14 15+ 17 0 1–7 8–10 11–13 14–16 17–19 20–23 24+ 26 0 1 2 3 4 5 6 7+ 7
9 0 1 2 3 4 5 6 7–8 9+ 9 0 1–3 4–5 6–7 8–9 10–11 12–13 14–16 17+ 18 0 1–7 8–10 11–12 13–14 15–17 18–20 21–24 25+ 27 0 1 2 3 4 5 6 7 8+ 8
N Number of Groups Drinking in 30 Days (ZIP) Ma Drinking in 12 Months (ZIP) Ma Lifetime Drinking (ZIP) Ma Binge Drinking (ZIP) Ma
λ = [4.38, 7.75] and P = [.36, .59] λ = [9.76, 16.29] and P = [.58, .81] λ = [17.08, 25.17] and P = [.63, .88] λ = [1.98, 3.34] and P = [.22, .39]
3 0 1–6 7+ 8 0 1–12 13+ 15 0 1–20 21+ 24 0 1–3 4+ 4
4 0 1–4 5–7 8+ 9 0 1–10 11–15 16+ 18 0 1–18 19–24 25+ 27 0 1–2 3–4 5+ 5
5 0 1–4 5–6 7–9 10+ 11 0 1–9 10–13 14–17 18+ 19 0 1–16 17–21 22–26 27+ 29 0 1 2 3–4 5+ 6
6 0 1–3 4–5 6–7 8–10 11+ 11 0 1–8 9–11 12–14 15–18 19+ 20 0 1–15 16–19 20–23 24–28 29+ 30 0 1 2 3 4–56+ 6
7 0 1–2 3–4 5–6 7–8 9–10 11+ 12 0 1–7 8–10 11–13 14–16 17–19 20+ 22 0 1–14 15–18 19–21 22–24 25–28 29+ 32 0 1 2 3 4 5–67+ 7
8 0 1–2 3–4 5 6 7–8 9–10 11+ 13 0 1–7 8–9 10–11 12–13 14–16 17–19 20+ 22 0 1–13 14–16 17–19 20–22 23–25 26–29 30+ 32 0 1 2 3 4 5 6 7+ 8
9 0 1–2 3–4 5 6 7 8–9 10–11 12+ 13 0 1–7 8–9 10–11 12–13 14–15 16–17 18–2021+ 23 0 1–12 13–15 16–18 19–20 21–23 24–26 27–30 31+ 33 0 1 2 3 4 5 6 7 8+ 9

Note: ZIP = zero-inflated Poisson.

a

The lowest M needed to find the global optimal grouping scheme.

Table 5.

Estimated Optimal Grouping Schemes With Different Numbers of Groups: Uniform Prior Distributions.

Number of Groups G Low λ Ma Moderate λ Ma High λ (M= 40) Ma Unspecified λ (M = 40) Ma
λ = [0.01, 3] λ = [3, 10] λ = [10, 30] λ = [0.01, 30]
3 0 1 2+ 3 0 1–5 6+ 8 0 1–16 17+ 24 0 1–2 3+ 15
4 0 1 2–3 4+ 4 0 1–4 5–8 9+ 10 0 1–13 14–22 23+ 27 0 1 2–5 6+ 20
5 0 1 2 3–4 5+ 5 0 1–3 4–6 7–9 10+ 11 0 1–11 12–17 18–25 26+ 30 0 1 2–4 5–11 12+ 23
6 0 1 2 3 4 5+ 6 0 1–3 4–5 6–7 8–10 11+ 12 0 1–10 11–15 16–20 21–27 28+ 31 0 1 2–3 4–8 9–17 18+ 25
7 0 1 2 3 4 5 6+ 7 0 1–2 3–4 5–6 7–8 9–11 12+ 13 0 1–10 11–14 15–18 19–23 24–29 30+ 33 0 1 2–3 4–7 8–13 14–22 23+ 27
8 0 1 2 3 4 5 6 7+ 7 0 1–2 3 4–5 6–7 8–9 10–12 13+ 14 0 1–9 10–12 13–15 16–19 20–24 25–30 31+ 34 0 1 2–3 4–6 7–10 11–16 17–24 25+ 29
9 0 1 2 3 4 5 6 7 8+ 8 0 1–2 3 4 5 6–7 8–9 10–12 13+ 14 0 1–8 9–11 12–14 15–17 18–21 22–25 26–31 32+ 35 0 1 2 3–4 5–7 8–11 12–17 18–25 26+ 29
Number of Groups G Moderate λ and Low P Ma Moderate λ and Moderate P Ma Moderate λ and High P Ma Moderate λ and Unspecified P Ma
λ = [3, 10] and P = [.001, .4] λ = [3, 10] and P= [.4, .6] λ = [3, 10] and P = [0.6, 1] λ = [3, 10] and P= [0.001, 1]
3 0 1–5 6+ 8 0 1–5 6+ 8 0 1–5 6+ 8 0 1–5 6+ 8
4 0 1–4 5–8 9+ 10 0 1–4 5–8 9+ 10 0 1–4 5–8 9+ 10 0 1–4 5–8 9+ 10
5 0 1–3 4–6 7–9 10+ 11 0 1–3 4–6 7–9 10+ 11 0 1–3 4–6 7–9 10+ 11 0 1–3 4–6 7–9 10+ 11
6 0 1–3 4–5 6–7 8–10 11+ 12 0 1–3 4–5 6–7 8–10 11+ 12 0 1–3 4–5 6–7 8–10 11+ 12 0 1–3 4–5 6–7 8–10 11+ 12
7 0 1–2 3–4 5–6 7–8 9–11 12+ 13 0 1–2 3–4 5–6 7–8 9–11 12+ 13 0 1–2 3–4 5–6 7–8 9–11 12+ 13 0 1–2 3–4 5–6 7–8 9–11 12+ 13
8 0 1–2 3 4–5 6–7 8–9 10–12 13+ 14 0 1–2 3 4–5 6–7 8–9 10–12 13+ 14 0 1–2 3 4–5 6–7 8–9 10–12 13+ 14 0 1–2 3 4–5 6–7 8–9 10–12 13+ 14
9 0 1–2 3 4 5 6–7 8–9 10–12 13+ 14 0 1–2 3 4 5 6–7 8–9 10–12 13+ 14 0 1–2 34 5 6–7 8–9 10–12 13+ 14 0 1–2 3 4 5 6–7 8–9 10–12 13+ 14
a

The lowest M needed to find the global optimal grouping scheme.

Based on results calculated by find.scheme, Table 3 shows optimal grouping schemes given combinations of the (maximum) number of groups N and prior distributions. The last entry under the lifetime drinking (ZIP) scenario is estimated by the following command:

find.scheme(M=35, N=9, density=function(…)1, lambda. Lwr=17.08, lambda.upr=25.17, p.lwr=0.63, p.upr=0.88, model=“ZIP”)

which yields the same optimal grouping scheme as

find.scheme(M=35, N=9, density=function(x)1, lambda. Lwr=17.08, lambda.upr=25.17)

We use a uniform distribution as the prior distribution in Table 3. It should be noted, however, that other continuous or discrete distributions can also be processed by the R function as the prior distribution. The third cell [0, 1, 2, 3–4, 5þ] under the drinking in 30 days (Poisson) scenario means that the optimal grouping scheme is zero, once, twice, three and four times, and five times and more, given that the total number of groups is five and the range for λ’s prior distribution is [1.34, 4.02]. Across different scenarios, the lowest M required to identify the global optimal scheme is also provided. As the search algorithm tolerates false rejection, the lowest M is often slightly larger than the lowest integer of the last right-censored group of the optimal scheme, and this difference becomes larger as λ increases. Across the eight scenarios in Table 3, the cutoff integers between two adjacent groups tend to concentrate on smaller integers if the parameter space of λ is close to zero (the binge drinking scenario). If the parameter space of λ stays close to zero (e.g., rare events), any grouping decision of small integers is not supported by the search algorithm as the maximum number of groups N increases (see N = 7 or 8 in the binge drinking scenario). This finding suggests that the GRC count response is inappropriate for collecting very rare count events. The cutoff integers tend to appear first around the mean of the parameter space of λ and then appear at other locations as N increases. The prior distributions in Table 6 are truncated (mean ± 3 standard deviations) Gaussian distributions whose means and standard deviations are provided in Table 2. Table 4 also demonstrates that the cutoff integers of grouping decisions often exist at integers around which l has higher probability density. For the same combination of N and range of prior distributions, the optimal schemes listed in Tables 3 and 4 are virtually the same, suggesting that the search for optimal schemes is not sensitive to the choice of prior distributions. Table 5 lists optimal grouping schemes when λ is low, moderate, high, and unspecified for readers’ reference. Because zero is contained in a separate group for the ZIP case from Tables 35, the optimal grouping scheme remains the same when λ is fixed but p varies.

Table 6.

Parameter Estimates for Different Grouping Schemes: Results From Simulation.

Parameters Used for Simulation Grouping Schemes λ¯MLE Standard Errors λ¯MLE Standard Errors λ¯MLE Standare Errors
(100)a (1,000)a (10,000)a
λ = 2.723 Reference [0, 1–2, 3–5, 6–9, 10–19, 20–39, 40+] 2.718 .179 2.719 .057 2.723 .018
Optimal [0, 1, 2, 3, 4, 5–6, 7+] 2.718 .166 2.719 .054 2.723 .017
Optimalb [0, 1 −2, 3–4, 5+] 2.719 .176 2.72 .057 2.723 .018
λ = 8.750 Reference [0, 1–2, 3–5, 6–9, 10–19, 20–39, 40+] 8.746 .330 8.742 .106 8.749 .033
Optimal [0, 1–5, 6–7, 8–9, 10–11, 12–14, 15+] 8.743 .298 8.742 .096 8.749 .03
Optimalb [0, 1 −7, 8–11, 1 2+] 8.746 .322 8.742 .105 8.749 .033
λ = 15.425 Reference [0, 1–2, 3–5, 6–9, 10–19, 20–39, 40+] 15.396 .502 1 5.409 .157 15.424 .048
Optimal [0, 1–10, 11–13, 14–15, 16–18, 19–21, 22+] 15.395 .403 1 5.409 .129 15.423 .039
Optimalb [0, 1–13, 14–18, 19+] 1 5.403 .436 15.413 .142 15.424 .043
λ = 0.774 Reference [0, 1, 2, 3–5, 6–9, 10+] 0.773 .088 0.772 .028 0.774 .009
Optimal [0, 1, 2, 3, 4, 5+] 0.773 .087 0.772 .028 0.774 .009
Optimalb [0, 1, 2+] 0.775 .090 0.773 .029 0.774 .009
λ= 1 Reference [0, 1, 2, 3–5, 6–9, 10+] 0.999 .102 0.998 .033 1 .010
Optimal [0, 1, 2, 3, 4, 5+] 0.999 .101 0.998 .032 1 .010
Optimalb [0, 1, 2+] 1.001 .106 0.998 .035 1 .010
λ = 3 Reference [0, 1–2, 3–5, 6–9, 10–19, 20–39, 40+] 2.996 .187 2.998 .060 3 .019
Optimal [0, 1, 2, 3, 4, 5–6, 7+] 2.996 .173 2.996 .055 3 .018
Optimalb [0, 1 −2, 3–4, 5+] 3 .183 2.998 .059 3 .019
λ = 5 Reference [0, 1–2, 3–5, 6–9, 10–19, 20–39, 40+] 4.997 .240 4.993 .076 4.999 .024
Optimal [0, 1 −2, 3–4, 5, 6–7, 8–9, 10+] 4.994 .225 4.994 .073 4.999 .023
Optimalb [0, 1 −4, 5–7, 8+] 4.991 .240 4.994 .079 5 .025
λ = 10 Reference [0, 1–2, 3–5, 6–9, 10–19, 20–39, 40+] 9.993 .373 9.997 .122 10 .037
Optimal [0, 1–6, 7–8, 9–10, 11–12, 13–15, 16+] 9.986 .322 9.994 .105 10 .032
Optimalb [0, 1 −8, 9–12, 1 3+] 9.99 .349 9.997 .114 10.001 .035
λ = 20 Reference [0, 1–2, 3–5, 6–9, 10–19, 20–39, 40+] 1 9.988 .552 1 9.994 .182 19.999 .054
Optimal [0, 1–14, 15–17, 18–20, 21–23, 24–27, 28+] 1 9.966 .467 1 9.985 .146 20 .045
Optimalb [0, 1–17, 18–23, 24+] 19.971 .502 1 9.987 .159 20 .049
a

Each simulation is repeated 1,000 times to calculate estimates. Numbers in the parentheses are sample sizes used for simulation.

b

Optimal grouping schemes with fewer groups.

Table 4.

Estimated Optimal Grouping Schemes With Different Numbers of Groups: Normal Prior Distributions.

N Number of Groups Drinking in 30 Days (Poisson) Ma Drinking in 12 Months (Poisson) Ma Lifetime Drinking (Poisson) Ma Binge Drinking (Poisson) Ma
λ = [1.34, 4.02] λ = [5.12, 12.38] λ = [9.92, 20.93] λ = [0.39, 1.16]
3 0 1–3 4+ 4 0 1–8 9+ 11 0 1–15 16+ 18 0 1 2+
4 0 1–2 3–4 5+ 5 0 1–6 7–10 11+ 13 0 1–12 13–17 18+ 21 0 1 2 3+ 3
5 0 1 2 3–4 5+ 6 0 1–6 7–9 10–12 13+ 14 0 1–11 12–15 16–19 20+ 22 0 1 2 3 4+ 4
6 0 1 2 3 4–5 6+ 6 0 1–5 6–7 8–9 10–12 13+ 15 0 1–10 11–13 14–16 17–20 21+ 23 0 1 2 3 4 5+ 5
7 0 1 2 3 4 5–6 7+ 7 0 1–4 5–6 7–8 9–10 11–13 14+ 15 0 1–9 10–12 13–15 16–18 19–22 23+ 24 0 1 2 3 4 5 6+ 6
8 0 1 2 3 4 5 6 7+ 8 0 1–4 5–6 7–8 9–10 11–12 13–15 16+ 16 0 1–9 10–12 13–14 15–16 17–19 20–23 24+ 25 0 1 2 3 4 5 6 7+ 7
9 0 1 2 3 4 5 6 7 8+ 9 0 1–3 4–5 6–7 8 9–10 11–12 13–15 16+ 17 0 1–8 9–11 12–13 14–15 16–17 18–20 21–23 24+ 26 0 1 2 3 4 5 6 7 8+ 8
N Number of Groups Drinking in 30 Days (ZIP) Ma Drinking in 12 Months (ZIP) Ma Lifetime Drinking (ZIP) Ma Binge Drinking (ZIP) Ma
λ = [4.38, 7.75] and P = [0.36, 0.59] λ = [9.76, 16.29] and P = [.58, .81] λ = [17.08, 25.17] and P = [.63, .88] λ = [1.98, 3.34] and P = [.22, .39]
3 0 1–6 7+ 8 0 1–13 14+ 15 0 1–21 22+ 24 0 1–3 4+ 4
4 0 1–4 5–7 8+ 9 0 1–10 11–15 16+ 18 0 1–18 19–24 25+ 27 0 1–2 3–4 5+ 5
5 0 1–4 5–6 7–9 10+ 10 0 1–9 10–13 14–17 18+ 19 0 1–16 17–21 22–26 27+ 29 0 1–2 3 4–5 6+ 6
6 0 1–3 4–5 6–7 8–9 10+ 11 0 1–8 9–11 12–14 15–18 19+ 20 0 1–15 16–19 20–23 24–27 28+ 30 0 1 2 3 4–5 6+ 6
7 0 1–3 4–5 6 7–8 9–10 11+ 12 0 1–8 9–11 12–13 14–16 17–19 20+ 21 0 1–14 15–18 19–21 22–24 25–28 29+ 31 0 1 2 3 4 5–6 7+ 7
8 0 1–2 3–4 5 6 7–8 9–10 11+ 12 0 1–7 8–10 11–12 13–14 15–16 17–19 20+ 22 0 1–14 15–17 18–20 21–23 24–26 27–30 31+ 32 0 1 2 3 4 5 6 7+ 8
9 0 1–2 3–4 5 6 7 8–9 10–11 12+ 12 0 1–7 8–9 10–11 12–13 14–15 16–17 18–20 21+ 22 0 1–13 14–16 17–19 20–21 22–23 24–26 27–30 31+ 33 0 1 2 3 4 5 6 7 8+ 8

Note: ZIP = zero-inflated Poisson.

a

The lowest M needed to find the global optimal grouping scheme.

To illustrate how an optimal grouping scheme is preferred to other grouping schemes, we use the grouping scheme adopted by the MTF binge drinking question as a reference grouping scheme and compare standard errors estimated under different grouping schemes. In Table 6, the first column is the true parameters we used to simulate the Poisson distributions. Both parameters inferred from the data (see Table 2) and hypothetical parameters are used. The reference schemes are those adopted by the MTF study to measure alcohol drinking. The optimal schemes are generated by find.scheme with the same number of groups as the reference group. For each scenario, the simulation is repeated 1,000 times to calculate estimates.

When λ is small, the reference schemes appear to be acceptable as their corresponding standard errors are only slightly larger than these calculated based on optimal grouping schemes. However, the differences between standard errors estimated from the reference schemes and those of the optimal schemes grow larger as λ increases. As expected, the standard errors decrease by 10 when sample sizes increase by 10. Moreover, the strength of this algorithm can be illustrated by optimal schemes with fewer groups than corresponding reference schemes. Compared with estimation based on the reference schemes, researchers could achieve almost the same, some-times better, efficiency of estimation by adopting optimal schemes with even smaller numbers of groups N. In other words, an optimal grouping scheme can outperform a nonoptimal one even if the latter has more groups.

Discussion and Conclusion

This research applies optimal experimental design, a branch of semisupervised machine learning, to social science research and provides a novel algorithm to find the optimal grouping scheme of GRC count responses. One of the most striking features of social science research on survey methodology is the degree to which the design of response categories in survey questions has been neglected. Count responses with grouping and right censoring have long been collected by social scientists to study a variety of behaviors, status, and attitudes. Yet there has been little research on optimal designs for discrete response categories such that grouping or right-censoring decisions often rely on arbitrary choices of survey investigators. To search for optimal grouping schemes, this article first uses Poisson multinomial mixture models to conceptualize the data-generating process of count data with grouping and right censoring and then investigates the relationship between grouping-scheme choices and asymptotic distributions of the Poisson multinomial models. Using different types of optimalities in experimental designs (De Leon and Atkinson 1991), we investigate local objective functions of the Fisher information (matrix) and further demonstrate the possibility of optimal designs for GRC count responses: The optimal grouping scheme should maximize the global objective function of the Fisher information (matrix). We also propose a new three-step general algorithm to process infinitely many grouping schemes and identify the global optimal grouping scheme. To process all possible grouping schemes, this algorithm introduces a sufficiently large integer M, which is in theory the lowest integer contained in the right-censored group of the global optimal scheme. The introduction of M not only makes the search feasible but also tolerates false rejection of the global optimal grouping scheme. A new R package GRCdata is developed to implement this algorithm and help survey investigators to assess grouping schemes of count responses. The use of two R programs grcmle and find.scheme in GRCdata is illustrated by empirical examples of alcohol drinking. Results from data simulation show that the optimal designs yielded by this new algorithm considerably outperform existing designs: The optimal grouping scheme, even with fewer total number of groups, can lead to more efficient estimation.

The M algorithm and software programs presented in this research readily provide survey investigators a new tool for evaluating grouping and rightcensoring decisions of count responses in surveys. While survey methodologists do need to take a series of factors (e.g., the coherence of response categories over time and across questions or whether a specific count is of research interest or has substantive meaning) into account when designing response categories (Schaeffer and Dykema 2011), the new R package developed allow scholars to incorporate their prior knowledge in optimal designs of survey questions. Although this research only addresses (ZI) Poisson models of count data, it should be noted that the application of the M search algorithm is not restricted to the two statistical models investigated and can be extended to other models of count data such as negative binomial models and hurdle models. If the assumption that the Fisher information increases with a finer grouping scheme holds for other discrete or continuous data-generating processes, this M algorithm can be employed for designing survey responses in general. Such potential applications of this algorithm to broader issues in survey methodology merit further attention.

Acknowledgment

The authors would like to thank Junhui Wang, Jiahua Chen, Sayan Mukherjee, Li Ma, Ding-Xuan Zhou, Tim Liao, Zheng Wu, Nan Lin, Linda K. George, Yanlong Zhang, Yandong Zhao, and seminar/conference participants at University of Victoria, Shanghai University, the 2013 Joint Statistical Meetings (Montreal, Canada), and the 2015 Methodology Section Mid-year Meeting of American Sociological Association (San Diego, USA) for their helpful comments.

Funding

The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: The authors gratefully acknowledge the financial support from the Research Grants Council of Hong Kong (ECS Project No. PolyU 25301115), Hampton New Faculty Award at The University of British Columbia, Chiang Ching-kuo Foundation for International Scholarly Exchange and a 2015 Major Project of the National Social Sciences Foundation in China (grant no. 15ZDB172).

Biography

Qiang Fu is an assistant professor of sociology at The University of British Columbia. His methodological research interests include the application of machine-learning tools in social sciences, demographic methods, and social network analysis, while his substantive interests focus on urban studies, social networks, health, and China.

Xin Guo is an assistant professor in the Department of Applied Mathematics at The Hong Kong Polytechnic University. His research interests include statistical learning theory (kernel methods, support vector machine, error analysis, sparsity analysis, and the implementation of algorithms), and computational social science.

Kenneth C. Land is the John Franklin Crowell Professor Emeritus of Sociology and research professor in Social Science Research Institute at Duke University. He is an elected fellow of the American Statistical Association and was the 1997 recipient of the Paul F. Lazarsfeld Award from the Methodology Section of the American Sociological Association. His research interests are in the development of mathematical and statistical models and methods for substantive applications in demography, criminology, and social indicators/quality-of-life studies.

Footnotes

Declaration of Conflicting Interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

1.

The proof follows Theorems 6.3.7 and 6.3.10 at (Lehmann and Casella 1998) and is available upon request.

References

  1. Akers Ronald L., La Greca Anthony J., Cochran John, and Sellers Christine. 1989. “Social Learning Theory and Alcohol Behavior Among the Elderly.” The Sociological Quarterly 30:625–38. [Google Scholar]
  2. Atkinson Anthony, Donev Alexander, and Tobias Randall. 2007. Optimum Experimental Designs, With SAS. Oxford, UK: Oxford University Press. [Google Scholar]
  3. Bachman Jerald G., Johnston Lloyd D., and O’Malley Patrick M.. 1990. “Explaining the Recent Decline in Cocaine Use Among Young Adults: Further Evidence That Perceived Risks and Disapproval Lead to Reduced Drug Use.” Journal of Health and Social Behavior 31:173–84. [PubMed] [Google Scholar]
  4. Bailey Susan L., Flewelling Robert L., and Rachal J. Valley. 1992. “Predicting Continued Use of Marijuana Among Adolescents: The Relative Influence of Drug-specific and Social Context Factors.” Journal of Health and Social Behavior 33:51–65. [PubMed] [Google Scholar]
  5. Barnes Grace M., Hoffman Joseph H., Welte John W., Farrell Michael P., and Dintcheff Barbara A.. 2006. “Effects of Parental Monitoring and Peer Deviance on Substance Use and Delinquency.” Journal of Marriage and Family 68:1084–104. [Google Scholar]
  6. Basu Bharati and Famoye Felix. 2004. “Domestic Violence Against Women, and Their Economic Dependence: A Count Data Analysis.” Review of Political Economy 16:457–72. [Google Scholar]
  7. Blazer Dan, Burchett Bruce, George Linda K., and Service Connie. 1991. “The Association of Age and Depression Among the Elderly: An Epidemiologic Exploration.” Journal of Gerontology 46:M210–15. [DOI] [PubMed] [Google Scholar]
  8. Bollen Kenneth A. 1990. “Political Democracy: Conceptual and Measurement Traps.” Studies in Comparative International Development 25:7–24. [Google Scholar]
  9. Bradburn Norman M., Sudman Seymour, and Wansink Brian. 2004. Asking Questions: The Definitive Guide to Questionnaire Design–for Market Research, Political Polls, and Social and Health Questionnaires. San Francisco, CA: John Wiley & Sons. [Google Scholar]
  10. Chaloner Kathryn and Verdinelli Isabella. 1995. “Bayesian Experimental Design: A Review.” Statistical Science 10:273–304. [Google Scholar]
  11. Cheibub Jose Antonio, Przeworski Adam, Limongi Neto Fernando Papaterra, and Alvarez Michael M.. 1996. “What Makes Democracies Endure?” Journal of Democracy 7:39–55. [Google Scholar]
  12. Chernoff Herman. 1953. “Locally Optimal Designs for Estimating Parameters.” The Annals of Mathematical Statistics 24:586–602. [Google Scholar]
  13. Cohn David A., Ghahramani Zoubin, and Jordan Michael I.. 1996. “Active Learning With Statistical Models.” Journal of Artificial Intelligence Research 4:129–45. [Google Scholar]
  14. De Leon, Ponce AC, and Atkinson Anthony C.. 1991. “Optimum Experimental Design for Discriminating Between Two Rival Models in the Presence of Prior Information.” Biometrika 78:601–08. [Google Scholar]
  15. Dette Holger, Melas Viatcheslav B., and Pepelyshev Andrey. 2004. “Optimal Designs for a Class of Nonlinear Regression Models.” Annals of Statistics 32: 2142–67. [Google Scholar]
  16. Elkins Zachary. 2000. “Gradations of Democracy? Empirical Tests of Alternative Conceptualizations.” American Journal of Political Science 44:293–300. [Google Scholar]
  17. Fu Qiang, Land Kenneth C., and Lamb Vicki L.. 2013. “Bullying Victimization, Socioeconomic Status and Behavioral Characteristics of 12th Graders in the United States, 1989 to 2009: Repetitive Trends and Persistent Risk Differentials.” Child Indicators Research 6:1–21. [Google Scholar]
  18. Fu Qiang, Land Kenneth C., and Lamb Vicki L.. 2016. “Violent Physical Bullying Victimization at School: Has There Been a Recent Increase in Exposure or Intensity? An Age-period-cohort Analysis in the United States, 1991 to 2012.” Child Indicators Research 9:485–513. [Google Scholar]
  19. Fu Qiang, Guo Xin, and Land Kenneth C.. 2018. “A Poisson-multinomial Mixture Approach to Grouped and Right-censored Counts.” Communications in Statistics-Theory and Methods 47: 427–447. [Google Scholar]
  20. Goodman Leo A. 1987. “New Methods for Analyzing the Intrinsic Character of Qualitative Variables Using Cross-classified Data.” American Journal of Sociology 93:529–83. [Google Scholar]
  21. Groves Robert M., Fowler Floyd J. Jr, Couper Mick P., Lepkowski James M., Singer Eleanor, and Tourangeau Roger. 2011. Survey Methodology. Hoboken, NJ: John Wiley & Sons. [Google Scholar]
  22. Hagan John, Shedd Carla, and Payne Monique R.. 2005. “Race, Ethnicity, and Youth Perceptions of Criminal Injustice.” American Sociological Review 70:381–407. [Google Scholar]
  23. Hall Daniel B. 2000. “Zero-inflated Poisson and Binomial Regression With Random Effects: A Case Study.” Biometrics 56:1030–39. [DOI] [PubMed] [Google Scholar]
  24. Horn Roger A. and Johnson Charles R.. 2013. Matrix Analysis. 2nd ed. Cambridge, UK: Cambridge University Press. [Google Scholar]
  25. Klein Nadja, Kneib Thomas, and Lang Stefan. 2015. “Bayesian Generalized Additive Models for Location, Scale, and Shape for Zero-inflated and Overdispersed Count Data.” Journal of the American Statistical Association 110:405–19. [Google Scholar]
  26. Lambert Diane. 1992. “Zero-inflated Poisson Regression, With an Application to Defects in Manufacturing.” Technometrics 34:1–14. [Google Scholar]
  27. Lang Joseph B. 2004. “Multinomial-Poisson Homogeneous Models for Contingency Tables.” Annals of Statistics 32:340–83. [Google Scholar]
  28. Lehmann Erich Leo and Casella George. 1998. Theory of Point Estimation. New York: Springer-Verlag. [Google Scholar]
  29. Lord Dominique, Washington Simon P., and Ivan John N.. 2005. “Poisson, Poissongamma and Zero-inflated Regression Models of Motor Vehicle Crashes: Balancing Statistical Fit and Theory.” Accident Analysis & Prevention 37:35–46. [DOI] [PubMed] [Google Scholar]
  30. Marsden Peter V. 2003. “Interviewer Effects in Measuring Network Size Using a Single Name Generator.” Social Networks 25:1–16. [Google Scholar]
  31. Minkin Salomon. 1987. “Optimal Designs for Binary Data.” Journal of the American Statistical Association 82:1098–103. [Google Scholar]
  32. Nguyen Nam-Ky and Miller Alan J.. 1992. “A Review of Some Exchange Algorithms for Constructing Discrete D-optimal Designs.” Computational Statistics & Data Analysis 14:489–98. [Google Scholar]
  33. Paik Anthony and Sanchagrin Kenneth. 2013. “Social Isolation in America: An Artifact.” American Sociological Review 78:339–60. [Google Scholar]
  34. Puig Pedro and Valero Jordi. 2006. “Count Data Distributions: Some Characterizations With Applications.” Journal of the American Statistical Association 101:332–40. [Google Scholar]
  35. Pukelsheim Friedrich. 1993. Optimal Design of Experiments. New York: John Wiley & Sons. [Google Scholar]
  36. Radloff Lenore Sawyer. 1977. “The CES-D Scale a Self-report Depression Scale for Research in the General Population.” Applied Psychological Measurement 1: 385–401. [Google Scholar]
  37. Reardon Sean F. and Raudenbush Stephen W.. 2006. “3. A Partial Independence Item Response Model for Surveys With Filter Questions.” Sociological Methodology 36:257–300. [Google Scholar]
  38. Schaeffer Nora Cate and Dykema Jennifer. 2011. “Questions for Surveys Current Trends and Future Directions.” Public Opinion Quarterly 75:909–61. [DOI] [PMC free article] [PubMed] [Google Scholar]
  39. Schaeffer Nora Cate and Presser Stanley. 2003. “The Science of Asking Questions.” Annual Review of Sociology 29:65–88. [Google Scholar]
  40. Schwarz Norbert, Hippler Hans-Juergen, Deutsch Brigitte, and Strack Fritz. 1985. “Response Categories: Effects on Behavioural Reports and Comparative Judgements.” Public Opinion Quarterly 49:388–95. [Google Scholar]
  41. Settles Burr. 2010. Active Learning Literature Survey. Madison: University of Wisconsin-Madison. [Google Scholar]
  42. Smith Herbert L. and Garnier Maurice A.. 1986. “Association Between Background and Educational Attainment in France.” Sociological Methods & Research 14:317–44. [Google Scholar]
  43. Steinberg David M. and Hunter William G.. 1984. “Experimental Design: Review and Comment.” Technometrics 26:71–97. [Google Scholar]
  44. Straus Murray Arnold, Gelles Richard J., and Smith Christine. 1990. Physical Violence in American Families: Risk Factors and Adaptations to Violence in 8,145 Families. New Brunswick, NJ: Transaction. [Google Scholar]
  45. Sudman Seymour, Bradburn Norman M., and Schwarz Norbert. 1996. Thinking About Answers: The Application of Cognitive Processes to Survey Methodology. San Francisco, CA: Jossey-Bass. [Google Scholar]
  46. Thoits Peggy A. and Hewitt Lyndi N.. 2001. “Volunteer Work and Well-being.” Journal of Health and Social Behavior 42:115–31. [PubMed] [Google Scholar]
  47. Wong Raymond Sin-Kwok. 2010. Association Models. Thousand Oaks, CA: Sage. [Google Scholar]

RESOURCES