Optimizing Count Responses in Surveys: A Machine-learning Approach

Qiang Fu; Xin Guo; Kenneth C Land

doi:10.1177/0049124117747302

. Author manuscript; available in PMC: 2021 Aug 1.

Published in final edited form as: Sociol Methods Res. 2018 Jan 30;49(3):637–671. doi: 10.1177/0049124117747302

Optimizing Count Responses in Surveys: A Machine-learning Approach

Qiang Fu ¹, Xin Guo ², Kenneth C Land ³

PMCID: PMC8034261 NIHMSID: NIHMS1032001 PMID: 33840866

Abstract

Count responses with grouping and right censoring have long been used in surveys to study a variety of behaviors, status, and attitudes. Yet grouping or right-censoring decisions of count responses still rely on arbitrary choices made by researchers. We develop a new method for evaluating grouping and right-censoring decisions of count responses from a (semisupervised) machine-learning perspective. This article uses Poisson multinomial mixture models to conceptualize the data-generating process of count responses with grouping and right censoring and demonstrates the link between grouping-scheme choices and asymptotic distributions of the Poisson mixture. To search for the optimal grouping scheme maximizing objective functions of the Fisher information (matrix), an innovative three-step M algorithm is then proposed to process infinitely many grouping schemes based on Bayesian A-, D-, and E-optimalities. A new R package is developed to implement this algorithm and evaluate grouping schemes of count responses. Results show that an optimal grouping scheme not only leads to a more efficient sampling design but also outperforms a nonoptimal one even if the latter has more groups.

Keywords: survey methodology, optimality, experimental design, search algorithm, machine learning, fisher information, zero inflation, right censoring, poisson distribution

The design of count responses in surveys is a common yet understudied topic in social sciences. Although the collection of exact counts of frequencies or incidence in social, epidemiological, and demographic surveys (e.g., number of births, frequencies of delinquent behaviors, incidents of diseases, and counts of social contacts) is analytically appealing, actual count responses in survey questions often consist of grouped counts (e.g., one response category “3–4 times” instead of two separate “3 times” and “4 times” categories) or are right-censored (e.g., the upper end response category as “6 or more times”). In fact, such grouped and right-censoring (GRC) count responses have long been adopted by social scientists to study a range of behaviors, events, and attitudes (Akers et al. 1989; Bachman, Johnston, and O’Malley 1990; Bailey, Flewelling, and Valley Rachal 1992; Barnes et al. 2006; Basu and Famoye 2004; Fu, Land, and Lamb 2013, 2016; Hagan, Shedd, and Payne 2005; Marsden 2003; Reardon and Raudenbush 2006; Schaeffer and Dykema 2011; Straus, Gelles, and Smith 1990; Thoits and Hewitt 2001). Scholars often find that GRC count responses are useful to study sensitive research topics (e.g., juvenile delinquency, domestic violence, and drug use) or to solicit information from respondents with less cognitive capacity (e.g., young adolescents or the oldest old). For example, one nationally representative survey project in the United States, the monitoring the future (MTF) study (or the National High School Senior Survey), has used GRC count responses to track annual trends of delinquency and substance use among U.S. high school seniors since the 1975. Such GRC count responses have also been used by the National Longitudinal Study of Adolescent Health (Add Health) to study adolescent behaviors at home, school, or neighborhood.

As documented in existing literature (Bradburn, Sudman, and Wansink 2004; Schwarz et al. 1985), the design of GRC count responses has a direct impact on the estimation of behavioral or cognitive frequencies. For example, an experimental study shows that the choice of right-censored count categories influences the estimation of TV-watching time Schwarz et al. (1985). Yet the design of GRC count responses is still arbitrarily determined by survey investigators. This practice is surprising, given the abundant presence of count responses with either right censoring or grouping or both in surveys. Under certain scenarios, determining the optimal grouping scheme of GRC count responses has been implemented by sophisticated statistical procedures and context-specific research designs, depending on the other variables of interest. For example, the intrinsic or contingent ordering of log-multiplicative association models may provide the optimal grouping scheme if conditional or joint distributions of variables in contingency tables are provided (Goodman 1987; Smith and Garnier 1986; Wong 2010). Likewise, given the extensive debates over the conceptualizations of gradations of democracy (Bollen 1990; Cheibub et al. 1996), it is found that the validity of dichotomous and graded measures of democracy can be evaluated by projecting their qualitative difference into two essential indicators related to democracy, international conflict, and regime stability (Elkins 2000). Although these innovative studies provide useful tools for scholars to assess grouping decisions for counts that are intrinsic to specific research questions at stake, their statistical procedures or research designs require additional information on the distribution of the ungrouped outcome variable and its association with other variables. Nevertheless, the use of GRC count responses often means that investigators have yet to understand the distribution of counts that are extrinsic to a specific research question, let alone its association with other variables. A search algorithm for the optimal grouping scheme focusing exclusively on the outcome variable per se rather than its research context is therefore useful and readily facilitates the evaluation of alternative grouping schemes with different a priori assumptions.

Applying the theory of optimal experimental designs (Atkinson, Donev, and Tobias 2007; De Leon and Atkinson 1991; Dette, Melas, and Pepelyshev 2004; Minkin 1987), we propose an innovative three-step algorithm for searching the parameter space and generating optimal grouping decisions for GRC count responses. In the machine-learning literature, optimal experimental design is also referred as a special case of semisupervised machine learning (or active learning) because a learning/search algorithm interacts with users (survey investigators for the current research) to obtain optimal outputs from the parameter space (Cohn, Ghahramani, and Jordan 1996; Settles 2010). Based on a Poisson multinomial mixture distribution, this article begins with configuring the data-generating process of GRC count responses and develops related maximum likelihood estimators. Two members of the Poisson family of frequency distributions, the Poisson and zeroinflated Poisson (ZIP), are studied in detail. Combined with prior Poisson distribution parameters, the Fisher information (matrix) of the maximum likelihood estimator is then employed to implement a new M search algorithm using Bayesian A-, D-, and E-optimalities. An R package [version 1.0] GRCdata currently consisting of two functions find.scheme and grcmle has been written to assess the grouping decisions of count responses.

GRC Count Data

Before discussing optimal designs for GRC count data, the question arises as to why such response categories have been adopted by social scientists. As examples, in various surveys, respondents are asked to list their numbers of close friends, weekly frequencies of alcohol intake, incidents of criminal victimization in the recent six months, times of illness in the last year, and lifetime history of residential moves. Admittedly, a precise enumeration of exact counts is methodologically appealing for two reasons. First, exact counts can be readily analyzed by existing statistical tools (e.g., Poisson regression models) and software packages. Second, survey investigators do not have to deal with arbitrary grouping or right-censoring decisions. Yet one major problem encountered by survey investigators is that the precise enumeration of counts can impose a cognitive burden on interviewees and sometimes leads to excessive missing data. In other words, the GRC data structure is a compromise between what survey investigators want and what respondents are willing or able to offer (Groves et al. 2011; Schaeffer and Presser 2003). For example, although medical sociologists and psychiatrists would like to know exactly how many days in the past week respondents experienced a variety of depressive symptoms, respondents, especially these with depressive symptoms, often get frustrated when required to distinguish between, for example, two days and three days. Thus, the Center for Epidemiological Studies-Depression (CES-D) scale, an established selfreport depression measure, offers four grouped response categories: less than one day, one to two days, three to four days, and five to seven days (Radloff 1977). For a study on elder adults aged 65 and above, a pretest showed that respondents were unwilling to answer even the four grouped response categories of the CES-D scale, so researchers had to further collapse the four grouped categories and used a dichotomous measure instead (Blazer et al. 1991).

Likewise, for research topics that are perceived as sensitive or less socially desirable, such as personal income, number of sex partners, incidents of delinquent behaviors, and history of drug use, respondents feel more comfortable in reporting grouped or right-censored categories instead of exact numbers (Sudman, Bradburn, and Schwarz 1996). It is not surprising that most, if not all, questions related to the frequency of juvenile delinquency and drug use in both the MTF study and the Add Health adopted GRC count responses.

Even if respondents are willing to collaborate, the difficulty in recalling the exact number of events that happened some time (e.g., several months) ago makes the exact number of events unreliable and introduces additional measurement errors (Groves et al. 2011; Schaeffer and Presser 2003). Similarly, if listing the total number of events requires extra efforts during field interview, fatigue of interviewers can result in underreported numbers of events. For example, as interviewers were instructed to probe for more discussion partners, it has been shown that interviewer effects (e.g., the failure to elicit more private network data) contributed to the extensive debates concerning increasing social isolation in the United States (Paik and Sanchagrin 2013).

Generating GRC Count Data

In order to define the optimality for objective functions of GRC grouping schemes, we first configure a data-generating process for GRC count responses. The Poisson distribution is often used to model count data with probability mass:

f (y | λ) = e^{- y} \frac{λ^{y}}{y!}, y = 0, 1, 2, \dots,

(1)

where y is a random count variable and λ is both the mean and the variance of the Poisson distribution. To define a Poisson-based likelihood function for GRC count data, we propose a data-generating scheme in the form of a Poisson multinomial process. Similar Poisson multinomial models were previously used to study contingency tables and traffic accidents (Lang 2004; Lord, Washington, and Ivan 2005).

We let $G = {I_{j}}_{j = 1}^{N}$ denote a GRC grouping scheme with N groups (i.e., the total number of response categories) and consecutive subsets I₁, …,I_N of nonnegative integers {0, 1, 2, …}. For identically and independently distributed observations x_i’s from a Poisson(λ) distribution, we have,

α_{j} (x_{i}) = {\begin{cases} 1, & when x_{i} \in I_{j}, \\ 0, & otherwise. \end{cases}

(2)

In other words, we have a N-dimensional random vector (α₁, …, α_N) denoting the GRC count responses. For example, (α_{0}, α_{1}, α_{2}, α_{3−5}, α_{6−9}, α_{10+}) denotes the GRC count response categories never, once, twice, 3–5 times, 6–9 times, and 10 and more times. Note that for any given observation in a survey sample, there is one and only one component α_j for (α₁,…, α_N) that equals 1. This N-dimensional vector then has a multinomial distribution M(1, θ₁, …,θ_N), where the parameters θ_j depend on the parameter λ of the underlying Poisson (λ) distribution:

θ_{j} (λ) = \sum_{y \in I_{j}} e^{- λ} \frac{λ^{y}}{y!}, j = 1, \dots, N .

(3)

For example, the multinomial distribution corresponding to (α_{0}, α_{1}, α_{2}, α_{3−5}, α_{6−9}, α_{10+}) is

M (1, e^{- λ}, e^{- λ} λ, e^{- λ} \frac{λ^{2}}{2}, \sum_{y = 3}^{5} e^{- λ} \frac{λ^{y}}{y!}, \sum_{y = 6}^{9} e^{- λ} \frac{λ^{y}}{y!}, \sum_{y = 10}^{\infty} e^{- λ} \frac{λ^{y}}{y!}) .

The probability mass function of α(X) = (α(X₁), …, α(X_N)) is also given:

f (α | λ) = θ_{1}^{α_{1}} θ_{2}^{α_{2}} \dots θ_{N}^{α_{N}} .

If there are n independent observations ${x_{i}}_{i = 1}^{n}$ drawn from the Poisson (λ) distribution, the likelihood function is defined using the probability mass function of the Poisson multinomial distribution:

L (λ) = \prod_{i = 1}^{n} f (α (x_{i}) | λ) .

(4)

Because this likelihood function derives from a Poisson multinomial distribution, it is easy to show that the corresponding maximum likelihood estimator is consistent and asymptotically normal. More importantly, the variance of its asymptotic distribution is given by the inverse of the Fisher information.¹ In other words, any consistent sequence ${\hat{λ}}_{n}$ of roots of the likelihood in equation (4) satisfies $\sqrt{n} ({\hat{λ}}_{n} - λ_{0}) \to N (0, 1 / I (λ_{0}))$ in distribution, where I(λ₀) is the fisher information corresponding to a specific grouping scheme, and λ₀ is the underlying true parameter.

The Poisson distribution assumes that its mean λ equals variance. However, this assumption is violated if empirical frequency distributions show excess zeros relative to a Poisson distribution (Hall 2000; Klein, Kneib, and Lang 2015; Lambert 1992; Puig and Valero 2006). The ZIP distribution takes excess zeros into account and has the probability mass function:

f (y | λ, p) = {\begin{cases} 1 - p + p e^{- λ}, & when y = 0, \\ p e^{- λ} \frac{λ^{y}}{y!}, & when y > 0, \end{cases}

(5)

where p is the proportion of population exposed to the Poisson(λ) distribution.

Using the same distribution of α_j in equation (2), the GRC data α_j (X_i) defined in the last section now has a different multinomial distribution M(1, μ₁, … μ_N), where

μ_{1} (λ, p) = 1 - p + p \sum_{y \in I_{1}} e^{- y} \frac{λ^{y}}{y!}, and μ_{i} (λ, p) = p \sum_{y \in I_{i}} e^{- λ} \frac{λ^{y}}{y!}, for i = 2, \dots, N .

The probability mass function of α then depends on two parameters p and λ of the ZIP distribution, $f (α | λ, p) = μ_{1}^{α_{1}} \dots μ_{N}^{α_{N}}$ For independent observations ${x_{i}}_{i = 1}^{n}$ , we have the likelihood function

L (λ, p) = \prod_{i = 1}^{n} f (α (x_{i}) | λ, p) .

(6)

Again, based on Theorem 6.5.1 at Lehmann and Casella (1998), it is easy to show that the maximum likelihood estimators remain consistent and asymptotically normal for the ZIP case. The detailed proof of their asymptotic properties is given in Fu, Guo, and Land (2018). Estimators ${\hat{λ}}_{n}$ and ${\hat{p}}_{n}$ are asymptotically efficient in the sense that in distribution,

\sqrt{n} ({\hat{λ}}_{n} - λ_{0}) \to N (0, J_{22} / | J |),

\sqrt{n} ({\hat{p}}_{n} - p_{0}) \to N (0, J_{11} / | J |),

where J₁₁ and J₂₂ are the 1–1 and 2–2 entry of the Fisher information matrix J corresponding to a specific grouping scheme, and λ₀ and p₀ are the underlying true parameters. We will further illustrate and discuss the Fisher information matrix J in the following sections.

Optimal Designs for GRC Count Data

The foregoing demonstration that the asymptotic distributions of the maximum likelihood estimators of both the Poisson and the ZIP cases are characterized by the Fisher information (matrix) is important for defining Bayesian optimality of GRC grouping schemes, given the internal link between Fisher information (matrix) and grouping choices. For example, the Fisher information of the Poisson case depends on both the true unknown parameter λ₀ and the specific grouping scheme $G$ : If we know the true parameter λ₀, the corresponding asymptotic distribution of the estimator λ_n is entirely determined by grouping choices in the sense that a better grouping scheme is associated with a smaller variance of the asymptotic distribution or a more efficient estimator. Although the search for an optimal grouping scheme is easier if the true parameter λ₀ is known or given, one task of optimal experimental designs is to take uncertainty of unknown parameters into account by incorporating prior knowledge (from experts, prior research, or pilot studies) into a general search algorithm. Next, we further investigate the relations between the Fisher information (matrix) and grouping choices. An objective function is then proposed to synthesize these relations and facilitate our subsequent discussion of a general three-step search algorithm.

Fisher Information and Grouping Choices: the Poisson Case

For the Poisson case, the Fisher information of the previous Poisson multinomial distribution with parameter λ and the grouping scheme $G = {I_{j}}_{j = 1}^{N}$ is

I (λ) = I_{G} (λ) = - E [\frac{d^{2}}{d λ^{2}} log f (α | λ)] = - \sum_{j = 1}^{N} θ_{j} (λ) \frac{d^{2}}{d λ^{2}} {logθ}_{j} (λ) = \sum_{i = 1}^{N} \frac{{(θ_{j}^{'})}^{2}}{θ_{j}} .

(7)

Equation (7) follows the definition of probability mass function θ_j in equation (3) and $\sum_{j = 1}^{N} θ_{j} = 1$ We next remark on the relationship between the Fisher information and grouping choices.

Remark 4.1: When N = 1, we have θ₁ ≡ 1 and thus I = 0. This corresponds to a trivial case where data provide no information for optimal designs. When N ≥ 2, it is easy to see I > 0, and the search for an optimal grouping scheme becomes possible.

Remark 4.2: While in empirical applications we restrict N to be finite, we can also let N = ∞ and make each group contain only one integer. This scenario is exactly the same as precise enumeration without any grouped counts. Under this circumstance, equation (7) shows that the Fisher information I is 1/λ, which corresponds to the asymptotic variance of the Poisson estimator.

Remark 4.3: If we obtain a finer grouping scheme $G^{'}$ by dividing one or more groups of $G$ into subgroups, such a grouping scheme yields a larger Fisher information. To show this, let $θ = θ (λ) = \sum_{k = a + 1}^{c} e^{- λ} λ^{k} / k!$ be the probability corresponding to a particular group {a + 1, …, c} with a ≥ −1 and a + 1 < c. For a grouping scheme $G$ we see from equation (7) that this particular group contributes ${(θ^{'})}^{2} / θ$ to the overall Fisher information. Now, we divide this {a + 1, …, b} and {a + 1, …, c} with a + 1 ≤ b and a + 1 ≤ c. For the new finer grouping scheme, the contribution of these two subgroups to the Fisher information is

\frac{{(θ_{*}^{'})}^{2}}{θ_{*}} + \frac{{(θ_{* *}^{'})}^{2}}{θ_{* *}},

Where $θ_{*} = \sum_{k = a + 1}^{b} e^{- λ} λ^{k} / k!$ and $θ_{* *} = \sum_{k = b + 1}^{c} e^{- λ} λ^{k} / k!$ . Here we note that θ = θ_* + θ_** and U²v(u + v) + V²u(u + v) ≥ uv(U + V)² for u, v, U, V > 0. Substituting $U = θ_{*}^{'}$ , u = θ_*, $V = θ_{* *}^{'}$ and v = θ_**, we have

\frac{{(θ_{*}^{'})}^{2}}{θ_{*}} + \frac{{(θ_{* *}^{'})}^{2}}{θ_{* *}} \geq \frac{{(θ^{'})}^{2}}{θ} .

(8)

In inequality (8), we note that the equality holds if and only if $θ_{*}^{'} θ_{* *} = θ_{* *}^{'} θ_{*}$ (i.e, Uv = uV). Next, we further demonstrate that $θ_{*}^{'} θ_{* *} \neq θ_{* *}^{'} θ_{*}$ and the equality in equation (8) does not hold. If −1 < a < b < c < ∞, we have

e^{2 λ} (θ_{* *}^{'} θ_{*} - θ_{*}^{'} θ_{* *}) = \sum_{k = a + 1}^{c} \frac{λ^{b} λ^{k}}{b! k!} - \sum_{k = b + 1}^{c} \frac{λ^{a} λ^{k}}{a! k!} - \sum_{k = a + 1}^{b} \frac{λ^{c} λ^{k}}{c! k!}, = \sum_{k = 1}^{c - b} (\frac{1}{b! (a + k)!} - \frac{1}{a! (b + k)!}) λ^{a + b + k} + \sum_{k = 1}^{b - a - 1} (\frac{1}{b! (c - k)!} - \frac{1}{c! (b - k)!}) λ^{c + b - k} .

Likewise, for special cases where a = −1 or c = ∞ we also have

e^{2 λ} (θ_{* *}^{'} θ_{*} - θ_{*}^{'} θ_{* *}) = {\begin{cases} \sum_{k = 0}^{c} \frac{λ^{b} λ^{k}}{b! k!} - \sum_{k = 0}^{b} \frac{λ^{c} λ^{k}}{c! k!}, & i f - 1 = a < b < c < \infty, \\ \sum_{k = a + 1}^{\infty} \frac{λ^{b} λ^{k}}{b! k!} - \sum_{k = b + 1}^{\infty} \frac{λ^{a} λ^{k}}{a! k!}, & i f - 1 < a < b < c = \infty, \\ \sum_{k = 0}^{\infty} \frac{λ^{b} λ^{k}}{b! k!}, & i f - 1 = a < b < c = \infty . \end{cases}

Given that a < b < c, the coefficients of the polynomial $e^{2 λ} (θ_{*}^{'} θ_{* *} - θ_{* *}^{'} θ_{*})$ across all cases discussed above (i.e, −1 < a < b < c < ∞, a= −1 or c = ∞) must be positive. Since λ is also positive, $θ_{*}^{'} θ_{* *} - θ_{* *}^{'} θ_{*}$ cannot be zero and we have

\frac{{(θ_{*}^{'})}^{2}}{θ_{*}} + \frac{{(θ_{* *}^{'})}^{2}}{θ_{* *}} > \frac{{(θ^{'})}^{2}}{θ} .

We previously noted that the Fisher information of the Poisson case depends on both the specific grouping scheme $G$ and the true (unknown) parameter λ. Given the foregoing three remarks on the relationship between grouping choices and Fisher information, a probability function ρ can be defined to take prior knowledge of λ into account. In general, we define an objective function as

Ω_{P} (G) = \int_{0}^{\infty} I_{G} (λ) d ρ (λ),

where ρ is a continuous or discrete distribution. The introduction of ρ allows analysts to deal with the uncertainty in estimating the true parameter λ and explore optimal grouping schemes under different prior distributions. For example, if a survey investigator assumes that the true value of λ is known, ρ becomes a degenerate distribution with a point mass of 1 at λ₀ ∈ (0, ∞). For the continuous distribution case, we could specify a uniform distribution on [a, b] for ρ and obtain

Ω_{P} (G) = \frac{1}{b - a} \int_{a}^{b} I_{G} (λ) d λ .

ρ can also be specified as a discrete distribution supported on positive numbers λ₁; …; λ_n with probability masses q₁; …; q_n, respectively, and we have

Ω_{P} (G) = \sum_{j = 1}^{n} I_{G} (λ_{j}) q_{j}, \sum_{j = 1}^{n} q_{j} = 1.

Fisher Information and Grouping Choices: The ZIP Case

Given that the ZIP distribution has two parameters p and λ, its corresponding Fisher information is denoted by a symmetric and positive semidefinite matrix:

J (λ, p) = J_{G} (λ, p) = [\begin{matrix} J_{11} & J_{12} \\ J_{21} & J_{22} \end{matrix}] = \sum_{j} \frac{1}{μ_{j}} [\begin{matrix} {(\frac{\partial μ_{j}}{\partial λ})}^{2} & \frac{\partial μ_{j}}{\partial λ} \frac{\partial μ_{j}}{\partial p} \\ \frac{\partial μ_{j}}{\partial λ} \frac{\partial μ_{j}}{\partial p} & {(\frac{\partial μ_{j}}{\partial p})}^{2} \end{matrix}],

Where μ₁ = 1 – p + pθ₁, μ_j = pθ_j for j ≥ 2, and $θ_{j} = \sum_{y \in I_{j}} e^{- λ} λ^{y} / y!$ for j = 1; …; N. This matrix could also be expressed in the form of equation (7) as

J = [\begin{array}{l} \frac{p (p - 1) {(θ_{1}^{'})}^{2}}{θ_{1} μ_{1}} + p I (λ) & - \frac{θ_{1}^{'}}{μ_{1}} \\ - \frac{θ_{1}^{'}}{μ_{1}} & \frac{1 - θ_{1}}{p μ_{1}} \end{array}],

(9)

where $I (λ) = \sum_{j = 1}^{N} {(θ_{j}^{'})}^{2} / θ_{j}$ . Next, we remark on the relationship between Fisher information and grouping choices for the ZIP case.

Remark 4.4: Optimal designs become impossible in the trivial case when N = 1. This scenario is similar to Remark 4.1, as the Fisher information matrix J becomes a zero matrix. Another trivial case appears when N = 2 and the determinant of J is zero (note that $J_{11} (λ, p) = p {(θ_{1}^{'})}^{2} / (μ_{1} (1 - θ_{1}))$ ). Since the asymptotic distribution of $\sqrt{n} ({\hat{λ}}_{n} - λ_{0}, {\hat{p}}_{n} - p_{0})$ is now a degenerate distribution, optimal designs based on prior knowledge of both λ and p become impossible. When N ≥ 3, Remark 4.3 implies that $I (λ) > {(θ_{1}^{'})}^{2} / θ_{1} + {({(1 - θ_{1})}^{'})}^{2} / (1 - θ_{1})$ and the determinant of J is calculated as:

det (J) = \frac{1 - θ_{1}}{μ_{1}} I (λ) + \frac{(p - 1) {(θ_{1}^{'})}^{2} (1 - θ_{1})}{θ_{1} μ_{1}^{2}} - \frac{{(θ_{1}^{'})}^{2}}{μ_{1}^{2}}, = \frac{1 - θ_{1}}{μ_{1}} I (λ) - \frac{{(θ_{1}^{'})}^{2}}{θ_{1} μ_{1}} = \frac{1}{μ_{1}} ((1 - θ_{1}) I (λ) - \frac{{(θ_{1}^{'})}^{2}}{θ_{1}}) > 0.

The Fisher information matrix J is therefore strictly positive definite when N ≥ 3 and thus can be used for optimal designs.

For both the Poisson and the ZIP cases, we have demonstrated that asymptotic variances are given by the inverse of Fisher information (matrix). Given that in experimental designs an optimal design is often selected to yield the most efficient estimator (see, e.g., Steinberg and Hunter 1984), an optimal grouping scheme of GRC data should, according to the same principle, maximize Fisher information (matrices) and produce more (asymptotically) efficient estimators. Because there are multiple ways of ordering square matrices, we introduce a local objective function S to compare Fisher information matrices: Optimising S will give a locally optimal design (Chernoff 1953), where local means that the design is optimal for a specific value of an unknown parameter (or vector). To illustrate the definition of S, we follow previous research on the Loewner partial order (see, e.g., Horn and Johnson 2013) and write J_*+J if J_* − J is positively semidefinite, where J_* and J both strictly positive definite matrices.

Definition 4.5 (objective function): We define a local objective function of positive definite matrices (e.g., the Fisher information matrices) as any function S satisfying S(J_*) ≥ S(J) if J_* ± J.

To maximize the Fisher information matrix and achieve more efficient estimation, we apply local objective functions based on three common optimality criteria (Horn and Johnson 2013; Steinberg and Hunter 1984): A-optimality: maximizing S_A = 1/tr(J⁻¹), where tr(J⁻¹) is the trace of J⁻¹; D-optimality: maximizing S_D = det(J); and E-optimality: maximizing S_E, where S_E is the minimum eigenvalue of J. If we assume that the two eigenvalues of J are e₁ and e₂, the A-, D-, and E-optimality designs maximize e₁e₂ = (e₁ + e₂), e₁e₂ and min(e₁, e₂), respectively (Nguyen and Miller 1992). Note that S_A, S_D, and S_E satisfy the definition of local objective functions above (see ,e.g., Horn and Johnson 2013:495). Among the three optimality criteria, A-optimality minimizes average asymptotic variances of all parameter estimates, D-optimality minimizes the generalized asymptotic variance (or the volume of the confidence ellipsoid under normality) of parameter estimates, and E-optimality minimizes the maximum asymptotic variance of the estimates of (components of) parameters. Because all three optimality criteria as information functions are isotonic with respect to the Loewner ordering (Pukelsheim 1993), results from simulations (not shown) suggest that optimal grouping schemes generated by the three methods are virtually the same. Nevertheless, we recommend the use of D-optimality due to its calculation simplicity. A- and E-optimalities may lead to more accurate results if there is a strong correlation between p and λ (Steinberg and Hunter 1984).

Remark 4.6: For the Poisson case, Remark 4.3 shows that a finer grouping scheme gives a larger Fisher information. This conclusion does not always hold for the ZIP case. For example, the local objective function $S_{2} (J) : = J_{22} = \frac{1 - θ_{1}}{p μ_{1}}$ depends entirely on how the first group of a grouping scheme is defined.S₂ remains unchanged if one divides a group other than the first group into more subgroups. Yet the conclusion $S (J_{G_{*}}) \geq S (J_{G})$ still holds for the ZIP case if a grouping scheme $G_{*}$ is finer than $G$

This conclusion that $S (J_{G_{*}}) \geq S (J_{G})$ becomes obvious once the difference $Δ J = J_{G_{*}} - J_{G}$ is shown to be positive semidefinite. To investigate whether ΔJ is positive semidefinite, we assume without loss of generality that $G_{*}$ is obtained by dividing one group from $G$ into two and denote

Δ J = [\begin{array}{l} Δ J_{11} & Δ J_{12} \\ Δ J_{21} & Δ J_{22} \end{array}] .

Because both $J_{G_{*}}$ and $J_{G}$ are symmetric matrices, we have ΔJ₁₂ = ΔJ₂₁. If $G_{*}$ is obtained by dividing the jth (j ≥ 2) group of $G$ into two subgroups, we have ΔJ₁₂ = ΔJ₂₁ = ΔJ₂₂ = 0. This conclusion follows equation (9) because the choice of the first group, which remains the same for both $G_{*}$ and $G$ , determines ΔJ₁₂, ΔJ₂₁, and ΔJ₂₂. Remark 4.3 indicates that ΔJ₁₁ > 0. Therefore ΔJ has two nonnegative eigenvalues, 0 and ΔJ₁₁, and is positive semidefinite.

If $G_{*}$ is obtained by dividing the first group of $G$ into two subgroups, we have $θ_{j} = \sum_{y \in I_{j}} e^{- y} λ^{y} / y!$ with I₁; :::; I_N denoting different groups of $G_{*}$ . Therefore I₁ ∪ I₂ is the first group of $G$ , I₃ is the second group of, $G$ and so on. Following the definition of the Poisson multinomial distribution, we still have μ₁ = 1 − p + pθ₁ and μ_j = pθ_j for j ≥ 2. ΔJ is then calculated as follows

Δ J_{11} = \frac{{(p θ_{1}^{'})}^{2}}{μ_{1}} + \frac{{(p θ_{2}^{'})}^{2}}{μ_{2}} - \frac{{(p θ_{1}^{'} + p θ_{2}^{'})}^{2}}{μ_{1} + μ_{2}} \geq 0,

Δ J_{12} = Δ J_{21} = - \frac{θ_{1}^{'}}{μ_{1}} + \frac{θ_{1}^{'} + θ_{2}^{'}}{μ_{1} + μ_{2}},

Δ J_{22} = \frac{1 - θ_{1}}{p μ_{1}} - \frac{1 - θ_{1} - θ_{2}}{p (μ_{1} + μ_{2})} = \frac{θ_{2}}{p μ_{1} (μ_{1} + μ_{2})} > 0.

Since tr(ΔJ) = ΔJ₁₁ + ΔJ₂₂ > 0, the sum of the two eigenvalues of ΔJ is positive. Meanwhile, ΔJ has no negative eigenvalues because det (ΔJ) = 0 (proof omitted). Hence, we conclude that ΔJ is positive semidefinite, and $S (J_{G_{*}}) \geq S (J_{G})$ if a grouping scheme $G_{*}$ is finer than $G$ .

We use a distribution ρ(λ, p) to model prior knowledge of the parameters λ and p. Let S be a local objective function. We define another global objective function as

Ω_{ZIP} (G) = Ω_{ZIP, S} (G) = S (\int_{ℝ^{+} \times (0, 1)} J_{G} (λ, p) d ρ (λ, p)) .

Here, we choose to optimize the integral of S (∫ J) because this method has been justified by Chaloner and Verdinelli (1995) and is shown to be a preferred option for defining Bayesian D-optimality (Atkinson et al. 2007). For example, if ρ(λ, p) is a uniform distribution on (a; b) × (c, d) we have

Ω_{ZIP} (G) = S (\frac{1}{(b - a) (d - c)} \int_{a}^{b} d λ \int_{c}^{d} J_{G} (λ, p) d p) .

When ρ(λ, p) is a discrete distribution supported on ${(λ_{j}, p_{j})}_{j = 1}^{n}$ with probabilities q₁; …; q_n, respectively, we have

Ω_{ZIP} = S (\sum_{j = 1}^{n} J_{G} (λ_{j}, p_{j}) q_{j}), \sum_{j = 1}^{n} q_{j} = 1.

A Three-step M Algorithm

Considering a global objective function Ω(·) that is either Ω_P or Ω_ZIP in the preceding section, we propose a three-step M search algorithm for selecting an optimal grouping scheme that maximizes Ω. It should be noted that the application of this algorithm is not restricted to GRC data but could be extended to optimal designs for count responses in general if either grouping or censoring is present. From the perspective of semisupervised machine learning, this M algorithm searches all possible combinations of grouping schemes and interacts with survey investigators to yield the optimal grouping scheme.

Remarks 4.3 and 4.6 show that a finer grouping scheme increases the value of Ω. Without grouping or right censoring, Ω is thus maximized by the finest scheme where each separate response group contains and only contains one integer. This finest possible grouping scheme is obviously the optimal one. In the presence of grouping and right censoring, however, the search for an optimal grouping scheme is constrained by the total number of groups N allowed. Now, the search becomes challenging, if not impossible, since the search algorithm has to deal with infinitely many grouping schemes. To make sure that the infinitely many grouping schemes for the GRC count responses can be processed by our search algorithm, we introduce a hypothetical integer M, which is sufficiently large, to divide the infinitely many grouping schemes into two parts: a finite set where M is contained in the last groups of schemes and an infinite set where M is not contained in the last groups. With the introduction of M, the search algorithm consists of three major steps. First, we use M to produce a finite set of possible grouping schemes. An optimal grouping scheme maximizing Ω(·) is identified after a search of this finite set. The second and third steps verify whether the optimal grouping scheme returned by the first step is the global maximizer, that is, the scheme achieving the best performance among all N-group schemes. The search algorithm stops if the optimal grouping scheme returned by the first step passes the verification. Otherwise, the iteration continues with a larger M.

Step 1: Select a sufficiently large positive integer M. Among all N-group grouping schemes where M is contained in their last groups, find the scheme $G_{max}$ that maximizes Ω.

The introduction of M divides the whole set of infinite grouping schemes into two parts: a finite set with M contained in the last group and an infinite set with M not contained in the last group. This procedure is motivated by the idea that, if M is sufficiently large, all integers larger than M from a Poisson process cannot exert much influence on the Fisher information and thus do not affect the choice of optimal grouping schemes. To illustrate this idea, we define the last right-censored group I_N of a N-group scheme $G = {I_{i}}_{i = 1}^{N}$ as I_N = {M, M + 1; …}. For a Poisson (λ) model, we see that contribution this particular group containing M and larger integers to the Fisher information is trivial:

\frac{{(e^{- λ} \frac{λ^{μ - 1}}{(M - 1)!})}^{2}}{\sum_{k = M}^{\infty} e^{- λ} \frac{λ^{k}}{k!}} \to 0, as M \to \infty .

Moreover, an implication of this property is that, to increase the Fisher information, finer grouping decisions should be applied to integers with nontrivial probabilities. If the total number of groups N is fixed, a finer grouping of large integers with trivial probabilities should be avoided, and a coarse right-censored group is preferred.

The choice of M follows a Goldilocks rule. An important assumption of the search algorithm is that M should be sufficiently large and represents the lower bound of a set of integers leading to a successful search for the global optimal scheme. Yet researchers should not choose a too large M either: The number of all possible grouping schemes processed by the search algorithm grows quickly with larger M, and the search takes much longer time despite optimization of the algorithm (the computation time is roughly proportional to M^N−1). In theory, M should be the lowest integer included in the last rightcensored group of the global optimal scheme, so that the search algorithm works without consuming too much time. As a practical guidance, researchers may start from an integer larger than the mean of the prior distribution of λ, gradually increase M if its previous value fails the verification from step 2 and 3, and locate a sufficiently large M in a trial and error learning process.

One example of the search algorithm is demonstrated in Figure 1. If we set N = 3 and M = 4, step 1 only searches six schemes as plotted in the part A of Figure 1. When M is not contained in the last group, there are infinitely many grouping schemes to search and their overall set is denoted as $F_{1}$ Examples of $F_{1}$ are plotted in the part B of Figure 1.

Step 2: Compute the objective function Ω* of $F_{3}$ (defined below), where $Ω^{*} = {max}_{G \in F_{3}} Ω (G)$ .

The foregoing discussion in step 1 shows that our algorithm divides the set of all possible grouping schemes into two parts, and step 1 deals with the finite set with M contained in the last group. Step 2 then deals with the other infinite set $F_{1}$ with M not contained in the last group. In step 2, the algorithm will search a finite set $F_{3}$ of grouping schemes and calculate the objective function based on an optimal scheme from $F_{3}$ . To understand the second step, we first illustrate what $F_{2}$ and $F_{3}$ are and then discuss the relation between $F_{1}$ and $F_{3}$ . First, let $F_{2}$ be the overall set of (N−1)-group schemes such that M is contained in the last group. When N = 3 and M = 4, $F_{2}$ only consists of four schemes and is illustrated in the part C in Figure 1. Second, for each grouping scheme $G$ in $F_{2}$ , we divide its tail after M to make a new scheme $G^{'}$ . Now, the first N−2 groups in $G^{'}$ the corresponding $G$ but each integer greater than M is now contained and only contained in a separate group in $G^{'}$ . We denote $F_{3}$ as the total set of all grouping schemes $G^{'}$ obtained in this way from $F_{2}$ . The case with N−3 and M = 4 is shown in parts C (for $F_{2}$ ) and D (for $F_{3}$ ) in Figure 1. Due to the one-to-one match between grouping schemes from $F_{2}$ and $F_{3}$ , they have the same number of schemes.

For any N-group scheme $G$ from $F_{1}$ , where M is not contained in the last group, $F_{3}$ contains at least one scheme $G^{'}$ finer than $G$ . specifically, if M is contained in the (N − 1) th group for a grouping scheme $G$ , there exists some $G^{'}$ in $F_{3}$ that has identical first N − 2 groups as $G$ . $G^{'}$ must be finer than $G$ , given that every integer beyond M is also contained in a separate group in $G^{'}$ . The case where M is contained in the kth group with k ≤ N − 2 can be deduced by analogy, as now the first N − 2 groups of $G^{'}$ are finer than the first k − 1 groups of $G$ .

Step 3: If $Ω^{*} \leq Ω (G_{\max})$ , $G_{\max}$ is the global maximizer of the objective function. A larger M should be chosen otherwise.

The relationship among $F_{1}$ , $F_{2}$ , and $F_{3}$ can be further conceptualized as follows. When M is not contained in the last group (as shown in $F_{1}$ ), the search algorithm actually merges the group containing M with all its right side groups (including the right-censored group) to form a new and bigger right-censored group. The first three grouping schemes from part B to part C in Figure 1 illustrate this merging process. Subsequently, the new grouping scheme with a bigger right-censored group (e.g., the first scheme in part C) has fewer total number of groups and is thus coarser than its original form in $F_{1}$ (e.g., the first scheme in part B). To make a fair comparison between Fisher information of grouping schemes with M contained in the last group and with M not contained in the last group, after the merging, we must compensate for the loss in the latter’s Fisher information due to this reduction in the total number of groups. To compensate for the loss of the Fisher information after merging, each integer greater than M in the new last group is subsequently contained and only contained in a separate group. This procedure thus forms a new (much) finer grouping scheme (i.e., from grouping schemes in part C to corresponding grouping schemes in part D). Step 3 then compares values of the objective functions between the optimal grouping scheme with M contained in the last group and the (much finer) optimal grouping scheme with M contained in other groups. Pseudocode describing the search algorithm is listed as below to facilitate readers’ understanding:

Input the (maximum) number of groups N, a sufficiently large integer M and the objective function of a grouping scheme Ω;
Among all N-group grouping schemes where M is contained in their last groups, find the scheme $G_{\max}$ that maximizes Ω and denote this maximum value as $Ω (G_{\max})$ ;
Set $F_{2}$ as the overall set of (N − 1)-group schemes where M is contained in the last group for every grouping scheme $G$ in $F_{2}$ ;
For every $G$ in $F_{2}$ , there exists one corresponding grouping scheme $G^{'}$ where every integer greater than M is contained and only contained in one separate group. The total set of all such grouping schemes $G^{'}$ is defined $F_{3}$ . Find the grouping scheme in $F_{3}$ that maximizes Ω and denote this maximum value as Ω*;
Return $G_{\max}$ if $Ω^{*} \leq Ω (G_{\max})$ ; else choose a larger M and proceed to the first step.

To summarize, because there are infinitely many grouping schemes with M not contained in the last group (e.g., grouping schemes from the set $F_{1}$ as shown in part B), we first transform them into finite schemes with M contained in a new (big) last group ( $F_{2}$ as shown in part C) and then much finer schemes ( $F_{3}$ as shown in part D) to do a fair comparison. It is clear that these much finer grouping schemes may sometimes overcompensate for loss in Fisher information in the merging process. For example, step 3 could falsely reject the true optimal grouping scheme if the M chosen is at or slightly higher than the lowest integer included in the last right-censored group of the global optimal scheme. Yet the false rejection can be easily solved by increasing the value of M as each separate group containing one integer larger than M plays less role in estimating the objective function. The global optimal grouping scheme successfully accepted by the algorithm remains the same as that falsely rejected. Actually, the search algorithm is intentionally developed in a way that it prevents any false acceptance of a wrong optimal scheme at the cost of tolerating false rejection of the true optimal grouping scheme, while a larger M further solves the false-rejection issue.

Data Simulation and Empirical Analysis

To illustrate the optimal designs for count data, we employ data from a nationally representative survey of youth in America, the MTF study. Since 1975, each year about 250,000 high school students from approximately 130 U.S. high schools nationwide participate in this survey. In the current study, we focus on four questions from the MTF study related to 12th graders’ frequencies of alcohol drinking from 1996 to 2012. The first three questions on alcohol drinking are virtually the same except for the reference period (in your lifetime, during the last 12 months, and during the last 30 days): “On how many occasions have you had alcoholic beverages to drink–more than just a few sips?” The GRC count response categories for the three questions are: 0 occasions, 1–2 occasions, 3–5 occasions, 6–9 occasions, 10–19 occasions, 20–39 occasions, and 40 or more. The fourth question is related to binge drinking: “Think back over the LAST TWO WEEKS. How many times have you had five or more drinks in a row? (A “drink” is a glass of wine, a bottle of beer, a wine cooler, a shot glass of liquor, a mixed drink, etc.).” GRC count response categories for this question are none, once, twice, 3–5 times, 6–9 times, and 10 or more times. Table 1 shows the original counts of drinking data from 1996 to 2012. Drinking behaviors tend to be less often with shorter reference periods. Binge drinking is most rare among the 12th graders.

Table 1.

Frequency Distributions of Adolescent Alcoholic Drinking, MTF, 1996–2012.

Year	Lifetime Drinking							Drinking in Last 12 Months
Year	0	1–2	3–5	6–9	10–19	20–39	40+	0	1–2	3–5	6–9	10–19	20–39	40+
2012	674	199	235	208	281	224	416	795	335	322	225	231	160	163
2011	675	191	271	211	294	211	422	825	384	297	237	237	136	151
2010	671	193	265	216	281	244	460	784	397	321	231	250	137	199
2009	582	191	239	223	270	245	446	701	385	292	222	262	158	171
2008	634	160	251	210	292	222	470	759	380	288	218	241	165	187
2007	676	195	227	204	306	229	531	808	354	277	243	271	157	239
2006	626	169	233	226	304	242	497	756	349	302	249	262	178	196
2005	610	197	272	243	244	246	544	745	390	308	237	265	197	212
2004	571	187	245	210	307	254	583	690	363	314	246	295	214	218
2003	535	187	232	260	295	257	595	687	373	322	282	256	171	261
2002	465	151	228	214	267	211	564	592	318	311	214	239	171	240
2001	447	145	188	185	298	287	514	574	298	269	263	272	179	217
2000	412	157	246	196	283	248	538	533	303	318	236	277	190	209
1999	420	178	215	192	311	261	632	558	364	265	247	256	238	274
1998	442	168	295	233	295	307	724	588	396	327	256	331	213	350
1997	492	171	247	233	340	310	694	638	367	338	261	325	262	285
1996	515	155	226	206	282	314	625	642	350	298	254	297	219	250

Year	Drinking in Last 30 Days							Binge Drinking in Last Two Weeks
Year	0	1–2	3–5	6–9	10–19	20–39	40+	0	1	2	3–5	6–9	10+
2012	1,264	462	254	123	76	28	24	1,665	197	150	142	29	19
2011	1,346	432	234	122	74	22	34	1,730	190	134	124	32	28
2010	1,323	457	254	132	86	25	38	1,716	220	158	129	35	27
2009	1,224	438	257	138	85	24	24	1,586	222	145	147	34	32
2008	1,262	454	240	137	78	26	41	1,630	215	160	135	44	24
2007	1,293	451	275	144	107	37	45	1,688	217	164	149	55	39
2006	1,244	469	262	154	100	35	27	1,661	211	160	145	45	32
2005	1,257	459	304	157	104	33	32	1,674	243	178	165	37	33
2004	1,201	466	296	193	103	31	40	1,635	253	171	170	61	26
2003	1,193	504	247	196	123	41	37	1,636	234	180	202	42	22
2002	1,062	446	223	151	136	30	39	1,456	200	161	166	47	25
2001	1,036	430	239	173	114	40	27	1,438	209	158	144	62	37
2000	1,022	437	288	169	94	31	30	1,425	202	186	150	40	36
1999	1,041	425	285	214	147	40	53	1,460	213	176	217	63	54
1998	1,162	453	347	220	155	58	63	1,634	228	224	228	61	57
1997	1,153	527	324	207	170	46	47	1,664	253	198	238	62	53
1996	1,134	436	321	193	142	34	53	1,572	236	191	202	55	47

Open in a new tab

Note: MTF = monitoring the future.

To identify appropriate prior distributions for Bayesian optimal designs, we wrote an R function grcmle to infer Poisson parameters based on the likelihood functions given in equations (4)) and (6). This R function adopts maximum likelihood estimation and reports the mean, standard error, and confidence interval estimated from data. As multiple waves of data are available from the MTF study, we use the mean + 3 standard deviations as the range of the prior distribution. In the absence of multiple data sets, researchers could also determine the range of prior distributions based on the mean and standard error (e.g., the mean + 3 standard errors) reported by this R function. Table 2 summarizes means and standard deviations of Poisson and ZIP parameters across the 17 years investigated. As expected, the means of λ across different survey years tend to be larger with a longer reference period. Also noteworthy is that all standard deviations calculated are much smaller than their corresponding means, suggesting that the year-to-year (ZI) Poisson estimates are relatively stable.

Table 2.

Maximum Likelihood Estimates of Means and Standard Deviations (SD) of (Zero-inflated) Poisson Parameters.

	Drinking in Last 30 Days			Drinking in Last 12 Months			Lifetime Drinking			Binge Drinking in Last Two Weeks
	λ (Poisson)	P (ZIP)	λ (ZIP)	λ (Poisson)	P (ZIP)	λ (ZIP)	λ (Poisson)	P (ZIP)	λ (ZIP)	λ (Poisson)	P (ZIP)	λ (ZIP)
Mean	2.723	.477	6.062	8.750	.698	13.026	15.425	.756	21.127	0.774	.307	2.662
SD	0.461	.039	0.561	1.209	.039	1.089	1.834	.040	1.348	0.129	.029	0.226

Open in a new tab

Note: ZIP = zero-inflated Poisson.

Next, based on the aforementioned search algorithm, we developed another R function find.scheme in the R package GRCdata to search for the optimal GRC grouping scheme, which has the following parameters:

find.scheme(N, densityFUN, lambda.lwr, lambda.upr, p.lwr, p.upr, probs, lambdas, ps, is.0.isolated = TRUE, model = c(“Poisson”, “ZIP”), matSc = c(“A”, “D”, “E”), M = “auto”).

N defines the (maximum) number of groups, which should be greater than one for the Poisson case and greater than two for the ZIP case. densityFUN gives the probability density function of a prior distribution, if needed. [lambda.lwr, lambda.upr] and [p.lwr, p.upr] define the range of λ and p, respectively, and [p.lwr, p.upr] is not needed if a Poisson model is chosen. probs, lambdas, and ps define discrete prior distributions. is.0.isolated indicates whether zero should be contained in a separate group. This parameter is included, given that researchers are often interested in estimating prevalence or incidence rates, which requires zero to be contained in a separated group. model specifies Poisson or ZIP cases to be used in the search algorithm. matSc gives the type of local objective functions of the Fisher information matrix for the ZIP case. Users can choose from A-, D-, and E-optimality. M is a sufficiently large integer required to implement the search, as discussed above. If the lowest Ms needed to find the global optimal grouping schemes are specified, most examples listed in Tables 3–5 take several seconds to converge. Depending on the computer configuration, the program may take several minutes to converge if a large M (e.g., 33) is chosen. If M is set as auto, the search algorithm will automatically determine an adequate M needed to produce the global optimal grouping scheme and subsequently return such optimal grouping scheme. As expected, it can take longer time for the program to converge when the auto option is chosen. If M is not set as auto, the output of find.scheme includes an indicator succeed indicating whether M chosen is sufficiently large for the search algorithm to identify the global optimal grouping scheme. Users need to choose a slightly larger value for M if succeed is false.

Table 3.

Estimated Optimal Grouping Schemes With Different Numbers of Groups: Uniform Prior Distributions.

N Number of Groups	Drinking in 30 Days (Poisson)	M^a	Drinking in 12 Months (Poisson)	M^a	Lifetime Drinking (Poisson)	M^a	Binge Drinking (Poisson)	M^a
N Number of Groups	λ = [1.34, 4.02]	M^a	λ = [5.12, 12.38]	M^a	λ = [9.92, 20.93]	M^a	λ = [0.39, 1.16]	M^a
3	0 1–2 3+	4	0 1–8 9+	I,	0 1–14 15+	18	0 1 2+	3
4	0 1–2 3–4 5+	5	0 1–6 7–10 11+	13	0 1–11 12–17 18+	21	0 1 2 3+	3
5	0 1 2 3–4 5+	6	0 1–5 6–8 9–12 13+	14	0 1–10 11–14 15–19 20+	23	0 1 2 3 4+	4
6	0 1 2 3 4–5 6+	6	0 1–4 5–7 8–10 11–13 14+	15	0 1–9 10–13 14–17 18–22 23+	24	0 1 2 3 4 5+	5
7	0 1 2 3 4 5–6 7+	7	0 1–4 5–6 7–8 9–10 11–13 14+	16	0 1–8 9–11 12–14 15–18 19–22 23+	25	0 1 2 3 4 5 6+	6
8	0 1 2 3 4 5 6–78+	8	0 1–3 4–5 6–7 8–9 10–11 12–14 15+	17	0 1–7 8–10 11–13 14–16 17–19 20–23 24+	26	0 1 2 3 4 5 6 7+	7
9	0 1 2 3 4 5 6 7–8 9+	9	0 1–3 4–5 6–7 8–9 10–11 12–13 14–16 17+	18	0 1–7 8–10 11–12 13–14 15–17 18–20 21–24 25+	27	0 1 2 3 4 5 6 7 8+	8
N Number of Groups	Drinking in 30 Days (ZIP)	M^a	Drinking in 12 Months (ZIP)	M^a	Lifetime Drinking (ZIP)	M^a	Binge Drinking (ZIP)	M^a
N Number of Groups	λ = [4.38, 7.75] and P = [.36, .59]	M^a	λ = [9.76, 16.29] and P = [.58, .81]	M^a	λ = [17.08, 25.17] and P = [.63, .88]	M^a	λ = [1.98, 3.34] and P = [.22, .39]	M^a
3	0 1–6 7+	8	0 1–12 13+	15	0 1–20 21+	24	0 1–3 4+	4
4	0 1–4 5–7 8+	9	0 1–10 11–15 16+	18	0 1–18 19–24 25+	27	0 1–2 3–4 5+	5
5	0 1–4 5–6 7–9 10+	11	0 1–9 10–13 14–17 18+	19	0 1–16 17–21 22–26 27+	29	0 1 2 3–4 5+	6
6	0 1–3 4–5 6–7 8–10 11+	11	0 1–8 9–11 12–14 15–18 19+	20	0 1–15 16–19 20–23 24–28 29+	30	0 1 2 3 4–56+	6
7	0 1–2 3–4 5–6 7–8 9–10 11+	12	0 1–7 8–10 11–13 14–16 17–19 20+	22	0 1–14 15–18 19–21 22–24 25–28 29+	32	0 1 2 3 4 5–67+	7
8	0 1–2 3–4 5 6 7–8 9–10 11+	13	0 1–7 8–9 10–11 12–13 14–16 17–19 20+	22	0 1–13 14–16 17–19 20–22 23–25 26–29 30+	32	0 1 2 3 4 5 6 7+	8
9	0 1–2 3–4 5 6 7 8–9 10–11 12+	13	0 1–7 8–9 10–11 12–13 14–15 16–17 18–2021+	23	0 1–12 13–15 16–18 19–20 21–23 24–26 27–30 31+	33	0 1 2 3 4 5 6 7 8+	9

Open in a new tab

Note: ZIP = zero-inflated Poisson.

The lowest M needed to find the global optimal grouping scheme.

Table 5.

Estimated Optimal Grouping Schemes With Different Numbers of Groups: Uniform Prior Distributions.

Number of Groups G	Low λ	M^a	Moderate λ	M^a	High λ (M= 40)	M^a	Unspecified λ (M = 40)	M^a
Number of Groups G	λ = [0.01, 3]	M^a	λ = [3, 10]	M^a	λ = [10, 30]	M^a	λ = [0.01, 30]	M^a
3	0 1 2+	3	0 1–5 6+	8	0 1–16 17+	24	0 1–2 3+	15
4	0 1 2–3 4+	4	0 1–4 5–8 9+	10	0 1–13 14–22 23+	27	0 1 2–5 6+	20
5	0 1 2 3–4 5+	5	0 1–3 4–6 7–9 10+	11	0 1–11 12–17 18–25 26+	30	0 1 2–4 5–11 12+	23
6	0 1 2 3 4 5+	6	0 1–3 4–5 6–7 8–10 11+	12	0 1–10 11–15 16–20 21–27 28+	31	0 1 2–3 4–8 9–17 18+	25
7	0 1 2 3 4 5 6+	7	0 1–2 3–4 5–6 7–8 9–11 12+	13	0 1–10 11–14 15–18 19–23 24–29 30+	33	0 1 2–3 4–7 8–13 14–22 23+	27
8	0 1 2 3 4 5 6 7+	7	0 1–2 3 4–5 6–7 8–9 10–12 13+	14	0 1–9 10–12 13–15 16–19 20–24 25–30 31+	34	0 1 2–3 4–6 7–10 11–16 17–24 25+	29
9	0 1 2 3 4 5 6 7 8+	8	0 1–2 3 4 5 6–7 8–9 10–12 13+	14	0 1–8 9–11 12–14 15–17 18–21 22–25 26–31 32+	35	0 1 2 3–4 5–7 8–11 12–17 18–25 26+	29
Number of Groups G	Moderate λ and Low P	M^a	Moderate λ and Moderate P	M^a	Moderate λ and High P	M^a	Moderate λ and Unspecified P	M^a
Number of Groups G	λ = [3, 10] and P = [.001, .4]	M^a	λ = [3, 10] and P= [.4, .6]	M^a	λ = [3, 10] and P = [0.6, 1]	M^a	λ = [3, 10] and P= [0.001, 1]	M^a
3	0 1–5 6+	8	0 1–5 6+	8	0 1–5 6+	8	0 1–5 6+	8
4	0 1–4 5–8 9+	10	0 1–4 5–8 9+	10	0 1–4 5–8 9+	10	0 1–4 5–8 9+	10
5	0 1–3 4–6 7–9 10+	11	0 1–3 4–6 7–9 10+	11	0 1–3 4–6 7–9 10+	11	0 1–3 4–6 7–9 10+	11
6	0 1–3 4–5 6–7 8–10 11+	12	0 1–3 4–5 6–7 8–10 11+	12	0 1–3 4–5 6–7 8–10 11+	12	0 1–3 4–5 6–7 8–10 11+	12
7	0 1–2 3–4 5–6 7–8 9–11 12+	13	0 1–2 3–4 5–6 7–8 9–11 12+	13	0 1–2 3–4 5–6 7–8 9–11 12+	13	0 1–2 3–4 5–6 7–8 9–11 12+	13
8	0 1–2 3 4–5 6–7 8–9 10–12 13+	14	0 1–2 3 4–5 6–7 8–9 10–12 13+	14	0 1–2 3 4–5 6–7 8–9 10–12 13+	14	0 1–2 3 4–5 6–7 8–9 10–12 13+	14
9	0 1–2 3 4 5 6–7 8–9 10–12 13+	14	0 1–2 3 4 5 6–7 8–9 10–12 13+	14	0 1–2 34 5 6–7 8–9 10–12 13+	14	0 1–2 3 4 5 6–7 8–9 10–12 13+	14

Open in a new tab

The lowest M needed to find the global optimal grouping scheme.

Based on results calculated by find.scheme, Table 3 shows optimal grouping schemes given combinations of the (maximum) number of groups N and prior distributions. The last entry under the lifetime drinking (ZIP) scenario is estimated by the following command:

find.scheme(M=35, N=9, density=function(…)1, lambda. Lwr=17.08, lambda.upr=25.17, p.lwr=0.63, p.upr=0.88, model=“ZIP”)

which yields the same optimal grouping scheme as

find.scheme(M=35, N=9, density=function(x)1, lambda. Lwr=17.08, lambda.upr=25.17)

We use a uniform distribution as the prior distribution in Table 3. It should be noted, however, that other continuous or discrete distributions can also be processed by the R function as the prior distribution. The third cell [0, 1, 2, 3–4, 5þ] under the drinking in 30 days (Poisson) scenario means that the optimal grouping scheme is zero, once, twice, three and four times, and five times and more, given that the total number of groups is five and the range for λ’s prior distribution is [1.34, 4.02]. Across different scenarios, the lowest M required to identify the global optimal scheme is also provided. As the search algorithm tolerates false rejection, the lowest M is often slightly larger than the lowest integer of the last right-censored group of the optimal scheme, and this difference becomes larger as λ increases. Across the eight scenarios in Table 3, the cutoff integers between two adjacent groups tend to concentrate on smaller integers if the parameter space of λ is close to zero (the binge drinking scenario). If the parameter space of λ stays close to zero (e.g., rare events), any grouping decision of small integers is not supported by the search algorithm as the maximum number of groups N increases (see N = 7 or 8 in the binge drinking scenario). This finding suggests that the GRC count response is inappropriate for collecting very rare count events. The cutoff integers tend to appear first around the mean of the parameter space of λ and then appear at other locations as N increases. The prior distributions in Table 6 are truncated (mean ± 3 standard deviations) Gaussian distributions whose means and standard deviations are provided in Table 2. Table 4 also demonstrates that the cutoff integers of grouping decisions often exist at integers around which l has higher probability density. For the same combination of N and range of prior distributions, the optimal schemes listed in Tables 3 and 4 are virtually the same, suggesting that the search for optimal schemes is not sensitive to the choice of prior distributions. Table 5 lists optimal grouping schemes when λ is low, moderate, high, and unspecified for readers’ reference. Because zero is contained in a separate group for the ZIP case from Tables 3–5, the optimal grouping scheme remains the same when λ is fixed but p varies.

Table 6.

Parameter Estimates for Different Grouping Schemes: Results From Simulation.

Parameters Used for Simulation	Grouping Schemes		${\bar{λ}}_{MLE}$	Standard Errors	${\bar{λ}}_{MLE}$	Standard Errors	${\bar{λ}}_{MLE}$	Standare Errors
			(100)^a		(1,000)^a		(10,000)^a
λ = 2.723	Reference	[0, 1–2, 3–5, 6–9, 10–19, 20–39, 40+]	2.718	.179	2.719	.057	2.723	.018
	Optimal	[0, 1, 2, 3, 4, 5–6, 7+]	2.718	.166	2.719	.054	2.723	.017
	Optimal^b	[0, 1 −2, 3–4, 5+]	2.719	.176	2.72	.057	2.723	.018
λ = 8.750	Reference	[0, 1–2, 3–5, 6–9, 10–19, 20–39, 40+]	8.746	.330	8.742	.106	8.749	.033
	Optimal	[0, 1–5, 6–7, 8–9, 10–11, 12–14, 15+]	8.743	.298	8.742	.096	8.749	.03
	Optimal^b	[0, 1 −7, 8–11, 1 2+]	8.746	.322	8.742	.105	8.749	.033
λ = 15.425	Reference	[0, 1–2, 3–5, 6–9, 10–19, 20–39, 40+]	15.396	.502	1 5.409	.157	15.424	.048
	Optimal	[0, 1–10, 11–13, 14–15, 16–18, 19–21, 22+]	15.395	.403	1 5.409	.129	15.423	.039
	Optimal^b	[0, 1–13, 14–18, 19+]	1 5.403	.436	15.413	.142	15.424	.043
λ = 0.774	Reference	[0, 1, 2, 3–5, 6–9, 10+]	0.773	.088	0.772	.028	0.774	.009
	Optimal	[0, 1, 2, 3, 4, 5+]	0.773	.087	0.772	.028	0.774	.009
	Optimal^b	[0, 1, 2+]	0.775	.090	0.773	.029	0.774	.009
λ= 1	Reference	[0, 1, 2, 3–5, 6–9, 10+]	0.999	.102	0.998	.033	1	.010
	Optimal	[0, 1, 2, 3, 4, 5+]	0.999	.101	0.998	.032	1	.010
	Optimal^b	[0, 1, 2+]	1.001	.106	0.998	.035	1	.010
λ = 3	Reference	[0, 1–2, 3–5, 6–9, 10–19, 20–39, 40+]	2.996	.187	2.998	.060	3	.019
	Optimal	[0, 1, 2, 3, 4, 5–6, 7+]	2.996	.173	2.996	.055	3	.018
	Optimal^b	[0, 1 −2, 3–4, 5+]	3	.183	2.998	.059	3	.019
λ = 5	Reference	[0, 1–2, 3–5, 6–9, 10–19, 20–39, 40+]	4.997	.240	4.993	.076	4.999	.024
	Optimal	[0, 1 −2, 3–4, 5, 6–7, 8–9, 10+]	4.994	.225	4.994	.073	4.999	.023
	Optimal^b	[0, 1 −4, 5–7, 8+]	4.991	.240	4.994	.079	5	.025
λ = 10	Reference	[0, 1–2, 3–5, 6–9, 10–19, 20–39, 40+]	9.993	.373	9.997	.122	10	.037
	Optimal	[0, 1–6, 7–8, 9–10, 11–12, 13–15, 16+]	9.986	.322	9.994	.105	10	.032
	Optimal^b	[0, 1 −8, 9–12, 1 3+]	9.99	.349	9.997	.114	10.001	.035
λ = 20	Reference	[0, 1–2, 3–5, 6–9, 10–19, 20–39, 40+]	1 9.988	.552	1 9.994	.182	19.999	.054
	Optimal	[0, 1–14, 15–17, 18–20, 21–23, 24–27, 28+]	1 9.966	.467	1 9.985	.146	20	.045
	Optimal^b	[0, 1–17, 18–23, 24+]	19.971	.502	1 9.987	.159	20	.049

Open in a new tab

Each simulation is repeated 1,000 times to calculate estimates. Numbers in the parentheses are sample sizes used for simulation.

Optimal grouping schemes with fewer groups.

Table 4.

Estimated Optimal Grouping Schemes With Different Numbers of Groups: Normal Prior Distributions.

N Number of Groups	Drinking in 30 Days (Poisson)	M^a	Drinking in 12 Months (Poisson)	M^a	Lifetime Drinking (Poisson)	M^a	Binge Drinking (Poisson)	M^a
N Number of Groups	λ = [1.34, 4.02]	M^a	λ = [5.12, 12.38]	M^a	λ = [9.92, 20.93]	M^a	λ = [0.39, 1.16]	M^a
3	0 1–3 4+	4	0 1–8 9+	11	0 1–15 16+	18	0 1 2+
4	0 1–2 3–4 5+	5	0 1–6 7–10 11+	13	0 1–12 13–17 18+	21	0 1 2 3+	3
5	0 1 2 3–4 5+	6	0 1–6 7–9 10–12 13+	14	0 1–11 12–15 16–19 20+	22	0 1 2 3 4+	4
6	0 1 2 3 4–5 6+	6	0 1–5 6–7 8–9 10–12 13+	15	0 1–10 11–13 14–16 17–20 21+	23	0 1 2 3 4 5+	5
7	0 1 2 3 4 5–6 7+	7	0 1–4 5–6 7–8 9–10 11–13 14+	15	0 1–9 10–12 13–15 16–18 19–22 23+	24	0 1 2 3 4 5 6+	6
8	0 1 2 3 4 5 6 7+	8	0 1–4 5–6 7–8 9–10 11–12 13–15 16+	16	0 1–9 10–12 13–14 15–16 17–19 20–23 24+	25	0 1 2 3 4 5 6 7+	7
9	0 1 2 3 4 5 6 7 8+	9	0 1–3 4–5 6–7 8 9–10 11–12 13–15 16+	17	0 1–8 9–11 12–13 14–15 16–17 18–20 21–23 24+	26	0 1 2 3 4 5 6 7 8+	8
N Number of Groups	Drinking in 30 Days (ZIP)	M^a	Drinking in 12 Months (ZIP)	M^a	Lifetime Drinking (ZIP)	M^a	Binge Drinking (ZIP)	M^a
N Number of Groups	λ = [4.38, 7.75] and P = [0.36, 0.59]	M^a	λ = [9.76, 16.29] and P = [.58, .81]	M^a	λ = [17.08, 25.17] and P = [.63, .88]	M^a	λ = [1.98, 3.34] and P = [.22, .39]	M^a
3	0 1–6 7+	8	0 1–13 14+	15	0 1–21 22+	24	0 1–3 4+	4
4	0 1–4 5–7 8+	9	0 1–10 11–15 16+	18	0 1–18 19–24 25+	27	0 1–2 3–4 5+	5
5	0 1–4 5–6 7–9 10+	10	0 1–9 10–13 14–17 18+	19	0 1–16 17–21 22–26 27+	29	0 1–2 3 4–5 6+	6
6	0 1–3 4–5 6–7 8–9 10+	11	0 1–8 9–11 12–14 15–18 19+	20	0 1–15 16–19 20–23 24–27 28+	30	0 1 2 3 4–5 6+	6
7	0 1–3 4–5 6 7–8 9–10 11+	12	0 1–8 9–11 12–13 14–16 17–19 20+	21	0 1–14 15–18 19–21 22–24 25–28 29+	31	0 1 2 3 4 5–6 7+	7
8	0 1–2 3–4 5 6 7–8 9–10 11+	12	0 1–7 8–10 11–12 13–14 15–16 17–19 20+	22	0 1–14 15–17 18–20 21–23 24–26 27–30 31+	32	0 1 2 3 4 5 6 7+	8
9	0 1–2 3–4 5 6 7 8–9 10–11 12+	12	0 1–7 8–9 10–11 12–13 14–15 16–17 18–20 21+	22	0 1–13 14–16 17–19 20–21 22–23 24–26 27–30 31+	33	0 1 2 3 4 5 6 7 8+	8

Open in a new tab

Note: ZIP = zero-inflated Poisson.

The lowest M needed to find the global optimal grouping scheme.

To illustrate how an optimal grouping scheme is preferred to other grouping schemes, we use the grouping scheme adopted by the MTF binge drinking question as a reference grouping scheme and compare standard errors estimated under different grouping schemes. In Table 6, the first column is the true parameters we used to simulate the Poisson distributions. Both parameters inferred from the data (see Table 2) and hypothetical parameters are used. The reference schemes are those adopted by the MTF study to measure alcohol drinking. The optimal schemes are generated by find.scheme with the same number of groups as the reference group. For each scenario, the simulation is repeated 1,000 times to calculate estimates.

When λ is small, the reference schemes appear to be acceptable as their corresponding standard errors are only slightly larger than these calculated based on optimal grouping schemes. However, the differences between standard errors estimated from the reference schemes and those of the optimal schemes grow larger as λ increases. As expected, the standard errors decrease by $\sqrt{10}$ when sample sizes increase by 10. Moreover, the strength of this algorithm can be illustrated by optimal schemes with fewer groups than corresponding reference schemes. Compared with estimation based on the reference schemes, researchers could achieve almost the same, some-times better, efficiency of estimation by adopting optimal schemes with even smaller numbers of groups N. In other words, an optimal grouping scheme can outperform a nonoptimal one even if the latter has more groups.

Discussion and Conclusion

This research applies optimal experimental design, a branch of semisupervised machine learning, to social science research and provides a novel algorithm to find the optimal grouping scheme of GRC count responses. One of the most striking features of social science research on survey methodology is the degree to which the design of response categories in survey questions has been neglected. Count responses with grouping and right censoring have long been collected by social scientists to study a variety of behaviors, status, and attitudes. Yet there has been little research on optimal designs for discrete response categories such that grouping or right-censoring decisions often rely on arbitrary choices of survey investigators. To search for optimal grouping schemes, this article first uses Poisson multinomial mixture models to conceptualize the data-generating process of count data with grouping and right censoring and then investigates the relationship between grouping-scheme choices and asymptotic distributions of the Poisson multinomial models. Using different types of optimalities in experimental designs (De Leon and Atkinson 1991), we investigate local objective functions of the Fisher information (matrix) and further demonstrate the possibility of optimal designs for GRC count responses: The optimal grouping scheme should maximize the global objective function of the Fisher information (matrix). We also propose a new three-step general algorithm to process infinitely many grouping schemes and identify the global optimal grouping scheme. To process all possible grouping schemes, this algorithm introduces a sufficiently large integer M, which is in theory the lowest integer contained in the right-censored group of the global optimal scheme. The introduction of M not only makes the search feasible but also tolerates false rejection of the global optimal grouping scheme. A new R package GRCdata is developed to implement this algorithm and help survey investigators to assess grouping schemes of count responses. The use of two R programs grcmle and find.scheme in GRCdata is illustrated by empirical examples of alcohol drinking. Results from data simulation show that the optimal designs yielded by this new algorithm considerably outperform existing designs: The optimal grouping scheme, even with fewer total number of groups, can lead to more efficient estimation.

The M algorithm and software programs presented in this research readily provide survey investigators a new tool for evaluating grouping and rightcensoring decisions of count responses in surveys. While survey methodologists do need to take a series of factors (e.g., the coherence of response categories over time and across questions or whether a specific count is of research interest or has substantive meaning) into account when designing response categories (Schaeffer and Dykema 2011), the new R package developed allow scholars to incorporate their prior knowledge in optimal designs of survey questions. Although this research only addresses (ZI) Poisson models of count data, it should be noted that the application of the M search algorithm is not restricted to the two statistical models investigated and can be extended to other models of count data such as negative binomial models and hurdle models. If the assumption that the Fisher information increases with a finer grouping scheme holds for other discrete or continuous data-generating processes, this M algorithm can be employed for designing survey responses in general. Such potential applications of this algorithm to broader issues in survey methodology merit further attention.

Acknowledgment

The authors would like to thank Junhui Wang, Jiahua Chen, Sayan Mukherjee, Li Ma, Ding-Xuan Zhou, Tim Liao, Zheng Wu, Nan Lin, Linda K. George, Yanlong Zhang, Yandong Zhao, and seminar/conference participants at University of Victoria, Shanghai University, the 2013 Joint Statistical Meetings (Montreal, Canada), and the 2015 Methodology Section Mid-year Meeting of American Sociological Association (San Diego, USA) for their helpful comments.

Funding

The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: The authors gratefully acknowledge the financial support from the Research Grants Council of Hong Kong (ECS Project No. PolyU 25301115), Hampton New Faculty Award at The University of British Columbia, Chiang Ching-kuo Foundation for International Scholarly Exchange and a 2015 Major Project of the National Social Sciences Foundation in China (grant no. 15ZDB172).

Biography

Qiang Fu is an assistant professor of sociology at The University of British Columbia. His methodological research interests include the application of machine-learning tools in social sciences, demographic methods, and social network analysis, while his substantive interests focus on urban studies, social networks, health, and China.

Xin Guo is an assistant professor in the Department of Applied Mathematics at The Hong Kong Polytechnic University. His research interests include statistical learning theory (kernel methods, support vector machine, error analysis, sparsity analysis, and the implementation of algorithms), and computational social science.

Kenneth C. Land is the John Franklin Crowell Professor Emeritus of Sociology and research professor in Social Science Research Institute at Duke University. He is an elected fellow of the American Statistical Association and was the 1997 recipient of the Paul F. Lazarsfeld Award from the Methodology Section of the American Sociological Association. His research interests are in the development of mathematical and statistical models and methods for substantive applications in demography, criminology, and social indicators/quality-of-life studies.

Footnotes

Declaration of Conflicting Interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

^1.

The proof follows Theorems 6.3.7 and 6.3.10 at (Lehmann and Casella 1998) and is available upon request.

References

Akers Ronald L., La Greca Anthony J., Cochran John, and Sellers Christine. 1989. “Social Learning Theory and Alcohol Behavior Among the Elderly.” The Sociological Quarterly 30:625–38. [Google Scholar]
Atkinson Anthony, Donev Alexander, and Tobias Randall. 2007. Optimum Experimental Designs, With SAS. Oxford, UK: Oxford University Press. [Google Scholar]
Bachman Jerald G., Johnston Lloyd D., and O’Malley Patrick M.. 1990. “Explaining the Recent Decline in Cocaine Use Among Young Adults: Further Evidence That Perceived Risks and Disapproval Lead to Reduced Drug Use.” Journal of Health and Social Behavior 31:173–84. [PubMed] [Google Scholar]
Bailey Susan L., Flewelling Robert L., and Rachal J. Valley. 1992. “Predicting Continued Use of Marijuana Among Adolescents: The Relative Influence of Drug-specific and Social Context Factors.” Journal of Health and Social Behavior 33:51–65. [PubMed] [Google Scholar]
Barnes Grace M., Hoffman Joseph H., Welte John W., Farrell Michael P., and Dintcheff Barbara A.. 2006. “Effects of Parental Monitoring and Peer Deviance on Substance Use and Delinquency.” Journal of Marriage and Family 68:1084–104. [Google Scholar]
Basu Bharati and Famoye Felix. 2004. “Domestic Violence Against Women, and Their Economic Dependence: A Count Data Analysis.” Review of Political Economy 16:457–72. [Google Scholar]
Blazer Dan, Burchett Bruce, George Linda K., and Service Connie. 1991. “The Association of Age and Depression Among the Elderly: An Epidemiologic Exploration.” Journal of Gerontology 46:M210–15. [DOI] [PubMed] [Google Scholar]
Bollen Kenneth A. 1990. “Political Democracy: Conceptual and Measurement Traps.” Studies in Comparative International Development 25:7–24. [Google Scholar]
Bradburn Norman M., Sudman Seymour, and Wansink Brian. 2004. Asking Questions: The Definitive Guide to Questionnaire Design–for Market Research, Political Polls, and Social and Health Questionnaires. San Francisco, CA: John Wiley & Sons. [Google Scholar]
Chaloner Kathryn and Verdinelli Isabella. 1995. “Bayesian Experimental Design: A Review.” Statistical Science 10:273–304. [Google Scholar]
Cheibub Jose Antonio, Przeworski Adam, Limongi Neto Fernando Papaterra, and Alvarez Michael M.. 1996. “What Makes Democracies Endure?” Journal of Democracy 7:39–55. [Google Scholar]
Chernoff Herman. 1953. “Locally Optimal Designs for Estimating Parameters.” The Annals of Mathematical Statistics 24:586–602. [Google Scholar]
Cohn David A., Ghahramani Zoubin, and Jordan Michael I.. 1996. “Active Learning With Statistical Models.” Journal of Artificial Intelligence Research 4:129–45. [Google Scholar]
De Leon, Ponce AC, and Atkinson Anthony C.. 1991. “Optimum Experimental Design for Discriminating Between Two Rival Models in the Presence of Prior Information.” Biometrika 78:601–08. [Google Scholar]
Dette Holger, Melas Viatcheslav B., and Pepelyshev Andrey. 2004. “Optimal Designs for a Class of Nonlinear Regression Models.” Annals of Statistics 32: 2142–67. [Google Scholar]
Elkins Zachary. 2000. “Gradations of Democracy? Empirical Tests of Alternative Conceptualizations.” American Journal of Political Science 44:293–300. [Google Scholar]
Fu Qiang, Land Kenneth C., and Lamb Vicki L.. 2013. “Bullying Victimization, Socioeconomic Status and Behavioral Characteristics of 12th Graders in the United States, 1989 to 2009: Repetitive Trends and Persistent Risk Differentials.” Child Indicators Research 6:1–21. [Google Scholar]
Fu Qiang, Land Kenneth C., and Lamb Vicki L.. 2016. “Violent Physical Bullying Victimization at School: Has There Been a Recent Increase in Exposure or Intensity? An Age-period-cohort Analysis in the United States, 1991 to 2012.” Child Indicators Research 9:485–513. [Google Scholar]
Fu Qiang, Guo Xin, and Land Kenneth C.. 2018. “A Poisson-multinomial Mixture Approach to Grouped and Right-censored Counts.” Communications in Statistics-Theory and Methods 47: 427–447. [Google Scholar]
Goodman Leo A. 1987. “New Methods for Analyzing the Intrinsic Character of Qualitative Variables Using Cross-classified Data.” American Journal of Sociology 93:529–83. [Google Scholar]
Groves Robert M., Fowler Floyd J. Jr, Couper Mick P., Lepkowski James M., Singer Eleanor, and Tourangeau Roger. 2011. Survey Methodology. Hoboken, NJ: John Wiley & Sons. [Google Scholar]
Hagan John, Shedd Carla, and Payne Monique R.. 2005. “Race, Ethnicity, and Youth Perceptions of Criminal Injustice.” American Sociological Review 70:381–407. [Google Scholar]
Hall Daniel B. 2000. “Zero-inflated Poisson and Binomial Regression With Random Effects: A Case Study.” Biometrics 56:1030–39. [DOI] [PubMed] [Google Scholar]
Horn Roger A. and Johnson Charles R.. 2013. Matrix Analysis. 2nd ed. Cambridge, UK: Cambridge University Press. [Google Scholar]
Klein Nadja, Kneib Thomas, and Lang Stefan. 2015. “Bayesian Generalized Additive Models for Location, Scale, and Shape for Zero-inflated and Overdispersed Count Data.” Journal of the American Statistical Association 110:405–19. [Google Scholar]
Lambert Diane. 1992. “Zero-inflated Poisson Regression, With an Application to Defects in Manufacturing.” Technometrics 34:1–14. [Google Scholar]
Lang Joseph B. 2004. “Multinomial-Poisson Homogeneous Models for Contingency Tables.” Annals of Statistics 32:340–83. [Google Scholar]
Lehmann Erich Leo and Casella George. 1998. Theory of Point Estimation. New York: Springer-Verlag. [Google Scholar]
Lord Dominique, Washington Simon P., and Ivan John N.. 2005. “Poisson, Poissongamma and Zero-inflated Regression Models of Motor Vehicle Crashes: Balancing Statistical Fit and Theory.” Accident Analysis & Prevention 37:35–46. [DOI] [PubMed] [Google Scholar]
Marsden Peter V. 2003. “Interviewer Effects in Measuring Network Size Using a Single Name Generator.” Social Networks 25:1–16. [Google Scholar]
Minkin Salomon. 1987. “Optimal Designs for Binary Data.” Journal of the American Statistical Association 82:1098–103. [Google Scholar]
Nguyen Nam-Ky and Miller Alan J.. 1992. “A Review of Some Exchange Algorithms for Constructing Discrete D-optimal Designs.” Computational Statistics & Data Analysis 14:489–98. [Google Scholar]
Paik Anthony and Sanchagrin Kenneth. 2013. “Social Isolation in America: An Artifact.” American Sociological Review 78:339–60. [Google Scholar]
Puig Pedro and Valero Jordi. 2006. “Count Data Distributions: Some Characterizations With Applications.” Journal of the American Statistical Association 101:332–40. [Google Scholar]
Pukelsheim Friedrich. 1993. Optimal Design of Experiments. New York: John Wiley & Sons. [Google Scholar]
Radloff Lenore Sawyer. 1977. “The CES-D Scale a Self-report Depression Scale for Research in the General Population.” Applied Psychological Measurement 1: 385–401. [Google Scholar]
Reardon Sean F. and Raudenbush Stephen W.. 2006. “3. A Partial Independence Item Response Model for Surveys With Filter Questions.” Sociological Methodology 36:257–300. [Google Scholar]
Schaeffer Nora Cate and Dykema Jennifer. 2011. “Questions for Surveys Current Trends and Future Directions.” Public Opinion Quarterly 75:909–61. [DOI] [PMC free article] [PubMed] [Google Scholar]
Schaeffer Nora Cate and Presser Stanley. 2003. “The Science of Asking Questions.” Annual Review of Sociology 29:65–88. [Google Scholar]
Schwarz Norbert, Hippler Hans-Juergen, Deutsch Brigitte, and Strack Fritz. 1985. “Response Categories: Effects on Behavioural Reports and Comparative Judgements.” Public Opinion Quarterly 49:388–95. [Google Scholar]
Settles Burr. 2010. Active Learning Literature Survey. Madison: University of Wisconsin-Madison. [Google Scholar]
Smith Herbert L. and Garnier Maurice A.. 1986. “Association Between Background and Educational Attainment in France.” Sociological Methods & Research 14:317–44. [Google Scholar]
Steinberg David M. and Hunter William G.. 1984. “Experimental Design: Review and Comment.” Technometrics 26:71–97. [Google Scholar]
Straus Murray Arnold, Gelles Richard J., and Smith Christine. 1990. Physical Violence in American Families: Risk Factors and Adaptations to Violence in 8,145 Families. New Brunswick, NJ: Transaction. [Google Scholar]
Sudman Seymour, Bradburn Norman M., and Schwarz Norbert. 1996. Thinking About Answers: The Application of Cognitive Processes to Survey Methodology. San Francisco, CA: Jossey-Bass. [Google Scholar]
Thoits Peggy A. and Hewitt Lyndi N.. 2001. “Volunteer Work and Well-being.” Journal of Health and Social Behavior 42:115–31. [PubMed] [Google Scholar]
Wong Raymond Sin-Kwok. 2010. Association Models. Thousand Oaks, CA: Sage. [Google Scholar]

[R1] Akers Ronald L., La Greca Anthony J., Cochran John, and Sellers Christine. 1989. “Social Learning Theory and Alcohol Behavior Among the Elderly.” The Sociological Quarterly 30:625–38. [Google Scholar]

[R2] Atkinson Anthony, Donev Alexander, and Tobias Randall. 2007. Optimum Experimental Designs, With SAS. Oxford, UK: Oxford University Press. [Google Scholar]

[R3] Bachman Jerald G., Johnston Lloyd D., and O’Malley Patrick M.. 1990. “Explaining the Recent Decline in Cocaine Use Among Young Adults: Further Evidence That Perceived Risks and Disapproval Lead to Reduced Drug Use.” Journal of Health and Social Behavior 31:173–84. [PubMed] [Google Scholar]

[R4] Bailey Susan L., Flewelling Robert L., and Rachal J. Valley. 1992. “Predicting Continued Use of Marijuana Among Adolescents: The Relative Influence of Drug-specific and Social Context Factors.” Journal of Health and Social Behavior 33:51–65. [PubMed] [Google Scholar]

[R5] Barnes Grace M., Hoffman Joseph H., Welte John W., Farrell Michael P., and Dintcheff Barbara A.. 2006. “Effects of Parental Monitoring and Peer Deviance on Substance Use and Delinquency.” Journal of Marriage and Family 68:1084–104. [Google Scholar]

[R6] Basu Bharati and Famoye Felix. 2004. “Domestic Violence Against Women, and Their Economic Dependence: A Count Data Analysis.” Review of Political Economy 16:457–72. [Google Scholar]

[R7] Blazer Dan, Burchett Bruce, George Linda K., and Service Connie. 1991. “The Association of Age and Depression Among the Elderly: An Epidemiologic Exploration.” Journal of Gerontology 46:M210–15. [DOI] [PubMed] [Google Scholar]

[R8] Bollen Kenneth A. 1990. “Political Democracy: Conceptual and Measurement Traps.” Studies in Comparative International Development 25:7–24. [Google Scholar]

[R9] Bradburn Norman M., Sudman Seymour, and Wansink Brian. 2004. Asking Questions: The Definitive Guide to Questionnaire Design–for Market Research, Political Polls, and Social and Health Questionnaires. San Francisco, CA: John Wiley & Sons. [Google Scholar]

[R10] Chaloner Kathryn and Verdinelli Isabella. 1995. “Bayesian Experimental Design: A Review.” Statistical Science 10:273–304. [Google Scholar]

[R11] Cheibub Jose Antonio, Przeworski Adam, Limongi Neto Fernando Papaterra, and Alvarez Michael M.. 1996. “What Makes Democracies Endure?” Journal of Democracy 7:39–55. [Google Scholar]

[R12] Chernoff Herman. 1953. “Locally Optimal Designs for Estimating Parameters.” The Annals of Mathematical Statistics 24:586–602. [Google Scholar]

[R13] Cohn David A., Ghahramani Zoubin, and Jordan Michael I.. 1996. “Active Learning With Statistical Models.” Journal of Artificial Intelligence Research 4:129–45. [Google Scholar]

[R14] De Leon, Ponce AC, and Atkinson Anthony C.. 1991. “Optimum Experimental Design for Discriminating Between Two Rival Models in the Presence of Prior Information.” Biometrika 78:601–08. [Google Scholar]

[R15] Dette Holger, Melas Viatcheslav B., and Pepelyshev Andrey. 2004. “Optimal Designs for a Class of Nonlinear Regression Models.” Annals of Statistics 32: 2142–67. [Google Scholar]

[R16] Elkins Zachary. 2000. “Gradations of Democracy? Empirical Tests of Alternative Conceptualizations.” American Journal of Political Science 44:293–300. [Google Scholar]

[R17] Fu Qiang, Land Kenneth C., and Lamb Vicki L.. 2013. “Bullying Victimization, Socioeconomic Status and Behavioral Characteristics of 12th Graders in the United States, 1989 to 2009: Repetitive Trends and Persistent Risk Differentials.” Child Indicators Research 6:1–21. [Google Scholar]

[R18] Fu Qiang, Land Kenneth C., and Lamb Vicki L.. 2016. “Violent Physical Bullying Victimization at School: Has There Been a Recent Increase in Exposure or Intensity? An Age-period-cohort Analysis in the United States, 1991 to 2012.” Child Indicators Research 9:485–513. [Google Scholar]

[R19] Fu Qiang, Guo Xin, and Land Kenneth C.. 2018. “A Poisson-multinomial Mixture Approach to Grouped and Right-censored Counts.” Communications in Statistics-Theory and Methods 47: 427–447. [Google Scholar]

[R20] Goodman Leo A. 1987. “New Methods for Analyzing the Intrinsic Character of Qualitative Variables Using Cross-classified Data.” American Journal of Sociology 93:529–83. [Google Scholar]

[R21] Groves Robert M., Fowler Floyd J. Jr, Couper Mick P., Lepkowski James M., Singer Eleanor, and Tourangeau Roger. 2011. Survey Methodology. Hoboken, NJ: John Wiley & Sons. [Google Scholar]

[R22] Hagan John, Shedd Carla, and Payne Monique R.. 2005. “Race, Ethnicity, and Youth Perceptions of Criminal Injustice.” American Sociological Review 70:381–407. [Google Scholar]

[R23] Hall Daniel B. 2000. “Zero-inflated Poisson and Binomial Regression With Random Effects: A Case Study.” Biometrics 56:1030–39. [DOI] [PubMed] [Google Scholar]

[R24] Horn Roger A. and Johnson Charles R.. 2013. Matrix Analysis. 2nd ed. Cambridge, UK: Cambridge University Press. [Google Scholar]

[R25] Klein Nadja, Kneib Thomas, and Lang Stefan. 2015. “Bayesian Generalized Additive Models for Location, Scale, and Shape for Zero-inflated and Overdispersed Count Data.” Journal of the American Statistical Association 110:405–19. [Google Scholar]

[R26] Lambert Diane. 1992. “Zero-inflated Poisson Regression, With an Application to Defects in Manufacturing.” Technometrics 34:1–14. [Google Scholar]

[R27] Lang Joseph B. 2004. “Multinomial-Poisson Homogeneous Models for Contingency Tables.” Annals of Statistics 32:340–83. [Google Scholar]

[R28] Lehmann Erich Leo and Casella George. 1998. Theory of Point Estimation. New York: Springer-Verlag. [Google Scholar]

[R29] Lord Dominique, Washington Simon P., and Ivan John N.. 2005. “Poisson, Poissongamma and Zero-inflated Regression Models of Motor Vehicle Crashes: Balancing Statistical Fit and Theory.” Accident Analysis & Prevention 37:35–46. [DOI] [PubMed] [Google Scholar]

[R30] Marsden Peter V. 2003. “Interviewer Effects in Measuring Network Size Using a Single Name Generator.” Social Networks 25:1–16. [Google Scholar]

[R31] Minkin Salomon. 1987. “Optimal Designs for Binary Data.” Journal of the American Statistical Association 82:1098–103. [Google Scholar]

[R32] Nguyen Nam-Ky and Miller Alan J.. 1992. “A Review of Some Exchange Algorithms for Constructing Discrete D-optimal Designs.” Computational Statistics & Data Analysis 14:489–98. [Google Scholar]

[R33] Paik Anthony and Sanchagrin Kenneth. 2013. “Social Isolation in America: An Artifact.” American Sociological Review 78:339–60. [Google Scholar]

[R34] Puig Pedro and Valero Jordi. 2006. “Count Data Distributions: Some Characterizations With Applications.” Journal of the American Statistical Association 101:332–40. [Google Scholar]

[R35] Pukelsheim Friedrich. 1993. Optimal Design of Experiments. New York: John Wiley & Sons. [Google Scholar]

[R36] Radloff Lenore Sawyer. 1977. “The CES-D Scale a Self-report Depression Scale for Research in the General Population.” Applied Psychological Measurement 1: 385–401. [Google Scholar]

[R37] Reardon Sean F. and Raudenbush Stephen W.. 2006. “3. A Partial Independence Item Response Model for Surveys With Filter Questions.” Sociological Methodology 36:257–300. [Google Scholar]

[R38] Schaeffer Nora Cate and Dykema Jennifer. 2011. “Questions for Surveys Current Trends and Future Directions.” Public Opinion Quarterly 75:909–61. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R39] Schaeffer Nora Cate and Presser Stanley. 2003. “The Science of Asking Questions.” Annual Review of Sociology 29:65–88. [Google Scholar]

[R40] Schwarz Norbert, Hippler Hans-Juergen, Deutsch Brigitte, and Strack Fritz. 1985. “Response Categories: Effects on Behavioural Reports and Comparative Judgements.” Public Opinion Quarterly 49:388–95. [Google Scholar]

[R41] Settles Burr. 2010. Active Learning Literature Survey. Madison: University of Wisconsin-Madison. [Google Scholar]

[R42] Smith Herbert L. and Garnier Maurice A.. 1986. “Association Between Background and Educational Attainment in France.” Sociological Methods & Research 14:317–44. [Google Scholar]

[R43] Steinberg David M. and Hunter William G.. 1984. “Experimental Design: Review and Comment.” Technometrics 26:71–97. [Google Scholar]

[R44] Straus Murray Arnold, Gelles Richard J., and Smith Christine. 1990. Physical Violence in American Families: Risk Factors and Adaptations to Violence in 8,145 Families. New Brunswick, NJ: Transaction. [Google Scholar]

[R45] Sudman Seymour, Bradburn Norman M., and Schwarz Norbert. 1996. Thinking About Answers: The Application of Cognitive Processes to Survey Methodology. San Francisco, CA: Jossey-Bass. [Google Scholar]

[R46] Thoits Peggy A. and Hewitt Lyndi N.. 2001. “Volunteer Work and Well-being.” Journal of Health and Social Behavior 42:115–31. [PubMed] [Google Scholar]

[R47] Wong Raymond Sin-Kwok. 2010. Association Models. Thousand Oaks, CA: Sage. [Google Scholar]

PERMALINK

Optimizing Count Responses in Surveys: A Machine-learning Approach

Qiang Fu

Xin Guo

Kenneth C Land

Abstract

GRC Count Data

Generating GRC Count Data

Optimal Designs for GRC Count Data

Fisher Information and Grouping Choices: the Poisson Case

Fisher Information and Grouping Choices: The ZIP Case

A Three-step M Algorithm

Figure 1.

Data Simulation and Empirical Analysis

Table 1.

Table 2.

Table 3.

Table 5.

Table 6.

Table 4.

Discussion and Conclusion

Acknowledgment

Biography

Footnotes

References

ACTIONS

PERMALINK

RESOURCES

Cite

Add to Collections

PERMALINK

Optimizing Count Responses in Surveys: A Machine-learning Approach

Qiang Fu

Xin Guo

Kenneth C Land

Abstract

GRC Count Data

Generating GRC Count Data

Optimal Designs for GRC Count Data

Fisher Information and Grouping Choices: the Poisson Case

Fisher Information and Grouping Choices: The ZIP Case

A Three-step M Algorithm

Figure 1.

Data Simulation and Empirical Analysis

Table 1.

Table 2.

Table 3.

Table 5.

Table 6.

Table 4.

Discussion and Conclusion

Acknowledgment

Biography

Footnotes

References

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases