Flexible Random Intercept Models for Binary Outcomes Using Mixtures of Normals

Brian Caffo; Ming-Wen An; Charles Rohde

doi:10.1016/j.csda.2006.09.031

. Author manuscript; available in PMC: 2008 Jul 15.

Published in final edited form as: Comput Stat Data Anal. 2007 Jul 15;51(11):5220–5235. doi: 10.1016/j.csda.2006.09.031

Flexible Random Intercept Models for Binary Outcomes Using Mixtures of Normals

Brian Caffo ^a,^*, Ming-Wen An ^a, Charles Rohde ^a

PMCID: PMC2031853 NIHMSID: NIHMS24237 PMID: 18628822

Abstract

Random intercept models for binary data are useful tools for addressing between-subject heterogeneity. Unlike linear models, the non-linearity of link functions used for binary data force a distinction between marginal and conditional interpretations. This distinction is blurred in probit models with a normally distributed random intercept because the resulting model implies a probit marginal link as well. That is, this model is closed in the sense that the distribution associated with the marginal and conditional link functions and the random effect distribution are all of the same family. It is shown that the closure property is also attained when the distributions associated with the conditional and marginal link functions and the random effect distribution are mixtures of normals. The resulting flexible family of models is demonstrated to be related to several others present in the literature and can be used to synthesize several seemingly disparate modeling approaches. In addition, this family of models offers considerable computational benefits. A diverse series of examples is explored that illustrates the wide applicability of this approach.

Keywords: Probit-normal, logit-normal, marginalized multilevel models

1. Introduction

Random intercept models for binary data are useful tools for addressing between subject heterogeneity. Typically, random intercept models are implemented by adding a normally distributed random effect into the linear predictor of a generalized linear model (or GLM, see Nelder and Wedderburn, 1972; McCullagh and Nelder, 1989), giving rise to a generalized linear mixed model (or GLMM, see Breslow and Clayton, 1993). Because of the non-linearity of the link functions for binary GLMMs, such models force a distinction between parameter interpretations conditional on random effects and marginal interpretations averaged over random effects.

Random intercept models for binary outcomes with a probit link function and normally distributed random intercept (probit-normal models) have the interesting property that the marginal link function is the inverse of a normal cumulative distribution function (CDF). In this case, we say the model is “closed” in the sense that the distributions associated with the marginal and conditional link functions and the random effect distribution are all of the same family.

In this manuscript we explore a general family of closed random intercept models. In particular, we consider instances when the distribution associated with the conditional link function and the random effect distribution are mixtures of normals. Simple properties of mixture of normals then imply that the distribution function associated with the marginal link function is also a mixture of normals. We emphasize both the conceptual and practical benefits of this class of models. Notably, we explore models that yield conditional and marginal interpretations of parameters.

To summarize the results, the proposed model offers the conceptual benefit of containing a wide class of common models for binary data as either special or limiting cases. Furthermore, we highlight some interesting practical advantages of these models. In particular, the added flexibility of placing a mixture distribution on the random effects protects against misspecification of this distribution. Placing a mixture distribution on the conditional link distribution both allows for easy post-hoc approximations of marginal link functions and easier fitting for marginalized multilevel models. Finally we demonstrate that the latent variable representations of the mixture distributions for the conditional link and random effect distributions can result in simple and elegant Gibbs samplers for Bayesian analysis.

The manuscript is laid out as follows. In Section 2 we present the notation and the model. In Section 3 we connect the mixture of normals model with several variants of random effect models in the literature. In Section 4 we illustrate with a diverse collection of useful applications of the mixture of normals approximation. Finally, in Section 5, we provide a summary and discussion of future work.

2. Random intercept model for binary outcomes

2.1. Notation

Consider the data given in Table 1, which arose from a teratology experiment (Weil, 1970), and was subsequently analyzed in Liang and Hanfelt (1994) and Heagerty and Zeger (1996). The objective is to compare the survival of rat pups in 16 control litters with that of the pups in the 16 treated litters. The treatment was a chemical agent administered to the mothers of each treated litter. We use this data set and experiment to motivate the model.

Table 1.

Teratology data. Numbers are (number survived, number dead) in each litter by treatment arm. For example, in the first control litter, all thirteen pups survived. Source Weil (1970).

	(number survived, number dead)
Control	(13, 0) (12, 0) (9, 0) (9, 0) (8, 0) (8, 0) (12, 1) (11, 1)
	(9, 1) (9, 1) (8, 1) (11, 2) (4, 1) (5, 2) (7, 3) (7, 3)
Treatment	(12, 0) (11, 0) (10, 0) (9, 0) (10, 1) (9, 1) (9, 1) (8, 1)
	(8, 1) (4, 1) (7, 2) (4, 3) (5, 5) (3, 3) (3, 7) (0, 7)

Open in a new tab

Assume that {Y_ij} are repeated binary responses for subject/cluster i = 1, …, I and response j = 1, …, J_i. Therefore, in the Teratology data set, Y_ij represents survival or not (1 versus 0 respectively) for pup j from litter i. Let x_ij be a vector of covariates associated with Y_ij. For the Teratology data x_ij = (1, x_ij₁)^t, containing an intercept term and a treatment indicator, respectively.

Let $F_{w}^{- 1}$ be a link function (see McCullagh and Nelder, 1989) that relates the probability of a success to a function of the covariates. As is typical for binary data, we assume that F_w (the inverse link function) is a distribution function, referred to as the “link distribution”. We assume that

Pr (Y_{i j} = 1 ∣ U_{i} = u_{i}) = F_{w} (Δ_{i j} - u_{i}),

(1)

where the {U_i} are cluster-specific random effects, used to model correlation and heterogeneity arising from unmeasured covariates specific to a cluster. The {U_i} are assumed to be independent and identically distributed random variables, having distribution function F_u. Throughout we assume that the {Y_ij} are conditionally independent given the {U_i}.

Users familiar with GLMMs will note two departures from common notation. First, the “transfer function”, Δ_ij, is typically omitted and replaced with a linear combination of the covariates and slope parameters, such as

Δ_{i j} = x_{i j}^{t} β^{c} .

(2)

This departure is adopted to consider a broader class of marginal and conditional models, which we describe in detail. Secondly, the random effect is subtracted in (1) rather than added, a convention that will be discussed below.

2.2. Conditional models

A conditional model specifies Δ_ij as in (2). The superscript c on the slope effects is used to denote that the effects are conditional, having an interpretation on the conditional link function’s scale.

Defining the Δ_ij as such implies a marginal model. Specifically

Pr (Y_{i j} = 1) = F_{q} (Δ_{i j}),

(3)

where F_q is the distribution of the sum of independent random variables having distribution functions F_u and F_w. To prove this fact, let {W_ij} be iid draws from F_w, then note that

\begin{array}{l} Pr (Y_{i j} = 1) & = E_{U_{i}} [Pr (Y_{i j} = 1 ∣ U_{i} = u_{i})] \\ = \int F_{w} (Δ_{i j} - u_{i}) d F_{u} (u_{i}) \\ = \int Pr (W_{i j} \leq Δ_{i j} - u_{i} ∣ U_{i} = u_{i}) d F_{u} (u_{i}) \\ = E_{U_{i}} [Pr (W_{i j} + u_{i} \leq Δ_{i j} ∣ U_{i} = u_{i})] \\ = Pr (W_{i j} + u_{i} \leq Δ_{i j}) \\ = F_{q} (Δ_{i j}) . \end{array}

From this proof, we hope that the reason for the somewhat unusual convention of subtracting the random intercept is now clear.

We summarize the basic properties of the conditional model as

\begin{array}{l} Conditional model & Pr (Y_{i j} = 1 ∣ U_{i} = u_{i}) = F_{w} (Δ_{i j} - u_{i}) \\ Transfer function & Δ_{i j} = x_{i j}^{t} β^{c} \\ Random effect distribution & Pr (U_{i} \leq u_{i}) = F_{u} (u_{i}) \\ Implied marginal model & Pr (Y_{i j} = 1) = F_{q} (x_{i j}^{t} β^{c}) \end{array} .

As an example, consider again the Teratology data set. Assume that F_w is the standard normal distribution, F_u is a normal distribution with 0 mean and variance $σ_{u, 1}^{2}$ , and Δ_ij is defined as in Equation 2. This model then corresponds to a probit-normal GLMM. By the standard properties of the normal distribution, the distribution of the sum of a standard normal (F_w) and a normal with mean 0 and variance $σ_{u, 1}^{2}$ (F_u) results in F_q being a normal distribution with 0 mean and variance $1 + σ_{u, 1}^{2}$ . Thus, using Equation 3, we have the well known result (see Zeger et al., 1988, for example) that the induced marginal model is

Pr (Y_{i j} = 1) = F_{q} (Δ_{i j}) = F_{q} (β_{0}^{c} + x_{i j 1} β_{1}^{c}) = Φ {\frac{β_{0}^{c} + x_{i j 1} β_{1}^{c}}{{(1 + σ_{u, 1}^{2})}^{1 / 2}}},

where Φ denotes the standard normal distribution function. Hence, the marginal link is also a probit, with the marginal effects being scaled versions of the conditional effects, $β^{c} / {(1 + σ_{u, 1}^{2})}^{1 / 2}$ .

To illustrate, the fitted values for the Teratology data set are ${\hat{β}}_{0}^{c} = 1.474$ , ${\hat{β}}_{1}^{c} = - .585$ , σ̂_u_,1 = .749. Hence −.585 estimates the conditional probit-scale change in the probability of survival comparing the treated to the untreated pups. Correspondingly, ${\hat{β}}_{1}^{c} / {(1 + {\hat{σ}}_{u, 1}^{2})}^{1 / 2} = - .468$ estimates the marginal probit-scale change in the probability of death.

2.3. Marginal Models

Consider again the Teratology probit-normal example from the previous section - i.e. F_w is a standard normal and F_u is a normal with mean 0 and variance $σ_{u, 1}^{2}$ . Had we defined

Δ_{i j} = (β_{0}^{m} + x_{i j 1} β_{1}^{m}) {(1 + σ_{u, 1}^{2})}^{1 / 2},

then the marginal probability of success would satisfy

Pr (Y_{i j} = 1) = F_{q} (Δ_{i j}) = F_{q} {(β_{0}^{m} + x_{i j 1} β_{1}^{m}) {(1 + σ_{u, 1}^{2})}^{1 / 2}} = Φ (β_{0}^{m} + x_{i j 1} β_{1}^{m}) .

Therefore, the estimated slope parameters would have a marginal probit interpretation without rescaling; hence the superscript m. By the invariance of the MLE, the marginal slope estimate is identical to those calculated from the conditional model, ${\hat{β}}_{1}^{m} = - .468$ .

The probit-normal example has emphasized that appropriately defining Δ_ij results in parameters with marginal interpretations. In fact, Heagerty and Zeger (2000) showed that this technique can be applied more generally. Specifically, consider defining

Δ_{i j} = F_{q}^{- 1} {F_{w} (x_{i j}^{t} β^{m})} .

(4)

Under this definition for Δ_ij and using (3), the marginal probability of success satisfies

Pr (Y_{i j} = 1) = F_{q} (Δ_{i j}) = F_{q} [F_{q}^{- 1} {F_{w} (x_{i j}^{t} β^{m})}] = F_{w} (x_{i j}^{t} β^{m})

That is, under an appropriate modification of Δ_ij, the slope parameters can be given a marginal interpretation with F_w as the link distribution. We summarize the marginal model with

\begin{array}{l} Conditional model & Pr (Y_{i j} = 1 ∣ U_{i} = u_{i}) = F_{w} (Δ_{i j} - u_{i}) \\ Transfer function & Δ_{i j} = F_{q}^{- 1} {F_{w} (x_{i j}^{t} β^{m})} \\ Random effect distribution & Pr (U_{i} \leq u_{i}) = F_{u} (u_{i}) \\ Implied marginal model & Pr (Y_{i j} = 1) = F_{w} (x_{i j}^{t} β^{m}) \end{array} .

Consider again the Teratology data set. Allowing F_w to be the logistic distribution yields ${\hat{β}}_{1}^{m} = - .86$ , which estimates the marginal log-odds ratio of survival.

Marginalized multilevel models defined as such offer several advantages over competing methods. Unlike generalized estimating equations (GEE, see Liang and Zeger, 1986), they enjoy the benefits of a completely specified model, which includes the ability to plot profile likelihoods, the availability of likelihood ratio tests and Bayesian analysis and the relaxation on assumptions for missing data. Also, these models are more parsimonious and extensible than other marginal likelihood based models (see Lang and Agresti, 1994).

2.4. Mixtures of normals

The distinction between the conditional and marginal approaches is especially interesting for the probit-normal model, because of the fact that the probit-normal model is closed - the conditional, random effect and marginal link distributions all belong to the same family. In this manuscript we present another closed random intercept model for binary data that is considerably more flexible than the probit-normal model. In particular, when F_w and F_u are mixtures of normal distributions, then so is F_q.

To prove this, consider a model of the form

F_{w} (w) = Σ_{l = 1}^{L_{w}} π_{w, l} Φ (\frac{w - μ_{w, l}}{σ_{w, l}}) and F_{u} (u) = Σ_{l = 1}^{L_{u}} π_{u, l} Φ (\frac{u - μ_{u, l}}{σ_{u, l}}),

where, the {π_w,l} and {π_u,l} are each assumed to be greater than 0 and sum to one. Using simple properties of mixtures of normals and Equation 3, we have that

F_{q} (q) = Σ_{l = 1}^{L_{w}} Σ_{l^{'} = 1}^{L_{u}} π_{w, l} π_{u, l^{'}} Φ {\frac{q - μ_{w, l} - μ_{u, l^{'}}}{{(σ_{w, l}^{2} + σ_{u, l^{'}}^{2})}^{1 / 2}}} .

(5)

That is, under this model, the random effect, conditional and marginal link distributions are all mixtures of normals. We summarize the model as

\begin{array}{l} Conditional model & Pr (Y_{i j} = 1 ∣ U_{i} = u_{i}) = Σ_{l = 1}^{L_{w}} π_{w, l} Φ (\frac{Δ_{i j} - u_{i} - μ_{w, l}}{σ_{w, l}}) \\ Transfer function & Δ_{i j} defined by either (2) or (4) \\ Random effect distribution & Pr (U_{i} \leq u_{i}) = Σ_{l = 1}^{L_{u}} π_{u, l} Φ (\frac{u_{i} - μ_{u, l}}{σ_{u, l}}) \\ Implied marginal model & Pr (Y_{i j} = 1) = Σ_{l = 1}^{L_{w}} Σ_{l^{'} = 1}^{L_{u}} π_{w, l} π_{u, l^{'}} Φ {\frac{Δ_{i j} - μ_{w, l} - μ_{u, l^{'}}}{{(σ_{w, l}^{2} + σ_{u, l^{'}}^{2})}^{1 / 2}}} \end{array} .

(6)

To summarize, the model of interest in this manuscript combines the conditional and marginal approaches, while adding the constraint that the conditional link and random effect distributions are both mixtures of normals. For completeness, we add that the log-likelihood for (6) is

Σ_{i = 1}^{I} log \int_{u_{i}} \prod_{j = 1}^{J_{i}} F_{w} {(Δ_{i j} - u_{i})}^{y_{i j}} {1 - F_{w} (Δ_{i j} - u_{i})}^{1 - y_{i j}} d F_{u} (u_{i})

(7)

(an equation that holds regardless of whether F_w and F_u are mixtures of normals).

Of course, Model 6 is excessively rich with all of the mixture probabilities, means and variances left unspecified; estimating both the conditional link distribution and the random effect distribution is a hopeless cause for most binary data sets. However, by specifying components of one or both of the free mixture distributions, one can achieve a variety of important models. We emphasize the following uses. First, specifying the mixture components for the conditional link distribution yields the ability to (approximately) fit models with other link functions, such as the logit, while retaining many of the computational benefits of probit links. Secondly, estimating a small number of mixture components on the random effects can be used to protect against canonical forms of misspecification. Finally, we find that using mixtures of normals for the conditional link and the random effect distribution leads to particularly convenient Gibbs samplers for conditional models. While many of these ideas have been explored, to our knowledge, they have not been placed in a unified modeling framework combining both conditional and marginal approaches. In the following literature review we highlight connections with existing research.

3. Literature review

In this section we demonstrate that Model 6 contains several important random intercept models for binary data as special or limiting cases. Clearly if F_u is degenerate at 0 and $Δ_{i j} = x_{i j}^{t} β^{c}$ , then the model yields a GLM for binary data. Extending this setting so that F_u is not degenerate and L_u = 1 and μ_u,₁ = 0 yields a GLMM for binary data with a normally distributed random intercept (see Breslow and Clayton, 1993; Agresti et al., 2000).

To be technical, only those GLM and GLMMs for binary data whose conditional link distribution, F_w, is a mixture of normals are special cases of the model we have suggested. However, all of the common link functions (logit, complementary log-log) can be obtained as limiting cases. In Appendix C we provide an algorithm to solve for π_w,l, σ_w,l and μ_w,l that yields very accurate approximations for a finite number of mixture components.

As an example, consider a mixture of normals as an approximation of the logistic distribution. The results using the algorithm in Appendix C with 150 quadrature points and {μ_w,j} = {0} yields the values given in Table 2. Figure 1 shows how accurate the approximation is, by depicting the exact logistic quantiles by a mixture of normals approximation. The mixture of normals approximation, with 5 mixture components, is nearly exact to logits of ± 10. By comparison, the plot also shows the standard normal and T quantiles, both of which are also used as approximations to the logit (see Caffo and Griswold, 2005). The linearity of the probit approximations breaks down at logits of around ± 3, while the T approximation around ± 5. Furthermore, we note that the mixture of normals approximation applies generally, to links other than the logistic, and can be made more accurate by simply adding more mixture components.

Table 2.

Mixing probabilities, standard deviations and means of the mixture components for a mixture-of-normals approximation to the logistic distribution.

π₁	π₂	π₃	π₄	π₅
0.126840496	0.543170220	0.261711982	0.066181589	0.002066853
σ₁	σ₂	σ₃	σ₄	σ₅
2.8420536	1.8257138	1.1943048	1.0757749	0.5631853
μ₁	μ₂	μ₃	μ₄	μ₅
0	0	0	0	0

Open in a new tab

Fig. 1 — Quantile-quantile plot of the logistic distribution (vertical axis) by three approximations: the mixture of normals (solid), the probit (dotted), the T (dashed). A reference identity line is depicted in grey. The corresponding probability scale is given on the right and upper axes.

Approximating the logistic distribution with a single normal distribution or mixture of normals has a rich history (see Demidenko, 2004, and the references therein). Perhaps most relevant, Monahan and Stefanski (1992) used weighted Gaussian distributions to explore the logistic-normal integral.

Representing the link function by a latent variable was considered in the proof of Equation 3. In Section 4.3 we consider a much more ambitious latent variable representation of Model 6, using latent variables to represent the normal mixture distributions as well. The general latent variable approach to binary data was considered in Albert and Chib (1993), who also introduced a Gibbs sampler that motivates the one presented in Section 4.3. Relevant extensions to multivariate settings were considered in Chib and Greenberg (1998); however they focused on probit links and more general covariance structures than the random intercept models considered here.

Consider again the instance where L_u = 1, μ_u,₁ = 0 (the random intercept is normally distributed). As described in Section 2.3, Heagerty and Zeger (2000) defined the Δ_ij to be non-linear (see Equation 4), so that the slope parameters have linear interpretations on the marginal link’s scale. These marginalized multilevel models for binary data are a special case of the models presented (by appropriately defining Δ_ij). Moreover, later we demonstrate that using mixtures of normals for the link distribution can greatly facilitate computing for these models.

A potentially negative aspect of this model is that, because Δ_ij is defined non-linearly, the conditional model is non-linear. The degree to which this is true depends on how close to linear $F_{q}^{- 1} F_{w}$ is. However, this may be of no concern whatsoever if only marginal interpretations are required (though see Lee and Nelder, 2004).

A clear generalization of the marginalized model would replace F_w in Equation 4 with any other desired link distribution, thus, allowing the conditional and marginal link functions to be different. This idea was explored in Griswold (2005) and extended to ordered multinomial data in Caffo and Griswold (2005). Again, this approach easily fits into the current framework by appropriately redefining Δ_ij.

There has been a relatively small amount of research using mixtures of normals to estimate the link distribution. Geweke and Keane (1997) used a mixture of normals as a link function for dichotomous choice models. They presented an MCMC algorithm for fitting the model, including estimating the mixture components. In related work, Erkanli et al. (1993) used mixtures of normals to estimate the link function for ordinal data models and also presented an MCMC algorithm for estimating the mixture components. These approaches are conceptually related to the proposed model by forcing the random effect distribution to be degenerate at 0 and estimating {π_w,l}, {μ_w,l} and { σ_w,l}.

In contrast, using mixtures to estimate the random effect distribution has received much more attention. Perhaps most relevant, Magder and Zeger (1996) used mixtures of normals as the random effect distribution and estimated the mixture parameters with an MCMC algorithm. This corresponds to estimating the {μ_u,l}, {σ_u,l} and {π_u,l}. Aitkin (1999) and Follmann and Lambert (1989) used discrete mixtures to non-parametrically estimate the random effect distribution using maximum likelihood. Such models are obtained under the current framework as the {σ_u,j} tend to 0 and {μ_u,j} and {π_u,j} are estimated.

4. Examples

In this section we explore a subset of practical considerations illustrated through four data sets. We explore two marginal and one conditional modeling settings, where computations are significantly simplified by using mixtures of normals. Moreover, we consider a case where mixture modeling of the random effect offers additional protection against model misspecification.

We consider four well studied data sets for illustration:

The Teratology data, introduced in Section 2.1.
The Approval Rating data set given in Table A.1. This 2×2 contingency table cross-classifies approval ratings of the British Prime Minister collected at two occasions. Here, Y_ij represents approval (1) or not (0) for individual i on occasion j, where j = 1, 2 for the two sampling occasions. The covariate vector, x_ij = (1, x_ij)^t, contains an intercept term and an indicator function representing occasion, taking the value 1 when j = 2.
The Crossover data, given in Table A.2, concerns a well-studied crossover study from Jones and Kenward (1987). Here, Y_ij represents an abnormal (1) or a normal (0) response for subject i during period j for j = 1, 2. The objective is to study the response in relation to the treatment and period. Thus, x_ij = (1, x_ij₁, x_ij₂)^t contains an intercept term, a treatment indicator and a period indicator, taking the value 1 for the second period.
The Item Response data, given in Table A.3, concerns subjects’ response to three scenarios (given in the table) on abortion stratified by gender. We let Y_ij be the response of subject i on question j, where a response of 1 is supportive of legalized abortion (and 0 is not). The covariate vector, x_ij = (1, x_ij)₁, x_ij₂, x_ij₃)^t, contains an intercept term, an indicator for male gender, an indicator for Scenario 1, and an indicator for Scenario 2, respectively. We use the Item Response data to illustrate an instance where a mixture random effect distribution is warranted.

To focus this discussion, we assume that the principal parameter of interest for each data set is: the (marginal or conditional) log odds-ratio comparing treated to controls in the Teratology data, the log odds-ratio comparing time 2 to time 1 for the Approval Rating data, the log odds-ratio comparing treated to controls in the Crossover data and the log odds-ratio comparing males to females in the Item response data. Therefore, in each case the regressor corresponding to the effect of interest is x_ij₁.

4.1. Post-hoc calculation of marginal effects

Given results from a conditional random effect model, an obvious question asks, “What is the corresponding marginal effects and link distribution?”. Such a question is especially relevant in situations such as in interpreting published results, where only effect estimates (and not the original data) are available. Model 6 allows one to approximate the necessary calculations easily.

Consider the conditional logit model

logit {Pr (Y_{i j} = 1 ∣ U_{i} = u_{i})} = x_{i j}^{t} β^{c} - u_{i} and U_{i} \sim N (0, σ_{u, 1}^{2}) .

(8)

If we are willing to accept the approximation that F_w is the 5 component mixtures of normals, then Model 8 is simply a special case of Model 6. Hence, we have that

Pr (Y_{i j} = 1) = F_{q} (x_{i j}^{t} {\hat{β}}^{c}) = Σ_{l = 1}^{L_{w}} π_{w, l} Φ {\frac{{\hat{x}}_{i j}^{t} {\hat{β}}^{c} - μ_{w, l}}{{(σ_{w, l}^{2} + σ_{u, 1}^{2})}^{1 / 2}}},

(9)

where {π_w,l}, {μ_w,l} and {σ_w,l} are from Table 2.

Below we use this approximation to obtain marginal logit interpretations from conditional logit models. However, before doing so, we emphasize the benefits of Equation 9 over Monte Carlo and numerical integration, which can also give very accurate approximations of marginal effects. For example, unlike numerical integration or Monte Carlo approximations, the approximation (9) can be performed quickly and easily. In addition, obtaining delta method estimates of standard errors is also easy. Furthermore, the method applies to any conditional link function, provided the relevant mixture components are known. Finally, and perhaps most importantly, we note that this method leads to an accurate and simple approximation to the marginal link distribution, F_q, whereas quadrature or Monte Carlo approximations only yield F_q for specific values of the covariates.

Table 4 gives estimated marginal logit effects for the four data sets calculated using (9). To illustrate the calculations, consider the Teratology dataset. The fitted values (SE) from Model 8 using the SAS procedure NLMIXED are ${\hat{β}}_{0}^{c} = 2.63 (0.48)$ , ${\hat{β}}_{1}^{c} = - 1.08 (0.63)$ and σ̂_u_,1 = 1.35 (0.33). Plugging the estimated parameters into (9) yields a marginal probability of survival of 0.76 for the treated and 0.88 for the untreated. Then, the marginal log odds ratio of survival (SE) comparing the treated to the control litters is logit(0.76) − logit(0.88) = −0.86 (0.51) (see Appendix D for details about obtaining standard errors).

Table 4.

Marginal logit estimates (standard errors) for the examples from Section 4.1.

Data Set	Marginal Estimate ( $\hat{β_{1}^{m}}$ )
Teratology	−0.86 (0.51)
Approval	−0.16 (0.04)
Crossover	Period 1 0.59 (0.31)	Period 2 10.58 (0.29)
Item Response	Question 1 −0.002 (0.03)	Question 2 −0.002 (0.03)	Question 3 −0.002 (0.03)

Open in a new tab

Table 4 applies these techniques to the three other data sets as well, each time taking the conditional estimates output by SAS (Table 3). Because of the additional covariates in the Crossover and Item Response data sets, the estimated marginal logit effects are reported within strata.

Table 3.

Conditional Estimates (standard errors).

	Parameter
Data Set	σ̂_u_,1	${\hat{β}}_{0}^{c}$	${\hat{β}}_{1}^{c}$	${\hat{β}}_{2}^{c}$	${\hat{β}}_{3}^{c}$
Teratology	1.35 (0.33)	2.63 (0.48)	−1.08 (0.63)
Approval	5.16 (0.35)	1.24 (0.19)	−0.56 (0.14)
Crossover	4.94 (1.91)	2.22 (1.17)	1.86 (0.93)	−1.04 (0.82)
Item Response	8.75 (0.54)	−0.61 (0.34)	−0.013 (0.49)	0.83 (0.16)	0.29 (0.16)

Open in a new tab

4.2. Easier marginalized multilevel models

The previous section addressed the issue of obtaining marginal effects from conditional results, which is useful when interpreting published results without access to the underlying data. However, when the data are available and marginal interpretations are desired, direct fitting is preferable. This section illustrates how the mixture of normals modeling framework can ease the calculations required to directly obtain marginal estimates.

We consider the marginal Model 6 where Δ_ij is given by Equation 4. Furthermore, assume that the $U_{i} \sim N (0, σ_{u, 1}^{2})$ and F_w is the 5 component mixtures of normals approximation to the logistic distribution function.

The benefit of using the F_w as a mixture of normals rather than the exact logistic distribution is that there is a closed form for F_q (see Equation 4); also its quantiles can easily be calculated using Newton’s method. Hence, representing the logistic distribution as such eliminates the difficult task of numerically approximating the convolution integral defining F_q and its inverse. It should be emphasized that while defining F_w as a mixture of normals eases the calculation of F_q and hence Δ_ij, calculation of the likelihood (7) still requires numerical integration, for which we employed Gauss/Hermite quadrature.

We implemented this model for the four data sets. We highlight the use of profile likelihoods - the functions obtained by maximizing the likelihood for each value of the parameter of interest. See Royall (1997) for more information regarding the benefits and interpretation of profile likelihoods.

The results of the model fits are given in Table 5. For example, for the Teratology data, −0.86 (the estimate for $β_{1}^{m}$ ) estimates the change in the marginal log-odds of survival comparing a treated pup to a control. For each of the data sets, Figure 2 shows the profile likelihood with 1/8 and 1/16 reference lines see (see Royall, 1997) for the parameter of interest ( $β_{1}^{m}$ ) and the variance component (σ_u_,1) for each of the four models.

Table 5.

Estimates (standard errors) for marginalized multilevel models from Section 4.2.

	Parameter
Data Set	σ̂_u_,1	${\hat{β}}_{0}^{m}$	${\hat{β}}_{1}^{m}$	${\hat{β}}_{2}^{m}$	${\hat{β}}_{3}^{m}$
Teratology	1.35 (0.33)	2.03 (0.39)	−0.87 (0.51)
Approval	5.16 (0.35)	0.36 (0.05)	−0.16 (0.04)
Crossover	4.94 (1.91)	0.68 (0.28)	0.58 (0.23)	−0.32 (0.23)
Item Response	8.71 (0.54)	−0.048 (0.054)	0.004 (0.074)	0.150 (0.028)	0.053 (0.028)

Open in a new tab

Fig. 2 — Profile likelihood plots with 1/8 and 1/16 reference lines, see (Royall, 1997) for $β_{1}^{m}$ and *σ_u,l* for the marginalized multilevel model from 4.2. The rows from top to bottom correspond to the Teratology, Approval, Crossover and Item Response data sets respectively.

4.3. Bayesian analysis

In this section, we illustrate how specific instances of Model (6) are particularly well suited for Bayesian analysis via MCMC. We note that similar methods utilizing latent variables have been proposed to simulate from the posterior distributions of parameters for binary and multinomial responses (see Albert and Chib, 1993; McCulloch and Rossi, 1994; Chib et al., 1998; Imai and van Dyk, 2005). In addition, close variants of the sampling schemes can be used for the Monte Carlo EM algorithm (see Chib et al., 1998; Natarajan et al., 2000).

We apply these methods to binary responses with random effects, using the mixture of normals link approximation (similar to Geweke and Keane, 1997; Erkanli et al., 1993). Consider the latent variable representation of Model (6) given by

{D_u_,_i} are iid discrete random variables with support 1, …, L_u so that Pr(D_u_,_i = l) = π_u_,_l,
the {U_i} given that the {D_u_,_i = d_u_,_i} are independent N (μ_u,d_u_,_i, $σ_{u, d_{u, i}}^{2}$ ),
the {D_w_,_ij} are discrete iid random variables with support 1, …, L_w so that Pr(D_w_,_ij = l) = π_w_,_l,
the {M_ij} given that the {D_w_,_ij = d_w_,_ij} and {U_i = u_i} are independent Normals with mean μ_w,d_w_,_ij + u_i − Δ_ij and variance $σ_{w, d_{w, i j}}^{2}$ ,
the {Y_ij} are 1 iff M_ij ≤ 0 and 0 otherwise,
each $Δ_{i j} = x_{i j}^{t} β^{c}$ ,

To summarize the model, items 1 and 2 yield the mixture model for F_u, items 3–5 yield the conditional model for the y_ij and item 6 forces a conditional interpretation for the β^c. To prove that 3–5 induces the mixture of normals model for the y_ij, consider

\begin{array}{l} Pr (Y_{i j} = 1 ∣ U_{i} = u_{i}) & = Pr (M_{i j} \leq 0 ∣ U_{i} = u_{i}) \\ = Σ_{l = 1}^{L_{w}} Pr (M_{i j} \leq 0 ∣ U_{i} = u_{i}, D_{w, i j} = l) Pr (D_{w, i j} = l) \\ = Σ_{l = 1}^{L_{w}} Φ (\frac{Δ_{i j} - μ_{w, l} - u_{i}}{σ_{w, l}}) π_{w, l} . \end{array}

We complete the Bayesian model by specifying that β^c ~ Normal(μ_β^c, Σ), $σ_{u, l}^{2} \sim IG (ν, τ)$ . In the examples where the random effect mixture distribution had more than one component, the {μ_u_,_l} were independent normals with mean η and variance θ and {π_u_,_l} were Dirichlet with shape parameters α. We note a small complication is that the mean of the random effect distribution is aliased with an intercept parameter. Therefore, throughout this section we assume that the intercept term is excluded and instead the random effect mean is estimated. A second complication could potentially arise when the random effect mixture distribution has more than one component and the {π_u_,_l}, {μ_u_,_l} and the {σ_u_,_l} are of primary interest, because of the non-identifiability of the parameters due to permutation invariance of the likelihood. In the settings we present this is not an issue, because we focus on the label invariant parameters, such as β^c. In our investigations of the label-dependent parameters, we addressed this issue by imposing an ordering on the {μ_u_,_l}. See Jasra et al. (2005) and the references therein for further information on the “label-switching” problem.

The benefit of this model specification is that all of the full conditionals are common distributions and an elegant Gibbs sampler, which does not employ any Metropolis/Hastings steps, is available for exploring the posterior. The full conditionals are as follows:

the full conditional for D_u_,_i is discrete so that the probability D_u_,_i takes value l is
$\frac{σ_{u, l}^{- 1} exp {- {(u_{i} - μ_{u, l})}^{2} / 2 σ_{u, l}^{2}} π_{u, l}}{Σ_{k} σ_{u, k}^{- 1} exp {- {(u_{i} - μ_{u, k})}^{2} / 2 σ_{u, k}^{2}} π_{u, k}};$
the full conditional for U_i is normal with mean
${(\underset{j}{Σ} σ_{w, d_{w, i j}}^{- 2} + σ_{u, d_{u, i}}^{- 2})}^{- 1} (\underset{j}{Σ} \frac{m_{i j} - μ_{w, d_{w, i j}} + Δ_{i j}}{σ_{w, d_{w, i j}}^{2}} + \frac{μ_{d_{u, i}}}{σ_{u, d_{u, i}}^{2}})$
and variance
${(\underset{j}{Σ} σ_{w, d_{w, i j}}^{- 2} + σ_{u, d_{u, i}}^{- 2})}^{- 1};$
the full conditional for D_w_,_ij is discrete so that the probability D_w_,_ij takes value l is
$\frac{σ_{w, l}^{- 1} exp {- {(m_{i j} - μ_{w, l} - u_{i} + Δ_{i j})}^{2} / 2 σ_{w, l}^{2}} π_{w, l}}{Σ_{k} σ_{w, k}^{- 1} exp {- {(m_{i j} - μ_{w, k} - u_{i} + Δ_{i j})}^{2} / 2 σ_{w, k}^{2}} π_{w, k}};$
the full conditional for M_ij is truncated normal with mean μ_w,d_w_,_ij+ u_i − Δ_ij and variance $σ_{w, d_{w, i j}}^{2}$ with M_ij ≤ 0 when y_ij = 1 and M_ij > 0 when y_ij = 0; that is, the distribution function is
$\frac{Φ {(m_{i j} - μ_{w, d_{w, i j}} - u_{i} + Δ_{i j}) / σ_{w, d_{w, i j}}}}{Φ {(- μ_{w, d_{w, i j}} - u_{i} + Δ_{i j}) / σ_{w, d_{w, i j}}}} I (m_{i j} \leq 0)$
when y_ij = 1 and
$\frac{Φ {(m_{i j} - μ_{w, d_{w, i j}} - u_{i} + Δ_{i j}) / σ_{w, d_{w, i j}}} - Φ {(- μ_{w, d_{w, i j}} - u_{i} + Δ_{i j}) / σ_{w, d_{w, i j}}}}{1 - Φ {(- μ_{w, d_{w, i j}} - u_{i} + Δ_{i j}) / σ_{w, d_{w, i j}}}} I (m_{i j} \geq 0)$
when y_ij = 0;
the full conditional for β^c is multivariate normal with mean
${(Σ^{- 1} + X^{t} W^{- 1} X)}^{- 1} (Σ^{- 1} μ_{β^{c}} + X^{t} W^{- 1} ξ)$
where X is the design matrix, W is a diagonal matrix of the $σ_{w, d_{w, i j}}^{2}$ and ξ is a vector with elements μ_w,d_w_,_ij+ u_i − m_ij and variance
${(Σ^{- 1} + X^{t} W^{- 1} X)}^{- 1};$
the full conditional for $σ_{u, l}^{2}$ is inverted gamma with shape parameter
$ν + \underset{i}{Σ} I (d_{u, i} = l) / 2$
and rate parameter
$τ + \underset{i}{Σ} I (d_{u, i} = l) {(u_{i} - μ_{u, l})}^{2} / 2,$
the full conditional for μ_u_,_l is normal with mean
${\underset{i}{Σ} I (d_{u, i} = l) \frac{1}{σ_{u, d_{u, i}}^{2}} + \frac{1}{θ}}^{- 1} {\underset{i}{Σ} I (d_{u, i} = l) \frac{u_{i}}{σ_{u, d_{u, i}}^{2}} + \frac{n}{θ}}$
and variance
${\underset{i}{Σ} I (d_{u, i} = l) \frac{1}{σ_{u, d_{u, i}}^{2}} + \frac{1}{θ}}^{- 1},$
the full conditional for the {π_u_,_l} is Dirichlet with shape parameter
$α + \underset{i}{Σ} {I (d_{u, i} = 1), \dots, I (d_{u, i} = L_{u})}^{t} .$

We apply the Gibbs sampler to the four datasets employing diffuse priors with a single normal random intercept. Throughout we assume that F_w is the five component mixture of normals approximation to the logistic distribution. Figure 3 shows the estimated posterior distributions for the parameter of interest after 20, 000 simulations and 1, 000 burn-in samples, for each of the data sets employing 1, 2 and 3 mixture components for the random effect distribution. For each of the examples we set ν = 10⁻⁴, τ = 10⁻⁶, η = 0, θ = 10⁴, α = (1, …, 1)^t, μ_β^c = (0, …, 0)^t, and Σ as a diagonal matrix with entries 10. Starting values were obtained in an ad-hoc manner, though using the maximum likelihood estimates when available. Diagnostic checks were made by examining trace plots and plots of some sequential quantiles for parameters of interest (see Figure B.1 for the Item Response data as an example). The hyperparameter specification had little impact for a range of reasonably diffuse priors (results not reported). To investigate the impact of occasional extremely large values of σ_u_,_l, the gamma priors were truncated, which again did not impact results.

Fig. 3 — Estimated posterior densities using for the examples from Section 4.3 using one (solid), two (dashed) and three (dotted) component mixtures for the random effect distributions.

For the random effect distribution, the use of large numbers of mixture components or estimating the number of mixture components, is generally not advisable in this setting, because there is typically insufficient information to identify this distribution in such detail. Instead we emphasize the use of a small number of mixture components, specifically 2 and 3, to protect against some canonical types of misspecification in the form of skewness or a small number of modes (see Agresti et al., 2004). This is particularly interesting for the Item Response data, since a random effect distribution with three modes makes practical sense in this situation. Specifically, it is likely that three populations, one opposed to abortion under any circumstance, one in favor of abortion rights regardless of the circumstance, and a more heterogeneous group, dominate the random effect distribution.

Regardless, for the parameter of interest for these four data sets, misspecification of the random effect distribution does not appear to be impacting results. The estimated posterior densities appear to be the same regardless of the number of mixture components implemented (Figure 3).

5. Discussion

In this manuscript, we discussed the conceptual and computational benefits of using mixtures of normals as the conditional link distribution and random effect distribution for random intercept models for binary outcomes. The use of mixtures of normals could be exploited for further generalizations of the random intercept model. In particular, the extension to multivariate random effects, using mixtures of multivariate normals, is plausible. Furthermore, this mixture approach is potentially very useful for jointly modeling discrete and continuous outcomes. Finally, further work may also explore how the mixture approach facilitates description of the “bridge” random effect distribution as introduced by Wang and Louis (2003) and Wang and Louis (2004).

In closing we note that we have put all of the relevant code to reproduce all of the results, and the derivations of the Bayesian full conditionals at the corresponding author’s web site.

Appendix A. Data sets

Table A.1.

Prime minister approval rating. Source Agresti (2002).

First Survey	Second Survey
First Survey	Approve	Disapprove
Approve	794	150
Disapprove	86	570

Open in a new tab

Table A.2.

Crossover data, frequency of responses by treatment regimen. Source Jones and Kenward (1987).

Response		Treatment sequence
Period 1	Period 2	Drug-Placebo	Placebo-Drug
Normal	Normal	22	18
Abnormal	Normal	0	4
Normal	Abnormal	6	2
Abnormal	Abnormal	6	9

Open in a new tab

Table A.3.

Response to questions on abortion stratified by gender from Agresti (2002). A response of “1” was in favor of legalized abortion in a specific scenario while a response of “0” was not. The scenarios are i if the family has a very low income ii the woman is not married and does not want to marry the man iii for any reason.

	Sequence of Responses
Gender	111	110	011	010	101	100	001	000
male	342	26	6	21	11	32	19	356
female	440	25	14	18	14	47	22	457

Open in a new tab

Appendix B. Example diagnostic plots

Fig. B.1 — Diagnostic trace plots for $β_{1}^{c}$ for the Item Response data. The upper right plot displays the (.1, .2, .4, .6, .8, .9) sequential quantiles.

Appendix C. Approximating link functions with mixtures of normals

In this section we give an estimation procedure for approximating a distribution with a mixture of normals. For a given number of mixture elements, we chose to minimize the Kullback-Leibler divergence (Kullback and Leibler, 1951) between the mixture approximation and the true density. That is, if g is the density associated with the link function of interest and f is the mixture approximation, we minimize E_g[log{f (X)/g(X)}]. The algorithm was obtained as the limit of the standard EM algorithm for estimating normal mixture components as the number of observed data points goes to infinity.

Let $π_{j}^{(t)}, σ_{j}^{(t)}$ and $μ_{j}^{(t)}$ be the current estimates,

\begin{array}{l} P_{j}^{(t)} (x) = \frac{π_{j}^{(t)} φ {(x - μ_{j}^{(t)}) / σ_{j}^{(t)}} / σ_{j}^{(t)}}{Σ_{l} π_{l}^{(t)} φ {(x - μ_{l}^{(t)}) / σ_{l}^{(t)}} / σ_{l}^{(t)}} \\ π_{j}^{(t + 1)} = E_{g} [P_{j}^{(t)} (X)] \\ μ_{j}^{(t + 1)} = E_{g} [X P_{j}^{(t)} (X)] / π_{j}^{(t + 1)} \\ σ_{j}^{(t + 1)} = {E_{g} [X^{2} P_{j}^{(t)} (X)] / π_{j}^{(t + 1)} - {(μ_{j}^{(t + 1)})}^{2}}^{1 / 2} \end{array} .

The expected values generally need to be evaluated numerically. In this manuscript we use Gauss/Hermite quadrature (see Lange, 1999).

Appendix D. Obtaining standard error estimates of marginal parameters using the Multivariate Delta Method

In this section, we detail how to obtain the standard error estimate for ${\hat{β}}_{1}^{m}$ when there is one binary covariate. Note that ${\hat{β}}_{1}^{m}$ is a function of ${\hat{β}}_{0}^{c}$ and ${\hat{β}}_{1}^{c}$ :

{\hat{β}}_{1}^{m} = g (\begin{array}{l} β_{0}^{c} \\ β_{1}^{c} \end{array}) = log {\frac{F_{q} ({\hat{β}}_{0}^{c} + {\hat{β}}_{1}^{c})}{1 - F_{q} ({\hat{β}}_{0}^{c} + {\hat{β}}_{1}^{c})}} - log {\frac{F_{q} ({\hat{β}}_{0}^{c})}{1 - F_{q} ({\hat{β}}_{0}^{c})}},

with gradient

\nabla g^{t} = [\begin{matrix} \frac{f_{q} ({\hat{β}}_{0}^{c} + {\hat{β}}_{1}^{c})}{F_{q} ({\hat{β}}_{0}^{c} + {\hat{β}}_{1}^{c}) {1 - F_{q} ({\hat{β}}_{0}^{c} + {\hat{β}}_{1}^{c})}} - \frac{f_{q} ({\hat{β}}_{0}^{c})}{F_{q} (β_{0}^{c}) {1 - F_{q} (β_{0}^{c})}} \\ \frac{f_{q} ({\hat{β}}_{0}^{c} + {\hat{β}}_{1}^{c})}{F_{q} ({\hat{β}}_{0}^{c} + {\hat{β}}_{1}^{c}) {1 - F_{q} ({\hat{β}}_{0}^{c} + {\hat{β}}_{1}^{c})}} \end{matrix}]

Since ${({\hat{β}}_{0}^{c}, {\hat{β}}_{1}^{c})}^{t}$ is normally distributed with covariance matrix V_β, say, we can apply the multivariate Delta Method to obtain a standard error estimate of $β_{1}^{m}$ is $SE ({\hat{β}}_{1}^{m}) = \nabla g V_{β} \nabla g^{t}$ .

Footnotes

Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

References

Agresti A. Categorical Data Analysis. 2. Wiley; 2002. [Google Scholar]
Agresti A, Caffo B, Ohman-Strickland P. Examples in which misspecification of a random effects distribution reduces efficiency, and possible remedies. Computational Statistics and Data Analysis. 2004;47 (3):639–653. [Google Scholar]
Agresti AA, Booth J, Hobert J, Caffo BS. Random effects modeling of categorical response data. Sociological Methodology. 2000;30:27–80. [Google Scholar]
Aitkin M. A general maximum likelihood analysis of variance components in generalized linear models. Biometrics. 1999;55:117–128. doi: 10.1111/j.0006-341x.1999.00117.x. [DOI] [PubMed] [Google Scholar]
Albert J, Chib S. Bayesian analysis of binary and polychotomous response data. Journal of the American Statistical Association. 1993;88 (422):669–678. [Google Scholar]
Breslow NE, Clayton DG. Approximate inference in generalized linear mixed models. Journal of the American Statistical Association. 1993;88:9–25. [Google Scholar]
Caffo B, Griswold M. Tech rep. Johns Hopkins University: Department of Biostatistics; 2005. A user-friendly tutorial on link-probit-normal models. [Google Scholar]
Chib S, Greenberg E. Analysis of multivariate probit normals. Biometrika. 1998;85 (2):347–361. [Google Scholar]
Chib S, Greenberg E, Chen Y. MCMC methods for fitting and comparing multinomial response models. Economics Working Paper Archive Econ WPA: Econometrics. 1998 Http://econwpa.wustl.edu:80/eps/em/papers/9802/9802001.pdf.
Demidenko E. Mixed Models Theory and Applications. Wiley: 2004. [Google Scholar]
Erkanli A, Stangl D, Mueller P. A bayesian analysis of ordinal data using mixtures. American Statistical Association Proceedings of the Section on Bayesian Statistical Science. 1993:51–56. [Google Scholar]
Follmann DA, Lambert D. Generalizing logistic regression by nonparametric mixing. Journal of the American Statistical Association. 1989;84:295–300. [Google Scholar]
Geweke J, Keane M. Tech Rep. Vol. 237. Federal Reserve Bank; Minneapolis: 1997. Mixture of normals probit model. [Google Scholar]
Griswold M. PhD thesis. Johns Hopkins University; 2005. Complex distributions, hmmmm... hiearchical mixtures of marginalized multilevel models. [Google Scholar]
Heagerty PJ, Zeger SL. Marginal regression models for clustered ordinal measurements. Journal of the American Statistical Association. 1996;91:1024–1036. [Google Scholar]
Heagerty PJ, Zeger SL. Marginalized multilevel models and likelihood inference. Statistical Science. 2000;15 (1):1–26. [Google Scholar]
Imai K, van Dyk D. A Bayesian analysis of the multinomial probit model using marginal data augmentation. Journal of econometrics. 2005;124:311–334. [Google Scholar]
Jasra A, Holmes C, Stephens D. Markov chain Monte Carlo methods and the label switching problem in Bayesian mixture modeling. Statistical Science. 2005;20 (1):50–61. [Google Scholar]
Jones B, Kenward M. Modelling binary data from a three ponit cross-over trial. Statistics in Medicine. 1987;6:555–564. doi: 10.1002/sim.4780060504. [DOI] [PubMed] [Google Scholar]
Kullback S, Leibler RA. On information and sufficiency. The Annals of Mathematical Statistics. 1951;22 (1):79–86. [Google Scholar]
Lang JB, Agresti A. Simultaneously modeling joint and marginal distributions of multivariate categorical responses. Journal of the American Statistical Association. 1994;89:625–632. [Google Scholar]
Lange K. Numerical Analysis for Statisticians. Springer-Verlag; 1999. [Google Scholar]
Lee Y, Nelder J. Conditional and marginal models: Another view. Statistical Science. 2004;19 (2):219–238. [Google Scholar]
Liang K, Hanfelt J. On the use of the quasi-likelihood method in teratolgy experiments. Biometrics. 1994;50:872–880. [PubMed] [Google Scholar]
Liang K, Zeger SL. Longitudinal data analysis using generalized linear models. Biometrika. 1986;73:13–22. [Google Scholar]
Magder L, Zeger S. A smooth nonparametric estimate of a mixing distribution using mixtures of Gaussians. Journal of the American Statistical Association. 1996;91:1141–1151. [Google Scholar]
McCullagh P, Nelder JA. Generalized Linear Models. 2. Chapman & Hall; London: 1989. [Google Scholar]
McCulloch R, Rossi P. An exact likelihood analysis of the multinomial probit model. Journal of Econometrics. 1994;64:207–240. [Google Scholar]
Monahan J, Stefanski L. Normal scale mixture approximations to f* (z) and computation of the logistic-normal integral. In: Balakrishnan, editor. Handbook of the Logistic Distribution. Marcel Dekker; 1992. pp. 529–540. [Google Scholar]
Natarajan R, McCulloch Kiefer N. A Monte Carlo EM method for estimating multinomial probit models. Computational Statistics and Data Analysis. 2000;34:33–50. [Google Scholar]
Nelder JA, Wedderburn RWM. Generalized linear models. Journal of the Royal Statistical Society, Series A, General. 1972;135:370–384. [Google Scholar]
Royall R. Statistical Evidence: A Likelihood Paradigm. Chapman and Hall; 1997. [Google Scholar]
Wang Z, Louis T. Matching conditional and marginal shapes in binary random intercept models using a bridge distribution function. Biometrika. 2003;90 (4):765–775. [Google Scholar]
Wang Z, Louis T. Marginalized binary mixed-effects with covariate-dependent random effects and likelihood inference. Biometrics. 2004;60 (4):884–891. doi: 10.1111/j.0006-341X.2004.00243.x. [DOI] [PubMed] [Google Scholar]
Weil C. Selection of the valid number of sampling units and a consideration of their combination in toxicological studies involving reproduction, teratogenisis or carcinogenisis. Food and cosmetics toxicology. 1970;8:177–182. doi: 10.1016/s0015-6264(70)80337-6. [DOI] [PubMed] [Google Scholar]
Zeger S, Liang K, Albert P. Models for longitudinal data: a generalized estimating equation approach. Biometrics. 1988;44:1049–1060. [PubMed] [Google Scholar]

[R1] Agresti A. Categorical Data Analysis. 2. Wiley; 2002. [Google Scholar]

[R2] Agresti A, Caffo B, Ohman-Strickland P. Examples in which misspecification of a random effects distribution reduces efficiency, and possible remedies. Computational Statistics and Data Analysis. 2004;47 (3):639–653. [Google Scholar]

[R3] Agresti AA, Booth J, Hobert J, Caffo BS. Random effects modeling of categorical response data. Sociological Methodology. 2000;30:27–80. [Google Scholar]

[R4] Aitkin M. A general maximum likelihood analysis of variance components in generalized linear models. Biometrics. 1999;55:117–128. doi: 10.1111/j.0006-341x.1999.00117.x. [DOI] [PubMed] [Google Scholar]

[R5] Albert J, Chib S. Bayesian analysis of binary and polychotomous response data. Journal of the American Statistical Association. 1993;88 (422):669–678. [Google Scholar]

[R6] Breslow NE, Clayton DG. Approximate inference in generalized linear mixed models. Journal of the American Statistical Association. 1993;88:9–25. [Google Scholar]

[R7] Caffo B, Griswold M. Tech rep. Johns Hopkins University: Department of Biostatistics; 2005. A user-friendly tutorial on link-probit-normal models. [Google Scholar]

[R8] Chib S, Greenberg E. Analysis of multivariate probit normals. Biometrika. 1998;85 (2):347–361. [Google Scholar]

[R9] Chib S, Greenberg E, Chen Y. MCMC methods for fitting and comparing multinomial response models. Economics Working Paper Archive Econ WPA: Econometrics. 1998 Http://econwpa.wustl.edu:80/eps/em/papers/9802/9802001.pdf.

[R10] Demidenko E. Mixed Models Theory and Applications. Wiley: 2004. [Google Scholar]

[R11] Erkanli A, Stangl D, Mueller P. A bayesian analysis of ordinal data using mixtures. American Statistical Association Proceedings of the Section on Bayesian Statistical Science. 1993:51–56. [Google Scholar]

[R12] Follmann DA, Lambert D. Generalizing logistic regression by nonparametric mixing. Journal of the American Statistical Association. 1989;84:295–300. [Google Scholar]

[R13] Geweke J, Keane M. Tech Rep. Vol. 237. Federal Reserve Bank; Minneapolis: 1997. Mixture of normals probit model. [Google Scholar]

[R14] Griswold M. PhD thesis. Johns Hopkins University; 2005. Complex distributions, hmmmm... hiearchical mixtures of marginalized multilevel models. [Google Scholar]

[R15] Heagerty PJ, Zeger SL. Marginal regression models for clustered ordinal measurements. Journal of the American Statistical Association. 1996;91:1024–1036. [Google Scholar]

[R16] Heagerty PJ, Zeger SL. Marginalized multilevel models and likelihood inference. Statistical Science. 2000;15 (1):1–26. [Google Scholar]

[R17] Imai K, van Dyk D. A Bayesian analysis of the multinomial probit model using marginal data augmentation. Journal of econometrics. 2005;124:311–334. [Google Scholar]

[R18] Jasra A, Holmes C, Stephens D. Markov chain Monte Carlo methods and the label switching problem in Bayesian mixture modeling. Statistical Science. 2005;20 (1):50–61. [Google Scholar]

[R19] Jones B, Kenward M. Modelling binary data from a three ponit cross-over trial. Statistics in Medicine. 1987;6:555–564. doi: 10.1002/sim.4780060504. [DOI] [PubMed] [Google Scholar]

[R20] Kullback S, Leibler RA. On information and sufficiency. The Annals of Mathematical Statistics. 1951;22 (1):79–86. [Google Scholar]

[R21] Lang JB, Agresti A. Simultaneously modeling joint and marginal distributions of multivariate categorical responses. Journal of the American Statistical Association. 1994;89:625–632. [Google Scholar]

[R22] Lange K. Numerical Analysis for Statisticians. Springer-Verlag; 1999. [Google Scholar]

[R23] Lee Y, Nelder J. Conditional and marginal models: Another view. Statistical Science. 2004;19 (2):219–238. [Google Scholar]

[R24] Liang K, Hanfelt J. On the use of the quasi-likelihood method in teratolgy experiments. Biometrics. 1994;50:872–880. [PubMed] [Google Scholar]

[R25] Liang K, Zeger SL. Longitudinal data analysis using generalized linear models. Biometrika. 1986;73:13–22. [Google Scholar]

[R26] Magder L, Zeger S. A smooth nonparametric estimate of a mixing distribution using mixtures of Gaussians. Journal of the American Statistical Association. 1996;91:1141–1151. [Google Scholar]

[R27] McCullagh P, Nelder JA. Generalized Linear Models. 2. Chapman & Hall; London: 1989. [Google Scholar]

[R28] McCulloch R, Rossi P. An exact likelihood analysis of the multinomial probit model. Journal of Econometrics. 1994;64:207–240. [Google Scholar]

[R29] Monahan J, Stefanski L. Normal scale mixture approximations to f* (z) and computation of the logistic-normal integral. In: Balakrishnan, editor. Handbook of the Logistic Distribution. Marcel Dekker; 1992. pp. 529–540. [Google Scholar]

[R30] Natarajan R, McCulloch Kiefer N. A Monte Carlo EM method for estimating multinomial probit models. Computational Statistics and Data Analysis. 2000;34:33–50. [Google Scholar]

[R31] Nelder JA, Wedderburn RWM. Generalized linear models. Journal of the Royal Statistical Society, Series A, General. 1972;135:370–384. [Google Scholar]

[R32] Royall R. Statistical Evidence: A Likelihood Paradigm. Chapman and Hall; 1997. [Google Scholar]

[R33] Wang Z, Louis T. Matching conditional and marginal shapes in binary random intercept models using a bridge distribution function. Biometrika. 2003;90 (4):765–775. [Google Scholar]

[R34] Wang Z, Louis T. Marginalized binary mixed-effects with covariate-dependent random effects and likelihood inference. Biometrics. 2004;60 (4):884–891. doi: 10.1111/j.0006-341X.2004.00243.x. [DOI] [PubMed] [Google Scholar]

[R35] Weil C. Selection of the valid number of sampling units and a consideration of their combination in toxicological studies involving reproduction, teratogenisis or carcinogenisis. Food and cosmetics toxicology. 1970;8:177–182. doi: 10.1016/s0015-6264(70)80337-6. [DOI] [PubMed] [Google Scholar]

[R36] Zeger S, Liang K, Albert P. Models for longitudinal data: a generalized estimating equation approach. Biometrics. 1988;44:1049–1060. [PubMed] [Google Scholar]

PERMALINK

Flexible Random Intercept Models for Binary Outcomes Using Mixtures of Normals

Brian Caffo

Ming-Wen An

Charles Rohde

Abstract

1. Introduction

2. Random intercept model for binary outcomes

2.1. Notation

Table 1.

2.2. Conditional models

2.3. Marginal Models

2.4. Mixtures of normals

3. Literature review

Table 2.

Fig. 1.

4. Examples

4.1. Post-hoc calculation of marginal effects

Table 4.

Table 3.

4.2. Easier marginalized multilevel models

Table 5.

Fig. 2.

4.3. Bayesian analysis

Fig. 3.

5. Discussion

Appendix A. Data sets

Table A.1.

Table A.2.

Table A.3.

Appendix B. Example diagnostic plots

Fig. B.1.

Appendix C. Approximating link functions with mixtures of normals

Appendix D. Obtaining standard error estimates of marginal parameters using the Multivariate Delta Method

Footnotes

References

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases