Dirichlet negative multinomial regression for overdispersed correlated count data

Daniel M Farewell; Vernon T Farewell

doi:10.1093/biostatistics/kxs050

. 2012 Dec 5;14(2):395–404. doi: 10.1093/biostatistics/kxs050

Dirichlet negative multinomial regression for overdispersed correlated count data

Daniel M Farewell ^1,^*, Vernon T Farewell ²

PMCID: PMC3590929 PMID: 23221819

Abstract

A generic random effects formulation for the Dirichlet negative multinomial distribution is developed together with a convenient regression parameterization. A simulation study indicates that, even when somewhat misspecified, regression models based on the Dirichlet negative multinomial distribution have smaller median absolute error than generalized estimating equations, with a particularly pronounced improvement when correlation between observations in a cluster is high. Estimation of explanatory variable effects and sources of variation is illustrated for a study of clinical trial recruitment.

Keywords: Dirichlet negative multinomial, Longitudinal count data, Regression, Sources of variation

1. Introduction

In a recent study of recruitment of patients into clinical trials, multi-disciplinary oncology teams (hereafter termed MDTs) were followed longitudinally. The number of patients approached to enter clinical trials was recorded in successive 6-month periods. Analysis of these figures highlighted the importance of understanding the nature and sources of variation in count data, and underlying approach rates.

A natural approach to modeling longitudinal count data would be through a generalized linear mixed model (GLMM) with Poisson distributions conditional on random effects. However, there are situations when a marginal modeling approach might be preferred. For example, the effect of explanatory variables at the population-averaged level (marginal effects) would be of interest if these variables were defined by genetic marker information, which is time-invariant and for which subject-specific effects would be less attractive conceptually. In our study of trial recruitment, we were interested in intervention effects on the population as a whole, rather than the recruitment performance of individual MDTs. An additional advantage of marginal modeling, especially in contrast to models with subject-specific conditional explanatory variable effects, is that it has been shown to offer some degree of robustness for the estimation of regression parameters when departure from the underlying assumed random effect structure occurs (Heagerty and Kurland, 2001).

One way of employing a marginal model would be to make use of generalized estimating equation (GEE) methods (Solis-Trapala and Farewell, 2005). However, GEE methods do not provide direct information on sources of variation, and there is a cost in efficiency compared with parametric maximum-likelihood estimationQ4. Bearing this in mind, alternatives were investigated, and an analysis based on the Dirichlet negative multinomial distribution appeared to be potentially useful.

The Dirichlet negative multinomial distribution is a discrete multivariate distribution having support on the non-negative integers, and has been well characterized by Mosimann (1963), Leeds and Gelfand(1989), and Johnson and others (1997)Q5. However, the typical formulation, in terms of a negative multinomial distribution compounded by a Dirchlet distribution, is unappealing in a regression context; we provide a more natural random effects description. A parameterization for this adaptation is given in the following sections and an illustrative analysis of the motivating data is provided subsequently. We also discuss the results of a small simulation study that explores model robustness and finite-sample behavior.

2. Dirichlet negative multinomial distribution

Consider a sequence of independent and identically distributed multinomial trials, each of which has 1+m possible outcomes. Call one of the outcomes a “success”, and suppose it has probability p₀. The other m outcomes have probabilities p₁,…,p_m and describe distinct types of “failure”. If the vector (y₁,…,y_m) counts the m types of failure until y₀>0 successes are observed, then (y₁,…,y_m) is said to have a negative multinomial distribution with parameters (y₀,p₀,…,p_m). Just as in the negative binomial (Pólya) case, y₀ need not be an integer: the negative multinomial distribution remains well defined when y₀ takes any positive real value.

Now suppose that, for each set of multinomial trials, the probabilities p₀,…,p_m are themselves sampled randomly from a Dirichlet distribution, having positive parameters α₀,…,α_m. The resulting distribution of (y₁,…,y_m) is then called Dirichlet (more generically, compound) negative multinomial. It has 2+m parameters, namely y₀,α₀,…,α_m. As shown by Mosimann (1963), the joint probability mass function of (y₁,…,y_m) so-distributed is

(2.1)

where y_•=y₀+⋯+y_m and α_•=α₀+⋯+α_m. The Dirichlet negative multinomial distribution has a finite mean when α₀>1, and a finite mean and covariance when α₀>2 (Mosimann, 1963). Heuristically, these constraints can be thought of as placing upper bounds on the variance of the underlying Dirichlet distribution.

An informative alternative to the foregoing, traditional, derivation of the Dirichlet negative multinomial distribution proceeds as follows. Let y₁,…,y_m be independent Poisson variables with mean values λ₁t,…,λ_mt. If t∼Ga(y₀,y₀) (shape=rate=y₀) and the λ_j are fixed and known, then (y₁,…,y_m) has a negative multinomial distribution. Alternatively, if the λ_j are positive random variables of a particular form, then the Dirichlet negative multinomial distribution results.

Interestingly, the required distributional form for the λ₁,…,λ_m is not the seemingly natural independent gamma construction, but instead a further hierarchical structure where each λ_j has a gamma distribution with shape α_j and a common, random rate parameter λ₀∼Ga(α₀,y₀) that induces correlation in the λ_j. This three-level formulation results in the same Dirichlet negative multinomial distribution as in (2.1), with parameters y₀,α₀,…,α_m. Marginally, the λ_j have compound-gamma (a special case of the generalized beta-prime) distributions (Dubey, 1970). We see that restrictions on α₀ correspond directly to restrictions on the skewness of the distribution of λ₀.

3. Regression formulation

Mosimann (1963) reports the mean of the Dirichlet negative multinomial distribution, but omits its derivation. Our alternative, random effects formulation of the Dirichlet negative multinomial provides a direct route: since t has unit mean, the expectation of y_j is simply the mean of λ_j, which in turn has a compound-gamma distribution whose first moment is α_jy₀/(α₀−1) when α₀>1. Mosimann (1963) also reports variances and covariances of the Dirichlet negative multinomial distribution, and these can similarly be derived from the random effects formulation. Applying the law of total covariance twice, it easily follows that for j≠k, cov(y_j,y_k)=E(y_j)E(y_k)var(t)+cov(λ_j,λ_k)(1+var(t)). The required cov(λ_j,λ_k) can either be determined directly from their bivariate compound gamma distribution (Hutchinson, 1981) or by a third application of the same law. Either way, we find (for j≠k) that

The law of total variance similarly gives var(y_j)=var(λ_j)+E(λ_j)(1+var(t)), which simplifies to

It is apparent from the form of the variance and covariance that the parameterization in terms of the α_j and y₀ is far from ideal for regression purposes. At a minimum, we would like to parameterize in terms of the means μ₁,…,μ_m of the vector random variable (y₁,…,y_m), and have the flexibility to say E(y_j)=μ_j. This is satisfied if we write α_j=μ_j×(α₀−1)/y₀. Additionally, we should like the remaining two parameters to convey something meaningful about the variance structures in the model. We suggest writing φ_NM=1/y₀ and φ_NB=y₀/(α₀−1); the notation is suggestive of the interpretation we ascribe to these parameters in the following section, linked to negative multinomial and negative binomial variation, respectively. A Dirichlet negative multinomial distribution with parameters Inline graphic is therefore a candidate regression model for correlated count data. The regression specification is completed by setting g(μ_j)=x_jβ, where g is a link function (canonically ), x_j is a row vector of explanatory variables, and β is a column vector of regression coefficients to be estimated. The parameter φ_NB provides a convenient way to ensure that E(y_j) exists: when φ_NB>0, then α₀>1 and so E(y_j)=μ_j. Under the regression parameterization, the covariance of two observations y_j and y_k within the same unit takes the form

(3.1)

where δ_jk=1 if j=k, and 0 otherwise. This covariance exists provided φ_NMφ_NB<1.

For observations on multiple individuals, the full likelihood function will be defined in the usual way as a product over all Dirichlet negative multinomial observations of terms of the form (2.1). Expressions for likelihood derivatives are provided in supplemental electronic material (available at Biostatistics online), together with R code to fit the foregoing regression model.

4. Interpretation

The Dirichlet negative multinomial distribution has been used most extensively when modeling the allocation of an unknown number of items (such as product purchases) to a known set of categories (such as individual brands), as for example in Goodhardt and others (1984). In such contexts formulating a model in terms of a negative binomial total with Dirichlet multinomial category probabilities is sensible and interpretable. However, when used as a model for longitudinal (more generically, repeated measures) count data, the concept of category probabilities has no substantive meaning: the number of “categories” could vary from subject to subject, and the total number of “items” is unlikely to reflect any important aspect of the scientific problem. Neither is it natural to extend the notion of categories to regression on other variables, and Goodhardt and others (1984) provide no such extension. In contrast, the Gamma random effect construction offers a more familiar perspective. Subjects have one shared and as many observation-specific random effects as needed, and these act on the marginal mean in the manner of a generalized linear model.

An alternative regression formulation of the Dirichlet negative multinomial distribution to that given in Section3 was developed by Hausman and others (1984) in the econometric context of studying the relationship between patents and expenditure by research & development.Q6 This was based on adopting a negative binomial model for each y_j with probability function

where the overdispersion parameter (δ in their notation) is the same for all y_j from the same individual or unit of observation. For reasons of mathematical convenience, Hausman and others (1984) then assumed that the δ for an individual or a unit is a random variable where δ/(1+δ)∼Beta(a,b). Conceptually, within-unit variation is reflected in the negative binomial assumption, which corresponds to a Poisson model with gamma random effects while between-unit variation is introduced by random variation in the δ values. However, no interpretation was presented for the parameters a and b, and it was recognized by Hausman and others (1984) that no investigation of the two variance components could be made. Once again, category probabilities, implicit in the underlying Beta distribution, offer little insight into this aspect of the model.

Comparison of the form of the resulting probability function with Equation(2.1) reveals that a=α₀ and that b=y₀. This form of the model has been implemented in the Stata package xtnbreg, where the linear predictor is linked to Inline graphic rather than the μ_j and therefore the location term would be the sum of and the location term derived from a fit of the model under our proposed formulation.

Arguably, the random effects formulation of the Dirichlet negative multinomial distribution simply makes it clear that there is no straightforward interpretation of the dispersion parameters. Since (λ_j|λ₀)∼Ga(μ_j/φ_NB,λ₀) and λ₀∼Ga(1/(φ_NMφ_NB)+1,1/φ_NM), we see that the parameter φ_NM influences the variance of observation-specific (λ_j) as well as the shared (t) random effects, while the λ_j themselves induce a small amount of correlation in the y_j.

However, the limiting behavior is informative about their respective roles: consider the limit as φ_NM→0. The random effects t and λ₀ degenerate to 1 and 1/φ_NB, respectively, meaning that the only remaining random effects are the (now independent) λ_j∼Ga(μ_j/φ_NB,1/φ_NB). We deduce that, as φ_NM→0, the distribution of (y₁,…,y_m) converges to independent negative binomial distributions, having parameters μ_j/φ_NB specifying the number of successes, 1/(1+φ_NB) the success probability and φ_NB/(1+φ_NB) the failure probability. As φ_NB→0, the λ_j degenerate to point masses on the μ_j, and hence the distribution of (y₁,…,y_m) converges to a negative multinomial distribution with parameters μ₀,μ₀/μ_•,…,μ_m/μ_•. Here, we have written μ_•=μ₀+⋯+μ_m and (since it is the correct limit) Inline graphic .

Taken together, these two limiting results indicate that φ_NB is linked mainly to within-unit variation across time. In contrast, φ_NM is related to variation between units. Aside from the boundary cases just described, the interplay between φ_NM and φ_NB will typically be fairly complex, both capturing some elements of between- and within-cluster variation. We can, however, make use of Mosimann's observation that the covariance matrix of a Dirichlet negative multinomial distribution is just a scalar multiple of that of the related negative multinomial distribution. The relevant overdispersion constant is given by the expression

In the limiting cases we have discussed, this overdispersion constant tends to 1+φ_NB (when φ_NM→0) and 1 (when φ_NB→0). For small values of φ_NMφ_NB, then, φ_NB can be seen crudely as quantifying the amount of extra-negative multinomial variation present in the data.

Another way of understanding the relative contributions of between- and within-unit variation is to compare the variance of t (which has mean 1) with the covariance matrix of the λ_j, holding the mean values of the latter also equal to 1. When μ_j=μ_k=1, the covariance of λ_j and λ_k is (δ_jk+φ_NM)φ_NB/(1−φ_NMφ_NB), while the variance of t is φ_NM irrespective of the values of the μ_j. This comparison puts t and λ_j on an equal footing; it allows us to investigate how the variability in such a set of observations is spread across the between- and within-unit random effects. On this scale, a direct comparison of variances is meaningful. This approach is illustrated in the following section.

Finally, we suggest similar examination of the estimated covariance matrix for two observations having unit or some other convenient mean. From (3.1), we have that when μ_j=μ_k=1,

Although not at all specific to our random effects formulation, we think that this offers the most direct insight into the variance structures present in the model. For more generality, we could consider the covariance matrix over a range of possible mean values, or plot the variance and covariance as a function of unspecified means μ_j and μ_k.

5. Simulation study

To explore the usefulness of regression based on the Dirichlet negative multinomial distribution, we conducted a small simulation study. We generated data from two, similar models: one was Dirichlet negative multinomial, and the other a related Gamma–Gamma–Poisson model. In both instances, we put (y_j|λ_j,t)∼Poi(λ_jt) for j=1,…,m and t∼Ga(1/φ_NM,1/φ_NM), and in both we set Inline graphic , where x₁ was an equiprobable binary covariate and x₂ a standard Gaussian covariate. The difference between the models lay in the specification of the λ_j; one employed the compound gamma setup that results in our Dirichlet negative multinomial model, the other used independent gamma variables λ_j∼Ga(μ_jφ_NB,φ_NB). These competing models allowed us to investigate the performance of Dirichlet negative multinomial regression when it is, and is not, correctly specified.

In all simulations, β₀=β₁=β₂=1. We used sample sizes n=50,100, and unit sizes m=5,10, and for each of these four combinations we considered all nine combinations of φ_NM=0.25,0.5,0.75 and φ_NB=0.25,0.5,0.75. We compared the performance of Dirichlet negative multinomial regression with a Poisson GEE using an exchangeable working correlation structure, and focus our reporting on the median absolute error. We use the median absolute error rather than root mean squared error because, in a small percentage of cases, the GEE routine either failed to converge or converged to implausible parameter estimates; since these would be noticed by any competent analyst, median absolute error constituted a fairer comparison of performance, being less sensitive to a few suspect estimates.

Table1 reports the results for n=100, m=5; other results are qualitatively similar, and tables giving details may be found in the online appendix (available at Biostatistics online). We note that, within these simulations, Dirichlet negative multinomial regression is uniformly preferable to the GEE approach in terms of median absolute error, and in almost all cases the best 50% of estimates lie within 10% of the true value. Improvements over GEEs were particularly marked for the coefficient β₂ of the continuous covariate x₂, in some cases showing a 4-fold improvement in median absolute error.

Table 1.

For n=100,m=5, median absolute error of the fixed effects estimates (β_j) over 100 simulations, under different variance structures (φ_NM,φ_NB), data generating models (columns, DNM: Dirichlet negative multinomial, GGP: Gamma–Gamma–Poisson), and regression models (rows, DNM: Dirichlet negative multinomial, GEE: generalized estimating equation)

			β₀		β₁		β₂
φ_NM	φ_NB	Model	DNM	GGP	DNM	GGP	DNM	GGP
0.25	0.25	DNM	0.051	0.047	0.031	0.031	0.018	0.019
		GEE	0.067	0.058	0.040	0.040	0.034	0.033
	0.5	DNM	0.050	0.047	0.033	0.029	0.020	0.019
		GEE	0.074	0.059	0.052	0.041	0.035	0.028
	0.75	DNM	0.048	0.036	0.041	0.033	0.022	0.023
		GEE	0.089	0.049	0.060	0.049	0.039	0.031
0.5	0.25	DNM	0.064	0.059	0.035	0.032	0.015	0.016
		GEE	0.098	0.068	0.049	0.047	0.040	0.039
	0.5	DNM	0.070	0.070	0.036	0.035	0.018	0.017
		GEE	0.114	0.093	0.065	0.044	0.066	0.050
	0.75	DNM	0.095	0.061	0.042	0.040	0.020	0.015
		GEE	0.116	0.083	0.077	0.054	0.056	0.035
0.75	0.25	DNM	0.081	0.078	0.034	0.029	0.019	0.020
		GEE	0.216	0.183	0.087	0.088	0.080	0.092
	0.5	DNM	0.115	0.091	0.039	0.040	0.019	0.019
		GEE	0.153	0.106	0.107	0.093	0.074	0.053
	0.75	DNM	0.117	0.095	0.042	0.041	0.026	0.023
		GEE	0.146	0.090	0.093	0.057	0.070	0.048

Open in a new tab

Interestingly, the biggest gains in efficiency are found when φ_NM=0.75 and φ_NB=0.25. This corresponds to the largest correlation between observations within a unit that we explored in our simulations, and it seems that in such cases modeling the variance (even incorrectly) leads to useful improvements in estimation. Note that the apparently superior performance of both models under the Gamma–Gamma–Poisson setup owes simply to the fact that the variance of the data is smaller here: the compounding gamma distribution is absent, so the only meaningful comparisons in the table are vertical ones.

Table2 summarizes the estimation of the variance parameters for Dirichlet negative multinomial regression. Recall that φ_NM and φ_NB have different meanings under the two data-generation setups; nevertheless, the proximity of the mean estimates that arise under the Dirichlet negative multinomial assumption to the true values of φ_NM and φ_NB even under the Gamma–Gamma–Poisson setup lend empirical support to our interpretation of these two parameters. As expected, the approximation becomes less satisfactory as the product φ_NMφ_NB increases. Even when the model is correctly specified, there is some underestimation of the variance components. This, too, is to be expected; bias of variance component estimates toward the null in small samples is a well-known feature of maximum-likelihood inference in random effect models. When the Dirichlet negative multinomial model was correctly specified, median absolute error in the estimation of φ_NM and φ_NB (not tabulated) varied between about 0.05 and 0.15, increasing with φ_NM and φ_NB and decreasing with sample size; this indicates that small, moderate, and large values of the variance components can be fairly reliably distinguished, even in small samples.

Table 2.

For n=100,m=5, mean estimates of variance component (columns) over 100 simulations, under different variance structures (rows) and data generating models (columns, DNM: Dirichlet negative multinomial, GGP: Gamma–Gamma–Poisson)

		φ_NM		φ_NB
φ_NM	φ_NB	DNM	GGP	DNM	GGP
0.25	0.25	0.2440	0.1977	0.2437	0.2426
	0.5	0.2466	0.1701	0.4972	0.4799
	0.75	0.2439	0.1474	0.7491	0.7424
0.5	0.25	0.4680	0.3928	0.2574	0.2406
	0.5	0.4721	0.3438	0.5049	0.4789
	0.75	0.4782	0.3064	0.7584	0.7313
0.75	0.25	0.7265	0.6140	0.2546	0.2325
	0.5	0.7207	0.5388	0.5207	0.4990
	0.75	0.7342	0.4986	0.7636	0.7419

Open in a new tab

6. Analysis of motivating example

Data available for analysis from the study outlined in the introduction consisted of number of patients approached over 3 consecutive 6-month periods for 12 MDTs and over 4 consecutive 6-month periods for another 10 MDTs. In addition, a variety of factors thought to influence approach rates were recorded, including whether the MDT worked in a specialist center or a district general hospital, whether there was more than one research nurse in the team and the number of trials in the team's portfolio during the 6-month period (which, for illustration, is used here as a binary indicator variable for more than five trials). The latter variable may vary across periods, while the first two will not.

The first five rows of Table3 present the results of fitting five different marginal regression models to these data. The regression coefficient for Hospital Type is similar across all models except the Dirichlet negative multinomial, for which the estimate is somewhat smaller. The effect of having multiple nurses increases slightly for the latter three models compared with the first two while the coefficient for number of trials is decreased in the last three models.

Table 3.

For the trial recruitment data, comparative results from a variety of regression models. Standard errors are shown in parentheses for each fixed effect estimate

	Explanatory variables
	Hospital type	Nurses>1	Number of trials>5
Poisson	1.05 (0.087)	0.64 (0.093)	0.64 (0.073)
Negative binomial	1.08 (0.189)	0.62 (0.212)	0.61 (0.185)
Negative multinomial	1.14 (0.197)	0.75 (0.197)	0.32 (0.115)
Negative multinomial (GEE)	1.14 (0.273)	0.75 (0.269)	0.32 (0.189)
Dirichlet negative multinomial	0.93 (0.246)	0.68 (0.276)	0.36 (0.189)
GLMM (Poisson)	1.08 (0.259)	0.74 (0.292)	0.44 (0.199)
GLMM (negative binomial)	1.07 (0.260)	0.72 (0.295)	0.45 (0.207)

Open in a new tab

The Poisson model, as expected, has the smallest estimated standard errors for the regression coefficients. One extra variance component is present in the negative binomial and negative multinomial models and these models show comparable increases in the estimated standard errors for the fixed explanatory variables related to hospital type and the number of research nurses. However, for the variable related to the number of trials recruiting then the standard error is larger for the negative binomial model than for the negative multinomial model. The robust standard errors for the negative multinomial model are larger than the model-based estimates and comparable with those from the Dirichlet negative multinomial model, which has two extra variance components compared with the Poisson model.

The dispersion parameter for the negative binomial model was estimated as 2.35; this corresponds to a variance of 0.43 for a gamma distribution from which random effects are generated for each observed 6-month period of observation. In contrast, the negative multinomial corresponds to random effects associated with each MDT and the variance of the assumed gamma distribution for these effects was estimated as 0.28.

The estimated values of φ_NM and φ_NB from the Dirichlet negative multinomial were 0.062 and 2.8, respectively. The much smaller value for φ_NM suggests that there may be more variation within an MDT's observations than between teams and the estimated overdispersion factor for extra-negative multinomial variation is C=4.6. Because φ_NM and φ_NB are not directly comparable, it is also informative to look at the covariance matrix of the underlying λ_j assuming unit means, as outlined in the section describing parameter interpretation. The standard deviation of t is 0.25, while the standard deviation of a single λ_j when μ_j=1 is 2.14, nearly an order of magnitude larger. To indicate that this variation is indeed largely within-team, we note that the correlation of distinct λ_j,λ_k is just 0.059.

As another illustration, consider a period for which the average number of patients approached is 13.8, the observed mean across periods in the data set. With this mean the estimated standard deviation for an observation from the Dirichlet negative multinomial model would be 10.86. In contrast, the estimated standard deviation from the negative binomial model would be 9.75 and from the negative multinomial model would be 8.16. This demonstrates that there is somewhat more variation within teams than between teams, although we emphasize again that either parameter will pick up some of both types of variation if the other is not included in the model. We note that the (model-based) correlation between numbers of patients approached by a team in distinct 6-month periods each with mean 13.8 is estimated to be just 0.13, emphasizing the extent of variability within the observations on a single MDT.

For comparison purposes, the penultimate row of Table3 gives the results of fitting a generalized linear Poisson mixed model (with both between- and within-MDT random effects) to these data, using the lme4 package in R (Bates and others, 2012; Core Team, 2012. The estimated coefficients and standard errors can be seen to be similar to the Dirichlet negative multinomial model. The between- and within-team estimated standard deviations of the Gaussian random effects were 0.461 and 0.458. This model can be thought of as λ_j having a log-Normal distribution with parameters θ_j and σ²_w and t having a log-Normal distribution with parameters 0 and σ²_b. When these both have unit mean, their variances are Inline graphic and , respectively, indicating that the breakdown of variation into these two components is slightly more comparable within this modeling framework than under a marginal model. The final row of Table3 shows the results of fitting a GLMM with Gaussian random effects between-MDTs but with negative binomial residuals, using the glmmADMB package in R (Fournier and others, 2012; Skaug and others, 2012; because the standard deviations of the within-team gamma (0.457) and between-team Gaussian (0.467) random effects are comparable with those from the Gaussian GLMM, it is unsurprising that the estimated fixed effects are also rather similar.

A further example of substantive analysis using a Dirichlet negative multinomial regression model can be found in the supplementary material (available at Biostatistics online).

7. Discussion

The purpose of this note is to highlight the availability of a convenient marginal regression model for count data with an explicit introduction of between- and within-unit variation. The choice between a marginal or conditional random effects model will typically be application-specific, although some possible advantages to the marginal approach have been discussed. If a marginal model is desired, then the Dirichlet negative multinomial model provides a convenient structure for this purpose, and can provide efficiency gains over GEEs. In particular, having separate mean parameters for each component and two variance parameters makes it suitable for use with unbalanced longitudinal count data with a time-invariant covariance structure.

The lack of separability of the variance components is a weakness of the Dirichlet negative multinomial regression model. The resulting difficulty of interpreting these parameters will often be outweighed by the ease of fitting, and comparing, multiple fixed effects structures without the need for numerical approximation of likelihoods. It is also worth reflecting that, outside of linear models, functional links between means and variances almost always result in difficulty in interpreting variance parameters; the Dirichlet negative multinomial is no different in this regard to, say, the Poisson GLMM used for comparison purposes in our motivating example.

Supplementary material

Supplementary material is available at http://biostatistics.oxfordjournals.org.

Funding

The authors' work was supported by the Medical Research Council (VF: U105261167) and Cancer Research UK.

Supplementary Material

Supplementary Data

supp_14_2_395__index.html^{(811B, html)}

Acknowledgments

This paper was substantially improved through the perceptive comments of two anonymous referees. Conflict of Interest: None declared.

References

Bates D., Maechler M., Bolker B. lme4: Linear Mixed-Effects Models using S4 Classes. 2012 R package version 0.999999-0. [Google Scholar]
Dubey S. D. Compound gamma, beta and F distributions. Metrika. 1970;16:27–31. [Google Scholar]
Fournier D. A., Skaug H. J., Ancheta J., Ianelli J., Magnusson A., Maunder M., Nielsen A., Sibert J. AD Model Builder: using automatic differentiation for statistical inference of highly parameterized complex nonlinear models. Optimization Methods & Software. 2012;27:233–249. [Google Scholar]
Goodhardt G. J., Ehrenberg A. S. C., Chatfield C. The Dirichlet: a comprehensive model of buying behaviour. Journal of the Royal Statistical Society. Series A (General) 1984;147:621–655. [Google Scholar]
Hausman J., Hall B. H., Griliches Z. Econometric models for count data with an application to the patents-R & D relationship. Econometrica. 1984;52:909–938. [Google Scholar]
Heagerty P. J., Kurland B. F. Misspecified maximum likelihood estimates and generalised linear mixed models. Biometrika. 2001;88:973–985. [Google Scholar]
Hutchinson T. P. Compound gamma bivariate distributions. Metrika. 1981;28:263–271. [Google Scholar]
Johnson N. L., Kotz S., Balakrishnan N. Discrete Multivariate Distributions. New York: Wiley; 1997. [Google Scholar]
Leeds S., Gelfand A. E. Estimation for Dirichlet mixed models. Naval Research Logistics. 1989;36:197–214. [Google Scholar]
Mosimann J. E. On the compound negative multinomial distribution and correlations among inversely sampled pollen counts. Biometrika. 1963;50:47–54. [Google Scholar]
R Core Team. R: A Language and Environment for Statistical Computing. 2012 R Foundation for Statistical Computing, Vienna, Austria. [Google Scholar]
Skaug H., Fournier D., Nielsen A., Magnusson A., Bolker B. Generalized Linear Mixed Models using AD Model Builder. 2012 R package version 0.7.2.12. [Google Scholar]
Solis-Trapala I. L., Farewell V. T. Regression analysis of overdispersed correlated count data with subject specific covariates. Statistics in Medicine. 2005;24:2557–2575. doi: 10.1002/sim.2121. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary Data

supp_14_2_395__index.html^{(811B, html)}

supp_kxs050_kxs050supp.pdf^{(269.5KB, pdf)}

[KXS050C1] Bates D., Maechler M., Bolker B. lme4: Linear Mixed-Effects Models using S4 Classes. 2012 R package version 0.999999-0. [Google Scholar]

[KXS050C2] Dubey S. D. Compound gamma, beta and F distributions. Metrika. 1970;16:27–31. [Google Scholar]

[KXS050C3] Fournier D. A., Skaug H. J., Ancheta J., Ianelli J., Magnusson A., Maunder M., Nielsen A., Sibert J. AD Model Builder: using automatic differentiation for statistical inference of highly parameterized complex nonlinear models. Optimization Methods & Software. 2012;27:233–249. [Google Scholar]

[KXS050C4] Goodhardt G. J., Ehrenberg A. S. C., Chatfield C. The Dirichlet: a comprehensive model of buying behaviour. Journal of the Royal Statistical Society. Series A (General) 1984;147:621–655. [Google Scholar]

[KXS050C5] Hausman J., Hall B. H., Griliches Z. Econometric models for count data with an application to the patents-R & D relationship. Econometrica. 1984;52:909–938. [Google Scholar]

[KXS050C6] Heagerty P. J., Kurland B. F. Misspecified maximum likelihood estimates and generalised linear mixed models. Biometrika. 2001;88:973–985. [Google Scholar]

[KXS050C7] Hutchinson T. P. Compound gamma bivariate distributions. Metrika. 1981;28:263–271. [Google Scholar]

[KXS050C8] Johnson N. L., Kotz S., Balakrishnan N. Discrete Multivariate Distributions. New York: Wiley; 1997. [Google Scholar]

[KXS050C9] Leeds S., Gelfand A. E. Estimation for Dirichlet mixed models. Naval Research Logistics. 1989;36:197–214. [Google Scholar]

[KXS050C10] Mosimann J. E. On the compound negative multinomial distribution and correlations among inversely sampled pollen counts. Biometrika. 1963;50:47–54. [Google Scholar]

[KXS050C11] R Core Team. R: A Language and Environment for Statistical Computing. 2012 R Foundation for Statistical Computing, Vienna, Austria. [Google Scholar]

[KXS050C12] Skaug H., Fournier D., Nielsen A., Magnusson A., Bolker B. Generalized Linear Mixed Models using AD Model Builder. 2012 R package version 0.7.2.12. [Google Scholar]

[KXS050C13] Solis-Trapala I. L., Farewell V. T. Regression analysis of overdispersed correlated count data with subject specific covariates. Statistics in Medicine. 2005;24:2557–2575. doi: 10.1002/sim.2121. [DOI] [PubMed] [Google Scholar]

PERMALINK

Dirichlet negative multinomial regression for overdispersed correlated count data

Daniel M Farewell

Vernon T Farewell

Abstract

1. Introduction

2. Dirichlet negative multinomial distribution

3. Regression formulation

4. Interpretation

5. Simulation study

Table 1.

Table 2.

6. Analysis of motivating example

Table 3.

7. Discussion

Supplementary material

Funding

Supplementary Material

Acknowledgments

References

Associated Data

Supplementary Materials

ACTIONS

PERMALINK

RESOURCES

Cite

Add to Collections

PERMALINK

Dirichlet negative multinomial regression for overdispersed correlated count data

Daniel M Farewell

Vernon T Farewell

Abstract

1. Introduction

2. Dirichlet negative multinomial distribution

3. Regression formulation

4. Interpretation

5. Simulation study

Table 1.

Table 2.

6. Analysis of motivating example

Table 3.

7. Discussion

Supplementary material

Funding

Supplementary Material

Acknowledgments

References

Associated Data

Supplementary Materials

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases