Review and Recommendations for Zero-inflated Count Regression Modeling of Dental Caries Indices in Epidemiological Studies

John S Preisser; John W Stamm; D Leann Long; Megan E Kincade

doi:10.1159/000338992

. Author manuscript; available in PMC: 2012 Aug 21.

Published in final edited form as: Caries Res. 2012 Jun 15;46(4):413–423. doi: 10.1159/000338992

Review and Recommendations for Zero-inflated Count Regression Modeling of Dental Caries Indices in Epidemiological Studies

John S Preisser ^1,^✉, John W Stamm ², D Leann Long ³, Megan E Kincade ⁴

PMCID: PMC3424072 NIHMSID: NIHMS394568 PMID: 22710271

Abstract

Over the past five to ten years, zero-inflated count regression models have been increasingly applied to the analysis of dental caries indices (e.g., DMFT, dfms, etc). The main reason for that is linked to the broad decline in children’s caries experience, such that dmf and DMF indices more frequently generate low or even zero counts. This article specifically reviews the application of zero-inflated Poisson and zero-inflated negative binomial regression models to dental caries, with emphasis on the description of the models and the interpretation of fitted model results given the study goals. The review finds that interpretations provided in the published caries research are often imprecise or inadvertently misleading, particularly with respect to failing to discriminate between inference for the class of susceptible persons defined by such models and inference for the sampled population in terms of overall exposure effects. Recommendations are provided to enhance the use as well as the interpretation and reporting of results of count regression models when applied to epidemiological studies of dental caries.

Keywords: dental caries, excess zeros, incidence, increment, overdispersion, prevalence

Dental caries, the most common disease of childhood, can be associated with severe health, social, and economic consequences, which can persist over a lifetime. Statistical modeling plays an important role in understanding caries risk factors and combating their development. More than fifty years ago, Grainger and Reid [1954] observed that caries counts are not generally approximated by a normal distribution [see also Lewsey et al., 2000]. They recommended the negative binomial distribution for describing dental caries indices in populations recognizing, as did Böhning et al. [1999] decades later, that caries counts tend to exhibit overdispersion, i.e., excess variation in them relative to the Poisson distribution. Subsequently, researchers [e.g., Syrjälä et al., 2003; Broffit et al. 2007; Ismail et al., 2008; Maserejian et al. 2008b; Thitasomakul et al., 2009; Wong, Lu and Lo, 2011] have often analyzed the effects of risk factors on dental caries indices using negative binomial regression [Hilbe, 2008].

As oral health has improved in populations over time [Campus et al. 2009 and references therein], epidemiological investigations often find that the traditional count data models provide poor fits to caries data. Distributions of caries counts are increasingly characterized by a large number of zero counts, with proportions in excess of what is expected under the Poisson and negative binomial distributions. To handle such “excess zeros”, Böhning et al. [1999], in a paper published in the statistics literature [see Simonoff, 2003 for comment], proposed zero-inflated Poisson (ZIP) regression for modeling the decayed, missing, and filled teeth index (DMFT). Yet while ZIP models account for large counts of zeros, they do not adequately account for data that have sizeable numbers of large caries counts. To address both excess zeros and overdispersion, Lewsey and Thomson [2004] used zero-inflated negative binomial (ZINB) regression models in examining the effect of economic status on DMF data. In the past five years there have appeared over a dozen publications with applications of both types of these zero-inflated (ZI) models to dental caries indices. Their emergence warrants a review.

To help explain the recent trend in applications of ZI models to caries, consider the following example that illustrates the potential inadequacy of traditional models for suitably describing distributions of caries counts. Figure 1a shows a representative distribution of caries counts with a moderately large number of zeros, as is commonly encountered in surveys and population based studies of caries. For this distribution, the mean and variance of the counts denoted Y are 1.2 and 1.68, respectively (calculated as E(Y) =ΣyP(y) and Var(Y) = Σ[y − E(Y)]²P(y), respectively, where P(y) is the relative frequency of count y). Furthermore, the frequency of zero counts is 40% while the cumulative frequency of counts of size four or greater is 5%. A Poisson distribution cannot adequately describe the distribution in figure 1a because all Poisson distributions have a single parameter (i.e., the mean) to describe the distribution of counts, where their variance equals their mean. Thus, not only does a Poisson distribution with a mean of 1.2 have a variance of 1.2, but it additionally specifies a relative frequency of zero counts of 30% and a relative frequency for counts of four or greater of 3.4% (as determined from its probability function), both of which are too low to adequately describe the distribution in figure 1a. Additionally, even the negative binomial distribution, which has a second parameter allowing for extra variation (over-dispersion) in Y relative to the Poisson, often fails to account for large fractions of zeros commonly observed in studies of dental caries.

Four representative distributions of caries counts are shown in this panel plot. Fig 1a depicts a zero-inflated Poisson (ZIP) distribution (ψ = 0.25, μ = 1.6) as a single population. It has mean ν = 1.2, variance 1.68 and a relative frequency of zero counts of 40%. Fig 1b depicts the same distribution as in fig 1a but as a mixture of two sub-populations (or non-susceptible (white bar) and susceptible (shaded bars) latent classes). Fig 1c shows a ZIP distribution (ψ = 0.10, μ = 2.0) that is defined relative to the distribution of counts in Fig 1b through a single dichotomous covariate in equations (1) and (2) having “consistent” trends in the two ZIP model parts. It has overall mean ν = 1.8, variance 2.16 and a relative frequency of all zero counts of 22%. Fig 1d shows a ZIP distribution (ψ = 0.40, μ = 2.0) that is defined relative to fig 1b through a dichotomous covariate having “inconsistent” trends in the two ZIP model parts. It has overall mean ν = 1.2, variance 2.16 and a relative frequency of all zero counts of 48%.

To overcome these limitations, the Poisson and negative binomial models have been extended to better incorporate the excess zeros, giving rise to ZIP and ZINB models. The expanded capacity for describing caries count distributions is illustrated by the ZIP distribution (defined in the appendix) with parameters ψ = 0.25 and μ = 1.60 for caries counts Y, which perfectly describes the frequency of counts in figure 1a. The fact that this example is constructed to give a perfect fit does not diminish the fact that ZIP models provide expanded families of count distributions that often give much better fits than the Poisson distribution to counts of caries indices, especially when large numbers of zeros are present. Analogous arguments exist for the utility of the ZINB model relative to the negative binomial model in accounting for both extra zeros and extra-Poisson variation.

Notwithstanding their increased usage due to providing improved model fits for counts of caries indices, analysis results based on ZIP and ZINB models may be difficult to interpret [Mwawili et al., 2008; Solinas et al., 2009]. For a fixed set of covariate values, ZI models constitute a mixture of a standard probability distribution for count data, typically Poisson or negative binomial, representing a “susceptible” subpopulation of children said to be at risk for a disease or condition (e.g., dental caries), and a subpopulation of “non-susceptible” children with only zero counts who are considered to be not-at-risk. For a single population (i.e., a model with no observed covariates), figure 1b gives an alternative representation of the relative frequencies of counts for a ZIP model with parameters ψ = 0.25 and μ = 1.60. It illustrates that a randomly selected child from the overall population is not at risk for caries (has excess zero) with probability 0.25; otherwise, with probability 1 − ψ = 0.75 the child is susceptible for caries and is assumed to have a caries count, a zero or otherwise, from a Poisson distribution with a mean μ of 1.60. Note that the probability of an excess zero is given by the length of the white bar, and the mean caries count for the susceptible subgroup is the mean of the distribution represented by the shaded bars. The challenges that dental researchers face in understanding ZI models are related to the fact that the composition of the two respective subpopulations or groups in figure 1b is a theoretical and mathematical construct such that the specific group membership of any given subject in a study with a zero count is unknown; accordingly, these groups are referred to in the literature as latent classes.

In fact, figures 1a and 1b display identical overall distributions for Y. Specifically, figure 1a depicts the overall frequency distribution resulting from the mixture of the two subgroups of figure 1b, white and shaded, without distinguishing between them. The only difference in the figures is that figure 1a, by depicting a single overall distribution for Y, reflects the view of Mwawili et al. [2008] that the mixture distribution model representation (figure 1b) is only a convenient explanation for a distribution of counts with excess zeros.

Noting that oral health research employing ZI models often limit consideration to the model-based latent class parameters ψ and μ via interpretation of regression coefficients that describe their variation, Albert et al. [2012] argue that insufficient emphasis has been given to the effects of caries risk factors on the overall population from which the study sample was drawn. From this perspective, figure 1a displays the distribution for Y that has overall mean caries count, say ν = E(Y), and the probability of a positive caries count, denoted π = Pr(Y > 0), represented by the fraction of all subjects with counts greater than zero. In a cross-sectional study, for example, ν is caries severity or extent and π is caries prevalence in the sampled population. Accordingly, Albert et al. [2012] define “overall effects” as the contrasts (i.e., differences or ratios) of values taken by ν (or π) as they vary across subgroups defined by caries risk factors.

Although epidemiological investigations of risk factors on caries often report on the ZI model parameters ψ and μ and the corresponding subpopulations in figure 1b that they characterize [Gilthorpe et al. 2009], ZI models can be used for investigating overall effects on the caries count Y because ν and π, which we refer to as the population oral health parameters, have known relationships to ψ and μ [Lambert, 1992; Böhning et al., 1999; Albert et al., 2012]. Specifically, caries prevalence π and caries severity ν are related to the ZIP (or ZINB) model parameters as follows:

π = (1 - ψ) [1 - exp (- μ)],

and ν = μ(1 − ψ). Thus, the ZI model parameters ψ and μ provide only indirect information on the population oral health parameters π and ν. As long as ψ > 0, the prevalence π is always less than the probability of not being an excess zero, 1 − ψ. Further, ν ≤ μ so that caries severity in the overall population cannot be greater than the mean count of the susceptible population. Applying these mathematical relationships to the example in figure 1b where ψ = 0.25 and μ = 1.60, prevalence is calculated as π = 0.60 and severity is ν = 1.20 (as noted above) for the overall population represented by figure 1a. Analogous arguments can be made when π denotes caries incidence and ν is mean increment in a longitudinal study.

The motivation for reviewing the usage and reporting of ZI models in the dental caries literature is the belief that drawing well articulated and valid conclusions from ZI models relies on an understanding of the differences between the ZI model parameters and the population oral health parameters, a distinction made two decades ago with an illustration from manufacturing by Lambert [1992] and later for dental caries by Böhning et al. [1999] and Albert et al. [2012]. This article reviews the caries literature for details of applications of ZI models to dental caries counts and assesses, with respect to stated study goals, the quality of interpretations given to the numerical results of these analyses. Finally, recommendations for improved usage and reporting of ZI models are provided.

Materials and Methods

Overall effects in ZI models

The aims of the literature review require consideration of how the presence of excess zeros in caries counts should be taken into account in statistical analysis, interpretation and reporting when interest is in the overall effects of risk factors on caries prevalence (or incidence) π and severity (or mean increment) ν. “Overall effects” refer to the effects of risk factors on caries indices in the overall population represented by the study participants, and not in the effects within a subset of the overall population defined by an unobserved variable assumed to define subgroups (latent classes) that partition that population [Albert et al., 2012]. For simplicity, consider a single dichotomous covariate, x_i = 0 or 1, appearing in each ZI model component for the i-th child. The probability of an excess zero is typically modeled by a logistic regression, which is expressed in its probability form by

ψ_{i} (x_{i}) = \frac{exp (γ_{0} + γ_{1} x_{i})}{1 + exp (γ_{0} + γ_{1} x_{i})}

(1)

and the mean caries count for at-risk children are modeled via a log linear model (equivalently, a generalized linear model with log link function) by

μ_{i} (x_{i}) = exp (β_{0} + β_{1} x_{i}) .

(2)

The regression coefficient γ₁ in (1) represents the log odds ratio of having an excess zero or being in the not-at-risk group for the effect of x_i = 1 relative to x_i = 0. The coefficient β₁ represents the log of the incidence rate ratio (IRR) for the effect of x_i = 1 relative to x_i = 0 in the at-risk group, i.e., ln[μ_i(x_i = 1)/μ_i(x_i = 0)]. Often γ₁ and β₁ are not of primary interest [Albert et al., 2012]. Rather, their importance lies in their relationship to prevalence and severity in the overall population. Substitution of (1) and (2) into ν_i(x_i) = μ_i(x_i)[1 − ψ_i(x_i)] gives the overall mean severity

E (Y_{i} ∣ x_{i}) = \frac{exp (β_{0} + β_{1} x_{i})}{[1 + exp (γ_{0} + γ_{1} x_{i})]} .

Then, the ratio of means, ν_i(x_i = 1)/ν_i(x_i = 0), or IRR for the overall effect of x_i on caries severity is:

\frac{E (Y_{i} ∣ x_{i} = 1)}{E (Y_{i} ∣ x_{i} = 0)} = exp (β_{1}) \frac{[1 + exp (γ_{0})]}{[1 + exp (γ_{0} + γ_{1})]}

(3)

Thus, a ratio factor on the right-hand-side of equation (3), which depends on the excess zero model parameters from equation (1), multiplies the IRR for the at-risk latent class (exp(β₁)) to produce the overall IRR giving the effect of x_i on caries severity in the overall population. Equation (3) generalizes for a continuous covariate (see appendix).

The signs of β₁ and γ₁ impact the direction of the bias when the at-risk latent class IRR is used to estimate the IRR for caries severity in the overall population. First, if γ₁ < 0 (i.e., negative sign) then the ratio factor in equation (3) will be greater than 1.0 and, thus, exp(β₁) will underestimate the IRR for caries severity in the overall population. On the other hand, if γ₁ > 0 (i.e., positive sign) then the ratio factor in equation will be less than 1.0 and exp(β₁) will overestimate the IRR for caries severity in the overall population.

Second, whether β₁ and γ₁ have consistent trends (i.e., opposite signs, one positive and the other negative) or inconsistent trends (same signs, both positive or both negative) usually determines the direction of the bias in relation to the null value of no covariate effect. The scenario of consistent trends is where a covariate decreases (increases) the probability of an excess zero and increases (decreases) the at-risk class mean. The less common scenario of inconsistent trends is where a covariate decreases (increases) the probability of an excess zero and decreases (increases) the at-risk class mean. Considering equation (3), and that β₁ < 0 implies exp(β₁) < 1 while β₁ > 0 implies exp(β₁) > 1, the impact of consistent trends and inconsistent trends in samples sufficiently large for estimates to reflect the relationship of parameters is as follows:

When a covariate has consistent trends in the two ZI model parts (opposite signs), the at-risk latent class IRR will in most cases be biased towards the null hypothesis of no effect in the sense that the IRR (latent) is closer to 1.0 than IRR (severity).
When a covariate has inconsistent trends in the two ZI model parts (same sign), the at-risk latent class IRR will in most cases be biased away from the null hypothesis of no effect in the sense that the IRR (latent) is farther from 1.0 (in either direction) than IRR (severity).

Exceptions to these rules sometimes occur when the IRR (latent) and IRR (severity) have different directions (one has a value less than one, while the other has a value greater than one), but violations to these laws appear to be rare, and when the occur they are often inconsequential with both IRRs being close to 1.0; see online supplementary appendix for further discussion and real life examples where exception to these rules occurred less than 2% of the time.

To illustrate the first scenario (consistent trends), suppose γ₀ = −1.099, β₀ = 0.470, γ₁ = − 1.099, and β₁ = 0.223, corresponding to {ψ₁ = 0.25, μ₁ = 1.60} for the group with x_i = 0 (figure 1b), and {ψ₂ = 0.10, μ₂ = 2.00} for the group with x_i = 1 (figure 1c). Then the IRR for the at-risk latent class is exp(0.223) = 1.25 (which also equals μ₂/μ₁), while the IRR for the overall population calculated from equation (3) or by [μ₂(1 − ψ₂)]/[μ₁(1 − ψ₁)] equals 1.50. In this case, the IRR (latent) underestimates the IRR (severity) and is biased towards the null.

To illustrate the second scenario (inconsistent trends), suppose γ₀ = −1.099, β₀ = 0.470, γ₁ = 0.693, and β₁ = 0.223, corresponding to {ψ₁ = 0.25, μ₁ = 1.60} for the group with x_i = 0 (figure 1b), and {ψ₂ = 0.40, μ₂ = 2.00} for the group with x_i = 1 (figure 1d). Then the IRR for the at-risk latent class shown in figure 1b is μ₃/μ₁ = 1.25 while the IRR for the overall population is [μ₃(1 − ψ₃)]/[μ₁(1 − ψ₁)] = 1.0. In this case, the IRR (latent) overestimates the IRR (severity) and is biased away from the null. The general point is that some bias generally occurs when exponentiated β-coefficients, which are IRRs for the at-risk latent class, are falsely interpreted as IRRs for the overall population.

The at-risk latent class IRR is equivalent to the IRR (severity) for the overall population when γ₁ = 0 in (1) in which case the ratio term on the right-hand-side of (3) cancels out. Thus, when the probability of an ‘excess zero’ does not depend on x_i, exp(β₁) is the IRR for the overall population and its interpretation is the same as in Poisson regression and negative binomial regression. Generally, however, estimates of IRRs based on equation (3) that appropriately adjust for the zero-inflation parameters (e.g., γ₀ and γ₁) provide valid inference for the overall effect of the risk factor on the population oral health parameters.

The general results for the relationship between trends and bias for ZI models (1. and 2. above) also apply to ZI models with multiple covariates. However, we wish to caution the reader that for models having multiple covariates appearing in both model parts (for μ_i and ψ_i) the IRR-severity for a covariate will depend upon the values of other covariates. Specifically, in a ZI model with three covariates, the IRR (severity) for a dichotomous factor x_i₃ is:

\frac{E (Y_{i} ∣ x_{i 1}, x_{i 2}, x_{i 3} = 1)}{E (Y_{i} ∣ x_{i 1}, x_{i 2}, x_{i 3} = 0)} = exp (β_{3}) \frac{[1 + exp (γ_{0} + γ_{1} x_{i 1} + γ_{2} x_{i 2})]}{[1 + exp (γ_{0} + γ_{1} x_{i 1} + γ_{2} x_{i 2} + γ_{3})]} .

(4)

If x_i₁ and x_i₂ are also dichotomous (in this example), there will be four different values for IRR (severity), one for each combination of x_i₁ and x_i₂. A single covariate-adjusted IRR (severity) for the effect of x_i₃ may be obtained by inserting mean values for x_i₁ and x_i₂ (whether they are dichotomous or continuous) into equation (4). Simplification of equation (4) occurs only if some of the covariates are omitted from the excess zero part of the model, or otherwise have their γ– coefficients equal to zero. For example, if x_i₃ does not appear in the excess zeros model (equivalently, γ₃ = 0), then exp(β₃) from a ZI model with three covariates is the IRR for both the at-risk latent class and the overall population relating the risk factor to caries severity, all other covariates being held fixed. The online supplementary material contains a detailed illustration with real-life data involving two dichotomous covariates and one categorical factor.

In addition to the bias arising from mis-intepreting an exponentiated β-term as a population IRR (severity) for caries increment, a second concern is that a variance estimate for an at-risk latent class IRR is likely to underestimate the corresponding variance estimate for the IRR in the overall population since an estimate of the variance for the latter should additionally account for uncertainty associated with estimating γ₀ and γ₁ in (3). The delta method for a scalar function of a random vector may be used to compute the large sample variances of the IRR estimates corresponding to equation (3) or equation (4) conditioning on means or specific covariate values [Albert et al. 2012].

Hurdle models

Hurdle models [Mullahy, 1986; Cameron and Trivedi, 1998] are briefly mentioned, as they are occasionally used or cited in epidemiological studies of dental caries. The hurdle model approach, like the ZI model approach, is a two-part count regression method that deals with the phenomenon of excess zeros in the data. However, hurdle models are distinct from ZI models. The first component of a hurdle model, typically logistic regression, addresses the probability of a zero count (as opposed to an “excess zero”) so that it pertains to prevalence (or incidence) in the overall population, as it targets all zero counts. The second part of a hurdle model is for the mean count among subjects with any caries, i.e., E(Y_i|Y_i > 0). It exceeds the unconditional mean E(Y_i) that is the increment for the overall population, and it is distinct from the mean μ_i for the at-risk latent class in a ZI model. As shown in figures 1b, 1c and 1d, zeros can occur in either part of a ZI model, whereas in hurdle models they are only modeled in the first part. Thus interpretations for ZI model results are incorrect when they are based on language pertaining to hurdle models.

Methodology for review of published articles

The authors sought to identify and review all published research articles in the dental literature that used ZIP or ZINB models to analyze caries experience, using ISI Web of Knowledge V5.4 and PubMed as search engines. The authors were the reviewers - a bio-statistician (JP), an oral health researcher (JS) and two biostatistics students (DL and MK working jointly) both holding graduate research assistantships in oral health. Each reviewer evaluated all the identified papers according to five criteria labeled “A” through “E” in Table 1, each which involved categorical classifications. First (“A”), did the article present caries applications using ZIP models, ZINB models, or both? Second (“B”), did the article assess model goodness of fit, or otherwise provide some rationale for model choice? Assigned ratings were (i) “test” if a statistical test of goodness-of-fit, likelihood ratio statistics or information criterion statistics (e.g., AIC, BIC,) were presented, (ii) “graph” if a graph displaying model fitted values with observed frequencies was provided, (iii) “not shown” if authors claimed to have examined goodness-of-fit but did not report results of their evaluation; and (iv) “none” if the article did not mention an assessment of goodness of fit for the study data. Third (“C”), did the reported ZI model(s) include covariates in both excess zeros and at-risk model parts: yes, no, or indeterminable?

Table 1.

Articles employing ZIP or ZINB models for childhood dental caries and assessment of reporting of model selection (goodness-of-fit) and whether the interpretations made in the article match the reported analysis results, where classifications are latent class (LC), overall effects, or hurdle model effects; bolded entries indicate inappropriate interpretations for the analysis results presented^*

authors & year	caries indices analyzed	ZI models used (A)	goodness -of-fit (B)^**	covariates in excess zero part? (C)^†	analysis results presented (D)^†^‡	interpretations given to results (E)^‡
Lewsey & Thomson 2004	dmfs, DMFS, DFS increm.	ZIP & ZINB	graph	Yes	LC	LC & overall
Hashim et al. 2006	dmfs	ZINB	none	Yes	LC	overall
Arora et al. 2008	dfs, DMFS	ZINB	not shown	Yes	LC	overall & hurdle
Broadbent et al. 2008	a modified DMFS	ZIP	test	Indeterminable	overall	overall
Lim et al. 2008	d₁s, d₂s, d₂mfs, d₁d₂mfs	ZINB	none	No	overall	overall
Maserejian et al. 2008	# carious teeth/surfaces^††	ZIP & ZINB	not shown	Yes	LC	overall & hurdle
Sanders, Lim, Sohn 2008	Non-cavitated lesions^†††	ZIP	not shown	Indeterminable	Indeterminable	overall
Campus et al. 2009	dmfs	ZINB	test	Yes	LC	LC & overall
Ismail et al. 2009	d3-6mfs, d1-6mfs	ZINB	none	Indeterminable	Indeterminable	overall
Solinas et al. 2009	dmfs	ZIP & ZINB	test & graph	Yes	LC	LC & overall
Tramini et al. 2009	D₃₄MFT	ZIP & ZINB	graph	Yes	LC	overall
Javali & Pandit 2010	DMFT	ZIP & ZINB	test & graph^‡‡‡	Indeterminable	Indeterminable	overall
Nelson et al. 2010	DMFT-I,M; DMFT	ZIP & ZINB	not shown	Indeterminable	indeterminable	overall
Broadbent et al. 2011	DMFT, MT	ZINB	none	Indeterminable	Indeterminable	overall
Campus et al. 2011	DS	ZIP	test	Yes	LC^‡‡	LC & overall

Open in a new tab

Appropriateness of interpretation cannot be ascertained for model with indeterminate specification; when the excess zeros model part does not contain covariates, interpretations of coefficients in the Poisson or negative binomial process as overall effects (i.e., Lambert 1992 and Albert et al. 2011) are appropriate.

^**

Goodness-of-fit classifications are: “test” if statistical test(s) or information criterion (AIC, BIC, etc) reported; “graph” if graphical display(s) of fitted values with observed frequencies provided; “not shown” if article claimed to have assessed goodness-of-fit but did not show results;

“none” if article did not report assessment of model fit for the data.

^†

The table entry is ‘indeterminable’ if covariate effect estimates for excess zeros are not reported, and there is no explicit indication as to whether covariates are included in the zero-inflation part of the model.

^‡

Numerical results (e.g., estimates of regression coefficients, odds ratio, incident rate ratios or percent mean change) presented by the article and the interpretations made for them correspond to either overall exposure effects, hurdle model effects, or ZI model latent class (LC) effects.

^††

1^u and 2^u teeth.

^†††

incidence of noncavitated carious tooth surfaces in primary dentition.

^‡‡

Coefficient estimates for the extra zeros part of the ZIP model are not given in the article.

^‡‡‡

Figures 1 and 2 in this article did not include the contributions of excess zeros to the ZIP and ZINB fitted distributions.

The fourth and fifth assessments aimed to determine whether the interpretations of results from ZI models provided in the article were appropriate for the reported numerical results. The fourth criterion “D” was whether the article presented numerical results for overall exposure effects (“overall”) pertaining to the overall population from which the data were selected, e.g., equation (3), or whether results presented were based on estimated regression coefficients γ₁ and β₁ (or their exponentiated forms) corresponding to the two latent classes (“LC”) of the ZI model; The fifth and final criterion “E” assessed the quality of interpretations of reported numerical results from fitting ZI models; in particular, was language used to describe overall exposure effects or latent class-specific effects? Additionally, we noted where articles inappropriately used language pertaining to hurdle models (Mullahy, 1986). The reviewers discussed their initial evaluations as a group to reconcile differences and reach consensus.

Finally, examples of problematic interpretations in the reviewed articles were listed. Specifically, for articles with ZI models that included covariates in the excess zeros part, we assigned to interpretations the following classifications: (i) incorrect - when regression coefficients or effects for the probability of an excess zero were falsely attributed to be effects on prevalence; (ii) misleading - when risk factor effects on mean caries counts in the at-risk group were mis-attributed (or could easily be mis-interpreted) as overall effects on severity; for example, when ‘severity’ is used without an appropriate qualifying phrase alluding to the ‘susceptible’ or at-risk subgroup for which the inference actually applies; (iii) imprecise - when results of estimated latent class effects were used directly as the basis for making unsubstantiated claims, or statements of a speculative nature, regarding overall effects on prevalence π or severity ν.

Results

We identified fifteen refereed papers published through 2011 that used ZIP or ZINB models to analyze dental caries in children or adults, either in cross-sectional or longitudinal observational studies (Table 1). As in the cross-sectional studies that analyzed caries indices, statistical models for independent observations were applied in the longitudinal studies for caries increments. One exception is the article by Broadbent et al. [2008] that employs longitudinal data models for repeated measures of DMFS counts from a life trajectory perspective. Some of the fourteen articles that used statistical methods for independent observations adjusted variance estimation for clustering in the study design.

Among the fifteen papers, six applied ZINB (but not ZIP) models to caries outcomes, three applied ZIP (but not ZINB) models, and six applied both ZIP and ZINB models. Most articles analyzed multiple outcomes. Seven articles assessed goodness-of-fit of the chosen model(s) by reporting results from either statistical tests, information criteria or graphs; four papers made the claim to have inspected their data for model selection without showing the results of their assessment; and the remaining four did not report any assessment of fit for ZI models applied to their caries outcomes. However, in this latter group, each article made a general statement that the ZI model was chosen to address excess zeros in caries data.

Eight articles reported analysis results from ZIP or ZINB models that included covariates in the zero-inflation part of the model in addition to the model for the mean count for the at-risk population. Each of these papers summarized latent class-specific covariate effects with estimated regression coefficients or exponentiated regression coefficients corresponding to a model specification for the probability of an excess zero, ψ, and the mean μ caries index for the susceptible class. Three of the eight articles gave appropriate interpretations for these latent class effects in most instances. For example, in their abstract, Lewsey and Thomson [2004] state “Being in the high-SES group during childhood was associated with a greater probability of being caries-free by age 18 years, over and above that which would be expected from the negative binomial process. Low childhood SES also had the largest coefficient in the modelling of the negative binomial process…” Next, Solinas et al. [2009] give interpretations that are appropriate for the results of the two ZI model parts presented from an analysis of Italian 4-year-olds in the National Pathfinder Survey. Consider the sentence in the abstract “The father’s educational level was significant in both parts of the ZINB regression model (P < 0.05), implying that the degree of caries experience increases in children whose fathers have a low level of education, while the excess of caries-free children decreases.” Similarly, Campus et al. [2009] who analyze data from the same study as Solinas et al. [2009] use appropriate language in their abstract such as “the probability of being an extra zero” and “caries experience” when describing results from the zero-inflation and negative binomial parts of the ZINB model, respectively. Note that “caries experience”, as used in these quotations, has a general meaning that must be understood in the specific modelling context as applying to the at-risk population and not to the overall population.

The difficulty of interpreting results from ZI models often resulted in imprecise, misleading or incorrect inferences. The difficulty arises because interpretations for ZI models involving “excess zeros” and “caries experience” may be cumbersome and quite often at odds with interpretations dental researchers wish to make regarding overall effects relating to prevalence (or incidence) and severity (or increment). The last column of Table 1 shows the types of interpretations made in the reviewed articles, which can be contrasted with the column next to it that shows the types of numerical results presented. Discrepancies, which are bolded in the last column, indicate errors of interpretation, of which a selection is listed in Table 2. For example, Lewsey and Thomson [2004] make a statement where latent class effects are incorrectly interpreted as overall effects for caries severity and prevalence (Table 2). To investigate the resulting bias of their estimates we examined detailed comparisons of IRR estimates for effects in the at-risk latent class versus the IRR estimates for effects in the overall population for the covariates reported in Table 1 of Lewsey and Thomson (2004). As anticipated, the latent class estimates tended to underestimate the “overall” IRR estimates in the sense of having values closer to 1.0. Therefore it is suggested that the latent class estimates not be considered as substitutes or proxies for properly computed “overall” estimates of severity. Additional computations and accompanying text are provided in the on-line Supplementary Material to this article.

Table 2.

Selected examples of incorrect use of the language of prevalence (π) when interpreting results from the zero-inflated part of a ZI model and misleading use of the language of severity (ν) when interpreting results from the Poisson or negative binomial process of a ZI model.

authors & year	Evaluation of quotation	quotation
Lewsey & Thomson 2004	misleading	“Thus, 5-year-old children from low-SES groups had, on average, nearly four more surfaces affected than their high-SES counterparts, and medium-SES children fell between those two groups” (p. 187)
	incorrect	“The models reveal some interesting differences in the way in which SES was associated with caries severity and prevalence in the cohort” (p. 188)
Hashim et al. 2006	incorrect	ZINB … “allows the simultaneous modelling of both the prevalence and severity of caries.” (p. 259)
	incorrect	“Children from low-income families had substantially lower probability of being caries-free. (p. 259)
	misleading	“Males had higher dmfs scores on average..” (p. 259, referring to Table 4)
Arora et al. 2008	incorrect	“the relative odds of having no decayed or filled surfaces” (Table 2)
	incorrect^*	“among children with low ETS exposure, an IQR increment in urine cadmium (0.21 ug/g creatinine) is associated with 17% more affected surfaces in children with any decayed or filled surfaces”. (Table 2)
Maserejian et al. 2008	incorrect	“despite the greater odds of having any permanent dentition caries among Boston children, there was no statistically significant linear association between caries rate and rural/urban setting.” (p. 10)
	incorrect	“P-values were obtained from the logistic portion of the zero-inflated model that represents the probability of having no carious permanent teeth or surfaces.” (Table 3)
	incorrect^*	“P-values were obtained from the linear portion of the zero-inflated model that represents the probability of having an additional carious tooth or surface, given that there were any permanent dentition caries. (Table 3)
Campus et al. 2009	imprecise	“The sociodemographic pattern in the probability of being an extra zero was highly influenced by a high education level of the father, suggesting that this parameter should affect caries severity, as previously reported.” (p. 160)
Solinas et al. 2009	misleading^**	“The aim of this paper was to predict the probability of ‘caries-free’ subjects and the dependence of dmfs index on the influence of childhood sociodemographic factors, through the application of regression models.” (abstract)
Tramini et al. 2009	incorrect	“The probability of a DMFT equal to zero was associated with a lower sugar consumption.” (p. 471)
	misleading	“Except for the logistic model, where the outcome variable was dichotomized, the other models [Poisson, ZIP, ZINB] assessed the association of independent variables with disease severity. (p. 469–70)
Campus et al. 2011	misleading	“The zero-inflated regression model showed that caries severity was significantly associated with smoking…” (abstract)
	misleading	“Caries severity was significantly associated with smoking > 3 years (p=0.02), dental check-up…” (p. 43)
	imprecise	“Smoking habit pattern (heavy smokers), self-satisfaction with teeth and gums, frequency of dental check-up and gingival status were statistically significant, in the probability to being an extra zero. This feature shows a reflection of the higher caries prevalence in subjects with heavy smoking habits.” (p. 45)

Open in a new tab

Incorrect hurdle model interpretation.

^**

This statement is technically correct if the reader understands that ‘caries-free’ (with quotations) means excess-zero and NOT being without caries.

Furthermore, while Solinas et al. [2009] do not report overall exposure effects, a sentence in the article’s abstract could be interpreted by readers that inferences are made for prevalence and severity (Table 2). Similarly, a statement in Campus et al. [2009] linking reported results of a ZINB model for dmfs to caries severity (i.e. ν) is not substantiated by the authors, nor is it qualified by restricting interpretation to the at-risk population. The five other articles that include covariates in both parts of the ZI model have multiple instances of interpreting results for the latent classes of the two model components as overall exposure effects using language of ”prevalence” or ”severity” that is imprecise, misleading or incorrect (Table 2).

The remaining seven articles do not report model results for the excess zero part of the ZI model (Table 1). The first of three papers to analyze dental caries in children participating in the Detroit Dental Health Project, Lim et al. [2008] modeled caries count indices using ZINB models where the model for the excess zeros, as indicated in a footnote to their Table 6, contained only an intercept term. The interpretations applied to the results are appropriate since, as discussed in the methods section, the exclusion of covariates from the excess-zero part of the model permits direct inference for the prevalence and caries increment in the sampled overall population. Ismail et al. [2009] and Sanders et al. [2008] interpret IRRs as overall effects for the population of children in the Detroit study, which would be appropriate assuming that covariates were not included in the excess zeros part of the model; however, like Javali and Pandit [2010], Nelson et al. [2010] and Broadbent et al. [2008, 2011] they do not explicitly state that the intercept-only model was used for excess zeros. In particular, Javali and Pandit [2010] only present estimated regression coefficients for the mean caries process of the at-risk latent class. Nelson et al. [2010] do not report any regression coefficient estimates from their ZIP and ZINB models. Rather, they report the “percent increase in mean” for the outcomes when comparing two groups, using language applicable to overall exposure effects. They do not fully describe the maximum likelihood estimation procedures they employed, and it is not clear whether statistical methodology for estimating overall effects when both model parts contain covariates [e.g., Albert et al., 2012] was used.

Conclusions and Recommendations

With the emergence of zero-inflated count regression models in caries research, authors have made imprecise, misleading and incorrect interpretations of results based on them. Eight of 15 (53%) caries articles reviewed reported ZI models that included covariates in the excess zero model part, which led to problems in interpretations. In five of these eight (63%) articles, authors gave multiple misleading or incorrect interpretations for regression coefficients corresponding to the ZI model’s latent class parameters ψ and μ by interpreting them as overall effects for caries prevalence π and severity ν. The other three articles in this group did not consistently use the terminology of ‘susceptible’ and ‘non-susceptible’ for the at-risk and not-at-risk subgroups, and contained instances of imprecise or misleading interpretations for overall effects given to analytic results for them. While the remaining seven articles did not have any similar concerns with interpretations, only one of them [Lim et al. 2008] clearly specified the ZI model being used by stating that it included only an intercept term for the excess zeros part. In totality, these results support the premise underlying this review that the effects corresponding to the regression coefficients in the two model parts are not typically the parameters of interest, but rather caries researchers usually aim, sometimes unsuccessfully, to study overall effects [Albert et al., 2012].

A research goal of studying overall effects of risk factors on caries indices leads to several recommendations regarding model choice for caries counts exhibiting extra-Poisson dispersion and/or excess zeros: 1) for some populations with high caries rates (e.g., children in developing countries) a negative binomial regression model [Hilbe, 2008] may provide a reasonable approach; 2) otherwise, in the presence of excess zeros, one could consider use of a ZI model with only an intercept in the zero-inflation part of the model; 3) else one could omit the subset of covariates that are of primary interest from the zero-inflation part of the model as this will simplify calculations and interpretations of their effects. However, the researcher should be aware that omitting covariates from the excess zeros part of the model without proper justification could result in bias. When model reduction in the excess zeros part is not warranted such that the primary exposure variables of interest are in both model parts, a fourth approach is to estimate prevalence (or incidence) and severity (or increment) for the overall population as discussed in this article and elsewhere [Lambert 1992; Albert et al. 2012]. Model choice should always be justified with a statistical assessment or, at least, with a graphical display of differences between observed and fitted counts.

For completeness we point out that where count distributions present with small maximum counts, alternatives to employing ZIP or ZINB models may be more appropriate. These alternative procedures are based on mixtures of distributions involving the binomial distribution, including the beta-binomial model (for overdispersion), zero-inflated binomial model (ZIB, for excess zeros) and the zero-inflated beta-binomial model (ZIBB, for both overdispersion and excess zeros) [e.g., Cheung 2006; Albert et al. 2012]. Gilthorpe et al. [2009] discuss that a model based on the Poisson or negative binomial distributions could inappropriately use long tails to describe the bounded distribution of counts, resulting in a poor fit for counts at the upper limit. A poor fit of the ZIP or ZINB model is less likely when the mean count is small relative to the maximum count. We did not find any applications of ZIB or ZIBB models in the caries literature, nor did we identify any caries indices analyzed in the papers reviewed in Table 1 where the maximum count was sufficiently small to question the use of ZIP or ZINB models.

The results of this review lead to several recommendations regarding the implementation, interpretation and reporting of results based on ZIP and ZINB models in caries research: 1) select a model, such as one described in the previous paragraph, based on model fit and in consideration of whether interpretation of its parameters facilitates addressing the research questions of interest (e.g., inferences for overall effects versus latent classes versus hurdle model interpretations); 2) clearly and completely describe the statistical models used, following Lim et al. [2008] for example, by stating for ZI models whether and which covariates were included in the zero-inflation part of the model; 3) report parameter estimates more completely such as intercept terms when regression coefficients are reported and over-dispersion parameter estimates in the case of negative binomial and ZINB models, stating the software and version used for model estimation; and (4) use precise, consistent, and clear language for interpreting results.

In particular, the importance of discriminating between inference for the overall population versus that for the latent class of susceptible children has been emphasized. As in the case of a single dichotomous covariate appearing to both ZI model parts, it was shown that the estimated latent class effect contrasting the two groups μ̂₂/μ̂₁, or equivalently exp(β̂₁), is a biased estimate of the overall effect ν₂/ν₁, which in terms of the ZI model parameters was given in equation (3). It is also the case that the large sample variance estimator of exp(β̂₁) is not equivalent to the variance estimator corresponding to equation (3). The implication is that caries researchers may have underestimated the variance of the overall effect by essentially removing contributions of the excess zeros from its variability in making exp(β₁), and not (3), the basis of inference.

Our review of caries research articles using ZI models had limited scope. We were not able to determine whether the substantive conclusions reached in the dental caries articles were valid because standard errors of IRR-severity estimates can not be determined from published data. In order to compute these, estimates of variances and covariances of regression coefficients are needed. However, misinterpreting effects for the at-risk latent class as overall effects could lead to erroneous conclusions, and, as shown in the example in the supplementary appendix, will usually result in bias towards the null hypothesis of no covariate effects. A second limitation is that the review did not address all statistical and methodological issues in the caries articles, but only those directly relating to the use and reporting of results from ZI models. Finally, the evaluation of assessment criteria involved subjectivity as well as objectivity. Nonetheless, this article concludes that the increasing use of zero-inflated count regression models in dental caries research, along with frequent misinterpretations of their results as documented in our review, calls for greater collaboration among statistical scientists and oral health researchers to advance the quality of caries research utilizing these highly versatile and useful methods.

Supplementary Material

Suppl. Material

NIHMS394568-supplement-Suppl__Material.pdf^{(78.8KB, pdf)}

Acknowledgments

JP wrote the first draft of the paper. JS provided edits and oversight, particularly with respect to dental caries. DL provided edits, giving particular input on statistical content. JP, JS and MK identified caries articles for review. JP, JS, DL and MK evaluated them according to the criteria. This work was partially funded by grant NIEHS T32ES007018.

Appendix

The appendix provides further statistical detail. ZI models define a Bernoulli process where s = 1 selects a class of subjects to be considered not-at-risk for caries (i.e., conditional on being a member in this class, they have an observed zero with probability one), where probability ψ, i.e., ψ = P (s = 1); this the probability of an “excess zero”. Otherwise s = 0 indicates the child is susceptible for having caries with probability 1 − ψ. Additionally, the child’s caries count is generated from a Poisson (or negative binomial) distribution with mean μ. The overall (or marginal) distribution of a child’s caries counts, Y, is:

y = {\begin{array}{l} 0 & with probability ψ if s = 1 \\ g (y ∣ μ, s = 0) & with probability 1 - ψ if s = 0 \end{array}

where g(y|μ, s = 0) is the probability function, for example, g(y|μ, s = 0) = exp(−μ)μ^y/y! in the Poisson case. Let g(y|μ) = g(y|μ, s = 0), and the marginal probability function for P (Y = y) is:

\begin{array}{l} P (Y = 0) = ψ + (1 - ψ) g (0 ∣ μ) \\ P (Y = y) = (1 - ψ) g (y ∣ μ), y > 0. \end{array}

(5)

The expression for P (Y = 0) shows that a zero count can be generated in either the excess-zero part or the at-risk part of a ZI model. The mixture distribution in (5) has mean E(Y) = μ(1 − π), for ZIP or ZINB. The population prevalence is π = 1 − P (Y = 0), where for the Poisson component of the ZIP distribution g(0|μ) = exp(−μ), and for the negative binomial part (i.e., equation (5.9) of Hilbe [2008]) of the ZINB distribution g(0|μ, α) = (1+αμ)⁻^α⁻¹, where α is its over-dispersion parameter. For the ZIP distribution, the variance is var(Y) = E(Y)(1 + ψμ), from which it is clear that the variance exceeds the mean (when ψ> 0) as in all the figures. For ZINB, var(Y) = E(Y)[1 + μ(ψ + α)].

Equations (1) and (2) specified the ZIP (or ZINB) model for a single covariate, and equation (3) gaves its IRR for caries severity in the overall population. The IRR in the overall population for a dichotomous covariate in a model with multiple covariates but with no covariates besides itself in the excess zero part of the model is also given by equation (3). In the specific case of a model with a single continuous covariate x_i₁, the ratio of means ν_i(x_i + 1)/ν_i(x_i), i.e., the IRR for a one unit increase of x_i on mean caries increment, is:

\frac{E (Y_{i} ∣ x_{i} + 1)}{E (Y_{i} ∣ x_{i})} = exp (β_{1}) \frac{[1 + exp (γ_{0} + x_{i 1} γ_{1})]}{[1 + exp (γ_{0} + x_{i 1} + 1) γ_{1}]}

(6)

Note that if x_i = 0, giving a dichotomous covariate, equation (6) reduces to equation (3).

Contributor Information

John S. Preisser, Email: jpreisse@bios.unc.edu, Department of Biostatistics, Gillings School of Global Public Health, University of North Carolina, Chapel Hill, NC 27599-7420, Tel: (919) 966-7265/fax: (919) 966-3804

John W. Stamm, Department of Dental Ecology, School of Dentistry, University of North Carolina, Chapel Hill, NC 27599-7420

D. Leann Long, Department of Biostatistics, Gillings School of Global Public Health, University of North Carolina, Chapel Hill, NC 27599-7420.

Megan E. Kincade, Department of Biostatistics, Gillings School of Global Public Health, University of North Carolina, Chapel Hill, NC 27599-7420

References

1.Albert JM, Wang W, Nelson S. Estimating overall exposure effects for zero-inflated regression models with application to dental caries. Stat Methods Med Res. 2012 doi: 10.1177/0962280211407800. online Sep 8, 2011. [DOI] [PMC free article] [PubMed] [Google Scholar]
2.Arora M, Weuve J, Schwartz J, Wright RO. Association of Environmental Cadmium Exposure with Pediatric Dental Caries. Environ Health Perspect. 2008;116:821–825. doi: 10.1289/ehp.10947. [DOI] [PMC free article] [PubMed] [Google Scholar]
3.Böhning D, Dietz E, Schlattmann P, Mendonca L, Kirchner U. The zero-inflated Poisson model and the decayed, missing and filled teeth index in dental epidemiology. J R Statist Soc A. 1999;162:195–209. [Google Scholar]
4.Broadbent JM, Thomson WM, Poulton R. Trajectory Patterns of Dental Caries Experience in the permanent dentition to the fourth decade of life. J Dent Res. 2008;87:69–72. doi: 10.1177/154405910808700112. [DOI] [PMC free article] [PubMed] [Google Scholar]
5.Broadbent JM, Thomson WM, Boyens JV, Poulton R. Dental plaque and oral health during the first 32 years of life. J Amer Dent Assoc. 2011;142:415–26. doi: 10.14219/jada.archive.2011.0197. [DOI] [PubMed] [Google Scholar]
6.Broffit B, Levy SM, Warren JJ, Cavanaugh JE. An investigation of bottled water use and caries in the mixed dentition. Journal of Public Health Dentistry. 2007;67:151–58. doi: 10.1111/j.1752-7325.2007.00013.x. [DOI] [PubMed] [Google Scholar]
7.Cameron AC, Trivedi PK. Regression Analysis of Count Data. NewYork: Cambridge University Press; 1998. [Google Scholar]
8.Campus G, Solinas G, Strohmenger L, Cagetti MG, Senna A, Minelli L, Majori S, Montagna MT, Reali D, Castiglia P. National pathfinder survey on children’s oral health in Italy: pattern and severity of caries disease in 4-year-olds. Caries Research. 2009;43:155–62. doi: 10.1159/000211719. [DOI] [PubMed] [Google Scholar]
9.Campus G, Cagetti MG, Senna A, Blasi G, Mascolo A, Demarchi P, Strohmenger L. Does smoking increase risk for caries? A cross-sectional study in an Italian Military Academy. Caries Research. 2011;45:40–46. doi: 10.1159/000322852. [DOI] [PubMed] [Google Scholar]
10.Cheung YB. Growth and cognition function of Indonesian children: zero-inflated proportion models. Statistics in Medicine. 2006;25:3011–3022. doi: 10.1002/sim.2467. [DOI] [PubMed] [Google Scholar]
11.Gilthorpe MS, Frydenberg M, Cheng Y, Baelum V. Modelling count data with excessive zeros: The need for class prediction in zero-inflated models and the issue of data generation in choosing between zero-inflated and generic mixture models for dental caries data. Statist Med. 2009;28:3539–3553. doi: 10.1002/sim.3699. [DOI] [PubMed] [Google Scholar]
12.Grainger RM, Reid DBW. Distribution of dental caries in children. Journal of Dental Research. 1954;33:613–623. doi: 10.1177/00220345540330050501. [DOI] [PubMed] [Google Scholar]
13.Hashim R, Thomson WM, Ayers KMS, Lewsey JD, Awad M. Dental caries experience and use of dental services among preschool children in Ajman, UAE. International Journal of Paediatric Dentistry. 2006;16:257262. doi: 10.1111/j.1365-263X.2006.00746.x. [DOI] [PubMed] [Google Scholar]
14.Hilbe JM. Negative Binomial Regression. New York: Cambridge University Press; 2008. p. 251. [Google Scholar]
15.Hujoel PP, Isokangas PJ, Tieko J, Davis S, Lamont RJ, DeRouen TA, Makinen KK. A re-analysis of caries rates in a preventive trial using poisson regression models. J Dent Res. 1994;73:573–579. doi: 10.1177/00220345940730021401. [DOI] [PubMed] [Google Scholar]
16.Ismail AI, Sohn W, Tellez M, Willem JM, Betz J, Lepkowski J. Risk factors for dental caries using the International Caries Detection and Assessment System (ICDAS) Community. Dent Oral Epidemiol. 2008;36:55–68. doi: 10.1111/j.1600-0528.2006.00369.x. [DOI] [PubMed] [Google Scholar]
17.Ismail AI, Sohn W, Lim S, Willem JM. Predictors of dental caries progression in primary teeth. J Dent Res. 2009;88:270–275. doi: 10.1177/0022034508331011. [DOI] [PMC free article] [PubMed] [Google Scholar]
18.Javali SB, Pandit PV. Using zero inflated models to analyze dental caries with many zeroes. Indian Journal of Dental Research. 2010;21:480–485. doi: 10.4103/0970-9290.74210. [DOI] [PubMed] [Google Scholar]
19.Lambert D. Zero-inflated Poisson regression, with an application to defects in manufacturing. Technometrics. 1992;34:114. [Google Scholar]
20.Lewsey JD, Gilthorpe MS, Bulman JS, Bedi R. Is modelling dental caries a ‘normal’ thing to do? Community Dentistry and Oral Epidemiology. 2000;17:21217. [PubMed] [Google Scholar]
21.Lewsey J, Thomson W. The utility of the zero-inflated Poisson and zero-inflated negative binomial models: a case study of cross-sectional and longitudinal DMF data examining the effect of socio-economic status. Community Dentistry and Oral Epidemiology. 2004;32:18389. doi: 10.1111/j.1600-0528.2004.00155.x. [DOI] [PubMed] [Google Scholar]
22.Levin KA, Davies CA, Topping GVA, Assaf AV, Pitts NB. Inequalities in dental caries of 5-year-old children in Scotland. European Journal of Public Health. 2009;19:337–342. doi: 10.1093/eurpub/ckp035. [DOI] [PubMed] [Google Scholar]
23.Lim S, Sohn W, Burt BA, Sandretto AM, Kolker JL, Marshall TA, Ismail A. Cariogenicity of soft drinks, milk and fruit juice in low-income African-American children: a longitudinal study. JADA. 2008;139(7):959–967. doi: 10.14219/jada.archive.2008.0283. [DOI] [PubMed] [Google Scholar]
24.Listl S. Family composition and children’s dental health behavior: evidence from Germany. Journal of Public Health Dentistry. 2011;71:91–101. doi: 10.1111/j.1752-7325.2010.00205.x. [DOI] [PubMed] [Google Scholar]
25.Maserejian NN, Tavares MA, Hayes C, Soncini JA, Trachtenberg FL. Rural and Urban Disparities in Caries Prevalence in Children with Unmet Dental Needs: The New England Childrens Amalgam Trial. Journal of Public Health Dentistry. 2008a;68:7–13. doi: 10.1111/j.1752-7325.2007.00057.x. [DOI] [PubMed] [Google Scholar]
26.Maserejian NN, Trachtenberg F, Hayes C, Tavares M. Oral health disparities in children of immigrants: dental caries experience at enrollment and during follow-up in the New England children’s Amalgam Trial. Journal of Public Health Dentistry. 2008b;68:14–21. doi: 10.1111/j.1752-7325.2007.00060.x. [DOI] [PubMed] [Google Scholar]
27.Mullahy J. Specification and testing of some modified count data models. Journal of Econometrics. 1986;3:341365. [Google Scholar]
28.Mwalili SM, Lesaffre E, Declerck D. The zero-inflated negative binomial regression model with correction for misclassification: an example in caries research. Stat Methods Med Res. 2008;17:123–139. doi: 10.1177/0962280206071840. [DOI] [PubMed] [Google Scholar]
29.Nelson S, Albert JM, Lombardi G, Wishnek S, Asaad G, Kirchner HL, Singer LT. Dental caries and enamel defects in very low birth weight adolescents. Caries Res. 2010;44:509–518. doi: 10.1159/000320160. [DOI] [PMC free article] [PubMed] [Google Scholar]
30.Sanders AE, Lim S, Sohn W. Resilience to urban poverty: theoretical and empirical considerations for population health. American Journal of Public Health. 2008;98:1101–1106. doi: 10.2105/AJPH.2007.119495. [DOI] [PMC free article] [PubMed] [Google Scholar]
31.Simonoff JS. Analyzing Categorical Data. New York: Springer; 2003. p. 496. [Google Scholar]
32.Solinas G, Campus G, Maida C, Sotgiu G, Cagetti MG, Lesaffre E, Castiglia P. What statistical method should be used to evaluate risk factors associated with dmfs index? Evidence from the National Pathfinder Survey of 4-year-old Italian children Community. Dent Oral Epidemiol. 2009;37:539–546. doi: 10.1111/j.1600-0528.2009.00500.x. [DOI] [PubMed] [Google Scholar]
33.Syrjälä AM, Niskanen MC, Ylöstalo P, Knuuttila ML. Metabolic control as a modifier of the association between salivary factors and dental caries among diabetic patients. Caries Research. 2003;37:142–7. doi: 10.1159/000069020. [DOI] [PubMed] [Google Scholar]
34.Thitasomakul S, Piwat S, Thearmontree A, Chankanka O, Pithpornchaiyakul W, Madyusoh S. Risks for early childhood caries analyzed by negative binomial models. J Dent Res. 2009;88:137–141. doi: 10.1177/0022034508328629. [DOI] [PubMed] [Google Scholar]
35.Tramini P, Molinari N, Tentscher M, Demattei C, Schulte AG. Association between caries experience and body mass index in 12-year-old French children. Caries Research. 2009;43:468–473. doi: 10.1159/000264684. [DOI] [PubMed] [Google Scholar]
36.Wong MC, Lu HX, Lo EC. Caries increment over 2 years in preschool children: a life course approach. Int J Paediatr Dent. 2011 doi: 10.1111/j.1365-263X.2011.01159.x. [epub ahead of print] [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Suppl. Material

NIHMS394568-supplement-Suppl__Material.pdf^{(78.8KB, pdf)}

[R1] 1.Albert JM, Wang W, Nelson S. Estimating overall exposure effects for zero-inflated regression models with application to dental caries. Stat Methods Med Res. 2012 doi: 10.1177/0962280211407800. online Sep 8, 2011. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R2] 2.Arora M, Weuve J, Schwartz J, Wright RO. Association of Environmental Cadmium Exposure with Pediatric Dental Caries. Environ Health Perspect. 2008;116:821–825. doi: 10.1289/ehp.10947. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R3] 3.Böhning D, Dietz E, Schlattmann P, Mendonca L, Kirchner U. The zero-inflated Poisson model and the decayed, missing and filled teeth index in dental epidemiology. J R Statist Soc A. 1999;162:195–209. [Google Scholar]

[R4] 4.Broadbent JM, Thomson WM, Poulton R. Trajectory Patterns of Dental Caries Experience in the permanent dentition to the fourth decade of life. J Dent Res. 2008;87:69–72. doi: 10.1177/154405910808700112. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R5] 5.Broadbent JM, Thomson WM, Boyens JV, Poulton R. Dental plaque and oral health during the first 32 years of life. J Amer Dent Assoc. 2011;142:415–26. doi: 10.14219/jada.archive.2011.0197. [DOI] [PubMed] [Google Scholar]

[R6] 6.Broffit B, Levy SM, Warren JJ, Cavanaugh JE. An investigation of bottled water use and caries in the mixed dentition. Journal of Public Health Dentistry. 2007;67:151–58. doi: 10.1111/j.1752-7325.2007.00013.x. [DOI] [PubMed] [Google Scholar]

[R7] 7.Cameron AC, Trivedi PK. Regression Analysis of Count Data. NewYork: Cambridge University Press; 1998. [Google Scholar]

[R8] 8.Campus G, Solinas G, Strohmenger L, Cagetti MG, Senna A, Minelli L, Majori S, Montagna MT, Reali D, Castiglia P. National pathfinder survey on children’s oral health in Italy: pattern and severity of caries disease in 4-year-olds. Caries Research. 2009;43:155–62. doi: 10.1159/000211719. [DOI] [PubMed] [Google Scholar]

[R9] 9.Campus G, Cagetti MG, Senna A, Blasi G, Mascolo A, Demarchi P, Strohmenger L. Does smoking increase risk for caries? A cross-sectional study in an Italian Military Academy. Caries Research. 2011;45:40–46. doi: 10.1159/000322852. [DOI] [PubMed] [Google Scholar]

[R10] 10.Cheung YB. Growth and cognition function of Indonesian children: zero-inflated proportion models. Statistics in Medicine. 2006;25:3011–3022. doi: 10.1002/sim.2467. [DOI] [PubMed] [Google Scholar]

[R11] 11.Gilthorpe MS, Frydenberg M, Cheng Y, Baelum V. Modelling count data with excessive zeros: The need for class prediction in zero-inflated models and the issue of data generation in choosing between zero-inflated and generic mixture models for dental caries data. Statist Med. 2009;28:3539–3553. doi: 10.1002/sim.3699. [DOI] [PubMed] [Google Scholar]

[R12] 12.Grainger RM, Reid DBW. Distribution of dental caries in children. Journal of Dental Research. 1954;33:613–623. doi: 10.1177/00220345540330050501. [DOI] [PubMed] [Google Scholar]

[R13] 13.Hashim R, Thomson WM, Ayers KMS, Lewsey JD, Awad M. Dental caries experience and use of dental services among preschool children in Ajman, UAE. International Journal of Paediatric Dentistry. 2006;16:257262. doi: 10.1111/j.1365-263X.2006.00746.x. [DOI] [PubMed] [Google Scholar]

[R14] 14.Hilbe JM. Negative Binomial Regression. New York: Cambridge University Press; 2008. p. 251. [Google Scholar]

[R15] 15.Hujoel PP, Isokangas PJ, Tieko J, Davis S, Lamont RJ, DeRouen TA, Makinen KK. A re-analysis of caries rates in a preventive trial using poisson regression models. J Dent Res. 1994;73:573–579. doi: 10.1177/00220345940730021401. [DOI] [PubMed] [Google Scholar]

[R16] 16.Ismail AI, Sohn W, Tellez M, Willem JM, Betz J, Lepkowski J. Risk factors for dental caries using the International Caries Detection and Assessment System (ICDAS) Community. Dent Oral Epidemiol. 2008;36:55–68. doi: 10.1111/j.1600-0528.2006.00369.x. [DOI] [PubMed] [Google Scholar]

[R17] 17.Ismail AI, Sohn W, Lim S, Willem JM. Predictors of dental caries progression in primary teeth. J Dent Res. 2009;88:270–275. doi: 10.1177/0022034508331011. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R18] 18.Javali SB, Pandit PV. Using zero inflated models to analyze dental caries with many zeroes. Indian Journal of Dental Research. 2010;21:480–485. doi: 10.4103/0970-9290.74210. [DOI] [PubMed] [Google Scholar]

[R19] 19.Lambert D. Zero-inflated Poisson regression, with an application to defects in manufacturing. Technometrics. 1992;34:114. [Google Scholar]

[R20] 20.Lewsey JD, Gilthorpe MS, Bulman JS, Bedi R. Is modelling dental caries a ‘normal’ thing to do? Community Dentistry and Oral Epidemiology. 2000;17:21217. [PubMed] [Google Scholar]

[R21] 21.Lewsey J, Thomson W. The utility of the zero-inflated Poisson and zero-inflated negative binomial models: a case study of cross-sectional and longitudinal DMF data examining the effect of socio-economic status. Community Dentistry and Oral Epidemiology. 2004;32:18389. doi: 10.1111/j.1600-0528.2004.00155.x. [DOI] [PubMed] [Google Scholar]

[R22] 22.Levin KA, Davies CA, Topping GVA, Assaf AV, Pitts NB. Inequalities in dental caries of 5-year-old children in Scotland. European Journal of Public Health. 2009;19:337–342. doi: 10.1093/eurpub/ckp035. [DOI] [PubMed] [Google Scholar]

[R23] 23.Lim S, Sohn W, Burt BA, Sandretto AM, Kolker JL, Marshall TA, Ismail A. Cariogenicity of soft drinks, milk and fruit juice in low-income African-American children: a longitudinal study. JADA. 2008;139(7):959–967. doi: 10.14219/jada.archive.2008.0283. [DOI] [PubMed] [Google Scholar]

[R24] 24.Listl S. Family composition and children’s dental health behavior: evidence from Germany. Journal of Public Health Dentistry. 2011;71:91–101. doi: 10.1111/j.1752-7325.2010.00205.x. [DOI] [PubMed] [Google Scholar]

[R25] 25.Maserejian NN, Tavares MA, Hayes C, Soncini JA, Trachtenberg FL. Rural and Urban Disparities in Caries Prevalence in Children with Unmet Dental Needs: The New England Childrens Amalgam Trial. Journal of Public Health Dentistry. 2008a;68:7–13. doi: 10.1111/j.1752-7325.2007.00057.x. [DOI] [PubMed] [Google Scholar]

[R26] 26.Maserejian NN, Trachtenberg F, Hayes C, Tavares M. Oral health disparities in children of immigrants: dental caries experience at enrollment and during follow-up in the New England children’s Amalgam Trial. Journal of Public Health Dentistry. 2008b;68:14–21. doi: 10.1111/j.1752-7325.2007.00060.x. [DOI] [PubMed] [Google Scholar]

[R27] 27.Mullahy J. Specification and testing of some modified count data models. Journal of Econometrics. 1986;3:341365. [Google Scholar]

[R28] 28.Mwalili SM, Lesaffre E, Declerck D. The zero-inflated negative binomial regression model with correction for misclassification: an example in caries research. Stat Methods Med Res. 2008;17:123–139. doi: 10.1177/0962280206071840. [DOI] [PubMed] [Google Scholar]

[R29] 29.Nelson S, Albert JM, Lombardi G, Wishnek S, Asaad G, Kirchner HL, Singer LT. Dental caries and enamel defects in very low birth weight adolescents. Caries Res. 2010;44:509–518. doi: 10.1159/000320160. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R30] 30.Sanders AE, Lim S, Sohn W. Resilience to urban poverty: theoretical and empirical considerations for population health. American Journal of Public Health. 2008;98:1101–1106. doi: 10.2105/AJPH.2007.119495. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R31] 31.Simonoff JS. Analyzing Categorical Data. New York: Springer; 2003. p. 496. [Google Scholar]

[R32] 32.Solinas G, Campus G, Maida C, Sotgiu G, Cagetti MG, Lesaffre E, Castiglia P. What statistical method should be used to evaluate risk factors associated with dmfs index? Evidence from the National Pathfinder Survey of 4-year-old Italian children Community. Dent Oral Epidemiol. 2009;37:539–546. doi: 10.1111/j.1600-0528.2009.00500.x. [DOI] [PubMed] [Google Scholar]

[R33] 33.Syrjälä AM, Niskanen MC, Ylöstalo P, Knuuttila ML. Metabolic control as a modifier of the association between salivary factors and dental caries among diabetic patients. Caries Research. 2003;37:142–7. doi: 10.1159/000069020. [DOI] [PubMed] [Google Scholar]

[R34] 34.Thitasomakul S, Piwat S, Thearmontree A, Chankanka O, Pithpornchaiyakul W, Madyusoh S. Risks for early childhood caries analyzed by negative binomial models. J Dent Res. 2009;88:137–141. doi: 10.1177/0022034508328629. [DOI] [PubMed] [Google Scholar]

[R35] 35.Tramini P, Molinari N, Tentscher M, Demattei C, Schulte AG. Association between caries experience and body mass index in 12-year-old French children. Caries Research. 2009;43:468–473. doi: 10.1159/000264684. [DOI] [PubMed] [Google Scholar]

[R36] 36.Wong MC, Lu HX, Lo EC. Caries increment over 2 years in preschool children: a life course approach. Int J Paediatr Dent. 2011 doi: 10.1111/j.1365-263X.2011.01159.x. [epub ahead of print] [DOI] [PubMed] [Google Scholar]

PERMALINK

Review and Recommendations for Zero-inflated Count Regression Modeling of Dental Caries Indices in Epidemiological Studies

John S Preisser

John W Stamm

D Leann Long

Megan E Kincade

Abstract

Figure 1.

Materials and Methods

Overall effects in ZI models

Hurdle models

Methodology for review of published articles

Table 1.

Results

Table 2.

Conclusions and Recommendations

Supplementary Material

Acknowledgments

Appendix

Contributor Information

References

Associated Data

Supplementary Materials

ACTIONS

PERMALINK

RESOURCES

Cite

Add to Collections

PERMALINK

Review and Recommendations for Zero-inflated Count Regression Modeling of Dental Caries Indices in Epidemiological Studies

John S Preisser

John W Stamm

D Leann Long

Megan E Kincade

Abstract

Figure 1.

Materials and Methods

Overall effects in ZI models

Hurdle models

Methodology for review of published articles

Table 1.

Results

Table 2.

Conclusions and Recommendations

Supplementary Material

Acknowledgments

Appendix

Contributor Information

References

Associated Data

Supplementary Materials

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases