ABSTRACT
Across subfields of psychology, researchers frequently encounter count variables (i.e., non‐negative integer values, which result from counted measurements). Although count variables are common in psychological research (e.g., frequency of behaviours or symptoms), researchers may not be aware of appropriate statistical procedures for modelling and drawing inferences from count data. Specialised regression techniques (i.e., generalised linear models and zero‐augmented models) have been developed for the unique properties of count data, but they can seem inaccessible to non‐technical audiences because of their departure from more familiar methods. Assuming a basic knowledge of linear regression, this tutorial aims to demystify count regression approaches and empower researchers to apply these methods to their own count data, using free, open‐source statistical software (i.e., R). This tutorial takes researchers step‐by‐step through the implementation of count regression methods in applied research, imparting them with the knowledge to confidently implement these techniques.
Keywords: count regression, GZLM, hurdle model, overdispersion, statistics tutorial, zero inflation
1. Introduction
Researchers in various subfields of psychology frequently encounter count data (Vives, Losilla, and Rodrigo 2006). For example, a clinical researcher may examine the factors that contribute to one's frequency of panic attacks; a market researcher may explore how a store's customer service ratings relate to daily store visits; a school psychologist may assess how personality impacts a child's number of close friendships. In each of these examples, the dependent variable naturally manifests as a count and should be measured and analysed as such. However, many researchers may be unprepared to implement models for count data because of challenges in modelling decisions (e.g., distribution selection, given that counts produce nonnormal data) and interpretation (e.g., odds and incidence rate ratios) and instead inappropriately apply traditional methods (e.g., ANOVA, multiple regression) to model count outcomes. The purposes of this tutorial are to demystify common count regression techniques and provide researchers with a foundation to implement these methods in their own research using R (R Core Team 2023). 1
2. Properties of Count Data
Count data are observations of non‐negative integer values, which represent the number of event occurrences within a given time or space (see Table 1 for examples). Distributions of these data typically have extreme positive skewness, often with a concentration of values at zero or close to zero, and fewer occurrences of large values (see Figure 2 for visualisation of a count distribution). Count variables can theoretically range from zero to infinity but often have an undefined upper limit. They are always bounded by a lower value, which is often, but not always, zero. For example, there is a maximum number of children a person can have, but this limit is unknown, whereas nobody can have fewer than zero children. If we could observe the population distribution of the number of children per adult, we would expect most observations to fall between 0 and 5 children, but there would also be some valid extreme values (e.g., 15 children). If you were sampling from the population of parents, zeros would be ‘censored’, and the lower limit would be 1.
TABLE 1.
Examples of count variables.
| Types of count variables | Examples |
|---|---|
| Counts of events |
Number of views on a YouTube Video Number of therapist visits Number of pets owned |
| Counts of events within a defined period of time |
Number of panic attacks in a week Aggressive behaviours in a day at school Number of headache days in a month |
| Counts of events within a defined space |
Number of drunk driving arrests in New York City Number of terminated employees in an organisation |
FIGURE 2.

Density plot and histogram displaying distribution of affairs.
3. Linear and Count Regression Methods
Count regression techniques are appropriate whenever the outcome (or dependent) variable is a count, although count variables can usually be used as regressors (i.e., predictors) in standard linear regression models without issue (Vives, Losilla, and Rodrigo 2006). Formally, the outcome and error terms in regression models are ‘random’ variables. Random variables are based on probability distributions representing the likelihood of the possible values occurring. With ordinary least‐squares (OLS) linear regression, the errors (i.e., the difference between the observed and predicted values of the outcome) are considered normal with constant variance across the regressors, and when estimating a linear model, one should check that these assumptions are met. Estimating a model with a count outcome via OLS violates these assumptions because of the underlying characteristics of the data (e.g., skewness, sparsity); technically, a count variable is discrete and therefore a linear regression model fitted to a count outcome cannot have normally distributed errors. As described by Gardner, Mulvey, and Shaw (1995), when the assumption of normal errors is violated, the standard error estimates are incorrect, leading to erroneous inferential conclusions. Further, they also note that OLS regression can yield negative predicted values, which are impossible for a count outcome. With OLS regression, assuming that the errors follow a normal distribution is equivalent to assuming that the outcome is normally distributed conditional on the regressors. With count regression models, in contrast, a different conditional distribution for the outcome (e.g., a Poisson distribution) is explicitly specified based on the nature of the count outcome.
4. Tutorial Overview
The following sections describe the common procedure for modelling count data (summarised in Figure 1). 2 We introduce the most notable count models, their characteristics and their implementation in R through an applied example (all materials are available in the Open Science Framework). Beginning with generalised linear models (GLMs; Poisson and negative binomial), and then their zero‐augmented extensions (hurdle and zero‐inflated), we provide a framework to guide researchers through model selection and inference. Although we introduce several tools for model‐fitting and checking, this is not a complete inventory of the available count regression tools. As such, we recommend additional resources throughout and provide an extended presentation of several topics in the Supporting Information. This tutorial adds to the limited literature on count regression methods in psychology (e.g., Atkins and Gallop 2007; Coxe, West, and Aiken 2009; Green 2021) by providing an accessible workflow process for researchers. Although we are not the first to review these methods, this tutorial integrates disparate count methods into a comprehensive system of decisions, culminating in a well‐specified model. The following serves as an introduction to researchers looking to model count data and equips them with the knowledge to pursue additional resources based on their own situations.
FIGURE 1.

Count data modelling procedure. *Indicates models that are not included in this tutorial.
4.1. Research Example
Throughout this tutorial, we use simulated data to illustrate count regression methods. The code used to create this dataset is available in OSF. These data are based on a study by Fair (1978) to form ‘A Theory of Extramarital Affairs’, which was later used for pedagogical purposes by Greene (2003). The dataset has N = 500 observations and five columns: ID (numeric identifier), gender (woman or man), years_married (years married to their current spouse), happiness (self‐rated marital happiness, on a scale of 1 = very unhappy to 7 = very happy) and affairs (number of extramarital affair encounters in the past year). The objective is to model affairs as a function of happiness, with gender and years_married included as covariates. The functions used for these analyses come pre‐installed in R unless otherwise stated.
5. Preliminary Steps
5.1. Count Outcome
The first step is to determine whether your outcome is a count variable. Count variables consist exclusively of non‐negative integer values, and each unit difference represents the same change across the range of observations (i.e., an interval scale of measurement; e.g., the difference between one and two affairs is the same as the difference between two and three affairs).
5.2. Possible Values of the Count Variable
Once you have established that you have a count outcome variable, consider its possible values. The models outlined in this article are predicated on the count variable having the possibility of all values ≥ 0. That is, specific values (e.g., observations of 0) cannot have been systematically excluded from your data based on research design. For example, if you recorded all participants who had ≥ 10 affairs as ‘10+’ (i.e., imposed an upper bound) or only recruited participants with ≥ 1 affair (i.e., excluded observations of 0), you would need to use special classes of models—censored and truncated count models—to model the data (see the Supporting Information for further discussion). At this point, it is helpful to graph the count outcome to get a sense of its distribution:
We can also calculate descriptive statistics:
mean(dat$affairs) sd(dat$affairs) var(dat$affairs)
Figure 2 shows a large concentration of zero observations and a long tail of high‐count values. The mean of affairs is M = 1.75 with variance = 17.82 (SD = 4.22).
6. Independence
Like standard linear regression, count regression models (and their extensions outlined in this paper) assume that individual observations are conditionally independent, that is, they are unrelated to one another after accounting for shared effects of predictors in the model. Observations can be non‐independent for many reasons, including research design (e.g., longitudinal) and sampling plan (e.g., nested observations). Strategies for handling non‐independent data are described in the Supporting Information. Atkins et al. (2013) provide a tutorial for researchers interested in modelling longitudinal count data.
7. Dispersion
7.1. Poisson Models
The Poisson distribution is the foundational probability distribution for count models. However, the use of Poisson regression is quite limited, especially in the social sciences. The Poisson distribution is usually not very good for modelling count data because of its single parameter, , which equals both the mean and the variance (referred to as ‘equidispersion’). Most data will not be consistent with equidispersion and will therefore not satisfy the assumptions of a Poisson model. Nevertheless, the Poisson regression model is a good starting point for building a count model, as most count models are extensions of the Poisson model. Specifically, the Poisson regression model assumes:
A count outcome with possible values of Y ≥ 0
Independent observations
The conditional outcome follows a Poisson distribution
The Poisson regression model equation can be represented as
where is the conditional mean of the count outcome given the values of the predictors. The model is expressed in terms of a natural‐log transformation of to allow a linear combination of regressors on the righthand side of the equation.
A Poisson model can be fit in R using the glm()function:
pois_mod <‐ glm(affairs ~ gender + years_married + happiness, family = poisson, data = dat) summary(pois_mod)
The Poisson model can also accommodate differences in ‘exposure’ among observations by adding an offset parameter. Offsets can be used in situations where the scope of measurement differs between observations (e.g., measured over different periods of time). In the example data, the affairs measurement spans over the past year for all participants, so an offset is not necessary. However, if affairs were measured across the length of marriage, we would include an offset to adjust for different marriage lengths to create an affairs rate (i.e., affairs per year of marriage) (see the Supporting Information for more information on offsets).
7.2. Testing Dispersion
Once a Poisson model is fitted to the data, you can assess the dispersion. There are three dispersion scenarios: (1) equidispersion: the conditional variance equals the conditional mean of the count outcome, (2) underdispersion: the conditional variance is less than the conditional mean and (3) overdispersion: the conditional variance is greater than the conditional mean. A common method of assessing dispersion is with the Pearson‐based dispersion statistic, which is evaluated with a chi‐squared test; several packages implement this overdispersion test. 3 Here, we enter our fitted Poisson model into the check_overdispersion()function from the performance package (Lüdecke et al. 2021):
library(performance) check_overdispersion(pois_mod) # Overdispersion test dispersion ratio = 4.228 Pearson's Chi‐Squared = 2097.008 p‐value = <0.001 Overdispersion detected.
The output provides a dispersion statistic and the associated chi‐squared test with a p‐value. A dispersion statistic equal to 1 indicates equidispersion, whereas values over 1 indicate overdispersion and values under 1 indicate underdispersion. This method assumes that poor model fit is due to violation of the Poisson distributional assumption and not another issue (e.g., missing predictors, outliers). Before attempting to remedy dispersion issues, researchers should first assess whether their model includes all relevant predictors and interactions. For our affairs model, the degree of overdispersion is much greater than 1 and statistically significant. This overdispersion was foreshadowed in our descriptive statistics for affairs, where the unconditional variance was much larger than the mean.
7.2.1. Equidispersion
If you find approximate equidispersion, it is reasonable to interpret the coefficient estimates. These estimates can be interpreted similarly as linear slope estimates, except each one‐unit increase in the predictor corresponds to a log‐count increase in the outcome variable (while holding other predictors in the model constant). Because the meaning of logged counts is not intuitive, you should exponentiate the coefficient estimates (and their confidence intervals) to obtain incidence rate ratios (IRRs), which indicate the percentage difference per unit difference in the corresponding predictor:
exp(cbind(IRR = coef(pois_mod), confint(pois_mod))) IRR 2.5% 97.5% (Intercept) 13.3247903 11.1632522 15.8653778 genderwoman 0.4117638 0.3530886 0.4784266 years_married 0.9960283 0.9915687 1.0005039 happiness 0.5508687 0.5218194 0.5805010
An exponentiated estimate (or IRR) = 1 indicates a null association between the predictor and outcome, whereas IRR > 1 corresponds to a positive association and IRR < 1 indicates a negative association. In the current example, we know that the Poisson model is not a good fit to the data, so we would not interpret these ratios in practice. However, for the purposes of the tutorial, we interpret the IRRs for the gender and happiness predictors: The IRR for genderwoman is 0.41, meaning that being a woman (vs. a man) corresponds to a predicted count outcome that is reduced by a factor of 0.41 (i.e., 59% fewer affairs). Next, each 1‐point increase in happiness corresponds to a relative reduction in the predicted count outcome by a factor of 0.55 (i.e., 45% fewer affairs).
7.2.2. Underdispersion
If underdisperson is indicated, there may not be enough variability in the outcome, and it is possible that a Poisson model will not converge. If a Poisson model is successfully estimated with underdispersed data, the standard errors will likely be inflated. One way to handle underdispersed data is to fit a generalised Poisson model (Consul and Famoye 1992). Because underdispersion is relatively uncommon, generalised Poisson models are not presented in this tutorial (but see Hilbe 2014). However, handling one possible cause of underdispersion—excess zeros—is covered below.
7.2.3. Overdispersion
There are several ways to address overdispersion, with the optimal choice depending on the specific research situation. Below, we outline some of the most common methods of handling overdispersion.
7.2.3.1. Robust Standard Errors
Sometimes, the magnitude of overdispersion is small enough to be handled by obtaining robust standard error estimates for the fitted Poisson model. Specifically, the ‘sandwich estimator’ will correct the downward bias in the standard errors caused by the underestimated conditional variance. The coefficient estimates remain the same from the Poisson model, but standard errors and ensuing confidence intervals and p‐values will be corrected. If the magnitude of overdispersion is small, the robust standard errors should be similar to the original Poisson model standard errors. Robust significance tests and confidence intervals based on the sandwich estimator can be obtained with the sandwich and lmtest packages (Zeileis, Köll, and Graham 2020; Zeileis and Hothorn 2002):
library(sandwich) library(lmtest) coeftest(pois_mod, vcov = sandwich)
7.2.3.2. Quasi‐Poisson Model
An alternative approach that will lead to the same coefficient estimates as the Poisson model but return standard errors that are adjusted for overdispersion is to estimate a quasi‐Poisson model. You can fit a quasi‐Poisson model by changing the family option in the code used to fit the original Poisson model (see the Supporting Information for more information):
quasipois_mod <‐ glm(affairs ~ gender + years_married + happiness, family = quasipoisson, data = dat)
7.2.3.3. Negative Binomial Model
The negative binomial (NB) model (see Cameron and Trivedi 1986) is another approach to handling overdispersed data. 4 In addition to the parameter of the Poisson, the NB model includes a dispersion parameter, although the NB specifies a different probability distribution from the Poisson. Furthermore, the NB model assumes a quadratic relationship between the conditional mean and variance, meaning that the variance is proportional to the mean‐squared. A negative binomial model can be fit using the glm.nb() function from the MASS package (Venables and Ripley 2002):
library(MASS) nb_mod <‐ glm.nb(affairs ~ gender + years_married + happiness, data = dat)
The output provides the estimated dispersion parameter ‘theta’. It is important to note that the glm.nb() function in R defines theta in a slightly different way than some other software packages (e.g., SPSS, STATA). After fitting the NB model, we should compare it with the previously estimated models and assess whether the NB fits the data adequately (see the model comparison selection below). Now, we can check how the NB model has handled overdispersion:
check_overdispersion(nb_mod) # Overdispersion test dispersion ratio = 0.946 Pearson's Chi‐Squared = 469.210 p‐value = 0.801 No overdispersion detected.
From the dispersion ratio (close to 1) and non‐significant chi‐squared test, we conclude that the data do not exhibit greater overdispersion than the NB distribution would allow.
7.2.3.4. Zero‐Augmented Model
Sometimes, overdispersion is caused by excess observations equal to 0, and correcting for this overdispersion using methods such as the negative binomial model may not be sufficient. Instead, one should then address overdispersion by fitting a zero‐augmented model, as described below.
8. Zeros
The presence of excess zeros is a common culprit of both overdispersion and poorly fitted Poisson and negative binomial models. Often, count data have more zeros than expected by the Poisson or negative binomial distributions. There are two main approaches for dealing with excess zeros: (1) hurdle models and (2) zero‐inflated models. Hurdle models (Cragg 1971; Mullahy 1986), sometimes referred to as ‘two‐part models’, can handle an over‐ or underrepresentation of zeros by modelling the zeros and the rest of the counts separately. These models assume that the zeros and non‐zero counts result from distinct data‐generating processes and thus separate the model into a binary component and a zero‐truncated count component. Zero‐inflated models (Lambert 1992) do not model zeros separately from the rest of the counts, but instead have an overlapping modelling process at zero, meaning that some zeros arise from their own data‐generating process and others arise from the count process.
Deciding between hurdle and zero‐inflated models is typically based on a theoretical understanding of the data‐generating process for the zeros. A critical issue is the distinction between ‘structural zeros’ and ‘random zeros’. A structural zero occurs for an observation that is never expected to have a non‐zero count in the specified time frame, whereas a random zero occurs when a non‐zero count could have been observed but was not due to random sampling variation. Hurdle models are appropriate when all zeros are structural, whereas zero‐inflated models assume a mixture of structural and random zeros. Using our extramarital affairs example, the affairs variable was defined as ‘number of affairs in the past year’, which is intended to represent one's general tendency to have affairs. Consequently, observations of 0 extramarital affairs in the affairs data would likely contain two groups of people: (1) those who have had or will have an affair but could report 0 affairs for the past year by random chance (i.e., random/sampling zeros) and (2) those who have never and will never have an affair (i.e., structural zeros). If the measurement of affairs spanned the entire marriage, and we could be reasonably certain that a portion of the observations of zero do not originate from the sampling process, a hurdle model would be more appropriate. Selecting between a hurdle model and a zero‐inflated model may not always be straightforward, and in more ambiguous situations, one can fit both models for comparison (for more information on structural versus sampling zeros, see He et al. 2014).
8.1. Testing for Excess Zeros
As a first step, we can check whether there are more zeros in the data than expected from a Poisson or negative binomial distribution. We may already suspect excess zeros based on theoretical considerations or data exploration, but it can be difficult to identify zero‐inflation from observed frequency alone. We can compare the observed frequency of zeros to the expected frequency based on the Poisson or NB distribution by entering their respective fitted models through the check_zeroinflation()function in the performance package (Lüdecke et al. 2021) 5 :
library(performance) check_zeroinflation(pois_mod) # Check for zero‐inflation Observed zeros: 281 Predicted zeros: 199 Ratio: 0.71 Model is underfitting zeros (probable zero‐inflation).
check_zeroinflation(nb_mod) # Check for zero‐inflation Observed zeros: 281 Predicted zeros: 280 Ratio: 1.00 Model seems ok, ratio of observed and predicted zeros is within the tolerance range.
The output includes the frequencies of observed and predicted zeros and an assessment of zero inflation based on the observed to predicted ratio and a given tolerance level. Ratios close to 1 indicate that the model fits the zeros well. If the frequency of observed zeros is greater than the frequency of predicted zeros, the model is ‘underfitting’ the zeros; there is zero‐inflation. Should your models underfit the zeros, either a hurdle or zero‐inflated model may fit the data better than the Poisson or NB model. If, instead, the models overfit the zeros (i.e., zero‐deflation), a hurdle model may be appropriate because hurdle models can handle both zero‐inflation and zero‐deflation.
The output for the affairs example indicates that the NB model does not over‐ or underfit the zeros. Thus, a zero‐augmented model is likely not necessary; consequently, we would normally use this NB model as our final model and proceed to interpret the exponentiated coefficient estimates. However, it will not always be apparent that the NB model fits the zeros as well as a zero‐augmented model could, and it is generally best to validate model selection by comparing model fit. We illustrate these model fit comparisons later.
8.2. Hurdle Models
As mentioned earlier, hurdle models have two components. The first component models the probability of observing a non‐zero (vs. a zero), and the second component models the count values among the non‐zero observations. It is possible to have different predictors for each component; changing the predictors for one component will not impact the other. For example, if you were using a hurdle model to examine the factors that influence the number of drunk driving arrests among a population of adults, a predictor of whether they have ever driven drunk may impact the zero component of the model (i.e., whether they have a drunk driving arrest or not) but not the count component (i.e., among people who have a drunk driving arrest, how many do they have). Other predictors, such as age, may be relevant for both components. A hurdle model can be fit using the hurdle() function in the pscl package in R (Jackman 2020):
library(pscl) p.hurdle_mod <‐ hurdle(affairs ~ gender + years_married + happiness, data = dat, dist = ‘poisson’, zero.dist = ‘binomial’)
In the syntax above, we have specified a hurdle model with all predictors included for both model components. If you would like to include different predictors for each component, you can separate the model formula into two parts using ‘|’, with the count portion's predictors on the left side and the zero portion's predictors on the right.
A Poisson distribution and binomial distribution are the default settings for the count and zero portions of the model, respectively. We can instead specify a negative binomial distribution for the count portion:
nb.hurdle_mod <‐ hurdle(affairs ~ gender + years_married + happiness, data = dat, dist = ‘negbin’, zero.dist = ‘binomial’)
The output from fitted hurdle models will display the count model and zero model results separately. The coefficients and confidence intervals can be exponentiated to provide IRRs and odds ratios (ORs) with this code:
exp(cbind(IRR_OR = coef(p.hurdle_mod), confint(p.hurdle_mod))) #Poisson model exp(cbind(IRR_OR = coef(nb.hurdle_mod), confint(nb.hurdle_mod))) #Negative binomial model
A count‐component IRR is interpreted such that, among the non‐zero counts, for each one‐unit increase in x, y changes by a factor of the IRR, as explained earlier for Poisson models. A zero‐component OR is interpreted such that, for each one‐unit increase in x, the odds of observing a non‐zero change by a factor of the OR, analogous to the interpretation of an OR from a logistic regression model. As noted previously, values of 1 indicate ‘no effect’ between the predictor and outcome variables, which is important to note when interpreting the confidence intervals, as CIs overlapping 1 are analogous to CIs including 0 for a linear slope parameter. Here are the exponentiated coefficients for the NB hurdle model:
IRR_OR 2.5% 97.5% count_(Intercept) 11.8163753 7.1752094 19.4596028 count_genderwoman 0.3513277 0.2401892 0.5138914 count_years_married 0.9962626 0.9839261 1.0087538 count_happiness 0.6364060 0.5665795 0.7148380 zero_(Intercept) 2.4225902 1.3913679 4.2181102 zero_genderwoman 1.0409134 0.7185209 1.5079598 zero_years_married 1.0027336 0.9905304 1.0150872 zero_happiness 0.7010454 0.6290618 0.7812660
Sometimes, the coefficients will be similar across both model components; above, those who were happier were less likely to have had an affair at all (OR = 0.70), and among those who had at least one affair, those who were happier had fewer affairs (IRR = 0.64). In contrast, gender is not significantly related to whether one has an affair (OR = 1.04), but among those who have had affairs, women are predicted to have fewer affairs than men (IRR = 0.35).
8.3. Zero‐Inflated Models
In contrast with hurdle models, zero‐inflated models have overlapping zero components, which simultaneously estimate the zeros. The zeros estimated with the binary component represent the structural zeros and the zeros estimated from the count component represent the random zeros. Zero‐inflated models can be estimated using the zeroinfl() function in the pscl package:
library(pscl)
A zero‐inflated Poisson (ZIP) model is estimated with:
p.zeroinfl_mod <‐ zeroinfl(affairs ~ gender + years_married + happiness, data = dat, dist = ‘poisson’, link = ‘logit’)
whereas a zero‐inflated NB model is estimated with:
nb.zeroinfl_mod <‐ zeroinfl(affairs ~ gender + years_married + happiness, data = dat, dist = ‘negbin’,link = ‘logit’)
The syntax for including predictors in each model component is the same as for the hurdle model, with the count predictors on the left side of the ‘|’ and the zero‐component predictors on the right. Unlike the hurdle model, zero‐inflated models estimate the zero and count components simultaneously to maximise likelihood. This means that the zero and count components of the model are interdependent, with the parameters of one component impacting the fit of the other component. Thus, changing the predictors for one part of the model (e.g., the zero component) could impact the estimates of the other part (e.g., the count component).
The coefficient estimates can be exponentiated in the same way as the hurdle model, but they are interpreted differently. Instead of the estimates in the zero component representing the likelihood of observing a non‐zero (as with the hurdle model), they represent the likelihood of observing a zero. The count component is interpreted like the hurdle model's count component: for each one‐unit increase in the predictor, the count changes by a factor of the exponentiated estimate. However, the count component in a zero‐inflated model includes the value of zero.
9. Model Comparison
In previous sections, we touched on model fit comparisons for model selection. This section outlines common methods for comparing models with respect to how well they fit the same data (see Table 2).
TABLE 2.
Model fit comparisons.
| Comparison | Likelihood ratio test | AIC | BIC | Assess zero‐inflation | Vuong's test a | Hanging rootogram |
|---|---|---|---|---|---|---|
| Nested models | ||||||
|
X | X | X | X | ||
|
X | X | X | X | ||
|
X | X | X | X | ||
| Non‐nested models (poisson) | ||||||
|
X | X | X | X | ||
|
X | X | X | X | ||
|
X | X | X a | X | ||
| Non‐nested models (NB) | ||||||
|
X | X | X | X | ||
|
X | X | X | X | ||
|
X | X | X a | X | ||
Note: More information about Vuong's test is in the Supporting Information.
Only use when theoretical rationale for selection is not available.
9.1. Likelihood Ratio Tests
The likelihood ratio test (LR test), also known as the Wilks test, compares nested models estimated with maximum likelihood. Two models are nested if one can be expressed as a special case of a second, more complex model, by placing constraints on its parameters, meaning that the more complex model must have all the parameters of the simpler model, along with one or more additional non‐zero parameters. For example, because the NB model reduces to the Poisson model when the dispersion parameter = 1, the Poisson model is nested within the NB model (assuming both models have the same predictors). In addition to comparing Poisson and NB models, the LR test can also be used to compare other nested models. For example, you can compare a model with two predictors against a more complex model that includes an interaction between the two predictors.
LR tests compare the probability of the data given the parameter estimates to assess the relative fit of two nested models. The LR test statistic is:
where m 1 is the more complex model and m 0 is the simpler model. The chi‐squared distribution is used for the hypothesis test, with degrees of freedom equalling the number of constrained parameters in m 0 (i.e., the number of additional parameters in m 1); the null hypothesis for this test is that there is no model fit difference between the two models (i.e., the models have equal likelihoods). Of the two nested models, the more complex model will always have a higher likelihood because it has more parameters to explain the data; the LR test determines whether this increased likelihood is significant. To limit the potential for favouring the more complex model as a result of overfitting, the LR test should be used in conjunction with other model fit criteria that account for model complexity.
To demonstrate, we can formally test whether the NB has a better fit than the Poisson model using the lrtest() function in the lmtest package (Zeileis and Hothorn 2002):
library(lmtest) lrtest(pois_mod, nb_mod)Likelihood ratio test Model 1: affairs ~ gender + years_married + happiness. Model 2: affairs ~ gender + years_married + happiness. #Df LogLik Df Chisq Pr(>Chisq). 1 4–1090.99. 2 5 ‐750.42 1681.14 < 2.2e‐16 ***.
The output indicates that the more complex model, the NB, fits the data significantly better than the Poisson model.
9.2. AIC and BIC
Akaike's Information Criterion (AIC) and the Bayesian Information Criterion (BIC) can be used to compare both nested and non‐nested models. Unlike the LR test, AIC and BIC consider both model fit and model complexity, as they penalise models with more parameters. The two metrics are interpreted similarly, with lower values indicating better model fit; information criteria values only have relative meaning, so their absolute values are largely meaningless—there are no threshold values that indicate ‘good’ or ‘bad’ model fit. Because these values are calculated with an arbitrary constant, the range of AIC and BIC values one could observe is quite wide and allows for the possibility of both positive and negative values.
Often, AIC and BIC point to the same conclusion; however, when the two disagree (i.e., AIC lower for one model, BIC lower for the other), it may be because BIC applies a larger penalty for additional parameters. Although AIC and BIC can be used to compare non‐nested models, the models should include the same outcome variable and use the same data. AIC and BIC can be produced for any model using:
AIC(nb_mod) BIC(nb_mod)
Results in Table 3 for models fitted to the affairs data show that the NB model produces the lowest BIC, whereas the hurdle model (with an NB distribution for the count component) produces the lowest AIC.
TABLE 3.
Estimated coefficients for models fitted to affairs data.
| Poisson | Poisson (sandwich SE) | Quasi‐ poisson | Negative binomial | Poisson Hurdle | NB hurdle | ZIP | ZINB | |
|---|---|---|---|---|---|---|---|---|
| Count component | ||||||||
| (Intercept) | 2.59* | 2.59* | 2.59* | 2.06* | 2.88* | 2.47* | 2.80* | 2.23* |
| (0.09) | (0.30) | (0.18) | (0.23) | (0.28) | (0.25) | (0.09) | (0.21) | |
| Gender (woman) | −0.89* | −0.89* | −0.89* | −0.56* | −1.01* | −1.05* | −0.97* | −1.06* |
| (0.08) | (0.16) | (0.16) | (0.16) | (0.17) | (0.19) | (0.09) | (0.18) | |
| Years married | 0.00 | 0.00 | 0.00 | 0.00 | −0.01 | 0.00 | −0.01* | 0.00 |
| (0.00) | (0.01) | (0.00) | (0.01) | (0.01) | (0.01) | (0.00) | (0.01) | |
| Happiness | −0.60* | −0.60* | −0.60* | −0.48* | −0.46* | −0.45* | −0.42* | −0.38* |
| (0.03) | (0.06) | (0.06) | (0.05) | (0.07) | (0.06) | (0.03) | (0.05) | |
| Zero component | ||||||||
| Intercept | 0.88* | 0.88* | −0.49 | −2.22* | ||||
| (0.28) | (0.28) | (0.32) | (0.78) | |||||
| Gender (woman) | 0.04 | 0.04 | −0.53* | −12.46 | ||||
| (0.19) | (0.19) | (0.25) | (166.98) | |||||
| Years married | 0.00 | 0.00 | −0.01 | 0.00 | ||||
| (0.01) | (0.01) | (0.01) | (0.02) | |||||
| Happiness | −0.36* | −0.36* | 0.20* | 0.42* | ||||
| (0.05) | (0.06) | (0.07) | (0.13) | |||||
| AIC | 2190.0 | 2190.0 | 1510.8 | 1773.7 | 1502.7 | 1790.9 | 1503.4 | |
| BIC | 2206.8 | 2206.8 | 1531.9 | 1807.4 | 1540.6 | 1824.7 | 1541.4 | |
Note: N = 500. Values in parentheses are standard error estimates.
p < 0.05.
9.3. Rootograms
The fit of count models can also be assessed graphically by using a hanging rootogram (Tukey 1977; Kleiber and Zeileis 2016). Rootograms can be plotted in R by passing the fitted model object through the rootogram() function in the topmodels package (Zeileis, Lang, and Stauffer 2023; see the Supplemental code file for installation instructions). Figure 3 shows the rootogram for the Poisson model:
FIGURE 3.

Rootogram of Poisson model.
library(topmodels) r_pois <‐ rootogram(pois_mod)
The rootogram plots show a model's expected counts (red line) and its observed counts (bars hanging from the red line). The count outcome is on the x‐axis and the y‐axis marks the square root of the frequencies. A model with a perfect fit would show each bar hanging down to the horizontal reference line at zero. If a bar hangs below the reference line, the model underpredicts that count (observed > expected count). If the bar hangs above the reference line, the model overpredicts that count (observed < expected count). The rootogram is useful for selecting among multiple fitted models, as you can quickly assess which most closely approximates the data across the entire range of observed values. Further, the rootogram is useful for identifying where the data depart the expected counts (e.g., underpredicts zeros), and which distributions succeed in remedying misfit. Although the hanging rootogram is not the only tool for assessing model fit visually (e.g., scatterplot or barplot of expected vs. observed frequencies), we believe it is the most intuitive and flexible visualisation for comparing among candidate models, especially given how easily it highlights zero‐inflation. Even among alternative rootograms (e.g., standing rootogram), the hanging rootogram best highlights deviations from the fitted model by using the horizontal reference line as a visual guide. In the current example, we can see that the Poisson model underfitted the zeros and the higher count values (bars hanging below the line) while also overfitting the lower count values (bars hanging above the line). For additional visual model fit assessment methods, see Green (2021).
9.4. Final Model Selection and Interpretation
When selecting the final model for interpretation, it is helpful to assess the fit of several models at once using model summarisation tools such as the modelsummary() function (Arel‐Bundock 2022). Coefficient estimates for the models fitted to the affairs data are in Table 3. The estimates are similar across Poisson, quasi‐Poisson and NB models (also notice that the robust Poisson standard errors are much larger than the Poisson standard errors).
The AIC and BIC suggest that the NB zero‐augmented models fit the data better than their Poisson counterparts. However, they are comparable with the naïve NB model. The zero‐augmented models have slightly better AIC, but the naïve model has better BIC; recall that BIC has a stronger model complexity penalty than AIC.
We can also examine at the rootograms (see Figure 4):
FIGURE 4.

Rootograms for Poisson, NB, NB hurdle and ZINB models.
r_nb <‐ rootogram(nb_mod) r_nbh <‐ rootogram(nb.hurdle_mod) r_nbz <‐ rootogram(nb.zeroinfl_mod) autoplot(c(pois_rgram, r_nb, r_nbh, r_nbz))
The rootograms show that the negative binomial models fit the data much better than the Poisson model, which underfit the zeros and overfit many of the lower values. Consistent with the fit indices, the rootograms indicate that the naïve NB model fits the data as well as the hurdle and zero‐inflated models. Taken together, we can conclude that the naïve NB model optimises model fit and parsimony, so we will use it as our final model. Now, we can interpret its exponentiated coefficient estimates and confidence intervals:
cbind(Estimate = coef(nb_mod), confint(nb_mod), IRR = exp(coef(nb_mod)), exp(confint(nb_mod))) Estimate 2.5% 97.5% IRR 2.5% 97.5% (Intercept) 2.0611598202 1.66055506 2.48120990 7.8550750 5.2622309 11.9557209 genderwoman ‐0.5569388460 ‐0.87500338 ‐0.23626383 0.5729603 0.4168606 0.7895723 years_married −0.0003227434 ‐0.01081237 0.01014821 0.9996773 0.9892459 1.0101999 happiness ‐0.4770903451 ‐0.56684028 ‐0.38920541 0.6205865 0.5673152 0.6775951
The results indicate that those who reported higher levels of happiness in their marriage had fewer affairs in the past year, B = −0.48, 95% CI [−0.57, −0.39]. Specifically, the IRR = 0.62 (95% CI [0.57, 0.68]) indicates that for each 1‐unit increase in happiness ratings, the predicted number of affairs changes by a factor of 0.62, which is a 38% decrease (i.e., 1—IRR = 0.38). Women were predicted to have 43% fewer affairs than men, IRR = 0.57, 95% CI [0.42, 0.79], and there was a near‐zero predicted difference in affairs for each additional year married, IRR = 1.00, 95% CI [0.99, 1.01].
10. Conclusion
Count variables are common in psychology, but standard linear regression is not well suited for count outcomes. This tutorial provided an overview of the practice of modelling count data with commonly used generalised linear models and zero‐augmented models. Figure 1 provides a flowchart for navigating count regression methods, which can be a useful starting point for researchers implementing these techniques.
In this tutorial, we addressed data exploration, issues with dispersion and zeros, and fitting GLMs (Poisson, NB, quasi‐Poisson) and zero‐augmented models (ZIP, ZINB, Poisson Hurdle, NB hurdle). Depending on one's field, some researchers may deal with certain data characteristics (e.g., zero‐inflation) regularly, while others may never encounter them. Nevertheless, it is important to recognise idiosyncrasies should they materialise. Those familiar with standard linear regression should recognise these models as extensions, rather than complete departures, from the models they are accustomed to.
While we aimed to introduce the most common techniques for modelling count data, we could not cover the breadth of all methods. This tutorial is meant as an introduction to count regression methods for the uninitiated, and we encourage researchers to pursue further reading to extend their knowledge (e.g., Hilbe 2014; Coxe, West, and Aiken 2009; Cameron and Trivedi 2013; Friendly and Meyer 2015).
Ethics Statement
The authors have nothing to report.
Conflicts of Interest
The authors declare no conflicts of interest.
Supporting information
Data S1. The complete R code and output for the analyses (as a single .html file) and Supporting Information containing additional topics are available in the Open Science Framework (OSF) at https://osf.io/tuysr/?view_only=1adf6a048b094680b8c91b78ffc81bb7.
Funding: The authors received no specific funding for this work.
Endnotes
To our knowledge, no point‐and‐click open access statistical software programs (e.g., Jamovi or JASP) support the full range of methods used in this paper. Jamovi's GAMj module offers a subset of generalised linear models but does not support zero‐augmented models or many of the model comparison methods used in this tutorial. See Green (2021) for a review of the available functions supported across R, SPSS and Jamovi.
In this tutorial, we suggest a multi‐stage analytical procedure, wherein the distributional assumptions are checked prior to interpreting the fitted model, and candidate models are evaluated with goodness‐of‐fit tests. Although this approach is traditional, concerns have been raised about its potential to introduce model selection bias by using the same data for both model selection and inference (Wells and Hintze 2007; Kriegeskorte et al. 2009). Ideally, one would know what population distribution to expect and select the corresponding model a priori. However, this is an unrealistic expectation for most psychological applications. Following Campbell (2021) and Perumean‐Chaney et al. (2013), we suggest that researchers exercise caution when modelling count data with a small sample (N < 50), as some of the methods used in this paper may be incorrectly selected when sample size is not adequate.
The check_overdispersion()function obtains a dispersion ratio by calculating the sum of squared Pearson residuals divided by n observations minus k regressors. The p‐value is obtained by calculating the probability of observing the sum of squared Pearson residuals given a χ2 with n—k degrees of freedom. Pearson residuals are calculated by subtracting the predicted count (based on the Poisson distribution for a Poisson model and simulated data for an NB model) from the observed count and dividing the result by the square root of the predicted count.
The traditional parameterization of the NB model is the ‘NB2’, also known as the quadratic negative binomial, named for the exponent on the second term of its variance function, Var(x) = μ + θμ2. There are other parameterizations of the NB, including the less‐popular ‘NB1’ model, named for its exponent of 1 on the second term, Var(x) = μ + θμ. Because NB2 is the default implementation in most software packages and considered the most flexible parameterization (e.g., MLE and IWLS can be used for estimation), we have selected the NB2 for all NB models mentioned in this tutorial. For information on other parameterizations, see Maher and Summersgill (1996) or Hilbe (2012).
This function compares the frequency of predicted zeros versus observed zeros for Poisson and NB models. For Poisson, predicted zeros are based on predicted values from the Poisson distribution. For NB, predicted zeros are based on simulated data.
References
- Arel‐Bundock, V. 2022. “Modelsummary: Data and Model Summaries in R.” Journal of Statistical Software 103, no. 1: 1–23. 10.18637/jss.v103.i01. [DOI] [Google Scholar]
- Atkins, D. C. , Baldwin S. A., Zheng C., Gallop R. J., and Neighbors C.. 2013. “A Tutorial on Count Regression and Zero‐Altered Count Models for Longitudinal Substance Use Data.” Psychology of Addictive Behaviors 27, no. 1: 166–177. 10.1037/a0029508. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Atkins, D. C. , and Gallop R. J.. 2007. “Rethinking How Family Researchers Model Infrequent Outcomes: A Tutorial on Count Regression and Zero‐Inflated Models.” Journal of Family Psychology 21, no. 4: 726–735. 10.1037/0893-3200.21.4.726. [DOI] [PubMed] [Google Scholar]
- Cameron, A. C. , and Trivedi P. K.. 1986. “Econometric Models Based on Count Data. Comparisons and Applications of Some Estimators and Tests.” Journal of Applied Econometrics 1: 29–53. 10.1002/jae.3950010104. [DOI] [Google Scholar]
- Cameron, A. C. , and Trivedi P. K.. 2013. Regression Analysis of Count Data. 2nd ed. Cambridge University Press. 10.1017/CBO9781139013567. [DOI] [Google Scholar]
- Campbell, H. 2021. “The Consequences of Checking for Zero‐Inflation and Overdispersion in the Analysis of Count Data.” Methods in Ecology and Evolution 12, no. 4: 665–680. 10.1111/2041-210X.13559. [DOI] [Google Scholar]
- Consul, P. C. , and Famoye F.. 1992. “Generalized Poisson Regression Model.” Communications in Statistics—Theory and Methods 21, no. 1: 89–109. 10.1080/03610929208830766. [DOI] [Google Scholar]
- Coxe, S. , West S. G., and Aiken L. S.. 2009. “The Analysis of Count Data: A Gentle Introduction to Poisson Regression and Its Alternatives.” Journal of Personality Assessment 91, no. 2: 121–136. 10.1080/00223890802634175. [DOI] [PubMed] [Google Scholar]
- Cragg, J. G. 1971. “Some Statistical Models for Limited Dependent Variables With Application to the Demand for Durable Goods.” Econometrica 39, no. 5: 829–844. 10.2307/1909582. [DOI] [Google Scholar]
- Fair, R. 1978. “A Theory of Extramarital Affairs.” Journal of Political Economy 86, no. 1: 45–61. http://www.jstor.org/stable/1828758. [Google Scholar]
- Friendly, M. , and Meyer D.. 2015. Discrete Data Analysis With R: Visualization and Modeling Techniques for Categorical and Count Data. 1st ed. Chapman and Hall/CRC. 10.1201/b19022. [DOI] [Google Scholar]
- Gardner, W. , Mulvey E. P., and Shaw E. C.. 1995. “Regression Analyses of Counts and Rates: Poisson, Overdispersed Poisson, and Negative Binomial Models.” Psychological Bulletin 118, no. 3: 392–404. 10.1037/0033-2909.118.3.392. [DOI] [PubMed] [Google Scholar]
- Green, J. A. 2021. “Too Many Zeros and/or Highly Skewed? A Tutorial on Modelling Health Behaviour as Count Data With Poisson and Negative Binomial Regression.” Health Psychology and Behavioral Medicine 9, no. 1: 436–455. 10.1080/21642850.2021.1920416. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Greene, W. H. 2003. Econometric Analysis. 5th ed. Prentice Hall. [Google Scholar]
- He, H. , Tang W., Wang W., and Crits‐Christoph P.. 2014. “Structural Zeroes and Zero‐Inflated Models.” Shanghai Archives Psychiatry 26, no. 4: 236–242. 10.3969/j.issn.1002-0829.2014.04.008. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hilbe, J. M. 2012. Negative Binomial Regression. 2nd ed. Cambridge University Press. 10.1017/CBO9780511973420. [DOI] [Google Scholar]
- Hilbe, J. M. 2014. Modeling Count Data. Cambridge University Press. 10.1017/cbo9781139236065. [DOI] [Google Scholar]
- Jackman, S. 2020. “pscl: Classes and Methods for R Developed in the Political Science Computational Laboratory.” United States Studies Centre, University of Sydney. Sydney, New South Wales, Australia. R Package Version 1.5.5.1. https://github.com/atahk/pscl/.
- Kleiber, C ., and A. Zeileis. 2016. “Visualizing count data Regressions Using Rootograms.” The American Statistician 70, no. 3: 296–303. [Google Scholar]
- Kriegeskorte, N. , Simmons W. K., Bellgowan P., and Baker C. I.. 2009. “Circular Analysis in Systems Neuroscience: The Dangers of Double Dipping.” Nature Neuroscience 12: 535–540. 10.1038/nn.2303. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lambert, D. 1992. “Zero‐Inflated Poisson Regression, With an Application to Defects in Manufacturing.” Technometrics 34, no. 1: 1–14. 10.2307/1269547. [DOI] [Google Scholar]
- Lüdecke, D. , Ben‐Shachar M., Patil I., Waggoner P., and Makowski D.. 2021. “Performance: An R Package for Assessment, Comparison and Testing of Statistical Models.” Journal of Open Source Software 6, no. 60: 3139. 10.21105/joss.03139. [DOI] [Google Scholar]
- Maher, M. J. , and Summersgill I.. 1996. “A Comprehensive Methodology for the Fitting of Predictive Accident Models.” Accident; Analysis and Prevention 28, no. 3: 281–296. 10.1016/0001-4575(95)00059-3. [DOI] [PubMed] [Google Scholar]
- Mullahy, J. 1986. “Specification and Testing of Some Modified Count Data Models.” Journal of Econometrics 33, no. 3: 341–365. 10.1016/0304-4076(86)90002-3. [DOI] [Google Scholar]
- Perumean‐Chaney, S. E. , Morgan C., McDowall D., and Aban I.. 2013. “Zero‐Inflated and Overdispersed: What's One to Do?” Journal of Statistical Computation and Simulation 83, no. 9: 1671–1683. 10.1080/00949655.2012.668550. [DOI] [Google Scholar]
- R Core Team . 2023. R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing. https://www.R‐project.org/. [Google Scholar]
- Tukey, J. W. 1977. Exploratory Data Analysis. Addison‐Wesley. [Google Scholar]
- Venables, W. N. , and Ripley B. D.. 2002. Modern Applied Statistics With S. Fourth ed. Springer. https://www.stats.ox.ac.uk/pub/MASS4/. [Google Scholar]
- Vives, J. , Losilla J. M., and Rodrigo M. F.. 2006. “Count Data in Psychological Applied Research.” Psychological Reports 98, no. 3: 821–835. 10.2466/pr0.98.3.821-835. [DOI] [PubMed] [Google Scholar]
- Wells, C. S. , and Hintze J. M.. 2007. “Dealing With Assumptions Underlying Statistical Tests.” Psychology in the Schools 44, no. 5: 495–502. 10.1002/pits.20241. [DOI] [Google Scholar]
- Zeileis, A. , and Hothorn T.. 2002. “Diagnostic Checking in Regression Relationships.” R News 2, no. 3: 7–10. https://CRAN.R‐project.org/doc/Rnews/. [Google Scholar]
- Zeileis, A. , Köll S., and Graham N.. 2020. “Various Versatile Variances: An Object‐Oriented Implementation of Clustered Covariances in R.” Journal of Statistical Software 95, no. 1: 1–36. 10.18637/jss.v095.i01. [DOI] [Google Scholar]
- Zeileis, A. , Lang M., and Stauffer R.. 2023. “topmodels: Infrastructure for Forecasting and Assessment of Probabilistic Models. R Package Version 0.3‐0/r1757.” https://R‐Forge.R‐project.org/projects/topmodels/.
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data S1. The complete R code and output for the analyses (as a single .html file) and Supporting Information containing additional topics are available in the Open Science Framework (OSF) at https://osf.io/tuysr/?view_only=1adf6a048b094680b8c91b78ffc81bb7.
