Skip to main content
. Author manuscript; available in PMC: 2019 Oct 5.
Published in final edited form as: Appetite. 2018 Jun 27;129:252–261. doi: 10.1016/j.appet.2018.06.030

Table 1.

Comparison of Regression-based Models

Assumptions Benefits Drawbacks
Ordinary Least Squares (OLS) Regression • Outcome variable normally distributed
• Independence
• Homoscedasticity
• Continuous outcomes
• Familiar to most researchers
• Relatively easy to use
• Can be used with continuous, non-count variables
• Normality and homoscedasticity assumptions are rarely met
• Violations of normality and homoscedasticity can distort Type I and Type II error rates and reduce power
• Affected by outliers
OLS-Transformed • Outcome variable normally distributed
• Independence
• Homoscedasticity
• Continuous outcomes
• Familiar to most researchers
• Relatively easy to use
• Can be used with continuous, non-count variables
• Transformation does not restore normality and homoscedasticity in all cases
• Outliers can remain after transforming data
• Difficult to interpret results due to change in scale
Logistic Regression • Dichotomous outcomes
• Independence
• Only predicts possible probabilities
• Not affected by outliers
• Only appropriate for dichotomous outcomes (or those recoded to be dichotomous)
• Recoding variables into dichotomous outcomes may inflate Type II error
• Sample size must be large when outcomes are infrequent
Poisson Regression • Outcome assumed to be distributed as a Poisson random variable
• Assumes variance is equal to the mean
• Continuous count outcomes
• Can be used in highly skewed distributions
• Appropriate for count data
• Appropriate when the mean count is a small value
• Selecting a Poisson model when the data are over-dispersed can result in Type I errors
• May not be appropriate for a large number of zeros
• Affected by outliers
Negative Binomial Regression • Allows for independent specification of the mean and variance
• Continuous count outcomes
• Can be used in highly skewed distributions
• May be advantageous when over dispersion of outcomes occurs
• May not be appropriate for a large number of zeros
• Affected by outliers
Zero-inflated Regression • Assumes a logistic regression model for the zero vs. non-zero portion of the outcome
• Assumes a Poisson or negative binomial distribution for the count portion of the model
• May be most successful in evaluating outcomes when there is a preponderance of zeros
• Able to maintain adequate power and Type I error control even when normality and heteroscedasticity assumptions are not met
• Can be used with highly skewed data
• Requires more power
• Affected by outliers
Hurdle Regression • All zeros are structural zeros (i.e., true zeros)
• Assumes separate processes for zero and non-zero counts
• Appropriate when the zero portion of the model and the count portion of the model are considered to arise from discrete processes
• Able to maintain adequate power and Type I error control even when normality and heteroscedasticity assumptions are not met
• Can be used with highly skewed data
• Relatively easy to interpret
• Requires more power
• Affected by outliers