Table 1.

Comparison of Regression-based Models

	Assumptions	Benefits	Drawbacks
Ordinary Least Squares (OLS) Regression	• Outcome variable normally distributed • Independence • Homoscedasticity • Continuous outcomes	• Familiar to most researchers • Relatively easy to use • Can be used with continuous, non-count variables	• Normality and homoscedasticity assumptions are rarely met • Violations of normality and homoscedasticity can distort Type I and Type II error rates and reduce power • Affected by outliers
OLS-Transformed	• Outcome variable normally distributed • Independence • Homoscedasticity • Continuous outcomes	• Familiar to most researchers • Relatively easy to use • Can be used with continuous, non-count variables	• Transformation does not restore normality and homoscedasticity in all cases • Outliers can remain after transforming data • Difficult to interpret results due to change in scale
Logistic Regression	• Dichotomous outcomes • Independence	• Only predicts possible probabilities • Not affected by outliers	• Only appropriate for dichotomous outcomes (or those recoded to be dichotomous) • Recoding variables into dichotomous outcomes may inflate Type II error • Sample size must be large when outcomes are infrequent
Poisson Regression	• Outcome assumed to be distributed as a Poisson random variable • Assumes variance is equal to the mean • Continuous count outcomes	• Can be used in highly skewed distributions • Appropriate for count data • Appropriate when the mean count is a small value	• Selecting a Poisson model when the data are over-dispersed can result in Type I errors • May not be appropriate for a large number of zeros • Affected by outliers
Negative Binomial Regression	• Allows for independent specification of the mean and variance • Continuous count outcomes	• Can be used in highly skewed distributions • May be advantageous when over dispersion of outcomes occurs	• May not be appropriate for a large number of zeros • Affected by outliers
Zero-inflated Regression	• Assumes a logistic regression model for the zero vs. non-zero portion of the outcome • Assumes a Poisson or negative binomial distribution for the count portion of the model	• May be most successful in evaluating outcomes when there is a preponderance of zeros • Able to maintain adequate power and Type I error control even when normality and heteroscedasticity assumptions are not met • Can be used with highly skewed data	• Requires more power • Affected by outliers
Hurdle Regression	• All zeros are structural zeros (i.e., true zeros) • Assumes separate processes for zero and non-zero counts	• Appropriate when the zero portion of the model and the count portion of the model are considered to arise from discrete processes • Able to maintain adequate power and Type I error control even when normality and heteroscedasticity assumptions are not met • Can be used with highly skewed data • Relatively easy to interpret	• Requires more power • Affected by outliers