Table 1.
Assumptions | Benefits | Drawbacks | |
---|---|---|---|
Ordinary Least Squares (OLS) Regression | • Outcome variable normally distributed • Independence • Homoscedasticity • Continuous outcomes |
• Familiar to most researchers • Relatively easy to use • Can be used with continuous, non-count variables |
• Normality and homoscedasticity assumptions are rarely met • Violations of normality and homoscedasticity can distort Type I and Type II error rates and reduce power • Affected by outliers |
OLS-Transformed | • Outcome variable normally distributed • Independence • Homoscedasticity • Continuous outcomes |
• Familiar to most researchers • Relatively easy to use • Can be used with continuous, non-count variables |
• Transformation does not restore normality and homoscedasticity in all cases • Outliers can remain after transforming data • Difficult to interpret results due to change in scale |
Logistic Regression | • Dichotomous outcomes • Independence |
• Only predicts possible probabilities • Not affected by outliers |
• Only appropriate for dichotomous outcomes (or those recoded to be dichotomous) • Recoding variables into dichotomous outcomes may inflate Type II error • Sample size must be large when outcomes are infrequent |
Poisson Regression | • Outcome assumed to be distributed as a Poisson random variable • Assumes variance is equal to the mean • Continuous count outcomes |
• Can be used in highly skewed distributions • Appropriate for count data • Appropriate when the mean count is a small value |
• Selecting a Poisson model when the data are over-dispersed can result in Type I errors • May not be appropriate for a large number of zeros • Affected by outliers |
Negative Binomial Regression | • Allows for independent specification of the mean and variance • Continuous count outcomes |
• Can be used in highly skewed distributions • May be advantageous when over dispersion of outcomes occurs |
• May not be appropriate for a large number of zeros • Affected by outliers |
Zero-inflated Regression | • Assumes a logistic regression model for the zero vs. non-zero portion of the outcome • Assumes a Poisson or negative binomial distribution for the count portion of the model |
• May be most successful in evaluating outcomes when there is a preponderance of zeros • Able to maintain adequate power and Type I error control even when normality and heteroscedasticity assumptions are not met • Can be used with highly skewed data |
• Requires more power • Affected by outliers |
Hurdle Regression | • All zeros are structural zeros (i.e., true zeros) • Assumes separate processes for zero and non-zero counts |
• Appropriate when the zero portion of the model and the count portion of the model are considered to arise from discrete processes • Able to maintain adequate power and Type I error control even when normality and heteroscedasticity assumptions are not met • Can be used with highly skewed data • Relatively easy to interpret |
• Requires more power • Affected by outliers |