Evaluating the Performance of Zero-Inflated and Hurdle Poisson Models for Modeling Overdispersion in Count Data

Aswi Aswi, Sri Ayu Astuti, Sudarmin Sudarmin

Abstract


A Poisson regression model is commonly used to model count data. The Poisson model assumes equidispersion, that is, the mean is equal to the variance. This assumption is often violated. In count data, overdispersion (the variance is larger than the mean) occurs frequently due to excessive zeroes in the response variable. Zero-inflated Poisson (ZIP) and Hurdle models are commonly used to fit data with excessive zeros. Although some studies have compared the ZIP and Hurdle models, the results are inconsistent. This paper aims to evaluate the performance of ZIP and Hurdle Poisson models for overdispersion data through both simulation study and real data. Data were simulated with three different sample sizes, six different means, and three different probabilities of zero with 500 replications. Model goodness-of-fit measures were compared by using Akaike Information Criteria (AIC). Overall, the ZIP model performed relatively the same or better than the Hurdle Poisson model under different scenarios, but both ZIP and Hurdle models are better than the standard Poisson model for overdispersion in count data.

Keywords


Hurdle Poisson; Overdispersion; Zero-inflated Poisson (ZIP)

Full Text:

PDF

References


C. E. Rose, S. W. Martin, K. A. Wannemuehler, and B. D. Plikaytis, "On the Use of Zero-Inflated and Hurdle Models for Modeling Vaccine Adverse Event Count Data," Journal of biopharmaceutical statistics, vol. 16, no. 4, pp. 463-481, 2006.

Z. Yang, J. W. Hardin, and C. L. Addy, "Testing overdispersion in the zero-inflated Poisson model," Journal of statistical planning and inference, vol. 139, no. 9, pp. 3340-3353, 2009.

S. E. Perumean-Chaney, C. Morgan, D. McDowall, and I. Aban, "Zero-inflated and overdispersed: what's one to do?," Journal of statistical computation and simulation, vol. 83, no. 9, pp. 1671-1683, 2013.

C. X. Feng, "A comparison of zero-inflated and hurdle models for modeling zero-inflated count data," Journal of statistical distributions and applications, vol. 8, no. 1, pp. 1-19, 2021.

L. Xu, A. D. Paterson, W. Turpin, and W. Xu, "Assessment and Selection of Competing Models for Zero-Inflated Microbiome Data," PloS one, vol. 10, no. 7, pp. e0129606-e0129606, 2015.

F. Tüzen, S. Erbaş, and H. Olmuş, "A simulation study for count data models under varying degrees of outliers and zeros," Communications in statistics. Simulation and computation, vol. 49, no. 4, pp. 1078-1088, 2020.

S. Sharker, L. Balbuena, G. Marcoux, and C. X. Feng, "Modeling socio-demographic and clinical factors influencing psychiatric inpatient service use: a comparison of models for zero-Inflated and overdispersed count data," BMC medical research methodology, vol. 20, no. 1, pp. 232-232, 2020.

Q. H. Vuong, "Likelihood Ratio Tests for Model Selection and Non-Nested Hypotheses," Econometrica, vol. 57, no. 2, pp. 307-333, 1989.

Y. Min and A. Agresti, "Random effect models for repeated measures of zero-inflated count data," Statistical modelling, vol. 5, no. 1, pp. 1-19, 2005.

M.-C. Hu, M. Pavlicova, and E. V. Nunes, "Zero-Inflated and Hurdle Models of Count Data with Extra Zeros: Examples from an HIV-Risk Reduction Intervention Trial," The American journal of drug and alcohol abuse, vol. 37, no. 5, pp. 367-375, 2011.

M. S. Workie and A. G. Azene, "Bayesian zero-inflated regression model with application to under-five child mortality," Journal of big data, vol. 8, no. 1, pp. 1-23, 2021.

R Core Team, "R: A language and environment for statistical computing," ed. Vienna, Austria: R Foundation for Statistical Computing, 2019.

A. C. Cameron and P. K. Trivedi, Regression analysis of count data (Econometric Society monographs ; 30). Cambridge: Cambridge University Press, 1998.

D. Lambert, "Zero-Inflated Poisson Regression, with an Application to Defects in Manufacturing," Technometrics, vol. 34, no. 1, pp. 1-14, 1992.

J. Mullahy, "Specification and testing of some modified count data models," Journal of econometrics, vol. 33, no. 3, pp. 341-365, 1986.

H. Akaike, "Information Theory and an Extension of the Maximum Likelihood Principle," ed. New York, NY: Springer New York, 1998, pp. 610-624.




DOI: http://dx.doi.org/10.12962/j27213862.v5i1.12422

Refbacks

  • There are currently no refbacks.




Creative Commons License
Inferensi by Department of Statistics ITS is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Based on a work at https://iptek.its.ac.id/index.php/inferensi.

ISSN:  0216-308X

e-ISSN: 2721-3862

Web
Analytics Made Easy - StatCounter View My Stats