On the Partial-Geometric Distribution: Properties and Applications

Krisada Khruachalee; Winai Bodhisuwan; Andrei Volodin

doi:10.1134/S1995080222010103

. 2022 Mar 28;42(13):3141–3149. doi: 10.1134/S1995080222010103

On the Partial-Geometric Distribution: Properties and Applications

Krisada Khruachalee ^1,^✉, Winai Bodhisuwan ^1,^✉, Andrei Volodin ^2,^✉

PMCID: PMC8961498

Abstract

In this article we introduce the new, two-parameter partial-geometric distribution (PG) that contains both geometric and first success distributions as a particular case. Some probability and statistical properties of the proposed distribution are discussed, including probability mass function, mean, variance, moment generating function, and probability generating function. We propose the method of maximum likelihood for estimating the model’s parameters, and apply the PG distribution to two real datasets to illustrate the flexibility of the proposed distribution. We found the PG distribution is more dynamic than the geometric distribution in the sense that it can be applied to the under-dispersed data. The PG distribution also performs well with a goodness of fit test and some other model selection characteristics for model fitting of these two datasets. Thus, the PG distribution can be applied as an alternative model for the analysis of discrete data.

Keywords: geometric distribution, first success distribution, maximum likelihood estimation method

INTRODUCTION

There is much interest in developing the most flexible probability distributions; many generalized classes of distributions have been developed and applied to describe various natural phenomena [1]. To provide an explanation of a natural phenomenon, researchers consider a construction of the new generalized class of distributions, and decide whether the underlying distribution should be regarded as discrete, continuous, or of a mixed type. The discrete distributions are very useful in many applications especially when the count phenomenon consists of non-negative integers. This happens when the number of times a discrete event occurrences are observed and examined in a specific area or period of time [2]. Examples include the number of trips per month that a person takes, the number of children a couple has, the number of Prussian soldier deaths during the Crimean war resulting from being kicked by a horse (a famous classical example related to the Poisson distribution); see [3–5], etc. In these situations, the continuous model is inappropriate to describe the count phenomenon. Accordingly, the discrete models are as significant as the continuous models.

Situations where a number of trials or experiments must occur before a predetermined number of successes, such as the number of bills that must be proposed to a legislature before 10 bills are passed, and recently the number of Thai citizens that must be tested for COVID-19 before 100 Thai citizens are confirmed to be infectious, are of the interest to many researchers keen to find suitable probability distributions that explain these natural phenomena.

In addition, the complications of comparing the probabilities of success for the Bernoulli trials that mostly arise in medical and biological investigations, acceptance sampling in quality control, and modeling demand for a product are considered. These real-life phenomena can be described by the geometric distribution. However, there are criticisms that the geometric and first success distributions are sometimes considered to be the same, when they are actually different.

Because the confusion between the geometric and first success distributions plays a very important role in this study, we would like to introduce a new family of distributions called the partial-geometric (PG) distribution that contains both the geometric and first success distributions as a particular case. The idea to combine and consider the geometric and first success distributions as a member of one distribution family seems to be very natural, but we did not see it in literature. The number of the studies that propose modifications of the geometric distribution for various purposes is so large that we decided not to discuss them. The major advantage of the newly proposed PG distribution over the previously modified geometric distributions is the flexibility in applications to real-life data.

The remainder of the paper is organized as follows. In Section 2, we discuss the difference between geometric and first success distributions. The PG distribution and some of its mathematical properties are defined in Section 3. Then, the maximum likelihood estimates of the PG distribution parameters are discussed in Section 3.3. Finally, some practical applications of the proposed distribution are illustrated by a goodness of fit with two real datasets in Section 4.

MATERIALS AND METHODS

Based on the theoretical interpretation of the Bernoulli experiment, we note a confusion between two very simple and basic geometric and first success distributions. In some literature, these two distributions are considered the same. But, as mentioned in Gut (2009) [6], they are different and defined in the following way.

Let Inline graphic and . A random variable has a Geometric distribution with parameter , denoted by , if its probability mass function (pmf) is

A random variable Inline graphic has a First Success distribution with parameter , denoted by , if its pmf is

We can interpret the geometric distribution as the number of failures in Bernoulli experiments until we reach first successes, while the first success distribution is the number of trials in Bernoulli experiments need to reach the first success.

Referring to the properties of the probability generating function (pgf), the pgf of the geometric distribution Inline graphic can be calculated in the following way:

where Inline graphic In the same manner, the pgf of the first success distribution can be also calculated in accordance with the following procedure:

where again Inline graphic . According to the pgf of the geometric and first success distributions, we present Proposition 1 that can be used to illustrate the connection of these two distributions as follows.

Proposition 1. If a random variable Inline graphic , then the random variable follows distribution. Similarly, if a random variable , then the random variable follows distribution.

Proof. If a random variable Inline graphic , then the pgf of the random variable can be written

The second part of the Proposition can be shown in the similar way. Inline graphic

PARTIAL-GEOMETRIC DISTRIBUTION AND ITS MATHEMATICAL PROPERTIES

Partial-Geometric (PG) Distribution

Changing the momentum by adding an extra parameter Inline graphic , leads us to propose the PG distribution. A random variable has a Partial-Geometric distribution with parameters and , denoted by , if its pmf is

where Inline graphic .

In order to illustrate the appearance of the PG distribution, Figs. 1 and 2 show some pmf plots of the PG distribution with various values of the parameters Inline graphic (0.25, 0.50 and 0.75) and (, , and ) where . We found that the scale of the PG distribution change due to the parameter . On the other hand, the shape parameter of the PG distribution can be varied because of the parameter . It is seen that the pmf rapidly decreases as parameter increases. In addition, the PG distribution is clearly a unimodal curve when the Inline graphic conversely increases to . According to Figs. 1 and 2, we conclude that the PG distribution is right skewed and unimodal.

Fig. 1 — The pmf plots of the partial-geometric distribution in various combinations of ( and ) and (, , , and ).

Inline graphic — The pmf plots of the partial-geometric distribution in various combinations of ( and ) and (, , , and ).

Fig. 2 — The pmf plots of the partial-geometric distribution in various combinations of ( and ) and (, , , and ).

Measuring the dispersion of the partial-geometric (PG) distribution, the Inline graphic , the ratio between variance to mean, is applied under some specified values of parameters and from Figs. 1 and 2. The values of will indicate whether the distribution is over-dispersed () or under-dispersed () [7]. Table 1 illustrates that the partial-geometric (PG) distribution is more dynamic than the geometric distribution in the sense that it can be applied to the under-dispersed data as well where the geometric distribution is only suitable for over-dispersed data.

Table 1.

The mean, variance and index of dispersion ( Inline graphic ) values of the partial-geometric (PG) distribution for different value of and

Figure
1(a)	0.25	0.1111	1.3333	7.5556	5.6667
1(b)	0.25	0.1667	2.0000	10.0000	5.0000
1(c)	0.25	0.2222	2.6667	11.5556	4.3333
1(d)	0.25	0.3000	3.6000	12.2400	3.4000
1(e)	0.50	0.3333	0.6667	1.5556	2.3333
1(f)	0.50	0.5000	1.0000	2.0000	2.0000
2(a)	0.50	0.6667	1.3333	2.2222	1.6667
2(b)	0.50	0.9000	1.8000	2.1600	1.2000
2(c)	0.75	1.0000	0.4444	0.5432	1.2222
2(d)	0.75	1.5000	0.6667	0.6667	1.0000
2(e)	0.75	2.0000	0.8889	0.6914	0.7778
2(f)	0.75	2.7000	1.2000	0.5600	0.4667

Open in a new tab

Probability Properties

Some probability properties of the PG distributions, especially the mean, variance, moment generating function (mgf), and pgf are provided in this section.

Theorem 1. Let Inline graphic , then the mean of is

Proof. The expectation of the PG distribution can be obtained from

Since Inline graphic is the geometric series, the expectation will be

Theorem 2. Let Inline graphic , then the variance of is

Proof. The variance of the PG distribution can be obtained from

With the expectation definition,

Since Inline graphic is the geometric power series, then the will be equal to . Therefore,

Inline graphic

Theorem 3. Let Inline graphic , then the mgf of is where .

Proof. The mgf of the PG distribution can be achieved from

Since Inline graphic is the geometric series, then the mgf will be equals where

Theorem 4. Let Inline graphic , then the pgf of is where .

Proof. The pgf of the PG distribution can be acquired from

Since Inline graphic is the geometric series, then the pgf will be equals where

Parameter Estimation

We consider the maximum likelihood estimation (MLE) that is the most commonly used method for parameter estimation. Let Inline graphic be an independent and identically distributed random sample of size from the partial-geometric distribution, , and be the observed sample values. For denote the frequencies that is, is the count of observations that are equal to . Note that

The likelihood function can be written as

The log-likelihood function can be written as

By taking partial derivatives by the parameters, we obtain

The method of maximum likelihood estimators are found by equating the partial derivatives to zero; that is

By rewriting the second equation and substituting to the first equation, we obtain the estimated maximum likelihood parameters Inline graphic and of the PG distribution as

APPLICATIONS TO REAL LIFE DATA

In order to evaluate the performance of the PG distribution, we consider two real datasets to fit with the two competing geometric and Poisson distributions. We do not consider the first success distribution as a competing because the data contain zeros.

The first dataset is accident data that provides the total number of the claims for 9,461 automobile insurance policies [10]. The second dataset is the number of hospitals stays of persons age 66 and over, for which there are 4,406 observations. These data were acquired from the national medical expenditure survey of how Americans use and pay for health services conducted in 1987 and 1988 to reveal a comprehensive picture of medical expenditure [11].

The appropriate distribution for fitting these datasets is evaluated with the Anderson-Darling (AD) goodness of fit test for discrete data [12]. In addition, the discrete AD test is obtained by applying the dgof package [13] in the R language. Moreover, there are also other model selection criteria used to determine the best fit model: the minus log-likelihood (-LL), the Akaike information criterion (AIC), and the Bayesian information criterion (BIC). The results of fitting different distributions to these datasets are recorded in Tables 2 and 3.

Table 2.

Estimated parameters for the number of cliams of the automobile insurance policies

Number of	Observed	Expected value by fitting distribution
claims	frequency	Geometric	Partial-Geometric	Poisson
0	7840
1	1317
2	239
3	42
4	14
5	4
6	4
7	1
Estimates
LL
AIC
BIC
AD test
-value

Open in a new tab

Table 3.

Estimated parameters for the number of hospital stays

Number of	Observed	Expected value by fitting distribution
hospital stays	frequency	Geometric	Partial-Geometric	Poisson
0	3541
1	599
2	176
3	48
4	20
5	12
6	5
7	1
8	4
Estimates
LL
AIC
BIC
AD test
-value

Open in a new tab

The fitted distributions for the number of claims and the number of hospital stays presented in Tables 2 and 3 illustrate that the Inline graphic -value based on the discrete AD test statistic of the PG distribution provides a good fit to the data where it provides the largest -value among others. Moreover, the PG distribution provides the lowest values of -LL, AIC and BIC. Obviously, the PG distribution provides the nearest expected value to the observed frequency. Therefore, the most appropriate fit distribution among these three distributions for the number of claims and the number of hospital stays is the PG distribution followed by the geometric and Poisson distributions respectively.

Figure 3 illustrates the plots of fitted frequency of the geometric, partial-geometric, and Poisson distributions with the observed datasets for the total number of claims of the automobile insurance policies and the number of hospital stays. It firmly shows that the partial-geometric (PG) distribution provides the most fitted performance to these two datasets among the three distributions. Thus, we consistently conclude that the PG distribution is more flexible than the geometric and Poisson distributions.

Fig. 3 — The fitted frequency of the geometric, partial-geometric and Poisson distributions to real datasets.

DISCUSSION AND CONCLUSION

Confusion between geometric and first success distributions led to the idea of developing a new family of distributions. By adding an extra parameter to an existing distribution for capturing more variation of the natural phenomena, the partial-geometric distribution that contains both geometric and first success distributions as a particular case is proposed. We found that the PG distribution is right skewed and unimodal. Moreover, it can also be applied to model under-dispersed data. We also derived some essential probability properties, for instance, probability mass function, mean, variance, moment generating function, and probability generating function. The maximum likelihood estimation method is employed to estimate the parameters of the PG distribution. Due to the practical applications with two real datasets, the PG distribution provides the highest Inline graphic -value for the discrete AD test and also provides the lowest values of -LL, AIC and BIC as well. Therefore, the PG distribution is useful as an alternative to other distribution for the analysis of discrete data.

ACKNOWLEDGMENTS

The authors would like to thank the Department of Statistics, Faculty of Science, Kasetsart University.

FUNDING

The research of the author listed last was partially supported by the subsidy allocated to Kazan Federal University for the state assignment in the sphere of scientific activities, project 1.13556.2019/13.1.

Footnotes

(Submitted by A. M. Elizarov)

Contributor Information

Krisada Khruachalee, Email: krisada.khr@ku.th.

Winai Bodhisuwan, Email: fsciwnb@ku.ac.th.

Andrei Volodin, Email: andrei@uregina.ca.

REFERENCES

1.Alzaatreh A., Lee C., Famoye F. A new method for generating families of continuous distributions. METRON. 2013;71:63–79. doi: 10.1007/s40300-013-0007-y. [DOI] [Google Scholar]
2.Johnson N. L., Kotz S. Distribution in Statistics: Discrete Distributions. New York: Houghton Mifflin; 1969. [Google Scholar]
3.Bortkiewicz L. V. Das Gesetz der kleinen Zahlen. Leipzig: G. Teubner; 1898. [Google Scholar]
4.Winsor C. P. Quotations: Das Gesetz der kleinen Zahlen. Human Biology. 1947;19:154–161. [PubMed] [Google Scholar]
5.Weaver W. Lady Luck: The Theory of Probability. London: Heinemann; 1964. [Google Scholar]
6.Gut A. An Intermediate Course in Probability. New York: Springer Science; 2009. [Google Scholar]
7.Chakraborty S., Chakravarty D. Discrete gamma distributions: Properties and parameter estimations. Commun. Stat.—Theory Method. 2012;41:3301–3324. doi: 10.1080/03610926.2011.563014. [DOI] [Google Scholar]
8.Nash J. C. On best practice optimization methods in R. J. Stat. Software. 2014;60:1–14. doi: 10.18637/jss.v060.i02. [DOI] [Google Scholar]
9.R Core Team, R: A Language and Environment for Statistical Computing (2020).
10.Klugman S. A., Panjer H. H., Willmot G. E. Loss Models: From Data to Decisions. Hoboken, NJ: Wiley; 2012. [Google Scholar]
11.Deb P., Trivedi P. K. Demand for medical care by the elderly: A finite mixture approach. J. Appl. Econometr. 1997;12:313–336. doi: 10.1002/(SICI)1099-1255(199705)12:3<313::AID-JAE440>3.0.CO;2-G. [DOI] [Google Scholar]
12.Choulakian V., Lockhart R. A., Stephens M. A. Cramer-von Mises statistics for discrete distributions. Canad. J. Stat. 1994;22:125–137. doi: 10.2307/3315828. [DOI] [Google Scholar]
13.Arnold T. B., Emerson J. W. Nonparametric goodness-of-fit tests for discrete null distributions. R J. 2011;3:34–39. doi: 10.32614/RJ-2011-016. [DOI] [Google Scholar]

[CR1] 1.Alzaatreh A., Lee C., Famoye F. A new method for generating families of continuous distributions. METRON. 2013;71:63–79. doi: 10.1007/s40300-013-0007-y. [DOI] [Google Scholar]

[CR2] 2.Johnson N. L., Kotz S. Distribution in Statistics: Discrete Distributions. New York: Houghton Mifflin; 1969. [Google Scholar]

[CR3] 3.Bortkiewicz L. V. Das Gesetz der kleinen Zahlen. Leipzig: G. Teubner; 1898. [Google Scholar]

[CR4] 4.Winsor C. P. Quotations: Das Gesetz der kleinen Zahlen. Human Biology. 1947;19:154–161. [PubMed] [Google Scholar]

[CR5] 5.Weaver W. Lady Luck: The Theory of Probability. London: Heinemann; 1964. [Google Scholar]

[CR6] 6.Gut A. An Intermediate Course in Probability. New York: Springer Science; 2009. [Google Scholar]

[CR7] 7.Chakraborty S., Chakravarty D. Discrete gamma distributions: Properties and parameter estimations. Commun. Stat.—Theory Method. 2012;41:3301–3324. doi: 10.1080/03610926.2011.563014. [DOI] [Google Scholar]

[CR8] 8.Nash J. C. On best practice optimization methods in R. J. Stat. Software. 2014;60:1–14. doi: 10.18637/jss.v060.i02. [DOI] [Google Scholar]

[CR9] 9.R Core Team, R: A Language and Environment for Statistical Computing (2020).

[CR10] 10.Klugman S. A., Panjer H. H., Willmot G. E. Loss Models: From Data to Decisions. Hoboken, NJ: Wiley; 2012. [Google Scholar]

[CR11] 11.Deb P., Trivedi P. K. Demand for medical care by the elderly: A finite mixture approach. J. Appl. Econometr. 1997;12:313–336. doi: 10.1002/(SICI)1099-1255(199705)12:3<313::AID-JAE440>3.0.CO;2-G. [DOI] [Google Scholar]

[CR12] 12.Choulakian V., Lockhart R. A., Stephens M. A. Cramer-von Mises statistics for discrete distributions. Canad. J. Stat. 1994;22:125–137. doi: 10.2307/3315828. [DOI] [Google Scholar]

[CR13] 13.Arnold T. B., Emerson J. W. Nonparametric goodness-of-fit tests for discrete null distributions. R J. 2011;3:34–39. doi: 10.32614/RJ-2011-016. [DOI] [Google Scholar]

PERMALINK

On the Partial-Geometric Distribution: Properties and Applications

Krisada Khruachalee

Winai Bodhisuwan

Andrei Volodin

Abstract

INTRODUCTION

MATERIALS AND METHODS

PARTIAL-GEOMETRIC DISTRIBUTION AND ITS MATHEMATICAL PROPERTIES

Partial-Geometric (PG) Distribution

Fig. 1.

Fig. 2.

Table 1.

Probability Properties

Parameter Estimation

APPLICATIONS TO REAL LIFE DATA

Table 2.

Table 3.

Fig. 3.

DISCUSSION AND CONCLUSION

ACKNOWLEDGMENTS

FUNDING

Footnotes

Contributor Information

REFERENCES

ACTIONS

PERMALINK

RESOURCES

Cite

Add to Collections

PERMALINK

On the Partial-Geometric Distribution: Properties and Applications

Krisada Khruachalee

Winai Bodhisuwan

Andrei Volodin

Abstract

INTRODUCTION

MATERIALS AND METHODS

PARTIAL-GEOMETRIC DISTRIBUTION AND ITS MATHEMATICAL PROPERTIES

Partial-Geometric (PG) Distribution

Fig. 1.

Fig. 2.

Table 1.

Probability Properties

Parameter Estimation

APPLICATIONS TO REAL LIFE DATA

Table 2.

Table 3.

Fig. 3.

DISCUSSION AND CONCLUSION

ACKNOWLEDGMENTS

FUNDING

Footnotes

Contributor Information

REFERENCES

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases