Abstract
The beta model is the most important distribution for fitting data with the unit interval. However, the beta distribution is not suitable to model bimodal unit interval data. In this paper, we propose a bimodal beta distribution constructed by using an approach based on the alpha-skew-normal model. We discuss several properties of this distribution, such as bimodality, real moments, entropies and identifiability. Furthermore, we propose a new regression model based on the proposed model and discuss residuals. Estimation is performed by maximum likelihood. A Monte Carlo experiment is conducted to evaluate the performances of these estimators in finite samples with a discussion of the results. An application is provided to show the modelling competence of the proposed distribution when the data sets show bimodality.
Keywords: Bimodal model, bimodality, bounded data, beta distribution, maximum likelihood, regression model
1. Introduction
The need for modelling and analysing the bimodal bounded data, especially for data on the unit interval, occurs in many fields of real life, such as bioinformatics [12], image classification [16], transaction at a car dealership [26] and so on. In such situations, in order to apply probabilistic modelling for these phenomena, under a parametric paradigm, probability distributions limited to are indispensable. The unimodal beta model is the most widely used model in the literature to describe data in the unit interval, especially because of its flexibility and fruitful properties [13]. However, despite its broad sense applicability in many fields, the beta distribution is not suitable to model bimodal data on the unit interval.
In general, one uses mixtures of distributions to describe the bimodal data. For example, the studies [26] and [25] consider finite mixtures of beta regression models to analyze the priming effects in judgements of imprecise probabilities. However, in general, mixtures of distributions may suffer from identifiability problems in the parameter estimation; see Refs. [14,15]. Thus, new mixture-free models which have the capacity to accommodate both unimodal and bimodal data are very important. The nature of phenomena can show bimodality due to many reasons, such as economical policies, uncertainty of social movement and its effects on the economy [28,30].
Since the structure of phenomena depends on many factors, it is reasonable to expect that the non-identically distributed data can occur in the data observed by the experimenter. For example, bimodality was introduced by Elal-Olivero [8], Domma et al. [6] and Vila and Çankaya [28] is a necessary probabilistic model to perform an efficient fitting on the non-identically distributed data set or the mixed data set. If we have the mixed data set, the mixed form of Beta and Weibull in Vila and Çankaya [28] should be necessary to model the data set efficiently, because the mixing proportions and in the bimodal case cannot be estimated accurately. The analytical expression of the mixed distribution can lead to problem while the optimization of the maximum likelihood estimation method according to parameters from parametric models, such as Beta, Weibull, etc. and the mixing parameters and of Beta from separate populations Beta1 and Beta2 is performed. At least, we can come across the numerical error while performing computation. The original working principle of phenomena can depend on the probabilistic model such as the bimodal beta (Bbeta) or the bimodal Weibull (BWeibull) in Vila and Çankaya [28]. On the other hand, the parameters in the mixed form of two Beta distributions, i.e. Beta1 and Beta2, are , , , , and . However, Bbeta distribution includes four parameters which are α, β, ρ and δ. Bbeta has less parameters when compared with the mixed form of Beta. Further, since we have the exact expression for the cumulative distribution function of Bbeta, it is advantageous for us to generate the bimodal artificial data sets, which can be used to check whether or not a data set in the system can be modelled by Bbeta with the estimated parameters. In other words, the results and outputs on the unit interval data can be modelled and tested by using the proposed distribution.
Variations of the beta model can be found in Ferrari and Cribari-Neto [9], Ospina and Ferrari [21], Bayes et al. [4], Hahn [11], among others. However, all the models cited above are not suitable for capturing bimodality. Recently, probabilistic models for modelling bimodality on the positive real line were discussed by various authors. Olmos et al. [20] introduced recently a bimodal extension of the unit-Birnbaum-Saunders distribution. Vila et al. [29] proposed the bimodal gamma distribution. Vila and Çankaya [28] considered a bimodal Weibull distribution. Recently, [17] proposed a family of bimodal distributions generated by distributions with positive support. Despite this, to the best of our knowledge, a specific parametric model to describe bimodality data observed of the unit interval has never been considered in the literature recently. Despite this, to the best of our knowledge, a specific regression bimodal model to unit interval data with a regression structure for the parameters has never been considered in the literature. Martnez-Flrez et al. [18] considered a transformation in a random variable that follows a unit-bimodal Birnbaum-Saunders (UBBS for short) distribution only in the case of identically and independently variables.
Based on the above discussion and motivated by the presence of bimodality in proportion responses, we develop a model for double-bounded response variables. In particular, we extended the usual beta distribution using a quadratic transformation technique used to generate bimodal functions [8,28]. The approach, therefore, appears to be a new development for the literature. We discuss several properties of the proposed model, such as bimodality, real moments, hazard rate, entropies and identifiability. Furthermore, we study the effects of the explanatory variables on the response variable using a regression model.
In what follows, we list some of the main contributions and advantages of the proposed model.
We introduce a new family of distributions that is flexible version of the usual beta distribution so that it is capable of fitting bimodal as well as unimodal data. We provide general properties of the proposed model;
We propose an extended version of the quadratic transformation technique used to generate bimodal functions;
The rest of the article proceeds as follows. In Sections 2 and 3, we present the new distribution and derive some of its properties. Then in Section 4, we present the main properties of the bimodal beta, which include entropies, stochastic representation and identifiability. Section 5 presents the bimodal beta regression model. Also, the estimation method for the model parameters and diagnostic measures are discussed. In Section 6, some numerical results of the estimators and the empirical distribution of the residuals are presented with a discussion of the results. A real-life application related to the proportion of votes that Jair Bolsonaro received in the second turn of Brazilian elections in 2018 is analysed in Section 7. Section 8 summarizes the main findings of the paper.
2. The bimodal beta distribution
In this section, the bimodal beta (Bbeta) distribution is introduced and its density is derived. Moreover, some results on the bimodality properties are obtained. We say that a random variable (r.v.) X has a Bbeta distribution with parameter vector , , and , denoted by , if its probability density function (PDF) is given by
| (1) |
where
| (2) |
denotes the normalization constant and is the beta function. When , and , we have the U-quadratic distribution on . When , we obtain the classical beta distribution with parameter vector . The parameters α, β (which appear as exponents of the r.v.) and ρ control the shape of the distribution. The uni- or bimodality is controlled by the parameter δ. Note that for α, β and fixed, the parameter ρ also controls the unimodality or bimodality of the distribution; see Subsection 2.1. From Figure 1, we note some different shapes of the Bbeta PDF for different combinations of parameters. Figure 1(a,b) represents L shape and its bimodal form and bell shaped case of beta distribution, respectively.
Figure 1.
The PDF of the bimodal beta distribution for different values of parameters. In the figure on the left, the PDF presents strict decreasing monotonicity or decreasing-increasing-decreasing shapes. In the figure on the right, the PDF shows symmetry and uni- or bimodality.
Unlike Figure 1(b), Figure 2(a,b) shows that, a peak can be major peak and the other one can be minor peak.
Figure 2.
The PDF of the bimodal beta distribution for different values of parameters. Both figures present asymmetry and bimodality.
The asymptotic behaviour of the PDF (1) is as follows:
| (3) |
and
| (4) |
This asymptotic behaviour of the Bbeta PDF was expected, since the bimodal beta distribution is defined in terms of the classical beta. It is clear that when ; for and for .
If , the cumulative distribution function (CDF) (see Figure 3), the survival function (SF) and the hazard rate function (HR) of X are, respectively, given by
| (5) |
| (6) |
| (7) |
where is the incomplete beta function ratio, is the incomplete beta function, and , , . For more details on the derivation of these formulas, see Section 3.
Figure 3.
The CDF of the bimodal beta distribution for different values of parameters. Due to bimodality, as seen in the figure, it is natural to expect the CDF graph to present up to three inflection points.
2.1. Bimodality properties
To state the following result that guarantees the uni- or bimodality of the Bbeta distribution, we define the following cubic polynomial:
| (8) |
where , , and .
Theorem 2.1 Uni- or bimodality —
Let such that , and .
- (i)
If has a single positive zero then the Bbeta distribution is unimodal.
- (ii)
If has exactly three zeros in then the Bbeta distribution is bimodal.
Proof.
A simple computation shows that
(9) where is given in (8). Under the conditions stated in the theorem, we have , , and . By definition, the boundary points are never critical points, then we exclude the analysis at these points.
Since and because and , the Intermediate Value Theorem guarantees that there is at least one root in the interval . Further, by Descartes rule of signs (see, e.g. Refs. [31] and [10]), has one or three roots in the interval .
Assume that has a single zero. In this case, has a single critical point, denoted by . Since, for and , and , see limits in (3) and (4); it follows that increases on and decreases on . That is, is a global maximum point of . This proves Item (i).
On the other hand, if has exactly three zeros in then has three critical points and . Without loss of generality, let us assume that . Again, since, for and , and , it follows that increases on the intervals and , and decreases on and . In other words, and are two maximum points and is the unique minimum point. Then the statement in Item (ii) follows.
Thus we have completed the proof of the theorem.
Remark 2.1
By considering α, β, δ and ρ as in the Table 1, it is clear that the conditions of Theorem 2.1 are satisfied. Then, depending on the number of roots of , Theorem 2.1 guarantees the uni- or bimodality (U- or B) of PDF . These results are compatible with Figure 1(b).
Again, using the values of Figure 2(a ,b), it can be verified that the conditions of Theorem 2.1 are satisfied, which allows concluding the bimodality of the PDF . This contrasts the shape of PDF shown in Figure 2.
Table 1.
Roots of the polynomial and shapes of the PDF bimodal beta using the values of the parameters of Figure 1(b).
| α | β | δ | ρ | Real roots of in | Shape | ||||
|---|---|---|---|---|---|---|---|---|---|
| 2 | 2 | 2 | 0.25 | 24 | 1.25 | B | |||
| 2 | 2 | 2 | 1.5 | 24 | 2.5 | x = 0.5 | U | ||
| 2 | 2 | 2 | 0 | 24 | 1 | B | |||
| 2 | 2 | 2 | 0.5 | 24 | 1.5 | B | |||
| 2 | 2 | 2 | 2 | 24 | 3 | x = 0.5 | U |
Theorem 2.2 Bimodality; case —
If , , , , and
(10) then the Bbeta distribution is bimodal with maximum points
and minimum point , where .
Proof.
Taking in (9), we have
A direct calculus shows that if and only if (excluding the boundary points) and
Hence, under condition (10), it follows that the equation has three roots and within the interval , where .
Since, for and , and , see limits in (3) and (4); the bimodality of the Bbeta distribution is guaranteed, where and are two maximum points and is the unique minimum point.
Remark 2.2
Let and . It is clear that the condition (10) is satisfied. Then, by Theorem 2.2, the Bbeta distribution is bimodal with maximum points and , and minimum point ; which is compatible with Figure 1(b).
Proposition 2.3
The Bbeta PDF is symmetric at the point whenever and .
Proof.
A simple algebraic manipulation shows that, if and then , . Then the proof follows.
Theorem 2.3
If , , and , then the Bbeta distribution is bimodal with maximum points
and minimum point , where . Moreover, the maximum values coincide, that is, .
Proof.
As a by-product of proof of the Theorem 2.1, we have if and only if x is a zero of polynomial defined in (8). Setting and in polynomial , we get if and only if
Note that the above polynomial can be written as Then, it is clear that and are critical points of , where . Note that the restriction guarantees that the discriminant of the quadratic polynomial implicit in is positive.
By using that and , and by following the same steps as in the final paragraph of proof of the Theorem 2.2, we guarantee bimodality of the Bbeta distribution.
Finally, the identity follows from Proposition 2.3.
Remark 2.4
By considering α, β, δ and ρ as in the Table 2, it is clear that the restriction is satisfied. Then, Theorem 2.3 guarantees the bimodality (B) of PDF with minimum point and points (and values) of maximum specified in this table. These results are compatible with Figures 1(b) and 2(a)–(b).
Table 2.
Modes, maximum values and shapes of the PDF bimodal beta using the parameter values in Figure 1 (b) and Figure 2(a)–(b).
| α | β | δ | ρ | Shape | ||||
|---|---|---|---|---|---|---|---|---|
| 2 | 2 | 2 | 0.25 | 0.19 | 0.81 | 1.30 | 1.30 | B |
| 2 | 2 | 2 | 0 | 0.15 | 0.85 | 1.87 | 1.87 | B |
| 2 | 2 | 2 | 0.5 | 0.25 | 0.75 | 1.21 | 1.21 | B |
3. Moments
In this section, some closed expressions for truncated moments and real moments of the Bbeta distribution are obtained. Other properties as raw moments, mean residual life function and moment generating function were also analysed in Section I of the Supplementary Material.
Theorem 3.1
If then, for and ,
where , , , and is the incomplete beta function.
Proof.
By using definition of expectation and definition of Bbeta density, we have
Since the proof of the theorem follows.
Taking r = 0, b = x and a = 0 in Theorem 3.1, we get the formula (5) for the CDF. Letting r = 0, b = 1 and a = x in Theorem 3.1, we get the formula (6) for the SF. By combining the formula (6) of CDF and definition of the Bbeta distribution, we obtain the formula (7) for the HR.
Taking r = 1, a = x and b = 1 in Theorem 3.1, we get a closed formula for the mean residual life function, see Corollary 1.1 of the Supplementary Material.
Corollary 3.1 Real moments —
If and , then
Proof.
By taking b = 1 and a = 0 in Theorem 3.1 we have the following:
where , and .
As a consequence of the above corollary, the closed expressions for the standardized moments, variance, skewness and kurtosis of the bimodal beta r.v. X are easily obtained.
4. Further properties
In this section, we consider some properties of the Bbeta distribution, such as stochastic representation and identifiability. For reasons of space, entropy measures such as Tsallis [27], quadratic [23] and Shannon [24] ones were studied in Section II of the Supplementary Material.
4.1. Stochastic representation
Let W be a discrete r.v. with the following probability function , k = 0, 1, 2, where
and is as in (2). Notice that .
Let's consider the following three r.v.'s: , and . Then we define a new r.v. X as follows:
| (11) |
where W is independent of , and .
Proposition 4.1 Stochastic representation for —
If X admits the form (11), then . Conversely, if then X is as in (11).
Proof.
Using the law of total probability and the definition of X, we get
where in the last line we used the independence of W with respect to variables , and . Since , k = 0, 1, 2, the above equality becomes
But, by (5), the right-hand side is equal to the CDF .
Then we have completed the proof.
4.2. Identifiability
A simple observation shows that the bimodal beta PDF in (1), with parameter vector , can be written as a finite (generalized) mixture of three beta distributions with different shape parameters, i.e.
| (12) |
where , and are constants (that depends only on ) given in Proposition 4.1, and , 0<x<1, ( ), denotes the standard beta PDF.
Unlike Proposition 4.1, here δ can be non-negative. In principle, mixing non-negative weights are not necessary since mixtures can be PDF even if some of weights are negative.
Let be the family of beta distributions, as follows:
Write as the class of all finite mixtures of . It is well-known that is not identifiable; see the main Theorem of Ahmad and Al-Hussaini [1]. Let be the class of all finite mixtures of with the restriction that the shape parameters β are pairwise different (that is, for ). As a consequence of the main result of Atienza et al. [3], it is a simple task to prove that the class is identifiable; see, e.g. Proposition 3.2.2 of de Alencar [5] or Proposition 1.2 in the Appendix of Alfaia [2].
The following result proves the identifiability of bimodal beta distribution.
Proposition 4.2
The mapping , where the β's are pairwise different, is one-to-one.
Proof.
Let us suppose that for all 0<x<1, where and . In other words, by (12),
where and , k = 0, 1, 2, are defined as in Proposition 4.1. Since is identifiable, we have , for k = 0, 1, 2, and , . Hence, from equalities , k = 0, 1, 2, immediately follows that and . Therefore, , and the proof follows.
5. Regression model, estimation and diagnostic analysis
Let be n independent random variables, where each , , follows the PDF given in (1). We assume that the parameters and satisfy the following functional relations:
| (13) |
where and are vectors of unknown regression coefficients which are assumed to be functionally independent, and , with p + q<n, and are the linear predictors, and and are observations on p and q known regressors, for . Furthermore, we assume that the covariate matrices and have rank p and q, respectively. The link functions and in (13) must be strictly monotone, positive and at least twice differentiable, such that and , with and being the inverse functions of and , respectively. There are several possible choices for the link functions and . For example, one can use the logarithmic specification , square root , or identity (with special attention to the sign of the estimates), j = 1, 2. In this paper, we consider the log link, , since it is the most used for positive parameters.
The log-likelihood function for based on a sample of n independent observations is given by
| (14) |
where and is as in (2).
The maximum likelihood estimator (MLE) of is obtained by the maximization of the log-likelihood function (14). However, it is not possible to derive analytical solution for the MLE , hence we resort to numerical solution using some optimization algorithm, such as Newton-Raphson and quasi-Newton.
Under mild regularity conditions and when n is large, the asymptotic distribution of the MLE is approximately multivariate normal (of dimension p + q + 2) with mean vector and variance covariance matrix where is the expected Fisher information matrix. Unfortunately, there is no closed form expression for the matrix . Nevertheless, a consistent estimator of the expected Fisher information matrix is given by which is the estimated observed Fisher information matrix. Therefore, for large n, we can replace by .
Let be the r-th component of . The asymptotic confidence interval for is given by where is the upper quantile of the standard normal distribution and is the asymptotic standard error of . Note that is the square root of the r-th diagonal element of the matrix .
Residuals are widely used to check the adequacy of the fitted model. To check the goodness of fit of the Bbeta model, we propose to use the randomized quantile residuals introduced by Dunn and Smyth [7]. Let be the cumulative distribution function of the Bbeta distribution, as defined in (5), in which the regression structures are assumed as in (13). The randomized quantile residual is given by
where is the standard normal distribution function. If the assumed model for the data is well adjusted, these residuals have standard normal distribution [7].
6. Simulation study
In this section, Monte Carlo simulations are performed (i) to evaluate the finite-sample behaviour of the maximum likelihood estimates of the regression coefficients and (ii) to investigate the empirical distribution of the randomized quantile residuals.
The Monte Carlo experiments were carried out by considering the following regression structure
i.e. , where the true values of the parameters were chosen to be the same with the values of the estimated parameters for the case in which we use the application part of regression, i.e. and . The covariate values of and were generated from the standard uniform distribution. The sample size considered was n = 50, 100, 200 and 300. All simulations were conducted in R [22] using the BFGS algorithm available in the optim() function. For each scenario, the Monte Carlo experiment was repeated 5000 times.
The Bbeta distribution is easily simulated from (5) as follows: if U has a uniform distribution, the solution of the non-linear equation has the distribution, where is the inverse functions of . To simulate data from this non-linear equation, we can use the programming language R through f.inv() function [22].
In the rest of this section, a small simulation study is presented to observe the finite sample performance of the proposed estimators from a regression approach. For such evaluation, the estimated bias and the estimated mean squared error (MSE) were calculated. The results are presented in Table 3 and Figure 4.
Table 3.
The estimated values for bias and mean squared error of the maximum likelihood estimators of and δ, and some values of sample size n.
| n | The estimated bias | The estimated MSE | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| δ | ρ | δ | ρ | |||||||||
| 50 | 0.212 | 0.106 | 0.132 | 0.299 | 0.177 | 1.306 | 0.234 | 0.634 | 0.417 | 0.839 | 0.488 | 0.235 |
| 100 | 0.213 | 0.099 | 0.114 | 0.254 | 0.120 | 0.938 | 0.192 | 0.475 | 0.276 | 0.558 | 0.183 | 0.091 |
| 200 | 0.202 | 0.093 | 0.095 | 0.215 | 0.081 | 0.543 | 0.157 | 0.390 | 0.181 | 0.381 | 0.068 | 0.006 |
| 300 | 0.195 | 0.091 | 0.088 | 0.200 | 0.061 | 0.414 | 0.139 | 0.353 | 0.152 | 0.313 | 0.037 | 0.003 |
Figure 4.
Box plots from 5000 simulated estimates of and δ for different sample sizes.
Table 3 presents the bias and MSE for the MLEs of and δ. Based on the results at these tables, we find that the estimates are convergent to their corresponding values of parameters. As expected, increasing the sample size n reduces substantially both bias and MSE. The previous findings are confirmed by the box plots shown in Figure 4.
6.1. Residuals
The second simulation study was performed to examine how well the distributions of the randomized quantile residuals are approximated by the standard normal distribution. The evaluation of the randomized quantile residuals were based on the normal probability plots of the mean order statistics and descriptive statistics. The results are presented in Table 4 and Figure 4 of the Supplementary Material.
Table 4.
Descriptive measures of the randomized quantile residuals for the bimodal beta model for different sample sizes.
| n | Mean | StdDev | Skewness | Kurtosis |
|---|---|---|---|---|
| 50 | −0.001 | 0.999 | 0.028 | 2.854 |
| 100 | −0.002 | 0.999 | 0.054 | 2.976 |
| 200 | −0.003 | 0.997 | 0.077 | 3.002 |
| 300 | −0.003 | 0.997 | 0.084 | 3.025 |
In Table 4, we present the mean, standard deviation (StdDev), skewness and kurtosis of the randomized quantile residuals. For all scenarios, that is, the residuals have approximately zero mean and unit standard deviation, have skewness close to zero, and the kurtosis is near three.
7. Real data application
In this section, to evaluate the applicability of the proposed model, a real data set with bimodality is considered. In particular, a real-life application related to the proportion of votes that Jair Bolsonaro received in the second turn of Brazilian elections in 2018 is analysed. We compared the potentiality of the Bbeta regression with the traditional beta regression model. In order to estimate the parameters of model, we adopt the MLE method (as discussed in Section 5). The asymptotic standard errors were computed using the observed Fisher information matrix. The required numerical evaluations for data analysis were implemented using the R software [22].
The goal of this data analysis is to describe the proportion of votes that Jair Bolsonaro received in the second turn of Brazilian elections in 2018 for all 5.565 cities, and it is available at https://dadosabertos.tse.jus.br. The response variable is the proportion of votes given the municipal human development (mhdi). The MHDI is used as explanatory variable since it is an important measure to guide authorities to assess progress and social reality as well as to define public policy priorities and comparisons of different cities [19]. Figure 5 plots the histogram with density estimated the response variable used in the application and the scatter plot of municipal human development against proportion of votes. From Figure 5, we can see that the response variable has bimodality. Furthermore, there is evidence of a proportion of votes trend with increased municipal human development. The correlation coefficient between the proportion of votes and MHDI is 0.8290.
Figure 5.
Estimated PDF and scatter plot of municipal human development against proportion of votes.
To explain this proportion of votes we consider the bimodal beta regression model, defined as
where cities and is municipal human development of cities i. For comparison purposes the beta regression model was fitted, assuming that
and the unit-bimodal Birnbaum-Saunders (UBBS) regression model was fitted, assuming that
Table 5 shows the estimated parameters and standard errors. Table 6 shows Akaike information criterion (AIC) and Bayesian information criterion (BIC) for the fitted models. In general, it is expected that the better model fitting the data presents the smallest values for the quantities which are AIC and BIC. Based on the AIC and BIC criteria, the model which provides a better fit in this data set is the Bbeta regression model. This claim is also supported by the residual plots with simulated envelopes shown in Figure 6.
Table 5.
Maximum likelihood estimates and standard errors (SE) for the fit of the bimodal beta, beta and unit-bimodal Birnbaum-Saunders models in the proportion of votes.
| Model | Parameter | Estimate | SE |
|---|---|---|---|
| Bbeta | −1.8999 | 0.1963 | |
| 5.9471 | 0.3044 | ||
| 3.8341 | 0.1915 | ||
| −2.4232 | 0.2862 | ||
| ρ | 0.1096 | 0.0090 | |
| δ | 2.4092 | 0.0351 | |
| beta | −7.5343 | 0.0749 | |
| 11.1820 | 0.1105 | ||
| 1.0029 | 0.1675 | ||
| 2.5214 | 0.2528 | ||
| UBBS | −0.5721 | 0.1001 | |
| −0.1035 | 0.1436 | ||
| 5.0120 | 0.0257 | ||
| −8.0601 | 0.0381 | ||
| δ | 0.6405 | 0.0990 |
Table 6.
Information criteria for the fitted models.
| Models | AIC | BIC |
|---|---|---|
| Bbeta | −8786 | −8746 |
| beta | −8238 | −8212 |
| UBBS | −8115 | −8082 |
Figure 6.
Half-normal plot of randomized quantile residuals with simulated envelope for the fit of beta and bimodal beta.
8. Concluding remarks
When modeling responses with bimodal bounded to the unit interval, despite its broad sense applicability in many fields, the beta distribution is not suitable. In this paper, the well-known two-parameter beta distribution is extended by introducing two extra parameters, thus defining the bimodal beta (Bbeta) distribution, based on a quadratic transformation technique used to generate bimodal functions [8,28], which generalizes the beta distribution. We provide a mathematical treatment of the new distribution, including bimodality, moments, entropies, stochastic representation and identifiability. We allow a regression structure for the parameters α and β. The estimation of the model parameters is approached by maximum likelihood and its good performance has been evaluated by means of Monte Carlo simulations. Furthermore, we have proposed residuals for the proposed model and conducted a simulation study to establish their empirical properties in order to evaluate their performances. The proposed model was fitted to the proportion of votes that Jair Bolsonaro received in the second turn of Brazilian elections in 2018. As expected, the Bbeta model outperforms the beta regression in the presence of bimodality. Further, Bbeta is capable to fit well when compared with UBBS.
Supplementary Material
Acknowledgments
The authors would like to thank the reviewers for all useful and helpful comments on an earlier version of our manuscript, which resulted in this improved version.
Disclosure statement
No potential conflict of interest was reported by the author(s).
References
- 1.Ahmad K.E. and Al-Hussaini E.K., Remarks on the non-identifiability of mixtures of distributions, Ann. Inst. Stat. Math. 34 (1982), pp. 543–544. [Google Scholar]
- 2.Alfaia L.M., A Distribuição Beta Bimodal: Propriedades e Aplicaçães, UnB, Brasília, 2021. [Google Scholar]
- 3.Atienza N., Garcia-Heras J., and Munoz-Pichardo J.M., A new condition for identifiability of finite mixture distributions, Metrika 63 (2006), pp. 215–221. [Google Scholar]
- 4.Bayes C.L, Bazán J.L., and Catalina G., A new robust regression model for proportions, Bayesian Anal. 7 (2012), pp. 841–866. [Google Scholar]
- 5.de Alencar E.R., Discriminante não-linear Para Mistura De Distribuições Beta, UnB, Brasília, 2018. [Google Scholar]
- 6.Domma F., Popović B.V., and Nadarajah S., An extension of Azzalini's method, J. Comput. Appl. Math. 278 (2015), pp. 37–47. [Google Scholar]
- 7.Dunn P.K. and Smyth G.K., Randomized quantile residuals, J. Comput. Graph. Stat. 5 (1996), pp. 236–244. [Google Scholar]
- 8.Elal-Olivero D., Alpha-skew-normal distribution, Proyecciones J. Math. 29 (2010), pp. 224–240. [Google Scholar]
- 9.Ferrari S. and Cribari-Neto F, Beta regression for modelling rates and proportions, J. Appl. Stat. 31 (2004), pp. 799–815. [Google Scholar]
- 10.Griffiths L., Introduction to the Theory of Equations, J. Wiley, 1947. [Google Scholar]
- 11.Hahn E.D., Regression modelling with the tilted beta distribution: A Bayesian approach, Can. J. Stat. 49 (2021), pp. 262–282. [Google Scholar]
- 12.Ji Y., Wu C., Liu P., Wang J., and Coombes K.R, Applications of beta-mixture models in bioinformatics, Bioinformatics 21 (2005), pp. 2118–2122. [DOI] [PubMed] [Google Scholar]
- 13.Johnson N.L., Kotz S., and Balakrishnan N., Continuous Univariate Distributions, Vol. 2nd ed., 2, John Wiley & Sons Inc., New York, 1995. [Google Scholar]
- 14.Lin T.I., Lee J.C., and Hsieh W.J., Robust mixture models using the skew-t distribution, Stat. Comput. 17 (2007a), pp. 81–92. [Google Scholar]
- 15.Lin T.I., Lee J.C., and Yen S.Y., Finite mixture modeling using the skew-normal distribution, Stat. Sin. 17 (2007b), pp. 909–927. [Google Scholar]
- 16.Ma Z. and Leijon A., Beta mixture models and the application to image classification. Proceedings of IEEE International Conference on Image Processing (ICIP), 2045–2048, 2009.
- 17.Martínez-Flórez M., Martínez E., Tovar-Falón R., and Gómez H.W., A family of bimodal distributions generated by distributions with positive support, J. Appl. Stat. 49 (2022), pp. 3614–3637. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Martínez-Flórez M., Olmos N.M., and Venegas O., Unit-bimodal Birnbaum-Saunders distribution with applications, Commun. Stat. -- Simul. Comput. (2022), pp. 1–20. 10.1080/03610918.2022.2069260 [DOI] [Google Scholar]
- 19.Menezes A.F.B. and Furriel W.O., Beta and simplex regression models in the analysis of the municipal human development index 2010, Rev. Bras. Biom. 37 (2019), pp. 394–408. [Google Scholar]
- 20.Olmos N.M., Martínez-Flórez G., and Bolfarine H., Bimodal Birnbaum-Saunders distribution with applications to non-negative measurements, Commun. Stat. -- Theory Methods 46 (2017), pp. 6240–6257. [Google Scholar]
- 21.Ospina R. and Ferrari S.L.P., Inflated beta distributions, Stat. Pap. 51 (2008), pp. 111–126. [Google Scholar]
- 22.R Core Team , R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. URL https://www.R-project.org/ (2021).
- 23.Rao C.R., Quadratic entropy and analysis of diversity, Sankhya Ser. A. 72 (2010), pp. 70–80. [Google Scholar]
- 24.Shannon C.E., A mathematical theory of communication, Bell Syst. Tech. J. 27 (1948). 379–423. 623–656. [Google Scholar]
- 25.Smithson M., Merkle E.C., and Verkuilen J., Beta regression finite mixture models of polarization and priming, J. Educ. Behav. Stat. 36 (2011), pp. 804–831. [Google Scholar]
- 26.Smithson M. and Segale C., Partition priming in judgments of imprecise probabilities, J. Stat. Theory Pract. 3 (2009), pp. 169–181. [Google Scholar]
- 27.Tsallis C., Possible generalization of Boltzmann-Gibbs statistics, J. Stat. Phys. 52 (1988), pp. 479–487. [Google Scholar]
- 28.Vila R. and Çankaya M.N., A bimodal Weibull distribution: properties and inference, J. Appl. Stat. 49 (2022), pp. 3044–3062. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Vila R., Ferreira L., Saulo H., Prataviera F., and Ortega E.M.M., A bimodal gamma distribution: properties, regression model and applications, Statistics 54 (2020), pp. 469–493. [Google Scholar]
- 30.Wong M.C., Bubble Value At Risk: A Countercyclical Risk Management Approach, John Wiley & Sons, Singapore, 2013. [Google Scholar]
- 31.Xue J.Loop Tiling for Parallelism, The Springer International Series in Engineering and Computer Science, Springer, New York, 2012. [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.






