Abstract
We introduce the bivariate unit-log-symmetric model based on the bivariate log-symmetric distribution (BLS) defined in Vila et al. [25] as a flexible family of bivariate distributions over the unit square. We then study its mathematical properties such as stochastic representations, quantiles, conditional distributions, independence of the marginal distributions and marginal moments. Maximum likelihood estimation method is discussed and examined through Monte Carlo simulation. Finally, the proposed model is used to analyze some soccer data sets.
Keywords: Bivariate unit-log-symmetric distribution, bivariate log-symmetric distribution, MCMC, proportion data, soccer data, maximum likelihood estimation
2010 Mathematics Subject Classifications: 60E05, 62Exx, 62Fxx
1. Introduction
Bivariate distributions over the unit-square have been discussed in detail in the literature; see, e.g. Barreto-Souza and Lemonte [4] and Özbilen and Genç [20]. Many of them are based on the beta distribution and its generalizations; see Arnold and Ng [2] and Nadarajah et al. [18]. Models of this type have been studied since the 1980s. Some other distributions on the unit square are based on generalized arcsine and inverse Gaussian distributions. A recent model, the bivariate unit-sinh-normal distribution, is based on the bivariate Birnbaum-Saunders distribution; see Martinez-Flóres et al. [16]. Bivariate distributions over the unit square arise naturally in comparing indices, rates or proportions in the interval .
In this paper, we study the bivariate unit-log-symmetric (BULS) distribution defined over the unit-square, obtained as a modification of the bivariate log-symmetric (BLS) distribution introduced by Vila et al. [25]. The definitions of BLS and BULS distributions are given in Section 2, along with some special cases of the BULS. In Section 3, we discuss some properties of the new model, including a stochastic representation, marginal quantiles, and the conditional distributions of BULS. We derive compact formulas for the conditional densities, using the distribution functions of normal, Student-t, hyperbolic, Laplace and slash distributions. One of the uses of having closed formulas for the conditional densities (of the BULS model), for example, is in studying Heckman-type selection models (see Heckman [11]) when the selection variables have bounded support. In addition, we derive the distribution of the squared Mahalanobis distance of a random vector with BULS distribution, and present a necessary condition for the independence of the components of and formulas for the moments of and . In Section 4, the log-likelihood function and the likelihood equations for the BULS distribution are presented. In Section 5, we carry out a Monte Carlo simulation study to evaluate the performance of the ML estimators by means of their bias, root mean square error and coverage probability. In Section 6, we present two applications to soccer data. Specifically, in Section 6.1, we model the vector , where represents the time elapsed until a first kick goal (of any team) and the time elapsed until a goal of any type of the home team, and show that specific BULS distributions are suitable for modelling . In Section 6.2, we consider the data of 2022 FIFA World Cup wherein the components of the vector represent the pass completion proportions of medium passes (14 to 18 meters) and long passes (longer than 37 meters). We then demonstrate that these data can also be fitted well by BULS distributions.
2. Bivariate unit-log-symmetric model
In this section, we describe the bivariate unit-log-symmetric model (BULS). To define this model, we first need to describe the bivariate log-symmetric distribution (BLS) defined in Vila et al. [25].
2.1. BLS family of distributions
Following Vila et al. [25], a continuous random vector is said to have a bivariate log-symmetric (BLS) distribution if its joint probability density function (PDF) is given by
| (1) |
where , with being the parameter vector, , , i=1,2 and . Furthermore, is the partition function, that is,
| (2) |
and is a scalar function referred to as the density generator (see Fang et al. [8]). The second integral in (2) is a consequence of a change of variables; for more details, see Proposition 3.1 of Vila et al. [25]. When a random vector is BLS distributed, with parameter vector , we denote it by .
2.2. BULS family of distributions
We say that a continuous random vector has a bivariate unit-log-symmetric (BULS) distribution with parameter vector , denoted by , if its PDF is, for , given by
| (3) |
where , with , i=1,2, , and and are as given in (2). We shall prove later that the BULS PDF in (3) is obtained by taking , i=1,2, with .
Table 1 presents some examples of bivariate unit-log-symmetric distributions.
Table 1.
Partition functions and density generators for some BULS distributions.
| Distribution | Parameter | ||
|---|---|---|---|
| Bivariate unit-log-normal | – | ||
| Bivariate unit-log-Student-t | |||
| Bivariate unit-log-hyperbolic | |||
| Bivariate unit-log-Laplace | π | – | |
| Bivariate unit-log-slash | q>0 |
In Table 1, , t>0, is the complete gamma function, , u>0, is the modified Bessel function of the third kind with index λ (see Appendix of Kotz et al. [14]), and is the lower incomplete gamma function.
Let . From (3), it is clear that the random vector , with
| (4) |
has a bivariate elliptically symmetric (BSY) distribution (see p. 592 in Balakrishnan and Lai [3]); that is, the PDF of is
| (5) |
where , with being the parameter vector and is the partition function defined in (2). In this case, we shall use the notation .
It is a simple task to observe that the joint cumulative distribution function (CDF) of , denoted by , is given by
wherein and denote the CDFs of and , respectively.
Remark 2.1
As the components of are defined as increasing functions of the components of , by invariance under monotone transformations property of copulas, both will have the same associated copula (for a review of copula theory, see, e.g. Nelsen [19]). Therefore, the analysis of dependence measures for the variables and that depend only on the copula is equivalent to the analysis of those for and . For the normal copula and Student's t copula, some dependency measures such as Kendall's tau, Spearman's rho, Blest's measure of rank correlation, and coefficients of tail dependence were all studied by Roncalli [21].
3. Some basic properties of the model
In this section, some mathematical properties of the bivariate unit-log-symmetric distribution are established.
3.1. Stochastic representation
Proposition 3.1
The random vector has a BULS distribution if
where and with , , R, and D being mutually independent random variables, , , and , i=1,2. The random variable D is positive and has PDF Further, the positive random variable R has its PDF as
Proof.
It is well-known that (see Proposition 3.2 of Vila et al. [25]) the random vector has a BLS distribution if
(6) Moreover, from (4), , i=1,2. Hence, the result.
The following lemma provides a slight simplification in the representation of Proposition 3.1. This result plays a role in the next subsections, since all the probabilistic characteristics that depend on the distribution of will be simplified since it has the same distribution as .
Lemma 3.2
For a Borel subset B of , we have
In other words, and have the same distribution.
Proof.
It is clear that the density of is related to the joint density by
(7) From Equation (13) of Saulo et al. [23], the joint PDF of and is given by
(8) and so the integral in (7) is
(9) Using the identity
the integral in (9) can be expressed as
Making the change of variable , the above integral becomes
where, in the last line, we have used (8). Hence,
(10) Now, from (10), it is clear that and are equal in distribution.
3.2. Marginal quantiles
Given , let be the p-quantile of , for i=1,2. By using the stochastic representation in Proposition 3.1, for , we have
and
Hence, the p-quantiles and of and , respectively, are given by
and
where in the last equality we have used the fact that and have the same distribution (see Lemma 3.2). Hence, the p-quantiles and are given by
respectively.
3.3. Conditional distributions
Before enunciating and proving the main result (Theorem 3.4) of this subsection, we establish the following technical lemma which gets used subsequently.
Lemma 3.3
If , then the PDF of is given by
(11) where , i=1,2, and are as defined in (3), and and are as given in Proposition 3.1.
Proof.
If , then . So, the conditional distribution of , given , is the same as the distribution of
Consequently,
Then, by differentiating with respect to , (11) is readily obtained.
The following result provides a simple formula for determining the conditional distribution of , given , whenever the marginal and conditional distributions of are known. This result is essential for studying Heckman-type selection models (see Heckman [11]) when the selection variables have unitary support.
Theorem 3.4
For a Borel subset B of , let us define the following Borel set:
(12) where is as in (3). If , then the PDF of is given by
in which is as in (3), is as in (12), and and are as given in Proposition 3.1.
Proof.
Let B be a Borel subset of . Note that
As and , where is as given in (12) with r=0, the term on the right-hand side of the above identity becomes
By using the formula for provided in Lemma 3.3, the above expression becomes
where and , i=1,2, are as in (3). Finally, by applying the change of variable , the above expression is
We have thus proved that
Finally, by combining the above identity with Lemma 3.2, the required result is obtained.
Using Theorem 3.4, for each generator () in Table 1, we present closed formulas for the conditional densities of corresponding to bivariate unit-log-normal (Corollary 3.5), bivariate unit-log-Student-t (Corollary 3.6), bivariate unit-log-hyperbolic (Corollary 3.7), bivariate unit-log-Laplace (Corollary 3.8) and bivariate unit-log-slash (Corollary 3.9) distributions.
Corollary 3.5 Gaussian generator —
Let and be the generator of the bivariate unit-log-normal distribution. Then, for each Borel subset B of , the PDF of is given by (for )
where and is the standard normal PDF. Further, and are as in (3), and is as in (12).
Proof.
It is well-known that the bivariate log-normal distribution has a stochastic representation as in (6), where and , and (see Abdous [1]). Hence, and Then, by applying Theorem 3.4, the required result follows.
Corollary 3.6 Student-t generator —
Let and , , be the generator of the bivariate unit-log-Student-t distribution with ν degrees of freedom. Then, for each Borel subset B of , the PDF of is given by (for )
where and is the standard Student-t PDF with ν degrees of freedom.
Proof.
It is well-known that the bivariate log-Student-t distribution has a stochastic representation as in (6), where and (Student-t with ν degrees of freedom), and (see Corollary 3.7 of Vila et al. [25])
Hence, and
By applying Theorem 3.4, the required result follows.
Corollary 3.7 Hyperbolic generator —
Let and be the generator of the bivariate unit-log-hyperbolic distribution. Then, for each Borel subset B of , the PDF of is given by (for )
where and is the generalized hyperbolic (GH) PDF (see Definition A.1 in the Appendix).
Proof.
It is well-known that the bivariate log-hyperbolic distribution has a stochastic representation as in (6), where and (see Subsection 2.1 of Deng and Yao [7]). Moreover, the distribution of , given , is (Proposition A.2). Then, and . By applying Theorem 3.4, the required result follows.
Corollary 3.8 Laplace generator —
Let and be the generator of the bivariate unit-log-Laplace distribution. Then, for each Borel subset B of , the PDF of is given by (for )
where and is the Laplace PDF with scale parameter , and is as defined in Corollary 3.7.
Proof.
It is well-known that the bivariate log-Laplace distribution has a stochastic representation as in (6), where and (see Subsection 5.1.4 of Kotz et al. [14]). Further, the distribution of , given , is (Proposition A.3). Hence, and By applying Theorem 3.4, the required result follows.
Corollary 3.9 Slash generator —
Let and , be the generator of the bivariate unit-log-slash distribution. Then, for each Borel subset B of , the PDF of is given by (for )
where and is the classical slash PDF, and , where is the generalized hyperbolic (ESL) PDF (see Definition A.4 in the Appendix).
Proof.
It is well-known that the bivariate log-slash distribution has a stochastic representation as in (6), where and (see Section 2 of Wang and Genton [26]). Moreover, the distribution of , given , is (Proposition A.5). Hence, and By applying Theorem 3.4, the required result follows.
Table 2 below presents some examples of conditional PDFs corresponding to all the bivariate unit-log-symmetric distributions presented earlier in Table 1.
Table 2.
Conditional densities of and density generators for some BULS distributions.
| Distribution | ||
|---|---|---|
| Bivariate unit-log-normal | ||
| Bivariate unit-log-Student-t | ||
| Bivariate unit-log-hyperbolic | ||
| Bivariate unit-log-Laplace | ||
| Bivariate unit-log-slash |
3.4. Independence
Proposition 3.10
Let . If and the density generator in (3) is such that
(13) for some density generators and , then and are independent.
Proof.
The proof follows the same steps as the proof of Proposition 3.11 of Vila et al. [25]. For the sake of completeness, however, we present it here.
Let . From (13), the joint density (3) of is such that
(14) where , and and are as in (3). Integrating (14) in terms of and , we obtain
and consequently, . Therefore,
Moreover, it is easy to verify that and are PDFs corresponding to univariate symmetric random variables (see Vanegas and Paula [24]). Then, and are statistically independent, and even more, , for i=1,2 (see Proposition 2.5 of James [13]).
Remark 3.11
In Table 1, the density generator of the bivariate unit-log-normal is the unique one that satisfies (13).
3.5. Marginal moments and correlation function
For , , it is clear that , for any r>0 and i=1,2. Therefore, the positive moments of always exist.
In general, for any , the moments of , i=1,2, admit the following representations:
| (15) |
where in the last equality we have used the fact that and have the same distribution (see Lemma 3.2). Here, and are as given in Proposition 3.1.
Closed-forms expressions for the product moments can be derived as follows. Law of total expectation gives
| (16) |
Lemma 3.3 provides a formula for the PDF of . Moreover, . By using these formulas in (16), the product moments is equal to
where and , for i=1,2, are as defined in (3).
The conditional and unconditional distributions of vector corresponding to bivariate models of Table 1 are well-known (see proofs of Corollaries 3.5–3.9). Therefore, at least numerically, the respective product moments can be determined.
For illustrative purposes, we now present the product moments and correlation function formula only for the bivariate unit-log-normal model. In this case, , and (see proof of Corollary 3.5). So, using the last formula above for , we have
Consequently, the correlation function between and , denoted by , can be expressed as
| (17) |
where and are as given in (15), and and , for i=1,2, are as defined in (3).
Table 3 shows some values of the correlation function in (17) for different choices of ρ in the special case when and .
Table 3.
Correlation values for different choices of ρ.
| ρ | −0.9000 | −0.7500 | −0.5000 | −0.2500 | −0.1000 | 0.0000 | 0.1000 | 0.2500 | 0.5000 | 0.7500 | 0.9000 |
| −0.9413 | −0.8406 | −0.6678 | −0.5275 | −0.4481 | −0.3929 | −0.2880 | −0.1463 | 0.1683 | 0.5339 | 0.7648 |
Remark 3.12
Given the first two moments of BULS model, some useful bounds for can be provided. For example,
3.6. Squared mahalanobis distance
Given a probability distribution F on with mean , , and positive-definite covariance matrix , and given two points and in , the squared Mahalanobis distance between them with respect to F is
Let with location (mean) vector and covariance matrix
where , , and are the moments given in (15), and is as given in (17). The (random) squared Mahalanobis distance of from is then
The following two results provide formulas for the PDF and CDF of . The respective proofs can be found in the Appendix (Section 1).
Proposition 3.13
If , then the CDF of , denoted by , is given by
with being as in (5), , and
(18)
Proposition 3.14
If , then the PDF of , denoted by , is given by
4. Maximum likelihood estimation of model parameters
Let be a bivariate random sample of size n from the distribution with PDF as in (3), and let be the corresponding observations of . Then, the log-likelihood function for , without the additive constant, is given by
where and
In the case when a supremum exists, it must satisfy the following likelihood equations:
| (19) |
with
| (20) |
where we have used the notation
| (21) |
with
Observe that the likelihood equations in (19) can be written as
Any nontrivial root of the above likelihood equations is an ML estimator in the loose sense. When the parameter value provides the absolute maximum of the log-likelihood function, it becomes the ML estimator in the strict sense.
In the following proposition, we discuss the existence of the ML estimator when all other parameters are known.
Proposition 4.1
Let be a density generator such that
(22) for some real-valued function with , where , , are as in (21). If the parameters and are all known, then (20) has at least one root in the interval .
Proof.
The proof of this result follows by a direct application of Intermediate value theorem. For more details, see Proposition 5.1 of Vila et al. [25].
By using Morse theory, Mäkeläinen et al. [15] established that, under some regularity conditions, there is a unique MLE for . For the BULS model, no closed-form solution to the maximization problem is available, and the MLE can only be found by means of numerical optimization. Under mild regularity conditions (Cox and Hinkley [5] and Davison [6]), the asymptotic distribution of the ML estimator of is as follows: , where is the zero mean vector and is the inverse expected Fisher information matrix. The main use of the last convergence is to construct confidence regions and to perform hypothesis testing for (see Davison [6]).
4.1. Residual analysis
To assess the goodness of fit and departures from the assumptions of the model, we consider the stochastic relation
| (23) |
where the random vector follows a BULS distribution, with and . The corresponding CDF and PDF of (23) are given respectively by
where is as in (2).
For example, upon taking and (see Table 1), we get (chi-square with 2 degrees of freedom). Next, upon taking and (see Table 1), we have , where denotes the F-distribution with 2 and ν degrees of freedom. Therefore, we can use the relation in (23) to check the goodness of fit, contrasting the empirical distribution against the theoretical one. Specifically, quantile-quantile (QQ) plots and the Kolmogorov-Smirnov test can then be used to assess the fit.
5. Simulation study
In this section, we carry out a Monte Carlo simulation study for evaluating the performance of the ML estimators of the parameters of BULS distribution. For illustrative purposes, we only present the results for the bivariate unit-log-normal model. The simulation scenario considered is as follows: 1,000 Monte Carlo replications, sample size , vector of true parameters , (negative values of ρ produce the same results and so are omitted). To study the performance of the ML estimators, we computed the bias, root mean square error (RMSE), and coverage probability (CP) as
where θ and are the true parameter value and its i-th ML estimate, N is the number of Monte Carlo replications, is an indicator function taking the value 1 if , and 0 otherwise, where and are the i-th upper and lower limit estimates of the 95% confidence interval. We expect that, as the sample size increases, the bias and RMSE would decrease, and the CP would approach the 95% nominal level.
The obtained simulation results are presented in Figure 1. We observe that the results obtained for the chosen bivariate unit-log-normal distribution are as expected in that as the sample size increases, the bias and RMSE both decrease and that the CP approaches the 95% nominal level. Finally, in general, the results do not seem to depend on the parameter ρ.
Figure 1.
Monte Carlo simulation results for the bivariate unit-log-normal model.
6. Application to soccer data
In this section, two real soccer data sets, corresponding to times elapsed until scored goals of UEFA Champions League and pass completions of 2022 FIFA World Cup, are analyzed. The UEFA Champions League data set was extracted from Meintanis [17], while the 2022 FIFA World Cup data set is new and is analyzed for the first time here.
6.1. UEFA champions league
We consider a bivariate data set on the group stage of the UEFA Champions League for the seasons 2004/05 and 2005/06. Only matches with at least one goal scored directly from a kick by any team, and with at least one goal scored by the home team, are considered here; see Meintanis [17]. The first variable () is the time (in minutes) elapsed until a first kick goal is scored by any team, and the second one is the time (in minutes) elapsed until a first goal of any type is scored by the home team. The times are divided by 90 minutes (full game time) to obtain data on the unit square ; see Table A1.
Table 4 provides descriptive statistics for the variables and , including minimum, median, mean, maximum, standard deviation (SD), coefficient of variation (CV), coefficient of skewness (CS), and coefficient of kurtosis (CK). We observe in the variable , the mean and median to be, respectively, 0.454 and 0.456, i.e. the mean is almost equal to the median, which indicates symmetry in the data. The CV is , which means a moderate level of dispersion is present around the mean. Furthermore, the CS value also confirms the symmetry nature. The variable has mean 0.365 and median 0.311, which indicates a small positively skewed feature in the distribution of the data. Moreover, the CV value is , showing a moderate level of dispersion around the mean. The CS confirms the small skewed nature and the CK value indicates the small kurtosis feature in the data.
Table 4.
Summary statistics for the UEFA Champions League data set.
| Variables | n | Minimum | Median | Mean | Maximum | SD | CV | CS | CK |
|---|---|---|---|---|---|---|---|---|---|
| 37 | 0.022 | 0.456 | 0.454 | 0.911 | 0.224 | 49.274 | 0.164 | −0.930 | |
| 37 | 0.022 | 0.311 | 0.365 | 0.944 | 0.254 | 69.475 | 0.522 | −0.839 |
The ML estimates and the standard errors (in parentheses) for the bivariate unit-log-symmetric model parameters are presented in Table 5. The extra parameters, associated with the log-Student-t, log-hyperbolic and log-slash models, were estimated by using the profile log-likelihood; see Saulo et al. [22]. Table 5 also presents the log-likelihood value, and the values of the Akaike (AIC) and Bayesian (BIC) information criteria. We observe that the log-hyperbolic model provides better fit than all other models based on the values of log-likelihood, AIC and BIC.
Table 5.
ML estimates (with standard errors in parentheses), and the log-likelihood, AIC and BIC values for the indicated bivariate unit-log-symmetric models.
| Distribuiton | Log-likelihood | AIC | BIC | ||||||
|---|---|---|---|---|---|---|---|---|---|
| Log-normal | 0.5288* | 0.3414* | 0.8865* | 1.1355* | 0.4956* | – | −36.693 | 83.386 | 91.441 |
| (0.0771) | (0.0637) | (0.1031) | (0.1320) | (0.1240) | |||||
| Log-Student-t | 0.5541* | 0.3783* | 0.7431* | 0.9734* | 0.4723* | 7 | −35.487 | 80.974 | 89.029 |
| (0.0751) | (0.0672) | (0.1033) | (0.1308) | (0.1463) | |||||
| Log-hyperbolic | 0.5458* | 0.3816* | 0.8456* | 1.0950* | 0.4893* | 2 | −35.470 | 80.940 | 88.996 |
| (0.0752) | (0.0677) | (0.1162) | (0.1462) | (0.1428) | |||||
| Log-Laplace | 0.5680* | 0.5679* | 0.9928* | 1.3231* | 0.5281* | – | −36.009 | 82.019 | 90.073 |
| (0.0020) | (0.0021) | (0.1692) | (0.2164) | (0.1639) | |||||
| Log-slash | 0.5629* | 0.3715* | 0.6203* | 0.8302* | 0.4472* | 5 | −35.560 | 81.120 | 89.174 |
| (0.0749) | (0.0666) | (0.0847) | (0.1096) | (0.1472) |
Note: * significant at 5% level.
Figure 2 shows the QQ plots of , defined in (23), for the bivariate unit-log-symmetric models considered in Table 5. The QQ plot is a plot of the empirical quantiles of against the theoretical quantiles of the respective reference distribution (see Section 4.1). Therefore, points falling along a straight line would indicate a good fit. From Figure 2, we see clearly that, with the exception of log-Student-t case, the empirical distributions of in the considered models conform relatively well with their reference distributions. We also see that, in all the cases, there is a point away from the reference line, which may be an outlier.
Figure 2.
QQ plot of for the indicated models. (a) Log-normal (b) Log-Student-t (c) Log-hyperbolic (d) Log-Laplace (e) Log-slash
We can use the Kolmogorov-Smirnov test [9] to verify the assumption of the theoretical distribution of . As the log-hyperbolic model provided the best fit in terms of log-likelihood, AIC and BIC, we only report the result for this case. The p-value is 0.9998, and therefore we can not reject the null hypothesis that the sample is drawn from the reference distribution.
6.2. 2022 FIFA world cup
We now use the data on the 2022 FIFA World Cup to illustrate the model developed in the preceding sections. These data are available at https://www.kaggle.com/. The first variable () is the medium pass completion proportion, that is, successful passes between 14 and 18 meters. The second variable () is the long pass completion proportion, namely, passes longer than 37 meters; see Table A1.
Table 6 provides descriptive statistics for the variables and . We observe that the variable has mean equal to the median, which indicates symmetry in the data. The CV is , which means a low level of dispersion around the mean. Furthermore, the CS value is relatively small, which also confirms the symmetric nature. The variable has mean equal to 0.550 and median equal to 0.556, which indicates a symmetric feature in the distribution of the data. Moreover, the CV value is , showing a low level of dispersion around the mean. The CS confirms the symmetric nature and the CK value indicates the small kurtosis feature in the data.
Table 6.
Summary statistics for the 2022 FIFA World Cup data set.
| Variables | n | Minimum | Median | Mean | Maximum | SD | CV | CS | CK |
|---|---|---|---|---|---|---|---|---|---|
| 32 | 0.769 | 0.860 | 0.860 | 0.931 | 0.038 | 4.376 | −0.373 | −0.194 | |
| 32 | 0.427 | 0.556 | 0.550 | 0.751 | 0.075 | 13.713 | 0.308 | −0.425 |
Table 7 presents the estimation results for the bivariate unit-log-symmetric models, and these reveal that the log-normal model provides better fit than all other models based on the values of log-likelihood, AIC and BIC.
Table 7.
ML estimates (with standard errors in parentheses), and the log-likelihood, AIC and BIC values for the indicated bivariate unit-log-symmetric models.
| Distribuiton | Log-likelihood | AIC | BIC | ||||||
|---|---|---|---|---|---|---|---|---|---|
| Log-normal | 1.9872* | 0.7953* | 0.1364* | 0.2089* | 0.7343* | – | 20.791 | −31.581 | −24.252 |
| (0.0479) | (0.0294) | (0.0171) | (0.0261) | (0.0815) | |||||
| Log-Student-t | 1.9954* | 0.7936* | −0.1257* | −0.1949* | 0.7423* | 9 | 20.130 | −30.260 | −22.931 |
| (0.0485) | (0.0299) | (0.0178) | (0.0271) | (0.0868) | |||||
| Log-hyperbolic | 1.9908* | 0.7942* | 0.3956* | 0.6088* | 0.7378* | 10 | 20.618 | −31.236 | −23.907 |
| (0.0482) | (0.0296) | (0.0523) | (0.0800) | (0.0841) | |||||
| Log-Laplace | 1.9908* | 0.8089* | 0.1563* | 0.2425* | 0.7471* | – | 16.915 | −23.830 | −16.501 |
| (0.0023) | (0.0021) | (0.0278) | (0.0415) | (0.0938) | |||||
| Log-slash | 1.9897* | 0.7935* | 0.1173* | 0.1802* | 0.7392* | 8 | 20.613 | −28.919 | −23.898 |
| (0.0482) | (0.0295) | (0.0154) | (0.0237) | (0.0844) |
Note: * significant at 5% level.
Figure 3 shows the QQ plots of (see Section 4.1) for the bivariate unit-log-symmetric models considered in Table 7. We see clearly that the log-normal model provides better fit than all other bivariate unit-log-symmetric models. By applying the Kolmogorov-Smirnov test for the log-normal case, the corresponding p-value is found to be 0.9993, and therefore, the null hypothesis is not rejected.
Figure 3.
QQ plot of for the indicated models. (a) Log-normal (b) Log-Student-t (c) Log-hyperbolic (d) Log-Laplace (e) Log-slash.
7. Conclusions
In this paper, we have proposed a family of bivariate distributions over the unit square. By suitably defining the density generator, we can transform any distribution over the real line into a bivariate distribution over the region . Such a model has several potential applications, since the simultaneous modeling of quantities like proportions, rates or indices frequently arises in applied sciences like economics, medicine, engineering and social sciences. We have discussed several theoretical properties such as stochastic representation, quantiles, conditional distributions, independence and moments. We have also carried out a Monte Carlo simulation study and have demonstrated some applications in the analyses soccer data. The present research can be extended in several possible directions. By changing the density generator, numerous special forms of the BULS distribution can be constructed. Furthermore, generalizations to higher dimensions can be studied. We are currently working in these directions and hope to report the findings in a future paper.
Appendices.
Appendix 1. Some proofs and additional results
For the convenience of readers, we present here some complementary results relating to Section 3.3 and the proofs of Propositions 3.13 and 3.14.
Definition A.1
We say that a random variable X follows a univariate generalized hyperbolic (GH) distribution, denoted by , if its PDF is given by
Here, is the modified Bessel function of the third kind with index r, and is a scale parameter.
The following result has appeared in a multivariate version in Proposition 3 of Deng and Yao [7].
Proposition A.2 Hyperbolic generator —
Let be a random vector as in Proposition 3.1. If , then the conditional distribution of , given , is and both of its unconditional distributions are .
Proof.
By using (8), the joint PDF of and is (see Table 1)
(A1) So, the marginal PDF of is given by
By using Formula 6 of Section 3.46–3.48 of (see p. 364 of Gradshteyn and Ryzhik [10]) that , the above integral becomes
(A2) Now, as , the above espression becomes
which proves that . Similarly, we can show that , as well.
On the other hand, from (A1) and (A2), and by using the well-known identity , the conditional PDF of , given , is obtained as
which completes the proof.
The following result has also appeared in a multivariate version in Theorem 6.7.1 of Kotz et al. [14].
Proposition A.3 Laplace generator —
Let be a random vector as in Proposition 3.1. If , then the conditional distribution of , given , is and both of its unconditional distributions are .
Proof.
By (8) and using the definitions of and in Table 1, we have
(A3) We then find the marginal density of to be
(A4) where is the Laplace PDF with scale parameter ; that is, . Similarly, we can show that , as well.
On the other hand, by using (A3), (A4) and the well-known identity , the conditional PDF of , given , is obtained as
Then, from Definition A.1, we simply have , as required.
Definition A.4
We say that a random variable X follows a univariate extended slash (ESL) distribution, denoted by , if its PDF is given by
where ϕ denotes the PDF of the standard normal distribution.
If we now choose a=0, the classical slash (SL) PDF is obtained, given by
In this case, we denote it by .
Proposition A.5 Slash generator —
Let be a random vector as in Proposition 3.1. If , then the conditional distribution of , given , is and both of its unconditional distributions are .
Proof.
By (8) and using the definition of in Table 1, the joint PDF of and is given by
(A5) So, the marginal PDF of is
(A6) which proves that (see Definition A.4), . Similarly, we can show that .
On the other hand, by (A5) and (A6), the conditional PDF of , given , is obtained as
From Definition A.4, we then find that , as required.
Proof of Proposition 3.13.
Writing and , a simple algebraic manipulation shows that
Upon using the law of total expectation, the corresponding CDF of is given by (for 0<w<1)
(A7) If , then . So, the conditional distribution of , given , is the same as the distribution of
Hence, the PDF of , given , is given by
By using Lemma 3.3, the above density can be expressed as
(A8) where , and , and are as given in (18).
Replacing (A8) in (A7), we get the following formula for the CDF of the squared Mahalanobis distance:
Moreover, the vector has a standard elliptical distribution (spherical) (see Item 2.11 of Saulo et al. [23]), denoted by , where is the density generator in (5). Hence, the required result.
Proof of Proposition 3.14.
By differentiating (in Proposition 3.13) with respect to w and then by using the following well-known Leibniz integral rule
the required result follows.
Appendix 2. Data sets
Table A1.
UEFA Champions League and 2022 FIFA World Cup data sets.
| UEFA | FIFA | |||
|---|---|---|---|---|
| W1 | W2 | W1 | W2 | |
| 1 | 0.289 | 0.222 | 0.888 | 0.541 |
| 2 | 0.700 | 0.200 | 0.815 | 0.474 |
| 3 | 0.211 | 0.211 | 0.907 | 0.624 |
| 4 | 0.733 | 0.944 | 0.891 | 0.606 |
| 5 | 0.444 | 0.444 | 0.827 | 0.517 |
| 6 | 0.544 | 0.544 | 0.898 | 0.557 |
| 7 | 0.089 | 0.089 | 0.856 | 0.462 |
| 8 | 0.767 | 0.789 | 0.861 | 0.618 |
| 9 | 0.433 | 0.433 | 0.890 | 0.603 |
| 10 | 0.911 | 0.533 | 0.860 | 0.477 |
| 11 | 0.800 | 0.800 | 0.920 | 0.646 |
| 12 | 0.733 | 0.689 | 0.894 | 0.587 |
| 13 | 0.278 | 0.100 | 0.913 | 0.648 |
| 14 | 0.456 | 0.033 | 0.849 | 0.471 |
| 15 | 0.178 | 0.833 | 0.781 | 0.427 |
| 16 | 0.200 | 0.200 | 0.828 | 0.442 |
| 17 | 0.244 | 0.156 | 0.864 | 0.581 |
| 18 | 0.467 | 0.467 | 0.820 | 0.527 |
| 19 | 0.022 | 0.022 | 0.846 | 0.526 |
| 20 | 0.400 | 0.578 | 0.879 | 0.601 |
| 21 | 0.378 | 0.378 | 0.860 | 0.481 |
| 22 | 0.589 | 0.433 | 0.885 | 0.616 |
| 23 | 0.600 | 0.078 | 0.862 | 0.592 |
| 24 | 0.567 | 0.311 | 0.769 | 0.463 |
| 25 | 0.844 | 0.711 | 0.845 | 0.495 |
| 26 | 0.711 | 0.167 | 0.846 | 0.489 |
| 27 | 0.289 | 0.533 | 0.931 | 0.751 |
| 28 | 0.178 | 0.178 | 0.863 | 0.555 |
| 29 | 0.489 | 0.144 | 0.856 | 0.447 |
| 30 | 0.278 | 0.156 | 0.879 | 0.569 |
| 31 | 0.611 | 0.122 | 0.812 | 0.613 |
| 32 | 0.544 | 0.544 | 0.841 | 0.594 |
| 33 | 0.267 | 0.267 | – | – |
| 34 | 0.489 | 0.333 | – | – |
| 35 | 0.467 | 0.033 | – | – |
| 36 | 0.300 | 0.522 | – | – |
| 37 | 0.311 | 0.311 | – | – |
Funding Statement
Roberto Vila and Helton Saulo gratefully acknowledge financial support from CNPq, CAPES and FAPDF, Brazil.
Disclosure statement
No potential conflict of interest was reported by the author(s).
References
- 1.Abdous B., Fougères A.-L., and Ghoudi K., Extreme behaviour for bivariate elliptical distributions, Canad. J. Statist. 33 (2005), pp. 317–334. [Google Scholar]
- 2.Arnold B.C. and Ng H.K.T., Flexible bivariate beta distributions, J. Multivariate Anal. 102 (2011), pp. 1194–1202. [Google Scholar]
- 3.Balakrishnan N. and Lai C-D., Continuous Bivariate Distributions, Springer-Verlag, New York, 2009. [Google Scholar]
- 4.Barreto-Souza W. and Lemonte A., Bivariate Kumaraswamy distribution: properties and a new method to generate bivariate classes, Statistics 47 (2013), pp. 1321–1342. [Google Scholar]
- 5.Cox D.R. and Hinkley D.V., Theoretical Statistics, Chapman and Hall, London, England, 1974. [Google Scholar]
- 6.Davison A.C., Statistical Models, Cambridge University Press, Cambridge, England, 2008. [Google Scholar]
- 7.Deng X. and Yao J., On the property of multivariate generalized hyperbolic distribution and the stein-type inequality, Comm. Statist. Theory Methods 47 (2018), pp. 5346–5356. [Google Scholar]
- 8.Fang K.T., Kotz S., and Ng K.W., Symmetric Multivariate and Related Distributions, Chapman and Hall, London, England, 1990. [Google Scholar]
- 9.Govindarajulu Z., Nonparametric Inference, World Scientific, Publishers, Singapore, 2007. [Google Scholar]
- 10.Gradshteyn I.S. and Ryzhik I.M., Table of Integrals, Series and Products, Academic Press, San Diego, 2000. [Google Scholar]
- 11.Heckman J.J., Sample selection bias as a specification error, Econometrica 47 (1979), pp. 153–161. [Google Scholar]
- 12.Hössjer O. and Sjölander A., Sharp lower and upper bounds for the covariance of bounded random variables, Statist. Probab. Lett. 182 (2022), p. 109323. [Google Scholar]
- 13.James B.R., Probabilidade: Um Curso Em Nível Intermediário, Projeto Euclides, Brazil, 2004. [Google Scholar]
- 14.Kotz S., Kozubowski T.J., and Podgórski K., The Laplace Distribution and Generalizations, John Wiley & Sons, New York, 2001. [Google Scholar]
- 15.Mäkeläinen T., Schmidt K., and Styan G.P.H., On the existence and uniqueness of the maximum likelihood estimate of a vector-valued parameter in fixed-size samples, Ann. Statist. 9 (1981), pp. 758–767. [Google Scholar]
- 16.Martínez-Flórez G., Lemonte A.J., Moreno-Arenas G., and Tovar-Falón R., The bivariate unit-sinh-normal distribution and its related regression model, Mathematics 10 (2022), pp. 3125. [Google Scholar]
- 17.Meintanis S.G., Test of fit for Marshall-Olkin distributions with applications, J. Statist. Plann. Inference 137 (2007), pp. 3954–3963. [Google Scholar]
- 18.Nadarajah S., Shih S.H., and Nagar D.K., A new bivariate beta distribution, Statistics 51 (2017), pp. 455–474. [Google Scholar]
- 19.Nelsen R.B., An Introduction to Copulas, 2nd ed., Springer-Verlag, New York, 2006. [Google Scholar]
- 20.Özbilen Ö. and Genç A.İ., A bivariate extension of the Omega distribution for two-dimensional proportional data, Math. Slovaca 72 (2022), pp. 1605–1622. [Google Scholar]
- 21.Roncalli T., Copulas and dependence modeling, in Handbook of Financial Risk Management, 1st ed., Chapman and Hall/CRC, Press, Boca Raton, FL, 2020.
- 22.Saulo H., Dasilva A., Leiva V., Sánchez L., and Fuente-Mella H.L., Log-symmetric quantile regression models, Stat. Neerl. 76 (2022), pp. 124–163. [Google Scholar]
- 23.Saulo H., Vila R., Cordeiro S.S., and Leiva V., Bivariate symmetric Heckman models and their characterization, J. Multivariate Anal. 193 (2023), p. 105097. [Google Scholar]
- 24.Vanegas L.H. and Paula G.A., Log-symmetric distributions: statistical properties and parameter estimation, Braz. J. Probab. Stat. 30 (2016), pp. 196–220. [Google Scholar]
- 25.Vila R., Balakrishnan N., Saulo H., and Protazio A., Bivariate log-symmetric models: theoretical properties and parameter estimation, preprint (2022). Available at arXiv:2211.13839, pp. 1–23.
- 26.Wang J. and Genton M., The multivariate skew-slash distribution, J. Statist. Plann. Inference 136 (2006), pp. 209–220. [Google Scholar]



