A density based empirical likelihood approach for testing bivariate normality

Gregory Gurevich; Albert Vexler

doi:10.1080/00949655.2018.1476516

. Author manuscript; available in PMC: 2019 Jun 20.

Published in final edited form as: J Stat Comput Simul. 2018 May 25;88(13):2540–2560. doi: 10.1080/00949655.2018.1476516

A density based empirical likelihood approach for testing bivariate normality

Gregory Gurevich ^a, Albert Vexler ^b,¹

PMCID: PMC6586021 NIHMSID: NIHMS1512459 PMID: 31223176

Abstract

Sample entropy based tests, methods of sieves and Grenander estimation type procedures are known to be very efficient tools for assessing normality of underlying data distributions, in one-dimensional nonparametric settings. Recently, it has been shown that the density based empirical likelihood (EL) concept extends and standardizes these methods, presenting a powerful approach for approximating optimal parametric likelihood ratio test statistics, in a distribution-free manner. In this paper, we discuss difficulties related to constructing density based EL ratio techniques for testing bivariate normality and propose a solution regarding this problem. Toward this end, a novel bivariate sample entropy expression is derived and shown to satisfy the known concept related to bivariate histogram density estimations. Monte Carlo results show that the new density based EL ratio tests for bivariate normality behave very well for finite sample sizes. In order to exemplify the excellent applicability of the proposed approach, we demonstrate a real data example related to a study of biomarkers associated with myocardial infarction.

Keywords: Bivariate normality, Density estimation, Empirical likelihood, Entropy, Goodness-of-fit, Histogram density estimation

1. Introduction and the statement of problem

Various statistical topics dealt with bivariate normally distributed data have broadened their appeal in recent theoretical and applied publications that provide a long cohort of new methods for multivariate statistical analysis (e.g., Balakrishnan and Lai [2]). This motivates the growing need for developing and evaluating powerful tests for bivariate normality (e.g., Balakrishnan and Lai [2]; Hawkins [7]; Kowalski [13]; Mecklin and Mundfrom [19]). Testing bivariate data for normality is much more difficult in practice than when data are univariate. Commonly techniques for detecting departures from the bivariate normal distribution are developed using modifications of conventional test procedures known in the context of assessing univariate normality. In many cases, in order to test goodness-of-fit of two-dimensional normal distribution functions, the literature proposes one-dimensional test statistics (e.g., Balakrishnan and Lai [2]). In this framework, we note that it is not sufficient to test corresponding univariate marginal distributions for normality, since scenarios, when the marginal distributions are normal but the joint distribution fails to be bivariate normal, may be in effect.

In this paper we propose and examine a bivariate extension to the one-dimensional sample entropy based concept (e.g., Vasicek [30]) using the density based empirical likelihood (EL) methodology (e.g., Vexler et al. [33]). Then, we propose the density based EL ratio tests for bivariate normality. To this end, we shall first outline the following material regarding basic sources we use in the new development.

1.1. Empirical likelihood and sample entropy

When functional forms of underlying data distributions are completely specified, the parametric likelihood approach is unarguably a powerful tool that provides optimal statistical inference. In such cases, by virtue of the Neyman-Pearson lemma, the likelihood ratio tests yield the most powerful decision making rules (e.g., Lehmann and Romano [16]; Vexler et al. [33]). However, the parametric likelihood methods cannot be applied properly if assumptions on the forms of data distributions do not hold. The distribution function based EL methods were introduced as nonparametric alternatives to parametric likelihood techniques (e.g., Lazar and Mykand [14]; Owen [22]). Commonly, the conventional EL function has the form $E L = \prod_{i = 1}^{n} p_{i}$ , where the probability weights, p_i, i = 1, …, n, satisfy the assumptions $\sum_{i = 1}^{n} p_{i} = 1,$ 0 < p_i < 1, i = 1, …, n and the values of p_i, i = 1, …, n, are derived by maximizing the EL function under empirical constraints. For example, when we observe independent and identically distributed (i.i.d.) data points X₁, …, X_n, under the null hypothesis that E(X₁) = 0, the corresponding empirical constraint is $\sum_{i = 1}^{n} p_{i} X_{i} = 0$ .

The recent statistical literature has introduced the density-based EL (dbEL) approach for creating nonparametric test statistics that approximate parametric Neyman-Pearson statistics (e.g., Vexler et al. [33]). The dbEL method proposes to consider the likelihood function in the form

L_{f} = \prod_{i = 1}^{n} f (X_{i}) = \prod_{i = 1}^{n} f_{i}, f_{i} = f (X_{(i)}),

where f(⋅) is a density function of observations X₁, …, X_n, and X₍₁₎ ≤ … ≤ X_(n) are the order statistics based on X₁, …, X_n. Then, one can estimate values of f_j, j = 1, …, n, maximizing L_f given a constraint related to the empirical version of the density property ∫ f(x)dx = 1. In this case, the following lemma (Vexler and Gurevich [31]) has a key role.

Lemma 1. Let f(x) be a density function. Then

\frac{1}{2 m} \sum_{j = 1}^{n} \int_{X_{(j - m)}}^{X_{(j + m)}} f (x) d x = \int_{X_{(1)}}^{X_{(n)}} f (x) d x - \frac{1}{2 m} \sum_{k = 1}^{m - 1} (m - k) \int_{X_{(n - k)}}^{X_{(n - k + 1)}} f (x) d x - \frac{1}{2 m} \sum_{k = 1}^{m - 1} (m - k) \int_{X_{(k)}}^{X_{(k + 1)}} f (x) d x,

where m < n/2, X_(j) = X₍₁₎, if j ≤ 1, and X_(j) = X_(n), if j ≥ n. In terms of constructing the dbEL ratio test for univariate normality, Lemma 1 implies the next inference. By virtue of the inequality

\int_{X_{(1)}}^{X_{(n)}} f (x) d x \leq \int_{- \infty}^{\infty} f (x) d x = 1,

we obtain the empirical constraint

{(2 m)}^{- 1} \sum_{j = 1}^{n} (X_{(j + m)} - X_{(j - m)}) f_{j} \leq 1,

where the expressions (X_(j+m) − X_(j−m))f_j, j = 1, …, n, are Mean-Value Theorem type approximations to $\int_{X_{(j - m)}}^{X_{(j + m)}} f (x) d x, j = 1, …, n$ , which appear in Lemma 1. Thus, the method of Lagrange multipliers provides values of f₁, …, f_n, which maximize log(L_f) and satisfy

{(2 m)}^{- 1} \sum_{j = 1}^{n} (X_{(j + m)} - X_{(j - m)}) f_{j} \leq 1,

resulting in

f_{i} = 2 m n^{- 1} {(X_{(i + m)} - X_{(i - m)})}^{- 1}, i = 1, …, n .

Therefore, taking into account the maximum likelihood function, say L_N, under the null hypothesis H₀: X₁, …, X_n are normally distributed, where L_N ∝ (2πes²)^−n/2 with $s^{2} = n^{- 1} \sum_{j = 1}^{n} {(X_{j} - n^{- 1} \sum_{k = 1}^{n} X_{k})}^{2}$ , we obtain the dbEL ratio

T_{m n} = {(2 π e s^{2})}^{n / 2} \prod_{i = 1}^{n} \frac{2 m}{n (X_{(i + m)} - X_{(i - m)})}

that is known to be an efficient test statistic based on sample entropy (e.g., Vasicek [30]; Arizono and Ohta [1]). In order to develop the sample entropy based test for normality, Vasicek [30] applied the property of the normal distribution that its entropy exceeds that of any other distribution with a density that has the same variance. The dbEL approach extends this sample entropy based mechanism to general methods for univariate goodness-of- fit testing. The test for normality based on sample entropy is an exponential rate optimal procedure (see Tusnady [28] for details). This is in conjunction with the fact that likelihood ratio type tests oftentimes have optimal statistical properties.

In the construction process of the test statistic T_mn shown above, we used the approximation to the constraint

{(2 m)}^{- 1} \sum_{j = 1}^{n} \int_{X_{(j - m)}}^{X_{(j + m)}} f (x) d x \leq 1.

By virtue of Lemma 1, one can expect that

{(2 m)}^{- 1} \sum_{j = 1}^{n} \int_{X_{(j - m)}}^{X_{(j + m)}} f (x) d x \to 1 when m / n \to 0 as m, n \to \infty .

In this case, the integer m should increase when n → ∞, provided that m/n → 0, since, in light of Lemma 1, corresponding remainder terms related to the constraint

\int_{X_{(1)}}^{X_{(n)}} f (x) d x \leq \int_{- \infty}^{\infty} f (x) d x = 1

need to vanish asymptotically (e.g., Vexler et al. [32]). In general cases, if m has a fixed value, the approximation $\prod_{i = 1}^{n} 2 m n^{- 1} {(X_{(i + m)} - X_{(i - m)})}^{- 1}$ to the parametric likelihood L_f is not consistent (see the Supplement, Appendix A, for technical details). This is an interesting point, since naively one can anticipate that fixed values of m can provide ‘good’ approximations to $\int_{- \infty}^{\infty} f (x) d x$ as n → ∞, shortening distances between $\int_{X_{(j - m)}}^{X_{(j + m)}} f (x) d x, j = 1, …, n$ , and their estimators (X_(j+m) – X_(j−m))f_j, j = 1, …, n. However, when m is fixed and n → ∞, the number of $\int_{X_{(j - m)}}^{X_{(j + m)}} f (x) d x, j = 1, …, n$ , is bigger than that in the case with m → ∞. This enlarges a total error of the applied Mean-Value Theorem type approximations in the context of the $\int_{- \infty}^{\infty} f (x) d x -estimation$ , when intervals [X_(i−m), X_(i+m)], i = 1, …, n, overlap.

Note that the dbEL technique mentioned above can be employed in order to estimate density functions in the maximum likelihood manner, obtaining a class of histogram density estimators (e.g., Izenman [9]; Prakasa Rao [23]). In this framework, we may use fixed values of m, assuming that f is a monotonically decreasing density function. This can yield procedures related to Grenander’s estimation and the method of sieves in the context of nonparametric density functions’ evaluations (Izenman [9]; Carolan and Dykstra [4]; Efromovich [5: pp. 341–343]).

The next advance of the dbEL approach based on the Lemma 1’s consequence is found by attending to that we should specify values of m to apply the test statistic T_mn, in practice. It turns out that, since we employ the likelihood concept to derive T_mn, the maximum likelihood principle implies the test statistic

T_{n} = min_{1 \leq m < n^{1 - δ}} (T_{m n}), 0 < δ < 1,

where the operator ‘ $min_{m}$ ’ automatically sets up nearly optimal values of m at the test statistic T_mn (Vexler and Gurevich [31]). In general cases, the optimal values of m, which maximize the power of the test based on T_mn, can be calculated using information regarding the alternative distribution of the observations.

The dbEL method was successfully applied to construct various nonparametric decision-making schemes, significantly improving power as compared to the corresponding classical procedures (e.g., Vexler, Hutson and Chen, 2016).

1.2. Bivariate extensions to the univariate density based empirical likelihood and sample entropy expressions

Let (X, Y)^T denote the bivariate random vector from the joint density function f(x, y). Consider a sample from f(x, y) in the form {(X_i, Y_i), i = 1, …, n}. In this case, the likelihood function is

L_{f} = \prod_{i = 1}^{n} f (X_{i}, Y_{i}) = \prod_{i = 1}^{n} f (X_{(i)}, Y_{[i]}) = \prod_{i = 1}^{n} f_{i},

(1)

where the sample is ordered by X_i, the notation Y_[i] is termed the concomitant of the i-th order statistic X_(i), representing the Y-variate associated with X_(i) and f_i = f(X_(i), Y_[i]), i = 1, …, n.

In this case, according to the dbEL technique, we aim to achieve maximization of L_f, holding an empirical constraint with respect to the requirement ∬f(x, y)dxdy = 1. The problem is to approximate the double integral ∬f(x, y)dxdy using only n data points. That is to say although it would be desired to apply n × n points in a Riemann-type manner to approximate the double integral, we cannot employ the couples (X_i, Y_j), i ≠ j, that are not associated with f_i, i = 1, …, n, and do not appear at L_f defined in (1). In this context, one can attempt to adapt multivariate entropy estimation algorithms (Berrett et al. [3]; Kozachenko and Leonenko [12]) using open circles around the points (X_(i), Y_[i]), i = 1, …, n, with the radiuses

ρ_{i} = {m i n}_{j = 1, …, n; j \neq i} {{(X_{(i)} - X_{(j)})}^{2} + {(Y_{[i]} - Y_{[j]})}^{2}}^{1 / 2}, i = 1, …, n .

The methods for estimating the random vector entropy can provide consistent evaluations of ∬f(x, y)log (f(x, y))dxdy. However, in terms of the target construction of approximations to ∬f(x, y)dxdy based on f_i, i = 1, …, n, these approaches cannot be directly involved due to very complicated overlaps between the open circles around the points (X_(i), Y_[i]), i = 1, …, n.

In order to avoid the issue above, one can reduce the dimension of the testing problem via an application of projection pursuit techniques (Zhu et al. [38]). In this framework, it can be proposed to use the fact that (X, Y)^T is bivariate normally distributed, say (X, Y)^T ~ N₂(μ, V) with μ = E{(X, Y)^T} and variance-covariance matrix V, if and only if for every vector a ∈ R² such that a^T a = 1and a^T Va ≠ 0, we have

Z_{a} = (a^{T} {(X, Y)}^{T} - a^{T} μ) / (a^{T} V a) ~ N_{1} (0,1).

Now, we can compute estimators $\hat{μ}$ , $\hat{V}$ of the parameters μ, V and then consider one-dimensional observations

Z_{a, i} = (a^{T} {(X_{i}, Y_{i})}^{T} - a^{T} \hat{μ}) / (a^{T} \hat{V} a), i = 1, …, n,

in order to derive the dbEL ratio depending on a via the method shown in Section 1.1. Then the obtained dbEL ratio as a function of a can be, for example, integrated over different values of a, yielding a final test-statistic. It is clear that this approach might suffer from an efficiency loss when the likelihood function L_f under the alternative hypothesis is replaced by that based on Z_a,i, i = 1, …, n. Properties of the decision making procedure based upon projection pursuit are significantly depend on a way for summarizing the final test statistic with respect to different values of a.

In a similar manner to that shown above, one can evaluate an option to transform the original variates X and Y to independent normal variates (Balakrishnan and Lai [2: p. 509]) that is a correct operation only under the null hypothesis, H₀: (X, Y)^T ~ N₂(μ, V), in general cases. Note also that algorithms to transform (X, Y)^T ~ N₂(μ, V) into two independent normal variates, say Z_X and Z_Y, depend on the parameters μ, V that are unknown. If μ and V are estimated, the target distributional properties of (Z_X, Z_Y) are only approximate.

We, in this paper, carry out an accurate scheme to apply the Lemma 1’s result to approximate ∬f(x, y)dxdy, basing on f_i, i = 1, …, n. Then, in Section 2, maximizing L_f defined in (1), we obtain estimators of f_i, i = 1, …, n, in forms that can be directly associated with the bivariate histogram density estimation (Kim and van Ryzin [11]; van Ryzin [29]; Prakasa Rao [23: pp. 234–235]; Izenman [9: pp. 209–210, 212–213]). This displays a natural linkage between the univariate sample entropy based approach shown in Section 1.1 and the proposed methodology (see Section 3 for details). We study the asymptotic consistency of the bivariate dbEL technique in Section 4. In this context, due to the complexity of a structure of the dbEL, we assume a condition on f(x, y) to show rigorously that the bivariate dbEL is a consistent approximation of the parametric likelihood L_f. Various Monte Carlo experiments based on more than one hundred different scenarios of (X, Y)-distributions and a variety of sample sizes n showed that the theoretical condition applied in Section 4 is not critical. Then, we believe the bivariate dbEL approach is consistent in more general cases.

In Section 5, we develop the dbEL ratio tests for bivariate normality. Consider the simple example assuming that X₁, …, X₅₀ are i.i.d. observations from a standard normal distribution and Y₁, …, Y₅₀ are defined as Y_i = τ_iX_i, i=1,…50, where random variables τ_i = −1 or 1, i = 1, …, 50, are i.i.d. and independent of X₁, …, X₅₀ with Pr(τ₁ = −1) = 0.5. This is a conventional scenario when X₁, …, X₅₀ ~ N₁(0,1) and Y₁, …, Y₅₀ ~ N₁(0,1), but (X₁, Y₁), …, (X₅₀, Y₅₀) are not bivariate normal. In this case, the Shapiro-Wilk test (R procedure “mvShapiro.Test”, R Development Core Team [24]) and classical Mardia’s test for bivariate normality show power of 0.06 and 0.38 at the significance level of 5%, respectively, whereas the new tests we propose provide power of 0.83 and 0.92 (see Section 6 for details). One advantage of the proposed technique is that by applying the dbEL approach we can powerfully detect failures of underlying data to be bivariate normal. In Section 6, an extensive Monte Carlo study is employed to support this conclusion.

In Section 7 the proposed tests are applied to a biomarker study associated with myocardial infarction (MI) disease. The epidemiological literature indicates significant associations between the biomarkers “vitamin E”, “cholesterol” and the MI disease. We demonstrate that the new tests based on measurements related to “vitamin E” and “cholesterol” biomarkers exhibit high and stable power characteristics in comparison to the well-known decision making procedures. We conclude with remarks in Section 8. Finally the technical proofs of the theoretical results shown in this paper are given in the Supplement. The online supplementary material of this paper also presents R code to implement the proposed method.

2. The bivariate density based empirical likelihood

In this section, we introduce the algorithm for developing the bivariate dbEL approximation to the likelihood function L_f defined in (1). Towards this end, we begin with outlining the following scheme to construct an empirical estimation of the constraint ∬f(x, y)dxdy = 1 based on the observations (X_(i), Y_[i]), i = 1, …, n. The proposed algorithm consists of two stages: (A) we use Lemma 1 with respect to the density function f(x) of X, employing X₍₁₎, …, X_(n), and then (B) we use Lemma 1 regarding the conditional density function f(y|x), employing Y’s that link to X’s involved in corresponding procedures related to Stage (A).

Lemma 1 applied to the density function

\int_{- \infty}^{\infty} f (x, y) d y = \int_{- \infty}^{\infty} f (y | x) f (x) d y = f (x) \int_{- \infty}^{\infty} f (y | x) d y

provides the inequality

\frac{1}{2 m} \sum_{i = 1}^{n} \int_{X_{(i - m)}}^{X_{(i + m)}} f (x) \int_{- \infty}^{\infty} f (y | x) d y d x \leq \int_{- \infty}^{\infty} f (x) \int_{- \infty}^{\infty} f (y | x) d y d x = 1,

(2)

where m < n / 2.

Next, we will use Lemma 1 to approximate ∫f(y|x)dy. To this end, we rewrite the left term of (2) as

\frac{1}{2 m} \sum_{i = 1}^{n} \int_{X_{(i - m)}}^{X_{(i + m)}} f (x) \int_{- \infty}^{\infty} f (y | x) d y d x = \frac{1}{2 m} \sum_{i = 1}^{m} \int_{X_{(i - m)}}^{X_{(i + m)}} f (x) \int_{- \infty}^{\infty} f (y | x) d y d x + \frac{1}{2 m} \sum_{i = m + 1}^{n - m} \int_{X_{(i - m)}}^{X_{(i + m)}} f (x) \int_{- \infty}^{\infty} f (y | x) d y d x + \frac{1}{2 m} \sum_{i = n - m + 1}^{n} \int_{X_{(i - m)}}^{X_{(i + m)}} f (x) \int_{- \infty}^{\infty} f (y | x) d y d x .

(3)

In order to consider the summands in (3) with 1 ≤ i ≤ m, m + 1 ≤ i ≤ n − m and n − m + 1 ≤ i ≤ n, corresponding to the sums related to the right side of (3), we define the order statistics

Y_{(1 : k, j)} \leq Y_{(2: k, j)} \leq … \leq Y_{(j - k + 1 : k, j)}

based on Y_[k], …, Y_[j] that are the concomitants of X_[k], …, X_[j], j ≥ k, respectively. We specify that Y_{(r:k, j)} = Y_{(1:k, j)}, if r ≤ 1 and Y_{(r:k, j)} = Y_{(j−k+1:k,j)}, if r ≥ j – k +1.

In the case of 1 ≤ i ≤ m, we have i + m observations Y_(1:1,i+m) ≤ … ≤ Y_(i+m:1,i+m), since it is defined that X_(j) = X₍₁₎, if j ≤ 1. Then, by virtue of Lemma 1, for k < m / 2 and a fixed x, we obtain

h_{1, i, m, k} (x) \equiv \frac{1}{2 k} \sum_{j = 1}^{i + m} \int_{Y_{(j - k : 1, i + m)}}^{Y_{(j + k : 1, i + m)}} f (y | x) d y \leq \int_{- \infty}^{\infty} f (y | x) d y = 1.

(4)

In the case of m + 1 ≤ i ≤ n − m, we observe 2m + 1 data points Y_(1:i-m,i+m) ≤ .. ≤ Y_{(2m+1:i-m,i+m)} Then, we have

h_{2, i, m, k} (x) \equiv \frac{1}{2 k} \sum_{j = 1}^{2 m + 1} \int_{Y_{(j - k : i - m, i + m)}}^{Y_{(j + k : i - m, i + m)}} f (y | x) d y \leq \int_{- \infty}^{\infty} f (y | x) d y = 1.

(5)

In the case of n – m + 1 ≤ i ≤ n, we observe n – i + m + 1 data points Y_(1:i-m,n) ≤ .. ≤ Y_{(n-i+m+1:i-m,n)}, since it is defined that X_(j) = X_(n), if j ≥ n. Then, we have

h_{3, i, m, k} (x) \equiv \frac{1}{2 k} \sum_{j = 1}^{n - i + m + 1} \int_{Y_{(j - k : i - m, n)}}^{Y_{(j + k : i - m, n)}} f (y | x) d y \leq \int_{- \infty}^{\infty} f (y | x) d y = 1.

(6)

Note that, for any fixed x, h_r,i,m,k(x) → 1, r = 1, 2, 3, when k / m → 0 as k, m → ∞. It is clear that Equations (2)–(6) provide the constraint

H_{m, k} \leq 1,

(7)

where H_m,k it defined as

\begin{matrix} H_{m, k} = & \frac{1}{2 m} \sum_{i = 1}^{m} \int_{X_{(i - m)}}^{X_{(i + m)}} f (x) h_{1, i, m, k} (x) d x + \frac{1}{2 m} \sum_{i = m + 1}^{n - m} \int_{X_{(i - m)}}^{X_{(i + m)}} f (x) h_{2, i, m, k} (x) d x \\ + \frac{1}{2 m} \sum_{i = n - m + 1}^{n} \int_{X_{(i - m)}}^{X_{(i + m)}} f (x) h_{3, i, m, k} (x) d x \end{matrix} .

A simple algebra shows that the statistic H_m,k can be presented in the form

H_{m, k} = \frac{1}{4 k m} \sum_{i = 1}^{m} \sum_{j = 1}^{i + m} \int_{X_{(i - m)}}^{X_{(i + m)}} \int_{Y_{(j - k : 1, i + m)}}^{Y_{(j + k : 1, i + m)}} f (x, y) d x d y + \frac{1}{4 k m} \sum_{i = m + 1}^{n - m} \sum_{j = 1}^{2 m + 1} \int_{X_{(i - m)}}^{X_{(i + m)}} \int_{Y_{(j - k : i - m, i + m)}}^{Y_{(j + k : i - m, i + m)}} f (x, y) d x d y + \frac{1}{4 k m} \sum_{i = n - m + 1}^{n} \sum_{j = 1}^{n - i + m + 1} \int_{X_{(i - m)}}^{X_{(i + m)}} \int_{Y_{(j - k : i - m, n)}}^{Y_{(j + k : i - m, n)}} f (x, y) d x d y .

Now, in accordance with the dbEL-developing procedure introduced in Section 1.1, we will use the mean-value approximations to the integrals in H_m,k. Let the notation X_[r:k,j] state the concomitant of Y_(r:k,j). The couple (X_[r:k,j], Y_(r:k,j)) belongs to the data points {(X_i, Y_i), i =1, …, n} and f(X_[r:k,j], Y_(r:k,j)) has a place in L_f at (1). Consider the following situations with respect to the summands that appear in the definition of H_m,k. In the cases with 1 < i < m, 1 ≤ j ≤ i + m, we obtain

\int_{X_{(i - m)}}^{X_{(i + m)}} \int_{Y_{(j - k :1, i + m)}}^{Y_{(j + k :1, i + m)}} f (x, y) d x d y ≃ f (X_{[j :1, i + m]}, Y_{(j :1, i + m)}) (X_{(i + m)} - X_{(i - m)}) (Y_{(j + k :1, i + m)} - Y_{(j - k :1, i + m)}) .

(8)

Note that, in this framework, it is held that Y_{(j−k:1,i+m)} ≤ Y_(j:1,i+m) ≤ Y_(j+k:1,i+m) and X_(i−m) ≤ X_[j:1,i+m] ≤ X_(i+m). In the cases with m + 1 ≤ i ≤ n − m, 1 ≤ j ≤ 2m + 1, we have

\int_{X_{(i - m)}}^{X_{(i + m)}} \int_{Y_{(j - k : i - m, i + m)}}^{Y_{(j + k : i - m, i + m)}} f (x, y) d x d y ≃ f (X_{[j : i - m, i + m]}, Y_{(j : i - m, i + m)}) (X_{(i + m)} - X_{(i - m)}) (Y_{(j + k : i - m, i + m)} - Y_{(j - k : i - m, i + m)}) .

(9)

In the cases with n – m + 1 ≤ i ≤ n, 1 ≤ j ≤ n – i + m + 1, we have

\int_{X_{(i - m)}}^{X_{(i + m)}} \int_{Y_{(j - k : i - m, n)}}^{Y_{(j + k : i - m, n)}} f (x, y) d x d y ≃ f (X_{[j : i - m, n]}, Y_{(j : i - m, n)}) (X_{(i + m)} - X_{(i - m)}) (Y_{(j + k : i - m, n)} - Y_{(j - k : i - m, n)}) .

(10)

Thus, using (8)–(10), we represent constraint (7) in its empirical form

{\tilde{H}}_{m, k} \leq 1,

(11)

defining

{\tilde{H}}_{m, k} = \frac{1}{4 k m} \sum_{i = 1}^{m} \sum_{j = 1}^{i + m} f (X_{[j :1, i + m]}, Y_{(j :1, i + m)}) (X_{(i + m)} - X_{(i - m)}) (Y_{(j + k :1, i + m)} - Y_{(j - k :1, i + m)}) + \frac{1}{4 k m} \sum_{i = m + 1}^{n - m} \sum_{j = 1}^{2 m + 1} f (X_{[j : i - m, i + m]}, Y_{(j : i - m, i + m)}) (X_{(i + m)} - X_{(i - m)}) (Y_{(j + k : i - m, i + m)} - Y_{(j - k : i - m, i + m)}) + \frac{1}{4 k m} \sum_{i = n - m + 1}^{n} \sum_{j = 1}^{n - i + m + 1} f (X_{[j : i - m, n]}, Y_{(j : i - m, n)}) (X_{(i + m)} - X_{(i - m)}) (Y_{(j + k : i - m, n)} - Y_{(j - k : i - m, n)}) .

Note that ${\tilde{H}}_{m, k}$ in (11) is a sum of

w = (m + 1) + (m + 2) + … + 2 m + (n - 2 m) (2 m + 1) + (m + 1) + (m + 2) + … + 2 m = n + m (2 n - m - 1) > n

different summands and each summand involves one multiplier f(X_l, Y_l) for some l ∈ [1, n]. Therefore, there are several summands in ${\tilde{H}}_{m, k}$ with equivalent multipliers f(X_(l), Y_[l]) = f_l, l = 1, 2, …, n. This can complicate the use of the Lagrange method for deriving values of f_i, i = 1, …, n, that maximize L_f defined in (1), satisfying (11). Taking into account this issue, we will rewrite ${\tilde{H}}_{m, k}$ via a sum of n variables with coefficients f_i, i = 1, .., n. To this end, we define the rank of the observation Y_[r] with respect to Y_[c], Y_[c+1], …, Y_[d] as

ρ (Y_{[r]}, c, d) = \sum_{l = c}^{d} I (Y_{[r]} \leq Y_{[l]}),

where I(⋅) is the indicator function and 1 ≤ c ≤ d ≤ n. Then, using some reorganization (see the Supplement, Appendix B, for details), we have

{\tilde{H}}_{m, k} = \frac{1}{4 k m} \sum_{i = 1}^{n} f_{i} G_{i, m, k},

(12)

where G_i,m,k are defined in accordance with the following scheme:

for 1 ≤ i ≤ m,
$G_{i, m, k} = \sum_{j = 1}^{m} (Y_{(ρ (Y_{[i]}, 1, m + j) + k : 1, m + j)} - Y_{(ρ (Y_{[i]}, 1, m + j) - k : 1, m + j)}) (X_{(m + j)} - X_{(1)}) + \sum_{j = 1}^{i} (Y_{(ρ (Y_{[i]}, j, j + 2 m) + k : j, j + 2 m)} - Y_{(ρ (Y_{[i]}, j, j + 2 m) - k : j, j + 2 m)}) (X_{(j + 2 m)} - X_{(j)});$
for m + 1 ≤ i ≤ 2m,
$G_{i, m, k} = \sum_{j = 0}^{2 m - i} (Y_{(ρ (Y_{[i]}, 1, i + j) + k : 1, i + j)} - Y_{(ρ (Y_{[i]}, 1, i + j) - k : 1, i + j)}) (X_{(i + j)} - X_{(1)}) + \sum_{j = 1}^{i} (Y_{(ρ (Y_{[i]}, j, j + 2 m) + k : j, j + 2 m)} - Y_{(ρ (Y_{[i]}, j, j + 2 m) - k : j, j + 2 m)}) (X_{(j + 2 m)} - X_{(j)});$
for 2m + 1 ≤ i ≤ n − 2m,
$G_{i, m, k} = \sum_{j = 0}^{2 m} (Y_{(ρ (Y_{[i]}, i - j, i - j + 2 m) + k : i - j, i - j + 2 m)} - Y_{(ρ (Y_{[i]}, i - j, i - j + 2 m) - k : i - j, i - j + 2 m)}) (X_{(i - j + 2 m)} - X_{(i - j)});$

for n − 2m + 1 ≤ i ≤ n − m,

G_{i, m, k} = \sum_{j = 0}^{i - n + 2 m} (Y_{(ρ (Y_{[i], i - j, n}) + k : i - j, n)} - Y_{(ρ (Y_{[i], i - j, n}) - k : i - j, n)}) (X_{(n)} - X_{(i - j)}) + \sum_{j = 1}^{n - i} {(Y_{(ρ (Y_{[i]}, n - j - 2 m, n - j) + k : n - j - 2 m, n - j)} - Y_{(ρ (Y_{[i]}, n - j - 2 m, n - j) - k : n - j - 2 m, n - j)}) (X_{(n - j)} - X_{(n - j - 2 m)})};

for n − m + 1 ≤ i ≤ n,

G_{i, m, k} = \sum_{j = 0}^{m - 1} (Y_{(ρ (Y_{[i]}, n - m - j, n) + k : n - m - j, n)} - Y_{(ρ (Y_{[i]}, n - m - j, n) - k : n - m - j, n)}) (X_{(n)} - X_{(n - m - j)}) + \sum_{j = 0}^{n - i} {(Y_{(ρ (Y_{[i]}, n - j - 2 m, n - j) + k : n - j - 2 m, n - j)} - Y_{(ρ (Y_{[i]}, n - j - 2 m, n - j) - k : n - j - 2 m, n - j)}) (X_{(n - j)} - X_{(n - j - 2 m)})} .

Note that, corresponding to Scenarios (a)-(e) above, the statistic G_i,m,k consists of m + i, 2m + 1, 2m + 1, 2m + 1 and m + n − i + 1 different summands, respectively. Then ${\tilde{H}}_{m, k}$ at (12) includes w different summands that is consistent with definition (11).

According to the dbEL concept, we derive values of f_i, i = 1, …, n, that maximize the logarithm of the likelihood $L_{f} = \prod_{i = 1}^{n} f_{i}$ defined in (1), subject to the constraint

{(4 k m)}^{- 1} \sum_{i = 1}^{n} f_{i} G_{i, m, k} \leq 1

obtained with respect to (11), where ${\tilde{H}}_{m, k}$ has the form (12). This procedure provides the values

f_{i} = \frac{4 m k}{n G_{i, m, k}}, i = 1, …, n,

(13)

yielding the approximation

Δ_{n, m, k} = \prod_{i = 1}^{n} \frac{4 m k}{n G_{i, m, k}}

to the likelihood function

L_{f} = \prod_{i = 1}^{n} f (X_{i}, Y_{i}),

where n^δ ≤ m ≤ n^1−δ and m^δ ≤ k ≤ m^1−δ with 0 < δ < 0.5. Employing the maximum likelihood technique described in Section 1.1, we conclude that the dbEL approximation to L_f is

Δ_{n} = \underset{n^{δ} \leq m \leq n^{1 - δ}}{m i n} \underset{m^{δ} \leq k \leq m^{1 - δ}}{m i n} Δ_{n, m, k},

(14)

where 0 < δ < 0.5. Note that, in contrast to the univariate dbEL approach shown in Section 1.1, it is required that m ≥ n^δ and k ≥ m^δ. Explanations regarding these restrictions are provided in the sections below.

3. A bivariate version of the variable partition histogram

In this section, we demonstrate that the proposed method satisfies the principle of the bivariate histogram construction in the context of a maximum likelihood estimation of the density function f(X, Y) (Kim and van Ryzin [11]; van Ryzin [29]; Prakasa Rao [23: pp. 234–235]; Izenman [9: pp. 209–210, 212–213]).

Consider, for example, Scenario (c) in the definition (12), where, for i ∈ [2m + 1, n − 2m],

G_{i, m, k} = \sum_{j = 0}^{2 m} (Y_{(ρ (Y_{[i]}, i - j, i - j + 2 m) + k : i - j, i - j + 2 m)} - Y_{(ρ (Y_{[i]}, i - j, i - j + 2 m) - k : i - j, i - j + 2 m)}) (X_{(i - j + 2 m)} - X_{(i - j)})

consists of 2m + 1 summands. Taking into account the formal notations used in Kim and van Ryzin [11], we denote the statistics

{\hat{f}}_{n, j} = \frac{(D_{n, j} - B_{n, j})}{n (Y_{(D_{n, j} : i - j, i - j + 2 m)} - Y_{(B_{n, j} : i - j, i - j + 2 m)}) (X_{(C_{n, j})} - X_{(A_{n, j})})}, j = 0, …, 2 m,

where A_n,j = i − j, C_n,j = i − j + 2m, B_n,j = ρ(Y_[i], i − j, i − j + 2m) − k, D_n,j = ρ(Y_[i], i − j, i − j + 2m) + k. The statistics ${\hat{f}}_{n, j}$ , j = 0, …, 2m, are consistent approximations to f(X_(i), Y_[i]), 2m + 1 ≤ i ≤ n − 2m, uniformly for j = 0, …, 2m, if A_n,j, B_n,j, C_n,j and D_n,j satisfy the conditions presented in Kim and van Ryzin [11]. In this context, we note that, for n^δ ≤ m ≤ n^1−δ and n → ∞,

P r (0 \leq A_{n, j} < C_{n, j} \leq n, X_{(A_{n, j})} \leq X_{(i)} < X_{(C_{n, j})}) = 1; C_{n, j} - A_{n, j} = 2 m \to \infty;

(C_n,j − A_n,j) / n → 0; A_n,j and C_n,j are invariant under permutations of (X_r,Y_r), r = 1, …, n, for X_(i). Regarding the positive integer-valued and indexing random variables B_n,j and D_n,j, we have

P r (0 \leq B'_{n, j} < D'_{n, j} \leq C_{n, j} - A_{n, j} + 1 = 2 m + 1, Y_{(B_{n, j} : i - j, i - j + 2 m)} \leq Y_{[i]} < Y_{(D_{n, j} : i - j, i - j + 2 m)}) = 1,

where D′_n,j = D_n,j I(D_n,j ≤ 2m + 1) + (2m + 1)I(D_n,j > 2m + 1) and B′_n,j = B_n,j I(B_n,j ≥ 1) + I (B_n,j < 1) corresponding to the definition of subscripts (D_n,j: i − j, i − j + 2m) and (B_n,j: i − j, i − j + 2m) of Y’s, since the order statistics Y_{(1:i−1,i−j+m)} ≤ Y_{(2:i−1,i−j+m)} ≤ … ≤ Y_{(2m+1:i−1,i−j+m)} are based on Y_[i−j], Y_[i−j+1], …, Y_[i−j+2m]. It is clear that, for n^δ ≤ m ≤ n^1−δ and m^δ ≤ k ≤ m^1−δ, D_n,j − B_n,j = 2k → ∞, (B_n,j − D_n,j) / (C_n,j − A_n,j) → 0 as well as D_n,j and B_n,j are invariant under permutation of (X_r, Y_r), r = 1, …, n, i = 1, 2, …, n for (X_(i), Y_(i)).

Now, requiring m ≥ n^δ and k ≥ m^δ, we obtain that

l o g (n) / (D_{n, j} - B_{n, j}) = l o g (n) / (2 k) \leq l o g (n) / (2 m^{δ}) \leq l o g (n) / (2 n^{2 δ}) \to 0 as n \to \infty .

Thus, the theoretical arguments shown in Kim and van Ryzin [11] provide that, for all j ∈ [0, 2m], $1 / {\hat{f}}_{n, j} \to 1 / f (X_{(i)}, Y_{[i]})$ . This implies

{(2 m)}^{- 1} \sum_{j = 0}^{2 m} (1 / {\hat{f}}_{n, j}) \to 1 / f (X_{(i)}, Y_{[i]}),

concluding that

4 m k {(n G_{i, m, k})}^{- 1} \to f (X_{(i)}, Y_{[i]}), i \in [2 m + 1, n - 2 m], as n \to \infty .

4. An asymptotic consistency of the bivariate density based empirical likelihood

Here we confine the density function f(x,y) to be continuous and bounded on its support, a₁ < f(x, y) < a₂, where 0 < a₁ < a₂ < ∞ are fixed constants. Then,

n^{- 1} l o g (Δ_{n}) \overset{p}{\to} E {l o g (f (X_{1}, Y_{1}))} as n \to \infty,

where the dbEL, Δ_n, is defined by (14).

The proof of the result above is based on a formal scheme in a manner, which can be associated with the explanations mentioned in Section 3. We use the theoretical arguments shown in [11], [29] and [37] to present the rigorous proof of the dbEL consistency in the Supplement (Appendix C).

We employed various Monte Carlo evaluations based on more than one hundred different scenarios of (X,Y)-distributions and a variety of sample sizes n in order to examine a critical necessity of the condition a₁ < f(x,y) < a₂ for the asymptotic result shown in this section. These studies demonstrated that the bivariate dbEL approach is consistent in more general cases of f(x, y)-forms.

5. The dbEL ratio tests for bivariate normality

Should the underlying data follow the density function

f (x, y) = \frac{0.5}{π σ_{x} σ_{y} {(1 - ρ^{2})}^{1 / 2}} e x p [- \frac{0.5}{(1 - ρ^{2})} {\frac{{(x - μ_{x})}^{2}}{σ_{x}^{2}} - 2 ρ \frac{(x - μ_{x}) (y - μ_{y})}{σ_{x} σ_{y}} + \frac{{(y - μ_{y})}^{2}}{σ_{x}^{2}}}],

where the parameters $μ_{x} = E (X), μ_{y} = E (Y), σ_{x}^{2} = v a r (X), σ_{y}^{2} = v a r (Y)$ and ρ = E((X − μ_x)(Y − μ_y)) / (σ_xσ_y) ∈ (0,1), the maximum log likelihood function is

- n l o g {2 π {\hat{σ}}_{x} {\hat{σ}}_{y} (1 - {\hat{ρ}}^{2})} - n with {({\hat{σ}}_{x})}^{2} = n^{- 1} \sum_{i = 1}^{n} {(X_{i} - n^{- 1} \sum_{j = 1}^{n} X_{j})}^{2}, {({\hat{σ}}_{y})}^{2} = n^{- 1} \sum_{i = 1}^{n} {(Y_{i} - n^{- 1} \sum_{j = 1}^{n} Y_{j})}^{2} and \hat{ρ} = {(n {\hat{σ}}_{x} {\hat{σ}}_{y})}^{- 1} \sum_{i = 1}^{n} (X_{i} - n^{- 1} \sum_{j = 1}^{n} X_{j}) (Y_{i} - n^{- 1} \sum_{j = 1}^{n} Y_{j})

(Balakrishnan and Lai [2: p. 490]). Then we state the log dbEL ratio test that rejects the null hypothesis iff

V_{n}^{1} \equiv l o g (Δ_{n}) + n l o g {2 π {\hat{σ}}_{x} {\hat{σ}}_{y} (1 - {\hat{ρ}}^{2})} + n > C,

(15)

where the statistic Δ_n is defined in (14) and C is a test-threshold.

It is clear that the transformation ((X_i − μ_x) / σ_x, (Y_i − μ_y) / σ_y)^T_, i = 1, …, n, of the data does not change a value of the statistic $V_{n}^{1}$ . That is to say the null distribution of $V_{n}^{1}$ is unaltered with respect to the parameters $μ_{x}, μ_{y}, σ_{x}^{2}, σ_{y}^{2}$ . However, the H₀ − distribution of $V_{n}^{1}$ depends on ρ and one can easily show that the probability

P r {V_{n}^{1} > C | {(X_{i}, Y_{i})}^{T} ~ N_{2} ((\begin{matrix} 0 \\ 0 \end{matrix}), (\begin{matrix} 1 & ρ \\ ρ & 1 \end{matrix})), i = 1, …, n}

decreases when |ρ| ∈ (0,1) increases. We will evaluate this fact in the next section.

Various efficient statistics applied for testing univariate normality have forms that distributed independently of the mean and variance of observations under the null hypothesis. For example, the distribution of the statistic T_n defined in Section 1.1 does not depend on μ_x and σ_x when X₁ ~ N₁(μ_x, σ_x). Then, in order to calculate the critical values of the T_n based test for X_i ~ N₁(μ_x, σ_x), i = 1, …, n, one can use pre-tabulated critical points or/and Monte Carlo simulations without restricting the sample size n to be relatively large.

Unfortunately, in the bivariate case, when we construct appropriate test statistics, the property mentioned above cannot be held in many scenarios. In this framework, the conventional approach is to standardize the bivariate data (X_i, Y_i)^T, i = 1, …, n, obtaining the transformed data

{(X'_{i}, Y'_{i})}^{T} = S_{n}^{- 0.5} (X_{i} - n^{- 1} \sum_{j}^{n} X_{j}, Y_{i} - n^{- 1} \sum_{j}^{n} Y_{j}), i = 1, …, n,

where $S_{n}^{- 0.5}$ is the square root of the inverse of the estimated covariance matrix. Then, under the null hypothesis, (X′₁,Y′₁)^T has approximately the bivariate standard normal distribution $N_{2} ((\begin{matrix} 0 \\ 0 \end{matrix}), (\begin{matrix} 1 & 0 \\ 0 & 1 \end{matrix})) .$ In this context, in addition to the relevant literature mentioned in Section 1, we would like to refer the reader to Looney [18], Henze and Zirkler [8], Lee et al. [15], Villaseñor-Alva and González-Estrada [36]. For example, the widely-applied R procedure (R Development Core Team [24]) “mvShapiro.Test” for Shapiro-Wilk type testing bivariate normality is based on the principle shown above. Note that applications of the code “mvShapiro.Test” are restricted by the requirement n ∈ [12,5000] related to the use of tabulated values of the theoretical expectations of order statistics in Shapiro-Wilk’s manner for testing normality.

We may then propose the dbEL ratio test that rejects the null hypothesis iff

V_{n}^{2} \equiv l o g (Δ_{n}) - n l o g (1 / (2 π)) + n > C,

(16)

where the statistic Δ_n by (14) is calculated employing the data (X′_i,Y′_i)^T, i = 1, …, n, instead of (X_i,Y_i)^T, i = 1, …, n, the notation nlog(1/(2π)) − n corresponds to the approximate likelihood function of (X′_i,Y′_i)^T, i = 1, …, n, under the null hypothesis, and C is a test-threshold.

Remark. The dbEL literature shows that the power of dbEL tests does not depend significantly on values of parameters that have roles similar to that of δ at definition (14) (e.g., Tsai et al. [27]; Vexler et al. [33, 35]). In this context, we note that extensive Monte Carlo simulations confirmed the robustness of the proposed tests with respect to the values of δ at (14), i.e. one can show that the power of the new tests does not depend significantly on values of δ ∈ (0, 0.5) under various scenarios of alternative distributions. Thus, without loss of generality, we set δ in the proposed test statistics to be 0.4.

5.1. Null distributions of the dbEL ratio tests

In the one-dimensional setting, a very substantial body of literature has now grown around the asymptotic distribution problems involving the Vasicek entropy type statistics. One can generally recognize that proofs regarding the asymptotic distribution of the statistic T_mn defined in Section 1.1 are analytically very complicated. Note that, when the sample size is relatively large, we can also anticipate that various tests provide very powerful inference. Thus, following the recent literature related to goodness-of-fit tests (e.g., Hall and Welsh [6]; Mudholkar and Tian [20, 21]), we will focus on finite sample sizes without attempting to provide here an asymptotic solution for the critical values for the proposed tests in the two-dimensional setting.

The critical values for the dbEL ratio tests can be accurately approximated using Monte Carlo techniques. In order to tabulate the percentiles of the null distributions of the test statistics $V_{n}^{1}$ and $V_{n}^{2}$ with δ = 0.4 in definition (14), we drew 50,000 samples of ${(X_{i}, Y_{i})}^{T} ~ N_{2} ((\begin{matrix} 0 \\ 0 \end{matrix}), (\begin{matrix} 1 & 0 \\ 0 & 1 \end{matrix})), i = 1, …, n,$ calculating values of $V_{n}^{j}, j = 1, 2,$ at each sample size n. The generated values of the test statistics $V_{n}^{j}, j = 1, 2,$ were used to determine the critical values Cα of the corresponding null distributions of $V_{n}^{j}, j = 1, 2,$ at the significance level α. The results of this Monte Carlo study are presented in Tables 1 and 2.

Table 1.

Critical Values,C_α, of the Test Statistic $V_{n}^{1}$

Sample size	α
n	0.1	0.05	0.04	0.025	0.01
15	13.78831	14.45392	14.67231	15.10617	15.89147
20	16.01282	16.82075	17.06102	17.56599	18.44998
25	17.79228	18.67010	18.93496	19.48494	20.54437
30	19.34573	20.38065	20.67776	21.28747	22.37941
35	20.88404	21.99563	22.33991	23.01375	24.19156
40	22.07930	23.26584	23.64351	24.36966	25.67781
45	23.30695	24.54427	24.90769	25.70937	27.06523
50	24.67166	26.00287	26.34771	27.08118	28.43204
55	25.53667	26.91676	27.33220	28.19032	29.63046
60	26.44042	27.77653	28.22558	29.12152	30.66377
70	28.04247	29.53837	29.96516	30.84174	32.44158
80	29.36747	31.02797	31.51445	32.44036	34.09259
90	30.75603	32.51081	33.05028	34.05678	35.57193
100	32.07309	33.76989	34.37907	35.40567	37.31336
120	34.02814	36.02057	36.52973	37.70851	39.86061

Open in a new tab

Table 2.

Critical Values, C_α, of the Test Statistic $V_{n}^{2}$

Sample size	α
n	0.1	0.05	0.04	0.025	0.01
15	13.96789	14.68150	14.85996	15.30876	16.03354
20	16.22554	17.06107	17.31668	17.77547	18.72344
25	17.94104	18.90434	19.19077	19.72435	20.72166
30	19.47291	20.53942	20.78474	21.38950	22.44172
35	21.31563	22.44544	22.77901	23.46965	24.70225
40	22.29129	23.48504	23.82955	24.50040	25.67661
45	23.52354	24.73849	25.09407	25.84452	27.13543
50	24.77544	26.05601	26.43748	27.21586	28.60631
55	25.65505	26.96317	27.35342	28.14286	29.63217
60	26.52463	27.98900	28.42992	29.29108	30.66241
70	28.15409	29.68741	30.12672	31.04799	32.67359
80	29.46820	31.10201	31.56947	32.42746	34.07674
90	30.92357	32.62724	33.13130	34.23576	36.06089
100	32.00653	33.72502	34.26569	35.32767	37.10607
120	34.01007	35.96420	36.54560	37.82717	39.62449

Open in a new tab

In order to verify the results shown in Tables 1 and 2, for different values of ρ ∈ (−1,1) and n, we calculated the Monte Carlo approximations to

P r {V_{n}^{j} > C_{0.05} | {(X_{i}, Y_{i})}^{T} ~ N_{2} ((\begin{matrix} 0 \\ 0 \end{matrix}), (\begin{matrix} 1 & ρ \\ ρ & 1 \end{matrix})), i = 1, …, n}, j = 1, 2,

where C_α=0.05’s are shown in Tables 1 and 2. In this study, we also analyzed the Shapiro-Wilk test (SW), using the R procedure “mvShapiro.Test”. For each value of ρ and n, the Type I error rates were derived using 25,000 samples of ${(X_{i}, Y_{i})}^{T} ~ N_{2} ((\begin{matrix} 0 \\ 0 \end{matrix}), (\begin{matrix} 1 & ρ \\ ρ & 1 \end{matrix})), i = 1, …, n$ . Table 3 presents the results of this Monte Carlo evaluation. According to Table 3, the validity of the critical values related to the test statistic $V_{n}^{2}$ is experimentally confirmed. Although the test based on $V_{n}^{1}$ is very conservative when |ρ| > 0.7, we can recommend the test (15) to be applied in practice, owing to high levels of the power of this test against alternatives considered in Section 6.

Table 3.

The Monte Carlo Type I error probabilities of the proposed tests (15), (16) and the Shapiro-Wilk test (SW), when ${(X_{i}, Y_{i})}^{T} ~ N_{2} ((\begin{matrix} 0 \\ 0 \end{matrix}), (\begin{matrix} 1 & ρ \\ ρ & 1 \end{matrix})), i = 1, …, n$ and the anticipated significance level is α= 0.05.

	n = 35			n = 50			n = 70
ρ	$V_{n}^{1}$	$V_{n}^{2}$	SW	$V_{n}^{1}$	$V_{n}^{2}$	SW	$V_{n}^{1}$	$V_{n}^{2}$	SW
−0.9	<0.0001	0.0475	0.0521	<0.0001	0.0496	0.0519	<0.0001	0.0513	0.0506
−0.8	0.0015	0.0504	0.0501	0.0014	0.0495	0.0492	0.0014	0.0477	0.0482
−0.7	0.0073	0.0501	0.0501	0.0069	0.0498	0.0480	0.0067	0.0499	0.0466
−0.6	0.0143	0.0505	0.0489	0.0139	0.0499	0.0521	0.0143	0.0479	0.0495
−0.5	0.0235	0.0511	0.0520	0.0233	0.0495	0.0521	0.0228	0.0498	0.0481
−0.4	0.0325	0.0513	0.0507	0.0319	0.0500	0.0519	0.0322	0.0495	0.0498
−0.3	0.0388	0.0509	0.0521	0.0407	0.0500	0.0493	0.0394	0.0500	0.0506
−0.2	0.0454	0.0509	0.0517	0.0465	0.0501	0.0516	0.0475	0.0495	0.0497
−0.1	0.0506	0.0461	0.0526	0.0499	0.0499	0.0489	0.0507	0.0503	0.0496
0	0.0488	0.0501	0.0500	0.0508	0.0499	0.0493	0.0504	0.0495	0.0501
0.1	0.0501	0.0473	0.0505	0.0500	0.0454	0.0510	0.0503	0.0475	0.0466
0.2	0.0466	0.0508	0.0521	0.0470	0.0499	0.0493	0.0469	0.0490	0.0508
0.3	0.0418	0.0507	0.0519	0.0400	0.0499	0.0519	0.0380	0.0493	0.0470
0.4	0.0335	0.0512	0.0509	0.0317	0.0501	0.0497	0.0323	0.0492	0.0506
0.5	0.0237	0.0510	0.0515	0.0228	0.0496	0.0481	0.0231	0.0499	0.0514
0.6	0.0143	0.0503	0.0488	0.0138	0.0498	0.0521	0.0142	0.0498	0.0497
0.7	0.0072	0.0498	0.0495	0.0071	0.0498	0.0492	0.0068	0.0489	0.0492
0.8	0.0016	0.0502	0.0498	0.0015	0.0496	0.0519	0.0012	0.0483	0.0489
0.9	<0.0001	0.0500	0.0518	<0.0001	0.0497	0.0519	<0.0001	0.0508	0.0488

Open in a new tab

Remark. In practice, in order to implement the proposed approach in a simple and rapid manner, one can suggest applying a hybrid method for computing the p-values of the tests (15), (16), by combining Monte Carlo simulations and the critical values displayed in Tables 1 and 2. In this framework, employing Bayesian type procedures, we can derive relevant information from the Monte Carlo experiments via likelihood type functions, whereas the tabulated critical values can be used to reflect prior distributions (Vexler et al. [34]). The hybrid technique for computing the p-values has been employed in STATA and R statistical packages (Vexler et al. [33]).

6. Power of the tests

It is clear that in the nonparametric setting of testing bivariate normality, there are no most powerful decision making procedures. In this section, we only exemplify several scenarios where the power of the proposed tests is compared with that of the Shapiro-Wilk (SW) test and classical Mardia’s test (MT) for bivariate normality at the significance level of 5%. The following scenarios of source distributions were treated:

(A) X₁, …, X_n ~ N₁(0,1) and Y_i = τ_iX_i, i =1,…n, where random variables τ_i = −1 or 1, i = 1, …, n, are i.i.d. and independent of X₁, …, X_n with P_r(τ₁ = −1) = 0.5. In this case, X₁, …, X_n ~ N₁(0,1) and Y₁, …, Y_n ~ N₁(0,1), but (X₁, Y₁), …, (X_n, Y_n) are not bivariate normal.

(B) X₁, …, X_n are uniformly distributed over (−5,5) and independent of Y₁, …, Y_n that are uniformly distributed over (−5,5). This case presents a light tailed alternative distribution.

(C) Let ξ₁, …, ξ_n and η₁, …, η_n be i.i.d. random variables from N₁(0,1). Define $(X_{i} = ξ_{i} + ξ_{i}^{4} / 25, Y_{i} = η_{i} X_{i})$ , i = 1, …, n, to examine a case of heavy tailed alternative distributions.

(D) Define $(X_{i} = \sum_{j = 1}^{7} ξ_{i j} / 7^{1 / 2}, Y_{i} = \sum_{j = 1}^{7} η_{i j} / 7^{1 / 2})$ , i = 1, …, n, with independent and identically U(0,1)-distributed random variables ξ’s and η’s to evaluate a case in which the central limit theorem can be applied to approximate the data distribution.

(E) (X,Y)’s follow Morgenstern’s distribution with parameter α = 0.5 (Johnson [10: p. 185]).

(F) (X,Y)’s follow Plackett’s distribution with parameter Ψ = 2 (Johnson [10: p. 193]).

(G) (X₁,Y₁), …, (X_n,Y_n) follow Gumbel’s Type I logistic distribution (Johnson [10: p. 199]).

(H) Assume random vectors (Z₁, W₁)^T, …, (Z_n, W_n)^T are from Gumbel’s bivariate exponential distribution with parameter θ = 0.9 (Johnson [10: p. 197]). In order to reduce the power of the considered tests, we define (X_i = Z_i + ξ_i, Y_i = W_i + η_i), where ξ’s and η’s are independent and identically U(−6,3)-distributed random variables.

(I) Define (X_i = ξ_i, Y_i = η_i), i = 1, …, n, where ξ’s and η’s are independent and identically Gamma(2,1)-distributed random variables. Note that, commonly, sample entropy based tests for univariate normality do not outperform the corresponding Shapiro-Wilk test when underlying data are from a gamma distribution (e.g., Table 2 in Vasicek [30], where the case with X ~ Gamma(2,1) is evaluated).

Table 4 shows the results of the power evaluations of the proposed tests ( $V_{n}^{1}$ and $V_{n}^{2}$ with δ = 0.4 in definition (14)), the Shapiro-Wilk (SW) test and classical Mardia’s test (MT) for bivariate normality via the Monte Carlo study based on 15,000 replications of (X₁, Y₁), …, (X_n, Y_n) for the designs (A-I) given above at each sample size n = 35, 50, 70.

Table 4.

The Monte Carlo power of the tests at the significance level of 5%.

Tests	Design (A)			Design (B)			Design (C)
	Sample size (n)			Sample size (n)			Sample size (n)
…	35	50	70	35	50	70	35	50	70
$V_{n}^{1}$	0.5645	0.8327	0.9427	0.9295	0.9923	1	0.8228	0.9635	0.9945
$V_{n}^{2}$	0.6666	0.9249	0.9796	0.8907	0.9888	1	0.7771	0.9458	0.9914
SW	0.0570	0.0604	0.0613	0.6975	0.9316	0.9952	0.7744	0.9282	0.9878
MT	0.3711	0.3881	0.4108	0.0004	0.0002	0.0002	0.7392	0.8350	0.9156
	Design (D)			Design (E)			Design (F)
$V_{n}^{1}$	0.0639	0.0643	0.0753	0.9226	0.9928	0.9998	0.9908	0.9998	1
$V_{n}^{2}$	0.0533	0.0639	0.0693	0.8748	0.9876	0.9994	0.9739	0.9990	1
SW	0.0375	0.0378	0.0381	0.6544	0.9098	0.9906	0.8987	0.9910	0.9999
MT	0.0230	0.0251	0.0257	0.0000	0.0001	0.0001	0.0033	0.0045	0.0049
	Design (G)			Design (H)			Design (I)
$V_{n}^{1}$	1	1	1	0.4581	0.6234	0.7752	0.8959	0.9766	0.9969
$V_{n}^{2}$	0.9994	1	1	0.4160	0.6184	0.7796	0.8521	0.9745	0.9964
SW	0.8812	0.9801	0.9980	0.3783	0.5582	0.7535	0.9615	0.9972	0.9999
MT	0.9950	0.9997	0.9998	0.2043	0.3004	0.4178	0.8062	0.9527	0.9959

Open in a new tab

This study demonstrates that the dbEL ratio tests are superior to the considered classical tests under the designs (A-H). The new tests have significantly improved powers as compared to the corresponding classical procedures. For example, the power of the proposed tests is roughly two times larger than that of the classical tests given scenarios (A) and (D). The proposed tests have approximately 10%−30% power gains as compared to the classical procedures when n = 35 in scenarios (E) and (H). It seems that the Shapiro Wilk test is not efficient under the designs of (A) and (D). In scenario (D), the Shapiro Wilk test is biased. Mardia’s test is biased under the designs of (B), (D), (E), and (F). In these scenarios, the new tests exhibit high and stable power characteristics. The proposed tests perform reasonably well, and are generally competitive with the classical tests in case (I). In this scenario, it is anticipated that the Shapiro Wilk test has higher power than the other considered tests. In parallel with studies regarding properties of tests for univariate normality, the shown Monte Carlo results are consistent with those related to one-dimensional sample entropy based tests (e.g., Vasicek [30]]).

7. Data analysis

Myocardial infarction is commonly caused by blood clots blocking the blood flow of the heart leading heart muscle injury. The heart disease is leading cause of death affecting about or higher than 20% of populations regardless of different ethnicities according to the Centers for Disease Control and Prevention (e.g., Schisterman et al. [25, 26]).

We illustrate the application of the proposed approach based on a sample from a study that evaluates biomarkers associated with myocardial infarction (MI). The study was focused on the residents of Erie and Niagara counties, 35–79 years of age. The New York State department of Motor Vehicles drivers’ license rolls was used as the sampling frame for adults between the age of 35 and 65 years, while the elderly sample (age 65–79) was randomly chosen from the Health Care Financing Administration database. The biomarkers “high density lipoprotein (HDL)-cholesterol” and “vitamin E” are often used as a discriminant factor between individuals with (MI=1) and without (MI=0) myocardial infarction disease (e.g., Schisterman et al. [25, 26]). The HDL-cholesterol levels were examined from a 12-hour fasting blood specimen for biochemical analysis at baseline. A total of 240 measurements of the biomarkers were evaluated by the study. The sample of 120 biomarkers values was collected on cases who survived on MI and the sample of 120 measurements on controls who had no previous MI.

Oftentimes, measurements related to biological processes follow a log-normal distribution (e.g., Limpert et al. [17]). The aim of this study is to investigate the joint distribution of log-transformed vitamin E measurements, say X, and log-transformed (HDL)-cholesterol measurements, say Y, with regard to MI disease. Towards this end, we implemented the new tests, the Shapiro-Wilk test (SW) and Mardia’s test (MT) for bivariate normality using the data described above. Figure S1 in the Supplement depicts the histograms based on values of X, Y and the scatter plots based on (X, Y) for the case (MI=1) and control (MI=0) groups, respectively.

In this study, the considered four tests provided p-values<0.045, rejecting the hypotheses that the observations (X₁, Y₁), …, (X₁₂₀, Y₁₂₀) are bivariate normally distributed (H₀) for the case (MI=1) and control (MI=0) groups, respectively. Then, we organized a bootstrap/Jackknife type study to examine the power performances of the test statistics. The conducted strategy was that samples with sizes 35, 50 and 70 were randomly selected from the “vitamin E/HDL-cholesterol” data to be tested for bivariate normality at 5% level of significance. We repeated this strategy 5000 times calculating the frequencies of the events { $V_{n}^{1}$ rejects H₀}, { $V_{n}^{2}$ rejects H₀}, {SW rejects H₀} and {MT rejects H₀}. The test statistics $V_{n}^{1}$ and $V_{n}^{2}$ were performed with δ = 0.4 in definition (14). The obtained experimental powers of the four tests are shown in Table 5.

Table 5.

The experimental powers of the tests at the significance level of 5%.

Tests	MI=1			MI=0
	Sample size (n)			Sample size (n)
	35	50	70	35	50	70
$V_{n}^{1}$	0.2780	0.3838	0.4716	0.3904	0.4504	0.6442
$V_{n}^{2}$	0.1990	0.3108	0.4277	0.2799	0.4178	0.6189
SW	0.1152	0.1896	0.3119	0.2602	0.3812	0.5910
MT	0.0940	0.1580	0.2369	0.0510	0.0786	0.1544

Open in a new tab

In this study, the proposed tests significantly outperform the SW and MT tests in terms of the power properties when detecting that the log-transformed biomarkers’ values are not jointly distributed as bivariate normal random variables. For example, when n = 35 and MI=1, the dbEL ratio tests reveal the experimental powers that are approximately two times larger than those of the SW and MT tests. That is, the dbEL ratio tests are more sensitive as compared with the known methods to rejecting the null hypothesis of bivariate normality regarding joint distributions of the log-transformed values of the “vitamin E” and “HDL-cholesterol” biomarkers.

8. Concluding remarks

In this paper, we extended the density based empirical likelihood approach to construct new goodness of fit tests for bivariate normality. The main idea of our method was to propose a consistent technique that employs histogram/sample entropies density based estimations in the bivariate framework. We compared the performance of the dbEL ratio tests to the known decision making procedures, the Shapiro-Wilk test and Mardia’s test. The conducted simulation study displayed that the proposed tests outperformed the known tests in many important scenarios of alternative distributions as well as the new tests provided power levels in a similar manner to their univariate sample entropy based analogs. Finally, we applied our tests on a real data set, where the proposed technique exhibited high and stable power characteristics.

Certainly, the proposed testing strategy is computational intensive. In this context, we would like to note that the known principles regarding bivariate histogram developments deal with strong computational requirements in general. In the modern age we are generally no longer constrained by computational issues and have a greater flexibility in terms of the statistical approaches that we may employ to data analysis problems. Advances in computation and the fast and cheap computational facilities now are available to statisticians. This can support that the dbEL methodology can be suggested to be modified and extended in order to be applied to various multivariate problems encountered in statistical studies.

Our main objective of this paper is twofold: (1) to show that the density based empirical likelihood technique can be a valuable tool in multivariate statistical analysis and (2) to convince readers of the usefulness of the sample entropy based approach that should be more widely investigated in multivariate frameworks.

Supplementary Material

Supp2

NIHMS1512459-supplement-Supp2.docx^{(514.4KB, docx)}

Acknowledgments

Dr. Vexler’s effort was supported by the National Institutes of Health (NIH) grant 1G13LM012241–01.

Footnotes

Appendix. Supplementary data

Supplementary material related to this article can be found online

References

[1].Arizono I and Ohta H, A test for normality based on Kullback-Leibler information. The American Statistician 43 (1989) 20–22. [Google Scholar]
[2].Balakrishnan N and Lai C-D, Continuous Bivariate Distributions. Springer, New York, 2009. [Google Scholar]
[3].Berrett TB, Samworth RJ and Yuan M. Efficient multivariate entropy estimation via k-nearest neighbour distances. arXiv:1606.00304 (https://arxiv.org/abs/1606.00304), 2017.
[4].Carolan C and Dykstra R. Asymptotic behavior of the Grenander estimator at density flat regions. The Canadian Journal of Statistics 27 (1999) 557–566. [Google Scholar]
[5].Efromovich S. Nonparametric Curve Estimation: Methods, Theory, and Applications. Springer-Verlag, New York, 1999. [Google Scholar]
[6].Hall P and Welsh AH. A test for normality based on the empirical characteristic function. Biometrika 70 (1983) 485–489. [Google Scholar]
[7].Hawkins DM. A new test for multivariate normality and homoscedasticity. Technometrics 23:1 (1981) 105–110. [Google Scholar]
[8].Henze N and Zirkler B. A Class of Invariant Consistent Tests for Multivariate Normality. Communications in Statistics-Theory and Method 19 (1990) 3595–3618. [Google Scholar]
[9].Izenman AJ. Resent developments in nonparametric density estimation. Journal of the American Statistical Association 86 (1991) 205–224. [Google Scholar]
[10].Johnson ME. Multivariate Statistical Simulationa. John Wiley & Sons, New York, 1987. [Google Scholar]
[11].Kim BK, Van Ryzin J. A bivariate histogram density estimator: consistency and asymptotic normality. Statistics & Probability Letters 3 (1985) 167–173. [Google Scholar]
[12].Kozachenko LF and Leonenko NN. Sample estimate of the entropy of a random vector. Probl. Inform. Transm 23 (1987) 95–101. [Google Scholar]
[13].Kowalski CJ. The performance of some rough tests for bivariate normality before and after coordinate transformations to normality. Technometrics 12:3 (1970) 517–544. [Google Scholar]
[14].Lazar N and Mykland PA. An Evaluation of the Power and Conditionality Properties of Empirical Likelihood. Biometrika 85 (998) 523–534. [Google Scholar]
[15].Lee R, Qian M and Shao YZ. On Rotational Robustness of Shapiro-Wilk Type Tests for Multivariate Normality. Open Journal of Statistics 4 (2014) 964–969. [Google Scholar]
[16].Lehmann EL and Romano JP. Testing Statistical Hypotheses. Springer-Verlag, New York, 2005. [Google Scholar]
[17].Limpert E, Stahel WA and Abbt M. Log-Normal Distributions across the Sciences: Keys and Clues on the Charms of Statistics, and How Mechanical Models Resembling Gambling Machines Offer a Link to a Handy Way to Characterize Log-Normal Distributions, Which Can Provide Deeper Insight into Variability and Probability—Normal or Log-Normal: That Is the Question. BioScience. 51 (2001) 341–352. [Google Scholar]
[18].Looney SW. How to Use Tests for Univariate Normality to Assess Multivariate Normality. The American Statistician 49 (1995) 64–70. [Google Scholar]
[19].Mecklin CJ and Mundfrom DJ. An appraisal and bibliography of tests for multivariate normality. International Statistical Review 72 (2004) 123–138. [Google Scholar]
[20].Mudholkar GS and Tian L. An entropy characterization of the inverse Gaussian distribution and related goodness-of-fit test. Journal of Statistical Planning and Inference 102 (2002) 211–221. [Google Scholar]
[21].Mudholkar GS and Tian L. A test for homogeneity of ordered means of inverse Gaussian populations. Journal of Statistical Planning and Inference. 118 (2004) 37–49. [Google Scholar]
[22].Owen AB. Empirical Likelihood. CRC press, Florida, 2001. [Google Scholar]
[23].Prakasa Rao BLS. Nonparametric functional estimation. Accademic Press, Inc., New York, 1983. [Google Scholar]
[24].R Development Core Team. R: A language and environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria, 2012. http://www.R-project.org
[25].Schisterman EF, Faraggi D, Browne R, Freudenheim J, Dorn J, Muti P, Armstrong D, Reiser B and Trevisan M. TBARS and cardiovascular disease in a population-based sample. Journal of Cardiovascular Risk. 8 (2001) 219–225. [DOI] [PubMed] [Google Scholar]
[26].Schisterman EF, Faraggi D, Browne R, Freudenheim J, Dorn J, Muti P, Armstrong D, Reiser B and Trevisan M. Minimal and best linear combination of oxidative stress and antioxidant biomarkers to discriminate cardiovascular disease. Nutrition, Metabolism, and Cardiovascular Disease 12 (2002) 259–266. [PubMed] [Google Scholar]
[27].W-M Tsai A Vexler and G. Gurevich. An extensive power evaluation of a novel two-sample density-based empirical likelihood ratio test for paired data with an application to a treatment study of attention-deficit/hyperactivity disorder and severe mood dysregulation. Journal of Applied Statistics. 40 (2013) 1189–1208. [Google Scholar]
[28].Tusnady G. On asymptotically optimal tests. Ann. Statist 5 (1977) 385–393. [Google Scholar]
[29].Van Ryzin J. A histogram method of density estimation, Communications in Statistics 2:64 (1973) 93–506. [Google Scholar]
[30].Vasicek O. A Test for Normality Based on Sample Entropy. Journal of the Royal Statistical Society. Series B (Methodological) 139 (1976) 54–59. [Google Scholar]
[31].Vexler A and Gurevich G, G. Empirical Likelihood Ratios Applied to Goodness-of-Fit Tests Based on Sample Entropy. Computational Statistics & Data Analysis 54 (2010) 531–545. [Google Scholar]
[32].Vexler A, Gurevich G and Hutson AD. An Exact Density-Based Empirical Likelihood Ratio Test for Paired Data. Journal of Statistical Planning and Inference 143(2) (2013) 334–345. [DOI] [PMC free article] [PubMed] [Google Scholar]
[33].Vexler A, Hutson AD and Chen X. Statistical Testing Strategies in the Health Sciences. Chapman & Hall/CRC, New York, 2016. [Google Scholar]
[34].Vexler A, Kim YM, Yu J, Lazar NA and Hutson AD. Computing Critical Values of Exact Tests by Incorporating Monte Carlo Simulations Combined with Statistical Tables. Scandinavian Journal of Statistics. 41 (2014) 1013–1030. [DOI] [PMC free article] [PubMed] [Google Scholar]
[35].Vexler A, W-M.Tsai and A. D. Hutson. A Simple Density-Based Empirical Likelihood Ratio Test for Independence. The American Statistician. 68 (2014) 158–169. [DOI] [PMC free article] [PubMed] [Google Scholar]
[36].Villaseñor-Alva JA and González-Estrada G. A Generalization of Shapiro-Wilk’s Test for Multivariate Normality. Communications in Statistics-Theory and Methods 38 (2009) 1870–1883. [Google Scholar]
[37].Wilks SS. Mathematical Statistics. John Wiley and Sons, New York, 1962. [Google Scholar]
[38].Zhu L-X, Wong HL and Fang K-T. A test for multivariate normality based on sample entropy and projection pursuit. Journal of Statistical Planning and Inference 45 (1995) 373–385. [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supp2

NIHMS1512459-supplement-Supp2.docx^{(514.4KB, docx)}

[R1] [1].Arizono I and Ohta H, A test for normality based on Kullback-Leibler information. The American Statistician 43 (1989) 20–22. [Google Scholar]

[R2] [2].Balakrishnan N and Lai C-D, Continuous Bivariate Distributions. Springer, New York, 2009. [Google Scholar]

[R3] [3].Berrett TB, Samworth RJ and Yuan M. Efficient multivariate entropy estimation via k-nearest neighbour distances. arXiv:1606.00304 (https://arxiv.org/abs/1606.00304), 2017.

[R4] [4].Carolan C and Dykstra R. Asymptotic behavior of the Grenander estimator at density flat regions. The Canadian Journal of Statistics 27 (1999) 557–566. [Google Scholar]

[R5] [5].Efromovich S. Nonparametric Curve Estimation: Methods, Theory, and Applications. Springer-Verlag, New York, 1999. [Google Scholar]

[R6] [6].Hall P and Welsh AH. A test for normality based on the empirical characteristic function. Biometrika 70 (1983) 485–489. [Google Scholar]

[R7] [7].Hawkins DM. A new test for multivariate normality and homoscedasticity. Technometrics 23:1 (1981) 105–110. [Google Scholar]

[R8] [8].Henze N and Zirkler B. A Class of Invariant Consistent Tests for Multivariate Normality. Communications in Statistics-Theory and Method 19 (1990) 3595–3618. [Google Scholar]

[R9] [9].Izenman AJ. Resent developments in nonparametric density estimation. Journal of the American Statistical Association 86 (1991) 205–224. [Google Scholar]

[R10] [10].Johnson ME. Multivariate Statistical Simulationa. John Wiley & Sons, New York, 1987. [Google Scholar]

[R11] [11].Kim BK, Van Ryzin J. A bivariate histogram density estimator: consistency and asymptotic normality. Statistics & Probability Letters 3 (1985) 167–173. [Google Scholar]

[R12] [12].Kozachenko LF and Leonenko NN. Sample estimate of the entropy of a random vector. Probl. Inform. Transm 23 (1987) 95–101. [Google Scholar]

[R13] [13].Kowalski CJ. The performance of some rough tests for bivariate normality before and after coordinate transformations to normality. Technometrics 12:3 (1970) 517–544. [Google Scholar]

[R14] [14].Lazar N and Mykland PA. An Evaluation of the Power and Conditionality Properties of Empirical Likelihood. Biometrika 85 (998) 523–534. [Google Scholar]

[R15] [15].Lee R, Qian M and Shao YZ. On Rotational Robustness of Shapiro-Wilk Type Tests for Multivariate Normality. Open Journal of Statistics 4 (2014) 964–969. [Google Scholar]

[R16] [16].Lehmann EL and Romano JP. Testing Statistical Hypotheses. Springer-Verlag, New York, 2005. [Google Scholar]

[R17] [17].Limpert E, Stahel WA and Abbt M. Log-Normal Distributions across the Sciences: Keys and Clues on the Charms of Statistics, and How Mechanical Models Resembling Gambling Machines Offer a Link to a Handy Way to Characterize Log-Normal Distributions, Which Can Provide Deeper Insight into Variability and Probability—Normal or Log-Normal: That Is the Question. BioScience. 51 (2001) 341–352. [Google Scholar]

[R18] [18].Looney SW. How to Use Tests for Univariate Normality to Assess Multivariate Normality. The American Statistician 49 (1995) 64–70. [Google Scholar]

[R19] [19].Mecklin CJ and Mundfrom DJ. An appraisal and bibliography of tests for multivariate normality. International Statistical Review 72 (2004) 123–138. [Google Scholar]

[R20] [20].Mudholkar GS and Tian L. An entropy characterization of the inverse Gaussian distribution and related goodness-of-fit test. Journal of Statistical Planning and Inference 102 (2002) 211–221. [Google Scholar]

[R21] [21].Mudholkar GS and Tian L. A test for homogeneity of ordered means of inverse Gaussian populations. Journal of Statistical Planning and Inference. 118 (2004) 37–49. [Google Scholar]

[R22] [22].Owen AB. Empirical Likelihood. CRC press, Florida, 2001. [Google Scholar]

[R23] [23].Prakasa Rao BLS. Nonparametric functional estimation. Accademic Press, Inc., New York, 1983. [Google Scholar]

[R24] [24].R Development Core Team. R: A language and environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria, 2012. http://www.R-project.org

[R25] [25].Schisterman EF, Faraggi D, Browne R, Freudenheim J, Dorn J, Muti P, Armstrong D, Reiser B and Trevisan M. TBARS and cardiovascular disease in a population-based sample. Journal of Cardiovascular Risk. 8 (2001) 219–225. [DOI] [PubMed] [Google Scholar]

[R26] [26].Schisterman EF, Faraggi D, Browne R, Freudenheim J, Dorn J, Muti P, Armstrong D, Reiser B and Trevisan M. Minimal and best linear combination of oxidative stress and antioxidant biomarkers to discriminate cardiovascular disease. Nutrition, Metabolism, and Cardiovascular Disease 12 (2002) 259–266. [PubMed] [Google Scholar]

[R27] [27].W-M Tsai A Vexler and G. Gurevich. An extensive power evaluation of a novel two-sample density-based empirical likelihood ratio test for paired data with an application to a treatment study of attention-deficit/hyperactivity disorder and severe mood dysregulation. Journal of Applied Statistics. 40 (2013) 1189–1208. [Google Scholar]

[R28] [28].Tusnady G. On asymptotically optimal tests. Ann. Statist 5 (1977) 385–393. [Google Scholar]

[R29] [29].Van Ryzin J. A histogram method of density estimation, Communications in Statistics 2:64 (1973) 93–506. [Google Scholar]

[R30] [30].Vasicek O. A Test for Normality Based on Sample Entropy. Journal of the Royal Statistical Society. Series B (Methodological) 139 (1976) 54–59. [Google Scholar]

[R31] [31].Vexler A and Gurevich G, G. Empirical Likelihood Ratios Applied to Goodness-of-Fit Tests Based on Sample Entropy. Computational Statistics & Data Analysis 54 (2010) 531–545. [Google Scholar]

[R32] [32].Vexler A, Gurevich G and Hutson AD. An Exact Density-Based Empirical Likelihood Ratio Test for Paired Data. Journal of Statistical Planning and Inference 143(2) (2013) 334–345. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R33] [33].Vexler A, Hutson AD and Chen X. Statistical Testing Strategies in the Health Sciences. Chapman & Hall/CRC, New York, 2016. [Google Scholar]

[R34] [34].Vexler A, Kim YM, Yu J, Lazar NA and Hutson AD. Computing Critical Values of Exact Tests by Incorporating Monte Carlo Simulations Combined with Statistical Tables. Scandinavian Journal of Statistics. 41 (2014) 1013–1030. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R35] [35].Vexler A, W-M.Tsai and A. D. Hutson. A Simple Density-Based Empirical Likelihood Ratio Test for Independence. The American Statistician. 68 (2014) 158–169. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R36] [36].Villaseñor-Alva JA and González-Estrada G. A Generalization of Shapiro-Wilk’s Test for Multivariate Normality. Communications in Statistics-Theory and Methods 38 (2009) 1870–1883. [Google Scholar]

[R37] [37].Wilks SS. Mathematical Statistics. John Wiley and Sons, New York, 1962. [Google Scholar]

[R38] [38].Zhu L-X, Wong HL and Fang K-T. A test for multivariate normality based on sample entropy and projection pursuit. Journal of Statistical Planning and Inference 45 (1995) 373–385. [Google Scholar]

PERMALINK

A density based empirical likelihood approach for testing bivariate normality

Gregory Gurevich

Albert Vexler

Abstract

1. Introduction and the statement of problem

1.1. Empirical likelihood and sample entropy

1.2. Bivariate extensions to the univariate density based empirical likelihood and sample entropy expressions

2. The bivariate density based empirical likelihood

3. A bivariate version of the variable partition histogram

4. An asymptotic consistency of the bivariate density based empirical likelihood

5. The dbEL ratio tests for bivariate normality

5.1. Null distributions of the dbEL ratio tests

Table 1.

Table 2.

Table 3.

6. Power of the tests

Table 4.

7. Data analysis

Table 5.

8. Concluding remarks

Supplementary Material

Acknowledgments

Footnotes

References

Associated Data

Supplementary Materials

ACTIONS

PERMALINK

RESOURCES

Cite

Add to Collections

PERMALINK

A density based empirical likelihood approach for testing bivariate normality

Gregory Gurevich

Albert Vexler

Abstract

1. Introduction and the statement of problem

1.1. Empirical likelihood and sample entropy

1.2. Bivariate extensions to the univariate density based empirical likelihood and sample entropy expressions

2. The bivariate density based empirical likelihood

3. A bivariate version of the variable partition histogram

4. An asymptotic consistency of the bivariate density based empirical likelihood

5. The dbEL ratio tests for bivariate normality

5.1. Null distributions of the dbEL ratio tests

Table 1.

Table 2.

Table 3.

6. Power of the tests

Table 4.

7. Data analysis

Table 5.

8. Concluding remarks

Supplementary Material

Acknowledgments

Footnotes

References

Associated Data

Supplementary Materials

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases