Abstract
We develop a novel nonparametric likelihood ratio test for independence between two random variables using a technique that is free of the common constraints of defining a given set of specific dependence structures. Our methodology revolves around an exact density-based empirical likelihood ratio test statistic that approximates in a distribution-free fashion the corresponding most powerful parametric likelihood ratio test. We demonstrate that the proposed test is very powerful in detecting general structures of dependence between two random variables, including non-linear and/or random-effect dependence structures. An extensive Monte Carlo study confirms that the proposed test is superior to the classical nonparametric procedures across a variety of settings. The real-world applicability of the proposed test is illustrated using data from a study of biomarkers associated with myocardial infarction.
Keywords: Density-based empirical likelihood, Empirical likelihood, Independence test, Likelihood, Nonlinear dependence, Nonparametric test, Random effect
1. INTRODUCTION
Independence and dependence are key concepts related to the implementation and validity of many statistical procedures. Research questions of interest often focus on the degree of dependence between two paired variables. For example, in studies related to lung cancer, a complex multifactorial disease, statistical tests for relationships between genetic factors and environmental factors play a vital role in understanding the mechanisms of disease (e.g., Gu et al., 2012). The functionality and limitations of classical tests of dependence for pairs of random variables have received a great deal of attention. Our focus is on a new empirical likelihood test as it compares to tests based on the Pearson (r), Spearman (rs), and Kendall (τ) correlation coefficients. Despite the fact that these measures of dependence are designed to capture specific structures of dependence between two random variables, statistical software packages often only provide procedures to conduct the r and/or rs and/or τ based tests for independence. As is well-known, the Pearson correlation coefficient is a measure of the strength of the linear relationship between two random variables (e.g., Pearson, 1920; Hauke and Kossowski, 2011). The Spearman correlation coefficient is a measure of a monotonic association between two random variables (e.g., Spearman, 1904; Hauke and Kossowski, 2011). Both the Pearson and the Spearman correlation coefficients are not well-suited to analyze non-monotone forms of dependence between two random variables (e.g., Embrechts et al., 2002). The Kendall correlation coefficient (τ) is a well-known measure of the concordance between two rankings associated with two sets of observations (Kendall, 1938, 1948). However, in many cases tests based on Kendall’s correlation coefficient shows relatively low power as compared to the Pearson and Spearman correlation coefficients (e.g., Mudholkar and Wilding, 2003). Note also that when dealing with dependence between two random variables Y and X that is defined via conditional expectations, say E(Y | X) ≠ E(Y), a test of a nonparametric regression model can provide a test of independence E(Y | X) = E(Y) (e.g., Christensen, 2002). In many practical data analysis settings, it is of interest to develop a general coefficient that can efficiently detect both linear and non-linear dependence between random variables.
Our goal is to propose a test for independence that is powerful in detecting a variety of complex dependence structures. In addition to linear/nonlinear dependence structures, the present applied biostatistical literature introduces random-effect-type associations between two sets of observations. For example, in the context of a lung cancer study, Gu et al. (2012) proposed to examine the dependence relationship of two polymorphisms rs1051730 and rs8034191 in random-effect-type model formulation. We aim to develop a simple and highly efficient nonparametric test that can be applied to detect dependence, including complex random effect type associations between two random variables. Towards this end, we construct a novel exact nonparametric likelihood ratio type test.
The likelihood ratio approach provides a basis for many important procedures and methods in statistical inference. When functional forms of data distributions are completely specified under the hypothesis to be tested the parametric likelihood approach is unarguably a powerful tool that can provide optimal statistical inference. In such cases, by virtue of the Neyman-Pearson lemma, the likelihood ratio tests yield the most powerful decision rules (e.g., Lehmann and Romano, 2005; Vexler and Wu, 2009; Vexler et al., 2010a). However, the parametric likelihood methods cannot be applied properly if assumptions on the forms of distributions of data do not hold. Often, in the context of likelihood applications the use of misspecified parametric forms of data distributions may result in wrong and/or inaccurate statistical conclusions. It is also well-known that when these key assumptions are not met, the parametric approach may be extremely biased and inefficient when compared to robust nonparametric counterparts. In this paper, we provide a nonparametric strategy to approximate an optimal parametric likelihood ratio test-statistic via an empirical likelihood methodology.
Empirical likelihood (EL) concepts were introduced as nonparametric alternatives to parametric likelihood methods. The EL method for testing has been dealt with extensively across a variety of settings (e.g., Qin and Lawless, 1994; Owen, 2001; Lazar and Mykand, 1998; Lazar, 2003; Vexler et al., 2009, 2010b, 2011b, 2012b). Commonly, the EL function has the form , where the probability weights, pi,i = 1,…,n satisfy the trivial assumptions , 0<pi < 1, i = 1, i = 1,…,n and the values of pi, i = 1,…,n, are derived by maximizing the EL function under empirical constraints. For example, when we draw an i.i.d. sample X1, ···, Xn under the null hypothesis that EX1 = 0, the corresponding empirical constraint is . Recent uses of the EL technique have been developed for hypothesis tests regarding various parameters of distributions, such as moments (e.g., Qin and Lawless, 1994; Owen, 2001, Vexler and Yu, 2011 for other examples).
Recently, several papers have introduced the density-based EL (dbEmpLike) approach for creating nonparametric test statistics that approximate parametric Neyman-Pearson statistics (e.g., Vexler and Gurevich, 2010; Gurevich and Vexler, 2011; Vexler et al., 2012a). The dbEmpLike method proposes to consider the likelihood in the form of
where f (·) is a density function of observations X1,…,Xn, and X(1) ≤…≤ X(n) are the order statistics based on X1,…, Xn. The dbEmpLike approach is a technique to approximate values of fj via maximization of Lf given a constraint related to the empirical version of the density property of the form ∫f (X)dX = 1. The dbEmpLike method was applied successfully to construct powerful test procedures for goodness-of-fit (e.g., Vexler and Gurevich, 2010; Vexler et al., 2011; Karagrigoriou, 2012) and two-sample problems (e.g., Gurevich and Vexler, 2011; Vexler, Tsai et al., 2012a). The dbEmpLike ratio tests have significantly improved power as compared to the corresponding classical procedures, each with various forms of alternative hypotheses. Miecznikowski et al. (2013) developed an R package (R Development Core Team) of statistical procedures based on the dbEmpLike methodology.
We introduce a simple dbEmpLike ratio test with high and stable power for detecting general cases of dependence. The proposed method is distribution-free, robust to model structures and highly efficient. Consider the simple example where we assume X1,…, X15 are i.i.d. observations from a standard normal distribution and Y1,…,Y15 are defined as ,i = 1, …,15. In this case, the classical Pearson, Spearman, and Kendall tests based on {(Xi,Yi), i = 1,…,15} for independence show powers of 0.36, 0.13, 0.21 at the significance level of 5%, whereas the proposed test provides power of 0.99. One advantage of the EL-based technique is that by applying the dbEmpLike approach we avoid having to specify a specific dependence structure, and hence we can efficiently detect linear, non-linear, and/or random-effect forms of dependence. An extensive Monte Carlo study is employed to support this conclusion. We observe that the proposed test either has relatively small power loss or is comparable to classical tests for independence when data has an exact linear dependence. On the other hand, in the cases of non-monotonic and/or random-effect type structures of the dependence, the proposed test is superior to the classical methods in terms of power gains. The asymptotic consistency of the proposed test is presented.
The paper is organized as follows. In Section 2, we introduce the dbEmpLike ratio test statistic. An extensive Monte Carlo comparison between the proposed test and the classical procedures is shown in Section 3. We discuss the performance of the tests under different alternative designs involving linear, non-linear and/or random-effect type forms of dependence. We show that the proposed test outperforms the classical tests in most of the scenarios we considered. In Section 4 the proposed test is applied to a biomarker study associated with myocardial infarction (MI) disease. The data set was collected from a sample of randomly selected residents of Erie and Niagara counties at the age of 35 to 79 years. This study examined the diagnostic ability of the thiobarbituric acid-reactive substances (TBARS) biomarker for MI disease by indirectly testing for independence between the TBARS biomarker values and the other measurements related to vitamin E and cholesterol biomarkers. The epidemiological literature indicates significant associations between the biomarkers vitamin E, cholesterol, and the MI disease. We conclude with remarks in Section 5.
2. METHOD
In this section, we develop the dbEmpLike test of association between random variables X and Y. The asymptotic consistency of the proposed test is presented. The null distribution of the test statistic is examined in detail, and we prove that the test offers exact control of the Type I error rate.
2.1 Development of the test statistic
Assume we observe a sample of i.i.d. two-dimensional random vectors (X1,Y1),…,(Xn,Yn). The joint distribution function of (X, Y) is denoted by FXY, while FX(x) and FY (y) denote the marginal distribution functions of X and Y, respectively. Consider the problem of testing the null hypothesis of bivariate independence
against a wide class of alternatives, where FXY, FX, and FY are absolutely continuous with corresponding density functions fXY, fX, and fY, respectively. The forms of the distribution and density functions are assumed to be unknown. In the case of completely specified forms of the density functions, the likelihood ratio test statistic is given by
where Y(1)≤…≤Y(n) are the order statistics based on the observations {Y1,…,Yn}; Xt(i) is the concomitant of the i-th order statistic (e.g., David and Nagaraja, 2003), i.e., the couple (Xt(i),Y(i)) belongs to (X1,Y1),…,(Xn,Yn). Our focus is on the nonparametric approximation to the likelihood L given above by applying the dbEmpLike methodology. We begin by evaluating the H0-likelihood , the denominator of the likelihood ratio L. In the empirical likelihood approach, values of
can be estimated by maximizing the likelihood with respect to fi,i = 1,…,n, given an empirical constraint to control the assumption ∫fY (u)du = 1 under H0. We obtain a corresponding empirical form of the constraint ∫fY (u)du = 1 into which fi,i = 1,…,n are convoluted. By virtue of Proposition 2.1 in Gurevich and Vexler (2011) one can show (for details see Appendix A) that for all positive integers, m < n/2 and r < n/2,
| (1) |
where X(1) ≤…≤ X (n) denote the order statistics based on the observations {X1,…,Xn}; si is an integer number such that X(s) = Xt(i);
| (2) |
X(si+r) = X(n), if si + r > n; X(sir) = X(1), if si −r<1; denotes the corresponding empirical distribution function based on (X1,…, Xn), Fn (x, y) denotes the corresponding bivariate empirical distribution function based on (X1,Y1),…,(Xn,Yn), and β1 ∈ (0,0.5). In this case, following Crouse (1966), Fn (x, y) is defined as
In order to find the values for each fi,i = 1,…,n, that maximize the likelihood under the empirical constraint
implied by Equation (1), we can apply the corresponding Lagrangian function
where λ is a Lagrange multiplier. Calculating roots of ∂ℓ/∂fi = 0,i = 1,…,n, one can show that
Thus, the dbEmpLike estimator of the likelihood ratio has the form of
| (3) |
Note that the term n−β1, 0 < β1 < 0.5, in definition (2) ensures the consistency of the proposed test as is shown in the proof of Proposition 1 below (see also Appendix A for details). In Equation (2), FXn (X(si+r))−FXn (X(si−r)) takes on the values of 2r/n, (n−si+r)/n, 1−1/n, and (si+r−1)/n, when (si+r<n,si−r>1), (si+r>n,si−r>1), (si+r>n,si−r<1), and (si+r<n,si−r<1), respectively. The form of the test-statistic at (3) depends on values of the integer parameters m and r involved in the Δ̃i (m,r) -construction. Following the density-based EL literature (e.g., Vexler and Gurevich, 2010, p. 5; Gurevich and Vexler, 2011, pp. 5, 10–11), we can estimate m and r via the EL methodology obtaining the test statistic
| (4) |
where γn = min(n0.9,n/2), 0.75 < β2 < 0.9 and Δ̃i (m,r) is defined in Equation (2). The conditions on m and r at definition (4) are sufficient to guarantee the proposed test is an asymptotic power one procedure as stated in Proposition 1.
Various Monte Carlo experiments based on more than one thousand different scenarios of (X,Y) -distributions and a variety of fixed sample sizes n showed that the statistic ṼTn (m,r) reaches its maximums with respect to m ≥ 0.5nβ2 and r≥0.5nβ2 at m = 0.5nβ2 and r = 0.5nβ2 for all the experiments. Thus, we obtain the simple test statistic
| (5) |
where the function [x] denotes the nearest integer to x.
Accordingly, the decision rule of the proposed test is to reject the null hypothesis if log(VTn)>Cα, where Cα is an α-level test threshold. The proposed test is exact, as shown in Section 2.2 below. Note that extensive Monte Carlo simulations confirmed the robustness of the proposed test with respect to the values of β1 at (2) and β2, i.e. one can show that the power of the new test does not depend significantly on values of β1∈ (0,0.5) and β2∈ (0.75,0.9) under various scenarios of alternative distributions applied to the hypothesis of bivariate independence. Thus, without loss of generality, we set β1 and β2 in the test statistic to be 0.45 and 0.8, respectively.
Remark
The test statistic defined at (5) has a dbEmpLike ratio type form. Note that one can simplify this test statistic by taking into account that product of all the denominators from Equation (2) is simply a constant dependent on n.
Let EH0 denote the expectation under H0 and EH0 denote the expectation under the alternative hypothesis H1: FXY (x, y) ≠ FX (x)FY (y), for some x, y ∈ ℝ. Assuming that
the following proposition demonstrates the consistency of the proposed test.
Proposition 1
Let (X,Y) be a random bivariate vector with the absolute continuous distribution function FXY (x, y) defined on ℜ2 and with the corresponding marginal distribution functions FX (x) and FY (y). Assume that the expectations E(log(fX (X1)), E(log(fY (Y1)) and E(log(fXY (X1Y1)) are finite. Then, under H0,
Proof
The proof of this proposition is outlined in Appendix B.
2.2 Null distribution
In this section, we show that the proposed test statistic is distribution-free under H0. We then present the critical values for the proposed test for different sample sizes. The test statistic VTn depends only on the empirical distribution functions FXn and Fn, which in turn depend only on certain indicator functions. For example, for fixed x and y, we have
where U1, U2 are Uniform(0, 1) distributed, u1 = FX(x) and u2 = FY(y). (For details regarding distribution free test constructions based on Fn see Crouse, 1966). Hence, it follows that
Therefore, the proposed method is distribution-free. Moreover the critical values for the dbEmpLike test can be accurately approximated using Monte Carlo techniques. In order to tabulate the percentiles of the null distribution of the test statistic log(VTn) with β1 = 0.45 and β2 = 0.8, we drew 50,000 samples of X1,…, Xn ~Uniform[0,1] and Y1,…,Yn ~Uniform[0,1] calculating values of log(VTn) at each sample size n. The generated values of the test statistic log(VTn) were used to determine the critical values Cα of the null distribution of log(VTn) at the significance level α. The results of this Monte Carlo study are presented in Table 1.
Table 1.
Critical Values of the Proposed Test Statistic
| Sample size n |
α
|
|||
|---|---|---|---|---|
| 0.2 | 0.1 | 0.05 | 0.01 | |
| 5 | 3.7592 | 3.7982 | 4.0709 | 4.3405 |
| 7 | 5.7211 | 5.9138 | 6.0390 | 6.3010 |
| 10 | 7.4321 | 7.6529 | 7.8549 | 8.2521 |
| 15 | 10.8383 | 11.1268 | 11.3766 | 11.8930 |
| 17 | 11.6964 | 11.9860 | 12.2427 | 12.7964 |
| 20 | 14.0863 | 14.4149 | 14.7061 | 15.3051 |
| 23 | 15.6590 | 15.9941 | 16.2962 | 16.9399 |
| 25 | 16.6106 | 16.9430 | 17.2632 | 17.9485 |
| 30 | 19.7139 | 20.0724 | 20.4089 | 21.1245 |
| 35 | 22.7744 | 23.1606 | 23.5258 | 24.2849 |
| 40 | 25.7939 | 26.1989 | 26.5672 | 27.3895 |
| 45 | 28.7824 | 29.2060 | 29.6057 | 30.4312 |
| 50 | 32.1148 | 32.5648 | 32.9714 | 33.8318 |
| 60 | 37.9011 | 38.3995 | 38.8356 | 39.7367 |
| 70 | 43.6649 | 44.1765 | 44.6390 | 45.6783 |
| 80 | 49.4025 | 49.9159 | 50.3733 | 51.3542 |
| 90 | 55.3020 | 55.8526 | 56.3487 | 57.4026 |
| 100 | 60.9451 | 61.4996 | 62.0054 | 63.0367 |
An R function (R Development Core Team, 2012) for Monte Carlo approximations for the critical values Cα of the null distribution of log(VTn) is available as online supplementary material. This function can be easily modified to execute the proposed test based on real data. Note that, since the statistic log(VTn) is distribution-free under the null hypothesis, Monte Carlo or Bootstrap type procedures can be employed to estimate different characteristics of the distribution of log(VTn) under H0, e.g., its variance.
3. SIMULATION STUDY
We carried out an extensive Monte Carlo study to evaluate the performance of the proposed test. Lehmann (1975) noted that “the study of the power and efficiency of tests of independence is complicated by the difficulty of defining natural classes of alternatives to the hypothesis of independence”. Furthermore, Kallenberg and Ledwina (1999) pointed out that power comparisons of tests of independence are rarely performed in statistical literature. These comments indicate the difficulty of implementing a Monte Carlo study for the power of tests of independence.
In this study we attended to general forms of dependence, which have been commonly pointed out with respect to bivariate data. That is, we consider linear and nonlinear forms of dependence between X and Y, including random-effect type structures of dependence. Following Kallenberg and Ledwina (1999), we analyzed three groups of alternatives: Group 1: non-linear correlation of X and Y; Group 2: linear correlation of X and Y; and Group 3: different bivariate distributions as alternatives to independence, including Pearson Type VII, Morgenstern, Plackett, and Cauchy distributions as described in Johnson (1987). Both the Groups, 1 and 2, include random-effects models. Table 2 displays formal definitions of the designs applied in the Monte Carlo experiments. A Pearson Type VII distribution represents symmetric distributions of X and Y, where random variables X and Y are uncorrelated but dependent.
Table 2.
Distributions for (X,Y) used in the power study
| Alternative Designs |
Models/Description
|
||
|---|---|---|---|
| Xi,i = 1…n | Yi,i = 1…n | ||
| Group 1 (non-linear) | Design 1.1 | N(0,1) | , iidεi ~ N (0,1) |
| Design 1.2 | N(0,1) | , iidγi,εi ~ N (0,1) | |
| Design 1.3 | N(0,1) | log(1 + |Xi|) | |
| Design 1.4 | N(0,1) | log(1 + |Xi|)γi, γi ~ N(0,1) | |
| Design 1.5 | N(0,1) | 2 + 0.1εi/Xi, εi ~ N (0,1) | |
|
| |||
| Group 2 (linear) | Design 2.1 | Lognormal(0, 1) | 1 + γi Xi, γi ~ N(0,1) |
| Design 2.2 | Lognormal(0, 1) | 1 + 0.1 Xi + 4γi Xi + εi, γi, εi ~ N(0, 1) | |
| Design 2.3 | N(0, 1) | 2 + 0.1Xi + εi, εi ~ N (0,1) | |
| Design 2.4 | U[0,1] | 2 + 0.5Xi + εi, εi ~ N (0,1) | |
| Design 2.5 | U[0,1] | 2 + 0.5Xi + γiXi + εi,εi ~ N(0,1), γi ~ N(0,22) | |
| Design 2.6 | U[0,1] | 2 + Xi + εi, εi ~ N(0,1) | |
| Design 2.7 | U[0,1] | 2 + Xi + γi Xi + εi, εi ~ N(0,1), γi ~ N(0,22) | |
|
| |||
| Group 3 (bivariate distributions) | Design 3.1 | Morgenstern (α = 1) | Reference Johnson (1987), pp. 180–190 |
| Design 3.2 | Plackett (ψ = 3.5) | Reference Johnson (1987), pp. 191–197 | |
| Design 3.3 | Pearson Type VII (m=1.1) with μ = 0 and Σ = I | Reference Johnson (1987), pp. 117–121 | |
| Design 3.4 | The multivariate Cauchy distribution | Reference Johnson (1987), p. 44 | |
In this section we also examine numerically the power of a test proposed by Einmahl and McKeague (2003). Those authors constructed a test statistic by localizing the empirical likelihood using one or more ‘time’ variables implicit in the null hypothesis and then forming integrals of the log-likelihood ratio statistic.
Table 3 shows the results of the power evaluations of the proposed test (“log(VTn)”), the classical tests (“Kendall”, “Pearson”, “Spearman”) and the Einmahl and McKeague’s test (“EMcK”) via the Monte Carlo study based on 10,000 replications of X1,…, Xn and Y1,…,Yn for the designs corresponding to Groups 1–3 given in Table 2 at each sample size n. This study demonstrates the dbEmpLike test is superior to the considered classical tests in most scenarios under the designs of Groups 1–3.
Table 3.
The Monte Carlo power of the tests.
| Tests | Design 1.1 | Design 1.2 | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
| ||||||||||||
| Sample size (n) | Sample size (n) | |||||||||||
| 20 | 25 | 30 | 35 | 50 | 70 | 20 | 25 | 30 | 35 | 50 | 70 | |
| log(VTn) | 0.36 | 0.48 | 0.57 | 0.66 | 0.85 | 0.99 | 0.35 | 0.47 | 0.57 | 0.67 | 0.85 | 0.98 |
| Kendall | 0.13 | 0.14 | 0.15 | 0.16 | 0.18 | 0.22 | 0.11 | 0.11 | 0.12 | 0.13 | 0.13 | 0.13 |
| Pearson | 0.26 | 0.28 | 0.29 | 0.31 | 0.32 | 0.36 | 0.26 | 0.26 | 0.27 | 0.28 | 0.28 | 0.30 |
| Spearman | 0.12 | 0.13 | 0.14 | 0.15 | 0.17 | 0.19 | 0.09 | 0.10 | 0.10 | 0.10 | 0.11 | 0.11 |
| EMcK | 0.26 | 0.31 | 0.47 | 0.56 | 0.81 | 0.95 | 0.19 | 0.29 | 0.37 | 0.48 | 0.74 | 0.90 |
| Tests | Design 1.3 | Design 1.4 | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
| ||||||||||||
| Sample size (n) | Sample size (n) | |||||||||||
| 20 | 25 | 30 | 35 | 50 | 70 | 20 | 25 | 30 | 35 | 50 | 70 | |
| log(VTn) | 0.99 | 0.99 | 1.00 | 1.00 | 1.00 | 1.00 | 0.33 | 0.43 | 0.54 | 0.64 | 0.86 | 0.97 |
| Kendall | 0.23 | 0.23 | 0.23 | 0.24 | 0.24 | 0.24 | 0.12 | 0.12 | 0.13 | 0.13 | 0.13 | 0.13 |
| Pearson | 0.19 | 0.19 | 0.19 | 0.19 | 0.19 | 0.19 | 0.19 | 0.19 | 0.20 | 0.20 | 0.20 | 0.20 |
| Spearman | 0.14 | 0.14 | 0.14 | 0.15 | 0.15 | 0.15 | 0.10 | 0.10 | 0.10 | 0.10 | 0.10 | 0.10 |
| EMcK | 0.78 | 0.89 | 0.97 | 0.99 | 1.00 | 1.00 | 0.10 | 0.13 | 0.14 | 0.14 | 0.18 | 0.25 |
| Tests | Design 1.5 | Design 2.1 | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
| ||||||||||||
| Sample size (n) | Sample size (n) | |||||||||||
| 20 | 25 | 30 | 35 | 50 | 70 | 20 | 25 | 30 | 35 | 50 | 70 | |
| log(VTn) | 0.12 | 0.15 | 0.23 | 0.36 | 0.76 | 0.94 | 0.51 | 0.65 | 0.76 | 0.87 | 0.97 | 1.00 |
| Kendall | L1 | L | L | L | L | L | 0.09 | 0.09 | 0.10 | 0.10 | 0.11 | 0.11 |
| Pearson | L | L | L | L | L | L | 0.46 | 0.48 | 0.50 | 0.51 | 0.55 | 0.57 |
| Spearman | L | L | L | L | L | L | 0.09 | 0.09 | 0.09 | 0.10 | 0.10 | 0.10 |
| EMcK | 0.03 | 0.04 | 0.07 | 0.08 | 0.19 | 0.61 | 0.33 | 0.52 | 0.70 | 0.77 | 0.91 | 0.97 |
| Tests | Design 2.2 | Design 2.3 | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
| ||||||||||||
| Sample size (n) | Sample size (n) | |||||||||||
| 20 | 25 | 30 | 35 | 50 | 70 | 20 | 25 | 30 | 35 | 50 | 70 | |
| log(VTn) | 0.44 | 0.57 | 0.68 | 0.80 | 0.93 | 1.00 | 0.05 | 0.06 | 0.07 | 0.08 | 0.08 | 0.12 |
| Kendall | 0.09 | 0.10 | 0.10 | 0.10 | 0.10 | 0.10 | 0.05 | 0.06 | 0.07 | 0.08 | 0.09 | 0.12 |
| Pearson | 0.44 | 0.49 | 0.50 | 0.50 | 0.55 | 0.56 | 0.06 | 0.07 | 0.08 | 0.08 | 0.10 | 0.13 |
| Spearman | 0.08 | 0.09 | 0.09 | 0.09 | 0.09 | 0.09 | 0.06 | 0.07 | 0.07 | 0.08 | 0.09 | 0.12 |
| EMcK | 0.28 | 0.45 | 0.58 | 0.70 | 0.90 | 0.98 | 0.06 | 0.07 | 0.07 | 0.07 | 0.08 | 0.11 |
| Tests | Design 2.4 | Design 2.5 | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
| ||||||||||||
| Sample size (n) | Sample size (n) | |||||||||||
| 20 | 25 | 30 | 35 | 50 | 70 | 20 | 25 | 30 | 35 | 50 | 70 | |
| log(VTn) | 0.08 | 0.09 | 0.10 | 0.11 | 0.12 | 0.14 | 0.14 | 0.15 | 0.17 | 0.20 | 0.24 | 0.35 |
| Kendall | 0.08 | 0.09 | 0.11 | 0.11 | 0.16 | 0.20 | 0.06 | 0.07 | 0.08 | 0.09 | 0.11 | 0.12 |
| Pearson | 0.09 | 0.10 | 0.11 | 0.12 | 0.17 | 0.21 | 0.07 | 0.08 | 0.09 | 0.10 | 0.12 | 0.14 |
| Spearman | 0.09 | 0.10 | 0.11 | 0.11 | 0.16 | 0.21 | 0.07 | 0.07 | 0.08 | 0.09 | 0.11 | 0.12 |
| EMcK | 0.12 | 0.14 | 0.14 | 0.15 | 0.16 | 0.17 | 0.10 | 0.11 | 0.11 | 0.14 | 0.20 | 0.30 |
| Tests | Design 2.6 | Design 2.7 | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
| ||||||||||||
| Sample size (n) | Sample size (n) | |||||||||||
| 20 | 25 | 30 | 35 | 50 | 70 | 20 | 25 | 30 | 35 | 50 | 70 | |
| log(VTn) | 0.16 | 0.17 | 0.20 | 0.21 | 0.26 | 0.38 | 0.19 | 0.21 | 0.23 | 0.26 | 0.28 | 0.45 |
| Kendall | 0.18 | 0.23 | 0.30 | 0.33 | 0.46 | 0.62 | 0.12 | 0.13 | 0.16 | 0.18 | 0.25 | 0.33 |
| Pearson | 0.22 | 0.26 | 0.32 | 0.36 | 0.50 | 0.65 | 0.13 | 0.16 | 0.18 | 0.20 | 0.26 | 0.34 |
| Spearman | 0.20 | 0.24 | 0.31 | 0.34 | 0.47 | 0.62 | 0.13 | 0.14 | 0.16 | 0.18 | 0.24 | 0.32 |
| EMcK | 0.18 | 0.23 | 0.24 | 0.30 | 0.44 | 0.57 | 0.14 | 0.17 | 0.19 | 0.20 | 0.24 | 0.40 |
| Tests | Design 3.1 (α = 1) | Design 3.2 (ψ = 3.5) | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
| ||||||||||||
| Sample size (n) | Sample size (n) | |||||||||||
| 20 | 25 | 30 | 35 | 50 | 70 | 20 | 25 | 30 | 35 | 50 | 70 | |
| log(VTn) | 0.24 | 0.31 | 0.34 | 0.43 | 0.63 | 0.75 | 0.58 | 0.68 | 0.79 | 0.88 | 0.98 | 1.00 |
| Kendall | 0.26 | 0.33 | 0.41 | 0.45 | 0.64 | 0.80 | 0.59 | 0.70 | 0.81 | 0.87 | 0.96 | 0.99 |
| Pearson | 0.30 | 0.38 | 0.45 | 0.50 | 0.68 | 0.82 | 0.54 | 0.63 | 0.72 | 0.79 | 0.91 | 0.98 |
| Spearman | 0.28 | 0.35 | 0.42 | 0.49 | 0.66 | 0.81 | 0.58 | 0.69 | 0.77 | 0.84 | 0.94 | 0.99 |
| EMcK | 0.23 | 0.30 | 0.34 | 0.45 | 0.62 | 0.73 | 0.59 | 0.68 | 0.80 | 0.89 | 0.97 | 0.98 |
| Tests | Design 3.3 | Design 3.4 | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
| ||||||||||||
| Sample size (n) | Sample size (n) | |||||||||||
| 20 | 25 | 30 | 35 | 50 | 70 | 20 | 25 | 30 | 35 | 50 | 70 | |
| log(VTn) | 0.99 | 0.99 | 1.00 | 1.00 | 1.00 | 1.00 | 0.23 | 0.30 | 0.38 | 0.44 | 0.65 | 0.82 |
| Kendall | 0.19 | 0.20 | 0.20 | 0.20 | 0.21 | 0.22 | 0.11 | 0.12 | 0.12 | 0.13 | 0.13 | 0.13 |
| Pearson | 0.91 | 0.92 | 0.93 | 0.93 | 0.94 | 0.94 | 0.57 | 0.60 | 0.64 | 0.67 | 0.71 | 0.75 |
| Spearman | 0.13 | 0.13 | 0.14 | 0.14 | 0.14 | 0.14 | 0.09 | 0.09 | 0.09 | 0.10 | 0.11 | 0.11 |
| EMcK | 0.23 | 0.27 | 0.32 | 0.36 | 0.62 | 0.93 | 0.12 | 0.12 | 0.13 | 0.15 | 0.18 | 0.25 |
The notation L indicates a number that is less than 0.007
The power differences between the novel test and the classical tests become more substantial as the sample size increases. For example, when the sample sizes are n=30 and n=35 the power of the proposed test is roughly two times larger than that of the classical tests given Group 1 alternatives. Under the designs 2.1 and 2.2 it is clear that the proposed test dramatically outperforms the classical tests. In these cases, the proposed test has approximately a 10%–60% power gain as compared to the classical procedures when n ≥ 25.
In the case of design 2.3, where the linear model of dependence is dominant, the proposed test is comparable to the classical tests. In particular, note that for the sample size n ≥ 35 the proposed test provides the same Monte Carlo power as that of the classical procedures. In these scenarios, it is anticipated that the Pearson correlation test has higher power than the other considered tests since the Pearson correlation test is known to be developed to detect linear forms of dependence between two random variables. Note that in the situations where the random-effect type structures of the dependence is present (e.g., the designs 1.2 and 2.2) the proposed test clearly outperforms the classical tests. Designs 2.4–2.7 consider scenarios based on mixed linear models as a function of an increasing linear fixed effect. In these cases, when the models do not include the random component γ the Pearson correlation test can be recommended as a very efficient test. However, when the random-effect γ is incorporated the dbEmpLike test has higher power than the other considered tests.
The conclusions regarding the power behavior of the tests under the designs 3.1 and 3.2 are similar to those related to the Monte Carlo simulations under the design 2.3. Under the designs given at 3.3 and 3.4 the proposed test has a considerable power gain in comparison to the Spearman and Kendall correlation tests (design 3.3) and the Spearman, Kendall correlation tests and the EMcK test (design 3.4), respectively.
Although Einmahl and McKeague’s test demonstrated very powerful characteristics capturing various non-linear and linear dependence structures, the proposed dbEmpLike test outperformed the Einmahl and McKeague’s test in many cases.
We conducted a brief evaluation of the data-driven rank tests for independence proposed in Kallenberg and Ledwina (1999). In order to develop these tests it was assumed that the observed samples are distributed following the joint density function given as
where bj denotes the jth orthonormal Legendre polynomial, θ = (θ1, θ2, …)T and c(θ) is a normalizing constant. Kallenberg and Ledwina proposed the score tests for testing θ = 0 against θ ≠ 0. They assumed exponential families of data distributions corresponding to a very wide class of alternatives under which it is most probable that model-free tests for independence have less power than the data-driven rank tests. Note that one can demonstrate that many familiar classes of distributions are not exponential families (e.g., Klauer, 1986). For example, we generated (Xi,Yi),i = 1,…,50 from the reciprocal-normal type distribution, using the R-command (R Development Core Team, 2012): 1/mvrnorm(n, rep(0, 2), Sigma), where . In this case the data-driven rank tests in the forms mentioned in Section 3 of Kallenberg and Ledwina (1999), gave the Monte Carlo powers 0.15 and 0.12 as compared to the Monte Carlo powers of 0.31, 0.15, 0.06, 0.12 and 0.13 obtained by using the tests “log(VTn)”, “Kendall”, “Pearson”, “Spearman” and “EMcK”, respectively. Note also, by virtue of the tests’ structures, Kallenberg and Ledwina’s tests require EH1bj (FX (X))bs (Fy (Y)) ≠ 0, for some j and s, to be consistent. Our Monte Carlo study showed that the data-driven rank tests had relatively low power when we generated, e.g., Xi ~ Unif [0,1] and Yi = arg minz Σs,jasj(bj(Xi)bs(z))2 (or Yi = arg minz(Σs,jasjbj(Xi)bs(z))2), where a11, a12,… are constant. For example, when Xi ~ Unif [0,1], , i = 1,…,50, the Monte Carlo powers had values of 0.69, 0.83 (the data-driven rank tests) and 1, 0.07, 0.37, 0.60, 1 (“log(VTn)”, “Kendall”, “Pearson”, “Spearman” and “EMcK” tests). Similarly, when Xi ~ Unif [0,1], Y = arg minz(b1(Xi)b1(z) + 0.3b1(Xi)b2(z) + 0.5b2(Xi)b1(z) + b2(Xi)b2(z))2, 1 ≤ i ≤ 50, the results were 0.72, 0.66 (the data-driven rank tests) versus 1, 0.08, 0.26, 0.40, 1 (“log(VTn)”, “Kendall”, “Pearson”, “Spearman” and “EMcK” tests). When Xi ~ Unif [0,1], Yi = arg minz Σj=1,2,4(bj (Xi)bj (z))2,i = 1,…,50, the results were 0.58, 0.54 (the data-driven rank tests) versus 0.83, 0.02, 0.011, 0.03, 0.78 (“log(VTn)”, “Kendall”, “Pearson”, “Spearman” and “EMcK” tests).
Based on the Monte Carlo results, we conclude that the proposed test exhibits high and stable power characteristics in comparison to the well-known classical procedures. Specifically, the proposed test performs reasonably well, and is generally competitive with the classical tests in the cases of linear forms of dependence. On the other hand, the proposed test significantly outperforms the classical tests in terms of the power properties when detecting the nonlinear forms of bivariate dependence including random-effect type dependencies.
4. DATA ANALYSIS
In this section, we present a data example to illustrate the practical application of the proposed test. The use of thiobarbituric acid-reactive substances (TBARS) as a value to summarize total circulating oxidative stress in individuals is common in laboratory research (Armstrong, 1994), but its use as a discriminant factor between individuals with and without myocardial infarction (MI) disease is still controversial (e.g., Schisterman et al., 2001). Some authors have found a positive association between TBARS and MI disease (e.g., Jayakumari et al., 1992; Miwa et al., 1995), while others did not find corresponding significant associations (e.g., Karmansky et al., 1996). The aim of this study is to investigate the discriminative properties of TBARS with regard to MI disease by indirectly evaluating an association between TBARS and MI disease. Towards this end, we implemented separately two groups of tests of independence: one is in order to test for independence between TBARS and high-density lipoprotein (HDL)-cholesterol, and the other is to test for the independence between TBARS and vitamin E, where both the biomarkers, HDL-cholesterol and vitamin E, are historically known to be significantly associated with MI disease (e.g., Schisterman et al., 2001). Therefore, the results of these two groups of tests can be beneficial to entail indirect evidence on the discriminative ability of TBARS as a single biomarker in individuals with MI disease versus healthy individuals.
A sample of randomly selected residents of Erie and Niagara counties, 35 to 79 years of age, was employed in this investigation. The New York State department of Motor Vehicles drivers’ license rolls was used as the sampling frame for adults between the age of 35 and 65, while the elderly sample (age 65 to 79) was randomly selected from the Health Care Financing Administration database. The study evaluated 230 measurements of TBARS, HDL-cholesterol and vitamin E biomarkers. Half of them were collected on cases, who recently survived on MI disease, and the other half on controls, who had no previous MI disease. Table 4 depicts the p-values obtained via the dbEmpLike, Pearson, Spearman and Kendall procedures for the case and control groups, respectively.
Table 4.
The p-values obtained via the proposed test and the classical procedures
| Test of independence | Groups | n | log(VTn) | Pearson | Spearman | Kendall |
|---|---|---|---|---|---|---|
| TBARS versus HDL-cholesterol | Control | 115 | 0.0147 | 0.0514 | 0.0709 | 0.0773 |
| Case | 115 | 0.0228 | 0.1619 | 0.05671 | 0.0730 | |
|
| ||||||
| TBARS versus Vitamin E | Control | 115 | 0.0440 | 0.0508 | 0.1168 | 0.0911 |
| Case | 115 | 0.0019 | 0.0977 | 0.0503 | 0.0844 | |
The classical tests provide p-values that are slightly larger than a significance level of 5%. As a result, the dependence between TBARS and HDL-cholesterol biomarkers, as well as the association between TBARS and vitamin E biomarkers, are not detected by the classical procedures. The proposed dbEmpLike test reveals a strong evidence of an association between TBARS and HDL-cholesterol biomarkers and a significant dependence between TBARS and vitamin E biomarkers. That is, the dbEmpLike test is more sensitive as compared with the classical methods to rejecting the null hypothesis of independence between TBARS and HDL-cholesterol biomarkers as well as for the test of independence between TBARS and vitamin E biomarkers.
5. CONCLUDING REMARKS
In this article we proposed and developed a novel density-based empirical likelihood ratio test for independence of two random variables. The proposed test is distribution-free, simple and can be easily applied in practice. The new procedure has very favorable and robust power properties against linear and non-monotone forms of dependence, with or without random effect type structures. Through extensive Monte Carlo simulation studies, we showed that the proposed test has significantly higher power as compared with the classical Pearson, Spearman, and Kendall test across a variety of scenarios. This study demonstrated that the proposed test can efficiently detect a broader class of dependence structures than can the classical techniques.
Supplementary Material
Acknowledgments
This research is supported by the NIH grant 1R03DE020851 - 01A1 (the National Institute of Dental and Craniofacial Research). The authors are grateful to the Editor, the Associate Editor and the referees for suggestions that led to a substantial improvement in this paper.
APPENDIX A: The empirical constraint (1)
Proposition 2.1 in Gurevich and Vexler (2011) shows that for all positive integers, m < n/2,
where Y(i+m) = Y(n), if i + m > n; Y(i−m) = Y(1), if i − m < 1. Since ,
| (A.1) |
It is clear that when m/n → 0 as m,n→ ∞, we have . In the interest of economy of space we refer the reader to Vexler and Gurevich (2010, p.533–534), Gurevich and Vexler (2011), Vexler and Yu (2011), Vexler et al., (2011) and the “sample entropy”-literature cited in these papers for more details regarding the equations above and the integer parameter m at Equation (A.1).
Consider (A.1) in the form of
| (A.2) |
where the conditional density fY|X is denoted as
By virtue of the Mean Value Theorem we have
Thus, taking into account (A.2), we can constrain values of fi,i = 1,…, n, to satisfy
| (A.3) |
where . It follows straightforward that
The constraint at Equation (A.3) depends on the unknown theoretical distributions of the underlying observations. To obtain an empirical constraint corresponding to Equation (A.2), we need to estimate Δi,i = 1,…,n. Towards this end let X(1) ≤…≤X(n) denote the order statistics based on the observations {X1,…,Xn} and si be an integer number such that X(si) = Xt(i). Then it can be shown through the implementation of the dbEmpLike methodology that Δi can be approximated by
| (A.4) |
where Δ̃i (m,r), an empirical estimator of Δi, is defined in (2). This empirical estimation is based on sample entropy considerations, e.g. see Vexler and Gurevich (2010) as well as Gurevich and Vexler (2011) for details. These papers extend the concept for estimating density functions’ values using sample entropy (e.g., Vasicek, 1976), which are presented as a consequence of the dbEmpLike approach. For example, dP{X ≤ x}/dx|x = Xt(i) in the definition of Δi can be approximated by (FX (X(si+r))−FX (X(si−r))/(2r). The sample entropy and dbEmpLike literature show that such approximations applied to construct test statistics are very efficient even when the observed samples have relatively small sizes.
Regarding the term n−β1, 0 < β2 < 0.5, in the definition (2) of Δ̃i(m,r), we note that theoretically, across several situations, the values of the numerator of Δi can be of an order that is comparable with that of n−0.5. In these cases it can lead to a bias of the estimation of FXY of an order that is greater than that of FXY. Additionally, in such scenarios the n−β1 term helps distance the numerator of Δi from zero. When FXY is close to zero, the term n−β1 makes Δ̃i(m,r) estimable.
Thus, Equations (A.3) and (A.4) imply that the empirical constraint on the values of fYi,i = 1,…,n, is
APPENDIX B: PROOFS
This appendix comprises a proof scheme to establish Proposition 1.
Proof of Proposition 1
We first study the elements of Δ̃i (m,r) defined in (2). Towards this end we denote QXY,i = FXY (X(si+r),Y(i+m))−FXY (X(si−r),Y(i+m))−FXY (X(si+r),Y(i−m))−FXY (X(si−r),Y(i−m))+n−β1, Qn,i = Fn (X(si+r),Y(i+m))−Fn (X(si−r),Y(i+m))−Fn (X(si+r),Y(i−m))−Fn (X(si−r),Y(i−m))+n−β1 and , where Fn is defined in (2). It is clear that FYn (Y(i+m))−FYn (Y(i−m)) = 2m/n, when i+m<n, and i−m>1. Then, since the definitions (2), (4) and m/n → 0, as n → ∞, the statistic at Equation (3) can be reformulated as
| (B.1) |
where m,r ∈ (0.5nβ2,n0.9) and 0.75 < β2 < 0.9. Regarding this equation, we show that
Towards this end, we apply the theorem of Kolmogorov (e.g., Serfling, 1980): for all ε ∈ (0,1/2),
Now we consider the case of . Using the fact that FXn(X(si+r))−FXn (X(si−r)) ≥2r/n, through the definition of FXn, and 0.5nβ2 ≤ r, we obtain that for each ε ∈ (0,1/4),
In this case, we also have
This implies
| (B.2) |
uniformly over m,r ∈ (0.5nβ2,n0.9), as n → ∞.
Similarly, one can show that
| (B.3) |
uniformly over m,r ∈ (0.5nβ2,n0.9), as n → ∞.
Now we consider the last term of (B.1) to show that , as n → ∞. To this end, we apply Theorem 1 of Kiefer (1961); that is, for all ε ∈ (0,1/4),
which clearly implies that
In the case of , applying the trivial inequality Qn,i ≥ n−β1, we have
Thus,
| (B.4) |
uniformly over m,r ∈ (0.5nβ2,n0.9), as n → ∞.
Taking into account (B.1)–(B.4), we conclude that
| (B.5) |
uniformly over m,r ∈ (0.5nβ2,n0.9).
Now we note that the test statistic at Equation (5) can be presented in the form
| (B.6) |
where m/n → 0,
(Here, for the sake of clarity and without loss of generality, we represent instead of the long notation shown in (2)).
Thus, we can consider n−1 log(VTn) to be based on uniformly distributed random variables Ui = FX (Xi) ~ Uniform[0,1], and Wi = FY (Yi) ~ Uniform[0,1],i = 1,..,n, i.e.
| (B.7) |
where
and
Let FUW (u, w), FU (u), and FW (w) denote a distribution function and corresponding marginal distribution functions of the uniformly distributed random variables (Ui,Wi),i = 1,…,n, respectively, and let fUW (u, w), fU (u), and fW (w) be the respective density functions.
By virtue of (B.5), we have
| (B.8) |
where
Consider the numerator term in the right-hand side of (B.8). We apply the Taylor argument to obtain the following result:
| (B.9) |
where and .
To analyze the second and third terms of the right-hand side of (B.9), we apply Chebyshev’s inequality. It follows that for η < 1−(2k)−1, m,r ∈ (0.5nβ2,n0.9), 0.75 < β2 < 0.9, k=1,2…,
| (B.10) |
as n → ∞, where U(q) is the qth order statistic based on standard uniformly distributed random variables (see for details David and Nagaraja, 2003). Then, this leads to the asymptotic result
| (B.11) |
Using the Taylor series expansion, for si>r,i>m, we have
| (B.12) |
where U*(si) ∈ (U(si−r),U(si)) and W*(i) ∈ (W(i−m),W(i)).
Noting that that fU (U(si)) = fW (W(i)) = 1 and by the results of (B.11) and (B.12), we have
| (B.13) |
Combining (B.6), (B.8) and (B.13), we conclude
| (B.14) |
It is clear that (B.14) completes the proof of Proposition 1.
Footnotes
R Code: Code for Monte Carlo computing the critical values of the null distribution of the proposed test statistic.
References
- 1.Armstrong D. Free radicals in diagnostic medicine: a systems approach to laboratory, technology, clinical correlations, and antioxidant therapy. New York: Plenum Press; 1994. [Google Scholar]
- 2.Christensen R. Advanced Linear Modeling. New York: Springer-Verlag New York, LLC; 2002. [Google Scholar]
- 3.Crouse CF. Distribution Free Tests Based on the Sample Distribution Function. Biometrika. 1966;53:99–108. [PubMed] [Google Scholar]
- 4.David HA, Nagaraja HN. Order Statistics. New York: Wiley; 2003. [Google Scholar]
- 5.Einmahl JHJ, McKeague IW. Empirical likelihood based hypothesis testing. Bernoulli. 2003;9:267–290. [Google Scholar]
- 6.Embrechts P, McNeil A, Straumann D. Correlation and dependence in risk management: properties and pitfalls. In: Dempster MAH, editor. Risk Management: Value at Risk and Beyond. Cambridge University Press; Cambridge: 2002. pp. 176–223. [Google Scholar]
- 7.Gu M, Dong X, Zhang X, Wang X, Qi Y, Yu J, Niu W. Strong Association between Two Polymorphisms on 15q25.1 and Lung Cancer Risks: A Meta-Analysis. PLoS ONE. 2012;7:e37970. doi: 10.1371/journal.pone.0037970. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Gurevich G, Vexler A. A two-sample empirical likelihood ratio test based on samples entropy. Statistics and Computing. 2011;21:657–670. [Google Scholar]
- 9.Hauke J, Kossowksi T. Comparison of values of Pearson’s and Spearman’s correlation coeffcienets on the same sets of data. Quaestiones Geographicae. 2011;3:87–93. [Google Scholar]
- 10.Jayakumari N, Ambikakumari V, Balakrishnan KG, Subramonia Lyer K. Antioxidant status in relation to free radical production during stable and unstable angina syndromes. Atherosclerosis. 1992;94:183–190. doi: 10.1016/0021-9150(92)90243-a. [DOI] [PubMed] [Google Scholar]
- 11.Johnson ME. Multivariate Statistical Simulation. New York: Wiley; 1987. [Google Scholar]
- 12.Kallenberg WCM, Ledwina T. Data-Driven Rank Tests for Independence. Journal of the American Statistical Association. 1999;94:285–301. [Google Scholar]
- 13.Karagrigoriou A. Goodness-of-Fit Tests for Reliability Modeling. Springer; New York: 2012. pp. 253–267. [Google Scholar]
- 14.Karmansky I, Shnaider H, Palant A, Gruener N. Plasma lipid oxidation and susceptibility of low-density lipoproteins to oxidation in male patients with stable coronary artery disease. Clin Biochem. 1996;29:573–579. doi: 10.1016/s0009-9120(96)00072-0. [DOI] [PubMed] [Google Scholar]
- 15.Kendall MG. A new measure of rank correlation. Biometrika. 1938;30:81–89. [Google Scholar]
- 16.Kendall MG. Rank Correlation Methods. London: Griffin; 1948. [Google Scholar]
- 17.Kiefer J. On large deviations of the empiric d.f. of vector chance variables and a law of the iterated logarithm. Pacific J Math. 1961;11:649–660. [Google Scholar]
- 18.Klauer KC. Non-exponential families of distributions. Metrika. 1986;33:299–305. [Google Scholar]
- 19.Lazar NA. Bayesian Empirical Likelihood. Biometrika. 2003;90:319–326. [Google Scholar]
- 20.Lazar N, Mykland PA. An evaluation of the power and conditionality properties of empirical likelihood. Biometrika. 1998;85:523–534. [Google Scholar]
- 21.Lehmann EL. Nonparametrics: Statistical Methods Based on Ranks. Oakland, CA: Holden-Day; 1975. [Google Scholar]
- 22.Lehmann EL, Romano JP. Testing Statistical Hypotheses. Springer; New York: 2005. [Google Scholar]
- 23.Miwa K, Miyagi U, Fujita M. Susceptibility of plasman low density liporprotein to cupric ion-induced peroxidation in patients with variant angina. J Am Coll Cardiol. 1995;26:632–638. doi: 10.1016/0735-1097(95)00207-K. [DOI] [PubMed] [Google Scholar]
- 24.Miecznikowski JC, Vexler A, Shepherd LA. dbEmpLikeGOF: An R package for nonparametric likelihood ratio tests for goodness-of-fit and two sample comparisons based on sample entropy. Journal of Statistical Software. 2013 In press. [Google Scholar]
- 25.Mudholkar GS, Wilding GE. On the conventional wisdom regarding two consistent tests of bivariate independence. Journal of the Royal Statistical Society, Series D. 2003;52:41–57. [Google Scholar]
- 26.Owen AB. Empirical likelihood ratio confidence intervals for a single functional. Biometrika. 1988;75:237–249. [Google Scholar]
- 27.Owen AB. Empirical Likelihood Ratio Confidence Regions. The Annals of Statistics. 1990;18:90–120. [Google Scholar]
- 28.Owen AB. Empirical Likelihood. Chapman and Hall/CRC; New York: 2001. [Google Scholar]
- 29.Pearson K. Notes on the history of correlation. Biometrika. 1920;13:25–45. [Google Scholar]
- 30.Qin J, Lawless J. Empirical Likelihood and General Estimating Equations. The Annals of Statistics. 1994;22:300–325. [Google Scholar]
- 31.R Development Core Team. R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing; Vienna, Austria: 2012. < http://www.R-project.org>. [Google Scholar]
- 32.Schisterman EF, Faraggi D, Browne R, Freudenheim J, Dorn J, Muti P, Armstrong D, Reiser N, Trevisan M. TBARS and cardiovascular disease in a population-based sample. Journal of Cardiovascular Risk. 2001;8:219–225. doi: 10.1177/174182670100800406. [DOI] [PubMed] [Google Scholar]
- 33.Serfling RJ. Approximation Theorems of Mathematical Statistics. Wiley; New York: 1980. [Google Scholar]
- 34.Spearman CE. The proof and measurements of association between two things. American Journal of Psychology. 1904;15:72–101. [Google Scholar]
- 35.Vasicek O. A test for normality based on sample entropy. Journal of the Royal Statistical Society, Ser B. 1976;38:54–59. [Google Scholar]
- 36.Vexler A, Gurevich G. Empirical likelihood ratios applied to goodness-of-fit tests based on sample entropy. Computational Statistics & Data Analysis. 2010;54:531–545. [Google Scholar]
- 37.Vexler A, Liu S, Kang L, Hutson AD. Modifications of the Empirical Likelihood Interval Estimation with Improved Coverage Probabilities. Communications in Statistics (Simulation and Computation) 2009;38:2171–2183. [Google Scholar]
- 38.Vexler A, Shan G, Kim S, Tsai W-M, Tian L, Hutson AD. An empirical likelihood ratio based goodness-of-fit test for Inverse Gaussian distributions. Journal of Statistical Planning and Inference. 2011;141:2128–2140. [Google Scholar]
- 39.Vexler A, Tsai W-M, Gurevich G, Yu J. Two-sample density-based empirical likelihood ratio tests based on paired data with an application to a treatment study of Attention-Deficit/Hyperactivity Disorder and Severe Mood Dysregulation. Statistics in Medicine. 2012a;31:1821–1837. doi: 10.1002/sim.4467. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Vexler A, Tsai W-M, Malinovsky Y. Estimation and testing based on data subject to measurement errors: from parametric to non-parametric likelihood methods. Statistics in Medicine. 2012b;31:2498–2512. doi: 10.1002/sim.4304. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Vexler A, Wu C. An Optimal Retrospective Change Point Detection Policy. Scandinavian Journal of Statistics. 2010;36:542–558. [Google Scholar]
- 42.Vexler A, Yu J. Two-sample density-based empirical likelihood tests for incomplete data in application to a pneumonia study. Biometrical Journal. 2011;53:628–651. doi: 10.1002/bimj.201000235. [DOI] [PubMed] [Google Scholar]
- 43.Vexler A, Wu C, Yu KF. Optimal hypothesis testing: from semi to fully Bayes factors. Metrika. 2010a;71:125–138. doi: 10.1007/s00184-008-0205-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44.Vexler A, Yu J, Tian L, Liu S. Two-sample nonparametric likelihood inference based on incomplete data with an application to a pneumonia study. Biometrical Journal. 2010b;52:348–361. doi: 10.1002/bimj.200900131. [DOI] [PubMed] [Google Scholar]
- 45.Yu J, Vexler A, Tian L. Analyzing Incomplete Data Subject to a Threshold Using Empirical Likelihood Methods: An Application to a Pneumonia Risk Study in an ICU Setting. Biometrics. 2011;66:123–130. doi: 10.1111/j.1541-0420.2009.01228.x. [DOI] [PubMed] [Google Scholar]
- 46.Yu J, Vexler A, Kim S, Hutson AD. Two-sample Empirical likelihood ratio tests for medians application to biomarker evaluations. The Canadian Journal of Statistics. 2011;39:671–689. [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
