Information bounds for Gaussian copulas

Peter D Hoff; Xiaoyue Niu; Jon A Wellner

doi:10.3150/12-BEJ499

. Author manuscript; available in PMC: 2015 Jan 1.

Published in final edited form as: Bernoulli (Andover). 2014;20(2):604–622. doi: 10.3150/12-BEJ499

Information bounds for Gaussian copulas

Peter D Hoff ¹, Xiaoyue Niu ², Jon A Wellner ³

PMCID: PMC4193671 NIHMSID: NIHMS499029 PMID: 25313292

Abstract

Often of primary interest in the analysis of multivariate data are the copula parameters describing the dependence among the variables, rather than the univariate marginal distributions. Since the ranks of a multivariate dataset are invariant to changes in the univariate marginal distributions, rank-based estimators are natural candidates for semiparametric copula estimation. Asymptotic information bounds for such estimators can be obtained from an asymptotic analysis of the rank likelihood, i.e. the probability of the multivariate ranks. In this article, we obtain limiting normal distributions of the rank likelihood for Gaussian copula models. Our results cover models with structured correlation matrices, such as exchangeable or circular correlation models, as well as unstructured correlation matrices. For all Gaussian copula models, the limiting distribution of the rank likelihood ratio is shown to be equal to that of a parametric likelihood ratio for an appropriately chosen multivariate normal model. This implies that the semiparametric information bounds for rank-based estimators are the same as the information bounds for estimators based on the full data, and that the multivariate normal distributions are least favorable.

Keywords: copula model, local asymptotic normality, multivariate rank statistics, marginal likelihood, rank likelihood, transformation model

1 Rank likelihood for copula models

Recall that a copula is a multivariate CDF having uniform univariate marginal distributions. For any multivariate CDF F(y₁, . . . , y_p) with absolutely continuous margins F₁, . . . , F_p, the corresponding copula C(u₁, . . . , u_p) is given by

C (u_{1}, \dots, u_{p}) = F (F_{1}^{- 1} (u_{1}), \dots, F_{p}^{- 1} (u_{p})) .

Sklar's theorem [Sklar, 1959] shows that C is the unique copula for which F(y₁, . . . , y_p) = C(F₁(y₁), . . . , F_p(y_p)).

In this article we consider models consisting of multivariate probability distributions for which the copula is parameterized separately from the univariate marginal distributions. Specifically, the models we consider consist of collections of multivariate CDFs ${F (y ∣ θ, ψ) : y \in R^{p}, (θ, ψ) \in ϴ \times Ψ}$ such that ψ parameterizes the univariate marginal distributions and θ parameterizes the copula, meaning that for a random vector Y = (Y₁, . . . , Y_p)^T with CDF F(y|θ, ψ),

\begin{matrix} \Pr (Y_{j} \leq y_{j} ∣ θ, ψ) & = F_{j} y_{j} ∣ ψ) \forall θ \in ϴ, j = 1, \dots, p \\ \Pr (F_{1} (Y_{1} ∣ ψ) \leq u_{1}, \dots, F_{p} (Y_{p} ∣ ψ) \leq u_{p} ∣ θ, ψ) & = C (u_{1}, \dots, u_{p} ∣ θ) \forall ψ \in Ψ . \end{matrix}

We refer to such a class of distributions as a copula-parameterized model. For such a model, it will be convenient to refer to the class of copulas {C(u|θ) : θ ∈ Θ} as the copula model, and the class {F₁(y|ψ), . . . , F_p(y|ψ ∈ Ψ)} as the marginal model.

As an example, the copula model for the class of p-variate multivariate normal distributions is called the Gaussian copula model, and is parameterized by letting Θ be the set of p × p correlation matrices. The marginal model for the p-variate normal distributions is the set of all p-tuples of univariate normal distributions. The copula-parameterized models we focus on in this article are semiparametric Gaussian copula models [Klaassen and Wellner, 1997], for which the copula model is Gaussian and the marginal model consists of the set of all p-tuples of absolutely continuous univariate CDFs.

Let Y be an n × p random matrix whose rows Y₁, . . . , Y_n are i.i.d. samples from a p-variate population. We define the multivariate rank function $R (Y) : R^{n \times p} \to R^{n \times p}$ so that R_i,j, the (i, j)th element of R(Y), is the rank of Y_i,j among {Y_1,j, . . . , Y_n,j}. Note that the ranks R(Y) are invariant to strictly increasing transformations of the columns of Y, and therefore the probability distribution of R(Y) does not depend on the univariate marginal distributions of the p variables. As a result, for any copula parameterized model and data matrix $y \in R^{n \times p}$ with ranks R(y) = r, the likelihood L(θ, ψ : y) can be decomposed as

\begin{matrix} L (θ, ψ : y) = p (y ∣ θ, ψ) & = \Pr (R (Y) = r ∣ θ, ψ) \times p (y ∣ θ, ψ, r) \\ \equiv L (θ : r) \times L (θ, ψ : [y ∣ r]), \end{matrix}

(1)

where p(y|θ, ψ) is the joint density of Y and p(y|θ, ψ, r) is the conditional density of Y given R(Y) = r. The function L(θ : r) = Pr(R(Y) = r|θ) is called the rank likelihood function. In situations where θ is the parameter of interest and ψ a nuisance parameter, inference for θ can be obtained from the rank likelihood function without having to estimate the margins or specify a marginal model. A univariate rank likelihood function was proposed by Pettitt [1982] for estimation in monotonically transformed regression models. Asymptotic properties of the rank likelihood for this regression model were studied by Bickel and Ritov [1997], and a parameter estimation scheme based on Gibbs sampling was provided in Hoff [2008]. Rank likelihood estimation of copula parameters was studied in Hoff [2007], who also extended the rank likelihood to accommodate multivariate data with mixed continuous and discrete marginal distributions.

The rank likelihood is constructed from the marginal probability of the ranks and can therefore be viewed as a type of marginal likelihood. Marginal likelihood procedures are often used for estimation in the presence of nuisance parameters (see Section 8.3 of Severini [2000] for a review). Ideally, the statistic that generates a marginal likelihood is “partially sufficient” in the sense that it contains all of the information about the parameter of interest that can be quantified without specifying the nuisance parameter. Notions of partial sufficiency include G-sufficiency [Barnard, 1963] and L-sufficiency [Rémon, 1984], which are motivated by group invariance and profile likelihood, respectively. Hoff [2007] showed that the ranks R(Y) are both a G- and L-sufficient statistic in the context of copula estimation.

Although rank-based estimators of the copula parameter θ may be appealing for the reasons described above, one may wonder to what extent they are efficient. The decomposition given in (1) indicates that rank-based estimates do not use any information about θ contained in L(θ, ψ : [y|r]), i.e. the conditional density of the data given the ranks. For at least one copula model, this information is asymptotically negligible: Klaassen and Wellner [1997] showed that for the bivariate normal copula model, a rank-based estimator is semiparametrically efficient and has asymptotic variance equal to the Cramér-Rao information bound in the bivariate normal model, i.e. the bivariate normal model is the least favorable submodel. Genest and Werker [2002] studied the efficiency properties of pseudo-likelihood estimators for two-dimensional semiparametric copula models and showed that the pseudo-likelihood estimators (which are functions of the bivariate ranks) are not in general semiparametrically efficient for non-Gaussian copulas. Chen et al. [2006] proposed estimators in general multivariate copula models that achieve semiparametric asymptotic efficiency but are not based solely on the multivariate ranks. It remains unclear whether estimators based solely on the ranks can be asymptotically efficient in general semiparametric copula models. In particular, it is not yet known if maximum likelihood estimators based on rank likelihoods for Gaussian semiparametric copula models are semiparametrically efficient.

The potential efficiency loss of rank-based estimators can be investigated via the limiting distribution of an appropriately scaled rank likelihood ratio. Generally speaking, the local asymptotic normality (LAN) of a likelihood ratio plays an important role in the asymptotic analysis of testing and estimation procedures. For semiparametric models, the asymptotic variance of a LAN likelihood ratio can be related to efficient tests [Choi et al., 1996] and information bounds for regular estimators [Begun et al., 1983, Bickel et al., 1993]. In particular, the variance of the limiting normal distribution of a LAN rank likelihood ratio provides information bounds for locally regular rank-based estimators of copula parameters.

In this article we obtain the limiting normal distributions of the rank likelihood ratio for Gaussian copula models with structured and unstructured correlation matrices. In the next section we give sufficient conditions under which the rank likelihood is LAN. The basic result is that the rank likelihood is LAN if there exists a good rank-measurable approximation to a LAN submodel. For Gaussian copulas, the natural candidate submodels are multivariate normal models, for which the log likelihood is quadratic in the observations. In Section 3, we identify sufficient conditions for a normal quadratic form to have a good rank-measurable approximation. This result allows us to identify multivariate normal submodels with likelihood ratios that asymptotically approximate the rank likelihood ratio. In Section 4 we show that for any smoothly parameterized Gaussian copula, the rank likelihood ratio is LAN with an asymptotic variance equal to that of the likelihood ratio for the corresponding multivariate normal model with unequal marginal variances. Since the parametric multivariate normal model is a submodel of the semiparametric Gaussian copula model, and in general the semiparametric information bound based on the full data is higher than that of any parametric submodel, our results imply that the bounds for rank-based estimators are equal to the semiparametric bounds for estimators based on the full data, and that the multivariate normal models are least favorable. These bounds can be compared to the asymptotic variance of an estimator to assess its asymptotic efficiency. Via two examples, in Section 5 we show that pseudo-likelihood estimators are asymptotically efficient for some but not all Gaussian copula models. This is discussed further in Section 6.

2 Approximating the rank likelihood ratio

The local log rank likelihood ratio is defined as

λ_{r} (s) = log \frac{L (θ + s ∕ \sqrt{n} : r)}{L (θ : r)},

where L(θ : r) is defined in (1). Studying λ_r is difficult because L(θ : r) is the integral of a copula density over a complicated set defined by multivariate order constraints. However, in some cases it is possible to obtain the asymptotic distribution of λ_r by relating it to the local log likelihood ratio λ_y of an appropriate parametric multivariate model, where

λ_{y} (s, t) = log \frac{L (θ + s ∕ \sqrt{n}, ψ + t ∕ \sqrt{n} : y)}{L (θ, ψ : y)} .

(2)

This method of identifying the asymptotic distribution of λ_r is analogous to the approach taken by by Bickel and Ritov [1997] in their investigation of the rank likelihood ratio for a univariate semiparametric regression model.

In this section, we will show that if we can find a sufficiently good rank-measurable approximation to λ_y, then the limiting distribution of λ_r will match that of λ_y. Specifically, we prove the following theorem:

Theorem 2.1. Let{F(y|θ, ψ) : θ ∈ Θ, ψ ∈ Ψ} be an absolutely continuous copula parameterized model where for given values of θ and s there exists values of ψ and t such that under i.i.d. sampling from F(y|θ, ψ),

λ_y(s, t) is LAN, so that $λ_{y} (s, t) \overset{d}{\to} Z$ , a normal random variable, and
there exists a rank-measurable approximation λ_ŷ(s, t) such that $λ_{y} (s, t) - λ_{\hat{y}} (s, t) \overset{p}{\to} 0$ .

Then $λ_{r} (s) \overset{d}{\to} Z$ as n → ∞ under i.i.d. sampling from any population with copula C(u|θ) equal to that of F(y|θ, ψ) and arbitrary absolutely continuous marginal distributions.

Proof. Let L(θ, ψ : y) be the (parametric) likelihood function for a given dataset $y \in R^{n \times p}$ . The lack of dependence of the rank likelihood on the marginal distributions leads to the following identity relating λ_r(s) to λ_y(s, t):

\begin{matrix} log E_{θ} [e^{λ_{y} (s, t)} ∣ R (Y) = r] & = log \int_{R (y) = r} \frac{p (y ∣ θ + s ∕ \sqrt{n}, ψ + t ∕ \sqrt{n})}{p (y ∣ θ, ψ)} \frac{p (y ∣ θ, ψ)}{\Pr (R (Y) = r ∣ θ)} d y \\ = log \frac{\Pr (R (Y) = r ∣ θ + s ∕ \sqrt{n})}{\Pr (R (Y) = r ∣ θ)} = λ_{r} (s) . \end{matrix}

Now suppose we would like to describe the statistical properties of λ_r(s) when the matrix r is replaced by the ranks R(Y), where the rows of Y are i.i.d. samples from a population with copula C(u|θ). Since the distribution of the ranks of Y is invariant with respect to the univariate marginal distributions, the particular marginal model and values of ψ and t are immaterial and can be chosen to facilitate analysis. For each θ and s, our strategy will be to select ψ and t such that the replacement of y by a rank-measurable approximation ŷ in Equation 2 results in an accurate rank-based approximation λ_ŷ(s, t) of λ_y(s, t). Because the resulting λ_ŷ is rank-measurable, we can write

\begin{matrix} λ_{r} (s) & = log E_{θ} [e^{λ_{y} (s, t)} ∣ R (Y)] \\ = λ_{\hat{y}} (s, t) + log E_{θ} [e^{λ_{y} (s, t) - λ_{\hat{y}} (s, t)} ∣ R (Y)] . \end{matrix}

If the approximation of λ_y(s, t) by λ_ŷ(s, t) is sufficiently accurate to make the remainder term, log E_θ[e^{λ_y(s,t)–λ_ŷ(s,t)}|R(Y)], converge in probability to zero as n → ∞, then the asymptotic distribution of λ_r(s) is determined by that of λ_ŷ(s, t). Note that λ_r(s) does not depend on t, which implies that the value of t for which such an approximation is available will depend on s and θ.

Let λ_y be LAN and Y₁, . . . , Y_n ~ i.i.d. F(y|θ, ψ). For given s and t, we will show that if $λ_{y} (s, t) - λ_{\hat{y}} (s, t) \overset{p}{\to} 0$ , then log $E_{θ} [e^{λ_{y} (s, t) - λ_{\hat{y}} (s, t)} ∣ R (Y)] \overset{p}{\to} 0$ , where here and in what follows, limits are as n → ∞ and probabilities and expectations are calculated under θ and ψ unless otherwise noted. We note that this result was essentially proven at the end of the proof of Theorem 1 of Bickel and Ritov [1997] in the context of the regression transformation model, although details were omitted. We include the proof here for completeness.

Let U_n = e^λ_y, V_n = e^λ_ŷ and R_n = R(Y₁ , . . . , Y_n), so that the exponential of the remainder term can be written as $E [\frac{U_{n}}{V_{n}} ∣ R_{n}]$ . For any M > 1 we can write

\begin{matrix} ∣ E [\frac{U_{n}}{V_{n}} - 1 ∣ R_{n}] ∣ & \leq E [∣ \frac{U_{n}}{V_{n}} - 1 ∣ ∣ R_{n}] \\ = E [∣ \frac{U_{n}}{V_{n}} - 1 ∣ 1_{(\frac{U_{n}}{V_{n}} \leq M)} ∣ R_{n}] + E [∣ \frac{U_{n}}{V_{n}} - 1 ∣ 1_{(\frac{U_{n}}{V_{n}} > M)} ∣ R_{n}] \\ \leq E [∣ \frac{U_{n}}{V_{n}} - 1 ∣ 1_{(\frac{U_{n}}{V_{n}} \leq M)} ∣ R_{n}] + E [\frac{U_{n}}{V_{n}} 1_{(\frac{U_{n}}{V_{n}} > M)} ∣ R_{n}] + E [1_{(\frac{U_{n}}{V_{n}} > M)} ∣ R_{n}] \\ = E [∣ \frac{U_{n}}{V_{n}} - 1 ∣ 1_{(\frac{U_{n}}{V_{n}} \leq M)} ∣ R_{n}] + V_{n}^{- 1} E [U_{n} 1_{(\frac{U_{n}}{V_{n}} > M)} ∣ R_{n}] + \Pr (\frac{U_{n}}{V_{n}} > M ∣ R_{n}) \\ = a_{n} + b_{n} + c_{n} . \end{matrix}

We now show that each of a_n, b_n and c_n converge in probability to zero. To do so, we make use of the following facts:

$U_{n} ∕ V_{n} = e^{λ_{y} - λ_{\hat{y}}} \overset{p}{\to} 1$ by the continuous mapping theorem;
U_n = e^λ_y and $V_{n}^{- 1} = e^{- λ_{\hat{y}}}$ are bounded in probability, as λ_y and λ_ŷ converge in distribution.
${U_{n} : n \in N}$ is uniformly integrable, since log U_n = λ_y is LAN [Hall and Loynes, 1977];
If E[|X_n|] → 0 and Z_n is a random sequence, then $E [X_{n} ∣ Z_{n}] \overset{p}{\to} 0$ .

To see that $a_{n} \overset{p}{\to} 0$ and $c_{n} \overset{p}{\to} 0$ , note that both $∣ \frac{U_{n}}{V_{n}} - 1 ∣ 1_{(\frac{U_{n}}{V_{n}} \leq M)}$ and $1_{(\frac{U_{n}}{V_{n}} > M)}$ are bounded random variables that converge in probability to zero, so their conditional expectations given R_n converge in probability to zero as well. For the sequence b_n, note that U_n is O_p(1) as it converges in distribution, and $1_{(\frac{U_{n}}{V_{n}} > M)}$ is o_p(1) as $\frac{U_{n}}{V_{n}} \overset{p}{\to} 1$ , so ${\tilde{U}}_{n} = U_{n} 1_{(\frac{U_{n}}{V_{n}} > M)}$ is o_p(1). Now 0 ≤ Ũ_n ≤ U_n for each n, and ${U_{n} : n \in N}$ is uniformly integrable, so ${{\tilde{U}}_{n} : n \in N}$ is uniformly integrable as well. This and ${\tilde{U}}_{n} \overset{p}{\to} 0$ imply that E[|Ũ_n|] = E[Ũ_n] → 0, and so $E [{\tilde{U}}_{n} ∣ R_{n}] \overset{p}{\to} 0$ . Since $b_{n} = V_{n}^{- 1} E [{\tilde{U}}_{n} ∣ R_{n}]$ , and $V_{n}^{- 1}$ is O_p(1), b_n is o_p(1).

Recall our original identity relating λ_r(s) to λ_y(s, t) and λ_ŷ(s, t):

λ_{r} (s) = λ_{\hat{y}} (s, t) + log E [e^{λ_{y} (s, t) - λ_{\hat{y}} (s, t)} ∣ R (Y)] .

We have shown that if λ_y is LAN and $λ_{y} (s, t) - λ_{\hat{y}} (s, t) \overset{p}{\to} 0$ under i.i.d. sampling from F(y|θ, ψ), then the remainder term goes to zero, and so λ_y, λ_ŷ and λ_r all converge to the same normal random variable. If the data are being sampled from a population with the same copula as F(y|θ, ψ) but different margins, then there exists a transformation of the data such that F(y|θ, ψ) is the distribution of the transformed population, and the result follows.

For a given copula model, Theorem 2.1 essentially says that the asymptotic distribution of the log rank likelihood ratio will be the same as that of the log likelihood ratio of any multivariate model with the same copula, as long as the latter admits an asymptotically accurate rank-measurable approximation. The task of identifying the limiting distribution of λ_r then becomes one of identifying a suitable marginal model for which such an approximation to the log likelihood ratio holds. For multivariate normal models, the log likelihood ratio is quadratic in the observations, and so the existence of a good rank measurable approximation depends on the accuracy of rank-based approximations to normal quadratic forms. In the next section, we identify a class of quadratic forms that admit sufficiently accurate rank-measurable approximations. In Section 4, we relate these forms to multivariate normal models for which the conditions of Theorem 2.1 hold.

3 Rank approximations to normal quadratic forms

Let Y₁, . . . , Y_n be i.i.d. random column vectors from a member of a class of mean-zero p-variate normal distributions indexed by a correlation parameter θ ∈ Θ and a variance parameter ψ ∈ Ψ. As discussed further in the next section, the local likelihood ratio λ_y can be expressed as a quadratic function of Y₁, . . . , Y_n, taking the form

λ_{y} (s, t) = (\frac{1}{\sqrt{n}} \sum_{i = 1}^{n} Y_{i}^{T} A Y_{i}) + c (θ, ψ, s, t) + o_{p} (1)

for some matrix A which could be a function of s, t, θ and ψ. A natural rank-based approximation to λ_y is

λ_{\hat{y}} (s, t) = (\frac{1}{\sqrt{n}} \sum_{i = 1}^{n} {\hat{Y}}_{i}^{T} A {\hat{Y}}_{i}) + c (θ, ψ, s, t),

where {Ŷ_i,j : i ∈ {1, . . . , n}, j ∈ {1, . . . , p}} are the (approximate) normal scores, defined by R = R(Y) and ${\hat{Y}}_{i, j} = \sqrt{Var [Y_{i, j} ∣ ψ]} \times Φ^{- 1} (\frac{R_{i, j}}{n + 1})$ . Whether or not λ_ŷ – λ_y → 0 therefore depends on the convergence to zero of the difference between the quadratic terms of λ_ŷ and λ_y. In this section we show that this difference converges to zero under certain conditions on A and the covariance matrix C = Cov[Y_i]. Specifically, we prove the following theorem:

Theorem 3.1. Let Y₁, . . . , Y_n ~ i.i.d. N_p(0, C) where C is a correlation matrix, and let ${\hat{Y}}_{i, j} = Φ^{- 1} (\frac{R_{i, j}}{n + 1})$ , where R_i,j is the rank of Y_i,j among Y_1,j, . . . , Y_n,j. Let A be a matrix such that the diagonal entries of AC + A^TC are zero. Then

\frac{1}{\sqrt{n}} \sum_{i = 1}^{n} ({\hat{Y}}_{i}^{T} A {\hat{Y}}_{i} - Y_{i}^{T} A Y_{i}) \overset{p}{\to} 0 a s n \to \infty .

Proof. Let $S_{n} = \frac{1}{\sqrt{n}} \sum_{i = 1}^{n} ({\hat{Y}}_{i}^{T} A {\hat{Y}}_{i} - Y_{i}^{T} A Y_{i})$ and let Ã = (A + A^T)/2, so that y^TÃy = y^TAy for all $y \in R^{p}$ . Then

\begin{matrix} {\hat{Y}}^{T} A \hat{Y} - Y^{T} A Y & = {\hat{Y}}^{T} \tilde{A} \hat{Y} - Y^{T} \tilde{A} Y \\ = {(\hat{Y} - Y)}^{T} \tilde{A} (\hat{Y} - Y) + 2 {(\hat{Y} - Y)}^{T} \tilde{A} Y, \end{matrix}

the latter equality holding since Ã is symmetric. From this, we can write S_n = Q_n + 2L_n where

\begin{matrix} Q_{n} & = \frac{1}{\sqrt{n}} \sum {({\hat{Y}}_{i} - Y_{i})}^{T} \tilde{A} ({\hat{Y}}_{i} - Y_{i}) \\ L_{n} & = \frac{1}{\sqrt{n}} \sum {({\hat{Y}}_{i} - Y_{i})}^{T} \tilde{A} Y_{i} . \end{matrix}

We can write Q_n as

Q_{n} = \sum_{j = 1}^{p} {\tilde{a}}_{j, j} (\frac{1}{\sqrt{n}} \sum_{i = 1}^{n} {({\hat{Y}}_{i, j} - Y_{i, j})}^{2}) + \sum_{j \neq k} {\tilde{a}}_{j, k} (\frac{1}{\sqrt{n}} \sum_{i = 1}^{n} ({\hat{Y}}_{i, j} - Y_{i, j}) ({\hat{Y}}_{i, k} - Y_{i, k})) .

The squared terms converge in probability to zero by Theorem 1 of de Wet and Venter [1972], and the cross term converges in probability to zero by the Cauchy-Schwarz inequality.

We now find conditions on A under which $L_{n} \overset{p}{\to} 0$ . Note that

{(\hat{y} - y)}^{T} \tilde{A} y = \sum_{j = 1}^{p} ({\hat{y}}_{j} - y_{j}) {\tilde{a}}_{j}^{T} y

where ã₁ . . . , ã_p are the rows of Ã. This gives

L_{n} = \sum_{j = 1}^{p} L_{n, j} \equiv \sum_{j = 1}^{p} \frac{1}{\sqrt{n}} \sum_{i = 1}^{n} ({\hat{Y}}_{i, j} - Y_{i, j}) {\tilde{a}}_{j}^{T} Y_{i} .

Let c_j be the jth row of C, the correlation matrix of Y. We will show that $L_{n, j} \overset{p}{\to} 0$ if ${\tilde{a}}_{j}^{T} c_{j} = 0$ using an argument based on conditional expectations. Considering L_n,1 for example, recall that E[Y|Y₁] = c₁Y₁ and so

\begin{matrix} E [L_{n, 1} ∣ Y_{1, 1}, \dots, Y_{n, 1}] & = \frac{1}{\sqrt{n}} \sum ({\hat{Y}}_{i, 1} - Y_{i, 1}) E [{\tilde{a}}_{1}^{T} Y_{i} ∣ Y_{i, 1}] \\ = \frac{1}{\sqrt{n}} \sum ({\hat{Y}}_{i, 1} - Y_{i, 1}) {\tilde{a}}_{1}^{T} c_{1} Y_{i, 1} = 0 \end{matrix}

if ${\tilde{a}}_{j}^{T} c_{j} = 0$ . The conditional expectation of $L_{n, 1}^{2}$ is given by

E [L_{n, 1}^{2} ∣ Y_{1, 1}, \dots, Y_{n, 1}] = \frac{1}{n} \sum_{i = 1}^{n} {({\hat{Y}}_{i, 1} - Y_{i, 1})}^{2} E [{({\tilde{a}}_{1}^{T} Y_{i})}^{2} ∣ Y_{i, 1}] + \frac{1}{n} \underset{i_{1} \neq i_{2}}{\sum\sum} ({\hat{Y}}_{i_{1}, 1} - Y_{i_{1}, 1}) ({\hat{Y}}_{i_{2}, 1} - Y_{i_{2}, 1}) E [{\tilde{a}}_{1}^{T} Y_{i_{1}} ∣ Y_{i_{1}, 1}] E [{\tilde{a}}_{1}^{T} Y_{i_{2}} ∣ Y_{i_{2}, 1}] .

The expectations in the second sum are both proportional to ${\tilde{a}}_{1}^{T} c_{1} = 0$ , leaving

Var [L_{n, 1} ∣ Y_{1, 1}, \dots, Y_{n, 1}] = E [L_{n, 1}^{2} ∣ Y_{1, 1}, \dots, Y_{n, 1}] = \frac{1}{n} \sum_{i = 1}^{n} {({\hat{Y}}_{i, 1} - Y_{i, 1})}^{2} E [{({\tilde{a}}_{1}^{T} Y_{i})}^{2} ∣ Y_{i, 1}] .

The conditional expectation $E [{({\tilde{a}}_{1}^{T} Y_{i})}^{2} ∣ Y_{i, 1}]$ can be obtained by noting that if Y ~ N_p(0, C), then the conditional distribution of Y given Y₁ can be expressed as

Y ∣ Y_{1} \overset{d}{=} c_{1} Y_{1} + G ∊,

where ${GG}^{T} = C - c_{1} c_{1}^{T}$ and ε is p-variate standard normal. The desired second moment is then

\begin{matrix} E [{({\tilde{a}}_{1}^{T} Y)}^{2} ∣ Y_{1}] & = {\tilde{a}}_{1}^{T} E [Y Y^{T} ∣ Y_{1}] {\tilde{a}}_{1} \\ = {\tilde{a}}_{1}^{T} E [Y_{1}^{2} c_{1} c_{1}^{T} + 2 Y_{1} c_{1} ∊^{T} G^{T} + G ∊ ∊^{T} G^{T} ∣ Y_{1}] {\tilde{a}}_{1} \\ = (Y_{1}^{2} - 1) {({\tilde{a}}_{1}^{T} c_{1})}^{2} + {\tilde{a}}_{1}^{T} C {\tilde{a}}_{1} \end{matrix}

which is equal to ${\tilde{a}}_{1}^{T} C {\tilde{a}}_{1}$ under the condition that ${\tilde{a}}_{1}^{T} c_{1} = 0$ . Letting $γ_{1} = {\tilde{a}}_{1}^{T} C {\tilde{a}}_{1}$ , the conditional variance of L_n,1 given the observations for the first variate is then

Var [L_{n, 1} ∣ Y_{1, 1}, \dots, Y_{n, 1}] = \frac{γ_{1}}{n} \sum {({\hat{Y}}_{i, 1} - Y_{i, 1})}^{2} .

Applying Chebyshev's inequality gives

\begin{matrix} \Pr (∣ L_{n, 1} ∣ > ∊ ∣ Y_{1, 1}, \dots, Y_{n, 1}) & \leq 1 \land Var [L_{n, 1} ∣ Y_{1, 1}, \dots, Y_{n, 1}] ∕ ∊^{2} \\ = 1 \land \frac{γ_{1}}{∊^{2}} \frac{\sum {({\hat{Y}}_{i, 1} - Y_{i, 1})}^{2}}{n} \\ = 1 \land c_{n} = {\tilde{c}}_{n} . \end{matrix}

Now $c_{n} \overset{p}{\to} 0$ as a result of Theorem 1 of de Wet and Venter [1972] and therefore so does c̃_n. But as c̃_n is bounded, we have E[c̃_n] → 0, giving

\begin{matrix} \Pr (∣ L_{n, 1} ∣ > ∊) & = E [\Pr (∣ L_{n, 1} ∣ > ∊ ∣ Y_{1, 1}, \dots, Y_{n, 1})] \\ \leq E [{\tilde{c}}_{n}] \to 0, \end{matrix}

and so $L_{n, 1} \overset{p}{\to} 0$ . The same argument can be applied to L_n,j for each j, and so $L_{n} = \sum_{j = 1}^{p} L_{n, j} \to 0$ as long as ${\tilde{a}}_{j}^{T} c_{j} = 0$ for each j = 1, . . . , p, or equivalently, if the diagonal elements of AC + A^TC are zero.

4 LAN for general Gaussian copulas

In this section we use Theorems 2.1 and 3.1 to prove that the limiting distribution of the rank likelihood ratio λ_r for smoothly parameterized Gaussian copula models is same as that of the likelihood ratio for the corresponding normal model with unequal marginal variances. Specifically, we prove the following theorem:

Theorem 4.1. Let ${C (θ) : θ \in ϴ \subset R^{q}}$ be a collection of positive definite correlation matrices such that C(θ) is twice differentiable. If Y₁, . . . , Y_n are i.i.d. from a population with absolutely continuous marginal distributions and copula C(θ) for some θ ∈ Θ, then the distribution of the rank likelihood ratio λ_r(s) converges to a N(–s^TI_θθ·ψs/2, s^TI_θθ·ψs) distribution, where I_θθ·ψ is the information for θ in the normal model with correlation C(θ) and marginal precisions ψ.

We note that I_θθ·ψ is a function of θ and not of ψ, as will become clear in the proof.

Proof. Consider the class of mean-zero multivariate normal models with inverse-covariance matrix Var[Y|θ, ψ]^–1 = D(ψ)^1/2B(θ)D(ψ)^1/2, where $θ \in R^{q}$ and D(ψ) is the diagonal matrix with diagonal elements $ψ \in R^{p}$ . The log probability density for a member of this class is given by

l (y) = (- p log 2 π + \sum log ψ_{j} + log ∣ B ∣ - y^{T} D {(ψ)}^{1 ∕ 2} BD {(ψ)}^{1 ∕ 2} y) ∕ 2 .

The log-likelihood derivatives are

\begin{matrix} {\dot{l}}_{θ_{k}} (y) & = [tr (B_{θ_{k}} C) - y^{T} D {(ψ)}^{1 ∕ 2} B_{θ_{k}} D {(ψ)}^{1 ∕ 2} y] ∕ 2 \\ {\dot{l}}_{ψ_{j}} (y) & = [1 - y_{j} ψ_{j}^{1 ∕ 2} b_{j}^{T} D {(ψ)}^{1 ∕ 2} y] ∕ (2 ψ_{j}), \end{matrix}

and straightforward calculations show that

\begin{matrix} I_{ψ ψ} & = D {(ψ)}^{- 1} (I + B \circ C) D {(ψ)}^{- 1} ∕ 4 \\ I_{ψ θ_{k}} & = - D^{- 1} (ψ) diag ({BC}_{θ_{k}}) ∕ 2, \end{matrix}

where “○” is the Hadamard product denoting element-wise multiplication. The local log likelihood ratio for this model can be expressed as

λ_{y} (s, t) = \frac{1}{\sqrt{n}} \sum_{i = 1}^{n} s^{T} {\dot{l}}_{θ} (Y_{i}) + t^{T} {\dot{l}}_{ψ} (Y_{i}) - \frac{1}{2} {[\begin{matrix} s \\ t \end{matrix}]}^{T} I [\begin{matrix} s \\ t \end{matrix}] + o_{p} (1),

which, under independent sampling from N_p(0, D(ψ)^1/2C(θ)D(ψ)^1/2), converges in distribution to a N(–u^TIu/2, u^TIu) random variable, where u^T = (s^T, t^T) and I is the information matrix for (θ, ψ).

We take our rank based approximation λ_ŷ to be equal to λ_y absent the o_p(1) term and with each Y_i replaced by its approximate normal scores Ŷ_i. Clearly, we have λ_ŷ(s) – λ_y(s) = o_p(1) if

\frac{1}{\sqrt{n}} \sum_{i = 1}^{n} [s^{T} {\dot{l}}_{θ} ({\hat{Y}}_{i}) + t^{T} {\dot{l}}_{ψ} ({\hat{Y}}_{i})] - [s^{T} {\dot{l}}_{θ} (Y_{i}) + t^{T} {\dot{l}}_{ψ} (Y_{i})] = o_{p} (1) .

Given θ and s, we now identify a value of t for which the above asymptotic result holds. Let t = Hs, where $H \in R^{p \times q}$ , so that

\begin{matrix} s^{T} {\dot{l}}_{θ} (y) + t^{T} {\dot{l}}_{ψ} (y) & = s^{T} [{\dot{l}}_{θ} (y) + H^{T} {\dot{l}}_{ψ} (y)] \\ = \sum_{k = 1}^{q} s_{k} [{\dot{l}}_{θ_{k}} (y) + h_{k}^{T} {\dot{l}}_{ψ} (y)], \end{matrix}

where {h_k, k = 1, . . . q} are the columns of H. Now l̇_{θ_k}(y) and l̇_ψ(y) are both quadratic in y. Evaluating at ψ = 1, we have l̇_{θ_k}(y) = [tr(B_{θ_k}C) – y^TB_{θ_k}y]/2 and ${\dot{l}}_{ψ_{j}} (y) = [1 - y_{j} b_{j}^{T} y] ∕ 2$ , and so

h^{T} {\dot{l}}_{ψ} (y) = [h^{T} 1 - y^{T} D (h) B y] ∕ 2 .

Therefore, we can write s^T[l̇_θ(y) + H^Tl̇_ψ(y)] as

\begin{matrix} s^{T} [{\dot{l}}_{θ} (y) + H^{T} {\dot{l}}_{ψ} (y)] & = \sum_{k = 1}^{q} s_{k} [{\dot{l}}_{θ_{k}} (y) + h_{k}^{T} {\dot{l}}_{ψ} (y)] \\ = (\sum_{k = 1}^{q} s_{k} y^{T} A_{k} y) + c (s, H, θ) \end{matrix}

where c(s, H, θ) does not depend on y, and A_k is given by

A_{k} = - [B_{θ_{k}} + D (h_{k}) B] ∕ 2 .

Substituting this representation of s^Tl̇_θ + t^Tl̇_ψ into λ_ŷ and λ_y gives

l_{\hat{y}} - λ_{y} = \sum_{k = 1}^{q} s_{k} (\frac{1}{\sqrt{n}} \sum_{i = 1}^{n} {\hat{Y}}_{i} A_{k} {\hat{Y}}_{i} - Y_{i} A_{k} Y_{i}) + o_{p} (1) .

Theorem 3.1 implies that this difference will converge in probability to zero if the diagonal elements of $(A_{k} + A_{k}^{T}) C$ are zero for each k = 1, . . . , q. The value of $(A_{k} + A_{k}^{T}) C$ can be calculated as

\begin{matrix} 2 (A_{k} + A_{k}^{T}) C & = - 2 \times B_{θ_{k}} C - D (h_{k}) BC - BD (h_{k}) C \\ = 2 \times {BC}_{θ_{k}} - (D (h_{k}) + BD (h_{k}) C) . \end{matrix}

The vector diag(D(h_k) + BD(h_k)C) can be written as

diag (D (h_{k}) + BD (h_{k}) C) = (\begin{matrix} h_{k 1} + h_{k}^{T} (b_{1} \circ c_{1}) \\ ⋮ \\ h_{k p} + h_{k}^{T} (b_{p} \circ c_{p}) \end{matrix}) = (I + B \circ C) h_{k},

and so our condition on h_k becomes

\begin{matrix} (I + B \circ C) h_{k} & = 2 \times diag ({BC}_{θ_{k}}) \\ h_{k} & = 2 {(I + B \circ C)}^{- 1} diag ({BC}_{θ_{k}}) \\ = - I_{ψ ψ}^{- 1} I_{θ_{k} ψ} . \end{matrix}

Therefore, setting $t = H s = - I_{ψ ψ}^{- 1} I_{ψ θ} s$ yields a quadratic form that satisfies the conditions of Theorem 3.1. The result then follows via Theorem 2.1. The value of u^TIu that determines the asymptotic mean and variance of λ_y(s), λ_ŷ(s) and λ_r(s) is given by

\begin{matrix} u^{T} I u & = {(\begin{matrix} s \\ - I_{ψ ψ}^{- 1} I_{ψ θ} s \end{matrix})}^{T} (\begin{matrix} I_{θ θ} & I_{θ ψ} \\ I_{θ ψ} & I_{ψ ψ} \end{matrix}) (\begin{matrix} s \\ - I_{ψ ψ}^{- 1} I_{ψ θ} s \end{matrix}) \\ = s^{T} I_{θ θ} s - s^{T} I_{θ ψ} I_{ψ ψ}^{- 1} I_{ψ θ} s \\ = s^{T} I_{θ θ \cdot ψ} s . \end{matrix}

This result shows that the least favorable submodel of a semiparametric Gaussian copula model is the multivariate normal model with unequal variances, and that the information bound for any regular estimator of θ is given by I_θθψ. However, for some correlation models the value of I_θθψ is equal to the corresponding information for θ in a model with equal marginal variances. In such cases, the least favorable submodel simplifies to the multivariate normal model with equal marginal variances. To identify conditions under which this result holds, consider the log likelihood ratio for a multivariate normal model with equal marginal variances:

λ_{y} (s, t) = \frac{1}{\sqrt{n}} \sum_{i = 1}^{n} [s^{T} {\dot{l}}_{θ} (Y_{i}) + t {\dot{l}}_{ψ} (Y_{i})] - {[\begin{matrix} s \\ t \end{matrix}]}^{T} I [\begin{matrix} s \\ t \end{matrix}] ∕ 2 + o_{p} (1) .

Under i.i.d. sampling from N_p(0, C(θ)/ψ), λ_y(s, t) converges in distribution to a N(–u^TIu/2,u^TIu) random variable, where u^T = (s^T, t) and I is the information matrix for (θ, ψ), for which

\begin{matrix} I_{ψ θ} & = {I_{ψ θ_{k}}} = {- tr ({BC}_{θ_{k}}) ∕ (2 ψ)} \\ I_{ψ ψ} & = p ∕ (2 ψ^{2}) . \end{matrix}

Our candidate rank-measurable approximation to λ_y(s, t) is given by

λ_{\hat{y}} (s, t) = \frac{1}{\sqrt{n}} \sum_{i = 1}^{n} [s^{T} {\dot{l}}_{θ} ({\hat{Y}}_{i}) + t {\dot{l}}_{ψ} ({\hat{Y}}_{i})] - {[\begin{matrix} s \\ t \end{matrix}]}^{T} I [\begin{matrix} s \\ t \end{matrix}] ∕ 2 .

Recall that if for our given s and θ we can find a t and ψ such that λ_ŷ – λy = o_p(1), then the conditions of Theorem 2.1 will be met and the asymptotic distribution of λ_r(s) will be that of λ_y(s, t). With this in mind, let t = hT s for some $h \in R^{q}$ , and writeλ_y(s,hT s) ≡ λ_y(s). We will find conditions on C(θ) such that there exists an h for which λ_ŷ(s) – λ_y(s) = o_p(1), and will show that any such h must be equal to $- I_{ψ ψ}^{- 1} I_{ψ θ}$ . With t = h^Ts and ψ = 1, we have

\begin{matrix} s^{T} {\dot{l}}_{θ} (y) + t {\dot{l}}_{ψ} (y) & = s^{T} [{\dot{l}}_{θ} (y) + h {\dot{l}}_{ψ} (y)] \\ = \sum_{k = 1}^{q} s_{k} [{\dot{l}}_{θ_{k}} (y) + h_{k} {\dot{l}}_{ψ} (y)] \\ = - \sum_{k = 1}^{q} s_{k} y^{T} (B_{θ_{k}} + h_{k} B) y ∕ 2 + c (θ, s, h) \\ = \sum_{k = 1}^{q} s_{k} y^{T} A_{k} y + c (θ, s, h), \end{matrix}

where A_k = –(Bθ_k + h_kB)/2 = (BCθ_kB – h_kB)/2 and c(θ, s, h) does not depend on y. The difference between λ_ŷ and λ_y is then

λ_{\hat{y}} - λ_{y} = \sum_{k = 1}^{q} s_{k} (\frac{1}{\sqrt{n}} \sum_{i = 1}^{n} {\hat{Y}}_{i} A_{k} {\hat{Y}}_{i} - Y_{i} A_{k} Y_{i}) + o_{p} (1) .

Since A_k is symmetric, Theorem 3.1 implies that this difference will converge in probability to zero if the diagonal elements of A_kC are zero for each k = 1, . . . , q. This condition can equivalently be written as follows:

\begin{matrix} 0 & = diag (A_{k} C) \\ = diag ({BC}_{θ_{k}} BC - h_{k} BC) ∕ 2 \\ = diag ({BC}_{θ_{k}} - h_{k} I) ∕ 2 \\ h_{k} 1 & = diag ({BC}_{θ_{k}}) . \end{matrix}

The above condition can only be met if, for each k, the diagonal elements of BCθ_k all take on a common value. If they do, then the convergence in probability of λ_ŷ(s, t) – λ_y(s, t) to zero can be obtained by setting t = h^Ts, where h_k = tr(BCθ_k)/p.

Setting ψ = 1, we have $h_{k} = tr ({BC}_{θ_{k}}) ∕ p = - I_{ψ ψ}^{- 1} I_{ψ θ_{k}}$ , and so setting $t = h^{T} s = - I_{ψ ψ}^{- 1} I_{ψ θ} s$ results in λ_y, λ_ŷ and λ_r each converging in distribution to a N(–s^TI_θθ·ψs/2, s^TI_θθ·ψ s) random variable, where $I_{θ θ \cdot ψ} = I_{θ θ} - I_{θ ψ} I_{θ ψ}^{T} ∕ I_{ψ ψ}$ is the information for θ in this parametric model. We summarize this result in the following corollary:

Corollary 4.2. Let ${C (θ) : θ \in ϴ \subset R^{q}}$ be a collection of positive definite correlation matrices such that C(θ) is twice differentiable, and for each k, the diagonal entries of BCθ_k are equal to some common value. If Y₁, . . . , Y_n are i.i.d. from a population with absolutely continuous marginal distributions and copula C(θ) for some θ ∈ Θ, then the distribution of the rank likelihood ratio λ_r(s) converges to a N(–s^TI_θθ·ψs/2, s I_θθ·ψs) distribution, where I_θθ·ψ is the information for θ in the normal model with correlation C(θ) and equal marginal precisions ψ.

5 Asymptotic efficiency in some simple examples

Obtaining the maximum likelihood estimator of a copula parameter θ from the rank likelihood is problematic due to the complicated nature of the likelihood. An easy-to-compute alternative estimator is the maximizer in θ of the pseudo-likelihood, which is essentially the probability of the observed data with the unknown marginal CDFs replaced with empirical estimates. Genest et al. [1995] studied the asymptotic properties of this pseudo-likelihood estimator (PLE) and obtained a formula for its asymptotic variance.

For Gaussian copula models, we can compare this asymptotic variance to the information bound $I_{θ θ \cdot ψ}^{- 1}$ obtained from Theorem 4.1 to evaluate the asymptotic efficiency of the PLE. This is most easily done in the case of a one-parameter copula model for which the conditions of Corollary 4.2 hold, as in this case the least favorable submodel is a simple two-parameter multivariate normal model with equal marginal variances. For such models, the value of I_θθ·ψ can be computed from the variance of the efficient influence function ľ_θ(y):

\begin{matrix} {\overset{ˇ}{l}}_{θ} (y) & = I_{θ θ \cdot ψ}^{- 1} [{\dot{l}}_{θ} (y) - I_{θ ψ} I_{ψ ψ}^{- 1} {\dot{l}}_{ψ} (y)] \\ = I_{θ θ}^{- 1} [{\dot{l}}_{θ} (y) - I_{θ ψ} {\tilde{l}}_{ψ} (y)], \end{matrix}

where l̃_ψ(y) is the efficient influence function for ψ, given by ${\tilde{l}}_{ψ} (y) = I_{ψ ψ \cdot θ}^{- 1} [{\dot{l}}_{ψ} (y) - I_{ψ θ} I_{θ θ}^{- 1} {\dot{l}}_{θ} (y)]$ (see, for example, Bickel et al. [1993], Chapter 2). This can be compared to the influence function for the PLE, which is given by

{\overset{ˇ}{l}}_{θ}^{P} (y) = I_{θ θ}^{- 1} ({\dot{l}}_{θ} (y) + \sum_{j = 1}^{p} W_{j} (y_{j})),

where the likelihood derivative and information matrix are based on the multivariate normal likelihood, and W_j(y_j) is defined as

W_{j} (y_{j}) = \int_{{[0, 1]}^{p}} (\frac{\partial^{2}}{\partial θ \partial u_{j}} log c (u ∣ θ)) (1 {Φ (y_{j}) \leq u_{j}} - u_{j}) c (u ∣ θ) d u .

By inspection, the two influence functions are equal if $\sum_{j = 1}^{p} W_{j} (y_{j}) = - I_{θ ψ} {\tilde{l}}_{ψ} (y) \forall y \in R^{p}$ , in which case the PLE is asymptotically efficient. To compute W_j(y_j) for j = 1, . . . , p, note that for a Gaussian copula model, we have

\begin{matrix} \frac{\partial}{\partial θ} log c (u ∣ θ) & = - [tr ({BC}_{θ}) + y^{T} B_{θ} y] ∕ 2 \\ \frac{\partial^{2}}{\partial θ \partial u_{j}} log c (u ∣ θ) & = - \sum_{k = 1}^{p} {(B_{θ})}_{j, k} \frac{Φ^{- 1} (u_{k})}{ϕ (Φ^{- 1} (u_{j}))}, \end{matrix}

where y = (Φ^–1(u₁), . . . , Φ^–1(u_p)), C is the correlation matrix under θ and B = C^–1. Straightforward calculations [Shorack, 2000, page 116] give

\begin{matrix} \sum_{j = 1}^{p} W_{j} (y_{j}) & = \frac{1}{2} \sum_{j = 1}^{p} \sum_{k = 1}^{p} {(B_{θ})}_{j, k} C_{j, k} (Y_{j}^{2} - 1) \\ = tr (B_{θ} C [D (y \circ y) - I]) ∕ 2 = tr ({BC}_{θ} [I - D (y \circ y)]) ∕ 2 \end{matrix}

where D(y ○ y) is the diagonal matrix with elements $y_{1}^{2}, \dots, y_{p}^{2}$ , and the last line follows from the fact that B_θC = –BC_θ. Recall that for the models we are considering here, the diagonal elements of BC_θ are assumed to all be equal, and so we can write

\sum_{j = 1}^{p} W_{j} (y_{j}) = \frac{1}{2 p} tr ({BC}_{θ}) \sum_{j = 1}^{p} (1 - y_{j}^{2}) .

On the other hand, I_θψ = –tr(BC_θ )/2, and so our condition for asymptotic efficiency becomes

\begin{matrix} - I_{θ ψ} {\tilde{l}}_{ψ} (y) & = \sum_{j = 1}^{p} W_{j} (y_{j}) \\ \frac{1}{2} tr ({BC}_{θ}) {\tilde{l}}_{ψ} (y) & = \frac{1}{2 p} tr ({BC}_{θ}) \sum_{j = 1}^{p} (1 - y_{j}^{2}) \\ {\tilde{l}}_{ψ} (y) & = \frac{1}{p} \sum_{j = 1}^{p} (1 - y_{j}^{2}) . \end{matrix}

(3)

We emphasize that this criterion for asymptotic efficiency only applies to one-parameter Gaussian copula models for which the conditions of Corollary 4.2 hold. Such models include the one-parameter exchangeable correlation model {C(θ) : θ ∈ (–(p – 1)^–1, 1)}, for which all off-diagonal elements are equal to θ, as well as any model in which the rows of C(θ) are permutations of one another. To see this, note that if c_i, the ith row of C(θ), is a permutation of c_j, then b_i, the ith row of B, is the same permutation of b_j. Therefore $b_{i}^{T} c_{θ, i} = b_{j}^{T} c_{θ, j}$ for each i and j, and so the conditions of Corollary 4.2 are satisfied. Subclasses of such correlation matrices include circular correlation models, often used for seasonal data [Olkin and Press, 1969, Khattree and Naik, 1994], and any model in which the rows of C are permutations of circular matrices.

Exchangeable correlation model: Consider the p = 4 exchangeable correlation matrix, for which

C = (1 - θ) I + θ 11^{T}, C_{θ} = 11^{T} - I, B = {(1 - θ)}^{- 1} I - \frac{θ}{(1 - θ) (1 + 3 θ)} 11^{T} .

This gives

\begin{matrix} I_{θ θ} & = \frac{1}{2} tr ({BC}_{θ} {BC}_{θ}) = 6 \frac{1 + 3 θ^{2}}{{(1 + 2 θ - 3 θ^{2})}^{2}} \\ I_{θ ψ} & = - \frac{1}{2 ψ} tr ({BC}_{θ}) = \frac{6 θ}{1 + 2 θ - 3 θ^{2}} \\ I_{ψ ψ \cdot θ} & = \frac{2}{1 + 3 θ^{2}} \end{matrix}

and

\begin{matrix} {\dot{l}}_{θ} (y) & = \frac{6 θ}{1 + 2 θ - 3 θ^{2}} + \frac{1}{{(1 + 2 θ - 3 θ^{2})}^{2}} [(1 + 3 θ^{2}) \sum_{1 \leq i < j \leq 4} y_{i} y_{j} - 3 θ (1 + θ) \sum_{j = 1}^{4} y_{j}^{2}] \\ {\dot{l}}_{ψ} (y) & = \frac{1}{2} [\frac{4}{ψ} - y^{T} B y] = 2 ∕ ψ - \frac{(1 ∕ 2 + θ) \sum_{j = 1}^{4} y_{j}^{2} - θ \sum_{1 \leq i < j \leq 4} y_{i} y_{j}}{1 + 2 θ - 3 θ^{2}}, \end{matrix}

so that when ψ = 1, we have

\begin{matrix} {\dot{l}}_{ψ} (y) - I_{ψ θ} I_{θ θ}^{- 1} {\dot{l}}_{θ} (y) & = 2 + \frac{θ \sum_{1 \leq i < j \leq 4} y_{i} y_{j} - (1 ∕ 2 + θ) \sum_{j = 1}^{4} y_{j}^{2}}{1 + 2 θ - 3 θ^{2}} - \frac{6 θ}{(1 + 2 θ - 3 θ^{2})} \cdot \frac{{(1 + 2 θ - 3 θ^{2})}^{2}}{6 (1 + 3 θ^{2})} \cdot \frac{6 θ (1 + 2 θ - 3 θ^{2}) + (1 + 3 θ^{2}) \sum_{1 \leq i < j \leq 4} y_{i} y_{j} - 3 θ (1 + θ) \sum_{j = 1}^{4} y_{j}^{2}}{{(1 + 2 θ - 3 θ^{2})}^{2}} \\ = \frac{2}{1 + 3 θ^{2}} - \frac{1 + 2 θ - 3 θ^{2}}{2 (1 + 3 θ^{2}) (1 + 2 θ - 3 θ^{2})} \sum_{j = 1}^{4} y_{j}^{2} \\ = \frac{2}{1 + 3 θ^{2}} - \frac{1}{1 + 3 θ^{2}} \frac{1}{2} \sum_{j = 1}^{4} y_{j}^{2} \\ = \frac{1}{4} \frac{2}{1 + 3 θ^{2}} \sum_{j = 1}^{4} (1 - y_{j}^{2}), \end{matrix}

and so finally

\begin{matrix} {\tilde{ℓ}}_{ψ} (y) & = I_{ψ ψ \cdot θ}^{- 1} [{\dot{l}}_{ψ} (y) - I_{ψ θ} I_{θ θ}^{- 1} {\dot{l}}_{θ}] \\ = \frac{1 + 3 θ^{2}}{2} (\frac{1}{4} \frac{2}{1 + 3 θ^{2}} \sum_{j = 1}^{4} (1 - y_{j}^{2})) \\ = \frac{1}{4} \sum_{j = 1}^{4} (1 - y_{j}^{2}), \end{matrix}

and so our criterion (3) for asymptotic efficiency is met.

Circular correlation model: Consider the correlation model such that

C (θ) = (\begin{matrix} 1 & θ & θ^{2} & θ \\ θ & 1 & θ & θ^{2} \\ θ^{2} & θ & 1 & θ \\ θ & θ^{2} & θ & 1 \end{matrix}) .

For this model, we have

I_{θ θ} = \frac{4 (1 + 2 θ^{2})}{{(1 - θ^{2})}^{2}}, I_{θ ψ} = \frac{4 θ}{1 - θ^{2}}, I_{ψ ψ \cdot θ} = \frac{2}{ψ^{2}} \frac{1}{1 + 2 θ^{2}} .

Letting $t_{0} = \sum y_{j}^{2}, t_{1} = 2 (y_{1} y_{2} + y_{1} y_{4} + y_{2} y_{3} + y_{3} y_{4})$ and t₂ = 2(y₁y₃ + y₂y₄), we have

\begin{matrix} {\dot{l}}_{θ} (y) & = \frac{4 θ}{1 - θ^{2}} - \frac{4 θ t_{0} - (1 + 3 θ^{2}) t_{1} + 2 θ (1 + θ^{2}) t_{2}}{2 {(1 - θ^{2})}^{3}} \\ {\dot{l}}_{ψ} (y) & = 2 ∕ ψ - \frac{t_{0} - θ t_{1} + θ^{2} t_{2}}{2 {(1 - θ^{2})}^{2}} . \end{matrix}

Further calculations give

{\tilde{l}}_{ψ} (y) = 1 - \frac{(1 - 2 θ^{2}) t_{0} + θ^{3} t_{1} - θ^{2} t_{2}}{4 {(1 - θ^{2})}^{2}} \neq \frac{1}{4} \sum_{j = 1}^{4} (1 - y_{j}^{2}),

and so our criterion for asymptotic efficiency is not met. Additional calculations (available from the authors) show that the asymptotic variance of the PLE is given by

Var [I_{θ θ}^{- 1} [{\dot{l}}_{θ} + \sum_{j = 1}^{4} W_{j} (y_{j})]] = I_{θ θ \cdot ψ}^{- 1} [1 + \frac{2 θ^{6}}{{(1 + 2 θ^{2})}^{2}}] .

The first panel Figure 1 plots the asymptotic variance of the PLE with the information bound, and the second panel plots their difference. The PLE is very nearly asymptotically efficient in this example, but this small discrepancy indicates that the PLE is not generally asymptotically efficient for Gaussian copula models.

Asymptotic variances for the circular copula model. The left panel gives the information bound (dashed black line) and the asymptotic variance of the PLE (gray line). and the right panel gives the difference between these two quantities as a function of θ.

6 Discussion

In this article, we have shown that the existence of a sufficiently accurate rank measurable approximation to the localized log likelihood of a copula parameterized model implies the local asymptotic normality of the log rank likelihood. We have also shown that such approximations exist for every smoothly parameterized Gaussian copula model. For such a copula model, the asymptotic information bound implied by the rank likelihood matches that of the corresponding parametric multivariate normal submodel. This result suggests the possibility of semiparametrically efficient rank-based estimators for Gaussian copula models: Generally speaking, the information I_r based on the ranks is less than or equal to the the semiparametric information I_f based on the full data, as the ranks are functions of the full data [Le Cam and Yang, 1988]. Furthermore, the semiparametric information based on the full data is less than or equal to I_p, the infimum of information functions over all parametric submodels, and so I_r ≤ I_f ≤ I_p in general. On the other hand, for Gaussian copula models we have shown that I_r is equal to the information for a particular parametric submodel, the corresponding multivariate normal model. This implies that for a given Gaussian copula model, the corresponding multivariate normal model is least favorable, that I_r = I_p and therefore I_r = I_f = I_p.

Based on this result, and the partial sufficiency of the multivariate ranks in semiparametric copula models in general, we conjecture that maximum likelihood estimators based on rank likelihoods are asymptotically efficient for Gaussian copula models, and possibly more generally whenever information bounds based on the complete data for the semiparametric model in question exist. However, the rank likelihood involves a multivariate integral over a set of order constraints, the number of which grows with the sample size, making it difficult to use or study. An alternative to the rank likelihood estimator is the pseudo-likelihood estimator [Genest et al., 1995], which is a very explicit function of the copula density, making optimization and asymptotic analysis tractable. For the one-parameter bivariate Gaussian copula model, the rank-based pseudo-likelihood estimator is asymptotically equivalent to the normal scores correlation coefficient, which Klaassen and Wellner [1997] showed to be asymptotically efficient. However, Genest and Werker [2002] showed with a non-Gaussian example that the pseudo-likelihood estimator is not generally asymptotically efficient, and in this article we have shown that this estimator is not generally asymptotically efficient for the restricted class of Gaussian copula models. However, this does not rule out the possibility that other rank-based estimators, such as the maximizer of the rank likelihood, are asymptotically efficient.

Acknowledgments

Peter Hoff's research was supported in part by NI-CHD grant 1R01HD067509-01A1. Jon Wellner's research was supported in part by by NSF Grants DMS-0804587 and DMS-1104832, by NI-AID grant 2R01 AI291968-04, and by the Alexander von Humboldt Foundation.

Contributor Information

Peter D. Hoff, Professor of Statistics and Biostatistics University of Washington Seattle, WA 98195-4322

Xiaoyue Niu, Research Assistant Professor of Statistics Penn State University University Park, PA 16802.

Jon A. Wellner, Professor of Statistics and Biostatistics University of Washington Seattle, WA 98195-4322

References

Barnard GA. Logical aspects of the fiducial argument. Bull. Inst. Internat. Statist. 1963;40:870–883. [Google Scholar]
Begun Janet M., Hall WJ, Huang Wei-Min, Wellner Jon A. Information and asymptotic efficiency in parametric–nonparametric models. Ann. Statist. 1983;11(2):432–452. ISSN 0090-5364. doi: 10.1214/aos/1176346151. URL http://dx.doi.org/10.1214/aos/1176346151. [Google Scholar]
Bickel PJ, Ritov Y. Festschrift for Lucien Le Cam. Springer; New York: 1997. Local asymptotic normality of ranks and covariates in transformation models. pp. 43–54. [Google Scholar]
Bickel Peter J., Klaassen Chris A. J., Ritov Ya'acov, Wellner Jon A. Johns Hopkins Series in the Mathematical Sciences. Johns Hopkins University Press; Baltimore, MD: 1993. Efficient and adaptive estimation for semiparametric models. ISBN 0-8018-4541-6. [Google Scholar]
Chen Xiaohong, Fan Yanqin, Tsyrennikov Viktor. Efficient estimation of semipara-metric multivariate copula models. J. Amer. Statist. Assoc. 2006;101(475):1228–1240. ISSN 0162-1459. doi: 10.1198/016214506000000311. URL http://dx.doi.org/10.1198/016214506000000311. [Google Scholar]
Choi Sungsub, Hall WJ, Schick Anton. Asymptotically uniformly most powerful tests in parametric and semiparametric models. Ann. Statist. 1996;24(2):841–861. ISSN 0090-5364. doi: 10.1214/aos/1032894469. URL http://dx.doi.org/10.1214/aos/1032894469. [Google Scholar]
de Wet T, Venter JH. Asymptotic distributions of certain test criteria of normality. South African Statist. J. 1972;6:135–149. ISSN 0038-271X. [Google Scholar]
Genest C, Ghoudi K, Rivest L-P. A semiparametric estimation procedure of dependence parameters in multivariate families of distributions. Biometrika. 1995;82(3):543–552. ISSN 0006-3444. [Google Scholar]
Genest Christian, Werker Bas J. M. Distributions with given marginals and statistical modelling. Kluwer Acad. Publ.; Dordrecht: 2002. Conditions for the asymptotic semiparametric efficiency of an omnibus estimator of dependence parameters in copula models. pp. 103–112. [Google Scholar]
Hall WJ, Loynes RM. On the concept of contiguity. Ann. Probability. 1977;5(2):278–282. [Google Scholar]
Hoff Peter D. Extending the rank likelihood for semiparametric copula estimation. Ann. Appl. Stat. 2007;1(1):265–283. ISSN 1932-6157. [Google Scholar]
Hoff Peter D. Rank likelihood estimation for continuous and discrete data. ISBA Bulletin. 2008;15(1):8–10. URL http://bayesian.org/sites/default/files/fm/bulletins/0803.pdf. [Google Scholar]
Khattree Ravindra, Naik Dayanand N. Estimation of interclass correlation under circular covariance. Biometrika. 1994;81(3):612–617. ISSN 0006-3444. doi: 10.1093/biomet/81.3.612. URL http://dx.doi.org/10.1093/biomet/81.3.612. [Google Scholar]
Klaassen Chris A. J., Wellner Jon A. Efficient estimation in the bivariate normal copula model: normal margins are least favourable. Bernoulli. 1997;3(1):55–77. ISSN 1350-7265. [Google Scholar]
Le Cam Lucien, Yang Grace L. On the preservation of local asymptotic normality under information loss. Ann. Statist. 1988;16(2):483–520. ISSN 0090-5364. doi: 10.1214/aos/1176350817. URL http://dx.doi.org/10.1214/aos/1176350817. [Google Scholar]
Olkin I, Press SJ. Testing and estimation for a circular stationary model. Ann. Math. Statist. 1969;40:1358–1373. ISSN 0003-4851. [Google Scholar]
Pettitt AN. Inference for the linear model using a likelihood based on ranks. J. Roy. Statist. Soc. Ser. B. 1982;44(2):234–243. ISSN 0035-9246. [Google Scholar]
Rémon M. On a concept of partial sufficiency: L-sufficiency. Internat. Statist. Rev. 1984;52(2):127–135. ISSN 0306-7734. [Google Scholar]
Severini Thomas A. Likelihood methods in statistics, volume 22 of Oxford Statistical Science Series. Oxford University Press; Oxford: 2000. ISBN 0-19-850650-3. [Google Scholar]
Shorack Galen R. Springer Texts in Statistics. Springer-Verlag; New York: 2000. Probability for statisticians. ISBN 0-387-98953-6. [Google Scholar]
Sklar M. Fonctions de répartitionà n dimensions et leurs marges. Publ. Inst. Statist. Univ. Paris. 1959;8:229–231. [Google Scholar]

[R1] Barnard GA. Logical aspects of the fiducial argument. Bull. Inst. Internat. Statist. 1963;40:870–883. [Google Scholar]

[R2] Begun Janet M., Hall WJ, Huang Wei-Min, Wellner Jon A. Information and asymptotic efficiency in parametric–nonparametric models. Ann. Statist. 1983;11(2):432–452. ISSN 0090-5364. doi: 10.1214/aos/1176346151. URL http://dx.doi.org/10.1214/aos/1176346151. [Google Scholar]

[R3] Bickel PJ, Ritov Y. Festschrift for Lucien Le Cam. Springer; New York: 1997. Local asymptotic normality of ranks and covariates in transformation models. pp. 43–54. [Google Scholar]

[R4] Bickel Peter J., Klaassen Chris A. J., Ritov Ya'acov, Wellner Jon A. Johns Hopkins Series in the Mathematical Sciences. Johns Hopkins University Press; Baltimore, MD: 1993. Efficient and adaptive estimation for semiparametric models. ISBN 0-8018-4541-6. [Google Scholar]

[R5] Chen Xiaohong, Fan Yanqin, Tsyrennikov Viktor. Efficient estimation of semipara-metric multivariate copula models. J. Amer. Statist. Assoc. 2006;101(475):1228–1240. ISSN 0162-1459. doi: 10.1198/016214506000000311. URL http://dx.doi.org/10.1198/016214506000000311. [Google Scholar]

[R6] Choi Sungsub, Hall WJ, Schick Anton. Asymptotically uniformly most powerful tests in parametric and semiparametric models. Ann. Statist. 1996;24(2):841–861. ISSN 0090-5364. doi: 10.1214/aos/1032894469. URL http://dx.doi.org/10.1214/aos/1032894469. [Google Scholar]

[R7] de Wet T, Venter JH. Asymptotic distributions of certain test criteria of normality. South African Statist. J. 1972;6:135–149. ISSN 0038-271X. [Google Scholar]

[R8] Genest C, Ghoudi K, Rivest L-P. A semiparametric estimation procedure of dependence parameters in multivariate families of distributions. Biometrika. 1995;82(3):543–552. ISSN 0006-3444. [Google Scholar]

[R9] Genest Christian, Werker Bas J. M. Distributions with given marginals and statistical modelling. Kluwer Acad. Publ.; Dordrecht: 2002. Conditions for the asymptotic semiparametric efficiency of an omnibus estimator of dependence parameters in copula models. pp. 103–112. [Google Scholar]

[R10] Hall WJ, Loynes RM. On the concept of contiguity. Ann. Probability. 1977;5(2):278–282. [Google Scholar]

[R11] Hoff Peter D. Extending the rank likelihood for semiparametric copula estimation. Ann. Appl. Stat. 2007;1(1):265–283. ISSN 1932-6157. [Google Scholar]

[R12] Hoff Peter D. Rank likelihood estimation for continuous and discrete data. ISBA Bulletin. 2008;15(1):8–10. URL http://bayesian.org/sites/default/files/fm/bulletins/0803.pdf. [Google Scholar]

[R13] Khattree Ravindra, Naik Dayanand N. Estimation of interclass correlation under circular covariance. Biometrika. 1994;81(3):612–617. ISSN 0006-3444. doi: 10.1093/biomet/81.3.612. URL http://dx.doi.org/10.1093/biomet/81.3.612. [Google Scholar]

[R14] Klaassen Chris A. J., Wellner Jon A. Efficient estimation in the bivariate normal copula model: normal margins are least favourable. Bernoulli. 1997;3(1):55–77. ISSN 1350-7265. [Google Scholar]

[R15] Le Cam Lucien, Yang Grace L. On the preservation of local asymptotic normality under information loss. Ann. Statist. 1988;16(2):483–520. ISSN 0090-5364. doi: 10.1214/aos/1176350817. URL http://dx.doi.org/10.1214/aos/1176350817. [Google Scholar]

[R16] Olkin I, Press SJ. Testing and estimation for a circular stationary model. Ann. Math. Statist. 1969;40:1358–1373. ISSN 0003-4851. [Google Scholar]

[R17] Pettitt AN. Inference for the linear model using a likelihood based on ranks. J. Roy. Statist. Soc. Ser. B. 1982;44(2):234–243. ISSN 0035-9246. [Google Scholar]

[R18] Rémon M. On a concept of partial sufficiency: L-sufficiency. Internat. Statist. Rev. 1984;52(2):127–135. ISSN 0306-7734. [Google Scholar]

[R19] Severini Thomas A. Likelihood methods in statistics, volume 22 of Oxford Statistical Science Series. Oxford University Press; Oxford: 2000. ISBN 0-19-850650-3. [Google Scholar]

[R20] Shorack Galen R. Springer Texts in Statistics. Springer-Verlag; New York: 2000. Probability for statisticians. ISBN 0-387-98953-6. [Google Scholar]

[R21] Sklar M. Fonctions de répartitionà n dimensions et leurs marges. Publ. Inst. Statist. Univ. Paris. 1959;8:229–231. [Google Scholar]

PERMALINK

Information bounds for Gaussian copulas

Peter D Hoff

Xiaoyue Niu

Jon A Wellner

Abstract

1 Rank likelihood for copula models

2 Approximating the rank likelihood ratio

3 Rank approximations to normal quadratic forms

4 LAN for general Gaussian copulas

5 Asymptotic efficiency in some simple examples

Figure 1.

6 Discussion

Acknowledgments

Contributor Information

References

ACTIONS

PERMALINK

RESOURCES

Cite

Add to Collections

PERMALINK

Information bounds for Gaussian copulas

Peter D Hoff

Xiaoyue Niu

Jon A Wellner

Abstract

1 Rank likelihood for copula models

2 Approximating the rank likelihood ratio

3 Rank approximations to normal quadratic forms

4 LAN for general Gaussian copulas

5 Asymptotic efficiency in some simple examples

Figure 1.

6 Discussion

Acknowledgments

Contributor Information

References

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases