On specification tests for composite likelihood inference

JING HUANG; YANG NING; NANCY REID; YONG CHEN

doi:10.1093/biomet/asaa039

. Author manuscript; available in PMC: 2021 Jun 25.

Published in final edited form as: Biometrika. 2020 Jun 14;107(4):907–917. doi: 10.1093/biomet/asaa039

On specification tests for composite likelihood inference

JING HUANG ¹, YANG NING ², NANCY REID ³, YONG CHEN ⁴

PMCID: PMC8232013 NIHMSID: NIHMS1693468 PMID: 34176951

Summary

Composite likelihood functions are often used for inference in applications where the data have a complex structure. While inference based on the composite likelihood can be more robust than inference based on the full likelihood, the inference is not valid if the associated conditional or marginal models are misspecified. In this paper, we propose a general class of specification tests for composite likelihood inference. The test statistics are motivated by the fact that the second Bartlett identity holds for each component of the composite likelihood function when these components are correctly specified. We construct the test statistics based on the discrepancy between the so-called composite information matrix and the sensitivity matrix. As an illustration, we study three important cases of the proposed tests and establish their limiting distributions under both null and local alternative hypotheses. Finally, we evaluate the finite-sample performance of the proposed tests in several examples.

Keywords: Bartlett identity, Information matrix, Misspecification test, Model specification

1. Introduction

The composite likelihood function (Besag, 1974; Lindsay, 1988; Cox & Reid, 2004) is an inference function constructed as the product of a set of conditional or marginal density functions, whether or not the events defined in each component are mutually independent. It has been used in longitudinal studies (Molenberghs & Verbeke, 2005; Chandler & Bate, 2007), for the analysis of panel data (Wellner & Zhang, 2007), with spatial modelling (Heagerty & Lele, 1998; Guan, 2006), with missing data (He & Yi, 2011), with graphical models (Xue et al., 2012; Chen et al., 2015; Yang et al., 2015) and with change point detection in multivariate time series (Ma & Yau, 2016), among many other settings. One of the main reasons for its widespread application is that only part of the data-generating mechanism needs to be specified, which can reduce the computational cost and provide some robustness to model misspecification. For a comprehensive discussion of composite likelihood and its applications, see the review paper by Varin et al. (2011).

Although inference based on the composite likelihood has been well developed (Kent, 1982; Lindsay, 1988; Pagui et al., 2014), a corresponding specification test does not seem to exist. Aside from technical difficulties, on which we will elaborate later, the lack of development in this area may be due to the perception that composite likelihood inference is robust to partial model misspecification (Xu & Reid, 2011). However, composite likelihood inference still requires the specification of a set of lower-dimensional conditional or marginal models, and the corresponding results can be misleading if these are misspecified. For instance, Ogden (2016) showed that the maximum composite likelihood estimator is inconsistent in a generalized linear mixed model with a misspecified random effect distribution, and the misspecification bias of the maximum composite likelihood estimator is significantly larger than the misspecification bias in the maximum likelihood estimator.

In this paper we propose a class of specification tests for composite likelihood functions. A general test for model specification with ordinary likelihood functions is the information matrix test proposed by White (1982). This relies on the second Bartlett identity, which holds when the model is correctly specified: it compares the variability matrix to the sensitivity matrix, both defined in § 2 below. A Wald-type test statistic is constructed from the difference between these two matrices. Presnell & Boos (2004) proposed an ‘in-and-out sample’ likelihood ratio test, which is asymptotically equivalent to a multiplicative contrast between the sensitivity matrix and the variability matrix. More recently, Zhou et al. (2012) developed an information ratio test which can effectively test the specifications of variance and covariance functions in generalized estimating equations (Liang & Zeger, 1986).

These specification tests rely on the fact that the second Bartlett identity holds under the correct model. For a composite likelihood, the second Bartlett identity does not hold, even when the component densities are correctly specified (Varin et al., 2011). Thus, direct application of the tests based on the information matrix is invalid. To circumvent this difficulty, we define the composite information matrix, and show that this can recover a counterpart of the second Bartlett identity. We propose a general class of specification tests for composite likelihood inference based on the discrepancy between the composite information matrix and the sensitivity matrix. We illustrate the usefulness of the proposed class of tests with three important special cases: the composite information matrix test, composite information ratio test and composite information max test. We establish their asymptotic distributions under the null hypothesis, and asymptotic power under local alternatives.

2. Specification tests for composite likelihood functions

2.1. Composite information matrix and the Bartlett identity

Let f (y; θ) be the joint probability density function of a multi-dimensional random vector Y, indexed by a p-dimensional parameter θ = (θ₁, … , θ_p) in the parameter space Ω. Let { $A_{1} (y), \dots, A_{K} (y)$ } denote a sequence of sets. The log probability of the event { $Y \in A_{k} (y)$ } is $ℓ_{k} (θ; y) = \log \int_{u \in A_{k} (y)} f (u; θ) d u$ , where k = 1, 2, … , K and K is the total number of events. Assume that n independently and identically distributed random variables Y₁, … , Y_n are observed from the model f(y; θ). A composite loglikelihood function based on this sequence of events is

c ℓ (θ) = n^{- 1} \sum_{i = 1}^{n} \sum_{k = 1}^{K} ω_{i k} ℓ_{k} (θ; y_{i}),

where ω_ik is a nonnegative weight associated with the event { $Y_{i} \in A_{k} (y_{i})$ }. The variability and sensitivity matrices associated with cℓ(θ) are

J (θ) = E [{\sum_{k = 1}^{K} ω_{i k} \frac{\partial}{\partial θ} ℓ_{k} (θ; y_{i})}^{\otimes 2}], H (θ) = - E {\sum_{k = 1}^{K} ω_{i k} \frac{\partial^{2}}{\partial θ \partial θ^{T}} ℓ_{k} (θ; y_{i})} .

Since defining a composite likelihood only requires models for the events { $Y \in A_{k} (y)$ } for 1 ⩽ k ⩽ K, specification tests should be tailored to the modelling of { $Y \in A_{k} (y)$ }. For instance, in the analysis of clustered data, the independence likelihood in Chandler & Bate (2007) only involves modelling the conditional mean function. Thus, the corresponding specification tests should be to verify the validity of the conditional mean model rather than, for example, the correlation structure of the full model.

Formally, the null hypothesis for a specification test of the composite likelihood can be formulated as

H_{0} : there exists a θ \in Ω such that pr {Y \in A_{k} (y)} \propto exp {ℓ_{k} (θ; y)}, for all k = 1, \dots, K .

As noted in § 1, the second Bartlett identity, which is the basis for all existing information based specification tests, does not hold for the composite likelihood even when the assumed model is correct. This statement is illustrated by an example of the multivariate normal model in the Supplementary Material. To develop specification tests, we construct a new identity by defining the composite information matrix I_c(θ) as follows:

I_{c} (θ) = E [\sum_{k = 1}^{K} ω_{i k} {\frac{\partial}{\partial θ} ℓ_{k} (θ; y_{i})}^{\otimes 2}] .

When each loglikelihood function ℓ_k(θ; y) is correctly specified, under mild regularity conditions, we have

I_{c} (θ) - H (θ) = 0 .

(1)

Based on this identity, we propose a general class of specification tests for the composite likelihood.

2.2. A general class of specification tests for the composite likelihood

Define the maximum composite likelihood estimator as ${\hat{θ}}_{n} = {argmax}_{θ \in Ω} c ℓ (θ)$ , and the observed composite information matrix and sensitivity matrix as

{\hat{I}}_{c} ({\hat{θ}}_{n}) = \frac{1}{n} \sum_{i = 1}^{n} \sum_{k = 1}^{K} ω_{i k} {\frac{\partial}{\partial θ} ℓ_{k} ({\hat{θ}}_{n}; y_{i})}^{\otimes 2}, \hat{H} ({\hat{θ}}_{n}) = - \frac{1}{n} \sum_{i = 1}^{n} \sum_{k = 1}^{K} ω_{i k} \frac{\partial^{2}}{\partial θ \partial θ^{T}} ℓ_{k} ({\hat{θ}}_{n}; y_{i}) .

Our family of specification tests is constructed by comparing ${\hat{I}}_{c} ({\hat{θ}}_{n})$ and $\hat{H} ({\hat{θ}}_{n})$ . Specifically, we consider a discrepancy function $d : R^{p \times p} \times R^{p \times p} \to [0, \infty)$ which satisfies d(M₁, M₂) ⩾ 0 and d(M₁, M₁) = 0 for any $M_{1}, M_{2} \in R^{p \times p}$ . It can be seen that d is a premetric, more general than a standard metric in the sense that we do not require the symmetry or subadditivity of d. As will be seen later, some special cases of d are indeed only a premetric. Given a discrepancy function d, we consider the corresponding test statistic

Q = d {{\hat{I}}_{c} ({\hat{θ}}_{n}), \hat{H} ({\hat{θ}}_{n})},

(2)

which measures the distance between the observed composite information matrix and the sensitivity matrix. Under H₀, (1) holds and therefore d{I_c(θ), H(θ)} = 0. Under mild regularity conditions, the composite information matrix and the sensitivity matrix, I_c(θ) and H(θ), can be consistently estimated by ${\hat{I}}_{c} ({\hat{θ}}_{n})$ and $\hat{H} ({\hat{θ}}_{n})$ respectively. Thus, we expect Q to be small when the composite likelihood is correctly specified. On the other hand, a large value of Q implies the potential misspecification of the composite likelihood. In the following we consider three special cases of d and study their asymptotic null distributions and local asymptotic powers.

2.3. Special case 1: composite information matrix test

In this subsection we take d as a scaled L₂ metric, which leads to a composite information matrix test. Since ${\hat{I}}_{c} ({\hat{θ}}_{n})$ and $\hat{H} ({\hat{θ}}_{n})$ are symmetric matrices, we only need to consider their upper triangular entries when constructing Q. That is,

d (M_{1}, M_{2}) = (m_{1} - m_{2})^{T} W (m_{1} - m_{2}),

(3)

where m₁ and m₂ are p(p + 1)/2-dimensional vectors obtained by vectorizing the upper triangular elements of the matrices M₁ and M₂, and W is a positive definite matrix used to standardize the test statistic. In this case, d is indeed a metric. When cℓ(θ) corresponds to the loglikelihood of the full model, the test statistic Q in (2) with the metric d defined in (3) reduces to the information matrix test in White (1982).

Formally, consider the p(p + 1)/2 upper triangular elements of ${\hat{I}}_{c} ({\hat{θ}}_{n}) - \hat{H} ({\hat{θ}}_{n})$ , and denote the contribution of the ith observation to the (j, t) element (1 ⩽ j ⩽ t ⩽ p) of the matrix ${\hat{I}}_{c} (θ) - \hat{H} (θ)$ by

e_{j t} (y_{i}, θ) = \sum_{k = 1}^{K} ω_{i k} {\frac{\partial}{\partial θ_{j}} ℓ_{k} (θ; y_{i}) \frac{\partial}{\partial θ_{t}} ℓ_{k} (θ; y_{i}) + \frac{\partial^{2}}{\partial θ_{j} \partial θ_{t}} ℓ_{k} (θ; y_{i})},

where θ_j and θ_t are the jth and tth elements of θ. With q = p(p + 1)/2, we stack the q elements e_jt(y_i, θ) for 1 ⩽ j ⩽ t ⩽ p to obtain a q × 1 vector e(y_i, θ). Define $T_{M} (θ) = n^{- 1} \sum_{i = 1}^{n} e (y_{i}, θ)$ and

V_{M} (θ) = E {[e (y_{i}, θ) + E {\frac{\partial}{\partial θ} e (y_{i}, θ)} H^{- 1} (θ) \sum_{k = 1}^{K} ω_{i k} \frac{\partial}{\partial θ} ℓ_{k} (θ; y_{i})]}^{\otimes 2} .

By Taylor expansion and the properties of influence functions, V_M (θ) is the asymptotic variance matrix of $n^{1 ∕ 2} T_{M} ({\hat{θ}}_{n})$ under H₀, and it can be consistently estimated by

{\hat{V}}_{M} = n^{- 1} \sum_{i = 1}^{n} {[e (y_{i}, {\hat{θ}}_{n}) + \frac{\partial}{\partial θ} T_{M} ({\hat{θ}}_{n}) {\hat{H}}^{- 1} ({\hat{θ}}_{n}) \sum_{k = 1}^{K} ω_{i k} \frac{\partial}{\partial θ} ℓ_{k} ({\hat{θ}}_{n}; y_{i})]}^{\otimes 2} .

The proposed composite information matrix test Q^matrix is defined as

Q^{matrix} = n T_{M}^{T} ({\hat{θ}}_{n}) {\hat{V}}_{M}^{- 1} T_{M} ({\hat{θ}}_{n}),

where we set $W = {\hat{V}}_{M}^{- 1}$ in (3) to construct an asymptotic χ² test of Wald type. Throughout the paper we consider the scenario that the sample size n increases, with the total number of events K fixed. The following theorem states the asymptotic distribution of Q^matrix under the null hypothesis.

Theorem 1. Given regularity conditions R1–R7 in the Supplementary Material, as n increases the composite information matrix test statistic Q^matrix converges in distribution to $χ_{q}^{2}$ under H₀.

The above theorem suggests the use of the (1 − α) quantile of the $χ_{q}^{2}$ distribution as the cut-off value for the test Q^matrix. Since the degrees of freedom q increase as a quadratic function of the number of parameters in the composite likelihood, and the composite information matrix test aims to pick up a wide range of deviations from the null, this test may have low power when p is relatively large. This phenomenon is also confirmed in our simulation studies. In the following subsections we propose alternative choices of the discrepancy function d, which attempts to address this issue.

2.4. Special case 2: composite information ratio test

Instead of using the L₂ metric in (3), we consider a discrepancy function which takes a ratio contrast between two matrices, defined as

d (M_{1}, M_{2}) = {tr (M_{1} M_{2}^{- 1}) - p}^{2} ∕ W,

(4)

where W is a positive number used to standardize the test statistic. It can be shown that d in (4) is a premetric rather than a metric. Motivated by (4), we can construct a parsimonious summary statistic by comparing the trace of ${\hat{I}}_{c} ({\hat{θ}}_{n}) {\hat{H}}^{- 1} ({\hat{θ}}_{n})$ to p. This type of test was proposed by Zhou et al. (2012) for the quasilikelihood function with application to the selection of the covariance structure in generalized estimating equations. Unlike quasilikelihood, where the second Bartlett identity holds, the construction of this test is built on the identity in (1). Inspired by Zhou et al. (2012), we call this the composite information ratio test.

The test statistic is

Q^{ratio} = {[n^{1 ∕ 2} {T_{R} ({\hat{θ}}_{n}) - p}]}^{2} ∕ {\hat{V}}_{R},

where $T_{R} ({\hat{θ}}_{n}) = tr {{\hat{I}}_{c} ({\hat{θ}}_{n}) {\hat{H}}^{- 1} ({\hat{θ}}_{n})}$ , and ${\hat{V}}_{R}$ is the estimated asymptotic variance of $T_{R} ({\hat{θ}}_{n})$ . In the Supplementary Material we derive the estimator

{\hat{V}}_{R} = n^{- 1} \sum_{i = 1}^{n} [\sum_{k = 1}^{K} ω_{i k} {\frac{\partial}{\partial θ} ℓ_{k} ({\hat{θ}}_{n}; y_{i})}^{T} {\hat{H}}^{- 1} ({\hat{θ}}_{n}) \frac{\partial}{\partial θ} ℓ_{k} ({\hat{θ}}_{n}; y_{i}) + \frac{\partial}{\partial θ} T_{R} ({\hat{θ}}_{n}) {\hat{H}}^{- 1} ({\hat{θ}}_{n}) \sum_{k = 1}^{K} ω_{i k} \frac{\partial}{\partial θ} ℓ_{k} ({\hat{θ}}_{n}; y_{i}) + tr {{\sum_{k = 1}^{K} ω_{i k} \frac{\partial^{2}}{\partial θ \partial θ^{T}} ℓ_{k} ({\hat{θ}}_{n}; y_{i}) {\hat{H}}^{- 1} ({\hat{θ}}_{n})}]}^{2} .

Theorem 2. Given regularity conditions R1–R7 in the Supplementary Material, the composite information ratio test statistic Q^ratio converges in distribution to $χ_{1}^{2}$ under H₀.

In the simulation studies, the composite information ratio test tends to yield more accurate Type I errors than the composite information matrix test.

2.5. Special case 3: composite information max test

Here we propose another test by considering the maximum discrepancy of the diagonal elements of the linear contrast between the observed composite information matrix and the sensitivity matrix. We call this test the composite information max test. This corresponds to the general test statistic Q in (2) with

d (M_{1}, M_{2}) = ‖ W (m_{1}^{*} - m_{2}^{*}) ‖_{\max},

where W is a p × p positive definite matrix used to standardize the test statistic, and $m_{1}^{*} = {(M_{1})_{11}, \dots, (M_{1})_{p p}}^{T}$ is a p × 1 vector.

Denote the diagonal elements of the linear contrast, contributed by the ith observation, by e_jj(y_i, θ), j = 1, … , p, and let e*(y_i, θ) = {e₁₁(y_i, θ), … , e_pp(y_i, θ)}^T. Define $T_{M}^{*} ({\hat{θ}}_{n}) = n^{- 1} \sum_{i = i}^{n} e^{*} (y_{i}, {\hat{θ}}_{n})$ . Denote by $V_{M}^{*} (θ)$ the asymptotic variance matrix of $n^{1 ∕ 2} T_{M}^{*} ({\hat{θ}}_{n})$ and by ${\hat{V}}_{M}^{*}$ its empirical counterpart. Define $S = n^{1 ∕ 2} {\hat{U}}_{M}^{*} T_{M}^{*} ({\hat{θ}}_{n})$ , where S_j is the jth element of S, j = 1, … , p, and ${\hat{U}}_{M}^{*}$ is a p × p matrix such that ${\hat{U}}_{M}^{* T} {\hat{U}}_{M}^{*} = {\hat{V}}_{M}^{* - 1}$ . Our asymptotic results hold for any choice of ${\hat{U}}_{M}^{*}$ . The composite information max test is given by

Q^{\max} = \max_{1 ⩽ j ⩽ p} ∣ S_{j} ∣ .

The following theorem establishes the asymptotic distribution of the composite information max test under the null hypothesis.

Theorem 3. Given the regularity conditions in the Supplementary Material, we have

Q^{\max} \to \max_{1 ⩽ j ⩽ p} ∣ N_{j} ∣

in distribution under H₀, where N₁, … , N_p are independent and identically distributed standard normal variables. In other words, for any t > 0, lim_n→∞ ∣pr(Q^max ⩽ t) − F (t)^p∣ = 0, where F (·) is the cumulative distribution function of the folded standard normal distribution.

By Theorem 3, we reject the null hypothesis at level α if Q^max > F⁻¹{(1 − α)^1/p}, where F⁻¹(·) is the inverse function of F(·) defined in Theorem 3, as N_j and N_k are independent standard normal random variables for any j ǂ k. In the construction of S, we transform $T_{M}^{*} ({\hat{θ}}_{n})$ by the matrix $U_{M}^{*}$ so that the elements of S are decorrelated.

The main difference among Q^max, Q^matrix and Q^ratio lies in how the individual statistics constructed from some components of the matrix equation (1) are combined. Specifically, the test statistic Q^max combines individual statistics by searching for their maximum across the diagonal, whereas Q^matrix and Q^ratio exploit different quadratic forms of individual statistics. In the Supplementary Material we investigate the asymptotic local power of the three composite information tests under a sequence of locally misspecified models (Copas & Eguchi, 2005).

3. Examples

3.1. Extended Mantel–Haenszel method for stratified case-control studies

The Mantel–Haenszel estimator is widely used to estimate the common odds ratio in a series of 2 × 2 tables in epidemiological studies. Liang (1987) extended the Mantel–Haenszel approach to logistic regression models for stratified case-control studies, to allow simultaneous estimation of effect sizes of multivariate risk factors.

Consider a stratified case-control study, where x_i1, … , x_{id_i} denote the p × 1 vectors of potential risk factors of d_i cases, and let x_{id_i+1}, … , x_{ih_i} denote the potential risk factors of m_i controls in the ith stratum, with m_i = h_i − d_i and i = 1, … , n. A logistic regression model allowing for stratum-specific effects is

logit pr (y_{i j} = 1 ∣ x_{i j}) = α_{i} + β^{T} x_{i j} (i = 1, \dots, n),

(5)

where the coefficient β quantifies the effects of risk factors x_ij on the disease status y_ij.

Liang (1987) proposed a composite likelihood method where the nuisance parameters α_i are eliminated by conditioning. Specifically, for the (j, l) case-control pair of subjects in the ith stratum (j = 1, … , d_i; l = d_i + 1, … , h_i), the conditional probability that x_ij is from the case given that x_ij and x_il are observed is

pr (y_{i j} = 1, y_{i l} = 0 ∣ y_{i j} + y_{i l} = 1, x_{i j}, x_{i l}; α_{i}, β) = \frac{e^{β^{T} x_{i j}}}{e^{β^{T} x_{i j}} + e^{β^{T} x_{i l}}} .

Thus, a composite likelihood can be formulated by considering all d_im_i pairs in the ith stratum,

L_{i} (β) = \prod_{j = 1}^{d_{i}} \prod_{l = d_{i} + 1}^{h_{i}} \frac{e^{β^{T} x_{i j}}}{e^{β^{T} x_{i j}} + e^{β^{T} x_{i l}}} .

A weighted composite loglikelihood function for β combining data from all n strata for β is then constructed as

c ℓ (β) = n^{- 1} \sum_{i = 1}^{n} ω_{i} \log {L_{i} (β)} = n^{- 1} \sum_{i = 1}^{n} ω_{i} \sum_{j = 1}^{d_{i}} \sum_{l = d_{i} + 1}^{h_{i}} \log (\frac{e^{β^{T} x_{i j}}}{e^{β^{T} x_{i j}} + e^{β^{T} x_{i l}}}) .

We consider the weights $ω_{i} = h_{i}^{- 1}$ ; with this choice the maximum composite likelihood estimator reduces to the Mantel–Haenszel estimator when only a binary covariate is considered (Liang, 1987). The composite information matrix and the sensitivity matrix are

I_{c} (β) = E [h_{i}^{- 1} \sum_{j = 1}^{d_{i}} \sum_{l = d_{i} + 1}^{h_{i}} {x_{i j} - \frac{x_{i j} exp (β^{T} x_{i j}) + x_{i l} exp (β^{T} x_{i l})}{exp (β^{T} x_{i j}) + exp (β^{T} x_{i l})}}^{\otimes 2}],

H (β) = E [h_{i}^{- 1} \sum_{j = 1}^{d_{i}} \sum_{l = d_{i} + 1}^{h_{i}} - \frac{{x_{i j} exp (β^{T} x_{i j}) + x_{i l} exp (β^{T} x_{i l})}^{\otimes 2}}{{exp (β^{T} x_{i j}) + exp (β^{T} x_{i l})}^{2}} + \frac{{x_{i j} x_{i j}^{T} exp (β^{T} x_{i j}) + x_{i l} x_{i l}^{T} exp (β^{T} x_{i l})}}{exp (β^{T} x_{i j}) + exp (β^{T} x_{i l})}] .

The proposed composite information tests can be constructed to test the specification of the model. The empirical performance of these tests is evaluated in § 4.

3.2. Undirected graphical model

The undirected graphical model has been widely used to describe the dependence structures of multivariate variables. For simplicity, most literature only assumes the pairwise interaction in the joint distribution of the multivariate variables (Chen et al., 2015; Yang et al., 2015). Assume that there are p random variables Y₁, … , Y_p represented as nodes of the graph G = (V, E), with the vertex set V = {1, … , p} and the edge set E ∈ V × V. An edge between vertexes i and j indicates that Y_i and Y_j are conditionally dependent given the remaining variables. The joint distribution of a pairwise graphical model corresponding to the graph G = (V, E) takes the form

pr (y) = exp {\sum_{s \in V} f_{s} (y_{s}; α_{s}) + 1 ∕ 2 \sum_{(s, t) \in E} θ_{s t} y_{s} y_{t} - Φ (Θ, α)},

(6)

where y = (y₁, … , y_p)^T, Θ = (θ_st)_p×p is a symmetric square matrix of parameters associated with edges, with the diagonal elements equal to zero, α = (α₁, … , α_p) is a matrix of parameters with the sth column α_s involved in the node potential function f_s(y_s; α_s), and Φ (Θ, α) is the log partition function. In many examples of exponential family graphical models, the log partition function is intractable. To circumvent this challenge, a composite likelihood function constructed by combining conditional likelihoods has been proposed by Xue et al. (2012), Chen et al. (2015) and Yang et al. (2015), among others. Specifically, denote y_−s = (y₁, … , y_s−1, y_s+1, … , y_p)^T. The conditional distribution is pr(y_s ∣ y_−s) = exp{f_s(y_s; α_s) + Σ_tǂs θ_sty_sy_t − D_s(η_s)}, where η_s is a function of α_s, y_−s and Θ_s, Θ_s is the sth column of Θ without the diagonal element, and D_s(·) is a function whose form depends on the conditional distribution.

As a concrete example, consider the following Ising graphical model. Let Y be a vector of binary variables, taking values {−1, 1}. Assume that Y satisfies the joint distribution (6) with f_s(y_s; α_s) = α_1sy_s. Given n independent and identically distributed copies Y⁽ⁱ⁾ of Y, the composite loglikelihood function is $c ℓ (Θ, α) = n^{- 1} \sum_{i = 1}^{n} \sum_{s = 1}^{p} ℓ_{s}^{(i)} (Θ, α)$ , where

ℓ_{s}^{(i)} (Θ, α) = α_{1 s} y_{s}^{(i)} + \sum_{t \neq s} θ_{s t} y_{s}^{(i)} y_{t}^{(i)} - D_{s} (α_{1 s} + \sum_{t \neq s} θ_{s t} y_{t}^{(i)}),

with D_s(η) = log{exp(η) + exp(−η)}. The composite information and sensitivity matrices are $I_{c} (Θ, α) = \sum_{s = 1}^{p} E {\partial ℓ_{s}^{(i)} (Θ, α) ∕ \partial (Θ, α)}^{\otimes 2}$ and $H (Θ, α) = \sum_{s = 1}^{p} E {- \partial^{2} ℓ_{s}^{(i)} (Θ, α) ∕ \partial (Θ, α)^{2}}$ . The proposed composite information tests can be used to test the specification of the composite likelihood for the Ising model.

4. Simulation

We consider the example in § 3.1 with two scenarios: when the specified composite likelihood model correctly includes all the predictors, and when a predictor is falsely excluded.

We simulate data using (5) with three continuous predictors, x₁, x₂ and x₃. The first predictor x₁ is generated from the uniform distribution from 0 to 5. The second and third predictors, x₂ and x₃, are the quadratic and cubic terms of x₁, i.e., $x_{2} = x_{1}^{2}$ and $x_{3} = x_{1}^{3}$ , respectively. We set β₁ and β₂ to 2 and −0.5, respectively, and let the magnitude of β₃ decrease from 0 to −0.5 to evaluate the size and power of the proposed tests. The stratum-specific intercepts α_i are generated from a uniform distribution from −2.5 to −2. We calculate the individual probability of Y_ij = 1 using (5) and generate the binary response from the corresponding Bernoulli distribution. We randomly sample five cases (y = 1) and five controls (y = 0) from each stratum. Assume that only two predictors x₁ and x₂ are available and included in the composite likelihood method described in § 3.1. We apply the proposed tests of model specification for the composite likelihood with only x₁ and x₂. In practice, we cannot use the classical composite likelihood based tests for β₃ = 0, since we assume the value of x₃ is unobserved. For each of the scenarios, we consider n = 200, 400 and 600 strata.

Table 1 summarizes the empirical Type I error and power of the proposed tests at nominal levels of 5% and 10% based on 5000 replicates. The composite information ratio test controls Type I error reasonably well at both nominal levels. The composite information max test is slightly liberal at the 5% nominal level when n is small but the Type I error tends to be more accurate as n increases. The composite information matrix test is relatively conservative, which agrees with the literature on the standard information matrix test.

Table 1.

Empirical rejection rates (%) in 5000 simulations of three proposed tests for H₀, under varying numbers of strata n = 200, 400 and 600, and log odds ratios β₁ = 2, β₂ = −0.5, and β₃ varying from 0 to −0.5

		Rejection (%) at α = 0.05			Rejection (%) at α = 0.10
n	β₃	Matrix test	Ratio test	Max test	Matrix test	Ratio test	Max test
200	−0.00	3.6	4.9	7.8	5.8	9.7	11.4
	−0.10	19.2	16.6	27.9	25.6	24.1	35.5
	−0.25	29.2	24.2	39.8	37.3	32.0	47.6
	−0.50	33.6	27.1	43.8	41.7	35.9	51.4
400	0.00	3.4	4.6	6.9	5.3	9.5	10.3
	−0.10	29.8	21.5	41.3	38.1	30.3	50.0
	−0.25	47.5	32.3	58.9	56.2	41.1	67.0
	−0.50	53.9	37.5	63.8	61.9	46.9	71.2
600	0.00	4.1	5.2	6.4	6.5	10.2	9.9
	−0.10	43.4	28.6	55.4	52.2	37.7	63.8
	−0.25	62.9	40.5	73.6	71.3	50.5	80.9
	−0.50	69.5	47.2	78.9	77.0	57.1	84.8

Open in a new tab

As the magnitude of β₃ decreases from 0 to −0.5, all three tests show increasing power. The composite information max test has largest power among the three tests. This gain in power may be due to the fact that it detects model misspecification by searching for the maximum discrepancy between the two matrices, and is more sensitive than the other two tests, which average the discrepancies. In our simulations, the composite information ratio test has relatively less power than the other two tests.

In the Supplementary Material we conduct additional simulation studies in four different scenarios, including the Ising model described in § 3.2, the independence likelihood for correlated data and the pairwise likelihood for multivariate outcomes. We find that the empirical Type I error rates of the proposed composite information tests are effectively controlled below the nominal levels and all three tests have good power for detecting model misspecifications. The matrix and max tests perform similarly in all examples, and their empirical Type I error rates converge to the nominal level more slowly than the ratio test, especially in the Ising model example. We find that the power of the three tests depends on the underlying data-generating process. The information ratio test has relatively higher statistical power in two of the four scenarios. We refer to the Supplementary Material for further discussion on the choice of the tests.

5. Discussion

In applications with time series, data are no longer independent and our results do not apply. Davis & Yau (2011) proposed a pairwise likelihood approach for a broad class of linear time series models. A future research direction is to generalize the proposed tests to evaluate the specification of a composite likelihood with dependent data.

In this paper we considered that the total number of events K was fixed and the sample size n increased. Although the constructions of the new composite information matrix and the new identity in (1) do not require such an assumption, derivations of the limiting distributions of the three proposed tests do assume that the maximum composite likelihood estimator ${\hat{θ}}_{n}$ is a consistent estimator of θ with n^1/2 convergence rate under the null hypothesis. Such an assumption may not hold in some scenarios where the number of components also increases as the number of independent replicates grows. For example, Cox & Reid (2004) discussed the situation where a small number of individually large sequences is available for pairwise likelihood, the maximum composite likelihood estimator may not be consistent or the convergence rate may be very slow as the number of components increases, depending on the internal correlation among the components. In such scenarios, the limiting distribution of the proposed test needs to be derived differently.

The validity of inference based on composite likelihoods usually requires both the correct specification of lower-dimensional conditional or marginal models and the existence of a joint model corresponding to the assumed composite likelihood. The latter is known as the compatibility of composite likelihoods; see Yi (2014) for further discussion. The proposed tests aim to investigate the specification of lower-dimensional models in composite likelihoods assuming the compatibility condition holds.

The information test (White, 1982) can be viewed as an omnibus test, whereas the proposed tests are tailored for the model specification required in the composite likelihood function. Thus, our tests can be more powerful than the existing information test based on the full likelihood, when the composite likelihood function is misspecified. If our tests indicate misspecification, it would be of interest to understand what modelling assumptions are violated. A future research direction is to design goodness-of-fit tests to detect specific departures from the assumed composite likelihood.

Supplementary Material

NIHMS1693468-supplement-1.pdf^{(210.6KB, pdf)}

Acknowledgement

The authors thank the referees, associate editor and editor for comments that substantially improved this work.

Footnotes

Supplementary material

Supplementary material available at Biometrika online includes regularity conditions, proof of Theorems 1-3, analysis of local asymptotic power and three additional simulation studies, including the Ising model described in § 3.2, inference using independent likelihood for dependent data, and inference using pairwise likelihood for multivariate outcomes.

Contributor Information

JING HUANG, Department of Biostatistics, Epidemiology and Informatics, University of Pennsylvania, Philadelphia, Pennsylvania 19104, U.S.A..

YANG NING, Department of Statistics and Data Science, Cornell University, Comstock Hall 1188, Ithaca, New York 14853, U.S.A..

NANCY REID, Department of Statistical Sciences, University of Toronto, Toronto M5S 3GS, Canada.

YONG CHEN, Department of Biostatistics, Epidemiology and Informatics, University of Pennsylvania, Philadelphia, Pennsylvania 19104, U.S.A..

References

Besag J (1974). Spatial interaction and the statistical analysis of lattice systems (with discussion). J. R. Statist. Soc. B 36, 192–239. [Google Scholar]
Chandler R & Bate S (2007). Inference for clustered data using the independence loglikelihood. Biometrika 94, 167–83. [Google Scholar]
Chen S, Witten DM & Ali S (2015). Selection and estimation for mixed graphical models. Biometrika 102, 47–64. [DOI] [PMC free article] [PubMed] [Google Scholar]
Copas J & Eguchi S (2005). Local model uncertainty and incomplete-data bias (with discussion). J. R. Statist. Soc. B 67, 459–513. [Google Scholar]
Cox D & Reid N (2004). A note on pseudolikelihood constructed from marginal densities. Biometrika 91, 729–37. [Google Scholar]
Davis RA & Yau CY (2011). Comments on pairwise likelihood in time series models. Statist. Sinica 21, 255–77. [Google Scholar]
Guan Y (2006). A composite likelihood approach in fitting spatial point process models. J. Am. Statist. Assoc 101, 1502–12. [Google Scholar]
He W & Yi G (2011). A pairwise likelihood method for correlated binary data with/without missing observations under generalized partially linear single-index models. Statist. Sinica 21, 207–29. [Google Scholar]
Heagerty P & Lele S (1998). A composite likelihood approach to binary spatial data. J. Am. Statist. Assoc 93, 1099–111. [Google Scholar]
Kent J (1982). Robust properties of likelihood ratio tests. Biometrika 69, 19–27. [Google Scholar]
Liang K (1987). Extended Mantel–Haenszel estimating procedure for multivariate logistic regression models. Biometrics 43, 289–99. [PubMed] [Google Scholar]
Liang K & Zeger S (1986). Longitudinal data analysis using generalized linear models. Biometrika 73, 13–22. [Google Scholar]
Lindsay B (1988). Composite likelihood methods. Contemp. Math 80, 221–39. [Google Scholar]
Ma TF & Yau CY (2016). A pairwise likelihood-based approach for changepoint detection in multivariate time series models. Biometrika 103, 409–21. [DOI] [PubMed] [Google Scholar]
Molenberghs G & Verbeke G (2005). Models for Discrete Longitudinal Data. New York: Springer. [Google Scholar]
Ogden HE (2016). A caveat on the robustness of composite likelihood estimators: the case of a mis-specified random effect distribution. Statist. Sinica 26, 639–51. [Google Scholar]
Pagui K, Clovis E, Salvan A & Sartori N (2014). Combined composite likelihood. Can. J. Statist 42, 525–43. [Google Scholar]
Presnell B & Boos DD (2004). The IOS test for model misspecification. J. Am. Statist. Assoc 99, 216–27. [Google Scholar]
Varin C, Reid N & Firth D (2011). An overview of composite likelihood methods. Statist. Sinica 21, 5–42. [Google Scholar]
Wellner J & Zhang Y (2007). Two likelihood-based semiparametric estimation methods for panel count data with covariates. Ann. Statist 35, 2106–42. [Google Scholar]
White H (1982). Maximum likelihood estimation of misspecified models. Econometrica 50, 1–25. [Google Scholar]
Xu X & Reid N (2011). On the robustness of maximum composite likelihood estimate. J. Statist. Plan. Infer 141, 3047–54. [Google Scholar]
Xue L, Zou H & Cai T (2012). Nonconcave penalized composite conditional likelihood estimation of sparse Ising models. Ann. Statist 40, 1403–29. [Google Scholar]
Yang E, Ravikumar P, Allen GI & Liu Z (2015). Graphical models via univariate exponential family distributions. J. Mach. Learn. Res 16, 3813–47. [PMC free article] [PubMed] [Google Scholar]
Yi GY (2014). Composite likelihood/pseudolikelihood. In Wiley StatsRef: Statistics Reference Online, Ed. Balakrishnan N, Colton T, Everitt B, Piegorsch W, Ruggeri F and Teugels JL. [Google Scholar]
Zhou QM, Song PX-K & Thompson ME (2012). Information ratio test for model misspecification in quasi-likelihood inference. J.Am. Statist. Assoc. 107, 205–13. [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

NIHMS1693468-supplement-1.pdf^{(210.6KB, pdf)}

[R1] Besag J (1974). Spatial interaction and the statistical analysis of lattice systems (with discussion). J. R. Statist. Soc. B 36, 192–239. [Google Scholar]

[R2] Chandler R & Bate S (2007). Inference for clustered data using the independence loglikelihood. Biometrika 94, 167–83. [Google Scholar]

[R3] Chen S, Witten DM & Ali S (2015). Selection and estimation for mixed graphical models. Biometrika 102, 47–64. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R4] Copas J & Eguchi S (2005). Local model uncertainty and incomplete-data bias (with discussion). J. R. Statist. Soc. B 67, 459–513. [Google Scholar]

[R5] Cox D & Reid N (2004). A note on pseudolikelihood constructed from marginal densities. Biometrika 91, 729–37. [Google Scholar]

[R6] Davis RA & Yau CY (2011). Comments on pairwise likelihood in time series models. Statist. Sinica 21, 255–77. [Google Scholar]

[R7] Guan Y (2006). A composite likelihood approach in fitting spatial point process models. J. Am. Statist. Assoc 101, 1502–12. [Google Scholar]

[R8] He W & Yi G (2011). A pairwise likelihood method for correlated binary data with/without missing observations under generalized partially linear single-index models. Statist. Sinica 21, 207–29. [Google Scholar]

[R9] Heagerty P & Lele S (1998). A composite likelihood approach to binary spatial data. J. Am. Statist. Assoc 93, 1099–111. [Google Scholar]

[R10] Kent J (1982). Robust properties of likelihood ratio tests. Biometrika 69, 19–27. [Google Scholar]

[R11] Liang K (1987). Extended Mantel–Haenszel estimating procedure for multivariate logistic regression models. Biometrics 43, 289–99. [PubMed] [Google Scholar]

[R12] Liang K & Zeger S (1986). Longitudinal data analysis using generalized linear models. Biometrika 73, 13–22. [Google Scholar]

[R13] Lindsay B (1988). Composite likelihood methods. Contemp. Math 80, 221–39. [Google Scholar]

[R14] Ma TF & Yau CY (2016). A pairwise likelihood-based approach for changepoint detection in multivariate time series models. Biometrika 103, 409–21. [DOI] [PubMed] [Google Scholar]

[R15] Molenberghs G & Verbeke G (2005). Models for Discrete Longitudinal Data. New York: Springer. [Google Scholar]

[R16] Ogden HE (2016). A caveat on the robustness of composite likelihood estimators: the case of a mis-specified random effect distribution. Statist. Sinica 26, 639–51. [Google Scholar]

[R17] Pagui K, Clovis E, Salvan A & Sartori N (2014). Combined composite likelihood. Can. J. Statist 42, 525–43. [Google Scholar]

[R18] Presnell B & Boos DD (2004). The IOS test for model misspecification. J. Am. Statist. Assoc 99, 216–27. [Google Scholar]

[R19] Varin C, Reid N & Firth D (2011). An overview of composite likelihood methods. Statist. Sinica 21, 5–42. [Google Scholar]

[R20] Wellner J & Zhang Y (2007). Two likelihood-based semiparametric estimation methods for panel count data with covariates. Ann. Statist 35, 2106–42. [Google Scholar]

[R21] White H (1982). Maximum likelihood estimation of misspecified models. Econometrica 50, 1–25. [Google Scholar]

[R22] Xu X & Reid N (2011). On the robustness of maximum composite likelihood estimate. J. Statist. Plan. Infer 141, 3047–54. [Google Scholar]

[R23] Xue L, Zou H & Cai T (2012). Nonconcave penalized composite conditional likelihood estimation of sparse Ising models. Ann. Statist 40, 1403–29. [Google Scholar]

[R24] Yang E, Ravikumar P, Allen GI & Liu Z (2015). Graphical models via univariate exponential family distributions. J. Mach. Learn. Res 16, 3813–47. [PMC free article] [PubMed] [Google Scholar]

[R25] Yi GY (2014). Composite likelihood/pseudolikelihood. In Wiley StatsRef: Statistics Reference Online, Ed. Balakrishnan N, Colton T, Everitt B, Piegorsch W, Ruggeri F and Teugels JL. [Google Scholar]

[R26] Zhou QM, Song PX-K & Thompson ME (2012). Information ratio test for model misspecification in quasi-likelihood inference. J.Am. Statist. Assoc. 107, 205–13. [Google Scholar]

PERMALINK

On specification tests for composite likelihood inference

JING HUANG

YANG NING

NANCY REID

YONG CHEN

Summary

1. Introduction

2. Specification tests for composite likelihood functions

2.1. Composite information matrix and the Bartlett identity

2.2. A general class of specification tests for the composite likelihood

2.3. Special case 1: composite information matrix test

2.4. Special case 2: composite information ratio test

2.5. Special case 3: composite information max test

3. Examples

3.1. Extended Mantel–Haenszel method for stratified case-control studies

3.2. Undirected graphical model

4. Simulation

Table 1.

5. Discussion

Supplementary Material

Acknowledgement

Footnotes

Contributor Information

References

Associated Data

Supplementary Materials

ACTIONS

PERMALINK

RESOURCES

Cite

Add to Collections

PERMALINK

On specification tests for composite likelihood inference

JING HUANG

YANG NING

NANCY REID

YONG CHEN

Summary

1. Introduction

2. Specification tests for composite likelihood functions

2.1. Composite information matrix and the Bartlett identity

2.2. A general class of specification tests for the composite likelihood

2.3. Special case 1: composite information matrix test

2.4. Special case 2: composite information ratio test

2.5. Special case 3: composite information max test

3. Examples

3.1. Extended Mantel–Haenszel method for stratified case-control studies

3.2. Undirected graphical model

4. Simulation

Table 1.

5. Discussion

Supplementary Material

Acknowledgement

Footnotes

Contributor Information

References

Associated Data

Supplementary Materials

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases