Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2021 Jun 25.
Published in final edited form as: Biometrika. 2020 Jun 14;107(4):907–917. doi: 10.1093/biomet/asaa039

On specification tests for composite likelihood inference

JING HUANG 1, YANG NING 2, NANCY REID 3, YONG CHEN 4
PMCID: PMC8232013  NIHMSID: NIHMS1693468  PMID: 34176951

Summary

Composite likelihood functions are often used for inference in applications where the data have a complex structure. While inference based on the composite likelihood can be more robust than inference based on the full likelihood, the inference is not valid if the associated conditional or marginal models are misspecified. In this paper, we propose a general class of specification tests for composite likelihood inference. The test statistics are motivated by the fact that the second Bartlett identity holds for each component of the composite likelihood function when these components are correctly specified. We construct the test statistics based on the discrepancy between the so-called composite information matrix and the sensitivity matrix. As an illustration, we study three important cases of the proposed tests and establish their limiting distributions under both null and local alternative hypotheses. Finally, we evaluate the finite-sample performance of the proposed tests in several examples.

Keywords: Bartlett identity, Information matrix, Misspecification test, Model specification

1. Introduction

The composite likelihood function (Besag, 1974; Lindsay, 1988; Cox & Reid, 2004) is an inference function constructed as the product of a set of conditional or marginal density functions, whether or not the events defined in each component are mutually independent. It has been used in longitudinal studies (Molenberghs & Verbeke, 2005; Chandler & Bate, 2007), for the analysis of panel data (Wellner & Zhang, 2007), with spatial modelling (Heagerty & Lele, 1998; Guan, 2006), with missing data (He & Yi, 2011), with graphical models (Xue et al., 2012; Chen et al., 2015; Yang et al., 2015) and with change point detection in multivariate time series (Ma & Yau, 2016), among many other settings. One of the main reasons for its widespread application is that only part of the data-generating mechanism needs to be specified, which can reduce the computational cost and provide some robustness to model misspecification. For a comprehensive discussion of composite likelihood and its applications, see the review paper by Varin et al. (2011).

Although inference based on the composite likelihood has been well developed (Kent, 1982; Lindsay, 1988; Pagui et al., 2014), a corresponding specification test does not seem to exist. Aside from technical difficulties, on which we will elaborate later, the lack of development in this area may be due to the perception that composite likelihood inference is robust to partial model misspecification (Xu & Reid, 2011). However, composite likelihood inference still requires the specification of a set of lower-dimensional conditional or marginal models, and the corresponding results can be misleading if these are misspecified. For instance, Ogden (2016) showed that the maximum composite likelihood estimator is inconsistent in a generalized linear mixed model with a misspecified random effect distribution, and the misspecification bias of the maximum composite likelihood estimator is significantly larger than the misspecification bias in the maximum likelihood estimator.

In this paper we propose a class of specification tests for composite likelihood functions. A general test for model specification with ordinary likelihood functions is the information matrix test proposed by White (1982). This relies on the second Bartlett identity, which holds when the model is correctly specified: it compares the variability matrix to the sensitivity matrix, both defined in § 2 below. A Wald-type test statistic is constructed from the difference between these two matrices. Presnell & Boos (2004) proposed an ‘in-and-out sample’ likelihood ratio test, which is asymptotically equivalent to a multiplicative contrast between the sensitivity matrix and the variability matrix. More recently, Zhou et al. (2012) developed an information ratio test which can effectively test the specifications of variance and covariance functions in generalized estimating equations (Liang & Zeger, 1986).

These specification tests rely on the fact that the second Bartlett identity holds under the correct model. For a composite likelihood, the second Bartlett identity does not hold, even when the component densities are correctly specified (Varin et al., 2011). Thus, direct application of the tests based on the information matrix is invalid. To circumvent this difficulty, we define the composite information matrix, and show that this can recover a counterpart of the second Bartlett identity. We propose a general class of specification tests for composite likelihood inference based on the discrepancy between the composite information matrix and the sensitivity matrix. We illustrate the usefulness of the proposed class of tests with three important special cases: the composite information matrix test, composite information ratio test and composite information max test. We establish their asymptotic distributions under the null hypothesis, and asymptotic power under local alternatives.

2. Specification tests for composite likelihood functions

2.1. Composite information matrix and the Bartlett identity

Let f (y; θ) be the joint probability density function of a multi-dimensional random vector Y, indexed by a p-dimensional parameter θ = (θ1, … , θp) in the parameter space Ω. Let {A1(y),,AK(y)} denote a sequence of sets. The log probability of the event {YAk(y)} is k(θ;y)=loguAk(y)f(u;θ)du, where k = 1, 2, … , K and K is the total number of events. Assume that n independently and identically distributed random variables Y1, … , Yn are observed from the model f(y; θ). A composite loglikelihood function based on this sequence of events is

c(θ)=n1i=1nk=1Kωikk(θ;yi),

where ωik is a nonnegative weight associated with the event {YiAk(yi)}. The variability and sensitivity matrices associated with cℓ(θ) are

J(θ)=E[{k=1Kωikθk(θ;yi)}2],H(θ)=E{k=1Kωik2θθTk(θ;yi)}.

Since defining a composite likelihood only requires models for the events {YAk(y)} for 1 ⩽ kK, specification tests should be tailored to the modelling of {YAk(y)}. For instance, in the analysis of clustered data, the independence likelihood in Chandler & Bate (2007) only involves modelling the conditional mean function. Thus, the corresponding specification tests should be to verify the validity of the conditional mean model rather than, for example, the correlation structure of the full model.

Formally, the null hypothesis for a specification test of the composite likelihood can be formulated as

H0:there exists aθΩsuch that pr{YAk(y)}exp{k(θ;y)},for allk=1,,K.

As noted in § 1, the second Bartlett identity, which is the basis for all existing information based specification tests, does not hold for the composite likelihood even when the assumed model is correct. This statement is illustrated by an example of the multivariate normal model in the Supplementary Material. To develop specification tests, we construct a new identity by defining the composite information matrix Ic(θ) as follows:

Ic(θ)=E[k=1Kωik{θk(θ;yi)}2].

When each loglikelihood function k(θ; y) is correctly specified, under mild regularity conditions, we have

Ic(θ)H(θ)=0. (1)

Based on this identity, we propose a general class of specification tests for the composite likelihood.

2.2. A general class of specification tests for the composite likelihood

Define the maximum composite likelihood estimator as θ^n=argmaxθΩc(θ), and the observed composite information matrix and sensitivity matrix as

I^c(θ^n)=1ni=1nk=1Kωik{θk(θ^n;yi)}2,H^(θ^n)=1ni=1nk=1Kωik2θθTk(θ^n;yi).

Our family of specification tests is constructed by comparing I^c(θ^n) and H^(θ^n). Specifically, we consider a discrepancy function d:Rp×p×Rp×p[0,) which satisfies d(M1, M2) ⩾ 0 and d(M1, M1) = 0 for any M1,M2Rp×p. It can be seen that d is a premetric, more general than a standard metric in the sense that we do not require the symmetry or subadditivity of d. As will be seen later, some special cases of d are indeed only a premetric. Given a discrepancy function d, we consider the corresponding test statistic

Q=d{I^c(θ^n),H^(θ^n)}, (2)

which measures the distance between the observed composite information matrix and the sensitivity matrix. Under H0, (1) holds and therefore d{Ic(θ), H(θ)} = 0. Under mild regularity conditions, the composite information matrix and the sensitivity matrix, Ic(θ) and H(θ), can be consistently estimated by I^c(θ^n) and H^(θ^n) respectively. Thus, we expect Q to be small when the composite likelihood is correctly specified. On the other hand, a large value of Q implies the potential misspecification of the composite likelihood. In the following we consider three special cases of d and study their asymptotic null distributions and local asymptotic powers.

2.3. Special case 1: composite information matrix test

In this subsection we take d as a scaled L2 metric, which leads to a composite information matrix test. Since I^c(θ^n) and H^(θ^n) are symmetric matrices, we only need to consider their upper triangular entries when constructing Q. That is,

d(M1,M2)=(m1m2)TW(m1m2), (3)

where m1 and m2 are p(p + 1)/2-dimensional vectors obtained by vectorizing the upper triangular elements of the matrices M1 and M2, and W is a positive definite matrix used to standardize the test statistic. In this case, d is indeed a metric. When cℓ(θ) corresponds to the loglikelihood of the full model, the test statistic Q in (2) with the metric d defined in (3) reduces to the information matrix test in White (1982).

Formally, consider the p(p + 1)/2 upper triangular elements of I^c(θ^n)H^(θ^n), and denote the contribution of the ith observation to the (j, t) element (1 ⩽ jtp) of the matrix I^c(θ)H^(θ) by

ejt(yi,θ)=k=1Kωik{θjk(θ;yi)θtk(θ;yi)+2θjθtk(θ;yi)},

where θj and θt are the jth and tth elements of θ. With q = p(p + 1)/2, we stack the q elements ejt(yi, θ) for 1 ⩽ jtp to obtain a q × 1 vector e(yi, θ). Define TM(θ)=n1i=1ne(yi,θ) and

VM(θ)=E[e(yi,θ)+E{θe(yi,θ)}H1(θ)k=1Kωikθk(θ;yi)]2.

By Taylor expansion and the properties of influence functions, VM (θ) is the asymptotic variance matrix of n12TM(θ^n) under H0, and it can be consistently estimated by

V^M=n1i=1n[e(yi,θ^n)+θTM(θ^n)H^1(θ^n)k=1Kωikθk(θ^n;yi)]2.

The proposed composite information matrix test Qmatrix is defined as

Qmatrix=nTMT(θ^n)V^M1TM(θ^n),

where we set W=V^M1 in (3) to construct an asymptotic χ2 test of Wald type. Throughout the paper we consider the scenario that the sample size n increases, with the total number of events K fixed. The following theorem states the asymptotic distribution of Qmatrix under the null hypothesis.

Theorem 1. Given regularity conditions R1–R7 in the Supplementary Material, as n increases the composite information matrix test statistic Qmatrix converges in distribution to χq2 under H0.

The above theorem suggests the use of the (1 − α) quantile of the χq2 distribution as the cut-off value for the test Qmatrix. Since the degrees of freedom q increase as a quadratic function of the number of parameters in the composite likelihood, and the composite information matrix test aims to pick up a wide range of deviations from the null, this test may have low power when p is relatively large. This phenomenon is also confirmed in our simulation studies. In the following subsections we propose alternative choices of the discrepancy function d, which attempts to address this issue.

2.4. Special case 2: composite information ratio test

Instead of using the L2 metric in (3), we consider a discrepancy function which takes a ratio contrast between two matrices, defined as

d(M1,M2)={tr(M1M21)p}2W, (4)

where W is a positive number used to standardize the test statistic. It can be shown that d in (4) is a premetric rather than a metric. Motivated by (4), we can construct a parsimonious summary statistic by comparing the trace of I^c(θ^n)H^1(θ^n) to p. This type of test was proposed by Zhou et al. (2012) for the quasilikelihood function with application to the selection of the covariance structure in generalized estimating equations. Unlike quasilikelihood, where the second Bartlett identity holds, the construction of this test is built on the identity in (1). Inspired by Zhou et al. (2012), we call this the composite information ratio test.

The test statistic is

Qratio=[n12{TR(θ^n)p}]2V^R,

where TR(θ^n)=tr{I^c(θ^n)H^1(θ^n)}, and V^R is the estimated asymptotic variance of TR(θ^n). In the Supplementary Material we derive the estimator

V^R=n1i=1n[k=1Kωik{θk(θ^n;yi)}TH^1(θ^n)θk(θ^n;yi)]+θTR(θ^n)H^1(θ^n)k=1Kωikθk(θ^n;yi)+tr[{k=1Kωik2θθTk(θ^n;yi)H^1(θ^n)}]2.

Theorem 2. Given regularity conditions R1–R7 in the Supplementary Material, the composite information ratio test statistic Qratio converges in distribution to χ12 under H0.

In the simulation studies, the composite information ratio test tends to yield more accurate Type I errors than the composite information matrix test.

2.5. Special case 3: composite information max test

Here we propose another test by considering the maximum discrepancy of the diagonal elements of the linear contrast between the observed composite information matrix and the sensitivity matrix. We call this test the composite information max test. This corresponds to the general test statistic Q in (2) with

d(M1,M2)=W(m1m2)max,

where W is a p × p positive definite matrix used to standardize the test statistic, and m1={(M1)11,,(M1)pp}T is a p × 1 vector.

Denote the diagonal elements of the linear contrast, contributed by the ith observation, by ejj(yi, θ), j = 1, … , p, and let e*(yi, θ) = {e11(yi, θ), … , epp(yi, θ)}T. Define TM(θ^n)=n1i=ine(yi,θ^n). Denote by VM(θ) the asymptotic variance matrix of n12TM(θ^n) and by V^M its empirical counterpart. Define S=n12U^MTM(θ^n), where Sj is the jth element of S, j = 1, … , p, and U^M is a p × p matrix such that U^MTU^M=V^M1. Our asymptotic results hold for any choice of U^M. The composite information max test is given by

Qmax=max1jpSj.

The following theorem establishes the asymptotic distribution of the composite information max test under the null hypothesis.

Theorem 3. Given the regularity conditions in the Supplementary Material, we have

Qmaxmax1jpNj

in distribution under H0, where N1, … , Np are independent and identically distributed standard normal variables. In other words, for any t > 0, limn→∞ ∣pr(Qmaxt) − F (t)p∣ = 0, where F (·) is the cumulative distribution function of the folded standard normal distribution.

By Theorem 3, we reject the null hypothesis at level α if Qmax > F−1{(1 − α)1/p}, where F−1(·) is the inverse function of F(·) defined in Theorem 3, as Nj and Nk are independent standard normal random variables for any j ǂ k. In the construction of S, we transform TM(θ^n) by the matrix UM so that the elements of S are decorrelated.

The main difference among Qmax, Qmatrix and Qratio lies in how the individual statistics constructed from some components of the matrix equation (1) are combined. Specifically, the test statistic Qmax combines individual statistics by searching for their maximum across the diagonal, whereas Qmatrix and Qratio exploit different quadratic forms of individual statistics. In the Supplementary Material we investigate the asymptotic local power of the three composite information tests under a sequence of locally misspecified models (Copas & Eguchi, 2005).

3. Examples

3.1. Extended Mantel–Haenszel method for stratified case-control studies

The Mantel–Haenszel estimator is widely used to estimate the common odds ratio in a series of 2 × 2 tables in epidemiological studies. Liang (1987) extended the Mantel–Haenszel approach to logistic regression models for stratified case-control studies, to allow simultaneous estimation of effect sizes of multivariate risk factors.

Consider a stratified case-control study, where xi1, … , xidi denote the p × 1 vectors of potential risk factors of di cases, and let xidi+1, … , xihi denote the potential risk factors of mi controls in the ith stratum, with mi = hidi and i = 1, … , n. A logistic regression model allowing for stratum-specific effects is

logitpr(yij=1xij)=αi+βTxij(i=1,,n), (5)

where the coefficient β quantifies the effects of risk factors xij on the disease status yij.

Liang (1987) proposed a composite likelihood method where the nuisance parameters αi are eliminated by conditioning. Specifically, for the (j, l) case-control pair of subjects in the ith stratum (j = 1, … , di; l = di + 1, … , hi), the conditional probability that xij is from the case given that xij and xil are observed is

pr(yij=1,yil=0yij+yil=1,xij,xil;αi,β)=eβTxijeβTxij+eβTxil.

Thus, a composite likelihood can be formulated by considering all dimi pairs in the ith stratum,

Li(β)=j=1dil=di+1hieβTxijeβTxij+eβTxil.

A weighted composite loglikelihood function for β combining data from all n strata for β is then constructed as

c(β)=n1i=1nωilog{Li(β)}=n1i=1nωij=1dil=di+1hilog(eβTxijeβTxij+eβTxil).

We consider the weights ωi=hi1; with this choice the maximum composite likelihood estimator reduces to the Mantel–Haenszel estimator when only a binary covariate is considered (Liang, 1987). The composite information matrix and the sensitivity matrix are

Ic(β)=E[hi1j=1dil=di+1hi{xijxijexp(βTxij)+xilexp(βTxil)exp(βTxij)+exp(βTxil)}2],
H(β)=E[hi1j=1dil=di+1hi{xijexp(βTxij)+xilexp(βTxil)}2{exp(βTxij)+exp(βTxil)}2]+[{xijxijTexp(βTxij)+xilxilTexp(βTxil)}exp(βTxij)+exp(βTxil)].

The proposed composite information tests can be constructed to test the specification of the model. The empirical performance of these tests is evaluated in § 4.

3.2. Undirected graphical model

The undirected graphical model has been widely used to describe the dependence structures of multivariate variables. For simplicity, most literature only assumes the pairwise interaction in the joint distribution of the multivariate variables (Chen et al., 2015; Yang et al., 2015). Assume that there are p random variables Y1, … , Yp represented as nodes of the graph G = (V, E), with the vertex set V = {1, … , p} and the edge set EV × V. An edge between vertexes i and j indicates that Yi and Yj are conditionally dependent given the remaining variables. The joint distribution of a pairwise graphical model corresponding to the graph G = (V, E) takes the form

pr(y)=exp{sVfs(ys;αs)+12(s,t)EθstysytΦ(Θ,α)}, (6)

where y = (y1, … , yp)T, Θ = (θst)p×p is a symmetric square matrix of parameters associated with edges, with the diagonal elements equal to zero, α = (α1, … , αp) is a matrix of parameters with the sth column αs involved in the node potential function fs(ys; αs), and Φ (Θ, α) is the log partition function. In many examples of exponential family graphical models, the log partition function is intractable. To circumvent this challenge, a composite likelihood function constructed by combining conditional likelihoods has been proposed by Xue et al. (2012), Chen et al. (2015) and Yang et al. (2015), among others. Specifically, denote y−s = (y1, … , ys−1, ys+1, … , yp)T. The conditional distribution is pr(ysys) = exp{fs(ys; αs) + Σtǂs θstysytDs(ηs)}, where ηs is a function of αs, y−s and Θs, Θs is the sth column of Θ without the diagonal element, and Ds(·) is a function whose form depends on the conditional distribution.

As a concrete example, consider the following Ising graphical model. Let Y be a vector of binary variables, taking values {−1, 1}. Assume that Y satisfies the joint distribution (6) with fs(ys; αs) = α1sys. Given n independent and identically distributed copies Y(i) of Y, the composite loglikelihood function is c(Θ,α)=n1i=1ns=1ps(i)(Θ,α), where

s(i)(Θ,α)=α1sys(i)+tsθstys(i)yt(i)Ds(α1s+tsθstyt(i)),

with Ds(η) = log{exp(η) + exp(−η)}. The composite information and sensitivity matrices are Ic(Θ,α)=s=1pE{s(i)(Θ,α)(Θ,α)}2 and H(Θ,α)=s=1pE{2s(i)(Θ,α)(Θ,α)2}. The proposed composite information tests can be used to test the specification of the composite likelihood for the Ising model.

4. Simulation

We consider the example in § 3.1 with two scenarios: when the specified composite likelihood model correctly includes all the predictors, and when a predictor is falsely excluded.

We simulate data using (5) with three continuous predictors, x1, x2 and x3. The first predictor x1 is generated from the uniform distribution from 0 to 5. The second and third predictors, x2 and x3, are the quadratic and cubic terms of x1, i.e., x2=x12 and x3=x13, respectively. We set β1 and β2 to 2 and −0.5, respectively, and let the magnitude of β3 decrease from 0 to −0.5 to evaluate the size and power of the proposed tests. The stratum-specific intercepts αi are generated from a uniform distribution from −2.5 to −2. We calculate the individual probability of Yij = 1 using (5) and generate the binary response from the corresponding Bernoulli distribution. We randomly sample five cases (y = 1) and five controls (y = 0) from each stratum. Assume that only two predictors x1 and x2 are available and included in the composite likelihood method described in § 3.1. We apply the proposed tests of model specification for the composite likelihood with only x1 and x2. In practice, we cannot use the classical composite likelihood based tests for β3 = 0, since we assume the value of x3 is unobserved. For each of the scenarios, we consider n = 200, 400 and 600 strata.

Table 1 summarizes the empirical Type I error and power of the proposed tests at nominal levels of 5% and 10% based on 5000 replicates. The composite information ratio test controls Type I error reasonably well at both nominal levels. The composite information max test is slightly liberal at the 5% nominal level when n is small but the Type I error tends to be more accurate as n increases. The composite information matrix test is relatively conservative, which agrees with the literature on the standard information matrix test.

Table 1.

Empirical rejection rates (%) in 5000 simulations of three proposed tests for H0, under varying numbers of strata n = 200, 400 and 600, and log odds ratios β1 = 2, β2 = −0.5, and β3 varying from 0 to −0.5

Rejection (%) at α = 0.05 Rejection (%) at α = 0.10
n β3 Matrix test Ratio test Max test Matrix test Ratio test Max test
200 −0.00 3.6 4.9 7.8 5.8 9.7 11.4
−0.10 19.2 16.6 27.9 25.6 24.1 35.5
−0.25 29.2 24.2 39.8 37.3 32.0 47.6
−0.50 33.6 27.1 43.8 41.7 35.9 51.4
400 0.00 3.4 4.6 6.9 5.3 9.5 10.3
−0.10 29.8 21.5 41.3 38.1 30.3 50.0
−0.25 47.5 32.3 58.9 56.2 41.1 67.0
−0.50 53.9 37.5 63.8 61.9 46.9 71.2
600 0.00 4.1 5.2 6.4 6.5 10.2 9.9
−0.10 43.4 28.6 55.4 52.2 37.7 63.8
−0.25 62.9 40.5 73.6 71.3 50.5 80.9
−0.50 69.5 47.2 78.9 77.0 57.1 84.8

As the magnitude of β3 decreases from 0 to −0.5, all three tests show increasing power. The composite information max test has largest power among the three tests. This gain in power may be due to the fact that it detects model misspecification by searching for the maximum discrepancy between the two matrices, and is more sensitive than the other two tests, which average the discrepancies. In our simulations, the composite information ratio test has relatively less power than the other two tests.

In the Supplementary Material we conduct additional simulation studies in four different scenarios, including the Ising model described in § 3.2, the independence likelihood for correlated data and the pairwise likelihood for multivariate outcomes. We find that the empirical Type I error rates of the proposed composite information tests are effectively controlled below the nominal levels and all three tests have good power for detecting model misspecifications. The matrix and max tests perform similarly in all examples, and their empirical Type I error rates converge to the nominal level more slowly than the ratio test, especially in the Ising model example. We find that the power of the three tests depends on the underlying data-generating process. The information ratio test has relatively higher statistical power in two of the four scenarios. We refer to the Supplementary Material for further discussion on the choice of the tests.

5. Discussion

In applications with time series, data are no longer independent and our results do not apply. Davis & Yau (2011) proposed a pairwise likelihood approach for a broad class of linear time series models. A future research direction is to generalize the proposed tests to evaluate the specification of a composite likelihood with dependent data.

In this paper we considered that the total number of events K was fixed and the sample size n increased. Although the constructions of the new composite information matrix and the new identity in (1) do not require such an assumption, derivations of the limiting distributions of the three proposed tests do assume that the maximum composite likelihood estimator θ^n is a consistent estimator of θ with n1/2 convergence rate under the null hypothesis. Such an assumption may not hold in some scenarios where the number of components also increases as the number of independent replicates grows. For example, Cox & Reid (2004) discussed the situation where a small number of individually large sequences is available for pairwise likelihood, the maximum composite likelihood estimator may not be consistent or the convergence rate may be very slow as the number of components increases, depending on the internal correlation among the components. In such scenarios, the limiting distribution of the proposed test needs to be derived differently.

The validity of inference based on composite likelihoods usually requires both the correct specification of lower-dimensional conditional or marginal models and the existence of a joint model corresponding to the assumed composite likelihood. The latter is known as the compatibility of composite likelihoods; see Yi (2014) for further discussion. The proposed tests aim to investigate the specification of lower-dimensional models in composite likelihoods assuming the compatibility condition holds.

The information test (White, 1982) can be viewed as an omnibus test, whereas the proposed tests are tailored for the model specification required in the composite likelihood function. Thus, our tests can be more powerful than the existing information test based on the full likelihood, when the composite likelihood function is misspecified. If our tests indicate misspecification, it would be of interest to understand what modelling assumptions are violated. A future research direction is to design goodness-of-fit tests to detect specific departures from the assumed composite likelihood.

Supplementary Material

1

Acknowledgement

The authors thank the referees, associate editor and editor for comments that substantially improved this work.

Footnotes

Supplementary material

Supplementary material available at Biometrika online includes regularity conditions, proof of Theorems 1-3, analysis of local asymptotic power and three additional simulation studies, including the Ising model described in § 3.2, inference using independent likelihood for dependent data, and inference using pairwise likelihood for multivariate outcomes.

Contributor Information

JING HUANG, Department of Biostatistics, Epidemiology and Informatics, University of Pennsylvania, Philadelphia, Pennsylvania 19104, U.S.A..

YANG NING, Department of Statistics and Data Science, Cornell University, Comstock Hall 1188, Ithaca, New York 14853, U.S.A..

NANCY REID, Department of Statistical Sciences, University of Toronto, Toronto M5S 3GS, Canada.

YONG CHEN, Department of Biostatistics, Epidemiology and Informatics, University of Pennsylvania, Philadelphia, Pennsylvania 19104, U.S.A..

References

  1. Besag J (1974). Spatial interaction and the statistical analysis of lattice systems (with discussion). J. R. Statist. Soc. B 36, 192–239. [Google Scholar]
  2. Chandler R & Bate S (2007). Inference for clustered data using the independence loglikelihood. Biometrika 94, 167–83. [Google Scholar]
  3. Chen S, Witten DM & Ali S (2015). Selection and estimation for mixed graphical models. Biometrika 102, 47–64. [DOI] [PMC free article] [PubMed] [Google Scholar]
  4. Copas J & Eguchi S (2005). Local model uncertainty and incomplete-data bias (with discussion). J. R. Statist. Soc. B 67, 459–513. [Google Scholar]
  5. Cox D & Reid N (2004). A note on pseudolikelihood constructed from marginal densities. Biometrika 91, 729–37. [Google Scholar]
  6. Davis RA & Yau CY (2011). Comments on pairwise likelihood in time series models. Statist. Sinica 21, 255–77. [Google Scholar]
  7. Guan Y (2006). A composite likelihood approach in fitting spatial point process models. J. Am. Statist. Assoc 101, 1502–12. [Google Scholar]
  8. He W & Yi G (2011). A pairwise likelihood method for correlated binary data with/without missing observations under generalized partially linear single-index models. Statist. Sinica 21, 207–29. [Google Scholar]
  9. Heagerty P & Lele S (1998). A composite likelihood approach to binary spatial data. J. Am. Statist. Assoc 93, 1099–111. [Google Scholar]
  10. Kent J (1982). Robust properties of likelihood ratio tests. Biometrika 69, 19–27. [Google Scholar]
  11. Liang K (1987). Extended Mantel–Haenszel estimating procedure for multivariate logistic regression models. Biometrics 43, 289–99. [PubMed] [Google Scholar]
  12. Liang K & Zeger S (1986). Longitudinal data analysis using generalized linear models. Biometrika 73, 13–22. [Google Scholar]
  13. Lindsay B (1988). Composite likelihood methods. Contemp. Math 80, 221–39. [Google Scholar]
  14. Ma TF & Yau CY (2016). A pairwise likelihood-based approach for changepoint detection in multivariate time series models. Biometrika 103, 409–21. [DOI] [PubMed] [Google Scholar]
  15. Molenberghs G & Verbeke G (2005). Models for Discrete Longitudinal Data. New York: Springer. [Google Scholar]
  16. Ogden HE (2016). A caveat on the robustness of composite likelihood estimators: the case of a mis-specified random effect distribution. Statist. Sinica 26, 639–51. [Google Scholar]
  17. Pagui K, Clovis E, Salvan A & Sartori N (2014). Combined composite likelihood. Can. J. Statist 42, 525–43. [Google Scholar]
  18. Presnell B & Boos DD (2004). The IOS test for model misspecification. J. Am. Statist. Assoc 99, 216–27. [Google Scholar]
  19. Varin C, Reid N & Firth D (2011). An overview of composite likelihood methods. Statist. Sinica 21, 5–42. [Google Scholar]
  20. Wellner J & Zhang Y (2007). Two likelihood-based semiparametric estimation methods for panel count data with covariates. Ann. Statist 35, 2106–42. [Google Scholar]
  21. White H (1982). Maximum likelihood estimation of misspecified models. Econometrica 50, 1–25. [Google Scholar]
  22. Xu X & Reid N (2011). On the robustness of maximum composite likelihood estimate. J. Statist. Plan. Infer 141, 3047–54. [Google Scholar]
  23. Xue L, Zou H & Cai T (2012). Nonconcave penalized composite conditional likelihood estimation of sparse Ising models. Ann. Statist 40, 1403–29. [Google Scholar]
  24. Yang E, Ravikumar P, Allen GI & Liu Z (2015). Graphical models via univariate exponential family distributions. J. Mach. Learn. Res 16, 3813–47. [PMC free article] [PubMed] [Google Scholar]
  25. Yi GY (2014). Composite likelihood/pseudolikelihood. In Wiley StatsRef: Statistics Reference Online, Ed. Balakrishnan N, Colton T, Everitt B, Piegorsch W, Ruggeri F and Teugels JL. [Google Scholar]
  26. Zhou QM, Song PX-K & Thompson ME (2012). Information ratio test for model misspecification in quasi-likelihood inference. J.Am. Statist. Assoc. 107, 205–13. [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

1

RESOURCES