Summary
Many parameters and positive-definiteness are two major obstacles in estimating and modelling a correlation matrix for longitudinal data. In addition, when longitudinal data is incomplete, incorrectly modelling the correlation matrix often results in bias in estimating mean regression parameters. In this paper, we introduce a flexible and parsimonious class of regression models for a covariance matrix parameterized using marginal variances and partial autocorrelations. The partial autocorrelations can freely vary in the interval (–1, 1) while maintaining positive definiteness of the correlation matrix so the regression parameters in these models will have no constraints. We propose a class of priors for the regression coefficients and examine the importance of correctly modeling the correlation structure on estimation of longitudinal (mean) trajectories and the performance of the DIC in choosing the correct correlation model via simulations. The regression approach is illustrated on data from a longitudinal clinical trial.
Keywords: Markov Chain Monte Carlo, Generalized linear model, Uniform prior
1 Introduction
Longitudinal data, measurements on the same subject over time, arise in many areas, from clinical trials to environmental studies. In such studies, to draw valid inferences, the covariance between repeated observations on the same individuals needs to be properly modelled. On top of this, in incomplete longitudinal data, mis-modeling the covariance matrix can result in biased estimates of fixed effect mean parameters (Little and Rubin, 2002; Daniels and Hogan 2008). Two major obstacles for modeling covariance matrices are 1) the number of parameters and 2) positive-definiteness.
Many approaches have been proposed for estimating a covariance matrix more efficiently, whether by shrinking eigenvalues to obtain more stability (Yang and Berger, 1994; Efron and Morris, 1976) or reducing the dimension via structure (Leonard and Hsu, 1992; Chiu et al., 1996; Pourahmadi, 1999; Pourahmadi 2000; Daniels and Zhao, 2003). Some regression approaches to introduce structure and/or unit-specific covariates (Chiu et al., 1996; Hoff and Niu, 2012) can result in difficulty in interpreting regression coefficients due to complex (but unconstrained) parameterizations. This is not our focus here, but the parameterizations in Pourahmadi (1999) and the one given here do not suffer from this issue. There has also been research on shrinkage to introduce stability in structured ways (Daniels and Kass 1999, 2001; Daniels and Pourahmadi 2002) or without structure (Wong et al. 2003; Liechty et al. 2004).
These approaches can often be thought of in terms of specific decompositions of a covariance matrix. Our approach will focus on the variance/correlation decomposition, used recently by Barnard, McCulloch and Meng (2000), which decomposes the covariance matrix Σ as Σ = DRD, where R is correlation matrix and D is diagonal matrix of standard deviations. Our approach here will rely on this decomposition and a further decomposition of the correlation matrix R into partial autocorrelations which we review next.
Consider a p × p correlation matrix R with (j, j + k)th element, the marginal correlation ρjj+k ≡ Cor(Yj, Yj+k). The matrix R can be re-parameterized using partial autocorrelations, πjj+k ≡ Cor(Yj, Yj+k|Yl, j < l < j + k). To provide an expression for the partial autocorrelations as a function of the marginal correlations, we first define some notation (similar to Joe (2006)).
Let R(j,j+ k) be the k + 1 × k + 1 submatrix of R which takes elements from the jth row to the (j + k)th row and the jth column to the (j + k)th column. Then we partition R(j,j+k) as follows,
where r1(j, k) = (ρj, j+1 ,...,ρj, j+ k–1), r3(j, k) = (ρj+k, j+1, ..., j+k,j+k–1), and R2(j, k) contains the middle k – 1 rows and columns of R(j,j+k).
The partial autocorrelations have the following form as a function of the marginal correlations,
| (1) |
where and . The marginal correlations, ρj,j+k can also be written as a simple function of the partial autocorrelations,
One of the advantages of this parameterization is that πjk can vary independently in (–1, 1) while maintaining positive definiteness of R, unlike ρjk (see, e.g., Joe 2006). Based on reparameterizing the marginal correlations into partial autocorrelations, Daniels and Pourahmadi (2009) introduce a prior for R induced by independent uniform priors on the partial autocorrelations, i.e., p(π) = 2–p(p–1)/2
After reparameterizing the off-diagonal elements of R = (ρjk) in terms of the partial autocorrelations {πjk}, we transform them to the entire real line using Fisher's z transformation. Moving from the constrained marginal correlations to the unconstrained transformed partial autocorrelations provides a link function framework similar to the theory of generalized linear models in McCullagh and Nelder (1989). The models proposed here will extend recent models from the literature for correlation matrices including the multivariate probit (Czado 2000) and related models (Daniels and Normand 2006).
This article expands on previous work by describing how to do Bayesian inference in the proposed models, including an appropriate choice of priors. Under an improper prior on the mean parameters and monotone (ignorable) missingness, we provide conditions under which the posterior is proper. We also provide simulation results that demonstrate the importance of specifying the correct correlation structure under ignorable missingness and evaluate the ability of the DIC to select the correct model in these situations.
This article is arranged as follows. In Section 2, we introduce regression models for the partial autocorrelations and marginal variances. We derive and investigate priors for the regression parameters for the partial autocorrelation and marginal variance parameters in Section 3. We provide details on posterior computations in Section 4. Results of a simulation study to investigate correlation structure misspecification are given in Section 5. Application of the models to a schizophrenia clinical trial is presented in Section 6. Section 7 provides conclusions and extensions.
2 Models for the covariance matrix
Let Yi : i = 1, . . . , n be a p × 1 vector of longitudinal responses measured (without loss of generality) at times 1, . . . , p with distribution,
| (2) |
where β is a vector of (mean) regression parameters with dimension pβ×1, xi is a pβ×p covariate matrix, and . We build regression models for Ri via the partial autocorrelations and Di, via the marginal variances in the following subsections.
2.1 Partial autocorrelations
Consider the following regression model for πi,jk, the jk-th partial autocorrelation for subject i,
| (3) |
where z(·) is Fisher's z-transform, and is a 1 × q vector of covariates to model structure and subject-level covariates; γ is unconstrained in q-dimensional real space . Given that the partial autocorrelations are correlations between longitudinal observations, conditional on intermediate ones, we might expect higher order ones to be zero. For example, we might specify corresponding to an AR(1) structure with all lag partial autocorrelations bigger than one equal to zero. The design vector, implies that , the corresponding partial autocorrelation matrix, which has 1's on the main diagonal and (j, k)–th element πjk (for j, k ≠ 1) has a Toeplitz form with the z-transform of the elements on each subdiagonal having a linear relationship in lag. The design vector, implies that the vector of π's has nonstationary structure with the z-transform of the lag one correlations a linear function of time and a stationary form for the rest of the matrix, with the z-transform of the elements for each lag (after the first one) having a linear relationship in lag. For related structures for the parameters of the modified Choleski decomposition, see Pourahmadi (1999) and Pourahmadi and Daniels (2002) .
2.2 Marginal Variances
We assume the logarithms of the marginal standard deviations, σi,j (i.e., the jth diagonal element of Di) follow the regression models,
| (4) |
where Ai,j is a 1 × q0 vector of covariates to model structure and unit-level covariates. For example, A(i,j) = (I(j = 1), I(j > 1)) induces a structure of equal variance except for time 1; A(i,j) = (1, j) corresponds to the marginal variances that are log linear in time. Verbyla (1993) proposed models for the marginal variances (residual variances) in terms of unit level covariates (i.e., heterogeneity) in the setting of independent responses.
3 Priors for γ
Standard diffuse priors for γ in (3), e.g., an improper uniform prior on or a diffuse normal prior, result in most of the mass for the partial autocorrelations, πi,jk being put at –1 and +1. This happens in many settings with diffuse priors on transformed spaces, e.g., coefficients in logistic regression (see Agresti and Hitchcock, 2005). These are not sensible prior beliefs. In the next two subsections, we will review a prior proposed for the partial autocorrelations from Daniels and Pourahmadi (2009) and propose alternative ones that both avoid this behavior for an unstructured vector of π's. We then propose a way to use these priors, which are both within the class of independent transformed Beta priors, to construct priors for γ and point out their connections to g-priors. We also construct a similar prior for η in (4).
3.1 Review of priors for unstructured partial autocorrelations
Independent uniform priors on the partial autocorrelations can be expressed as independent Beta priors on the interval (–1, 1), Beta(–1,1) (a, b), with parameters a = 1 and b = 1. These priors induce desirable behavior for longitudinal (ordered) data by shrinking higher lag marginal correlations toward zero (Daniels and Pourahmadi, 2009) and a priori favoring serial correlation often seen in longitudinal data (Munoz et. al., 1992). The behavior can be understood by examining the determinant of the Jacobian from ρ to π, J(ρ → π) (Joe, 2006),
As lag (k) increases, more mass is placed toward zero. This is not surprising since most priors on R(ρ) do not use information on potential ordering of the responses and would induce this prior form on partial correlations to obtain identical marginal priors for the marginal correlations (Barnard et al., 2000). However, the Beta(–1,1) (1, 1) prior does not favor positive correlations as we typically see in longitudinal data.
3.2 An alternative prior for unstructured partial autocorrelations
Here we introduce a prior on the partial autocorrelations that favors positive correlations over negative correlations. We propose independent priors on πjk with pdf's, , which is a Beta(–1,1) (2, 1) distribution; we refer to these as triangular priors given their shape. The implied marginal priors for ρjk are given in Figure 1a. The priors have decreasing mass close to 1 as lag (|j – k|) increases. This is consistent with serial correlation often seen in longitudinal data and places more mass on positive correlations than the Beta(–1,1) (1, 1) priors.
Figure 1.
Marginal priors for the marginal correlations induced by (a) the triangular Prior and (b) the Uniform prior. In (b), the upper triangles are the marginal priors on the ρ's induced by the original priors; the lower triangles are the marginal priors on ρ's after the normal approximation.
In the following section, we will use these two priors as a starting point to construct a prior for the regression coefficients, γ in (3). In the remaining, all Beta priors will be specified on the interval (–1, 1), but we just denote them as Beta(a, b).
3.3 Proposed prior on γ
In the following, when it is not necessary we will drop the subscripts on πjk. We start by deriving the distribution of z(πjk) when the πjk follow independent Beta(1, 1) priors. For this prior on π, , with pdf,
where z ∈ (–∞, +∞). This is the pdf of a logistic distribution, z ~ logistic(0, ½) with variance π/12. It is well known that the logistic distribution can be approximated with a t-distribution (Albert and Chib 1992). However, the easy to use construction of the multivariate t-distribution as a gamma mixture of normals has t-marginals but they are not independent as we required based on our original specification of independent Beta's. As a result, we will use a normal approximation to the logistic distribution, whose multivariate version does have independent marginals, ; that is, the random vector , where .
Figure 1b shows how well the normal prior approximates the original uniform prior on the hypercube (i.e., the independent Beta(1,1) priors) in terms of the marginal correlations. The upper triangular elements represent the marginal priors of ρjk from the original uniform prior and the lower triangular elements represent the marginal priors of ρjk from the prior based on the normal approximation. The approximate prior appears to behave sufficiently similarly.
Now, we show how this prior can be used to construct a prior for γ in (3). We first focus on the case of z(πi,jk) = z(πjk) and for ease of notation, let zjk = z(πjk) and z = (z12 , .., z1p, z23, ...z2p, ...zp–1p)T. Consider the full rank linear transformation z = wγ, where
w* is a T × q (T ≥ q) full column rank matrix corresponding to the regression in (3). The matrix w⊥ is a T × (T – q) full column rank matrix such that (w*)T × (w⊥) = 0q×(T–q) and (w⊥)T × w⊥ = I(T–q)×(T–q).
Therefore
with
We define E(z) = μ and V ar(z) = σ2 based on the multivariate normal prior on z. Under the Beta(1,1) prior on π, μ = 0 and σ2 = π/12; under the Beta(2,1) prior (triangular prior), μ = ½ and σ2 = 0.5722. The corresponding prior for γ is multivariate normal with mean and variance given below,
and
The resulting prior for γ* is also multivariate normal with expectation, μ(w*Tw*)–1w*T1T×1 and variance, σ2(w*Tw*)–1. The dimension reduction from z to γ* results in the prior variance being too small. To see this note that the variance of ith component of z in z = wγ is
The ith component of z = w*γ* has variance
Clearly, var(zi(γ)) > var(zi(γ*)). It is easy to adjust for this by noting that the average variance of is
where σ2 is the desired variance. Hence we can inflate var(γ*) by a factor of . The resulting prior for γ* is
See the supplementary materials for additional details on the above derivations.
3.4 Extension to unit-specific covariates
We can easily extend this prior to unit-specific covariates. Suppose, for i = 1, . . . , n,
Let be a p × q matrix such that . We first stack z1 , ..., zn and together, i.e., , and . So, we have z = w*γ and w* is nT × q full column rank matrix. Similar to the previous case, we obtain
| (5) |
which is our recommended prior in the general case.
3.5 Connection to g-priors
Our priors on γ have similar form to the g–priors introduced by Zellner (1986). However, our derivation begins with a prior on an unconstrained parameter space as opposed to Zellner's construction of a prior based on the posterior distribution of imaginary data y0, y0 = xT β + ε where (with independent priors on β ∝ 1 and ). The Zellner prior for β|σ0 has the form , where is the least squares estimate based on the imaginary data and g is a penalty parameter; in practice, the mean is typically set to zero so no imaginary data is actually required. Our prior has a similar form but it is based on the projection of z(πi) on with weights based on the original prior for π on the unconstrained space (here a hypercube). The ’weights’ based on the prior in (5) come in through μ in the prior mean, and σ2 in the prior variance, σ2(w*Tw*)–1. As a result, with these priors, we do not have to deal with the issue of the choice of g (for some discussion, see George and Foster (2000) and Clyde and George (2000)).
3.6 Prior for η
The most commonly used prior on the marginal variances σ2 is the inverse gamma prior, which facilitates computations due to conditional conjugacy. Daniels (2006) used a uniform prior on the transformed innovation (IV) parameters with or without structure similar to the model (4) in Section 2.2 for the marginal standard deviations. Barnard et al. (2000) discussed independent normal priors on logarithmic transformed σ. In particular, they proposed the following prior
| (6) |
with Λ diagonal. We will derive a prior for η similar to that for γ based on Barnard et al.'s prior for the marginal standard deviations. The resulting prior is
Note in the derivation, we have assumed ξ = λ1p×1 and Λ = τ2Ip×p in (6), where λ and τ are fixed a priori.
4 Posterior distribution and computations
The full data likelihood, L(η, β, γ|y) is proportional to
We specify the following priors for β, η, and γ,
| (7) |
| (8) |
| (9) |
where and are non-singular. In the setting of incomplete longitudinal responses, under an assumption of ignorable dropout (monotone missingness), we only need to specify the full data response model and the likelihood of interest is the observed data likelihood, L(β, γ, η|yobs, x), where the observed data response is yobs (Daniels and Hogan 2008); the form of the observed data likelihood is given in the supplementary materials.
Since we specify an improper prior on β, we need to prove the posterior distribution of (β, γ, η) is proper. In the next section, we provide a theorem which gives simple sufficient conditions under which the posterior is proper. The supplementary materials contain details on the MCMC algorithm to sample from the posterior distribution.
4.1 Posterior propriety
In the following theorem, we state conditions that are sufficient for the posterior to be proper. First, we need to introduce some notation. Suppose full-data Yi : i = 1, . . . , n are independently distributed random variables with distribution , where xi is a pβ × p covariate matrix, β is a pβ×1 (mean) regression parameter vector, , and with specified by (4) and Ri(γ) specified by (1) and (3); define , , to be sample spaces of β, γ, η, respectively. Let (Qi1 , ..., Qip)T be a vector of observed data indicators, where Qik = 1 if Yik is observed (0 otherwise). Let be the vector of observed data (of dimension ki) for subject i and Sk = {i, Qiki = 1 and Qiki+1 = 0, and ki = k, where 1 ≤ k ≤ p – 1; i = 1, ..n} be the set of subjects with observed data of dimension k.
Theorem 1:
We assume the observed data distribution for the ith subject (i = 1, . . . , n) is , where is a pβ ×ki submatrix of xi and is the ki × ki principal sub-matrix of Σi. We also assume the priors on the parameters are given by (7)-(9) and missingness is monotone and ignorable. Then the posterior of (β, γ, η) will be proper under the following three (easy to check) conditions:
is non-singular for all k ∈ {1, 2, . . . , p – 1}.
is non-singular.
is non-singular.
The proof is given in the supplementary materials. Note that the three conditions are conditions for the three design matrices in our model (for the mean, the variance, and the correlations, respectively); the latter two guarantee that the priors (8) and (9) are proper.
5 Simulation
To assess the importance of the correlation structure on estimating the (mean) longitudinal trajectories in incomplete data, we conducted a simulation study. The true model was (2) with p = 6. For each individual, the rows of the mean design matrix were specified as a second order orthogonal polynomial. We set β = (27, –2.3, 0.50)T. We considered three sample sizes (30, 100, and 400). For each scenario we simulated 200 data sets.
The true models for marginal variances and partial autocorrelation coefficients were given by
| (10) |
and
with γ = (0.65, 0.21, 0.85)T, and η = (150, 200)T. The structure on π represents a second order model with the lag one partial autocorrelations constant except for time 1, the lag two partial autocorrelations constant over time, and higher lag partial autocorrelations equal to zero. The structure on the variances corresponds to a constant variance over time after time one.
After simulating the complete data, we induce ignorable missingness via the following missing data mechanism,
where Qjk = I{Yjk is observed} and α = (3.86, –0.05).
We fit four models to the simulated data. For each model, we use the same true mean and marginal variance models, but different partial autocorrelation models. Our objective is to evaluate the impact of mis-specifying the partial autocorrelation model on inference on the marginal mean regression coefficients, β. Specifically, the models we compare are:
True structured model for π given in (10)
Independence model, π = 0
AR(1) model: z(πjk) = γ1I{|k – j| = 1}
Unstructured model (no structure on π)
For models 1, 3, and 4, we use a Beta(1,1) distribution in constructing the prior in (5); we note that the simulation results were similar when using a Beta(2,1). For the prior on η, we set (λ, τ2 ) = (0, 100). For each model, we compute the DIC (Spiegelhalter et al., 2002) based on the observed data likelihood (Wang and Daniels, 2011).
For each of the simulated datasets, we ran 20, 000 iterations for each of the four models. For each dataset, we compute the DIC for the four correlation models and rank them (1=best to 4=worst) based on their fit (as measured by the DIC). To compare inference on the mean under all four models, we computed the following two quantities: 1) Total MSE, sum of mean squared error of the components of β and 2) Change from Baseline, change of estimated mean responses from time one to time six. We also compare the mean trajectories graphically.
5.1 Results
The simulation results are given in Tables 1 and 2 and Figure 2. As the sample size increases, the estimates for β quickly approach the true value for the true structured correlation model, more slowly for the unstructured correlation model and to the wrong values for the AR(1) and independence correlation models (with the latter with considerable bias) (Table 1). The results in Table 2 are similar with bias in the estimate of change from baseline and larger MSE's for the estimates of the β's for the incorrect models. Graphically, the fitted trajectories can be seen in Figure 2 and illustrate the bias in the fitted trajectory when the correlation structure is incorrect.
Table 1.
Posterior means of β: The values in first row, second row, and third row correspond to posterior means of β1, β2, and β3, respectively. The true value for β is (27.0, –2.0, 0.50).
| Sample Size 30 | Sample Size 100 | Sample Size 400 | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| Unstr | True | AR(1) | Indep. | Unstr | True | AR(1) | Indep. | Unstr | True | AR(1) | Indep. |
| 26.5 | 26.9 | 26.7 | 25.3 | 26.7 | 26.9 | 26.6 | 25.3 | 26.9 | 27.0 | 26.7 | 25.4 |
| -2.1 | -2.0 | -2.1 | -2.5 | -2.0 | -2.0 | -2.1 | -2.5 | -2.0 | -2.0 | -2.1 | -2.5 |
| 0.52 | 0.51 | 0.50 | 0.58 | 0.51 | 0.51 | 0.50 | 0.58 | 0.50 | 0.50 | 0.49 | 0.57 |
‘Unstr’, ‘True’, ‘AR(1)’, and ‘Indep.’ represent unstructured model, true model, AR(1) model, and independence model, respectively.
Table 2.
Summary measures from the simulation: The values in rows correspond to sample size 30, 100, and 400, respectively.
| Total MSE | Change from Baseline | ||||||
|---|---|---|---|---|---|---|---|
| Unstr | True | AR(1) | Indep. | Unstr | True | AR(1) | Indep. |
| 6.9 | 6.5 | 6.6 | 10.1 | 12.6 | 12.0 | 12.7 | 15.1 |
| 1.8 | 1.7 | 2.0 | 4.8 | 12.2 | 12.0 | 12.7 | 14.9 |
| 0.42 | 0.41 | 0.52 | 3.4 | 12.1 | 12.0 | 12.8 | 15.0 |
‘Unstr’, ‘True’, ‘AR(1)’, and ‘Indep.’ represent unstructured model, true model, AR(1) model, and independence model, respectively. ‘Total MSE’ and ‘Change from Baseline’ correspond to the mean square errors of β and the change of the mean responses from the beginning to the end of study (True change is 12.0), respectively.
Figure 2.
Posterior mean of the trajectories for the Unstructured model, True model, AR(1) model, and Independence model with sample size 30, 100, and 400
The DIC chose the true model with high probability, with this probability generally increasing with sample size (see Table 3). For example, for n = 30, 100, 400, the true structured model is chosen by the DIC with probabilities .66, .99, and .95, respectively. It is interesting to note that the main competitor of the true model in the smaller sample sizes (30, 100) is the (parsimonious) AR(1) structure, while for the larger sample size (400) it is the unstructured model. This is the reason that the probability of the true structured model decreases between n = 100 and n = 400. For the larger sample sizes, AR(1) is no longer a reasonable competitor, but the unstructured is (though it is only chosen with probability .06 for n = 400). And note, of course, the unstructured model is correct, but it has more free parameters than needed.
Table 3.
Percentage of times each model is chosen as best (row 1), second best (row 2), third best (row 3), and worst (row 4).
| Sample Size 30 | Sample Size 100 | Sample Size 400 | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| AR(1) | Ind | True | Unstr | AR(1) | Ind | True | Unstr | AR(1) | Ind | True | Unstr |
| 0.35 | .00 | 0.66 | .00 | .01 | .00 | .99 | .00 | .00 | .00 | .95 | .06 |
| 0.66 | .00 | .35 | .00 | .73 | .00 | .01 | .26 | .00 | .00 | .06 | .95 |
| .00 | .15 | .00 | .85 | .26 | .00 | .00 | .74 | 1.00 | .00 | .00 | .00 |
| .00 | .85 | .00 | .15 | .00 | 1.00 | .00 | .00 | .00 | 1.00 | .00 | .00 |
‘AR(1)’, ‘Indep’, ‘True’, and ‘Unstr’ correspond to the true model, AR(1) model, independence model, and the unstructured model, respectively.
6 Data Example: Schizophrenia trial
The data were collected as part of a randomized, double-blind clinical trial for a ”new” pharmacologic treatment of schizophrenia (Lapierre et al. 1990). The trial compared three doses of the ”new” treatment (low, medium, high) to the standard dose of haloperidol, an effective antipsychotic that had known side effects. At the time of the study, the trial was designed to find the appropriate dosing level since the experimental therapy was thought to have similar antipsychotic effectiveness with fewer side effects. Two hundred forty-five patients were enrolled and randomized to one of the four treatment arms. The intended length of follow-up was 6 weeks, with measures taken weekly expect for week 5. Schizophrenia severity was assessed using the Brief Psychiatric Rating Scale (BPRS) a sum of scores of 18 items that reflect behaviors, mode, and feelings. The scores ranged from 0 to 108 with higher scores indicating higher severity. To enter the study, the BPRS score had to be no less than 20. We will illustrate our approach using only the medium dose arm. Of main inferential interest is the change in BPRS from the beginning to the end of the study.
The dropout rate on the medium dose arm was high, with only 40 out of 61 (about 66%) participants having a measurement at week 7 (the sixth measurement time). Reasons for dropout included adverse events (e.g., side effects), lack of treatment effect, and withdrawal for unspecified reasons. The trajectories of completers vs. non-completers is shown in Figure 3a. Clearly those dropping out were doing worse prior to dropping out (higher BPRS).
Figure 3.
Trajectories of the observed data and posterior mean of the trajectories for the models considered.
6.1 Models
Let the longitudinal vector of outcomes for subject i be Yi = (Yi1 , ..., Yi6)T, measured at weeks t = (t1 , . . . , t6 ) = (1, 2, 3, 4, 5, 7). We assume Yi follows (2) with mean.
where and , i.e., an orthogonal quadratic polynomial. We assume missingness is ignorable.
We fit the partial autocorrelation models given below:
Independence Model: z(πjj+k) = 0, log(σj) = I(j = 1)η1 + I(j > 1)η2 .
AR(1) Model: z(πjj+k) = I(k = 1)γ1, log(σj) = I(j = 1)η1 + I(j > 1)η2.
Unstructured Covariance Model: .
Structured Model 1:
Structured Model 2:
Structured Model 1 is the same model as the one considered in the simulation. Stuctured Model 2 is more flexible than Structured Model 1 for the partial autocorrelations allowing nonstationary lag one and lag two autocorrelations and stationary lag three, four and five (with no structural zeros). The structure on the variances is the same as Structured Model 1. The marginal variance structure for all the models and the partial autocorrelation structures for Structured Models 1 and 2 were chosen after examining the unstructured covariance matrix in Table 4.
Table 4.
MLE of variances (on main diagonal), partial autocorrelations (in upper triangle), and Fisher's z-transformation of partial autocorrelations (lower triangle).
| 126.25 | 0.6578 | -0.0738 | 0.0804 | -0.0253 | -0.5230 |
| 0.7889 | 210.35 | 0.8543 | -0.0593 | -0.3328 | 0.0292 |
| -0.0740 | 1.2718 | 224.42 | 0.8559 | 0.2648 | 0.4375 |
| 0.0806 | -0.0594 | 1.2779 | 240.84 | 0.8961 | 0.3506 |
| -0.0253 | -0.3460 | 0.2713 | 1.4522 | 221.98 | 0.8433 |
| -0.5805 | 0.0292 | 0.4692 | 0.3661 | 1.2325 | 243.08 |
We use priors specified in (7), (8), and (9) for β, η, γ, respectively. For (9), we consider both the Beta(1, 1) (uniform) and Beta(2, 1) (triangular) for the specification of (μ, σ2). For (8), we set λ = 0 and τ2 = 100.
6.2 Results
For all models, we ran 200,000 iterations with minimal burn-in since the chains converged after a few iterations. The plot of all fitted mean trajectories is given in Figure 3b (only shown under the triangular prior). The mean BPRS initially decreased but started to go back up by week 5. This is related to those dropping out doing more poorly than those staying in the study. Table 5 contains the posterior mean of β, the change from baseline to week 7, their 95% credible intervals, and the DIC based on the observed data likelihood. This table shows that models under uniform and triangular prior fit similarly. The changes from baseline in all models were negative with 95% credible interval excluding 0, showing that Medium-dose reduced the BPRS score significantly, which agrees with earlier analysis done in Daniels and Hogan (2008). The changes from baseline varied from –14 to –11 based on the covariance model chosen. According to the DIC, Structured Model 2 with the triangular prior provided the best fit. The change from baseline in Structured Model 2 was almost a full point different from the unstructured model.
Table 5.
Posterior summaries of the models for the schizophrenia trial.
| (β0, β1, β2) | Changes from Baseline (95% CI) | DIC | |
|---|---|---|---|
| Independent | (25.6, -2.35, 0.68) | -14.1 (-18.7, -9.4) | 1924.6 |
| Unstructured | (26.9, -2.03, 0.62) | -12.2 (-16.3, -7.9) | 1663.0 |
| AR(1) (Uniform) | (27.5, -1.97, 0.69) | -11.8 (-15.8, -7.9) | 1681.3 |
| AR(1) (Triangular) | (27.5, -1.97, 0.69) | -11.8 (-15.7, -7.9) | 1680.6 |
| Structured Model 1 (Uniform) | (27.9, -1.80, 0.58) | -10.8 (-14.6, -7.0) | 1669.6 |
| Structured Model 1(Triangular) | (27.9, -1.81, 0.58) | -10.8 (-14.7, -7.0) | 1670.4 |
| Structured Model 2(Uniform) | (27.9, -1.89, 0.54) | -11.3 (-15.3, -7.3) | 1660.4 |
| Structured Model 2(Triangular) | (27.9, -1.89, 0.55) | -11.3 (-15.4, -7.3) | 1658.1 |
7 Discussion
In this paper, we first extended the priors in Daniels and Pourahmadi (2009) for partial autocorrelations for the unstructured case by introducing a set of (triangular) priors which favor positive marginal correlations. Using Fisher's z-transformation on the partial autocorrelations, we introduced a regression framework to induce structure and/or unit-specific covariates in the correlation matrix. Based on priors proposed for the partial autocorrelations in the non-regression setting, we introduced a prior for the coefficients in the partial autocorrelation regressions (and for the coefficients of the marginal variance regressions). We conducted simulations that illustrated the importance of correct specification of the correlation structure in the setting of ignorable missingness in longitudinal data and show the ability of the DIC to choose the true correlation model. We also fit the models to data from a longitudinal schizophrenia clinical trial.
There are a variety of extensions to the modeling proposed here. Clearly, it can be difficult to ‘find’ a good parametric model that imposes structure on the correlation matrix. Thus extending approaches developed under different parameterizations (Smith and Kohn, 2002; Wong, Carter, and Kohn, 2003) to our setting is an important extension. Correlation matrices (instead of covariance matrices) arise commonly in models for longitudinal data modeled using Gaussian copulas (Nelsen, 1999); efficient computations using the partial autocorrelation in these settings will be a challenging problem due to the lack of conjugacy. However, the partial autocorrelation models provide an opportunity for flexible dependence in longitudinal categorical data via multivariate probit models. To offer some robustness to a selected model for the correlation structure, an alternative would be to shrink the partial autocorrelations to the structure using independent Beta priors as has been done previously using normal priors on other parameterizations of a covariance matrix (Daniels and Kass, 2001; Daniels and Pourahmadi, 2002). Finally, we are considering extensions to irregular longitudinal data using a partial autocorrelation function and to time-dependent covariates.
Supplementary Material
Acknowledgments
This work was partially supported by NIH grant CA-85295.
Footnotes
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
Contributor Information
Y. Wang, Department of Biostatistics University of Florida Dauer Hall Gainesville, Florida 32611 yanpin@ufl.edu
M. J. Daniels, Section of Integrative Biology University of Texas at Austin 141MC Patterson Hall Austin, TX 78712 mjdaniels@austin.utexas.edu
References
- Agresti A, Hitchcock DB. Bayesian Inference for Categorical Data Analysis. Statistical Methods and Applications. 2005;14:297–330. [Google Scholar]
- Albert JH, Chib S. Bayesian Analysis of Binary and Polychotomous Response Data. Journal of the American Statistical Association. 1993;88:669–679. [Google Scholar]
- Barnard J, McCulloch R, Meng X.l. Modeling Covariance Matrices in Terms of Standard Deviations and Correlations. Statistica Sinica. 2000;10:1281–1312. [Google Scholar]
- Chiu TYM, Leonard T, Tsui KW. The Matrix-Logarithmic Covariance Model. Journal of the American Statistical Association. 1996;91:198–210. [Google Scholar]
- Clyde M, George EI. Flexible Empirical Bayes Estimation for Wavelets. Journal of the Royal Statistical Society. Series B (Statistical Methodology) 2000;62:681–698. [Google Scholar]
- Czado C. Multivariate Regression Analysis of Panel Data with Binary Outcomes Applied to Unemployment Data. Statistical Papers. 2000;41:281–304. [Google Scholar]
- Daniels MJ. Bayesian Modeling of Several Covariance Matrices and Some Results on Propriety of the Posterior for Linear Regression with Correlated and/or Heterogeneous Errors. Journal of Multivariate Analysis. 2006;97:1185–1207. [Google Scholar]
- Daniels MJ, Hogan JW. Missing Data in Longitudinal Studies: Strategies for Bayesian Modeling and Sensitivity Analysis. Chapman and Hall/CRC. 2008 [Google Scholar]
- Daniels MJ, Kass RE. Nonconjugate Bayesian Estimation of Covariance Matrices and Its Use in Hierarchical Models. Journal of the American Statistical Association. 1999;94:1254–1263. [Google Scholar]
- Daniels MJ, Kass RE. Shrinkage Estimators for Covariance Matrices. Biometrics. 2001;57:1173–1184. doi: 10.1111/j.0006-341x.2001.01173.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Daniels MJ, Normand SL. Longitudinal Profiling and Health Care Units Based on Mixed Multivariate Patient Outcomes. Biometrics. 2006;7:1–15. doi: 10.1093/biostatistics/kxi036. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Daniels MJ, Pourahmadi M. Bayesian Analysis of Covariance Matrices and Dynamic Models for Longitudinal Data. Biometrika. 2002;89:553–566. [Google Scholar]
- Daniels MJ, Pourahmadi M. Modeling Covariance Matrices via Partial Autocorrelations. Journal of Multivariate Analysis. 2009;100:2352–2363. doi: 10.1016/j.jmva.2009.04.015. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Daniels MJ, Zhao YD. Modeling the Random Effects Covariance Matrix in Longitudinal Data. Statistics in Medicine. 2003;22:1631–1647. doi: 10.1002/sim.1470. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Dempster AP. Covariance Selection. Biometrics. 1972;28:157–175. [Google Scholar]
- Efron B, Morris C. Multivariate Empirical Bayes and Estimation of Covariance Matrices. The Annals of Statistics. 1976;4:22–32. [Google Scholar]
- George EI, Foster DP. Calibration and Empirical Bayes variable Selection. Biometrika. 2000;87:731–747. [Google Scholar]
- Hoff PD, Niu X. A covariance regression model. Statistica Sinica. 2012;22:729–753. [Google Scholar]
- Joe H. Generating Random Correlation Matrices Based on Partial Correlations. Journal of Multivariate Analysis. 2006;97:2177–2189. [Google Scholar]
- Lapierre YD, Nair NPV, Chouinard G, Awad AG, Saxena B, Jones B, McClure DJ, Bakish D, Max P, Manchanda R, Beaudry P, BIoom D, Rotstein E, Ancill R, Sandor P, Sladen-Dew N, Durand C, Chandrasena R, Horn E, Elliot D, Das M, Ravindran A, Matsos G. A controlled dose-ranging study of remoxipride and haloperidol in schizophrenia - a Canadian multicentre trial. Acta Psychiatrica Scandinavica. 1990;82:72–77. doi: 10.1111/j.1600-0447.1990.tb05293.x. [DOI] [PubMed] [Google Scholar]
- Leonard T, Hsu JSJ. Bayesian Inference for a Covariance Matrix. The Annals of Statistics. 1992;20:1669–1696. [Google Scholar]
- Liechty JC, Liechty MW, Muller P. Bayesian Correlation Estimation. Biometrika. 2004;91:1–14. [Google Scholar]
- Little RJA, Rubin DB. Statistical Analysis with Missing Data. John Wiley; New York: 2002. [Google Scholar]
- Magnus JR, Neudecker H. Matrix Differential Calculus with Applications in Statistics and Econometrics. John Wiley and Sons Ltd.; 1984. [Google Scholar]
- McCullagh P, Nelder JA. Generalized Linear Models. Chapman and Hall, second edition. 1989 [Google Scholar]
- Munoz A, Carey V, Schouten JP, Segal M, Rosner B. A Parametric Family of Correlation Structures for the Analysis of Longitudinal Data. Biometrics. 1992;48:733–742. [PubMed] [Google Scholar]
- Nelsen RB. An Introduction to Copulas. Springer; 1999. [Google Scholar]
- Pourahmadi M. Joint Mean-Covariance Models with Applications to Lngitudinal Data: Unconstrained Parameterisation. Biometrika. 1999;86:677–690. [Google Scholar]
- Pourahmadi M. Maximum Likelihood Estimation of Generalised Linear Models for Multivariate Normal Covariance Matrix. Biometrika. 2000;87:425–435. [Google Scholar]
- Pourahmadi M, Daniels MJ. Dynamic Conditionally Linear Mixed Models for Longitudinal Data. Biometrics. 2002;58:225–231. doi: 10.1111/j.0006-341x.2002.00225.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Smith M, Kohn R. Parsimonious Covariance Matrix Estimation for Longitudinal Data. Journal of the American Statistical Association. 2002;97:1141–1153. [Google Scholar]
- Spiegelhalter DJ, Best NG, Carlin BP, Linde A. Bayesian Measures of Model Complexity and Fit. Journal of the Royal Statistical Society. Series B (Statistical Methodology) 2002;64:583–639. [Google Scholar]
- Verbyla AP. Modelling Variance Heterogeneity: Residual Maximum Likelihood and Diagnostics. Journal of the Royal Statistical Society. Series B. 1993;55:493–508. [Google Scholar]
- Wang C, Daniels MJ. A Note on MAR, Identifying Restrictions, Model Comparison, and Sensitivity Analysis in Pattern Mixture Models with and without Covariates for Incomplete Data (Correction, vol. 68, p 994) Biometrics. 2011;67:810–818. doi: 10.1111/j.1541-0420.2011.01565.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wong F, Carter CK, Kohn R. Efficient Estimation of Covariance Selection Models. Biometrika. 2003;90:809–830. [Google Scholar]
- Yang R, Berger JO. Estimation of a Covariance Matrix Using the Reference Prior. The Annals of Statistics. 1994;22:1195–1211. [Google Scholar]
- Zellner A. On Assessing Prior Distributions and Bayesian Regression Analysis with g-prior Distributions. In Bayesian Inference and Decision Techniques: Essays in Honor of Bruno de Finetti. 1986:233–243. [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.



