Common predictor effects for multivariate longitudinal data

Juan Jia; Robert E Weiss

doi:10.1002/sim.3589

. Author manuscript; available in PMC: 2014 Jan 20.

Published in final edited form as: Stat Med. 2009 Jun 15;28(13):1793–1804. doi: 10.1002/sim.3589

Common predictor effects for multivariate longitudinal data

Juan Jia ¹, Robert E Weiss ^1,^*

PMCID: PMC3896128 NIHMSID: NIHMS536947 PMID: 19360840

SUMMARY

Multivariate outcomes measured longitudinally over time are common in medicine, public health, psychology and sociology. The typical (saturated) longitudinal multivariate regression model has a separate set of regression coefficients for each outcome. However, multivariate outcomes are often quite similar and many outcomes can be expected to respond similarly to changes in covariate values. Given a set of outcomes likely to share common covariate effects, we propose the Clustered Outcome COmmon Predictor Effect (COPE) model and offer a two step iterative algorithm to fit the model using available software for univariate longitudinal data. Outcomes that share predictor affects need not be chosen a priori; we propose model selection tools to let the data select outcome clusters. We apply the proposed methods to psychometric data from adolescent children of HIV+ parents.

Keywords: Clustering, Common Effect, Dimension Reduction, Hierarchical Linear Model, HIV, Multivariate Regression, Model Selection

1. Introduction

Longitudinal studies in medicine, public health, psychology and sociology measure multiple outcomes repeatedly over time. Multivariate longitudinal outcomes are often highly correlated and may be affected similarly by covariates. We might find that as the covariates X1, X2 and X3 increase, so do outcomes Y1, Y2 and Y3. Rather than fit separate models with separate linear predictors for all three outcomes, we propose that outcomes with similar relationships to covariates be combined into a single model with a single linear predictor that predicts all outcomes.

Numerous multivariate longitudinal data models have been proposed in recent years [1, 2, 3, 4, 5, 6, 7, 8, 9]. For univariate longitudinal data, main interest lies in covariate effects. For multivariate outcomes, interest often lies in the covariances that measure the interrelation among outcomes [10, 11, 12, 13] and testing global effects and estimating common effects on multivariate outcomes [14, 15, 16]. O’Brien [17] proposed a global test for a common dose effect for multiple outcomes. Sammel, Ryan, and Legler [18] proposed a latent variable model where the multiple outcomes were summarized by a single latent variable and a univariate linear regression model was used to assess common covariate effects. Lin and coauthors [19] considered a scaled linear mixed model for multiple outcomes that standardized each outcome by its standard deviation to estimate the common effect of a single covariate of interest. In our paper, we are interested in a middle area where we analyze multiple outcomes, yet primary interest lies in the relationship of outcomes to predictors.

A linear predictor is a linear combination of covariates. We define two or more outcomes as similar if they relate to covariates similarly which we define to mean that a single linear predictor is all that is needed to predict the similar outcomes. Dissimilar outcomes need different linear predictors. Similar outcomes may be identified a priori from expert knowledge, or preferably in many circumstances, we can let the data determine which outcomes are similar. Similar outcomes belong to a single cluster of outcomes in our model.

We define the Clustered Outcome COmmon Predictor Effect (COPE; CO stands for both Clustered Outcome and COmmon) model for multivariate outcomes as a model where similar outcomes share a single linear predictor while dissimilar outcomes have different linear predictors. Outcome-specific link functions connect the several similar outcomes to the one linear predictor.

The saturated model is the usual multivariate model where each outcome has its own outcome-specific coefficients. The COPE model will have greater efficiency in parameter estimation than the saturated model. Suppose there are 5 outcomes and 10 predictors, then the saturated model estimates 5 * 10 = 50 fixed effect parameters and 5 intercepts for a total of 55 parameters. If all outcomes are similar, we need to estimate 19 total parameters, 10 fixed effects, 5 intercepts and 4 scale parameters. To avoid identifiability problems, we force one of the scale parameters equal to one in each cluster. If outcomes are grouped into two clusters, we would need to estimate 28 parameters, 2 * 10 = 20 fixed effects parameters, 5 intercepts and 3 scale parameters. In general, with K outcomes, G clusters and L covariates not including the intercept, there are K intercepts, K − G scale parameters and GL coefficients for a total of 2K + G(L − 1) parameters, which can be much less than the K(L + 1) coefficients for the saturated model, particularly when G is small.

The COPE model increases scientific understanding of the outcomes by identifying sets of outcomes that change similarly as covariates change and it eases the difficulty of interpreting large numbers of parameters. If a covariate x_l is a significant predictor for the g^th cluster then that covariate is a significant predictor for all outcomes in the cluster and this takes only a single hypothesis test.

This paper is motivated by a study of parents living with Human Immunodeficiency Virus (HIV) and their adolescent children. HIV+ parents were recruited from New York City’s Division of AIDS Services along with their adolescent children and were followed for up to 7 years. Follow-up interviews were conducted every three months in the first two years and every six months thereafter. Age, female (coded yes=1/no=0), time of visit, parental drug status, parental alcohol use, season, brief symptom inventory (BSI) sub-scales, of which we analyze Anxiety and Somatization, and several coping style variables, of which we analyze Positive Action and Social Support, were recorded at each interview along with many other variables. We find the best way to cluster the outcomes and estimate common predictor (female, age, parental drug status, season and so on) effects for each outcome cluster.

This paper is organized as following: Section 2 develops the Clustered Outcome COmmon Predictor Effects (COPE) model in the situation that all outcomes belong to one cluster, then more generally for G clusters. Section 3 explains the two step iterative algorithm to fit the model and model selection tools for specifying the outcome clusters. Section 4 applies the method to the data of adolescent children with HIV+ parents. The paper closes with a discussion in section 5.

2. The Clustered Outcome COmmon Predictor Effects (COPE) model

We observe K continuous outcomes and L covariates on subject i, i = 1, …, n, at times t_ij, j = 1, …, J_i. Let y_ijk denote the k^th outcome for subject i at time t_ij, k = 1, …,K, and define y_ij = (y_ij1, …, y_ijK)′; x_ijl is the value of the l^th predictor for subject i at time j, l = 1, …,L; x_ij = (x_ij1, …, x_ijL)′ and x_ij does not include an intercept. Covariates are the same for all outcomes. The saturated multiple outcomes longitudinal model is

y_{i j k} = γ_{0 k} + x_{i j}^{'} α_{k} + ε_{i j k}

(1)

where γ_0k is the intercept for outcome k, α_k is an L × 1 vector of fixed effect coefficients for the k^th outcome, ε_ijk are random errors, ε_ij = (ε_ij1, …, ε_ijK)′, $ε_{i} = (ε_{i 1}^{'}, \dots, ε_{i J_{i}}^{'})'$ , ε_i ~ N(0,Σ(θ)), Σ(θ) models covariances among the multiple outcomes and repeated measures, and θ is a vector of unknown parameters for the components of the covariance matrix.

First we specify a single cluster COPE model, meaning that a single linear predictor x_ijα predicts all outcomes. However, outcomes are measured on different scales and it is necessary to allow each outcome to have its own link function to connect with the linear predictor. The model is

E (y_{i j k}) = u_{i j k} u_{i j k} = ψ_{k} (x_{i j} α) = γ_{0 k} + γ_{1 k} x_{i j} α

(2)

where u_ijk is the expectation of the k^th outcome of subject i at time t_ij and the link function ψ_k(·), here assumed linear, links the mean to the linear predictor using intercept γ_0k and scale parameter γ_1k. In (2), α = (α₁, …, α_L)′ is the L × 1 vector of fixed common predictor effects.

In practice, each outcome will belong to one of G clusters with outcomes in each cluster sharing a common predictor effect. Here we assume the clustering is known; model selection tools to identify similarity among outcomes are discussed in the next section. There is an additional nested subscript compared to (2); we introduce subscript g to index clusters, with k now indexing outcomes within cluster. In cluster g, g = 1, …, G, there are k_g ≥ 1 outcomes, $\sum_{g = 1}^{G} k_{g} = K$ . The k^th outcome in cluster g for subject i at time t_ij is y_ijgk and we model

E (y_{i j g k}) = u_{i j g k} = ψ_{g k} (x_{i j} α_{g}) = γ_{0 g k} + γ_{1 g k} x_{i j} α_{g} .

(3)

where γ_0gk is the intercept for the k^th outcome, k = 1, …, k_g in the g^th cluster, g = 1, …,G, γ_1gk is the scale parameter for the k^th outcome in the g^th cluster, α_g is the L × 1 vector of fixed common effects for all outcomes in the g^th cluster, α_g = (α_g1, …, α_gL)′ and α_gl is the regression coefficient for the l^th predictor in the g^th cluster.

3. Model fitting and selection

We separate estimation into two parts, (i) estimation given a set of clusters and (ii) identification of a best model, that is, identification of the best way to cluster outcomes.

3.1. Estimation given a set of clusters

Given a clustering of outcomes, we estimate parameters (γ, α, θ), where link function parameters are $γ = (γ_{0}^{'}, γ_{1}^{'})'$ (K intercepts γ₀ and K scaling parameters γ₁) where γ_s = (γ_s11, …, γ_s1k₁, …, γ_sG1, …, γ_{sGk_G})′, s = 0, 1, and fixed effects parameters are $α = (α_{1}^{'}, \dots, α_{G}^{'})'$ , and regression coefficients in the g^th cluster are α_g = (α_g1, …, α_gL)′, g = 1, …,G.

We develop a two step iterative algorithm for fitting (3) using already existing algorithms for maximum likelihood fitting of multivariate longitudinal data given a particular clustering of the outcomes. We fit (3) by first conditioning on $γ = (γ_{0}^{'}, γ_{1}^{'})'$ and maximizing with respect to (α, θ). Next we condition on α and maximize the likelihood with respect to (γ, θ). We iterate till convergence by repeatedly calling existing univariate longitudinal statistical software, in our case the MIXED procedure in SAS (SAS Institute, Cary, NC).

To start the algorithm, we first get an initial value α⁽⁰⁾ for α. We estimate α̂⁽⁰⁾ by fitting the usual multivariate longitudinal linear model y_ijgk = γ_0g + x_ijα_g + ε_ijgk with G clusters but with a simplified link function where the intercepts γ_0gk ≡ γ_0g are all equal and all γ_1gk ≡ 1.

Suppose we have executed t − 1 previous iterations, now we are starting the t^th iteration.

The steps are

Update link. Define ${V̂}_{i j g}^{(t - 1)} = x_{i j} {α̂}_{g}^{(t - 1)}$ . Estimate ${γ̂}_{0 g k}^{(t)}$ and ${γ̂}_{1 g k}^{(t)}$ by fitting the multivariate longitudinal linear model
$y_{i j g k} = γ_{0 g k} + γ_{1 g k} {V̂}_{i j g}^{(t - 1)} + ε_{i j g k},$ (4)
fixing ${α̂}_{g}^{(t - 1)}$ and ${V̂}_{i j g}^{(t - 1)}$ . In each cluster, to avoid a lack of identifiability, we force one scale parameter equal to 1. Without loss of generality, we set the scale parameter for the last outcome in each cluster to be 1, where last is defined alphabetically or with other arbitrary ordering.
Update the common α̂_g. Define $Ŵ_{i g k}^{(t)} = {γ̂}_{1 g k}^{(t)} x_{i}$ and $z_{i j g k}^{(t)} = y_{i j g k} - {γ̂}_{0 g k}^{(t)}$ . Estimate α̂^(t)s by fitting the multivariate longitudinal model
$z_{i j g k} = Ŵ_{1 g k}^{(t)} α_{g} + ε_{i j g k}$ (5)
with no intercept.

The maximized likelihood increases after each iteration and after each step within an iteration. In calling SAS Proc Mixed, we update the variance parameters θ at each step of the algorithm. Many methods for checking convergence are possible. We record the maximized likelihood and repeat steps (A) and (B) until the difference of maximized likelihoods in two consecutive iterations is less than a pre-specified small number.

We calculate standard errors of the estimated α and γ as the square root of the diagonal elements of the negative inverse of the observed information matrix. We don’t use the output from either step in our algorithm. Formulas are given in the Appendix.

3.2. Choosing a best set of clusters

A best partition of outcomes into disjoint clusters can be determined by the data. We look for a clustering of similar outcomes that give the largest likelihood and that is not significantly worse than the saturated model. For K outcomes, there are b(K) ways to partition the K outcomes, where b(K) is the Bell number [20]. There are b(4) = 15 candidate models for four outcomes. We use commas to separate clusters within a model and semicolons to separate models, and use the numbers 1–4 to represent the four outcomes: the 15 models are 1,2,3,4; 12,3,4; 13,2,4; 14,2,3; 1,23,4; 1,24,3; 1,2,34; 123,4; 124,3; 12,34; 134,2; 13,24; 14,23; 1,234; 1234. Model 1,2,3,4 is the saturated model where each outcome is its own cluster. Model 1234 is the single cluster COPE model where all outcomes are assumed similar to each other. The other 13 models have 2 or 3 clusters.

The number of models $b (K) = \sum_{r = 1}^{K} a (K, r)$ , where a(K, r), the Stirling number of the second kind, is the number of ways to partition K outcomes into r clusters. Based on the recursion a(n + 1, r) = r×a(n, r) + a(n, r − 1) [21], knowing all partitions a(n, r) and a(n, r − 1), we can generate all partitions of n + 1 outcomes into r clusters using induction. Suppose we have partitioned n outcomes into r clusters, the (n + 1)^st outcome goes in to one of the existing a(n, r) clusters, or we have partitioned n outcomes into r − 1 clusters and the (n + 1)^st outcome becomes a new cluster, the (r)^th cluster combined with the (r − 1) clusters in each partition of size a(n, r − 1). We illustrated model choice using AIC backed up by formal hypothesis testing against the saturated model.

4. Application

We analyze data on the psychometric outcomes of adolescent children of HIV+ parents introduced in section 1. Adolescents’ data were collected every three months in the first two years and every six months thereafter for a total of up to 7 years. We analyze four outcomes, two are coping style subscales: 1=Social Support and 2=Positive Action, while 3=Anxiety and 4=Somatization, are BSI subscales. We number the variables from one to four to enable shorthand for identifying the different clusterings. The four outcome variables have long right tails, and we transform the outcomes as log₂(x + c) where c is the smallest non-zero outcome.

The predictors are female; age (years); season trichotomized as spring (Mar–Jun), summer (Jul–Oct) and winter (Nov–Feb); parental hard drug use trichotomized as non-user, using-user and non-using user; and time. Time is modeled as a linear spline with knots at 18 and 36 months; the three variables are titled Month, Month18 and Month36. A parent is a hard drug non-user if the parent never reported hard drug use during the study; a hard drug user is a parent who reported hard drug use at some visit during the study. Adolescents had each visit classified as (i) a non-user visit if their parent was a non-user; (ii) a using-user visit if the parent was a hard drug user and the parent reported hard drug use in the three months previous to the observation; or (iii) a non-using user visit if the parent was a hard drug user who did not report hard drug use in the previous three months.

Correlations among the four outcomes are given in Table I. The two BSI variables 3=Anxiety and 4=Somatization are highly correlated (ρ = .75) and the two coping style variables 1=Social Support and 2=Positive Action are moderately correlated (ρ = .49). The two coping style variables have low cross correlations with the two BSI variables (all ρ < .25).

Table I.

Baseline correlations among the four outcome variables.

	Social Support	Positive Action	Anxiety	Somatization
Social support	1.00	.49	.25	.24
Positive action	.49	1.00	.19	.17
Anxiety	.25	.19	1.00	.75
Somatization	.24	.17	.75	1.00

Open in a new tab

We model the residuals ε_i with a multivariate random intercept model [1, chapter 13], one random intercept per outcome per subject. We write ε_ijgk = β_igk + δ_ijgk, with β_i ~ N_K(0,D) a K-vector with elements β_igk and δ_ij ~ N_K(0, V) where δ_ij = (δ_ij11, …, δ_{ijgk_g} )′ and both D and V are unstructured covariance matrices.

We used the two step iterative algorithm to fit each model and compare the likelihood to the saturated model. We seek models that have the fewest clusters and at the same time are not significantly different from the saturated model. Results are summarized in Table II, the column difference is the likelihood ratio test statistic between each model and the saturated model, model 1,2,3,4; the degrees of freedom and p-values using the nominal χ² comparison distribution are listed in neighboring columns. The four columns labeled gamma give the estimated scale parameters γ_1gk for each model; the γs are in the same order as the outcomes are listed in the Model column. For example, in model 134,2, γ₁ is the scale parameter for outcome 1, γ₂ is the scale parameter for outcome 3, γ₃ is the scale parameter for outcome 4 and γ₄ is the scale parameter for outcome 2.

Table II.

Summary statistics for all 15 COPE models for four outcomes, ordered by Akaike’s Information Criterion (AIC) from best to worst. Model identifies the clustering among outcomes; Outcomes 1–4 represent Social Support= 1, Positive Action= 2, Anxiety= 3 and Somatization= 4. AIC, Bayes Information Criterion (BIC) and −2 log L are in smaller-is-better form. Diff is the −2log likelihood ratio test (LRT) statistic between each model and the saturated model 1, 2, 3, 4; df is the degrees of freedom for the test and the p-value comes from comparing the LRT statistic to the χ²(df) reference distribution. Columns labeled γ₁ to γ₄ are the estimates of the scale γ₁s from the link function; the γ₁s correspond to the outcomes in the order shown in the Model column.

Order	Model	AIC	BIC	−2 log L	Diff	df	p-value	γ₁	γ₂	γ₃	γ₄
1	134, 2	12,325.5	12,477.3	12,245.5	23.7	18	.17	.38	.88	1.00	1.00
2	1, 2, 34	12,330.6	12,520.4	12,230.6	8.8	9	.46	1.00	1.00	.87	1.00
3	13, 2, 4	12,333.7	12,523.5	12,233.7	11.9	9	.22	.48	1.00	1.00	1.00
4	14, 2, 3	12,337.8	12,527.6	12,237.8	16.0	9	.07	.37	1.00	1.00	1.00
5	1, 234	12,339.2	12,491.0	12,259.2	37.4	18	.00	1.00	.38	.90	1.00
6	1234	12,339.3	12,453.2	12,279.3	57.5	27	.00	.39	.23	.88	1.00
7	1, 2, 3, 4	12,341.8	12,569.6	12,221.8	-	-	-	1.00	1.00	1.00	1.00
8	13, 24	12,344.9	12,496.8	12,264.9	43.1	18	.00	.53	1.00	.35	1.00
9	12, 34	12,345.1	12,497.0	12,265.1	43.3	18	.00	1.05	1.00	.88	1.00
10	14, 23	12,346.8	12,498.6	12,266.8	45.0	18	.00	.51	1.00	.59	1.00
11	123, 4	12,347.8	12,499.7	12,267.8	46.0	18	.00	.47	.28	1.00	1.00
12	1, 23, 4	12,348.4	12,538.2	12,248.4	26.6	9	.00	1.00	.53	1.00	1.00
13	1, 24, 3	12,350.8	12,540.6	12,250.8	29.0	9	.00	1.00	.37	1.00	1.00
14	124, 3	12,351.6	12,503.4	12,271.6	49.8	18	.00	.38	.23	1.00	1.00
15	12, 3, 4	12,356.4	12,546.2	12,256.4	34.6	9	.00	1.06	1.00	1.00	1.00

Open in a new tab

Model 134,2 with two clusters is best according to AIC. It is not significantly worse than the saturated model (p-value= .17). Models 13,2,4; 14,2,3; and 1,2,34 are also good models that are also not significantly worse than the saturated model (p-values= .22, .07, .46), but they have more parameters and one more cluster than model 134,2. Model 134,2 is nested in all three and all are not significantly better than model 134,2, thus we settle on model 134,2.

We also report BIC, though it is known to pick models with too few parameters as compared with AIC. BIC picks the model 1234, but this model is significantly worse than the model 1,2,3,4 and we discard it from further consideration.

The model selection results suggest that 1=Social Support, 3=Anxiety and 4=Somatization share a common set of predictor effects but 2=Positive Action has its own set of predictor effects. Based on the correlations in Table I, we might have expected the model 12,34 to have been best; actually model 12,34 is significantly worse than the saturated model. In fact, every model that clusters outcome 2 with any other outcome is significantly worse than the saturated model by the likelihood ratio test. This illustrates that outcome correlation and outcome similarity are different concepts and can lead to different clustering of outcomes even in the same data set.

The scale parameters γ_1gk for 1=Social Support and 3=Anxiety are .38 (se= .087) and .88 (se= .099) with 4=Somatization and 2=Positive Action having γ ≡ 1. Predictor changes associated with a one unit change in 4=Somatization are associated with a .38 unit change in 1=Social Support and a .88 unit change in 3=Anxiety. Both estimated γ_1gks are significantly different from zero, indicating that both outcomes are indeed related to the linear predictor. The estimate of γ_1gk is not significantly different from 1, suggesting that it is possible that 3=Anxiety and 4=Somatization are affected equally by changes in covariates.

Table III shows parameter estimates, standard errors and p-values for γ_1gkα_g from the saturated model 1,2,3,4 and from our preferred model 134,2. We separate estimates, standard errors and p-values into three sub-sections of the table. Because we calculate γα for each outcome, the p-values are slightly different for each outcome. If we depend on the results for α_g alone to report significance, which we would in a paper, those results are given in the columns for outcomes 2=Positive Action and 4=Somatization since their γ_1gk are constrained to equal 1.

Table III.

Comparison of coefficient estimates, standard errors and p-values from the saturated model (Sat) and for α_g * γ_1gk from the COPE model 134,2, where Social Support= 1, Anxiety= 3 and Somatization= 4 are in the same cluster, and Positive Action= 2 is in a separate cluster.

	Predictor	Social Support		Positive Action		Anxiety		Somatization
		Sat	COPE	Sat	COPE	Sat	COPE	Sat	COPE
Point Estimate	Month	−.010	−.010	−.016	−.016	−.026	−.023	−.024	−.026
	Female	.171	.163	.036	.034	.348	.379	.444	.429
	Age	.005	.002	.042	.040	.011	.006	−.005	.006
	Spring	−.031	.024	−.080	−.050	.054	.055	.096	.063
	Summer	−.103	−.029	−.092	−.053	−.052	−.066	−.039	−.075
	Month18	.016	.013	.024	.023	.031	.030	.030	.034
	Month36	−.014	−.005	−.014	−.010	−.010	−.011	−.002	−.012
	Drug NUU	.063	.027	.073	.053	.007	.064	.052	.072
	Drug UU	.033	.086	.092	.117	.223	.198	.223	.225
	Parental Alcohol	−.036	−.051	−.026	−.036	−.082	−.119	−.158	−.135

Std Error	Month	.004	.003	.005	.004	.005	.005	.005	.005
	Female	.044	.032	.055	.040	.093	.075	.099	.086
	Age	.004	.007	.005	.009	.007	.015	.007	.017
	Spring	.036	.017	.042	.038	.053	.040	.051	.045
	Summer	.038	.019	.044	.040	.053	.040	.051	.045
	Month18	.006	.004	.007	.006	.008	.007	.008	.007
	Month36	.005	.003	.006	.006	.009	.007	.009	.007
	Drug NUU	.047	.028	.058	.041	.096	.064	.102	.073
	Drug UU	.066	.036	.080	.062	.112	.080	.116	.089
	Parental Alcohol	.040	.022	.047	.039	.063	.047	.062	.054

P-Value	Month	.014	.000	.001	.000	.000	.000	.000	.000
	Female	.000	.000	.517	.398	.000	.000	.000	.000
	Age	.196	.717	.000	.000	.097	.713	.464	.710
	Spring	.396	.167	.059	.184	.309	.166	.058	.168
	Summer	.007	.133	.037	.179	.330	.100	.446	.096
	Month18	.007	.001	.000	.000	.000	.000	.000	.000
	Month36	.007	.145	.016	.082	.265	.108	.803	.100
	Drug NUU	.177	.335	.204	.198	.938	.320	.608	.321
	Drug UU	.620	.018	.250	.061	.046	.013	.054	.012
	Parental Alcohol	.359	.018	.573	.353	.196	.012	.011	.014

Open in a new tab

In the saturated model, coefficient estimates for 3=Anxiety and 4=Somatization are close for statistically significant predictors including month, female, summer, month18 and these estimates are close to the estimated α for cluster 134. This is consistent with the scale parameter estimate γ = .88 for 3=Anxiety and γ ≡ 1 for 4=Somatization. 1=Social Support has a γ₁ = .38, and its estimates are therefore smaller and its standard errors (SE) are smaller compared to those for 3=Anxiety.

For outcome 2=Positive Action, age is important, however age is not significantly associated with outcomes 1, 3 or 4. Conversely, female gender is associated with the clustered outcomes 1,3 and 4 but is not significantly associated with outcome 2. Thus the differences in the clusters appears to be driven heavily by their relationships to covariates age and female gender.

The standard errors (SEs) for the COPE model are smaller than the corresponding SEs from the saturated model for all outcomes and coefficients except age. The average ratio of standard errors COPE model to saturated model is .90 for a reduction in standard error of 10%; ignoring age, the average ratio of SEs is .76 for a reduction of 24%. This will be true in general as information from multiple outcomes estimates the coefficients for a single cluster of similar outcomes.

5. Discussion

We have described methodology for parameter estimation and model selection for the COPE model using maximum likelihood. Our model can be applied to both longitudinal and cross sectional data. Our original inspiration for this model is the longitudinal data set described in this example. A useful feature of the methodology is that inferences do not depend on the parameterization of the predictors; linear transformations of the predictors lead to the same choices and conclusions. While we use maximum likelihood for our estimation procedure, users who preferred to could use generalized estimating equations [22] or quasi-least squares [23] or other procedure. Like many statistical procedures, our approach is exploratory; a confirmatory approach might use one data set to select a clustering and a second data set could be used to specifically test the outcome clustering found by the first model. The second data set could result from an independent sample or it could be created by splitting an original larger sample into an estimation data set and a formal test data set.

While we did not emphasize the point, our model and likelihood based approach accommodates missing data that is missing at random [24]. Single outcomes at a time may be missing or all outcomes at a given time may be missing. Missing data is appropriately accounted for in our formulas for estimating the information matrix given in the appendix.

Lin et al [19] propose a somewhat related random effects model for cross sectional data. Our models differ in several major features. We use the data to choose our clusters, they assume outcomes under analysis are all in the single cluster. In their model, a single covariate has an effect that is constant across outcomes up to a scaling parameter, all other covariates are assumed to have different effects on different outcomes; within a cluster, we assume all covariates have similar impacts on the outcome up to the scaling parameter. Lin et al treat cross sectional data while we treat longitudinal data – this has a major impact on our approaches to scaling the impacts of the covariates, for example we conceptualize outcome specific link functions to connect the linear predictor to each outcome in a cluster. In general models for longitudinal data, there is not a single marginal variance for any outcome; the marginal variance is a function of time in any non-trivial model with a random time effect. We place our scaling parameters solely inside the linear predictor and we estimate them as part of the linear predictor; they are not related to the covariance parameters. This simplifies the algebra in the sense that unlike in Lin et al (2000) the covariance parameters are asymptotically independent of the linear predictor parameters for normal error distributions.

ACKNOWLEDGEMENT

The authors thank Mary Jane Rotheram for permission to use the Project TALC data.

Contract/grant sponsor: Weiss was supported by the Center for HIV Identification, Prevention, and Treatment Services, NIH/NIMH; contract/grant number: P30MH58107

APPENDIX

Estimating the covariance matrix for α̂ and β̂

We calculate the observed information matrix to estimate the covariance matrix for all fixed effect parameters in a K outcome G cluster COPE model with L predictors. We first define notation, then we take the second derivative of the likelihood function to calculate the information matrix, and lastly we use the delta method to calculate the variance of γ̂ α̂.

As before, let y_ijgk denote the k^th outcome in the g^th cluster for subject i at time t_ij, k = 1, …, k_g, g = 1, …,G, $\sum_{g = 1}^{G} k_{g} = K$ , i = 1, …, n and j = 1, …, J_i. Let Y_igk = (y_i1gk, …, y_{iJ_igk})′ be the vector of J_i outcomes for subject i and outcome k in cluster g, then the KJ_i-vector of all observations for subject i is Y_i = (y_i11, …, y_i1k₁, …, y_iG1, …, y_{iGk_G})′. Let x_ijl be the l^th predictor value for subject i at time t_ij, l = 1, …,L, x_ij = (x_ij1, …, x_ijL)′ and define X_i = (x_i1, …, x_{iJ_i})′. Including an intercept for each outcome, we set up the KJ_i × G(L + 1) predictor matrix M_i = I_K ⊗ (1,X_i), in which 1 is a J_i-vector of 1s, I_K is a K×K identity matrix and ⊗ denotes direct product. We split M_i into 2K sets of columns corresponding to the intercepts 1 and predictors X_i for each of the K outcomes, $M_{i} = (M_{i 1}^{(1, 1)}, M_{i 2}^{(1, 1)}, \dots, M_{i 1}^{(1, k_{1})}, M_{i 2}^{(1, k_{1})}, \dots, M_{i 1}^{(G, 1)}, M_{i 2}^{(G, 1)}, \dots, M_{i 1}^{(G, k_{G})}, M_{i 2}^{(G, k_{G})})$ . Each $M_{i 1}^{(g, k)}$ is a column vector corresponding to the intercept for the k^th outcome in cluster g, while $M_{i 2}^{(g, k)}$ is an L columned matrix with predictors for the k^th outcome in cluster g.

Model (3) can now be written in matrix form as

Y_{i} = M_{i} ϕ + ε_{i} .

(6)

The coefficient vector ϕ is a G(L + 1)-vector with elements $ϕ = (ϕ_{11}^{'}, \dots, ϕ_{1 k_{1}}^{'}, \dots, ϕ_{G 1}^{'}, \dots, ϕ_{G k_{G}}^{'})'$ , where $ϕ_{g k} = (γ_{0 g k}, γ_{1 g k} α_{g}^{'})'$ , γ_sgk is the intercept for s = 0 and scale parameter for s = 1 for the k^th outcome in cluster g, α_g = (α_g1, …, α_gL)′ is an L-vector of fixed effect coefficients for cluster g, α_gl is the regression coefficient for the l^th predictor in cluster g.

We have K intercepts γ₀ = (γ₀₁₁, …, γ_01k₁, …, γ_0G1, …, γ_{0Gk_G})′, K scaling parameters γ₁ = (γ₁₁₁, …, γ_11k₁, …, γ_1G1, …, γ_{1Gk_G})′, of which G are fixed to 1 and K − G are to be estimated, and GL coefficients $α = (α_{1}^{'}, \dots, α_{G}^{'})'$ . In each cluster we fix the scale parameter of the last outcome in a cluster to 1, γ_{gk_g} ≡ 1, and let $γ_{1}^{*}$ be the K − G sub-vector of γ₁ that omits the G fixed γs. The covariance matrix of $({γ̂}_{0}, {γ̂}_{1}^{*}, α̂)$ is a (2K − G + GL)×(2K − G + GL) matrix.

The residual ε_i = (ε_i11, ε_i12, …, ε_i1k₁, …, ε_iG1, …, ε_{iGk_G})′ is a KJ_i-vector, and we model ε_i ~ N(0, V_i(θ)), θ is a vector of unknown parameters for the components of the covariance matrix. The log likelihood function of (γ₀, γ₁, α, θ) is proportional to

- \frac{1}{2} log | U_{i}^{*} V_{i} U_{i}^{*'} | - \frac{1}{2} \sum_{i = 1}^{n} (Y_{i} - M_{i} ϕ)' U_{i} V_{i}^{- 1} U_{i} (Y_{i} - M_{i} ϕ)

(7)

where U_i is a diagonal matrix with j^th diagonal element equal to 1 when y_ijgk is observed and 0 when y_ijgk is missing and $U_{1}^{*}$ is U_i omitting the rows that are all 0. The second derivative of the log likelihood is block diagonal with θ orthogonal to (γ₀, γ₁, α), so we omit calculations involving θ in the information matrix [25].

We differentiate (7) twice with respect to (γ₀, γ₁, α). We lay out the estimated information matrix Î of (γ₀, γ₁, α) as a 3 × 3 grid matrix where Î_pq is the second derivative of the log likelihood with respect to p and q, p and q go from 1 to 3, representing γ₀, γ₁ and α respectively and substituting parameter estimates for the unknown parameters. Because the information matrix is symmetric, we need give only the upper triangle set of blocks

Î_{11} [(g, k), (g', k')] = \frac{\partial ℓ^{2}}{\partial γ_{0 g k} \partial γ_{0 g' k'}} = - \sum_{i = 1}^{n} M_{i 1}^{(g, k)'} U_{i} V_{i}^{- 1} U_{i} M_{i 1}^{(g', k')}, Î_{12} [(g, k), (g', k')] = \frac{\partial ℓ^{2}}{\partial γ_{0 g k} \partial γ_{1 g' k'}} = - \sum_{i = 1}^{n} M_{i 1}^{(g, k)'} U_{i} V_{i}^{- 1} U_{i} (M_{i 2}^{(g', k')} {α̂}_{g'}), Î_{13} [(g, k), g'] = \frac{\partial ℓ^{2}}{\partial γ_{0 g k} \partial α_{g'}} = - \sum_{i = 1}^{n} M_{i 1}^{(g, k)'} U_{i} V_{i}^{- 1} U_{i} (\sum_{d = 1}^{g_{k}^{'}} {γ̂}_{1 g' d} M_{i 2}^{(g', d)}), Î_{22} [(g, k), (g', k')] = \frac{\partial ℓ^{2}}{\partial γ_{1 g k} \partial γ_{1 g' k'}} = - \sum_{i = 1}^{n} (M_{i 2}^{(g, k)} {α̂}_{g})' U_{i} V_{i}^{- 1} U_{i} (M_{i 2}^{(g', k')} {α̂}_{g'}), Î_{23} [(g, k), g'] = \frac{\partial ℓ^{2}}{\partial γ_{1 g k} \partial α_{g'}} = - \sum_{i = 1}^{n} (M_{i 2}^{(g, k)} {α̂}_{g'})' U_{i} V_{i}^{- 1} U_{i} (\sum_{d = 1}^{g_{k}^{'}} {γ̂}_{1 g' d} M_{i 2}^{(g', d)}) + δ (g = g') \sum_{i = 1}^{n} (Y_{i} - M_{i} ϕ̂)' U_{i} V_{i}^{- 1} U_{i} M_{i 2}^{(g, k)},

and

Î_{33} [g, g'] = \frac{\partial ℓ^{2}}{\partial α_{g} \partial α_{g'}} = - \sum_{i = 1}^{n} (\sum_{k = 1}^{g_{k}} {γ̂}_{1 g k} M_{i 2}^{(g, k)})' U_{i} V_{i}^{- 1} U_{i} (\sum_{d = 1}^{g_{k}^{'}} {γ̂}_{1 g' d} M_{i 2}^{(g', d)}),

where V̂_i = V_i(θ)|_θ=θ̂ and δ(g = g′) = 1 if g = g′, otherwise δ(g = g′) = 0. Define $k_{g}^{*} = 0$ if g = 1 and let $k_{g}^{*} = \sum_{i = 1}^{g - 1} k_{i}$ for g ≥ 2 be the cumulative number of outcomes prior to block g, then Î_pq((g, k), (g′, k′)) is the element in the ${(k_{g}^{*} + k)}^{t h}$ row and ${(k_{g'}^{' *} + k')}^{t h}$ column of block Î_pq for q ≤ 2 and p ≤ q. Î_p3((g, k), g′) is the ${(k_{g}^{*} + k)}^{t h}$ row of block Î_p3 for p ≤ 2, and Î₃₃(g, g′) is the g^th and g′^th sub-block of Î₃₃, each sub-block is an L × L matrix. The covariance matrix of the MLE $({γ̂}_{0}, {γ̂}_{1}^{*}, α̂)$ is the inverse of the negative information matrix deleting the G rows and G columns corresponding to the fixed scaling parameters γ_{1gk_g}, g = 1, …,G.

We are also interested in inference on γ̂_1gk α̂_g, g = 1, …,G and k = 1, …, k_g − 1. Define h ≡ h(α_g, γ_1gk) = α_gγ_1gk, and using the delta method [26, page 45]

h ({α̂}_{g}, {γ̂}_{1 g k}) = {γ̂}_{1 g k}^{*} {α̂}_{g} ~ N ({γ̂}_{1 g k}^{*} α_{g}, ḣ Ψ̂ ḣ')

(8)

where Ψ̂ is the covariance matrix for the parameter subset (γ̂_1gk, α̂_g) and

ḣ_{L \times (L + 1)} = {(\partial h / \partial γ_{1 g k}, \partial h / \partial α_{g}) |}_{{α̂}_{g}, {γ̂}_{1 g k}} = ({α̂}_{g}, {γ̂}_{1 g k} I_{L})

(9)

Partition

Ψ̂ = (\begin{matrix} {Ψ̂}_{γ γ} {Ψ̂}_{γ α} \\ {Ψ̂}_{α γ} {Ψ̂}_{α α} \end{matrix})

(10)

Evaluating ḣ from (9) in the variance ḣΨ̂ḣ′ we get

V a r ({γ̂}_{1 g k} {α̂}_{g l}) = {α̂}_{g l}^{2} {Ψ̂}_{γ γ} + 2 {γ̂}_{1 g k} {Ψ̂}_{α γ, l} {α̂}_{g l} + {γ̂}_{1 g k}^{2} {Ψ̂}_{α α, l l}

(11)

where Ψ_αα,ll is the l^th diagonal element of Ψ_αα and Ψ_αγ,l is the l^th element of Ψ_αγ.

REFERENCES

1.Weiss RE. Modeling Longitudinal Data. New York: Springer-Verlag; 2005. [Google Scholar]
2.Molenberghs G, Verbeke G. Models for Discrete Longitudinal Data. New York: Springer-Verlag; 2005. [Google Scholar]
3.Diggle PJ, Heagerty PJ, Liang KY, Zeger SL. Analysis of Longitudinal Data. 2nd Ed. New York: Oxford University Press; 2002. [Google Scholar]
4.Davis CS. Statistical Methods for the Analysis of Repeated Measurements. New York: Springer-Verlag; 2002. [Google Scholar]
5.McCulloch CE, Searle SR. Generalized, Linear, and Mixed Models. New York: Wiley; 2001. [Google Scholar]
6.Verbeke G, Molenberghs G. Linear Mixed Models for Longitudinal Data. New York: Springer-Verlag; 2000. [Google Scholar]
7.Brown H, Prescott R. Applied Mixed Models in Medicine. New York: Wiley; 1999. [Google Scholar]
8.Littel RC, Milliken GA, Stroup WA, Wolfinger RD. The SAS System for Mixed Models. Cary, N.C: SAS Institute Inc; 1996. [Google Scholar]
9.Chaganty NR, Naik DN. Analysis of multivariate longitudinal data using quasi-least squares. Journal of Statistical Planning and Inference. 2002;103:421–436. [Google Scholar]
10.Beckett LA, Tancredi DJ, Wilson RS. Multivariate longitudinal models for complex change processes. Statistics in Medicine. 2004;23:231–239. doi: 10.1002/sim.1712. [DOI] [PubMed] [Google Scholar]
11.Nummi T, Mottonen J. On the analysis of multivariate growth curve. Metrika. 2000;52:77–89. [Google Scholar]
12.Mickey RM, Shema SJ, Vacek PM, Bell DY. Analysis of multiple outcome variables measured longitudinally. Computational Statistics and Data Analysis. 1994;17:17–33. [Google Scholar]
13.Dubin JA, Müller HG. Dynamical correlation for multivariate longitudinal data. Journal of the American Statistical Association. 2005;100(471):872–881. [Google Scholar]
14.Travison TG, Brookmeyer R. Global effects estimation for multidimensional outcomes. Statistics in Medicine. 2007;26:4845–4859. doi: 10.1002/sim.2983. [DOI] [PubMed] [Google Scholar]
15.Gray SM, Brookmeyer R. Estimating a treatment effect from multidimensional longitudinal data. Biometrics. 1998;54:976–988. [PubMed] [Google Scholar]
16.Gray SM, Brookmeyer R. Multidimensional longitudinal data: Estimating a treatment effect from continuous, discrete or time-to-event response variables. Journal of the American Statistical Association. 2000;95:396–406. [Google Scholar]
17.O’Brien PC. Procedures for comparing samples with multiple endpoints. Biometrics. 1984;40:1079–1087. [PubMed] [Google Scholar]
18.Sammel M, Ryan LM, Legler JM. Latent variable models for mixed discrete and continuous outcomes. Journal of Royal Statistical Society, Series B. 1997;59:667–678. [Google Scholar]
19.Lin X, Ryan L, Sammel M, Zhang D, Padungtod C, Xu X. A scaled linear model for multiple outcomes. Biometrics. 2000;56:593–601. doi: 10.1111/j.0006-341x.2000.00593.x. [DOI] [PubMed] [Google Scholar]
20.Rota GC. The number of partitions of a set. American Mathematical Monthly. 1964;71(5):498–504. [Google Scholar]
21.Young PT. Congruences for Bernoulli, Euler, and Stirling numbers. Journal of Number Theory. 1999;78:204–227. [Google Scholar]
22.Liang KY, Zeger SL. Longitudinal data analysis using generalized linear models. Biometrika. 1986;73(1):13–22. [Google Scholar]
23.Chaganty NR. An alternative approach to the analysis of longitudinal data via generalized estimating equations. Journal of Statistical Planning and Inference. 1997;63:39–54. [Google Scholar]
24.Little RJA, Rubin DB. Statistical Analysis with Missing Data. John Wiley & Sons; 2002. [Google Scholar]
25.Lange KL, Little RJA, Taylor JMG. Robust statistical modeling using the t distribution. Journal of the American Statistical Association. 1989;84:881–896. [Google Scholar]
26.Ferguson TS. A Course in Large Sample Theory. Florida: CRC Press; 2002. [Google Scholar]

[R1] 1.Weiss RE. Modeling Longitudinal Data. New York: Springer-Verlag; 2005. [Google Scholar]

[R2] 2.Molenberghs G, Verbeke G. Models for Discrete Longitudinal Data. New York: Springer-Verlag; 2005. [Google Scholar]

[R3] 3.Diggle PJ, Heagerty PJ, Liang KY, Zeger SL. Analysis of Longitudinal Data. 2nd Ed. New York: Oxford University Press; 2002. [Google Scholar]

[R4] 4.Davis CS. Statistical Methods for the Analysis of Repeated Measurements. New York: Springer-Verlag; 2002. [Google Scholar]

[R5] 5.McCulloch CE, Searle SR. Generalized, Linear, and Mixed Models. New York: Wiley; 2001. [Google Scholar]

[R6] 6.Verbeke G, Molenberghs G. Linear Mixed Models for Longitudinal Data. New York: Springer-Verlag; 2000. [Google Scholar]

[R7] 7.Brown H, Prescott R. Applied Mixed Models in Medicine. New York: Wiley; 1999. [Google Scholar]

[R8] 8.Littel RC, Milliken GA, Stroup WA, Wolfinger RD. The SAS System for Mixed Models. Cary, N.C: SAS Institute Inc; 1996. [Google Scholar]

[R9] 9.Chaganty NR, Naik DN. Analysis of multivariate longitudinal data using quasi-least squares. Journal of Statistical Planning and Inference. 2002;103:421–436. [Google Scholar]

[R10] 10.Beckett LA, Tancredi DJ, Wilson RS. Multivariate longitudinal models for complex change processes. Statistics in Medicine. 2004;23:231–239. doi: 10.1002/sim.1712. [DOI] [PubMed] [Google Scholar]

[R11] 11.Nummi T, Mottonen J. On the analysis of multivariate growth curve. Metrika. 2000;52:77–89. [Google Scholar]

[R12] 12.Mickey RM, Shema SJ, Vacek PM, Bell DY. Analysis of multiple outcome variables measured longitudinally. Computational Statistics and Data Analysis. 1994;17:17–33. [Google Scholar]

[R13] 13.Dubin JA, Müller HG. Dynamical correlation for multivariate longitudinal data. Journal of the American Statistical Association. 2005;100(471):872–881. [Google Scholar]

[R14] 14.Travison TG, Brookmeyer R. Global effects estimation for multidimensional outcomes. Statistics in Medicine. 2007;26:4845–4859. doi: 10.1002/sim.2983. [DOI] [PubMed] [Google Scholar]

[R15] 15.Gray SM, Brookmeyer R. Estimating a treatment effect from multidimensional longitudinal data. Biometrics. 1998;54:976–988. [PubMed] [Google Scholar]

[R16] 16.Gray SM, Brookmeyer R. Multidimensional longitudinal data: Estimating a treatment effect from continuous, discrete or time-to-event response variables. Journal of the American Statistical Association. 2000;95:396–406. [Google Scholar]

[R17] 17.O’Brien PC. Procedures for comparing samples with multiple endpoints. Biometrics. 1984;40:1079–1087. [PubMed] [Google Scholar]

[R18] 18.Sammel M, Ryan LM, Legler JM. Latent variable models for mixed discrete and continuous outcomes. Journal of Royal Statistical Society, Series B. 1997;59:667–678. [Google Scholar]

[R19] 19.Lin X, Ryan L, Sammel M, Zhang D, Padungtod C, Xu X. A scaled linear model for multiple outcomes. Biometrics. 2000;56:593–601. doi: 10.1111/j.0006-341x.2000.00593.x. [DOI] [PubMed] [Google Scholar]

[R20] 20.Rota GC. The number of partitions of a set. American Mathematical Monthly. 1964;71(5):498–504. [Google Scholar]

[R21] 21.Young PT. Congruences for Bernoulli, Euler, and Stirling numbers. Journal of Number Theory. 1999;78:204–227. [Google Scholar]

[R22] 22.Liang KY, Zeger SL. Longitudinal data analysis using generalized linear models. Biometrika. 1986;73(1):13–22. [Google Scholar]

[R23] 23.Chaganty NR. An alternative approach to the analysis of longitudinal data via generalized estimating equations. Journal of Statistical Planning and Inference. 1997;63:39–54. [Google Scholar]

[R24] 24.Little RJA, Rubin DB. Statistical Analysis with Missing Data. John Wiley & Sons; 2002. [Google Scholar]

[R25] 25.Lange KL, Little RJA, Taylor JMG. Robust statistical modeling using the t distribution. Journal of the American Statistical Association. 1989;84:881–896. [Google Scholar]

[R26] 26.Ferguson TS. A Course in Large Sample Theory. Florida: CRC Press; 2002. [Google Scholar]

PERMALINK

Common predictor effects for multivariate longitudinal data

Juan Jia

Robert E Weiss

SUMMARY

1. Introduction

2. The Clustered Outcome COmmon Predictor Effects (COPE) model

3. Model fitting and selection

3.1. Estimation given a set of clusters

3.2. Choosing a best set of clusters

4. Application

Table I.

Table II.

Table III.

5. Discussion

ACKNOWLEDGEMENT

APPENDIX

Estimating the covariance matrix for α̂ and β̂

REFERENCES

ACTIONS

PERMALINK

RESOURCES

Cite

Add to Collections

PERMALINK

Common predictor effects for multivariate longitudinal data

Juan Jia

Robert E Weiss

SUMMARY

1. Introduction

2. The Clustered Outcome COmmon Predictor Effects (COPE) model

3. Model fitting and selection

3.1. Estimation given a set of clusters

3.2. Choosing a best set of clusters

4. Application

Table I.

Table II.

Table III.

5. Discussion

ACKNOWLEDGEMENT

APPENDIX

Estimating the covariance matrix for α̂ and β̂

REFERENCES

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases