SUMMARY
Multivariate outcomes measured longitudinally over time are common in medicine, public health, psychology and sociology. The typical (saturated) longitudinal multivariate regression model has a separate set of regression coefficients for each outcome. However, multivariate outcomes are often quite similar and many outcomes can be expected to respond similarly to changes in covariate values. Given a set of outcomes likely to share common covariate effects, we propose the Clustered Outcome COmmon Predictor Effect (COPE) model and offer a two step iterative algorithm to fit the model using available software for univariate longitudinal data. Outcomes that share predictor affects need not be chosen a priori; we propose model selection tools to let the data select outcome clusters. We apply the proposed methods to psychometric data from adolescent children of HIV+ parents.
Keywords: Clustering, Common Effect, Dimension Reduction, Hierarchical Linear Model, HIV, Multivariate Regression, Model Selection
1. Introduction
Longitudinal studies in medicine, public health, psychology and sociology measure multiple outcomes repeatedly over time. Multivariate longitudinal outcomes are often highly correlated and may be affected similarly by covariates. We might find that as the covariates X1, X2 and X3 increase, so do outcomes Y1, Y2 and Y3. Rather than fit separate models with separate linear predictors for all three outcomes, we propose that outcomes with similar relationships to covariates be combined into a single model with a single linear predictor that predicts all outcomes.
Numerous multivariate longitudinal data models have been proposed in recent years [1, 2, 3, 4, 5, 6, 7, 8, 9]. For univariate longitudinal data, main interest lies in covariate effects. For multivariate outcomes, interest often lies in the covariances that measure the interrelation among outcomes [10, 11, 12, 13] and testing global effects and estimating common effects on multivariate outcomes [14, 15, 16]. O’Brien [17] proposed a global test for a common dose effect for multiple outcomes. Sammel, Ryan, and Legler [18] proposed a latent variable model where the multiple outcomes were summarized by a single latent variable and a univariate linear regression model was used to assess common covariate effects. Lin and coauthors [19] considered a scaled linear mixed model for multiple outcomes that standardized each outcome by its standard deviation to estimate the common effect of a single covariate of interest. In our paper, we are interested in a middle area where we analyze multiple outcomes, yet primary interest lies in the relationship of outcomes to predictors.
A linear predictor is a linear combination of covariates. We define two or more outcomes as similar if they relate to covariates similarly which we define to mean that a single linear predictor is all that is needed to predict the similar outcomes. Dissimilar outcomes need different linear predictors. Similar outcomes may be identified a priori from expert knowledge, or preferably in many circumstances, we can let the data determine which outcomes are similar. Similar outcomes belong to a single cluster of outcomes in our model.
We define the Clustered Outcome COmmon Predictor Effect (COPE; CO stands for both Clustered Outcome and COmmon) model for multivariate outcomes as a model where similar outcomes share a single linear predictor while dissimilar outcomes have different linear predictors. Outcome-specific link functions connect the several similar outcomes to the one linear predictor.
The saturated model is the usual multivariate model where each outcome has its own outcome-specific coefficients. The COPE model will have greater efficiency in parameter estimation than the saturated model. Suppose there are 5 outcomes and 10 predictors, then the saturated model estimates 5 * 10 = 50 fixed effect parameters and 5 intercepts for a total of 55 parameters. If all outcomes are similar, we need to estimate 19 total parameters, 10 fixed effects, 5 intercepts and 4 scale parameters. To avoid identifiability problems, we force one of the scale parameters equal to one in each cluster. If outcomes are grouped into two clusters, we would need to estimate 28 parameters, 2 * 10 = 20 fixed effects parameters, 5 intercepts and 3 scale parameters. In general, with K outcomes, G clusters and L covariates not including the intercept, there are K intercepts, K − G scale parameters and GL coefficients for a total of 2K + G(L − 1) parameters, which can be much less than the K(L + 1) coefficients for the saturated model, particularly when G is small.
The COPE model increases scientific understanding of the outcomes by identifying sets of outcomes that change similarly as covariates change and it eases the difficulty of interpreting large numbers of parameters. If a covariate xl is a significant predictor for the gth cluster then that covariate is a significant predictor for all outcomes in the cluster and this takes only a single hypothesis test.
This paper is motivated by a study of parents living with Human Immunodeficiency Virus (HIV) and their adolescent children. HIV+ parents were recruited from New York City’s Division of AIDS Services along with their adolescent children and were followed for up to 7 years. Follow-up interviews were conducted every three months in the first two years and every six months thereafter. Age, female (coded yes=1/no=0), time of visit, parental drug status, parental alcohol use, season, brief symptom inventory (BSI) sub-scales, of which we analyze Anxiety and Somatization, and several coping style variables, of which we analyze Positive Action and Social Support, were recorded at each interview along with many other variables. We find the best way to cluster the outcomes and estimate common predictor (female, age, parental drug status, season and so on) effects for each outcome cluster.
This paper is organized as following: Section 2 develops the Clustered Outcome COmmon Predictor Effects (COPE) model in the situation that all outcomes belong to one cluster, then more generally for G clusters. Section 3 explains the two step iterative algorithm to fit the model and model selection tools for specifying the outcome clusters. Section 4 applies the method to the data of adolescent children with HIV+ parents. The paper closes with a discussion in section 5.
2. The Clustered Outcome COmmon Predictor Effects (COPE) model
We observe K continuous outcomes and L covariates on subject i, i = 1, …, n, at times tij, j = 1, …, Ji. Let yijk denote the kth outcome for subject i at time tij, k = 1, …,K, and define yij = (yij1, …, yijK)′; xijl is the value of the lth predictor for subject i at time j, l = 1, …,L; xij = (xij1, …, xijL)′ and xij does not include an intercept. Covariates are the same for all outcomes. The saturated multiple outcomes longitudinal model is
| (1) |
where γ0k is the intercept for outcome k, αk is an L × 1 vector of fixed effect coefficients for the kth outcome, εijk are random errors, εij = (εij1, …, εijK)′, , εi ~ N(0,Σ(θ)), Σ(θ) models covariances among the multiple outcomes and repeated measures, and θ is a vector of unknown parameters for the components of the covariance matrix.
First we specify a single cluster COPE model, meaning that a single linear predictor xijα predicts all outcomes. However, outcomes are measured on different scales and it is necessary to allow each outcome to have its own link function to connect with the linear predictor. The model is
| (2) |
where uijk is the expectation of the kth outcome of subject i at time tij and the link function ψk(·), here assumed linear, links the mean to the linear predictor using intercept γ0k and scale parameter γ1k. In (2), α = (α1, …, αL)′ is the L × 1 vector of fixed common predictor effects.
In practice, each outcome will belong to one of G clusters with outcomes in each cluster sharing a common predictor effect. Here we assume the clustering is known; model selection tools to identify similarity among outcomes are discussed in the next section. There is an additional nested subscript compared to (2); we introduce subscript g to index clusters, with k now indexing outcomes within cluster. In cluster g, g = 1, …, G, there are kg ≥ 1 outcomes, . The kth outcome in cluster g for subject i at time tij is yijgk and we model
| (3) |
where γ0gk is the intercept for the kth outcome, k = 1, …, kg in the gth cluster, g = 1, …,G, γ1gk is the scale parameter for the kth outcome in the gth cluster, αg is the L × 1 vector of fixed common effects for all outcomes in the gth cluster, αg = (αg1, …, αgL)′ and αgl is the regression coefficient for the lth predictor in the gth cluster.
3. Model fitting and selection
We separate estimation into two parts, (i) estimation given a set of clusters and (ii) identification of a best model, that is, identification of the best way to cluster outcomes.
3.1. Estimation given a set of clusters
Given a clustering of outcomes, we estimate parameters (γ, α, θ), where link function parameters are (K intercepts γ0 and K scaling parameters γ1) where γs = (γs11, …, γs1k1, …, γsG1, …, γsGkG)′, s = 0, 1, and fixed effects parameters are , and regression coefficients in the gth cluster are αg = (αg1, …, αgL)′, g = 1, …,G.
We develop a two step iterative algorithm for fitting (3) using already existing algorithms for maximum likelihood fitting of multivariate longitudinal data given a particular clustering of the outcomes. We fit (3) by first conditioning on and maximizing with respect to (α, θ). Next we condition on α and maximize the likelihood with respect to (γ, θ). We iterate till convergence by repeatedly calling existing univariate longitudinal statistical software, in our case the MIXED procedure in SAS (SAS Institute, Cary, NC).
To start the algorithm, we first get an initial value α(0) for α. We estimate α̂(0) by fitting the usual multivariate longitudinal linear model yijgk = γ0g + xijαg + εijgk with G clusters but with a simplified link function where the intercepts γ0gk ≡ γ0g are all equal and all γ1gk ≡ 1.
Suppose we have executed t − 1 previous iterations, now we are starting the tth iteration.
The steps are
- Update link. Define . Estimate and by fitting the multivariate longitudinal linear model
fixing and . In each cluster, to avoid a lack of identifiability, we force one scale parameter equal to 1. Without loss of generality, we set the scale parameter for the last outcome in each cluster to be 1, where last is defined alphabetically or with other arbitrary ordering.(4) - Update the common α̂g. Define and . Estimate α̂(t)s by fitting the multivariate longitudinal model
with no intercept.(5)
The maximized likelihood increases after each iteration and after each step within an iteration. In calling SAS Proc Mixed, we update the variance parameters θ at each step of the algorithm. Many methods for checking convergence are possible. We record the maximized likelihood and repeat steps (A) and (B) until the difference of maximized likelihoods in two consecutive iterations is less than a pre-specified small number.
We calculate standard errors of the estimated α and γ as the square root of the diagonal elements of the negative inverse of the observed information matrix. We don’t use the output from either step in our algorithm. Formulas are given in the Appendix.
3.2. Choosing a best set of clusters
A best partition of outcomes into disjoint clusters can be determined by the data. We look for a clustering of similar outcomes that give the largest likelihood and that is not significantly worse than the saturated model. For K outcomes, there are b(K) ways to partition the K outcomes, where b(K) is the Bell number [20]. There are b(4) = 15 candidate models for four outcomes. We use commas to separate clusters within a model and semicolons to separate models, and use the numbers 1–4 to represent the four outcomes: the 15 models are 1,2,3,4; 12,3,4; 13,2,4; 14,2,3; 1,23,4; 1,24,3; 1,2,34; 123,4; 124,3; 12,34; 134,2; 13,24; 14,23; 1,234; 1234. Model 1,2,3,4 is the saturated model where each outcome is its own cluster. Model 1234 is the single cluster COPE model where all outcomes are assumed similar to each other. The other 13 models have 2 or 3 clusters.
The number of models , where a(K, r), the Stirling number of the second kind, is the number of ways to partition K outcomes into r clusters. Based on the recursion a(n + 1, r) = r×a(n, r) + a(n, r − 1) [21], knowing all partitions a(n, r) and a(n, r − 1), we can generate all partitions of n + 1 outcomes into r clusters using induction. Suppose we have partitioned n outcomes into r clusters, the (n + 1)st outcome goes in to one of the existing a(n, r) clusters, or we have partitioned n outcomes into r − 1 clusters and the (n + 1)st outcome becomes a new cluster, the (r)th cluster combined with the (r − 1) clusters in each partition of size a(n, r − 1). We illustrated model choice using AIC backed up by formal hypothesis testing against the saturated model.
4. Application
We analyze data on the psychometric outcomes of adolescent children of HIV+ parents introduced in section 1. Adolescents’ data were collected every three months in the first two years and every six months thereafter for a total of up to 7 years. We analyze four outcomes, two are coping style subscales: 1=Social Support and 2=Positive Action, while 3=Anxiety and 4=Somatization, are BSI subscales. We number the variables from one to four to enable shorthand for identifying the different clusterings. The four outcome variables have long right tails, and we transform the outcomes as log2(x + c) where c is the smallest non-zero outcome.
The predictors are female; age (years); season trichotomized as spring (Mar–Jun), summer (Jul–Oct) and winter (Nov–Feb); parental hard drug use trichotomized as non-user, using-user and non-using user; and time. Time is modeled as a linear spline with knots at 18 and 36 months; the three variables are titled Month, Month18 and Month36. A parent is a hard drug non-user if the parent never reported hard drug use during the study; a hard drug user is a parent who reported hard drug use at some visit during the study. Adolescents had each visit classified as (i) a non-user visit if their parent was a non-user; (ii) a using-user visit if the parent was a hard drug user and the parent reported hard drug use in the three months previous to the observation; or (iii) a non-using user visit if the parent was a hard drug user who did not report hard drug use in the previous three months.
Correlations among the four outcomes are given in Table I. The two BSI variables 3=Anxiety and 4=Somatization are highly correlated (ρ = .75) and the two coping style variables 1=Social Support and 2=Positive Action are moderately correlated (ρ = .49). The two coping style variables have low cross correlations with the two BSI variables (all ρ < .25).
Table I.
Baseline correlations among the four outcome variables.
| Social Support |
Positive Action |
Anxiety | Somatization | |
|---|---|---|---|---|
| Social support | 1.00 | .49 | .25 | .24 |
| Positive action | .49 | 1.00 | .19 | .17 |
| Anxiety | .25 | .19 | 1.00 | .75 |
| Somatization | .24 | .17 | .75 | 1.00 |
We model the residuals εi with a multivariate random intercept model [1, chapter 13], one random intercept per outcome per subject. We write εijgk = βigk + δijgk, with βi ~ NK(0,D) a K-vector with elements βigk and δij ~ NK(0, V) where δij = (δij11, …, δijgkg )′ and both D and V are unstructured covariance matrices.
We used the two step iterative algorithm to fit each model and compare the likelihood to the saturated model. We seek models that have the fewest clusters and at the same time are not significantly different from the saturated model. Results are summarized in Table II, the column difference is the likelihood ratio test statistic between each model and the saturated model, model 1,2,3,4; the degrees of freedom and p-values using the nominal χ2 comparison distribution are listed in neighboring columns. The four columns labeled gamma give the estimated scale parameters γ1gk for each model; the γs are in the same order as the outcomes are listed in the Model column. For example, in model 134,2, γ1 is the scale parameter for outcome 1, γ2 is the scale parameter for outcome 3, γ3 is the scale parameter for outcome 4 and γ4 is the scale parameter for outcome 2.
Table II.
Summary statistics for all 15 COPE models for four outcomes, ordered by Akaike’s Information Criterion (AIC) from best to worst. Model identifies the clustering among outcomes; Outcomes 1–4 represent Social Support= 1, Positive Action= 2, Anxiety= 3 and Somatization= 4. AIC, Bayes Information Criterion (BIC) and −2 log L are in smaller-is-better form. Diff is the −2log likelihood ratio test (LRT) statistic between each model and the saturated model 1, 2, 3, 4; df is the degrees of freedom for the test and the p-value comes from comparing the LRT statistic to the χ2(df) reference distribution. Columns labeled γ1 to γ4 are the estimates of the scale γ1s from the link function; the γ1s correspond to the outcomes in the order shown in the Model column.
| Order | Model | AIC | BIC | −2 log L | Diff | df | p-value | γ1 | γ2 | γ3 | γ4 |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | 134, 2 | 12,325.5 | 12,477.3 | 12,245.5 | 23.7 | 18 | .17 | .38 | .88 | 1.00 | 1.00 |
| 2 | 1, 2, 34 | 12,330.6 | 12,520.4 | 12,230.6 | 8.8 | 9 | .46 | 1.00 | 1.00 | .87 | 1.00 |
| 3 | 13, 2, 4 | 12,333.7 | 12,523.5 | 12,233.7 | 11.9 | 9 | .22 | .48 | 1.00 | 1.00 | 1.00 |
| 4 | 14, 2, 3 | 12,337.8 | 12,527.6 | 12,237.8 | 16.0 | 9 | .07 | .37 | 1.00 | 1.00 | 1.00 |
| 5 | 1, 234 | 12,339.2 | 12,491.0 | 12,259.2 | 37.4 | 18 | .00 | 1.00 | .38 | .90 | 1.00 |
| 6 | 1234 | 12,339.3 | 12,453.2 | 12,279.3 | 57.5 | 27 | .00 | .39 | .23 | .88 | 1.00 |
| 7 | 1, 2, 3, 4 | 12,341.8 | 12,569.6 | 12,221.8 | - | - | - | 1.00 | 1.00 | 1.00 | 1.00 |
| 8 | 13, 24 | 12,344.9 | 12,496.8 | 12,264.9 | 43.1 | 18 | .00 | .53 | 1.00 | .35 | 1.00 |
| 9 | 12, 34 | 12,345.1 | 12,497.0 | 12,265.1 | 43.3 | 18 | .00 | 1.05 | 1.00 | .88 | 1.00 |
| 10 | 14, 23 | 12,346.8 | 12,498.6 | 12,266.8 | 45.0 | 18 | .00 | .51 | 1.00 | .59 | 1.00 |
| 11 | 123, 4 | 12,347.8 | 12,499.7 | 12,267.8 | 46.0 | 18 | .00 | .47 | .28 | 1.00 | 1.00 |
| 12 | 1, 23, 4 | 12,348.4 | 12,538.2 | 12,248.4 | 26.6 | 9 | .00 | 1.00 | .53 | 1.00 | 1.00 |
| 13 | 1, 24, 3 | 12,350.8 | 12,540.6 | 12,250.8 | 29.0 | 9 | .00 | 1.00 | .37 | 1.00 | 1.00 |
| 14 | 124, 3 | 12,351.6 | 12,503.4 | 12,271.6 | 49.8 | 18 | .00 | .38 | .23 | 1.00 | 1.00 |
| 15 | 12, 3, 4 | 12,356.4 | 12,546.2 | 12,256.4 | 34.6 | 9 | .00 | 1.06 | 1.00 | 1.00 | 1.00 |
Model 134,2 with two clusters is best according to AIC. It is not significantly worse than the saturated model (p-value= .17). Models 13,2,4; 14,2,3; and 1,2,34 are also good models that are also not significantly worse than the saturated model (p-values= .22, .07, .46), but they have more parameters and one more cluster than model 134,2. Model 134,2 is nested in all three and all are not significantly better than model 134,2, thus we settle on model 134,2.
We also report BIC, though it is known to pick models with too few parameters as compared with AIC. BIC picks the model 1234, but this model is significantly worse than the model 1,2,3,4 and we discard it from further consideration.
The model selection results suggest that 1=Social Support, 3=Anxiety and 4=Somatization share a common set of predictor effects but 2=Positive Action has its own set of predictor effects. Based on the correlations in Table I, we might have expected the model 12,34 to have been best; actually model 12,34 is significantly worse than the saturated model. In fact, every model that clusters outcome 2 with any other outcome is significantly worse than the saturated model by the likelihood ratio test. This illustrates that outcome correlation and outcome similarity are different concepts and can lead to different clustering of outcomes even in the same data set.
The scale parameters γ1gk for 1=Social Support and 3=Anxiety are .38 (se= .087) and .88 (se= .099) with 4=Somatization and 2=Positive Action having γ ≡ 1. Predictor changes associated with a one unit change in 4=Somatization are associated with a .38 unit change in 1=Social Support and a .88 unit change in 3=Anxiety. Both estimated γ1gks are significantly different from zero, indicating that both outcomes are indeed related to the linear predictor. The estimate of γ1gk is not significantly different from 1, suggesting that it is possible that 3=Anxiety and 4=Somatization are affected equally by changes in covariates.
Table III shows parameter estimates, standard errors and p-values for γ1gkαg from the saturated model 1,2,3,4 and from our preferred model 134,2. We separate estimates, standard errors and p-values into three sub-sections of the table. Because we calculate γα for each outcome, the p-values are slightly different for each outcome. If we depend on the results for αg alone to report significance, which we would in a paper, those results are given in the columns for outcomes 2=Positive Action and 4=Somatization since their γ1gk are constrained to equal 1.
Table III.
Comparison of coefficient estimates, standard errors and p-values from the saturated model (Sat) and for αg * γ1gk from the COPE model 134,2, where Social Support= 1, Anxiety= 3 and Somatization= 4 are in the same cluster, and Positive Action= 2 is in a separate cluster.
| Predictor | Social Support | Positive Action | Anxiety | Somatization | |||||
|---|---|---|---|---|---|---|---|---|---|
| Sat | COPE | Sat | COPE | Sat | COPE | Sat | COPE | ||
| Point Estimate | Month | −.010 | −.010 | −.016 | −.016 | −.026 | −.023 | −.024 | −.026 |
| Female | .171 | .163 | .036 | .034 | .348 | .379 | .444 | .429 | |
| Age | .005 | .002 | .042 | .040 | .011 | .006 | −.005 | .006 | |
| Spring | −.031 | .024 | −.080 | −.050 | .054 | .055 | .096 | .063 | |
| Summer | −.103 | −.029 | −.092 | −.053 | −.052 | −.066 | −.039 | −.075 | |
| Month18 | .016 | .013 | .024 | .023 | .031 | .030 | .030 | .034 | |
| Month36 | −.014 | −.005 | −.014 | −.010 | −.010 | −.011 | −.002 | −.012 | |
| Drug NUU | .063 | .027 | .073 | .053 | .007 | .064 | .052 | .072 | |
| Drug UU | .033 | .086 | .092 | .117 | .223 | .198 | .223 | .225 | |
| Parental Alcohol | −.036 | −.051 | −.026 | −.036 | −.082 | −.119 | −.158 | −.135 | |
| Std Error | Month | .004 | .003 | .005 | .004 | .005 | .005 | .005 | .005 |
| Female | .044 | .032 | .055 | .040 | .093 | .075 | .099 | .086 | |
| Age | .004 | .007 | .005 | .009 | .007 | .015 | .007 | .017 | |
| Spring | .036 | .017 | .042 | .038 | .053 | .040 | .051 | .045 | |
| Summer | .038 | .019 | .044 | .040 | .053 | .040 | .051 | .045 | |
| Month18 | .006 | .004 | .007 | .006 | .008 | .007 | .008 | .007 | |
| Month36 | .005 | .003 | .006 | .006 | .009 | .007 | .009 | .007 | |
| Drug NUU | .047 | .028 | .058 | .041 | .096 | .064 | .102 | .073 | |
| Drug UU | .066 | .036 | .080 | .062 | .112 | .080 | .116 | .089 | |
| Parental Alcohol | .040 | .022 | .047 | .039 | .063 | .047 | .062 | .054 | |
| P-Value | Month | .014 | .000 | .001 | .000 | .000 | .000 | .000 | .000 |
| Female | .000 | .000 | .517 | .398 | .000 | .000 | .000 | .000 | |
| Age | .196 | .717 | .000 | .000 | .097 | .713 | .464 | .710 | |
| Spring | .396 | .167 | .059 | .184 | .309 | .166 | .058 | .168 | |
| Summer | .007 | .133 | .037 | .179 | .330 | .100 | .446 | .096 | |
| Month18 | .007 | .001 | .000 | .000 | .000 | .000 | .000 | .000 | |
| Month36 | .007 | .145 | .016 | .082 | .265 | .108 | .803 | .100 | |
| Drug NUU | .177 | .335 | .204 | .198 | .938 | .320 | .608 | .321 | |
| Drug UU | .620 | .018 | .250 | .061 | .046 | .013 | .054 | .012 | |
| Parental Alcohol | .359 | .018 | .573 | .353 | .196 | .012 | .011 | .014 | |
In the saturated model, coefficient estimates for 3=Anxiety and 4=Somatization are close for statistically significant predictors including month, female, summer, month18 and these estimates are close to the estimated α for cluster 134. This is consistent with the scale parameter estimate γ = .88 for 3=Anxiety and γ ≡ 1 for 4=Somatization. 1=Social Support has a γ1 = .38, and its estimates are therefore smaller and its standard errors (SE) are smaller compared to those for 3=Anxiety.
For outcome 2=Positive Action, age is important, however age is not significantly associated with outcomes 1, 3 or 4. Conversely, female gender is associated with the clustered outcomes 1,3 and 4 but is not significantly associated with outcome 2. Thus the differences in the clusters appears to be driven heavily by their relationships to covariates age and female gender.
The standard errors (SEs) for the COPE model are smaller than the corresponding SEs from the saturated model for all outcomes and coefficients except age. The average ratio of standard errors COPE model to saturated model is .90 for a reduction in standard error of 10%; ignoring age, the average ratio of SEs is .76 for a reduction of 24%. This will be true in general as information from multiple outcomes estimates the coefficients for a single cluster of similar outcomes.
5. Discussion
We have described methodology for parameter estimation and model selection for the COPE model using maximum likelihood. Our model can be applied to both longitudinal and cross sectional data. Our original inspiration for this model is the longitudinal data set described in this example. A useful feature of the methodology is that inferences do not depend on the parameterization of the predictors; linear transformations of the predictors lead to the same choices and conclusions. While we use maximum likelihood for our estimation procedure, users who preferred to could use generalized estimating equations [22] or quasi-least squares [23] or other procedure. Like many statistical procedures, our approach is exploratory; a confirmatory approach might use one data set to select a clustering and a second data set could be used to specifically test the outcome clustering found by the first model. The second data set could result from an independent sample or it could be created by splitting an original larger sample into an estimation data set and a formal test data set.
While we did not emphasize the point, our model and likelihood based approach accommodates missing data that is missing at random [24]. Single outcomes at a time may be missing or all outcomes at a given time may be missing. Missing data is appropriately accounted for in our formulas for estimating the information matrix given in the appendix.
Lin et al [19] propose a somewhat related random effects model for cross sectional data. Our models differ in several major features. We use the data to choose our clusters, they assume outcomes under analysis are all in the single cluster. In their model, a single covariate has an effect that is constant across outcomes up to a scaling parameter, all other covariates are assumed to have different effects on different outcomes; within a cluster, we assume all covariates have similar impacts on the outcome up to the scaling parameter. Lin et al treat cross sectional data while we treat longitudinal data – this has a major impact on our approaches to scaling the impacts of the covariates, for example we conceptualize outcome specific link functions to connect the linear predictor to each outcome in a cluster. In general models for longitudinal data, there is not a single marginal variance for any outcome; the marginal variance is a function of time in any non-trivial model with a random time effect. We place our scaling parameters solely inside the linear predictor and we estimate them as part of the linear predictor; they are not related to the covariance parameters. This simplifies the algebra in the sense that unlike in Lin et al (2000) the covariance parameters are asymptotically independent of the linear predictor parameters for normal error distributions.
ACKNOWLEDGEMENT
The authors thank Mary Jane Rotheram for permission to use the Project TALC data.
Contract/grant sponsor: Weiss was supported by the Center for HIV Identification, Prevention, and Treatment Services, NIH/NIMH; contract/grant number: P30MH58107
APPENDIX
Estimating the covariance matrix for α̂ and β̂
We calculate the observed information matrix to estimate the covariance matrix for all fixed effect parameters in a K outcome G cluster COPE model with L predictors. We first define notation, then we take the second derivative of the likelihood function to calculate the information matrix, and lastly we use the delta method to calculate the variance of γ̂ α̂.
As before, let yijgk denote the kth outcome in the gth cluster for subject i at time tij, k = 1, …, kg, g = 1, …,G, , i = 1, …, n and j = 1, …, Ji. Let Yigk = (yi1gk, …, yiJigk)′ be the vector of Ji outcomes for subject i and outcome k in cluster g, then the KJi-vector of all observations for subject i is Yi = (yi11, …, yi1k1, …, yiG1, …, yiGkG)′. Let xijl be the lth predictor value for subject i at time tij, l = 1, …,L, xij = (xij1, …, xijL)′ and define Xi = (xi1, …, xiJi)′. Including an intercept for each outcome, we set up the KJi × G(L + 1) predictor matrix Mi = IK ⊗ (1,Xi), in which 1 is a Ji-vector of 1s, IK is a K×K identity matrix and ⊗ denotes direct product. We split Mi into 2K sets of columns corresponding to the intercepts 1 and predictors Xi for each of the K outcomes, . Each is a column vector corresponding to the intercept for the kth outcome in cluster g, while is an L columned matrix with predictors for the kth outcome in cluster g.
Model (3) can now be written in matrix form as
| (6) |
The coefficient vector ϕ is a G(L + 1)-vector with elements , where , γsgk is the intercept for s = 0 and scale parameter for s = 1 for the kth outcome in cluster g, αg = (αg1, …, αgL)′ is an L-vector of fixed effect coefficients for cluster g, αgl is the regression coefficient for the lth predictor in cluster g.
We have K intercepts γ0 = (γ011, …, γ01k1, …, γ0G1, …, γ0GkG)′, K scaling parameters γ1 = (γ111, …, γ11k1, …, γ1G1, …, γ1GkG)′, of which G are fixed to 1 and K − G are to be estimated, and GL coefficients . In each cluster we fix the scale parameter of the last outcome in a cluster to 1, γgkg ≡ 1, and let be the K − G sub-vector of γ1 that omits the G fixed γs. The covariance matrix of is a (2K − G + GL)×(2K − G + GL) matrix.
The residual εi = (εi11, εi12, …, εi1k1, …, εiG1, …, εiGkG)′ is a KJi-vector, and we model εi ~ N(0, Vi(θ)), θ is a vector of unknown parameters for the components of the covariance matrix. The log likelihood function of (γ0, γ1, α, θ) is proportional to
| (7) |
where Ui is a diagonal matrix with jth diagonal element equal to 1 when yijgk is observed and 0 when yijgk is missing and is Ui omitting the rows that are all 0. The second derivative of the log likelihood is block diagonal with θ orthogonal to (γ0, γ1, α), so we omit calculations involving θ in the information matrix [25].
We differentiate (7) twice with respect to (γ0, γ1, α). We lay out the estimated information matrix Î of (γ0, γ1, α) as a 3 × 3 grid matrix where Îpq is the second derivative of the log likelihood with respect to p and q, p and q go from 1 to 3, representing γ0, γ1 and α respectively and substituting parameter estimates for the unknown parameters. Because the information matrix is symmetric, we need give only the upper triangle set of blocks
and
where V̂i = Vi(θ)|θ=θ̂ and δ(g = g′) = 1 if g = g′, otherwise δ(g = g′) = 0. Define if g = 1 and let for g ≥ 2 be the cumulative number of outcomes prior to block g, then Îpq((g, k), (g′, k′)) is the element in the row and column of block Îpq for q ≤ 2 and p ≤ q. Îp3((g, k), g′) is the row of block Îp3 for p ≤ 2, and Î33(g, g′) is the gth and g′th sub-block of Î33, each sub-block is an L × L matrix. The covariance matrix of the MLE is the inverse of the negative information matrix deleting the G rows and G columns corresponding to the fixed scaling parameters γ1gkg, g = 1, …,G.
We are also interested in inference on γ̂1gk α̂g, g = 1, …,G and k = 1, …, kg − 1. Define h ≡ h(αg, γ1gk) = αgγ1gk, and using the delta method [26, page 45]
| (8) |
where Ψ̂ is the covariance matrix for the parameter subset (γ̂1gk, α̂g) and
| (9) |
Partition
| (10) |
Evaluating ḣ from (9) in the variance ḣΨ̂ḣ′ we get
| (11) |
where Ψαα,ll is the lth diagonal element of Ψαα and Ψαγ,l is the lth element of Ψαγ.
REFERENCES
- 1.Weiss RE. Modeling Longitudinal Data. New York: Springer-Verlag; 2005. [Google Scholar]
- 2.Molenberghs G, Verbeke G. Models for Discrete Longitudinal Data. New York: Springer-Verlag; 2005. [Google Scholar]
- 3.Diggle PJ, Heagerty PJ, Liang KY, Zeger SL. Analysis of Longitudinal Data. 2nd Ed. New York: Oxford University Press; 2002. [Google Scholar]
- 4.Davis CS. Statistical Methods for the Analysis of Repeated Measurements. New York: Springer-Verlag; 2002. [Google Scholar]
- 5.McCulloch CE, Searle SR. Generalized, Linear, and Mixed Models. New York: Wiley; 2001. [Google Scholar]
- 6.Verbeke G, Molenberghs G. Linear Mixed Models for Longitudinal Data. New York: Springer-Verlag; 2000. [Google Scholar]
- 7.Brown H, Prescott R. Applied Mixed Models in Medicine. New York: Wiley; 1999. [Google Scholar]
- 8.Littel RC, Milliken GA, Stroup WA, Wolfinger RD. The SAS System for Mixed Models. Cary, N.C: SAS Institute Inc; 1996. [Google Scholar]
- 9.Chaganty NR, Naik DN. Analysis of multivariate longitudinal data using quasi-least squares. Journal of Statistical Planning and Inference. 2002;103:421–436. [Google Scholar]
- 10.Beckett LA, Tancredi DJ, Wilson RS. Multivariate longitudinal models for complex change processes. Statistics in Medicine. 2004;23:231–239. doi: 10.1002/sim.1712. [DOI] [PubMed] [Google Scholar]
- 11.Nummi T, Mottonen J. On the analysis of multivariate growth curve. Metrika. 2000;52:77–89. [Google Scholar]
- 12.Mickey RM, Shema SJ, Vacek PM, Bell DY. Analysis of multiple outcome variables measured longitudinally. Computational Statistics and Data Analysis. 1994;17:17–33. [Google Scholar]
- 13.Dubin JA, Müller HG. Dynamical correlation for multivariate longitudinal data. Journal of the American Statistical Association. 2005;100(471):872–881. [Google Scholar]
- 14.Travison TG, Brookmeyer R. Global effects estimation for multidimensional outcomes. Statistics in Medicine. 2007;26:4845–4859. doi: 10.1002/sim.2983. [DOI] [PubMed] [Google Scholar]
- 15.Gray SM, Brookmeyer R. Estimating a treatment effect from multidimensional longitudinal data. Biometrics. 1998;54:976–988. [PubMed] [Google Scholar]
- 16.Gray SM, Brookmeyer R. Multidimensional longitudinal data: Estimating a treatment effect from continuous, discrete or time-to-event response variables. Journal of the American Statistical Association. 2000;95:396–406. [Google Scholar]
- 17.O’Brien PC. Procedures for comparing samples with multiple endpoints. Biometrics. 1984;40:1079–1087. [PubMed] [Google Scholar]
- 18.Sammel M, Ryan LM, Legler JM. Latent variable models for mixed discrete and continuous outcomes. Journal of Royal Statistical Society, Series B. 1997;59:667–678. [Google Scholar]
- 19.Lin X, Ryan L, Sammel M, Zhang D, Padungtod C, Xu X. A scaled linear model for multiple outcomes. Biometrics. 2000;56:593–601. doi: 10.1111/j.0006-341x.2000.00593.x. [DOI] [PubMed] [Google Scholar]
- 20.Rota GC. The number of partitions of a set. American Mathematical Monthly. 1964;71(5):498–504. [Google Scholar]
- 21.Young PT. Congruences for Bernoulli, Euler, and Stirling numbers. Journal of Number Theory. 1999;78:204–227. [Google Scholar]
- 22.Liang KY, Zeger SL. Longitudinal data analysis using generalized linear models. Biometrika. 1986;73(1):13–22. [Google Scholar]
- 23.Chaganty NR. An alternative approach to the analysis of longitudinal data via generalized estimating equations. Journal of Statistical Planning and Inference. 1997;63:39–54. [Google Scholar]
- 24.Little RJA, Rubin DB. Statistical Analysis with Missing Data. John Wiley & Sons; 2002. [Google Scholar]
- 25.Lange KL, Little RJA, Taylor JMG. Robust statistical modeling using the t distribution. Journal of the American Statistical Association. 1989;84:881–896. [Google Scholar]
- 26.Ferguson TS. A Course in Large Sample Theory. Florida: CRC Press; 2002. [Google Scholar]
