Abstract
When using repeated measures linear regression models to make causal inference in laboratory, clinical and environmental research, it is typically assumed that the within-subject association of differences (or changes) in predictor variable values across replicates is the same as the between-subject association of differences in those predictor variable values. However, this is often false. For example, with body weight as the predictor variable and blood cholesterol (which increases with higher body fat) as the outcome: (i) a 10-lb. weight increase in the same adult affects more greatly an increase in cholesterol in that adult than does (ii) one adult weighing 10 lbs. more than a second indicate higher cholesterol in the heavier adult. A 10-lb. weight gain in the first adult more likely reflects a build-up of body fat in that person, while a second person being 10 lbs. heavier than the first could be influenced by other factors, such as the second person being taller. Hence, to make causal inferences, different within- and between-subject slopes should be separately modeled. A related misconception commonly made using generalized estimation equations (GEE) and mixed models on repeated measures (i.e., for fitting cross-sectional regression) is that the working correlation structure only influences variance of the parameter estimates. However, only independence working correlation guarantees that the modeled parameters have interpretability. We illustrate this with an example where changing the working correlation from independence to equicorrelation qualitatively biases parameters of GEE models and show that this happens because within- and between-subject slopes for the outcomes regressed on the predictor variables differ. We then systematically describe several common mechanisms that cause within- and between-subject slopes to differ: change effects, lag/reverse-lag and spillover causality, shared within-subject measurement bias or confounding, and predictor variable measurement error. The misconceptions we describe should be better publicized. Repeated measures analyses should compare within- and between-subject slopes of predictors and when they do differ, investigate the causal reasons for this.
Keywords: within-/between-subject associations, repeated measures, cross-sectional regression, generalized estimating equations, mixed models, working correlation structure
1. Introduction
We focus on two common misconceptions that are made in research while fitting repeated measures regression with generalized estimating equations (GEE) and mixed models (MM). Misconception-A: The association between the predictor variable and the outcome across different measures from the same subject (within-subject) is the same as the association of that variable with the outcome between measures from different subjects (between-subject). In fact, these associations often differ, which should be considered when making causal inference. For example, consider weight as the predictor and cholesterol the outcome given the well-known association of higher serum cholesterol and with greater body fat: (i) a 10 lb. increase in the same adult more likely indicates greater difference in serum cholesterol than does (ii) one adult being 10 lbs heavier than a second adult. A 10-pound weight gain in the same adult more likely reflects a build-up of body fat in that person, while the first adult being 10 pounds heavier than the second could be influenced by other factors such as the first adult being taller than the second. Misconception-B: The working correlation structure used in GEE and MM models is only a nuisance factor that impacts precision of model parameter estimates. As illustrated and explained in the next Section (and Table 1), the wrong choice for working correlation structure biases parameter estimates.
Table 1.
Variable | Working Correlation Structure | |||||
---|---|---|---|---|---|---|
Independence | Equicorrelation 2 | |||||
Point Estimate | 95% CI | Z-Value (p) | Point Estimate | 95% CI | Z-Value (p) | |
HIV Infection (βHIV,CS) |
−2.04 | (−5.07, 0.98) | −1.32 (<0.19) | −3.96 | (−6.90, −1.03) | −2.65 (0.0081) |
Albumin Per g/dL (βALB,CS) |
−6.21 | (−8.95, −3.47) | −4.44 (<0.0001) | −9.84 | (−12.01, −7.68) | −8.93 (<0.0001) |
BUN Per mg/dL (βBUN, CS) |
−1.87 | (−2.12, −1.62) | −14.45 (<0.0001) | −1.22 | (−1.46, −0.99) | −10.30 (<0.0001) |
Quasi-Likelihood Information Criteria (QIC) | 10,847.14 | 10,836.27 |
1 Mixed models gave essentially similar point estimates; see Appendix A. 2 Interclass correlation of residuals from GEE-E was 0.45 indicating non-independence correlation was structurally correct.
Both of these misconceptions are related, but the analytical details are complicated. To explore this further, Section 2 begins with an illustration of Misconception-B in real data. Section 2 also explains how this relates to Misconception-A and why independence working correlation must be used for creation of predictive models using “cross-sectional regression” on repeated measures. Then, Section 3 details how separation into within- and between-subject associations is needed for using repeated measures regression to makes causal inference. Section 4 describes epidemiological mechanisms that can cause within- and between-subject slopes to differ. Section 6 summarizes and explores further implications for statistical practice in applied research.
2. Cross-Sectional and Between/Within-Subject Linear Models with Repeated Measures
We begin here with some notation. Consider repeated measures on n subjects denoted by i = 1,2, …, n. The “subjects” can either be persons with longitudinal repeated measures, or, as is common in environmental epidemiology, can be cities, schools, neighborhoods, census tracks, hospitals, etc. Each subject has Ji different observations enumerated by j = 1, …, Ji. For example, these Ji different observations could be taken at times ti1 < ti2 <… < tiJi, on the same person when the “subject” is a person or from Ji different persons living in the same neighborhood when the “subject” is a neighborhood. For Ji constant across i, (i.e., always the same number of repeat measures for a subject), we drop the “i” subscript and denote J. Let us consider that the observations have continuous outcomes Yij and K predictor (or exposure) variables . When K = 1, we drop the “K” enumeration, using Xij for the only predictor. Linear regression models for E[Yij| ] or E[Yij|Xij] are fit in the analyses described here. However, the overall conclusions we obtain on these linear regression models can be generalized to discrete outcomes (i.e., logistic regression) and survival analyses.
2A Cross-Sectional (CS) Regression. The most commonly fitted linear regression model on repeated measures does not separate within- and between-subject associations and is usually written out as Yij = α + β1X1,ij + β2X2,ij + … + βKXK,ij + εij. This is denoted as “cross-sectional (CS) regression” particularly for longitudinal repeated measures. We add a subscripted “CS” to the β’s to distinguish these slopes from between-subject (BS) and within-subject (WS) slopes defined in Section 2B. The CS regression model here is thus denoted as
(1) |
where are parameters (fixed effects), while εij is error with E[εij] = 0 that is independent between different subjects i and i’, but may be correlated for j ≠ j’ within the same subject. It should be noted that the intercept is fixed at the same for each subject. Should the actual intercepts differ between subjects (i.e., be ) as random intercepts, then for both MM and GEE, the difference is incorporated into the error term εij of (1) and the within-subject correlation of that error [1]. Using (vs. not using) random intercepts does not influence the point estimates of or the variance of these estimates for mixed models [1]. However, for GEE using a different intercept on each subject (with each intercept now adding a new parameter) creates too many parameters for the asymptotic properties of GEE model to hold in our examples (and in general) which destabilizes parameter estimates [2].
Again for K = 1, the subscript for K is dropped and the model is + εij. The main goal of CS regression is to first obtain estimates for and then input into (1) in order to estimate future unobserved Y’s from observed ’s as . Cross-sectional regression is also used to make adjusted (causal) inference on the covariate associations in , but, as we show later, doing this may be problematic.
Table 1 presents parameter estimates from repeated measure cross-sectional regression (1) to a clinical measure of glomerular filtration rate (EGFR) from the Modification of Diet in Renal Disease Study (MDRD) [3]. Formula (1) with EGFR as the outcome Y and three predictor variables (X1, X2, X3) = (HIV infection, serum albumin, blood urea nitrogen (BUN)) was fit to 10,782 semi-annual measures of 584 women at the Bronx-site of the Women’s Interagency HIV Study (WIHS) [4]. Higher EGFR values indicate better renal function. The models assume that the within- and between-subject associations of the predictor variables are the same. We later show this assumption is incorrect. The parameter estimates of Table 1 were calculated using GEE [1] with both independence (GEE-IND) in columns 2–4 and equicorrelation (GEE-E) columns 5–7 for the working correlation structure of model residuals from repeated measures in the same person. We again note that this model (1) has a fixed intercept across all subjects with the error term being independent between different subjects. However, otherwise in Table 1 (and elsewhere in the paper) the within subject correlation structure of the error is allowed to be either (i) independent within the same subject (GEE-IND) or (ii) to have the same correlation for all outcomes within the same subject (GEE-E). The second condition (i.e., equicorrelation) is equivalent to fitting a random subject intercept model [1].
Most of today’s literature providing guidance on fitting repeated measures linear regression (i.e., [5,6,7,8,9,10,11,12,13]) qualitatively describes working correlation as a “nuisance factor” that does not alter model parameters and states that “the working correlation that minimizes variance of parameter estimates should be chosen”. However, in Table 1, the parameter estimates for BUN (per g/dL), from GEE-E, of −1.22; 95% confidence interval (CI) (−1.46, −0.99) is both qualitatively and statistically higher than the corresponding GEE-IND estimate of −1.87; 95% CI (−2.12, −1.62). For HIV, the parameter estimates of −3.86, p = 0.0081 from GEE-E is qualitatively lower than that from GEE-IND −2.04 and p = 0.19. Clearly, changing the working correlation from independence to equicorrelation qualitatively and statistically changes the parameter estimates. Thus, this correlation structure is not a nuisance factor.
When faced with such a dilemma of qualitatively and statistically different parameter estimates from the same model fit to the same data with only the working correlation structure changed (as is shown in Table 1), investigators typically go to published guidance on which correlation structure to use. To that end, based on the within-subject correlation of residuals being 0.45 in GEE-E (and in MM-E), and the quasi-likelihood independence criteria goodness of fit statistic (QIC) = 10,836.27 for GEE-E being smaller than the QIC = 10,847.14 for GEE-IND (or the Akaike information criteria goodness of fit statistic (AIC) from a mixed model using equicorrelation (MM-E) of (AIC = 94,934.5) being smaller than AIC = 99,374.5 from a mixed model using independence (MM-IND) as shown in Table A1 in Appendix A), almost all articles providing model fitting guidance [5,6,7,8,9,10,11,12,13] point towards using equicorrelation as the working correlation structure. However, as the rest of Section 2 describes in detail, this guidance is problematic as only the parameter estimates obtained by using independence working correlation can have any meaning for cross-sectional regression.
But first we make two brief asides. First, we note that if MM, rather than GEE are used for Table 1, the corresponding parameter point estimates in Table 1 using independence correlation (MM-IND) and equicorrelation (MM-E) are essentially unchanged [1]. (See Appendix A for details on parameter estimates from MM fit to this data with independence and equicorrelation correlations structures). However, due to non-robustness of MM, GEE is preferable for this specific example. Second, we note that the differences observed in Table 1 occur not only between independence and equicorrelation. Any different choice of correlation structure, such as AR(1), Toeplitz, unstructured, etc. will result in different parameter estimates (results not shown). For simplicity, we focus this article on only two structures: independence and equicorrelation.
2B Between-/Within-Subject Slope (BS/WS) Regression. While investigators almost never consider this in practice, it has long been noted that slopes on changes of Xij within the same subject i differ from cross-sectional slopes on between subject-measure differences in Xij [14,15,16,17]. To illustrate this, consider the cross-sectional model of a laboratory measure cholesterol (Yij), which is well known to be higher in people with more body fat. To that end, the predictor is body weight (Xij) with E[Yij] = . As described in the Introduction, the cross-sectional slope βCS for association of a 10 lbs. weight difference between two different adults for cholesterol is less than the slope for association of a 10 lbs. within-subject weight change for the same adult on cholesterol, which we denote as βWS. Again, the reason βCS is less than βWS is that: (i) a 10 lb. cross-sectional weight difference between two adults often reflects greater height in one of the persons, but (ii) a 10 lbs. weight increase in the same adult is not influenced by height difference and thus is more likely due to more body fat after the 10-lbs weight gain. Thus, since greater body fat is what is directly associated with more cholesterol, the within-person association of a 10-lb. weight increase with cholesterol is greater than the cross-sectional repeated measures association with a 10-lb. weight difference between two persons.
Common within-person height creates a shared within-subject measurement bias from this extraneous factor for subject i (denoted Ei) on weight as a predictor of cholesterol. To that end, many investigators adjust weight for height using body mass index = wt/ht2 to remove this effect of height on weight. As Figure 1a illustrates, if TXij = body mass index (wt/ht2) were the true predictor of Yij, and Hi = height (which does not change with j in the same i), then Xij = TXij * (Hi)2 contains this shared within-subject measurement bias from common Hi which again we denote as Ei in Table 1a to confer it is an extraneous within-subject bias. Section 4 describes more settings where .
While for weight it is possible to remove the common shared within-subject bias from height by dividing by ht2, this is not the case for less well-understood causal relationships. Therefore, to model and account for a bias such as this, linear regression models fit for making causal inference can decompose the associations into “within-subject” slopes (), described above, and “between-subject” slopes (), described below, which capture associations of subjects’ central tendencies of the exposure. To do this, subject means of the predictor variables are calculated, where . Then Yij is modeled as a combination of “between-subject” slopes from (that could be influenced by the common person measurement bias in Figure 1) and “within-subject” slopes from deviations of Xk,ij about which will be free of such a bias, since the comparison is within person.
(2) |
As described for (1), this is a fixed intercept model that is functionally equivalent to a random intercept model for MM. When K = 1, we have . To illustrate this for our earlier example with Yij = cholesterol and Xij = weight, let , βBS = 0.9 and βWS = 3, such that . If person i had an average value of = 210 across all Ji measures with the jth measure being Xij = 200, then for the person-visit at time tij, E[Yij] = 30 + 0.9(210) + 3(200-210) = 189.
Now we make some technical asides. First, the choice of the observed as the “central tendency” of Xk,ij for subject i is necessary as a person’s “true average weight” over the entire time period is unknown, but for Ji large enough, should be close to . Thus, while βk,WS only captures association with within-subject change in Xk,ij, βk,BS inherently contains some βk,WS from deviation of (); especially for small Ji. This situation is described for occupational epidemiology research, where often an average of personal exposure measurements is computed as estimate of true exposure of a “subject”, defined as either an individual, or group of individuals that share a job [18]. Second, the implicit assumption that βk,WS is well defined may also not always be true. For example, “βk,WS” could differ by time separation tij – tij’. Perhaps for k = weight, a weight gain of 10 lbs. in one month creates a shock that hyper-elevates cholesterol, but a 10 lbs. weight gain over 12 months does not, in which case . Third, if the investigator is only interested in the within-subject slopes he/she can substitute as a fixed effect a different subject intercept for the between-subject slopes in (2) with the model reducing to .
Despite these technical caveats, the within- vs. between-subject decomposition in (2) is used to test whether so that, as shown in Section 2C, they also equal and thus the separated WS vs. BS decomposition can be collapsed to (1). Due to the orthogonal decomposition of Xk,ij about this previous test for collapsing the within- vs. between-subject decomposition is a two-sample z-test of parameter estimates from fitted models comparing to Z1-α/2 [17]. The within- vs. between-subject decomposition is mostly used for inference on adjusted (causal) associations of the Xk,ij’s on Yij’s. It is typically not used to produce models to estimate future unknown Yij from known as such estimation often only happens in settings where just one observation per subject is available, hence .
We refit the analyses of Table 1 to illustrate that the impact of choice of correlation structure (i.e., GEE-IND vs. GEE-E working correlation structure) is eliminated in our example after making a within- vs. between-subject decomposition. Please note that there were no new HIV infections after study entry; so meaning that the within-subject association of change of HIV infection status cannot be modeled. For within-subject associations of BUN and albumin, GEE-IND and GEE-E gave identical point estimates, because centering about makes comparisons entirely within-subject and invariant to these correlation structure choices (although within-subject estimates could differ slightly if autoregressive (AR (1)) or other formulations for intra-subject correlation of residuals had been used). There were small GEE-IND vs. GEE-E differences on the between-subject slopes as was observed elsewhere [19]. For example, the point estimate for between-subject HIV status is −1.16; 95% CI (−4.21, 1.88) in the GEE-IND of Table 2 versus −1.57; 95% CI (−4.47, 1.33) with GEE-E.
Table 2.
Variable | Working Correlation Structure | ||||||
---|---|---|---|---|---|---|---|
Compartment | Independence | Equicorrelation | |||||
Point Estimate | 95% CI | Z-Value (p) | Point Estimate | 95% CI | Z-Value (p) | ||
HIV Infection | Between-subject (βHIV, BS) |
−1.16 | (−4.21, 1.88) |
−0.75 (0.45) |
−1.57 | (−4.47, 1.33) |
−1.06 (0.29) |
NA 2 | --- | --- | --- | NA 2 | --- | --- | |
Albumin Per g/dL |
Between-subject (βALB, BS) |
−3.27 | (−7.88, 1.33) |
−1.39 (0.16) |
−2.71 | (−7.00, 1.57) |
−1.24 (0.21) |
Within-subject (βALB, WS) |
−10.70 | (−12.99, −8.40) |
−9.16 (<0.0001) | −10.70 | (−12.99, −8.40) |
−9.16 (<0.0001) | |
BUN Per mg/dL | Between-subject (βBUN, BS) |
−2.72 | (−3.10, −2.33) |
−13.89 (<0.0001) |
−2.65 | (−3.01, −2.08) |
−14.21 (<0.0001) |
Within-subject (βBUN, WS) |
−1.11 | (−1.34, −0.88) |
−9.31 (<0.0001) | −1.11 | (−1.34, −0.88) |
−9.31 (<0.0001) | |
Quasi-Likelihood Information Criteria (QIC) |
10,866.64 | 10,857.62 |
1 Mixed models gave essentially similar point estimates. See Appendix A 2 There is no within-subject variation for HIV infection status.
From now on, we only examine GEE-IND results for within- between-subject decomposition models, as GEE-E results are similar. For BUN and GEE-IND, the within-subject = −1.11, 95% CI (−1.34, −0.88) is qualitatively and statistically closer to 0 than is the corresponding between-subject slope = −2.72, 95% CI (−3.10, −2.33). However, serum albumin goes the other way: the within-subject slope = −10.70, 95% CI (−12.99, −8.40) is statistically further from 0 than is the corresponding between-subject GEE-IND = −3.27 with a 95% CI (−7.88, 1.33) that overlaps 0. The QIC is lower (10,857.62) for equicorrelation than for independence (10,866.64) which perhaps now indicates an advantage to the former correlation structure in this setting where the slopes have been correctly decomposed.
One might wonder how to interpret differences in the within- and between-subject slopes for causal inference, including the reasons that these slopes were different? This in part will depend on the hypotheses of interest (and we did not have any for this illustrative example). However, general rules also apply, although we are unaware of any systematic exploration of reasons why the between-subject slopes (or βBS for K = 1) could differ from within-subject slopes (or βWS for K = 1). and the resultant implications for causal inference. Before outlining these rules, it is important to note an important relationship among cross-sectional, within-subject and between-subject slopes.
2C Relationship between , and . Now averages and according to relative variances of the subject means (i.e., the ) vs. the variance of the repeated measures about those sample means (i.e., the ) [17]. For example, with K = 1, if is the population variance of the within-person mean and is the population variance of the deviations of differences of the repeat measures Xij from their , then
(3) |
In the previous example of weight and cholesterol with = 0.9, = 3 and , if and , then from (3) = 0.9*400/(100+400)+3*100/(100+400) = 1.32. If the between-person sample means are more homogeneous in weight with but the within- person is still 100, then again using (3) moves closer to ; = 0.9*200/(100+200)+3*100/(100+200) = 1.60.
2D Working Correlation Structures for Model Residuals Other than Independence Can Lead to Unusable Results for Cross-Sectional Regression. As noted earlier, fitting both MM and GEE repeated measure regression models involves specification of correlation (or working correlation) structure of εij within the same subject i. We denote the working correlation structure by matrix Vi. Typical choices for Vi are the ones we used in the illustrative examples of Table 1 and Table 2; equicorrelation (E), with correlation of εij and εij’ for j ≠ j’ always the same value ρ (this common value of ρ is estimated in the model fitting process based on the residuals in the model fitting process), and independence (IND), with correlation of εij and εij’ ≡ 0. However, other structures are used such as AR(1) where correlation of εij and εij’ is ρ|j-j’| with the value of ρ being estimated from the residuals [1]. Again, current guidance [5,6,7,8,9,10,11,12,13] emphasizes choosing the Vi that most closely fits the true covariance structure of the residuals within i and/or by model fit criteria such as having lowest QIC for GEE and AIC for MM, because doing so often improves precision of the model parameter estimates. However, we just observed that this approach may be wrong for CS regression, because using any correlation structure other than IND can introduce structural bias into [20,21] and, unfortunately, AIC and QIC do not account for this bias.
To that end, Pepe and Anderson (1994) [20], developed a general rule for when IND is (and is not) the only correlation structure that should be used for CS regression that we now present. Specifically, they show that if a predictor varies (i.e., takes on different values) within the same subject i and,
(4) |
then, no matter what true correlation structure of εij among repeated measures within a subject is, GEE-IND gives unbiased estimates for , but any MM or GEE model not using Vi = IND, gives biased estimates of . Thus, the only working correlation structure that should be used to estimate is Vi = IND. However, if (4) does not hold, then any working correlation structure obtains unbiased estimates for in which case, choosing the Vi that most accurately fits the correlation structure of εij minimizes the variance of .
Our paper only focuses on equicorrelation as the alternate to independence in order to keep the presentation from becoming too cumbersome, given the large number of possible correlation structures. However, the previous paragraph and (4) apply to any non-independence correlation structure.
As one (of many) examples of where (4) holds, let k = 1 and Yij and Xij be the degree of airway obstruction and inhalation of tobacco smoke of subject i at time j, respectively. One would expect that, because smoking effect on the lung is cumulative, historical smoking in a current smoker or non-smoker would lead to poorer lung function. Thus, E[Yij|Xij’] for a smoker at time j’ < j would poorer irrespective of Xij.
We now present an easier way to visualize (4). If repeated measures j and j’ are thought of as “siblings” and the predictors as “exposures” then (4) means that even after considering the “self-exposure” of the current measure j through the outcome Y has “Conditional Dependence On Sibling Exposures” (Co-DOSE) (i.e., on ). Thus, the sibling exposure could be thought of as a Co-DOSE beyond the “dose” from the “self-exposure”. Hence, from now on we use the term Co-DOSE to denote that (4) occurs.
Also, while this point has not been very well made, for CS regression, Co-DOSE in (4) largely occurs if and only if within- and between-subject slopes differ. If within- and between-subject slopes differ for any predictor (i.e., ) then Co-DOSE (4) happens. However, if the within- and between-subject slopes are equal for all predictors (i.e., ) then Co-DOSE (4) does not occur. More details on this and an illustration are given in Appendix B, but one trivial case arises if the predictors are invariant within the same subject (i.e., ) such that the within-subject slopes are not defined (since ) and for the same reason Co-DOSE in (4) cannot occur. While the mathematical details are beyond this paper, if and Vi = IND, then non-zero covariance ρij > 0 besides adjusting for within-i collinearity of εij also over-weights the relative to in (3), thereby pushing CS regression parameter estimates away from towards [17]. Since robust covariance methods exist to adjust for impact of misspecification of Vi = IND from collinearity of the residuals εij’s on variance estimates, in particular for GEE [1], Vi = IND can eliminate bias in estimating while providing conservative variances for the parameter estimates.
2E Implications for Applied Research and Statistical Practice. Much of what has been presented above is not commonly understood and implemented in applied research and statistical practice. CS models are typically fit, with interpreted to also be and , without checking if these slopes are equal. Non-independenceVi is often used for CS regression without checking if Co-DOSE (in (4)) exists. Perhaps in part this occurs because systematic epidemiological descriptions of causal mechanisms for why between- and within-subject slopes can differ are lacking, which hinders awareness of this possibility. We endeavor to fill this gap in Section 3.
3. Epidemiological Reasons for Between- and Within-Subject Slopes to Differ
To make it easier for investigators to identify what could cause βk,WS ≠ βk,BS (or equivalently Co-DOSE) in a given setting, we classify major reasons why this can happen. For simplicity, let K = 1 unless otherwise noted, as the following principles extend to multivariate settings.
3A. Change Effects. We propose that the effect of a longitudinal within-subject change in the predictor X could have a greater (or less) direct impact on Y than a long-term standing difference in X between two different subjects (hence βWS ≠ βBS) and define this as a (c.f. short term) “change effect”. Returning to the example of weight and cholesterol, consider two identical twins, “A” has lived his adult life at = 190 lbs. and “B” at = 180 lbs. If “B” undergoes a short-term weight gain of 10 lbs. to 190 ( = 10), assuming not impacted by the rapid change, while A remains at 190 lbs. ( = 0), the shock or corollaries of this rapid change in B may raise his cholesterol level above that of A’s even though they both now weigh 190 lbs., meaning that βWS > βBS and Co-DOSE in (4) occurs. However, it should be noted that as was mentioned in Section 2B, in this setting, βWS would be somewhat undefined if, e.g., a 10 lbs. gain in a shorter time period (i.e., 1 month) increases βWS more than does a 10 lbs. gain over a longer time period (i.e., 12 months).
3B Lag Causality of X on Future Y. The effect of historical levels of X on Y may independently project into the future (i.e., beyond that effect of the current level of X). For example, consider an HIV-infected person and two time points t1 < t2; let Xij be HIV viral load and Yij be CD4 count. High HIV levels destroy CD4 blood cells into the future. Therefore, as illustrated in Figure 2a, high HIV viral load at t1 may affect CD4 loss from t1 to t2 so that even if the person’s HIV viral load is low at t2, the high viral load at t1 is predictive of lower CD4 at t2 through higher viral load at t1 having created more CD4 destruction between t1 and t2 (i.e., lag causality of X at t1 on Y at t2). Thus, Yi2|Xi2 at t2 is not independent of Xi1 at t1; Co-DOSE in (4) occurs and the within- and between- subject slopes differ (βWS ≠ βBS). In Figure 2a,b, Ei2 denotes that Xi2 differs from Xi1 due to an extraneous process that is causing Xi to change over time. Lag causality is often considered when serial measures of X represent long-term environmental exposures (such as air pollution and cigarette smoke) that effect chronic conditions Y (such as lung function) are obtained [1,18,22].
3C Reverse-Lag Causality of X on Future Y. The setting in Section 3B also manifests in the opposite direction if X is being used as to estimate Y that is causal for future X. Reversing the previous example with X now being CD4 used to predict HIV viral load as Y, as Figure 2b illustrates, high viral load (Yi1) at t1 may have degraded the CD4 count from t1 to t2. Thus, Yi1|Xi1 at t1 is not independent of Xi2 at t2: Co-DOSE in (4) occurs and within- and between-subject slopes differ (βWS ≠ βBS).
3D Spillover Causality of X on Adjacent Y. An analogous setting to those of 3B and 3C can also manifest in repeated measure cross-sectional settings based on geographical proximities. Let the subjects i now be cities and j enumerate different neighborhoods in these cities. The repeated measures are average air pollution (Xij) of neighborhood j in city i and average lung function of all residents living within neighborhood j of city i (Yij). A resident living in neighborhood j may work in a different neighborhood j’ of the same city and thus have “spillover exposure” to air in the neighborhood they work in, for a given city i, thus Yij|Xij is not independent of Xij’ and hence Co-DOSE in (4) occurs.
3E Common Within-Subject Measurement Bias. Shared within-subject measurement bias occurs if all repeat measures from the same subject have the same correlated measurement bias. This was the setting described in Section 2B and Figure 1a with weight as exposure for cholesterol. Here with weight as a surrogate for body fat, the measurement bias was mediated by height with taller adults being heavier independently of body fat than shorter adults, which leads to βWS > βCS and Co-DOSE in (4) when weight was a predictor of cholesterol. In this setting, height is a measurement bias not a confounder as height itself is not associated with cholesterol. We now present a similar setting where the un-modeled variable is a confounder.
3F Common Within-Subject Confounding. Figure 1b shows common within-subject confounding, that causes βWS ≠ βCS and Co-DOSE in (4). This phenomenon is diagrammatically similar to common measurement bias that was described in Section 2B. However, rather than a common measurement bias, the extraneous factor, shared by the repeated measures of the same subject, is a confounder that is associated with both X and Y. For example, let the confounder variable Ci be sex of subject i (which does not change with j) not be in the model and the outcome Yij be a linear score for male pattern baldness at time j with again Xij being weight at time j. Adult men are both on average heavier and, independently of weight, have greater male pattern baldness than do adult women. So Ci is associated with both the exposure and the outcome. Here a 10 lbs. weight difference in two adults, but not a within-adult increase of 10 lbs., could be informative of the heavier adult more likely being male. Hence for this example, βWS = 0 (assuming within adult weight does not influence baldness), but βCS > 0 (and thus βBS > 0) as males are more likely to be both heavier and bald compared to women. Hence also βCS > 0, reflecting unaddressed between-subject confounding from heavier adults more likely being men.
Similarly, Mancl, Leroux and DeRouen proposed that in a study with repeated dental predictor and outcome pairs as (Xij,Yij) measured on teeth (i.e., enumerated by j) on the same persons (i.e., enumerated by i) that better compliance with dental treatment by some persons was a confounder that could lead to differences in slopes within and between subjects [19]. In a non-longitudinal setting where i denotes clusters (for example schools) and j denotes repeated subjects within that cluster (for example students), common within-subject confounding is referred to as “contextual effects” [23,24]. For example, as Robinson (1950) [14] observed, when X was race of the student and Y was achievement-score, a higher (here: portion of a school’s students that were non-White) indicated weaker financial support for that school (weaker financial support being the confounder) and thus worse achievement-scores overall for that school: βBS was negative. However, within the same school, race had no impact on the achievement score (βWS = 0). Begg and Parides [25] identify a similar setting in birthweight and intelligence quotient in families.
3G Measurement Error in Xij Makes E[Yij|Xij] Dependent on Xij’ In many settings, the predictor we observe is X = TX + M where TX is the true value of the predictor and M is measurement error that is independent of TX (i.e., classical measurement error). It has been shown that, measurement error in X that is either independent of [26], or correlated with Y [27], biases estimates for the slope that relates TX with Y. Measurement error can arise either from imprecision in an analysis instrument, such as in a machine quantifying components of serum, or in data collection process, such as the chemical composition of blood samples being non-informatively influenced by diurnal and other nuisance processes. If Xij is incorrectly quantified due to such measurement error, then Co-DOSE in (4) occurs and the observed within- and between-subject slopes differ, because, as illustrated in Appendix C, the biases being created from the measurement error distribute differentially to different slopes. As Figure 3 shows and the paragraph below it describes using an illustrative example, if Xi1 incompletely measures the true state TXi1 (i.e., true BUN) due to classical measurement error as the extraneous influence then Xi2, is informative for TXi1 even after considering Xi1. Please note that in Figure 3 there are two times subscripts on the extraneous influence, because Ei1 and Ei2 are two independent measure errors.
For example, going back to the analysis of Table 1, let Xij be BUN and Yij be EGFR. Consider two persons who have BUN of Xi1 = 10 mg/dL measured with error today. Also assume that the true BUN state changes slowly. If so, and after 6 months one of these persons measures Xi2 = 20 mg/dL while the other measures Xi2 = 5 mg/dL, we can then surmise that since BUN changes slowly, it is more likely that the true BUN today (TXi1) of the former person is > 10 mg/dL and that of the latter is < 10 mg/dL. Thus, since (i) EGFR (Yi1) directly depends on TXi1 not Xi1, and (ii) Xi2 is informative on TXi1 after considering Xi1, then (iii) Yi1|Xi1 is not independent of Xi2 and similarly Yi2|Xi2 not independent of Xi1 meaning Co-DOSE in (4) occurs and the observed within- between-subject slopes differ. Appendix C shows that measurement error in the exposure that is independent of the outcome pushes both βWS and βBS towards 0, but more so for βWS. Such tempering from averaged measurement error has been proposed as a reason |βWS| < |βBS| was observed in dental research [19] and occupational epidemiology [28,29].
However, if Mij is correlated with Yij (most likely being correlated with measurement error on Yij [27]) the tempering of β’s from Mij will not be to 0. For example, consider TX = CD8 and TY = CD4 cells which together are the almost exclusive components of serum lymphocytes (TZ) (i.e., . Physiologically, TZ is constrained to create a negative βBS, βWS and βCS for TYij on TXij: subjects with a higher CD8 component of serum lymphocytes by converse must a have lower CD4 components. However, the measured lymphocyte count (Z) is subject to a correlated measurement error that equally spreads onto X and Y. For example, if a person is dehydrated, the entire measured lymphocyte (meaning both CD8 = X and CD4 = Y) portion of blood becomes artificially higher due to reduction of the percentage of water in the blood. If a person has a high (or low) measured lymphocyte count Zij = TZij + Mij due to such measurement error, then Mij contributes to both CD4 (Xij) and CD8 (Yij), making both simultaneously artificially higher (or lower). Consequently, within person, a higher measured CD4 count due to positive Mij is associated with higher measured CD8. Because in this case the measurement error is shared, naïve regression analysis tends to draw βWS towards being positive. On the other hand, βBS, which tempers down Mij on both X and Y through averaging as shown in Appendix C, is less affected by the shared bias due to measurement error.
We have only considered classical measurement error so far. The other common type of measurement error is known as Berkson error [30]. It is approximated by some exposure assessment procedures commonly used in environmental and occupational epidemiology (see semi-ecological design and group-based exposure assessment) [18]. While this is an aside to the main points of this paper, when Berkson measurement error exists, only the between-subject slope, βBS, is estimable. More details are in Appendix D.
4. Predictors Having Co-DOSE Will Bias Adjusted Parameter Estimates of Other Predictors Not Having Co-DOSE When Included Together in Cross-Sectional Regression When Vi ≠ IND Is Used
Going back to Table 1, it was shown earlier that the point estimate from GEE-IND for the adjusted cross-sectional association of HIV with EGFR is still consistent for . However, HIV infection status was constant over all replicates within the same subject, and therefore cannot have Co-DOSE in (4) as the entire effect of HIV is mediated between-subject, not within-subject. Consequently, the question arises whether the adjusted estimate from a non-independence correlation structure (say for example ) can be biased for . Please note that for this section, we use and to denote estimates for adjusted cross-sectional association for variable XXX from models using independence and equicorrelation structures, respectively. The added designation of “E” (CS-E) in the subscript for equicorrelation, but none for independence correlation, is made because the equicorrelation estimate (but not the independence estimate) can be asymptotically biased. The specific question addressed here is: could including BUN and albumin that each have Co-DOSE in the model bias the corresponding estimate for cross-sectional adjusted HIV association from using equicorrelation () so that it no longer is consistent for βHIV,CS in the multivariate model, even though HIV itself is not Co-DOSE? This is important, because in Table 1, of −2.04 95% CI (−5.07 0.98) qualitatively differs from of −3.96 (−6.90, −1.03) with only statistically (p < 0.01) differing from 0.
We believe that for HIV in Table 1 is biased away from βHIV,CS. To help make this point, Table 3 presents normative data broken down by HIV status of the subjects. First we note from Table 1 that is biased higher for (with GEE-E = −1.22 > = −1.87, p < 0.0001 from GEE-IND), while from Table 3, those who are HIV+ have higher mean BUN (12.94 vs. 12.10, p < 0.0001 from GEE-IND). Thus, the full apparent “negative effect” of the higher BUN in HIV+ subjects from is underestimated by and this pushes down to compensate. Second, similarly, also from Table 1, is biased lower for βALB,CS (with GEE-E = −9.84 < = −6.21), while from Table 3, HIV+ individuals have lower mean albumin (3.97 vs. 4.14, p < 0.0001 from GEE-IND). Thus, the apparent “positive effect” of the lower albumin in HIV+ subjects from is overestimated by , which pushes further down to compensate. Now we consider these two biases together as illustrated in Figure 4. These two deficits act jointly to push downwards from the true adjusted βHIV,CS. Therefore, non-independence Vi can bias multivariate cross-sectional parameter estimates of variables that do not carry Co-DOSE in (4) when other variables in the model carry Co-DOSE.
Table 3.
Variable | For HIV + Subjects (496 persons 7326 Replicates) | For HIV - Subjects (178 persons 3456 Replicates) |
---|---|---|
EGFR | 90.3 ± 27.2 | 92.4 ± 25.0 |
BUN | 12.94 ± 5.71 | 12.10 ± 5.30 |
Albumin | 3.97 ± 0.44 | 4.14 ± 0.36 |
5. Discussion
Numerous published papers fit GEE and MM cross-sectional regression models with repeated measures having time varying predictors that either use non-independence working correlations structures or do not state the correlation structure. These papers, which continue to be published, do not show awareness of the points presented in Section 1, Section 2, Section 3 and Section 4, above. Specifically, they:
-
(a)
Neither specify whether the coefficients of interest are , or, nor check whether ;
-
(b)
Make potentially invalid interpretations of from MM and GEE using non-independence correlation Vi’s; and/or;
-
(c)
Do not justify the choice of non-independence working correlation structures Vi in light of potential differences between , and .
We have identified almost 45 such papers including some authored by us prior to becoming aware of these issues. This is almost certainly only a fraction of the total number of such papers.
Yet papers published up to 65 years ago either warn against using non-independence working correlation structure in cross-sectional regression with repeated measures [19,20], or instruct to decompose the associations into within-subject () and between-subject () slopes to make causal inference [14,15,16,17]. Numerous examples where have been presented [14,15,16,17,18,19,20,22,23,24,25]. While it was not covered in our paper, this includes fitting GEE models of binary outcomes where the issues discussed here also apply [19,31]. However, these points are still not well known or emphasized in statistical software documentation and papers providing guidance on GEE and MM analyses (i.e., [5,6,7,8,9,10,11,12,13]).
One problem that impedes acceptance of within- and between-subject decomposition is that it necessitates much more complicated models that are difficult to explain. Still, some air pollution epidemiologic studies have attempted within- and between-subject decompositions using cities as the subject and neighborhoods as the repeated measures within the city [32,33,34]. Most often in these studies, the magnitude was greater for within-subject slope |βWS| > |βBS| but sometimes |βBS| > |βWS| was observed meaning that possibly multiple causes for slope differences are involved. Those papers that did attempt to explain the reasons for the differences described only “common within-subject confounding” (Section 3E) as a potential reason; such as un-modeled pollutants that were correlated between (but not within) cities with the modeled pollutants of interest. Other studies in environmental research have considered the mechanism described in Section 3B, namely, lag causality in longitudinal analyses of association of air pollution on health measures [1]. Nevertheless, having to explain complicated and unknown mechanisms for biases such as these can appear to detract from the main purpose of the research and cast doubt on the overall findings, making the paper harder to publish. In other words, there appears to be neither incentive, nor guidance on how to engage with these issues for applied researchers.
We concur with others [19,20], that cross-sectional regression with repeated measures should use independence as the default working correlation unless justification is given to use other Vi. While non-independence Vi can improve precision and thus be desirable [21], they can considerably bias estimates for cross-sectional parameters, , including perhaps towards what the investigator wants to see. For example, in Table 1, p < 0.01 was observed for association with HIV with worse EGFR in GEE-E compared to the more appropriate p = 0.19 from GEE-IND. An investigator who was expecting HIV to be associated with worse EGFR might thus be tempted to use the results from GEE-E for this reason.
While showing this is beyond the scope of our paper, when Vi is not independence, factors such as the values of Ji and magnitude/structure of εij strongly influence parameter estimate values for from the miss-fitted cross-sectional models, allowing the miss-fitted estimate to arbitrarily range from to [17]. Standardization is important and, as such factors will arbitrarily vary between studies, parameter estimates of become harder to compare across studies when Vi differs at discretion of investigators. Therefore, the working correlation structure used in cross-sectional regressions using repeated measures should always be justified and reported.
We also concur with others [14,15,16,17,18,19,23,24,25] that despite the difficulties in identifying why within- and between-subject slopes differ, causal inference analyses with repeated measures should initially make such decompositions. Investigators should then be wary if there are qualitative differences between and . For example, Table 2 with 584 subjects and 10,782 measurements demonstrated need for , decomposition to make causal inference (as well as for using GEE-IND in cross-sectional regression). However, a smaller study could have been less clear-cut. If the same point estimates for and seen in Table 2 were observed but did not statistically differ, one would be tempted to merge and into a combined at least for some variables, because standard model-fitting practice promotes parsimony when statistical significance is not observed. This would be particularly true if for a given variable, k, neither nor statistically differed from 0, but did. If such collapsing is done, it may still be important to report and for comparison to future studies and target potential mechanisms for within- between-subject slope differences as described in Section 3.
Unfortunately, the within- and between-subject slope decomposition expands required analyses and presentation. Statistical software mostly does not have standard subroutines to do this. Decomposition can be tedious if is recalculated to maintain orthogonal decomposition of Xk,ij as new models are fit if observations are excluded from the Ji due to missing values of newly included variables. The fact that the are ill-defined by averaging the Xk,ij rather than being true means for subject i creates confusion about interpretation of that can also be influenced by within-subject slopes as was noted in Section 2B.
When and differ, the causal mechanisms as to why this happens should be explored. For example, in our analysis presented in Table 2 with EGFR as the outcome, for BUN the between-subject slope (from GEE-IND) was statistically further from 0 in the expected direction of association than was the within-subject slope . However, the albumin went the other way: between-subject slope was statistically closer to 0 than was within-subject slope with again both slopes being in the expected direction from zero. So what are the potential reasons for this? While lag/reverse-lag causality (Section 3B,C) between BUN and creatinine (the main component of calculated EGFR) could reduce magnitude of βBUN,WS vs. βBUN,BS, this was unlikely given the separation of visits was 6 months and internal biochemistry operates over shorter time periods. However, independent measurement error on BUN (Section 3G) would temper |βBUN,WS| towards 0 relative to |βBUN,BS|. To that end, several articles find greater coefficient of variation [35,36], within-person change [35,36], assay error [36], and sample degradation for BUN vs. albumin measures [37], all of which could reflect BUN having larger independent measurement error than does albumin that would selectively attenuate towards 0 (i.e., more than it did to ). Conversely, serum creatinine and albumin are both constrained into the intravascular fluid compartment and will non-informatively increase together with greater hydration and decrease with less hydration of this compartment, inducing positively correlated measurement error, as in the case for measured CD4 and CD8 cells in the last paragraph of Section 3G. As creatinine factors inversely into the EGFR calculation, this would constitute negative correlation of measurement error between albumin and EGFR and selectively bias to be more negative than . However, BUN, which crosses across all body compartments, is less subject to such correlation in measurement error with creatinine and thus with EGFR.
As is illustrated in the previous paragraph, we believe that the systematic epidemiological description of reasons for within- and between subject slopes to differ in Section 3 will provide some basis for future studies to explore this. That may lead to greater recognition and understanding of this phenomenon. However, our list of reasons for these slopes to differ may not be exhaustive. Furthermore, these mechanisms are quite complicated including that limited resources may be available to investigate them in given studies given the other tasks that need to be done and limited funding/personnel.
When between- and within-subject slopes differ, , it is unclear which is the “least confounded or biased”, including the possibility that by “averaging” the different biases in each would make be the least biased. There may be a heuristic perception that by “matching within the same subject”, is superior to and , but this is not necessarily true as measurement error in X (Section 3G) and lag/reverse-lag and spillover causality (Section 3B–D) can in fact bias to a larger degree than they do for and .
6. Conclusions
It has been known for decades by some that when exposures vary within subjects in repeated measures regression then, (i) cross-sectional regression using Vi = independence working correlation should be the default for building a model to estimate a future unknown Y as the goal, and (ii) within- and between-subject decompositions of slopes should at least initially be fit when building models for causal inference. Yet this advice rarely makes it into published guidelines and hence is not heeded, perhaps in part due to complexity of the settings where within- and between-subject slopes differ and limited substantive study of the mechanisms that cause such differences. In general, analysts should explore and quantify reasons for biases that can occur in such study designs. To that end, analyses using repeated measures regression should investigate if within- and between-subject slopes differ and when they do, try to identify the reasons for this.
Acknowledgments
Data in this manuscript were collected by the Women’s Interagency HIV Study (WIHS) Collaborative Study Group at New York City/Bronx Consortium, which was funded by the National Institute of Allergy and Infectious Diseases UO1-AI-35004. We are indebted to the participants of this study, many of whom have now devoted over 15 years of their life to this effect.
Abbreviations
AIC | Akaike Information Criteria |
AR(1) | Autoregressive Order 1 |
BS | Between-Subject |
BUN | Blood Urine Nitrogen |
Co-DOSE | Conditionally Dependent On Sibling Exposure |
CS | Cross-sectional |
E | Equicorrelation |
EGFR | Estimated Glomerular Filtration Rate |
GEE | Generalize Estimation Equations |
IND | Independent |
MM | Mixed Models |
QIC | Quasi-likelihood Information criteria |
WIHS | Women’s Interagency HIV Study |
WS | Within-Subject |
Appendix A. Results of Our Example Obtained Using Mixed Models
While mixed models are not appropriate for the cross-sectional regression of this example (and often are not appropriate for cross-sectional regression using repeated measures in general), they are often used for this purpose in practice. We have made the case that the biases described in this paper applying to GEE for CS regression also apply to mixed models CS regression. Thus, Table A1 below presents the parameters for CS regression of the Bronx WIHS example in Table 1 as estimated by mixed models using independence and equicorrelation working correlation structures. To that end, the reader can confirm that the parameter estimates from the mixed models in Table A1 are almost identical to those from the GEE model with the same correlation structure in Table 1. This includes that the estimates obtained using independence correlation here are also qualitatively different than those from using equicorrelation. We caution the reader; however, that the confidence intervals and p-values reported in these tables are meaningless irrespective of the biases reported on in this paper, since, unlike GEE, mixed models are not robust to misspecification of the correlation structure. Miss-specified correlation structure for this example is clearly the case for independence although such a claim is more debatable for equicorrelation.
Table A1.
Variable | Working Correlation Structure | |||||
---|---|---|---|---|---|---|
Independence | Equicorrelation | |||||
Point Estimate | 95% CI 1 | Z-Value (p) 1 | Point Estimate | 95% CI 1 | Z-Value (p) 1 | |
HIV Infection (βHIV,CS) |
−2.04 | (−5.04, −1.04) | −4.02 (<0.0001) | −3.99 | (−7.04 −0.93) | −2.55 (0.01) |
Albumin Per g/dL (βALB,CS) |
−6.21 | (−7.30 −5.11) | −11.04 (<0.0001) | −9.89 | (−11.03, −8.73) | −16.90 (<0.0001) |
BUN Per mg/dL (βBUN,CS) |
−1.87 | (−1.95, −1.79) | −44.68 (<0.0001) | −1.22 | (−1.30, −1.13) | −29.08 (<0.0001) |
Akaike Information Criteria (AIC) | 99,374.5 | 94,934.5 |
1 The confidence interval and p-values for independence working correlation structure in particular but also arguably for equicorrelation as well overestimate the precision of the parameter estimates. Unlike GEE, mixed models are not robust to misspecification of the correlation structure.
Mixed models may be more appropriate for within- between- subject decomposition models than they are for CS regression provided the correct correlation structure of the residuals is used. Table A2 below thus presents the parameters for within- between- subject decomposition regression of the Bronx WIHS example in Table 2 as estimated by mixed models using independence and equicorrelation. As was the case with Table A1 above compared to Table 1, the reader can again confirm here that the parameter estimates from the mixed models in Table A2 are almost identical to those from the GEE model with the same correlation structure in Table 2. This also includes that the mixed model parameter estimates under independence and equicorrelation are at worst qualitatively similar and often close to identical for the two correlation structures. We caution the reader; however, that, as with Table A1, the confidence intervals and p-values in Table A2 should be interpreted cautiously, because mixed models are not robust to misspecification of correlation structure of the residuals. While independence correlation is clearly not correct (as the within subject residuals for this example had a large positive correlation) it can be argued that equicorrelation might be correct. However, looking into that is beyond the scope of this paper.
Table A2.
Variable | Compartment | Working Correlation Structure | |||||
---|---|---|---|---|---|---|---|
Independence | Equicorrelation | ||||||
Point Estimate | 95% CI 1 | Z-Value (p) 1 | Point Estimate | 95% CI 1 | Z-Value (p) 1 | ||
HIV Infection | Between-subject (βHIV,BS) |
−1.16 | (−4.21, 1.88) | −2.30 (0.02) |
−1.57 | (−4.47, 1.32) | −1.06 (0.29) |
NA 2 | --- | --- | --- | NA 2 | --- | --- | |
Albumin Per g/dL |
Between-subject (βALB,BS) |
−3.28 | (−4.78, −1.77) |
−4.26 (0.16) |
−2.72 | (−7.00, 1.57) |
−1.32 (0.19) |
Within-subject (βALB,WS) |
−10.70 | (−12.24, −9.15) |
13.60 (<0.0001) | −10.67 | (−11.86, −9.48) |
−17.57 (<0.0001) | |
BUN Per mg/dL | Between-subject (βBUN,BS) |
−2.72 | (−2.84, −2.60) |
−44.87 (<0.0001) |
−2.65 | (−2.95, −2.35) |
−17.10 (<0.0001) |
Within-subject (βBUN,WS) |
−1.11 | (−1.22, −0.99) |
−18.59 (<0.0001) | −1.11 | (−1.20, −1.03) |
−25.72 (<0.0001) | |
Akaike Information Criteria (AIC) | 98,451.8 | 94,824.7 |
1 The confidence interval and p-values for independence working correlation structure in particular but also arguably for equicorrelation overestimate the precision of the parameter estimates. Unlike GEE, mixed models are not robust to misspecification of the correlation structure. 2 There is no Within-subject Variation for HIV Infection Status.
Appendix B. Homology between Co-DOSE in (4) Occurring with between- and within-Subject Slopes Being the Same or Differing
Figure A1 illustrates using the example of Section 2B (with K = 1) that Co-DOSE in (4) occurs if . Remember that in this example, βBS = 0.9, βWS = 3, βCS = 1.60. Now let J = 2. So for the between / within-subject decomposition model; . If the overall mean of Xij for all repeat measures in the sample was 180 (i.e., = 180) then the full cross-sectional model is . If a subject’s two weight measures are Xi1 = 200 and Xi2 = 220, then for the first measure, the cross-sectional model estimates . However, since Xi2 = 220 and = 210, as we saw earlier within- between-subject decomposition gives; . Thus, E[Yi1|Xi1] is not independent of Xi2 since Xi2 is informative of where falls and the slope for (Xij − ) is different than the slope for when . However, if , then Xi2 is non-informative on Yi1|Xi1 as E[Yij] = = = since βWS = βBS = βCS.
As Ji ≡ 2, in the prior example, the second observation was deterministic for . However, for Ji > 2, while additional Xij’ go into computation of these are still informative on relative contributions of and on E[Yij|Xij].
Whether or not Co-DOSE in (4) occurs also informs if . If for a given j, E[Yij|Xij] is independent of all other Xij’, then E[Yij|Xij] is independent of , which only happens if . However, if E[Yij|Xij] is not independent of other Xij’, then; (i) if , βWS (if well defined) ≠ βBS, (ii) otherwise if then βWS is not well defined.
Appendix C. Illustration That Classical Measurement Error Which Is Independent of the Outcome Pushes βWS and βBS to Zero with Greater Impact on βWS
To illustrate this for the classical measurement error setting with K = 1, let there always be the same number of replicates, J, per subject and assume that the true data-generating mechanism, i.e., in the absence of measurement error, involves β = βWS = βBS = βCS. For example, let E[Yij] = βTXij = β(TXij-µi) + βµi, where µi is true mean exposure of the ith subject, for simplicity the intercept is 0. However, we only observe Xij = TXij + Mij, where Mij is measurement error with E[Mij] = 0, variance that is independent across all i’s and j’s and also independent from Yij. It also would often be assumed that Mij ~ N(0, ), but we do not invoke this assumption here. When we use Xij instead of TXij in regression, the observed estimates of βWS, βBS, and βCS will not be equal to their true values (i.e., as obtained with TXij), but instead will equal different values β*WS, β*BS, and β*CS from Xij being watered down by the independent measurement error as shown below. In this special case that β = βW S = βB S = βCS, we show that we expect β*WS ≠ β*BS ≠ β*CS. We also reproduce a known result that under classical measurement error observed β*’s are attenuated towards 0 with respect to the true β’s. Furthermore, let TXij vary with j within i as follows; TXij = TCi + TRij where TCi is a central tendency of TX for subject i, while TRij is within- subject i repeated visit variation in TXij across the j’s. Let and be variances of TCi and TRij, respectively. Now, using the identity that the slope of the regression line for Y = α + βX is the covariance of X and Y divided by the variance of X (i.e., Cov(X,Y)/Var(X)), we derive:
Var(Xij) = (σ2C + σ2R + σ2M), Cov(Xij,Yij) = βCS(σ2C + σ2R), so that
β*CS = βCS(σ2C + σ2R)/(σ2C + σ2R + σ2M); |
Var() = (σ2C + σ2R/J + σ2M/J), Cov(,Yij) = (βBS σ2C + βWS σ2R/J), so that
β*BS = (βBS σ2C + βWS σ2R/J)/(σ2C + σ2R/J +σ 2M/J); |
Var = , Cov() = βWS σ2R ((J − 1)/J), so that
β*W S = βWSσ2R/(σ2R + σ2M). |
Thus, for example if βWS = βBS = βCS = 5 we therefore have from the above formulas
β*CS = , β*BS = , and β*WS = .
Continuing with this numeric example, let = 8, = 2 and = 10 and J = 5. The entire variance of Xij is 20 of which half, 10, is from measurement error, 8 is variation of the central tendency of X between-subjects and 2 is variation of X within-subject. Then β*CS = 5(10)/20 = 2.5, β*BS = 5(8.4)/10.4 = 4.03 and β*WS = 5(2)/12 = 0.83. Considering that, without measurement error, true between and within person slopes are both 5, measurement error has greatly attenuated β*WS = 0.83 towards 0, while β*BS = 4.03 is the least tempered. This happens because β*BS most fully retains the common signal in X, but tempers M through averaging, while β*WS more fully retains M while excluding the between-subject signal in X.
Appendix D. Only βBS Is Estimable When Under Berkson-type Measurement Error
Since we have brought up classical measurement error we should also discuss the other common type of measurement error known as Berkson error. For Berkson error the measurement error is independent of the observed value (i.e., X), but is not independent from the true value (TX). Such a situation is approximated when a common value is reported for all Ji replicates in the same subject i. For example consider a study of radiation contamination of the milk supply with i = community and j = child within the community. Now the true value of daily milk consumption for each child, TXij, is unknown, but the average daily consumption of milk across all children in that community, µi, is known (or estimated with high degree of certainty as , with a caveat noted below) and is thus substituted for Xij. With Berkson-type error, the common within subject mean, µ (or ), rather than the different Xij is observed across all j replicates. Thus, when Berkson-type error exists, the fitted model estimates βBS by default and both βWS and βCS (which require knowledge of TXij) are not identifiable. However, in practice of observational rather than laboratory studies Berkson-type error may coexist with classical measure error in different ratios. This was described by Berkson (1950) [30] as “modified controlled experimentation”. If so, then the estimate for βBS is likely attenuated (i.e., as a β*BS similar to what has been described in Appendix C for classical measurement error) due to the classical measurement error component of that is computed from Xij. More formal exploration of this hybrid quasi-Berkson-type error is given in context of occupational epidemiology in Kim et al. [18].
Author Contributions
Conceptualization, D.R.H.; Methodology, D.R.H., Q.S. and I.B.; Software, D.R.H. and Q.S.; Validation, D.R.H., Q.S., and I.B.; Formal Analysis, D.R.H. and Q.S.; Investigation, D.R.H. and I.B.; Resources, K.A.; Data Curation, K.A. and Q.S.; Writing—Original Draft Preparation, D.R.H.; Writing—Review & Editing, D.R.H., and I.B.; Funding Acquisition, K.A.
Funding
This research was funded by the Women’s Interagency HIV Study (WIHS) Collaborative Study Group at New York City/Bronx Consortium, which was funded by the National Institute of Allergy and Infectious Diseases UO1-AI-35004.
Conflicts of Interest
The authors declare no conflict of interest. The founding sponsors had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, and in the decision to publish the results.
References
- 1.Diggle P.J., Heagarty P., Liang K.Y., Zeger S.L. Analysis of Longitudinal Data. Oxford Press; New York, NY, USA: 2002. [Google Scholar]
- 2.Yang Y., Xie M. Asymptotics for generalized estimating equations with large cluster sizes. Ann. Stat. 2003;31:310–347. doi: 10.1214/aos/1046294467. [DOI] [Google Scholar]
- 3.Levey A.S., Bosch J.P., Lewis J.B., Greene T., Rogers N., Roth D. A more accurate method to estimate glomerular filtration rate from serum creatinine: A new prediction equation. Modification of Diet in Renal Disease Study Group. Ann. Intern. Med. 1999;130:461–470. doi: 10.7326/0003-4819-130-6-199903160-00002. [DOI] [PubMed] [Google Scholar]
- 4.Barkan S.E., Melnick S.L., Preston-Martin S., Weber K., Kalish L.A., Miotti P., Young M., Greenblatt R., Sacks H., Feldman J. The Women’s Interagency HIV Study. WIHS Collaborative Study Group. Epidemiology. 1998;9:117–125. doi: 10.1097/00001648-199803000-00004. [DOI] [PubMed] [Google Scholar]
- 5.Littell R.C., Pendergast J., Natarajan R. Modelling covariance structure in the analysis of repeated measures data. Stat. Med. 2000;19:B1793–B1819. doi: 10.1002/1097-0258(20000715)19:13<1793::AID-SIM482>3.0.CO;2-Q. [DOI] [PubMed] [Google Scholar]
- 6.SPSS Inc. Linear Mixed-Effects Modeling in SPSS: Introduction to the MIXED Procedure) [(accessed on 11 January 2019)]; Available online: http://www.spss.ch/upload/1126184451_Linear%20Mixed%20Effects%20Modeling%20in%20SPSS.pdf.
- 7.Cui J. QIC program and model selection in GEE analyses. Stat. J. 2007;7:209–220. doi: 10.1177/1536867X0700700205. [DOI] [Google Scholar]
- 8.Gardiner J.C., Luo Z., Roman L.A. Fixed effects, random effects and GEE: What are the differences? Stat. Med. 2009;28:221–239. doi: 10.1002/sim.3478. [DOI] [PubMed] [Google Scholar]
- 9.Shults J., Sun W., Tu X., Kim H., Amsterdam J., Hilbe J.M., Ten-Have T.A. Comparison of several approaches for choosing between working correlation structures in generalized estimating equation analysis of longitudinal binary data. Stat. Med. 2009;28:2338–2355. doi: 10.1002/sim.3622. [DOI] [PubMed] [Google Scholar]
- 10.Cheng J., Edwards L.J., Maldonado-Molina M.M., Komro K.A., Muller K.E. Real Longitudinal Data Analysis for Real People: Building a Good Enough Mixed Model. Stat. Med. 2010;29:504–520. doi: 10.1002/sim.3775. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Gosho M. Criteria to Select a Working Correlation Structure for the Generalized Estimating Equations Method in SAS. J. Stat. Software. 2014;57:1–10. doi: 10.18637/jss.v057.c01. [DOI] [Google Scholar]
- 12.Tiwar P., Shukla G. Approach of Linear Mixed Model in Longitudinal Data Analysis Using SAS. J. Reliabil. Statist. Stud. 2011;4:73–84. [Google Scholar]
- 13.Robinson W.S. Ecological correlations and the behavior of individuals. Am. Sociol. Rev. 1950;15:351–357. doi: 10.2307/2087176. [DOI] [Google Scholar]
- 14.Cronbach L.J. Research on Classifications and Schools. Formulations of Questions Designs and Analyses. Occasional Paper. Stanford Evaluation Consortium; Stanford, CA, USA: 1976. [Google Scholar]
- 15.Firebaugh G. A rule for inferring individual level relationships from aggregate data. Am. Sociol. Rev. 1978;43:557–572. doi: 10.2307/2094779. [DOI] [Google Scholar]
- 16.Scott A.J., Holt D. The effect of two-stage sampling on ordinary least square methods. J. Am. Stat. Assoc. 1982;77:848–854. doi: 10.1080/01621459.1982.10477897. [DOI] [Google Scholar]
- 17.Kim H.M., Richardson D., Loomis D., Van Tongeren M., Burstyn I. Bias in the estimation of exposure effects with individual- or group-based exposure assessment. J. Expo. Sci. Environ. Epidemiol. 2011;21:212–221. doi: 10.1038/jes.2009.74. [DOI] [PubMed] [Google Scholar]
- 18.Mancl L.A., Leroux B.G., DeRouen T.A. Between-subject and within-subject statistical information in dental research. J. Dent. Res. 2000;79:1778–1781. doi: 10.1177/00220345000790100801. [DOI] [PubMed] [Google Scholar]
- 19.Pepe M.S., Anderson G.L. A cautionary note on inference for marginal regression models with longitudinal data and general correlated response data. Commun. Stat. Simul. 1994;23:939–951. doi: 10.1080/03610919408813210. [DOI] [Google Scholar]
- 20.Mancl L.A., Leroux B.G. Efficiency of regression estimates for clustered data. Biometrics. 1996;52:500–511. doi: 10.2307/2532890. [DOI] [PubMed] [Google Scholar]
- 21.Schildcrout J.S., Heagarty P.J. Regression analysis of longitudinal binary data with time-dependent environmental covariates: bias and efficiency. Biostat. 2005;6:633–652. doi: 10.1093/biostatistics/kxi033. [DOI] [PubMed] [Google Scholar]
- 22.Burstein L. The analysis of multilevel data in educational research and evaluation. J. Educ. Stat. 1980;3:347–383. doi: 10.3102/10769986003004347. [DOI] [Google Scholar]
- 23.Raudenbusch S.W., Bryk A.S. Heirarchical Linear Models: Applications and Data Analysis Methods. 2nd ed. Sage Publications; London, UK: 2002. [Google Scholar]
- 24.Begg M.D., Parides M.K. Separation of individual-level and cluster-level covariate effects in regression analysis of correlated data. Stat. Med. 2003;22:2591–2602. doi: 10.1002/sim.1524. [DOI] [PubMed] [Google Scholar]
- 25.Fuller W.A. Measure Error Models. Wiley; New York, NY, USA: 1987. [Google Scholar]
- 26.Rifkin R.D. Effects of Correlated and Uncorrelated Measurement Error on Linear Regression and Correlation in Medical Method Comparison Studies. Stat. Med. 1995;14:789–798. doi: 10.1002/sim.4780140808. [DOI] [PubMed] [Google Scholar]
- 27.Preller L., Kromhout H., Heederik D., Tielen M.J. Modeling long-term average exposure in occupational exposure-response analysis. Scand. J. Work Environ. Health. 1995;21:504–512. doi: 10.5271/sjweh.67. [DOI] [PubMed] [Google Scholar]
- 28.Tielemans E., Kupper L.L., Kromhout H., Heederik D., Houba R. Individual-based and group-based occupational exposure assessment: some equations to evaluate different strategies. Ann. Occup. Hyg. 1998;42:115–119. doi: 10.1016/S0003-4878(97)00051-3. [DOI] [PubMed] [Google Scholar]
- 29.Berkson J. Are There Two Regressions? J. Am. Stat. Assoc. 1950;45:164–180. doi: 10.1080/01621459.1950.10483349. [DOI] [Google Scholar]
- 30.Neuhaus J.M., Kalbfleisch J.D. Between- and within-cluster covariate effects in the analysis of clustered data. Biometrics. 1998;54:638–645. doi: 10.2307/3109770. [DOI] [PubMed] [Google Scholar]
- 31.Zhang J., Hu W., Wei F., Wu G., Korn L.R., Chapman R.S. Children’s Respiratory morbidity prevalence in relation to air pollution in four Chinese cities. Environ. Health Perspect. 2002;110:961–967. doi: 10.1289/ehp.02110961. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Miller K.A., Siscovick D.S., Sheppard L., Sheppard K., Sullivan J.H., Anderson G.L., Kaufman J.D. Long term exposure to Air Pollution and Incidence of Cardiovascular Events in Women. N. Eng. J. Med. 2007;350:447–458. doi: 10.1056/NEJMoa054409. [DOI] [PubMed] [Google Scholar]
- 33.Pan G., Zhang S., Feng Y., Takahashi K., Kagawa J., Yu L., Wang P., Liu M., Liu Q., Hou S., et al. Air pollution and children’s respiratory symptoms in six cities of Northern China. Respir. Med. 2010;104:1903–1911. doi: 10.1016/j.rmed.2010.07.018. [DOI] [PubMed] [Google Scholar]
- 34.Crouse D.L., Peters P.A., Villenueve P.J., Proux M.A., Shih H.H., Goldberg M.S., Johnson M., Wheeler A.J., Allen R.W., Atari D.O., et al. Within- and between-city contrasts in nitrogen dioxide and mortlaiy in 10 Canadian cities; a subset of the Canadian Census Health and Environment Cohort (CanCHEC) J. Expo. Sci. Environ. Epidemiol. 2015;25:482–489. doi: 10.1038/jes.2014.89. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Morrison B., Shenklin A., McLelland A., Robertson D.A., Barrowman M., Graham S., Wuga G., Cunningham K.J.M. Intra-individual variation in commonly analyzed serum constituents. Clin. Chem. 1979;25:1799–1805. [PubMed] [Google Scholar]
- 36.Lacher D.A., Hughes. J.P., Carroll. M.P. Biological variation of laboratory analytes based on the 1999-2002 National Health and Examination Survey. National Center for Health Statistics; Hyattsville, MD, USA: 2010. National Health Statistics Reports. [PubMed] [Google Scholar]
- 37.Cuhadar S., Atay A., Koseoglu M., Dirican A., Har A. Stability studies of common biochemical analytes in serum separator tubes with or without gel barriers subjected to various storage conditions. Biochemia Media. 2012;22:202–214. doi: 10.11613/BM.2012.023. [DOI] [PMC free article] [PubMed] [Google Scholar]