Introduction
In our companion paper, we reported health economic data from a clinical trial with the primary outcome of cumulative opioid intake measured weekly postoperatively.1,2 The randomized trial compared two groups of patients, one a care-as-usual group, and the other receiving an educational eHealth intervention.1,2,3 For the health-economic evaluation, we measured quality of life during the four weeks after discharge.1,2 Both endpoints had serial correlation.1,2 The data from our companion paper, Stata statistical code, and output are online (click here).1 The statistical goal was to estimate the mean change in quality of life associated with reductions in each patient’s opioid consumption after hospital discharge.2 We estimated the association (i.e., correlation) between the two endpoints, doing so within, not between, patients.1,2
Opioid consumption and quality of life are “paired longitudinal data,” paired in the sense that these observations measured within subject are being analyzed pairwise by time.1,2 In anesthesiology journals, analyses are often described as based on “repeated measures,” “time series,” or “panel data” (Supplemental S1). Serial correlation (i.e., correlation between successive measurements within subjects) also arise for intraoperative and postoperative monitoring (e.g., measurements of heart rate and blood pressure). Next, consider asking whether an association (e.g., between morphine milligram equivalents and quality of life) differs between the randomized groups. Alternatively, consider asking a completely different question, whether an association (e.g., between heart rate and blood pressure) differs among patients based on American Society of Anesthesiologists’ physical status classification. Methods used for comparisons can include random effects linear regression or pooled pairwise comparisons (Supplemental S1). Our tutorial reviews why, for most clinical trials and retrospective cohort studies, it is statistically misleading and usually unnecessary to measure the direct effect of a fixed patient-specific characteristic (e.g., sex or surgical specialty) when analyzing the pairwise relationship between two patient variables that change over time. The tutorial also reviews the features and advantages of fixed effects models that account for serial correlation (e.g., with first-order autoregressive error processes) for quantifying these pairwise relationships.
Methods
Throughout the article, we rely on a public dataset1 from our companion paper, reporting on a prospective US-based observational study,2 performed as part of a randomized trial. We studied the association between postoperative opioids (morphine milligram equivalents) reported as consumed by patients in the four weeks after hospital discharge and their health-related quality of life as measured by EuroQol – 5 Dimensions, 5 Levels (EQ-5D-5L), valuations:4,5,6 https://doi.org/10.25820/data.007798. The site includes the data posted as a comma delimited file and as a Stata DTA file (StataCorp, College Station, Texas). On the webpage, there are also Stata output in the PDF file, with each section of the PDF labeled (e.g., “Stata A” for these Methods, with data descriptions). The data were collected longitudinally by patient (“study_id”) over five successive weeks (“week”), the first being during the period before discharge, and then for four weeks after hospital discharge.2 For these panel data, each panel represents one patient. These data are repeated measures in non-random succession. A substantive covariate influencing the quality of life was the surgical specialty of the procedure the patient underwent (η2 = 0.218, lower 97.5% one-sided confidence limit = 0.151). The surgical specialty was also associated with the independent variable of morphine milligram equivalents (η2 = 0.204, lower 97.5% confidence limit = 0.138). For that reason, the surgical specialty is included as a covariate in the online data. We urge readers to follow along with this worked example using the Supplemental Stata1 PDF file, or Stata code DO file if preferred, as they both have the same listed sections.
Supplemental S1 includes a PubMed search, performed July 23, 2025. The search shows that many (430) abstracts of anesthesiology articles report comparing multiple groups each with longitudinal data and serial correlation. Phrases used often included “repeated measures” and “time series”. Medical Subject Headings (MeSH) often used in the articles include “Pain, Postoperative/drug therapy,” “Heart Rate/drug effects,” and “Blood Pressure/drug effects.” These search results informed our choice of examples for the current tutorial. Nevertheless, there were also many articles devoted to operating room management and economics. These endpoints frequently have serial correlation (e.g., surgeons’ adjusted utilizations of their allocated time and their successive patients’ days waiting for surgery).7 There are more operating room management articles published in Anesthesia & Analgesia than in any other journal, both clinical and engineering.8 However, these topics lack a corresponding MeSH entry.9
Mathematical background
Imagine that we are measuring the patient’s quality of life in 1-week increments . Quality of life is unitless. There are patients, . The time-varying covariates are the square root of the total oral morphine milligram equivalents measured weekly. For our online data file from our companion paper,1,2 there were five weekly times, , where weeks. The represents observations soon before hospital discharge. The square root transformation was used to achieve a closer to linear association with the quality-of-life valuations. We expect a negative association between these two, paired, longitudinally measured variables:
| (1) |
Here, represents the fixed, patient-specific intercept capturing time-invariant differences. The show the variation among patients in the unitless quality of life. The term represents random, serially correlated, unexplained fluctuations in the quality of life, being variation among measurements for the same patient (i.e., within patient variability).
Before formal testing, we return to the physiological example introduced in the Introduction. We measure the patient’s heart rates in 5-minute increments. The time-varying covariates are the mean arterial pressures measured at 5-minute increments. We again expect a negative association within subject between the paired variables, equation (1). For this example, the represents the patient-specific intercept, in units of beats per minute. The term represents random, unexplained fluctuations in the heart rate (i.e., within patient variability).
In Supplemental Stata1 B, the Wooldridge test rejects the null hypothesis of no first-order correlation between successive , P <0.00001.10 Therefore, model estimation for quality of life was corrected for first-order autoregressive errors using Nagar’s procedure.11 There was positive serial correlation, . The estimated from equation (1), between quality of life and morphine milligrams equivalent, has a standard error of 0.0011, P <0.0001. The slope coefficient’s interpretation would be that, on average, during weeks when each patient was using more opioids, they had a lower quality of life valuation.
Now, we get to the principal topic of the tutorial, the addition of a time-invariant variable to the model of equation (1). These could be patients’ ages, sex, race, or physical status. For simplicity, based on the Supplemental data,1 we consider surgical specialties, or . Combining Equation (1) and yields:
| (2) |
The time-invariant characteristics are being absorbed into the patient-specific intercept of Equation (1). Formally, the in equation (2) represent the remaining unobserved patient-specific effects after adding as separate terms. We are writing them out explicitly here as we show in the next two equations how they are being handled by fixed-effects models with serial correlation. To isolate the effects of the variables that change over time, a fixed-effects model calculates the mean of each variable over time for each patient and subtracts it from the original equation. Averaging equation (2) over time for each patient yields equation (3), where the bar denotes the mean over time for patient :
| (3) |
Continuing, we subtract the averaged equation (3) from the original equation (2). The difference shows the model that functionally is being estimated for each patient:
| (4) |
The xtregar command in Supplemental Stata1 B estimates the coefficients for this equation while accounting for the serial correlation of the error terms. The patient-specific intercepts cancel out, as do the surgical specialties . The algebra shows that adding to equation (1) giving equation (2) has functionally had no effect. The algebra of equations (1–4) has shown an implication of using the fixed-effects model for longitudinal (panel) data. Specifically, time-invariant variables (e.g., specialty) should not be included as covariates because they are perfectly collinear with the fixed effects. Not only is removing the specialty from the model essential to make the statistical model identifiable, Supplemental Stata1 B shows that the removal is not a weakness but a strength of the study design. In a fixed-effect model, each patient’s unique characteristics are fully accounted for by including the patient-specific intercept . The fact that the time-invariant term(s) drop out is an advantage because this means that both measured (i.e., observed) factors drop out, and unmeasured patient-specific factors also drop out. Clinical interventions (e.g., a phone app with analgesic recommendations)3 cannot change the time-invariant characteristics (e.g., specialty). The fixed effects model focuses on the estimand of interest, the effect on the dependent variable of the within-subject change in the independent variable. The panel model (equations 2, 4) adjusts for the specified variable of surgical specialty as well as any other stable patient characteristic, such as genetic predispositions or psychological resilience, even if these were unknown. Put another way, analogous to how properly randomized trials provide high levels of evidence because unknown but important factors cannot affect results, fixed effects panel models can provide similar control for time-invariant characteristics.
Supplemental Stata1 B shows that a time-invariant variable can be included as an interaction term:
| (5) |
The interaction term estimates whether (patients’ specialties) modifies the within-patient slope but does not address the (large) main effects of . If an investigator interprets this interaction term in isolation as the treatment effect, that interpretation would be wrong. From Supplemental Stata1 A, the (patients’ average opioid use) differ systematically among (patients’ specialties). Also from Supplemental Stata1 A, the (patients’ average quality of life) differ systematically among (patients’ specialties), a relationship hidden by being absorbed into the patient-specific intercept, . While neurological surgery and orthopedics both have significant associations with the independent (opioid consumption) and dependent (quality of life) variables (Bonferroni adjusted P ≤ 0.0004), Supplemental Stata1 B shows that equation (5) detects no associations, both adjusted P > 0.99.
Could we instead validly treat the patients as a random effect? From equation (2), the would be revised and added:
| (6) |
The is treated as a common intercept among patients. The represents the random, patient-specific deviation from that common intercept. Thus, the fixed of equation (1) is replaced by the random quantity . The patient-specific random effect is assumed to follow a normal distribution with a mean of zero and a constant variance . If the statistical assumptions were satisfied, then the random effects model would be advantageous, because it is a more efficient use of the available data. Supplemental Stata1 C shows that, indeed, the standard error of the estimated slope coefficient is appropriately smaller for the random-effects model. There is thus greater statistical efficiency because equation (2) has more parameters to be estimated than does equation (6), versus . However, the random effects model has a fundamental and generally unsuitable assumption. We add in the time-averaged value of the independent variable, as seen in equation (3):
| (7) |
For the random effect model of equation (6) to yield consistent estimates, the coefficient in equation (7) should be statistically indistinguishable from zero. (This is conceptually like equation (4) that shows that, for the fixed-effects model, literally equals zero.) This condition, that , will rarely hold for random-effect observational models between variables common in anesthesia research (Methods). To explain why, we consider heart rate predicted by blood pressure. The assumption would imply that the association between patients’ average heart rates and blood pressures (between-subject slope) is the same as the association for a single patient over time (within-subject slope).12 That condition does not hold from basic pathophysiology. For trauma, as an example, deviation from the assumption is fundamental to the shock index, the ratio of the heart rate to systolic blood pressure.13 When comparing among patients, not longitudinally, higher shock indices indicate a greater degree of hypovolemia and are associated with more blood units transfused.13 Trauma management guidelines recommend using the shock index.14 A trauma patient having experienced a blood loss of 1.5 liters and presenting with a heart rate of 126 beats per minute and systolic blood pressure of 90 mmHg has a shock index of 1.40.13 Patients with preexisting hypertension who experience trauma have significantly lower shock indices,15,16 falsely suggesting on a cross-sectional basis that there is absence of clinically significant hemorrhage. The patient’s time-averaged systolic blood pressure is correlated with the patient-specific effect because hypertensive patients have above-average blood pressures and average or below-average heart rates (i.e., ).17 An investigator using such a random effects model, assuming incorrectly that the between-patient and within-patient correlations are identical , would incorporate this misleading cross-sectional signal and report a wrong estimate of the acute hemorrhage response. This example shows the importance that, when using longitudinal (i.e., panel) data for two or more time-varying variables and serial correlation, use of the random-effects models should be limited to the relatively uncommon situations wherein the random-effects assumption is valid. In contrast, the fixed-effects model (equations 1 and 4) would successfully adjust not only for baseline blood pressure but also for other unknown or unmeasured time-invariant covariates. In the next paragraph, we apply this logic to our companion article’s dataset1,2 to show the exact difference in conclusions yielded by the two approaches.
Supplemental Stata1 C shows an example of testing the random-effects assumption, using the opioid consumption and quality of life data.1,2 Estimation was performed twice, once using the fixed-effects specification and once using random-effects. If the random-effects independence assumption holds, the fixed effects and random effects estimates of the slope coefficients should be similar, only the standard errors should differ,18 but that was not so. The point estimates differed significantly between the fixed-effects and random-effects models (Hausman test, significant with and without robust standard errors, both P < 0.00001).19 The random effect model overestimated the magnitude of the association by 55% compared to the fixed effects model. The Mundlak test was also significant (P < 0.00001).18 In other words, as expected, the random effects model violated its key assumption, that the unobserved component of each patient’s quality of life was uncorrelated with their opioid consumption, including with their mean opioid use. An investigator relying on these random effect estimates would draw the wrong conclusion about both the magnitude of the main effect and its precision.
Because the random effect model regularly is used for analyzing anesthesia data, consider how equations (6) and (7) differ from their many appropriate applications. Suppose that the represent 0’s and 1’s for different periods (e.g., describing a before and after study). Then, the random effect assumption would generally hold. Suppose further that the were 0’s and 1’s signifying randomized groups. Then, the random effects assumption of independence would be expected to hold. Finally, suppose that the independent time-varying variable were exogenous (e.g., computer-controlled infusion assigned randomly without relation to either the dependent variable or the time-invariant patient characteristic). Then, the random effect assumption of independence would also be expected to hold. The problem with our formulation of equations (6) and (7) is that and are endogenous, correlated variables within subjects (i.e., not independent). Quoting the title of our tutorial, we are considering “paired longitudinal data with serial correlation.”
Statistical methods sections’ warning signs for incomplete or incorrect analyses
One potentially problematic scenario would be a study on the association between time-varying variables (e.g., heart rate and blood pressure), and the results also include a test for the effect of a time-invariant covariate on the dependent variable, either (a) using a fixed effects model, (b) using a random effects model without a confirmation of its assumptions, or (c) without specification of fixed or random effects. There is nothing wrong with reporting the standardized differences between non-randomized or randomized groups at baseline for each observed variable (e.g., heart rate and blood pressure). However, there should not be a claim of the significant or non-significant effect of the group on the marginal mean associations between variables. Supplemental Stata1 D provides examples of warning notifications and error messages. For example, when analyzing paired longitudinal data, repeated measures analysis of variance cannot estimate the effect of a time-invariant variable like specialty because of its perfect collinearity with the patient as a fixed effect. The statistical method also cannot model first-order autoregressive errors.
Investigators may instead try to compare longitudinal variables between time-invariant groups by using a fixed effects model and applying pooled ordinary least squares while using cluster-robust standard errors, with the clusters being the patients. These would be like the so-called “naïve pooled method” in pharmacokinetics.20,21 The statistical modeling recognizes that multiple observations have been made for each patient, but neglects both the patient-specific effects and the serial correlation in the longitudinal data. Robust clustered variance does not distinguish between variation within and among patients. Supplemental Stata1 D shows that the estimates of differ markedly and the confidence intervals do not even overlap with those of the fixed effects model with first-order autoregressive errors (equation 1). In other words, for problems of equation (1), pooling data yields estimates that are biased relative to the endpoint of clinical and economic interest, the marginal arithmetic mean change (e.g., how quality of life changes for a patient as their opioid consumption changes).2 If an investigator relies on these estimates or their narrow confidence intervals, their clinical or economic interpretations will be wrong. Equation (4) shows why there is bias. The model with autoregressive errors subtracts out the effect of the means to isolate the within-patient effect, estimating the relationship between and . That differs from comparing and without regard to whether variation is within or between patients. Similarly, a summary measures approach can be used to estimate the relationship between and . Supplemental Stata1 D shows this discrepancy, because the slope of the linear relationship between and differs from the appropriate relationship, shown in equation 4, between and .
The reporting of statistical methods in anesthesia manuscripts can also have warning signs of errors in the analysis of paired longitudinal data by lacking the key information needed for reproducibility of the statistical model. Following the Statistical Analyses and Methods in the Published Literature (SAMPL) guidelines, investigators should specify the software and module or provide the code for the model.22 Their statistical methods section should specify and explain the choice of the method of variance estimation (e.g., one of the Huber-White Sandwich robust estimators).23 In Supplemental Stata1 E, we show examples of the large effect of the variance estimation methods on estimated mean change in quality of life with changes in opioid consumption. If a random effect is used, the article should explain if the random effect assumption was evaluated and confirmed using Hausman’s or Mundlak’s test.18,19 Finally, if first-order autoregressive errors were treated as small or too little to be included, their absence should be tested (e.g., using Wooldridge’s test).10
Conclusions
We provided a tutorial with Supplemental Stata1 data and statistical code on making comparisons between groups when analyzing the relationship between time-varying longitudinal data with serial correlation (e.g., blood pressure and heart rate changing over time). We included a fully worked example from our companion paper.2 Fixed effects models accounting for serial correlation (e.g., using first-order autoregressive errors) should be used for the vast majority of similar prospective or retrospective cohort analyses. Both observed and unobserved (i.e., unmeasured) time-invariant variables drop out from the estimated statistical model, markedly simplifying the interpretation of results.
Supplementary Material
S1 Literature search
Funding:
Supported by US Agency for Healthcare Research and Quality, 5R01HS027795-04
Footnotes
Conflicts of Interest: None of the authors have related disclosures.
References
- 1.Dexter F, Bartels K. Dataset of weekly postoperative EQ-5D-5L valuations and morphine milligram equivalents for four weeks after major surgery. University of Iowa [Dataset]; 2025; DOI: 10.25820/data.007798. [DOI] [Google Scholar]
- 2.Dexter F, Rolfzen ML, Hoffman J, Rodriguez ES, Bartels K. Opioid intake and quality of life after hospital discharge from major surgery – a health economic evaluation. Anesth Analg 2026; ePub. doi: 10.1213/ANE.0000000000007914. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Rolfzen ML, Shah K, Sanchez Rodriguez E, Hoffman JT, Clauw D, Mascha E, Graff V, Bartels K. Postsurgical Medication Awareness, Recovery, and Tracking using a Phone-Based App (SMART-APP): a randomized clinical trial. Reg Anesth Pain Med. 2025; ePub. doi: 10.1136/rapm-2025-106783. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Devlin N, Pickard S, Busschbach J. The development of the EQ-5D-5L and its value sets. 2022 Mar 24. In: Devlin N, Roudijk B, Ludwig K, editors. Value Sets for EQ-5D-5L: A Compendium, Comparative Review & User Guide [Internet]. Cham (CH): Springer; 2022. Chapter 1. doi: 10.1007/978-3-030-89289-0_1. [DOI] [PubMed] [Google Scholar]
- 5.Pickard AS, Law EH, Jiang R, Pullenayegum E, Shaw JW, Xie F, Oppe M, Boye KS, Chapman RH, Gong CL, Balch A, Busschbach JJV. United States valuation of EQ-5D-5L health states using an international protocol. Value Health. 2019;22:931–941. doi: 10.1016/j.jval.2019.02.009. [DOI] [PubMed] [Google Scholar]
- 6.EuroQOL. Computing EQ-5D-5L index values with STATA using the United States (US) Pickard value set, Version 2.1 (Updated 01/12/2020). https://euroqol.org/wp-content/uploads/2024/01/US_valueset_STATA.txt. Accessed 23 February 2025.
- 7.Dexter F, Macario A, Traub RD, Hopwood M, Lubarsky DA. An operating room scheduling strategy to maximize the use of operating room block time: computer simulation of patient scheduling and survey of patients’ preferences for surgical waiting time. Anesth Analg. 1999;89:7–20. doi: 10.1097/00000539-199907000-00003. [DOI] [PubMed] [Google Scholar]
- 8.Dexter F, Scheib S, Xie W, Epstein RH. Bibliometric analysis of contributions of anesthesiology journals and anesthesiologists to operating room management science. Anesth Analg. 2024;138:1120–1128. doi: 10.1213/ANE.0000000000006694. [DOI] [PubMed] [Google Scholar]
- 9.Wachtel RE, Dexter F. Difficulties and challenges associated with literature searches in operating room management, complete with recommendations. Anesth Analg. 2013;117:1460–1479. doi: 10.1213/ANE.0b013e3182a6d33b. [DOI] [PubMed] [Google Scholar]
- 10.Drukker DM. Testing for serial correlation in linear panel-data models. Stata J. 2003;3:168–177. [Google Scholar]
- 11.Horn JA Jr. A comparison of four estimators of a first order autoregressive process. Masters Thesis, US Naval Postgraduate School, Monterey, California. September 1986. https://apps.dtic.mil/sti/tr/pdf/ADA175144.pdf. Accessed 14 September 2025. [Google Scholar]
- 12.Twisk JWR, de Vente W. Hybrid models were found to be very elegant to disentangle longitudinal within- and between-subject relationships. J Clin Epidemiol. 2019;107:66–70. doi: 10.1016/j.jclinepi.2018.11.021. [DOI] [PubMed] [Google Scholar]
- 13.Mutschler M, Nienaber U, Münzberg M, Wölfl C, Schoechl H, Paffrath T, Bouillon B, Maegele M; TraumaRegister DGU. The Shock Index revisited - a fast guide to transfusion requirement? A retrospective analysis on 21,853 patients derived from the TraumaRegister DGU. Crit Care. 2013;17:R172. doi: 10.1186/cc12851. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Rossaint R, Afshari A, Bouillon B, Cerny V, Cimpoesu D, Curry N, Duranteau J, Filipescu D, Grottke O, Grønlykke L, Harrois A, Hunt BJ, Kaserer A, Komadina R, Madsen MH, Maegele M, Mora L, Riddez L, Romero CS, Samama CM, Vincent JL, Wiberg S, Spahn DR. The European guideline on management of major bleeding and coagulopathy following trauma: sixth edition. Crit Care. 2023;27:80. doi: 10.1186/s13054-023-04327-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Rau CS, Wu SC, Kuo SC, Pao-Jen K, Shiun-Yuan H, Chen YC, Hsieh HY, Hsieh CH, Liu HT. Prediction of massive transfusion in trauma patients with shock index, modified shock index, and age shock index. Int J Environ Res Public Health. 2016;13:683. doi: 10.3390/ijerph13070683. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Park SJ, Lee MJ, Kim C, Jung H, Kim SH, Nho W, Seo KS, Park J, Ryoo HW, Ahn JY, Moon S, Cho JW, Son SA. The impact of age and receipt antihypertensives to systolic blood pressure and shock index at injury scene and in the emergency department to predict massive transfusion in trauma patients. Scand J Trauma Resusc Emerg Med. 2021;29:26. doi: 10.1186/s13049-021-00840-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Niu W, Qi Y. A meta-analysis of randomized controlled trials assessing the impact of beta-blockers on arterial stiffness, peripheral blood pressure and heart rate. Int J Cardiol. 2016;218:109–117. doi: 10.1016/j.ijcard.2016.05.017. [DOI] [PubMed] [Google Scholar]
- 18.Mundlak Y On the pooling of time series and cross section data. Econometrica. 1978;46:69–85. doi: 10.2307/1913646. [DOI] [Google Scholar]
- 19.Amini S, Delgado MS, Henderson DJ, Parmeter CF. Fixed vs random: The Hausman test four decades later. In Essays in honor of Jerry Hausman 2012, Eds Baltagi BH, Hill RC, Newey WK, White HF. 29:479–513. Emerald Group Publishing Limited. doi: 10.1108/S0731-9053(2012)0000029021. [DOI] [Google Scholar]
- 20.Egan TD, Lemmens HJ, Fiset P, Hermann DJ, Muir KT, Stanski DR, Shafer SL. The pharmacokinetics of the new short-acting opioid remifentanil (GI87084B) in healthy adult male volunteers. Anesthesiology. 1993;79:881–892. doi: 10.1097/00000542-199311000-00004. [DOI] [PubMed] [Google Scholar]
- 21.Kataria BK, Ved SA, Nicodemus HF, Hoy GR, Lea D, Dubois MY, Mandema JW, Shafer SL. The pharmacokinetics of propofol in children using three different data analysis approaches. Anesthesiology. 1994;80:104–122. doi: 10.1097/00000542-199401000-00018. [DOI] [PubMed] [Google Scholar]
- 22.Lang TA, Altman DG. Basic statistical reporting for articles published in biomedical journals: the “Statistical Analyses and Methods in the Published Literature” or the SAMPL Guidelines. Int J Nurs Stud. 2015;52:5–9. doi: 10.1016/j.ijnurstu.2014.09.006. [DOI] [PubMed] [Google Scholar]
- 23.White H A heteroskedasticity-consistent covariance matrix estimator and a direct test for heteroskedasticity. Econometrica. 1980;48:817–838. doi: 10.2307/1912934. [DOI] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
