LiMMCov: An interactive research tool for efficiently selecting covariance structures in linear mixed models using insights from time series analysis

Perseverence Savieri; Lara Stas; Kurt Barbé

doi:10.1371/journal.pone.0325834

. 2025 Jun 11;20(6):e0325834. doi: 10.1371/journal.pone.0325834

LiMMCov: An interactive research tool for efficiently selecting covariance structures in linear mixed models using insights from time series analysis

Perseverence Savieri ^1,^2,^*, Lara Stas ^1,², Kurt Barbé ^1,²

Editor: Mohamed R Abonazel³

PMCID: PMC12157095 PMID: 40498721

Abstract

The correct specification of covariance structures in linear mixed models (LMMs) is critical for accurate longitudinal data analysis. These data, characterised by repeated measurements on subjects over time, demand careful handling of inherent correlations to avoid biased estimates and invalid inferences. Incorrect covariance structure specification can lead to inflated type I error rates, reduced statistical power, and inefficient estimation, ultimately compromising the reliability of statistical inferences. Traditional methods for selecting appropriate covariance structures, such as AIC and BIC, often fall short, particularly as model complexity increases or sample sizes decrease. Studies have shown that these criteria can misidentify the correct structure, resulting in suboptimal parameter estimates and poor assessment of standard errors for fixed effects. Additionally, relying on trial-and-error comparisons in LMMs can lead to overfitting and arbitrary decisions, further undermining the robustness of model selection and inference. To address this challenge, we introduce LiMMCov, an interactive app that uniquely integrates time-series concepts into the process of covariance structure selection. Unlike existing tools, LiMMCov allows researchers to explore and model complex structures using autoregressive models, a novel feature that enhances the accuracy of model specification. The app provides interactive visualisations of residuals, offering insights into underlying patterns that traditional methods may overlook. LiMMCov facilitates a systematic approach to covariance structure selection with a user-friendly interface and integrated theoretical guidance. This paper details the development and features of LiMMCov, demonstrates its application with an example dataset, and discusses its potential impact on research. The app is freely accessible at https://zq9mvv-vub0square.shinyapps.io/LiMMCov-research-tool/.

1. Introduction

Linear mixed models (LMMs) have been extensively studied and utilised to analyse longitudinal data in various fields, including medical research and social sciences [1,2]. Longitudinal data is collected on subjects or sampling units repeatedly over time [3,4]. Since these repeated measurements are positively correlated, ignoring this correlation in the analysis can lead to underestimated standard errors and inflated type I error rates, as well as potentially incorrect type II error rates [5]. LMMs address the dependency in the data by incorporating both fixed effects (consistent across individuals) and random effects (which vary among individuals or clusters) [3]. Verbeke and Molenberghs [2] represent the mixed model for observed outcome data (y_i) as:

y_{i} = X_{i} β + Z_{i} b_{i} + ε_{i}

(1)

b_{i} ~ N (0, D), ε_{i} ~ N (0, Σ_{i})

X is a design matrix for the fixed effects β, Z is the design matrix for random effects b_i, and $ε$ is the residual error vector. In this model, b_i and $ε_{i}$ are assumed independent, D represents the random effects covariance matrix, and $Σ_{i}$ represents the covariance matrix of the residual errors. The model captures correlations using two approaches: 1) Hierarchical formulation (i.e., the conditional or subject-specific model) and 2) Marginal formulation (i.e., the population-average or marginal model). In the first approach, given the random effects, the measurements of each subject are independent (conditional independence assumption). This means that the more random effects we include, the more flexibly we capture the correlations [2,6]. In the second approach, the measurements of each subject are correlated, and these correlations between repeated measurements are modelled by the marginal covariance matrix ${V_{i} = Z_{i} D Z_{i}^{'} + Σ}_{i}$ [2,3,7]. A critical component of this marginal formulation is the correct specification of the covariance matrix [8]. This matrix ( $V_{i}$ ) can be explicitly modelled to reflect any remaining correlation among residuals after specifying time-varying fixed effects, or it can be left unstructured. Common choices for modelling $V_{i}$ include the autoregressive model of order 1, AR(1), which assumes that correlations decay geometrically over time (or repeated measures) and compound symmetry (CS), which assumes a uniform correlation among all pairs of observations within the same subject [2,4,9]. In contrast, leaving $V_{i}$ unstructured means that the data specify all variances and covariances without constraining structures and ignoring residual correlation when it exists can lead to biased estimates and misinformed inference, highlighting the importance of proper evaluation of $V_{i}$ [1,2]. In this study, we focus on the marginal formulation where the covariance structure is modelled by $V_{i}$ .

Despite the critical importance of an appropriate structure for $V_{i}$ , many researchers remain unaware of its impact and often rely on default settings, making the selection process challenging. As Molenberghs & Verbeke [2] emphasise, “while many residual covariance structures are available, there is a lack of general simple techniques for model comparisons”. The current body of literature has explored various methods for covariance structure selection, including model comparison techniques and diagnostic tools [8]. Researchers typically use the likelihood ratio test (LRT) to compare a pair of nested models, i.e., “full” and “reduced” models, where the “reduced” model’s covariance structure is a special case of the “full” model, e.g., when comparing the unstructured with the CS [5,10]. However, LRTs are limited by their requirement for nested models and their sensitivity to sample size, leading to challenges in model comparison and a higher risk of overfitting or failing to detect true differences [3]. For non-nested models, information criteria, such as Akaike’s Information Criterion (AIC; [11]), AIC Corrected Criterion (AICc; [12]), Consistent AIC (CAIC; [13]) and the Bayesian Information Criterion (BIC; [14]) are employed. The effectiveness of these information criteria in selecting the correct residual covariance structure has been studied by Ferron et al. [15], Gómez et al. [16], and Keselman et al. [17]. Unfortunately, these criteria do not always succeed in selecting the proper structure. Ferron et al. [15] found that AIC selected the correct covariance structure about 79% of the time, while BIC did so 66% of the time. In contrast, Keselman et al. [17] reported that AIC and BIC were successful only 47% and 35% of the time. However, while the AIC is useful for comparing models, it offers limited insight into why one model outperforms another, effectively making the selection process a black box for identifying the model structure. The selection accuracy decreases as the covariance structure becomes complex and the sample size decreases. Although Vallejo et al. [8] recommend that researchers may benefit from using criteria designed for small samples, such as AICc, the inferential properties of such criteria remain underexplored. Other studies have shown that information criteria like CAIC and BIC can be competitive in identifying the correct covariance structure when paired with simulation-based approaches [18,19]. These findings are context-dependent and limited to specific simulation settings. The conclusion that these criteria perform well under certain conditions does not address the broader challenges faced in real-world applications, where model complexity and data variability often undermine their effectiveness. Thus, one must consider the possible effects of misspecification of the covariance structure on the statistical properties of the inferences.

Misspecifying the marginal covariance structure can have significant consequences when analysing repeated measurement data [8,15,20–22]. Incorrect selection leads to inefficient estimation and poor assessment of standard errors for fixed effects [2,20,23–25], leading to biased statistical inferences, inflated type I error rates, and reduced statistical power [8,20,21]. Although the fixed effect estimates often remain unbiased [15], selecting an inadequate covariance structure can compromise the control over type I error rates associated with hypothesis tests for fixed effects, such as group differences or time trends [16,26–28]. This issue becomes pronounced with smaller sample sizes, fewer repeated measures, and more complex covariance structures [8]. Additionally, properly modelling the marginal covariance structure captures the remaining temporal dependencies and ensures more accurate estimates of the variability in the outcome over time after accounting for fixed effects [29].

There is a clear need for an accessible and detailed approach to selecting covariance structures in LMMs for longitudinal data to avoid the consequences of misspecifications. Although commercial software procedures such as SAS Proc Mixed and SPSS Mixed offer a menu of covariance structures, these platforms are not accessible to everyone. Traditional methods described by Littell et al. [9] provide limited guidance and visualisation options, often leaving users to rely on trial and error or suboptimal default settings. Despite research on graphical procedures such as the draftsman’s display and parallel axis plots [30], there is still a significant gap in understanding and accurately identifying complex covariance structures in longitudinal data, mainly when these structures are not easily discernible through traditional visualisation methods. Moreover, these methods often require advanced statistical knowledge and are not always user-friendly. Existing research tools for LMMs do not include features to explicitly model the marginal covariance structure [31]. Therefore, researchers require a free tool that provides robust statistical methods and offers an intuitive interface, theoretical guidance, and interactive visualisations to aid in selecting the correct marginal covariance structures.

A key insight of this study is leveraging time-series analysis concepts to address the challenge of modelling correlation structures in longitudinal data. Specifically, when random effects or a marginal covariance matrix are not included, the correlation remains in the residuals [32]. This observation provides a foundation for visualising and understanding residual correlation patterns, enabling researchers to select appropriate residual covariance structures more effectively. Another novel feature of this study is the application of Yule-Walker equations to construct correlation matrices [33], which were employed to simulate datasets and validate the proposed method. These insights form the basis of our interactive tool, LiMMCov, developed to make these advanced concepts accessible to a broader audience.

LiMMCov is an interactive research tool built using the R Shiny package [34] to intuitively guide researchers in selecting appropriate covariance structures for longitudinal data analysis. It integrates interactive visualisations of residuals and applies time-series concepts, such as autoregressive models, to model complex correlation structures. The app’s user-friendly design, point-and-click interface, and detailed theoretical text allow researchers to load their own or example datasets, visualise data properties, and apply preprocessing techniques such as variable transformations and data type changes. By bridging the gap between statistical theory and practical application, LiMMCov has the potential to enhance the analytical capabilities of researchers, ensuring more accurate and efficient longitudinal data analyses.

In the Methods section, we summarise the time-series concepts that motivated our work, including all statistical details. We describe the design and features of LiMMCov, and the correlation structures included. Readers primarily interested in the practical applications can skip to section 2.4 and read the Results section, where we demonstrate the capabilities of LiMMCov through a step-by-step guide using an example dataset from published research. The Discussion section then highlights the tool’s strengths and limitations, followed by concluding remarks.

2. Methods

2.1 Process for evaluating the proposed method

2.1.1 Deriving correlation matrices using Yule-Walker equations.

Time series analysis is used to analyse data points collected or observed at successive points in time [35]. These analyses aim to understand underlying patterns, forecast future values, or model the dependency structure within the data. A key concept in time series analysis is the autocorrelation function, which quantifies the degree of correlation between observations at different time lags [32]. Within this framework, autoregressive (AR) models are frequently used to capture dependencies between current and past observations in a time series.

The AR(p) model is given by the equation:

X_{t} = ϕ_{1} X_{t - 1} + ϕ_{2} X_{t - 2} + \dots + ϕ_{p} X_{t - p} + ε_{t}

(2)

where $X_{t}$ represents the time series at time t, $ϕ_{i}$ are the autoregressive coefficients, p is the order of the AR process, and $ε_{t}$ is a white noise error term with variance $σ^{2} .$

As shown in model (2), AR models assume that each observation can be expressed as a linear combination of its preceding values, along with a random error term [32,35].

The Yule-Walker equations are fundamental in the estimation of AR model parameters (i.e., autoregressive coefficients, $ϕ_{i}$ ). These equations establish a relationship between the autoregressive coefficients and the autocorrelation function of the time series, allowing the AR parameters to be derived directly from the data [33].

In this study, we utilised the Yule-Walker equations to estimate the autoregressive coefficients required to construct correlation matrices for AR processes.

The Yule-Walker equations for an AR(p) process are given by:

γ_{k} = \sum_{i = 1}^{p} ϕ_{i} γ_{k - i} + σ^{2} δ_{k 0}, k = 0, 1, 2, \dots, p

(3)

where k = 0,…,p yielding p + 1 equations [36]. Each term in the equation is defined as follows:

• $γ_{k}$ : The autocovariance at lag k which quantifies the similarity between the time series values separated by k time steps. It reflects the degree of dependency between observations at different points in time.
• $ϕ_{i} :$ The autoregressive (AR) coefficients, which represent the contribution of the i-th lagged value to the current value in the time series. These are the parameters to be estimated in the AR model.
• $σ^{2} :$ The variance of the input noise process. It captures the variability in the time series that is not explained by its past values.
• $δ_{k 0}$ is the Kronecker delta, which is 1 for k = 0 and 0 otherwise. It ensures that the noise variance contributes only to the autocovariance at lag k = 0 (i.e., the variance of the series).
• p: The order of the AR process, indicating how many past time steps are considered when modelling the current value.

Since the autocorrelation function (ACF) $ρ_{k}$ is defined as the autocovariance at lag k normalized by the variance $γ_{0}$ :

ρ_{k} = \frac{γ_{k}}{γ_{0}}

(4)

the Yule-Walker equations can be rewritten in terms of the autocorrelation function to give model (5):

ρ_{k} = \sum_{i = 1}^{p} ϕ_{i} ρ_{k - i}, k = 1, 2, \dots, p

(5)

Matrix form of Yule-Walker equations

To enhance the theoretical clarity of parameter estimation, the Yule-Walker equations for an AR(p) process can be expressed in matrix form as:

[\begin{matrix} \begin{matrix} \begin{matrix} ρ_{0} & ρ_{1} \end{matrix} \\ \begin{matrix} ρ_{1} & ρ_{0} \end{matrix} \end{matrix} & \dots & \begin{matrix} ρ_{p - 1} \\ ρ_{p - 2} \end{matrix} \\ ⋮ & ⋱ & ⋮ \\ \begin{matrix} ρ_{p - 1} & ρ_{p - 2} \end{matrix} & \dots & ρ_{0} \end{matrix}] [\begin{matrix} ϕ_{1} \\ ϕ_{2} \\ \begin{matrix} ⋮ \\ ϕ_{p} \end{matrix} \end{matrix}] = [\begin{matrix} γ_{1} \\ γ_{2} \\ \begin{matrix} ⋮ \\ γ_{p} \end{matrix} \end{matrix}]

or concisely

R Φ = Γ

(6)

where $R$ is the p x p autocorrelation matrix, $Φ$ is the p x 1 vector of AR coefficients and $Γ$ is the p x 1 vector of autocorrelations. Yule-Walker equations for AR(1) and AR(2) processes are provided in the S1 File.

To derive the correlation matrices $R$ in our study, we first estimated autoregressive (AR) parameters from weakly stationary time series via the Yule-Walker equations, confirming that all characteristic roots lay outside the unit circle. Once these AR parameters were estimated, we constructed correlation matrices for different AR processes with specific dimensions based on the number of repeated measurements. For AR(2) processes, we utilised a modified version of the mat.ar2() function from the dae package [37], to generate the corresponding correlation matrix. The modified version includes additional checks for AR(2) model causality and provides a more precise computation for the third lag, ensuring greater model stability and accuracy. The complete R code used to implement this whole process can be found in the supplementary information S2.

2.2 Numerical validation

We simulated longitudinal data incorporating specified correlation structures (i.e., CS, AR1 and AR2) to create a controlled yet realistic environment for evaluation. The simulated datasets were based on previously published longitudinal studies on HIV-positive adults [38–40] to reflect standard demographic and clinical variables encountered in real-world longitudinal studies. This ensured that the correct sample covariance structure was known, allowing us to systematically evaluate the accuracy of different modelling approaches. These datasets, with realistic longitudinal patterns, provided a robust testing environment for our methods and are included in the app as examples. Details of the R code used to generate these datasets can be found in S2 File.

2.3 Fitting linear models and visualising residuals

2.3.1 Importance of residual analysis.

One of the key insights from time series analysis is that when a model does not account for the correlation inherent in structured data, this correlation manifests in the residuals [33]. Residual patterns can reveal temporal dependencies, clustering, or other correlation structures that a given model may have missed. This makes examining residuals a powerful diagnostic tool to ensure the adequacy of the model’s assumptions and to guide the choice of appropriate correlation structures.

2.3.2 Initial model fitting.

In this study, we initially fitted linear models to the simulated data using the generalised least squares method without specifying any random effects or correlation structures. This step was deliberate: by focusing solely on the fixed effects and neglecting the clustering or correlation in the data, we could assess the linear models to reveal the underlying structure. Residuals from these fixed effects models were extracted, and their covariance and correlation matrices were computed. In these steps, we aimed to identify the structure depicted in the correlation matrices.

2.3.3 Visualising residual correlation.

To visualise the temporal correlation structure, we plotted residuals as means of the off diagonals in the correlation matrix against the lag. This approach enabled us to uncover the remaining correlation patterns that the initial linear model did not account for. Ideally, residuals in a well-specified fixed effects model should be independent and randomly distributed. However, patterns or trends in the correlation matrices of the residuals indicated that the model had failed to capture some aspects of the underlying structure (see plots in Fig 1).

Fig 1 — Residuals illustrating correlation structures from linear models: (a) compound symmetry, (b) autoregressive order 1 (AR1) and (c) autoregressive order 2 (AR2).

Visualising these residuals against the lag provided valuable insights into the nature of the correlation, for example, identifying whether the pattern was autoregressive or compound symmetric. By incorporating these insights, subsequent models could more accurately reflect the true correlation structure, avoiding reliance on purely theoretical assumptions or a trial-and-error approach. This step ensured that the analysis remained robust and grounded in empirical evidence.

2.3.4 Using PACF to detect autoregression order.

In addition to residual plots, we used the Partial Autocorrelation Function (PACF) plots to detect autocorrelation and determine the appropriate order of the autoregressive (AR) correlation structures (Fig 2). The PACF isolates the direct relationship between residuals at specific lags, filtering out the influence of shorter lags.

Fig 2 — An illustration of autocorrelation structures from linear models: (a) autoregressive order 1 and (b) autoregressive order 2.

Residual plots provide initial insights into autoregressive patterns, such as a decaying trend (often indicative of an odd order) or oscillations (suggesting an even order). However, they do not explicitly determine the true AR order. For instance, a decaying trend in residuals might result from the combined effects of three AR(1) components, effectively forming an AR(3) structure. This ambiguity highlights the need for PACF plots, which provide a more precise indication of the correct AR order by focusing exclusively on direct correlations at each lag.

In our context, PACF plots were particularly valuable compared to the Autocorrelation Function (ACF), which measures overall correlations at various lags. The ACF can mislead by suggesting longer lag correlations due to the cumulative effects of intermediate lags. By filtering out these confounding effects, the PACF simplified the process of identifying the correct AR order (i.e., AR(1), AR(2), or higher), ensuring that our models more accurately reflected the underlying correlation structure [33, 36].

2.4 User interface overview

LiMMCov was developed in R [41] and implemented as a web application using the R Shiny package [34]. It is available as an open-source application at https://zq9mvv-vub0square.shinyapps.io/LiMMCov-research-tool/. A complete list of the R packages that were utilised can be found in the source code which is available on GitHub at https://github.com/vub-square/LiMMCov-Shiny-app.

The Home Tab introduces the app and outlines its purpose and functionality (Fig 3). The app contains several tabs dedicated to a specific part of the analysis workflow under the Analysis navigation menu.

The Dataset Tab allows users to select an example dataset for analysis. In addition to the simulated datasets, we included a dataset named “COVID data,” drawn from a published prospective observational study of patients with severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) infection admitted to the intensive care unit (ICU) at a tertiary hospital in Cape Town, South Africa [42]. The dataset includes 413 patients admitted between 27 March and 4 November 2020. Clinical and laboratory data were captured daily and stored electronically, including demographic and lifestyle characteristics, clinical disease characteristics, pre-existing comorbidities, arterial blood gases, and routinely collected laboratory data. The primary outcomes analysed in the study were mortality rates and length of stay in the ICU, with time to death or discharge as secondary outcomes. Users can also upload their own datasets by specifying the data format (e.g., CSV, Excel) and file location. Once data has been loaded, there are options to transform variables or change data types under “Data manipulation and preprocessing”. The “Data Summary” includes basic descriptive statistics and visualisations to understand the data distribution.

The General Linear Model Tab enables users to fit a general linear model (GLM) to the dataset. Users can specify the outcome variable and the mean structure appropriate for their data. The mean structure will include the fixed effects of the model. A summary of the fitted GLM model is then provided on the main panel.

The Covariance Analysis Tab then provides tools for analysing the correlation structure of the residuals from the GLM. Users can visualise 1) residual plots and 2) partial autocorrelation function (PACF) plots which can easily be downloaded for further exploration. This tab also includes the interpretation of ideal residual and PACF plots for common correlation structures. In a compound symmetry (CS) structure, all residuals are equally correlated, irrespective of lag, resulting in a horizontal line at the value rho (ρ) in the residual plot. The PACF shows a single significant spike at lag 1, indicating a constant correlation across all time points without a decay pattern. In AR(1), the correlation between observations decreases exponentially with lag distance. Therefore, residuals show a declining line in the residual plot as correlation decreases exponentially with increasing lag. The PACF displays a significant spike at lag 1, with subsequent lags showing values close to zero, reflecting the single-level decay of correlation over time. In AR(2), two autoregressive (AR) parameters influence the correlation pattern, reflecting the interaction between two lag terms. Residuals oscillate around zero, reflecting the interplay between lag terms without systematic bias. The PACF typically shows significant spikes at lags 1 and 2, capturing the influence of both lag terms in the model. For an AR(p) process, the PACF will have significant spikes up to lag p, after which the values drop to zero, highlighting the true order of the autoregressive model [33].

The Linear Mixed Model Tab allows users to specify and fit LMMs to their data. Users can define fixed effects, including interaction terms, and choose from various correlation structures to accurately capture the dependencies in their data. The app uses the nlme package [43] to fit these models, which is particularly advantageous as it provides extensive options for modelling the residuals’ covariance structures. It allows for the flexible specification of these structures, enabling more precise modelling of the residuals in the data. Thus, LiMMCov offers various covariance structures for the residuals to accommodate different types of data:

AR1 and AR(p) are designed for discrete and equally spaced time points, as their correlation structures rely on the spacing between observations. Compound Symmetry (CS), while often applied to equally spaced time points, assumes a constant correlation between all observations regardless of time or spacing. This makes CS less sensitive to time spacing and suitable for some discrete, unequally spaced data if the correlation structure remains constant.
Continuous AR1, Exponential, Linear and Gaussian are for continuous time and are well-suited to irregularly spaced or unbalanced data.

LiMMCov also offers an option for heteroscedastic residual variances via the nlme package’s varIdent() function. In the Linear Mixed Model tab, users can check the “Include heteroscedastic variances” box after selecting a correlation structure (e.g., AR(1)). Once checked, a warning alerts the user that the fitting process may take longer. If the heteroscedastic option is selected, LiMMCov will automatically fit an additional model using the same correlation structure but allowing the residual variance to vary, then present a side-by-side comparison of both the homoscedastic and heteroscedastic fits in the results table. This dynamic workflow ensures that the heteroscedastic model is only estimated when explicitly requested, thereby streamlining the process for users who do not require heteroscedastic variance structures. By including this feature, LiMMCov extends its applicability to real-world longitudinal data where variance may change over time or across measurement occasions. Additionally, summaries of the results, including model fit comparisons, are provided to confirm the true covariance structure and aid in making informed decisions based on the characteristics of their dataset.

The Reports Tab lets users generate and download reports of the output summarising the analyses. Available formats include HTML, PDF, and Word, with HTML recommended for its interactive features. The Manual Tab offers a detailed user manual guiding users through each step of using the app, from uploading data, model fitting, and covariance structure selection to report generation. This section helps users understand the functionality of each tab and how to interpret the results. The Contact Us Tab provides the contact information of the developers.

3. Results

3.1 Example: Longitudinal analysis of oxygen saturation in COVID-19 ICU patients

The objective in our example was to model changes in oxygen saturation over time using an LMM and selecting the correct residual covariance structure.

3.1.1 Load data.

The COVID example dataset was selected from the Dataset tab, under the “Choose example data” heading as shown in Fig 4.

3.1.2 Fit the linear model.

The objective of this initial model was to explore the relationship between oxygen saturation and the fixed effects variables, such as time, age, gender, diastolic BP, and systolic BP. These variables were selected from the General Linear Model (GLM) tab, which is part of the Analysis menu. On the dropdown menus, users can specify the subject/patient ID, outcome, time and fixed effects before fitting the fixed effects model via the “Run GLM” button (Fig 5).

Fig 5 — The tab shows the fitted linear model and a summary of the coefficients.

3.1.3 Residuals diagnostics.

Residuals were extracted from the fixed effects linear model to analyse the underlying correlation structure. Under the Covariance Analysis tab, two key plots were generated (as described in the Methods section):

(1) Residual plot (Fig 6a): The plot shows a pattern of the residuals’ downward trend, indicating autoregression.
(2) Partial Autocorrelation Function (PACF) plot: It provides a more precise picture by showing the direct correlation of the residuals with their lagged values, excluding the indirect correlations (Fig 6b). The first three significant lags indicate an AR(3) correlation structure.

Fig 6 — (a) Residual and (b) PACF plots from the linear model to explore the underlying covariance structure.

3.1.4 Fit Linear Mixed Model (LMM).

Once the appropriate covariance structure is selected, the linear mixed model can be fitted. Under the Linear Mixed Model (LMM) tab (Fig 7), we fitted the model by including the variables from the GLM from Step 2 and selecting the chosen AR(3) covariance structure under the “Select covariance structure” dropdown.

3.1.5 Final selected model.

Residual plots and the partial autocorrelation function (PACF) in Step 3 suggested an AR(3) covariance structure as the most appropriate for the data. To confirm this, we compared the AR(3) model with alternative covariance structures using AIC, BIC and AICc (Table 1). The AR(3) model had the lowest BIC (13924.97), indicating the best overall fit among the models evaluated. While the AIC values for some models, such as the Symmetric structure, were close, the BIC strongly favoured AR(3), aligning with the visual diagnostics. This result supports the choice of AR(3) as the optimal structure for capturing the correlation patterns in the data.

Table 1. Model fit comparison.

Covariance Structure	AIC	BIC	AICc
AR(3) (selected correlation structure)	13869.49	13924.97	13869.60
Exponential	13947.69	13992.08	13947.77
Symmetry	13802.65	13996.85	13804.00
Compound Symmetry	13958.77	14003.15	13958.84
Rational quadratics	13997.36	14041.75	13997.44
Spherical spatial	14063.63	14108.02	14063.71
Gaussian spatial	14145.34	14189.72	14145.41
Linear spatial	14245.48	14289.87	14245.55

Open in a new tab

3.1.6 Generate Report.

Finally, the app’s comprehensive report functionality can be utilised to generate and save the analysis under the Reports tab described in section 2.4.

4. Discussion

In this study, we present LiMMCov, an interactive tool designed to assist researchers in selecting covariance structures for longitudinal data analyses using linear mixed models. The tool’s process includes fitting models with fixed effects, extracting the residuals, and analysing them through residual plots and Partial Autocorrelation Function (PACF) plots. LiMMCov simplifies the user’s understanding of residual patterns through interactive visualisations and a user-friendly interface. The tool includes detailed explanations of the common sample covariance structures to guide and support beginner and intermediate users. By leveraging time-series concepts, particularly autoregressive models, LiMMCov effectively identifies complex serial correlation structures. The app allows users to download plot outputs and reports for further analysis or sharing to promote transparency and reproducibility. Using a previously published longitudinal study, we highlighted the app’s key features and robust modelling capabilities [42]. Our findings indicate that LiMMCov facilitates the identification of the most appropriate marginal covariance structure, leading to improved model fit and inference. Including interactive visualisations proved crucial in uncovering patterns in the residuals, thereby effectively guiding the selection process. We validated our results by comparing the model with the selected covariance structure to those fitted with alternative structures available in the nlme package using AIC and BIC. In addition, LiMMCov also provides AICc for smaller samples, as recommended in the literature for model selection [12]. Cross-validation methods, while valuable for predictive modelling, fall outside the scope of LiMMCov’s primary purpose of identifying appropriate covariance structures. Researchers who need robust variance estimates or wish to evaluate predictive performance may complement LiMMCov analyses with additional external methods. As an open-source application, all code is available on GitHub.

When LiMMCov is compared to existing tools and methodologies, it becomes evident that our approach offers unique advantages, particularly in its user-centric design and focus on interactive visualisation. As Molenberghs and Verbeke [2] highlighted, there is a lack of straightforward methods for comparing various residual covariance structures. Their recommendation to rely on information criteria for model comparison underscores the complexity and manual effort involved in this process and can sometimes obscure the underlying patterns in the data. LiMMCov directly addresses this challenge by offering a platform that automates and enhances these comparisons with interactive visualisations. Although widely used statistical software such as SAS and SPSS can handle linear mixed models for correlated data, these solutions do not offer guidance and are neither free nor intuitive for less experienced users. They also require a solid understanding of model specifications and lack interactive features. Furthermore, LiMMCov focuses specifically on temporal correlations in longitudinal data rather than spatial correlations, which often require different analytical approaches. Spatial correlation methods, such as semi-variograms, have been extended to model random effects [2,44,45]. However, these are outside the scope of LiMMCov, which offers a specialized method that complements (rather than duplicates) spatial analysis techniques. Other Shiny applications, such as rmcorrShiny and SISSREM [31,46], are tools for specific statistical tasks like repeated measures ANOVA. They do not provide an option for selecting the covariance structure, which is a crucial feature of LiMMCov. This distinction highlights LiMMCov’s unique capability in addressing the complexities of longitudinal data analysis through visual diagnostic tools, allowing users to observe residual patterns and make informed decisions based on interactive plots. This emphasis on visual and dynamic exploration sets LiMMCov apart from other tools, providing a more accessible and intuitive way for users to engage with complex modelling processes. Other tools often require manual coding or are less suited for exploratory analysis, which LiMMCov addresses effectively by offering a streamlined, no-code interface.

Despite its strengths, we should acknowledge certain limitations of LiMMCov. While the research tool simplifies the selection of the residual covariance structure, it assumes that users have already identified an appropriate mean structure for their linear models. The accuracy of the final model depends not only on the correct specification of the covariance structure but also on the mean structure [2]. Although the time-series concepts applied in LiMMCov require complete data, we recommend that users handle missing data before importing their dataset into the application. For instance, multiple imputation (MI) or other robust techniques can be used to address missing values, ensuring the temporal structure is preserved when necessary [47–50]. In addition, we have added a note and references to the relevant literature [47–50] in the ‘Dataset’ and ‘Manual’ tabs of the LiMMCov app to remind users to address missing data before using the app. Although LiMMCov has demonstrated reliable performance in datasets with at least three repeated measures per subject and a sample size of 50 participants or more, its use in studies with fewer time points (e.g., two or three) remains less thoroughly tested. In these smaller designs, time-series diagnostics (particularly the PACF) may lack stability, and compound symmetry (CS) may be the most practical correlation structure. Consequently, further validation is recommended for datasets with very few repeated measures to ensure robust model selection. We also acknowledge that more complex correlation patterns, such as moving-average (MA) or ARMA(p,q) processes, are currently not available in LiMMCov, as the app focuses on commonly used covariance structures (e.g., AR(p), CS). Although we have begun addressing heteroscedasticity by integrating the varIdent() function for certain covariance structures in LiMMCov, more advanced or generalised variance function classes remain potential avenues for future development. In addition, future work could explore conditional heteroscedastic models, such as ARCH or GARCH, to accommodate volatility clustering, allowing LiMMCov to handle cases where residual variance evolves over time. Evaluating how best to integrate these models into the LMM framework is a promising direction for further research.

Our validation study relied on the Yule-Walker equations to obtain autoregressive coefficients from stationary AR(p) processes; these coefficients then defined the correlation matrices used to simulate longitudinal data. One limitation which deserves mention is that, since Yule-Walker relies on the sample autocorrelation function, parameter estimates can be imprecise in small samples. Although our simulations employed large series to minimise this error, recognising these constraints ensures that the simulated scenarios used to benchmark LiMMCov remain realistic while highlighting considerations for future work that uses empirically derived covariance structures.

A further open issue involves stochastic (unit-root) trends, where the series follows an integrated process (e.g., random walk) rather than the weak-stationary behaviour assumed by the covariance structures currently implemented in LiMMCov. If such a trend is present, researchers should first model it by including time-related fixed effects, so that the residuals analysed by LiMMCov are approximately stationary. At present, the app does not model non-stationary residual behaviour (i.e., the “I” component of an ARIMA(p,d,q) process with d > 0), nevertheless, incorporating state-space or other methods that treat integrated components directly is a promising avenue for future development, especially for data in which long-term stochastic trends cannot be captured adequately in the mean structure alone. Evaluating their added value in practice will first require a dedicated feasibility or sensitivity study within the current LMM framework, which we identify as future work.

Finally, specifying both a sophisticated random-effects structure and a serial correlation term in the residual covariance matrix can pose computational challenges and risk non-convergence, particularly with smaller or more complex datasets. Random effects inherently capture some degree of correlation among repeated measurements, so including extensive random effects while also modelling detailed serial correlations may lead to overparameterization or local optima. Consequently, model selection in linear mixed models remains a process of balancing interpretability, convergence, and fit. Although LiMMCov utilises empirical criteria (e.g., AIC, BIC) combined with diagnostic plots to aid model building, it does not guarantee the “true” or most complex model; rather, it helps researchers arrive at a reasonable and justifiable specification grounded in the observed data. Notably, LiMMCov follows a marginal formulation perspective rather than a broader hierarchical model-building approach. This aligns with standard model-building principles, wherein the final choice often reflects both data-driven insights and practical constraints, such as sample size and computational feasibility.

5. Conclusion

In summary, the current manuscript presents a solution to the challenges researchers face when using linear mixed models in longitudinal data analysis. LiMMCov represents a significant advancement in the tools available for LMM analysis for selecting covariance structures. Providing an interactive, user-friendly platform empowers researchers to make more informed decisions, ultimately improving the quality and reliability of their analyses.

Supporting information

S1 File. Appendix A.

Yule-Walker equations for AR(1) and AR(2) processes.

(DOCX)

pone.0325834.s001.docx^{(21.9KB, docx)}

S2 File. Appendix B.

R Code for longitudinal data simulation and estimation of AR parameters.

(DOCX)

pone.0325834.s002.docx^{(22.5KB, docx)}

Acknowledgments

We thank Peter S. Nyasulu from the Division of Epidemiology and Biostatistics, Faculty of Medicine and Health Sciences, Stellenbosch University (Cape Town, South Africa), for providing the example dataset.

Data Availability

All relevant data are within the manuscript and its Supporting information files.

Funding Statement

The author(s) received no specific funding for this work.

References

1.Fitzmaurice G, Davidian M, Verbeke G, Molenberghs G. Longitudinal data analysis. CRC press. 2008. [Google Scholar]
2.Molenberghs G, Verbeke G. Linear Mixed Models for Longitudinal Data. Springer. [Google Scholar]
3.Fitzmaurice GM, Laird NM, Ware JH. Applied longitudinal analysis [Internet]. 2012. [cited 2024 Aug 16]. Available from: https://books.google.com/books?hl=en&lr=&id=0exUN1yFBHEC&oi=fnd&pg=PR21&dq=Fitzmaurice,+G.M.,+Laird,+N.M.,+%26+Ware,+J.H.+(2004).+Applied+longitudinal+analysis.+Hoboken,+NJ:+Wiley.&ots=BfkQW1vK8D&sig=Tfnhf7ehrF1diIypWeHGiP73hUo [Google Scholar]
4.Littell R, Stroup W, Freund R. SAS for linear models [Internet]. 2002. [cited 2024 Aug 16]. Available from: https://books.google.com/books?hl=en&lr=&id=yG_JEAAAQBAJ&oi=fnd&pg=PR11&dq=Littell+(2002)+SAS+for+Linear+Models&ots=C-fVUbTgAf&sig=gqMC9_CNDyK6kiu54LkSiHlSNIs [Google Scholar]
5.Singer JD. Using SAS PROC MIXED to fit multilevel models, hierarchical models, and individual growth models. J Educational and Behavioral Statistics. 1998;23(4):323–55. doi: 10.3102/10769986023004323 [DOI] [Google Scholar]
6.Diggle PJ, Heagerty P, Kung-Yee L, Zeger S. Analysis of longitudinal data. Oxford University Press; 2002. [Google Scholar]
7.Gałecki A, Burzykowski T. Linear mixed-effects model. Springer. 2013. [Google Scholar]
8.Vallejo G, Ato M, Valdés T. Consequences of misspecifying the error covariance structure in linear mixed models for longitudinal data. Methodology. 2008;4(1):10–21. doi: 10.1027/1614-2241.4.1.10 [DOI] [Google Scholar]
9.Littell RC, Pendergast J, Natarajan R. Modelling covariance structure in the analysis of repeated measures data. Stat Med. 2000;19(13):1793–819. doi: [DOI] [PubMed] [Google Scholar]
10.Liu S, Rovine MJ, Molenaar PCM. Selecting a linear mixed model for longitudinal data: repeated measures analysis of variance, covariance pattern model, and growth curve approaches. Psychol Methods. 2012;17(1):15–30. doi: 10.1037/a0026971 [DOI] [PubMed] [Google Scholar]
11.Akaike H. A new look at the statistical model identification. IEEE Trans Automat Contr. 1974;19(6):716–23. doi: 10.1109/tac.1974.1100705 [DOI] [Google Scholar]
12.HURVICH CM, TSAI C-L. Regression and time series model selection in small samples. Biometrika. 1989;76(2):297–307. doi: 10.1093/biomet/76.2.297 [DOI] [Google Scholar]
13.Bozdogan H. Model selection and Akaike’s Information Criterion (AIC): the general theory and its analytical extensions. Psychometrika. 1987;52(3):345–70. doi: 10.1007/bf02294361 [DOI] [Google Scholar]
14.Schwarz G. Estimating the dimension of a model. Ann Stat. 1978;6(2):461–4. [Google Scholar]
15.Ferron J, Dailey R, Yi Q. Effects of misspecifying the first-level error structure in two-level models of change. Multivariate Behav Res. 2002;37(3):379–403. doi: 10.1207/S15327906MBR3703_4 [DOI] [PubMed] [Google Scholar]
16.Gomez EV, Schaalje GB, Fellingham GW. Performance of the kenward–roger method when the covariance structure is selected using AIC and BIC. Commun Stat Simul Comput. 2005;34(2):377–92. doi: 10.1081/sac-200055719 [DOI] [Google Scholar]
17.Keselman HJ, Algina J, Kowalchuk RK, Wolfinger RD. A comparison of two approaches for selecting covariance structures in the analysis of repeated measurements. Commun Stat Simul Comput. 1998;27(3):591–604. doi: 10.1080/03610919808813497 [DOI] [Google Scholar]
18.Al-Marshadi AH. Selecting the covariance structure in mixed model using statistical methods calibration. J Math Stat. 2014;10(3):309–15. [Google Scholar]
19.Hussein AL-Marshadi A. The new approach to guide the selection of the covariance structure in mixed model. Res J Med Med Sci. 2007;2(2):88–97. [Google Scholar]
20.Kwok O, West SG, Green SB. The impact of misspecifying the within-subject covariance structure in multiwave longitudinal multilevel models: a monte carlo study. Multivariate Behav Res. 2007;42(3):557–92. doi: 10.1080/00273170701540537 [DOI] [Google Scholar]
21.Murphy DL, Pituch KA. The performance of multilevel growth curve models under an autoregressive moving average process. J Exp Educ. 2009;77(3):255–84. doi: 10.3200/jexe.77.3.255-284 [DOI] [Google Scholar]
22.Lee Y hsuan. Longitudinal data analysis using multilevel linear modeling (MLM): Fitting an optimal variance-covariance structure. 2010.
23.Davis CS. Statistical Methods for the Analysis of Repeated Measurements. Springer New York. 2002. doi: 10.1007/b97287 [DOI] [Google Scholar]
24.Diggle P, Heagerty P, Liang KY, Zeger S. Analysis of longitudinal data. New York, NY: Oxford University Press. 2002. [Google Scholar]
25.Singer J, Willett J. Applied longitudinal data analysis: modeling change and event occurrence. 2003. [Google Scholar]
26.Keselman HJ, Algina J, Kowalchuk RK, Wolfinger RD. The analysis of repeated measurements: a comparison of mixed-model satterthwaite f tests and a nonpooled adjusted degrees of freedom multivariate test. Communications in Statistics - Theory and Methods. 1999;28(12):2967–99. doi: 10.1080/03610929908832460 [DOI] [Google Scholar]
27.Vallejo G, Livacic-Rojas P. Comparison of two procedures for analyzing small sets of repeated measures data. Multivariate Behav Res. 2005;40(2):179–205. doi: 10.1207/s15327906mbr4002_2 [DOI] [PubMed] [Google Scholar]
28.Kowalchuk RK, Keselman HJ, Algina J, Wolfinger RD. The analysis of repeated measurements with mixed-model adjusted F tests. J Educ Behav Stat. 2004;64(2):224–42. doi: 10.1177/0013164403260196 [DOI] [Google Scholar]
29.Hedeker D, Mermelstein R j. Mixed-effects regression models with heterogeneous variance: Analyzing ecological momentary assessment (EMA) data of smoking. Modeling contextual effects in longitudinal studies. 2007. [Google Scholar]
30.Dawson KS, Gennings C, Carter WH. Two graphical techniques useful in detecting correlation structure in repeated measures data. Am Stat. 1997;51(3):275–83. [Google Scholar]
31.Alonso PH. Data analysis based on SISSREM: Shiny Interactive, Supervised and Systematic report from REpeated Measures data. 2019.
32.Shumway RH, Stoffer DS. Time Series: A Data Analysis Approach Using R. 2019. [Google Scholar]
33.Box GEP, Jenkins GM, Reinsel GC, Ljung GM. Time series analysis: forecasting and control. John Wiley & Sons. 2015. [Google Scholar]
34.Chang W, Cheng J, Allaire JJ, Sievert C, Schloerke B, Xie Y, et al. shiny: Web Application Framework for R. https://cran.r-project.org/package=shiny. 2023. [Google Scholar]
35.Chatfield C, Xing H. The Analysis of Time Series: An Introduction with R, Seventh Edition. 2019. [Google Scholar]
36.Theodoridis S. Machine learning: a Bayesian and optimization perspective. Academic Press. 2015. [Google Scholar]
37.Brien C. Dae: Functions Useful in the Design and ANOVA of Experiments. https://cran.r-project.org/package=dae. 2024. [Google Scholar]
38.Tegegne AS. Predictors associated with the variation of CD4 cell count and body mass index (BMI) for HIV positive adults under ART. Scientific African. 2021;12:e00820. doi: 10.1016/j.sciaf.2021.e00820 [DOI] [Google Scholar]
39.Muhie NS, Tegegne AS. Predictors for CD4 cell count and hemoglobin level with survival time to default for HIV positive adults under ART treatment at University of Gondar Comprehensive and Specialized Hospital, Ethiopia. BMC Res Notes. 2023;16(1):357. doi: 10.1186/s13104-023-06625-3 [DOI] [PMC free article] [PubMed] [Google Scholar]
40.Gebrerufael GG, Asfaw ZG. Predictors of change in CD4 cell count over time for HIV/AIDS patients on ART follow-up in northern Ethiopia: a retrospective longitudinal study. BMC Immunol. 2024;25(1):64. doi: 10.1186/s12865-024-00659-3 [DOI] [PMC free article] [PubMed] [Google Scholar]
41.R Core Team. R: A Language and Environment for Statistical Computing. Vienna, Austria. 2023. [Google Scholar]
42.Nyasulu PS, Ayele BT, Koegelenberg CF, Irusen E, Lalla U, Davids R, et al. Clinical characteristics associated with mortality of COVID-19 patients admitted to an intensive care unit of a tertiary hospital in South Africa. PLoS One. 2022;17(12):e0279565. doi: 10.1371/journal.pone.0279565 [DOI] [PMC free article] [PubMed] [Google Scholar]
43.Pinheiro J, Bates D, R Core Team. nlme: Linear and Nonlinear Mixed Effects Models. https://cran.r-project.org/package=nlme. 2023. [Google Scholar]
44.Lesaffre E, Asefa M, Verbeke G. Assessing the goodness-of-fit of the Laird and Ware model--an example: the jimma infant survival differential longitudinal study. Stat Med. 1999;18(7):835–54. doi: [DOI] [PubMed] [Google Scholar]
45.Lesaffre E, Todem D, Verbeke G, Kenward M. Flexible modelling of the covariance matrix in a linear random effects model. Biom J. 2000;42(7):807–22. doi: [DOI] [Google Scholar]
46.Marusich LR, Bakdash JZ. rmcorrShiny: A web and standalone application for repeated measures correlation. F1000Res. 2021;10:697. doi: 10.12688/f1000research.55027.2 [DOI] [PMC free article] [PubMed] [Google Scholar]
47.Molenberghs G, Fitzmaurice G, Kenward MG, Tsiatis A, Verbeke G. Handbook of missing data methodology. CRC Press. 2014. [Google Scholar]
48.Nooraee N, Molenberghs G, Ormel J, Van den Heuvel ER. Strategies for handling missing data in longitudinal studies with questionnaires. J Stat Comput Simul. 2018;88(17):3415–36. doi: 10.1080/00949655.2018.1520854 [DOI] [Google Scholar]
49.Ji L, Chow S-M, Schermerhorn AC, Jacobson NC, Cummings EM. Handling missing data in the modeling of intensive longitudinal data. Struct Equ Modeling. 2018;25(5):715–36. doi: 10.1080/10705511.2017.1417046 [DOI] [PMC free article] [PubMed] [Google Scholar]
50.Heymans MW, Twisk JWR. Handling missing data in clinical research. J Clin Epidemiol. 2022;151:185–8. doi: 10.1016/j.jclinepi.2022.08.016 [DOI] [PubMed] [Google Scholar]

PLoS One. 2025 Jun 11;20(6):e0325834. doi: 10.1371/journal.pone.0325834.r001

Author response to Decision Letter 0

22 Jan 2025

PLoS One. doi: 10.1371/journal.pone.0325834.r002

Decision Letter 0

Mohamed Abonazel

2 Feb 2019

Dear Dr. Savieri,

Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process.

plosone@plos.org . When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.
A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.
An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter.

If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: https://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols . Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols .

We look forward to receiving your revised manuscript.

Kind regards,

Mohamed R. Abonazel, Ph.D.

Academic Editor

PLOS ONE

Journal Requirements:

1. When submitting your revision, we need you to address these additional requirements.

Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at

https://journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and

https://journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf

2. We note that Figures 3,4,5,6 and 7 in your submission contain copyrighted images. All PLOS content is published under the Creative Commons Attribution License (CC BY 4.0), which means that the manuscript, images, and Supporting Information files will be freely available online, and any third party is permitted to access, download, copy, distribute, and use these materials in any way, even commercially, with proper attribution. For more information, see our copyright guidelines: http://journals.plos.org/plosone/s/licenses-and-copyright.

We require you to either (1) present written permission from the copyright holder to publish these figures specifically under the CC BY 4.0 license, or (2) remove the figures from your submission:

1. You may seek permission from the original copyright holder of Figures 3,4,5,6 and 7 to publish the content specifically under the CC BY 4.0 license.

We recommend that you contact the original copyright holder with the Content Permission Form (http://journals.plos.org/plosone/s/file?id=7c09/content-permission-form.pdf) and the following text:

“I request permission for the open-access journal PLOS ONE to publish XXX under the Creative Commons Attribution License (CCAL) CC BY 4.0 (http://creativecommons.org/licenses/by/4.0/). Please be aware that this license allows unrestricted use and distribution, even commercially, by third parties. Please reply and provide explicit written permission to publish XXX under a CC BY license and complete the attached form.”

Please upload the completed Content Permission Form or other proof of granted permissions as an "Other" file with your submission.

In the figure caption of the copyrighted figure, please include the following text: “Reprinted from [ref] under a CC BY license, with permission from [name of publisher], original copyright [original copyright year].”

2. If you are unable to obtain permission from the original copyright holder to publish these figures under the CC BY 4.0 license or if the copyright holder’s requirements are incompatible with the CC BY 4.0 license, please either i) remove the figure or ii) supply a replacement figure that complies with the CC BY 4.0 license. Please check copyright information on all replacement figures and update the figure caption with source information. If applicable, please specify in the figure caption text when a figure is similar but not identical to the original image and is therefore for illustrative purposes only.

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. Is the manuscript technically sound, and do the data support the conclusions?

Reviewer #1: Yes

**********

2. Has the statistical analysis been performed appropriately and rigorously? -->?>

Reviewer #1: Yes

**********

3. Have the authors made all data underlying the findings in their manuscript fully available??>

The PLOS Data policy

Reviewer #1: Yes

**********

4. Is the manuscript presented in an intelligible fashion and written in standard English??>

Reviewer #1: Yes

**********

Reviewer #1: Peer Review of “LiMMCov: An Interactive Research Tool for Efficiently Selecting Covariance Structures in Linear Mixed Models Using Insights from Time Series Analysis”

I would like to congratulate the authors on this important topic. The manuscript addresses a real and persistent challenge in longitudinal data analysis: correctly specifying the residual covariance structure in linear mixed models (LMMs). By offering an interactive application (LiMMCov) that leverages both standard information criteria (AIC, BIC, etc.) and time‐series diagnostics (residual plots, PACF), the paper tackles a topic of clear importance for applied researchers in fields such as medicine, epidemiology, and social sciences.

Incorporating time‐series perspectives (e.g., Yule‐Walker equations, partial autocorrelation function plots) into the LMM covariance‐structure selection process is a distinct and innovative feature. Despite their proven effectiveness in detecting autoregressive patterns, these concepts are often underutilised in longitudinal analyses.

The manuscript clearly describes how LiMMCov can be used point-and-click, requires minimal coding, and provides direct visual feedback (residual plots, PACF plots, etc.). This format lowers the barrier to advanced modelling for practitioners and graduate students who may not otherwise implement thorough covariance-structure checks.

The code's availability on GitHub and free deployment in R Shiny are marked advantages. They foster transparency, reproducibility, and collaboration—principles strongly encouraged in scientific research.

Major areas of improvement

1- Real‐world longitudinal studies often have missing time points or irregular sampling intervals. While the manuscript mentions that LiMMCov “requires complete data,” it would benefit from guidance or references on handling missingness before importing.

2- Although the authors note that performance was not “thoroughly tested” for data sets with few repeated measures or small samples, this point warrants stronger emphasis. In practice, many longitudinal studies have 2–3 time points. The utility and stability of time‐series diagnostics (particularly PACF) in such constrained designs might be limited.

3- While combining AIC/BIC with diagnostic plots is valuable, the paper might clarify the limitations of purely empirical selection (e.g., the possibility of local optima or how random effects can alter apparent residual patterns). Specifically, clarifying how random‐effects structures and residual covariance structures interact would strengthen the discussion.

4- The manuscript focuses on common information criteria (AIC, BIC) and the partial autocorrelation function. However, other approaches (e.g., cross‐validation, robust standard errors, or specialised small‐sample corrections like AICc) are only briefly mentioned. It might be worth highlighting or demonstrating how these can be integrated within the app or used in tandem with its outputs.

5- Many longitudinal data sets may exhibit more complex correlation patterns (e.g., moving‐average components or partially nested random effects). It would be prudent to acknowledge that LiMMCov currently does not handle MA(q) or ARMA(p,q) structures.

6- As the authors note, random effects and residual covariance structures can be complementary. An overly complicated random‐effects structure could obscure the patterns in the residuals. Readers might need guidance on balancing the complexity of each part of the model.

7- Provide a quick “decision tree” in the manual: for instance, “If your residual plot shows a gradually decaying correlation with lag, consider AR(1). If it shows oscillation, consider AR(2). If a flat correlation for all lags, consider CS,” etc. This can be an excellent at-a-glance guide.

8- Enhance reporting of model diagnostics. In addition to the final AIC/BIC, incorporate:

o Model Convergence Warnings: Indicate if the model has trouble converging, prompting the user to revise initial guesses.

o Random-effects Diagnostics: Although residual correlation is the focus, briefly mentioning random-effects variance components can reassure advanced users about model fit.

9- While the authors discuss covariance structures, many practical problems also feature heteroskedastic residuals (where variance changes over time). Adding a note on how LiMMCov might be extended or used when residual variances are not assumed constant (e.g., varIdent in nlme) would expand its scope.

10- To identify boundary cases, encourage broader usage and external testing (e.g., including more diverse example data sets or user-submitted open data).

Minor issues

1- The word “models” is repeated (“LMMs models”), the word models is in LMMs. Please remove models after LMMs.

2- Lines 84–85 (“However, the AIC does not offer insight ... black box identification”): The sentence could be clarified.

3- While the text defines many terms (e.g., AR(p), CS, etc.), some readers may benefit from succinct definitions or references to standard texts the first time each structure is named. For example, a short descriptive bullet might clarify that “Compound Symmetry (CS) implies a constant correlation ρ between any two observations for the same subject.”

4- The results section might be clearer with subheadings like “3.1 Residual Diagnostics,” “3.2 Model Fit Comparison,” “3.3 Final Selected Model,” to provide an at-a-glance roadmap.

5- Certain sentences would read more fluently if combined or slightly rephrased, especially around lines 468–476, where the text shifts from discussing SAS/SPSS limitations to referencing time-series approaches. Ensure each paragraph transitions smoothly from one concept to the next.

6- Overall, the text is quite readable but can be made more concise by removing repeated points and ensuring that each paragraph targets a single theme.

**********

what does this mean? ). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy

Reviewer #1: Yes: Osama Abdelhay

**********

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/ . PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org . Please note that Supporting Information files do not need this step.

PLoS One. 2025 Jun 11;20(6):e0325834. doi: 10.1371/journal.pone.0325834.r003

Author response to Decision Letter 1

28 Mar 2025

Reviewer #1:

Major areas of improvement

Response: Thank you for this suggestion. We first want to point out that multiple imputation is only one of various strategies for handling missing data, and each approach (e.g., full information maximum likelihood, inverse probability weighting) has specific requirements and assumptions, particularly regarding sample size or missing not-at-random (MNAR) data. We have revised the Discussion section of the manuscript (lines 526-532) to provide references that discuss different missing-data methods in more detail, helping readers choose techniques best suited to their study’s characteristics. In addition, we emphasize that LiMMCov focuses on the selection of covariance structures, so data preprocessing and the choice of imputation or missing-data handling approach lie outside the app’s scope. Nevertheless, we have included notes in the ‘Dataset’ and ‘Manual’ tabs and the app’s manual to remind users to address missingness appropriately (for instance, via multiple imputation or other suitable methods) before uploading data to LiMMCov. This ensures the validity of subsequent analyses and aligns with best practices in longitudinal data handling.

• Molenberghs G, Fitzmaurice G, Kenward MG, Tsiatis A, Verbeke G. Handbook of missing data methodology. CRC Press; 2014.

• Nooraee N, Molenberghs G, Ormel J, Van den Heuvel ER. Strategies for handling missing data in longitudinal studies with questionnaires. J Stat Comput Simul [Internet]. 2018;88(17):3415–36. Available from: https://doi.org/10.1080/00949655.2018.1520854

• Ji L, Chow SM, Schermerhorn AC, Jacobson NC, Cummings EM. Handling Missing Data in the Modeling of Intensive Longitudinal Data. Struct Equ Model [Internet]. 2018;25(5):715–36. Available from: https://doi.org/10.1080/10705511.2017.1417046

• Heymans MW, Twisk JWR. Handling missing data in clinical research. J Clin Epidemiol [Internet]. 2022;151:185–8. Available from: https://doi.org/10.1016/j.jclinepi.2022.08.01

Response: We appreciate the reviewer’s focus on studies with minimal time points. We have revised the manuscript (lines 533-543) to emphasize more strongly that the stability and interpretability of time-series diagnostics (such as the PACF) are indeed limited when the number of repeated measures is very low. In these scenarios, a compound symmetry structure may be the most practical option. Our simulations suggest that LiMMCov performs well with at least three repeated measures per subject and a sample size of 50 participants or more, but we acknowledge the need for further research to confirm performance in smaller designs. We believe these clarifications better highlight the app’s limitations in real-world settings with constrained data.

Response: We appreciate your attention to this important interplay between random effects and residual covariance structures. We have added a paragraph (lines 550-564) in the Discussion section acknowledging that specifying multiple or complex random effects can already capture part of the serial correlation, making the simultaneous specification of a detailed marginal covariance structure more prone to overparameterization or non-convergence. Moreover, we emphasize that our approach offers a practical, data-driven mechanism to arrive at a defendable model, though not necessarily the most complex or “true” model. This recognizes the inherent limitations of empirical selection and underscores that LiMMCov should be used as part of a careful model-building process that balances fit, interpretability, and feasibility in real-world datasets.

Response: We acknowledge the reviewer’s interest in additional model selection and diagnostic approaches. In response, we now include AICc in LiMMCov’s model-fit comparison table, specifically for smaller-sample scenarios where AICc may be more appropriate than AIC or BIC. Regarding cross-validation, we believe it is more aligned with predictive objectives, whereas LiMMCov’s primary goal is to identify and compare different covariance structures in a longitudinal framework. As such, we currently do not implement cross-validation. We acknowledge that robust standard errors could provide further insights when model assumptions are questioned, but this extends beyond the scope of our tool, which emphasizes comparative model diagnostics rather than fully robust estimation. Nonetheless, we have expanded the Discussion section (lines 484-490) to note that users may apply robust standard errors or cross-validation in external workflows, potentially in conjunction with LiMMCov output, if their study objectives require these additional methods.

Response: Thank you for highlighting this. We have added a note in the Discussion (lines 540-543) explicitly acknowledging that LiMMCov does not currently support MA(q) or ARMA(p,q) structures. We aimed to focus on the most commonly employed covariance formulations (e.g., AR(p), CS) to keep the interface and model selection process manageable for most typical longitudinal studies. Nonetheless, we acknowledge the need for more advanced correlation structures in future updates.

Response: We appreciate the reviewer’s emphasis on finding the right balance between random effects complexity and the modelling of residual correlations. However, LiMMCov follows the marginal formulation perspective, guiding the marginal covariance structure selection. We acknowledge this limitation by linking it to comment (3). In the revised Discussion (lines 550-564), we clarify that LiMMCov is designed specifically to help users select among candidate residual covariance structures and does not handle the full hierarchical model-building process, which includes random effects. We note that excessive complexity in the random-effects portion can mask meaningful residual correlations..

Response: We appreciate this suggestion to offer a concise “decision tree” for guiding users. We have incorporated such a guide into the “Manual” tab of the LiMMCov application, where users can find recommendations on when to consider AR(1), AR(2), Compound Symmetry, or other structures based on their residual plots and autocorrelation patterns. We believe this resource enhances the user experience by clarifying how to interpret diagnostic outputs and choose an appropriate correlation structure more confidently.

8- Enhance reporting of model diagnostics. In addition to the final AIC/BIC, incorporate:

o Model Convergence Warnings: Indicate if the model has trouble converging, prompting the user to revise initial guesses.

o Random-effects Diagnostics: Although residual correlation is the focus, briefly mentioning random-effects variance components can reassure advanced users about model fit.

Response: We appreciate the reviewer’s suggestions and are pleased to confirm that LiMMCov already provides checks and warnings when models fail to converge, prompting users to adjust their inputs accordingly. Regarding random-effects diagnostics, we respectfully note that LiMMCov is rooted in the marginal formulation and thus focuses on modelling the residual covariance matrix rather than examining different random-effects structures in detail (as also discussed in our responses to Comments 3 and 6). In the Discussion section, we acknowledge that random-effects variance component diagnostics can be valuable for advanced users, but these features currently lie outside the scope of our tool’s primary objective.

Response: We appreciate the reviewer’s observation regarding heteroskedastic residuals. Although our approach initially assumes stationarity, we have added an option in LiMMCov to allow for both homoscedastic and heteroscedastic (varIdent()) residual variances for the selected covariance structure. This update acknowledges that real-world longitudinal data often exhibits changing variance over time. However, extending LiMMCov to a broader range of heteroscedastic models offered under variance function classes (i.e. varClasses in nlme) is currently out of scope (lines 543-548).

10- To identify boundary cases, encourage broader usage and external testing (e.g., including more diverse example data sets or user-submitted open data).

Response: Thank you for this suggestion. In the current version, we provide multiple example datasets, both simulated (AR1, AR2, CS) and from a published study, to demonstrate typical use cases. We also allow users to upload their own datasets with varying designs and complexities. To further expand testing and identify boundary cases not captured by our initial examples, we welcome user-submitted open data, and we have added a not in the app to encourage those willing to make their datasets publicly available to share them. We plan to maintain an online repository of such user-contributed datasets, enabling broader community engagement and facilitating more extensive real-world testing of LiMMCov.

Minor issues

1- The word “models” is repeated (“LMMs models”), the word models is in LMMs. Please remove models after LMMs.

Response: Thank you for noticing this; we have removed the word models after LMMs (lines 1-2).

2- Lines 84–85 (“However, the AIC does not offer insight ... black box identification”): The sentence could be clarified.

Response: Thank you for pointing out the need for clarification. We have revised the sentence to emphasize that although AIC provides a numerical basis for comparing models, it does not reveal the underlying reasons why one model’s covariance structure may be preferable. This lack of interpretability can be problematic for researchers who wish to understand the justification behind the chosen model. The revised text now reads (lines 87-89): “However, while AIC is useful for comparing competing models, it offers limited insight into why one model outperforms another, effectively making the selection process a ‘black box’ for identifying the model structure.”

Response: Thank you for highlighting that more detail or references would be helpful when introducing each covariance structure. We have revised the manuscript by adding a concise definition and relevant references the first time each structure is mentioned (lines 55-60). This should assist readers who are less familiar with AR(1), compound symmetry (CS), and other covariance structures in understanding their basic assumptions and where to find further details.

4- The results section might be clearer with subheadings like “3.1 Residual Diagnostics,” “3.2 Model Fit Comparison,” “3.3 Final Selected Model,” to provide an at-a-glance roadmap.

Response: Thank you for suggesting clearer subheadings in the Results section. We have reorganized the presentation to include subheadings such as “Residual Diagnostics,” and “Final Selected Model,” while retaining the logical steps to illustrate how the Shiny app is used from data loading to final report generation. This approach should provide readers with both a step-by-step workflow and an at-a-glance roadmap of the key analytical steps, as recommended.

Response: Thank you for your suggestion. We have revised the manuscript, and the sentences around these lines (now lines 500-509) have been rephrased to improve their fluency and transitions.

6- Overall, the text is quite readable but can be made more concise by removing repeated points and ensuring that each paragraph targets a single theme.

Response: Thank you. We have revised the manuscript to address repeated points and ensure that each paragraph targets a single theme. Specifically, we consolidated overlapping discussions, removed redundant sentences, and reorganized the text to improve clarity and flow, as can be seen in the track changes. We believe these changes have resulted in a more concise and coherent manuscript.

Attachment

Submitted filename: Response to Reviewers.docx

pone.0325834.s004.docx^{(113.7KB, docx)}

PLoS One. doi: 10.1371/journal.pone.0325834.r004

Decision Letter 1

Mohamed Abonazel

29 Apr 2025

Dear Dr. Savieri,

Please submit your revised manuscript by Jun 13 2025 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org . When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.
A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.
An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

We look forward to receiving your revised manuscript.

Kind regards,

Mohamed R. Abonazel, Ph.D.

Academic Editor

PLOS ONE

Journal Requirements:

Please review your reference list to ensure that it is complete and correct. If you have cited papers that have been retracted, please include the rationale for doing so in the manuscript text, or remove these references and replace them with relevant current references. Any changes to the reference list should be mentioned in the rebuttal letter that accompanies your revised manuscript. If you need to cite a retracted article, indicate the article’s retracted status in the References list and also include a citation and full reference for the retraction notice.

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

Reviewer #1: All comments have been addressed

Reviewer #2: All comments have been addressed

**********

2. Is the manuscript technically sound, and do the data support the conclusions??>

Reviewer #1: Yes

Reviewer #2: Yes

**********

3. Has the statistical analysis been performed appropriately and rigorously? -->?>

Reviewer #1: Yes

Reviewer #2: Yes

**********

4. Have the authors made all data underlying the findings in their manuscript fully available??>

The PLOS Data policy

Reviewer #1: Yes

Reviewer #2: Yes

**********

5. Is the manuscript presented in an intelligible fashion and written in standard English??>

Reviewer #1: Yes

Reviewer #2: Yes

**********

Reviewer #1: Dear authors, congratulations on this important manuscript. Your developed tool is very useful and have practical implementation to any researcher working with longitudinal studies.

Reviewer #2: The article seems interesting and proposes software solutions for the specific problems of Mixed Models, particularly for longitudinal analysis over time.

However, I do have some reservations regarding the material presented:

1. The Yule-Walker equations have some limitations that the authors should consider:

The accuracy of the estimates depends on the assumption that the time series is stationary, meaning that its statistical properties do not change over time. If the time series is non-stationary, differencing or other transformations may be required before applying the Yule-Walker method.

The Yule-Walker estimates are based on the sample autocorrelation function, which can introduce sampling error, especially for small sample sizes.

2. The PACF function in Figure 6 has several values that exceed the confidence interval limits at lags 9 and 16. The component's behavior decreases harmonically, which suggests — based on the inversion of AR models into MA models — the possibility of the existence of an MA component rather than higher-order AR components.

3. The authors correctly note that there are still "gray areas" to be explored, but I would recommend also discussing the possibility of a stochastic trend.

4. The violation of stationarity in the series variance — which is closely linked to the covariance matrix — should not be overlooked. This leads to the use of ARCH models, which opens a new “window of opportunity” for future articles. This aspect should also be mentioned among future research directions!

I would like to congratulate and encourage the authors for their effort in developing and aggregating existing methods for time series analysis within the framework of mixed models!

**********

what does this mean? ). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy

Reviewer #1: Yes: Osama Abdelhay

Reviewer #2: No

**********

PLoS One. 2025 Jun 11;20(6):e0325834. doi: 10.1371/journal.pone.0325834.r005

Author response to Decision Letter 2

6 May 2025

Manuscript ID: PONE-D-25-03886R1

Manuscript title: LiMMCov: An Interactive Research Tool for Efficiently Selecting Covariance Structures in Linear Mixed Models Using Insights from Time Series Analysis

Reviewer #2:

Minor areas of improvement

1. The Yule-Walker equations have some limitations that the authors should consider:

The Yule-Walker estimates are based on the sample autocorrelation function, which can introduce sampling error, especially for small sample sizes.

Thank you for pointing this out. From a linear mixed-model viewpoint, the residual covariance structures that LiMMCov compares (i.e, AR(1), AR(p), compound symmetry, etc.) from the nlme package (Pinheiro et al., 2023), all embed the assumption of weak stationarity. Because this assumption underlies the likelihood for any LMM fitted with such structures, it was natural to maintain it when generating synthetic data for our validation study. Therefore, we first generated stationary AR(p) processes (verifying that all characteristic roots lay outside the unit circle) and then applied the Yule-Walker equations to obtain the corresponding AR coefficients. These coefficients defined the correlation matrices used to simulate longitudinal datasets for benchmarking LiMMCov. We have clarified in the Methods section (lines 223-225) that the Yule-Walker procedure was applied under weak-stationarity assumptions. We also acknowledge, now in the Discussion (lines 553-561), that Yule-Walker estimators rely on the sample autocorrelation function and can be imprecise in very small samples. However, this potential sampling error pertains only to the data-generation phase of our simulations; LiMMCov itself remains agnostic to how the covariance matrix is obtained and can be applied to empirical data regardless of the method used to estimate or specify residual correlations. Hence, the core functionality of the app is not altered.

We appreciate the reviewer’s time-series perspective. The out-of-band spikes at lags 9 and 16 indeed suggest that a moving-average (MA) component, or an ARMA(p,q) process, could underlie the residual pattern. LiMMCov, however, is intentionally limited to autoregressive and other commonly used stationary covariance structures in linear mixed models; modelling explicit MA or ARMA components is beyond this first release's scope. As noted in the Discussion (and echoed by Reviewer 1), the absence of MA/ARMA covariance structures is a current limitation (lines 541-544) and an important direction for future development as we continue integrating time-series methods into LMM frameworks. In the meantime, users who suspect MA behaviour can either analyse the series externally with dedicated time-series software or approximate the effect by increasing the AR order within LiMMCov, while recognising the approximation’s limitations.

3. The authors correctly note that there are still "grey areas" to be explored, but I would recommend also discussing the possibility of a stochastic trend.

We appreciate this insightful recommendation. LiMMCov is intended as an entry point for the linear-mixed-model (LMM) community to adopt time-series concepts; accordingly, the current version assumes that, once an appropriate mean structure is specified, the residuals are weakly stationary. Any stochastic (unit-root) trend, therefore, needs to be removed or modelled in the fixed-effects part of the LMM (e.g., polynomial time terms) before LiMMCov’s covariance-structure selection is applied. Because an ARIMA(p,d,q) process with d>0 becomes an ARMA(p,q) process after differencing, such non-stationary behaviour cannot be accommodated simply by specifying an “ARIMA covariance matrix.” Supporting integrated components would instead require additional modelling layers (e.g., automatic differencing or state-space extensions), and evaluating their usefulness within the LMM framework would first necessitate a feasibility or sensitivity study, work we regard as future research rather than part of the present release. We have revised the Discussion section (lines 541-544 and 563–573) with parts that (i) explain how users can handle a stochastic trend with the existing tool and (ii) identify the incorporation of extended ARMA-type covariance structures as a logical, but still prospective, enhancement to LiMMCov.

We agree that conditional heteroskedasticity, often observed as “volatility clustering,” where large residuals tend to be followed by large residuals, constitutes an important avenue for future work. At present, LiMMCov can accommodate unconditional heteroskedasticity via varIdent(), but it does not estimate models in which the residual variance evolves as an explicit stochastic process driven by past squared errors (e.g., ARCH/GARCH behaviour). We have therefore added a sentence in the Discussion (lines 547-551) that (i) acknowledges volatility clustering as a motivation for ARCH models and (ii) lists ARCH/GARCH covariance structures as a promising avenue for future development and methodological research.

Attachment

Submitted filename: Response_to_Reviewers_auresp_2.docx

pone.0325834.s005.docx^{(24.1KB, docx)}

PLoS One. doi: 10.1371/journal.pone.0325834.r006

Decision Letter 2

Mohamed Abonazel

LiMMCov: An Interactive Research Tool for Efficiently Selecting Covariance Structures in Linear Mixed Models Using Insights from Time Series Analysis

PONE-D-25-03886R2

20 May 2025

Dear Dr. Savieri,

We’re pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it meets all outstanding technical requirements.

Within one week, you’ll receive an e-mail detailing the required amendments. When these have been addressed, you’ll receive a formal acceptance letter and your manuscript will be scheduled for publication.

An invoice will be generated when your article is formally accepted. Please note, if your institution has a publishing partnership with PLOS and your article meets the relevant criteria, all or part of your publication costs will be covered. Please make sure your user information is up-to-date by logging into Editorial Manager at Editorial Manager® and clicking the ‘Update My Information' link at the top of the page. If you have any questions relating to publication charges, please contact our Author Billing department directly at authorbilling@plos.org.

If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they’ll be preparing press materials, please inform our press team as soon as possible -- no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org.

Kind regards,

Mohamed R. Abonazel, Ph.D.

Academic Editor

PLOS ONE

Additional Editor Comments (optional):

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

Reviewer #2: All comments have been addressed

**********

2. Is the manuscript technically sound, and do the data support the conclusions??>

Reviewer #2: Yes

**********

3. Has the statistical analysis been performed appropriately and rigorously? -->?>

Reviewer #2: Yes

**********

4. Have the authors made all data underlying the findings in their manuscript fully available??>

The PLOS Data policy

Reviewer #2: Yes

**********

5. Is the manuscript presented in an intelligible fashion and written in standard English??>

Reviewer #2: Yes

**********

Reviewer #2: (No Response)

**********

what does this mean? ). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy

Reviewer #2: No

**********

PLoS One. doi: 10.1371/journal.pone.0325834.r007

Acceptance letter

Mohamed Abonazel

PONE-D-25-03886R2

PLOS ONE

Dear Dr. Savieri,

I'm pleased to inform you that your manuscript has been deemed suitable for publication in PLOS ONE. Congratulations! Your manuscript is now being handed over to our production team.

At this stage, our production department will prepare your paper for publication. This includes ensuring the following:

* All references, tables, and figures are properly cited

* All relevant supporting information is included in the manuscript submission,

* There are no issues that prevent the paper from being properly typeset

You will receive further instructions from the production team, including instructions on how to review your proof when it is ready. Please keep in mind that we are working through a large volume of accepted articles, so please give us a few days to review your paper and let you know the next and final steps.

Lastly, if your institution or institutions have a press office, please let them know about your upcoming paper now to help maximize its impact. If they'll be preparing press materials, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org.

If we can help with anything else, please email us at customercare@plos.org.

Thank you for submitting your work to PLOS ONE and supporting open access.

Kind regards,

PLOS ONE Editorial Office Staff

on behalf of

Dr Mohamed R. Abonazel

Academic Editor

PLOS ONE

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

S1 File. Appendix A.

Yule-Walker equations for AR(1) and AR(2) processes.

(DOCX)

pone.0325834.s001.docx^{(21.9KB, docx)}

S2 File. Appendix B.

R Code for longitudinal data simulation and estimation of AR parameters.

(DOCX)

pone.0325834.s002.docx^{(22.5KB, docx)}

Attachment

Submitted filename: Response to Reviewers.docx

pone.0325834.s004.docx^{(113.7KB, docx)}

Attachment

Submitted filename: Response_to_Reviewers_auresp_2.docx

pone.0325834.s005.docx^{(24.1KB, docx)}

Data Availability Statement

All relevant data are within the manuscript and its Supporting information files.

[pone.0325834.ref001] 1.Fitzmaurice G, Davidian M, Verbeke G, Molenberghs G. Longitudinal data analysis. CRC press. 2008. [Google Scholar]

[pone.0325834.ref002] 2.Molenberghs G, Verbeke G. Linear Mixed Models for Longitudinal Data. Springer. [Google Scholar]

[pone.0325834.ref003] 3.Fitzmaurice GM, Laird NM, Ware JH. Applied longitudinal analysis [Internet]. 2012. [cited 2024 Aug 16]. Available from: https://books.google.com/books?hl=en&lr=&id=0exUN1yFBHEC&oi=fnd&pg=PR21&dq=Fitzmaurice,+G.M.,+Laird,+N.M.,+%26+Ware,+J.H.+(2004).+Applied+longitudinal+analysis.+Hoboken,+NJ:+Wiley.&ots=BfkQW1vK8D&sig=Tfnhf7ehrF1diIypWeHGiP73hUo [Google Scholar]

[pone.0325834.ref004] 4.Littell R, Stroup W, Freund R. SAS for linear models [Internet]. 2002. [cited 2024 Aug 16]. Available from: https://books.google.com/books?hl=en&lr=&id=yG_JEAAAQBAJ&oi=fnd&pg=PR11&dq=Littell+(2002)+SAS+for+Linear+Models&ots=C-fVUbTgAf&sig=gqMC9_CNDyK6kiu54LkSiHlSNIs [Google Scholar]

[pone.0325834.ref005] 5.Singer JD. Using SAS PROC MIXED to fit multilevel models, hierarchical models, and individual growth models. J Educational and Behavioral Statistics. 1998;23(4):323–55. doi: 10.3102/10769986023004323 [DOI] [Google Scholar]

[pone.0325834.ref006] 6.Diggle PJ, Heagerty P, Kung-Yee L, Zeger S. Analysis of longitudinal data. Oxford University Press; 2002. [Google Scholar]

[pone.0325834.ref007] 7.Gałecki A, Burzykowski T. Linear mixed-effects model. Springer. 2013. [Google Scholar]

[pone.0325834.ref008] 8.Vallejo G, Ato M, Valdés T. Consequences of misspecifying the error covariance structure in linear mixed models for longitudinal data. Methodology. 2008;4(1):10–21. doi: 10.1027/1614-2241.4.1.10 [DOI] [Google Scholar]

[pone.0325834.ref009] 9.Littell RC, Pendergast J, Natarajan R. Modelling covariance structure in the analysis of repeated measures data. Stat Med. 2000;19(13):1793–819. doi: [DOI] [PubMed] [Google Scholar]

[pone.0325834.ref010] 10.Liu S, Rovine MJ, Molenaar PCM. Selecting a linear mixed model for longitudinal data: repeated measures analysis of variance, covariance pattern model, and growth curve approaches. Psychol Methods. 2012;17(1):15–30. doi: 10.1037/a0026971 [DOI] [PubMed] [Google Scholar]

[pone.0325834.ref011] 11.Akaike H. A new look at the statistical model identification. IEEE Trans Automat Contr. 1974;19(6):716–23. doi: 10.1109/tac.1974.1100705 [DOI] [Google Scholar]

[pone.0325834.ref012] 12.HURVICH CM, TSAI C-L. Regression and time series model selection in small samples. Biometrika. 1989;76(2):297–307. doi: 10.1093/biomet/76.2.297 [DOI] [Google Scholar]

[pone.0325834.ref013] 13.Bozdogan H. Model selection and Akaike’s Information Criterion (AIC): the general theory and its analytical extensions. Psychometrika. 1987;52(3):345–70. doi: 10.1007/bf02294361 [DOI] [Google Scholar]

[pone.0325834.ref014] 14.Schwarz G. Estimating the dimension of a model. Ann Stat. 1978;6(2):461–4. [Google Scholar]

[pone.0325834.ref015] 15.Ferron J, Dailey R, Yi Q. Effects of misspecifying the first-level error structure in two-level models of change. Multivariate Behav Res. 2002;37(3):379–403. doi: 10.1207/S15327906MBR3703_4 [DOI] [PubMed] [Google Scholar]

[pone.0325834.ref016] 16.Gomez EV, Schaalje GB, Fellingham GW. Performance of the kenward–roger method when the covariance structure is selected using AIC and BIC. Commun Stat Simul Comput. 2005;34(2):377–92. doi: 10.1081/sac-200055719 [DOI] [Google Scholar]

[pone.0325834.ref017] 17.Keselman HJ, Algina J, Kowalchuk RK, Wolfinger RD. A comparison of two approaches for selecting covariance structures in the analysis of repeated measurements. Commun Stat Simul Comput. 1998;27(3):591–604. doi: 10.1080/03610919808813497 [DOI] [Google Scholar]

[pone.0325834.ref018] 18.Al-Marshadi AH. Selecting the covariance structure in mixed model using statistical methods calibration. J Math Stat. 2014;10(3):309–15. [Google Scholar]

[pone.0325834.ref019] 19.Hussein AL-Marshadi A. The new approach to guide the selection of the covariance structure in mixed model. Res J Med Med Sci. 2007;2(2):88–97. [Google Scholar]

[pone.0325834.ref020] 20.Kwok O, West SG, Green SB. The impact of misspecifying the within-subject covariance structure in multiwave longitudinal multilevel models: a monte carlo study. Multivariate Behav Res. 2007;42(3):557–92. doi: 10.1080/00273170701540537 [DOI] [Google Scholar]

[pone.0325834.ref021] 21.Murphy DL, Pituch KA. The performance of multilevel growth curve models under an autoregressive moving average process. J Exp Educ. 2009;77(3):255–84. doi: 10.3200/jexe.77.3.255-284 [DOI] [Google Scholar]

[pone.0325834.ref022] 22.Lee Y hsuan. Longitudinal data analysis using multilevel linear modeling (MLM): Fitting an optimal variance-covariance structure. 2010.

[pone.0325834.ref023] 23.Davis CS. Statistical Methods for the Analysis of Repeated Measurements. Springer New York. 2002. doi: 10.1007/b97287 [DOI] [Google Scholar]

[pone.0325834.ref024] 24.Diggle P, Heagerty P, Liang KY, Zeger S. Analysis of longitudinal data. New York, NY: Oxford University Press. 2002. [Google Scholar]

[pone.0325834.ref025] 25.Singer J, Willett J. Applied longitudinal data analysis: modeling change and event occurrence. 2003. [Google Scholar]

[pone.0325834.ref026] 26.Keselman HJ, Algina J, Kowalchuk RK, Wolfinger RD. The analysis of repeated measurements: a comparison of mixed-model satterthwaite f tests and a nonpooled adjusted degrees of freedom multivariate test. Communications in Statistics - Theory and Methods. 1999;28(12):2967–99. doi: 10.1080/03610929908832460 [DOI] [Google Scholar]

[pone.0325834.ref027] 27.Vallejo G, Livacic-Rojas P. Comparison of two procedures for analyzing small sets of repeated measures data. Multivariate Behav Res. 2005;40(2):179–205. doi: 10.1207/s15327906mbr4002_2 [DOI] [PubMed] [Google Scholar]

[pone.0325834.ref028] 28.Kowalchuk RK, Keselman HJ, Algina J, Wolfinger RD. The analysis of repeated measurements with mixed-model adjusted F tests. J Educ Behav Stat. 2004;64(2):224–42. doi: 10.1177/0013164403260196 [DOI] [Google Scholar]

[pone.0325834.ref029] 29.Hedeker D, Mermelstein R j. Mixed-effects regression models with heterogeneous variance: Analyzing ecological momentary assessment (EMA) data of smoking. Modeling contextual effects in longitudinal studies. 2007. [Google Scholar]

[pone.0325834.ref030] 30.Dawson KS, Gennings C, Carter WH. Two graphical techniques useful in detecting correlation structure in repeated measures data. Am Stat. 1997;51(3):275–83. [Google Scholar]

[pone.0325834.ref031] 31.Alonso PH. Data analysis based on SISSREM: Shiny Interactive, Supervised and Systematic report from REpeated Measures data. 2019.

[pone.0325834.ref032] 32.Shumway RH, Stoffer DS. Time Series: A Data Analysis Approach Using R. 2019. [Google Scholar]

[pone.0325834.ref033] 33.Box GEP, Jenkins GM, Reinsel GC, Ljung GM. Time series analysis: forecasting and control. John Wiley & Sons. 2015. [Google Scholar]

[pone.0325834.ref034] 34.Chang W, Cheng J, Allaire JJ, Sievert C, Schloerke B, Xie Y, et al. shiny: Web Application Framework for R. https://cran.r-project.org/package=shiny. 2023. [Google Scholar]

[pone.0325834.ref035] 35.Chatfield C, Xing H. The Analysis of Time Series: An Introduction with R, Seventh Edition. 2019. [Google Scholar]

[pone.0325834.ref036] 36.Theodoridis S. Machine learning: a Bayesian and optimization perspective. Academic Press. 2015. [Google Scholar]

[pone.0325834.ref037] 37.Brien C. Dae: Functions Useful in the Design and ANOVA of Experiments. https://cran.r-project.org/package=dae. 2024. [Google Scholar]

[pone.0325834.ref038] 38.Tegegne AS. Predictors associated with the variation of CD4 cell count and body mass index (BMI) for HIV positive adults under ART. Scientific African. 2021;12:e00820. doi: 10.1016/j.sciaf.2021.e00820 [DOI] [Google Scholar]

[pone.0325834.ref039] 39.Muhie NS, Tegegne AS. Predictors for CD4 cell count and hemoglobin level with survival time to default for HIV positive adults under ART treatment at University of Gondar Comprehensive and Specialized Hospital, Ethiopia. BMC Res Notes. 2023;16(1):357. doi: 10.1186/s13104-023-06625-3 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0325834.ref040] 40.Gebrerufael GG, Asfaw ZG. Predictors of change in CD4 cell count over time for HIV/AIDS patients on ART follow-up in northern Ethiopia: a retrospective longitudinal study. BMC Immunol. 2024;25(1):64. doi: 10.1186/s12865-024-00659-3 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0325834.ref041] 41.R Core Team. R: A Language and Environment for Statistical Computing. Vienna, Austria. 2023. [Google Scholar]

[pone.0325834.ref042] 42.Nyasulu PS, Ayele BT, Koegelenberg CF, Irusen E, Lalla U, Davids R, et al. Clinical characteristics associated with mortality of COVID-19 patients admitted to an intensive care unit of a tertiary hospital in South Africa. PLoS One. 2022;17(12):e0279565. doi: 10.1371/journal.pone.0279565 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0325834.ref043] 43.Pinheiro J, Bates D, R Core Team. nlme: Linear and Nonlinear Mixed Effects Models. https://cran.r-project.org/package=nlme. 2023. [Google Scholar]

[pone.0325834.ref044] 44.Lesaffre E, Asefa M, Verbeke G. Assessing the goodness-of-fit of the Laird and Ware model--an example: the jimma infant survival differential longitudinal study. Stat Med. 1999;18(7):835–54. doi: [DOI] [PubMed] [Google Scholar]

[pone.0325834.ref045] 45.Lesaffre E, Todem D, Verbeke G, Kenward M. Flexible modelling of the covariance matrix in a linear random effects model. Biom J. 2000;42(7):807–22. doi: [DOI] [Google Scholar]

[pone.0325834.ref046] 46.Marusich LR, Bakdash JZ. rmcorrShiny: A web and standalone application for repeated measures correlation. F1000Res. 2021;10:697. doi: 10.12688/f1000research.55027.2 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0325834.ref047] 47.Molenberghs G, Fitzmaurice G, Kenward MG, Tsiatis A, Verbeke G. Handbook of missing data methodology. CRC Press. 2014. [Google Scholar]

[pone.0325834.ref048] 48.Nooraee N, Molenberghs G, Ormel J, Van den Heuvel ER. Strategies for handling missing data in longitudinal studies with questionnaires. J Stat Comput Simul. 2018;88(17):3415–36. doi: 10.1080/00949655.2018.1520854 [DOI] [Google Scholar]

[pone.0325834.ref049] 49.Ji L, Chow S-M, Schermerhorn AC, Jacobson NC, Cummings EM. Handling missing data in the modeling of intensive longitudinal data. Struct Equ Modeling. 2018;25(5):715–36. doi: 10.1080/10705511.2017.1417046 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0325834.ref050] 50.Heymans MW, Twisk JWR. Handling missing data in clinical research. J Clin Epidemiol. 2022;151:185–8. doi: 10.1016/j.jclinepi.2022.08.016 [DOI] [PubMed] [Google Scholar]

PERMALINK

LiMMCov: An interactive research tool for efficiently selecting covariance structures in linear mixed models using insights from time series analysis

Perseverence Savieri

Lara Stas

Kurt Barbé

Roles

Abstract

1. Introduction

2. Methods

2.1 Process for evaluating the proposed method

2.1.1 Deriving correlation matrices using Yule-Walker equations.

2.2 Numerical validation

2.3 Fitting linear models and visualising residuals

2.3.1 Importance of residual analysis.

2.3.2 Initial model fitting.

2.3.3 Visualising residual correlation.

Fig 1. Residual plots.

2.3.4 Using PACF to detect autoregression order.

Fig 2. PACF plots.

2.4 User interface overview

Fig 3. LiMMCov user interface.

3. Results

3.1 Example: Longitudinal analysis of oxygen saturation in COVID-19 ICU patients

3.1.1 Load data.

Fig 4. Dataset tab.

3.1.2 Fit the linear model.

Fig 5. GLM tab.

3.1.3 Residuals diagnostics.

Fig 6. Covariance Analysis tab.

3.1.4 Fit Linear Mixed Model (LMM).

Fig 7. LMM tab.

3.1.5 Final selected model.

Table 1. Model fit comparison.

3.1.6 Generate Report.

4. Discussion

5. Conclusion

Supporting information

Acknowledgments

Data Availability

Funding Statement

References

Author response to Decision Letter 0

Decision Letter 0

Mohamed Abonazel

Roles

Author response to Decision Letter 1

Decision Letter 1

Mohamed Abonazel

Roles

Author response to Decision Letter 2

Decision Letter 2

Mohamed Abonazel

Roles

Acceptance letter

Mohamed Abonazel

Roles

Associated Data

Supplementary Materials

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases