Abstract
Interrupted time series (ITS) are often meta‐analysed to inform public health and policy decisions but examination of the statistical methods for ITS analysis and meta‐analysis in this context is limited. We simulated meta‐analyses of ITS studies with continuous outcome data, analysed the studies using segmented linear regression with two estimation methods [ordinary least squares (OLS) and restricted maximum likelihood (REML)], and meta‐analysed the immediate level‐ and slope‐change effect estimates using fixed‐effect and (multiple) random‐effects meta‐analysis methods. Simulation design parameters included varying series length; magnitude of lag‐1 autocorrelation; magnitude of level‐ and slope‐changes; number of included studies; and, effect size heterogeneity. All meta‐analysis methods yielded unbiased estimates of the interruption effects. All random effects meta‐analysis methods yielded coverage close to the nominal level, irrespective of the ITS analysis method used and other design parameters. However, heterogeneity was frequently overestimated in scenarios where the ITS study standard errors were underestimated, which occurred for short series or when the ITS analysis method did not appropriately account for autocorrelation. The performance of meta‐analysis methods depends on the design and analysis of the included ITS studies. Although all random effects methods performed well in terms of coverage, irrespective of the ITS analysis method, we recommend the use of effect estimates calculated from ITS methods that adjust for autocorrelation when possible. Doing so will likely to lead to more accurate estimates of the heterogeneity variance.
Keywords: interrupted time series, meta‐analysis, segmented regression, simulation, statistical methods
Highlights.
What is already known
An interrupted time series (ITS) study is a non‐randomised design in which data are collected repeatedly over time before and after an interruption (such as the introduction of a bicycle helmet law). The results from multiple ITS studies may be statistically combined using meta‐analysis methods; the findings of which underpin conclusions informing public health or policy decisions.
The performance of the statistical methods for analysing single ITS studies has been shown to depend on the length of the series and the underlying correlation between consecutive data points (i.e., autocorrelation). As well, the performance of meta‐analysis methods is known to depend on the number of included studies and the underlying variability in the study intervention effects.
What is new
We undertook a numerical simulation study to examine the performance of meta‐analysis methods in the context of multiple ITS studies. We found that all meta‐analysis methods yielded unbiased estimates of the interruption effects. Furthermore, we found that all random effects methods yielded coverage close to the nominal level, irrespective of the ITS analysis method used and other design features (e.g., the magnitude of heterogeneity). However, heterogeneity was frequently overestimated in scenarios where the ITS study standard errors were underestimated, which is more likely to arise when ITS analysis methods do not appropriately account for autocorrelation. We therefore recommend that meta‐analysts should strive to use effect estimates and standard errors that have been calculated from ITS methods that adjust for autocorrelation.
Potential impact for RSM readers outside the authors' field
ITS studies and systematic reviews of ITS studies are used across disciplines and topics (e.g., public health, crime, economics, war and psychology) to investigate the impact of interruptions. Our findings and recommendations are therefore likely to apply across disciplines.
1. INTRODUCTION
Healthcare policy decision‐making is often informed by systematic reviews examining the impact of policy or public health interventions, or the impact from exposures such as natural disasters (both referred to as ‘interruptions’ hereafter). These reviews may need to consider evidence beyond randomised trials, as it is not always possible to randomise interventions targeted at populations (e.g., when evaluating the impact of a media campaign broadcast to an entire country). 1 , 2 A quasi‐experimental non‐randomised design that is often used to evaluate the impact of interruptions targeted at populations is the interrupted time series (ITS) design. 3 , 4 , 5 This design is immune to common threats to internal validity compared with other non‐randomised designs (e.g., uncontrolled before‐after design), and as such, is often included in systematic reviews. 6 , 7 , 8 The results from multiple ITS studies within systematic reviews may be statistically combined using meta‐analysis methods; the findings of which underpin review conclusions. 9 , 10
Before proceeding with meta‐analysis of ITS studies, there is a range of issues for analysis of a single ITS study that requires consideration. In a single ITS study, data are often collected continuously over time pre‐ and post‐interruption. Commonly the data are aggregated using summary statistics (such as means or proportions) over regular time intervals (e.g., weekly or monthly) for analysis. 11 A commonly fitted model structure is a segmented linear model, 12 , 13 , 14 which allows estimation of separate underlying time trends in the pre‐interruption period and the post‐interruption period. The estimated time trend in the pre‐interruption period can be used to predict what would have occurred in the absence of the interruption, thus providing a counterfactual for comparison with what was observed, using the estimated post‐interruption time trend. Several effect metrics can then be calculated to quantify the impact of the interruption; commonly these include an immediate level change, and a change in slope from pre‐interruption period to post‐interruption period 15 (see, e.g., Figure 1a). Researchers aiming to include ITS in a meta‐analysis may need to re‐analyse the original data (which is often possible when data are presented in figures in primary publications 12 , 16 ) to calculate interruption effects using desired effect metrics, and appropriate statistical methods. 14 , 17
FIGURE 1.

An analysis of ITS data from Lane et al (2019). 38 (A) Plots of interrupted time series (ITS) data examining the effect of state laws to legalise recreational cannabis sales on the Traffic fatality rate (per million residents) for the three legalising states. 38 The crosses represent data points, the solid lines represent the pre‐ and post‐interruption trend lines and the dashed line represents the counterfactual trend line. The green dashed line indicates the time of the interruption. The four states' datapoints are coloured red (State 2), orange (State 3), blue (State 5), and purple (State 9) for matching with their respective level‐change and slope‐change effect estimates in B. See Appendix 1 for an ITS graph annotated with the effect measures of interest and plots of all 11 states' ITS data. (B) Forest plots depicting state‐level and meta‐analysis estimates of immediate level‐change (left) and slope‐change (right). Statistics associated with hypothesis tests for whether the underlying level‐ or slope‐change are equal to zero are presented (“Test of = 0”), along with statistics quantifying heterogeneity (e.g., ). [Colour figure can be viewed at wileyonlinelibrary.com]
A range of statistical methods are available for estimating the regression parameters and effect estimates from a segmented linear model. 12 , 18 , 19 While Ordinary Least Squares (OLS) is often used, it fails to account for the potential correlation of consecutive time points (known as autocorrelation or serial correlation), which is a key characteristic of time series data. 20 , 21 Failing to account for autocorrelation may lead to incorrect estimates of the standard errors of the regression parameters. 22 , 23 Several methods that attempt to account for potential autocorrelation include, for example, generalised least squares methods (e.g., Prais‐Winsten (PW) 24 ), and Restricted Maximum Likelihood (REML). 25 A numerical simulation study has provided insight on the performance of these statistical methods (and others) for analysing single ITS studies with continuous outcomes using segmented linear models. 17 The authors found that performance differed across the methods, but that REML was often preferable to the other methods; however, its performance was dependent on the length of the series and the underlying magnitude of autocorrelation.
Univariate meta‐analysis may be used to estimate a combined effect across ITS studies 26 , 27 (see, e.g., Figure 1b). Commonly a two‐stage meta‐analysis approach is used, 8 whereby the interruption effect estimates (e.g., level‐change or slope‐change) are calculated for each ITS study, and then statistically combined. 27 , 28 The effects are commonly combined assuming the fixed (or common) effect model, or the random effects model. 8 , 29 , 30 The fixed‐effect approach requires estimates of the interruption effect and its standard error only, while the random‐effects approach additionally requires the estimation of the between‐study variance. 10 , 30 Numerous between‐study variance estimators exist (e.g., the DerSimonian and Laird (DL) and REML estimators) 31 and, in addition, numerous methods are available for calculating the confidence interval of the combined effect (e.g., Wald‐type (WT) and Hartung‐Knapp/Sidik‐Jonkman (HKSJ) methods). 32
The performance of the between‐study variance estimators and the confidence interval methods have been reviewed and compared in numerical simulation studies and empirically using real‐world data. 31 , 32 , 33 The DL estimator, commonly used to estimate the between‐study variance, is well known to have suboptimal statistical properties in circumstances where there are few studies and the underlying statistical heterogeneity is large. 21 , 34 The REML estimator has been proposed as an alternative because it has been shown to yield less biased estimates of the between‐study variance compared to DL. 12 The WT method, a commonly used method for calculating confidence intervals, has been shown to yield less than nominal coverage levels when there are few studies or when the underlying between‐study variance is large, or both. 32 The HKSJ method has been shown to yield wider confidence intervals than the WT, although may yield narrower confidence intervals when the number of included studies is small or true between‐study variance is small. 35
While ITS analysis methods have been evaluated at the individual study level, 17 , 22 and the meta‐analysis methods have been evaluated generally, 31 , 32 , 36 , 37 neither has been evaluated in the context of multiple ITS studies. This context necessitates consideration of both individual study (re‐)analysis and meta‐analysis simultaneously. Hence, in this simulation study, we aimed to examine the performance of different univariate meta‐analysis methods to combine results from ITS studies, and how characteristics of the meta‐analysis, ITS design, and ITS analysis methods, modify the performance. Specifically, we examined how the performance was altered when the magnitude of first order [also known as lag‐1 or AR (1)] autocorrelation, series length, degree of heterogeneity in the interruption effects and number of included ITS studies were varied. We limited the meta‐analyses examined to those that included ITS studies with continuous outcomes, a fixed number of data points, an equal number of data points pre‐ and post‐interruption, and the same pre‐interruption level and slope. We did not consider scenarios or statistical methods that include control series. We begin by describing an illustrative example, to which we later return to demonstrate the impact of applying the methods evaluated in the simulation study.
1.1. Illustrative example
Lane et al undertook a study to examine the effects of cannabis legislation on traffic fatalities. 38 Using routinely collected traffic fatality data from 11 states in the United States of America, they examined whether legalising recreational cannabis had an impact on traffic fatality rates in legalising and neighbouring states. The outcome was monthly traffic fatality rates per million residents. The study also included 19 location‐based control states which had not legalised recreational cannabis and were not neighbours of the legalising states; these are ignored for this illustrative example. The series had 96 monthly datapoints per series and there were 11 series available for meta‐analysis. Fitting segmented linear regression models to each of the state's time series allows estimation of the impact of the legislation immediately (level‐change) as well as any change in trend (slope‐change; Figure 1a). These effect estimates can then be combined using meta‐analysis methods, which also allows investigation of the consistency of the effects across the series (in this example, states). In Section 2, we describe several statistical methods available for estimating the interruption effects for ITS studies and for meta‐analysing the resulting estimates. In Section 3.8, we apply these methods to this example and compare the results.
2. METHODS
This simulation study was designed according to the “ADEMP” structure proposed by Morris et al. 39 The background and Aims were described above, while in subsequent sections we outline the Data generation mechanisms (Sections 2.1 and 2.2.1), the Estimands (and their estimation procedures of interest, Section 2.2.2), Methods and Performance measures (Sections 2.2.3, 2.2.6).
2.1. Statistical models
2.1.1. Statistical model for an ITS study
An ITS with a single interruption is commonly modelled using segmented linear regression as follows 1 :
| (1) |
is a continuous outcome at time and the interruption time is indicated by . is an indicator variable that represents the post‐interruption period () represents the intercept in the pre‐interruption period, the pre‐interruption slope, and and represent the interruption effects; respectively, change in level and change in slope. The error term, , is constructed from two components (). The first, (), represents the degree of the correlation between the error at time and the error of the previous time point, and the second represents ‘white noise’, which is assumed to be normally distributed (). Here, the error term accommodates lag‐1 (AR (1)) autocorrelation, but can be extended to accommodate longer lags.
2.1.2. Estimation methods for ITS analysis
There are several estimation methods that can be used to estimate the parameters of the segmented linear regression model. Here, we focus on three estimation methods—OLS, PW and REML.
Ordinary least squares (OLS) estimators, commonly used in practice, 8 , 12 can be used to estimate the regression parameters and their standard errors. 21 A key assumption of OLS is that the model errors are uncorrelated between observations, which may be violated with time series data. In the presence of autocorrelation, estimates of the regression parameters will be unbiased, however, their standard errors may be biased. In the presence of (likely 22 ) positive autocorrelation, they will be too small. 18 , 40
Prais‐Winsten (PW) a generalised least‐squares approach, provides an extension of OLS that allows for lag‐1 autocorrelation (AR (1)). 41 In brief, the estimation procedure involves first fitting the segmented linear regression model (Equation 1) using OLS, from which an estimate of autocorrelation is calculated from the residuals. The data are then transformed using the estimated autocorrelation, aiming to remove the autocorrelation from the errors. The regression parameters are then re‐estimated using OLS. Further iteration of these steps may be required until the estimated autocorrelation converges. 42
Restricted Maximum Likelihood (REML) estimators can be used to estimate the regression parameters and their standard errors. REML is a form of maximum likelihood estimation in which the log‐likelihood is partitioned into two terms. The first term, comprised of only variance components, is first maximised to obtain estimates of the error variance and correlation parameters, accounting for the appropriate degrees of freedom. The second term, comprised of both regression and error variance parameters, is then maximised using estimates from the first term. Maximum likelihood variance estimators do not appropriately account for the loss in degrees of freedom that result from estimating the regression parameters, which leads to negatively biased variance components for small samples. 25
2.1.3. Statistical models for meta‐analysis
Meta‐analysis may be used to estimate a combined effect from at least two ITS studies. 9 , 10 Here, we focus on the two‐stage meta‐analysis approach. The two most common meta‐analysis models include the fixed‐effect (also known as common‐effect) and random‐effects models.
In a fixed‐effect meta‐analysis model, it is assumed that the included ITS studies estimate a single true (common) interruption effect, and any variability in the observed effects is only due to sampling variability. The model can be specified by:
| (2) |
where represents the underlying true interruption effect of the m th regression parameter from Equation 1, and of interest here is (immediate level‐change) and (slope‐change); is an estimate of the mth regression parameter from the ITS study (), and the error in estimating a particular ITS study 's true effect from a sample of participants, assumed to be normally distributed, is represented by ).
In a random‐effects meta‐analysis model, it is assumed that the true interruption effects follow a (conventionally assumed normal) distribution, and the observed ITS study effects are a random sample from this distribution. 10 The random‐effects model can be specified by:
| (3) |
where represents the mean of the distribution of true interruption effects (where, as above, represents the regression parameter (and effect measure) of interest); represents a random effect that allows a separate interruption effect in the ITS study, where these effects are assumed to be normally distributed about the average interruption effect (), with between‐study variance (i.e., ); and within‐study error .
These univariate meta‐analysis models can be extended to allow joint modelling of the regression parameters, known as bivariate meta‐analysis. 43 In fitting separate univariate meta‐analysis models, any within‐study correlation between regression parameters is ignored.
2.1.4. Estimation methods for meta‐analysis
For a given effect measure, , the meta‐analytic effect is estimated as the weighted average of the ITS study effect estimates. For a fixed‐effect model, the estimator for the meta‐analytic effect is (with a variance of ), where the weight given to the ITS study is simply the reciprocal of the within‐study variance, . For a random‐effects model, the same estimator is used, but the weights are modified to incorporate the additional source of between‐study variation, . A common assumption is that the within‐study variances are known, when in practice they are estimated from the observed study data. For large studies, this assumption is generally reasonable, however, for small studies, this can bias the model parameters. 31 Different between‐study variance estimators are available, 31 as well as methods to calculate the confidence interval for the meta‐analytic effect. 32 Here we consider two between‐study variance estimators and confidence interval methods.
Between‐study variance estimators
DerSimonian and Laird (DL) is a moment‐based between‐study variance estimator derived from Cochrane's Q‐statistic, 44 chosen for inclusion in this study as it is commonly used 8 and is implemented as the default estimator in many software packages (e.g., RevMan, 45 metan in Stata 46 ). The estimator is given by:
| (4) |
where the weights are from a fixed‐effect meta‐analysis model, and Q is calculated based on the fixed‐effect meta‐analysis estimate,
| (5) |
An alternative between‐study variance estimator can be derived using REML, 31 chosen for inclusion in this study as it has been recommended as a preferable estimator compared with DL. 31 , 47 , 48 The estimator is given by:
| (6) |
The estimate of the between‐study variance is calculated through a process of iteration, whereby the initial value of is the maximum likelihood estimate, from which an initial is computed, then is updated and the process repeated until convergence. However, the algorithm can occasionally fail to converge.
Confidence interval calculation
A range of confidence interval methods for the meta‐analytic (summary) estimate are available. 32 The two outlined here can be used with both the DL and REML between‐study variance estimators.
The method chosen for inclusion in this study for its wide use in practice, 8 , 34 , 44 the Wald‐type normal distribution (WT) confidence interval, 49 is calculated as:
| (7) |
where is the th quantile of the standard normal distribution (note that is replaced with for a fixed‐effect meta‐analysis). This method assumes is normally distributed, despite the within‐study and between‐study variances not being known and estimated. 32 , 50
Hartung and Knapp, and independently, Sidik and Jonkman, [henceforth referred to as the Hartung‐Knapp/Sidik‐Jonkman (HKSJ)] 51 derived an alternative confidence interval method in an attempt to deal with meta‐analyses with few studies, selected here for its better performance when there are few included studies. 33 , 52 , 53 , 54 Rather than assuming normality of , the method assumes the t‐distribution (with K‐1 degrees of freedom), and includes a small sample standard error adjustment, q, and is calculated as:
| (8) |
where
| (9) |
and
| (10) |
2.2. Simulation methods
Before providing full details, we briefly outline our simulation approach. We generated ITS studies, analysed these using segmented linear regression using two estimation methods (Section 2.1.2) and meta‐analysed the resulting level‐change and slope‐change effect estimates using a fixed‐effect and multiple random‐effects meta‐analysis methods (Section 2.1.4). The ITS studies and meta‐analyses were generated using a range of design parameters (e.g., varying levels of autocorrelation, varying number of studies per meta‐analysis). These design parameters were combined using a fully factorial approach (1620 simulation scenarios), with 1000 replicate meta‐analyses generated per scenario. Various criteria (e.g., bias, 95% confidence interval coverage, Appendix 2) were used to assess the performance of the meta‐analysis methods.
2.2.1. Data generation
The design parameters, which were informed by reviews of ITS studies 12 , 17 , 22 and of meta‐analyses of ITS studies, 8 are provided in Table 1. For each combination of these parameters, ITS studies with continuous outcomes were generated by randomly sampling from the model in Equation 1. We limited our focus to continuous outcomes only, as more research is required to understand the statistical performance of ITS analysis methods for binary (proportion), count or rate outcomes (which are common in ITS studies 8 , 12 ) prior to their assessment in combination with meta‐analysis methods.
TABLE 1.
Design parameters used in the simulation study.
| Parameter | Symbol | Values | ||
|---|---|---|---|---|
| Interrupted time series characteristics | ||||
| Series length |
|
12, 48, 100 | ||
| Interruption time point |
|
|
||
| Intercept |
|
0 | ||
| Pre‐interruption slope |
|
0 | ||
| Immediate level‐change |
|
0, 1 | ||
| Slope‐change post‐interruption |
|
0, 0.1 | ||
| Autocorrelation coefficient, fixed |
|
0, 0.2, 0.4, 0.6 | ||
| Autocorrelation coefficient, variable a |
|
N (0.4,0.152) | ||
| Meta‐analysis characteristics | ||||
| Number of ITS studies per meta‐analysis |
|
3, 5, 20 | ||
| Between‐study variance of the level‐change |
|
0, 0.12, 0.32 | ||
| Between‐study variance of the slope‐change |
|
0, 0.012, 0.052 |
was resampled when the selected autocorrelation fell outside allowable values ().
Models were constructed assuming level‐changes () of 0 and 1, and slope‐changes () of 0 and 0.1. We limited the number of level‐ and slope‐changes investigated based on findings of a simulation study 17 that demonstrate that the magnitude of these parameters did not impact the performance (across a range of metrics, excluding power) for the ITS estimation methods considered (Section 2.1.2). Lag‐1 autocorrelation was varied between fixed values of 0 and 0.6 in increments of 0.2, and values drawn from a distribution . The values of autocorrelation were selected to reflect magnitudes of autocorrelation seen in real‐world data sets (a median of 0.21 across 181 ITS studies, IQR: −0.01 to 0.56). 22 We did not select negative values of autocorrelation, because although estimated in real‐world datasets, this is more likely to be due to sampling error rather than truly negative autocorrelation. Sampling autocorrelation from a distribution aimed to reflect the likely scenario that autocorrelation will vary across the ITS studies.
Three different series lengths were generated (12, 48 and 100 datapoints) to reflect lengths seen in real‐world data sets (median of 48 datapoints per series, IQR: 23–157 12 ), and for which the performance of the ITS analysis methods (OLS and REML) has been shown to differ. 17 , 22 We restricted our examination to an equal number of datapoints as the impact of unequal numbers on the ITS analysis results is not well understood, and to maintain a manageable number of simulation scenarios.
Each meta‐analysis was comprised of multiple ITS studies, and therefore the data generation process involved generating data for each of the multiple ITS studies within each meta‐analysis. Meta‐analyses were generated with 3, 5 or 20 included ITS studies, to reflect the size of meta‐analyses seen in real‐world datasets [a median of 7 ITS (IQR: 6–10)], 55 and to include scenarios where estimation of between‐study variance is likely to be suboptimal (few studies, 3 ITS) and likely to be more accurate (many studies 20 ITS). Furthermore, to incorporate heterogeneity, level changes of the ITS studies within each meta‐analysis were generated with a between‐study variance of 0, 0.1, 2 0.3 2 and the slope changes with a between‐study variance of 0, 0.01, 2 0.05. 17 , 28
2.2.2. Estimands and other targets
The estimands of interest in this simulation study were the meta‐analytic effects for the immediate level‐change and slope‐change parameters from Equation 1 (fixed‐effect and , and random‐effects and ), and the between‐study variance for both parameters ().
2.2.3. Statistical methods to analyse ITS studies
The ITS studies were analysed with both OLS and REML (Figure 2). When REML failed to converge, we used PW, and if this failed, we used OLS. The choice of ITS estimation methods was based on methods that are commonly used and those shown to have better performance. 8 , 12 , 17 , 22 We did not include autoregressive integrated moving average (ARIMA), despite being commonly used, 12 , 56 as ARIMA and REML have been shown to perform similarly (in terms of confidence interval coverage) for analysing single ITS studies. 17 Furthermore, REML yielded less biased estimates of lag‐1 autocorrelation.
FIGURE 2.

Simulation procedure and analysis methods. *The estimation methods for ITS analysis are listed in order of preference, that is, REML is used whenever it converges, while PW followed by OLS are used in the case of non‐convergence. DL, DerSimonian and Laird; HKSJ, Hartung‐Knapp/Sidik‐Jonkman; ITS, interrupted time series; OLS, ordinary least squares; PW, Prais‐Winsten; REML, restricted maximum likelihood; WT, Wald‐type. [Colour figure can be viewed at wileyonlinelibrary.com]
2.2.4. Statistical methods for meta‐analysis
Level‐change and slope‐change effect estimates were combined assuming both fixed‐effect and random‐effects univariate models (Figure 2). We did not examine the performance of bivariate meta‐analysis models because our analysis of real‐world ITS datasets yielded negligible correlation between estimates of immediate level‐ and slope‐change (n = 473, correlation = −0.01, 95% CI: −0.03 to 0.01; further details available from the corresponding author). In this circumstance, bivariate meta‐analysis would not lead to improved precision of the regression parameters. For random‐effects meta‐analysis, we examined two between‐study variance estimators (DL and REML) and two confidence interval methods (WT and HKSJ) (Section 2.1.4). All meta‐analyses were implemented using meta in Stata version 16.1. 57
2.2.5. Performance measures
The performance of the meta‐analysis methods was assessed by examining bias, 95% confidence interval coverage, empirical standard error, model‐based standard error, type I error rate, and statistical power (see Table S1 for definitions). For each of the 1620 combinations of design parameters (scenarios), we used 1000 replicates to keep the Monte Carlo Standard Error (MCSE) for confidence interval coverage and type I error rate below 0.7%. 39 The non‐convergence rate of the REML and PW estimation methods were tabulated.
2.2.6. Simulation procedures
Prior to running the simulations, data generation mechanisms were checked by initially simulating series with 100,000 datapoints and meta‐analyses with 50 included studies, to ensure the estimated level‐change, slope‐change, autocorrelation and estimates of heterogeneity matched their input values, that is, that these estimators were all consistent in the statistical sense in large samples. Scatter and box plots were used to visualise the performance for all metrics.
The simulation was conducted using Stata version 16.1 57 and results were visualised using R version 4.1.0 (dplyr, 58 foreign, 59 ggplot2 60 ). All code for generating and analysing the simulated data are available in the Monash University repository known as Bridges. 61
3. RESULTS
For simplicity of presentation, we restrict the descriptions of our findings to a limited number of the simulation scenarios, meta‐analysis methods and ITS effect measures. We focus on scenarios in which the data were generated with an immediate level‐change () of 1, a slope‐change () of 0.1, and where autocorrelation was fixed at specific magnitudes; autocorrelation variability had no impact on performance (Appendix 3.7). The simulation performance measures (aside from power) were not impacted by combinations of level‐ and slope‐changes. Furthermore, given minimal difference in performance between the random‐effects methods with DL and REML between‐study variance estimators, we restrict presentation of findings to (i) DL between‐study variance estimator with WT confidence intervals (DL + WT) and (ii) REML between‐study variance estimator with HKSJ confidence intervals (REML+HKSJ), with the former representing the most common method combination. Results from all random‐effects methods are presented in Appendix 3. Finally, we focus on the performance of the meta‐analysis methods for combining immediate level‐change estimates, given the patterns for slope‐change were similar across the scenarios (see Appendix 4 for all findings).
3.1. Bias of meta‐analytic level‐change
All meta‐analysis method and ITS analysis combinations yielded approximately unbiased estimates of meta‐analytic level‐change across all simulated scenarios (Appendix 3.2).
3.2. Confidence interval coverage of meta‐analytic level‐change
When data were generated under a fixed‐effect model (i.e., no underlying level‐change heterogeneity), fixed‐effect meta‐analysis of level‐change effects estimated from OLS ITS yielded coverage less than the nominal 95% level, except when autocorrelation was zero and the number of datapoints was 48 or greater (Figure 3a). The less than nominal coverage decreased further with increasing autocorrelation. When the number of ITS datapoints was 12 (Figure 3b), fixed‐effect meta‐analysis of level‐change effects estimated from REML ITS yielded coverage less than the nominal 95% level, and was importantly less than when the ITS were analysed using OLS. However, when the number of datapoints was greater than 12, fixed‐effect meta‐analysis of REML ITS level‐change estimates reached coverage close to the nominal 95% level. Random‐effects meta‐analysis (irrespective of the method) of level‐change effects estimated from OLS ITS or REML ITS yielded coverage close to the nominal 95% level. Exceptions to this were when the number of included ITS studies was small (i.e., ≤5) and meta‐analysed with DL + WT meta‐analysis; in this circumstance, coverage, for some scenarios (e.g., increasing magnitude of autocorrelation and OLS ITS), was less than the nominal 95% level (but still at least 83%).
FIGURE 3.

Plots of 95% confidence interval coverage of immediate level‐change (y‐axis) when the data were generated under a fixed‐effect model and the ITS studies were analysed with OLS (A) and REML (B) using fixed‐effect (red circles), DL + WT (green diamonds) and REML+ HKSJ (blue crosses) meta‐analysis methods versus autocorrelation (x‐axis). Plots are presented separately by combinations of the number of included studies (rows) and the number of datapoints (columns). The solid red line depicts the nominal 95% coverage level. Simulation scenarios presented include a level‐change of 1, level‐change between‐study heterogeneity of 0, slope‐change of 0.1, slope‐change between‐study heterogeneity of 0, and autocorrelation constant across included ITS studies. CI, confidence interval; DL, DerSimonian and Laird; dps, datapoints; HKSJ, Hartung‐Knapp/Sidik‐Jonkman; ITS, interrupted time series; OLS, ordinary least squares; REML, restricted maximum likelihood; WT, Wald‐type. [Colour figure can be viewed at wileyonlinelibrary.com]
When data were generated under a random‐effects model (Figure 4), random‐effects meta‐analysis of level‐change effects estimated from OLS ITS or REML ITS yielded coverage close to the nominal 95% level, irrespective of the meta‐analysis method used and the magnitude of heterogeneity. The only exceptions to this were DL + WT meta‐analysis of REML ITS when the number of included ITS studies was small (i.e., ≤5) and the number of ITS datapoints was 12.
FIGURE 4.

Plots of 95% confidence interval coverage of immediate level‐change (y‐axis) when the ITS studies were analysed with OLS (A) and REML (B) using fixed‐effect (red circles), DL + WT (green diamonds) and REML+HKSJ (blue crosses) meta‐analysis methods versus level‐change heterogeneity (x‐axis). Plots are presented separately by combinations of the number of included studies (rows) and number of datapoints (columns). The solid red line depicts the nominal 95% coverage level. Simulation scenarios presented include a level‐change of 1, slope‐change of 0.1, slope‐change between‐study heterogeneity of 0, and autocorrelation of 0. CI, confidence interval; DL, DerSimonian and Laird; dps, datapoints; HKSJ, Hartung‐Knapp/Sidik‐Jonkman; ITS, interrupted time series; OLS, ordinary least squares; REML, restricted maximum likelihood; WT, Wald‐type. [Colour figure can be viewed at wileyonlinelibrary.com]
3.3. Standard errors of meta‐analytic level‐change
When data were generated under a fixed‐effect model, fixed‐effect meta‐analysis of level‐change effects estimated from OLS ITS yielded model‐based standard errors that were smaller than empirical standard errors (i.e., the ratio was less than one), except when autocorrelation was zero and the number of datapoints was 48 or greater (Figure 5a). The underestimation was exacerbated by increasing autocorrelation. Fixed‐effect meta‐analysis of level‐change effects estimated from REML ITS yielded model‐based standard errors that were smaller than the empirical standard errors when the number of ITS datapoints was 12 (Figure 5b) (ratios ranging from 0.37 to 0.64), and the ratio was importantly less than when the ITS were analysed using OLS (ratios ranging from 0.54 to 0.83). However, when the number of datapoints was greater than 12, fixed‐effect meta‐analysis of REML ITS level‐change estimates yielded model‐based standard errors that were generally similar to the empirical standard errors (ratios ranging from 0.77 to 0.97). Random‐effects meta‐analysis (irrespective of the method) of level‐change effects estimated from OLS ITS or REML ITS yielded model‐based standard errors similar to the empirical standard errors (i.e., all ratios were generally close to one, with ratios ranging from 0.83 to 1.14 for OLS ITS and 0.81 to 1.13 for REML ITS).
FIGURE 5.

Plots of the ratio of model‐based standard error (modSE) to the empirical standard error (empSE) of immediate level‐change (y‐axis) when the data was generated under a fixed‐effect model and the ITS studies were analysed with OLS (A) and REML (B) using fixed‐effect (red circles), DL + WT (green diamonds) and REML+HKSJ (blue crosses) meta‐analysis methods versus autocorrelation (x‐axis). Plots are presented separately by the number of included studies (rows) and number of datapoints (columns). The solid red line depicts a ratio of one, where the model‐based standard error and empirical standard error are equal and thus that the model‐based standard error accurately estimates the true standard error. Simulation scenarios include a level‐change of 1, level‐change heterogeneity of 0, slope‐change of 0.1, slope‐change heterogeneity of 0, and fixed autocorrelation. CI, confidence interval; DL, DerSimonian and Laird; dps, datapoints. empSE, empirical standard error; HKSJ, Hartung‐Knapp/Sidik‐Jonkman; ITS, interrupted time series; modSE, model‐based standard error; OLS, ordinary least squares; REML, restricted maximum likelihood; WT, Wald‐type. [Colour figure can be viewed at wileyonlinelibrary.com]
When data were generated under a random‐effects model (Figure 6), random‐effects meta‐analysis (irrespective of the method) of level‐change effects estimated from OLS ITS or REML ITS yielded ratios close to one (ratios ranging from 0.82 to 1.14 for OLS ITS and 0.79 to 1.13 for REML ITS) (i.e., appropriate estimation of standard errors in the presence of heterogeneity).
FIGURE 6.

Plots of the ratio of model based standard error (modSE) to the empirical standard error (empSE) of immediate level‐change (y‐axis) when the ITS studies were analysed with OLS (A) and REML (B) using fixed‐effect (red circles), DL + WT (green diamonds) and REML+HKSJ (blue crosses) meta‐analysis methods versus level‐change heterogeneity (x‐axis). Plots are presented separately by combinations of the number of included studies (rows) and number of datapoints (columns). The solid red line depicts a ratio of one, where the model‐based standard error and empirical standard error are equal and thus that the model‐based standard error accurately estimates the true standard error. Simulation scenarios include a level‐change of 1, slope‐change of 0.1, slope‐change heterogeneity of 0, and autocorrelation of 0. CI, confidence interval; DL, DerSimonian and Laird; dps, datapoints. empSE, empirical standard error; HKSJ, Hartung‐Knapp/Sidik‐Jonkman; ITS, interrupted time series; modSE, model‐based standard error; OLS, ordinary least squares; REML, restricted maximum likelihood; WT, Wald‐type. [Colour figure can be viewed at wileyonlinelibrary.com]
3.4. Statistical power to detect a level‐change
To avoid misleading interpretations of statistical power, we limit presentation of results to only scenarios in which coverage was at least 90%; acknowledging that with this threshold, there will be some artificial inflation of power when coverage is less than the nominal 95% level (due to the inflated type I error rate, i.e., 100‐coverage%). When the number of ITS was large (i.e., 20), power was reasonable, irrespective of the number of time points, ITS analysis method, or meta‐analysis method (Figure 7). When there were few ITS studies per meta‐analysis (i.e., 5 or fewer), power importantly reduced with a decreasing number of datapoints and with increasing autocorrelation. Furthermore, power was affected by the meta‐analysis method used, with REML+HKSJ yielding less power than DL + WT.
FIGURE 7.

Plots of statistical power (the percentage of simulations that have a 95% confidence interval that did not include zero) of immediate level‐change (y‐axis) when the data was generated under a fixed‐effect model and the ITS studies were analysed with OLS (A) and REML (B) using fixed‐effect (red circles), DL + WT (green diamonds) and REML+HKSJ (blue crosses) meta‐analysis methods versus autocorrelation (x‐axis). Plots are presented separately by the number of included studies (rows) and number of datapoints (columns). Simulation scenarios include a level‐change of 1, level‐change heterogeneity of 0, slope‐change of 0.1, slope‐change heterogeneity of 0, and fixed autocorrelation. Only scenarios with a confidence interval coverage greater than 90% have been plotted. CI, confidence interval; DL, DerSimonian and Laird; dps, datapoints; HKSJ, Hartung‐Knapp/Sidik‐Jonkman; ITS, interrupted time series; OLS, ordinary least squares; REML, restricted maximum likelihood; WT, Wald‐type. [Colour figure can be viewed at wileyonlinelibrary.com]
3.5. Convergence of estimation methods
When analysing the ITS datasets using REML, if the analysis failed to converge, we used PW, followed by OLS. Of the 15,120,000 ITS studies analysed using REML, 6% (899,970) failed to converge. When analysing these 899,970 ITS studies using PW, all converged, precluding the need to use OLS. Non‐convergence of REML was more common for short series (17.21% for 12 datapoints, 0.57% for 48 datapoints, and 0.08% for 100 datapoints). However, the performance across all measures when comparing ITS studies analysed using REML with those analysed with PW were similar (results shown for some simulation scenarios for coverage, Appendix 3.9). Among the 3,240,000 meta‐analyses of level‐change performed using REML, 2161 (0.0006%) failed to converge.
3.6. Estimation of heterogeneity
The magnitude of heterogeneity was overestimated in most scenarios by both REML and DL estimators. The DL estimates of heterogeneity were comparable with those from REML (Appendix 3.8). In scenarios where the ITS study standard errors were underestimated [i.e., when there were a small number of datapoints (i.e., 12 datapoints), or when autocorrelation was present but not accounted for in the analysis (i.e., OLS)], the between‐study variance was overestimated (Figure 8). As autocorrelation increased, the overestimation of heterogeneity when analysed with OLS increased, and this relationship was not modified by the number of included ITS studies. When there was no underlying heterogeneity, often the estimated between‐study variance was greater than zero (in 25,850/45,000, 57%, in scenarios with 3 studies, 31,055/45,000, 69%, with 5 studies and 39,859/45,000, 89%, with 20 studies, when analysed with OLS).
FIGURE 8.

Plots of level‐change heterogeneity estimated using random‐effects meta‐analysis with the REML between‐study variance estimator (y‐axis) when the (A) 3 ITS, (B) 5 ITS and (C) 20 ITS studies were analysed with OLS (orange) and REML (purple) ITS analysis methods. Plots are presented separately by combinations of the true level‐change heterogeneity (rows) and the number of datapoints (columns). The solid red lines indicate the true level‐change heterogeneity. Simulation scenarios presented include a level‐change of 1, slope‐change of 0.1, slope‐change heterogeneity of 0, and fixed levels of autocorrelation. CI, confidence interval; DL, DerSimonian and Laird; dps, datapoints; HKSJ, Hartung‐Knapp/Sidik‐Jonkman; ITS, interrupted time series; OLS, ordinary least squares; REML, restricted maximum likelihood; WT, Wald‐type. [Colour figure can be viewed at wileyonlinelibrary.com]
3.7. Performance in estimating the meta‐analytic slope‐change
All meta‐analysis and ITS analysis method combinations yielded approximately unbiased estimates of meta‐analytic slope‐change in all simulated scenarios. The patterns observed for slope‐change reflected those of level‐change, although the specific performance values differed for coverage, empirical standard error, and power (Appendix 3). Furthermore, the patterns of between‐study variance overestimation were also observed when estimating between‐study variance in meta‐analyses of slope‐change (Appendix 3.8).
3.8. Analysis of illustrative example
The series in the example (introduced in Section 1.1) were relatively long, with 96 monthly datapoints per series, and the number of series available for the meta‐analysis (11) was more than seen on average. 8 We analysed the ITS data from each of the 11 States using two ITS estimation methods. The level‐ and slope‐change effect estimates (and standard errors) were calculated and combined using fixed‐effect and four random‐effects meta‐analysis methods. The point estimates for both level‐ and slope‐changes were similar when using OLS and REML ITS analysis methods, however, the standard errors for the level‐change effects were always smaller when using OLS compared with REML (Table 2). The magnitude of autocorrelation ranged from 0.16 to 0.63 (estimated using REML). The meta‐analytic point estimates for level‐change did not vary by meta‐analysis method (Table 3). This was a result of the heterogeneity variance being estimated as zero, or close to, for both DL and REML (irrespective of the ITS analysis method). The confidence intervals for level‐change were wider when HKSJ confidence intervals were used, compared with WT, when the ITS studies were analysed using OLS. The reverse pattern was observed when the ITS studies were analysed using REML. The same patterns were observed for meta‐analysis of slope‐change.
TABLE 2.
Level‐ and slope‐change effect estimates (SEs), and estimate of lag‐1 autocorrelation from traffic fatality data using OLS and REML ITS analysis methods.
| State | Level‐change (SE) | Slope‐change (SE) | Autocorrelation a | ||
|---|---|---|---|---|---|
| OLS | REML | OLS | REML | ||
| State 1 | 0.96 (0.43) | 0.72 (0.6) | 0.01 (0.02) | 0.01 (0.02) | 0.37 |
| State 2 | 0.73 (0.54) | 1.09 (0.73) | 0.11 (0.06) | 0.11 (0.06) | 0.42 |
| State 3 | −0.07 (0.42) | −0.01 (0.64) | 0.03 (0.02) | 0.03 (0.02) | 0.49 |
| State 4 | −0.36 (0.42) | −0.34 (0.58) | 0.04 (0.02) | 0.04 (0.02) | 0.36 |
| State 5 | 0.56 (0.42) | 0.56 (0.66) | 0.00 (0.02) | 0.00 (0.02) | 0.55 |
| State 6 | 0.68 (0.54) | 0.76 (0.62) | −0.03 (0.06) | −0.03 (0.06) | 0.17 |
| State 7 | 0.08 (0.42) | 0.10 (0.48) | 0.02 (0.02) | 0.02 (0.02) | 0.16 |
| State 8 | −0.38 (0.42) | −0.27 (0.54) | 0.00 (0.02) | 0.00 (0.02) | 0.28 |
| State 9 | 0.25 (0.54) | 0.25 (0.71) | 0.06 (0.06) | 0.06 (0.06) | 0.38 |
| State 10 | 0.25 (0.42) | 0.36 (0.70) | 0.01 (0.02) | 0.01 (0.02) | 0.63 |
| State 11 | 0.05 (0.43) | −0.12 (0.56) | 0.02 (0.02) | 0.02 (0.02) | 0.29 |
Abbreviations: OLS, ordinary least squares; REML, restricted maximum likelihood; SE, standard error.
Autocorrelation estimated using REML.
TABLE 3.
Meta‐analytic level‐ and slope‐change effect estimates (95% confidence intervals), and estimate of between‐study variance from the meta‐analysis of 11 States of traffic fatality data using two ITS analysis and five meta‐analysis methods combinations.
| Level‐change | Slope‐change | |||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| OLS estimate (95%CI) | p‐value |
|
REML estimate (95%CI) | p‐value |
|
OLS estimate (95%CI) | p‐value |
|
REML estimate (95%CI) | p‐value |
|
|||||
| Fixed | 0.209 | 0.121 | 0.000 | 0.22 | 0.23 | 0.000 | 0.017 | 0.008 | 0.000 | 0.015 | 0.116 | 0.000 | ||||
| (−0.055, 0.472) | (−0.139, 0.579) | (0.004, 0.03) | (−0.004, 0.033) | |||||||||||||
| DL + WT | 0.209 | 0.121 | 0.000 | 0.22 | 0.23 | 0.000 | 0.017 | 0.008 | 0.000 | 0.015 | 0.116 | 0.000 | ||||
| (−0.055, 0.472) | (−0.139, 0.579) | (0.004, 0.03) | (−0.004, 0.033) | |||||||||||||
| DL + HKSJ | 0.209 | 0.151 | 0.000 | 0.22 | 0.136 | 0.000 | 0.017 | 0.017 | 0.000 | 0.015 | 0.03 | 0.000 | ||||
| (−0.09, 0.507) | (−0.082, 0.523) | (0.004, 0.03) | (0.002, 0.028) | |||||||||||||
| REML + WT | 0.21 | 0.125 | 0.006 | 0.22 | 0.23 | 0.000 | 0.017 | 0.008 | 0.000 | 0.015 | 0.116 | 0.000 | ||||
| (−0.058, 0.477) | (−0.139, 0.579) | (0.004, 0.03) | (−0.004, 0.033) | |||||||||||||
| REML + HKSJ | 0.21 | 0.149 | 0.006 | 0.22 | 0.136 | 0.000 | 0.017 | 0.017 | 0.000 | 0.015 | 0.03 | 0.000 | ||||
| (−0.089, 0.508) | (−0.082, 0.523) | (0.004, 0.03) | (0.002, 0.028) | |||||||||||||
Abbreviations: CI, confidence interval; DL, DerSimonian and Laird; HKSJ, Hartung‐Knapp/Sidik‐Jonkman; ITS, interrupted time series; OLS, ordinary least squares; REML, restricted maximum likelihood; WT, Wald‐type.
4. DISCUSSION
4.1. Summary and discussion of key findings
Systematic reviews including meta‐analyses of results from ITS studies are important for examining the effects of population‐level interventions. 8 To date, there has been limited evaluation of the performance of meta‐analysis methods when combining results from ITS studies, which often have characteristics that might compromise performance (e.g., short series 8 , 12 , 18 , 62 ). Our simulation study provides insight on the performance of meta‐analysis methods in conjunction with ITS analysis methods, and factors that impact their performance, (e.g., series length, magnitude of autocorrelation and between‐study variance). The statistical estimation methods that we examined (both the ITS analysis and meta‐analysis methods) are those that are commonly used in practice and those that have been recommended for their improved performance. 8 , 32 , 33 , 34 , 44 , 47 , 48 , 52 , 53 , 54
All meta‐analysis methods yielded unbiased estimates of level‐change and slope‐change effects. This was unsurprising given the ITS analysis methods we examined have all been shown to yield unbiased estimates of level‐ and slope‐change. 17 However, the choice of meta‐analysis method did impact the 95% confidence interval coverage, standard error, and power. We discuss these findings firstly in the context of scenarios with no underlying heterogeneity (generated under a fixed‐effect model), followed by scenarios in which heterogeneity was present (generated under a random‐effects model).
In scenarios with no true underlying heterogeneity, fixed effect meta‐analysis (of level‐ and slope‐change estimates) yielded coverage below the nominal 95% level for short series (i.e., 12 data points) or when the ITS method did not account for autocorrelation (when autocorrelation was present). In a numerical simulation study examining the performance of statistical methods for single ITS’, Turner et al. 17 found that both OLS and REML analysis methods underestimated the effect estimate standard errors for short series (i.e., 12 data points). As the number of data points increased, REML ITS analysis yielded estimates of standard errors closer to the true values, even in the presence of autocorrelation. However, as expected, improvements in the estimates of standard errors were not observed when an OLS ITS analysis was used. Using OLS in the presence of autocorrelation yielded greater underestimation of the standard errors as the number of data points increased. Given that the standard error of a meta‐analytic effect estimate in a fixed‐effect model is a function of the within‐study standard errors only, the patterns observed in the present simulation study directly reflect the patterns of Turner et al. 17
In the same scenarios as above, with no true underlying heterogeneity, random‐effects meta‐analysis generally yielded coverage that was close to the nominal 95% level. An artefact of underestimating the within‐study standard errors was that this induced observed between‐study heterogeneity, even in the presence of no true underlying heterogeneity. Greater underestimation of the within‐study standard errors led to greater overestimation of the between‐study variance. For example, greater underestimation occurs when there are many, long series (e.g., 20 ITS studies with 100 datapoints) and autocorrelation is present (0.6) but unaccounted for in the ITS analysis (i.e., OLS is used), while no underestimation of within‐study standard errors occurs when there are many, long series and autocorrelation is present but accounted for (i.e., REML is used for ITS analysis). Fortuitously, this under and over estimation of variances counterbalance one another when combined in the calculation of the meta‐analytic effect estimate standard errors in a random‐effects meta‐analysis, yielding generally unbiased standard errors.
In scenarios where true underlying heterogeneity was present, random‐effects meta‐analysis generally yielded coverage close to the nominal 95% level, irrespective of the ITS analysis method used. This occurred for the same reason as above; scenarios in which the within‐study standard errors were underestimated, the between‐study variance was overestimated, which resulted in generally unbiased standard errors of the meta‐analytic effect estimates. Because the resulting meta‐analysis standard errors were generally unbiased, even with few studies (i.e., ≤ 5 ITS), the HKSJ confidence interval method offered no (or limited) advantage compared with the WT method. The HKSJ method has been shown to yield better coverage than the WT confidence interval in other simulation studies. 31 , 33 Our findings may have differed if the scenarios we investigated resulted in greater underestimation of the within‐study standard errors, thus affecting the estimated between‐study variances, and in turn, the standard errors of the meta‐analytic effect.
While the parameter of interest when fitting a random‐effects meta‐analysis is often the average interruption effect (i.e., the meta‐analytic level‐ or slope‐change), reporting the average alone provides an incomplete and potentially misleading summary of the impact of the interruption. 29 Understanding the consistency of the interruption effects (e.g., through the calculation of a prediction interval), and the factors that may explain observed heterogeneity should be of equal importance. 63 Crucially, however, this relies on accurate estimation of the between‐study variance, which was most accurately estimated when REML was used for the ITS analysis, as opposed to OLS.
4.2. Strengths and limitations
Strengths of this numerical simulation study include the large number of design parameters (and their factorial combination) examined, and the use of a wide range of performance metrics. This allowed us to understand how the design parameters and their interactions affected key parameters required for interpreting meta‐analysis results. Our design parameters were informed by those observed in practice 8 , 12 , 18 , 22 , 40 in an attempt to create scenarios reflective of practice. We also included scenarios that would test the meta‐analysis methods when the underlying assumptions were unlikely to be met.
While our simulation scenarios were extensive, there are many other outcome types (e.g., proportion, count, rate), statistical methods (e.g., ITS analysis methods such as ARIMA, between‐study variance estimators and meta‐analytic effect confidence interval methods), design parameters (e.g., proportion of datapoints pre‐ and post‐interruption) and their combinations that could be investigated. One particular example, pertinent to simulation studies of meta‐analysis methods, is to vary the design parameters across the individual ITS' within the meta‐analyses. 36 For example, by assuming a varying number of datapoints, pre‐interruption levels and/or slopes, and methods used for ITS analysis across the ITS studies. Although we caution against generalising our findings beyond the outcome type, statistical methods and design factor configurations examined in the present study, our study provides a broad understanding of the factors that affect performance, which may be helpful for informing the choice of statistical methods in scenarios beyond the configurations examined here.
4.3. Implications for practice
For meta‐analysts, our findings suggest that fitting a random‐effects model generally yields coverage close to the nominal 95% level, in scenarios with and without underlying heterogeneity, and irrespective of the number of ITS studies or the method used for their analysis. Random‐effects models (as opposed to fixed‐effect models) may be a more appropriate model choice in the context of systematic reviews including ITS studies, as these study designs are likely to have more diversity in their characteristics (compared with randomised trials), potentially inducing statistical heterogeneity. While use of random‐effects meta‐analysis may mitigate some of the consequences of suboptimal ITS analysis methods in the estimation of the average effect, wherever possible, we recommend meta‐analysts use effect estimates calculated from ITS methods that attempt to adjust for autocorrelation (e.g., REML). As noted above, this will lead to more accurate estimation of the between‐study variance (see Section 4.1). This may require re‐analysis of the ITS studies prior to their inclusion in meta‐analysis. ITS data are often available in figures in publications, 12 from which data can be digitally extracted and effect estimates accurately calculated. 64 However, caution is required in relying on the estimated heterogeneity when there are few ITS studies and the ITS have few datapoints.
For researchers undertaking the analysis of primary ITS studies, our results suggest the length of the time series and method used to analyse ITS studies have important implications for meta‐analysis. To facilitate inclusion of eligible ITS studies in potential future systematic reviews, it is critical that their design and analysis methods are completely and accurately reported. Reporting should include a clear description of the ITS design (e.g., number of datapoints in the series), the model and statistical estimation method, including any adjustments made for autocorrelation, the interruption effect measure (e.g., immediate level‐change), and the estimate and measure of precision. 8 Further, provision of the aggregate‐level time series data (e.g., in tables or figures) would be beneficial as it would allow systematic reviewers to re‐analyse time series data across the studies using their preferred method, consistently across studies within a meta‐analysis and to calculate the impact of the interruption using the desired effect measure. 14 , 16 , 22 , 62 , 65 , 66
4.4. Implications for future research
We examined the performance of meta‐analysis methods using a two‐stage meta‐analysis approach; however, in certain circumstances it is possible to fit a single model that includes all the ITS to estimate the parameters in Equation 1, known as the one‐stage approach. 26 Gebski et al. 26 demonstrated this with a single model fit to ITS from three hospital units and allowing the level‐ and slope‐changes to vary via the addition of fixed effect interaction terms between the interruption effects and the hospital units. Other one‐stage models could incorporate random effects for level‐ and slope‐change, to parallel the two‐stage random‐effects approach. Examination of whether there are scenarios in which the one‐stage approach may offer improved efficiency would be of value. 26 Further avenues for research include examining, the impact of the different analysis methods on prediction intervals, as well as more complex scenarios such as where the ITS analysis methods differed between studies in the meta‐analysis, the included ITS studies have lags of greater than 1, or exhibit seasonal patterns.
4.5. Conclusions
Systematic reviews including meta‐analyses of results from ITS studies are important for informing public health policy. Our simulation study provides evidence on the performance of meta‐analysis methods when combining results from ITS studies. We found that all meta‐analysis methods yielded unbiased estimates of the interruption effects. All random effects meta‐analysis methods yielded coverage close to the nominal level, irrespective of the ITS analysis method used. However, the between‐study heterogeneity variance was overestimated in scenarios where the ITS study standard errors were underestimated. Therefore, meta‐analysts should strive to use effect estimates and standard errors that have been calculated from ITS methods that attempt to adjust for autocorrelation (such as REML).
AUTHOR CONTRIBUTIONS
Elizabeth Korevaar: Data curation; formal analysis; investigation; methodology; project administration; visualization; writing – original draft; writing – review and editing. Simon Lee Turner: Investigation; methodology; writing – review and editing. Andrew Forbes: Funding acquisition; investigation; methodology; supervision; writing – review and editing. Amalia Karahalios: Funding acquisition; investigation; methodology; supervision; writing – review and editing. Monica Taljaard: Funding acquisition; writing – review and editing. Joanne E McKenzie: Conceptualization; funding acquisition; investigation; methodology; project administration; supervision; writing – original draft; writing – review and editing.
FUNDING INFORMATION
Elizabeth Korevaar. is supported through an Australian Government Research Training Program (RTP) Scholarship administered by Monash University, Australia. Joanne E. McKenzie supported by an NHMRC Investigator Grant (GNT2009612). The project is funded by the Australian National Health and Medical Research Council (NHMRC) project grant GNT1145273, ‘How should we analyse, synthesize, and interpret evidence from interrupted time series studies? Making the best use of available evidence’, Joanne E. McKenzie, Andrew B. Forbes, Monica Taljaard, Allen C. Cheng, Jeremy M. Grimshaw, Lisa Bero, Amalia Karahalios. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
CONFLICT OF INTEREST STATEMENT
The authors have no competing interests to disclose.
Supporting information
Appendix S1. – Example meta‐analysis, with annotated ITS graphs.
Appendix S2. – Performance measure formulae.
Appendix S3. – Additional results for scenarios with a level‐change of 1 and slope‐change of 1.
Appendix S4. – Results for alternative scenarios; other level‐ and slope‐change combinations.
Appendix S5. – Code and data.
ACKNOWLEDGMENT
Open access publishing facilitated by Monash University, as part of the Wiley ‐ Monash University agreement via the Council of Australian University Librarians.
Korevaar E, Turner SL, Forbes AB, Karahalios A, Taljaard M, McKenzie JE. Evaluation of statistical methods used to meta‐analyse results from interrupted time series studies: A simulation study. Res Syn Meth. 2023;14(6):882‐902. doi: 10.1002/jrsm.1669
DATA AVAILABILITY STATEMENT
The datasets generated and/or analysed during the current study, in addition to the code to replicate the simulation study in its entirety, are available in the Monash University repository known as Bridges, https://doi.org/10.26180/20999185.v1
REFERENCES
- 1. Higgins JPT, Ramsay C, Reeves BC, et al. Issues relating to study design and risk of bias when including non‐randomized studies in systematic reviews on the effects of interventions. Res Synth Methods. 2013;4:12‐25. doi: 10.1002/jrsm.1056 [DOI] [PubMed] [Google Scholar]
- 2. Kontopantelis E, Doran T, Springate DA, Buchan I, Reeves D. Regression based quasi‐experimental approach when randomisation is not an option: interrupted time series analysis. BMJ. 2015;350:h2750. doi: 10.1136/bmj.h2750 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3. Eccles M, Grimshaw J, Campbell M, Ramsay C. Research designs for studies evaluating the effectiveness of change and improvement strategies. Qual Saf Health Care. 2003;12:47‐52. doi: 10.1136/qhc.12.1.47 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4. Penfold RB, Zhang F. Use of interrupted time series analysis in evaluating health care quality improvements. Acad Pediatr. 2013;13:S38‐S44. doi: 10.1016/j.acap.2013.08.002 [DOI] [PubMed] [Google Scholar]
- 5. Gruenewald PJ. Analysis approaches to community evaluation. Eval Rev. 1997;21:209‐230. doi: 10.1177/0193841X9702100205 [DOI] [PubMed] [Google Scholar]
- 6. Goodacre S. Uncontrolled before‐after studies: discouraged by Cochrane and the EMJ. Emerg Med J. 2015;32:507‐508. doi: 10.1136/emermed-2015-204761 [DOI] [PubMed] [Google Scholar]
- 7. Soumerai SB, Starr D, Majumdar SR. How do you know which health care effectiveness research you can trust? A guide to study Design for the Perplexed. Prev Chronic Dis. 2015;12:E101. doi: 10.5888/pcd12.150187 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8. Korevaar E, Karahalios A, Turner SL, et al. Methodological systematic review recommends improvements to conduct and reporting when meta‐analysing interrupted time series studies. J Clin Epidemiol. 2022;145:55‐69. doi: 10.1016/j.jclinepi.2022.01.010 [DOI] [PubMed] [Google Scholar]
- 9. Deeks J, Higgins J, Altman D. Chapter 10: Analysing data and undertaking meta‐analyses. In: Higgins J, Thomas J, Chandler J, et al., eds. Cochrane Handbook for Systematic Reviews of Interventions. Cochrane; 2019. [Google Scholar]
- 10. McKenzie JE, Beller EM, Forbes AB. Introduction to systematic reviews and meta‐analysis. Respirology. 2016;21:626‐637. doi: 10.1111/resp.12783 [DOI] [PubMed] [Google Scholar]
- 11. Shadish WR, Cook TD, Campbell DT. Experimental and Quasi‐Experimental Designs for Generalized Causal Inference. Houghton Mifflin Company; 2002. [Google Scholar]
- 12. Turner SL, Karahalios A, Forbes AB, et al. Design characteristics and statistical methods used in interrupted time series studies evaluating public health interventions: a review. J Clin Epidemiol. 2020;122:1‐11. doi: 10.1016/j.jclinepi.2020.02.006 [DOI] [PubMed] [Google Scholar]
- 13. Taljaard M, McKenzie JE, Ramsay CR, Grimshaw JM. The use of segmented regression in analysing interrupted time series studies: an example in pre‐hospital ambulance care. Implement Sci. 2014;9:77. doi: 10.1186/1748-5908-9-77 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14. Lopez Bernal J, Soumerai S, Gasparrini A. A methodological framework for model selection in interrupted time series studies. J Clin Epidemiol. 2018;103:82‐91. doi: 10.1016/j.jclinepi.2018.05.026 [DOI] [PubMed] [Google Scholar]
- 15. Wagner AK, Soumerai SB, Zhang F, Ross‐Degnan D. Segmented regression analysis of interrupted time series studies in medication use research. J Clin Pharm Ther. 2002;27:299‐309. doi: 10.1046/j.1365-2710.2002.00430.x [DOI] [PubMed] [Google Scholar]
- 16. Turner SL, Karahalios A, Forbes AB, et al. Creating effective interrupted time series graphs: review and recommendations. Res Synth Methods. 2021;12:106‐117. doi: 10.1002/jrsm.1435 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17. Turner SL, Forbes AB, Karahalios A, Taljaard M, McKenzie J. Evaluation of statistical methods used in the analysis of interrupted time series studies: a simulation study. BMC Med Res Methodol. 2021;21:181. doi: 10.1186/s12874-021-01364-0 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18. Jandoc R, Burden AM, Mamdani M, Lévesque LE, Cadarette SM. Interrupted time series analysis in drug utilization research is increasing: systematic review and recommendations. J Clin Epidemiol. 2015;68:950‐956. doi: 10.1016/j.jclinepi.2014.12.018 [DOI] [PubMed] [Google Scholar]
- 19. Polus S, Pieper D, Burns J, et al. Heterogeneity in application, design, and analysis characteristics was found for controlled before‐after and interrupted time series studies included in Cochrane reviews. J Clin Epidemiol. 2017;91:56‐69. doi: 10.1016/j.jclinepi.2017.07.008 [DOI] [PubMed] [Google Scholar]
- 20. Huitema BE, McKean JW. Identifying autocorrelation generated by various error processes in interrupted time‐series regression designs – a comparison of AR1 and portmanteau tests. Educ Psychol Meas. 2007;67:447‐459. doi: 10.1177/0013164406294774 [DOI] [Google Scholar]
- 21. Kutner MH, Nachtsheim CJ, Neter J, et al. Applied Linear Statistical Models. McGraw‐Hill; 1996. [Google Scholar]
- 22. Turner SL, Karahalios A, Forbes AB, Taljaard M, Grimshaw JM, McKenzie J. Comparison of six statistical methods for interrupted time series studies: empirical evaluation of 190 published series. BMC Med Res Methodol. 2021;21:134. doi: 10.1186/s12874-021-01306-w [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23. Chatterjee S, Simonoff JS. Time series data and autocorrelation. In: Chatterjee S, Simonoff JS, eds. Handbook of Regression Analysis. Wiley; 2012:81‐109. [Google Scholar]
- 24. Prais SJ, Winsten CB. Trend estimators and serial correlation. Cowles Commission Discussion Paper Chicago. 1954.
- 25. Cheang W‐K, Reinsel GC. Bias reduction of autoregressive estimates in time series regression model through restricted maximum likelihood. J Am Stat Assoc. 2000;95:1173‐1184. doi: 10.2307/2669758 [DOI] [Google Scholar]
- 26. Gebski V, Ellingson K, Edwards J, Jernigan J, Kleinbaum D. Modelling interrupted time series to evaluate prevention and control of infection in healthcare. Epidemiol Infect. 2012;140:2131‐2141. doi: 10.1017/S0950268812000179 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27. Ramsay C, Grimshaw JM, Grilli R. Meta‐analysis of interrupted time series designs: what is the effect size? 9th Annual Cochrane Colloquium. Cochrane; 2001. [Google Scholar]
- 28. Ramsay C, Grilli R, Grimshaw JM. Robust methods for analysis of interrupted time series designs for inclusion in systematic reviews. 9th Annual Cochrane Colloquium. Cochrane; 2001. [Google Scholar]
- 29. Higgins JP, Thompson SG, Spiegelhalter DJ. A re‐evaluation of random‐effects meta‐analysis. J R Stat Soc Ser A Stat Soc. 2009;172:137‐159. doi: 10.1111/j.1467-985X.2008.00552.x [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30. Borenstein M, Hedges LV, Higgins JP, et al. A basic introduction to fixed‐effect and random‐effects models for meta‐analysis. Res Synth Methods. 2010;1:97‐111. doi: 10.1002/jrsm.12 [DOI] [PubMed] [Google Scholar]
- 31. Veroniki AA, Jackson D, Viechtbauer W, et al. Methods to estimate the between‐study variance and its uncertainty in meta‐analysis. Res Synth Methods. 2016;7:55‐79. doi: 10.1002/jrsm.1164 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32. Veroniki AA, Jackson D, Bender R, et al. Methods to calculate uncertainty in the estimated overall effect size from a random‐effects meta‐analysis. Res Synth Methods. 2019;10:23‐43. doi: 10.1002/jrsm.1319 [DOI] [PubMed] [Google Scholar]
- 33. Langan D, Higgins JPT, Jackson D, et al. A comparison of heterogeneity variance estimators in simulated random‐effects meta‐analyses. Res Synth Methods. 2019;10:83‐98. doi: 10.1002/jrsm.1316 [DOI] [PubMed] [Google Scholar]
- 34. Cornell JE, Mulrow CD, Localio R, et al. Random‐effects meta‐analysis of inconsistent effects: a time for change. Ann Intern Med. 2014;160:267‐270. doi: 10.7326/M13-2886 [DOI] [PubMed] [Google Scholar]
- 35. Wiksten A, Rucker G, Schwarzer G. Hartung‐Knapp method is not always conservative compared with fixed‐effect meta‐analysis. Stat Med. 2016;35:2503‐2515. doi: 10.1002/sim.6879 [DOI] [PubMed] [Google Scholar]
- 36. McKenzie JE, Herbison GP, Deeks JJ. Impact of analysing continuous outcomes using final values, change scores and analysis of covariance on the performance of meta‐analytic methods: a simulation study. Res Synth Methods. 2016;7:371‐386. doi: 10.1002/jrsm.1196 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37. Villar J, Mackey ME, Carroli G, et al. Meta‐analyses in systematic reviews of randomized controlled trials in perinatal medicine: comparison of fixed and random effects models. Stat Med. 2001;20:3635‐3647. doi: 10.1002/sim.1096 [DOI] [PubMed] [Google Scholar]
- 38. Lane TJ, Hall W. Traffic fatalities within US states that have legalized recreational cannabis sales and their neighbours. Addiction. 2019;114:847‐856. doi: 10.1111/add.14536 [DOI] [PubMed] [Google Scholar]
- 39. Morris TP, White IR, Crowther MJ. Using simulation studies to evaluate statistical methods. Stat Med. 2019;38:2074‐2102. doi: 10.1002/sim.8086 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40. Hudson J, Fielding S, Ramsay CR. Methodology and reporting characteristics of studies using interrupted time series design in healthcare. BMC Med Res Methodol. 2019;19:137. doi: 10.1186/s12874-019-0777-x [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41. Judge GG. The Theory and Practice of Econometrics. 2nd ed. Wiley; 1985:1019. [Google Scholar]
- 42. StataCorp . Stata 16 Base Reference Manual. College Station StataCorp LLC; 2019. [Google Scholar]
- 43. Riley RD, Abrams KR, Lambert PC, et al. An evaluation of bivariate random‐effects meta‐analysis for the joint synthesis of two correlated outcomes. Stat Med. 2007;26:78‐97. doi: 10.1002/sim.2524 [DOI] [PubMed] [Google Scholar]
- 44. DerSimonian R, Laird N. Meta‐analysis in clinical trials. Control Clin Trials. 1986;7:177‐188. doi: 10.1016/0197-2456(86)90046-2 [DOI] [PubMed] [Google Scholar]
- 45. Collaboration C . Review manager (RevMan)[computer program] version 5.2. 3. Copenhagen: the Nordic Cochrane Centre. Health Psychol Rev. 2012;2014:17. [Google Scholar]
- 46. Harris R, Bradburn M, Deeks J, et al. Metan: fixed‐ and random‐effects meta‐analysis. Stata J. 2008;8:3‐28. [Google Scholar]
- 47. Novianti PW, Roes KCB, van der Tweel I. Estimation of between‐trial variance in sequential meta‐analyses: a simulation study. Contemp Clin Trials. 2014;37:129‐138. doi: 10.1016/j.cct.2013.11.012 [DOI] [PubMed] [Google Scholar]
- 48. Viechtbauer W. Bias and efficiency of meta‐analytic variance estimators in the random‐effects model. J Educ Behav Stat. 2005;30:261‐293. doi: 10.3102/10769986030003261 [DOI] [Google Scholar]
- 49. Page MJ, Altman DG, McKenzie JE, et al. Flaws in the application and interpretation of statistical analyses in systematic reviews of therapeutic interventions were common: a cross‐sectional analysis. J Clin Epidemiol. 2018;95:7‐18. doi: 10.1016/j.jclinepi.2017.11.022 [DOI] [PubMed] [Google Scholar]
- 50. Brockwell SE, Gordon IR. A comparison of statistical methods for meta‐analysis. Stat Med. 2001;20:825‐840. doi: 10.1002/sim.650 [DOI] [PubMed] [Google Scholar]
- 51. Knapp G, Hartung J. Improved tests for a random effects meta‐regression with a single covariate. Stat Med. 2003;22:2693‐2710. doi: 10.1002/sim.1482 [DOI] [PubMed] [Google Scholar]
- 52. Hartung J, Knapp G. A refined method for the meta‐analysis of controlled clinical trials with binary outcome. Stat Med. 2001;20:3875‐3889. doi: 10.1002/sim.1009 [DOI] [PubMed] [Google Scholar]
- 53. Sanchez‐Meca J, Marin‐Martinez F. Confidence intervals for the overall effect size in random‐effects meta‐analysis. Psychol Methods. 2008;13:31‐48. doi: 10.1037/1082-989x.13.1.31 [DOI] [PubMed] [Google Scholar]
- 54. Sidik K, Jonkman JN. A simple confidence interval for meta‐analysis. Stat Med. 2002;21:3153‐3159. doi: 10.1002/sim.1262 [DOI] [PubMed] [Google Scholar]
- 55. Korevaar E, Karahalios A, Forbes AB, et al. Methods used to meta‐analyse results from interrupted time series studies: a methodological systematic review protocol. F1000Res. 2020;9:110. doi: 10.12688/f1000research.22226.3 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 56. Schaffer AL, Dobbins TA, Pearson S‐A. Interrupted time series analysis using autoregressive integrated moving average (ARIMA) models: a guide for evaluating large‐scale health interventions. BMC Med Res Methodol. 2021;21:58. doi: 10.1186/s12874-021-01235-8 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 57. StataCorp . Stata Statistical Software: Release 16. College Station StataCorp LLC; 2019. [Google Scholar]
- 58. Wickham H, François R, Lionel H, et al. dplyr: A Grammar of Data Manipulation. 2022.
- 59. Team RC . foreign: Read Data Stored by ‘Minitab’, ‘S’, ‘SAS’, ‘SPSS’, ‘Stata’, ‘Systat’, ‘Weka’, ‘dBase’, … 2022.
- 60. Wickham H. ggplot2: Elegant Graphics for Data Analysis. Springer‐Verlag; 2016. [Google Scholar]
- 61. Korevaar E, Turner SL, Forbes AB, et al. Evaluation of Statistical Methods Used to Meta‐Analyse Results from Interrupted Time Series Studies: A Simulation Study – Code and Data. Monash Bridges ; 2022. doi: 10.26180/20999185.v1 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 62. Ramsay CR, Matowe L, Grilli R, Grimshaw JM, Thomas RE. Interrupted time series designs in health technology assessment: lessons from two systematic reviews of behavior change strategies. Int J Technol Assess Health Care. 2003;19:613‐623. doi: 10.1017/s0266462303000576 [DOI] [PubMed] [Google Scholar]
- 63. IntHout J, Ioannidis JP, Rovers MM, Goeman JJ. Plea for routinely presenting prediction intervals in meta‐analysis. BMJ Open. 2016;6:e010247. doi: 10.1136/bmjopen-2015-010247 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 64. Turner SL, Korevaar E, Cumpston M, et al. Effect estimates can be accurately calculated with data digitally extracted from interrupted time series graphs. medRxiv 2022: 2022.2009.2012.22279878. doi: 10.1101/2022.09.12.22279878 [DOI] [PMC free article] [PubMed]
- 65. Lopez Bernal J, Cummins S, Gasparrini A. Interrupted time series regression for the evaluation of public health interventions: a tutorial. Int J Epidemiol. 2017;46:348‐355. doi: 10.1093/ije/dyw098 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 66. Lopez Bernal J, Cummins S, Gasparrini A. Corrigendum to: interrupted time series regression for the evaluation of public health interventions: a tutorial. Int J Epidemiol. 2020;49:1414. doi: 10.1093/ije/dyaa118 [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Appendix S1. – Example meta‐analysis, with annotated ITS graphs.
Appendix S2. – Performance measure formulae.
Appendix S3. – Additional results for scenarios with a level‐change of 1 and slope‐change of 1.
Appendix S4. – Results for alternative scenarios; other level‐ and slope‐change combinations.
Appendix S5. – Code and data.
Data Availability Statement
The datasets generated and/or analysed during the current study, in addition to the code to replicate the simulation study in its entirety, are available in the Monash University repository known as Bridges, https://doi.org/10.26180/20999185.v1
