A generalized interrupted time series model for assessing complex health care interventions

Maricela Cruz; Hernando Ombao; Daniel L Gillen

doi:10.1007/s12561-022-09346-6

. Author manuscript; available in PMC: 2023 Jun 1.

Published in final edited form as: Stat Biosci. 2022 May 25;14(3):582–610. doi: 10.1007/s12561-022-09346-6

A generalized interrupted time series model for assessing complex health care interventions

Maricela Cruz ¹, Hernando Ombao ², Daniel L Gillen ³

PMCID: PMC10208393 NIHMSID: NIHMS1884816 PMID: 37234509

Abstract

Assessing the impact of complex interventions on measurable health outcomes is a growing concern in health care and health policy. Interrupted time series (ITS) designs borrow from traditional case-crossover designs and function as quasi-experimental methodology able to retrospectively analyze the impact of an intervention. Statistical models used to analyze ITS designs primarily focus on continuous-valued outcomes. We propose the “Generalized Robust ITS” (GRITS) model appropriate for outcomes whose underlying distribution belongs to the exponential family of distributions, thereby expanding the available methodology to adequately model binary and count responses. GRITS formally implements a test for the existence of a change point in discrete ITS. The methodology proposed is able to test for the existence of and estimate the change point, borrow information across units in multi-unit settings, and test for differences in the mean function and correlation pre- and post-intervention. The methodology is illustrated by analyzing patient falls from a hospital that implemented and evaluated a new care delivery model in multiple units.

Keywords: Change point detection, Complex interventions, Discrete outcomes, Interrupted time series, Patient-centered data, Segmented regression

1. Introduction

Evaluating the effectiveness of complex interventions is a growing concern in health care and health policy. The Centers for Medicare & Medicaid Services (CMS) in the United States financially incentivize health care quality reform via a value-based purchasing program for health systems care services reimbursement (Kavanagh et al., 2012). The purchasing program proliferates health care interventions aimed at bettering quality of care measures, including mortality and complications, patient safety, and patient experience (Centers for Medicare and Medicaid Services, 2018). Randomized controlled trials are often difficult or infeasible to implement in health systems with regard to health care reform (West et al., 2008). According to the 2018 Annual Review of Public Health, interrupted time series (ITS) designs may be the only feasible recourse for studying the impacts of large-scale public health policies (Handley et al., 2018). ITS designs borrow from traditional case-crossover designs and function as quasi-experimental methodology able to retrospectively analyze the impact of an intervention and account for data dependencies (Bernal et al., 2017).

Statistical models used to analyze ITS data are primarily based on segmented regression, a powerful methodology requiring outcomes measured at regular and evenly spaced intervals (Linden, 2015; Penfold and Zhang, 2013; Wagner et al., 2002). Segmented regression requires a clear differentiation of the pre- and post-intervention time periods (Taljaard et al., 2014), neglects plausible differences in temporal dependence and volatility, restricts the analysis to a single unit, and — though presented as able to model counts, rates and proportions (Wagner et al., 2002) — is not well specified for discrete outcomes in the presence of a mean-variance relationship. These substantial limitations delineate that segmented regression does not take advantage of all available data that may provide information on the change associated with the intervention. Methodology that estimates the time point at which the outcome begins to change, allows for a change in temporal dependence and volatility, and allows for multi-unit analyses have been developed for continuous-valued outcomes (Cruz et al., 2017, 2019). ITS methods for discrete responses remain an area of open research. As health care ITS are often composed of discrete outcome measurements (i.e., patient falls, unretrieved device fragment count, etc.), methodology able to assess the impact of an intervention on these outcome types is needed.

The methodology proposed in this paper is motivated by our interest in estimating the lagged effect of a care delivery intervention on patient falls, recorded monthly at six clinical care units over a five year period. The intervention was the implementation of Clinical Nurse Leader (CNL) integrated care delivery, a nursing model that embeds a master prepared nurse into the front lines of care six months prior to the formal intervention (Bender et al., 2017). ITS of the log of patient falls are included in Figure 1 for two clinical units (the Cardiac and Acute Care units). Our overall aim is to determine if a change in patient falls exists over a predetermined set of possible change points. Then, if a change exists, to estimate the time point at which patient falls exhibits the change and quantify that change in terms of the mean function and temporal dependence.

Fig. 1 — Plots the log of the time series of observed patient falls for the Cardiac and Acute Care units. Note, for the purposes of depicting the time series we add 0.5 to patient falls when patient falls is equal to zero, making log of patient falls equal to −0.69 and giving rise to the negative points in the plots.

In the subsequent sections, we develop the ‘Generalized Robust ITS’ (GRITS) model appropriate for outcomes whose underlying distribution belongs to the exponential family of distributions, thereby expanding the available methodology to adequately model binary and count responses. We describe our proposed GRITS model in detail and provide estimation and inference procedures. Then, we present empirical simulations that assess type one error and power for detecting specified change point alternatives, along with accuracy of our change point estimation procedure. Next, we determine the impact of the CNL integrated care delivery intervention on patient falls via our GRITS model. To conclude, we summarize our developed model and its impact on the broader ITS literature and describe future work.

2. Methods

We propose the Generalized Robust Interrupted Time Series (GRITS) model to analyze multi-unit discrete ITS. The GRITS model generalizes the Robust Multiple Interrupted Time Series model, proposed in Cruz et al. (2019), to handle discrete outcomes. As such, the GRITS model estimates an overall change point across units when appropriate and allows the mean function and correlation structure to differ pre- and post-change point. The change point is defined as the time point at which the underlying pattern of an outcome exhibits a change that may be associated with the intervention of interest. To properly account for the mean-variance relationship of discrete outcomes, we embed the traditional segmented regression approach within a broader generalized estimating equation framework.

The GRITS model makes a distinction between the change point in the outcome and the formal intervention implementation time point. Denote the change point as $τ$ and the time point at which the intervention is formally introduced as $t^{*}$ . The change point $τ$ is not necessarily equal to $t^{*}$ . In fact, often $τ > t^{*}$ (due to a delayed intervention effect) or $τ > t^{*}$ (due to an anticipated intervention effect). For example, patient experience scores may exhibit a change point after the formal intervention of a new care delivery model as patients may not immediately feel the effects of the change due to an adjustment period. If the clinical staff is trained regarding the care delivery model prior to the formal intervention, the change point may occur between the onset of the training and the official start of the formal intervention. GRITS formally tests for the existence of a change point, rather than simply assuming a change, over a predetermined set of possible change points, by implementing the supremum Wald test (Cruz et al., 2019). If the supremum Wald test concludes a change point exists, estimation is carried out via GRITS using a model with a change point at the most likely location. Otherwise, estimation of mean and correlation parameters are estimated without a change point.

2.1. The Generalized Robust ITS (GRITS) Model

Let y_ij denote the response of interest for unit i and measurement j, with $i \in [1, \dots, N], j \in [1, \dots, n_{i}]$ , N the total number of units and $n_{i}$ the number of measurements for unit i. For computational ease we transform the measurement number to the unit interval [0, 1] by dividing j by $n_{i}$ for all i. The transformation is done to appropriately model large valued time series and to ensure that our parameter estimation procedure converges. Denote the transformed j^th measurement number for unit i by $t_{i j}$ . Define $Y_{i} = {[y_{i 1}, \dots, y_{i n_{i}}]}^{'}$ as the vector of all response measurements and $X_{i} = {[x_{i 1}, \dots, x_{i n_{i}}]}^{'}$ as the design matrix, where $x_{i j} = {[1, t_{i j}, I (j \geq τ), t_{i j} I (j \geq τ)]}^{'}$ , for all i. Let $β_{i} (τ) = {[β_{i 0}^{τ}, β_{i 1}^{τ}, δ_{i}^{τ}, Δ_{i}^{τ}]}^{'}$ , the vector of mean function parameters. We include the τ as a superscript or in parentheses to illustrate the dependency of the mean parameters on the change point, τ, but drop them henceforth for simplicity. Observe that $β_{i}$ and $X_{i}$ depend on the unit. Then, denote the conditional expectation of the response given $x_{i j}$ as $μ_{i j} = E [y_{i j} ∣ x_{i j}]$ for all i. We suppose

g (μ_{i j}) = β_{i 0} + β_{i 1} t_{i j} + (δ_{i} + Δ_{i} t_{i j}) I (j \geq τ) = x_{i j}^{'} β_{i},

(1)

with g(·) an assumed link function.

Remark: The measures used to quantify the impact of an intervention on a health outcome in the ITS literature are level change and trend change. If g(·) is the identity link, the level change for unit i is defined as the change in anchored intercept (anchored at $t_{i τ}$ ) for that unit, denoted by $δ_{i} + Δ_{i} t_{i τ}$ , and the trend (slope) change is denoted by $Δ_{i}$ .

We model the conditional working variance of $y_{i j}$ given $x_{i j}$ as $V (μ_{ij}) = Var [y_{ij} ∣ x_{ij}]$ for all i via a quasi-partial likelihood framework. We assume a working correlation structure that follows an auto-regressive process of order one pre- and post-change point conditional on the covariates and $τ$ , i.e.,

corr (y_{i, j - h}, y_{i j} ∣ X_{i}, τ) = {\begin{array}{l} {(ρ_{1} (τ))}^{h} j < τ, \\ {(ρ_{2} (τ))}^{h} j \geq τ, \end{array}

(2)

with $ρ_{1} (τ), ρ_{2} (τ) \in (- 1, 1)$ , for all $i \in {1, \dots, N}$ and $j \in {2, \dots, n_{i}}$ . We include the $τ$ in parentheses to illustrate the dependency of the correlation parameters on the change point, $τ$ , but drop them henceforth for simplicity. We assume the adjacent correlations, $ρ_{1}, ρ_{2}$ , are the same across units. Other working correlation structures are feasible, but we chose an AR(1) structure based on our data setting. Thus, the conditional working covariance matrix of Y_i given $X_{i}$ can be written as $V_{i} \equiv V_{i} (β_{i}, ρ_{1}, ρ_{2}) = S_{i} {(β_{i})}^{\frac{1}{2}} R (ρ_{1}, ρ_{2}) S_{i} {(β_{i})}^{\frac{1}{2}},$ with $S_{i} (β_{i}) = diag {V (μ_{i j})},$ and $R (ρ_{1}, ρ_{2})$ an AR(1) block diagonal working correlation matrix, provided in the Appendix. Note, Then, the quasi-score function for unit i is given by

U_{i} (β_{i}, ρ_{1}, ρ_{2}) = D_{i}^{'} V_{i}^{- 1} (Y_{i} - μ_{i}),

(3)

where D_i denotes the matrix of partial derivatives of the mean function vector, $μ_{i} \equiv {[μ_{i 1}, \dots, μ_{i n_{i}}]}^{'}$ , with respect to $β_{i}$ . The change point and mean function parameters are estimated simultaneously by iteratively solving the quasi-score equation (obtained by setting (3) equal to zero). The adjacent correlation parameters are estimated via method of moments.

2.2. Supremum Wald Test (SWT)

A primary goal of our method is to test for the existence of a change in the outcome of interest over a predetermined set of possible change points. Let $𝒬 = {t^{*} - m, \dots, t^{*}, \dots t^{*} + k}$ denote the set of possible change points, where m and k are non-negative integers predetermined by the researchers. Then, we wish to determine whether a change point exists for any $q \in 𝒬$ . To this end, we propose a supremum Wald test (SWT) for generalized linear models based on the SWT developed in Cruz et al. (2019) for Gaussian-distributed outcomes. Our SWT, therefore, calculates the multivariate Wald test statistic for every $q \in 𝒬$ , implements the Benjamini-Hochberg method to adjust for the multiple comparisons, and results in a binary decision of whether a change point exists or not. Specifically, for each $q \in 𝒬$ we test:

H_{0} : δ_{i}^{q} = Δ_{i}^{q} = 0 \forall i and ρ_{1} (q) = ρ_{2} (q) (no change point) vs. H_{a} : δ_{i}^{q} \neq 0 and/or Δ_{i}^{q} \neq 0 for some i and ρ_{1} (q) \neq ρ_{2} (q) (a change point at q),

The alternative hypothesis, H_a, assumes a change point in the mean function for at least one of the units and a change in the correlation structure at q. The multivariate Wald statistic is therefore

W = \sum_{i = 1}^{N} {(C {\hat{β}}_{i} (q))}^{'} {[C {\hat{V}}_{i} ({\hat{β}}_{i} (q)) C^{'}]}^{- 1} (C {\hat{β}}_{i} (q)) \overset{H_{0}}{⩪} χ_{(2 N + 1)}^{2}, where C = [\begin{matrix} 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & 1 \end{matrix}] and {\hat{V}}_{i} ({\hat{β}}_{i} (q)) = {[{\hat{D}}_{i}^{'} {\hat{V}}_{i} {\hat{D}}_{i}]}^{- 1} .

(4)

Note, C is a contrast matrix and ${\hat{V}}_{i} ({\hat{β}}_{i} (q))$ is the estimated covariance matrix of ${\hat{β}}_{i} (q)$ under the alternative, assuming the working correlation structure is correctly modeled. An empirical sandwich estimator may be used to estimate ${\hat{V}}_{i} ({\hat{β}}_{i} (q))$ , but empirical results indicate poor small sample performance.

The version of the SWT discussed detects whether a change point exists in both the mean functions and correlation structure. It may be of interest to detect a change point solely in the mean functions. In this case, the Multivariate Wald test statistic can be altered accordingly by calculating ${\hat{V}}_{i} ({\hat{β}}_{i} (q))$ assuming one overall correlation structure.

Other tests such as the pseudo-score test (Agresti and Ryu, 2010) or the pseudo-likelihood ratio test (Liang and Self, 1996) may be used in place of our supremum Wald test but would need to be adapted to test for the existence of a change point across a set of possible change points. This would mean using the Pearson-type pseudo-score statistic proposed in Agresti and Ryu (2010) or the pseudo-likelihood ratio statistic (a weighted sum of independent chi-squared variables) proposed in Liang and Self (1996) in place of the Wald statistic which uses the ratio of the parameter estimate(s) to their standard errors(s) and accounting for multiplicity. The pseudo-score statistic is asymptotically equivalent to the score and likelihood ratio statistics for multinomial models, the pseudo-likelihood ratio statistics tends to be more conservative and less powerful than the likelihood ratio statistic, and the Wald statistic is asymptotically equivalent to the score and likelihood ratio statistics, under the null when testing for differences in mean function parameters.

2.3. Parameter Estimation

Post-test parameter estimation depends on the conclusion of the SWT. If the SWT concludes that no change point exist, then GRITS assumes the mean function parameters and the adjacent correlation are the same pre- and post-intervention. That is, GRITS assumes

g (μ_{i j}) = β_{i 0} + β_{i 1} t_{i j}

(5)

for all j, all i, some link function g(·), and a working correlation matrix that follows an AR(1) structure, provided in the Appendix. To estimate $β_{i 0}, β_{i 1}$ and $ρ,$ we implement Algorithm 1, an interative Newton-Raphson algorithm akin to estimation via generalized estimating equations (GEEs), to obtain estimates of the mean function parameters and the adjacent correlation. Otherwise, if the SWT rejects the null hypothesis of no change point, GRITS is expressed as described in The Generalized Robust ITS Model section and the mean function parameters and adjacent correlations are estimated for each $q \in 𝒬 .$ That is, for each $q \in 𝒬 .$ we implement Algorithm 1.

Algorithm 1.

Estimating mean function and correlation parameters iteratively for all i

1:	set $ζ = 1$
2:	set ${\hat{β}}_{i}^{0} = {[0.1 \dots 0.1]}^{'}$
3:	while $ζ >$ tol do
4:	set k to the iteration
5:	set ${\hat{μ}}_{i} = \exp {{({\hat{β}}_{i}^{κ - 1})}^{T} X_{i}}$
6:	obtain Pearson residuals
7:	from the Pearson residuals calculate adjacent correlation(s)
8:	obtain $D_{i}$ and $V_{i}$
9:	Calculate $ℐ_{n_{i}}^{- 1} = {[D_{i}^{'} V_{i} D_{i}]}^{- 1}$
10:	set ${\hat{β}}_{i}^{k} = {\hat{β}}_{i}^{k - 1} + ℐ_{n_{i}}^{- 1} \times U_{i} ({\hat{β}}_{i}^{k - 1})$
11:	set $ζ$ to the sum of the Euclidean distances between ${\hat{β}}_{i}^{k}$ and ${\hat{β}}_{i}^{k - 1}$
12:	end while
13:	obtain estimated covariances of ${\hat{β}}_{i}$

Open in a new tab

The variance-covariance matrix of Y_i conditional on X_i, V_i, is completely specified by the mean function parameters and the adjacent correlations. The estimator of the mean parameters is provided in Algorithm 1. Estimators of the adjacent correlations are provided in the Appendix.

Next, we obtain an estimate of the change point by minimizing the quasi-likelihood information criterion under the independence model (QIC) (Pan, 2001). As an alternative, one could maximize the partial likelihood (Cruz et al., 2017, 2019) or the independence quasi-likelihood. We choose to maintain a ‘likelihood free’ estimation procedure, and thus, minimize the QIC. For each $q \in 𝒬$ , define QIC as

QIC (R; q) = - 2 \sum_{i = 1}^{N} Q ({\hat{β}}_{i} {(q); I, D}_{i}) + 2 \sum_{i = 1}^{N} trace (Ω_{I, i} {\hat{V}}_{i} ({\hat{β}}_{i} (q))),

(6)

with $Q ({\hat{β}}_{i} (q); I, D_{i})$ as the quasi-likelihood, $Ω_{I, i}$ as the observed Fisher information under the independence working correlation structure for unit i, I as the independence working correlation matrix (i.e., a matrix with ones on the diagonal and zero in the off diagonal), and $R = R (ρ_{1}, ρ_{2})$ an AR(1) block diagonal working correlation matrix. Then, the estimated change point is:

\hat{τ} = \arg \min_{q \in Q} QIC (R; q) .

(7)

Estimates of the mean function parameters and the adjacent correlations are obtained based on Algorithm 1 conditional on the estimated change point, $\hat{τ}$ .

We use all of the units jointly to estimate parameters ρ₁ and ρ₂, as well as the change point, thereby borrowing information in the estimation of the adjacent correlations and change point.

3. Empirical Studies

We go on to study the operating characteristics of our proposed methodology. Particularly, we examine the type one error rate of the SWT, power to detect specified change point alternatives for the SWT, and accuracy of our proposed change point estimation procedure. As in the Supremum Wald Test section, we test:

H_{0} : δ_{i}^{q} = Δ_{i}^{q} = 0 \forall i and ρ_{1} (q) = ρ_{2} (q) (no change point) vs. H_{a} : δ_{i}^{q} \neq 0 and/or Δ_{i}^{q} \neq 0 for some i and ρ_{1} (q) \neq ρ_{2} (q) (change point at q),

for each $q \in 𝒬$ . The Appendix provides empirical studies for the case when the alternative hypothesis assumes a change point solely in the mean functions.

We set simulated data particularities to values based on our patient falls data and generated correlated count ITS via the GenOrd package in R (Barbiero and Ferrari, 2015). We assumed the canonical link function for a Poisson distribution, g(·) = log(·), the same mean function and number of measurements (n_i = n) for all units in all simulated data settings. Importantly, we considered four different values of the total number of units, $N \in {1, 3, 5, 10}$ , in order to compare the gains in efficiency obtained by borrowing information across various units.

3.1. Empirical Type One Error of the SWT

To examine the type one error rate of the SWT we generated 10,000 correlated count ITS of length $n \in {60, 120}$ under the null hypothesis of no change point for three values of the adjacent correlation, $ρ \in {0.1, 0.2, 0.4}$ . We assumed $β_{i 0} = 2$ and $β_{i 1} = - 0.2$ for all i. When n = 60 the set of possible change points was {25, 26, … , 34} and when n = 120 the set of possible change points was {50, 51, … , 69}. We compared two values of n to illustrate the impact of doubling the time series length on the type one error rate. With regard to the adjacent correlation values, 0.1 was a hypothesized upper bound for ρ in our patient falls data and 0.4 a large adjacent correlation value for count data. Type one error rates for the six scenarios are included in Table 1. For the cases when N = 1 or ρ = 0.4 and n = 60, the empirical type one error rates were large. This is likely due to a small effective sample size. For all other scenarios the type one error rates were relatively well behaved, albeit better behaved as the number of units and the length of the time series increased.

Table 1.

Type one error rates for the SWT testing the existence of a change point in the mean function and correlation structure.

Empirical Type One Error Rate
	n = 60				n = 120
ρ	1 Unit	3 Units	5 Units	10 Units	1 Unit	3 Units	5 Units	10 Units
0.1	0.0730	0.0533	0.0507	0.0472	0.0450	0.0410	0.0358	0.0364
0.2	0.0788	0.0588	0.0535	0.0510	0.0473	0.0330	0.0399	0.0388
0.4	0.0859	0.0596	0.0576	0.0543	0.0567	0.0440	0.0391	0.0418

Open in a new tab

3.2. Empirical Power of the SWT

We generated 10, 000 correlated count ITS of length $n \in {60, 120}$ under the alternative hypothesis of a change point in the mean function and correlation structure. The change point was placed in the middle of the time series, at time point 31 if n = 60 and at 61 if n = 120, and the set of possible change points was assumed to be $𝒬_{60} = {25, 26, \dots, 34}$ and $𝒬_{120} = {50, 51, \dots, 69}$ . As in the previous section, we considered $N \in {1, 3, 5, 10}$ to illustrate the gains in power obtained by borrowing information across units. We set the adjacent correlation to $(ρ_{1}, ρ_{2}) \in {(0.1, 0.2), (0.2, 0.3), (0.4, 0.5)}$ , and assumed $β_{i 0} = 2, β_{i 1} = - 0.2$ and $δ_{i} = 0$ for all i.

For brevity, we examined power as a function of the change in slope, provided in Figure 2. We expect similar results for power as a function of the change in intercept. We note that empirical power decreased as the adjacent correlations increased and increased as the length of the time series increased, as expected. Additionally, empirical power increased as the number of units increased. Therefore, there was a significant gain in power obtained by borrowing information across units and a lesser yet substantial gain in power as the length of the time series increased.

Fig. 2 — Plots empirical power of the SWT as a function of the change in slope for n = 60 in the first column and n = 120 in the second column. The values of the change in slope ranged between −0.8 and 0.8.

3.2.1. Accuracy of Change Point Estimation Procedure

In addition to power, we are interested in the ability of our change point estimation procedure to correctly estimate the true change point when the SWT concludes a change point does indeed exists. The proportion of simulations that correctly estimated the change point within one unit of the truth when the SWT concluded a change point did exist, are included in Figure 3 for all scenarios considered. Similar to the empirical power results, accuracy of our change point estimation procedure increased as the adjacent correlation decreased and as the number of units increased. Thus, a gain in accuracy occurred when information was borrowed across units.

Fig. 3 — The first column plots accuracy of our change point estimation procedure as a function of the change in slope for n = 60 and the second column for n = 120. The values of the change in slope ranged between −0.8 and 0.8. Note, accuracy is defined as the proportion of simulations that estimate the change point to be within one time point of the true change point after rejecting the null hypothesis that a change point does not exist via SWT. For Δ = 0 (the model with no change point), we did not calculate change point accuracy.

We note that as the length of the time series increased accuracy seems to have decreased. This anomaly may be explained by the size of the set of possible change points. In our simulation studies, we doubled the cardinality of the set of possible change points along with the time series length, thus the change point search space grew as n increased. The large change point search space may have in turn decreased accuracy by increasing the number of plausible change points.

3.2.2. Coverage Probabilities of Bootstrap Confidence Intervals

We were also interested in quantifying the variability of the change point, and as such, conducted simulations that implement a bootstrap to obtain 95% confidence intervals for the change point following the process proposed in Hušková and Kirch (2008) and adapting it for more than one unit or time series. In the case with one unit, we implemented a circular block bootstrap (Politis and Romano, 1991) of the centered estimated errors to reconstruct a bootstrap time series using the parameter estimates of the original generated time series. In the case with more than one unit, we first sampled units with replacement and then carried out the circular block bootstrap of the estimated centered errors for each sampled unit. Then, in both cases, we utilized the difference between the change point estimate of the bootstrap time series and the change point estimate of the original generated time series to approximate the difference between the true change point and its estimate. The empirical distribution of the difference allowed us to construct confidence intervals for the change point, taking into account that the change point is not known and had to be estimated (Hušková and Kirch, 2008).

We set the block length to $n^{\frac{1}{5}}$ , approximately 3 for both time series lengths n = 60 and n = 120. For estimating a symmetrical distribution, the asymptotically optimal block length (in terms of minimizing asymptotic mean-square error) is proportional to $n^{\frac{1}{5}}$ (Härdle et al., 2003). In preliminary simulations not included here, we set the block length to $2 \times n^{\frac{1}{5}}$ , but coverage probabilities were worse. We only considered the case with $ρ_{1} = 0.1$ and $ρ_{1} = 0.2$ , as that is the closest to our data application adjacent correlation values.

We selected six values of the change in slope for which to conduct these simulations and generated 1,000 time series under the alternative for the six change in slope values. For each generated time series, we created 100 bootstrap samples to obtain 95% confidence intervals. These simulations were computationally time-intensive and as such we were unable to conduct them for more generated time series, bootstrap samples, or change in slope values. We considered $Δ_{i} = Δ \in {- 0.7, - 0.5, - 0.2, 0.2, 0.5, 0.7} \forall i$ . All other parameters were set based on the power simulation parameter values.

The coverage probabilities for the bootstrap change point 95% confidence intervals increased as the number of units increased and as the number of time points increased. For the cases with 10 units and the larger slope change values (e.g, −0.7 and 0.7), coverage probabilities of the change point confidence intervals were relatively well behaved; see Table 2. This is expected, as the empirical power was higher for more units and larger change in slope values. We also provided coverage probabilities for the other model parameters in Table 2. We note that coverage probabilities were reasonable (given that the bootstrap sample size was set to 100) for the other parameters except the baseline intercept, β₀, and the adjacent correlation post-intervention, ρ₂. The bootstrap procedure used would likely capture the change in correlation more adequately with more time points pre- and post-intervention or if we had separately sampled the blocks in the pre- and post-periods. In terms of the baseline intercept, we believe the coverage probabilities were so low because we applied the circular block bootstrap directly on the residuals of our time series without centering the time series. Coverage probabilities for the baseline intercept may have improved if the time series were centered around zero.

Table 2.

Coverage probabilities of the circular block bootstrap 95% confidence intervals for the case with the same mean function across units and a change in the mean functions and correlation structure. Note, $Δ_{i} = Δ \in {- 0.7, - 0.5, - 0.2, 0.2, 0.5, 0.7}$ for all i, in this scenario.

Coverage Probabilities for Bootstrap Confidence Intervals

		n = 60	n = 120

Δ	Parameter	Truth	Units: 1	3	5	10	Units: 1	3	5	10
−0.7	τ	31 if n = 60 61 if n = 120	0.539	0.708	0.809	0.92	0.635	0.851	0.918	0.965
	β ₀	2	0.186	0.001	0	0	0.002	0	0	0
	β ₁	−0.2	0.998	0.998	0.999	1	0.997	1	0.996	1
	δ	0	0.996	0.998	0.999	0.999	0.993	0.993	0.992	0.974
	Δ	−0.7	0.999	0.998	1	1	0.998	0.999	0.999	1
	ρ ₁	0.1	0.989	0.995	0.989	0.97	0.995	0.995	0.994	0.995
	ρ ₂	0.2	0.943	0.828	0.639	0.208	0.952	0.824	0.618	0.147

−0.5	τ	31 if n = 60 61 if n = 120	0.496	0.602	0.703	0.827	0.551	0.721	0.8	0.912
	β ₀	2	0.173	0	0	0	0	0	0	0
	β ₁	−0.2	0.997	0.999	0.998	0.999	0.998	0.996	0.997	0.999
	δ	0	0.992	0.997	0.998	0.998	0.989	0.989	0.991	0.989
	Δ	−0.5	0.994	0.998	0.999	0.999	0.995	0.998	0.997	0.999
	ρ ₁	0.1	0.988	0.987	0.988	0.953	0.999	1	0.999	0.995
	ρ ₂	0.2	0.921	0.719	0.503	0.116	0.938	0.715	0.429	0.069

−0.2	τ	31 if n = 60 61 if n = 120	0.424	0.447	0.459	0.558	0.438	0.459	0.551	0.65
	β ₀	2	0.147	0.001	0	0	0	0	0	0
	β ₁	−0.2	0.992	0.998	0.998	1	0.995	0.995	0.999	1
	δ	0	0.986	0.997	0.996	0.999	0.981	0.995	0.997	0.997
	Δ	−0.2	0.998	0.995	0.999	0.999	0.992	0.996	0.995	0.998
	ρ ₁	0.1	0.993	0.985	0.978	0.927	0.994	0.996	0.995	0.988
	ρ ₂	0.2	0.895	0.62	0.34	0.039	0.884	0.558	0.263	0.009

0.2	τ	31 if n = 60 61 if n = 120	0.412	0.44	0.477	0.563	0.437	0.492	0.559	0.653
	β ₀	2	0.191	0.001	0	0	0.003	0	0	0
	β ₁	−0.2	0.991	0.993	0.994	0.999	0.991	0.993	0.993	0.992
	δ	0	0.981	0.984	0.99	0.998	0.984	0.996	0.994	0.99
	Δ	0.2	0.99	0.993	0.997	0.998	0.987	0.998	1	1
	ρ ₁	0.1	0.987	0.993	0.969	0.925	0.996	0.993	0.986	0.977
	ρ ₂	0.2	0.883	0.595	0.289	0.023	0.878	0.497	0.191	0.005

0.5	τ	31 if n = 60 61 if n = 120	0.495	0.652	0.758	0.872	0.594	0.791	0.844	0.926
	β ₀	2	0.242	0.003	0	0	0.006	0	0	0
	β ₁	−0.2	0.991	0.984	0.989	0.985	0.982	0.98	0.98	0.965
	δ	0	0.986	0.992	0.989	0.991	0.981	0.983	0.984	0.966
	Δ	0.5	0.986	0.993	0.994	0.992	0.987	0.993	0.992	0.993
	ρ ₁	0.1	0.995	0.975	0.963	0.904	0.995	0.988	0.991	0.97
	ρ ₂	0.2	0.894	0.589	0.296	0.026	0.87	0.472	0.204	0.006

0.7	τ	31 if n = 60 61 if n = 120	0.594	0.768	0.874	0.953	0.684	0.849	0.927	0.973
	β ₀	2	0.288	0.007	0	0	0.022	0	0	0
	β ₁	−0.2	0.989	0.981	0.983	0.977	0.968	0.978	0.968	0.967
	δ	0	0.987	0.992	0.983	0.976	0.975	0.973	0.945	0.867
	Δ	0.7	0.991	0.988	0.989	0.975	0.986	0.982	0.987	0.984
	ρ ₁	0.1	0.99	0.98	0.966	0.904	0.995	0.991	0.991	0.97
	ρ ₂	0.2	0.908	0.656	0.4	0.036	0.886	0.547	0.223	0.005

Open in a new tab

4. Results

We assessed the impact of the CNL integrated care delivery model on patient falls in six clinical care units. Our primary goal was to determine whether the CNL intervention was associated with a change in patient falls. To that end, our proposed GRITS model tested for the existence of a change point in patient falls between the nurses introduction into their respective hospital units (January 2010) and three months after the formal intervention (October 2010). GRITS concluded, based on the SWT, that there was a change point in patient falls between January 2010 and October 2010 for the clinical care units at the α = 0.05 level. Thus, we modeled patient falls with a change point.

We were therefore interested in determining the time lag between the onset of the intervention and the intervention’s effect. GRITS estimated a preemptive CNL intervention effect on patient falls. The estimated change point occured one month after the nurses introduction into their respective hospital units, on February 2010. This indicates that the nurses were implementing their CNL training prior to the formal intervention and is critical knowledge with regards to future study planning.

As patient falls are count data, we assumed the canonical link for a Poisson distribution $g (\cdot) = \log (\cdot)$ throughout our modelling procedure and supposed $S_{i} = diag {(μ_{i})}^{\frac{1}{2}}$ for all i in our working correlation matrix. We, thus, discuss mean function parameters in terms of rates. Table 3 and Table 4 provide exponentiated estimates, 95% confidence intervals and p-values for the intercept and slope pre-estimated change point and for the level change, trend change and slope post-estimated change point, respectively. The 95% confidence intervals were obtained via a normal approximation with variance estimated by the inverse of the observed Fisher information. An empirical sandwich estimator could have been used to estimate the variance, but, as stated in the Methods section, empirical results indicate poor small sample performance. We did not include bootstrap confidence intervals due to the inconsistent coverage probabilities obtained in our empirical studies.

Table 3.

Provides estimates of the exponentiated intercept and slope pre-estimated change point, as well as corresponding 95% confidence intervals and p-values. The pre-estimated change point slope is given (scaled) in terms of six month comparisons, thus the inclusion of the $\frac{6}{n}$ term.

	Intercept Pre-Change Point $\exp ({\hat{β}}_{i 0}^{\hat{τ}})$ 95% CI	Slope Pre-Change Point $\exp ({\hat{β}}_{i 1}^{\hat{τ}} \frac{6}{n_{i}})$ 95% CI
Stroke	1.42 (0.83, 2.42)	12.94 (1.9, 88.35)^**
Surgical	2.53 (1.63, 3.94)^***	2.67 (0.49, 14.62)
Acute Care	4.44 (3.15, 6.27)^***	1.53 (0.39, 5.99)^***
Pulmonary	6.65 (5.15, 8.59)^***	7.27 (2.83, 18.68)
Cardiac	3.11 (1.95, 4.97)^***	0.16 (0.02, 1.27)
Medical Surgical	2.49 (1.5, 4.13)^***	0.29 (0.03, 2.55)

Open in a new tab

^**

p-value < .01

^***

p-value < .001

Table 4.

Provides estimates of the exponentiated level change, trend change and slope post-estimated change point, as well as corresponding 95% confidence intervals and p-values. The pre- and post-estimated change point slopes are given (scaled) in terms of six month comparisons, thus the inclusion of the $\frac{6}{n}$ term.

	Level Change $\exp ({\hat{δ}}_{i}^{\hat{τ}} + Δ_{i}^{\hat{τ}} t_{i \hat{τ}})$ 95% CI	Trend Change $\exp ({\hat{Δ}}_{i}^{\hat{τ}})$ 95% CI	Slope Post-Change Point $\exp ([{\hat{β}}_{i 1}^{\hat{τ}} + {\hat{Δ}}_{i}^{\hat{τ}}] \frac{6}{n_{i}})$ 95% CI
Stroke	1.49 (0.87, 2.56)	0.19 (0.02, 1.68)	2.52 (0.94, 6.74)
Surgical	1.62 (0.92, 2.84)	0.5 (0.06, 3.97)	1.33 (0.41, 4.35)
Acute Care	2.14 (1.29, 3.54)^**	1.26 (0.22, 7.28)	1.94 (0.64, 5.82)
Pulmonary	2.08 (1.5, 2.88)^***	0.04 (0.01, 0.13)^***	0.27 (0.12, 0.63)^**
Cardiac	0.69 (0.34, 1.4)	9.39 (0.81, 108.52)	1.47 (0.41, 5.24)
Medical Surgical	1.48 (0.64, 3.4)	7.09 (0.43, 116.5)	2.02 (0.35, 11.6)

Open in a new tab

^**

p-value < .01

^***

p-value < .001

We estimated that the rate of patient falls at the beginning of the observational period (January 2008) ranged between 1.42 to 6.65 per 1000 patient days per month for the six clinical units. For the Stroke, Surgical and Pulmonary units, we estimated that the rate of patient falls comparing time points six months apart was larger in the pre-estimated change point phase than in the post-estimated change point phase. The Stroke and Pulmonary units’ estimated rates of patient falls comparing time points six months apart were respectively 12.98 and 7.27 in the pre-intervention phase and were statistically significant, suggesting a decrease in the rate of patient falls post-intervention. For the Pulmonary unit, the estimated rate of patient falls comparing time points six months apart changed from 7.27 pre-intervention to 0.27 post-intervention. In both phases, the estimated rate of patient falls was statistically significant, suggesting that the rate of patient falls increased pre-estimated change point and decreased post-estimated change point. Figure 4 plots the time series for the six clinical care units along with the unit-specific estimated mean functions.

Fig. 4 — Provides estimated mean functions for the log of patient falls for the six clinical care units. Note, for the purposes of depicting the time series we add 0.5 to patient falls when patient falls is equal to zero, making log of patient falls equal to −0.69 and giving rise to the negative points in the plots. We did not use the log of patient falls in the estimation procedure, and so, this jittering does not affect model parameter estimates.

Recall, the measures used in the ITS literature are level change and trend change. The level change in this setting was defined as exp $(δ_{i} + Δ_{i} t_{i τ})$ for unit i, which is a rate ratio. Quantifying the slope change in an informative manner in this setting is difficult, i.e., exp $(Δ_{i})$ does not translate to a tangible quantity. Nevertheless, we included an estimate of exp $(Δ_{i})$ in Table 4. With regards to level change: the estimated rate of patient falls was between 1.48 and 2.14 times higher at the estimated change point comparing the post-intervention mean function to the projected pre-intervention mean function for five out of the six units, indicating an immediate increase in the rate of patient falls. The increase in the estimated rate of the five units may be attributed to the disruption of the underlying care processes in the clinical care units. In the Cardiac unit, we estimated that the rate of patient falls was approximately 31% that of the projected pre-intervention mean function at the change point.

GRITS estimated that the adjacent correlation prior to the estimated change point was ${\hat{ρ}}_{1} (\hat{τ}) = - 0.090$ [95% confidence interval (−0.317; 0.146)] and ${\hat{ρ}}_{1} (\hat{τ}) = - 0.035$ [95% confidence interval (−0.278; 0.212)] post-estimated change point. The estimated adjacent correlations were small and relatively close to zero, suggesting minimal temporal dependency in both phases. This is consistent with the manner in which patient falls was collected; different patients were likely sampled every month.

5. Discussion

Health care ITS are often composed of non-continuous outcome measurements; i.e., many health care outcomes of interest are binary, counts, or rates (e.g., nurse turnover, number of beds, and patient falls). While segmented regression is presented as being able to model these outcomes, to the best of our knowledge, no formal statistical process is provided. The GRITS model was developed to address this deficiency in the literature. GRITS is appropriate for outcomes whose underlying distribution belongs to the exponential family of distributions, thereby expanding the available methodology to adequately model binary outcomes, counts and rates. In addition, GRITS is able to formally test for the existence of and estimate the change point, borrow information across units in multi-unit settings, and test for differences in the mean function and correlation pre- and post-intervention. Researchers using GRITS can adequately model many of the discrete outcomes used by the Centers for Medicare and Medicaid Services for reimbursement purposes, by taking into account the proper mean-variance relationship of the outcome.

The methodology proposed estimates a global over-all-units change point via a grid search over a pre-specified set of possible change points. Researchers must specify the set of possible change points with care, as we must be cautious of competing intervention effects in ITS designs. Identification of a change point via our proposed procedure relies upon detection of a difference in the mean functions or adjacent correlation of the outcome post-change point. If no change point exists, this would indicate that there is no difference in the mean functions or the correlation structure of the outcome over time and across units.

The number of mean function parameters in GRITS increases as the number of units included in the analyses increase. Consistency and asymptotic normality of the model parameters hold if the lengths of the time series are large enough for each unit individually. For unit-specific parameters, consistency depends upon the length of the time series as in traditional single-unit ITS models; see Zhang, Wagner and Ross-Degnan (2011) for sample size considerations of continuous outcomes and Liu et al. (2020) for sample size considerations of count outcomes, for a single unit. For shared parameters, consistency will depend upon the number of units, the length of the time series, and their ratio. That is, as the number of units goes to infinity, the ratio of the length of the time series to the number of units needs to converge to a constant. If each individual unit has a large enough time series length – see Zhang, Wagner and Ross-Degnan (2011) or Liu et al. (2020) for guidance – then consistency will hold as the number of units may increase but the total number of observations (or the time series lengths) per unit increases at a much faster rate.

GRITS exists in the ITS framework, and as such cannot attribute an observed effect to the intervention of interest, as is the case with most ITS designs (Bernal et al., 2018), but it can inform future study development and measures. Through our GRITS analysis of the CNL intervention we were able to discern that nurses in the six clinical care units may have a priori implemented their CNL training; future CNL studies may be altered to reflect this finding. As noted in the results section, interpreting the level change and trend change when g(·) is not the identity link may be difficult depending on the choice of g(·). Measures able to discern the intervention effects more clearly are needed for discrete ITS in the GRITS (and other non-simple linear segmented regression methods) to properly capture existing relationships.

The methodology proposed in this paper does not account for heterogeneity of change points across units for situations where the data warrants such treatment. In the future, we will develop ITS mixed effects models as alternatives to these methods, able to detect unit specific change points and borrow information across units while allowing for change point heterogeneity. With these models, researches will be able to make inference for the overall population of hospital units and quantify unit specific deviations from the population trajectories.

Acknowledgements

We thank Dr. Miriam Bender (University of California Irvine) for providing us with the patient falls data and allowing us to use it in this article.

University of California Irvine Eugene Cota-Robles Fellowship; National Science Foundation Graduate Research Fellowship Program under Grant No. DGE-1321846; National Science Foundation MMS 1461534 and DMS 1509023 grants; National Institute on Aging of the National Institutes of Health R01AG053555 and P50AG16573; National Institute of Mental Health of the National Institutes of Health MH115697

A. Working Correlation Matrices for GRITS

The working correlation matrix of Section 2.1, R, that assumes a change point in the correlation structure at τ, is:

R = [\begin{matrix} [\begin{matrix} 1 & ρ_{1} & \dots & {(ρ_{1})}^{τ - 2} \\ ρ_{1} & 1 & \dots & {(ρ_{1})}^{τ - 3} \\ ⋮ & ⋮ & ⋱ & ⋮ \\ {(ρ_{1})}^{τ - 2} & {(ρ_{1})}^{τ - 2} & \dots & 1 \end{matrix}] & 0 \\ 0 & [\begin{matrix} 1 & ρ_{2} & \dots & {(ρ_{2})}^{n - τ} \\ ρ_{2} & 1 & \dots & {(ρ_{2})}^{n - τ - 1} \\ ⋮ & ⋮ & ⋱ & ⋮ \\ {(ρ_{2})}^{n - τ} & {(ρ_{2})}^{n - τ - 1} & \dots & 1 \end{matrix}] \end{matrix}],

with ρ₁ and ρ₂ denoting the adjacent correlations in the pre- and post-change point phases, respectively.

If the SWT concludes no change point exists, the working correlation matrix GRITS assumes is:

[\begin{matrix} 1 & ρ & \dots & ρ^{n - 1} \\ ρ & 1 & \dots & ρ^{n - 2} \\ ⋮ & ⋮ & \dots & ⋮ \\ ρ^{n - 1} & ρ^{n - 2} & \dots & 1 \end{matrix}],

with ρ denoting the adjacent correlation.

B. Estimators of the Adjacent Correlations for GRITS

Let $r_{i j} = \frac{y_{i 1} - {\hat{μ}}_{i j}}{\sqrt{V ({\hat{μ}}_{i j})}}$ denote the Pearson residuals. If the SWT concludes a change point does not exist, the estimator of the adjacent correlation, ρ, is:

{\hat{ρ}}_{i} = \frac{\sum_{j = 2}^{n} [(r_{i, j} - {\bar{r}}_{2 : n}) (r_{i, j - 1} - {\bar{r}}_{1 : n - 1})]}{\sum_{j = 1}^{n} [{(r_{i, j} - {\bar{r}}_{1 : n})}^{2}]}, \hat{ρ} = \frac{1}{N} \sum_{i = 1}^{N} {\hat{ρ}}_{i} .

Otherwise, when the SWT concludes a change point does exist, the estimators of $ρ_{1}$ and $ρ_{2}$ are:

{\hat{ρ}}_{1, i} = \frac{\sum_{j = 2}^{τ - 1} [(r_{i, j} - {\bar{r}}_{2 : τ - 1}) (r_{i, j - 1} - {\bar{r}}_{1 : τ - 2})]}{\sum_{j = 1}^{τ - 1} [{(r_{i, j} - {\bar{r}}_{1 : τ - 1})}^{2}]}, {\hat{ρ}}_{1} = \frac{1}{N} \sum_{i = 1}^{N} {\hat{ρ}}_{1, i},

{\hat{ρ}}_{2, i} = \frac{\sum_{j = τ}^{n} [(r_{i, j} - {\bar{r}}_{τ : n}) (r_{i, j - 1} - {\bar{r}}_{τ - 1 : n - 1})]}{\sum_{j = τ}^{n} [{(r_{i, j} - {\bar{r}}_{τ : n})}^{2}]}, {\hat{ρ}}_{2} = \frac{1}{N} \sum_{i = 1}^{N} {\hat{ρ}}_{2, i}

C. Empirical Studies II

In Section 3, we provide empirical studies examining the operating characteristics of our proposed methodology when testing whether a change point exists in both the mean functions and correlation structure. It may be of interest to test whether a change point exists solely in the mean functions. We, therefore, go on to provide empirical studies for the case when the alternative hypothesis assumes a change point in the mean functions but not the correlation structure. That is, for each $q \in 𝒬$ , we test:

H_{0} : δ_{i}^{q} = Δ_{i}^{q} = 0 \forall i (no change point) vs. H_{a} : δ_{i}^{q} \neq 0 and/or Δ_{i}^{q} \neq 0 for some i (a change point at q) .

We again set simulated data particularities to values based on our patient falls data, generated correlated count ITS via the GenOrd package in R (Barbiero and Ferrari, 2015), assumed the canonical link function for a Poisson distribution, g(·) = log(·), and the same mean function for all units. Since the focus is on detecting a change point in the mean functions, we assumed one overall AR(1) correlation structure for the entire observational period. As in Section 3, we considered four different values of the total number of units, N ∈ {1, 3, 5, 10}, to compare the gains in efficiency obtained by borrowing information across units.

C.1. Empirical Type One Error of the SWT

To examine the type one error rate of the SWT, we once again generated 10, 000 correlated count ITS of length $n \in {60, 120}$ under the null hypothesis of no change point. We considered three values of the adjacent correlation, $ρ \in {0.1, 0.2, 0.4}$ , and assumed $β_{i 0} = 2$ and $β_{i 1} = - 0.2$ for all i. When n = 60 the set of possible change points was {25, 26, … , 34} and when n = 120 the set of possible changes points was {50, 51, … , 69}. Type one error rates for the six scenarios are included in Table 5. Results were consistent with those obtained in Section 3: empirical type one error rates were large when N = 1 or ρ = 0.4 and n = 60, and relatively well behaved otherwise.

Table 5.

Type one error rates for the SWT testing the existence of a change point solely in the mean functions.

Empirical Type One Error Rate
	n = 60				n = 120
ρ	1 Unit	3 Units	5 Units	10 Units	1 Unit	3 Units	5 Units	10 Units
0.1	0.0674	0.0481	0.0440	0.0420	0.0426	0.0386	0.0341	0.0341
0.2	0.0705	0.0511	0.0478	0.0450	0.0449	0.0313	0.0363	0.0362
0.4	0.0750	0.0514	0.0504	0.0465	0.0532	0.0431	0.0391	0.0385

Open in a new tab

C.2. Empirical Power of the SWT

We generated 10, 000 correlated count ITS of length $n \in {60, 120}$ under the alternative hypothesis of a change point in the mean function. The change point was again placed in the middle of the time series, at time point 31 if n = 60 and at 61 if n = 120, and the set of possible change points were assumed to be $𝒬_{60} = {25, 26, \dots, 34}$ and $𝒬_{120} = {50, 51, \dots, 69}$ . We also considered three scenarios for the adjacent correlation, $(ρ_{1}), ρ_{2} (τ)) \in {(0.1, 0.2), (0.2, 0.3), (0.4, 0.5)}$ and assumed $β_{i 0} = 2, β_{i 1} = - 0.2$ and $δ_{i} = 0$ for all i.

Empirical power as a function of the change in slope is provided in Figure 5. Results were consistent with those from Section 3: empirical power decreased as the adjacent correlations increased and increased as the length of the time series and the number of units increased. Once again, there was a significant gain in power obtained by borrowing information across units and a lesser yet substantial gain in power as the length of the time series increased.

Fig. 5 — Plots empirical power of the SWT (testing for a change solely in the means) as a function of the change in slope for n = 60 in the first column and n = 120 in the second column. The values of the change in slope ranged between −0.8 and 0.8.

C.2.1. Accuracy of Change Point Estimation Procedure

We are additionally interested in the ability of our change point estimation procedure to correctly estimate the true change point when the SWT concludes a change point exists. Figure 6 plots the proportion of simulations that correctly estimated the change point within one unit of the true change point when the SWT concluded that a change point did indeed exist, for all scenarios considered. As in Section 5.2.2, accuracy of our change point estimation procedure increased as the adjacent correlation decreased and as the number of units increased, and accuracy decreased as the number of response measurements increased. The latter finding may be explained by the fact that the cardinality of the set of possible change points doubled when we doubled the length of the time series; the set of possible change points was about 15% of the time series, regardless of time series length. The large search space may have in turn decreased accuracy. Nonetheless, a gain in accuracy occurred when information was borrowed across units.

Fig. 6 — The first column plots accuracy of our change point estimation procedure (when a change solely in the means is assumed) as a function of the change in slope for n = 60 and the second column for n = 120. The values of the change in slope ranged between −0.8 and 0.8. Note, accuracy is defined as the proportion of simulations that estimate the change point to be within one time point of the true change point after rejecting the null hypothesis that a change point does not exist via SWT. For $Δ = 0$ (the model with no change point), we did not calculate change point accuracy.

C.2.2. Coverage Probabilities of Bootstrap Confidence Intervals

We again conducted simulations that implemented a circular block bootstrap to obtain 95% confidence intervals for the change point following the process proposed in Hušková and Kirch (2008) for six values of the change in slope. Here, we consider the case when there is no change point in the correlation structure and assume ρ = 0.1. We still set the block length to 3, let the change in slope be $Δ_{i} = Δ \in {- 0.7, - 0.5, - 0.2, 0.2, 0.5, 0.7}$ for all i, generated 1,000 time series, and created 100 bootstrap samples for each generated time series to obtain 95% confidence intervals. All other parameters were set based on the corresponding power simulations.

The coverage probabilities behave similarly to those in Section 3.2.2 and in fact tend to be closer to the desired 0.95 mark, with the exception of the baseline intercept and adjacent correlations. The coverage probabilities of the bootstrap confidence intervals seem even worse for the baseline intercept here than in Section 3.2.2 and the coverage probabilities for the 95% confidence intervals of the adjacent correlation decrease as the number of units increases.

Table 6.

Coverage probabilities of the circular block bootstrap change point 95% confidence intervals for the case with the same mean function across units and a change solely in the mean functions.Note, $Δ_{i} = Δ \in {- 0.7, - 0.5, - 0.2, 0.2, 0.5, 0.7}$ for all i, in this scenario.

Coverage Probabilities for Change Point Confidence Intervals

		n = 60	n = 120

Δ	Parameter	Truth	Units: 1	3	5	10	Units: 1	3	5	10
−0.7	τ	31 if n = 60 61 if n = 120	0.577	0.761	0.845	0.921	0.666	0.848	0.918	0.953
	β ₀	2	0.161	0	0	0	0.001	0	0	0
	β ₁	−0.2	0.998	0.997	0.999	0.997	0.997	0.995	0.997	0.999
	δ	0	0.993	0.999	0.997	1	0.992	0.985	0.988	0.976
	Δ	−0.7	0.996	1	0.999	1	0.997	1	0.999	1
	ρ	0.1	0.959	0.965	0.967	0.957	0.979	0.99	0.986	0.978

−0.5	τ	31 if n = 60 61 if n = 120	0.499	0.609	0.702	0.833	0.554	0.756	0.828	0.913
	β ₀	2	0.159	0	0	0	0.001	0	0	0
	β ₁	−0.2	0.992	0.998	0.998	1	0.992	0.997	0.998	0.999
	δ	0	0.995	0.999	0.999	1	0.993	0.99	0.992	0.995
	Δ	−0.5	0.999	0.999	1	1	0.998	1	0.997	1
	ρ	0.1	0.936	0.949	0.918	0.847	0.961	0.968	0.961	0.925

−0.2	τ	31 if n = 60 61 if n = 120	0.408	0.454	0.533	0.561	0.41	0.478	0.569	0.669
	β ₀	2	0.146	0.001	0	0	0.001	0	0	0
	β ₁	−0.2	0.999	0.999	0.998	1	0.991	0.998	1	1
	δ	0	0.994	0.994	0.998	0.998	0.983	0.991	0.996	0.992
	Δ	−0.2	0.996	0.998	0.999	0.999	0.992	0.998	0.998	1
	ρ	0.1	0.903	0.869	0.848	0.695	0.912	0.907	0.854	0.719

0.2	τ	31 if n = 60 61 if n = 120	0.432	0.465	0.53	0.584	0.442	0.521	0.584	0.672
	β ₀	2	0.177	0.001	0	0	0	0	0	0
	β ₁	−0.2	0.99	0.992	0.992	0.993	0.994	0.994	0.991	0.99
	δ	0	0.991	0.995	0.994	0.997	0.981	0.993	0.993	0.998
	Δ	0.2	0.987	0.994	0.996	0.999	0.993	0.997	0.998	1
	ρ	0.1	0.877	0.83	0.786	0.645	0.894	0.86	0.803	0.653

0.5	τ	31 if n = 60 61 if n = 120	0.511	0.662	0.761	0.883	0.61	0.786	0.824	0.928
	β ₀	2	0.267	0.001	0	0	0.006	0	0	0
	β ₁	−0.2	0.982	0.981	0.987	0.981	0.98	0.973	0.966	0.971
	δ	0	0.987	0.993	0.994	0.99	0.979	0.976	0.985	0.959
	Δ	0.5	0.986	0.991	0.989	0.994	0.989	0.991	0.988	0.988
	ρ	0.1	0.916	0.883	0.845	0.714	0.928	0.904	0.858	0.742

0.7	τ	31 if n = 60 61 if n = 120	0.634	0.778	0.864	0.957	0.694	0.855	0.911	0.958
	β ₀	2	0.315	0.019	0	0	0.032	0	0	0
	β ₁	−0.2	0.985	0.974	0.968	0.973	0.971	0.962	0.966	0.96
	δ	0	0.99	0.991	0.991	0.98	0.973	0.971	0.958	0.849
	Δ	0.7	0.99	0.986	0.978	0.982	0.977	0.971	0.983	0.975
	ρ	0.1	0.922	0.915	0.9	0.794	0.942	0.946	0.928	0.878

Open in a new tab

D. Empirical Studies III

In this section, we provide empirical studies examining the operating characteristics of our proposed methodology when testing whether a change point exists in the mean function(s) and correlation structure as in Section 3. That is, for each $q \in 𝒬$ , we test:

H_{0} : δ_{i}^{q} = Δ_{i}^{q} = 0 \forall i (no change point) vs. H_{a} : δ_{i}^{q} \neq 0 and/or Δ_{i}^{q} \neq 0 for some i (a change point at q) .

The main difference is that here we generate time series data based on unit-specific mean functions (and unit-specific mean function parameters, with the exception of the change point, $τ$ ) for each unit while in Section 3 we assume the same mean function parameters across units. We again set data particularities to values based on our patient falls data, generated correlated count ITS via the GenOrd package in R (Barbiero and Ferrari, 2015), and assumed the canonical link function for a Poisson distribution, g(·) = log(·). We again considered four different values of the total number of units, $N \in {1, 3, 5, 10}$ , to compare the gains in efficiency obtained by borrowing information across units.

D.1. Empirical Type One Error of the SWT

To examine the type one error rate of the SWT, we generated 10, 000 correlated count ITS of length $n \in {60, 120}$ under the null hypothesis of no change point. We considered three values of the adjacent correlation, $ρ \in {0.1, 0.2, 0.4}$ , and assumed $β_{i 0} \overset{i i d}{\sim} N (1.12, 0.2)$ and $β_{i 1} \overset{i i d}{\sim} N (0.47, 0.2)$ for all i. When n = 60 the set of possible change points was {25, 26, … , 34} and when n = 120 the set of possible changes points was {50, 51, … , 69}. Type one error rates for the six scenarios are included in Table 7. In most cases the empirical type one error rates were relatively well-behaved but were large when N = 10 and n = 60.

Table 7.

Type one error rates for the SWT testing the existence of a change point in the mean functions and adjacent correlations for the case with unit-specific mean functions.

Empirical Type One Error Rate
	n = 60				n = 120
ρ	1 Unit	3 Units	5 Units	10 Units	1 Unit	3 Units	5 Units	10 Units
0.1	0.0384	0.0486	0.0592	0.0736	0.0206	0.0316	0.0368	0.0444
0.2	0.0302	0.0462	0.0522	0.0750	0.0218	0.0286	0.0334	0.0452
0.4	0.0306	0.0414	0.0574	0.0800	0.0188	0.0220	0.0264	0.0386

Open in a new tab

D.2. Empirical Power of the SWT

We again generated 10,000 correlated count ITS of length n ∈ {60, 120} under the alternative hypothesis of a change point in the mean function. The change point was placed in the middle of the time series, at time point 31 if n = 60 and at 61 if n = 120, and the set of possible change points were assumed to be $𝒬_{60} = {25, 26, \dots, 34}$ and $𝒬_{120} = {50, 51, \dots, 69}$ . We also considered three scenarios for the adjacent correlation, $(ρ_{1} (τ)), ρ_{2} (τ)) \in {(0.1, 0.2), (0.2, 0.3), (0.4, 0.5)}$ , and assumed $β_{i 0} \overset{i i d}{\sim} N (1.12, 0.2), β_{i 1} \overset{i i d}{\sim} N (0.47, 0.2)$ , and $δ_{i} \overset{i i d}{\sim} N (- 0.31, 0.05)$ for all i.

Empirical power as a function of the change in slope is provided in Figure 7. We let $Δ_{i} \overset{i i d}{\sim} N (Δ_{x}, 0.01)$ for all i, with $Δ_{x} \in {- 1.8, \dots, 1.8}$ . Results were consistent with those from Section 3: empirical power decreased as the adjacent correlations increased and increased as the length of the time series and the number of units increased. Once again, there was a significant gain in power obtained by borrowing information across units and a lesser yet substantial gain in power as the length of the time series increased. Note, that the curves are no longer centered at 0, this is because $δ_{i}$ are not equal to zero in these simulations.

Fig. 7 — Plots empirical power of the SWT (testing for a change in the mean functions and correlation structure) as a function of the change in slope for n = 60 in the first column and n = 120 in the second column in the case where the mean functions vary by unit. The values of the change in slope ranged between −1.8 and 1.8.

D.2.1. Accuracy of Change Point Estimation Procedure

Figure 8 plots the proportion of simulations that correctly estimated the change point within one unit of the true change point when the SWT concluded that a change point did indeed exist, for all scenarios considered. As in Section 5.2.2, accuracy of our change point estimation procedure increased as the adjacent correlation decreased and as the number of units increased, and accuracy decreased as the number of response measurements increased. As with power, these curves are no longer centered at zero because the δ_i are not equal to zero.

Fig. 8 — The first column plots accuracy of our change point estimation procedure (when a change solely in the means is assumed) as a function of the change in slope for n = 60 and the second column for n = 120. The values of the change in slope ranged between −1.8 and 1.8. Note, accuracy is defined as the proportion of simulations that estimate the change point to be within one time point of the true change point after rejecting the null hypothesis that a change point does not exist via SWT.

D.2.2. Coverage Probabilities of Bootstrap Change Point Confidence Intervals

As in Section 5.2.3, we implement a bootstrap to obtain confidence intervals for the change point following the process proposed by Hušková and Kirch (2008). See Section 5.2.3 for details on bootstrap implementation.

We again generated 1,000 correlated count time series for a single slope change value, simulated 100 bootstrap time series for each of the 1,000 generated time series, and set the block length to 3 (approximately $n^{\frac{1}{5}}$ for both time series lengths n = 60 and n = 12). We only considered the case with ρ₁ = 0.1 and ρ₂ = 0.2 due to computational constraints and let $Δ_{i}, \overset{i i d}{\sim} N (Δ, 0.01)$ for all i, with $Δ \in {- 1.8, - 1.6, - 1, 1, 1.2, 1.4}$ for all i. All other parameters were set based on the power simulation parameter values.

Change point coverage probabilities are provided in Table 8. We do not provide coverage probabilities of the other model parameters for succinctness, as they increase with the number of units and behave similar to the coverage probabilities in Sections 3.2.2 and C.2.1. For a time series length of 60, we saw that the coverage probabilities increased as the number of units increased with a coverage probability close to 0.95 when there were 10 units and a change in slope of −1.8 and −1.6. When n = 120, however, the coverage probabilities decrease as the number of units increased. This discrepancy between time series length may be because of the chosen block length or because more bootstrap samples are needed.

Table 8.

Coverage probabilities of the circular block bootstrap change point confidence intervals for the case with unit-specific mean functions and a change in both the mean functions and correlation structure. Note, $Δ_{i}, \overset{i i d}{\sim} N (Δ, 0.01)$ for all i, with $Δ \in {- 1.8, - 1.6, - 1, 1, 1.2, 1.4}$ in this scenario. All other mean parameters, with the exception of the change point, vary across units.

Coverage Probabilities for Change Point Confidence Intervals

	n = 60				n = 120

Δ	1 Unit	3 Units	5 Units	10 Units	1 Unit	3 Units	5 Units	10 Units
−1.8	0.72	0.879	0.937	0.928	0.817	0.791	0.573	0.216
−1.6	0.693	0.889	0.948	0.934	0.794	0.824	0.722	0.374
−1	0.487	0.563	0.608	0.747	0.533	0.646	0.727	0.899
1	0.444	0.47	0.484	0.585	0.468	0.538	0.602	0.745
1.2	0.447	0.612	0.653	0.644	0.536	0.647	0.414	0.1
1.4	.528	0.705	0.793	0.81	0.633	0.716	0.432	0.079

Open in a new tab

Footnotes

Conflict of interest

The authors declare that they have no conflict of interest.

Contributor Information

Maricela Cruz, Kaiser Permanente Washington Health Research Institute, Seattle, WA, USA.

Hernando Ombao, Biostatistics Group, King Abdullah University of Science and Technology Thuwal, Saudi Arabia.

Daniel L. Gillen, Department of Statistics, University of California Irvine, Irvine, CA, USA

References

Agresti A, Ryu E (2010) Pseudo-score confidence intervals for parameters in discrete statistical models. Biometrika 97(1):215–222 [Google Scholar]
Barbiero A, Ferrari P (2015) Genord: Simulation of discrete random variables with given correlation matrix and marginal distributions. r package version 1.4. 0. 2015 [Google Scholar]
Bender M, Williams M, Su W, Hites L (2017) Refining and validating a conceptual model of clinical nurse leader integrated care delivery. Journal of Advanced Nursing 73(2):448–464 [DOI] [PubMed] [Google Scholar]
Bernal JL, Cummins S, Gasparrini A (2017) Interrupted time series regression for the evaluation of public health interventions: a tutorial. International journal of epidemiology 46(1):348–355 [DOI] [PMC free article] [PubMed] [Google Scholar]
Bernal JL, Soumerai S, Gasparrini A (2018) A methodological framework for model selection in interrupted time series studies. Journal of clinical epidemiology 103:82–91 [DOI] [PubMed] [Google Scholar]
Centers for Medicare and Medicaid Services (2018) The hospital value-based purchasing (vbp) program. https://www.cms.gov/Medicare/Quality-Initiatives-Patient-Assessment-Instruments/Value-Based-Programs/HVBP/Hospital-Value-Based-Purchasing.html, online; accessed 13 February 2019
Cruz M, Bender M, Ombao H (2017) A robust interrupted time series model for analyzing complex health care intervention data. Statistics in medicine 36(29):4660–4676 [DOI] [PubMed] [Google Scholar]
Cruz M, Gillen DL, Bender M, Ombao H (2019) Assessing health care interventions via an interrupted time series model: Study power and design considerations. Statistics in medicine 38(10):1734–1752 [DOI] [PMC free article] [PubMed] [Google Scholar]
Handley MA, Lyles CR, McCulloch C, Cattamanchi A (2018) Selecting and improving quasi-experimental designs in effectiveness and implementation research. Annual Review of Public Health 39:5–25 [DOI] [PMC free article] [PubMed] [Google Scholar]
Härdle W, Horowitz J, Kreiss JP (2003) Bootstrap methods for time series. International Statistical Review 71(2):435–459 [Google Scholar]
Hušková M, Kirch C (2008) Bootstrapping confidence intervals for the change-point of time series. Journal of Time Series Analysis 29(6):947–972 [Google Scholar]
Kavanagh KT, Cimiotti JP, Abusalem S, Coty MB (2012) Moving healthcare quality forward with nursing-sensitive value-based purchasing. Journal of Nursing Scholarship 44(4):385–395 [DOI] [PMC free article] [PubMed] [Google Scholar]
Liang KY, Self SG (1996) On the asymptotic behaviour of the pseudolikelihood ratio test statistic. Journal of the Royal Statistical Society: Series B (Methodological) 58(4):785–796 [Google Scholar]
Linden A (2015) Conducting interrupted time-series analysis for single-and multiple-group comparisons. Stata J 15(2):480–500 [Google Scholar]
Pan W (2001) Akaike’s information criterion in generalized estimating equations. Biometrics 57(1):120–125 [DOI] [PubMed] [Google Scholar]
Penfold RB, Zhang F (2013) Use of interrupted time series analysis in evaluating health care quality improvements. Academic pediatrics 13(6):S38–S44 [DOI] [PubMed] [Google Scholar]
Politis DN, Romano JP (1991) A circular block-resampling procedure for stationary data. Purdue University. Department of Statistics [Google Scholar]
Taljaard M, McKenzie JE, Ramsay CR, Grimshaw JM (2014) The use of segmented regression in analysing interrupted time series studies: an example in pre-hospital ambulance care. Implementation Science 9(1):77. [DOI] [PMC free article] [PubMed] [Google Scholar]
Wagner AK, Soumerai SB, Zhang F, Ross-Degnan D (2002) Segmented regression analysis of interrupted time series studies in medication use research. Journal of clinical pharmacy and therapeutics 27(4):299–309 [DOI] [PubMed] [Google Scholar]
West SG, Duan N, Pequegnat W, Gaist P, Des Jarlais DC, Holtgrave D, Szapocznik J, Fishbein M, Rapkin B, Clatts M, et al. (2008) Alternatives to the randomized controlled trial. American journal of public health 98(8):1359–1366 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R1] Agresti A, Ryu E (2010) Pseudo-score confidence intervals for parameters in discrete statistical models. Biometrika 97(1):215–222 [Google Scholar]

[R2] Barbiero A, Ferrari P (2015) Genord: Simulation of discrete random variables with given correlation matrix and marginal distributions. r package version 1.4. 0. 2015 [Google Scholar]

[R3] Bender M, Williams M, Su W, Hites L (2017) Refining and validating a conceptual model of clinical nurse leader integrated care delivery. Journal of Advanced Nursing 73(2):448–464 [DOI] [PubMed] [Google Scholar]

[R4] Bernal JL, Cummins S, Gasparrini A (2017) Interrupted time series regression for the evaluation of public health interventions: a tutorial. International journal of epidemiology 46(1):348–355 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R5] Bernal JL, Soumerai S, Gasparrini A (2018) A methodological framework for model selection in interrupted time series studies. Journal of clinical epidemiology 103:82–91 [DOI] [PubMed] [Google Scholar]

[R6] Centers for Medicare and Medicaid Services (2018) The hospital value-based purchasing (vbp) program. https://www.cms.gov/Medicare/Quality-Initiatives-Patient-Assessment-Instruments/Value-Based-Programs/HVBP/Hospital-Value-Based-Purchasing.html, online; accessed 13 February 2019

[R7] Cruz M, Bender M, Ombao H (2017) A robust interrupted time series model for analyzing complex health care intervention data. Statistics in medicine 36(29):4660–4676 [DOI] [PubMed] [Google Scholar]

[R8] Cruz M, Gillen DL, Bender M, Ombao H (2019) Assessing health care interventions via an interrupted time series model: Study power and design considerations. Statistics in medicine 38(10):1734–1752 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R9] Handley MA, Lyles CR, McCulloch C, Cattamanchi A (2018) Selecting and improving quasi-experimental designs in effectiveness and implementation research. Annual Review of Public Health 39:5–25 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R10] Härdle W, Horowitz J, Kreiss JP (2003) Bootstrap methods for time series. International Statistical Review 71(2):435–459 [Google Scholar]

[R11] Hušková M, Kirch C (2008) Bootstrapping confidence intervals for the change-point of time series. Journal of Time Series Analysis 29(6):947–972 [Google Scholar]

[R12] Kavanagh KT, Cimiotti JP, Abusalem S, Coty MB (2012) Moving healthcare quality forward with nursing-sensitive value-based purchasing. Journal of Nursing Scholarship 44(4):385–395 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R13] Liang KY, Self SG (1996) On the asymptotic behaviour of the pseudolikelihood ratio test statistic. Journal of the Royal Statistical Society: Series B (Methodological) 58(4):785–796 [Google Scholar]

[R14] Linden A (2015) Conducting interrupted time-series analysis for single-and multiple-group comparisons. Stata J 15(2):480–500 [Google Scholar]

[R15] Pan W (2001) Akaike’s information criterion in generalized estimating equations. Biometrics 57(1):120–125 [DOI] [PubMed] [Google Scholar]

[R16] Penfold RB, Zhang F (2013) Use of interrupted time series analysis in evaluating health care quality improvements. Academic pediatrics 13(6):S38–S44 [DOI] [PubMed] [Google Scholar]

[R17] Politis DN, Romano JP (1991) A circular block-resampling procedure for stationary data. Purdue University. Department of Statistics [Google Scholar]

[R18] Taljaard M, McKenzie JE, Ramsay CR, Grimshaw JM (2014) The use of segmented regression in analysing interrupted time series studies: an example in pre-hospital ambulance care. Implementation Science 9(1):77. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R19] Wagner AK, Soumerai SB, Zhang F, Ross-Degnan D (2002) Segmented regression analysis of interrupted time series studies in medication use research. Journal of clinical pharmacy and therapeutics 27(4):299–309 [DOI] [PubMed] [Google Scholar]

[R20] West SG, Duan N, Pequegnat W, Gaist P, Des Jarlais DC, Holtgrave D, Szapocznik J, Fishbein M, Rapkin B, Clatts M, et al. (2008) Alternatives to the randomized controlled trial. American journal of public health 98(8):1359–1366 [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

A generalized interrupted time series model for assessing complex health care interventions

Maricela Cruz

Hernando Ombao

Daniel L Gillen

Abstract

1. Introduction

Fig. 1.

2. Methods

2.1. The Generalized Robust ITS (GRITS) Model

2.2. Supremum Wald Test (SWT)

2.3. Parameter Estimation

Algorithm 1.

3. Empirical Studies

3.1. Empirical Type One Error of the SWT

Table 1.

3.2. Empirical Power of the SWT

Fig. 2.

3.2.1. Accuracy of Change Point Estimation Procedure

Fig. 3.

3.2.2. Coverage Probabilities of Bootstrap Confidence Intervals

Table 2.

4. Results

Table 3.

Table 4.

Fig. 4.

5. Discussion

Acknowledgements

A. Working Correlation Matrices for GRITS

B. Estimators of the Adjacent Correlations for GRITS

C. Empirical Studies II

C.1. Empirical Type One Error of the SWT

Table 5.

C.2. Empirical Power of the SWT

Fig. 5.

C.2.1. Accuracy of Change Point Estimation Procedure

Fig. 6.

C.2.2. Coverage Probabilities of Bootstrap Confidence Intervals

Table 6.

D. Empirical Studies III

D.1. Empirical Type One Error of the SWT

Table 7.

D.2. Empirical Power of the SWT

Fig. 7.

D.2.1. Accuracy of Change Point Estimation Procedure

Fig. 8.

D.2.2. Coverage Probabilities of Bootstrap Change Point Confidence Intervals

Table 8.

Footnotes

Contributor Information

References

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases