Summary:
Epidemiologic studies of the short-term effects of ambient particulate matter (PM) on the risk of acute cardiovascular or cerebrovascular events often use data from administrative databases in which only the date of hospitalization is known. A common study design for analyzing such data is the case-crossover design, in which exposure at a time when a patient experiences an event is compared to exposure at times when the patient did not experience an event within a case-control paradigm. However, the time of true event onset may precede hospitalization by hours or days, which can yield attenuated effect estimates. In this article, we consider a marginal likelihood estimator, a regression calibration estimator, and a conditional score estimator, as well as parametric bootstrap versions of each, to correct for this bias. All considered approaches require validation data on the distribution of the delay times. We compare the performance of the approaches in realistic scenarios via simulation, and apply the methods to analyze data from a Boston-area study of the association between ambient air pollution and acute stroke onset. Based on both simulation and the case study, we conclude that a two-stage regression calibration estimator with a parametric bootstrap bias correction is an effective method for correcting bias in health effect estimates arising from delayed onset in a case-crossover study.
Keywords: Air pollution epidemiology, Bootstrap, Conditional score, Marginal likelihood, Regression calibration
1. Introduction
The case-crossover design, first proposed by McClure (1991), has proven useful for assessing the health effects of environmental risk factors, such as air pollution. The design constructs matched sets of days, in which the exposure at a time when a patient experiences an event is contrasted to exposure for that patient at a set of “reference” times at which the event did not occur. One can then use standard conditional logistic regression to estimate the association between exposure and event onset within the matched sets.
In some situations, the exact time of event onset is not available. For instance, in studies of stroke, patients may come in to the hospital hours, days, or even weeks after the onset of symptoms. In such cases, this time lag between true onset and hospitalization leads to mis-timed exposure assignment, and subsequent analysis using this error-prone covariate leads to biased estimates of association. For instance, using validation data on the distribution of delay times for stroke hospitalization in the greater Boston area, Lokken et al. (2009) showed that, given the PM2.5 time series observed at a central site monitor, misclassification of the time of event onset can yield health effect estimates biased towards the null by more than 30% when exposure assessment is based on date of hospital presentation, and by more than 60% when exposure is based on date of hospital admission. In general, there has been an enormous literature on measurement error methods that adjust for exposure measurement error in both linear and nonlinear regression models (Gustafson, 2003; Carroll et al., 2006; Buonaccorsi, 2010; Yi, 2017), and these include several methods for retrospective case-control designs (e.g. McShane et al., 2001, Guolo and Brazzle, 2008). However, to the best of our knowledge, no one has considered measurement error corrections directly applicable in case-crossover designs with a mis-measured exposure.
In this article we analyze data on the association between stroke onset and air pollution levels in the greater Boston area. In this study, investigators reviewed charts and electronic medical records of all patients aged 21 years and older admitted to the Beth Israel Deaconess Medical Center (BIDMC) between 1 April 1999 and 31 December 2004 with a primary discharge diagnosis related to ischemic cerebrovascular disease (ICD-9 codes 433–438). Patients who were admitted for a neurologist-confirmed acute ischemic stroke were identified. This study obtained hourly measures of ambient particles with aerodynamic diameter of 2.5 μm or less (PM2.5) from the Boston/Harvard Countway Library PM Center, located less than 1 km from the BIDMC. In this work we focus on the association between stroke onset and 24-hour average of PM2.5 prior to onset.
Wellenius et al. (2012) presented the primary findings from the study using exposure derived from the exact onset times resulting from this careful review, which for the purposes of the current work we consider the “gold standard”. However, in other, larger studies focusing on this question, it is not always possible to conduct a manual review of all medical records to adjudicate the exact event times, but it is often feasible to validate timing onset in a sub-sample of study participants. Therefore, in this work we evaluate the performance of multiple potential approaches to adjust for delay of onset in such settings when only a validation sample of the delay time distribution is available. This Boston study with its availability of exact onset times for all subjects provides the rare opportunity to do this. Specifically, we compare the results of each proposed correction against what would have been obtained were we to have the exact onset times, which we ordinarily cannot do.
We consider several possible measurement error correction methods for the case-crossover design. We apply the existing conditional scores approach of McShane et al. (2001) developed for case-control designs. It is not immediately clear that this correction method will perform well because the traditional measurement error assumptions upon which this method depends are not met in the case-crossover design. We therefore propose two alternatives, a marginal likelihood approach and a regression calibration approach, that use data on the distribution of delay times from an external source. Simulations (see Section 5) show that, while these methods do a relatively good job removing large amounts of bias observed in our example study, some residual bias remains, particularly when the effect size is large. We employ a second-stage parametric bootstrap to remove the remaining bias incurred by these three estimators, yielding approximately unbiased health effect estimators for this design in the presence of measurement error. Interestingly, application of the bootstrap bias correction to the naive estimator does not result in an unbiased estimator, suggesting that both stages of bias correction are necessary.
2. The Case-Crossover Design
In the Boston stroke study, as in many other case-crossover studies of air pollution and risk of clinical events, a single exposure time series recorded at a central monitoring site is assigned to all patients on a given day. We also assume all time-specific measures of confounders (e.g., temperature and relative humidity) are also measured once at a given time. Therefore let be a vector of covariates measured at time . In the motivating Boston data, represents the primary exposure (PM2.5) of interest and represent functions of time-specific temperature and barometric pressure, as described further in Section 6.
Now consider the case-crossover design in which, for patient , we define to be the time associated with the event, with the “1” reflecting that the event day is the first observation in the matched set of days for that patient. In this design, we define a binary response variable to reflect that this first observation in the matched set corresponds to the event (the “case”). The case-crossover design then defines a set of reference times , for which we set . There now exists a large literature on appropriate strategies for constructing the set of reference days in a case-crossover design (e.g. Lumley and Levy, 2000; Maclure and Mittleman, 2008). The most common approach in environmental epidemiological applications, the time-stratified matched design, selects reference times to be all days that fall on the same day-of-the-week within the same month as the event (e.g. all Mondays in June 2002). This design is especially useful for removing seasonal confounding and day of week effects in epidemiologic studies of the association between ambient air pollution (e.g., PM2.5) exposure and the onset of acute cardiovascular or cerebrovascular events. Let denote the operator that defines the matched set for a given onset time. That is, letting denote the number of controls selected for a given event time is the set consisting of the event time and the associated reference times for patient . We emphasize that the number of control days, , can vary depending on when a given event occurs.
Following Lu and Zeger (2007), we conduct our case-crossover analysis assuming that the event is rare and that the probability that patient has an event on day is given by the relative risk model , where represents a constant baseline frailty for person . The probability that an event occurs on the event day , conditional on the fact that only one event occurs in the set of days , is , where is a vector of log odds ratios corresponding to a unit increase of each covariate, controlling for all others. The log conditional likelihood given the data for all patients, , is
| (1) |
and inference on can be performed using any software package that implements conditional logistic regression.
3. Exposure Measurement Error in the Case-Crossover Design
Now suppose that patient experiences an event at time but this true time of onset is not observed and instead this time is recorded to have occurred after some delay, . Denote the delayed time of onset of record for patient as . Analogous to the case with no error, denote . When the time of onset is mis-timed, a naive analysis selects mis-timed control days, resulting in the incorrect matched set . In addition to the timing of the control days, the number of control days, , can be misspecified in the presence of delay of onset. For example, if an event occurs in February with reporting delayed until a day in March, the number of controls can change from to . The naive analysis maximizes the misspecified log likelihood
| (2) |
A particular challenge of the problem under consideration is that the exposure errors generated by the delayed onset often do not follow the most commonly used measurement error models. For instance, a classical measurement error model assumes that the mis-measured exposure is equal to the true exposure plus some independent error, whereas a Berkson error model assumes that the true exposure is equal to the mis-measured exposure plus independent error (Carroll et al., 2006). Figure 1, which presents a scatterplot of the true exposure errors versus the true exposures for the Boston stroke data, demonstrates that the traditional assumption that the errors are independent of true exposure is violated for this exposure in this particular population. This plot shows a negative association exists between the true exposure and the difference between the true exposure and exposure based on the mis-specified onset time. This association arises from the nature of the error process. Intuitively, when the exposure at the true onset time is high, the exposure at the delayed onset time will on average be lower than the true exposure, yielding negative values for the error defined as the difference between the error-prone and true exposures, and vice versa.
Figure 1.

Scatterplot of association between error due to delay of onset with true exposure values in the Boston-area stroke study, based on prior 24-hour PM2.5 exposure to each stroke event and corresponding control days in matched sets defined by delayed exposure time.
In general, the error generating mechanism will depend on both the properties of the exposure time series and the association between exposure and response, and while for specific scenarios it may be possible to carefully construct a valid measurement error model for this process, it appears to difficult to do so in any generality. In particular, instead of an additive classical or Berkson error structure, the error instead arises from the interplay between the structure of the underlying exposure time series and the distribution of lag times. Therefore, we seek to develop measurement error corrections that reflect the nature in which the errors are generated (delayed onset), and compare the performance of these methods to that of one, the conditional score method, that naively assumes a classical measurement error model.
4. Error Correction Methods for Case-Crossover Design
The first correction approach we consider, a marginal likelihood (ML) estimator, treats the true onset times, or equivalently the delay time for each individual, as an unknown latent random variable. We view the second approach, a type of regression calibration (RC), as a computationally simpler approximation to the marginal likelihood approach. Importantly, this approach conditions on the matched sets constructed using the mis-timed onset times. This strategy of conditioning on the matched sets based on the mis-timed onset times opens up the possibility of using existing methods for matched case-control designs, although such existing methods rely on assumptions that do not hold in the case-crossover setting. To illustrate this point and investigate the impact of the violation of these assumptions on inference, we also apply a conditional score approach proposed by McShane et al. (2001). Finally, because of the relationship between the true and mis-measured exposures in the case-crossover setting, all three of these methods correct for much but not all of the bias, with the relative improvement varying across the different methods. Therefore, we also investigate the ability of a second bias correction based on a parametric bootstrap to remove the remaining bias of all methods.
For all of the methods considered in this Section, we assume that we have a validation sample of delay times, , This sample of delay times could represent either an internal or external sample. In this work, we require a sufficient sample of size such that the distribution of these delay times in the population can be estimated empirically, although we discuss potential extensions that develop models for the delay times in Section 7.
4.1. Marginal Likelihood Estimator
The marginal likelihood approach treats the delay times as a latent random variable with probability distribution . For the marginal likelihood estimator, we write the likelihood contribution from patient given the delay time for that patient, and marginalize this likelihood contribution over the estimated distribution of the delay times obtained from the validation data.
Specifically, the event for patient occurs at time with corresponding exposure . We assume the delay times , , follow distribution defined on the non-negative domain. The likelihood contribution from patient , conditional on , is
After integrating over the distribution of the delay times , the log marginal likelihood is
| (3) |
Several approaches forward are possible. We can make a parametric assumption for the distribution of and maximize (3) given the observed events and validation data on the delay times as a function of and the parameters in the distribution . In this work, we use the empirical distribution of the delay times observed in the validation data, such that
This nonparametric empirical estimate of yields the approximation to the marginal likelihood
| (4) |
While there are many optimization routines available in standard software to maximize (4) with respect to , we select the quasi-Newton algorithm method implemented in optim() function provided in the statistical software R. This implementation successfully maximized objective function (4) for all datasets generated in the simulation study as well as for the Boston stroke data.
4.2. Regression Calibration
Calculation of for a given value of requires the construction of the matched set for each delay time associated with an event, which can be computationally burdensome. We therefore also consider a computationally simple approximation to this marginalized likelihood using a regression calibration (RC) type approach. This RC approach conditions on the set of days selected given the mis-timed events, and computes the expected value of the true exposure, for both case and control days, given the observed event times and the estimated distribution of the random delay times. Specifically, conditioning on the matched set constructed based on the mis-timed onset for patient , we compute for all . This approach then uses these expected values as covariates in a standard conditional logistic regression analysis, which has the advantage that it can be implemented in standard software for conditional logistic regression.
4.3. Conditional Score Estimator
McShane et al. (2001) proposed the conditional score (CS) method to correct for exposure measurement error in a matched case-control study. The approach treats the unknown true exposures as unknown parameters and constructs unbiased conditional scores by conditioning on the sufficient statistics for these parameters. Under the assumptions of classical measurement error and an error-generating process that does not change the construction of the matched sets, this construction yields consistent estimation of the conditional score. We apply this approach by fixing the matched sets constructed using delayed onset, , and we view as error-prone surrogates for covariates .
For notational convenience, we use simplified notation to denote the true exposures as , . Further, we denote the mis-timed exposures , and write . The conditional score method assumes the are i.i.d. multivariate normal, with zero-mean and variance . Further, it assumes these errors are independent of true exposures , and that is non-differential, such that it contains no information about conditional on .
Let , be the difference between the vector of observed covariates at the control time and those at the event time for matched set , and let , denote the analogous differences for the true values of the covariates. Let denote the vector of binary response variables associated with the time points in the reference set . Finally, let and . We write the conditional probability as .
McShane et al. (2001) showed that , where is the variance-covariance matrix of given and , is sufficient for . We present the analogous derivation using the specific case-crossover notation used in this article in supporting information. These authors showed that the full conditional log likelihood is
| (5) |
with a new covariate for and 0 for . In contrast to (1), equation (5) has a covariate that depends on the estimator of interest. To maximize (5), we solve the conditional logistic regression problem iteratively using standard statistical software, where the naive estimator obtained by maximizing (2) is used as the initial value for .
The conditional score estimator assumes a classical error model for the exposure error, which assumes that is independent of the error induced by the delay in onset. As noted above, Figure 1 shows that this assumption does not hold for PM2.5 exposure in the Boston stroke study. We include this method in our simulation studies and Boston stroke case study to assess how this model misspecification affects inference in practice. Because measurement error due to delay in symptom onset in the case-crossover design does not neatly fit into the additive error framework, it is not immediately clear how one should obtain an estimate of even if one wanted to ignore this mis-match. To be able to consider this method as a comparator to the marginal likelihood and regression calibration methods we propose in Sections 4.1 and 4.2, respectively, we adopt an ad-hoc approach that (1) constructs resamples of delays , , from the empirical distribution of the validation sample of delay times, (2) constructs , (3) computes for each resample, and (4) computes .
4.4. Second-stage Bias Correction Using a Parametric Bootstrap
In our simulation studies presented in Section 5, the proposed marginal likelihood, regression calibration, and conditional score estimators overcorrect the downward bias incurred by the naive estimator, with this bias increasing for larger true effect sizes. Intuition for this for the regression calibration approach is most easily gained from considering Figure 1, which shows that the errors around the estimated expected value of the true exposure given delayed onset is not truly independent Berkson error, and we view the regression calibration as an approximation to the marginal likelihood in the nonlinear conditional logistic regression model (Carroll et al., 2006). To remove this remaining source of bias, we employ a second-stage parametric bootstrap procedure. The parametric bootstrap in our formulation uses the equivalence of the case-crossover method and the time series method, where the expected total number of events at each time is modeled by a log-linear regression (Lu and Zeger, 2007). We adopt a Poisson model for data generation in parametric bootstrap.
Let be the total number of events on time throughout all samples. Suppose is the log relative risk representing the association between and in the conditional logistic regression setting. It is known that follow Poisson distribution with mean exp . For the parametric bootstrap, we use this Poisson model with the estimate . from measurement error correction method, then generate bootstrap samples from this model. Although is not of interest here, it must be estimated for generation of the bootstrap resamples. We estimate from the Poisson regression fit with offset variable . From and , we generate parametric bootstrap samples, , from Poisson distribution with mean exp . The bootstrap estimator of , denoted by , is obtained by fitting a given correction method to each of the resampled data set, yielding . Applying a standard bootstrap bias correction (Efron and Tibrhirani, 1993), we obtain second-stage bias-corrected estimator where is the average of bootstrap estimates.
4.5. Variance Estimation
We also explored variance estimation for the methods considered in this paper. For the naive and regression calibration estimators, we evaluated the performance of the naive variance estimators that ignore the uncertainty in the exposure covariates used in the algorithms, as reported by the software as the standard errors from a standard conditional logistic regression at the last iteration of the algorithm. For the conditional score method, following McShane et al. (2001), we calculated the standard error of the resulting estimator for using a nonparametric bootstrap where the re-sampling unit is the matched set. For the marginal likelihood approach, we used the inverse of the observed information matrix based on the approximation (4), calculated using numerical differentiation. For the two-stage bootstrapped estimators (bootstrapped naive, bootstrapped conditional score, bootstrapped regression calibration, bootstrapped marginal likelihood), formal variance estimation requires a double bootstrap procedure in which variability in the bootstrapped estimator is itself estimated via the bootstrap. However, this procedure, which requires bootstrap re-samples, is computationally demanding. Therefore, we evaluate the performance of the approach that approximates this double bootstrap by using the same set of bootstrap resamples both for the bias correction step and variance estimation, as has been done in other environmental epidemiology settings (Szpiro et al., 2011).
5. Simulation Study
To assess the operating characteristics of the estimators described in Section 4 in finite samples, for each simulation scenario we generated 2000 simulated datasets to evaluate the operating characteristics of each estimator described in Section 4. To mimic the data structure of the motivating Boston-area stroke study, we simulated each data set to contain a mean value of 1,100 cases of acute ischemic stroke over a 5-year period. We simulated the number of cases observed each hour from a Poisson distribution with mean , where is the vector of actual (PM2.5,t,Tempt) in the Boston data. We set , with different scenarios for the PM2.5 effect corresponding to .
For each dataset, we then sampled the event-specific delay times, , from the empirical distribution of the delay times to hospital presentation in the Boston data. For each event, we record the true exposures at onset as well the mis-timed exposures based on that event’s sampled delay. We then compute the matched sets for each event time as needed for each estimator. These reference times for any given time is defined as the set of same hours in the same year, month and day-of-week as the case time . For example, if a case occurs at 2pm on Tuesday of the second week of March, 2009, then the corresponding reference times are 2pm on Tuesday of all weeks of March, 2009. We then sampled a set of 300 delay times to serve as a validation sample for each simulated dataset. To each simulated dataset we applied the naive (“Error”), conditional score (CS), regression calibration (RS), marginal likelihood (ML), bootstrap naive (B-Error), bootstrap conditional score (B-CS), bootstrap regression calibration (B-RC), and bootstrap marginal likelihood (B-ML) estimators. The bootstrap corrected estimators were based on bootstrap resamples.
Table 1 presents the simulated mean, bias, relative bias expressed as a percentage of (% bias), standard deviation, and root mean square error, as well as the mean of the standard error and the coverage of the associated normal theory 95% confidence based on this standard error, of each method. Figure S.1 in supporting information presents boxplots of the simulated distributions for each estimator under each simulated scenario. Bias results for the naive estimator reinforce the findings presented by Lokken et al. (2009), showing large amounts of downward bias due to the error induced by onset delay. These findings show this bias leads to severe under-coverage of the resulting 95% confidence intervals.
Table 1.
Simulation Results Based on 500 Simulated Data Sets for the naive (“Error”), conditional score (CS), regression calibration (RS), marginal likelihood (ML), bootstrap naive (“B-Error”), bootstrap conditional score (B-CS), bootstrap regression calibration (B-RC), and bootstrap marginal likelihood (B-ML) estimators. Bootstrap estimators based on B = 200 resamples. Values for β, Mean, Bias, SE, SD, and RMSE are reported as value ×103.
| βPM | True | Error | CS | RC | ML | B-Error | B-CS | B-RC | B-ML | |
|---|---|---|---|---|---|---|---|---|---|---|
|
| ||||||||||
| 0.0 | Mean | −0.1 | −1.0 | −1.4 | −0.1 | −0.6 | −0.4 | 0.8 | 0.4 | 0.5 |
| Bias | −0.1 | −1.0 | −1.4 | −0.1 | −0.6 | −0.4 | 0.8 | 0.4 | 0.5 | |
| % Bias† | – | – | – | – | – | – | – | – | – | |
| SE | 6.6 | 6.7 | 13.4 | 8.9 | 9.0 | 6.9 | 13.4 | 9.1 | 9.3 | |
| SD | 6.7 | 6.9 | 12.8 | 9.1 | 9.2 | 9.6 | 10.8 | 9.0 | 9.0 | |
| RMSE | 6.7 | 7.0 | 12.8 | 9.1 | 9.2 | 9.7 | 10.9 | 9.0 | 9.0 | |
| Coverage* | 94.8 | 94.0 | 97.3 | 94.7 | 95.0 | 84.6 | 97.7 | 95.2 | 95.8 | |
|
| ||||||||||
| 15.0 | Mean | 15.0 | 7.8 | 14.3 | 15.5 | 15.0 | 12.1 | 14.8 | 15.2 | 15.4 |
| Bias | 0.0 | −7.2 | −0.7 | 0.5 | 0.0 | −2.9 | 0.1 | 0.2 | 0.4 | |
| % Bias† | 0.1 | −47.9 | −4.4 | 3.4 | −0.1 | −19.5 | −1.4 | 1.6 | 2.4 | |
| SE | 6.1 | 6.3 | 11.5 | 8.3 | 8.3 | 6.5 | 11.5 | 8.5 | 8.6 | |
| SD | 6.2 | 6.3 | 11.2 | 8.4 | 8.4 | 8.6 | 9.5 | 7.9 | 7.8 | |
| RMSE | 6.2 | 9.5 | 11.2 | 8.4 | 8.4 | 9.1 | 9.5 | 7.9 | 7.8 | |
| Coverage* | 94.5 | 79.0 | 97.4 | 93.7 | 94.3 | 84.3 | 99.6 | 96.0 | 96.8 | |
|
| ||||||||||
| 30.0 | Mean | 30.1 | 17.7 | 31.8 | 32.0 | 31.3 | 25.8 | 28.3 | 30.0 | 29.9 |
| Bias | 0.1 | −12.3 | 1.8 | 2.0 | 1.3 | −4.2 | −1.7 | 0.0 | −0.1 | |
| % Bias† | 0.3 | −41.0 | 5.9 | 6.6 | 4.3 | −14.1 | −5.8 | −0.2 | −0.2 | |
| SE | 5.6 | 5.8 | 11.9 | 7.6 | 7.8 | 6.1 | 11.9 | 7.7 | 8.0 | |
| SD | 5.7 | 5.7 | 10.8 | 7.7 | 7.9 | 7.6 | 7.3 | 6.8 | 6.8 | |
| RMSE | 5.7 | 13.6 | 11.0 | 7.9 | 8.0 | 8.7 | 7.5 | 6.8 | 6.8 | |
| Coverage* | 94.8 | 42.5 | 98.7 | 94.3 | 94.9 | 84.0 | 99.3 | 97.3 | 97.8 | |
|
| ||||||||||
| 45.0 | Mean | 45.2 | 28.1 | 52.7 | 49.1 | 48.8 | 39.8 | 24.3 | 44.3 | 44.1 |
| Bias | 0.2 | −16.9 | 7.7 | 4.1 | 3.8 | −5.2 | −20.7 | −0.7 | −0.9 | |
| % Bias† | 0.4 | −37.5 | 17.1 | 9.2 | 8.6 | −11.6 | −45.9 | −1.5 | −2.0 | |
| SE | 5.2 | 5.3 | 21.7 | 7.0 | 7.6 | 5.6 | 21.7 | 7.1 | 7.7 | |
| SD | 5.3 | 5.2 | 13.1 | 7.2 | 7.6 | 6.9 | 29.7 | 5.9 | 5.8 | |
| RMSE | 5.3 | 17.7 | 15.2 | 8.3 | 8.5 | 8.6 | 36.2 | 5.9 | 5.9 | |
| Coverage* | 94.6 | 11.3 | 99.2 | 91.0 | 93.3 | 79.6 | 95.8 | 98.4 | 99.1 | |
Relative Bias as a Percentage of βPM
Normal Theory 95% Confidence Interval:
Table 1 also shows that, potentially due to the violation of the independence assumption for the exposure and the exposure error in this setting, the conditional score estimator is substantially biased, particularly when the effect size is large. The regression calibration and maximum likelihood estimators remove almost all of the bias incurred by the naive estimator, but tend to overcorrect slightly when the effect size is large. The second stage parametric bootstrap step effectively removes the remaining bias in all three (CS, RC, ML) approaches. Interestingly, the bootstrapped version of the naive estimator (B.Error) does not remove all of the bias incurred by the naive estimator, suggesting that both stages of correction are necessary to yield approximately unbiased estimators.
The two conditional score estimators are the most variable of the estimators, while the bootstrapped regression calibration and bootstrapped marginal likelihood are more variable than the gold standard (as expected) but much less variable than their conditional score counterparts. Despite this increase in variance, the bootstrap regression calibration and bootstrap marginal likelihood improve upon the naive estimator with respect to RMSE due to the large reductions in bias, which is the goal of any measurement error correction method. The coverage of the confidence intervals from these two approaches are slightly conservative, relative to the nominal 95% level, due to slight overestimation of the uncertainty of each estimator (mean SE relative to empirical SD). Variance estimates for the conditional score method significantly underestimate the relatively large variance of this estimator.
6. Analysis of the Boston Stroke Data
The Boston-area stroke onset and air pollution study contained data from 1763 patients aged 21 years and older admitted to the Beth Israel Deaconess Medical Center (BIDMC) with a primary discharge diagnosis related to ischemic cerebrovascular disease between 1 April 1999 and 31 December 2004. Patients with in-hospital strokes or transient ischemic attacks were excluded. Patients for whom onset dates were available but onset times were unknown were assumed to have had strokes at 9:00 AM, as discussed previously (Wellenius et al. 2012). One of the novel aspects of this study is the adjudication of the true onset times. Although the correction methods handle observations in which the true onset time is not known, we excluded 58 patients for whom stroke onset dates were unavailable because we want to compare the estimates from each correction method to the estimate we obtain using the true onset times. Delay time to hospital presentation for the remaining 1705 patients ranged from 1 hour to over 2 weeks, with a median delay of 16 hours. We obtained hourly measures of ambient particles with aerodynamic diameter of 2.5 μm or less (PM2.5) from our centrally sited monitor located on the top of the Harvard Countway Library located in downtown Boston, which is located less than 1 km from the BIDMC study site. The study was approved by the Institutional Review Board of the Beth Israel Deaconess Medical Center.
For our case study, we used as our exposure metric of interest in the case-crossover analyses the 24-hour mean PM2.5 level prior to an event. The goal of the analysis was to correct the naive estimate and investigate how well the various correction methods recover what would have been obtained using exposures measured at true time of onset. To ensure that our validation time distribution is not exactly equal to the true delay time distribution in this population, we drew a sample of 300 delays (18%) from the distribution of 1705 delay times (between onset and admission) to form our validation sample. Following Wellenius et al. (2012), we fit all models estimating the association between 24 hour average PM2.5 and stroke onset, adjusting for a linear effect of 24 hour average barometric pressure and for ambient temperature (in C) using natural cubic splines with three degrees of freedom. We analyzed the data using all of the methods considered in the simulation study.
Figure 2 presents the point estimates and 95% confidence intervals obtained from each method applied to the data. As in the simulation study, parametric bootstrap resamples were drawn for bias correction. The 95% confidence interval associated with this effect estimate, shown at the top of the figure, shows that this effect estimate is significantly different from 0 at the level.
Figure 2.

Results of the naive estimator (Error), conditional score (CS), regression calibration (RC), bootstrap-correct naive estimator (B.Error), bootstrap-correct conditional score (B.CS), and bootstrap-correct regression calibration (B.RC) when applied to the motivating Boston-area stroke data. The bootstrap correct estimators are based on bootstrap resamples.
The naive estimate that ignores errors associated with delay of onset is attenuated towards zero and its associated 95% confidence interval contains zero, demonstrating the bias and loss of power associated with this approach. The next set of estimates and confidence intervals in the figure, in order, are obtained by conditional score, regression calibration, marginal likelihood, the naive estimator plus bootstrap, conditional score plus bootstrap, the regression calibration plus bootstrap, and the marginal likelihood plus bootstrap. All of these estimators remove the bias associated with the naive estimator. However, there are noticeable differences in the estimated uncertainties around these point estimates reflected in the width of the confidence intervals. First, the width of the confidence interval for the bootstrapped, naive estimator is less than that of the gold standard, agreeing with the simulation results that this approach yields intervals that are too narrow. Second, while the conditional score approaches (with and without bootstrap) yield results similar to those from the other bias-corrections, the resulting estimates are slightly inflated relative to the gold standard estimates and the estimated uncertainties are noticeably larger than those from the other methods. Finally, the regression calibration and the marginal likelihood estimates, and their bootstrap counterparts, are relatively close to the gold standard estimate. The confidence intervals for these four estimates are slightly wider than those from the naive approach, which is appropriate since one would expect correction for the measurement error yields estimators having larger variance, and the intervals for the bootstrapped versions are slightly wider than those for the original estimators that are not bootstrapped. In this case study, unlike the naive approach that does not correct for measurement error, all of the bias corrected estimates except the conditional score estimates (original and bootstrapped version) are significant at the level.
7. Discussion
While investigators of the motivating Boston-area stroke study validated the onset times of all events so that these measurement error corrections are not necessary, many studies, such as those based on administrative data, do not have access to the true onset times. Therefore, we believe a comparison of possible approaches to adjust for this error, which can have a serious impact on inference, is valuable. Having the true onset times in the Boston stoke study is particularly illuminating because we can assess how close the inferences based on the various correction methods are to what we obtain using the true onset times.
The regression calibration and marginal likelihood approaches require validation data to estimate the distribution of the delay times. The conditional score method also requires validation data to estimate the variance covariance matrix of the exposure errors. A key assumption of any approach relying on validation data is the transportability of the distribution of the delay times in the validation sample to the population of interest. When this assumption is violated these estimators will likely incur more bias.
Further, the methods we consider assume that the distribution of delay times does not depend on subject characteristics. For instance, one could envision a scenario in which higher levels of exposure lead to more severe events, and therefore shorter delays between onset and hospital presentation or admission. Alternatively, the distribution of delay times may vary by patient demographics, such as gender, race/ethnicity, age at time of onset, or other factors. We explored this possibility in the Boston stroke study, but did not find any evidence that this heterogeneity exists in this population. Therefore, we did not pursue extensions of the methods to accommodate this complication. In practice, in situations in which validation samples are obtained by adjudicating true onset times in a subsample of the study population, one may prefer a stratified sampling scheme in which delay times are randomly sampled within population strata. In such a design, an interesting future research direction is the development of correction methods that incorporate non-saturated regression models for the delay time distributions as a function of patient characteristics.
Because our interest focused on evaluating the performance of potential correction methods for delay of onset in case-crossover studies, we ran all of our analyses using a pre-specified exposure metric: 24-hour average of PM2.5 prior to onset. This assumption built on the primary finding of this study reported by Wellenius et al. (2012), who evaluated the association between exposure and outcome for a range of exposure averaging times and reported strong associations between onset and 24-hour exposure. In many environmental epidemologic settings, however, the relevant exposure timing is unknown and it is of interest to identify the times during which exposure is most strongly associated with onset. Several recent studies have employed distributed lag models (Gasparrini et al., 2009; Gasparrini, 2014) to address this question. An interesting direction for future research would be to extend the analytic frameworks proposed in this work to the setting in which the identification of exposure timing using distributed lag models is of primary interest.
It is well-known that traditional regression calibration corrections for measurement error are not strictly unbiased for nonlinear regression models (Carroll et al., 2006), but can work well in practice for logistic regression (Thoresen and Laake, 2000). In the case-crossover design, we also found that the regression calibration approach reduced a large amount, but not all, of the bias incurred by the naive approach under the simulation scenarios considered in Section 5. However, for larger effect sizes, the resulting estimator appeared to over-correct, in that the resulting biases were small but positive. Although overestimation of health effects provides stronger protections for public health, it can also lead to regulations that are difficult to defend, particularly to stakeholders who bear the costs of those regulations based on overstated health risks. Therefore, we recommend the second bootstrap stage to yield an approximately unbiased estimator in this setting. Given its computational simplicity, this estimator appears to be a promising approach to correct for the severe attenuation of effect estimates in case-crossover designs with delayed time of onset.
Supplementary Material
Acknowledgements
This work was supported by NIH grants R01ES020871, P30ES000002, U2CES026555 and a US EPA grant (RD-83587201). Its contents are solely the responsibility of the grantee and do not necessarily represent the official views of the USEPA. Further, USEPA does not endorse the purchase of any commercial products or services mentioned in the publication.
Footnotes
Supporting Information
Web Appendices and Figures referenced in Sections 4.3 and 5 are available with this paper at the Biometrics website on Wiley Online Library. Software for implementing the methodology considered in this article and for running the simulation study described in Section 5 is available at https://github.com/glenmcgee/casecrossoverME.
References
- Buonaccorsi JP (2010). Measurement Error Models, Methods, and Applications. Boca Raton, FL: Chapman and Hall. [Google Scholar]
- Carroll RJ, Ruppert D, Stefanski LA and Crainiceanu CM (2006). Measurement Error in Nonlinear Models 2nd edition, Boca Raton, FL: Chapman and Hall. [Google Scholar]
- Efron B and Tibshirani R (1993). An Introduction to the Bootstrap. Boca Raton, FL: Chapman & Hall/CRC. [Google Scholar]
- Gasparrini A, Armstrong B, Kenward MG. (2009). Distributed lag non-linear models. Statistics in Medicine 29:2224–2234. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Gasparrini A (2014). Modeling exposure-lag-response associations with distributed lag nonlinear models. Statistics in Medicine 33:881–899. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Gustafson P (2010). Measurement Error and Misclassification in Statistics and Epidemiology: Impacts and Bayesian Adjustments. Boca Raton, Florida: Chapman and Hall. [Google Scholar]
- Guolo A and Brazzle AR (2008). A simulation-based comparison of techniques to correct for measurement error in matched case-control studies. Statistics in Medicine 27, 3755–3775. [DOI] [PubMed] [Google Scholar]
- Lokken PR, Wellenius GA, Coull BA, Burgr MR, Schlang G, Suh HH, et al. (2009). Air pollution and risk of stroke: underestimation of effect due to misclassification of time of event onset. Epidemiology 20, 137–142. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lu Y and Zeger SL (2007) On the equivalence of case-crossover and time series methods in environmental epidemiology. Biometrics 8, 337–344. [DOI] [PubMed] [Google Scholar]
- Lumley T, Levy D (2000). Bias in the case-crossover design: implications for studies of air pollution. Environmetrics 11, 689–704. [Google Scholar]
- McClure M (1991). The case-crossover design: a method for studying transient effects on the risk of acute events. American Journal of Epidemiology 133, 144–153. [DOI] [PubMed] [Google Scholar]
- Maclure M and Mittleman MA. (2008). Case-crossover designs compared with dynamic follow-up designs. Epidemiology 19, 176–178. [DOI] [PubMed] [Google Scholar]
- McShane LM, Midthune DN, Dorgan JF, Freedman LS and Carroll RJ (2001). Covariate measurement error adjustment for matched case-control studies. Biometrics 57, 62–73. [DOI] [PubMed] [Google Scholar]
- Thoresen M, Laake P. (2000). A simulation study of measurement error correction methods in logistic regression. Biometrics 56, 868–872. [DOI] [PubMed] [Google Scholar]
- Wellenius GA, Burger MR, Coull BA, Schwartz J, Suh HH, Koutrakis P, et al. (2012). Ambient air pollution and the risk of acute ischemic stroke. Archives of Internal Medicine 172, 229–234. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Yi G (2017). Statistical Analysis with Measurement Error or Misclassification. New York: Springer-Verlag. [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
